Training: 2022-03-25 22:36:07,736-rank_id: 0
Training: 2022-03-25 22:36:58,412-Speed 24752.70 samples/sec   Loss 42.4928   LearningRate 0.0000   Epoch: 0   Global Step: 20   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-25 22:37:08,404-Speed 24601.27 samples/sec   Loss 42.4611   LearningRate 0.0000   Epoch: 0   Global Step: 30   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-25 22:37:18,224-Speed 25033.10 samples/sec   Loss 42.4434   LearningRate 0.0000   Epoch: 0   Global Step: 40   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-25 22:37:28,098-Speed 24893.11 samples/sec   Loss 42.4066   LearningRate 0.0000   Epoch: 0   Global Step: 50   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-25 22:37:37,970-Speed 24897.43 samples/sec   Loss 42.3443   LearningRate 0.0000   Epoch: 0   Global Step: 60   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-25 22:37:47,692-Speed 25282.15 samples/sec   Loss 42.2692   LearningRate 0.0000   Epoch: 0   Global Step: 70   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-25 22:37:57,593-Speed 24827.06 samples/sec   Loss 42.1428   LearningRate 0.0000   Epoch: 0   Global Step: 80   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-25 22:38:07,403-Speed 25055.80 samples/sec   Loss 41.9780   LearningRate 0.0000   Epoch: 0   Global Step: 90   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-25 22:38:17,218-Speed 25045.76 samples/sec   Loss 41.7694   LearningRate 0.0000   Epoch: 0   Global Step: 100   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-03-25 22:38:26,940-Speed 25283.01 samples/sec   Loss 41.5106   LearningRate 0.0000   Epoch: 0   Global Step: 110   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-03-25 22:38:36,646-Speed 25329.46 samples/sec   Loss 41.2360   LearningRate 0.0000   Epoch: 0   Global Step: 120   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-03-25 22:38:46,629-Speed 24623.08 samples/sec   Loss 40.9357   LearningRate 0.0000   Epoch: 0   Global Step: 130   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-03-25 22:38:56,435-Speed 25067.54 samples/sec   Loss 40.6204   LearningRate 0.0000   Epoch: 0   Global Step: 140   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-03-25 22:39:06,262-Speed 25011.07 samples/sec   Loss 40.3145   LearningRate 0.0000   Epoch: 0   Global Step: 150   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-03-25 22:39:15,977-Speed 25303.17 samples/sec   Loss 40.0316   LearningRate 0.0000   Epoch: 0   Global Step: 160   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-03-25 22:39:25,906-Speed 24753.12 samples/sec   Loss 39.7716   LearningRate 0.0000   Epoch: 0   Global Step: 170   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-03-25 22:39:35,660-Speed 25199.90 samples/sec   Loss 39.5489   LearningRate 0.0000   Epoch: 0   Global Step: 180   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-03-25 22:39:45,573-Speed 24795.62 samples/sec   Loss 39.3533   LearningRate 0.0000   Epoch: 0   Global Step: 190   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-03-25 22:39:55,358-Speed 25125.76 samples/sec   Loss 39.1919   LearningRate 0.0000   Epoch: 0   Global Step: 200   Fp16 Grad Scale: 262144   Required: 19 hours
Training: 2022-03-25 22:40:05,157-Speed 25084.93 samples/sec   Loss 39.0729   LearningRate 0.0000   Epoch: 0   Global Step: 210   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-03-25 22:40:14,887-Speed 25261.59 samples/sec   Loss 38.9766   LearningRate 0.0000   Epoch: 0   Global Step: 220   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-25 22:40:24,604-Speed 25297.40 samples/sec   Loss 38.8966   LearningRate 0.0000   Epoch: 0   Global Step: 230   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-25 22:40:35,552-Speed 22449.68 samples/sec   Loss 38.8501   LearningRate 0.0000   Epoch: 0   Global Step: 240   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-25 22:40:45,377-Speed 25019.55 samples/sec   Loss 38.8173   LearningRate 0.0000   Epoch: 0   Global Step: 250   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-25 22:40:55,209-Speed 24999.09 samples/sec   Loss 38.7820   LearningRate 0.0000   Epoch: 0   Global Step: 260   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-25 22:41:04,964-Speed 25195.94 samples/sec   Loss 38.7643   LearningRate 0.0000   Epoch: 0   Global Step: 270   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-25 22:41:14,766-Speed 25078.41 samples/sec   Loss 38.7602   LearningRate 0.0000   Epoch: 0   Global Step: 280   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-25 22:41:24,563-Speed 25089.73 samples/sec   Loss 38.7441   LearningRate 0.0000   Epoch: 0   Global Step: 290   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-25 22:41:35,110-Speed 23305.06 samples/sec   Loss 38.7394   LearningRate 0.0000   Epoch: 0   Global Step: 300   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-25 22:41:44,998-Speed 24859.36 samples/sec   Loss 38.7302   LearningRate 0.0000   Epoch: 0   Global Step: 310   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-25 22:41:54,839-Speed 24976.39 samples/sec   Loss 38.7246   LearningRate 0.0000   Epoch: 0   Global Step: 320   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-25 22:42:04,500-Speed 25444.07 samples/sec   Loss 38.7257   LearningRate 0.0000   Epoch: 0   Global Step: 330   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-25 22:42:14,379-Speed 24881.89 samples/sec   Loss 38.7195   LearningRate 0.0000   Epoch: 0   Global Step: 340   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-25 22:42:24,089-Speed 25313.48 samples/sec   Loss 38.7181   LearningRate 0.0001   Epoch: 0   Global Step: 350   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-25 22:42:33,808-Speed 25290.30 samples/sec   Loss 38.7167   LearningRate 0.0001   Epoch: 0   Global Step: 360   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-25 22:42:43,576-Speed 25165.10 samples/sec   Loss 38.7178   LearningRate 0.0001   Epoch: 0   Global Step: 370   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-25 22:42:53,337-Speed 25179.68 samples/sec   Loss 38.7127   LearningRate 0.0001   Epoch: 0   Global Step: 380   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-25 22:43:03,190-Speed 24946.59 samples/sec   Loss 38.7139   LearningRate 0.0001   Epoch: 0   Global Step: 390   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-25 22:43:12,944-Speed 25211.06 samples/sec   Loss 38.7209   LearningRate 0.0001   Epoch: 0   Global Step: 400   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-25 22:43:22,649-Speed 25325.49 samples/sec   Loss 38.7303   LearningRate 0.0001   Epoch: 0   Global Step: 410   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-25 22:43:32,536-Speed 24861.02 samples/sec   Loss 38.7329   LearningRate 0.0001   Epoch: 0   Global Step: 420   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-25 22:43:42,371-Speed 24992.98 samples/sec   Loss 38.7332   LearningRate 0.0001   Epoch: 0   Global Step: 430   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-25 22:43:52,363-Speed 24606.66 samples/sec   Loss 38.7629   LearningRate 0.0001   Epoch: 0   Global Step: 440   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-03-25 22:44:02,356-Speed 24599.41 samples/sec   Loss 38.8732   LearningRate 0.0001   Epoch: 0   Global Step: 450   Fp16 Grad Scale: 4096   Required: 19 hours
Training: 2022-03-25 22:44:12,129-Speed 25150.83 samples/sec   Loss 38.7793   LearningRate 0.0001   Epoch: 0   Global Step: 460   Fp16 Grad Scale: 4096   Required: 19 hours
Training: 2022-03-25 22:44:21,837-Speed 25321.16 samples/sec   Loss 38.7960   LearningRate 0.0001   Epoch: 0   Global Step: 470   Fp16 Grad Scale: 4096   Required: 19 hours
Training: 2022-03-25 22:44:32,812-Speed 22395.31 samples/sec   Loss 38.8065   LearningRate 0.0001   Epoch: 0   Global Step: 480   Fp16 Grad Scale: 4096   Required: 19 hours
Training: 2022-03-25 22:44:42,564-Speed 25209.48 samples/sec   Loss 38.8675   LearningRate 0.0001   Epoch: 0   Global Step: 490   Fp16 Grad Scale: 4096   Required: 19 hours
Training: 2022-03-25 22:44:52,349-Speed 25120.53 samples/sec   Loss 38.8335   LearningRate 0.0001   Epoch: 0   Global Step: 500   Fp16 Grad Scale: 4096   Required: 19 hours
Training: 2022-03-25 22:45:02,109-Speed 25183.51 samples/sec   Loss 38.8670   LearningRate 0.0001   Epoch: 0   Global Step: 510   Fp16 Grad Scale: 4096   Required: 19 hours
Training: 2022-03-25 22:45:11,904-Speed 25093.14 samples/sec   Loss 38.8441   LearningRate 0.0001   Epoch: 0   Global Step: 520   Fp16 Grad Scale: 4096   Required: 19 hours
Training: 2022-03-25 22:45:21,631-Speed 25267.33 samples/sec   Loss 38.8459   LearningRate 0.0001   Epoch: 0   Global Step: 530   Fp16 Grad Scale: 4096   Required: 19 hours
Training: 2022-03-25 22:45:31,663-Speed 24500.13 samples/sec   Loss 38.8537   LearningRate 0.0001   Epoch: 0   Global Step: 540   Fp16 Grad Scale: 4096   Required: 19 hours
Training: 2022-03-25 22:45:41,489-Speed 25016.00 samples/sec   Loss 38.8839   LearningRate 0.0001   Epoch: 0   Global Step: 550   Fp16 Grad Scale: 8192   Required: 19 hours
Training: 2022-03-25 22:45:51,305-Speed 25038.74 samples/sec   Loss 38.8862   LearningRate 0.0001   Epoch: 0   Global Step: 560   Fp16 Grad Scale: 8192   Required: 19 hours
Training: 2022-03-25 22:46:01,105-Speed 25081.81 samples/sec   Loss 38.8807   LearningRate 0.0001   Epoch: 0   Global Step: 570   Fp16 Grad Scale: 8192   Required: 19 hours
Training: 2022-03-25 22:46:10,969-Speed 24918.48 samples/sec   Loss 38.8994   LearningRate 0.0001   Epoch: 0   Global Step: 580   Fp16 Grad Scale: 8192   Required: 19 hours
Training: 2022-03-25 22:46:20,722-Speed 25200.57 samples/sec   Loss 38.9110   LearningRate 0.0001   Epoch: 0   Global Step: 590   Fp16 Grad Scale: 8192   Required: 19 hours
Training: 2022-03-25 22:46:30,525-Speed 25073.68 samples/sec   Loss 38.9053   LearningRate 0.0001   Epoch: 0   Global Step: 600   Fp16 Grad Scale: 8192   Required: 19 hours
Training: 2022-03-25 22:46:40,177-Speed 25465.73 samples/sec   Loss 38.9112   LearningRate 0.0001   Epoch: 0   Global Step: 610   Fp16 Grad Scale: 8192   Required: 19 hours
Training: 2022-03-25 22:46:49,897-Speed 25287.79 samples/sec   Loss 38.9328   LearningRate 0.0001   Epoch: 0   Global Step: 620   Fp16 Grad Scale: 8192   Required: 19 hours
Training: 2022-03-25 22:46:59,527-Speed 25523.88 samples/sec   Loss 38.9337   LearningRate 0.0001   Epoch: 0   Global Step: 630   Fp16 Grad Scale: 8192   Required: 19 hours
Training: 2022-03-25 22:47:09,322-Speed 25094.00 samples/sec   Loss 38.9425   LearningRate 0.0001   Epoch: 0   Global Step: 640   Fp16 Grad Scale: 8192   Required: 19 hours
Training: 2022-03-25 22:47:19,173-Speed 24953.98 samples/sec   Loss 38.9526   LearningRate 0.0001   Epoch: 0   Global Step: 650   Fp16 Grad Scale: 8192   Required: 19 hours
Training: 2022-03-25 22:47:30,472-Speed 21753.67 samples/sec   Loss 38.9557   LearningRate 0.0001   Epoch: 0   Global Step: 660   Fp16 Grad Scale: 8192   Required: 19 hours
Training: 2022-03-25 22:47:40,195-Speed 25279.94 samples/sec   Loss 38.9613   LearningRate 0.0001   Epoch: 0   Global Step: 670   Fp16 Grad Scale: 8192   Required: 19 hours
Training: 2022-03-25 22:47:50,004-Speed 25057.28 samples/sec   Loss 38.9586   LearningRate 0.0001   Epoch: 0   Global Step: 680   Fp16 Grad Scale: 8192   Required: 19 hours
Training: 2022-03-25 22:47:59,841-Speed 24986.01 samples/sec   Loss 38.9529   LearningRate 0.0001   Epoch: 0   Global Step: 690   Fp16 Grad Scale: 8192   Required: 19 hours
Training: 2022-03-25 22:48:09,576-Speed 25248.77 samples/sec   Loss 38.9621   LearningRate 0.0001   Epoch: 0   Global Step: 700   Fp16 Grad Scale: 8192   Required: 19 hours
Training: 2022-03-25 22:48:19,331-Speed 25199.78 samples/sec   Loss 38.9775   LearningRate 0.0001   Epoch: 0   Global Step: 710   Fp16 Grad Scale: 8192   Required: 19 hours
Training: 2022-03-25 22:48:29,087-Speed 25194.89 samples/sec   Loss 38.9830   LearningRate 0.0001   Epoch: 0   Global Step: 720   Fp16 Grad Scale: 8192   Required: 19 hours
Training: 2022-03-25 22:48:38,797-Speed 25319.98 samples/sec   Loss 38.9819   LearningRate 0.0001   Epoch: 0   Global Step: 730   Fp16 Grad Scale: 8192   Required: 19 hours
Training: 2022-03-25 22:48:48,522-Speed 25272.41 samples/sec   Loss 38.9803   LearningRate 0.0001   Epoch: 0   Global Step: 740   Fp16 Grad Scale: 8192   Required: 19 hours
Training: 2022-03-25 22:48:58,343-Speed 25027.31 samples/sec   Loss 38.9782   LearningRate 0.0001   Epoch: 0   Global Step: 750   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-03-25 22:49:08,065-Speed 25284.50 samples/sec   Loss 38.9735   LearningRate 0.0001   Epoch: 0   Global Step: 760   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-03-25 22:49:17,930-Speed 24917.86 samples/sec   Loss 38.9842   LearningRate 0.0001   Epoch: 0   Global Step: 770   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-03-25 22:49:27,816-Speed 24867.03 samples/sec   Loss 38.9782   LearningRate 0.0001   Epoch: 0   Global Step: 780   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-03-25 22:49:37,695-Speed 24878.41 samples/sec   Loss 38.9766   LearningRate 0.0001   Epoch: 0   Global Step: 790   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-03-25 22:49:47,415-Speed 25288.16 samples/sec   Loss 38.9930   LearningRate 0.0001   Epoch: 0   Global Step: 800   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-03-25 22:49:57,106-Speed 25364.67 samples/sec   Loss 38.9862   LearningRate 0.0001   Epoch: 0   Global Step: 810   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-03-25 22:50:06,886-Speed 25130.70 samples/sec   Loss 38.9670   LearningRate 0.0001   Epoch: 0   Global Step: 820   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-03-25 22:50:16,699-Speed 25049.66 samples/sec   Loss 38.9634   LearningRate 0.0001   Epoch: 0   Global Step: 830   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-03-25 22:50:26,525-Speed 25013.46 samples/sec   Loss 38.9432   LearningRate 0.0001   Epoch: 0   Global Step: 840   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-03-25 22:50:37,507-Speed 22382.83 samples/sec   Loss 38.9405   LearningRate 0.0001   Epoch: 0   Global Step: 850   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-25 22:50:47,280-Speed 25149.70 samples/sec   Loss 38.9218   LearningRate 0.0001   Epoch: 0   Global Step: 860   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-25 22:50:57,092-Speed 25050.26 samples/sec   Loss 38.8972   LearningRate 0.0001   Epoch: 0   Global Step: 870   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-25 22:51:06,865-Speed 25155.23 samples/sec   Loss 38.8819   LearningRate 0.0001   Epoch: 0   Global Step: 880   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-25 22:51:16,679-Speed 25047.79 samples/sec   Loss 38.8412   LearningRate 0.0001   Epoch: 0   Global Step: 890   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-25 22:51:26,401-Speed 25279.53 samples/sec   Loss 38.8155   LearningRate 0.0001   Epoch: 0   Global Step: 900   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-25 22:51:36,468-Speed 24417.38 samples/sec   Loss 38.7776   LearningRate 0.0001   Epoch: 0   Global Step: 910   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-25 22:51:46,288-Speed 25029.46 samples/sec   Loss 38.7361   LearningRate 0.0001   Epoch: 0   Global Step: 920   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-25 22:51:56,058-Speed 25157.95 samples/sec   Loss 38.7040   LearningRate 0.0001   Epoch: 0   Global Step: 930   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-25 22:52:05,902-Speed 24968.12 samples/sec   Loss 38.6560   LearningRate 0.0001   Epoch: 0   Global Step: 940   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-25 22:52:15,684-Speed 25125.92 samples/sec   Loss 38.6261   LearningRate 0.0001   Epoch: 0   Global Step: 950   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-25 22:52:25,446-Speed 25179.22 samples/sec   Loss 38.5882   LearningRate 0.0001   Epoch: 0   Global Step: 960   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-25 22:52:35,462-Speed 24540.79 samples/sec   Loss 38.5673   LearningRate 0.0001   Epoch: 0   Global Step: 970   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-25 22:52:45,302-Speed 24986.35 samples/sec   Loss 38.5227   LearningRate 0.0001   Epoch: 0   Global Step: 980   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-25 22:52:55,141-Speed 24985.96 samples/sec   Loss 38.4887   LearningRate 0.0001   Epoch: 0   Global Step: 990   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-25 22:53:04,954-Speed 25046.79 samples/sec   Loss 38.4526   LearningRate 0.0001   Epoch: 0   Global Step: 1000   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-25 22:53:14,837-Speed 24876.65 samples/sec   Loss 38.4188   LearningRate 0.0001   Epoch: 0   Global Step: 1010   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-25 22:53:24,687-Speed 24955.08 samples/sec   Loss 38.3958   LearningRate 0.0001   Epoch: 0   Global Step: 1020   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-25 22:53:34,689-Speed 24573.73 samples/sec   Loss 38.3687   LearningRate 0.0001   Epoch: 0   Global Step: 1030   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-25 22:53:44,512-Speed 25023.17 samples/sec   Loss 38.3508   LearningRate 0.0002   Epoch: 0   Global Step: 1040   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-25 22:53:54,279-Speed 25165.41 samples/sec   Loss 38.3439   LearningRate 0.0002   Epoch: 0   Global Step: 1050   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-25 22:54:04,055-Speed 25144.15 samples/sec   Loss 38.2910   LearningRate 0.0002   Epoch: 0   Global Step: 1060   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-25 22:54:13,922-Speed 24908.75 samples/sec   Loss 38.2325   LearningRate 0.0002   Epoch: 0   Global Step: 1070   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-25 22:54:23,600-Speed 25396.45 samples/sec   Loss 38.2070   LearningRate 0.0002   Epoch: 0   Global Step: 1080   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-25 22:54:34,131-Speed 23339.99 samples/sec   Loss 38.1601   LearningRate 0.0002   Epoch: 0   Global Step: 1090   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-25 22:54:43,846-Speed 25305.78 samples/sec   Loss 38.1244   LearningRate 0.0002   Epoch: 0   Global Step: 1100   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-25 22:54:53,591-Speed 25224.01 samples/sec   Loss 38.1062   LearningRate 0.0002   Epoch: 0   Global Step: 1110   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-25 22:55:03,334-Speed 25228.52 samples/sec   Loss 38.0669   LearningRate 0.0002   Epoch: 0   Global Step: 1120   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-25 22:55:13,187-Speed 24945.68 samples/sec   Loss 38.0845   LearningRate 0.0002   Epoch: 0   Global Step: 1130   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-03-25 22:55:23,041-Speed 24944.14 samples/sec   Loss 38.0333   LearningRate 0.0002   Epoch: 0   Global Step: 1140   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-03-25 22:55:32,868-Speed 25013.82 samples/sec   Loss 37.9502   LearningRate 0.0002   Epoch: 0   Global Step: 1150   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-03-25 22:55:42,595-Speed 25268.36 samples/sec   Loss 37.9103   LearningRate 0.0002   Epoch: 0   Global Step: 1160   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-03-25 22:55:52,386-Speed 25105.04 samples/sec   Loss 37.8745   LearningRate 0.0002   Epoch: 0   Global Step: 1170   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-03-25 22:56:02,242-Speed 24937.07 samples/sec   Loss 37.8555   LearningRate 0.0002   Epoch: 0   Global Step: 1180   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-03-25 22:56:12,045-Speed 25073.10 samples/sec   Loss 37.7807   LearningRate 0.0002   Epoch: 0   Global Step: 1190   Fp16 Grad Scale: 8192   Required: 19 hours
Training: 2022-03-25 22:56:21,797-Speed 25204.43 samples/sec   Loss 37.7258   LearningRate 0.0002   Epoch: 0   Global Step: 1200   Fp16 Grad Scale: 8192   Required: 19 hours
Training: 2022-03-25 22:56:31,531-Speed 25251.95 samples/sec   Loss 37.7467   LearningRate 0.0002   Epoch: 0   Global Step: 1210   Fp16 Grad Scale: 8192   Required: 19 hours
Training: 2022-03-25 22:56:42,084-Speed 23291.91 samples/sec   Loss 37.6412   LearningRate 0.0002   Epoch: 0   Global Step: 1220   Fp16 Grad Scale: 8192   Required: 19 hours
Training: 2022-03-25 22:56:51,906-Speed 25022.93 samples/sec   Loss 37.5869   LearningRate 0.0002   Epoch: 0   Global Step: 1230   Fp16 Grad Scale: 4096   Required: 19 hours
Training: 2022-03-25 22:57:01,780-Speed 24893.63 samples/sec   Loss 37.5390   LearningRate 0.0002   Epoch: 0   Global Step: 1240   Fp16 Grad Scale: 4096   Required: 19 hours
Training: 2022-03-25 22:57:11,522-Speed 25238.10 samples/sec   Loss 37.4656   LearningRate 0.0002   Epoch: 0   Global Step: 1250   Fp16 Grad Scale: 4096   Required: 19 hours
Training: 2022-03-25 22:57:21,332-Speed 25056.54 samples/sec   Loss 37.3967   LearningRate 0.0002   Epoch: 0   Global Step: 1260   Fp16 Grad Scale: 4096   Required: 19 hours
Training: 2022-03-25 22:57:31,079-Speed 25218.55 samples/sec   Loss 37.3382   LearningRate 0.0002   Epoch: 0   Global Step: 1270   Fp16 Grad Scale: 4096   Required: 19 hours
Training: 2022-03-25 22:57:41,158-Speed 24385.51 samples/sec   Loss 37.3044   LearningRate 0.0002   Epoch: 0   Global Step: 1280   Fp16 Grad Scale: 4096   Required: 19 hours
Training: 2022-03-25 22:57:50,909-Speed 25205.78 samples/sec   Loss 37.2822   LearningRate 0.0002   Epoch: 0   Global Step: 1290   Fp16 Grad Scale: 4096   Required: 19 hours
Training: 2022-03-25 22:58:00,655-Speed 25220.17 samples/sec   Loss 37.2265   LearningRate 0.0002   Epoch: 0   Global Step: 1300   Fp16 Grad Scale: 4096   Required: 19 hours
Training: 2022-03-25 22:58:10,395-Speed 25236.94 samples/sec   Loss 37.1217   LearningRate 0.0002   Epoch: 0   Global Step: 1310   Fp16 Grad Scale: 4096   Required: 19 hours
Training: 2022-03-25 22:58:20,271-Speed 24888.00 samples/sec   Loss 37.1116   LearningRate 0.0002   Epoch: 0   Global Step: 1320   Fp16 Grad Scale: 4096   Required: 19 hours
Training: 2022-03-25 22:58:30,195-Speed 24768.94 samples/sec   Loss 37.0782   LearningRate 0.0002   Epoch: 0   Global Step: 1330   Fp16 Grad Scale: 8192   Required: 19 hours
Training: 2022-03-25 22:58:39,960-Speed 25171.81 samples/sec   Loss 37.0565   LearningRate 0.0002   Epoch: 0   Global Step: 1340   Fp16 Grad Scale: 2048   Required: 19 hours
Training: 2022-03-25 22:58:49,774-Speed 25046.03 samples/sec   Loss 36.9526   LearningRate 0.0002   Epoch: 0   Global Step: 1350   Fp16 Grad Scale: 2048   Required: 19 hours
Training: 2022-03-25 22:58:59,534-Speed 25182.55 samples/sec   Loss 36.8588   LearningRate 0.0002   Epoch: 0   Global Step: 1360   Fp16 Grad Scale: 2048   Required: 19 hours
Training: 2022-03-25 22:59:09,293-Speed 25186.89 samples/sec   Loss 36.8086   LearningRate 0.0002   Epoch: 0   Global Step: 1370   Fp16 Grad Scale: 2048   Required: 19 hours
Training: 2022-03-25 22:59:19,027-Speed 25252.06 samples/sec   Loss 36.7707   LearningRate 0.0002   Epoch: 0   Global Step: 1380   Fp16 Grad Scale: 2048   Required: 19 hours
Training: 2022-03-25 22:59:28,767-Speed 25237.80 samples/sec   Loss 36.7119   LearningRate 0.0002   Epoch: 0   Global Step: 1390   Fp16 Grad Scale: 2048   Required: 19 hours
Training: 2022-03-25 22:59:38,574-Speed 25073.98 samples/sec   Loss 36.6268   LearningRate 0.0002   Epoch: 0   Global Step: 1400   Fp16 Grad Scale: 2048   Required: 19 hours
Training: 2022-03-25 22:59:48,289-Speed 25301.34 samples/sec   Loss 36.6037   LearningRate 0.0002   Epoch: 0   Global Step: 1410   Fp16 Grad Scale: 2048   Required: 19 hours
Training: 2022-03-25 22:59:58,191-Speed 24823.01 samples/sec   Loss 36.6277   LearningRate 0.0002   Epoch: 0   Global Step: 1420   Fp16 Grad Scale: 2048   Required: 19 hours
Training: 2022-03-25 23:00:07,998-Speed 25063.98 samples/sec   Loss 36.5858   LearningRate 0.0002   Epoch: 0   Global Step: 1430   Fp16 Grad Scale: 1024   Required: 19 hours
Training: 2022-03-25 23:00:17,726-Speed 25272.23 samples/sec   Loss 36.5115   LearningRate 0.0002   Epoch: 0   Global Step: 1440   Fp16 Grad Scale: 1024   Required: 19 hours
Training: 2022-03-25 23:00:27,473-Speed 25219.59 samples/sec   Loss 36.6534   LearningRate 0.0002   Epoch: 0   Global Step: 1450   Fp16 Grad Scale: 1024   Required: 19 hours
Training: 2022-03-25 23:00:37,185-Speed 25307.38 samples/sec   Loss 36.4391   LearningRate 0.0002   Epoch: 0   Global Step: 1460   Fp16 Grad Scale: 1024   Required: 19 hours
Training: 2022-03-25 23:00:47,005-Speed 25029.88 samples/sec   Loss 36.3262   LearningRate 0.0002   Epoch: 0   Global Step: 1470   Fp16 Grad Scale: 1024   Required: 19 hours
Training: 2022-03-25 23:00:56,650-Speed 25486.44 samples/sec   Loss 36.2863   LearningRate 0.0002   Epoch: 0   Global Step: 1480   Fp16 Grad Scale: 1024   Required: 19 hours
Training: 2022-03-25 23:01:06,397-Speed 25217.03 samples/sec   Loss 36.1859   LearningRate 0.0002   Epoch: 0   Global Step: 1490   Fp16 Grad Scale: 1024   Required: 19 hours
Training: 2022-03-25 23:01:16,278-Speed 24875.33 samples/sec   Loss 36.0993   LearningRate 0.0002   Epoch: 0   Global Step: 1500   Fp16 Grad Scale: 1024   Required: 19 hours
Training: 2022-03-25 23:01:26,049-Speed 25154.64 samples/sec   Loss 35.9905   LearningRate 0.0002   Epoch: 0   Global Step: 1510   Fp16 Grad Scale: 1024   Required: 19 hours
Training: 2022-03-25 23:01:35,841-Speed 25103.46 samples/sec   Loss 35.9407   LearningRate 0.0002   Epoch: 0   Global Step: 1520   Fp16 Grad Scale: 1024   Required: 19 hours
Training: 2022-03-25 23:01:45,730-Speed 24860.70 samples/sec   Loss 35.9075   LearningRate 0.0002   Epoch: 0   Global Step: 1530   Fp16 Grad Scale: 2048   Required: 19 hours
Training: 2022-03-25 23:01:55,543-Speed 25046.41 samples/sec   Loss 35.8568   LearningRate 0.0002   Epoch: 0   Global Step: 1540   Fp16 Grad Scale: 2048   Required: 19 hours
Training: 2022-03-25 23:02:05,468-Speed 24766.90 samples/sec   Loss 35.8091   LearningRate 0.0002   Epoch: 0   Global Step: 1550   Fp16 Grad Scale: 2048   Required: 19 hours
Training: 2022-03-25 23:02:15,258-Speed 25107.07 samples/sec   Loss 35.7093   LearningRate 0.0002   Epoch: 0   Global Step: 1560   Fp16 Grad Scale: 2048   Required: 19 hours
Training: 2022-03-25 23:02:24,948-Speed 25365.59 samples/sec   Loss 35.6379   LearningRate 0.0002   Epoch: 0   Global Step: 1570   Fp16 Grad Scale: 2048   Required: 19 hours
Training: 2022-03-25 23:02:34,600-Speed 25467.27 samples/sec   Loss 35.5819   LearningRate 0.0002   Epoch: 0   Global Step: 1580   Fp16 Grad Scale: 2048   Required: 19 hours
Training: 2022-03-25 23:02:44,354-Speed 25199.03 samples/sec   Loss 35.5177   LearningRate 0.0002   Epoch: 0   Global Step: 1590   Fp16 Grad Scale: 2048   Required: 19 hours
Training: 2022-03-25 23:02:54,098-Speed 25225.16 samples/sec   Loss 35.4733   LearningRate 0.0002   Epoch: 0   Global Step: 1600   Fp16 Grad Scale: 2048   Required: 19 hours
Training: 2022-03-25 23:03:03,803-Speed 25326.54 samples/sec   Loss 35.3953   LearningRate 0.0002   Epoch: 0   Global Step: 1610   Fp16 Grad Scale: 2048   Required: 19 hours
Training: 2022-03-25 23:03:13,656-Speed 24945.85 samples/sec   Loss 35.3190   LearningRate 0.0002   Epoch: 0   Global Step: 1620   Fp16 Grad Scale: 2048   Required: 19 hours
Training: 2022-03-25 23:03:23,359-Speed 25336.58 samples/sec   Loss 35.2584   LearningRate 0.0002   Epoch: 0   Global Step: 1630   Fp16 Grad Scale: 4096   Required: 19 hours
Training: 2022-03-25 23:03:33,050-Speed 25361.03 samples/sec   Loss 35.1773   LearningRate 0.0002   Epoch: 0   Global Step: 1640   Fp16 Grad Scale: 4096   Required: 19 hours
Training: 2022-03-25 23:03:42,810-Speed 25183.60 samples/sec   Loss 35.1168   LearningRate 0.0002   Epoch: 0   Global Step: 1650   Fp16 Grad Scale: 4096   Required: 18 hours
Training: 2022-03-25 23:03:52,591-Speed 25138.79 samples/sec   Loss 35.0389   LearningRate 0.0002   Epoch: 0   Global Step: 1660   Fp16 Grad Scale: 4096   Required: 18 hours
Training: 2022-03-25 23:04:02,432-Speed 24975.44 samples/sec   Loss 34.9721   LearningRate 0.0002   Epoch: 0   Global Step: 1670   Fp16 Grad Scale: 4096   Required: 18 hours
Training: 2022-03-25 23:04:12,221-Speed 25111.65 samples/sec   Loss 34.9036   LearningRate 0.0002   Epoch: 0   Global Step: 1680   Fp16 Grad Scale: 4096   Required: 18 hours
Training: 2022-03-25 23:04:21,921-Speed 25338.85 samples/sec   Loss 34.8275   LearningRate 0.0002   Epoch: 0   Global Step: 1690   Fp16 Grad Scale: 4096   Required: 18 hours
Training: 2022-03-25 23:04:31,677-Speed 25202.05 samples/sec   Loss 34.7599   LearningRate 0.0002   Epoch: 0   Global Step: 1700   Fp16 Grad Scale: 4096   Required: 18 hours
Training: 2022-03-25 23:04:41,463-Speed 25118.88 samples/sec   Loss 34.6675   LearningRate 0.0002   Epoch: 0   Global Step: 1710   Fp16 Grad Scale: 4096   Required: 18 hours
Training: 2022-03-25 23:04:51,190-Speed 25269.60 samples/sec   Loss 34.5887   LearningRate 0.0002   Epoch: 0   Global Step: 1720   Fp16 Grad Scale: 4096   Required: 18 hours
Training: 2022-03-25 23:05:49,792-Speed 4193.82 samples/sec   Loss 34.5386   LearningRate 0.0003   Epoch: 1   Global Step: 1730   Fp16 Grad Scale: 8192   Required: 19 hours
Training: 2022-03-25 23:05:59,663-Speed 24900.48 samples/sec   Loss 34.4704   LearningRate 0.0003   Epoch: 1   Global Step: 1740   Fp16 Grad Scale: 8192   Required: 19 hours
Training: 2022-03-25 23:06:09,497-Speed 24996.18 samples/sec   Loss 34.3774   LearningRate 0.0003   Epoch: 1   Global Step: 1750   Fp16 Grad Scale: 8192   Required: 19 hours
Training: 2022-03-25 23:06:19,438-Speed 24731.98 samples/sec   Loss 34.3104   LearningRate 0.0003   Epoch: 1   Global Step: 1760   Fp16 Grad Scale: 8192   Required: 19 hours
Training: 2022-03-25 23:06:29,398-Speed 24677.92 samples/sec   Loss 34.2475   LearningRate 0.0003   Epoch: 1   Global Step: 1770   Fp16 Grad Scale: 8192   Required: 19 hours
Training: 2022-03-25 23:06:39,262-Speed 24920.67 samples/sec   Loss 34.1715   LearningRate 0.0003   Epoch: 1   Global Step: 1780   Fp16 Grad Scale: 8192   Required: 19 hours
Training: 2022-03-25 23:06:49,171-Speed 24803.68 samples/sec   Loss 34.0740   LearningRate 0.0003   Epoch: 1   Global Step: 1790   Fp16 Grad Scale: 8192   Required: 19 hours
Training: 2022-03-25 23:06:58,917-Speed 25219.17 samples/sec   Loss 34.0136   LearningRate 0.0003   Epoch: 1   Global Step: 1800   Fp16 Grad Scale: 8192   Required: 19 hours
Training: 2022-03-25 23:07:08,612-Speed 25354.10 samples/sec   Loss 33.9507   LearningRate 0.0003   Epoch: 1   Global Step: 1810   Fp16 Grad Scale: 8192   Required: 19 hours
Training: 2022-03-25 23:07:18,426-Speed 25043.86 samples/sec   Loss 33.8603   LearningRate 0.0003   Epoch: 1   Global Step: 1820   Fp16 Grad Scale: 8192   Required: 19 hours
Training: 2022-03-25 23:07:28,089-Speed 25436.74 samples/sec   Loss 33.7683   LearningRate 0.0003   Epoch: 1   Global Step: 1830   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-03-25 23:07:37,888-Speed 25083.87 samples/sec   Loss 33.6876   LearningRate 0.0003   Epoch: 1   Global Step: 1840   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-03-25 23:07:47,612-Speed 25284.22 samples/sec   Loss 33.6100   LearningRate 0.0003   Epoch: 1   Global Step: 1850   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-03-25 23:07:57,263-Speed 25466.85 samples/sec   Loss 33.5325   LearningRate 0.0003   Epoch: 1   Global Step: 1860   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-03-25 23:08:06,923-Speed 25443.08 samples/sec   Loss 33.4317   LearningRate 0.0003   Epoch: 1   Global Step: 1870   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-03-25 23:08:16,658-Speed 25247.89 samples/sec   Loss 33.3378   LearningRate 0.0003   Epoch: 1   Global Step: 1880   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-03-25 23:08:26,313-Speed 25458.70 samples/sec   Loss 33.2390   LearningRate 0.0003   Epoch: 1   Global Step: 1890   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-03-25 23:08:36,048-Speed 25248.86 samples/sec   Loss 33.1705   LearningRate 0.0003   Epoch: 1   Global Step: 1900   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-03-25 23:08:45,991-Speed 24720.53 samples/sec   Loss 33.0983   LearningRate 0.0003   Epoch: 1   Global Step: 1910   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-03-25 23:08:55,799-Speed 25063.41 samples/sec   Loss 32.9861   LearningRate 0.0003   Epoch: 1   Global Step: 1920   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-03-25 23:09:05,705-Speed 24811.03 samples/sec   Loss 32.9448   LearningRate 0.0003   Epoch: 1   Global Step: 1930   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-25 23:09:15,629-Speed 24767.99 samples/sec   Loss 32.8222   LearningRate 0.0003   Epoch: 1   Global Step: 1940   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-25 23:09:25,504-Speed 24891.05 samples/sec   Loss 32.6941   LearningRate 0.0003   Epoch: 1   Global Step: 1950   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-25 23:09:35,375-Speed 24909.39 samples/sec   Loss 32.6399   LearningRate 0.0003   Epoch: 1   Global Step: 1960   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-25 23:09:45,208-Speed 24995.24 samples/sec   Loss 32.5305   LearningRate 0.0003   Epoch: 1   Global Step: 1970   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-25 23:09:54,978-Speed 25165.76 samples/sec   Loss 32.4602   LearningRate 0.0003   Epoch: 1   Global Step: 1980   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-25 23:10:04,881-Speed 24823.34 samples/sec   Loss 32.3620   LearningRate 0.0003   Epoch: 1   Global Step: 1990   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-25 23:10:14,713-Speed 24999.25 samples/sec   Loss 32.2720   LearningRate 0.0003   Epoch: 1   Global Step: 2000   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-03-25 23:10:24,651-Speed 24732.22 samples/sec   Loss 32.1940   LearningRate 0.0003   Epoch: 1   Global Step: 2010   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-03-25 23:10:34,452-Speed 25077.44 samples/sec   Loss 32.0679   LearningRate 0.0003   Epoch: 1   Global Step: 2020   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-03-25 23:10:44,484-Speed 24502.03 samples/sec   Loss 31.9575   LearningRate 0.0003   Epoch: 1   Global Step: 2030   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-03-25 23:10:54,512-Speed 24511.65 samples/sec   Loss 31.9039   LearningRate 0.0003   Epoch: 1   Global Step: 2040   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-03-25 23:11:04,360-Speed 24955.93 samples/sec   Loss 31.7882   LearningRate 0.0003   Epoch: 1   Global Step: 2050   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-03-25 23:11:14,238-Speed 24882.98 samples/sec   Loss 31.7037   LearningRate 0.0003   Epoch: 1   Global Step: 2060   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-03-25 23:11:24,179-Speed 24725.06 samples/sec   Loss 31.5904   LearningRate 0.0003   Epoch: 1   Global Step: 2070   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-03-25 23:11:34,099-Speed 24777.58 samples/sec   Loss 31.4572   LearningRate 0.0003   Epoch: 1   Global Step: 2080   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-03-25 23:11:44,070-Speed 24650.79 samples/sec   Loss 31.3781   LearningRate 0.0003   Epoch: 1   Global Step: 2090   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-03-25 23:11:53,994-Speed 24767.28 samples/sec   Loss 31.3101   LearningRate 0.0003   Epoch: 1   Global Step: 2100   Fp16 Grad Scale: 8192   Required: 19 hours
Training: 2022-03-25 23:12:04,102-Speed 24316.11 samples/sec   Loss 31.1987   LearningRate 0.0003   Epoch: 1   Global Step: 2110   Fp16 Grad Scale: 8192   Required: 19 hours
Training: 2022-03-25 23:12:14,130-Speed 24510.62 samples/sec   Loss 31.0848   LearningRate 0.0003   Epoch: 1   Global Step: 2120   Fp16 Grad Scale: 8192   Required: 19 hours
Training: 2022-03-25 23:12:23,979-Speed 24959.30 samples/sec   Loss 31.0179   LearningRate 0.0003   Epoch: 1   Global Step: 2130   Fp16 Grad Scale: 8192   Required: 19 hours
Training: 2022-03-25 23:12:33,916-Speed 24740.25 samples/sec   Loss 30.9685   LearningRate 0.0003   Epoch: 1   Global Step: 2140   Fp16 Grad Scale: 8192   Required: 19 hours
Training: 2022-03-25 23:12:43,804-Speed 24856.78 samples/sec   Loss 30.8054   LearningRate 0.0003   Epoch: 1   Global Step: 2150   Fp16 Grad Scale: 8192   Required: 19 hours
Training: 2022-03-25 23:12:53,660-Speed 24941.99 samples/sec   Loss 30.7122   LearningRate 0.0003   Epoch: 1   Global Step: 2160   Fp16 Grad Scale: 4096   Required: 19 hours
Training: 2022-03-25 23:13:03,415-Speed 25199.00 samples/sec   Loss 30.6227   LearningRate 0.0003   Epoch: 1   Global Step: 2170   Fp16 Grad Scale: 4096   Required: 19 hours
Training: 2022-03-25 23:13:13,225-Speed 25055.10 samples/sec   Loss 30.5507   LearningRate 0.0003   Epoch: 1   Global Step: 2180   Fp16 Grad Scale: 4096   Required: 19 hours
Training: 2022-03-25 23:13:22,938-Speed 25305.01 samples/sec   Loss 30.3906   LearningRate 0.0003   Epoch: 1   Global Step: 2190   Fp16 Grad Scale: 4096   Required: 19 hours
Training: 2022-03-25 23:13:32,605-Speed 25426.92 samples/sec   Loss 30.2623   LearningRate 0.0003   Epoch: 1   Global Step: 2200   Fp16 Grad Scale: 4096   Required: 19 hours
Training: 2022-03-25 23:13:42,483-Speed 24882.28 samples/sec   Loss 30.1496   LearningRate 0.0003   Epoch: 1   Global Step: 2210   Fp16 Grad Scale: 4096   Required: 19 hours
Training: 2022-03-25 23:13:52,180-Speed 25348.24 samples/sec   Loss 30.0365   LearningRate 0.0003   Epoch: 1   Global Step: 2220   Fp16 Grad Scale: 4096   Required: 19 hours
Training: 2022-03-25 23:14:01,911-Speed 25259.35 samples/sec   Loss 30.0293   LearningRate 0.0003   Epoch: 1   Global Step: 2230   Fp16 Grad Scale: 4096   Required: 19 hours
Training: 2022-03-25 23:14:11,532-Speed 25546.73 samples/sec   Loss 29.9535   LearningRate 0.0003   Epoch: 1   Global Step: 2240   Fp16 Grad Scale: 2048   Required: 19 hours
Training: 2022-03-25 23:14:21,301-Speed 25161.23 samples/sec   Loss 29.7620   LearningRate 0.0003   Epoch: 1   Global Step: 2250   Fp16 Grad Scale: 2048   Required: 19 hours
Training: 2022-03-25 23:14:31,095-Speed 25095.03 samples/sec   Loss 29.6482   LearningRate 0.0003   Epoch: 1   Global Step: 2260   Fp16 Grad Scale: 2048   Required: 19 hours
Training: 2022-03-25 23:14:40,891-Speed 25091.62 samples/sec   Loss 29.5748   LearningRate 0.0003   Epoch: 1   Global Step: 2270   Fp16 Grad Scale: 2048   Required: 19 hours
Training: 2022-03-25 23:14:50,613-Speed 25280.88 samples/sec   Loss 29.4368   LearningRate 0.0003   Epoch: 1   Global Step: 2280   Fp16 Grad Scale: 2048   Required: 19 hours
Training: 2022-03-25 23:15:00,361-Speed 25215.68 samples/sec   Loss 29.3380   LearningRate 0.0003   Epoch: 1   Global Step: 2290   Fp16 Grad Scale: 2048   Required: 19 hours
Training: 2022-03-25 23:15:10,128-Speed 25164.90 samples/sec   Loss 29.2221   LearningRate 0.0003   Epoch: 1   Global Step: 2300   Fp16 Grad Scale: 2048   Required: 19 hours
Training: 2022-03-25 23:15:19,873-Speed 25221.98 samples/sec   Loss 29.1020   LearningRate 0.0003   Epoch: 1   Global Step: 2310   Fp16 Grad Scale: 2048   Required: 19 hours
Training: 2022-03-25 23:15:29,623-Speed 25210.88 samples/sec   Loss 28.9722   LearningRate 0.0003   Epoch: 1   Global Step: 2320   Fp16 Grad Scale: 2048   Required: 19 hours
Training: 2022-03-25 23:15:39,326-Speed 25332.75 samples/sec   Loss 28.9109   LearningRate 0.0003   Epoch: 1   Global Step: 2330   Fp16 Grad Scale: 2048   Required: 19 hours
Training: 2022-03-25 23:15:49,064-Speed 25239.42 samples/sec   Loss 28.7423   LearningRate 0.0003   Epoch: 1   Global Step: 2340   Fp16 Grad Scale: 4096   Required: 19 hours
Training: 2022-03-25 23:15:58,844-Speed 25133.41 samples/sec   Loss 28.6458   LearningRate 0.0003   Epoch: 1   Global Step: 2350   Fp16 Grad Scale: 4096   Required: 19 hours
Training: 2022-03-25 23:16:08,542-Speed 25345.84 samples/sec   Loss 28.5085   LearningRate 0.0003   Epoch: 1   Global Step: 2360   Fp16 Grad Scale: 4096   Required: 19 hours
Training: 2022-03-25 23:16:18,335-Speed 25097.38 samples/sec   Loss 28.4597   LearningRate 0.0003   Epoch: 1   Global Step: 2370   Fp16 Grad Scale: 4096   Required: 19 hours
Training: 2022-03-25 23:16:28,218-Speed 24871.83 samples/sec   Loss 28.3246   LearningRate 0.0003   Epoch: 1   Global Step: 2380   Fp16 Grad Scale: 4096   Required: 19 hours
Training: 2022-03-25 23:16:38,003-Speed 25120.47 samples/sec   Loss 28.1763   LearningRate 0.0003   Epoch: 1   Global Step: 2390   Fp16 Grad Scale: 4096   Required: 19 hours
Training: 2022-03-25 23:16:47,768-Speed 25173.69 samples/sec   Loss 28.0865   LearningRate 0.0003   Epoch: 1   Global Step: 2400   Fp16 Grad Scale: 4096   Required: 19 hours
Training: 2022-03-25 23:16:57,578-Speed 25053.55 samples/sec   Loss 27.9546   LearningRate 0.0003   Epoch: 1   Global Step: 2410   Fp16 Grad Scale: 4096   Required: 19 hours
Training: 2022-03-25 23:17:07,364-Speed 25117.12 samples/sec   Loss 27.8524   LearningRate 0.0004   Epoch: 1   Global Step: 2420   Fp16 Grad Scale: 4096   Required: 19 hours
Training: 2022-03-25 23:17:17,083-Speed 25289.73 samples/sec   Loss 27.7783   LearningRate 0.0004   Epoch: 1   Global Step: 2430   Fp16 Grad Scale: 4096   Required: 19 hours
Training: 2022-03-25 23:17:26,896-Speed 25050.44 samples/sec   Loss 27.6127   LearningRate 0.0004   Epoch: 1   Global Step: 2440   Fp16 Grad Scale: 8192   Required: 19 hours
Training: 2022-03-25 23:17:36,626-Speed 25260.47 samples/sec   Loss 27.5015   LearningRate 0.0004   Epoch: 1   Global Step: 2450   Fp16 Grad Scale: 8192   Required: 19 hours
Training: 2022-03-25 23:17:46,362-Speed 25247.13 samples/sec   Loss 27.3621   LearningRate 0.0004   Epoch: 1   Global Step: 2460   Fp16 Grad Scale: 8192   Required: 19 hours
Training: 2022-03-25 23:17:56,105-Speed 25225.55 samples/sec   Loss 27.2799   LearningRate 0.0004   Epoch: 1   Global Step: 2470   Fp16 Grad Scale: 4096   Required: 19 hours
Training: 2022-03-25 23:18:05,877-Speed 25154.32 samples/sec   Loss 27.1513   LearningRate 0.0004   Epoch: 1   Global Step: 2480   Fp16 Grad Scale: 4096   Required: 19 hours
Training: 2022-03-25 23:18:15,617-Speed 25234.62 samples/sec   Loss 27.0488   LearningRate 0.0004   Epoch: 1   Global Step: 2490   Fp16 Grad Scale: 4096   Required: 19 hours
Training: 2022-03-25 23:18:25,371-Speed 25199.02 samples/sec   Loss 26.9313   LearningRate 0.0004   Epoch: 1   Global Step: 2500   Fp16 Grad Scale: 4096   Required: 19 hours
Training: 2022-03-25 23:18:35,077-Speed 25323.45 samples/sec   Loss 26.8186   LearningRate 0.0004   Epoch: 1   Global Step: 2510   Fp16 Grad Scale: 4096   Required: 19 hours
Training: 2022-03-25 23:18:44,966-Speed 24854.79 samples/sec   Loss 26.7437   LearningRate 0.0004   Epoch: 1   Global Step: 2520   Fp16 Grad Scale: 4096   Required: 19 hours
Training: 2022-03-25 23:18:54,740-Speed 25148.84 samples/sec   Loss 26.5722   LearningRate 0.0004   Epoch: 1   Global Step: 2530   Fp16 Grad Scale: 4096   Required: 19 hours
Training: 2022-03-25 23:19:04,462-Speed 25280.59 samples/sec   Loss 26.3833   LearningRate 0.0004   Epoch: 1   Global Step: 2540   Fp16 Grad Scale: 4096   Required: 19 hours
Training: 2022-03-25 23:19:14,176-Speed 25304.65 samples/sec   Loss 26.3266   LearningRate 0.0004   Epoch: 1   Global Step: 2550   Fp16 Grad Scale: 4096   Required: 19 hours
Training: 2022-03-25 23:19:24,024-Speed 24957.82 samples/sec   Loss 26.2671   LearningRate 0.0004   Epoch: 1   Global Step: 2560   Fp16 Grad Scale: 4096   Required: 19 hours
Training: 2022-03-25 23:19:33,707-Speed 25382.74 samples/sec   Loss 26.0960   LearningRate 0.0004   Epoch: 1   Global Step: 2570   Fp16 Grad Scale: 8192   Required: 19 hours
Training: 2022-03-25 23:19:43,584-Speed 24885.25 samples/sec   Loss 25.9456   LearningRate 0.0004   Epoch: 1   Global Step: 2580   Fp16 Grad Scale: 8192   Required: 19 hours
Training: 2022-03-25 23:19:53,320-Speed 25245.32 samples/sec   Loss 25.8807   LearningRate 0.0004   Epoch: 1   Global Step: 2590   Fp16 Grad Scale: 8192   Required: 19 hours
Training: 2022-03-25 23:20:03,045-Speed 25274.59 samples/sec   Loss 25.7070   LearningRate 0.0004   Epoch: 1   Global Step: 2600   Fp16 Grad Scale: 8192   Required: 19 hours
Training: 2022-03-25 23:20:12,819-Speed 25145.51 samples/sec   Loss 25.5839   LearningRate 0.0004   Epoch: 1   Global Step: 2610   Fp16 Grad Scale: 8192   Required: 19 hours
Training: 2022-03-25 23:20:22,516-Speed 25347.50 samples/sec   Loss 25.4604   LearningRate 0.0004   Epoch: 1   Global Step: 2620   Fp16 Grad Scale: 8192   Required: 19 hours
Training: 2022-03-25 23:20:32,445-Speed 24754.29 samples/sec   Loss 25.3822   LearningRate 0.0004   Epoch: 1   Global Step: 2630   Fp16 Grad Scale: 8192   Required: 19 hours
Training: 2022-03-25 23:20:42,198-Speed 25200.97 samples/sec   Loss 25.1756   LearningRate 0.0004   Epoch: 1   Global Step: 2640   Fp16 Grad Scale: 8192   Required: 19 hours
Training: 2022-03-25 23:20:51,950-Speed 25203.18 samples/sec   Loss 25.1051   LearningRate 0.0004   Epoch: 1   Global Step: 2650   Fp16 Grad Scale: 8192   Required: 19 hours
Training: 2022-03-25 23:21:01,748-Speed 25084.55 samples/sec   Loss 24.9740   LearningRate 0.0004   Epoch: 1   Global Step: 2660   Fp16 Grad Scale: 8192   Required: 19 hours
Training: 2022-03-25 23:21:11,554-Speed 25064.55 samples/sec   Loss 24.8707   LearningRate 0.0004   Epoch: 1   Global Step: 2670   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-03-25 23:21:21,424-Speed 24902.45 samples/sec   Loss 24.7900   LearningRate 0.0004   Epoch: 1   Global Step: 2680   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-03-25 23:21:31,246-Speed 25025.59 samples/sec   Loss 24.6314   LearningRate 0.0004   Epoch: 1   Global Step: 2690   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-03-25 23:21:40,971-Speed 25274.11 samples/sec   Loss 24.4532   LearningRate 0.0004   Epoch: 1   Global Step: 2700   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-03-25 23:21:50,797-Speed 25013.45 samples/sec   Loss 24.3664   LearningRate 0.0004   Epoch: 1   Global Step: 2710   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-03-25 23:22:00,591-Speed 25097.87 samples/sec   Loss 24.2363   LearningRate 0.0004   Epoch: 1   Global Step: 2720   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-03-25 23:22:10,442-Speed 24950.15 samples/sec   Loss 24.1528   LearningRate 0.0004   Epoch: 1   Global Step: 2730   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-03-25 23:22:20,216-Speed 25149.74 samples/sec   Loss 23.9904   LearningRate 0.0004   Epoch: 1   Global Step: 2740   Fp16 Grad Scale: 8192   Required: 18 hours
Training: 2022-03-25 23:22:29,987-Speed 25152.79 samples/sec   Loss 23.8639   LearningRate 0.0004   Epoch: 1   Global Step: 2750   Fp16 Grad Scale: 8192   Required: 18 hours
Training: 2022-03-25 23:22:39,760-Speed 25156.50 samples/sec   Loss 23.7269   LearningRate 0.0004   Epoch: 1   Global Step: 2760   Fp16 Grad Scale: 8192   Required: 18 hours
Training: 2022-03-25 23:22:49,762-Speed 24573.17 samples/sec   Loss 23.6158   LearningRate 0.0004   Epoch: 1   Global Step: 2770   Fp16 Grad Scale: 8192   Required: 18 hours
Training: 2022-03-25 23:22:59,587-Speed 25018.46 samples/sec   Loss 23.4984   LearningRate 0.0004   Epoch: 1   Global Step: 2780   Fp16 Grad Scale: 8192   Required: 18 hours
Training: 2022-03-25 23:23:09,342-Speed 25195.89 samples/sec   Loss 23.3760   LearningRate 0.0004   Epoch: 1   Global Step: 2790   Fp16 Grad Scale: 8192   Required: 18 hours
Training: 2022-03-25 23:23:19,085-Speed 25227.78 samples/sec   Loss 23.2356   LearningRate 0.0004   Epoch: 1   Global Step: 2800   Fp16 Grad Scale: 8192   Required: 18 hours
Training: 2022-03-25 23:23:28,974-Speed 24855.70 samples/sec   Loss 23.1308   LearningRate 0.0004   Epoch: 1   Global Step: 2810   Fp16 Grad Scale: 8192   Required: 18 hours
Training: 2022-03-25 23:23:38,697-Speed 25278.51 samples/sec   Loss 23.0694   LearningRate 0.0004   Epoch: 1   Global Step: 2820   Fp16 Grad Scale: 8192   Required: 18 hours
Training: 2022-03-25 23:23:48,461-Speed 25173.10 samples/sec   Loss 22.9124   LearningRate 0.0004   Epoch: 1   Global Step: 2830   Fp16 Grad Scale: 8192   Required: 18 hours
Training: 2022-03-25 23:23:58,250-Speed 25108.78 samples/sec   Loss 22.7914   LearningRate 0.0004   Epoch: 1   Global Step: 2840   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-03-25 23:24:08,131-Speed 24873.48 samples/sec   Loss 22.6964   LearningRate 0.0004   Epoch: 1   Global Step: 2850   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-03-25 23:24:17,891-Speed 25181.79 samples/sec   Loss 22.5736   LearningRate 0.0004   Epoch: 1   Global Step: 2860   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-03-25 23:24:27,707-Speed 25039.65 samples/sec   Loss 22.3991   LearningRate 0.0004   Epoch: 1   Global Step: 2870   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-03-25 23:24:37,535-Speed 25015.40 samples/sec   Loss 22.3019   LearningRate 0.0004   Epoch: 1   Global Step: 2880   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-03-25 23:24:47,245-Speed 25319.49 samples/sec   Loss 22.1840   LearningRate 0.0004   Epoch: 1   Global Step: 2890   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-03-25 23:24:56,964-Speed 25288.64 samples/sec   Loss 22.0701   LearningRate 0.0004   Epoch: 1   Global Step: 2900   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-03-25 23:25:06,791-Speed 25009.78 samples/sec   Loss 21.8759   LearningRate 0.0004   Epoch: 1   Global Step: 2910   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-03-25 23:25:16,718-Speed 24763.15 samples/sec   Loss 21.8232   LearningRate 0.0004   Epoch: 1   Global Step: 2920   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-03-25 23:25:26,487-Speed 25160.05 samples/sec   Loss 21.7388   LearningRate 0.0004   Epoch: 1   Global Step: 2930   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-03-25 23:25:36,348-Speed 24926.86 samples/sec   Loss 21.5801   LearningRate 0.0004   Epoch: 1   Global Step: 2940   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-25 23:25:46,084-Speed 25247.66 samples/sec   Loss 21.4782   LearningRate 0.0004   Epoch: 1   Global Step: 2950   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-25 23:25:55,828-Speed 25224.72 samples/sec   Loss 21.3349   LearningRate 0.0004   Epoch: 1   Global Step: 2960   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-25 23:26:05,583-Speed 25196.26 samples/sec   Loss 21.2125   LearningRate 0.0004   Epoch: 1   Global Step: 2970   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-25 23:26:15,368-Speed 25121.02 samples/sec   Loss 21.1110   LearningRate 0.0004   Epoch: 1   Global Step: 2980   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-25 23:26:25,104-Speed 25246.69 samples/sec   Loss 20.9764   LearningRate 0.0004   Epoch: 1   Global Step: 2990   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-25 23:26:34,962-Speed 24933.38 samples/sec   Loss 20.8678   LearningRate 0.0004   Epoch: 1   Global Step: 3000   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-25 23:26:44,706-Speed 25232.20 samples/sec   Loss 20.7297   LearningRate 0.0004   Epoch: 1   Global Step: 3010   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-25 23:26:54,407-Speed 25336.29 samples/sec   Loss 20.6361   LearningRate 0.0004   Epoch: 1   Global Step: 3020   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-25 23:27:04,185-Speed 25139.13 samples/sec   Loss 20.5041   LearningRate 0.0004   Epoch: 1   Global Step: 3030   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-25 23:27:13,954-Speed 25159.72 samples/sec   Loss 20.4249   LearningRate 0.0004   Epoch: 1   Global Step: 3040   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-25 23:27:23,674-Speed 25289.26 samples/sec   Loss 20.2864   LearningRate 0.0004   Epoch: 1   Global Step: 3050   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-25 23:27:33,494-Speed 25028.91 samples/sec   Loss 20.2047   LearningRate 0.0004   Epoch: 1   Global Step: 3060   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-25 23:27:43,282-Speed 25113.80 samples/sec   Loss 20.1353   LearningRate 0.0004   Epoch: 1   Global Step: 3070   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-03-25 23:27:53,064-Speed 25133.60 samples/sec   Loss 19.9164   LearningRate 0.0004   Epoch: 1   Global Step: 3080   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-03-25 23:28:02,854-Speed 25105.58 samples/sec   Loss 19.8458   LearningRate 0.0004   Epoch: 1   Global Step: 3090   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-03-25 23:28:12,713-Speed 24930.63 samples/sec   Loss 19.7557   LearningRate 0.0004   Epoch: 1   Global Step: 3100   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-03-25 23:28:22,443-Speed 25261.38 samples/sec   Loss 19.6165   LearningRate 0.0004   Epoch: 1   Global Step: 3110   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-03-25 23:28:32,217-Speed 25148.00 samples/sec   Loss 19.5574   LearningRate 0.0005   Epoch: 1   Global Step: 3120   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-03-25 23:28:42,247-Speed 24507.88 samples/sec   Loss 19.4151   LearningRate 0.0005   Epoch: 1   Global Step: 3130   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-03-25 23:28:52,265-Speed 24533.52 samples/sec   Loss 19.3071   LearningRate 0.0005   Epoch: 1   Global Step: 3140   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-03-25 23:29:02,288-Speed 24521.72 samples/sec   Loss 19.2118   LearningRate 0.0005   Epoch: 1   Global Step: 3150   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-03-25 23:29:12,310-Speed 24525.79 samples/sec   Loss 19.1147   LearningRate 0.0005   Epoch: 1   Global Step: 3160   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-03-25 23:29:22,324-Speed 24546.47 samples/sec   Loss 19.0137   LearningRate 0.0005   Epoch: 1   Global Step: 3170   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-25 23:29:32,335-Speed 24551.03 samples/sec   Loss 18.9172   LearningRate 0.0005   Epoch: 1   Global Step: 3180   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-25 23:29:42,374-Speed 24484.38 samples/sec   Loss 18.7752   LearningRate 0.0005   Epoch: 1   Global Step: 3190   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-25 23:29:52,433-Speed 24433.77 samples/sec   Loss 18.6822   LearningRate 0.0005   Epoch: 1   Global Step: 3200   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-25 23:30:02,430-Speed 24587.29 samples/sec   Loss 18.6161   LearningRate 0.0005   Epoch: 1   Global Step: 3210   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-25 23:30:12,467-Speed 24488.55 samples/sec   Loss 18.4775   LearningRate 0.0005   Epoch: 1   Global Step: 3220   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-25 23:30:22,298-Speed 25002.10 samples/sec   Loss 18.3985   LearningRate 0.0005   Epoch: 1   Global Step: 3230   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-25 23:30:32,127-Speed 25007.64 samples/sec   Loss 18.2802   LearningRate 0.0005   Epoch: 1   Global Step: 3240   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-25 23:30:42,041-Speed 24794.09 samples/sec   Loss 18.2469   LearningRate 0.0005   Epoch: 1   Global Step: 3250   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-25 23:30:51,821-Speed 25134.25 samples/sec   Loss 18.1722   LearningRate 0.0005   Epoch: 1   Global Step: 3260   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-25 23:31:01,640-Speed 25033.96 samples/sec   Loss 18.0265   LearningRate 0.0005   Epoch: 1   Global Step: 3270   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-25 23:31:11,488-Speed 24966.02 samples/sec   Loss 17.9772   LearningRate 0.0005   Epoch: 1   Global Step: 3280   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-25 23:31:21,408-Speed 24778.64 samples/sec   Loss 17.8233   LearningRate 0.0005   Epoch: 1   Global Step: 3290   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-25 23:31:31,202-Speed 25098.75 samples/sec   Loss 17.7702   LearningRate 0.0005   Epoch: 1   Global Step: 3300   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-25 23:31:41,083-Speed 24876.48 samples/sec   Loss 17.7215   LearningRate 0.0005   Epoch: 1   Global Step: 3310   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-25 23:31:50,968-Speed 24866.10 samples/sec   Loss 17.6012   LearningRate 0.0005   Epoch: 1   Global Step: 3320   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-25 23:32:00,850-Speed 24871.41 samples/sec   Loss 17.4883   LearningRate 0.0005   Epoch: 1   Global Step: 3330   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-25 23:32:10,637-Speed 25115.86 samples/sec   Loss 17.4178   LearningRate 0.0005   Epoch: 1   Global Step: 3340   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-25 23:32:20,426-Speed 25106.67 samples/sec   Loss 17.3480   LearningRate 0.0005   Epoch: 1   Global Step: 3350   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-25 23:32:30,319-Speed 24846.18 samples/sec   Loss 17.2066   LearningRate 0.0005   Epoch: 1   Global Step: 3360   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-25 23:32:40,196-Speed 24888.77 samples/sec   Loss 17.1509   LearningRate 0.0005   Epoch: 1   Global Step: 3370   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:32:50,028-Speed 25005.31 samples/sec   Loss 17.0565   LearningRate 0.0005   Epoch: 1   Global Step: 3380   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:32:59,910-Speed 24873.97 samples/sec   Loss 16.9704   LearningRate 0.0005   Epoch: 1   Global Step: 3390   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:33:09,769-Speed 24931.94 samples/sec   Loss 16.9162   LearningRate 0.0005   Epoch: 1   Global Step: 3400   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:33:19,576-Speed 25061.83 samples/sec   Loss 16.8070   LearningRate 0.0005   Epoch: 1   Global Step: 3410   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:33:29,445-Speed 24913.75 samples/sec   Loss 16.7561   LearningRate 0.0005   Epoch: 1   Global Step: 3420   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:33:39,278-Speed 24995.35 samples/sec   Loss 16.6870   LearningRate 0.0005   Epoch: 1   Global Step: 3430   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:33:49,088-Speed 25057.64 samples/sec   Loss 16.5930   LearningRate 0.0005   Epoch: 1   Global Step: 3440   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:33:58,959-Speed 24899.82 samples/sec   Loss 16.4984   LearningRate 0.0005   Epoch: 1   Global Step: 3450   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:34:57,707-Speed 4183.37 samples/sec   Loss 16.3728   LearningRate 0.0005   Epoch: 2   Global Step: 3460   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:35:07,667-Speed 24679.81 samples/sec   Loss 16.3247   LearningRate 0.0005   Epoch: 2   Global Step: 3470   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:35:17,465-Speed 25086.25 samples/sec   Loss 16.2328   LearningRate 0.0005   Epoch: 2   Global Step: 3480   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:35:27,276-Speed 25051.62 samples/sec   Loss 16.1593   LearningRate 0.0005   Epoch: 2   Global Step: 3490   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:35:37,009-Speed 25255.88 samples/sec   Loss 16.0609   LearningRate 0.0005   Epoch: 2   Global Step: 3500   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:35:46,731-Speed 25279.23 samples/sec   Loss 15.9801   LearningRate 0.0005   Epoch: 2   Global Step: 3510   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:35:56,502-Speed 25158.14 samples/sec   Loss 15.9200   LearningRate 0.0005   Epoch: 2   Global Step: 3520   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:36:06,189-Speed 25372.74 samples/sec   Loss 15.8597   LearningRate 0.0005   Epoch: 2   Global Step: 3530   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:36:15,986-Speed 25087.32 samples/sec   Loss 15.8265   LearningRate 0.0005   Epoch: 2   Global Step: 3540   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:36:25,788-Speed 25074.90 samples/sec   Loss 15.7255   LearningRate 0.0005   Epoch: 2   Global Step: 3550   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:36:35,561-Speed 25152.20 samples/sec   Loss 15.6217   LearningRate 0.0005   Epoch: 2   Global Step: 3560   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:36:45,406-Speed 24965.75 samples/sec   Loss 15.5476   LearningRate 0.0005   Epoch: 2   Global Step: 3570   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:36:55,122-Speed 25298.09 samples/sec   Loss 15.5001   LearningRate 0.0005   Epoch: 2   Global Step: 3580   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:37:04,945-Speed 25021.74 samples/sec   Loss 15.4222   LearningRate 0.0005   Epoch: 2   Global Step: 3590   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:37:14,671-Speed 25272.34 samples/sec   Loss 15.3599   LearningRate 0.0005   Epoch: 2   Global Step: 3600   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:37:24,428-Speed 25198.43 samples/sec   Loss 15.3094   LearningRate 0.0005   Epoch: 2   Global Step: 3610   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:37:34,106-Speed 25401.97 samples/sec   Loss 15.1692   LearningRate 0.0005   Epoch: 2   Global Step: 3620   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:37:43,939-Speed 24999.28 samples/sec   Loss 15.1055   LearningRate 0.0005   Epoch: 2   Global Step: 3630   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:37:53,773-Speed 24994.45 samples/sec   Loss 15.0738   LearningRate 0.0005   Epoch: 2   Global Step: 3640   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:38:03,526-Speed 25205.36 samples/sec   Loss 15.0974   LearningRate 0.0005   Epoch: 2   Global Step: 3650   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:38:13,313-Speed 25116.48 samples/sec   Loss 14.9016   LearningRate 0.0005   Epoch: 2   Global Step: 3660   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:38:23,031-Speed 25292.23 samples/sec   Loss 14.8859   LearningRate 0.0005   Epoch: 2   Global Step: 3670   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:38:32,843-Speed 25050.72 samples/sec   Loss 14.7951   LearningRate 0.0005   Epoch: 2   Global Step: 3680   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:38:42,582-Speed 25244.15 samples/sec   Loss 14.8158   LearningRate 0.0005   Epoch: 2   Global Step: 3690   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:38:52,380-Speed 25090.83 samples/sec   Loss 14.7362   LearningRate 0.0005   Epoch: 2   Global Step: 3700   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:39:02,083-Speed 25329.69 samples/sec   Loss 14.7002   LearningRate 0.0005   Epoch: 2   Global Step: 3710   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:39:11,820-Speed 25243.74 samples/sec   Loss 14.5990   LearningRate 0.0005   Epoch: 2   Global Step: 3720   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:39:21,606-Speed 25118.53 samples/sec   Loss 14.5215   LearningRate 0.0005   Epoch: 2   Global Step: 3730   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:39:31,435-Speed 25006.42 samples/sec   Loss 14.4906   LearningRate 0.0005   Epoch: 2   Global Step: 3740   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:39:41,149-Speed 25302.30 samples/sec   Loss 14.4105   LearningRate 0.0005   Epoch: 2   Global Step: 3750   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:39:50,888-Speed 25239.12 samples/sec   Loss 14.3434   LearningRate 0.0005   Epoch: 2   Global Step: 3760   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:40:00,718-Speed 25003.55 samples/sec   Loss 14.2330   LearningRate 0.0005   Epoch: 2   Global Step: 3770   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:40:10,424-Speed 25323.92 samples/sec   Loss 14.1956   LearningRate 0.0005   Epoch: 2   Global Step: 3780   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:40:20,182-Speed 25196.42 samples/sec   Loss 14.2314   LearningRate 0.0005   Epoch: 2   Global Step: 3790   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:40:29,971-Speed 25106.13 samples/sec   Loss 14.1366   LearningRate 0.0005   Epoch: 2   Global Step: 3800   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:40:39,726-Speed 25197.32 samples/sec   Loss 14.0105   LearningRate 0.0006   Epoch: 2   Global Step: 3810   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:40:49,456-Speed 25260.65 samples/sec   Loss 13.9939   LearningRate 0.0006   Epoch: 2   Global Step: 3820   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:40:59,147-Speed 25361.11 samples/sec   Loss 13.9411   LearningRate 0.0006   Epoch: 2   Global Step: 3830   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:41:08,842-Speed 25354.19 samples/sec   Loss 13.9214   LearningRate 0.0006   Epoch: 2   Global Step: 3840   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:41:18,552-Speed 25312.57 samples/sec   Loss 13.7856   LearningRate 0.0006   Epoch: 2   Global Step: 3850   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:41:28,330-Speed 25140.65 samples/sec   Loss 13.7677   LearningRate 0.0006   Epoch: 2   Global Step: 3860   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:41:38,009-Speed 25393.24 samples/sec   Loss 13.7017   LearningRate 0.0006   Epoch: 2   Global Step: 3870   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-03-25 23:41:47,749-Speed 25236.41 samples/sec   Loss 13.6327   LearningRate 0.0006   Epoch: 2   Global Step: 3880   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:41:57,579-Speed 25003.21 samples/sec   Loss 13.5456   LearningRate 0.0006   Epoch: 2   Global Step: 3890   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:42:07,314-Speed 25248.67 samples/sec   Loss 13.5768   LearningRate 0.0006   Epoch: 2   Global Step: 3900   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:42:17,079-Speed 25170.45 samples/sec   Loss 13.4974   LearningRate 0.0006   Epoch: 2   Global Step: 3910   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:42:26,946-Speed 24910.98 samples/sec   Loss 13.5260   LearningRate 0.0006   Epoch: 2   Global Step: 3920   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:42:36,681-Speed 25249.00 samples/sec   Loss 13.4213   LearningRate 0.0006   Epoch: 2   Global Step: 3930   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:42:46,454-Speed 25151.89 samples/sec   Loss 13.3690   LearningRate 0.0006   Epoch: 2   Global Step: 3940   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:42:56,184-Speed 25259.51 samples/sec   Loss 13.2970   LearningRate 0.0006   Epoch: 2   Global Step: 3950   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:43:05,888-Speed 25329.59 samples/sec   Loss 13.2026   LearningRate 0.0006   Epoch: 2   Global Step: 3960   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:43:15,707-Speed 25035.02 samples/sec   Loss 13.1293   LearningRate 0.0006   Epoch: 2   Global Step: 3970   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:43:25,471-Speed 25173.89 samples/sec   Loss 13.1073   LearningRate 0.0006   Epoch: 2   Global Step: 3980   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:43:35,303-Speed 24999.28 samples/sec   Loss 13.0804   LearningRate 0.0006   Epoch: 2   Global Step: 3990   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:43:45,046-Speed 25231.76 samples/sec   Loss 13.0401   LearningRate 0.0006   Epoch: 2   Global Step: 4000   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:43:54,865-Speed 25031.91 samples/sec   Loss 13.0453   LearningRate 0.0006   Epoch: 2   Global Step: 4010   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:44:04,620-Speed 25198.44 samples/sec   Loss 12.9254   LearningRate 0.0006   Epoch: 2   Global Step: 4020   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:44:14,465-Speed 24966.65 samples/sec   Loss 12.8736   LearningRate 0.0006   Epoch: 2   Global Step: 4030   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:44:24,209-Speed 25224.69 samples/sec   Loss 12.8687   LearningRate 0.0006   Epoch: 2   Global Step: 4040   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:44:33,957-Speed 25214.93 samples/sec   Loss 12.7996   LearningRate 0.0006   Epoch: 2   Global Step: 4050   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:44:43,707-Speed 25208.57 samples/sec   Loss 12.7144   LearningRate 0.0006   Epoch: 2   Global Step: 4060   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:44:53,418-Speed 25310.90 samples/sec   Loss 12.7112   LearningRate 0.0006   Epoch: 2   Global Step: 4070   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:45:03,119-Speed 25336.58 samples/sec   Loss 12.6514   LearningRate 0.0006   Epoch: 2   Global Step: 4080   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:45:12,904-Speed 25121.01 samples/sec   Loss 12.6124   LearningRate 0.0006   Epoch: 2   Global Step: 4090   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:45:22,654-Speed 25210.84 samples/sec   Loss 12.5718   LearningRate 0.0006   Epoch: 2   Global Step: 4100   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:45:32,360-Speed 25326.26 samples/sec   Loss 12.6235   LearningRate 0.0006   Epoch: 2   Global Step: 4110   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:45:42,072-Speed 25307.38 samples/sec   Loss 12.5090   LearningRate 0.0006   Epoch: 2   Global Step: 4120   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:45:51,819-Speed 25217.09 samples/sec   Loss 12.4400   LearningRate 0.0006   Epoch: 2   Global Step: 4130   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:46:01,570-Speed 25211.03 samples/sec   Loss 12.4379   LearningRate 0.0006   Epoch: 2   Global Step: 4140   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:46:11,289-Speed 25290.91 samples/sec   Loss 12.3433   LearningRate 0.0006   Epoch: 2   Global Step: 4150   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:46:21,151-Speed 24923.95 samples/sec   Loss 12.2806   LearningRate 0.0006   Epoch: 2   Global Step: 4160   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:46:30,910-Speed 25187.19 samples/sec   Loss 12.3000   LearningRate 0.0006   Epoch: 2   Global Step: 4170   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:46:40,615-Speed 25325.31 samples/sec   Loss 12.2398   LearningRate 0.0006   Epoch: 2   Global Step: 4180   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:46:50,560-Speed 24715.79 samples/sec   Loss 12.1789   LearningRate 0.0006   Epoch: 2   Global Step: 4190   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:47:00,302-Speed 25235.29 samples/sec   Loss 12.1331   LearningRate 0.0006   Epoch: 2   Global Step: 4200   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:47:10,118-Speed 25040.59 samples/sec   Loss 12.1272   LearningRate 0.0006   Epoch: 2   Global Step: 4210   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:47:19,898-Speed 25133.54 samples/sec   Loss 12.0899   LearningRate 0.0006   Epoch: 2   Global Step: 4220   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:47:29,693-Speed 25094.69 samples/sec   Loss 12.0265   LearningRate 0.0006   Epoch: 2   Global Step: 4230   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:47:39,535-Speed 24974.47 samples/sec   Loss 12.0830   LearningRate 0.0006   Epoch: 2   Global Step: 4240   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:47:49,328-Speed 25098.36 samples/sec   Loss 11.9514   LearningRate 0.0006   Epoch: 2   Global Step: 4250   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:47:59,096-Speed 25162.92 samples/sec   Loss 11.9148   LearningRate 0.0006   Epoch: 2   Global Step: 4260   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:48:08,934-Speed 24984.37 samples/sec   Loss 11.8832   LearningRate 0.0006   Epoch: 2   Global Step: 4270   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:48:18,646-Speed 25307.81 samples/sec   Loss 11.9524   LearningRate 0.0006   Epoch: 2   Global Step: 4280   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:48:28,471-Speed 25016.85 samples/sec   Loss 11.9027   LearningRate 0.0006   Epoch: 2   Global Step: 4290   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:48:38,259-Speed 25112.91 samples/sec   Loss 11.7897   LearningRate 0.0006   Epoch: 2   Global Step: 4300   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:48:48,109-Speed 24959.77 samples/sec   Loss 11.7582   LearningRate 0.0006   Epoch: 2   Global Step: 4310   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:48:57,852-Speed 25229.31 samples/sec   Loss 11.7283   LearningRate 0.0006   Epoch: 2   Global Step: 4320   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:49:07,633-Speed 25131.05 samples/sec   Loss 11.6573   LearningRate 0.0006   Epoch: 2   Global Step: 4330   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:49:17,375-Speed 25231.12 samples/sec   Loss 11.6564   LearningRate 0.0006   Epoch: 2   Global Step: 4340   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:49:27,208-Speed 24996.63 samples/sec   Loss 11.6620   LearningRate 0.0006   Epoch: 2   Global Step: 4350   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:49:37,022-Speed 25044.90 samples/sec   Loss 11.5483   LearningRate 0.0006   Epoch: 2   Global Step: 4360   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:49:46,783-Speed 25180.21 samples/sec   Loss 11.5244   LearningRate 0.0006   Epoch: 2   Global Step: 4370   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:49:56,497-Speed 25303.70 samples/sec   Loss 11.5562   LearningRate 0.0006   Epoch: 2   Global Step: 4380   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:50:06,302-Speed 25076.01 samples/sec   Loss 11.5065   LearningRate 0.0006   Epoch: 2   Global Step: 4390   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:50:16,100-Speed 25085.51 samples/sec   Loss 11.4748   LearningRate 0.0006   Epoch: 2   Global Step: 4400   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:50:25,898-Speed 25086.95 samples/sec   Loss 11.4196   LearningRate 0.0006   Epoch: 2   Global Step: 4410   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:50:35,707-Speed 25060.21 samples/sec   Loss 11.2997   LearningRate 0.0006   Epoch: 2   Global Step: 4420   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:50:45,503-Speed 25093.93 samples/sec   Loss 11.3310   LearningRate 0.0006   Epoch: 2   Global Step: 4430   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:50:55,251-Speed 25216.23 samples/sec   Loss 11.3364   LearningRate 0.0006   Epoch: 2   Global Step: 4440   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:51:05,080-Speed 25005.39 samples/sec   Loss 11.2949   LearningRate 0.0006   Epoch: 2   Global Step: 4450   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:51:14,932-Speed 24948.79 samples/sec   Loss 11.2041   LearningRate 0.0006   Epoch: 2   Global Step: 4460   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:51:24,668-Speed 25252.34 samples/sec   Loss 11.2298   LearningRate 0.0006   Epoch: 2   Global Step: 4470   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-25 23:51:34,469-Speed 25078.05 samples/sec   Loss 11.2525   LearningRate 0.0006   Epoch: 2   Global Step: 4480   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-25 23:51:44,352-Speed 24871.25 samples/sec   Loss 11.1555   LearningRate 0.0006   Epoch: 2   Global Step: 4490   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-25 23:51:54,140-Speed 25113.61 samples/sec   Loss 11.1016   LearningRate 0.0007   Epoch: 2   Global Step: 4500   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-25 23:52:03,951-Speed 25054.44 samples/sec   Loss 11.0671   LearningRate 0.0007   Epoch: 2   Global Step: 4510   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-25 23:52:13,792-Speed 24975.34 samples/sec   Loss 11.0565   LearningRate 0.0007   Epoch: 2   Global Step: 4520   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-25 23:52:23,575-Speed 25124.43 samples/sec   Loss 11.0361   LearningRate 0.0007   Epoch: 2   Global Step: 4530   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-25 23:52:33,275-Speed 25341.27 samples/sec   Loss 11.0075   LearningRate 0.0007   Epoch: 2   Global Step: 4540   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-25 23:52:42,960-Speed 25377.26 samples/sec   Loss 11.0614   LearningRate 0.0007   Epoch: 2   Global Step: 4550   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-25 23:52:52,737-Speed 25139.88 samples/sec   Loss 11.0280   LearningRate 0.0007   Epoch: 2   Global Step: 4560   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-25 23:53:02,567-Speed 25005.29 samples/sec   Loss 10.9427   LearningRate 0.0007   Epoch: 2   Global Step: 4570   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:53:12,352-Speed 25119.12 samples/sec   Loss 10.8824   LearningRate 0.0007   Epoch: 2   Global Step: 4580   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:53:22,106-Speed 25198.72 samples/sec   Loss 10.8102   LearningRate 0.0007   Epoch: 2   Global Step: 4590   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:53:31,876-Speed 25161.07 samples/sec   Loss 10.8607   LearningRate 0.0007   Epoch: 2   Global Step: 4600   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:53:41,722-Speed 24964.47 samples/sec   Loss 10.8664   LearningRate 0.0007   Epoch: 2   Global Step: 4610   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:53:51,532-Speed 25055.69 samples/sec   Loss 10.8339   LearningRate 0.0007   Epoch: 2   Global Step: 4620   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:54:01,219-Speed 25371.86 samples/sec   Loss 10.6886   LearningRate 0.0007   Epoch: 2   Global Step: 4630   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:54:10,985-Speed 25169.49 samples/sec   Loss 10.7700   LearningRate 0.0007   Epoch: 2   Global Step: 4640   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:54:20,850-Speed 24917.16 samples/sec   Loss 10.7013   LearningRate 0.0007   Epoch: 2   Global Step: 4650   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:54:30,635-Speed 25118.22 samples/sec   Loss 10.6198   LearningRate 0.0007   Epoch: 2   Global Step: 4660   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:54:40,454-Speed 25040.51 samples/sec   Loss 10.5787   LearningRate 0.0007   Epoch: 2   Global Step: 4670   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:54:50,191-Speed 25243.10 samples/sec   Loss 10.5600   LearningRate 0.0007   Epoch: 2   Global Step: 4680   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:54:59,953-Speed 25181.89 samples/sec   Loss 10.5815   LearningRate 0.0007   Epoch: 2   Global Step: 4690   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:55:09,811-Speed 24933.24 samples/sec   Loss 10.5900   LearningRate 0.0007   Epoch: 2   Global Step: 4700   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:55:19,663-Speed 24950.51 samples/sec   Loss 10.5495   LearningRate 0.0007   Epoch: 2   Global Step: 4710   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:55:29,453-Speed 25108.22 samples/sec   Loss 10.4854   LearningRate 0.0007   Epoch: 2   Global Step: 4720   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:55:39,274-Speed 25028.60 samples/sec   Loss 10.4569   LearningRate 0.0007   Epoch: 2   Global Step: 4730   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:55:49,103-Speed 25008.11 samples/sec   Loss 10.4924   LearningRate 0.0007   Epoch: 2   Global Step: 4740   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:55:58,931-Speed 25009.09 samples/sec   Loss 10.4289   LearningRate 0.0007   Epoch: 2   Global Step: 4750   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:56:08,699-Speed 25162.58 samples/sec   Loss 10.4140   LearningRate 0.0007   Epoch: 2   Global Step: 4760   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:56:18,449-Speed 25212.08 samples/sec   Loss 10.4119   LearningRate 0.0007   Epoch: 2   Global Step: 4770   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-03-25 23:56:28,305-Speed 24941.63 samples/sec   Loss 10.3793   LearningRate 0.0007   Epoch: 2   Global Step: 4780   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:56:38,159-Speed 24943.38 samples/sec   Loss 10.3522   LearningRate 0.0007   Epoch: 2   Global Step: 4790   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:56:47,896-Speed 25241.57 samples/sec   Loss 10.3673   LearningRate 0.0007   Epoch: 2   Global Step: 4800   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:56:57,694-Speed 25087.72 samples/sec   Loss 10.2997   LearningRate 0.0007   Epoch: 2   Global Step: 4810   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:57:07,452-Speed 25187.33 samples/sec   Loss 10.2656   LearningRate 0.0007   Epoch: 2   Global Step: 4820   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:57:17,353-Speed 24827.43 samples/sec   Loss 10.2868   LearningRate 0.0007   Epoch: 2   Global Step: 4830   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:57:27,118-Speed 25177.99 samples/sec   Loss 10.1768   LearningRate 0.0007   Epoch: 2   Global Step: 4840   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:57:36,870-Speed 25205.31 samples/sec   Loss 10.1621   LearningRate 0.0007   Epoch: 2   Global Step: 4850   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:57:46,737-Speed 24911.77 samples/sec   Loss 10.1530   LearningRate 0.0007   Epoch: 2   Global Step: 4860   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:57:56,654-Speed 24784.97 samples/sec   Loss 10.1176   LearningRate 0.0007   Epoch: 2   Global Step: 4870   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:58:06,353-Speed 25342.66 samples/sec   Loss 10.1218   LearningRate 0.0007   Epoch: 2   Global Step: 4880   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:58:16,162-Speed 25057.98 samples/sec   Loss 10.0869   LearningRate 0.0007   Epoch: 2   Global Step: 4890   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:58:25,999-Speed 24985.12 samples/sec   Loss 10.0806   LearningRate 0.0007   Epoch: 2   Global Step: 4900   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:58:35,824-Speed 25018.33 samples/sec   Loss 10.0787   LearningRate 0.0007   Epoch: 2   Global Step: 4910   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:58:45,650-Speed 25014.37 samples/sec   Loss 10.0137   LearningRate 0.0007   Epoch: 2   Global Step: 4920   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:58:55,429-Speed 25134.10 samples/sec   Loss 10.0020   LearningRate 0.0007   Epoch: 2   Global Step: 4930   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:59:05,193-Speed 25174.40 samples/sec   Loss 9.9455   LearningRate 0.0007   Epoch: 2   Global Step: 4940   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:59:15,012-Speed 25030.26 samples/sec   Loss 9.9494   LearningRate 0.0007   Epoch: 2   Global Step: 4950   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:59:24,776-Speed 25174.85 samples/sec   Loss 9.9770   LearningRate 0.0007   Epoch: 2   Global Step: 4960   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:59:34,559-Speed 25124.38 samples/sec   Loss 9.9730   LearningRate 0.0007   Epoch: 2   Global Step: 4970   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:59:44,439-Speed 24878.23 samples/sec   Loss 9.9573   LearningRate 0.0007   Epoch: 2   Global Step: 4980   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-25 23:59:54,296-Speed 24935.87 samples/sec   Loss 9.8760   LearningRate 0.0007   Epoch: 2   Global Step: 4990   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:00:04,096-Speed 25081.86 samples/sec   Loss 9.8640   LearningRate 0.0007   Epoch: 2   Global Step: 5000   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:00:13,909-Speed 25049.23 samples/sec   Loss 9.8768   LearningRate 0.0007   Epoch: 2   Global Step: 5010   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:00:23,719-Speed 25058.36 samples/sec   Loss 9.8387   LearningRate 0.0007   Epoch: 2   Global Step: 5020   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:00:33,524-Speed 25068.39 samples/sec   Loss 9.7866   LearningRate 0.0007   Epoch: 2   Global Step: 5030   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:00:43,267-Speed 25227.29 samples/sec   Loss 9.8399   LearningRate 0.0007   Epoch: 2   Global Step: 5040   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:00:53,114-Speed 24962.07 samples/sec   Loss 9.7810   LearningRate 0.0007   Epoch: 2   Global Step: 5050   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:01:02,893-Speed 25134.12 samples/sec   Loss 9.7603   LearningRate 0.0007   Epoch: 2   Global Step: 5060   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:01:12,701-Speed 25062.07 samples/sec   Loss 9.6907   LearningRate 0.0007   Epoch: 2   Global Step: 5070   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:01:22,496-Speed 25093.12 samples/sec   Loss 9.7353   LearningRate 0.0007   Epoch: 2   Global Step: 5080   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:01:32,310-Speed 25050.01 samples/sec   Loss 9.7265   LearningRate 0.0007   Epoch: 2   Global Step: 5090   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:01:42,061-Speed 25206.37 samples/sec   Loss 9.6970   LearningRate 0.0007   Epoch: 2   Global Step: 5100   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:01:51,810-Speed 25214.50 samples/sec   Loss 9.7057   LearningRate 0.0007   Epoch: 2   Global Step: 5110   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:02:01,650-Speed 24977.51 samples/sec   Loss 9.6234   LearningRate 0.0007   Epoch: 2   Global Step: 5120   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:02:11,368-Speed 25290.98 samples/sec   Loss 9.7377   LearningRate 0.0007   Epoch: 2   Global Step: 5130   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:02:21,061-Speed 25357.24 samples/sec   Loss 9.6508   LearningRate 0.0007   Epoch: 2   Global Step: 5140   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:02:30,808-Speed 25220.80 samples/sec   Loss 9.6799   LearningRate 0.0007   Epoch: 2   Global Step: 5150   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:02:40,520-Speed 25311.73 samples/sec   Loss 9.5638   LearningRate 0.0007   Epoch: 2   Global Step: 5160   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:02:50,378-Speed 24933.31 samples/sec   Loss 9.5930   LearningRate 0.0007   Epoch: 2   Global Step: 5170   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:03:00,179-Speed 25078.91 samples/sec   Loss 9.6003   LearningRate 0.0007   Epoch: 2   Global Step: 5180   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:03:59,623-Speed 4134.44 samples/sec   Loss 9.4701   LearningRate 0.0008   Epoch: 3   Global Step: 5190   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:04:09,330-Speed 25322.28 samples/sec   Loss 9.4427   LearningRate 0.0008   Epoch: 3   Global Step: 5200   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:04:19,091-Speed 25181.88 samples/sec   Loss 9.4165   LearningRate 0.0008   Epoch: 3   Global Step: 5210   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:04:28,786-Speed 25352.36 samples/sec   Loss 9.4297   LearningRate 0.0008   Epoch: 3   Global Step: 5220   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:04:38,520-Speed 25249.93 samples/sec   Loss 9.4170   LearningRate 0.0008   Epoch: 3   Global Step: 5230   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:04:48,175-Speed 25459.46 samples/sec   Loss 9.4249   LearningRate 0.0008   Epoch: 3   Global Step: 5240   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:04:57,848-Speed 25410.10 samples/sec   Loss 9.3451   LearningRate 0.0008   Epoch: 3   Global Step: 5250   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:05:07,582-Speed 25249.68 samples/sec   Loss 9.3547   LearningRate 0.0008   Epoch: 3   Global Step: 5260   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:05:17,385-Speed 25073.18 samples/sec   Loss 9.3459   LearningRate 0.0008   Epoch: 3   Global Step: 5270   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:05:27,079-Speed 25360.88 samples/sec   Loss 9.2893   LearningRate 0.0008   Epoch: 3   Global Step: 5280   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:05:36,822-Speed 25230.88 samples/sec   Loss 9.2653   LearningRate 0.0008   Epoch: 3   Global Step: 5290   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:05:46,598-Speed 25147.75 samples/sec   Loss 9.2566   LearningRate 0.0008   Epoch: 3   Global Step: 5300   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:05:56,336-Speed 25241.67 samples/sec   Loss 9.2670   LearningRate 0.0008   Epoch: 3   Global Step: 5310   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:06:06,055-Speed 25289.47 samples/sec   Loss 9.2176   LearningRate 0.0008   Epoch: 3   Global Step: 5320   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:06:15,884-Speed 25006.69 samples/sec   Loss 9.1890   LearningRate 0.0008   Epoch: 3   Global Step: 5330   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:06:25,617-Speed 25252.48 samples/sec   Loss 9.2091   LearningRate 0.0008   Epoch: 3   Global Step: 5340   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:06:35,440-Speed 25023.01 samples/sec   Loss 9.1918   LearningRate 0.0008   Epoch: 3   Global Step: 5350   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:06:45,288-Speed 24958.41 samples/sec   Loss 9.1846   LearningRate 0.0008   Epoch: 3   Global Step: 5360   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:06:55,258-Speed 24652.77 samples/sec   Loss 9.1476   LearningRate 0.0008   Epoch: 3   Global Step: 5370   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:07:05,070-Speed 25057.50 samples/sec   Loss 9.1563   LearningRate 0.0008   Epoch: 3   Global Step: 5380   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:07:14,833-Speed 25183.64 samples/sec   Loss 9.1902   LearningRate 0.0008   Epoch: 3   Global Step: 5390   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:07:24,636-Speed 25071.94 samples/sec   Loss 9.0847   LearningRate 0.0008   Epoch: 3   Global Step: 5400   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:07:34,359-Speed 25279.70 samples/sec   Loss 9.1499   LearningRate 0.0008   Epoch: 3   Global Step: 5410   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:07:43,993-Speed 25513.10 samples/sec   Loss 9.1415   LearningRate 0.0008   Epoch: 3   Global Step: 5420   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:07:53,669-Speed 25400.29 samples/sec   Loss 9.1243   LearningRate 0.0008   Epoch: 3   Global Step: 5430   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:08:03,445-Speed 25144.23 samples/sec   Loss 9.1109   LearningRate 0.0008   Epoch: 3   Global Step: 5440   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:08:13,225-Speed 25130.78 samples/sec   Loss 9.1249   LearningRate 0.0008   Epoch: 3   Global Step: 5450   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:08:22,902-Speed 25406.45 samples/sec   Loss 9.1001   LearningRate 0.0008   Epoch: 3   Global Step: 5460   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:08:32,610-Speed 25318.35 samples/sec   Loss 9.0919   LearningRate 0.0008   Epoch: 3   Global Step: 5470   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:08:42,298-Speed 25371.43 samples/sec   Loss 8.9636   LearningRate 0.0008   Epoch: 3   Global Step: 5480   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:08:52,133-Speed 24992.26 samples/sec   Loss 8.9695   LearningRate 0.0008   Epoch: 3   Global Step: 5490   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:09:01,850-Speed 25292.55 samples/sec   Loss 8.9780   LearningRate 0.0008   Epoch: 3   Global Step: 5500   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:09:11,548-Speed 25346.54 samples/sec   Loss 8.9259   LearningRate 0.0008   Epoch: 3   Global Step: 5510   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:09:21,316-Speed 25163.32 samples/sec   Loss 8.9477   LearningRate 0.0008   Epoch: 3   Global Step: 5520   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:09:31,056-Speed 25232.77 samples/sec   Loss 8.9673   LearningRate 0.0008   Epoch: 3   Global Step: 5530   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:09:40,806-Speed 25209.06 samples/sec   Loss 8.9054   LearningRate 0.0008   Epoch: 3   Global Step: 5540   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:09:50,600-Speed 25097.34 samples/sec   Loss 8.8796   LearningRate 0.0008   Epoch: 3   Global Step: 5550   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:10:00,318-Speed 25290.97 samples/sec   Loss 8.8892   LearningRate 0.0008   Epoch: 3   Global Step: 5560   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:10:10,064-Speed 25221.00 samples/sec   Loss 8.8740   LearningRate 0.0008   Epoch: 3   Global Step: 5570   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:10:19,910-Speed 24963.09 samples/sec   Loss 8.8371   LearningRate 0.0008   Epoch: 3   Global Step: 5580   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:10:29,696-Speed 25117.61 samples/sec   Loss 8.8797   LearningRate 0.0008   Epoch: 3   Global Step: 5590   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:10:39,485-Speed 25110.77 samples/sec   Loss 8.8208   LearningRate 0.0008   Epoch: 3   Global Step: 5600   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:10:49,333-Speed 24957.71 samples/sec   Loss 8.7724   LearningRate 0.0008   Epoch: 3   Global Step: 5610   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:10:59,026-Speed 25357.31 samples/sec   Loss 8.8220   LearningRate 0.0008   Epoch: 3   Global Step: 5620   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:11:08,760-Speed 25252.15 samples/sec   Loss 8.7634   LearningRate 0.0008   Epoch: 3   Global Step: 5630   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:11:18,505-Speed 25222.65 samples/sec   Loss 8.7438   LearningRate 0.0008   Epoch: 3   Global Step: 5640   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:11:28,134-Speed 25527.61 samples/sec   Loss 8.7767   LearningRate 0.0008   Epoch: 3   Global Step: 5650   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:11:37,890-Speed 25191.96 samples/sec   Loss 8.8055   LearningRate 0.0008   Epoch: 3   Global Step: 5660   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:11:47,610-Speed 25289.09 samples/sec   Loss 8.7738   LearningRate 0.0008   Epoch: 3   Global Step: 5670   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:11:57,371-Speed 25180.09 samples/sec   Loss 8.6217   LearningRate 0.0008   Epoch: 3   Global Step: 5680   Fp16 Grad Scale: 262144   Required: 18 hours
Training: 2022-03-26 00:12:07,069-Speed 25344.44 samples/sec   Loss 8.6296   LearningRate 0.0008   Epoch: 3   Global Step: 5690   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:12:16,815-Speed 25219.41 samples/sec   Loss 8.6877   LearningRate 0.0008   Epoch: 3   Global Step: 5700   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:12:26,484-Speed 25420.79 samples/sec   Loss 8.7370   LearningRate 0.0008   Epoch: 3   Global Step: 5710   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:12:36,313-Speed 25008.18 samples/sec   Loss 8.6183   LearningRate 0.0008   Epoch: 3   Global Step: 5720   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:12:46,057-Speed 25224.56 samples/sec   Loss 8.5889   LearningRate 0.0008   Epoch: 3   Global Step: 5730   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:12:55,850-Speed 25098.72 samples/sec   Loss 8.6456   LearningRate 0.0008   Epoch: 3   Global Step: 5740   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:13:05,576-Speed 25270.20 samples/sec   Loss 8.6471   LearningRate 0.0008   Epoch: 3   Global Step: 5750   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:13:15,423-Speed 24961.15 samples/sec   Loss 8.5966   LearningRate 0.0008   Epoch: 3   Global Step: 5760   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:13:25,148-Speed 25274.57 samples/sec   Loss 8.5759   LearningRate 0.0008   Epoch: 3   Global Step: 5770   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:13:34,895-Speed 25215.66 samples/sec   Loss 8.5472   LearningRate 0.0008   Epoch: 3   Global Step: 5780   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:13:44,656-Speed 25182.06 samples/sec   Loss 8.5257   LearningRate 0.0008   Epoch: 3   Global Step: 5790   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:13:54,473-Speed 25042.20 samples/sec   Loss 8.4904   LearningRate 0.0008   Epoch: 3   Global Step: 5800   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:14:04,264-Speed 25103.59 samples/sec   Loss 8.5452   LearningRate 0.0008   Epoch: 3   Global Step: 5810   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:14:13,954-Speed 25365.60 samples/sec   Loss 8.5394   LearningRate 0.0008   Epoch: 3   Global Step: 5820   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:14:23,690-Speed 25247.59 samples/sec   Loss 8.5146   LearningRate 0.0008   Epoch: 3   Global Step: 5830   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:14:33,554-Speed 24917.01 samples/sec   Loss 8.4958   LearningRate 0.0008   Epoch: 3   Global Step: 5840   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:14:43,362-Speed 25061.99 samples/sec   Loss 8.4932   LearningRate 0.0008   Epoch: 3   Global Step: 5850   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:14:53,127-Speed 25176.04 samples/sec   Loss 8.4356   LearningRate 0.0008   Epoch: 3   Global Step: 5860   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:15:02,892-Speed 25170.01 samples/sec   Loss 8.4244   LearningRate 0.0008   Epoch: 3   Global Step: 5870   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:15:12,778-Speed 24863.24 samples/sec   Loss 8.3853   LearningRate 0.0009   Epoch: 3   Global Step: 5880   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:15:22,530-Speed 25204.48 samples/sec   Loss 8.3750   LearningRate 0.0009   Epoch: 3   Global Step: 5890   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:15:32,286-Speed 25194.25 samples/sec   Loss 8.3303   LearningRate 0.0009   Epoch: 3   Global Step: 5900   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:15:42,143-Speed 24932.52 samples/sec   Loss 8.3864   LearningRate 0.0009   Epoch: 3   Global Step: 5910   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:15:52,057-Speed 24793.18 samples/sec   Loss 8.3414   LearningRate 0.0009   Epoch: 3   Global Step: 5920   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:16:01,916-Speed 24929.77 samples/sec   Loss 8.4041   LearningRate 0.0009   Epoch: 3   Global Step: 5930   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:16:11,687-Speed 25155.50 samples/sec   Loss 8.3395   LearningRate 0.0009   Epoch: 3   Global Step: 5940   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:16:21,419-Speed 25253.53 samples/sec   Loss 8.3195   LearningRate 0.0009   Epoch: 3   Global Step: 5950   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:16:31,230-Speed 25052.34 samples/sec   Loss 8.3377   LearningRate 0.0009   Epoch: 3   Global Step: 5960   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:16:40,972-Speed 25230.44 samples/sec   Loss 8.3686   LearningRate 0.0009   Epoch: 3   Global Step: 5970   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:16:50,721-Speed 25212.55 samples/sec   Loss 8.3669   LearningRate 0.0009   Epoch: 3   Global Step: 5980   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:17:00,519-Speed 25090.97 samples/sec   Loss 8.3554   LearningRate 0.0009   Epoch: 3   Global Step: 5990   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:17:10,335-Speed 25040.95 samples/sec   Loss 8.2894   LearningRate 0.0009   Epoch: 3   Global Step: 6000   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:17:20,109-Speed 25147.01 samples/sec   Loss 8.2254   LearningRate 0.0009   Epoch: 3   Global Step: 6010   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:17:29,819-Speed 25313.54 samples/sec   Loss 8.2284   LearningRate 0.0009   Epoch: 3   Global Step: 6020   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:17:39,574-Speed 25197.02 samples/sec   Loss 8.1942   LearningRate 0.0009   Epoch: 3   Global Step: 6030   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:17:49,402-Speed 25009.50 samples/sec   Loss 8.2376   LearningRate 0.0009   Epoch: 3   Global Step: 6040   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:17:59,149-Speed 25216.90 samples/sec   Loss 8.2390   LearningRate 0.0009   Epoch: 3   Global Step: 6050   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:18:09,007-Speed 24933.33 samples/sec   Loss 8.2134   LearningRate 0.0009   Epoch: 3   Global Step: 6060   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:18:18,770-Speed 25176.86 samples/sec   Loss 8.1617   LearningRate 0.0009   Epoch: 3   Global Step: 6070   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:18:28,554-Speed 25121.06 samples/sec   Loss 8.2070   LearningRate 0.0009   Epoch: 3   Global Step: 6080   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:18:38,378-Speed 25018.68 samples/sec   Loss 8.2319   LearningRate 0.0009   Epoch: 3   Global Step: 6090   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:18:48,197-Speed 25032.15 samples/sec   Loss 8.1441   LearningRate 0.0009   Epoch: 3   Global Step: 6100   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:18:57,983-Speed 25114.55 samples/sec   Loss 8.1726   LearningRate 0.0009   Epoch: 3   Global Step: 6110   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:19:07,903-Speed 24778.72 samples/sec   Loss 8.1756   LearningRate 0.0009   Epoch: 3   Global Step: 6120   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:19:17,787-Speed 24871.80 samples/sec   Loss 8.1066   LearningRate 0.0009   Epoch: 3   Global Step: 6130   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:19:27,544-Speed 25190.26 samples/sec   Loss 8.0820   LearningRate 0.0009   Epoch: 3   Global Step: 6140   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:19:37,423-Speed 24880.18 samples/sec   Loss 8.1431   LearningRate 0.0009   Epoch: 3   Global Step: 6150   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:19:47,197-Speed 25145.45 samples/sec   Loss 8.1249   LearningRate 0.0009   Epoch: 3   Global Step: 6160   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:19:57,196-Speed 24584.11 samples/sec   Loss 8.1091   LearningRate 0.0009   Epoch: 3   Global Step: 6170   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:20:07,098-Speed 24821.25 samples/sec   Loss 7.9841   LearningRate 0.0009   Epoch: 3   Global Step: 6180   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:20:16,889-Speed 25112.39 samples/sec   Loss 7.9398   LearningRate 0.0009   Epoch: 3   Global Step: 6190   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:20:26,701-Speed 25049.36 samples/sec   Loss 8.0519   LearningRate 0.0009   Epoch: 3   Global Step: 6200   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:20:36,476-Speed 25145.17 samples/sec   Loss 8.0810   LearningRate 0.0009   Epoch: 3   Global Step: 6210   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:20:46,225-Speed 25210.99 samples/sec   Loss 8.0225   LearningRate 0.0009   Epoch: 3   Global Step: 6220   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:20:56,108-Speed 24876.74 samples/sec   Loss 8.0601   LearningRate 0.0009   Epoch: 3   Global Step: 6230   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:21:06,356-Speed 23983.17 samples/sec   Loss 8.0344   LearningRate 0.0009   Epoch: 3   Global Step: 6240   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:21:16,341-Speed 24615.50 samples/sec   Loss 7.9839   LearningRate 0.0009   Epoch: 3   Global Step: 6250   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:21:26,420-Speed 24386.81 samples/sec   Loss 7.9602   LearningRate 0.0009   Epoch: 3   Global Step: 6260   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:21:36,540-Speed 24289.46 samples/sec   Loss 7.9311   LearningRate 0.0009   Epoch: 3   Global Step: 6270   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:21:46,612-Speed 24412.34 samples/sec   Loss 7.9621   LearningRate 0.0009   Epoch: 3   Global Step: 6280   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:21:56,690-Speed 24389.53 samples/sec   Loss 8.0333   LearningRate 0.0009   Epoch: 3   Global Step: 6290   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:22:06,812-Speed 24282.23 samples/sec   Loss 7.8987   LearningRate 0.0009   Epoch: 3   Global Step: 6300   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:22:16,817-Speed 24566.48 samples/sec   Loss 7.9394   LearningRate 0.0009   Epoch: 3   Global Step: 6310   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:22:27,113-Speed 23873.23 samples/sec   Loss 7.9426   LearningRate 0.0009   Epoch: 3   Global Step: 6320   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:22:37,257-Speed 24229.24 samples/sec   Loss 7.9326   LearningRate 0.0009   Epoch: 3   Global Step: 6330   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:22:47,303-Speed 24466.20 samples/sec   Loss 7.9177   LearningRate 0.0009   Epoch: 3   Global Step: 6340   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-26 00:22:57,283-Speed 24628.79 samples/sec   Loss 7.8422   LearningRate 0.0009   Epoch: 3   Global Step: 6350   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-26 00:23:07,345-Speed 24428.02 samples/sec   Loss 7.8939   LearningRate 0.0009   Epoch: 3   Global Step: 6360   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-26 00:23:17,382-Speed 24489.82 samples/sec   Loss 7.8812   LearningRate 0.0009   Epoch: 3   Global Step: 6370   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-26 00:23:27,390-Speed 24560.68 samples/sec   Loss 7.7782   LearningRate 0.0009   Epoch: 3   Global Step: 6380   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-26 00:23:37,332-Speed 24721.38 samples/sec   Loss 7.8224   LearningRate 0.0009   Epoch: 3   Global Step: 6390   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-26 00:23:47,414-Speed 24381.34 samples/sec   Loss 7.8082   LearningRate 0.0009   Epoch: 3   Global Step: 6400   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-26 00:23:57,609-Speed 24109.10 samples/sec   Loss 7.7722   LearningRate 0.0009   Epoch: 3   Global Step: 6410   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-26 00:24:07,675-Speed 24418.09 samples/sec   Loss 7.7975   LearningRate 0.0009   Epoch: 3   Global Step: 6420   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-26 00:24:17,595-Speed 24775.20 samples/sec   Loss 7.7403   LearningRate 0.0009   Epoch: 3   Global Step: 6430   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-26 00:24:27,614-Speed 24533.08 samples/sec   Loss 7.7146   LearningRate 0.0009   Epoch: 3   Global Step: 6440   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-26 00:24:37,612-Speed 24583.15 samples/sec   Loss 7.7962   LearningRate 0.0009   Epoch: 3   Global Step: 6450   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:24:47,688-Speed 24392.39 samples/sec   Loss 7.7953   LearningRate 0.0009   Epoch: 3   Global Step: 6460   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:24:57,756-Speed 24413.54 samples/sec   Loss 7.7941   LearningRate 0.0009   Epoch: 3   Global Step: 6470   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:25:07,792-Speed 24490.69 samples/sec   Loss 7.7030   LearningRate 0.0009   Epoch: 3   Global Step: 6480   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:25:17,895-Speed 24330.83 samples/sec   Loss 7.7171   LearningRate 0.0009   Epoch: 3   Global Step: 6490   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:25:27,893-Speed 24582.19 samples/sec   Loss 7.7340   LearningRate 0.0009   Epoch: 3   Global Step: 6500   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:25:37,926-Speed 24498.67 samples/sec   Loss 7.6852   LearningRate 0.0009   Epoch: 3   Global Step: 6510   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:25:48,012-Speed 24368.77 samples/sec   Loss 7.7294   LearningRate 0.0009   Epoch: 3   Global Step: 6520   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:25:58,037-Speed 24518.46 samples/sec   Loss 7.6826   LearningRate 0.0009   Epoch: 3   Global Step: 6530   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:26:08,195-Speed 24196.59 samples/sec   Loss 7.7042   LearningRate 0.0009   Epoch: 3   Global Step: 6540   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-03-26 00:26:18,226-Speed 24502.38 samples/sec   Loss 7.6462   LearningRate 0.0009   Epoch: 3   Global Step: 6550   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:26:28,173-Speed 24710.91 samples/sec   Loss 7.6830   LearningRate 0.0009   Epoch: 3   Global Step: 6560   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:26:37,959-Speed 25116.16 samples/sec   Loss 7.6741   LearningRate 0.0010   Epoch: 3   Global Step: 6570   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:26:47,767-Speed 25059.93 samples/sec   Loss 7.6447   LearningRate 0.0010   Epoch: 3   Global Step: 6580   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:26:57,484-Speed 25293.30 samples/sec   Loss 7.6585   LearningRate 0.0010   Epoch: 3   Global Step: 6590   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:27:07,377-Speed 24845.76 samples/sec   Loss 7.6434   LearningRate 0.0010   Epoch: 3   Global Step: 6600   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:27:17,217-Speed 24979.21 samples/sec   Loss 7.6320   LearningRate 0.0010   Epoch: 3   Global Step: 6610   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:27:27,052-Speed 24993.21 samples/sec   Loss 7.6203   LearningRate 0.0010   Epoch: 3   Global Step: 6620   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:27:36,867-Speed 25040.50 samples/sec   Loss 7.5565   LearningRate 0.0010   Epoch: 3   Global Step: 6630   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:27:46,696-Speed 25008.15 samples/sec   Loss 7.6590   LearningRate 0.0010   Epoch: 3   Global Step: 6640   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:27:56,485-Speed 25108.67 samples/sec   Loss 7.6426   LearningRate 0.0010   Epoch: 3   Global Step: 6650   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:28:06,338-Speed 24946.67 samples/sec   Loss 7.5881   LearningRate 0.0010   Epoch: 3   Global Step: 6660   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:28:16,303-Speed 24665.76 samples/sec   Loss 7.5379   LearningRate 0.0010   Epoch: 3   Global Step: 6670   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:28:26,037-Speed 25249.66 samples/sec   Loss 7.5526   LearningRate 0.0010   Epoch: 3   Global Step: 6680   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:28:35,919-Speed 24873.60 samples/sec   Loss 7.5407   LearningRate 0.0010   Epoch: 3   Global Step: 6690   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:28:45,676-Speed 25190.25 samples/sec   Loss 7.5633   LearningRate 0.0010   Epoch: 3   Global Step: 6700   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:28:55,337-Speed 25440.25 samples/sec   Loss 7.5421   LearningRate 0.0010   Epoch: 3   Global Step: 6710   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:29:05,106-Speed 25160.23 samples/sec   Loss 7.4941   LearningRate 0.0010   Epoch: 3   Global Step: 6720   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:29:14,882-Speed 25143.61 samples/sec   Loss 7.4862   LearningRate 0.0010   Epoch: 3   Global Step: 6730   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:29:24,768-Speed 24863.46 samples/sec   Loss 7.5958   LearningRate 0.0010   Epoch: 3   Global Step: 6740   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:29:34,634-Speed 24918.15 samples/sec   Loss 7.4902   LearningRate 0.0010   Epoch: 3   Global Step: 6750   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:29:44,536-Speed 24820.67 samples/sec   Loss 7.4514   LearningRate 0.0010   Epoch: 3   Global Step: 6760   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:29:54,440-Speed 24817.93 samples/sec   Loss 7.4421   LearningRate 0.0010   Epoch: 3   Global Step: 6770   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:30:04,258-Speed 25034.29 samples/sec   Loss 7.4929   LearningRate 0.0010   Epoch: 3   Global Step: 6780   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:30:14,132-Speed 24893.92 samples/sec   Loss 7.3748   LearningRate 0.0010   Epoch: 3   Global Step: 6790   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:30:24,056-Speed 24767.21 samples/sec   Loss 7.4649   LearningRate 0.0010   Epoch: 3   Global Step: 6800   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:30:33,874-Speed 25035.99 samples/sec   Loss 7.3780   LearningRate 0.0010   Epoch: 3   Global Step: 6810   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:30:43,917-Speed 24474.72 samples/sec   Loss 7.4102   LearningRate 0.0010   Epoch: 3   Global Step: 6820   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:30:53,933-Speed 24538.75 samples/sec   Loss 7.3705   LearningRate 0.0010   Epoch: 3   Global Step: 6830   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:31:03,806-Speed 24895.82 samples/sec   Loss 7.3566   LearningRate 0.0010   Epoch: 3   Global Step: 6840   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:31:13,590-Speed 25121.83 samples/sec   Loss 7.3351   LearningRate 0.0010   Epoch: 3   Global Step: 6850   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:31:23,373-Speed 25124.41 samples/sec   Loss 7.4099   LearningRate 0.0010   Epoch: 3   Global Step: 6860   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:31:33,309-Speed 24734.85 samples/sec   Loss 7.4066   LearningRate 0.0010   Epoch: 3   Global Step: 6870   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:31:43,174-Speed 24917.49 samples/sec   Loss 7.4310   LearningRate 0.0010   Epoch: 3   Global Step: 6880   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:31:52,994-Speed 25036.63 samples/sec   Loss 7.3806   LearningRate 0.0010   Epoch: 3   Global Step: 6890   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:32:02,935-Speed 24724.54 samples/sec   Loss 7.3986   LearningRate 0.0010   Epoch: 3   Global Step: 6900   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:32:12,860-Speed 24764.29 samples/sec   Loss 7.4012   LearningRate 0.0010   Epoch: 3   Global Step: 6910   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:33:13,442-Speed 4056.76 samples/sec   Loss 7.3120   LearningRate 0.0010   Epoch: 4   Global Step: 6920   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:33:23,238-Speed 25092.98 samples/sec   Loss 7.2355   LearningRate 0.0010   Epoch: 4   Global Step: 6930   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:33:33,012-Speed 25148.24 samples/sec   Loss 7.2313   LearningRate 0.0010   Epoch: 4   Global Step: 6940   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:33:42,883-Speed 24899.68 samples/sec   Loss 7.2063   LearningRate 0.0010   Epoch: 4   Global Step: 6950   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:33:52,723-Speed 24984.04 samples/sec   Loss 7.2832   LearningRate 0.0010   Epoch: 4   Global Step: 6960   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:34:02,424-Speed 25335.26 samples/sec   Loss 7.2097   LearningRate 0.0010   Epoch: 4   Global Step: 6970   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:34:12,334-Speed 24803.02 samples/sec   Loss 7.2662   LearningRate 0.0010   Epoch: 4   Global Step: 6980   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:34:22,272-Speed 24731.73 samples/sec   Loss 7.1920   LearningRate 0.0010   Epoch: 4   Global Step: 6990   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:34:32,064-Speed 25101.05 samples/sec   Loss 7.2773   LearningRate 0.0010   Epoch: 4   Global Step: 7000   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:34:41,759-Speed 25350.28 samples/sec   Loss 7.1750   LearningRate 0.0010   Epoch: 4   Global Step: 7010   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:34:51,676-Speed 24787.00 samples/sec   Loss 7.2162   LearningRate 0.0010   Epoch: 4   Global Step: 7020   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:35:01,449-Speed 25148.21 samples/sec   Loss 7.1763   LearningRate 0.0010   Epoch: 4   Global Step: 7030   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:35:11,270-Speed 25027.46 samples/sec   Loss 7.1563   LearningRate 0.0010   Epoch: 4   Global Step: 7040   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:35:21,117-Speed 24960.25 samples/sec   Loss 7.0961   LearningRate 0.0010   Epoch: 4   Global Step: 7050   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-03-26 00:35:30,907-Speed 25105.62 samples/sec   Loss 7.1668   LearningRate 0.0010   Epoch: 4   Global Step: 7060   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:35:40,790-Speed 24868.34 samples/sec   Loss 7.1404   LearningRate 0.0010   Epoch: 4   Global Step: 7070   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:35:50,575-Speed 25119.82 samples/sec   Loss 7.1577   LearningRate 0.0010   Epoch: 4   Global Step: 7080   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:36:00,459-Speed 24868.66 samples/sec   Loss 7.1018   LearningRate 0.0010   Epoch: 4   Global Step: 7090   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:36:10,244-Speed 25118.32 samples/sec   Loss 7.1295   LearningRate 0.0010   Epoch: 4   Global Step: 7100   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:36:19,995-Speed 25206.64 samples/sec   Loss 7.0792   LearningRate 0.0010   Epoch: 4   Global Step: 7110   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-26 00:36:29,910-Speed 24791.88 samples/sec   Loss 7.0970   LearningRate 0.0010   Epoch: 4   Global Step: 7120   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-26 00:36:39,669-Speed 25184.28 samples/sec   Loss 7.0911   LearningRate 0.0010   Epoch: 4   Global Step: 7130   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-26 00:36:49,428-Speed 25186.38 samples/sec   Loss 7.0420   LearningRate 0.0010   Epoch: 4   Global Step: 7140   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-26 00:36:59,259-Speed 25001.78 samples/sec   Loss 7.0417   LearningRate 0.0010   Epoch: 4   Global Step: 7150   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-26 00:37:09,000-Speed 25232.50 samples/sec   Loss 7.0684   LearningRate 0.0010   Epoch: 4   Global Step: 7160   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-26 00:37:18,767-Speed 25164.48 samples/sec   Loss 7.0660   LearningRate 0.0010   Epoch: 4   Global Step: 7170   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-26 00:37:28,517-Speed 25208.99 samples/sec   Loss 7.0251   LearningRate 0.0010   Epoch: 4   Global Step: 7180   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-26 00:37:38,426-Speed 24804.74 samples/sec   Loss 7.0165   LearningRate 0.0010   Epoch: 4   Global Step: 7190   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-26 00:37:48,191-Speed 25172.18 samples/sec   Loss 6.9950   LearningRate 0.0010   Epoch: 4   Global Step: 7200   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-26 00:37:57,952-Speed 25179.39 samples/sec   Loss 6.9845   LearningRate 0.0010   Epoch: 4   Global Step: 7210   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:38:07,878-Speed 24764.69 samples/sec   Loss 7.0303   LearningRate 0.0010   Epoch: 4   Global Step: 7220   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:38:17,680-Speed 25075.88 samples/sec   Loss 6.9704   LearningRate 0.0010   Epoch: 4   Global Step: 7230   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:38:27,402-Speed 25280.90 samples/sec   Loss 6.9641   LearningRate 0.0010   Epoch: 4   Global Step: 7240   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:38:37,183-Speed 25128.71 samples/sec   Loss 6.9684   LearningRate 0.0010   Epoch: 4   Global Step: 7250   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:38:46,862-Speed 25394.15 samples/sec   Loss 6.9266   LearningRate 0.0010   Epoch: 4   Global Step: 7260   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:38:56,651-Speed 25110.98 samples/sec   Loss 6.9443   LearningRate 0.0010   Epoch: 4   Global Step: 7270   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:39:06,465-Speed 25044.62 samples/sec   Loss 6.9226   LearningRate 0.0010   Epoch: 4   Global Step: 7280   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:39:16,338-Speed 24893.78 samples/sec   Loss 6.9126   LearningRate 0.0010   Epoch: 4   Global Step: 7290   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:39:26,190-Speed 24950.89 samples/sec   Loss 6.8732   LearningRate 0.0010   Epoch: 4   Global Step: 7300   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:39:35,963-Speed 25149.04 samples/sec   Loss 6.9062   LearningRate 0.0010   Epoch: 4   Global Step: 7310   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:39:45,813-Speed 24954.48 samples/sec   Loss 6.8869   LearningRate 0.0010   Epoch: 4   Global Step: 7320   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:39:55,536-Speed 25288.26 samples/sec   Loss 6.8599   LearningRate 0.0010   Epoch: 4   Global Step: 7330   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:40:05,448-Speed 24800.87 samples/sec   Loss 6.8219   LearningRate 0.0010   Epoch: 4   Global Step: 7340   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:40:15,201-Speed 25200.67 samples/sec   Loss 6.8643   LearningRate 0.0010   Epoch: 4   Global Step: 7350   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:40:24,965-Speed 25174.44 samples/sec   Loss 6.8795   LearningRate 0.0010   Epoch: 4   Global Step: 7360   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:40:34,834-Speed 24906.00 samples/sec   Loss 6.8164   LearningRate 0.0010   Epoch: 4   Global Step: 7370   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:40:44,679-Speed 24965.69 samples/sec   Loss 6.8160   LearningRate 0.0010   Epoch: 4   Global Step: 7380   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:40:54,625-Speed 24713.68 samples/sec   Loss 6.8320   LearningRate 0.0010   Epoch: 4   Global Step: 7390   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:41:04,409-Speed 25121.99 samples/sec   Loss 6.8038   LearningRate 0.0010   Epoch: 4   Global Step: 7400   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:41:14,162-Speed 25201.41 samples/sec   Loss 6.7783   LearningRate 0.0010   Epoch: 4   Global Step: 7410   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:41:23,951-Speed 25109.55 samples/sec   Loss 6.7702   LearningRate 0.0010   Epoch: 4   Global Step: 7420   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:41:33,694-Speed 25226.67 samples/sec   Loss 6.7572   LearningRate 0.0010   Epoch: 4   Global Step: 7430   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:41:43,446-Speed 25202.81 samples/sec   Loss 6.8119   LearningRate 0.0010   Epoch: 4   Global Step: 7440   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:41:53,230-Speed 25122.74 samples/sec   Loss 6.7465   LearningRate 0.0010   Epoch: 4   Global Step: 7450   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:42:03,007-Speed 25138.89 samples/sec   Loss 6.7089   LearningRate 0.0010   Epoch: 4   Global Step: 7460   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:42:12,769-Speed 25178.52 samples/sec   Loss 6.7120   LearningRate 0.0010   Epoch: 4   Global Step: 7470   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:42:22,589-Speed 25029.56 samples/sec   Loss 6.7161   LearningRate 0.0010   Epoch: 4   Global Step: 7480   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:42:32,561-Speed 24648.26 samples/sec   Loss 6.7144   LearningRate 0.0010   Epoch: 4   Global Step: 7490   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:42:42,411-Speed 24956.95 samples/sec   Loss 6.6788   LearningRate 0.0010   Epoch: 4   Global Step: 7500   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:42:52,169-Speed 25186.90 samples/sec   Loss 6.6832   LearningRate 0.0010   Epoch: 4   Global Step: 7510   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-03-26 00:43:01,942-Speed 25149.93 samples/sec   Loss 6.6942   LearningRate 0.0010   Epoch: 4   Global Step: 7520   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:43:11,739-Speed 25088.36 samples/sec   Loss 6.7026   LearningRate 0.0010   Epoch: 4   Global Step: 7530   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:43:21,475-Speed 25245.51 samples/sec   Loss 6.6500   LearningRate 0.0010   Epoch: 4   Global Step: 7540   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:43:31,307-Speed 25000.41 samples/sec   Loss 6.6177   LearningRate 0.0010   Epoch: 4   Global Step: 7550   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:43:40,999-Speed 25358.90 samples/sec   Loss 6.6011   LearningRate 0.0010   Epoch: 4   Global Step: 7560   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:43:50,806-Speed 25065.61 samples/sec   Loss 6.5918   LearningRate 0.0010   Epoch: 4   Global Step: 7570   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:44:00,581-Speed 25144.75 samples/sec   Loss 6.5792   LearningRate 0.0010   Epoch: 4   Global Step: 7580   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:44:10,320-Speed 25240.05 samples/sec   Loss 6.5603   LearningRate 0.0010   Epoch: 4   Global Step: 7590   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:44:20,148-Speed 25009.33 samples/sec   Loss 6.5855   LearningRate 0.0010   Epoch: 4   Global Step: 7600   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:44:29,887-Speed 25235.62 samples/sec   Loss 6.5605   LearningRate 0.0010   Epoch: 4   Global Step: 7610   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:44:39,588-Speed 25337.60 samples/sec   Loss 6.5597   LearningRate 0.0010   Epoch: 4   Global Step: 7620   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:44:49,332-Speed 25227.13 samples/sec   Loss 6.5594   LearningRate 0.0010   Epoch: 4   Global Step: 7630   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:44:59,255-Speed 24767.62 samples/sec   Loss 6.5055   LearningRate 0.0010   Epoch: 4   Global Step: 7640   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:45:09,110-Speed 24940.18 samples/sec   Loss 6.5126   LearningRate 0.0010   Epoch: 4   Global Step: 7650   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:45:18,887-Speed 25139.64 samples/sec   Loss 6.4882   LearningRate 0.0010   Epoch: 4   Global Step: 7660   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:45:28,690-Speed 25071.92 samples/sec   Loss 6.5267   LearningRate 0.0010   Epoch: 4   Global Step: 7670   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:45:38,510-Speed 25030.03 samples/sec   Loss 6.5036   LearningRate 0.0010   Epoch: 4   Global Step: 7680   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:45:48,298-Speed 25114.06 samples/sec   Loss 6.4935   LearningRate 0.0010   Epoch: 4   Global Step: 7690   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:45:58,116-Speed 25035.11 samples/sec   Loss 6.5155   LearningRate 0.0010   Epoch: 4   Global Step: 7700   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:46:08,010-Speed 24841.16 samples/sec   Loss 6.4691   LearningRate 0.0010   Epoch: 4   Global Step: 7710   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:46:17,879-Speed 24907.08 samples/sec   Loss 6.5049   LearningRate 0.0010   Epoch: 4   Global Step: 7720   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:46:27,734-Speed 24939.82 samples/sec   Loss 6.4672   LearningRate 0.0010   Epoch: 4   Global Step: 7730   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:46:37,525-Speed 25103.28 samples/sec   Loss 6.4665   LearningRate 0.0010   Epoch: 4   Global Step: 7740   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:46:47,286-Speed 25196.27 samples/sec   Loss 6.4196   LearningRate 0.0010   Epoch: 4   Global Step: 7750   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:46:57,080-Speed 25101.16 samples/sec   Loss 6.3943   LearningRate 0.0010   Epoch: 4   Global Step: 7760   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:47:06,828-Speed 25212.10 samples/sec   Loss 6.4047   LearningRate 0.0010   Epoch: 4   Global Step: 7770   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:47:16,621-Speed 25100.17 samples/sec   Loss 6.3908   LearningRate 0.0010   Epoch: 4   Global Step: 7780   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:47:26,390-Speed 25161.63 samples/sec   Loss 6.3924   LearningRate 0.0010   Epoch: 4   Global Step: 7790   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:47:36,171-Speed 25128.48 samples/sec   Loss 6.3877   LearningRate 0.0010   Epoch: 4   Global Step: 7800   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:47:45,939-Speed 25165.34 samples/sec   Loss 6.3467   LearningRate 0.0010   Epoch: 4   Global Step: 7810   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:47:55,855-Speed 24786.69 samples/sec   Loss 6.3343   LearningRate 0.0010   Epoch: 4   Global Step: 7820   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-03-26 00:48:05,660-Speed 25069.68 samples/sec   Loss 6.3343   LearningRate 0.0010   Epoch: 4   Global Step: 7830   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:48:15,654-Speed 24595.86 samples/sec   Loss 6.3095   LearningRate 0.0010   Epoch: 4   Global Step: 7840   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:48:25,471-Speed 25038.37 samples/sec   Loss 6.3493   LearningRate 0.0010   Epoch: 4   Global Step: 7850   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:48:35,363-Speed 24846.99 samples/sec   Loss 6.3432   LearningRate 0.0010   Epoch: 4   Global Step: 7860   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:48:45,297-Speed 24742.73 samples/sec   Loss 6.2847   LearningRate 0.0010   Epoch: 4   Global Step: 7870   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:48:55,309-Speed 24550.60 samples/sec   Loss 6.2820   LearningRate 0.0010   Epoch: 4   Global Step: 7880   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:49:05,265-Speed 24690.41 samples/sec   Loss 6.2897   LearningRate 0.0010   Epoch: 4   Global Step: 7890   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:49:15,089-Speed 25019.25 samples/sec   Loss 6.2793   LearningRate 0.0010   Epoch: 4   Global Step: 7900   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:49:24,976-Speed 24857.00 samples/sec   Loss 6.2774   LearningRate 0.0010   Epoch: 4   Global Step: 7910   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:49:34,888-Speed 24799.17 samples/sec   Loss 6.2965   LearningRate 0.0010   Epoch: 4   Global Step: 7920   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:49:44,769-Speed 24875.19 samples/sec   Loss 6.2792   LearningRate 0.0010   Epoch: 4   Global Step: 7930   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:49:54,964-Speed 24108.19 samples/sec   Loss 6.2517   LearningRate 0.0010   Epoch: 4   Global Step: 7940   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:50:05,109-Speed 24227.45 samples/sec   Loss 6.2311   LearningRate 0.0010   Epoch: 4   Global Step: 7950   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:50:14,977-Speed 24908.50 samples/sec   Loss 6.2340   LearningRate 0.0010   Epoch: 4   Global Step: 7960   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:50:24,841-Speed 24919.23 samples/sec   Loss 6.1852   LearningRate 0.0010   Epoch: 4   Global Step: 7970   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:50:34,751-Speed 24801.79 samples/sec   Loss 6.1886   LearningRate 0.0010   Epoch: 4   Global Step: 7980   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:50:44,624-Speed 24893.90 samples/sec   Loss 6.1971   LearningRate 0.0010   Epoch: 4   Global Step: 7990   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:50:54,470-Speed 24964.55 samples/sec   Loss 6.2642   LearningRate 0.0010   Epoch: 4   Global Step: 8000   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:51:04,292-Speed 25026.09 samples/sec   Loss 6.1767   LearningRate 0.0010   Epoch: 4   Global Step: 8010   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:51:14,211-Speed 24778.37 samples/sec   Loss 6.1355   LearningRate 0.0010   Epoch: 4   Global Step: 8020   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:51:24,099-Speed 24857.83 samples/sec   Loss 6.1776   LearningRate 0.0010   Epoch: 4   Global Step: 8030   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-26 00:51:34,003-Speed 24817.98 samples/sec   Loss 6.1645   LearningRate 0.0010   Epoch: 4   Global Step: 8040   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-26 00:51:43,930-Speed 24761.19 samples/sec   Loss 6.1438   LearningRate 0.0010   Epoch: 4   Global Step: 8050   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-26 00:51:53,788-Speed 24932.35 samples/sec   Loss 6.1255   LearningRate 0.0010   Epoch: 4   Global Step: 8060   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-26 00:52:03,716-Speed 24756.53 samples/sec   Loss 6.1375   LearningRate 0.0010   Epoch: 4   Global Step: 8070   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-26 00:52:13,576-Speed 24927.99 samples/sec   Loss 6.1374   LearningRate 0.0010   Epoch: 4   Global Step: 8080   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-26 00:52:23,401-Speed 25023.49 samples/sec   Loss 6.1222   LearningRate 0.0010   Epoch: 4   Global Step: 8090   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-26 00:52:33,386-Speed 24616.65 samples/sec   Loss 6.0808   LearningRate 0.0010   Epoch: 4   Global Step: 8100   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-26 00:52:43,286-Speed 24827.71 samples/sec   Loss 6.1225   LearningRate 0.0010   Epoch: 4   Global Step: 8110   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-26 00:52:53,175-Speed 24854.17 samples/sec   Loss 6.0692   LearningRate 0.0010   Epoch: 4   Global Step: 8120   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-26 00:53:03,023-Speed 24959.91 samples/sec   Loss 6.0600   LearningRate 0.0010   Epoch: 4   Global Step: 8130   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:53:13,128-Speed 24324.10 samples/sec   Loss 6.0875   LearningRate 0.0010   Epoch: 4   Global Step: 8140   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:53:23,029-Speed 24822.74 samples/sec   Loss 6.0331   LearningRate 0.0010   Epoch: 4   Global Step: 8150   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:53:32,933-Speed 24818.97 samples/sec   Loss 6.0501   LearningRate 0.0010   Epoch: 4   Global Step: 8160   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:53:42,852-Speed 24779.55 samples/sec   Loss 6.0627   LearningRate 0.0010   Epoch: 4   Global Step: 8170   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:53:52,650-Speed 25083.99 samples/sec   Loss 6.0554   LearningRate 0.0010   Epoch: 4   Global Step: 8180   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:54:02,632-Speed 24623.47 samples/sec   Loss 6.0250   LearningRate 0.0010   Epoch: 4   Global Step: 8190   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:54:12,415-Speed 25125.29 samples/sec   Loss 6.0207   LearningRate 0.0010   Epoch: 4   Global Step: 8200   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:54:22,376-Speed 24674.77 samples/sec   Loss 6.0172   LearningRate 0.0010   Epoch: 4   Global Step: 8210   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:54:32,243-Speed 24913.42 samples/sec   Loss 6.0350   LearningRate 0.0010   Epoch: 4   Global Step: 8220   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:54:42,269-Speed 24513.77 samples/sec   Loss 5.9769   LearningRate 0.0010   Epoch: 4   Global Step: 8230   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:54:52,140-Speed 24900.47 samples/sec   Loss 5.9299   LearningRate 0.0010   Epoch: 4   Global Step: 8240   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:55:01,974-Speed 24995.76 samples/sec   Loss 5.9888   LearningRate 0.0010   Epoch: 4   Global Step: 8250   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:55:11,813-Speed 24985.69 samples/sec   Loss 5.9866   LearningRate 0.0010   Epoch: 4   Global Step: 8260   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:55:21,718-Speed 24815.43 samples/sec   Loss 5.9250   LearningRate 0.0010   Epoch: 4   Global Step: 8270   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:55:31,636-Speed 24782.82 samples/sec   Loss 5.9610   LearningRate 0.0010   Epoch: 4   Global Step: 8280   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:55:41,443-Speed 25061.71 samples/sec   Loss 5.9377   LearningRate 0.0010   Epoch: 4   Global Step: 8290   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:55:51,237-Speed 25095.92 samples/sec   Loss 5.9618   LearningRate 0.0010   Epoch: 4   Global Step: 8300   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:56:01,242-Speed 24565.77 samples/sec   Loss 5.9636   LearningRate 0.0010   Epoch: 4   Global Step: 8310   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:56:11,066-Speed 25020.01 samples/sec   Loss 5.9302   LearningRate 0.0010   Epoch: 4   Global Step: 8320   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:56:20,883-Speed 25039.21 samples/sec   Loss 5.9335   LearningRate 0.0010   Epoch: 4   Global Step: 8330   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:56:30,772-Speed 24854.31 samples/sec   Loss 5.8877   LearningRate 0.0010   Epoch: 4   Global Step: 8340   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:56:40,586-Speed 25043.62 samples/sec   Loss 5.9028   LearningRate 0.0010   Epoch: 4   Global Step: 8350   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:56:50,415-Speed 25008.99 samples/sec   Loss 5.9152   LearningRate 0.0010   Epoch: 4   Global Step: 8360   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:57:00,340-Speed 24763.54 samples/sec   Loss 5.8824   LearningRate 0.0010   Epoch: 4   Global Step: 8370   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:57:10,299-Speed 24682.13 samples/sec   Loss 5.8274   LearningRate 0.0010   Epoch: 4   Global Step: 8380   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:57:20,408-Speed 24318.89 samples/sec   Loss 5.8898   LearningRate 0.0010   Epoch: 4   Global Step: 8390   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:57:30,370-Speed 24674.75 samples/sec   Loss 5.8577   LearningRate 0.0010   Epoch: 4   Global Step: 8400   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:57:40,351-Speed 24623.98 samples/sec   Loss 5.8445   LearningRate 0.0010   Epoch: 4   Global Step: 8410   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:57:50,325-Speed 24643.27 samples/sec   Loss 5.8427   LearningRate 0.0010   Epoch: 4   Global Step: 8420   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:58:00,177-Speed 24950.09 samples/sec   Loss 5.8272   LearningRate 0.0010   Epoch: 4   Global Step: 8430   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:58:10,092-Speed 24787.69 samples/sec   Loss 5.8205   LearningRate 0.0010   Epoch: 4   Global Step: 8440   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:58:19,923-Speed 25004.02 samples/sec   Loss 5.8164   LearningRate 0.0010   Epoch: 4   Global Step: 8450   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:58:29,792-Speed 24903.38 samples/sec   Loss 5.8369   LearningRate 0.0010   Epoch: 4   Global Step: 8460   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:58:39,751-Speed 24679.93 samples/sec   Loss 5.8248   LearningRate 0.0010   Epoch: 4   Global Step: 8470   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:58:49,651-Speed 24827.36 samples/sec   Loss 5.7754   LearningRate 0.0010   Epoch: 4   Global Step: 8480   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:58:59,547-Speed 24835.21 samples/sec   Loss 5.7766   LearningRate 0.0009   Epoch: 4   Global Step: 8490   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:59:09,396-Speed 24956.56 samples/sec   Loss 5.7966   LearningRate 0.0009   Epoch: 4   Global Step: 8500   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:59:19,225-Speed 25006.18 samples/sec   Loss 5.8486   LearningRate 0.0009   Epoch: 4   Global Step: 8510   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:59:29,062-Speed 24985.32 samples/sec   Loss 5.7734   LearningRate 0.0009   Epoch: 4   Global Step: 8520   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:59:38,969-Speed 24809.09 samples/sec   Loss 5.7463   LearningRate 0.0009   Epoch: 4   Global Step: 8530   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:59:48,670-Speed 25336.39 samples/sec   Loss 5.7454   LearningRate 0.0009   Epoch: 4   Global Step: 8540   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 00:59:58,423-Speed 25200.05 samples/sec   Loss 5.7176   LearningRate 0.0009   Epoch: 4   Global Step: 8550   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:00:08,131-Speed 25318.19 samples/sec   Loss 5.7334   LearningRate 0.0009   Epoch: 4   Global Step: 8560   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:00:17,971-Speed 24978.16 samples/sec   Loss 5.7634   LearningRate 0.0009   Epoch: 4   Global Step: 8570   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:00:27,690-Speed 25289.73 samples/sec   Loss 5.7660   LearningRate 0.0009   Epoch: 4   Global Step: 8580   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:00:37,492-Speed 25075.29 samples/sec   Loss 5.7667   LearningRate 0.0009   Epoch: 4   Global Step: 8590   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:00:47,238-Speed 25217.85 samples/sec   Loss 5.7512   LearningRate 0.0009   Epoch: 4   Global Step: 8600   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:00:56,910-Speed 25411.08 samples/sec   Loss 5.7362   LearningRate 0.0009   Epoch: 4   Global Step: 8610   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:01:06,688-Speed 25137.25 samples/sec   Loss 5.7451   LearningRate 0.0009   Epoch: 4   Global Step: 8620   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:01:16,376-Speed 25370.97 samples/sec   Loss 5.7614   LearningRate 0.0009   Epoch: 4   Global Step: 8630   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:01:26,107-Speed 25257.12 samples/sec   Loss 5.7384   LearningRate 0.0009   Epoch: 4   Global Step: 8640   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:02:26,210-Speed 4089.12 samples/sec   Loss 5.6603   LearningRate 0.0009   Epoch: 5   Global Step: 8650   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:02:35,990-Speed 25131.91 samples/sec   Loss 5.5986   LearningRate 0.0009   Epoch: 5   Global Step: 8660   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:02:45,722-Speed 25258.04 samples/sec   Loss 5.5975   LearningRate 0.0009   Epoch: 5   Global Step: 8670   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:02:55,417-Speed 25351.88 samples/sec   Loss 5.5957   LearningRate 0.0009   Epoch: 5   Global Step: 8680   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:03:05,162-Speed 25224.47 samples/sec   Loss 5.6402   LearningRate 0.0009   Epoch: 5   Global Step: 8690   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:03:14,953-Speed 25103.59 samples/sec   Loss 5.6274   LearningRate 0.0009   Epoch: 5   Global Step: 8700   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:03:24,752-Speed 25084.45 samples/sec   Loss 5.6249   LearningRate 0.0009   Epoch: 5   Global Step: 8710   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-26 01:03:34,521-Speed 25159.61 samples/sec   Loss 5.6127   LearningRate 0.0009   Epoch: 5   Global Step: 8720   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-26 01:03:44,430-Speed 24804.51 samples/sec   Loss 5.5775   LearningRate 0.0009   Epoch: 5   Global Step: 8730   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-26 01:03:54,203-Speed 25151.51 samples/sec   Loss 5.5604   LearningRate 0.0009   Epoch: 5   Global Step: 8740   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-26 01:04:03,966-Speed 25174.51 samples/sec   Loss 5.5650   LearningRate 0.0009   Epoch: 5   Global Step: 8750   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-26 01:04:13,737-Speed 25162.23 samples/sec   Loss 5.6041   LearningRate 0.0009   Epoch: 5   Global Step: 8760   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-26 01:04:23,522-Speed 25118.33 samples/sec   Loss 5.5819   LearningRate 0.0009   Epoch: 5   Global Step: 8770   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-26 01:04:33,345-Speed 25024.02 samples/sec   Loss 5.5337   LearningRate 0.0009   Epoch: 5   Global Step: 8780   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-26 01:04:43,062-Speed 25295.02 samples/sec   Loss 5.5784   LearningRate 0.0009   Epoch: 5   Global Step: 8790   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-26 01:04:52,829-Speed 25165.70 samples/sec   Loss 5.5639   LearningRate 0.0009   Epoch: 5   Global Step: 8800   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-26 01:05:02,607-Speed 25136.82 samples/sec   Loss 5.5570   LearningRate 0.0009   Epoch: 5   Global Step: 8810   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:05:12,456-Speed 24957.60 samples/sec   Loss 5.6196   LearningRate 0.0009   Epoch: 5   Global Step: 8820   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:05:22,252-Speed 25089.55 samples/sec   Loss 5.5806   LearningRate 0.0009   Epoch: 5   Global Step: 8830   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:05:31,986-Speed 25251.77 samples/sec   Loss 5.5476   LearningRate 0.0009   Epoch: 5   Global Step: 8840   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:05:41,767-Speed 25129.37 samples/sec   Loss 5.5172   LearningRate 0.0009   Epoch: 5   Global Step: 8850   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:05:51,596-Speed 25006.70 samples/sec   Loss 5.5306   LearningRate 0.0009   Epoch: 5   Global Step: 8860   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:06:01,398-Speed 25073.74 samples/sec   Loss 5.5042   LearningRate 0.0009   Epoch: 5   Global Step: 8870   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:06:11,145-Speed 25216.35 samples/sec   Loss 5.5931   LearningRate 0.0009   Epoch: 5   Global Step: 8880   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:06:21,017-Speed 24898.69 samples/sec   Loss 5.5065   LearningRate 0.0009   Epoch: 5   Global Step: 8890   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:06:30,823-Speed 25066.55 samples/sec   Loss 5.5309   LearningRate 0.0009   Epoch: 5   Global Step: 8900   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:06:40,578-Speed 25197.96 samples/sec   Loss 5.5089   LearningRate 0.0009   Epoch: 5   Global Step: 8910   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:06:50,323-Speed 25220.43 samples/sec   Loss 5.5014   LearningRate 0.0009   Epoch: 5   Global Step: 8920   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:07:00,149-Speed 25014.83 samples/sec   Loss 5.4582   LearningRate 0.0009   Epoch: 5   Global Step: 8930   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:07:09,979-Speed 25004.36 samples/sec   Loss 5.4963   LearningRate 0.0009   Epoch: 5   Global Step: 8940   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:07:19,774-Speed 25096.01 samples/sec   Loss 5.4979   LearningRate 0.0009   Epoch: 5   Global Step: 8950   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:07:29,488-Speed 25299.76 samples/sec   Loss 5.4595   LearningRate 0.0009   Epoch: 5   Global Step: 8960   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:07:39,284-Speed 25091.37 samples/sec   Loss 5.4802   LearningRate 0.0009   Epoch: 5   Global Step: 8970   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:07:49,109-Speed 25017.17 samples/sec   Loss 5.5310   LearningRate 0.0009   Epoch: 5   Global Step: 8980   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:07:58,895-Speed 25115.55 samples/sec   Loss 5.5063   LearningRate 0.0009   Epoch: 5   Global Step: 8990   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:08:08,653-Speed 25188.22 samples/sec   Loss 5.4347   LearningRate 0.0009   Epoch: 5   Global Step: 9000   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:08:18,432-Speed 25134.80 samples/sec   Loss 5.4446   LearningRate 0.0009   Epoch: 5   Global Step: 9010   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-03-26 01:08:28,241-Speed 25055.26 samples/sec   Loss 5.4386   LearningRate 0.0009   Epoch: 5   Global Step: 9020   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:08:38,134-Speed 24845.61 samples/sec   Loss 5.4723   LearningRate 0.0009   Epoch: 5   Global Step: 9030   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:08:48,022-Speed 24858.07 samples/sec   Loss 5.4443   LearningRate 0.0009   Epoch: 5   Global Step: 9040   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:08:57,939-Speed 24785.18 samples/sec   Loss 5.4305   LearningRate 0.0009   Epoch: 5   Global Step: 9050   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:09:07,696-Speed 25191.02 samples/sec   Loss 5.4198   LearningRate 0.0009   Epoch: 5   Global Step: 9060   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:09:17,421-Speed 25275.31 samples/sec   Loss 5.4088   LearningRate 0.0009   Epoch: 5   Global Step: 9070   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:09:27,221-Speed 25082.68 samples/sec   Loss 5.4132   LearningRate 0.0009   Epoch: 5   Global Step: 9080   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:09:36,942-Speed 25283.80 samples/sec   Loss 5.4190   LearningRate 0.0009   Epoch: 5   Global Step: 9090   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:09:46,770-Speed 25009.50 samples/sec   Loss 5.3721   LearningRate 0.0009   Epoch: 5   Global Step: 9100   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:09:56,674-Speed 24816.35 samples/sec   Loss 5.3964   LearningRate 0.0009   Epoch: 5   Global Step: 9110   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:10:06,409-Speed 25246.52 samples/sec   Loss 5.4420   LearningRate 0.0009   Epoch: 5   Global Step: 9120   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:10:16,292-Speed 24870.75 samples/sec   Loss 5.4162   LearningRate 0.0009   Epoch: 5   Global Step: 9130   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:10:26,150-Speed 24933.46 samples/sec   Loss 5.3487   LearningRate 0.0009   Epoch: 5   Global Step: 9140   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:10:35,984-Speed 24992.43 samples/sec   Loss 5.3448   LearningRate 0.0009   Epoch: 5   Global Step: 9150   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:10:45,908-Speed 24769.01 samples/sec   Loss 5.3572   LearningRate 0.0009   Epoch: 5   Global Step: 9160   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-26 01:10:55,858-Speed 24702.17 samples/sec   Loss 5.3512   LearningRate 0.0009   Epoch: 5   Global Step: 9170   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-26 01:11:05,637-Speed 25134.63 samples/sec   Loss 5.3611   LearningRate 0.0009   Epoch: 5   Global Step: 9180   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-26 01:11:15,382-Speed 25222.36 samples/sec   Loss 5.3881   LearningRate 0.0009   Epoch: 5   Global Step: 9190   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-26 01:11:25,219-Speed 24986.73 samples/sec   Loss 5.3395   LearningRate 0.0009   Epoch: 5   Global Step: 9200   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-26 01:11:34,973-Speed 25198.65 samples/sec   Loss 5.3534   LearningRate 0.0009   Epoch: 5   Global Step: 9210   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-26 01:11:44,725-Speed 25204.23 samples/sec   Loss 5.3299   LearningRate 0.0009   Epoch: 5   Global Step: 9220   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-26 01:11:54,463-Speed 25240.31 samples/sec   Loss 5.3527   LearningRate 0.0009   Epoch: 5   Global Step: 9230   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-26 01:12:04,328-Speed 24915.90 samples/sec   Loss 5.3312   LearningRate 0.0009   Epoch: 5   Global Step: 9240   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-26 01:12:14,175-Speed 24961.07 samples/sec   Loss 5.3054   LearningRate 0.0009   Epoch: 5   Global Step: 9250   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-26 01:12:23,984-Speed 25058.40 samples/sec   Loss 5.3071   LearningRate 0.0009   Epoch: 5   Global Step: 9260   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:12:33,881-Speed 24835.57 samples/sec   Loss 5.3231   LearningRate 0.0009   Epoch: 5   Global Step: 9270   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:12:43,621-Speed 25234.48 samples/sec   Loss 5.2881   LearningRate 0.0009   Epoch: 5   Global Step: 9280   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:12:53,511-Speed 24854.18 samples/sec   Loss 5.2966   LearningRate 0.0009   Epoch: 5   Global Step: 9290   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:13:03,397-Speed 24860.94 samples/sec   Loss 5.2919   LearningRate 0.0009   Epoch: 5   Global Step: 9300   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:13:13,274-Speed 24886.96 samples/sec   Loss 5.2931   LearningRate 0.0009   Epoch: 5   Global Step: 9310   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:13:23,291-Speed 24538.93 samples/sec   Loss 5.2845   LearningRate 0.0009   Epoch: 5   Global Step: 9320   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:13:33,180-Speed 24855.06 samples/sec   Loss 5.2519   LearningRate 0.0009   Epoch: 5   Global Step: 9330   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:13:43,145-Speed 24667.08 samples/sec   Loss 5.2290   LearningRate 0.0009   Epoch: 5   Global Step: 9340   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:13:52,885-Speed 25233.92 samples/sec   Loss 5.2907   LearningRate 0.0009   Epoch: 5   Global Step: 9350   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:14:02,620-Speed 25248.48 samples/sec   Loss 5.2755   LearningRate 0.0009   Epoch: 5   Global Step: 9360   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:14:12,409-Speed 25109.09 samples/sec   Loss 5.2152   LearningRate 0.0009   Epoch: 5   Global Step: 9370   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:14:22,177-Speed 25161.32 samples/sec   Loss 5.2867   LearningRate 0.0009   Epoch: 5   Global Step: 9380   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:14:31,973-Speed 25093.01 samples/sec   Loss 5.2589   LearningRate 0.0009   Epoch: 5   Global Step: 9390   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:14:41,785-Speed 25048.53 samples/sec   Loss 5.2270   LearningRate 0.0009   Epoch: 5   Global Step: 9400   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-26 01:14:51,491-Speed 25325.13 samples/sec   Loss 5.2172   LearningRate 0.0009   Epoch: 5   Global Step: 9410   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-26 01:15:01,251-Speed 25185.26 samples/sec   Loss 5.2493   LearningRate 0.0009   Epoch: 5   Global Step: 9420   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-26 01:15:10,954-Speed 25331.23 samples/sec   Loss 5.2087   LearningRate 0.0009   Epoch: 5   Global Step: 9430   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-26 01:15:20,799-Speed 24964.38 samples/sec   Loss 5.1892   LearningRate 0.0009   Epoch: 5   Global Step: 9440   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-26 01:15:30,532-Speed 25253.97 samples/sec   Loss 5.2065   LearningRate 0.0009   Epoch: 5   Global Step: 9450   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-26 01:15:40,461-Speed 24753.83 samples/sec   Loss 5.2090   LearningRate 0.0009   Epoch: 5   Global Step: 9460   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-26 01:15:50,178-Speed 25295.27 samples/sec   Loss 5.2405   LearningRate 0.0009   Epoch: 5   Global Step: 9470   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-26 01:15:59,915-Speed 25242.13 samples/sec   Loss 5.2370   LearningRate 0.0009   Epoch: 5   Global Step: 9480   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-26 01:16:09,731-Speed 25040.62 samples/sec   Loss 5.1699   LearningRate 0.0009   Epoch: 5   Global Step: 9490   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-26 01:16:19,592-Speed 24926.35 samples/sec   Loss 5.1767   LearningRate 0.0009   Epoch: 5   Global Step: 9500   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:16:29,452-Speed 24928.93 samples/sec   Loss 5.1738   LearningRate 0.0009   Epoch: 5   Global Step: 9510   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:16:39,236-Speed 25120.20 samples/sec   Loss 5.1320   LearningRate 0.0009   Epoch: 5   Global Step: 9520   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:16:49,130-Speed 24844.92 samples/sec   Loss 5.1637   LearningRate 0.0009   Epoch: 5   Global Step: 9530   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:16:58,851-Speed 25284.66 samples/sec   Loss 5.2133   LearningRate 0.0009   Epoch: 5   Global Step: 9540   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:17:08,644-Speed 25100.84 samples/sec   Loss 5.1445   LearningRate 0.0009   Epoch: 5   Global Step: 9550   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:17:18,374-Speed 25259.46 samples/sec   Loss 5.1772   LearningRate 0.0009   Epoch: 5   Global Step: 9560   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:17:28,144-Speed 25158.75 samples/sec   Loss 5.1412   LearningRate 0.0009   Epoch: 5   Global Step: 9570   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:17:37,906-Speed 25180.69 samples/sec   Loss 5.1571   LearningRate 0.0009   Epoch: 5   Global Step: 9580   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:17:47,677-Speed 25207.72 samples/sec   Loss 5.1605   LearningRate 0.0009   Epoch: 5   Global Step: 9590   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:17:57,645-Speed 24657.27 samples/sec   Loss 5.1429   LearningRate 0.0009   Epoch: 5   Global Step: 9600   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-03-26 01:18:07,473-Speed 25009.95 samples/sec   Loss 5.1366   LearningRate 0.0009   Epoch: 5   Global Step: 9610   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:18:17,234-Speed 25186.45 samples/sec   Loss 5.1494   LearningRate 0.0009   Epoch: 5   Global Step: 9620   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:18:27,139-Speed 24814.77 samples/sec   Loss 5.1418   LearningRate 0.0009   Epoch: 5   Global Step: 9630   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:18:36,967-Speed 25008.72 samples/sec   Loss 5.1340   LearningRate 0.0009   Epoch: 5   Global Step: 9640   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:18:46,756-Speed 25107.76 samples/sec   Loss 5.1376   LearningRate 0.0009   Epoch: 5   Global Step: 9650   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:18:56,537-Speed 25129.98 samples/sec   Loss 5.1235   LearningRate 0.0009   Epoch: 5   Global Step: 9660   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:19:06,443-Speed 24812.10 samples/sec   Loss 5.1167   LearningRate 0.0009   Epoch: 5   Global Step: 9670   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:19:16,367-Speed 24769.76 samples/sec   Loss 5.0729   LearningRate 0.0009   Epoch: 5   Global Step: 9680   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:19:26,200-Speed 24993.63 samples/sec   Loss 5.0992   LearningRate 0.0009   Epoch: 5   Global Step: 9690   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:19:35,980-Speed 25134.61 samples/sec   Loss 5.0672   LearningRate 0.0009   Epoch: 5   Global Step: 9700   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:19:45,800-Speed 25029.06 samples/sec   Loss 5.0844   LearningRate 0.0009   Epoch: 5   Global Step: 9710   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:19:55,645-Speed 24967.37 samples/sec   Loss 5.0856   LearningRate 0.0009   Epoch: 5   Global Step: 9720   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:20:05,421-Speed 25143.95 samples/sec   Loss 5.0577   LearningRate 0.0009   Epoch: 5   Global Step: 9730   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:20:15,197-Speed 25143.46 samples/sec   Loss 5.0915   LearningRate 0.0009   Epoch: 5   Global Step: 9740   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:20:24,998-Speed 25077.05 samples/sec   Loss 5.0609   LearningRate 0.0009   Epoch: 5   Global Step: 9750   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:20:34,870-Speed 24899.10 samples/sec   Loss 5.0537   LearningRate 0.0009   Epoch: 5   Global Step: 9760   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:20:44,696-Speed 25013.55 samples/sec   Loss 5.0726   LearningRate 0.0009   Epoch: 5   Global Step: 9770   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:20:54,494-Speed 25087.64 samples/sec   Loss 5.0727   LearningRate 0.0009   Epoch: 5   Global Step: 9780   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:21:04,242-Speed 25213.01 samples/sec   Loss 5.0377   LearningRate 0.0009   Epoch: 5   Global Step: 9790   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:21:13,994-Speed 25204.58 samples/sec   Loss 5.0286   LearningRate 0.0009   Epoch: 5   Global Step: 9800   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:21:23,863-Speed 24905.12 samples/sec   Loss 5.0573   LearningRate 0.0009   Epoch: 5   Global Step: 9810   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:21:33,620-Speed 25191.03 samples/sec   Loss 5.0645   LearningRate 0.0009   Epoch: 5   Global Step: 9820   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:21:43,440-Speed 25028.60 samples/sec   Loss 5.0238   LearningRate 0.0009   Epoch: 5   Global Step: 9830   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:21:53,201-Speed 25182.10 samples/sec   Loss 4.9970   LearningRate 0.0009   Epoch: 5   Global Step: 9840   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:22:02,933-Speed 25255.75 samples/sec   Loss 5.0238   LearningRate 0.0009   Epoch: 5   Global Step: 9850   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:22:12,791-Speed 24932.65 samples/sec   Loss 5.0226   LearningRate 0.0009   Epoch: 5   Global Step: 9860   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:22:22,679-Speed 24857.36 samples/sec   Loss 5.0394   LearningRate 0.0009   Epoch: 5   Global Step: 9870   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:22:32,498-Speed 25030.54 samples/sec   Loss 5.0479   LearningRate 0.0009   Epoch: 5   Global Step: 9880   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:22:42,234-Speed 25244.99 samples/sec   Loss 5.0375   LearningRate 0.0009   Epoch: 5   Global Step: 9890   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:22:52,052-Speed 25035.54 samples/sec   Loss 5.0171   LearningRate 0.0009   Epoch: 5   Global Step: 9900   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:23:01,864-Speed 25052.14 samples/sec   Loss 5.0264   LearningRate 0.0009   Epoch: 5   Global Step: 9910   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-03-26 01:23:11,628-Speed 25172.40 samples/sec   Loss 5.0160   LearningRate 0.0009   Epoch: 5   Global Step: 9920   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:23:21,388-Speed 25183.32 samples/sec   Loss 4.9695   LearningRate 0.0009   Epoch: 5   Global Step: 9930   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:23:31,273-Speed 24865.98 samples/sec   Loss 4.9663   LearningRate 0.0009   Epoch: 5   Global Step: 9940   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:23:41,039-Speed 25167.27 samples/sec   Loss 4.9489   LearningRate 0.0009   Epoch: 5   Global Step: 9950   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:23:50,830-Speed 25106.00 samples/sec   Loss 4.9552   LearningRate 0.0009   Epoch: 5   Global Step: 9960   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:24:00,689-Speed 24929.81 samples/sec   Loss 4.9910   LearningRate 0.0009   Epoch: 5   Global Step: 9970   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:24:10,525-Speed 24987.41 samples/sec   Loss 4.9541   LearningRate 0.0009   Epoch: 5   Global Step: 9980   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:24:20,382-Speed 24935.51 samples/sec   Loss 4.9766   LearningRate 0.0009   Epoch: 5   Global Step: 9990   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:24:30,218-Speed 24992.24 samples/sec   Loss 4.9503   LearningRate 0.0009   Epoch: 5   Global Step: 10000   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:24:40,082-Speed 24918.70 samples/sec   Loss 4.9143   LearningRate 0.0009   Epoch: 5   Global Step: 10010   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:24:50,012-Speed 24752.36 samples/sec   Loss 4.9498   LearningRate 0.0009   Epoch: 5   Global Step: 10020   Fp16 Grad Scale: 262144   Required: 17 hours
Training: 2022-03-26 01:25:00,002-Speed 24604.55 samples/sec   Loss 4.9312   LearningRate 0.0009   Epoch: 5   Global Step: 10030   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:25:09,780-Speed 25137.29 samples/sec   Loss 4.9378   LearningRate 0.0009   Epoch: 5   Global Step: 10040   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:25:19,611-Speed 25002.68 samples/sec   Loss 4.9471   LearningRate 0.0009   Epoch: 5   Global Step: 10050   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:25:29,435-Speed 25018.29 samples/sec   Loss 4.9703   LearningRate 0.0009   Epoch: 5   Global Step: 10060   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:25:39,263-Speed 25011.41 samples/sec   Loss 4.9218   LearningRate 0.0009   Epoch: 5   Global Step: 10070   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:25:49,036-Speed 25149.34 samples/sec   Loss 4.9557   LearningRate 0.0009   Epoch: 5   Global Step: 10080   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:25:58,861-Speed 25017.22 samples/sec   Loss 4.9536   LearningRate 0.0009   Epoch: 5   Global Step: 10090   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:26:08,887-Speed 24514.52 samples/sec   Loss 4.9330   LearningRate 0.0009   Epoch: 5   Global Step: 10100   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:26:18,729-Speed 24973.93 samples/sec   Loss 4.9093   LearningRate 0.0009   Epoch: 5   Global Step: 10110   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:26:28,472-Speed 25225.17 samples/sec   Loss 4.9694   LearningRate 0.0009   Epoch: 5   Global Step: 10120   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:26:38,254-Speed 25126.16 samples/sec   Loss 4.8996   LearningRate 0.0009   Epoch: 5   Global Step: 10130   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:26:47,947-Speed 25357.70 samples/sec   Loss 4.8953   LearningRate 0.0009   Epoch: 5   Global Step: 10140   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:26:57,759-Speed 25048.66 samples/sec   Loss 4.9043   LearningRate 0.0009   Epoch: 5   Global Step: 10150   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:27:07,486-Speed 25269.80 samples/sec   Loss 4.9247   LearningRate 0.0009   Epoch: 5   Global Step: 10160   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:27:17,249-Speed 25173.91 samples/sec   Loss 4.9135   LearningRate 0.0009   Epoch: 5   Global Step: 10170   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:27:26,992-Speed 25233.23 samples/sec   Loss 4.9217   LearningRate 0.0009   Epoch: 5   Global Step: 10180   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:27:36,737-Speed 25223.97 samples/sec   Loss 4.8752   LearningRate 0.0009   Epoch: 5   Global Step: 10190   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:27:46,591-Speed 24942.68 samples/sec   Loss 4.8692   LearningRate 0.0009   Epoch: 5   Global Step: 10200   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:27:56,413-Speed 25025.08 samples/sec   Loss 4.8506   LearningRate 0.0009   Epoch: 5   Global Step: 10210   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:28:06,229-Speed 25046.94 samples/sec   Loss 4.8441   LearningRate 0.0009   Epoch: 5   Global Step: 10220   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:28:16,029-Speed 25090.00 samples/sec   Loss 4.8768   LearningRate 0.0009   Epoch: 5   Global Step: 10230   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:28:25,785-Speed 25192.10 samples/sec   Loss 4.8818   LearningRate 0.0009   Epoch: 5   Global Step: 10240   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:28:35,579-Speed 25099.33 samples/sec   Loss 4.9434   LearningRate 0.0009   Epoch: 5   Global Step: 10250   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:28:45,616-Speed 24488.70 samples/sec   Loss 4.8886   LearningRate 0.0009   Epoch: 5   Global Step: 10260   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:28:55,415-Speed 25088.77 samples/sec   Loss 4.8710   LearningRate 0.0009   Epoch: 5   Global Step: 10270   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:29:05,210-Speed 25095.19 samples/sec   Loss 4.8499   LearningRate 0.0009   Epoch: 5   Global Step: 10280   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:29:14,949-Speed 25237.15 samples/sec   Loss 4.8649   LearningRate 0.0009   Epoch: 5   Global Step: 10290   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:29:24,793-Speed 24969.24 samples/sec   Loss 4.8828   LearningRate 0.0009   Epoch: 5   Global Step: 10300   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:29:34,658-Speed 24915.11 samples/sec   Loss 4.8906   LearningRate 0.0009   Epoch: 5   Global Step: 10310   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:29:44,394-Speed 25245.39 samples/sec   Loss 4.8407   LearningRate 0.0009   Epoch: 5   Global Step: 10320   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:29:54,114-Speed 25287.71 samples/sec   Loss 4.8299   LearningRate 0.0009   Epoch: 5   Global Step: 10330   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:30:03,920-Speed 25067.69 samples/sec   Loss 4.8198   LearningRate 0.0009   Epoch: 5   Global Step: 10340   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:30:13,726-Speed 25064.78 samples/sec   Loss 4.8559   LearningRate 0.0009   Epoch: 5   Global Step: 10350   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:30:23,485-Speed 25185.84 samples/sec   Loss 4.8571   LearningRate 0.0009   Epoch: 5   Global Step: 10360   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:30:33,262-Speed 25140.14 samples/sec   Loss 4.8912   LearningRate 0.0009   Epoch: 5   Global Step: 10370   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:31:34,307-Speed 4026.05 samples/sec   Loss 4.7680   LearningRate 0.0009   Epoch: 6   Global Step: 10380   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:31:44,111-Speed 25072.29 samples/sec   Loss 4.7378   LearningRate 0.0009   Epoch: 6   Global Step: 10390   Fp16 Grad Scale: 131072   Required: 17 hours
Training: 2022-03-26 01:31:54,425-Speed 23830.47 samples/sec   Loss 4.7974   LearningRate 0.0009   Epoch: 6   Global Step: 10400   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:32:04,176-Speed 25206.23 samples/sec   Loss 4.7783   LearningRate 0.0009   Epoch: 6   Global Step: 10410   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:32:13,863-Speed 25374.89 samples/sec   Loss 4.7752   LearningRate 0.0009   Epoch: 6   Global Step: 10420   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:32:23,704-Speed 24976.76 samples/sec   Loss 4.7502   LearningRate 0.0009   Epoch: 6   Global Step: 10430   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-03-26 01:32:33,337-Speed 25515.39 samples/sec   Loss 4.7876   LearningRate 0.0009   Epoch: 6   Global Step: 10440   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:32:43,022-Speed 25377.67 samples/sec   Loss 4.7738   LearningRate 0.0009   Epoch: 6   Global Step: 10450   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:32:52,803-Speed 25129.03 samples/sec   Loss 4.7495   LearningRate 0.0009   Epoch: 6   Global Step: 10460   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:33:02,576-Speed 25149.97 samples/sec   Loss 4.7629   LearningRate 0.0009   Epoch: 6   Global Step: 10470   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:33:12,274-Speed 25342.84 samples/sec   Loss 4.7466   LearningRate 0.0009   Epoch: 6   Global Step: 10480   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:33:22,038-Speed 25174.18 samples/sec   Loss 4.7813   LearningRate 0.0009   Epoch: 6   Global Step: 10490   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:33:31,759-Speed 25284.12 samples/sec   Loss 4.7298   LearningRate 0.0009   Epoch: 6   Global Step: 10500   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:33:41,530-Speed 25157.09 samples/sec   Loss 4.7483   LearningRate 0.0009   Epoch: 6   Global Step: 10510   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:33:51,259-Speed 25264.79 samples/sec   Loss 4.7582   LearningRate 0.0009   Epoch: 6   Global Step: 10520   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:34:00,924-Speed 25430.86 samples/sec   Loss 4.7469   LearningRate 0.0009   Epoch: 6   Global Step: 10530   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:34:10,658-Speed 25250.57 samples/sec   Loss 4.6952   LearningRate 0.0009   Epoch: 6   Global Step: 10540   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:34:20,406-Speed 25213.56 samples/sec   Loss 4.7146   LearningRate 0.0009   Epoch: 6   Global Step: 10550   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:34:30,176-Speed 25157.37 samples/sec   Loss 4.7388   LearningRate 0.0009   Epoch: 6   Global Step: 10560   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:34:39,870-Speed 25356.16 samples/sec   Loss 4.7493   LearningRate 0.0009   Epoch: 6   Global Step: 10570   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:34:49,640-Speed 25158.02 samples/sec   Loss 4.7240   LearningRate 0.0009   Epoch: 6   Global Step: 10580   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:34:59,345-Speed 25326.71 samples/sec   Loss 4.7698   LearningRate 0.0009   Epoch: 6   Global Step: 10590   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:35:09,073-Speed 25275.17 samples/sec   Loss 4.7344   LearningRate 0.0009   Epoch: 6   Global Step: 10600   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:35:18,905-Speed 24997.58 samples/sec   Loss 4.7580   LearningRate 0.0009   Epoch: 6   Global Step: 10610   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:35:28,773-Speed 24908.87 samples/sec   Loss 4.6666   LearningRate 0.0009   Epoch: 6   Global Step: 10620   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:35:38,653-Speed 24876.93 samples/sec   Loss 4.6950   LearningRate 0.0009   Epoch: 6   Global Step: 10630   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:35:48,388-Speed 25248.14 samples/sec   Loss 4.7267   LearningRate 0.0009   Epoch: 6   Global Step: 10640   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:35:58,108-Speed 25286.60 samples/sec   Loss 4.6916   LearningRate 0.0009   Epoch: 6   Global Step: 10650   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:36:07,963-Speed 24939.36 samples/sec   Loss 4.7106   LearningRate 0.0009   Epoch: 6   Global Step: 10660   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:36:17,819-Speed 24938.08 samples/sec   Loss 4.7212   LearningRate 0.0009   Epoch: 6   Global Step: 10670   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:36:27,581-Speed 25178.54 samples/sec   Loss 4.6882   LearningRate 0.0009   Epoch: 6   Global Step: 10680   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:36:37,294-Speed 25305.77 samples/sec   Loss 4.7096   LearningRate 0.0009   Epoch: 6   Global Step: 10690   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:36:47,079-Speed 25119.95 samples/sec   Loss 4.7360   LearningRate 0.0009   Epoch: 6   Global Step: 10700   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:36:56,809-Speed 25263.43 samples/sec   Loss 4.6978   LearningRate 0.0009   Epoch: 6   Global Step: 10710   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:37:06,637-Speed 25009.25 samples/sec   Loss 4.6559   LearningRate 0.0009   Epoch: 6   Global Step: 10720   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:37:16,471-Speed 24993.20 samples/sec   Loss 4.6793   LearningRate 0.0009   Epoch: 6   Global Step: 10730   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:37:26,271-Speed 25081.52 samples/sec   Loss 4.7196   LearningRate 0.0009   Epoch: 6   Global Step: 10740   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:37:35,958-Speed 25375.37 samples/sec   Loss 4.7184   LearningRate 0.0009   Epoch: 6   Global Step: 10750   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:37:45,718-Speed 25182.89 samples/sec   Loss 4.7031   LearningRate 0.0009   Epoch: 6   Global Step: 10760   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:37:55,428-Speed 25311.61 samples/sec   Loss 4.6504   LearningRate 0.0009   Epoch: 6   Global Step: 10770   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:38:05,271-Speed 24972.27 samples/sec   Loss 4.6502   LearningRate 0.0009   Epoch: 6   Global Step: 10780   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 01:38:15,038-Speed 25167.75 samples/sec   Loss 4.6846   LearningRate 0.0009   Epoch: 6   Global Step: 10790   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 01:38:24,796-Speed 25187.45 samples/sec   Loss 4.6756   LearningRate 0.0009   Epoch: 6   Global Step: 10800   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 01:38:34,566-Speed 25157.40 samples/sec   Loss 4.6612   LearningRate 0.0009   Epoch: 6   Global Step: 10810   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 01:38:44,349-Speed 25125.63 samples/sec   Loss 4.6709   LearningRate 0.0009   Epoch: 6   Global Step: 10820   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 01:38:54,109-Speed 25192.78 samples/sec   Loss 4.6295   LearningRate 0.0009   Epoch: 6   Global Step: 10830   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 01:39:03,902-Speed 25098.96 samples/sec   Loss 4.6689   LearningRate 0.0009   Epoch: 6   Global Step: 10840   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 01:39:13,662-Speed 25183.36 samples/sec   Loss 4.6505   LearningRate 0.0009   Epoch: 6   Global Step: 10850   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 01:39:23,565-Speed 24820.90 samples/sec   Loss 4.6203   LearningRate 0.0009   Epoch: 6   Global Step: 10860   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 01:39:33,552-Speed 24611.39 samples/sec   Loss 4.6150   LearningRate 0.0009   Epoch: 6   Global Step: 10870   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 01:39:43,512-Speed 24678.28 samples/sec   Loss 4.6708   LearningRate 0.0009   Epoch: 6   Global Step: 10880   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:39:53,477-Speed 24667.51 samples/sec   Loss 4.6387   LearningRate 0.0009   Epoch: 6   Global Step: 10890   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:40:03,543-Speed 24416.83 samples/sec   Loss 4.6428   LearningRate 0.0009   Epoch: 6   Global Step: 10900   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:40:13,541-Speed 24584.23 samples/sec   Loss 4.6262   LearningRate 0.0009   Epoch: 6   Global Step: 10910   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:40:23,563-Speed 24533.45 samples/sec   Loss 4.6274   LearningRate 0.0009   Epoch: 6   Global Step: 10920   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:40:33,543-Speed 24626.89 samples/sec   Loss 4.6108   LearningRate 0.0009   Epoch: 6   Global Step: 10930   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 01:40:43,537-Speed 24592.67 samples/sec   Loss 4.6060   LearningRate 0.0009   Epoch: 6   Global Step: 10940   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 01:40:53,606-Speed 24409.54 samples/sec   Loss 4.6151   LearningRate 0.0009   Epoch: 6   Global Step: 10950   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 01:41:03,636-Speed 24507.19 samples/sec   Loss 4.5885   LearningRate 0.0009   Epoch: 6   Global Step: 10960   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 01:41:13,578-Speed 24721.52 samples/sec   Loss 4.6097   LearningRate 0.0009   Epoch: 6   Global Step: 10970   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 01:41:23,519-Speed 24725.72 samples/sec   Loss 4.5934   LearningRate 0.0009   Epoch: 6   Global Step: 10980   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 01:41:33,461-Speed 24720.09 samples/sec   Loss 4.6127   LearningRate 0.0009   Epoch: 6   Global Step: 10990   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 01:41:43,384-Speed 24771.08 samples/sec   Loss 4.6008   LearningRate 0.0009   Epoch: 6   Global Step: 11000   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 01:41:53,328-Speed 24715.51 samples/sec   Loss 4.5858   LearningRate 0.0009   Epoch: 6   Global Step: 11010   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 01:42:03,301-Speed 24647.46 samples/sec   Loss 4.5898   LearningRate 0.0009   Epoch: 6   Global Step: 11020   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 01:42:13,419-Speed 24294.72 samples/sec   Loss 4.6250   LearningRate 0.0009   Epoch: 6   Global Step: 11030   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:42:23,442-Speed 24521.09 samples/sec   Loss 4.5995   LearningRate 0.0009   Epoch: 6   Global Step: 11040   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:42:33,337-Speed 24837.93 samples/sec   Loss 4.5996   LearningRate 0.0009   Epoch: 6   Global Step: 11050   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:42:43,376-Speed 24483.11 samples/sec   Loss 4.5944   LearningRate 0.0009   Epoch: 6   Global Step: 11060   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:42:53,406-Speed 24507.02 samples/sec   Loss 4.5647   LearningRate 0.0009   Epoch: 6   Global Step: 11070   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:43:03,370-Speed 24669.50 samples/sec   Loss 4.5883   LearningRate 0.0009   Epoch: 6   Global Step: 11080   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:43:13,377-Speed 24560.76 samples/sec   Loss 4.5997   LearningRate 0.0009   Epoch: 6   Global Step: 11090   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:43:23,353-Speed 24641.27 samples/sec   Loss 4.5845   LearningRate 0.0009   Epoch: 6   Global Step: 11100   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:43:33,354-Speed 24575.47 samples/sec   Loss 4.5806   LearningRate 0.0009   Epoch: 6   Global Step: 11110   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:43:43,311-Speed 24686.23 samples/sec   Loss 4.5551   LearningRate 0.0009   Epoch: 6   Global Step: 11120   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:43:53,302-Speed 24603.59 samples/sec   Loss 4.5314   LearningRate 0.0009   Epoch: 6   Global Step: 11130   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-03-26 01:44:03,244-Speed 24722.96 samples/sec   Loss 4.5251   LearningRate 0.0009   Epoch: 6   Global Step: 11140   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:44:13,196-Speed 24697.16 samples/sec   Loss 4.5154   LearningRate 0.0009   Epoch: 6   Global Step: 11150   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:44:23,152-Speed 24686.27 samples/sec   Loss 4.4955   LearningRate 0.0009   Epoch: 6   Global Step: 11160   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:44:33,098-Speed 24711.80 samples/sec   Loss 4.4818   LearningRate 0.0009   Epoch: 6   Global Step: 11170   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:44:43,117-Speed 24534.04 samples/sec   Loss 4.5475   LearningRate 0.0009   Epoch: 6   Global Step: 11180   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:44:53,120-Speed 24574.81 samples/sec   Loss 4.5226   LearningRate 0.0009   Epoch: 6   Global Step: 11190   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:45:03,179-Speed 24435.70 samples/sec   Loss 4.5717   LearningRate 0.0009   Epoch: 6   Global Step: 11200   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:45:13,274-Speed 24347.13 samples/sec   Loss 4.5935   LearningRate 0.0009   Epoch: 6   Global Step: 11210   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:45:23,276-Speed 24574.58 samples/sec   Loss 4.5489   LearningRate 0.0009   Epoch: 6   Global Step: 11220   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 01:45:33,209-Speed 24744.91 samples/sec   Loss 4.5081   LearningRate 0.0009   Epoch: 6   Global Step: 11230   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 01:45:43,285-Speed 24392.63 samples/sec   Loss 4.5005   LearningRate 0.0009   Epoch: 6   Global Step: 11240   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 01:45:53,232-Speed 24709.12 samples/sec   Loss 4.4818   LearningRate 0.0009   Epoch: 6   Global Step: 11250   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 01:46:03,224-Speed 24598.78 samples/sec   Loss 4.4902   LearningRate 0.0009   Epoch: 6   Global Step: 11260   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 01:46:13,208-Speed 24619.99 samples/sec   Loss 4.4949   LearningRate 0.0009   Epoch: 6   Global Step: 11270   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 01:46:23,234-Speed 24514.61 samples/sec   Loss 4.4678   LearningRate 0.0009   Epoch: 6   Global Step: 11280   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 01:46:33,308-Speed 24401.26 samples/sec   Loss 4.5010   LearningRate 0.0009   Epoch: 6   Global Step: 11290   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 01:46:43,300-Speed 24598.32 samples/sec   Loss 4.5647   LearningRate 0.0009   Epoch: 6   Global Step: 11300   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 01:46:53,270-Speed 24653.06 samples/sec   Loss 4.5287   LearningRate 0.0009   Epoch: 6   Global Step: 11310   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 01:47:03,260-Speed 24604.50 samples/sec   Loss 4.4874   LearningRate 0.0009   Epoch: 6   Global Step: 11320   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:47:13,305-Speed 24470.25 samples/sec   Loss 4.4762   LearningRate 0.0009   Epoch: 6   Global Step: 11330   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:47:23,266-Speed 24674.96 samples/sec   Loss 4.4616   LearningRate 0.0009   Epoch: 6   Global Step: 11340   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:47:33,171-Speed 24814.54 samples/sec   Loss 4.4919   LearningRate 0.0009   Epoch: 6   Global Step: 11350   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 01:47:43,321-Speed 24216.81 samples/sec   Loss 4.5003   LearningRate 0.0009   Epoch: 6   Global Step: 11360   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 01:47:53,279-Speed 24684.75 samples/sec   Loss 4.4771   LearningRate 0.0009   Epoch: 6   Global Step: 11370   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 01:48:03,472-Speed 24117.78 samples/sec   Loss 4.4801   LearningRate 0.0009   Epoch: 6   Global Step: 11380   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 01:48:13,446-Speed 24650.80 samples/sec   Loss 4.4642   LearningRate 0.0009   Epoch: 6   Global Step: 11390   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 01:48:23,469-Speed 24521.84 samples/sec   Loss 4.4918   LearningRate 0.0009   Epoch: 6   Global Step: 11400   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 01:48:33,419-Speed 24703.02 samples/sec   Loss 4.4533   LearningRate 0.0009   Epoch: 6   Global Step: 11410   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 01:48:43,385-Speed 24664.05 samples/sec   Loss 4.4328   LearningRate 0.0009   Epoch: 6   Global Step: 11420   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 01:48:53,432-Speed 24462.87 samples/sec   Loss 4.4663   LearningRate 0.0009   Epoch: 6   Global Step: 11430   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 01:49:03,498-Speed 24417.39 samples/sec   Loss 4.5124   LearningRate 0.0009   Epoch: 6   Global Step: 11440   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-26 01:49:13,538-Speed 24482.53 samples/sec   Loss 4.4181   LearningRate 0.0009   Epoch: 6   Global Step: 11450   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-26 01:49:23,592-Speed 24445.34 samples/sec   Loss 4.4421   LearningRate 0.0009   Epoch: 6   Global Step: 11460   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-26 01:49:33,532-Speed 24726.53 samples/sec   Loss 4.4743   LearningRate 0.0009   Epoch: 6   Global Step: 11470   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-26 01:49:43,519-Speed 24618.14 samples/sec   Loss 4.4535   LearningRate 0.0009   Epoch: 6   Global Step: 11480   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-26 01:49:53,489-Speed 24653.54 samples/sec   Loss 4.5024   LearningRate 0.0009   Epoch: 6   Global Step: 11490   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-26 01:50:03,387-Speed 24839.78 samples/sec   Loss 4.4299   LearningRate 0.0009   Epoch: 6   Global Step: 11500   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-26 01:50:13,436-Speed 24458.61 samples/sec   Loss 4.4297   LearningRate 0.0009   Epoch: 6   Global Step: 11510   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-26 01:50:23,354-Speed 24783.49 samples/sec   Loss 4.4284   LearningRate 0.0009   Epoch: 6   Global Step: 11520   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-26 01:50:33,295-Speed 24725.92 samples/sec   Loss 4.4089   LearningRate 0.0009   Epoch: 6   Global Step: 11530   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-26 01:50:43,346-Speed 24455.31 samples/sec   Loss 4.4182   LearningRate 0.0009   Epoch: 6   Global Step: 11540   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 01:50:53,300-Speed 24691.67 samples/sec   Loss 4.4150   LearningRate 0.0009   Epoch: 6   Global Step: 11550   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 01:51:03,274-Speed 24642.40 samples/sec   Loss 4.4560   LearningRate 0.0009   Epoch: 6   Global Step: 11560   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 01:51:13,247-Speed 24645.91 samples/sec   Loss 4.4260   LearningRate 0.0009   Epoch: 6   Global Step: 11570   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 01:51:23,250-Speed 24568.93 samples/sec   Loss 4.4194   LearningRate 0.0009   Epoch: 6   Global Step: 11580   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 01:51:33,248-Speed 24586.53 samples/sec   Loss 4.4719   LearningRate 0.0009   Epoch: 6   Global Step: 11590   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 01:51:43,237-Speed 24604.71 samples/sec   Loss 4.4357   LearningRate 0.0009   Epoch: 6   Global Step: 11600   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 01:51:53,257-Speed 24532.02 samples/sec   Loss 4.3827   LearningRate 0.0009   Epoch: 6   Global Step: 11610   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 01:52:03,399-Speed 24234.94 samples/sec   Loss 4.4255   LearningRate 0.0009   Epoch: 6   Global Step: 11620   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 01:52:13,366-Speed 24658.86 samples/sec   Loss 4.4067   LearningRate 0.0009   Epoch: 6   Global Step: 11630   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 01:52:23,359-Speed 24596.28 samples/sec   Loss 4.3734   LearningRate 0.0009   Epoch: 6   Global Step: 11640   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:52:33,322-Speed 24669.64 samples/sec   Loss 4.3851   LearningRate 0.0009   Epoch: 6   Global Step: 11650   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:52:43,300-Speed 24634.44 samples/sec   Loss 4.3734   LearningRate 0.0009   Epoch: 6   Global Step: 11660   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:52:53,286-Speed 24613.37 samples/sec   Loss 4.4359   LearningRate 0.0009   Epoch: 6   Global Step: 11670   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:53:03,310-Speed 24519.84 samples/sec   Loss 4.3985   LearningRate 0.0009   Epoch: 6   Global Step: 11680   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:53:13,274-Speed 24667.82 samples/sec   Loss 4.4060   LearningRate 0.0009   Epoch: 6   Global Step: 11690   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:53:23,291-Speed 24535.82 samples/sec   Loss 4.3878   LearningRate 0.0009   Epoch: 6   Global Step: 11700   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:53:33,255-Speed 24668.04 samples/sec   Loss 4.3930   LearningRate 0.0009   Epoch: 6   Global Step: 11710   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:53:43,520-Speed 23943.92 samples/sec   Loss 4.4142   LearningRate 0.0009   Epoch: 6   Global Step: 11720   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:53:53,441-Speed 24774.58 samples/sec   Loss 4.3980   LearningRate 0.0009   Epoch: 6   Global Step: 11730   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:54:03,342-Speed 24826.73 samples/sec   Loss 4.4004   LearningRate 0.0009   Epoch: 6   Global Step: 11740   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:54:13,364-Speed 24525.72 samples/sec   Loss 4.3562   LearningRate 0.0009   Epoch: 6   Global Step: 11750   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:54:23,345-Speed 24623.48 samples/sec   Loss 4.3470   LearningRate 0.0009   Epoch: 6   Global Step: 11760   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:54:33,331-Speed 24612.99 samples/sec   Loss 4.3207   LearningRate 0.0008   Epoch: 6   Global Step: 11770   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:54:43,327-Speed 24595.26 samples/sec   Loss 4.3726   LearningRate 0.0008   Epoch: 6   Global Step: 11780   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:54:53,312-Speed 24616.55 samples/sec   Loss 4.3432   LearningRate 0.0008   Epoch: 6   Global Step: 11790   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:55:03,277-Speed 24665.23 samples/sec   Loss 4.3913   LearningRate 0.0008   Epoch: 6   Global Step: 11800   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:55:13,402-Speed 24274.71 samples/sec   Loss 4.3846   LearningRate 0.0008   Epoch: 6   Global Step: 11810   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:55:23,363-Speed 24675.09 samples/sec   Loss 4.3478   LearningRate 0.0008   Epoch: 6   Global Step: 11820   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:55:33,375-Speed 24556.77 samples/sec   Loss 4.3446   LearningRate 0.0008   Epoch: 6   Global Step: 11830   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:55:43,306-Speed 24747.80 samples/sec   Loss 4.3450   LearningRate 0.0008   Epoch: 6   Global Step: 11840   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:55:53,278-Speed 24656.11 samples/sec   Loss 4.3350   LearningRate 0.0008   Epoch: 6   Global Step: 11850   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:56:03,350-Speed 24402.45 samples/sec   Loss 4.3574   LearningRate 0.0008   Epoch: 6   Global Step: 11860   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:56:13,392-Speed 24476.19 samples/sec   Loss 4.3414   LearningRate 0.0008   Epoch: 6   Global Step: 11870   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:56:23,386-Speed 24594.61 samples/sec   Loss 4.3272   LearningRate 0.0008   Epoch: 6   Global Step: 11880   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:56:33,367-Speed 24624.42 samples/sec   Loss 4.3417   LearningRate 0.0008   Epoch: 6   Global Step: 11890   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:56:43,414-Speed 24466.49 samples/sec   Loss 4.3321   LearningRate 0.0008   Epoch: 6   Global Step: 11900   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:56:53,415-Speed 24576.48 samples/sec   Loss 4.3439   LearningRate 0.0008   Epoch: 6   Global Step: 11910   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:57:03,435-Speed 24529.64 samples/sec   Loss 4.3225   LearningRate 0.0008   Epoch: 6   Global Step: 11920   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:57:13,490-Speed 24444.86 samples/sec   Loss 4.3454   LearningRate 0.0008   Epoch: 6   Global Step: 11930   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:57:23,465-Speed 24642.23 samples/sec   Loss 4.2876   LearningRate 0.0008   Epoch: 6   Global Step: 11940   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:57:33,515-Speed 24462.76 samples/sec   Loss 4.3218   LearningRate 0.0008   Epoch: 6   Global Step: 11950   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:57:43,351-Speed 24987.98 samples/sec   Loss 4.3497   LearningRate 0.0008   Epoch: 6   Global Step: 11960   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:57:53,087-Speed 25246.48 samples/sec   Loss 4.3238   LearningRate 0.0008   Epoch: 6   Global Step: 11970   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:58:02,875-Speed 25111.72 samples/sec   Loss 4.3166   LearningRate 0.0008   Epoch: 6   Global Step: 11980   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:58:12,678-Speed 25072.54 samples/sec   Loss 4.3399   LearningRate 0.0008   Epoch: 6   Global Step: 11990   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:58:22,394-Speed 25303.62 samples/sec   Loss 4.2892   LearningRate 0.0008   Epoch: 6   Global Step: 12000   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:58:32,167-Speed 25150.77 samples/sec   Loss 4.3176   LearningRate 0.0008   Epoch: 6   Global Step: 12010   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:58:41,859-Speed 25359.92 samples/sec   Loss 4.2784   LearningRate 0.0008   Epoch: 6   Global Step: 12020   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:58:51,598-Speed 25238.86 samples/sec   Loss 4.3134   LearningRate 0.0008   Epoch: 6   Global Step: 12030   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:59:01,442-Speed 24968.67 samples/sec   Loss 4.3390   LearningRate 0.0008   Epoch: 6   Global Step: 12040   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-03-26 01:59:11,291-Speed 24955.89 samples/sec   Loss 4.3710   LearningRate 0.0008   Epoch: 6   Global Step: 12050   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:59:21,087-Speed 25090.00 samples/sec   Loss 4.3340   LearningRate 0.0008   Epoch: 6   Global Step: 12060   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:59:30,814-Speed 25270.46 samples/sec   Loss 4.3224   LearningRate 0.0008   Epoch: 6   Global Step: 12070   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:59:40,536-Speed 25281.65 samples/sec   Loss 4.3359   LearningRate 0.0008   Epoch: 6   Global Step: 12080   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 01:59:50,388-Speed 24946.10 samples/sec   Loss 4.3253   LearningRate 0.0008   Epoch: 6   Global Step: 12090   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 02:00:50,326-Speed 4100.30 samples/sec   Loss 4.3474   LearningRate 0.0008   Epoch: 7   Global Step: 12100   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 02:01:00,144-Speed 25035.31 samples/sec   Loss 4.2515   LearningRate 0.0008   Epoch: 7   Global Step: 12110   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 02:01:09,818-Speed 25408.94 samples/sec   Loss 4.2426   LearningRate 0.0008   Epoch: 7   Global Step: 12120   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 02:01:19,567-Speed 25214.47 samples/sec   Loss 4.2101   LearningRate 0.0008   Epoch: 7   Global Step: 12130   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 02:01:29,294-Speed 25279.17 samples/sec   Loss 4.2349   LearningRate 0.0008   Epoch: 7   Global Step: 12140   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 02:01:39,134-Speed 24978.73 samples/sec   Loss 4.2336   LearningRate 0.0008   Epoch: 7   Global Step: 12150   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 02:01:48,880-Speed 25221.36 samples/sec   Loss 4.2417   LearningRate 0.0008   Epoch: 7   Global Step: 12160   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 02:01:58,553-Speed 25415.48 samples/sec   Loss 4.3043   LearningRate 0.0008   Epoch: 7   Global Step: 12170   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 02:02:08,351-Speed 25086.08 samples/sec   Loss 4.2236   LearningRate 0.0008   Epoch: 7   Global Step: 12180   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 02:02:18,196-Speed 24968.88 samples/sec   Loss 4.2330   LearningRate 0.0008   Epoch: 7   Global Step: 12190   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 02:02:27,973-Speed 25141.18 samples/sec   Loss 4.2669   LearningRate 0.0008   Epoch: 7   Global Step: 12200   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 02:02:37,765-Speed 25105.64 samples/sec   Loss 4.2055   LearningRate 0.0008   Epoch: 7   Global Step: 12210   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 02:02:47,660-Speed 24840.78 samples/sec   Loss 4.2341   LearningRate 0.0008   Epoch: 7   Global Step: 12220   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 02:02:57,333-Speed 25412.04 samples/sec   Loss 4.2333   LearningRate 0.0008   Epoch: 7   Global Step: 12230   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 02:03:07,079-Speed 25219.68 samples/sec   Loss 4.2201   LearningRate 0.0008   Epoch: 7   Global Step: 12240   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 02:03:16,888-Speed 25065.68 samples/sec   Loss 4.2402   LearningRate 0.0008   Epoch: 7   Global Step: 12250   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 02:03:26,703-Speed 25051.69 samples/sec   Loss 4.2312   LearningRate 0.0008   Epoch: 7   Global Step: 12260   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 02:03:36,427-Speed 25276.23 samples/sec   Loss 4.2331   LearningRate 0.0008   Epoch: 7   Global Step: 12270   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 02:03:46,134-Speed 25321.26 samples/sec   Loss 4.1898   LearningRate 0.0008   Epoch: 7   Global Step: 12280   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 02:03:55,870-Speed 25245.42 samples/sec   Loss 4.2250   LearningRate 0.0008   Epoch: 7   Global Step: 12290   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 02:04:05,600-Speed 25262.74 samples/sec   Loss 4.2785   LearningRate 0.0008   Epoch: 7   Global Step: 12300   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 02:04:15,316-Speed 25298.23 samples/sec   Loss 4.2343   LearningRate 0.0008   Epoch: 7   Global Step: 12310   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 02:04:25,184-Speed 24908.90 samples/sec   Loss 4.2079   LearningRate 0.0008   Epoch: 7   Global Step: 12320   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 02:04:35,023-Speed 24983.08 samples/sec   Loss 4.2173   LearningRate 0.0008   Epoch: 7   Global Step: 12330   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 02:04:44,827-Speed 25069.17 samples/sec   Loss 4.2400   LearningRate 0.0008   Epoch: 7   Global Step: 12340   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-03-26 02:04:54,487-Speed 25444.95 samples/sec   Loss 4.2424   LearningRate 0.0008   Epoch: 7   Global Step: 12350   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 02:05:04,307-Speed 25028.63 samples/sec   Loss 4.2343   LearningRate 0.0008   Epoch: 7   Global Step: 12360   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 02:05:14,128-Speed 25029.27 samples/sec   Loss 4.2103   LearningRate 0.0008   Epoch: 7   Global Step: 12370   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 02:05:23,952-Speed 25018.93 samples/sec   Loss 4.2564   LearningRate 0.0008   Epoch: 7   Global Step: 12380   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 02:05:33,702-Speed 25215.33 samples/sec   Loss 4.1886   LearningRate 0.0008   Epoch: 7   Global Step: 12390   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 02:05:43,450-Speed 25214.27 samples/sec   Loss 4.2128   LearningRate 0.0008   Epoch: 7   Global Step: 12400   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 02:05:53,263-Speed 25047.21 samples/sec   Loss 4.1801   LearningRate 0.0008   Epoch: 7   Global Step: 12410   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 02:06:02,936-Speed 25410.78 samples/sec   Loss 4.1842   LearningRate 0.0008   Epoch: 7   Global Step: 12420   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 02:06:12,733-Speed 25089.44 samples/sec   Loss 4.1884   LearningRate 0.0008   Epoch: 7   Global Step: 12430   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 02:06:22,408-Speed 25404.14 samples/sec   Loss 4.2004   LearningRate 0.0008   Epoch: 7   Global Step: 12440   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 02:06:32,228-Speed 25027.47 samples/sec   Loss 4.2039   LearningRate 0.0008   Epoch: 7   Global Step: 12450   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 02:06:41,920-Speed 25361.41 samples/sec   Loss 4.2254   LearningRate 0.0008   Epoch: 7   Global Step: 12460   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 02:06:51,793-Speed 24897.58 samples/sec   Loss 4.1878   LearningRate 0.0008   Epoch: 7   Global Step: 12470   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 02:07:01,539-Speed 25217.92 samples/sec   Loss 4.1767   LearningRate 0.0008   Epoch: 7   Global Step: 12480   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 02:07:11,259-Speed 25288.75 samples/sec   Loss 4.1827   LearningRate 0.0008   Epoch: 7   Global Step: 12490   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 02:07:21,054-Speed 25094.17 samples/sec   Loss 4.2146   LearningRate 0.0008   Epoch: 7   Global Step: 12500   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 02:07:30,831-Speed 25140.93 samples/sec   Loss 4.1807   LearningRate 0.0008   Epoch: 7   Global Step: 12510   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 02:07:40,674-Speed 24969.26 samples/sec   Loss 4.1942   LearningRate 0.0008   Epoch: 7   Global Step: 12520   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 02:07:50,493-Speed 25031.08 samples/sec   Loss 4.1608   LearningRate 0.0008   Epoch: 7   Global Step: 12530   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 02:08:00,296-Speed 25074.91 samples/sec   Loss 4.1857   LearningRate 0.0008   Epoch: 7   Global Step: 12540   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 02:08:09,966-Speed 25416.52 samples/sec   Loss 4.1629   LearningRate 0.0008   Epoch: 7   Global Step: 12550   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 02:08:19,772-Speed 25065.06 samples/sec   Loss 4.1643   LearningRate 0.0008   Epoch: 7   Global Step: 12560   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 02:08:29,572-Speed 25082.46 samples/sec   Loss 4.1771   LearningRate 0.0008   Epoch: 7   Global Step: 12570   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 02:08:39,247-Speed 25402.61 samples/sec   Loss 4.1834   LearningRate 0.0008   Epoch: 7   Global Step: 12580   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 02:08:49,025-Speed 25136.91 samples/sec   Loss 4.1487   LearningRate 0.0008   Epoch: 7   Global Step: 12590   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 02:08:58,730-Speed 25326.32 samples/sec   Loss 4.1200   LearningRate 0.0008   Epoch: 7   Global Step: 12600   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 02:09:08,545-Speed 25043.87 samples/sec   Loss 4.1444   LearningRate 0.0008   Epoch: 7   Global Step: 12610   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 02:09:18,350-Speed 25067.48 samples/sec   Loss 4.1489   LearningRate 0.0008   Epoch: 7   Global Step: 12620   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 02:09:28,210-Speed 24928.75 samples/sec   Loss 4.1848   LearningRate 0.0008   Epoch: 7   Global Step: 12630   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 02:09:37,976-Speed 25168.87 samples/sec   Loss 4.1680   LearningRate 0.0008   Epoch: 7   Global Step: 12640   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 02:09:47,678-Speed 25331.47 samples/sec   Loss 4.1205   LearningRate 0.0008   Epoch: 7   Global Step: 12650   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 02:09:57,692-Speed 24545.72 samples/sec   Loss 4.1944   LearningRate 0.0008   Epoch: 7   Global Step: 12660   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 02:10:07,451-Speed 25184.38 samples/sec   Loss 4.1822   LearningRate 0.0008   Epoch: 7   Global Step: 12670   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 02:10:17,257-Speed 25065.11 samples/sec   Loss 4.1333   LearningRate 0.0008   Epoch: 7   Global Step: 12680   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 02:10:27,143-Speed 24862.17 samples/sec   Loss 4.1132   LearningRate 0.0008   Epoch: 7   Global Step: 12690   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 02:10:36,937-Speed 25095.54 samples/sec   Loss 4.0823   LearningRate 0.0008   Epoch: 7   Global Step: 12700   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 02:10:46,670-Speed 25251.48 samples/sec   Loss 4.1427   LearningRate 0.0008   Epoch: 7   Global Step: 12710   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 02:10:56,422-Speed 25205.56 samples/sec   Loss 4.1560   LearningRate 0.0008   Epoch: 7   Global Step: 12720   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 02:11:06,170-Speed 25213.50 samples/sec   Loss 4.1280   LearningRate 0.0008   Epoch: 7   Global Step: 12730   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 02:11:15,903-Speed 25254.14 samples/sec   Loss 4.0761   LearningRate 0.0008   Epoch: 7   Global Step: 12740   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 02:11:25,605-Speed 25333.63 samples/sec   Loss 4.1416   LearningRate 0.0008   Epoch: 7   Global Step: 12750   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 02:11:35,329-Speed 25274.59 samples/sec   Loss 4.1264   LearningRate 0.0008   Epoch: 7   Global Step: 12760   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 02:11:45,189-Speed 24928.38 samples/sec   Loss 4.1210   LearningRate 0.0008   Epoch: 7   Global Step: 12770   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 02:11:55,022-Speed 24995.74 samples/sec   Loss 4.1616   LearningRate 0.0008   Epoch: 7   Global Step: 12780   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 02:12:04,871-Speed 24964.30 samples/sec   Loss 4.1545   LearningRate 0.0008   Epoch: 7   Global Step: 12790   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 02:12:14,564-Speed 25356.03 samples/sec   Loss 4.1203   LearningRate 0.0008   Epoch: 7   Global Step: 12800   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 02:12:24,289-Speed 25274.66 samples/sec   Loss 4.0950   LearningRate 0.0008   Epoch: 7   Global Step: 12810   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 02:12:34,128-Speed 24982.84 samples/sec   Loss 4.0852   LearningRate 0.0008   Epoch: 7   Global Step: 12820   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 02:12:43,836-Speed 25317.55 samples/sec   Loss 4.1098   LearningRate 0.0008   Epoch: 7   Global Step: 12830   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 02:12:53,546-Speed 25312.52 samples/sec   Loss 4.0961   LearningRate 0.0008   Epoch: 7   Global Step: 12840   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 02:13:03,262-Speed 25298.01 samples/sec   Loss 4.0929   LearningRate 0.0008   Epoch: 7   Global Step: 12850   Fp16 Grad Scale: 262144   Required: 16 hours
Training: 2022-03-26 02:13:13,005-Speed 25227.99 samples/sec   Loss 4.1123   LearningRate 0.0008   Epoch: 7   Global Step: 12860   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 02:13:22,746-Speed 25232.19 samples/sec   Loss 4.1187   LearningRate 0.0008   Epoch: 7   Global Step: 12870   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 02:13:32,539-Speed 25100.42 samples/sec   Loss 4.1135   LearningRate 0.0008   Epoch: 7   Global Step: 12880   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 02:13:42,348-Speed 25057.38 samples/sec   Loss 4.0821   LearningRate 0.0008   Epoch: 7   Global Step: 12890   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 02:13:52,060-Speed 25307.89 samples/sec   Loss 4.1389   LearningRate 0.0008   Epoch: 7   Global Step: 12900   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 02:14:01,858-Speed 25086.98 samples/sec   Loss 4.1066   LearningRate 0.0008   Epoch: 7   Global Step: 12910   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 02:14:11,665-Speed 25065.90 samples/sec   Loss 4.0640   LearningRate 0.0008   Epoch: 7   Global Step: 12920   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 02:14:21,429-Speed 25172.95 samples/sec   Loss 4.0630   LearningRate 0.0008   Epoch: 7   Global Step: 12930   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 02:14:31,205-Speed 25147.27 samples/sec   Loss 4.0808   LearningRate 0.0008   Epoch: 7   Global Step: 12940   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 02:14:41,003-Speed 25087.18 samples/sec   Loss 4.0742   LearningRate 0.0008   Epoch: 7   Global Step: 12950   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 02:14:50,849-Speed 24964.22 samples/sec   Loss 4.0925   LearningRate 0.0008   Epoch: 7   Global Step: 12960   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 02:15:00,553-Speed 25326.19 samples/sec   Loss 4.1331   LearningRate 0.0008   Epoch: 7   Global Step: 12970   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 02:15:10,354-Speed 25078.63 samples/sec   Loss 4.0794   LearningRate 0.0008   Epoch: 7   Global Step: 12980   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 02:15:20,155-Speed 25078.54 samples/sec   Loss 4.0672   LearningRate 0.0008   Epoch: 7   Global Step: 12990   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 02:15:29,948-Speed 25098.68 samples/sec   Loss 4.0788   LearningRate 0.0008   Epoch: 7   Global Step: 13000   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 02:15:39,595-Speed 25478.39 samples/sec   Loss 4.0969   LearningRate 0.0008   Epoch: 7   Global Step: 13010   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 02:15:49,401-Speed 25064.35 samples/sec   Loss 4.0706   LearningRate 0.0008   Epoch: 7   Global Step: 13020   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 02:15:59,121-Speed 25284.44 samples/sec   Loss 4.0659   LearningRate 0.0008   Epoch: 7   Global Step: 13030   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 02:16:08,945-Speed 25025.11 samples/sec   Loss 4.0623   LearningRate 0.0008   Epoch: 7   Global Step: 13040   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 02:16:18,649-Speed 25331.04 samples/sec   Loss 4.0657   LearningRate 0.0008   Epoch: 7   Global Step: 13050   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 02:16:28,536-Speed 24859.17 samples/sec   Loss 4.0420   LearningRate 0.0008   Epoch: 7   Global Step: 13060   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 02:16:38,398-Speed 24923.52 samples/sec   Loss 4.0505   LearningRate 0.0008   Epoch: 7   Global Step: 13070   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 02:16:48,177-Speed 25134.22 samples/sec   Loss 4.0618   LearningRate 0.0008   Epoch: 7   Global Step: 13080   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 02:16:57,940-Speed 25175.96 samples/sec   Loss 4.0519   LearningRate 0.0008   Epoch: 7   Global Step: 13090   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 02:17:07,776-Speed 24986.40 samples/sec   Loss 4.0632   LearningRate 0.0008   Epoch: 7   Global Step: 13100   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 02:17:17,586-Speed 25056.30 samples/sec   Loss 4.0432   LearningRate 0.0008   Epoch: 7   Global Step: 13110   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 02:17:27,514-Speed 24759.10 samples/sec   Loss 4.0054   LearningRate 0.0008   Epoch: 7   Global Step: 13120   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 02:17:37,312-Speed 25085.46 samples/sec   Loss 4.0056   LearningRate 0.0008   Epoch: 7   Global Step: 13130   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 02:17:47,122-Speed 25054.58 samples/sec   Loss 4.0591   LearningRate 0.0008   Epoch: 7   Global Step: 13140   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 02:17:56,833-Speed 25312.25 samples/sec   Loss 4.0070   LearningRate 0.0008   Epoch: 7   Global Step: 13150   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 02:18:06,673-Speed 24978.59 samples/sec   Loss 4.0398   LearningRate 0.0008   Epoch: 7   Global Step: 13160   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 02:18:16,412-Speed 25238.07 samples/sec   Loss 4.0822   LearningRate 0.0008   Epoch: 7   Global Step: 13170   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 02:18:26,143-Speed 25260.49 samples/sec   Loss 4.0693   LearningRate 0.0008   Epoch: 7   Global Step: 13180   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 02:18:35,943-Speed 25080.46 samples/sec   Loss 4.0337   LearningRate 0.0008   Epoch: 7   Global Step: 13190   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 02:18:45,770-Speed 25014.74 samples/sec   Loss 4.0500   LearningRate 0.0008   Epoch: 7   Global Step: 13200   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 02:18:55,554-Speed 25119.28 samples/sec   Loss 4.0070   LearningRate 0.0008   Epoch: 7   Global Step: 13210   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 02:19:05,338-Speed 25122.34 samples/sec   Loss 4.0335   LearningRate 0.0008   Epoch: 7   Global Step: 13220   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 02:19:15,048-Speed 25314.54 samples/sec   Loss 3.9762   LearningRate 0.0008   Epoch: 7   Global Step: 13230   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 02:19:24,909-Speed 24927.18 samples/sec   Loss 3.9996   LearningRate 0.0008   Epoch: 7   Global Step: 13240   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 02:19:34,691-Speed 25132.98 samples/sec   Loss 4.0287   LearningRate 0.0008   Epoch: 7   Global Step: 13250   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 02:19:44,445-Speed 25198.42 samples/sec   Loss 3.9750   LearningRate 0.0008   Epoch: 7   Global Step: 13260   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 02:19:54,251-Speed 25073.94 samples/sec   Loss 3.9852   LearningRate 0.0008   Epoch: 7   Global Step: 13270   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 02:20:04,073-Speed 25025.43 samples/sec   Loss 4.0146   LearningRate 0.0008   Epoch: 7   Global Step: 13280   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 02:20:13,809-Speed 25244.31 samples/sec   Loss 4.0107   LearningRate 0.0008   Epoch: 7   Global Step: 13290   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 02:20:23,606-Speed 25090.31 samples/sec   Loss 3.9922   LearningRate 0.0008   Epoch: 7   Global Step: 13300   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 02:20:33,347-Speed 25236.32 samples/sec   Loss 4.0098   LearningRate 0.0008   Epoch: 7   Global Step: 13310   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 02:20:43,026-Speed 25394.50 samples/sec   Loss 3.9826   LearningRate 0.0008   Epoch: 7   Global Step: 13320   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 02:20:52,831-Speed 25067.93 samples/sec   Loss 3.9900   LearningRate 0.0008   Epoch: 7   Global Step: 13330   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 02:21:02,592-Speed 25180.81 samples/sec   Loss 3.9894   LearningRate 0.0008   Epoch: 7   Global Step: 13340   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 02:21:12,355-Speed 25177.62 samples/sec   Loss 3.9913   LearningRate 0.0008   Epoch: 7   Global Step: 13350   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 02:21:22,108-Speed 25209.48 samples/sec   Loss 4.0016   LearningRate 0.0008   Epoch: 7   Global Step: 13360   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 02:21:31,874-Speed 25167.88 samples/sec   Loss 3.9605   LearningRate 0.0008   Epoch: 7   Global Step: 13370   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 02:21:41,621-Speed 25217.77 samples/sec   Loss 4.0046   LearningRate 0.0008   Epoch: 7   Global Step: 13380   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 02:21:51,372-Speed 25206.42 samples/sec   Loss 3.9873   LearningRate 0.0008   Epoch: 7   Global Step: 13390   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 02:22:01,128-Speed 25193.42 samples/sec   Loss 4.0013   LearningRate 0.0008   Epoch: 7   Global Step: 13400   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 02:22:10,812-Speed 25379.38 samples/sec   Loss 3.9992   LearningRate 0.0008   Epoch: 7   Global Step: 13410   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 02:22:20,583-Speed 25157.38 samples/sec   Loss 3.9546   LearningRate 0.0008   Epoch: 7   Global Step: 13420   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 02:22:30,270-Speed 25373.15 samples/sec   Loss 3.9699   LearningRate 0.0008   Epoch: 7   Global Step: 13430   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 02:22:39,963-Speed 25357.83 samples/sec   Loss 3.9747   LearningRate 0.0008   Epoch: 7   Global Step: 13440   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 02:22:49,777-Speed 25045.67 samples/sec   Loss 3.9583   LearningRate 0.0008   Epoch: 7   Global Step: 13450   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 02:22:59,611-Speed 24994.76 samples/sec   Loss 3.9987   LearningRate 0.0008   Epoch: 7   Global Step: 13460   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 02:23:09,414-Speed 25074.55 samples/sec   Loss 3.9792   LearningRate 0.0008   Epoch: 7   Global Step: 13470   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 02:23:19,278-Speed 24917.46 samples/sec   Loss 4.0014   LearningRate 0.0008   Epoch: 7   Global Step: 13480   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 02:23:29,352-Speed 24397.79 samples/sec   Loss 3.9543   LearningRate 0.0008   Epoch: 7   Global Step: 13490   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 02:23:39,161-Speed 25059.60 samples/sec   Loss 3.9792   LearningRate 0.0008   Epoch: 7   Global Step: 13500   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 02:23:48,939-Speed 25137.78 samples/sec   Loss 3.9795   LearningRate 0.0008   Epoch: 7   Global Step: 13510   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 02:23:58,803-Speed 24915.73 samples/sec   Loss 3.9571   LearningRate 0.0008   Epoch: 7   Global Step: 13520   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 02:24:08,568-Speed 25170.49 samples/sec   Loss 3.9429   LearningRate 0.0008   Epoch: 7   Global Step: 13530   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 02:24:18,358-Speed 25107.65 samples/sec   Loss 3.9727   LearningRate 0.0008   Epoch: 7   Global Step: 13540   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 02:24:28,163-Speed 25067.33 samples/sec   Loss 3.9798   LearningRate 0.0008   Epoch: 7   Global Step: 13550   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 02:24:37,883-Speed 25287.79 samples/sec   Loss 4.0062   LearningRate 0.0008   Epoch: 7   Global Step: 13560   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 02:24:47,665-Speed 25127.16 samples/sec   Loss 3.9492   LearningRate 0.0008   Epoch: 7   Global Step: 13570   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 02:24:57,529-Speed 24918.85 samples/sec   Loss 3.9444   LearningRate 0.0008   Epoch: 7   Global Step: 13580   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 02:25:07,230-Speed 25338.04 samples/sec   Loss 3.9328   LearningRate 0.0008   Epoch: 7   Global Step: 13590   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 02:25:17,112-Speed 24871.20 samples/sec   Loss 3.9588   LearningRate 0.0008   Epoch: 7   Global Step: 13600   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 02:25:26,866-Speed 25199.81 samples/sec   Loss 3.9645   LearningRate 0.0008   Epoch: 7   Global Step: 13610   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 02:25:36,654-Speed 25112.10 samples/sec   Loss 3.9395   LearningRate 0.0008   Epoch: 7   Global Step: 13620   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 02:25:46,514-Speed 24931.49 samples/sec   Loss 3.9846   LearningRate 0.0008   Epoch: 7   Global Step: 13630   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 02:25:56,284-Speed 25157.27 samples/sec   Loss 3.9798   LearningRate 0.0008   Epoch: 7   Global Step: 13640   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 02:26:06,049-Speed 25170.92 samples/sec   Loss 3.9545   LearningRate 0.0008   Epoch: 7   Global Step: 13650   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 02:26:15,797-Speed 25215.25 samples/sec   Loss 3.9325   LearningRate 0.0008   Epoch: 7   Global Step: 13660   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 02:26:25,613-Speed 25041.95 samples/sec   Loss 3.9420   LearningRate 0.0008   Epoch: 7   Global Step: 13670   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 02:26:35,372-Speed 25186.01 samples/sec   Loss 3.9084   LearningRate 0.0008   Epoch: 7   Global Step: 13680   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 02:26:45,233-Speed 24923.96 samples/sec   Loss 3.9437   LearningRate 0.0008   Epoch: 7   Global Step: 13690   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 02:26:55,018-Speed 25119.56 samples/sec   Loss 3.9281   LearningRate 0.0008   Epoch: 7   Global Step: 13700   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 02:27:04,831-Speed 25054.88 samples/sec   Loss 3.9250   LearningRate 0.0008   Epoch: 7   Global Step: 13710   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 02:27:14,575-Speed 25224.14 samples/sec   Loss 3.9086   LearningRate 0.0008   Epoch: 7   Global Step: 13720   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 02:27:24,319-Speed 25224.34 samples/sec   Loss 3.9345   LearningRate 0.0008   Epoch: 7   Global Step: 13730   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 02:27:34,149-Speed 25006.69 samples/sec   Loss 3.9747   LearningRate 0.0008   Epoch: 7   Global Step: 13740   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 02:27:43,913-Speed 25170.46 samples/sec   Loss 3.9478   LearningRate 0.0008   Epoch: 7   Global Step: 13750   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 02:27:53,674-Speed 25179.75 samples/sec   Loss 3.9211   LearningRate 0.0008   Epoch: 7   Global Step: 13760   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 02:28:03,477-Speed 25071.99 samples/sec   Loss 3.9136   LearningRate 0.0008   Epoch: 7   Global Step: 13770   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 02:28:13,267-Speed 25108.18 samples/sec   Loss 3.9201   LearningRate 0.0008   Epoch: 7   Global Step: 13780   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 02:28:23,014-Speed 25215.51 samples/sec   Loss 3.9430   LearningRate 0.0008   Epoch: 7   Global Step: 13790   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 02:28:32,757-Speed 25226.82 samples/sec   Loss 3.9645   LearningRate 0.0008   Epoch: 7   Global Step: 13800   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 02:28:42,670-Speed 24793.67 samples/sec   Loss 3.9645   LearningRate 0.0008   Epoch: 7   Global Step: 13810   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 02:28:52,440-Speed 25155.22 samples/sec   Loss 3.9465   LearningRate 0.0008   Epoch: 7   Global Step: 13820   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 02:29:52,089-Speed 4120.21 samples/sec   Loss 3.9138   LearningRate 0.0008   Epoch: 8   Global Step: 13830   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 02:30:01,806-Speed 25304.58 samples/sec   Loss 3.8754   LearningRate 0.0008   Epoch: 8   Global Step: 13840   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 02:30:11,776-Speed 24652.69 samples/sec   Loss 3.8626   LearningRate 0.0008   Epoch: 8   Global Step: 13850   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 02:30:21,613-Speed 24986.52 samples/sec   Loss 3.8203   LearningRate 0.0008   Epoch: 8   Global Step: 13860   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 02:30:31,432-Speed 25031.82 samples/sec   Loss 3.8493   LearningRate 0.0008   Epoch: 8   Global Step: 13870   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 02:30:41,207-Speed 25145.81 samples/sec   Loss 3.8583   LearningRate 0.0008   Epoch: 8   Global Step: 13880   Fp16 Grad Scale: 131072   Required: 16 hours
Training: 2022-03-26 02:30:50,983-Speed 25142.01 samples/sec   Loss 3.8407   LearningRate 0.0008   Epoch: 8   Global Step: 13890   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 02:31:00,868-Speed 24863.91 samples/sec   Loss 3.8764   LearningRate 0.0008   Epoch: 8   Global Step: 13900   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 02:31:10,811-Speed 24720.84 samples/sec   Loss 3.9215   LearningRate 0.0008   Epoch: 8   Global Step: 13910   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 02:31:20,934-Speed 24280.11 samples/sec   Loss 3.8754   LearningRate 0.0008   Epoch: 8   Global Step: 13920   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 02:31:30,908-Speed 24644.35 samples/sec   Loss 3.8833   LearningRate 0.0008   Epoch: 8   Global Step: 13930   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 02:31:40,901-Speed 24596.81 samples/sec   Loss 3.8833   LearningRate 0.0008   Epoch: 8   Global Step: 13940   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 02:31:50,937-Speed 24490.19 samples/sec   Loss 3.8605   LearningRate 0.0008   Epoch: 8   Global Step: 13950   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 02:32:00,888-Speed 24702.17 samples/sec   Loss 3.8635   LearningRate 0.0008   Epoch: 8   Global Step: 13960   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 02:32:10,846-Speed 24683.30 samples/sec   Loss 3.8599   LearningRate 0.0008   Epoch: 8   Global Step: 13970   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-26 02:32:20,827-Speed 24625.59 samples/sec   Loss 3.9016   LearningRate 0.0008   Epoch: 8   Global Step: 13980   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 02:32:30,865-Speed 24486.37 samples/sec   Loss 3.8945   LearningRate 0.0008   Epoch: 8   Global Step: 13990   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-03-26 02:32:40,887-Speed 24524.98 samples/sec   Loss 3.8841   LearningRate 0.0008   Epoch: 8   Global Step: 14000   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-03-26 02:32:50,875-Speed 24609.17 samples/sec   Loss 3.8477   LearningRate 0.0008   Epoch: 8   Global Step: 14010   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-03-26 02:33:00,788-Speed 24795.32 samples/sec   Loss 3.8288   LearningRate 0.0008   Epoch: 8   Global Step: 14020   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-03-26 02:33:10,691-Speed 24819.11 samples/sec   Loss 3.8192   LearningRate 0.0008   Epoch: 8   Global Step: 14030   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-03-26 02:33:20,671-Speed 24628.04 samples/sec   Loss 3.9350   LearningRate 0.0008   Epoch: 8   Global Step: 14040   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-03-26 02:33:30,763-Speed 24354.26 samples/sec   Loss 3.9634   LearningRate 0.0008   Epoch: 8   Global Step: 14050   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-03-26 02:33:40,788-Speed 24516.19 samples/sec   Loss 3.8885   LearningRate 0.0008   Epoch: 8   Global Step: 14060   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-03-26 02:33:50,732-Speed 24717.24 samples/sec   Loss 3.8600   LearningRate 0.0008   Epoch: 8   Global Step: 14070   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-03-26 02:34:00,687-Speed 24688.76 samples/sec   Loss 3.8569   LearningRate 0.0008   Epoch: 8   Global Step: 14080   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 02:34:10,684-Speed 24586.48 samples/sec   Loss 3.8499   LearningRate 0.0008   Epoch: 8   Global Step: 14090   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 02:34:20,591-Speed 24808.97 samples/sec   Loss 3.8450   LearningRate 0.0008   Epoch: 8   Global Step: 14100   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 02:34:30,527-Speed 24737.10 samples/sec   Loss 3.8493   LearningRate 0.0008   Epoch: 8   Global Step: 14110   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 02:34:40,570-Speed 24472.88 samples/sec   Loss 3.8449   LearningRate 0.0008   Epoch: 8   Global Step: 14120   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 02:34:50,599-Speed 24508.89 samples/sec   Loss 3.8731   LearningRate 0.0008   Epoch: 8   Global Step: 14130   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 02:35:00,590-Speed 24600.57 samples/sec   Loss 3.8488   LearningRate 0.0008   Epoch: 8   Global Step: 14140   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 02:35:10,542-Speed 24697.02 samples/sec   Loss 3.8270   LearningRate 0.0008   Epoch: 8   Global Step: 14150   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 02:35:20,584-Speed 24474.04 samples/sec   Loss 3.8340   LearningRate 0.0008   Epoch: 8   Global Step: 14160   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 02:35:30,630-Speed 24465.74 samples/sec   Loss 3.8367   LearningRate 0.0008   Epoch: 8   Global Step: 14170   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 02:35:40,815-Speed 24139.84 samples/sec   Loss 3.8717   LearningRate 0.0008   Epoch: 8   Global Step: 14180   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-03-26 02:35:50,817-Speed 24577.49 samples/sec   Loss 3.8553   LearningRate 0.0008   Epoch: 8   Global Step: 14190   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-03-26 02:36:00,839-Speed 24525.57 samples/sec   Loss 3.8867   LearningRate 0.0008   Epoch: 8   Global Step: 14200   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-03-26 02:36:10,832-Speed 24593.65 samples/sec   Loss 3.8680   LearningRate 0.0008   Epoch: 8   Global Step: 14210   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-03-26 02:36:20,924-Speed 24354.74 samples/sec   Loss 3.8307   LearningRate 0.0008   Epoch: 8   Global Step: 14220   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-03-26 02:36:31,049-Speed 24274.67 samples/sec   Loss 3.8332   LearningRate 0.0008   Epoch: 8   Global Step: 14230   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-03-26 02:36:41,067-Speed 24533.81 samples/sec   Loss 3.8228   LearningRate 0.0008   Epoch: 8   Global Step: 14240   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-03-26 02:36:51,094-Speed 24513.57 samples/sec   Loss 3.8476   LearningRate 0.0008   Epoch: 8   Global Step: 14250   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-03-26 02:37:01,083-Speed 24605.72 samples/sec   Loss 3.8655   LearningRate 0.0008   Epoch: 8   Global Step: 14260   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-03-26 02:37:11,072-Speed 24607.45 samples/sec   Loss 3.8269   LearningRate 0.0008   Epoch: 8   Global Step: 14270   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-03-26 02:37:21,177-Speed 24322.57 samples/sec   Loss 3.8409   LearningRate 0.0008   Epoch: 8   Global Step: 14280   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-03-26 02:37:31,311-Speed 24251.79 samples/sec   Loss 3.8134   LearningRate 0.0008   Epoch: 8   Global Step: 14290   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-03-26 02:37:41,331-Speed 24536.65 samples/sec   Loss 3.8171   LearningRate 0.0008   Epoch: 8   Global Step: 14300   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-03-26 02:37:51,293-Speed 24672.51 samples/sec   Loss 3.8267   LearningRate 0.0008   Epoch: 8   Global Step: 14310   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-03-26 02:38:01,314-Speed 24527.95 samples/sec   Loss 3.8053   LearningRate 0.0008   Epoch: 8   Global Step: 14320   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-03-26 02:38:11,294-Speed 24628.11 samples/sec   Loss 3.8315   LearningRate 0.0008   Epoch: 8   Global Step: 14330   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-03-26 02:38:21,239-Speed 24713.52 samples/sec   Loss 3.8011   LearningRate 0.0008   Epoch: 8   Global Step: 14340   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-03-26 02:38:31,201-Speed 24671.86 samples/sec   Loss 3.8117   LearningRate 0.0008   Epoch: 8   Global Step: 14350   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-03-26 02:38:41,176-Speed 24642.03 samples/sec   Loss 3.8597   LearningRate 0.0008   Epoch: 8   Global Step: 14360   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-03-26 02:38:51,180-Speed 24569.03 samples/sec   Loss 3.8187   LearningRate 0.0008   Epoch: 8   Global Step: 14370   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-03-26 02:39:01,064-Speed 24867.62 samples/sec   Loss 3.8027   LearningRate 0.0008   Epoch: 8   Global Step: 14380   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 02:39:10,958-Speed 24842.19 samples/sec   Loss 3.7951   LearningRate 0.0008   Epoch: 8   Global Step: 14390   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 02:39:20,779-Speed 25025.52 samples/sec   Loss 3.7770   LearningRate 0.0008   Epoch: 8   Global Step: 14400   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 02:39:30,587-Speed 25061.07 samples/sec   Loss 3.8071   LearningRate 0.0008   Epoch: 8   Global Step: 14410   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 02:39:40,418-Speed 25007.46 samples/sec   Loss 3.7893   LearningRate 0.0008   Epoch: 8   Global Step: 14420   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 02:39:50,206-Speed 25112.12 samples/sec   Loss 3.8100   LearningRate 0.0008   Epoch: 8   Global Step: 14430   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 02:40:00,228-Speed 24523.16 samples/sec   Loss 3.7826   LearningRate 0.0008   Epoch: 8   Global Step: 14440   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 02:40:10,298-Speed 24407.26 samples/sec   Loss 3.7731   LearningRate 0.0008   Epoch: 8   Global Step: 14450   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 02:40:20,315-Speed 24537.68 samples/sec   Loss 3.7979   LearningRate 0.0008   Epoch: 8   Global Step: 14460   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 02:40:30,308-Speed 24596.12 samples/sec   Loss 3.8110   LearningRate 0.0008   Epoch: 8   Global Step: 14470   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 02:40:40,396-Speed 24363.13 samples/sec   Loss 3.8005   LearningRate 0.0008   Epoch: 8   Global Step: 14480   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 02:40:50,403-Speed 24561.29 samples/sec   Loss 3.7814   LearningRate 0.0008   Epoch: 8   Global Step: 14490   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 02:41:00,376-Speed 24643.75 samples/sec   Loss 3.7522   LearningRate 0.0008   Epoch: 8   Global Step: 14500   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 02:41:10,205-Speed 25006.77 samples/sec   Loss 3.8140   LearningRate 0.0008   Epoch: 8   Global Step: 14510   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 02:41:19,937-Speed 25256.13 samples/sec   Loss 3.8184   LearningRate 0.0008   Epoch: 8   Global Step: 14520   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 02:41:29,772-Speed 24992.04 samples/sec   Loss 3.7823   LearningRate 0.0008   Epoch: 8   Global Step: 14530   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 02:41:39,608-Speed 24989.08 samples/sec   Loss 3.7869   LearningRate 0.0008   Epoch: 8   Global Step: 14540   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 02:41:49,335-Speed 25267.75 samples/sec   Loss 3.7520   LearningRate 0.0008   Epoch: 8   Global Step: 14550   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 02:41:59,303-Speed 24663.76 samples/sec   Loss 3.7739   LearningRate 0.0008   Epoch: 8   Global Step: 14560   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 02:42:09,059-Speed 25194.16 samples/sec   Loss 3.7741   LearningRate 0.0008   Epoch: 8   Global Step: 14570   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 02:42:19,071-Speed 24548.39 samples/sec   Loss 3.7687   LearningRate 0.0008   Epoch: 8   Global Step: 14580   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-03-26 02:42:29,261-Speed 24125.37 samples/sec   Loss 3.7813   LearningRate 0.0008   Epoch: 8   Global Step: 14590   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-03-26 02:42:39,451-Speed 24128.55 samples/sec   Loss 3.8034   LearningRate 0.0008   Epoch: 8   Global Step: 14600   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-03-26 02:42:49,512-Speed 24430.40 samples/sec   Loss 3.7719   LearningRate 0.0008   Epoch: 8   Global Step: 14610   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-03-26 02:42:59,588-Speed 24394.57 samples/sec   Loss 3.7813   LearningRate 0.0008   Epoch: 8   Global Step: 14620   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-03-26 02:43:09,660-Speed 24403.17 samples/sec   Loss 3.7635   LearningRate 0.0008   Epoch: 8   Global Step: 14630   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-03-26 02:43:19,747-Speed 24364.23 samples/sec   Loss 3.7691   LearningRate 0.0008   Epoch: 8   Global Step: 14640   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-03-26 02:43:29,898-Speed 24212.44 samples/sec   Loss 3.7563   LearningRate 0.0008   Epoch: 8   Global Step: 14650   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-03-26 02:43:39,952-Speed 24447.44 samples/sec   Loss 3.7880   LearningRate 0.0008   Epoch: 8   Global Step: 14660   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-03-26 02:43:50,150-Speed 24100.34 samples/sec   Loss 3.7867   LearningRate 0.0008   Epoch: 8   Global Step: 14670   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-03-26 02:44:00,236-Speed 24376.77 samples/sec   Loss 3.7434   LearningRate 0.0008   Epoch: 8   Global Step: 14680   Fp16 Grad Scale: 262144   Required: 15 hours
Training: 2022-03-26 02:44:10,309-Speed 24400.31 samples/sec   Loss 3.7230   LearningRate 0.0008   Epoch: 8   Global Step: 14690   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 02:44:20,389-Speed 24382.19 samples/sec   Loss 3.7192   LearningRate 0.0008   Epoch: 8   Global Step: 14700   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 02:44:30,465-Speed 24392.41 samples/sec   Loss 3.7501   LearningRate 0.0008   Epoch: 8   Global Step: 14710   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 02:44:40,502-Speed 24487.07 samples/sec   Loss 3.7627   LearningRate 0.0008   Epoch: 8   Global Step: 14720   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 02:44:50,588-Speed 24368.43 samples/sec   Loss 3.7497   LearningRate 0.0008   Epoch: 8   Global Step: 14730   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 02:45:00,706-Speed 24293.43 samples/sec   Loss 3.7436   LearningRate 0.0008   Epoch: 8   Global Step: 14740   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 02:45:10,798-Speed 24354.35 samples/sec   Loss 3.7419   LearningRate 0.0008   Epoch: 8   Global Step: 14750   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 02:45:20,888-Speed 24357.15 samples/sec   Loss 3.7032   LearningRate 0.0008   Epoch: 8   Global Step: 14760   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 02:45:30,950-Speed 24426.61 samples/sec   Loss 3.7331   LearningRate 0.0008   Epoch: 8   Global Step: 14770   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 02:45:40,952-Speed 24575.21 samples/sec   Loss 3.7303   LearningRate 0.0008   Epoch: 8   Global Step: 14780   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 02:45:50,785-Speed 24993.52 samples/sec   Loss 3.7397   LearningRate 0.0008   Epoch: 8   Global Step: 14790   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-03-26 02:46:00,586-Speed 25078.60 samples/sec   Loss 3.7116   LearningRate 0.0008   Epoch: 8   Global Step: 14800   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-03-26 02:46:10,334-Speed 25216.87 samples/sec   Loss 3.7671   LearningRate 0.0008   Epoch: 8   Global Step: 14810   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-03-26 02:46:20,093-Speed 25183.89 samples/sec   Loss 3.7466   LearningRate 0.0008   Epoch: 8   Global Step: 14820   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-03-26 02:46:29,823-Speed 25262.61 samples/sec   Loss 3.7333   LearningRate 0.0008   Epoch: 8   Global Step: 14830   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-03-26 02:46:39,578-Speed 25194.89 samples/sec   Loss 3.7083   LearningRate 0.0008   Epoch: 8   Global Step: 14840   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-03-26 02:46:49,405-Speed 25011.12 samples/sec   Loss 3.6960   LearningRate 0.0008   Epoch: 8   Global Step: 14850   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 02:46:59,104-Speed 25341.43 samples/sec   Loss 3.7007   LearningRate 0.0008   Epoch: 8   Global Step: 14860   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 02:47:08,878-Speed 25148.12 samples/sec   Loss 3.7545   LearningRate 0.0008   Epoch: 8   Global Step: 14870   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 02:47:18,667-Speed 25109.12 samples/sec   Loss 3.7929   LearningRate 0.0008   Epoch: 8   Global Step: 14880   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 02:47:28,540-Speed 24893.96 samples/sec   Loss 3.7056   LearningRate 0.0008   Epoch: 8   Global Step: 14890   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 02:47:38,368-Speed 25006.97 samples/sec   Loss 3.6920   LearningRate 0.0008   Epoch: 8   Global Step: 14900   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 02:47:48,138-Speed 25158.31 samples/sec   Loss 3.7382   LearningRate 0.0008   Epoch: 8   Global Step: 14910   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 02:47:57,898-Speed 25183.25 samples/sec   Loss 3.7458   LearningRate 0.0008   Epoch: 8   Global Step: 14920   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 02:48:07,699-Speed 25075.97 samples/sec   Loss 3.6889   LearningRate 0.0008   Epoch: 8   Global Step: 14930   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 02:48:17,471-Speed 25152.72 samples/sec   Loss 3.6965   LearningRate 0.0008   Epoch: 8   Global Step: 14940   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 02:48:27,190-Speed 25289.41 samples/sec   Loss 3.7136   LearningRate 0.0008   Epoch: 8   Global Step: 14950   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-03-26 02:48:36,961-Speed 25153.41 samples/sec   Loss 3.7224   LearningRate 0.0008   Epoch: 8   Global Step: 14960   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-03-26 02:48:46,755-Speed 25094.24 samples/sec   Loss 3.6817   LearningRate 0.0008   Epoch: 8   Global Step: 14970   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 02:48:56,595-Speed 24980.00 samples/sec   Loss 3.6657   LearningRate 0.0008   Epoch: 8   Global Step: 14980   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 02:49:06,441-Speed 24963.01 samples/sec   Loss 3.7167   LearningRate 0.0008   Epoch: 8   Global Step: 14990   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 02:49:16,203-Speed 25178.10 samples/sec   Loss 3.6801   LearningRate 0.0008   Epoch: 8   Global Step: 15000   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 02:49:25,921-Speed 25291.08 samples/sec   Loss 3.7028   LearningRate 0.0008   Epoch: 8   Global Step: 15010   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 02:49:35,651-Speed 25257.63 samples/sec   Loss 3.6936   LearningRate 0.0008   Epoch: 8   Global Step: 15020   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 02:49:45,382-Speed 25260.09 samples/sec   Loss 3.6745   LearningRate 0.0008   Epoch: 8   Global Step: 15030   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 02:49:55,199-Speed 25034.20 samples/sec   Loss 3.6887   LearningRate 0.0008   Epoch: 8   Global Step: 15040   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 02:50:04,976-Speed 25139.79 samples/sec   Loss 3.6938   LearningRate 0.0008   Epoch: 8   Global Step: 15050   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 02:50:14,680-Speed 25327.32 samples/sec   Loss 3.6447   LearningRate 0.0008   Epoch: 8   Global Step: 15060   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 02:50:24,511-Speed 25002.61 samples/sec   Loss 3.7039   LearningRate 0.0008   Epoch: 8   Global Step: 15070   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-03-26 02:50:34,209-Speed 25343.38 samples/sec   Loss 3.7353   LearningRate 0.0008   Epoch: 8   Global Step: 15080   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-03-26 02:50:44,120-Speed 24797.74 samples/sec   Loss 3.7074   LearningRate 0.0008   Epoch: 8   Global Step: 15090   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-03-26 02:50:53,896-Speed 25141.83 samples/sec   Loss 3.6889   LearningRate 0.0008   Epoch: 8   Global Step: 15100   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-03-26 02:51:03,675-Speed 25134.27 samples/sec   Loss 3.7079   LearningRate 0.0008   Epoch: 8   Global Step: 15110   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-03-26 02:51:13,435-Speed 25182.60 samples/sec   Loss 3.7334   LearningRate 0.0008   Epoch: 8   Global Step: 15120   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-03-26 02:51:23,229-Speed 25102.42 samples/sec   Loss 3.6983   LearningRate 0.0008   Epoch: 8   Global Step: 15130   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-03-26 02:51:32,997-Speed 25164.22 samples/sec   Loss 3.6790   LearningRate 0.0008   Epoch: 8   Global Step: 15140   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-03-26 02:51:42,801-Speed 25070.06 samples/sec   Loss 3.6835   LearningRate 0.0008   Epoch: 8   Global Step: 15150   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 02:51:52,699-Speed 24831.19 samples/sec   Loss 3.6716   LearningRate 0.0008   Epoch: 8   Global Step: 15160   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 02:52:02,406-Speed 25323.23 samples/sec   Loss 3.6744   LearningRate 0.0008   Epoch: 8   Global Step: 15170   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 02:52:12,179-Speed 25149.93 samples/sec   Loss 3.6643   LearningRate 0.0008   Epoch: 8   Global Step: 15180   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 02:52:21,948-Speed 25159.69 samples/sec   Loss 3.6516   LearningRate 0.0008   Epoch: 8   Global Step: 15190   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 02:52:31,666-Speed 25291.32 samples/sec   Loss 3.6752   LearningRate 0.0008   Epoch: 8   Global Step: 15200   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 02:52:41,474-Speed 25060.83 samples/sec   Loss 3.6775   LearningRate 0.0008   Epoch: 8   Global Step: 15210   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 02:52:51,209-Speed 25247.44 samples/sec   Loss 3.7118   LearningRate 0.0008   Epoch: 8   Global Step: 15220   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 02:53:01,075-Speed 24917.95 samples/sec   Loss 3.6881   LearningRate 0.0008   Epoch: 8   Global Step: 15230   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 02:53:10,782-Speed 25320.01 samples/sec   Loss 3.6385   LearningRate 0.0008   Epoch: 8   Global Step: 15240   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 02:53:20,596-Speed 25043.85 samples/sec   Loss 3.6801   LearningRate 0.0007   Epoch: 8   Global Step: 15250   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-03-26 02:53:30,365-Speed 25164.24 samples/sec   Loss 3.6955   LearningRate 0.0007   Epoch: 8   Global Step: 15260   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-03-26 02:53:40,058-Speed 25356.96 samples/sec   Loss 3.6569   LearningRate 0.0007   Epoch: 8   Global Step: 15270   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 02:53:49,877-Speed 25032.87 samples/sec   Loss 3.6764   LearningRate 0.0007   Epoch: 8   Global Step: 15280   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 02:53:59,635-Speed 25185.90 samples/sec   Loss 3.6577   LearningRate 0.0007   Epoch: 8   Global Step: 15290   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 02:54:09,389-Speed 25200.55 samples/sec   Loss 3.6485   LearningRate 0.0007   Epoch: 8   Global Step: 15300   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 02:54:19,192-Speed 25071.86 samples/sec   Loss 3.6554   LearningRate 0.0007   Epoch: 8   Global Step: 15310   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 02:54:28,834-Speed 25491.21 samples/sec   Loss 3.6525   LearningRate 0.0007   Epoch: 8   Global Step: 15320   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 02:54:38,558-Speed 25278.12 samples/sec   Loss 3.6484   LearningRate 0.0007   Epoch: 8   Global Step: 15330   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 02:54:48,302-Speed 25227.55 samples/sec   Loss 3.6362   LearningRate 0.0007   Epoch: 8   Global Step: 15340   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 02:54:58,078-Speed 25144.65 samples/sec   Loss 3.6567   LearningRate 0.0007   Epoch: 8   Global Step: 15350   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 02:55:07,843-Speed 25169.14 samples/sec   Loss 3.6487   LearningRate 0.0007   Epoch: 8   Global Step: 15360   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 02:55:17,680-Speed 24988.08 samples/sec   Loss 3.6287   LearningRate 0.0007   Epoch: 8   Global Step: 15370   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-03-26 02:55:27,382-Speed 25335.95 samples/sec   Loss 3.6575   LearningRate 0.0007   Epoch: 8   Global Step: 15380   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-03-26 02:55:37,293-Speed 24800.64 samples/sec   Loss 3.6451   LearningRate 0.0007   Epoch: 8   Global Step: 15390   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-03-26 02:55:47,151-Speed 24932.64 samples/sec   Loss 3.6442   LearningRate 0.0007   Epoch: 8   Global Step: 15400   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 02:55:57,094-Speed 24718.86 samples/sec   Loss 3.6499   LearningRate 0.0007   Epoch: 8   Global Step: 15410   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 02:56:07,007-Speed 24793.07 samples/sec   Loss 3.6162   LearningRate 0.0007   Epoch: 8   Global Step: 15420   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 02:56:16,697-Speed 25366.30 samples/sec   Loss 3.6191   LearningRate 0.0007   Epoch: 8   Global Step: 15430   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 02:56:26,412-Speed 25301.62 samples/sec   Loss 3.6449   LearningRate 0.0007   Epoch: 8   Global Step: 15440   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 02:56:36,142-Speed 25260.92 samples/sec   Loss 3.6266   LearningRate 0.0007   Epoch: 8   Global Step: 15450   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 02:56:45,903-Speed 25189.23 samples/sec   Loss 3.6670   LearningRate 0.0007   Epoch: 8   Global Step: 15460   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 02:56:55,689-Speed 25119.13 samples/sec   Loss 3.6624   LearningRate 0.0007   Epoch: 8   Global Step: 15470   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 02:57:05,525-Speed 24987.87 samples/sec   Loss 3.6319   LearningRate 0.0007   Epoch: 8   Global Step: 15480   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 02:57:15,274-Speed 25211.04 samples/sec   Loss 3.6439   LearningRate 0.0007   Epoch: 8   Global Step: 15490   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 02:57:25,097-Speed 25022.45 samples/sec   Loss 3.6710   LearningRate 0.0007   Epoch: 8   Global Step: 15500   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-03-26 02:57:35,067-Speed 24653.26 samples/sec   Loss 3.6369   LearningRate 0.0007   Epoch: 8   Global Step: 15510   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-03-26 02:57:44,932-Speed 24915.54 samples/sec   Loss 3.6461   LearningRate 0.0007   Epoch: 8   Global Step: 15520   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-03-26 02:57:54,701-Speed 25158.56 samples/sec   Loss 3.6745   LearningRate 0.0007   Epoch: 8   Global Step: 15530   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-03-26 02:58:04,518-Speed 25038.50 samples/sec   Loss 3.6548   LearningRate 0.0007   Epoch: 8   Global Step: 15540   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-03-26 02:58:14,379-Speed 24924.53 samples/sec   Loss 3.6490   LearningRate 0.0007   Epoch: 8   Global Step: 15550   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-03-26 02:59:14,502-Speed 4087.72 samples/sec   Loss 3.6441   LearningRate 0.0007   Epoch: 9   Global Step: 15560   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-03-26 02:59:24,210-Speed 25320.27 samples/sec   Loss 3.5868   LearningRate 0.0007   Epoch: 9   Global Step: 15570   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-03-26 02:59:33,974-Speed 25172.88 samples/sec   Loss 3.5680   LearningRate 0.0007   Epoch: 9   Global Step: 15580   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 02:59:43,811-Speed 24985.22 samples/sec   Loss 3.5681   LearningRate 0.0007   Epoch: 9   Global Step: 15590   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 02:59:53,651-Speed 24979.20 samples/sec   Loss 3.5854   LearningRate 0.0007   Epoch: 9   Global Step: 15600   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:00:03,445-Speed 25101.45 samples/sec   Loss 3.6064   LearningRate 0.0007   Epoch: 9   Global Step: 15610   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:00:13,277-Speed 24998.75 samples/sec   Loss 3.5929   LearningRate 0.0007   Epoch: 9   Global Step: 15620   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:00:23,076-Speed 25083.46 samples/sec   Loss 3.6022   LearningRate 0.0007   Epoch: 9   Global Step: 15630   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:00:32,845-Speed 25160.80 samples/sec   Loss 3.5510   LearningRate 0.0007   Epoch: 9   Global Step: 15640   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:00:42,567-Speed 25282.12 samples/sec   Loss 3.6098   LearningRate 0.0007   Epoch: 9   Global Step: 15650   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:00:52,399-Speed 24998.20 samples/sec   Loss 3.5605   LearningRate 0.0007   Epoch: 9   Global Step: 15660   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:01:02,152-Speed 25206.25 samples/sec   Loss 3.5943   LearningRate 0.0007   Epoch: 9   Global Step: 15670   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:01:11,820-Speed 25424.51 samples/sec   Loss 3.6076   LearningRate 0.0007   Epoch: 9   Global Step: 15680   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-03-26 03:01:21,562-Speed 25229.99 samples/sec   Loss 3.5713   LearningRate 0.0007   Epoch: 9   Global Step: 15690   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-03-26 03:01:31,241-Speed 25396.71 samples/sec   Loss 3.5960   LearningRate 0.0007   Epoch: 9   Global Step: 15700   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:01:40,977-Speed 25244.95 samples/sec   Loss 3.5886   LearningRate 0.0007   Epoch: 9   Global Step: 15710   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:01:50,685-Speed 25320.07 samples/sec   Loss 3.5696   LearningRate 0.0007   Epoch: 9   Global Step: 15720   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:02:00,458-Speed 25148.55 samples/sec   Loss 3.5917   LearningRate 0.0007   Epoch: 9   Global Step: 15730   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:02:10,341-Speed 24870.09 samples/sec   Loss 3.6016   LearningRate 0.0007   Epoch: 9   Global Step: 15740   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:02:20,057-Speed 25298.53 samples/sec   Loss 3.6090   LearningRate 0.0007   Epoch: 9   Global Step: 15750   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:02:29,957-Speed 24825.32 samples/sec   Loss 3.6045   LearningRate 0.0007   Epoch: 9   Global Step: 15760   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:02:39,779-Speed 25026.66 samples/sec   Loss 3.5775   LearningRate 0.0007   Epoch: 9   Global Step: 15770   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:02:49,536-Speed 25189.44 samples/sec   Loss 3.5943   LearningRate 0.0007   Epoch: 9   Global Step: 15780   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:02:59,326-Speed 25106.57 samples/sec   Loss 3.5538   LearningRate 0.0007   Epoch: 9   Global Step: 15790   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:03:09,107-Speed 25129.83 samples/sec   Loss 3.5863   LearningRate 0.0007   Epoch: 9   Global Step: 15800   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-03-26 03:03:19,009-Speed 24822.40 samples/sec   Loss 3.5850   LearningRate 0.0007   Epoch: 9   Global Step: 15810   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-03-26 03:03:28,994-Speed 24617.53 samples/sec   Loss 3.5695   LearningRate 0.0007   Epoch: 9   Global Step: 15820   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:03:38,878-Speed 24867.27 samples/sec   Loss 3.5683   LearningRate 0.0007   Epoch: 9   Global Step: 15830   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:03:48,551-Speed 25409.26 samples/sec   Loss 3.5756   LearningRate 0.0007   Epoch: 9   Global Step: 15840   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:03:58,266-Speed 25308.00 samples/sec   Loss 3.5401   LearningRate 0.0007   Epoch: 9   Global Step: 15850   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:04:08,051-Speed 25119.21 samples/sec   Loss 3.5696   LearningRate 0.0007   Epoch: 9   Global Step: 15860   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:04:17,860-Speed 25057.85 samples/sec   Loss 3.6129   LearningRate 0.0007   Epoch: 9   Global Step: 15870   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:04:27,688-Speed 25011.32 samples/sec   Loss 3.6185   LearningRate 0.0007   Epoch: 9   Global Step: 15880   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:04:37,429-Speed 25231.30 samples/sec   Loss 3.5756   LearningRate 0.0007   Epoch: 9   Global Step: 15890   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:04:47,229-Speed 25079.49 samples/sec   Loss 3.5473   LearningRate 0.0007   Epoch: 9   Global Step: 15900   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:04:56,981-Speed 25206.56 samples/sec   Loss 3.5526   LearningRate 0.0007   Epoch: 9   Global Step: 15910   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:05:06,736-Speed 25197.36 samples/sec   Loss 3.5999   LearningRate 0.0007   Epoch: 9   Global Step: 15920   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-03-26 03:05:16,551-Speed 25041.76 samples/sec   Loss 3.5770   LearningRate 0.0007   Epoch: 9   Global Step: 15930   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-03-26 03:05:26,315-Speed 25172.86 samples/sec   Loss 3.6052   LearningRate 0.0007   Epoch: 9   Global Step: 15940   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:05:36,150-Speed 24992.71 samples/sec   Loss 3.5901   LearningRate 0.0007   Epoch: 9   Global Step: 15950   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:05:45,905-Speed 25198.66 samples/sec   Loss 3.5432   LearningRate 0.0007   Epoch: 9   Global Step: 15960   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:05:55,751-Speed 24964.61 samples/sec   Loss 3.5398   LearningRate 0.0007   Epoch: 9   Global Step: 15970   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:06:05,673-Speed 24773.63 samples/sec   Loss 3.5431   LearningRate 0.0007   Epoch: 9   Global Step: 15980   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:06:15,436-Speed 25176.95 samples/sec   Loss 3.5514   LearningRate 0.0007   Epoch: 9   Global Step: 15990   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:06:25,240-Speed 25070.74 samples/sec   Loss 3.5379   LearningRate 0.0007   Epoch: 9   Global Step: 16000   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:06:35,034-Speed 25097.09 samples/sec   Loss 3.5309   LearningRate 0.0007   Epoch: 9   Global Step: 16010   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:06:44,881-Speed 24961.15 samples/sec   Loss 3.5691   LearningRate 0.0007   Epoch: 9   Global Step: 16020   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:06:54,755-Speed 24893.28 samples/sec   Loss 3.5359   LearningRate 0.0007   Epoch: 9   Global Step: 16030   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:07:04,587-Speed 24999.47 samples/sec   Loss 3.5690   LearningRate 0.0007   Epoch: 9   Global Step: 16040   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-03-26 03:07:14,341-Speed 25201.69 samples/sec   Loss 3.5378   LearningRate 0.0007   Epoch: 9   Global Step: 16050   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-03-26 03:07:24,132-Speed 25104.14 samples/sec   Loss 3.5177   LearningRate 0.0007   Epoch: 9   Global Step: 16060   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-03-26 03:07:33,922-Speed 25106.00 samples/sec   Loss 3.5730   LearningRate 0.0007   Epoch: 9   Global Step: 16070   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-03-26 03:07:43,723-Speed 25078.82 samples/sec   Loss 3.5489   LearningRate 0.0007   Epoch: 9   Global Step: 16080   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-03-26 03:07:53,534-Speed 25054.51 samples/sec   Loss 3.5346   LearningRate 0.0007   Epoch: 9   Global Step: 16090   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-03-26 03:08:03,279-Speed 25222.09 samples/sec   Loss 3.5339   LearningRate 0.0007   Epoch: 9   Global Step: 16100   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-03-26 03:08:13,121-Speed 24972.74 samples/sec   Loss 3.5019   LearningRate 0.0007   Epoch: 9   Global Step: 16110   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-03-26 03:08:22,858-Speed 25248.60 samples/sec   Loss 3.5400   LearningRate 0.0007   Epoch: 9   Global Step: 16120   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-03-26 03:08:32,705-Speed 24962.02 samples/sec   Loss 3.5584   LearningRate 0.0007   Epoch: 9   Global Step: 16130   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-03-26 03:08:42,583-Speed 24883.86 samples/sec   Loss 3.5348   LearningRate 0.0007   Epoch: 9   Global Step: 16140   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:08:52,303-Speed 25287.61 samples/sec   Loss 3.5363   LearningRate 0.0007   Epoch: 9   Global Step: 16150   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:09:02,093-Speed 25110.43 samples/sec   Loss 3.5282   LearningRate 0.0007   Epoch: 9   Global Step: 16160   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:09:11,920-Speed 25011.38 samples/sec   Loss 3.5282   LearningRate 0.0007   Epoch: 9   Global Step: 16170   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:09:21,674-Speed 25197.78 samples/sec   Loss 3.5182   LearningRate 0.0007   Epoch: 9   Global Step: 16180   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:09:31,571-Speed 24834.81 samples/sec   Loss 3.5348   LearningRate 0.0007   Epoch: 9   Global Step: 16190   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:09:41,486-Speed 24791.23 samples/sec   Loss 3.5445   LearningRate 0.0007   Epoch: 9   Global Step: 16200   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:09:51,179-Speed 25356.99 samples/sec   Loss 3.5194   LearningRate 0.0007   Epoch: 9   Global Step: 16210   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:10:00,885-Speed 25321.73 samples/sec   Loss 3.5180   LearningRate 0.0007   Epoch: 9   Global Step: 16220   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:10:10,661-Speed 25142.62 samples/sec   Loss 3.5372   LearningRate 0.0007   Epoch: 9   Global Step: 16230   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:10:20,504-Speed 24972.12 samples/sec   Loss 3.5486   LearningRate 0.0007   Epoch: 9   Global Step: 16240   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-03-26 03:10:30,314-Speed 25055.74 samples/sec   Loss 3.5492   LearningRate 0.0007   Epoch: 9   Global Step: 16250   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-03-26 03:10:40,071-Speed 25190.99 samples/sec   Loss 3.5013   LearningRate 0.0007   Epoch: 9   Global Step: 16260   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-03-26 03:10:49,872-Speed 25075.99 samples/sec   Loss 3.5129   LearningRate 0.0007   Epoch: 9   Global Step: 16270   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:10:59,540-Speed 25425.83 samples/sec   Loss 3.5030   LearningRate 0.0007   Epoch: 9   Global Step: 16280   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:11:09,362-Speed 25024.22 samples/sec   Loss 3.5164   LearningRate 0.0007   Epoch: 9   Global Step: 16290   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:11:19,175-Speed 25048.14 samples/sec   Loss 3.5519   LearningRate 0.0007   Epoch: 9   Global Step: 16300   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:11:28,973-Speed 25086.62 samples/sec   Loss 3.5333   LearningRate 0.0007   Epoch: 9   Global Step: 16310   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:11:38,666-Speed 25356.26 samples/sec   Loss 3.5194   LearningRate 0.0007   Epoch: 9   Global Step: 16320   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-26 03:11:48,468-Speed 25076.75 samples/sec   Loss 3.5213   LearningRate 0.0007   Epoch: 9   Global Step: 16330   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-26 03:11:58,151-Speed 25383.77 samples/sec   Loss 3.5094   LearningRate 0.0007   Epoch: 9   Global Step: 16340   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-26 03:12:07,946-Speed 25093.16 samples/sec   Loss 3.5254   LearningRate 0.0007   Epoch: 9   Global Step: 16350   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-26 03:12:17,733-Speed 25115.26 samples/sec   Loss 3.4925   LearningRate 0.0007   Epoch: 9   Global Step: 16360   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-26 03:12:27,436-Speed 25330.74 samples/sec   Loss 3.4788   LearningRate 0.0007   Epoch: 9   Global Step: 16370   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-26 03:12:37,176-Speed 25237.49 samples/sec   Loss 3.4760   LearningRate 0.0007   Epoch: 9   Global Step: 16380   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-26 03:12:46,974-Speed 25085.88 samples/sec   Loss 3.5000   LearningRate 0.0007   Epoch: 9   Global Step: 16390   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-26 03:12:56,936-Speed 24674.46 samples/sec   Loss 3.4841   LearningRate 0.0007   Epoch: 9   Global Step: 16400   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-26 03:13:06,758-Speed 25024.22 samples/sec   Loss 3.4826   LearningRate 0.0007   Epoch: 9   Global Step: 16410   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-26 03:13:16,565-Speed 25063.46 samples/sec   Loss 3.4806   LearningRate 0.0007   Epoch: 9   Global Step: 16420   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:13:26,481-Speed 24786.46 samples/sec   Loss 3.5020   LearningRate 0.0007   Epoch: 9   Global Step: 16430   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:13:36,423-Speed 24723.35 samples/sec   Loss 3.5087   LearningRate 0.0007   Epoch: 9   Global Step: 16440   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:13:46,273-Speed 24954.28 samples/sec   Loss 3.4854   LearningRate 0.0007   Epoch: 9   Global Step: 16450   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:13:56,199-Speed 24766.14 samples/sec   Loss 3.5074   LearningRate 0.0007   Epoch: 9   Global Step: 16460   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:14:05,993-Speed 25102.01 samples/sec   Loss 3.4830   LearningRate 0.0007   Epoch: 9   Global Step: 16470   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:14:15,812-Speed 25037.89 samples/sec   Loss 3.5128   LearningRate 0.0007   Epoch: 9   Global Step: 16480   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:14:25,642-Speed 25005.62 samples/sec   Loss 3.5047   LearningRate 0.0007   Epoch: 9   Global Step: 16490   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:14:35,431-Speed 25108.51 samples/sec   Loss 3.4812   LearningRate 0.0007   Epoch: 9   Global Step: 16500   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:14:45,246-Speed 25039.92 samples/sec   Loss 3.4815   LearningRate 0.0007   Epoch: 9   Global Step: 16510   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:14:55,040-Speed 25097.54 samples/sec   Loss 3.4829   LearningRate 0.0007   Epoch: 9   Global Step: 16520   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-03-26 03:15:04,923-Speed 24877.48 samples/sec   Loss 3.4708   LearningRate 0.0007   Epoch: 9   Global Step: 16530   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-03-26 03:15:14,684-Speed 25181.96 samples/sec   Loss 3.4577   LearningRate 0.0007   Epoch: 9   Global Step: 16540   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-03-26 03:15:24,500-Speed 25040.27 samples/sec   Loss 3.5109   LearningRate 0.0007   Epoch: 9   Global Step: 16550   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-03-26 03:15:34,323-Speed 25023.32 samples/sec   Loss 3.4780   LearningRate 0.0007   Epoch: 9   Global Step: 16560   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-03-26 03:15:44,118-Speed 25092.32 samples/sec   Loss 3.4836   LearningRate 0.0007   Epoch: 9   Global Step: 16570   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:15:53,980-Speed 24923.32 samples/sec   Loss 3.4636   LearningRate 0.0007   Epoch: 9   Global Step: 16580   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:16:03,767-Speed 25115.45 samples/sec   Loss 3.4982   LearningRate 0.0007   Epoch: 9   Global Step: 16590   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:16:13,500-Speed 25253.91 samples/sec   Loss 3.4680   LearningRate 0.0007   Epoch: 9   Global Step: 16600   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:16:23,364-Speed 24918.26 samples/sec   Loss 3.4590   LearningRate 0.0007   Epoch: 9   Global Step: 16610   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:16:33,366-Speed 24574.61 samples/sec   Loss 3.4504   LearningRate 0.0007   Epoch: 9   Global Step: 16620   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:16:43,309-Speed 24721.29 samples/sec   Loss 3.4552   LearningRate 0.0007   Epoch: 9   Global Step: 16630   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:16:53,201-Speed 24849.37 samples/sec   Loss 3.4503   LearningRate 0.0007   Epoch: 9   Global Step: 16640   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:17:02,981-Speed 25131.00 samples/sec   Loss 3.4515   LearningRate 0.0007   Epoch: 9   Global Step: 16650   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:17:12,791-Speed 25058.41 samples/sec   Loss 3.4716   LearningRate 0.0007   Epoch: 9   Global Step: 16660   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:17:22,526-Speed 25247.96 samples/sec   Loss 3.4429   LearningRate 0.0007   Epoch: 9   Global Step: 16670   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:17:32,422-Speed 24837.55 samples/sec   Loss 3.4596   LearningRate 0.0007   Epoch: 9   Global Step: 16680   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:17:42,351-Speed 24755.07 samples/sec   Loss 3.4629   LearningRate 0.0007   Epoch: 9   Global Step: 16690   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:17:52,301-Speed 24701.66 samples/sec   Loss 3.4979   LearningRate 0.0007   Epoch: 9   Global Step: 16700   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:18:02,052-Speed 25208.27 samples/sec   Loss 3.4627   LearningRate 0.0007   Epoch: 9   Global Step: 16710   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:18:11,871-Speed 25033.37 samples/sec   Loss 3.4730   LearningRate 0.0007   Epoch: 9   Global Step: 16720   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:18:21,706-Speed 24991.66 samples/sec   Loss 3.4408   LearningRate 0.0007   Epoch: 9   Global Step: 16730   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:18:31,473-Speed 25165.78 samples/sec   Loss 3.4367   LearningRate 0.0007   Epoch: 9   Global Step: 16740   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:18:41,338-Speed 24914.58 samples/sec   Loss 3.4400   LearningRate 0.0007   Epoch: 9   Global Step: 16750   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:18:51,155-Speed 25036.57 samples/sec   Loss 3.4462   LearningRate 0.0007   Epoch: 9   Global Step: 16760   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:19:01,020-Speed 24914.99 samples/sec   Loss 3.4308   LearningRate 0.0007   Epoch: 9   Global Step: 16770   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-26 03:19:10,823-Speed 25073.46 samples/sec   Loss 3.4172   LearningRate 0.0007   Epoch: 9   Global Step: 16780   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-26 03:19:20,567-Speed 25227.43 samples/sec   Loss 3.4444   LearningRate 0.0007   Epoch: 9   Global Step: 16790   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-26 03:19:30,312-Speed 25223.20 samples/sec   Loss 3.4331   LearningRate 0.0007   Epoch: 9   Global Step: 16800   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-26 03:19:40,167-Speed 24942.23 samples/sec   Loss 3.4567   LearningRate 0.0007   Epoch: 9   Global Step: 16810   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-26 03:19:50,026-Speed 24929.52 samples/sec   Loss 3.4350   LearningRate 0.0007   Epoch: 9   Global Step: 16820   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-26 03:19:59,727-Speed 25335.86 samples/sec   Loss 3.4625   LearningRate 0.0007   Epoch: 9   Global Step: 16830   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-26 03:20:09,461-Speed 25252.64 samples/sec   Loss 3.4191   LearningRate 0.0007   Epoch: 9   Global Step: 16840   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-26 03:20:19,177-Speed 25298.51 samples/sec   Loss 3.4326   LearningRate 0.0007   Epoch: 9   Global Step: 16850   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-26 03:20:28,845-Speed 25422.98 samples/sec   Loss 3.4223   LearningRate 0.0007   Epoch: 9   Global Step: 16860   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-26 03:20:38,656-Speed 25052.15 samples/sec   Loss 3.4457   LearningRate 0.0007   Epoch: 9   Global Step: 16870   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:20:48,389-Speed 25254.31 samples/sec   Loss 3.4540   LearningRate 0.0007   Epoch: 9   Global Step: 16880   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:20:58,237-Speed 24958.17 samples/sec   Loss 3.4429   LearningRate 0.0007   Epoch: 9   Global Step: 16890   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:21:08,115-Speed 24882.00 samples/sec   Loss 3.3960   LearningRate 0.0007   Epoch: 9   Global Step: 16900   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:21:17,854-Speed 25237.47 samples/sec   Loss 3.3961   LearningRate 0.0007   Epoch: 9   Global Step: 16910   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:21:27,608-Speed 25199.17 samples/sec   Loss 3.4239   LearningRate 0.0007   Epoch: 9   Global Step: 16920   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:21:37,397-Speed 25108.68 samples/sec   Loss 3.4695   LearningRate 0.0007   Epoch: 9   Global Step: 16930   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:21:47,207-Speed 25053.68 samples/sec   Loss 3.4916   LearningRate 0.0007   Epoch: 9   Global Step: 16940   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:21:56,990-Speed 25126.37 samples/sec   Loss 3.4355   LearningRate 0.0007   Epoch: 9   Global Step: 16950   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:22:06,818-Speed 25009.85 samples/sec   Loss 3.4355   LearningRate 0.0007   Epoch: 9   Global Step: 16960   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:22:16,628-Speed 25056.85 samples/sec   Loss 3.4139   LearningRate 0.0007   Epoch: 9   Global Step: 16970   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-03-26 03:22:26,467-Speed 24982.79 samples/sec   Loss 3.4442   LearningRate 0.0007   Epoch: 9   Global Step: 16980   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-03-26 03:22:36,383-Speed 24786.72 samples/sec   Loss 3.4101   LearningRate 0.0007   Epoch: 9   Global Step: 16990   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-03-26 03:22:46,162-Speed 25135.84 samples/sec   Loss 3.4243   LearningRate 0.0007   Epoch: 9   Global Step: 17000   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:22:56,144-Speed 24622.89 samples/sec   Loss 3.4181   LearningRate 0.0007   Epoch: 9   Global Step: 17010   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:23:05,923-Speed 25136.28 samples/sec   Loss 3.4280   LearningRate 0.0007   Epoch: 9   Global Step: 17020   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:23:15,695-Speed 25152.91 samples/sec   Loss 3.4091   LearningRate 0.0007   Epoch: 9   Global Step: 17030   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:23:25,564-Speed 24902.95 samples/sec   Loss 3.4289   LearningRate 0.0007   Epoch: 9   Global Step: 17040   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:23:35,293-Speed 25276.12 samples/sec   Loss 3.4450   LearningRate 0.0007   Epoch: 9   Global Step: 17050   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:23:44,966-Speed 25410.19 samples/sec   Loss 3.4087   LearningRate 0.0007   Epoch: 9   Global Step: 17060   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:23:54,893-Speed 24763.58 samples/sec   Loss 3.4361   LearningRate 0.0007   Epoch: 9   Global Step: 17070   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:24:04,658-Speed 25178.92 samples/sec   Loss 3.4022   LearningRate 0.0007   Epoch: 9   Global Step: 17080   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:24:14,525-Speed 24909.85 samples/sec   Loss 3.4006   LearningRate 0.0007   Epoch: 9   Global Step: 17090   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:24:24,345-Speed 25031.08 samples/sec   Loss 3.3976   LearningRate 0.0007   Epoch: 9   Global Step: 17100   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-03-26 03:24:34,029-Speed 25379.81 samples/sec   Loss 3.4293   LearningRate 0.0007   Epoch: 9   Global Step: 17110   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:24:43,830-Speed 25078.66 samples/sec   Loss 3.4243   LearningRate 0.0007   Epoch: 9   Global Step: 17120   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:24:53,527-Speed 25350.30 samples/sec   Loss 3.4059   LearningRate 0.0007   Epoch: 9   Global Step: 17130   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:25:03,266-Speed 25235.48 samples/sec   Loss 3.4012   LearningRate 0.0007   Epoch: 9   Global Step: 17140   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:25:12,990-Speed 25278.28 samples/sec   Loss 3.4173   LearningRate 0.0007   Epoch: 9   Global Step: 17150   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:25:22,733-Speed 25228.29 samples/sec   Loss 3.4285   LearningRate 0.0007   Epoch: 9   Global Step: 17160   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:25:32,537-Speed 25069.58 samples/sec   Loss 3.3880   LearningRate 0.0007   Epoch: 9   Global Step: 17170   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:25:42,257-Speed 25286.28 samples/sec   Loss 3.3971   LearningRate 0.0007   Epoch: 9   Global Step: 17180   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:25:52,024-Speed 25168.95 samples/sec   Loss 3.4017   LearningRate 0.0007   Epoch: 9   Global Step: 17190   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:26:01,854-Speed 25004.53 samples/sec   Loss 3.4059   LearningRate 0.0007   Epoch: 9   Global Step: 17200   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:26:11,627-Speed 25148.94 samples/sec   Loss 3.4134   LearningRate 0.0007   Epoch: 9   Global Step: 17210   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-03-26 03:26:21,376-Speed 25212.39 samples/sec   Loss 3.4347   LearningRate 0.0007   Epoch: 9   Global Step: 17220   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-03-26 03:26:31,434-Speed 24437.21 samples/sec   Loss 3.3914   LearningRate 0.0007   Epoch: 9   Global Step: 17230   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-03-26 03:26:41,288-Speed 24942.42 samples/sec   Loss 3.4047   LearningRate 0.0007   Epoch: 9   Global Step: 17240   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-03-26 03:26:51,254-Speed 24665.04 samples/sec   Loss 3.4236   LearningRate 0.0007   Epoch: 9   Global Step: 17250   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-03-26 03:27:01,081-Speed 25011.22 samples/sec   Loss 3.4293   LearningRate 0.0007   Epoch: 9   Global Step: 17260   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-03-26 03:27:10,849-Speed 25161.59 samples/sec   Loss 3.4041   LearningRate 0.0007   Epoch: 9   Global Step: 17270   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-03-26 03:27:20,597-Speed 25215.35 samples/sec   Loss 3.4245   LearningRate 0.0007   Epoch: 9   Global Step: 17280   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-03-26 03:28:20,868-Speed 4077.67 samples/sec   Loss 3.4010   LearningRate 0.0007   Epoch: 10   Global Step: 17290   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-03-26 03:28:30,592-Speed 25277.36 samples/sec   Loss 3.3847   LearningRate 0.0007   Epoch: 10   Global Step: 17300   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-03-26 03:28:40,276-Speed 25381.63 samples/sec   Loss 3.3996   LearningRate 0.0007   Epoch: 10   Global Step: 17310   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-03-26 03:28:49,969-Speed 25360.91 samples/sec   Loss 3.3427   LearningRate 0.0007   Epoch: 10   Global Step: 17320   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-03-26 03:28:59,689-Speed 25285.04 samples/sec   Loss 3.3312   LearningRate 0.0007   Epoch: 10   Global Step: 17330   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:29:09,393-Speed 25331.47 samples/sec   Loss 3.3622   LearningRate 0.0007   Epoch: 10   Global Step: 17340   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:29:19,184-Speed 25103.20 samples/sec   Loss 3.3440   LearningRate 0.0007   Epoch: 10   Global Step: 17350   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:29:28,970-Speed 25116.90 samples/sec   Loss 3.3255   LearningRate 0.0007   Epoch: 10   Global Step: 17360   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:29:38,617-Speed 25479.29 samples/sec   Loss 3.3218   LearningRate 0.0007   Epoch: 10   Global Step: 17370   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:29:48,546-Speed 24754.19 samples/sec   Loss 3.3123   LearningRate 0.0007   Epoch: 10   Global Step: 17380   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:29:58,346-Speed 25083.36 samples/sec   Loss 3.3355   LearningRate 0.0007   Epoch: 10   Global Step: 17390   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:30:08,163-Speed 25036.11 samples/sec   Loss 3.3693   LearningRate 0.0007   Epoch: 10   Global Step: 17400   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:30:17,881-Speed 25293.00 samples/sec   Loss 3.3675   LearningRate 0.0007   Epoch: 10   Global Step: 17410   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:30:27,532-Speed 25468.95 samples/sec   Loss 3.3587   LearningRate 0.0007   Epoch: 10   Global Step: 17420   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:30:37,325-Speed 25097.89 samples/sec   Loss 3.3456   LearningRate 0.0007   Epoch: 10   Global Step: 17430   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-03-26 03:30:47,159-Speed 24994.10 samples/sec   Loss 3.3230   LearningRate 0.0007   Epoch: 10   Global Step: 17440   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-03-26 03:30:56,968-Speed 25059.35 samples/sec   Loss 3.3526   LearningRate 0.0007   Epoch: 10   Global Step: 17450   Fp16 Grad Scale: 131072   Required: 15 hours
Training: 2022-03-26 03:31:06,718-Speed 25208.27 samples/sec   Loss 3.3664   LearningRate 0.0007   Epoch: 10   Global Step: 17460   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:31:16,422-Speed 25329.24 samples/sec   Loss 3.3669   LearningRate 0.0007   Epoch: 10   Global Step: 17470   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:31:26,178-Speed 25194.55 samples/sec   Loss 3.4040   LearningRate 0.0007   Epoch: 10   Global Step: 17480   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:31:35,917-Speed 25235.10 samples/sec   Loss 3.3845   LearningRate 0.0007   Epoch: 10   Global Step: 17490   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:31:45,642-Speed 25275.42 samples/sec   Loss 3.3497   LearningRate 0.0007   Epoch: 10   Global Step: 17500   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:31:55,396-Speed 25199.84 samples/sec   Loss 3.3584   LearningRate 0.0007   Epoch: 10   Global Step: 17510   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:32:05,180-Speed 25122.52 samples/sec   Loss 3.3462   LearningRate 0.0007   Epoch: 10   Global Step: 17520   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:32:14,833-Speed 25461.22 samples/sec   Loss 3.3728   LearningRate 0.0007   Epoch: 10   Global Step: 17530   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-26 03:32:24,564-Speed 25263.00 samples/sec   Loss 3.3789   LearningRate 0.0007   Epoch: 10   Global Step: 17540   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:32:34,346-Speed 25125.93 samples/sec   Loss 3.3753   LearningRate 0.0007   Epoch: 10   Global Step: 17550   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:32:44,110-Speed 25172.08 samples/sec   Loss 3.3254   LearningRate 0.0007   Epoch: 10   Global Step: 17560   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-03-26 03:32:53,859-Speed 25214.35 samples/sec   Loss 3.3272   LearningRate 0.0007   Epoch: 10   Global Step: 17570   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:33:03,572-Speed 25303.90 samples/sec   Loss 3.3557   LearningRate 0.0007   Epoch: 10   Global Step: 17580   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:33:13,258-Speed 25377.60 samples/sec   Loss 3.3413   LearningRate 0.0007   Epoch: 10   Global Step: 17590   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:33:22,973-Speed 25298.31 samples/sec   Loss 3.3503   LearningRate 0.0007   Epoch: 10   Global Step: 17600   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:33:32,788-Speed 25043.75 samples/sec   Loss 3.3338   LearningRate 0.0007   Epoch: 10   Global Step: 17610   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:33:42,500-Speed 25305.63 samples/sec   Loss 3.3816   LearningRate 0.0007   Epoch: 10   Global Step: 17620   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:33:52,211-Speed 25312.78 samples/sec   Loss 3.3521   LearningRate 0.0007   Epoch: 10   Global Step: 17630   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:34:01,965-Speed 25199.25 samples/sec   Loss 3.3448   LearningRate 0.0007   Epoch: 10   Global Step: 17640   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:34:11,643-Speed 25395.94 samples/sec   Loss 3.3006   LearningRate 0.0007   Epoch: 10   Global Step: 17650   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:34:21,342-Speed 25340.79 samples/sec   Loss 3.3256   LearningRate 0.0007   Epoch: 10   Global Step: 17660   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:34:31,117-Speed 25144.57 samples/sec   Loss 3.3202   LearningRate 0.0007   Epoch: 10   Global Step: 17670   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-03-26 03:34:40,823-Speed 25323.55 samples/sec   Loss 3.3547   LearningRate 0.0007   Epoch: 10   Global Step: 17680   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-03-26 03:34:50,525-Speed 25333.80 samples/sec   Loss 3.3530   LearningRate 0.0007   Epoch: 10   Global Step: 17690   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-03-26 03:35:00,295-Speed 25161.30 samples/sec   Loss 3.3330   LearningRate 0.0007   Epoch: 10   Global Step: 17700   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-03-26 03:35:10,045-Speed 25209.49 samples/sec   Loss 3.3416   LearningRate 0.0007   Epoch: 10   Global Step: 17710   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-03-26 03:35:19,726-Speed 25390.06 samples/sec   Loss 3.3223   LearningRate 0.0007   Epoch: 10   Global Step: 17720   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:35:29,432-Speed 25323.34 samples/sec   Loss 3.3367   LearningRate 0.0007   Epoch: 10   Global Step: 17730   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:35:39,207-Speed 25146.60 samples/sec   Loss 3.3304   LearningRate 0.0007   Epoch: 10   Global Step: 17740   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:35:48,932-Speed 25274.45 samples/sec   Loss 3.3243   LearningRate 0.0007   Epoch: 10   Global Step: 17750   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:35:58,689-Speed 25195.80 samples/sec   Loss 3.3512   LearningRate 0.0007   Epoch: 10   Global Step: 17760   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:36:08,502-Speed 25049.14 samples/sec   Loss 3.3303   LearningRate 0.0007   Epoch: 10   Global Step: 17770   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:36:18,305-Speed 25072.42 samples/sec   Loss 3.3399   LearningRate 0.0007   Epoch: 10   Global Step: 17780   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:36:28,086-Speed 25131.47 samples/sec   Loss 3.3393   LearningRate 0.0007   Epoch: 10   Global Step: 17790   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:36:37,751-Speed 25436.66 samples/sec   Loss 3.3541   LearningRate 0.0007   Epoch: 10   Global Step: 17800   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:36:47,526-Speed 25148.76 samples/sec   Loss 3.3173   LearningRate 0.0007   Epoch: 10   Global Step: 17810   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:36:57,294-Speed 25165.65 samples/sec   Loss 3.3198   LearningRate 0.0007   Epoch: 10   Global Step: 17820   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:37:07,265-Speed 24649.81 samples/sec   Loss 3.3063   LearningRate 0.0007   Epoch: 10   Global Step: 17830   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:37:17,037-Speed 25153.02 samples/sec   Loss 3.3223   LearningRate 0.0007   Epoch: 10   Global Step: 17840   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:37:26,739-Speed 25334.69 samples/sec   Loss 3.2830   LearningRate 0.0007   Epoch: 10   Global Step: 17850   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:37:36,484-Speed 25221.43 samples/sec   Loss 3.3148   LearningRate 0.0007   Epoch: 10   Global Step: 17860   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:37:46,300-Speed 25041.70 samples/sec   Loss 3.3254   LearningRate 0.0007   Epoch: 10   Global Step: 17870   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:37:55,975-Speed 25405.10 samples/sec   Loss 3.2995   LearningRate 0.0007   Epoch: 10   Global Step: 17880   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:38:05,774-Speed 25080.81 samples/sec   Loss 3.3011   LearningRate 0.0007   Epoch: 10   Global Step: 17890   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:38:15,577-Speed 25073.38 samples/sec   Loss 3.3065   LearningRate 0.0007   Epoch: 10   Global Step: 17900   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:38:25,327-Speed 25209.26 samples/sec   Loss 3.2956   LearningRate 0.0007   Epoch: 10   Global Step: 17910   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:38:35,090-Speed 25177.93 samples/sec   Loss 3.3015   LearningRate 0.0007   Epoch: 10   Global Step: 17920   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-03-26 03:38:44,948-Speed 24932.38 samples/sec   Loss 3.3092   LearningRate 0.0007   Epoch: 10   Global Step: 17930   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-03-26 03:38:54,848-Speed 24827.35 samples/sec   Loss 3.3309   LearningRate 0.0007   Epoch: 10   Global Step: 17940   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-03-26 03:39:04,581-Speed 25252.95 samples/sec   Loss 3.2962   LearningRate 0.0007   Epoch: 10   Global Step: 17950   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-03-26 03:39:14,303-Speed 25283.70 samples/sec   Loss 3.3084   LearningRate 0.0007   Epoch: 10   Global Step: 17960   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:39:24,089-Speed 25117.52 samples/sec   Loss 3.2876   LearningRate 0.0007   Epoch: 10   Global Step: 17970   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:39:33,855-Speed 25168.40 samples/sec   Loss 3.3003   LearningRate 0.0007   Epoch: 10   Global Step: 17980   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:39:43,678-Speed 25024.34 samples/sec   Loss 3.2913   LearningRate 0.0007   Epoch: 10   Global Step: 17990   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:39:53,379-Speed 25338.83 samples/sec   Loss 3.3282   LearningRate 0.0007   Epoch: 10   Global Step: 18000   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:40:03,012-Speed 25515.05 samples/sec   Loss 3.2915   LearningRate 0.0007   Epoch: 10   Global Step: 18010   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:40:12,817-Speed 25069.49 samples/sec   Loss 3.2798   LearningRate 0.0007   Epoch: 10   Global Step: 18020   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:40:22,515-Speed 25343.93 samples/sec   Loss 3.2742   LearningRate 0.0007   Epoch: 10   Global Step: 18030   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:40:32,333-Speed 25033.50 samples/sec   Loss 3.3120   LearningRate 0.0007   Epoch: 10   Global Step: 18040   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:40:42,231-Speed 24833.97 samples/sec   Loss 3.3008   LearningRate 0.0007   Epoch: 10   Global Step: 18050   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:40:52,047-Speed 25039.97 samples/sec   Loss 3.2849   LearningRate 0.0007   Epoch: 10   Global Step: 18060   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-03-26 03:41:01,820-Speed 25151.55 samples/sec   Loss 3.3317   LearningRate 0.0007   Epoch: 10   Global Step: 18070   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-03-26 03:41:11,636-Speed 25044.08 samples/sec   Loss 3.2941   LearningRate 0.0007   Epoch: 10   Global Step: 18080   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-03-26 03:41:21,304-Speed 25422.89 samples/sec   Loss 3.2860   LearningRate 0.0007   Epoch: 10   Global Step: 18090   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:41:30,977-Speed 25411.27 samples/sec   Loss 3.2750   LearningRate 0.0007   Epoch: 10   Global Step: 18100   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:41:40,654-Speed 25400.02 samples/sec   Loss 3.2898   LearningRate 0.0007   Epoch: 10   Global Step: 18110   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:41:50,372-Speed 25292.49 samples/sec   Loss 3.2869   LearningRate 0.0007   Epoch: 10   Global Step: 18120   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:42:00,061-Speed 25369.70 samples/sec   Loss 3.3113   LearningRate 0.0007   Epoch: 10   Global Step: 18130   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:42:09,813-Speed 25205.91 samples/sec   Loss 3.2555   LearningRate 0.0007   Epoch: 10   Global Step: 18140   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:42:19,573-Speed 25183.05 samples/sec   Loss 3.2869   LearningRate 0.0007   Epoch: 10   Global Step: 18150   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:42:29,288-Speed 25300.90 samples/sec   Loss 3.2910   LearningRate 0.0007   Epoch: 10   Global Step: 18160   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:42:39,023-Speed 25251.14 samples/sec   Loss 3.2579   LearningRate 0.0007   Epoch: 10   Global Step: 18170   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:42:48,706-Speed 25385.19 samples/sec   Loss 3.2653   LearningRate 0.0007   Epoch: 10   Global Step: 18180   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:42:58,478-Speed 25152.72 samples/sec   Loss 3.2926   LearningRate 0.0007   Epoch: 10   Global Step: 18190   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-03-26 03:43:08,210-Speed 25257.13 samples/sec   Loss 3.2797   LearningRate 0.0007   Epoch: 10   Global Step: 18200   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:43:17,982-Speed 25151.16 samples/sec   Loss 3.2819   LearningRate 0.0007   Epoch: 10   Global Step: 18210   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:43:27,753-Speed 25155.48 samples/sec   Loss 3.2740   LearningRate 0.0007   Epoch: 10   Global Step: 18220   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:43:37,432-Speed 25394.77 samples/sec   Loss 3.2923   LearningRate 0.0007   Epoch: 10   Global Step: 18230   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:43:47,127-Speed 25351.31 samples/sec   Loss 3.2541   LearningRate 0.0007   Epoch: 10   Global Step: 18240   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:43:56,832-Speed 25329.24 samples/sec   Loss 3.2437   LearningRate 0.0007   Epoch: 10   Global Step: 18250   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:44:06,608-Speed 25140.70 samples/sec   Loss 3.2610   LearningRate 0.0007   Epoch: 10   Global Step: 18260   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:44:16,364-Speed 25195.04 samples/sec   Loss 3.2829   LearningRate 0.0007   Epoch: 10   Global Step: 18270   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:44:26,232-Speed 24906.71 samples/sec   Loss 3.2585   LearningRate 0.0007   Epoch: 10   Global Step: 18280   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:44:35,936-Speed 25330.54 samples/sec   Loss 3.2616   LearningRate 0.0007   Epoch: 10   Global Step: 18290   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:44:45,742-Speed 25066.54 samples/sec   Loss 3.2786   LearningRate 0.0007   Epoch: 10   Global Step: 18300   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-03-26 03:44:55,520-Speed 25136.43 samples/sec   Loss 3.3015   LearningRate 0.0007   Epoch: 10   Global Step: 18310   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:45:05,323-Speed 25072.20 samples/sec   Loss 3.2509   LearningRate 0.0007   Epoch: 10   Global Step: 18320   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:45:15,091-Speed 25164.17 samples/sec   Loss 3.2604   LearningRate 0.0007   Epoch: 10   Global Step: 18330   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:45:24,881-Speed 25106.25 samples/sec   Loss 3.2736   LearningRate 0.0007   Epoch: 10   Global Step: 18340   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:45:34,609-Speed 25267.45 samples/sec   Loss 3.2480   LearningRate 0.0007   Epoch: 10   Global Step: 18350   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:45:44,333-Speed 25274.84 samples/sec   Loss 3.2423   LearningRate 0.0007   Epoch: 10   Global Step: 18360   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:45:54,176-Speed 24971.45 samples/sec   Loss 3.2610   LearningRate 0.0007   Epoch: 10   Global Step: 18370   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:46:03,893-Speed 25297.65 samples/sec   Loss 3.2422   LearningRate 0.0007   Epoch: 10   Global Step: 18380   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:46:13,622-Speed 25261.87 samples/sec   Loss 3.2766   LearningRate 0.0007   Epoch: 10   Global Step: 18390   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:46:23,457-Speed 24992.23 samples/sec   Loss 3.2440   LearningRate 0.0007   Epoch: 10   Global Step: 18400   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:46:33,243-Speed 25118.36 samples/sec   Loss 3.2671   LearningRate 0.0007   Epoch: 10   Global Step: 18410   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:46:43,223-Speed 24626.09 samples/sec   Loss 3.2554   LearningRate 0.0007   Epoch: 10   Global Step: 18420   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:46:52,955-Speed 25257.67 samples/sec   Loss 3.2519   LearningRate 0.0007   Epoch: 10   Global Step: 18430   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:47:02,721-Speed 25171.79 samples/sec   Loss 3.2454   LearningRate 0.0007   Epoch: 10   Global Step: 18440   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:47:12,370-Speed 25474.52 samples/sec   Loss 3.2460   LearningRate 0.0007   Epoch: 10   Global Step: 18450   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:47:22,068-Speed 25344.71 samples/sec   Loss 3.2139   LearningRate 0.0007   Epoch: 10   Global Step: 18460   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:47:31,790-Speed 25282.56 samples/sec   Loss 3.2636   LearningRate 0.0007   Epoch: 10   Global Step: 18470   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:47:41,636-Speed 24963.77 samples/sec   Loss 3.2546   LearningRate 0.0007   Epoch: 10   Global Step: 18480   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:47:51,378-Speed 25230.25 samples/sec   Loss 3.2613   LearningRate 0.0007   Epoch: 10   Global Step: 18490   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:48:01,094-Speed 25297.20 samples/sec   Loss 3.2269   LearningRate 0.0007   Epoch: 10   Global Step: 18500   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:48:10,791-Speed 25347.13 samples/sec   Loss 3.2405   LearningRate 0.0007   Epoch: 10   Global Step: 18510   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-03-26 03:48:20,529-Speed 25240.50 samples/sec   Loss 3.2274   LearningRate 0.0007   Epoch: 10   Global Step: 18520   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:48:30,250-Speed 25283.61 samples/sec   Loss 3.2248   LearningRate 0.0007   Epoch: 10   Global Step: 18530   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:48:40,123-Speed 24896.17 samples/sec   Loss 3.2405   LearningRate 0.0007   Epoch: 10   Global Step: 18540   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:48:49,976-Speed 24946.18 samples/sec   Loss 3.2385   LearningRate 0.0007   Epoch: 10   Global Step: 18550   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:48:59,858-Speed 24871.84 samples/sec   Loss 3.2651   LearningRate 0.0007   Epoch: 10   Global Step: 18560   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:49:09,698-Speed 24977.54 samples/sec   Loss 3.2324   LearningRate 0.0007   Epoch: 10   Global Step: 18570   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:49:19,478-Speed 25132.56 samples/sec   Loss 3.2135   LearningRate 0.0007   Epoch: 10   Global Step: 18580   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:49:29,195-Speed 25294.78 samples/sec   Loss 3.2187   LearningRate 0.0007   Epoch: 10   Global Step: 18590   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:49:38,920-Speed 25274.58 samples/sec   Loss 3.2157   LearningRate 0.0007   Epoch: 10   Global Step: 18600   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:49:48,732-Speed 25048.46 samples/sec   Loss 3.2152   LearningRate 0.0007   Epoch: 10   Global Step: 18610   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:49:58,549-Speed 25038.11 samples/sec   Loss 3.2025   LearningRate 0.0007   Epoch: 10   Global Step: 18620   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-03-26 03:50:08,406-Speed 24935.44 samples/sec   Loss 3.2187   LearningRate 0.0007   Epoch: 10   Global Step: 18630   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:50:18,170-Speed 25173.37 samples/sec   Loss 3.2058   LearningRate 0.0007   Epoch: 10   Global Step: 18640   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:50:27,930-Speed 25186.67 samples/sec   Loss 3.2178   LearningRate 0.0007   Epoch: 10   Global Step: 18650   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:50:37,759-Speed 25004.65 samples/sec   Loss 3.2135   LearningRate 0.0007   Epoch: 10   Global Step: 18660   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:50:47,582-Speed 25022.90 samples/sec   Loss 3.2414   LearningRate 0.0007   Epoch: 10   Global Step: 18670   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:50:57,316-Speed 25250.72 samples/sec   Loss 3.2110   LearningRate 0.0007   Epoch: 10   Global Step: 18680   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:51:07,076-Speed 25183.88 samples/sec   Loss 3.2326   LearningRate 0.0007   Epoch: 10   Global Step: 18690   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:51:16,841-Speed 25171.12 samples/sec   Loss 3.2331   LearningRate 0.0007   Epoch: 10   Global Step: 18700   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:51:26,750-Speed 24803.21 samples/sec   Loss 3.2055   LearningRate 0.0007   Epoch: 10   Global Step: 18710   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:51:36,625-Speed 24892.01 samples/sec   Loss 3.2150   LearningRate 0.0007   Epoch: 10   Global Step: 18720   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:51:46,410-Speed 25120.94 samples/sec   Loss 3.2286   LearningRate 0.0007   Epoch: 10   Global Step: 18730   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-03-26 03:51:56,127-Speed 25296.65 samples/sec   Loss 3.1943   LearningRate 0.0007   Epoch: 10   Global Step: 18740   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-03-26 03:52:05,810-Speed 25385.35 samples/sec   Loss 3.2159   LearningRate 0.0007   Epoch: 10   Global Step: 18750   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:52:15,508-Speed 25344.86 samples/sec   Loss 3.2011   LearningRate 0.0007   Epoch: 10   Global Step: 18760   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:52:25,284-Speed 25142.00 samples/sec   Loss 3.2130   LearningRate 0.0007   Epoch: 10   Global Step: 18770   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:52:34,931-Speed 25478.23 samples/sec   Loss 3.2401   LearningRate 0.0007   Epoch: 10   Global Step: 18780   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:52:44,630-Speed 25344.11 samples/sec   Loss 3.2092   LearningRate 0.0007   Epoch: 10   Global Step: 18790   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:52:54,318-Speed 25370.99 samples/sec   Loss 3.2085   LearningRate 0.0007   Epoch: 10   Global Step: 18800   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:53:04,072-Speed 25199.67 samples/sec   Loss 3.2321   LearningRate 0.0007   Epoch: 10   Global Step: 18810   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:53:13,723-Speed 25470.99 samples/sec   Loss 3.2268   LearningRate 0.0007   Epoch: 10   Global Step: 18820   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:53:23,454-Speed 25258.84 samples/sec   Loss 3.2223   LearningRate 0.0007   Epoch: 10   Global Step: 18830   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:53:33,256-Speed 25079.52 samples/sec   Loss 3.2007   LearningRate 0.0007   Epoch: 10   Global Step: 18840   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:53:43,054-Speed 25087.23 samples/sec   Loss 3.2156   LearningRate 0.0007   Epoch: 10   Global Step: 18850   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:53:52,787-Speed 25254.91 samples/sec   Loss 3.2016   LearningRate 0.0007   Epoch: 10   Global Step: 18860   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:54:02,519-Speed 25255.35 samples/sec   Loss 3.1949   LearningRate 0.0007   Epoch: 10   Global Step: 18870   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:54:12,248-Speed 25264.62 samples/sec   Loss 3.2222   LearningRate 0.0007   Epoch: 10   Global Step: 18880   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:54:22,208-Speed 24678.34 samples/sec   Loss 3.1911   LearningRate 0.0007   Epoch: 10   Global Step: 18890   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:54:32,096-Speed 24856.73 samples/sec   Loss 3.2026   LearningRate 0.0007   Epoch: 10   Global Step: 18900   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:54:41,884-Speed 25111.30 samples/sec   Loss 3.1745   LearningRate 0.0007   Epoch: 10   Global Step: 18910   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:54:51,587-Speed 25332.54 samples/sec   Loss 3.2018   LearningRate 0.0007   Epoch: 10   Global Step: 18920   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:55:01,307-Speed 25287.24 samples/sec   Loss 3.1996   LearningRate 0.0007   Epoch: 10   Global Step: 18930   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:55:11,077-Speed 25156.00 samples/sec   Loss 3.2167   LearningRate 0.0007   Epoch: 10   Global Step: 18940   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:55:20,903-Speed 25015.19 samples/sec   Loss 3.2097   LearningRate 0.0007   Epoch: 10   Global Step: 18950   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-03-26 03:55:30,608-Speed 25327.31 samples/sec   Loss 3.1941   LearningRate 0.0007   Epoch: 10   Global Step: 18960   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-03-26 03:55:40,376-Speed 25163.00 samples/sec   Loss 3.2075   LearningRate 0.0006   Epoch: 10   Global Step: 18970   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-03-26 03:55:50,119-Speed 25229.55 samples/sec   Loss 3.2546   LearningRate 0.0006   Epoch: 10   Global Step: 18980   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:55:59,852-Speed 25261.72 samples/sec   Loss 3.2255   LearningRate 0.0006   Epoch: 10   Global Step: 18990   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:56:09,748-Speed 24836.78 samples/sec   Loss 3.2369   LearningRate 0.0006   Epoch: 10   Global Step: 19000   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:56:19,544-Speed 25091.41 samples/sec   Loss 3.2390   LearningRate 0.0006   Epoch: 10   Global Step: 19010   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:57:19,611-Speed 4091.57 samples/sec   Loss 3.1713   LearningRate 0.0006   Epoch: 11   Global Step: 19020   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:57:29,557-Speed 24713.64 samples/sec   Loss 3.1366   LearningRate 0.0006   Epoch: 11   Global Step: 19030   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:57:39,595-Speed 24485.51 samples/sec   Loss 3.1363   LearningRate 0.0006   Epoch: 11   Global Step: 19040   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:57:49,505-Speed 24802.81 samples/sec   Loss 3.1625   LearningRate 0.0006   Epoch: 11   Global Step: 19050   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:57:59,427-Speed 24774.46 samples/sec   Loss 3.1829   LearningRate 0.0006   Epoch: 11   Global Step: 19060   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:58:09,348-Speed 24775.91 samples/sec   Loss 3.1809   LearningRate 0.0006   Epoch: 11   Global Step: 19070   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:58:19,278-Speed 24750.89 samples/sec   Loss 3.1666   LearningRate 0.0006   Epoch: 11   Global Step: 19080   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-03-26 03:58:29,187-Speed 24803.25 samples/sec   Loss 3.1787   LearningRate 0.0006   Epoch: 11   Global Step: 19090   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 03:58:39,277-Speed 24361.04 samples/sec   Loss 3.1794   LearningRate 0.0006   Epoch: 11   Global Step: 19100   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-26 03:58:49,212-Speed 24740.63 samples/sec   Loss 3.1315   LearningRate 0.0006   Epoch: 11   Global Step: 19110   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-26 03:58:59,202-Speed 24608.36 samples/sec   Loss 3.1539   LearningRate 0.0006   Epoch: 11   Global Step: 19120   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-26 03:59:09,151-Speed 24707.10 samples/sec   Loss 3.1556   LearningRate 0.0006   Epoch: 11   Global Step: 19130   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-26 03:59:19,083-Speed 24746.58 samples/sec   Loss 3.1341   LearningRate 0.0006   Epoch: 11   Global Step: 19140   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-26 03:59:29,012-Speed 24755.07 samples/sec   Loss 3.1834   LearningRate 0.0006   Epoch: 11   Global Step: 19150   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-26 03:59:38,908-Speed 24838.29 samples/sec   Loss 3.1834   LearningRate 0.0006   Epoch: 11   Global Step: 19160   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-26 03:59:48,871-Speed 24668.83 samples/sec   Loss 3.1507   LearningRate 0.0006   Epoch: 11   Global Step: 19170   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-26 03:59:58,871-Speed 24578.50 samples/sec   Loss 3.1759   LearningRate 0.0006   Epoch: 11   Global Step: 19180   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-26 04:00:08,855-Speed 24620.83 samples/sec   Loss 3.1266   LearningRate 0.0006   Epoch: 11   Global Step: 19190   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-26 04:00:18,719-Speed 24916.97 samples/sec   Loss 3.1274   LearningRate 0.0006   Epoch: 11   Global Step: 19200   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:00:28,672-Speed 24694.99 samples/sec   Loss 3.1632   LearningRate 0.0006   Epoch: 11   Global Step: 19210   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:00:38,633-Speed 24676.36 samples/sec   Loss 3.1804   LearningRate 0.0006   Epoch: 11   Global Step: 19220   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:00:48,557-Speed 24766.64 samples/sec   Loss 3.1676   LearningRate 0.0006   Epoch: 11   Global Step: 19230   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:00:58,507-Speed 24703.95 samples/sec   Loss 3.1517   LearningRate 0.0006   Epoch: 11   Global Step: 19240   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:01:08,469-Speed 24671.87 samples/sec   Loss 3.1650   LearningRate 0.0006   Epoch: 11   Global Step: 19250   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:01:18,383-Speed 24791.52 samples/sec   Loss 3.1661   LearningRate 0.0006   Epoch: 11   Global Step: 19260   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:01:28,325-Speed 24723.15 samples/sec   Loss 3.1791   LearningRate 0.0006   Epoch: 11   Global Step: 19270   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:01:38,252-Speed 24766.61 samples/sec   Loss 3.1685   LearningRate 0.0006   Epoch: 11   Global Step: 19280   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:01:48,130-Speed 24881.38 samples/sec   Loss 3.1468   LearningRate 0.0006   Epoch: 11   Global Step: 19290   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:01:58,033-Speed 24820.57 samples/sec   Loss 3.1689   LearningRate 0.0006   Epoch: 11   Global Step: 19300   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:02:07,989-Speed 24685.98 samples/sec   Loss 3.1376   LearningRate 0.0006   Epoch: 11   Global Step: 19310   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:02:17,970-Speed 24631.75 samples/sec   Loss 3.1573   LearningRate 0.0006   Epoch: 11   Global Step: 19320   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:02:27,845-Speed 24892.24 samples/sec   Loss 3.1586   LearningRate 0.0006   Epoch: 11   Global Step: 19330   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:02:37,866-Speed 24526.28 samples/sec   Loss 3.1363   LearningRate 0.0006   Epoch: 11   Global Step: 19340   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:02:47,840-Speed 24648.84 samples/sec   Loss 3.1909   LearningRate 0.0006   Epoch: 11   Global Step: 19350   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:02:57,910-Speed 24408.96 samples/sec   Loss 3.1534   LearningRate 0.0006   Epoch: 11   Global Step: 19360   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:03:07,869-Speed 24680.19 samples/sec   Loss 3.1894   LearningRate 0.0006   Epoch: 11   Global Step: 19370   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:03:17,751-Speed 24872.37 samples/sec   Loss 3.1585   LearningRate 0.0006   Epoch: 11   Global Step: 19380   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:03:27,675-Speed 24767.24 samples/sec   Loss 3.1625   LearningRate 0.0006   Epoch: 11   Global Step: 19390   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:03:37,639-Speed 24669.02 samples/sec   Loss 3.1495   LearningRate 0.0006   Epoch: 11   Global Step: 19400   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-03-26 04:03:47,599-Speed 24678.50 samples/sec   Loss 3.1373   LearningRate 0.0006   Epoch: 11   Global Step: 19410   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:03:57,629-Speed 24505.74 samples/sec   Loss 3.1564   LearningRate 0.0006   Epoch: 11   Global Step: 19420   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:04:07,517-Speed 24856.77 samples/sec   Loss 3.1430   LearningRate 0.0006   Epoch: 11   Global Step: 19430   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:04:17,533-Speed 24540.42 samples/sec   Loss 3.1203   LearningRate 0.0006   Epoch: 11   Global Step: 19440   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:04:27,543-Speed 24554.41 samples/sec   Loss 3.1331   LearningRate 0.0006   Epoch: 11   Global Step: 19450   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:04:37,458-Speed 24790.14 samples/sec   Loss 3.1638   LearningRate 0.0006   Epoch: 11   Global Step: 19460   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:04:47,443-Speed 24614.56 samples/sec   Loss 3.1712   LearningRate 0.0006   Epoch: 11   Global Step: 19470   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:04:57,451-Speed 24560.78 samples/sec   Loss 3.1531   LearningRate 0.0006   Epoch: 11   Global Step: 19480   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:05:07,467-Speed 24538.97 samples/sec   Loss 3.1083   LearningRate 0.0006   Epoch: 11   Global Step: 19490   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:05:17,480-Speed 24547.41 samples/sec   Loss 3.1749   LearningRate 0.0006   Epoch: 11   Global Step: 19500   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:05:27,483-Speed 24570.59 samples/sec   Loss 3.1435   LearningRate 0.0006   Epoch: 11   Global Step: 19510   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-03-26 04:05:37,410-Speed 24760.58 samples/sec   Loss 3.1357   LearningRate 0.0006   Epoch: 11   Global Step: 19520   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-03-26 04:05:47,383-Speed 24645.58 samples/sec   Loss 3.1498   LearningRate 0.0006   Epoch: 11   Global Step: 19530   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:05:57,516-Speed 24258.27 samples/sec   Loss 3.1452   LearningRate 0.0006   Epoch: 11   Global Step: 19540   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:06:07,525-Speed 24557.55 samples/sec   Loss 3.1043   LearningRate 0.0006   Epoch: 11   Global Step: 19550   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:06:17,484-Speed 24687.38 samples/sec   Loss 3.1331   LearningRate 0.0006   Epoch: 11   Global Step: 19560   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:06:27,522-Speed 24485.42 samples/sec   Loss 3.1041   LearningRate 0.0006   Epoch: 11   Global Step: 19570   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:06:37,495-Speed 24645.47 samples/sec   Loss 3.1562   LearningRate 0.0006   Epoch: 11   Global Step: 19580   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:06:47,455-Speed 24678.00 samples/sec   Loss 3.1271   LearningRate 0.0006   Epoch: 11   Global Step: 19590   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:06:57,406-Speed 24701.44 samples/sec   Loss 3.1064   LearningRate 0.0006   Epoch: 11   Global Step: 19600   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:07:07,342-Speed 24736.89 samples/sec   Loss 3.1397   LearningRate 0.0006   Epoch: 11   Global Step: 19610   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:07:17,284-Speed 24723.16 samples/sec   Loss 3.1127   LearningRate 0.0006   Epoch: 11   Global Step: 19620   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:07:27,280-Speed 24590.29 samples/sec   Loss 3.1217   LearningRate 0.0006   Epoch: 11   Global Step: 19630   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-03-26 04:07:37,179-Speed 24829.35 samples/sec   Loss 3.1211   LearningRate 0.0006   Epoch: 11   Global Step: 19640   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:07:47,094-Speed 24791.64 samples/sec   Loss 3.1101   LearningRate 0.0006   Epoch: 11   Global Step: 19650   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:07:57,141-Speed 24468.25 samples/sec   Loss 3.1072   LearningRate 0.0006   Epoch: 11   Global Step: 19660   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:08:07,105-Speed 24669.08 samples/sec   Loss 3.1083   LearningRate 0.0006   Epoch: 11   Global Step: 19670   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:08:17,109-Speed 24568.53 samples/sec   Loss 3.1285   LearningRate 0.0006   Epoch: 11   Global Step: 19680   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:08:27,025-Speed 24784.77 samples/sec   Loss 3.1110   LearningRate 0.0006   Epoch: 11   Global Step: 19690   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:08:36,968-Speed 24721.55 samples/sec   Loss 3.1002   LearningRate 0.0006   Epoch: 11   Global Step: 19700   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:08:47,162-Speed 24111.49 samples/sec   Loss 3.0987   LearningRate 0.0006   Epoch: 11   Global Step: 19710   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:08:57,083-Speed 24775.07 samples/sec   Loss 3.1136   LearningRate 0.0006   Epoch: 11   Global Step: 19720   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:09:07,130-Speed 24461.55 samples/sec   Loss 3.0880   LearningRate 0.0006   Epoch: 11   Global Step: 19730   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:09:17,192-Speed 24429.41 samples/sec   Loss 3.1224   LearningRate 0.0006   Epoch: 11   Global Step: 19740   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-03-26 04:09:27,079-Speed 24866.49 samples/sec   Loss 3.1192   LearningRate 0.0006   Epoch: 11   Global Step: 19750   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:09:37,009-Speed 24751.82 samples/sec   Loss 3.1129   LearningRate 0.0006   Epoch: 11   Global Step: 19760   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:09:46,952-Speed 24718.26 samples/sec   Loss 3.1419   LearningRate 0.0006   Epoch: 11   Global Step: 19770   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:09:56,918-Speed 24669.41 samples/sec   Loss 3.1101   LearningRate 0.0006   Epoch: 11   Global Step: 19780   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:10:06,937-Speed 24532.19 samples/sec   Loss 3.1335   LearningRate 0.0006   Epoch: 11   Global Step: 19790   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:10:16,797-Speed 24929.37 samples/sec   Loss 3.0933   LearningRate 0.0006   Epoch: 11   Global Step: 19800   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:10:26,707-Speed 24801.52 samples/sec   Loss 3.1262   LearningRate 0.0006   Epoch: 11   Global Step: 19810   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:10:36,777-Speed 24408.53 samples/sec   Loss 3.0907   LearningRate 0.0006   Epoch: 11   Global Step: 19820   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:10:46,782-Speed 24567.03 samples/sec   Loss 3.0863   LearningRate 0.0006   Epoch: 11   Global Step: 19830   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:10:56,785-Speed 24571.02 samples/sec   Loss 3.1282   LearningRate 0.0006   Epoch: 11   Global Step: 19840   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:11:06,809-Speed 24522.02 samples/sec   Loss 3.1073   LearningRate 0.0006   Epoch: 11   Global Step: 19850   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:11:16,820-Speed 24550.99 samples/sec   Loss 3.0919   LearningRate 0.0006   Epoch: 11   Global Step: 19860   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:11:26,735-Speed 24791.78 samples/sec   Loss 3.0934   LearningRate 0.0006   Epoch: 11   Global Step: 19870   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:11:36,761-Speed 24515.46 samples/sec   Loss 3.0842   LearningRate 0.0006   Epoch: 11   Global Step: 19880   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:11:46,783-Speed 24525.68 samples/sec   Loss 3.0707   LearningRate 0.0006   Epoch: 11   Global Step: 19890   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:11:56,729-Speed 24718.08 samples/sec   Loss 3.0747   LearningRate 0.0006   Epoch: 11   Global Step: 19900   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:12:06,725-Speed 24589.07 samples/sec   Loss 3.0985   LearningRate 0.0006   Epoch: 11   Global Step: 19910   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:12:16,549-Speed 25020.21 samples/sec   Loss 3.1289   LearningRate 0.0006   Epoch: 11   Global Step: 19920   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:12:26,360-Speed 25052.61 samples/sec   Loss 3.1126   LearningRate 0.0006   Epoch: 11   Global Step: 19930   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:12:36,254-Speed 24842.01 samples/sec   Loss 3.0736   LearningRate 0.0006   Epoch: 11   Global Step: 19940   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:12:46,253-Speed 24583.32 samples/sec   Loss 3.0865   LearningRate 0.0006   Epoch: 11   Global Step: 19950   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-03-26 04:12:56,302-Speed 24458.51 samples/sec   Loss 3.0981   LearningRate 0.0006   Epoch: 11   Global Step: 19960   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:13:06,381-Speed 24386.46 samples/sec   Loss 3.0791   LearningRate 0.0006   Epoch: 11   Global Step: 19970   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:13:16,481-Speed 24334.65 samples/sec   Loss 3.1331   LearningRate 0.0006   Epoch: 11   Global Step: 19980   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:13:26,591-Speed 24312.46 samples/sec   Loss 3.0679   LearningRate 0.0006   Epoch: 11   Global Step: 19990   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:13:36,651-Speed 24431.77 samples/sec   Loss 3.0579   LearningRate 0.0006   Epoch: 11   Global Step: 20000   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:13:46,772-Speed 24286.72 samples/sec   Loss 3.1125   LearningRate 0.0006   Epoch: 11   Global Step: 20010   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:13:56,880-Speed 24316.64 samples/sec   Loss 3.0919   LearningRate 0.0006   Epoch: 11   Global Step: 20020   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:14:06,997-Speed 24292.94 samples/sec   Loss 3.0730   LearningRate 0.0006   Epoch: 11   Global Step: 20030   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:14:17,086-Speed 24364.31 samples/sec   Loss 3.0871   LearningRate 0.0006   Epoch: 11   Global Step: 20040   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:14:27,123-Speed 24485.14 samples/sec   Loss 3.0547   LearningRate 0.0006   Epoch: 11   Global Step: 20050   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:14:37,253-Speed 24264.53 samples/sec   Loss 3.1004   LearningRate 0.0006   Epoch: 11   Global Step: 20060   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-03-26 04:14:47,394-Speed 24237.90 samples/sec   Loss 3.1108   LearningRate 0.0006   Epoch: 11   Global Step: 20070   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:14:57,409-Speed 24542.74 samples/sec   Loss 3.1054   LearningRate 0.0006   Epoch: 11   Global Step: 20080   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:15:07,607-Speed 24104.01 samples/sec   Loss 3.0705   LearningRate 0.0006   Epoch: 11   Global Step: 20090   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:15:17,680-Speed 24400.16 samples/sec   Loss 3.0476   LearningRate 0.0006   Epoch: 11   Global Step: 20100   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:15:27,815-Speed 24252.44 samples/sec   Loss 3.0982   LearningRate 0.0006   Epoch: 11   Global Step: 20110   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:15:37,862-Speed 24464.16 samples/sec   Loss 3.0711   LearningRate 0.0006   Epoch: 11   Global Step: 20120   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:15:48,016-Speed 24206.64 samples/sec   Loss 3.0659   LearningRate 0.0006   Epoch: 11   Global Step: 20130   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:15:58,137-Speed 24285.22 samples/sec   Loss 3.0573   LearningRate 0.0006   Epoch: 11   Global Step: 20140   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:16:08,239-Speed 24331.39 samples/sec   Loss 3.0555   LearningRate 0.0006   Epoch: 11   Global Step: 20150   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:16:18,303-Speed 24422.14 samples/sec   Loss 3.0792   LearningRate 0.0006   Epoch: 11   Global Step: 20160   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:16:28,339-Speed 24491.27 samples/sec   Loss 3.0608   LearningRate 0.0006   Epoch: 11   Global Step: 20170   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:16:38,468-Speed 24266.36 samples/sec   Loss 3.0646   LearningRate 0.0006   Epoch: 11   Global Step: 20180   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:16:48,503-Speed 24493.68 samples/sec   Loss 3.0744   LearningRate 0.0006   Epoch: 11   Global Step: 20190   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:16:58,528-Speed 24517.65 samples/sec   Loss 3.0476   LearningRate 0.0006   Epoch: 11   Global Step: 20200   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:17:08,581-Speed 24448.54 samples/sec   Loss 3.0500   LearningRate 0.0006   Epoch: 11   Global Step: 20210   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:17:18,626-Speed 24470.27 samples/sec   Loss 3.0545   LearningRate 0.0006   Epoch: 11   Global Step: 20220   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:17:28,684-Speed 24446.02 samples/sec   Loss 3.0939   LearningRate 0.0006   Epoch: 11   Global Step: 20230   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:17:38,791-Speed 24318.54 samples/sec   Loss 3.0761   LearningRate 0.0006   Epoch: 11   Global Step: 20240   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:17:48,850-Speed 24433.96 samples/sec   Loss 3.0278   LearningRate 0.0006   Epoch: 11   Global Step: 20250   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:17:58,983-Speed 24256.20 samples/sec   Loss 3.0758   LearningRate 0.0006   Epoch: 11   Global Step: 20260   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:18:09,073-Speed 24360.61 samples/sec   Loss 3.0326   LearningRate 0.0006   Epoch: 11   Global Step: 20270   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:18:19,182-Speed 24313.59 samples/sec   Loss 3.0358   LearningRate 0.0006   Epoch: 11   Global Step: 20280   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:18:29,307-Speed 24276.65 samples/sec   Loss 3.0735   LearningRate 0.0006   Epoch: 11   Global Step: 20290   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:18:39,361-Speed 24445.68 samples/sec   Loss 3.0612   LearningRate 0.0006   Epoch: 11   Global Step: 20300   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:18:49,476-Speed 24301.07 samples/sec   Loss 3.0623   LearningRate 0.0006   Epoch: 11   Global Step: 20310   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:18:59,591-Speed 24299.75 samples/sec   Loss 3.0567   LearningRate 0.0006   Epoch: 11   Global Step: 20320   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:19:09,764-Speed 24161.94 samples/sec   Loss 3.0488   LearningRate 0.0006   Epoch: 11   Global Step: 20330   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:19:19,925-Speed 24189.75 samples/sec   Loss 3.0604   LearningRate 0.0006   Epoch: 11   Global Step: 20340   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:19:30,108-Speed 24138.17 samples/sec   Loss 3.0917   LearningRate 0.0006   Epoch: 11   Global Step: 20350   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:19:40,202-Speed 24348.13 samples/sec   Loss 3.0688   LearningRate 0.0006   Epoch: 11   Global Step: 20360   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:19:50,301-Speed 24339.19 samples/sec   Loss 3.0525   LearningRate 0.0006   Epoch: 11   Global Step: 20370   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-03-26 04:20:00,337-Speed 24490.44 samples/sec   Loss 3.0639   LearningRate 0.0006   Epoch: 11   Global Step: 20380   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:20:10,475-Speed 24244.29 samples/sec   Loss 3.0527   LearningRate 0.0006   Epoch: 11   Global Step: 20390   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:20:20,691-Speed 24059.31 samples/sec   Loss 3.0693   LearningRate 0.0006   Epoch: 11   Global Step: 20400   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:20:31,126-Speed 23555.19 samples/sec   Loss 3.0483   LearningRate 0.0006   Epoch: 11   Global Step: 20410   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:20:41,318-Speed 24114.41 samples/sec   Loss 3.0523   LearningRate 0.0006   Epoch: 11   Global Step: 20420   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:20:51,395-Speed 24393.30 samples/sec   Loss 3.0465   LearningRate 0.0006   Epoch: 11   Global Step: 20430   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:21:01,509-Speed 24300.41 samples/sec   Loss 3.0403   LearningRate 0.0006   Epoch: 11   Global Step: 20440   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:21:11,510-Speed 24578.44 samples/sec   Loss 3.0378   LearningRate 0.0006   Epoch: 11   Global Step: 20450   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:21:21,538-Speed 24508.62 samples/sec   Loss 3.0477   LearningRate 0.0006   Epoch: 11   Global Step: 20460   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:21:31,582-Speed 24472.52 samples/sec   Loss 3.1058   LearningRate 0.0006   Epoch: 11   Global Step: 20470   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:21:41,736-Speed 24207.39 samples/sec   Loss 3.0924   LearningRate 0.0006   Epoch: 11   Global Step: 20480   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:21:51,791-Speed 24445.88 samples/sec   Loss 3.0640   LearningRate 0.0006   Epoch: 11   Global Step: 20490   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:22:01,887-Speed 24345.18 samples/sec   Loss 3.0594   LearningRate 0.0006   Epoch: 11   Global Step: 20500   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:22:12,006-Speed 24289.85 samples/sec   Loss 3.0284   LearningRate 0.0006   Epoch: 11   Global Step: 20510   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:22:22,060-Speed 24445.97 samples/sec   Loss 3.0316   LearningRate 0.0006   Epoch: 11   Global Step: 20520   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:22:32,149-Speed 24362.69 samples/sec   Loss 3.0304   LearningRate 0.0006   Epoch: 11   Global Step: 20530   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:22:42,205-Speed 24441.66 samples/sec   Loss 3.0370   LearningRate 0.0006   Epoch: 11   Global Step: 20540   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:22:52,303-Speed 24338.70 samples/sec   Loss 3.0451   LearningRate 0.0006   Epoch: 11   Global Step: 20550   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:23:02,369-Speed 24417.26 samples/sec   Loss 3.0348   LearningRate 0.0006   Epoch: 11   Global Step: 20560   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:23:12,598-Speed 24029.39 samples/sec   Loss 3.0221   LearningRate 0.0006   Epoch: 11   Global Step: 20570   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:23:22,764-Speed 24175.07 samples/sec   Loss 3.0447   LearningRate 0.0006   Epoch: 11   Global Step: 20580   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-03-26 04:23:32,798-Speed 24495.30 samples/sec   Loss 3.0286   LearningRate 0.0006   Epoch: 11   Global Step: 20590   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:23:42,910-Speed 24306.26 samples/sec   Loss 3.0303   LearningRate 0.0006   Epoch: 11   Global Step: 20600   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:23:52,983-Speed 24400.39 samples/sec   Loss 3.0288   LearningRate 0.0006   Epoch: 11   Global Step: 20610   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:24:03,038-Speed 24445.00 samples/sec   Loss 3.0136   LearningRate 0.0006   Epoch: 11   Global Step: 20620   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:24:13,076-Speed 24491.30 samples/sec   Loss 3.0277   LearningRate 0.0006   Epoch: 11   Global Step: 20630   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:24:23,145-Speed 24409.87 samples/sec   Loss 3.0217   LearningRate 0.0006   Epoch: 11   Global Step: 20640   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:24:33,183-Speed 24483.65 samples/sec   Loss 3.0388   LearningRate 0.0006   Epoch: 11   Global Step: 20650   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:24:43,278-Speed 24348.51 samples/sec   Loss 3.0306   LearningRate 0.0006   Epoch: 11   Global Step: 20660   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:24:53,325-Speed 24462.45 samples/sec   Loss 3.0454   LearningRate 0.0006   Epoch: 11   Global Step: 20670   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:25:03,411-Speed 24370.62 samples/sec   Loss 3.0345   LearningRate 0.0006   Epoch: 11   Global Step: 20680   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:25:13,481-Speed 24406.09 samples/sec   Loss 3.0257   LearningRate 0.0006   Epoch: 11   Global Step: 20690   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-03-26 04:25:23,556-Speed 24395.48 samples/sec   Loss 3.0511   LearningRate 0.0006   Epoch: 11   Global Step: 20700   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:25:33,658-Speed 24329.39 samples/sec   Loss 3.0511   LearningRate 0.0006   Epoch: 11   Global Step: 20710   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:25:43,768-Speed 24311.17 samples/sec   Loss 3.0934   LearningRate 0.0006   Epoch: 11   Global Step: 20720   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:25:53,884-Speed 24295.65 samples/sec   Loss 3.0764   LearningRate 0.0006   Epoch: 11   Global Step: 20730   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:26:04,020-Speed 24247.99 samples/sec   Loss 3.0576   LearningRate 0.0006   Epoch: 11   Global Step: 20740   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:27:04,743-Speed 4047.35 samples/sec   Loss 2.9859   LearningRate 0.0006   Epoch: 12   Global Step: 20750   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:27:14,860-Speed 24294.11 samples/sec   Loss 2.9824   LearningRate 0.0006   Epoch: 12   Global Step: 20760   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:27:24,962-Speed 24331.34 samples/sec   Loss 3.0063   LearningRate 0.0006   Epoch: 12   Global Step: 20770   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:27:35,074-Speed 24305.31 samples/sec   Loss 2.9893   LearningRate 0.0006   Epoch: 12   Global Step: 20780   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:27:45,111-Speed 24488.05 samples/sec   Loss 2.9919   LearningRate 0.0006   Epoch: 12   Global Step: 20790   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:27:55,160-Speed 24461.22 samples/sec   Loss 3.0102   LearningRate 0.0006   Epoch: 12   Global Step: 20800   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-03-26 04:28:05,200-Speed 24480.81 samples/sec   Loss 3.0232   LearningRate 0.0006   Epoch: 12   Global Step: 20810   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:28:15,446-Speed 23989.02 samples/sec   Loss 2.9986   LearningRate 0.0006   Epoch: 12   Global Step: 20820   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:28:25,500-Speed 24448.75 samples/sec   Loss 3.0042   LearningRate 0.0006   Epoch: 12   Global Step: 20830   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:28:35,571-Speed 24403.24 samples/sec   Loss 3.0100   LearningRate 0.0006   Epoch: 12   Global Step: 20840   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:28:45,672-Speed 24333.81 samples/sec   Loss 2.9812   LearningRate 0.0006   Epoch: 12   Global Step: 20850   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:28:55,787-Speed 24299.24 samples/sec   Loss 2.9886   LearningRate 0.0006   Epoch: 12   Global Step: 20860   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:29:05,816-Speed 24506.74 samples/sec   Loss 2.9964   LearningRate 0.0006   Epoch: 12   Global Step: 20870   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:29:15,975-Speed 24196.57 samples/sec   Loss 2.9742   LearningRate 0.0006   Epoch: 12   Global Step: 20880   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:29:26,093-Speed 24294.09 samples/sec   Loss 2.9626   LearningRate 0.0006   Epoch: 12   Global Step: 20890   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:29:36,221-Speed 24267.34 samples/sec   Loss 2.9755   LearningRate 0.0006   Epoch: 12   Global Step: 20900   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:29:46,266-Speed 24468.97 samples/sec   Loss 2.9944   LearningRate 0.0006   Epoch: 12   Global Step: 20910   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-03-26 04:29:56,404-Speed 24245.55 samples/sec   Loss 3.0137   LearningRate 0.0006   Epoch: 12   Global Step: 20920   Fp16 Grad Scale: 131072   Required: 14 hours
Training: 2022-03-26 04:30:06,475-Speed 24404.71 samples/sec   Loss 3.0002   LearningRate 0.0006   Epoch: 12   Global Step: 20930   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:30:16,575-Speed 24336.21 samples/sec   Loss 3.0071   LearningRate 0.0006   Epoch: 12   Global Step: 20940   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:30:26,624-Speed 24458.66 samples/sec   Loss 2.9953   LearningRate 0.0006   Epoch: 12   Global Step: 20950   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:30:36,655-Speed 24503.85 samples/sec   Loss 2.9787   LearningRate 0.0006   Epoch: 12   Global Step: 20960   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:30:46,813-Speed 24197.15 samples/sec   Loss 3.0053   LearningRate 0.0006   Epoch: 12   Global Step: 20970   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:30:56,861-Speed 24464.53 samples/sec   Loss 3.0121   LearningRate 0.0006   Epoch: 12   Global Step: 20980   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:31:06,973-Speed 24305.69 samples/sec   Loss 2.9909   LearningRate 0.0006   Epoch: 12   Global Step: 20990   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:31:17,152-Speed 24146.95 samples/sec   Loss 3.0030   LearningRate 0.0006   Epoch: 12   Global Step: 21000   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:31:27,211-Speed 24441.51 samples/sec   Loss 2.9760   LearningRate 0.0006   Epoch: 12   Global Step: 21010   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:31:37,305-Speed 24349.42 samples/sec   Loss 2.9760   LearningRate 0.0006   Epoch: 12   Global Step: 21020   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:31:47,369-Speed 24421.65 samples/sec   Loss 2.9848   LearningRate 0.0006   Epoch: 12   Global Step: 21030   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:31:57,428-Speed 24435.41 samples/sec   Loss 2.9583   LearningRate 0.0006   Epoch: 12   Global Step: 21040   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:32:07,524-Speed 24343.94 samples/sec   Loss 3.0251   LearningRate 0.0006   Epoch: 12   Global Step: 21050   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:32:17,689-Speed 24179.42 samples/sec   Loss 3.0376   LearningRate 0.0006   Epoch: 12   Global Step: 21060   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:32:27,861-Speed 24169.97 samples/sec   Loss 2.9859   LearningRate 0.0006   Epoch: 12   Global Step: 21070   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:32:37,901-Speed 24480.73 samples/sec   Loss 2.9892   LearningRate 0.0006   Epoch: 12   Global Step: 21080   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:32:48,008-Speed 24319.17 samples/sec   Loss 2.9771   LearningRate 0.0006   Epoch: 12   Global Step: 21090   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:32:58,158-Speed 24214.60 samples/sec   Loss 2.9897   LearningRate 0.0006   Epoch: 12   Global Step: 21100   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:33:08,232-Speed 24399.60 samples/sec   Loss 2.9842   LearningRate 0.0006   Epoch: 12   Global Step: 21110   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:33:18,298-Speed 24417.04 samples/sec   Loss 2.9761   LearningRate 0.0006   Epoch: 12   Global Step: 21120   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:33:28,345-Speed 24464.61 samples/sec   Loss 2.9580   LearningRate 0.0006   Epoch: 12   Global Step: 21130   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:33:38,393-Speed 24467.42 samples/sec   Loss 2.9885   LearningRate 0.0006   Epoch: 12   Global Step: 21140   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:33:48,471-Speed 24388.32 samples/sec   Loss 3.0159   LearningRate 0.0006   Epoch: 12   Global Step: 21150   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:33:58,532-Speed 24430.08 samples/sec   Loss 2.9758   LearningRate 0.0006   Epoch: 12   Global Step: 21160   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-26 04:34:08,645-Speed 24304.43 samples/sec   Loss 2.9787   LearningRate 0.0006   Epoch: 12   Global Step: 21170   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:34:18,757-Speed 24305.44 samples/sec   Loss 2.9942   LearningRate 0.0006   Epoch: 12   Global Step: 21180   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:34:28,861-Speed 24334.39 samples/sec   Loss 2.9462   LearningRate 0.0006   Epoch: 12   Global Step: 21190   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:34:39,083-Speed 24046.69 samples/sec   Loss 2.9690   LearningRate 0.0006   Epoch: 12   Global Step: 21200   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:34:49,141-Speed 24435.69 samples/sec   Loss 2.9921   LearningRate 0.0006   Epoch: 12   Global Step: 21210   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:34:59,276-Speed 24250.43 samples/sec   Loss 3.0048   LearningRate 0.0006   Epoch: 12   Global Step: 21220   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:35:09,314-Speed 24486.55 samples/sec   Loss 2.9900   LearningRate 0.0006   Epoch: 12   Global Step: 21230   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:35:19,349-Speed 24493.37 samples/sec   Loss 2.9583   LearningRate 0.0006   Epoch: 12   Global Step: 21240   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:35:29,415-Speed 24418.51 samples/sec   Loss 2.9648   LearningRate 0.0006   Epoch: 12   Global Step: 21250   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:35:39,468-Speed 24449.81 samples/sec   Loss 2.9396   LearningRate 0.0006   Epoch: 12   Global Step: 21260   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:35:49,602-Speed 24254.81 samples/sec   Loss 2.9920   LearningRate 0.0006   Epoch: 12   Global Step: 21270   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:35:59,745-Speed 24234.06 samples/sec   Loss 2.9439   LearningRate 0.0006   Epoch: 12   Global Step: 21280   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:36:09,872-Speed 24271.51 samples/sec   Loss 2.9857   LearningRate 0.0006   Epoch: 12   Global Step: 21290   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:36:19,932-Speed 24432.76 samples/sec   Loss 2.9595   LearningRate 0.0006   Epoch: 12   Global Step: 21300   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:36:30,031-Speed 24337.29 samples/sec   Loss 2.9600   LearningRate 0.0006   Epoch: 12   Global Step: 21310   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:36:40,126-Speed 24350.51 samples/sec   Loss 2.9542   LearningRate 0.0006   Epoch: 12   Global Step: 21320   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:36:50,227-Speed 24333.72 samples/sec   Loss 2.9801   LearningRate 0.0006   Epoch: 12   Global Step: 21330   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-03-26 04:37:00,268-Speed 24477.10 samples/sec   Loss 2.9919   LearningRate 0.0006   Epoch: 12   Global Step: 21340   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:37:10,337-Speed 24410.56 samples/sec   Loss 2.9514   LearningRate 0.0006   Epoch: 12   Global Step: 21350   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:37:20,362-Speed 24520.91 samples/sec   Loss 2.9651   LearningRate 0.0006   Epoch: 12   Global Step: 21360   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:37:30,439-Speed 24389.66 samples/sec   Loss 2.9295   LearningRate 0.0006   Epoch: 12   Global Step: 21370   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:37:40,475-Speed 24490.74 samples/sec   Loss 2.9823   LearningRate 0.0006   Epoch: 12   Global Step: 21380   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:37:50,560-Speed 24372.21 samples/sec   Loss 2.9661   LearningRate 0.0006   Epoch: 12   Global Step: 21390   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:38:00,665-Speed 24325.60 samples/sec   Loss 2.9663   LearningRate 0.0006   Epoch: 12   Global Step: 21400   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:38:10,771-Speed 24320.87 samples/sec   Loss 2.9444   LearningRate 0.0006   Epoch: 12   Global Step: 21410   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:38:20,830-Speed 24433.95 samples/sec   Loss 2.9500   LearningRate 0.0006   Epoch: 12   Global Step: 21420   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:38:30,951-Speed 24287.15 samples/sec   Loss 2.9573   LearningRate 0.0006   Epoch: 12   Global Step: 21430   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:38:40,949-Speed 24581.82 samples/sec   Loss 2.9438   LearningRate 0.0006   Epoch: 12   Global Step: 21440   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-03-26 04:38:51,161-Speed 24068.29 samples/sec   Loss 2.9643   LearningRate 0.0006   Epoch: 12   Global Step: 21450   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-03-26 04:39:01,298-Speed 24250.10 samples/sec   Loss 2.9615   LearningRate 0.0006   Epoch: 12   Global Step: 21460   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:39:11,365-Speed 24417.22 samples/sec   Loss 2.9684   LearningRate 0.0006   Epoch: 12   Global Step: 21470   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:39:21,466-Speed 24335.20 samples/sec   Loss 2.9319   LearningRate 0.0006   Epoch: 12   Global Step: 21480   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:39:31,535-Speed 24411.63 samples/sec   Loss 2.9167   LearningRate 0.0006   Epoch: 12   Global Step: 21490   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:39:41,596-Speed 24429.97 samples/sec   Loss 2.9709   LearningRate 0.0006   Epoch: 12   Global Step: 21500   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:39:51,654-Speed 24438.63 samples/sec   Loss 2.9438   LearningRate 0.0006   Epoch: 12   Global Step: 21510   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:40:01,785-Speed 24267.77 samples/sec   Loss 2.9749   LearningRate 0.0006   Epoch: 12   Global Step: 21520   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:40:11,976-Speed 24117.18 samples/sec   Loss 2.9629   LearningRate 0.0006   Epoch: 12   Global Step: 21530   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:40:22,100-Speed 24277.94 samples/sec   Loss 2.9474   LearningRate 0.0006   Epoch: 12   Global Step: 21540   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:40:32,171-Speed 24408.52 samples/sec   Loss 2.9403   LearningRate 0.0006   Epoch: 12   Global Step: 21550   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:40:42,225-Speed 24446.21 samples/sec   Loss 2.9509   LearningRate 0.0006   Epoch: 12   Global Step: 21560   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-03-26 04:40:52,332-Speed 24319.49 samples/sec   Loss 2.9862   LearningRate 0.0006   Epoch: 12   Global Step: 21570   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:41:02,495-Speed 24184.74 samples/sec   Loss 2.9766   LearningRate 0.0006   Epoch: 12   Global Step: 21580   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:41:12,627-Speed 24257.74 samples/sec   Loss 2.9669   LearningRate 0.0006   Epoch: 12   Global Step: 21590   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:41:22,673-Speed 24468.57 samples/sec   Loss 2.9505   LearningRate 0.0006   Epoch: 12   Global Step: 21600   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:41:32,768-Speed 24346.45 samples/sec   Loss 2.9295   LearningRate 0.0006   Epoch: 12   Global Step: 21610   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:41:42,813-Speed 24469.70 samples/sec   Loss 2.9100   LearningRate 0.0006   Epoch: 12   Global Step: 21620   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:41:52,946-Speed 24256.62 samples/sec   Loss 2.9462   LearningRate 0.0006   Epoch: 12   Global Step: 21630   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:42:03,063-Speed 24295.32 samples/sec   Loss 2.9719   LearningRate 0.0006   Epoch: 12   Global Step: 21640   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:42:13,161-Speed 24341.94 samples/sec   Loss 2.9224   LearningRate 0.0006   Epoch: 12   Global Step: 21650   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:42:23,318-Speed 24197.61 samples/sec   Loss 2.9063   LearningRate 0.0006   Epoch: 12   Global Step: 21660   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:42:33,448-Speed 24263.32 samples/sec   Loss 2.9278   LearningRate 0.0006   Epoch: 12   Global Step: 21670   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:42:43,553-Speed 24324.26 samples/sec   Loss 2.9154   LearningRate 0.0006   Epoch: 12   Global Step: 21680   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:42:53,605-Speed 24451.79 samples/sec   Loss 2.9573   LearningRate 0.0006   Epoch: 12   Global Step: 21690   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:43:03,716-Speed 24310.19 samples/sec   Loss 2.9258   LearningRate 0.0006   Epoch: 12   Global Step: 21700   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:43:13,866-Speed 24221.31 samples/sec   Loss 2.9353   LearningRate 0.0006   Epoch: 12   Global Step: 21710   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:43:24,021-Speed 24201.19 samples/sec   Loss 2.9417   LearningRate 0.0006   Epoch: 12   Global Step: 21720   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:43:34,073-Speed 24453.30 samples/sec   Loss 2.9268   LearningRate 0.0006   Epoch: 12   Global Step: 21730   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:43:44,151-Speed 24393.76 samples/sec   Loss 2.9343   LearningRate 0.0006   Epoch: 12   Global Step: 21740   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:43:54,311-Speed 24191.59 samples/sec   Loss 2.9454   LearningRate 0.0006   Epoch: 12   Global Step: 21750   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:44:04,355-Speed 24471.75 samples/sec   Loss 2.9426   LearningRate 0.0006   Epoch: 12   Global Step: 21760   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:44:14,408-Speed 24449.11 samples/sec   Loss 2.9026   LearningRate 0.0006   Epoch: 12   Global Step: 21770   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:44:24,563-Speed 24203.73 samples/sec   Loss 2.9497   LearningRate 0.0006   Epoch: 12   Global Step: 21780   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:44:34,619-Speed 24443.95 samples/sec   Loss 2.9319   LearningRate 0.0006   Epoch: 12   Global Step: 21790   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:44:44,625-Speed 24570.61 samples/sec   Loss 2.8972   LearningRate 0.0006   Epoch: 12   Global Step: 21800   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:44:54,797-Speed 24162.52 samples/sec   Loss 2.8992   LearningRate 0.0006   Epoch: 12   Global Step: 21810   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:45:04,884-Speed 24365.10 samples/sec   Loss 2.9019   LearningRate 0.0006   Epoch: 12   Global Step: 21820   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:45:14,948-Speed 24423.62 samples/sec   Loss 2.9118   LearningRate 0.0006   Epoch: 12   Global Step: 21830   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:45:24,996-Speed 24462.07 samples/sec   Loss 2.9087   LearningRate 0.0006   Epoch: 12   Global Step: 21840   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:45:35,073-Speed 24391.90 samples/sec   Loss 2.9090   LearningRate 0.0006   Epoch: 12   Global Step: 21850   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:45:45,126-Speed 24448.96 samples/sec   Loss 2.8940   LearningRate 0.0006   Epoch: 12   Global Step: 21860   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:45:55,218-Speed 24356.03 samples/sec   Loss 2.9031   LearningRate 0.0006   Epoch: 12   Global Step: 21870   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:46:05,349-Speed 24259.91 samples/sec   Loss 2.9469   LearningRate 0.0006   Epoch: 12   Global Step: 21880   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:46:15,395-Speed 24468.32 samples/sec   Loss 2.9221   LearningRate 0.0006   Epoch: 12   Global Step: 21890   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:46:25,496-Speed 24331.27 samples/sec   Loss 2.8948   LearningRate 0.0006   Epoch: 12   Global Step: 21900   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:46:35,539-Speed 24473.84 samples/sec   Loss 2.9156   LearningRate 0.0006   Epoch: 12   Global Step: 21910   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:46:45,596-Speed 24441.63 samples/sec   Loss 2.9112   LearningRate 0.0006   Epoch: 12   Global Step: 21920   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:46:55,677-Speed 24381.33 samples/sec   Loss 2.9208   LearningRate 0.0006   Epoch: 12   Global Step: 21930   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:47:05,775-Speed 24339.09 samples/sec   Loss 2.9327   LearningRate 0.0006   Epoch: 12   Global Step: 21940   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:47:15,841-Speed 24416.56 samples/sec   Loss 2.9425   LearningRate 0.0006   Epoch: 12   Global Step: 21950   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:47:25,907-Speed 24420.63 samples/sec   Loss 2.9042   LearningRate 0.0006   Epoch: 12   Global Step: 21960   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:47:36,058-Speed 24215.03 samples/sec   Loss 2.9042   LearningRate 0.0006   Epoch: 12   Global Step: 21970   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-03-26 04:47:46,113-Speed 24450.76 samples/sec   Loss 2.9219   LearningRate 0.0006   Epoch: 12   Global Step: 21980   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:47:56,197-Speed 24376.91 samples/sec   Loss 2.8956   LearningRate 0.0006   Epoch: 12   Global Step: 21990   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:48:06,306-Speed 24312.95 samples/sec   Loss 2.8922   LearningRate 0.0006   Epoch: 12   Global Step: 22000   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:48:16,369-Speed 24426.28 samples/sec   Loss 2.8813   LearningRate 0.0006   Epoch: 12   Global Step: 22010   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:48:26,403-Speed 24498.04 samples/sec   Loss 2.8894   LearningRate 0.0006   Epoch: 12   Global Step: 22020   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:48:36,503-Speed 24335.39 samples/sec   Loss 2.9351   LearningRate 0.0006   Epoch: 12   Global Step: 22030   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:48:46,578-Speed 24395.65 samples/sec   Loss 2.9147   LearningRate 0.0006   Epoch: 12   Global Step: 22040   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:48:56,724-Speed 24225.26 samples/sec   Loss 2.8902   LearningRate 0.0006   Epoch: 12   Global Step: 22050   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:49:06,814-Speed 24357.30 samples/sec   Loss 2.8894   LearningRate 0.0006   Epoch: 12   Global Step: 22060   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:49:16,890-Speed 24394.12 samples/sec   Loss 2.8770   LearningRate 0.0006   Epoch: 12   Global Step: 22070   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:49:26,902-Speed 24551.51 samples/sec   Loss 2.8653   LearningRate 0.0006   Epoch: 12   Global Step: 22080   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:49:37,053-Speed 24214.11 samples/sec   Loss 2.8850   LearningRate 0.0006   Epoch: 12   Global Step: 22090   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:49:47,155-Speed 24331.07 samples/sec   Loss 2.8971   LearningRate 0.0006   Epoch: 12   Global Step: 22100   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:49:57,490-Speed 23780.16 samples/sec   Loss 2.9006   LearningRate 0.0006   Epoch: 12   Global Step: 22110   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:50:07,570-Speed 24384.65 samples/sec   Loss 2.9088   LearningRate 0.0006   Epoch: 12   Global Step: 22120   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:50:17,766-Speed 24106.69 samples/sec   Loss 2.8975   LearningRate 0.0006   Epoch: 12   Global Step: 22130   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:50:27,842-Speed 24394.55 samples/sec   Loss 2.8893   LearningRate 0.0006   Epoch: 12   Global Step: 22140   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:50:38,006-Speed 24181.59 samples/sec   Loss 2.9115   LearningRate 0.0006   Epoch: 12   Global Step: 22150   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:50:48,064-Speed 24437.26 samples/sec   Loss 2.8977   LearningRate 0.0006   Epoch: 12   Global Step: 22160   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:50:58,319-Speed 23967.39 samples/sec   Loss 2.9000   LearningRate 0.0006   Epoch: 12   Global Step: 22170   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:51:08,409-Speed 24361.61 samples/sec   Loss 2.8870   LearningRate 0.0006   Epoch: 12   Global Step: 22180   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-03-26 04:51:18,505-Speed 24343.37 samples/sec   Loss 2.8795   LearningRate 0.0006   Epoch: 12   Global Step: 22190   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:51:28,671-Speed 24179.07 samples/sec   Loss 2.8898   LearningRate 0.0006   Epoch: 12   Global Step: 22200   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:51:38,712-Speed 24478.35 samples/sec   Loss 2.8947   LearningRate 0.0006   Epoch: 12   Global Step: 22210   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:51:48,795-Speed 24376.09 samples/sec   Loss 2.9184   LearningRate 0.0006   Epoch: 12   Global Step: 22220   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:51:58,848-Speed 24449.38 samples/sec   Loss 2.8768   LearningRate 0.0006   Epoch: 12   Global Step: 22230   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:52:08,928-Speed 24383.41 samples/sec   Loss 2.8678   LearningRate 0.0006   Epoch: 12   Global Step: 22240   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:52:19,013-Speed 24370.33 samples/sec   Loss 2.9157   LearningRate 0.0006   Epoch: 12   Global Step: 22250   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:52:29,081-Speed 24413.79 samples/sec   Loss 2.9061   LearningRate 0.0006   Epoch: 12   Global Step: 22260   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:52:39,184-Speed 24329.29 samples/sec   Loss 2.9005   LearningRate 0.0006   Epoch: 12   Global Step: 22270   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:52:49,275-Speed 24356.09 samples/sec   Loss 2.9081   LearningRate 0.0006   Epoch: 12   Global Step: 22280   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:52:59,324-Speed 24461.15 samples/sec   Loss 2.8718   LearningRate 0.0006   Epoch: 12   Global Step: 22290   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-03-26 04:53:09,410-Speed 24367.76 samples/sec   Loss 2.8978   LearningRate 0.0006   Epoch: 12   Global Step: 22300   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:53:19,574-Speed 24185.92 samples/sec   Loss 2.8877   LearningRate 0.0006   Epoch: 12   Global Step: 22310   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:53:29,705-Speed 24262.73 samples/sec   Loss 2.8900   LearningRate 0.0006   Epoch: 12   Global Step: 22320   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:53:39,784-Speed 24388.78 samples/sec   Loss 2.9241   LearningRate 0.0006   Epoch: 12   Global Step: 22330   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:53:49,860-Speed 24392.80 samples/sec   Loss 2.8810   LearningRate 0.0006   Epoch: 12   Global Step: 22340   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:54:00,011-Speed 24212.86 samples/sec   Loss 2.8685   LearningRate 0.0006   Epoch: 12   Global Step: 22350   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:54:10,062-Speed 24460.21 samples/sec   Loss 2.8859   LearningRate 0.0006   Epoch: 12   Global Step: 22360   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:54:20,138-Speed 24394.70 samples/sec   Loss 2.8751   LearningRate 0.0006   Epoch: 12   Global Step: 22370   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:54:30,204-Speed 24419.82 samples/sec   Loss 2.9083   LearningRate 0.0006   Epoch: 12   Global Step: 22380   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:54:40,359-Speed 24204.43 samples/sec   Loss 2.8999   LearningRate 0.0006   Epoch: 12   Global Step: 22390   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:54:50,446-Speed 24365.36 samples/sec   Loss 2.9186   LearningRate 0.0006   Epoch: 12   Global Step: 22400   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-03-26 04:55:00,518-Speed 24405.31 samples/sec   Loss 2.9153   LearningRate 0.0006   Epoch: 12   Global Step: 22410   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:55:10,556-Speed 24487.09 samples/sec   Loss 2.8987   LearningRate 0.0006   Epoch: 12   Global Step: 22420   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:55:20,660-Speed 24327.91 samples/sec   Loss 2.8886   LearningRate 0.0006   Epoch: 12   Global Step: 22430   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:55:30,738-Speed 24388.24 samples/sec   Loss 2.9215   LearningRate 0.0006   Epoch: 12   Global Step: 22440   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:55:40,894-Speed 24201.07 samples/sec   Loss 2.9287   LearningRate 0.0006   Epoch: 12   Global Step: 22450   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:55:51,018-Speed 24277.31 samples/sec   Loss 2.9278   LearningRate 0.0006   Epoch: 12   Global Step: 22460   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:56:51,722-Speed 4048.60 samples/sec   Loss 2.8908   LearningRate 0.0006   Epoch: 13   Global Step: 22470   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:57:01,797-Speed 24398.27 samples/sec   Loss 2.8484   LearningRate 0.0006   Epoch: 13   Global Step: 22480   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:57:11,929-Speed 24261.98 samples/sec   Loss 2.8220   LearningRate 0.0006   Epoch: 13   Global Step: 22490   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:57:21,966-Speed 24489.56 samples/sec   Loss 2.8487   LearningRate 0.0006   Epoch: 13   Global Step: 22500   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:57:32,021-Speed 24446.19 samples/sec   Loss 2.8366   LearningRate 0.0006   Epoch: 13   Global Step: 22510   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:57:42,103-Speed 24381.62 samples/sec   Loss 2.8374   LearningRate 0.0006   Epoch: 13   Global Step: 22520   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:57:52,180-Speed 24389.99 samples/sec   Loss 2.8343   LearningRate 0.0006   Epoch: 13   Global Step: 22530   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:58:02,267-Speed 24365.35 samples/sec   Loss 2.8228   LearningRate 0.0006   Epoch: 13   Global Step: 22540   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:58:12,394-Speed 24271.48 samples/sec   Loss 2.8635   LearningRate 0.0006   Epoch: 13   Global Step: 22550   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:58:22,500-Speed 24321.28 samples/sec   Loss 2.8457   LearningRate 0.0006   Epoch: 13   Global Step: 22560   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:58:32,695-Speed 24108.04 samples/sec   Loss 2.8342   LearningRate 0.0006   Epoch: 13   Global Step: 22570   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:58:42,823-Speed 24268.92 samples/sec   Loss 2.8373   LearningRate 0.0006   Epoch: 13   Global Step: 22580   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:58:52,883-Speed 24434.62 samples/sec   Loss 2.8274   LearningRate 0.0006   Epoch: 13   Global Step: 22590   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:59:02,948-Speed 24423.83 samples/sec   Loss 2.8353   LearningRate 0.0006   Epoch: 13   Global Step: 22600   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:59:13,043-Speed 24349.19 samples/sec   Loss 2.8585   LearningRate 0.0006   Epoch: 13   Global Step: 22610   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-03-26 04:59:23,191-Speed 24220.85 samples/sec   Loss 2.8325   LearningRate 0.0006   Epoch: 13   Global Step: 22620   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:59:33,265-Speed 24397.95 samples/sec   Loss 2.8376   LearningRate 0.0006   Epoch: 13   Global Step: 22630   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:59:43,359-Speed 24354.49 samples/sec   Loss 2.8723   LearningRate 0.0006   Epoch: 13   Global Step: 22640   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 04:59:53,426-Speed 24414.65 samples/sec   Loss 2.8422   LearningRate 0.0006   Epoch: 13   Global Step: 22650   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:00:03,547-Speed 24287.03 samples/sec   Loss 2.8495   LearningRate 0.0006   Epoch: 13   Global Step: 22660   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:00:13,793-Speed 23986.85 samples/sec   Loss 2.8460   LearningRate 0.0006   Epoch: 13   Global Step: 22670   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:00:23,970-Speed 24152.81 samples/sec   Loss 2.8827   LearningRate 0.0006   Epoch: 13   Global Step: 22680   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:00:34,022-Speed 24450.41 samples/sec   Loss 2.8695   LearningRate 0.0006   Epoch: 13   Global Step: 22690   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:00:44,135-Speed 24303.62 samples/sec   Loss 2.8707   LearningRate 0.0006   Epoch: 13   Global Step: 22700   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:00:54,261-Speed 24274.15 samples/sec   Loss 2.8750   LearningRate 0.0006   Epoch: 13   Global Step: 22710   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:01:04,346-Speed 24370.80 samples/sec   Loss 2.8462   LearningRate 0.0006   Epoch: 13   Global Step: 22720   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-03-26 05:01:14,387-Speed 24480.30 samples/sec   Loss 2.8294   LearningRate 0.0006   Epoch: 13   Global Step: 22730   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:01:24,451-Speed 24422.79 samples/sec   Loss 2.8348   LearningRate 0.0006   Epoch: 13   Global Step: 22740   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:01:34,539-Speed 24365.26 samples/sec   Loss 2.8415   LearningRate 0.0006   Epoch: 13   Global Step: 22750   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:01:44,817-Speed 23912.26 samples/sec   Loss 2.8421   LearningRate 0.0006   Epoch: 13   Global Step: 22760   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:01:54,885-Speed 24414.25 samples/sec   Loss 2.8537   LearningRate 0.0006   Epoch: 13   Global Step: 22770   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:02:04,974-Speed 24360.80 samples/sec   Loss 2.8614   LearningRate 0.0006   Epoch: 13   Global Step: 22780   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:02:15,101-Speed 24272.80 samples/sec   Loss 2.8599   LearningRate 0.0006   Epoch: 13   Global Step: 22790   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:02:25,209-Speed 24315.12 samples/sec   Loss 2.8381   LearningRate 0.0006   Epoch: 13   Global Step: 22800   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:02:35,339-Speed 24264.68 samples/sec   Loss 2.8419   LearningRate 0.0006   Epoch: 13   Global Step: 22810   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:02:45,436-Speed 24341.08 samples/sec   Loss 2.8196   LearningRate 0.0006   Epoch: 13   Global Step: 22820   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:02:55,597-Speed 24190.97 samples/sec   Loss 2.8222   LearningRate 0.0006   Epoch: 13   Global Step: 22830   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:03:05,736-Speed 24249.25 samples/sec   Loss 2.8512   LearningRate 0.0006   Epoch: 13   Global Step: 22840   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:03:15,834-Speed 24339.78 samples/sec   Loss 2.8723   LearningRate 0.0006   Epoch: 13   Global Step: 22850   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:03:25,927-Speed 24360.16 samples/sec   Loss 2.8624   LearningRate 0.0006   Epoch: 13   Global Step: 22860   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:03:35,940-Speed 24546.52 samples/sec   Loss 2.8894   LearningRate 0.0006   Epoch: 13   Global Step: 22870   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-26 05:03:46,078-Speed 24244.47 samples/sec   Loss 2.8498   LearningRate 0.0006   Epoch: 13   Global Step: 22880   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-26 05:03:56,138-Speed 24432.27 samples/sec   Loss 2.8142   LearningRate 0.0006   Epoch: 13   Global Step: 22890   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-26 05:04:06,201-Speed 24424.63 samples/sec   Loss 2.8106   LearningRate 0.0006   Epoch: 13   Global Step: 22900   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-26 05:04:16,320-Speed 24289.15 samples/sec   Loss 2.8290   LearningRate 0.0006   Epoch: 13   Global Step: 22910   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-26 05:04:26,368-Speed 24462.47 samples/sec   Loss 2.8240   LearningRate 0.0006   Epoch: 13   Global Step: 22920   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-26 05:04:36,534-Speed 24177.55 samples/sec   Loss 2.8375   LearningRate 0.0006   Epoch: 13   Global Step: 22930   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-26 05:04:46,650-Speed 24298.70 samples/sec   Loss 2.8296   LearningRate 0.0006   Epoch: 13   Global Step: 22940   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-26 05:04:56,762-Speed 24306.04 samples/sec   Loss 2.8526   LearningRate 0.0006   Epoch: 13   Global Step: 22950   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-26 05:05:06,890-Speed 24270.04 samples/sec   Loss 2.8079   LearningRate 0.0006   Epoch: 13   Global Step: 22960   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-26 05:05:16,992-Speed 24330.77 samples/sec   Loss 2.8060   LearningRate 0.0006   Epoch: 13   Global Step: 22970   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:05:27,123-Speed 24261.10 samples/sec   Loss 2.8418   LearningRate 0.0006   Epoch: 13   Global Step: 22980   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:05:37,192-Speed 24411.82 samples/sec   Loss 2.8270   LearningRate 0.0005   Epoch: 13   Global Step: 22990   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:05:47,376-Speed 24134.68 samples/sec   Loss 2.8263   LearningRate 0.0005   Epoch: 13   Global Step: 23000   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:05:57,470-Speed 24349.63 samples/sec   Loss 2.8115   LearningRate 0.0005   Epoch: 13   Global Step: 23010   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-26 05:06:07,578-Speed 24319.22 samples/sec   Loss 2.8223   LearningRate 0.0005   Epoch: 13   Global Step: 23020   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-26 05:06:17,631-Speed 24447.53 samples/sec   Loss 2.8342   LearningRate 0.0005   Epoch: 13   Global Step: 23030   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-26 05:06:27,718-Speed 24366.17 samples/sec   Loss 2.8140   LearningRate 0.0005   Epoch: 13   Global Step: 23040   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-26 05:06:37,776-Speed 24439.43 samples/sec   Loss 2.8265   LearningRate 0.0005   Epoch: 13   Global Step: 23050   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-26 05:06:47,866-Speed 24360.21 samples/sec   Loss 2.8255   LearningRate 0.0005   Epoch: 13   Global Step: 23060   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-26 05:06:57,886-Speed 24530.11 samples/sec   Loss 2.8290   LearningRate 0.0005   Epoch: 13   Global Step: 23070   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-26 05:07:07,987-Speed 24336.86 samples/sec   Loss 2.8078   LearningRate 0.0005   Epoch: 13   Global Step: 23080   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-26 05:07:18,133-Speed 24226.07 samples/sec   Loss 2.8061   LearningRate 0.0005   Epoch: 13   Global Step: 23090   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-26 05:07:28,302-Speed 24170.31 samples/sec   Loss 2.8332   LearningRate 0.0005   Epoch: 13   Global Step: 23100   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-26 05:07:38,425-Speed 24283.96 samples/sec   Loss 2.8114   LearningRate 0.0005   Epoch: 13   Global Step: 23110   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:07:48,480-Speed 24446.98 samples/sec   Loss 2.8152   LearningRate 0.0005   Epoch: 13   Global Step: 23120   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:07:58,565-Speed 24370.10 samples/sec   Loss 2.8321   LearningRate 0.0005   Epoch: 13   Global Step: 23130   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:08:08,731-Speed 24181.61 samples/sec   Loss 2.8077   LearningRate 0.0005   Epoch: 13   Global Step: 23140   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:08:18,819-Speed 24365.15 samples/sec   Loss 2.8059   LearningRate 0.0005   Epoch: 13   Global Step: 23150   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:08:28,882-Speed 24423.33 samples/sec   Loss 2.8159   LearningRate 0.0005   Epoch: 13   Global Step: 23160   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:08:38,926-Speed 24474.05 samples/sec   Loss 2.8033   LearningRate 0.0005   Epoch: 13   Global Step: 23170   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:08:49,040-Speed 24302.76 samples/sec   Loss 2.8173   LearningRate 0.0005   Epoch: 13   Global Step: 23180   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:08:59,112-Speed 24402.63 samples/sec   Loss 2.8303   LearningRate 0.0005   Epoch: 13   Global Step: 23190   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:09:09,235-Speed 24280.78 samples/sec   Loss 2.8225   LearningRate 0.0005   Epoch: 13   Global Step: 23200   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:09:19,345-Speed 24312.61 samples/sec   Loss 2.8082   LearningRate 0.0005   Epoch: 13   Global Step: 23210   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:09:29,426-Speed 24383.27 samples/sec   Loss 2.8027   LearningRate 0.0005   Epoch: 13   Global Step: 23220   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:09:39,465-Speed 24488.86 samples/sec   Loss 2.8474   LearningRate 0.0005   Epoch: 13   Global Step: 23230   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:09:49,549-Speed 24373.00 samples/sec   Loss 2.8269   LearningRate 0.0005   Epoch: 13   Global Step: 23240   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:09:59,623-Speed 24398.74 samples/sec   Loss 2.8001   LearningRate 0.0005   Epoch: 13   Global Step: 23250   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:10:09,668-Speed 24468.71 samples/sec   Loss 2.7818   LearningRate 0.0005   Epoch: 13   Global Step: 23260   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:10:19,835-Speed 24176.31 samples/sec   Loss 2.8421   LearningRate 0.0005   Epoch: 13   Global Step: 23270   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:10:29,983-Speed 24218.79 samples/sec   Loss 2.7797   LearningRate 0.0005   Epoch: 13   Global Step: 23280   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:10:40,013-Speed 24504.66 samples/sec   Loss 2.7777   LearningRate 0.0005   Epoch: 13   Global Step: 23290   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:10:50,074-Speed 24431.02 samples/sec   Loss 2.7758   LearningRate 0.0005   Epoch: 13   Global Step: 23300   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:11:00,043-Speed 24658.60 samples/sec   Loss 2.8101   LearningRate 0.0005   Epoch: 13   Global Step: 23310   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:11:10,147-Speed 24325.22 samples/sec   Loss 2.8052   LearningRate 0.0005   Epoch: 13   Global Step: 23320   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:11:20,288-Speed 24236.99 samples/sec   Loss 2.8100   LearningRate 0.0005   Epoch: 13   Global Step: 23330   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:11:30,405-Speed 24293.84 samples/sec   Loss 2.7710   LearningRate 0.0005   Epoch: 13   Global Step: 23340   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:11:40,530-Speed 24275.90 samples/sec   Loss 2.8046   LearningRate 0.0005   Epoch: 13   Global Step: 23350   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:11:50,689-Speed 24202.53 samples/sec   Loss 2.7874   LearningRate 0.0005   Epoch: 13   Global Step: 23360   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:12:00,787-Speed 24344.25 samples/sec   Loss 2.8086   LearningRate 0.0005   Epoch: 13   Global Step: 23370   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:12:10,861-Speed 24396.85 samples/sec   Loss 2.8140   LearningRate 0.0005   Epoch: 13   Global Step: 23380   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:12:20,979-Speed 24296.00 samples/sec   Loss 2.7807   LearningRate 0.0005   Epoch: 13   Global Step: 23390   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:12:31,051-Speed 24403.26 samples/sec   Loss 2.7702   LearningRate 0.0005   Epoch: 13   Global Step: 23400   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:12:41,194-Speed 24232.21 samples/sec   Loss 2.7950   LearningRate 0.0005   Epoch: 13   Global Step: 23410   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-03-26 05:12:51,335-Speed 24238.22 samples/sec   Loss 2.7948   LearningRate 0.0005   Epoch: 13   Global Step: 23420   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:13:01,435-Speed 24336.41 samples/sec   Loss 2.7874   LearningRate 0.0005   Epoch: 13   Global Step: 23430   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:13:11,530-Speed 24347.30 samples/sec   Loss 2.7802   LearningRate 0.0005   Epoch: 13   Global Step: 23440   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:13:21,736-Speed 24081.94 samples/sec   Loss 2.7837   LearningRate 0.0005   Epoch: 13   Global Step: 23450   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:13:31,874-Speed 24244.14 samples/sec   Loss 2.7842   LearningRate 0.0005   Epoch: 13   Global Step: 23460   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:13:41,960-Speed 24368.95 samples/sec   Loss 2.7973   LearningRate 0.0005   Epoch: 13   Global Step: 23470   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-26 05:13:52,061-Speed 24332.77 samples/sec   Loss 2.7953   LearningRate 0.0005   Epoch: 13   Global Step: 23480   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-26 05:14:02,147-Speed 24368.71 samples/sec   Loss 2.8044   LearningRate 0.0005   Epoch: 13   Global Step: 23490   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-26 05:14:12,291-Speed 24230.34 samples/sec   Loss 2.8066   LearningRate 0.0005   Epoch: 13   Global Step: 23500   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-26 05:14:22,390-Speed 24339.08 samples/sec   Loss 2.7849   LearningRate 0.0005   Epoch: 13   Global Step: 23510   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-26 05:14:32,506-Speed 24296.67 samples/sec   Loss 2.7811   LearningRate 0.0005   Epoch: 13   Global Step: 23520   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-26 05:14:42,636-Speed 24264.66 samples/sec   Loss 2.7811   LearningRate 0.0005   Epoch: 13   Global Step: 23530   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-26 05:14:52,680-Speed 24471.18 samples/sec   Loss 2.7798   LearningRate 0.0005   Epoch: 13   Global Step: 23540   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-26 05:15:02,722-Speed 24478.04 samples/sec   Loss 2.7702   LearningRate 0.0005   Epoch: 13   Global Step: 23550   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-26 05:15:12,838-Speed 24298.51 samples/sec   Loss 2.7906   LearningRate 0.0005   Epoch: 13   Global Step: 23560   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-26 05:15:22,912-Speed 24397.07 samples/sec   Loss 2.8172   LearningRate 0.0005   Epoch: 13   Global Step: 23570   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:15:33,120-Speed 24077.83 samples/sec   Loss 2.7865   LearningRate 0.0005   Epoch: 13   Global Step: 23580   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:15:43,242-Speed 24284.89 samples/sec   Loss 2.7799   LearningRate 0.0005   Epoch: 13   Global Step: 23590   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:15:53,327-Speed 24369.87 samples/sec   Loss 2.7806   LearningRate 0.0005   Epoch: 13   Global Step: 23600   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-26 05:16:03,516-Speed 24124.92 samples/sec   Loss 2.7959   LearningRate 0.0005   Epoch: 13   Global Step: 23610   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-26 05:16:13,632-Speed 24297.65 samples/sec   Loss 2.7721   LearningRate 0.0005   Epoch: 13   Global Step: 23620   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-26 05:16:23,705-Speed 24397.90 samples/sec   Loss 2.7681   LearningRate 0.0005   Epoch: 13   Global Step: 23630   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-26 05:16:33,823-Speed 24292.66 samples/sec   Loss 2.7570   LearningRate 0.0005   Epoch: 13   Global Step: 23640   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-26 05:16:43,929-Speed 24323.36 samples/sec   Loss 2.7793   LearningRate 0.0005   Epoch: 13   Global Step: 23650   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-26 05:16:53,988-Speed 24434.40 samples/sec   Loss 2.7991   LearningRate 0.0005   Epoch: 13   Global Step: 23660   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-26 05:17:04,115-Speed 24271.63 samples/sec   Loss 2.7878   LearningRate 0.0005   Epoch: 13   Global Step: 23670   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-26 05:17:14,132-Speed 24537.87 samples/sec   Loss 2.7824   LearningRate 0.0005   Epoch: 13   Global Step: 23680   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-26 05:17:24,240-Speed 24315.97 samples/sec   Loss 2.7594   LearningRate 0.0005   Epoch: 13   Global Step: 23690   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-26 05:17:34,328-Speed 24364.36 samples/sec   Loss 2.7759   LearningRate 0.0005   Epoch: 13   Global Step: 23700   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:17:44,453-Speed 24275.75 samples/sec   Loss 2.7873   LearningRate 0.0005   Epoch: 13   Global Step: 23710   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:17:54,562-Speed 24315.47 samples/sec   Loss 2.7712   LearningRate 0.0005   Epoch: 13   Global Step: 23720   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:18:04,667-Speed 24322.73 samples/sec   Loss 2.7574   LearningRate 0.0005   Epoch: 13   Global Step: 23730   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:18:14,827-Speed 24193.40 samples/sec   Loss 2.7841   LearningRate 0.0005   Epoch: 13   Global Step: 23740   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:18:24,985-Speed 24196.79 samples/sec   Loss 2.7884   LearningRate 0.0005   Epoch: 13   Global Step: 23750   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:18:35,064-Speed 24386.15 samples/sec   Loss 2.7362   LearningRate 0.0005   Epoch: 13   Global Step: 23760   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:18:45,145-Speed 24387.13 samples/sec   Loss 2.7608   LearningRate 0.0005   Epoch: 13   Global Step: 23770   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:18:55,213-Speed 24411.75 samples/sec   Loss 2.7635   LearningRate 0.0005   Epoch: 13   Global Step: 23780   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:19:05,239-Speed 24514.03 samples/sec   Loss 2.7705   LearningRate 0.0005   Epoch: 13   Global Step: 23790   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:19:15,333-Speed 24349.66 samples/sec   Loss 2.7840   LearningRate 0.0005   Epoch: 13   Global Step: 23800   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-03-26 05:19:25,349-Speed 24540.21 samples/sec   Loss 2.7560   LearningRate 0.0005   Epoch: 13   Global Step: 23810   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:19:35,383-Speed 24494.03 samples/sec   Loss 2.8086   LearningRate 0.0005   Epoch: 13   Global Step: 23820   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:19:45,438-Speed 24446.46 samples/sec   Loss 2.7589   LearningRate 0.0005   Epoch: 13   Global Step: 23830   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:19:55,535-Speed 24342.17 samples/sec   Loss 2.7494   LearningRate 0.0005   Epoch: 13   Global Step: 23840   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:20:05,647-Speed 24306.34 samples/sec   Loss 2.7837   LearningRate 0.0005   Epoch: 13   Global Step: 23850   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:20:15,717-Speed 24407.41 samples/sec   Loss 2.7428   LearningRate 0.0005   Epoch: 13   Global Step: 23860   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:20:25,792-Speed 24396.89 samples/sec   Loss 2.7686   LearningRate 0.0005   Epoch: 13   Global Step: 23870   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:20:35,847-Speed 24449.29 samples/sec   Loss 2.7445   LearningRate 0.0005   Epoch: 13   Global Step: 23880   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:20:45,945-Speed 24341.43 samples/sec   Loss 2.7407   LearningRate 0.0005   Epoch: 13   Global Step: 23890   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:20:56,045-Speed 24336.44 samples/sec   Loss 2.7516   LearningRate 0.0005   Epoch: 13   Global Step: 23900   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:21:06,126-Speed 24381.13 samples/sec   Loss 2.7537   LearningRate 0.0005   Epoch: 13   Global Step: 23910   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:21:16,342-Speed 24058.88 samples/sec   Loss 2.7631   LearningRate 0.0005   Epoch: 13   Global Step: 23920   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:21:26,435-Speed 24360.94 samples/sec   Loss 2.7545   LearningRate 0.0005   Epoch: 13   Global Step: 23930   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:21:36,582-Speed 24222.67 samples/sec   Loss 2.7649   LearningRate 0.0005   Epoch: 13   Global Step: 23940   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:21:46,667-Speed 24371.51 samples/sec   Loss 2.7400   LearningRate 0.0005   Epoch: 13   Global Step: 23950   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:21:56,876-Speed 24074.79 samples/sec   Loss 2.7221   LearningRate 0.0005   Epoch: 13   Global Step: 23960   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:22:06,979-Speed 24328.47 samples/sec   Loss 2.7580   LearningRate 0.0005   Epoch: 13   Global Step: 23970   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:22:17,112-Speed 24253.43 samples/sec   Loss 2.7694   LearningRate 0.0005   Epoch: 13   Global Step: 23980   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:22:27,321-Speed 24076.42 samples/sec   Loss 2.7435   LearningRate 0.0005   Epoch: 13   Global Step: 23990   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:22:37,376-Speed 24445.39 samples/sec   Loss 2.7312   LearningRate 0.0005   Epoch: 13   Global Step: 24000   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:22:47,462-Speed 24369.13 samples/sec   Loss 2.7472   LearningRate 0.0005   Epoch: 13   Global Step: 24010   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:22:57,682-Speed 24051.02 samples/sec   Loss 2.7403   LearningRate 0.0005   Epoch: 13   Global Step: 24020   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:23:07,948-Speed 23944.87 samples/sec   Loss 2.7323   LearningRate 0.0005   Epoch: 13   Global Step: 24030   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:23:17,817-Speed 24905.15 samples/sec   Loss 2.7289   LearningRate 0.0005   Epoch: 13   Global Step: 24040   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:23:27,791-Speed 24646.85 samples/sec   Loss 2.7514   LearningRate 0.0005   Epoch: 13   Global Step: 24050   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:23:37,594-Speed 25072.87 samples/sec   Loss 2.7494   LearningRate 0.0005   Epoch: 13   Global Step: 24060   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:23:47,598-Speed 24568.89 samples/sec   Loss 2.7376   LearningRate 0.0005   Epoch: 13   Global Step: 24070   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:23:57,500-Speed 24822.60 samples/sec   Loss 2.7465   LearningRate 0.0005   Epoch: 13   Global Step: 24080   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:24:07,345-Speed 24965.02 samples/sec   Loss 2.7600   LearningRate 0.0005   Epoch: 13   Global Step: 24090   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:24:17,127-Speed 25126.85 samples/sec   Loss 2.7553   LearningRate 0.0005   Epoch: 13   Global Step: 24100   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:24:26,939-Speed 25050.09 samples/sec   Loss 2.7687   LearningRate 0.0005   Epoch: 13   Global Step: 24110   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:24:36,687-Speed 25213.97 samples/sec   Loss 2.7613   LearningRate 0.0005   Epoch: 13   Global Step: 24120   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:24:46,504-Speed 25038.41 samples/sec   Loss 2.7639   LearningRate 0.0005   Epoch: 13   Global Step: 24130   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:24:56,292-Speed 25112.63 samples/sec   Loss 2.7341   LearningRate 0.0005   Epoch: 13   Global Step: 24140   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:25:06,218-Speed 24761.23 samples/sec   Loss 2.7630   LearningRate 0.0005   Epoch: 13   Global Step: 24150   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-26 05:25:16,184-Speed 24665.31 samples/sec   Loss 2.7803   LearningRate 0.0005   Epoch: 13   Global Step: 24160   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-26 05:25:26,010-Speed 25011.99 samples/sec   Loss 2.7879   LearningRate 0.0005   Epoch: 13   Global Step: 24170   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-26 05:25:35,744-Speed 25252.80 samples/sec   Loss 2.7593   LearningRate 0.0005   Epoch: 13   Global Step: 24180   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-26 05:25:45,600-Speed 24939.15 samples/sec   Loss 2.7623   LearningRate 0.0005   Epoch: 13   Global Step: 24190   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-26 05:26:45,634-Speed 4093.72 samples/sec   Loss 2.7564   LearningRate 0.0005   Epoch: 14   Global Step: 24200   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-26 05:26:55,614-Speed 24635.58 samples/sec   Loss 2.6942   LearningRate 0.0005   Epoch: 14   Global Step: 24210   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-26 05:27:05,585-Speed 24649.85 samples/sec   Loss 2.6881   LearningRate 0.0005   Epoch: 14   Global Step: 24220   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-26 05:27:15,513-Speed 24757.11 samples/sec   Loss 2.6925   LearningRate 0.0005   Epoch: 14   Global Step: 24230   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-26 05:27:25,454-Speed 24727.38 samples/sec   Loss 2.7048   LearningRate 0.0005   Epoch: 14   Global Step: 24240   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-26 05:27:35,406-Speed 24698.15 samples/sec   Loss 2.7079   LearningRate 0.0005   Epoch: 14   Global Step: 24250   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:27:45,352-Speed 24711.83 samples/sec   Loss 2.6933   LearningRate 0.0005   Epoch: 14   Global Step: 24260   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:27:55,380-Speed 24511.26 samples/sec   Loss 2.7143   LearningRate 0.0005   Epoch: 14   Global Step: 24270   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:28:05,362-Speed 24630.03 samples/sec   Loss 2.6985   LearningRate 0.0005   Epoch: 14   Global Step: 24280   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:28:15,454-Speed 24355.38 samples/sec   Loss 2.7338   LearningRate 0.0005   Epoch: 14   Global Step: 24290   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:28:25,523-Speed 24409.94 samples/sec   Loss 2.7478   LearningRate 0.0005   Epoch: 14   Global Step: 24300   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:28:35,533-Speed 24555.06 samples/sec   Loss 2.7431   LearningRate 0.0005   Epoch: 14   Global Step: 24310   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:28:45,463-Speed 24751.35 samples/sec   Loss 2.7055   LearningRate 0.0005   Epoch: 14   Global Step: 24320   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:28:55,351-Speed 24856.19 samples/sec   Loss 2.6957   LearningRate 0.0005   Epoch: 14   Global Step: 24330   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:29:05,145-Speed 25097.38 samples/sec   Loss 2.7085   LearningRate 0.0005   Epoch: 14   Global Step: 24340   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:29:14,968-Speed 25020.31 samples/sec   Loss 2.7242   LearningRate 0.0005   Epoch: 14   Global Step: 24350   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:29:24,814-Speed 24963.85 samples/sec   Loss 2.7154   LearningRate 0.0005   Epoch: 14   Global Step: 24360   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:29:34,661-Speed 24960.93 samples/sec   Loss 2.7636   LearningRate 0.0005   Epoch: 14   Global Step: 24370   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:29:44,646-Speed 24622.19 samples/sec   Loss 2.7751   LearningRate 0.0005   Epoch: 14   Global Step: 24380   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:29:54,478-Speed 24998.33 samples/sec   Loss 2.6945   LearningRate 0.0005   Epoch: 14   Global Step: 24390   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-26 05:30:04,301-Speed 25022.61 samples/sec   Loss 2.7132   LearningRate 0.0005   Epoch: 14   Global Step: 24400   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-26 05:30:14,081-Speed 25131.12 samples/sec   Loss 2.7118   LearningRate 0.0005   Epoch: 14   Global Step: 24410   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-26 05:30:23,831-Speed 25211.93 samples/sec   Loss 2.7147   LearningRate 0.0005   Epoch: 14   Global Step: 24420   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-26 05:30:33,655-Speed 25019.78 samples/sec   Loss 2.7186   LearningRate 0.0005   Epoch: 14   Global Step: 24430   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-26 05:30:43,421-Speed 25168.11 samples/sec   Loss 2.7163   LearningRate 0.0005   Epoch: 14   Global Step: 24440   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-26 05:30:53,122-Speed 25340.20 samples/sec   Loss 2.7368   LearningRate 0.0005   Epoch: 14   Global Step: 24450   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-26 05:31:02,960-Speed 24982.04 samples/sec   Loss 2.7254   LearningRate 0.0005   Epoch: 14   Global Step: 24460   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-26 05:31:12,707-Speed 25217.03 samples/sec   Loss 2.7254   LearningRate 0.0005   Epoch: 14   Global Step: 24470   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-26 05:31:22,556-Speed 24956.02 samples/sec   Loss 2.7139   LearningRate 0.0005   Epoch: 14   Global Step: 24480   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-26 05:31:32,399-Speed 24970.95 samples/sec   Loss 2.7184   LearningRate 0.0005   Epoch: 14   Global Step: 24490   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:31:42,420-Speed 24527.05 samples/sec   Loss 2.7227   LearningRate 0.0005   Epoch: 14   Global Step: 24500   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:31:52,425-Speed 24568.24 samples/sec   Loss 2.7024   LearningRate 0.0005   Epoch: 14   Global Step: 24510   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:32:02,309-Speed 24867.23 samples/sec   Loss 2.7113   LearningRate 0.0005   Epoch: 14   Global Step: 24520   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:32:12,086-Speed 25139.34 samples/sec   Loss 2.6929   LearningRate 0.0005   Epoch: 14   Global Step: 24530   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:32:21,845-Speed 25187.58 samples/sec   Loss 2.7137   LearningRate 0.0005   Epoch: 14   Global Step: 24540   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:32:31,719-Speed 24892.15 samples/sec   Loss 2.7135   LearningRate 0.0005   Epoch: 14   Global Step: 24550   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:32:41,592-Speed 24894.09 samples/sec   Loss 2.7307   LearningRate 0.0005   Epoch: 14   Global Step: 24560   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:32:51,463-Speed 24902.23 samples/sec   Loss 2.7001   LearningRate 0.0005   Epoch: 14   Global Step: 24570   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:33:01,300-Speed 24985.30 samples/sec   Loss 2.7206   LearningRate 0.0005   Epoch: 14   Global Step: 24580   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:33:11,121-Speed 25029.45 samples/sec   Loss 2.7116   LearningRate 0.0005   Epoch: 14   Global Step: 24590   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:33:20,882-Speed 25180.02 samples/sec   Loss 2.7024   LearningRate 0.0005   Epoch: 14   Global Step: 24600   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:33:30,779-Speed 24833.95 samples/sec   Loss 2.7012   LearningRate 0.0005   Epoch: 14   Global Step: 24610   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:33:40,532-Speed 25201.20 samples/sec   Loss 2.6993   LearningRate 0.0005   Epoch: 14   Global Step: 24620   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:33:50,371-Speed 24981.23 samples/sec   Loss 2.7427   LearningRate 0.0005   Epoch: 14   Global Step: 24630   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:34:00,266-Speed 24840.23 samples/sec   Loss 2.7351   LearningRate 0.0005   Epoch: 14   Global Step: 24640   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:34:10,067-Speed 25077.20 samples/sec   Loss 2.6975   LearningRate 0.0005   Epoch: 14   Global Step: 24650   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:34:19,939-Speed 24899.10 samples/sec   Loss 2.7000   LearningRate 0.0005   Epoch: 14   Global Step: 24660   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:34:29,822-Speed 24869.84 samples/sec   Loss 2.7264   LearningRate 0.0005   Epoch: 14   Global Step: 24670   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:34:39,627-Speed 25066.76 samples/sec   Loss 2.7228   LearningRate 0.0005   Epoch: 14   Global Step: 24680   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:34:49,626-Speed 24587.93 samples/sec   Loss 2.6740   LearningRate 0.0005   Epoch: 14   Global Step: 24690   Fp16 Grad Scale: 131072   Required: 13 hours
Training: 2022-03-26 05:34:59,417-Speed 25103.18 samples/sec   Loss 2.6843   LearningRate 0.0005   Epoch: 14   Global Step: 24700   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:35:09,183-Speed 25166.98 samples/sec   Loss 2.7011   LearningRate 0.0005   Epoch: 14   Global Step: 24710   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:35:18,939-Speed 25194.34 samples/sec   Loss 2.7261   LearningRate 0.0005   Epoch: 14   Global Step: 24720   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:35:28,747-Speed 25061.75 samples/sec   Loss 2.6883   LearningRate 0.0005   Epoch: 14   Global Step: 24730   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:35:38,615-Speed 24906.96 samples/sec   Loss 2.6872   LearningRate 0.0005   Epoch: 14   Global Step: 24740   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:35:48,390-Speed 25151.73 samples/sec   Loss 2.6966   LearningRate 0.0005   Epoch: 14   Global Step: 24750   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:35:58,141-Speed 25204.50 samples/sec   Loss 2.6841   LearningRate 0.0005   Epoch: 14   Global Step: 24760   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:36:07,997-Speed 24937.90 samples/sec   Loss 2.6684   LearningRate 0.0005   Epoch: 14   Global Step: 24770   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:36:17,922-Speed 24766.19 samples/sec   Loss 2.6855   LearningRate 0.0005   Epoch: 14   Global Step: 24780   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:36:27,785-Speed 24918.03 samples/sec   Loss 2.6937   LearningRate 0.0005   Epoch: 14   Global Step: 24790   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:36:37,594-Speed 25057.31 samples/sec   Loss 2.7297   LearningRate 0.0005   Epoch: 14   Global Step: 24800   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:36:47,423-Speed 25007.53 samples/sec   Loss 2.7012   LearningRate 0.0005   Epoch: 14   Global Step: 24810   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:36:57,351-Speed 24756.80 samples/sec   Loss 2.6865   LearningRate 0.0005   Epoch: 14   Global Step: 24820   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:37:07,107-Speed 25193.08 samples/sec   Loss 2.6698   LearningRate 0.0005   Epoch: 14   Global Step: 24830   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-26 05:37:16,932-Speed 25015.90 samples/sec   Loss 2.6925   LearningRate 0.0005   Epoch: 14   Global Step: 24840   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 05:37:26,734-Speed 25077.93 samples/sec   Loss 2.6784   LearningRate 0.0005   Epoch: 14   Global Step: 24850   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 05:37:36,521-Speed 25113.04 samples/sec   Loss 2.6629   LearningRate 0.0005   Epoch: 14   Global Step: 24860   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 05:37:46,475-Speed 24693.19 samples/sec   Loss 2.6569   LearningRate 0.0005   Epoch: 14   Global Step: 24870   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 05:37:56,355-Speed 24876.71 samples/sec   Loss 2.6589   LearningRate 0.0005   Epoch: 14   Global Step: 24880   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 05:38:06,356-Speed 24581.93 samples/sec   Loss 2.6824   LearningRate 0.0005   Epoch: 14   Global Step: 24890   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 05:38:16,198-Speed 24971.51 samples/sec   Loss 2.6909   LearningRate 0.0005   Epoch: 14   Global Step: 24900   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 05:38:26,086-Speed 24860.32 samples/sec   Loss 2.7082   LearningRate 0.0005   Epoch: 14   Global Step: 24910   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 05:38:35,977-Speed 24854.90 samples/sec   Loss 2.6714   LearningRate 0.0005   Epoch: 14   Global Step: 24920   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 05:38:45,764-Speed 25113.52 samples/sec   Loss 2.6940   LearningRate 0.0005   Epoch: 14   Global Step: 24930   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 05:38:55,565-Speed 25078.90 samples/sec   Loss 2.6737   LearningRate 0.0005   Epoch: 14   Global Step: 24940   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 05:39:05,375-Speed 25053.42 samples/sec   Loss 2.6739   LearningRate 0.0005   Epoch: 14   Global Step: 24950   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 05:39:15,089-Speed 25304.31 samples/sec   Loss 2.6405   LearningRate 0.0005   Epoch: 14   Global Step: 24960   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 05:39:24,940-Speed 24950.07 samples/sec   Loss 2.6954   LearningRate 0.0005   Epoch: 14   Global Step: 24970   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 05:39:34,784-Speed 24967.38 samples/sec   Loss 2.6629   LearningRate 0.0005   Epoch: 14   Global Step: 24980   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 05:39:44,580-Speed 25089.33 samples/sec   Loss 2.6713   LearningRate 0.0005   Epoch: 14   Global Step: 24990   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 05:39:54,370-Speed 25106.40 samples/sec   Loss 2.7029   LearningRate 0.0005   Epoch: 14   Global Step: 25000   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 05:40:04,258-Speed 24854.70 samples/sec   Loss 2.6748   LearningRate 0.0005   Epoch: 14   Global Step: 25010   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 05:40:14,196-Speed 24733.05 samples/sec   Loss 2.6761   LearningRate 0.0005   Epoch: 14   Global Step: 25020   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 05:40:24,105-Speed 24808.90 samples/sec   Loss 2.6584   LearningRate 0.0005   Epoch: 14   Global Step: 25030   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 05:40:33,998-Speed 24843.81 samples/sec   Loss 2.6653   LearningRate 0.0005   Epoch: 14   Global Step: 25040   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 05:40:43,787-Speed 25107.92 samples/sec   Loss 2.6829   LearningRate 0.0005   Epoch: 14   Global Step: 25050   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 05:40:53,697-Speed 24801.24 samples/sec   Loss 2.6934   LearningRate 0.0005   Epoch: 14   Global Step: 25060   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 05:41:03,571-Speed 24893.32 samples/sec   Loss 2.6800   LearningRate 0.0005   Epoch: 14   Global Step: 25070   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 05:41:13,386-Speed 25039.46 samples/sec   Loss 2.6747   LearningRate 0.0005   Epoch: 14   Global Step: 25080   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 05:41:23,199-Speed 25047.63 samples/sec   Loss 2.6494   LearningRate 0.0005   Epoch: 14   Global Step: 25090   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 05:41:33,128-Speed 24754.47 samples/sec   Loss 2.6785   LearningRate 0.0005   Epoch: 14   Global Step: 25100   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-03-26 05:41:42,975-Speed 24960.67 samples/sec   Loss 2.6697   LearningRate 0.0005   Epoch: 14   Global Step: 25110   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 05:41:52,835-Speed 24926.06 samples/sec   Loss 2.6589   LearningRate 0.0005   Epoch: 14   Global Step: 25120   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 05:42:02,643-Speed 25062.17 samples/sec   Loss 2.7021   LearningRate 0.0005   Epoch: 14   Global Step: 25130   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 05:42:12,506-Speed 24919.55 samples/sec   Loss 2.6772   LearningRate 0.0005   Epoch: 14   Global Step: 25140   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 05:42:22,530-Speed 24519.38 samples/sec   Loss 2.6483   LearningRate 0.0005   Epoch: 14   Global Step: 25150   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 05:42:32,430-Speed 24828.33 samples/sec   Loss 2.6607   LearningRate 0.0005   Epoch: 14   Global Step: 25160   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 05:42:42,258-Speed 25008.45 samples/sec   Loss 2.6744   LearningRate 0.0005   Epoch: 14   Global Step: 25170   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 05:42:52,053-Speed 25094.36 samples/sec   Loss 2.6565   LearningRate 0.0005   Epoch: 14   Global Step: 25180   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 05:43:01,889-Speed 24986.77 samples/sec   Loss 2.6636   LearningRate 0.0005   Epoch: 14   Global Step: 25190   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 05:43:11,773-Speed 24868.11 samples/sec   Loss 2.6449   LearningRate 0.0005   Epoch: 14   Global Step: 25200   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 05:43:21,637-Speed 24919.73 samples/sec   Loss 2.6724   LearningRate 0.0005   Epoch: 14   Global Step: 25210   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 05:43:31,646-Speed 24556.59 samples/sec   Loss 2.6768   LearningRate 0.0005   Epoch: 14   Global Step: 25220   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 05:43:41,722-Speed 24392.80 samples/sec   Loss 2.6582   LearningRate 0.0005   Epoch: 14   Global Step: 25230   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 05:43:51,748-Speed 24516.35 samples/sec   Loss 2.6408   LearningRate 0.0005   Epoch: 14   Global Step: 25240   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 05:44:01,803-Speed 24445.07 samples/sec   Loss 2.6707   LearningRate 0.0005   Epoch: 14   Global Step: 25250   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 05:44:11,947-Speed 24230.39 samples/sec   Loss 2.6691   LearningRate 0.0005   Epoch: 14   Global Step: 25260   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 05:44:21,967-Speed 24529.93 samples/sec   Loss 2.6609   LearningRate 0.0005   Epoch: 14   Global Step: 25270   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 05:44:31,965-Speed 24582.86 samples/sec   Loss 2.6375   LearningRate 0.0005   Epoch: 14   Global Step: 25280   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 05:44:42,011-Speed 24467.24 samples/sec   Loss 2.6426   LearningRate 0.0005   Epoch: 14   Global Step: 25290   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 05:44:52,084-Speed 24401.07 samples/sec   Loss 2.6701   LearningRate 0.0005   Epoch: 14   Global Step: 25300   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 05:45:02,044-Speed 24676.01 samples/sec   Loss 2.6711   LearningRate 0.0005   Epoch: 14   Global Step: 25310   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 05:45:12,285-Speed 24007.55 samples/sec   Loss 2.6267   LearningRate 0.0005   Epoch: 14   Global Step: 25320   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 05:45:22,284-Speed 24582.76 samples/sec   Loss 2.6528   LearningRate 0.0005   Epoch: 14   Global Step: 25330   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 05:45:32,196-Speed 24796.47 samples/sec   Loss 2.6351   LearningRate 0.0005   Epoch: 14   Global Step: 25340   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 05:45:42,183-Speed 24608.69 samples/sec   Loss 2.6516   LearningRate 0.0005   Epoch: 14   Global Step: 25350   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 05:45:52,158-Speed 24641.44 samples/sec   Loss 2.6684   LearningRate 0.0005   Epoch: 14   Global Step: 25360   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 05:46:02,123-Speed 24665.01 samples/sec   Loss 2.6415   LearningRate 0.0005   Epoch: 14   Global Step: 25370   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 05:46:12,208-Speed 24376.52 samples/sec   Loss 2.6415   LearningRate 0.0005   Epoch: 14   Global Step: 25380   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 05:46:22,261-Speed 24448.64 samples/sec   Loss 2.6562   LearningRate 0.0005   Epoch: 14   Global Step: 25390   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 05:46:32,138-Speed 24885.62 samples/sec   Loss 2.6320   LearningRate 0.0005   Epoch: 14   Global Step: 25400   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 05:46:42,007-Speed 24902.68 samples/sec   Loss 2.6499   LearningRate 0.0005   Epoch: 14   Global Step: 25410   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 05:46:52,001-Speed 24597.07 samples/sec   Loss 2.7587   LearningRate 0.0005   Epoch: 14   Global Step: 25420   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 05:47:01,953-Speed 24696.52 samples/sec   Loss 2.6638   LearningRate 0.0005   Epoch: 14   Global Step: 25430   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 05:47:11,786-Speed 24997.33 samples/sec   Loss 2.6521   LearningRate 0.0005   Epoch: 14   Global Step: 25440   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 05:47:21,747-Speed 24678.60 samples/sec   Loss 2.6340   LearningRate 0.0005   Epoch: 14   Global Step: 25450   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 05:47:31,693-Speed 24712.09 samples/sec   Loss 2.6573   LearningRate 0.0005   Epoch: 14   Global Step: 25460   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-26 05:47:41,598-Speed 24817.69 samples/sec   Loss 2.6625   LearningRate 0.0005   Epoch: 14   Global Step: 25470   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-26 05:47:51,727-Speed 24266.19 samples/sec   Loss 2.6437   LearningRate 0.0005   Epoch: 14   Global Step: 25480   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-26 05:48:01,707-Speed 24628.92 samples/sec   Loss 2.6432   LearningRate 0.0005   Epoch: 14   Global Step: 25490   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-26 05:48:11,986-Speed 23912.94 samples/sec   Loss 2.6762   LearningRate 0.0005   Epoch: 14   Global Step: 25500   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-26 05:48:22,068-Speed 24380.76 samples/sec   Loss 2.6463   LearningRate 0.0005   Epoch: 14   Global Step: 25510   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-26 05:48:32,069-Speed 24574.14 samples/sec   Loss 2.6237   LearningRate 0.0005   Epoch: 14   Global Step: 25520   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-26 05:48:42,109-Speed 24481.95 samples/sec   Loss 2.6271   LearningRate 0.0005   Epoch: 14   Global Step: 25530   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-26 05:48:52,074-Speed 24671.18 samples/sec   Loss 2.6399   LearningRate 0.0005   Epoch: 14   Global Step: 25540   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-26 05:49:02,212-Speed 24244.51 samples/sec   Loss 2.6573   LearningRate 0.0005   Epoch: 14   Global Step: 25550   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-26 05:49:12,259-Speed 24464.89 samples/sec   Loss 2.6373   LearningRate 0.0005   Epoch: 14   Global Step: 25560   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 05:49:22,256-Speed 24588.17 samples/sec   Loss 2.6261   LearningRate 0.0005   Epoch: 14   Global Step: 25570   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 05:49:32,427-Speed 24166.35 samples/sec   Loss 2.6395   LearningRate 0.0005   Epoch: 14   Global Step: 25580   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 05:49:42,402-Speed 24640.12 samples/sec   Loss 2.6273   LearningRate 0.0005   Epoch: 14   Global Step: 25590   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 05:49:52,340-Speed 24733.34 samples/sec   Loss 2.6600   LearningRate 0.0005   Epoch: 14   Global Step: 25600   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 05:50:02,456-Speed 24296.96 samples/sec   Loss 2.6389   LearningRate 0.0005   Epoch: 14   Global Step: 25610   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 05:50:12,537-Speed 24382.89 samples/sec   Loss 2.6212   LearningRate 0.0005   Epoch: 14   Global Step: 25620   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 05:50:22,505-Speed 24658.45 samples/sec   Loss 2.6441   LearningRate 0.0005   Epoch: 14   Global Step: 25630   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-26 05:50:32,380-Speed 24888.83 samples/sec   Loss 2.6456   LearningRate 0.0005   Epoch: 14   Global Step: 25640   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-26 05:50:42,191-Speed 25053.76 samples/sec   Loss 2.6677   LearningRate 0.0005   Epoch: 14   Global Step: 25650   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-26 05:50:52,020-Speed 25011.83 samples/sec   Loss 2.6470   LearningRate 0.0005   Epoch: 14   Global Step: 25660   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-26 05:51:01,826-Speed 25067.70 samples/sec   Loss 2.6291   LearningRate 0.0005   Epoch: 14   Global Step: 25670   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-26 05:51:11,784-Speed 24683.66 samples/sec   Loss 2.6336   LearningRate 0.0005   Epoch: 14   Global Step: 25680   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-26 05:51:21,534-Speed 25210.56 samples/sec   Loss 2.6826   LearningRate 0.0005   Epoch: 14   Global Step: 25690   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-26 05:51:31,371-Speed 24987.15 samples/sec   Loss 2.6609   LearningRate 0.0005   Epoch: 14   Global Step: 25700   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-26 05:51:41,252-Speed 24875.91 samples/sec   Loss 2.6286   LearningRate 0.0005   Epoch: 14   Global Step: 25710   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-26 05:51:51,124-Speed 24898.48 samples/sec   Loss 2.6112   LearningRate 0.0005   Epoch: 14   Global Step: 25720   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-26 05:52:00,867-Speed 25229.78 samples/sec   Loss 2.6420   LearningRate 0.0005   Epoch: 14   Global Step: 25730   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 05:52:10,647-Speed 25130.61 samples/sec   Loss 2.6493   LearningRate 0.0005   Epoch: 14   Global Step: 25740   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 05:52:20,429-Speed 25128.09 samples/sec   Loss 2.6337   LearningRate 0.0005   Epoch: 14   Global Step: 25750   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 05:52:30,237-Speed 25060.94 samples/sec   Loss 2.6426   LearningRate 0.0005   Epoch: 14   Global Step: 25760   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 05:52:40,106-Speed 24904.24 samples/sec   Loss 2.6300   LearningRate 0.0005   Epoch: 14   Global Step: 25770   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 05:52:49,853-Speed 25217.76 samples/sec   Loss 2.6183   LearningRate 0.0005   Epoch: 14   Global Step: 25780   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 05:52:59,596-Speed 25226.72 samples/sec   Loss 2.6093   LearningRate 0.0005   Epoch: 14   Global Step: 25790   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 05:53:09,453-Speed 24934.93 samples/sec   Loss 2.6125   LearningRate 0.0005   Epoch: 14   Global Step: 25800   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 05:53:19,181-Speed 25266.82 samples/sec   Loss 2.6036   LearningRate 0.0005   Epoch: 14   Global Step: 25810   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 05:53:29,054-Speed 24899.70 samples/sec   Loss 2.6302   LearningRate 0.0005   Epoch: 14   Global Step: 25820   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 05:53:38,838-Speed 25121.01 samples/sec   Loss 2.6289   LearningRate 0.0005   Epoch: 14   Global Step: 25830   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-03-26 05:53:48,706-Speed 24907.12 samples/sec   Loss 2.6167   LearningRate 0.0005   Epoch: 14   Global Step: 25840   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 05:53:58,393-Speed 25373.09 samples/sec   Loss 2.6369   LearningRate 0.0005   Epoch: 14   Global Step: 25850   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 05:54:08,156-Speed 25177.87 samples/sec   Loss 2.6267   LearningRate 0.0005   Epoch: 14   Global Step: 25860   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 05:54:18,001-Speed 24964.95 samples/sec   Loss 2.6360   LearningRate 0.0005   Epoch: 14   Global Step: 25870   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 05:54:27,699-Speed 25344.35 samples/sec   Loss 2.6405   LearningRate 0.0005   Epoch: 14   Global Step: 25880   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 05:54:37,664-Speed 24666.84 samples/sec   Loss 2.5990   LearningRate 0.0005   Epoch: 14   Global Step: 25890   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 05:54:47,509-Speed 24965.45 samples/sec   Loss 2.6346   LearningRate 0.0005   Epoch: 14   Global Step: 25900   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 05:54:57,346-Speed 24986.94 samples/sec   Loss 2.6609   LearningRate 0.0005   Epoch: 14   Global Step: 25910   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 05:55:07,166-Speed 25029.72 samples/sec   Loss 2.6734   LearningRate 0.0005   Epoch: 14   Global Step: 25920   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 05:56:06,582-Speed 4136.32 samples/sec   Loss 2.6628   LearningRate 0.0005   Epoch: 15   Global Step: 25930   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 05:56:16,332-Speed 25209.83 samples/sec   Loss 2.5840   LearningRate 0.0005   Epoch: 15   Global Step: 25940   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 05:56:26,171-Speed 24989.19 samples/sec   Loss 2.5836   LearningRate 0.0005   Epoch: 15   Global Step: 25950   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 05:56:35,985-Speed 25047.20 samples/sec   Loss 2.5923   LearningRate 0.0005   Epoch: 15   Global Step: 25960   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 05:56:45,712-Speed 25268.27 samples/sec   Loss 2.5880   LearningRate 0.0005   Epoch: 15   Global Step: 25970   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-26 05:56:55,502-Speed 25108.25 samples/sec   Loss 2.5899   LearningRate 0.0005   Epoch: 15   Global Step: 25980   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-26 05:57:05,280-Speed 25138.40 samples/sec   Loss 2.5763   LearningRate 0.0005   Epoch: 15   Global Step: 25990   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-26 05:57:15,117-Speed 24984.41 samples/sec   Loss 2.5734   LearningRate 0.0005   Epoch: 15   Global Step: 26000   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-26 05:57:24,902-Speed 25124.46 samples/sec   Loss 2.6070   LearningRate 0.0005   Epoch: 15   Global Step: 26010   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-26 05:57:34,777-Speed 24889.64 samples/sec   Loss 2.6153   LearningRate 0.0005   Epoch: 15   Global Step: 26020   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-26 05:57:44,563-Speed 25115.49 samples/sec   Loss 2.5837   LearningRate 0.0005   Epoch: 15   Global Step: 26030   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-26 05:57:54,329-Speed 25168.76 samples/sec   Loss 2.6019   LearningRate 0.0005   Epoch: 15   Global Step: 26040   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-26 05:58:04,121-Speed 25103.53 samples/sec   Loss 2.6042   LearningRate 0.0005   Epoch: 15   Global Step: 26050   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-26 05:58:14,050-Speed 24754.29 samples/sec   Loss 2.5588   LearningRate 0.0005   Epoch: 15   Global Step: 26060   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-26 05:58:23,999-Speed 24704.88 samples/sec   Loss 2.5912   LearningRate 0.0005   Epoch: 15   Global Step: 26070   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 05:58:33,777-Speed 25137.22 samples/sec   Loss 2.5909   LearningRate 0.0005   Epoch: 15   Global Step: 26080   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 05:58:43,637-Speed 24927.48 samples/sec   Loss 2.5887   LearningRate 0.0005   Epoch: 15   Global Step: 26090   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 05:58:53,423-Speed 25118.61 samples/sec   Loss 2.5766   LearningRate 0.0005   Epoch: 15   Global Step: 26100   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 05:59:03,312-Speed 24856.72 samples/sec   Loss 2.5868   LearningRate 0.0005   Epoch: 15   Global Step: 26110   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 05:59:13,114-Speed 25075.99 samples/sec   Loss 2.5835   LearningRate 0.0005   Epoch: 15   Global Step: 26120   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 05:59:22,837-Speed 25280.05 samples/sec   Loss 2.6213   LearningRate 0.0005   Epoch: 15   Global Step: 26130   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 05:59:32,699-Speed 24921.86 samples/sec   Loss 2.6000   LearningRate 0.0005   Epoch: 15   Global Step: 26140   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 05:59:42,618-Speed 24780.78 samples/sec   Loss 2.5892   LearningRate 0.0005   Epoch: 15   Global Step: 26150   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 05:59:52,435-Speed 25037.29 samples/sec   Loss 2.5975   LearningRate 0.0005   Epoch: 15   Global Step: 26160   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:00:02,298-Speed 24920.51 samples/sec   Loss 2.5750   LearningRate 0.0005   Epoch: 15   Global Step: 26170   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:00:12,409-Speed 24308.93 samples/sec   Loss 2.6049   LearningRate 0.0005   Epoch: 15   Global Step: 26180   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:00:22,373-Speed 24668.90 samples/sec   Loss 2.6113   LearningRate 0.0005   Epoch: 15   Global Step: 26190   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:00:32,300-Speed 24760.25 samples/sec   Loss 2.5962   LearningRate 0.0005   Epoch: 15   Global Step: 26200   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:00:42,304-Speed 24567.82 samples/sec   Loss 2.6086   LearningRate 0.0005   Epoch: 15   Global Step: 26210   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-26 06:00:52,285-Speed 24626.67 samples/sec   Loss 2.6036   LearningRate 0.0005   Epoch: 15   Global Step: 26220   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-26 06:01:02,049-Speed 25174.75 samples/sec   Loss 2.5998   LearningRate 0.0005   Epoch: 15   Global Step: 26230   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-26 06:01:11,843-Speed 25096.52 samples/sec   Loss 2.6076   LearningRate 0.0005   Epoch: 15   Global Step: 26240   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-26 06:01:21,702-Speed 24930.82 samples/sec   Loss 2.5972   LearningRate 0.0005   Epoch: 15   Global Step: 26250   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-26 06:01:31,439-Speed 25243.77 samples/sec   Loss 2.5875   LearningRate 0.0005   Epoch: 15   Global Step: 26260   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-26 06:01:41,279-Speed 24982.96 samples/sec   Loss 2.6010   LearningRate 0.0005   Epoch: 15   Global Step: 26270   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-26 06:01:51,245-Speed 24664.34 samples/sec   Loss 2.5860   LearningRate 0.0005   Epoch: 15   Global Step: 26280   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-26 06:02:01,012-Speed 25164.37 samples/sec   Loss 2.5833   LearningRate 0.0005   Epoch: 15   Global Step: 26290   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-26 06:02:10,956-Speed 24720.64 samples/sec   Loss 2.6154   LearningRate 0.0005   Epoch: 15   Global Step: 26300   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-26 06:02:20,809-Speed 24946.90 samples/sec   Loss 2.5921   LearningRate 0.0005   Epoch: 15   Global Step: 26310   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-26 06:02:30,563-Speed 25198.45 samples/sec   Loss 2.5848   LearningRate 0.0005   Epoch: 15   Global Step: 26320   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-26 06:02:40,277-Speed 25303.31 samples/sec   Loss 2.6008   LearningRate 0.0005   Epoch: 15   Global Step: 26330   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-26 06:02:50,179-Speed 24824.18 samples/sec   Loss 2.5764   LearningRate 0.0005   Epoch: 15   Global Step: 26340   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-26 06:03:00,142-Speed 24670.32 samples/sec   Loss 2.5832   LearningRate 0.0005   Epoch: 15   Global Step: 26350   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-26 06:03:09,922-Speed 25133.66 samples/sec   Loss 2.5961   LearningRate 0.0005   Epoch: 15   Global Step: 26360   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-26 06:03:19,699-Speed 25141.36 samples/sec   Loss 2.6052   LearningRate 0.0005   Epoch: 15   Global Step: 26370   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-26 06:03:29,443-Speed 25223.54 samples/sec   Loss 2.5774   LearningRate 0.0005   Epoch: 15   Global Step: 26380   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-26 06:03:39,190-Speed 25216.93 samples/sec   Loss 2.5754   LearningRate 0.0005   Epoch: 15   Global Step: 26390   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-26 06:03:49,049-Speed 24932.35 samples/sec   Loss 2.5857   LearningRate 0.0005   Epoch: 15   Global Step: 26400   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-26 06:03:58,945-Speed 24835.19 samples/sec   Loss 2.5816   LearningRate 0.0005   Epoch: 15   Global Step: 26410   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:04:08,829-Speed 24868.11 samples/sec   Loss 2.5658   LearningRate 0.0005   Epoch: 15   Global Step: 26420   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:04:18,629-Speed 25078.80 samples/sec   Loss 2.5955   LearningRate 0.0005   Epoch: 15   Global Step: 26430   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:04:28,475-Speed 24965.52 samples/sec   Loss 2.5935   LearningRate 0.0005   Epoch: 15   Global Step: 26440   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:04:38,411-Speed 24735.37 samples/sec   Loss 2.5539   LearningRate 0.0005   Epoch: 15   Global Step: 26450   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:04:48,224-Speed 25048.69 samples/sec   Loss 2.5698   LearningRate 0.0005   Epoch: 15   Global Step: 26460   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:04:57,978-Speed 25199.15 samples/sec   Loss 2.5611   LearningRate 0.0005   Epoch: 15   Global Step: 26470   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:05:07,759-Speed 25128.01 samples/sec   Loss 2.5609   LearningRate 0.0005   Epoch: 15   Global Step: 26480   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:05:17,585-Speed 25015.21 samples/sec   Loss 2.5546   LearningRate 0.0005   Epoch: 15   Global Step: 26490   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:05:27,434-Speed 24956.99 samples/sec   Loss 2.5628   LearningRate 0.0005   Epoch: 15   Global Step: 26500   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:05:37,388-Speed 24693.42 samples/sec   Loss 2.5955   LearningRate 0.0005   Epoch: 15   Global Step: 26510   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:05:47,215-Speed 25010.51 samples/sec   Loss 2.6105   LearningRate 0.0005   Epoch: 15   Global Step: 26520   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:05:57,038-Speed 25020.46 samples/sec   Loss 2.5974   LearningRate 0.0005   Epoch: 15   Global Step: 26530   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:06:06,794-Speed 25195.72 samples/sec   Loss 2.5887   LearningRate 0.0005   Epoch: 15   Global Step: 26540   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:06:16,581-Speed 25114.20 samples/sec   Loss 2.6489   LearningRate 0.0005   Epoch: 15   Global Step: 26550   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:06:26,478-Speed 24833.23 samples/sec   Loss 2.6007   LearningRate 0.0005   Epoch: 15   Global Step: 26560   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:06:36,319-Speed 24977.77 samples/sec   Loss 2.5665   LearningRate 0.0005   Epoch: 15   Global Step: 26570   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:06:46,188-Speed 24904.40 samples/sec   Loss 2.5658   LearningRate 0.0005   Epoch: 15   Global Step: 26580   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:06:55,976-Speed 25110.85 samples/sec   Loss 2.5847   LearningRate 0.0005   Epoch: 15   Global Step: 26590   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:07:05,690-Speed 25302.22 samples/sec   Loss 2.5812   LearningRate 0.0005   Epoch: 15   Global Step: 26600   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:07:15,419-Speed 25264.20 samples/sec   Loss 2.5651   LearningRate 0.0005   Epoch: 15   Global Step: 26610   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-03-26 06:07:25,502-Speed 24387.09 samples/sec   Loss 2.5680   LearningRate 0.0005   Epoch: 15   Global Step: 26620   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:07:35,477-Speed 24642.13 samples/sec   Loss 2.6076   LearningRate 0.0005   Epoch: 15   Global Step: 26630   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:07:45,511-Speed 24494.55 samples/sec   Loss 2.5661   LearningRate 0.0005   Epoch: 15   Global Step: 26640   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:07:55,519-Speed 24558.86 samples/sec   Loss 2.5472   LearningRate 0.0005   Epoch: 15   Global Step: 26650   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:08:05,529-Speed 24554.25 samples/sec   Loss 2.5544   LearningRate 0.0005   Epoch: 15   Global Step: 26660   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:08:15,457-Speed 24757.15 samples/sec   Loss 2.5547   LearningRate 0.0005   Epoch: 15   Global Step: 26670   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:08:25,361-Speed 24817.22 samples/sec   Loss 2.5723   LearningRate 0.0005   Epoch: 15   Global Step: 26680   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:08:35,524-Speed 24185.78 samples/sec   Loss 2.5619   LearningRate 0.0005   Epoch: 15   Global Step: 26690   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:08:45,638-Speed 24301.93 samples/sec   Loss 2.5630   LearningRate 0.0005   Epoch: 15   Global Step: 26700   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:08:55,683-Speed 24465.74 samples/sec   Loss 2.5666   LearningRate 0.0005   Epoch: 15   Global Step: 26710   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-26 06:09:05,644-Speed 24678.01 samples/sec   Loss 2.5347   LearningRate 0.0005   Epoch: 15   Global Step: 26720   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-26 06:09:15,679-Speed 24498.89 samples/sec   Loss 2.5531   LearningRate 0.0005   Epoch: 15   Global Step: 26730   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-26 06:09:25,638-Speed 24682.79 samples/sec   Loss 2.5494   LearningRate 0.0005   Epoch: 15   Global Step: 26740   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-26 06:09:35,551-Speed 24795.31 samples/sec   Loss 2.5412   LearningRate 0.0005   Epoch: 15   Global Step: 26750   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-26 06:09:45,491-Speed 24727.82 samples/sec   Loss 2.5705   LearningRate 0.0005   Epoch: 15   Global Step: 26760   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-26 06:09:55,500-Speed 24556.02 samples/sec   Loss 2.5705   LearningRate 0.0005   Epoch: 15   Global Step: 26770   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-26 06:10:05,541-Speed 24479.47 samples/sec   Loss 2.5586   LearningRate 0.0005   Epoch: 15   Global Step: 26780   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-26 06:10:15,584-Speed 24473.44 samples/sec   Loss 2.5495   LearningRate 0.0005   Epoch: 15   Global Step: 26790   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-26 06:10:25,525-Speed 24725.27 samples/sec   Loss 2.5407   LearningRate 0.0005   Epoch: 15   Global Step: 26800   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-26 06:10:35,651-Speed 24273.39 samples/sec   Loss 2.5442   LearningRate 0.0005   Epoch: 15   Global Step: 26810   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:10:45,761-Speed 24311.58 samples/sec   Loss 2.5512   LearningRate 0.0005   Epoch: 15   Global Step: 26820   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:10:55,795-Speed 24496.00 samples/sec   Loss 2.5431   LearningRate 0.0005   Epoch: 15   Global Step: 26830   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:11:05,828-Speed 24509.50 samples/sec   Loss 2.5904   LearningRate 0.0005   Epoch: 15   Global Step: 26840   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:11:15,811-Speed 24622.75 samples/sec   Loss 2.5845   LearningRate 0.0005   Epoch: 15   Global Step: 26850   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:11:25,748-Speed 24737.28 samples/sec   Loss 2.5765   LearningRate 0.0005   Epoch: 15   Global Step: 26860   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:11:35,746-Speed 24584.30 samples/sec   Loss 2.5961   LearningRate 0.0005   Epoch: 15   Global Step: 26870   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:11:45,730-Speed 24619.07 samples/sec   Loss 2.5543   LearningRate 0.0005   Epoch: 15   Global Step: 26880   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:11:55,818-Speed 24364.15 samples/sec   Loss 2.5467   LearningRate 0.0005   Epoch: 15   Global Step: 26890   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:12:05,802-Speed 24617.19 samples/sec   Loss 2.5360   LearningRate 0.0005   Epoch: 15   Global Step: 26900   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:12:15,778-Speed 24640.45 samples/sec   Loss 2.5203   LearningRate 0.0005   Epoch: 15   Global Step: 26910   Fp16 Grad Scale: 131072   Required: 12 hours
Training: 2022-03-26 06:12:25,788-Speed 24553.34 samples/sec   Loss 2.5656   LearningRate 0.0005   Epoch: 15   Global Step: 26920   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:12:35,796-Speed 24559.73 samples/sec   Loss 2.5597   LearningRate 0.0005   Epoch: 15   Global Step: 26930   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-26 06:12:45,788-Speed 24604.50 samples/sec   Loss 2.5408   LearningRate 0.0005   Epoch: 15   Global Step: 26940   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-26 06:12:55,745-Speed 24683.92 samples/sec   Loss 2.5311   LearningRate 0.0005   Epoch: 15   Global Step: 26950   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-26 06:13:05,734-Speed 24605.46 samples/sec   Loss 2.5476   LearningRate 0.0005   Epoch: 15   Global Step: 26960   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-26 06:13:15,749-Speed 24542.30 samples/sec   Loss 2.5397   LearningRate 0.0005   Epoch: 15   Global Step: 26970   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-26 06:13:25,927-Speed 24147.79 samples/sec   Loss 2.5304   LearningRate 0.0005   Epoch: 15   Global Step: 26980   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-26 06:13:35,987-Speed 24432.28 samples/sec   Loss 2.5293   LearningRate 0.0005   Epoch: 15   Global Step: 26990   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-26 06:13:46,013-Speed 24515.33 samples/sec   Loss 2.5522   LearningRate 0.0005   Epoch: 15   Global Step: 27000   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-26 06:13:56,049-Speed 24489.87 samples/sec   Loss 2.5784   LearningRate 0.0005   Epoch: 15   Global Step: 27010   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-26 06:14:06,126-Speed 24394.23 samples/sec   Loss 2.5696   LearningRate 0.0005   Epoch: 15   Global Step: 27020   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-26 06:14:16,133-Speed 24569.02 samples/sec   Loss 2.5473   LearningRate 0.0005   Epoch: 15   Global Step: 27030   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:14:26,151-Speed 24538.69 samples/sec   Loss 2.5394   LearningRate 0.0005   Epoch: 15   Global Step: 27040   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:14:36,102-Speed 24699.45 samples/sec   Loss 2.5639   LearningRate 0.0005   Epoch: 15   Global Step: 27050   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-26 06:14:46,058-Speed 24688.17 samples/sec   Loss 2.5183   LearningRate 0.0005   Epoch: 15   Global Step: 27060   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-26 06:14:56,135-Speed 24392.27 samples/sec   Loss 2.5361   LearningRate 0.0005   Epoch: 15   Global Step: 27070   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-26 06:15:06,218-Speed 24376.61 samples/sec   Loss 2.5390   LearningRate 0.0005   Epoch: 15   Global Step: 27080   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-26 06:15:16,250-Speed 24498.95 samples/sec   Loss 2.5530   LearningRate 0.0005   Epoch: 15   Global Step: 27090   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-26 06:15:26,228-Speed 24633.02 samples/sec   Loss 2.5382   LearningRate 0.0005   Epoch: 15   Global Step: 27100   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-26 06:15:36,270-Speed 24476.18 samples/sec   Loss 2.5266   LearningRate 0.0005   Epoch: 15   Global Step: 27110   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-26 06:15:46,386-Speed 24295.05 samples/sec   Loss 2.5196   LearningRate 0.0005   Epoch: 15   Global Step: 27120   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-26 06:15:56,419-Speed 24499.24 samples/sec   Loss 2.5392   LearningRate 0.0005   Epoch: 15   Global Step: 27130   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-26 06:16:06,373-Speed 24693.00 samples/sec   Loss 2.5087   LearningRate 0.0005   Epoch: 15   Global Step: 27140   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-26 06:16:16,361-Speed 24606.75 samples/sec   Loss 2.5463   LearningRate 0.0005   Epoch: 15   Global Step: 27150   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:16:26,412-Speed 24455.17 samples/sec   Loss 2.5579   LearningRate 0.0005   Epoch: 15   Global Step: 27160   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:16:36,509-Speed 24342.73 samples/sec   Loss 2.5295   LearningRate 0.0005   Epoch: 15   Global Step: 27170   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:16:46,560-Speed 24454.15 samples/sec   Loss 2.5528   LearningRate 0.0005   Epoch: 15   Global Step: 27180   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:16:56,524-Speed 24669.50 samples/sec   Loss 2.5119   LearningRate 0.0005   Epoch: 15   Global Step: 27190   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:17:06,701-Speed 24149.85 samples/sec   Loss 2.5214   LearningRate 0.0005   Epoch: 15   Global Step: 27200   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:17:16,779-Speed 24389.40 samples/sec   Loss 2.5504   LearningRate 0.0005   Epoch: 15   Global Step: 27210   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-26 06:17:26,787-Speed 24560.21 samples/sec   Loss 2.5249   LearningRate 0.0005   Epoch: 15   Global Step: 27220   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-26 06:17:36,851-Speed 24421.00 samples/sec   Loss 2.5057   LearningRate 0.0005   Epoch: 15   Global Step: 27230   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-26 06:17:46,868-Speed 24538.24 samples/sec   Loss 2.5348   LearningRate 0.0005   Epoch: 15   Global Step: 27240   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-26 06:17:56,887-Speed 24531.84 samples/sec   Loss 2.5256   LearningRate 0.0005   Epoch: 15   Global Step: 27250   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-26 06:18:06,856-Speed 24664.91 samples/sec   Loss 2.5483   LearningRate 0.0005   Epoch: 15   Global Step: 27260   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-26 06:18:16,891-Speed 24492.34 samples/sec   Loss 2.5316   LearningRate 0.0005   Epoch: 15   Global Step: 27270   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-26 06:18:26,922-Speed 24504.02 samples/sec   Loss 2.5145   LearningRate 0.0005   Epoch: 15   Global Step: 27280   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-26 06:18:37,022-Speed 24334.15 samples/sec   Loss 2.5041   LearningRate 0.0005   Epoch: 15   Global Step: 27290   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-26 06:18:47,127-Speed 24322.87 samples/sec   Loss 2.5244   LearningRate 0.0005   Epoch: 15   Global Step: 27300   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-26 06:18:57,138-Speed 24557.72 samples/sec   Loss 2.5318   LearningRate 0.0005   Epoch: 15   Global Step: 27310   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:19:07,158-Speed 24530.15 samples/sec   Loss 2.5131   LearningRate 0.0005   Epoch: 15   Global Step: 27320   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:19:17,132-Speed 24642.28 samples/sec   Loss 2.5236   LearningRate 0.0005   Epoch: 15   Global Step: 27330   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:19:27,288-Speed 24201.52 samples/sec   Loss 2.5167   LearningRate 0.0005   Epoch: 15   Global Step: 27340   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:19:37,323-Speed 24493.06 samples/sec   Loss 2.5231   LearningRate 0.0005   Epoch: 15   Global Step: 27350   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:19:47,302-Speed 24630.61 samples/sec   Loss 2.5089   LearningRate 0.0005   Epoch: 15   Global Step: 27360   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:19:57,336-Speed 24497.67 samples/sec   Loss 2.5165   LearningRate 0.0005   Epoch: 15   Global Step: 27370   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:20:07,350-Speed 24543.70 samples/sec   Loss 2.5128   LearningRate 0.0005   Epoch: 15   Global Step: 27380   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:20:17,329-Speed 24631.88 samples/sec   Loss 2.5373   LearningRate 0.0004   Epoch: 15   Global Step: 27390   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:20:27,325-Speed 24594.50 samples/sec   Loss 2.5126   LearningRate 0.0004   Epoch: 15   Global Step: 27400   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:20:37,391-Speed 24418.93 samples/sec   Loss 2.5310   LearningRate 0.0004   Epoch: 15   Global Step: 27410   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:20:47,385-Speed 24593.46 samples/sec   Loss 2.5269   LearningRate 0.0004   Epoch: 15   Global Step: 27420   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:20:57,431-Speed 24465.74 samples/sec   Loss 2.5957   LearningRate 0.0004   Epoch: 15   Global Step: 27430   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:21:07,545-Speed 24302.40 samples/sec   Loss 2.6208   LearningRate 0.0004   Epoch: 15   Global Step: 27440   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:21:17,558-Speed 24546.18 samples/sec   Loss 2.5146   LearningRate 0.0004   Epoch: 15   Global Step: 27450   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:21:27,628-Speed 24408.35 samples/sec   Loss 2.4942   LearningRate 0.0004   Epoch: 15   Global Step: 27460   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:21:37,629-Speed 24576.48 samples/sec   Loss 2.5041   LearningRate 0.0004   Epoch: 15   Global Step: 27470   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-26 06:21:47,611-Speed 24622.38 samples/sec   Loss 2.5022   LearningRate 0.0004   Epoch: 15   Global Step: 27480   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-26 06:21:57,458-Speed 24959.67 samples/sec   Loss 2.5127   LearningRate 0.0004   Epoch: 15   Global Step: 27490   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-26 06:22:07,220-Speed 25179.60 samples/sec   Loss 2.4902   LearningRate 0.0004   Epoch: 15   Global Step: 27500   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-26 06:22:17,002-Speed 25124.60 samples/sec   Loss 2.5103   LearningRate 0.0004   Epoch: 15   Global Step: 27510   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-26 06:22:26,783-Speed 25129.33 samples/sec   Loss 2.5264   LearningRate 0.0004   Epoch: 15   Global Step: 27520   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-26 06:22:36,558-Speed 25146.26 samples/sec   Loss 2.5383   LearningRate 0.0004   Epoch: 15   Global Step: 27530   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-26 06:22:46,276-Speed 25291.95 samples/sec   Loss 2.5319   LearningRate 0.0004   Epoch: 15   Global Step: 27540   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-26 06:22:56,148-Speed 24895.62 samples/sec   Loss 2.5396   LearningRate 0.0004   Epoch: 15   Global Step: 27550   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-26 06:23:05,894-Speed 25219.70 samples/sec   Loss 2.5327   LearningRate 0.0004   Epoch: 15   Global Step: 27560   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-26 06:23:15,729-Speed 24991.90 samples/sec   Loss 2.5231   LearningRate 0.0004   Epoch: 15   Global Step: 27570   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:23:25,424-Speed 25350.50 samples/sec   Loss 2.5173   LearningRate 0.0004   Epoch: 15   Global Step: 27580   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:23:35,185-Speed 25182.49 samples/sec   Loss 2.5123   LearningRate 0.0004   Epoch: 15   Global Step: 27590   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:23:44,911-Speed 25271.08 samples/sec   Loss 2.5093   LearningRate 0.0004   Epoch: 15   Global Step: 27600   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:23:54,650-Speed 25238.23 samples/sec   Loss 2.5060   LearningRate 0.0004   Epoch: 15   Global Step: 27610   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:24:04,445-Speed 25094.70 samples/sec   Loss 2.5299   LearningRate 0.0004   Epoch: 15   Global Step: 27620   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-26 06:24:14,328-Speed 24872.81 samples/sec   Loss 2.5341   LearningRate 0.0004   Epoch: 15   Global Step: 27630   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-26 06:24:24,113-Speed 25116.48 samples/sec   Loss 2.5436   LearningRate 0.0004   Epoch: 15   Global Step: 27640   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-26 06:24:33,918-Speed 25068.21 samples/sec   Loss 2.5325   LearningRate 0.0004   Epoch: 15   Global Step: 27650   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-26 06:25:33,874-Speed 4099.15 samples/sec   Loss 2.4976   LearningRate 0.0004   Epoch: 16   Global Step: 27660   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-26 06:25:43,613-Speed 25237.95 samples/sec   Loss 2.4591   LearningRate 0.0004   Epoch: 16   Global Step: 27670   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-26 06:25:53,336-Speed 25284.15 samples/sec   Loss 2.4822   LearningRate 0.0004   Epoch: 16   Global Step: 27680   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-26 06:26:03,068-Speed 25256.01 samples/sec   Loss 2.4693   LearningRate 0.0004   Epoch: 16   Global Step: 27690   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-26 06:26:12,908-Speed 24978.59 samples/sec   Loss 2.4737   LearningRate 0.0004   Epoch: 16   Global Step: 27700   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-26 06:26:22,673-Speed 25169.77 samples/sec   Loss 2.4759   LearningRate 0.0004   Epoch: 16   Global Step: 27710   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-26 06:26:32,507-Speed 24993.79 samples/sec   Loss 2.4937   LearningRate 0.0004   Epoch: 16   Global Step: 27720   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:26:42,344-Speed 24985.68 samples/sec   Loss 2.4836   LearningRate 0.0004   Epoch: 16   Global Step: 27730   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:26:52,101-Speed 25191.44 samples/sec   Loss 2.5108   LearningRate 0.0004   Epoch: 16   Global Step: 27740   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:27:01,950-Speed 24956.35 samples/sec   Loss 2.4789   LearningRate 0.0004   Epoch: 16   Global Step: 27750   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:27:11,772-Speed 25025.13 samples/sec   Loss 2.4870   LearningRate 0.0004   Epoch: 16   Global Step: 27760   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:27:21,580-Speed 25059.13 samples/sec   Loss 2.4858   LearningRate 0.0004   Epoch: 16   Global Step: 27770   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:27:31,488-Speed 24818.15 samples/sec   Loss 2.4824   LearningRate 0.0004   Epoch: 16   Global Step: 27780   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:27:41,309-Speed 25027.33 samples/sec   Loss 2.4695   LearningRate 0.0004   Epoch: 16   Global Step: 27790   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:27:51,041-Speed 25256.77 samples/sec   Loss 2.5035   LearningRate 0.0004   Epoch: 16   Global Step: 27800   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:28:00,856-Speed 25049.20 samples/sec   Loss 2.4733   LearningRate 0.0004   Epoch: 16   Global Step: 27810   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:28:10,681-Speed 25015.53 samples/sec   Loss 2.4821   LearningRate 0.0004   Epoch: 16   Global Step: 27820   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:28:20,536-Speed 24948.38 samples/sec   Loss 2.4696   LearningRate 0.0004   Epoch: 16   Global Step: 27830   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:28:30,264-Speed 25264.97 samples/sec   Loss 2.4592   LearningRate 0.0004   Epoch: 16   Global Step: 27840   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:28:40,103-Speed 24984.27 samples/sec   Loss 2.4878   LearningRate 0.0004   Epoch: 16   Global Step: 27850   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:28:49,852-Speed 25211.68 samples/sec   Loss 2.4989   LearningRate 0.0004   Epoch: 16   Global Step: 27860   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:28:59,649-Speed 25087.55 samples/sec   Loss 2.4698   LearningRate 0.0004   Epoch: 16   Global Step: 27870   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:29:09,423-Speed 25151.81 samples/sec   Loss 2.4811   LearningRate 0.0004   Epoch: 16   Global Step: 27880   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:29:19,328-Speed 24817.81 samples/sec   Loss 2.4966   LearningRate 0.0004   Epoch: 16   Global Step: 27890   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:29:29,335-Speed 24569.00 samples/sec   Loss 2.4885   LearningRate 0.0004   Epoch: 16   Global Step: 27900   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:29:39,380-Speed 24470.03 samples/sec   Loss 2.5056   LearningRate 0.0004   Epoch: 16   Global Step: 27910   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:29:49,426-Speed 24468.61 samples/sec   Loss 2.4892   LearningRate 0.0004   Epoch: 16   Global Step: 27920   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:29:59,338-Speed 24798.10 samples/sec   Loss 2.4814   LearningRate 0.0004   Epoch: 16   Global Step: 27930   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:30:09,179-Speed 24977.38 samples/sec   Loss 2.4770   LearningRate 0.0004   Epoch: 16   Global Step: 27940   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:30:18,934-Speed 25196.86 samples/sec   Loss 2.5021   LearningRate 0.0004   Epoch: 16   Global Step: 27950   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:30:28,645-Speed 25311.15 samples/sec   Loss 2.5087   LearningRate 0.0004   Epoch: 16   Global Step: 27960   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:30:38,531-Speed 24862.60 samples/sec   Loss 2.4922   LearningRate 0.0004   Epoch: 16   Global Step: 27970   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:30:48,358-Speed 25019.80 samples/sec   Loss 2.4857   LearningRate 0.0004   Epoch: 16   Global Step: 27980   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:30:58,113-Speed 25196.54 samples/sec   Loss 2.4761   LearningRate 0.0004   Epoch: 16   Global Step: 27990   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:31:07,896-Speed 25123.36 samples/sec   Loss 2.4781   LearningRate 0.0004   Epoch: 16   Global Step: 28000   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:31:17,706-Speed 25057.93 samples/sec   Loss 2.4693   LearningRate 0.0004   Epoch: 16   Global Step: 28010   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:31:27,405-Speed 25344.19 samples/sec   Loss 2.4857   LearningRate 0.0004   Epoch: 16   Global Step: 28020   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:31:37,226-Speed 25025.65 samples/sec   Loss 2.4951   LearningRate 0.0004   Epoch: 16   Global Step: 28030   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:31:46,946-Speed 25291.79 samples/sec   Loss 2.5407   LearningRate 0.0004   Epoch: 16   Global Step: 28040   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:31:56,672-Speed 25273.41 samples/sec   Loss 2.4787   LearningRate 0.0004   Epoch: 16   Global Step: 28050   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:32:06,504-Speed 25001.15 samples/sec   Loss 2.4834   LearningRate 0.0004   Epoch: 16   Global Step: 28060   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-26 06:32:16,422-Speed 24785.36 samples/sec   Loss 2.4543   LearningRate 0.0004   Epoch: 16   Global Step: 28070   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-26 06:32:26,396-Speed 24648.09 samples/sec   Loss 2.4687   LearningRate 0.0004   Epoch: 16   Global Step: 28080   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-26 06:32:36,378-Speed 24627.19 samples/sec   Loss 2.4688   LearningRate 0.0004   Epoch: 16   Global Step: 28090   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-26 06:32:46,361-Speed 24629.43 samples/sec   Loss 2.4938   LearningRate 0.0004   Epoch: 16   Global Step: 28100   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-26 06:32:56,463-Speed 24333.19 samples/sec   Loss 2.4625   LearningRate 0.0004   Epoch: 16   Global Step: 28110   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-26 06:33:06,461-Speed 24587.29 samples/sec   Loss 2.4648   LearningRate 0.0004   Epoch: 16   Global Step: 28120   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-26 06:33:16,372-Speed 24799.51 samples/sec   Loss 2.4645   LearningRate 0.0004   Epoch: 16   Global Step: 28130   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-26 06:33:26,403-Speed 24504.43 samples/sec   Loss 2.4872   LearningRate 0.0004   Epoch: 16   Global Step: 28140   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-26 06:33:36,361-Speed 24683.31 samples/sec   Loss 2.4677   LearningRate 0.0004   Epoch: 16   Global Step: 28150   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-26 06:33:46,126-Speed 25180.28 samples/sec   Loss 2.4641   LearningRate 0.0004   Epoch: 16   Global Step: 28160   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:33:55,961-Speed 24991.49 samples/sec   Loss 2.4431   LearningRate 0.0004   Epoch: 16   Global Step: 28170   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:34:05,720-Speed 25188.43 samples/sec   Loss 2.4665   LearningRate 0.0004   Epoch: 16   Global Step: 28180   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:34:15,550-Speed 25005.89 samples/sec   Loss 2.4825   LearningRate 0.0004   Epoch: 16   Global Step: 28190   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:34:25,481-Speed 24748.24 samples/sec   Loss 2.4716   LearningRate 0.0004   Epoch: 16   Global Step: 28200   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:34:35,370-Speed 24856.35 samples/sec   Loss 2.4468   LearningRate 0.0004   Epoch: 16   Global Step: 28210   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:34:45,099-Speed 25269.19 samples/sec   Loss 2.4795   LearningRate 0.0004   Epoch: 16   Global Step: 28220   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:34:54,870-Speed 25156.62 samples/sec   Loss 2.4934   LearningRate 0.0004   Epoch: 16   Global Step: 28230   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:35:04,645-Speed 25145.42 samples/sec   Loss 2.4904   LearningRate 0.0004   Epoch: 16   Global Step: 28240   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:35:14,421-Speed 25142.84 samples/sec   Loss 2.4551   LearningRate 0.0004   Epoch: 16   Global Step: 28250   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:35:24,213-Speed 25118.20 samples/sec   Loss 2.4747   LearningRate 0.0004   Epoch: 16   Global Step: 28260   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:35:34,029-Speed 25039.60 samples/sec   Loss 2.4722   LearningRate 0.0004   Epoch: 16   Global Step: 28270   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:35:43,810-Speed 25129.90 samples/sec   Loss 2.4607   LearningRate 0.0004   Epoch: 16   Global Step: 28280   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:35:53,634-Speed 25019.41 samples/sec   Loss 2.4563   LearningRate 0.0004   Epoch: 16   Global Step: 28290   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:36:03,372-Speed 25240.95 samples/sec   Loss 2.5356   LearningRate 0.0004   Epoch: 16   Global Step: 28300   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:36:13,046-Speed 25407.87 samples/sec   Loss 2.4844   LearningRate 0.0004   Epoch: 16   Global Step: 28310   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:36:22,815-Speed 25162.14 samples/sec   Loss 2.4680   LearningRate 0.0004   Epoch: 16   Global Step: 28320   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:36:32,585-Speed 25156.30 samples/sec   Loss 2.4467   LearningRate 0.0004   Epoch: 16   Global Step: 28330   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:36:42,468-Speed 24872.07 samples/sec   Loss 2.5125   LearningRate 0.0004   Epoch: 16   Global Step: 28340   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:36:52,223-Speed 25201.83 samples/sec   Loss 2.4568   LearningRate 0.0004   Epoch: 16   Global Step: 28350   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:37:01,993-Speed 25157.03 samples/sec   Loss 2.4514   LearningRate 0.0004   Epoch: 16   Global Step: 28360   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:37:11,764-Speed 25161.67 samples/sec   Loss 2.4371   LearningRate 0.0004   Epoch: 16   Global Step: 28370   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:37:21,532-Speed 25161.39 samples/sec   Loss 2.4575   LearningRate 0.0004   Epoch: 16   Global Step: 28380   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-26 06:37:31,273-Speed 25234.07 samples/sec   Loss 2.4532   LearningRate 0.0004   Epoch: 16   Global Step: 28390   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 06:37:41,035-Speed 25177.19 samples/sec   Loss 2.4473   LearningRate 0.0004   Epoch: 16   Global Step: 28400   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 06:37:50,754-Speed 25289.48 samples/sec   Loss 2.4342   LearningRate 0.0004   Epoch: 16   Global Step: 28410   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 06:38:00,528-Speed 25148.75 samples/sec   Loss 2.4380   LearningRate 0.0004   Epoch: 16   Global Step: 28420   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 06:38:10,224-Speed 25351.05 samples/sec   Loss 2.4584   LearningRate 0.0004   Epoch: 16   Global Step: 28430   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 06:38:19,994-Speed 25157.95 samples/sec   Loss 2.4495   LearningRate 0.0004   Epoch: 16   Global Step: 28440   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 06:38:29,776-Speed 25124.75 samples/sec   Loss 2.4495   LearningRate 0.0004   Epoch: 16   Global Step: 28450   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 06:38:39,543-Speed 25165.94 samples/sec   Loss 2.4397   LearningRate 0.0004   Epoch: 16   Global Step: 28460   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-03-26 06:38:49,315-Speed 25153.29 samples/sec   Loss 2.4261   LearningRate 0.0004   Epoch: 16   Global Step: 28470   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 06:38:59,078-Speed 25178.91 samples/sec   Loss 2.4464   LearningRate 0.0004   Epoch: 16   Global Step: 28480   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 06:39:08,822-Speed 25226.45 samples/sec   Loss 2.4645   LearningRate 0.0004   Epoch: 16   Global Step: 28490   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 06:39:18,578-Speed 25194.57 samples/sec   Loss 2.4436   LearningRate 0.0004   Epoch: 16   Global Step: 28500   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 06:39:28,382-Speed 25071.98 samples/sec   Loss 2.4278   LearningRate 0.0004   Epoch: 16   Global Step: 28510   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 06:39:38,138-Speed 25199.10 samples/sec   Loss 2.4541   LearningRate 0.0004   Epoch: 16   Global Step: 28520   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 06:39:47,866-Speed 25266.64 samples/sec   Loss 2.4383   LearningRate 0.0004   Epoch: 16   Global Step: 28530   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 06:39:57,667-Speed 25077.53 samples/sec   Loss 2.4335   LearningRate 0.0004   Epoch: 16   Global Step: 28540   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 06:40:07,586-Speed 24779.37 samples/sec   Loss 2.4193   LearningRate 0.0004   Epoch: 16   Global Step: 28550   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 06:40:17,366-Speed 25131.36 samples/sec   Loss 2.4330   LearningRate 0.0004   Epoch: 16   Global Step: 28560   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 06:40:27,100-Speed 25250.36 samples/sec   Loss 2.4376   LearningRate 0.0004   Epoch: 16   Global Step: 28570   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 06:40:36,871-Speed 25156.48 samples/sec   Loss 2.4410   LearningRate 0.0004   Epoch: 16   Global Step: 28580   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 06:40:46,627-Speed 25192.75 samples/sec   Loss 2.4319   LearningRate 0.0004   Epoch: 16   Global Step: 28590   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 06:40:56,405-Speed 25137.32 samples/sec   Loss 2.4419   LearningRate 0.0004   Epoch: 16   Global Step: 28600   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 06:41:06,175-Speed 25156.57 samples/sec   Loss 2.4254   LearningRate 0.0004   Epoch: 16   Global Step: 28610   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 06:41:15,995-Speed 25030.38 samples/sec   Loss 2.4423   LearningRate 0.0004   Epoch: 16   Global Step: 28620   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 06:41:25,714-Speed 25289.19 samples/sec   Loss 2.4473   LearningRate 0.0004   Epoch: 16   Global Step: 28630   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 06:41:35,450-Speed 25247.24 samples/sec   Loss 2.4403   LearningRate 0.0004   Epoch: 16   Global Step: 28640   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 06:41:45,241-Speed 25109.80 samples/sec   Loss 2.4490   LearningRate 0.0004   Epoch: 16   Global Step: 28650   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 06:41:55,096-Speed 24940.17 samples/sec   Loss 2.4318   LearningRate 0.0004   Epoch: 16   Global Step: 28660   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 06:42:04,823-Speed 25270.70 samples/sec   Loss 2.4411   LearningRate 0.0004   Epoch: 16   Global Step: 28670   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 06:42:14,641-Speed 25033.15 samples/sec   Loss 2.4489   LearningRate 0.0004   Epoch: 16   Global Step: 28680   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 06:42:24,344-Speed 25332.66 samples/sec   Loss 2.4612   LearningRate 0.0004   Epoch: 16   Global Step: 28690   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 06:42:34,161-Speed 25039.35 samples/sec   Loss 2.4602   LearningRate 0.0004   Epoch: 16   Global Step: 28700   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 06:42:43,937-Speed 25141.91 samples/sec   Loss 2.4245   LearningRate 0.0004   Epoch: 16   Global Step: 28710   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 06:42:53,686-Speed 25213.13 samples/sec   Loss 2.4318   LearningRate 0.0004   Epoch: 16   Global Step: 28720   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 06:43:03,384-Speed 25344.52 samples/sec   Loss 2.4303   LearningRate 0.0004   Epoch: 16   Global Step: 28730   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 06:43:13,132-Speed 25216.57 samples/sec   Loss 2.4358   LearningRate 0.0004   Epoch: 16   Global Step: 28740   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 06:43:22,908-Speed 25141.93 samples/sec   Loss 2.4393   LearningRate 0.0004   Epoch: 16   Global Step: 28750   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 06:43:32,672-Speed 25172.13 samples/sec   Loss 2.4237   LearningRate 0.0004   Epoch: 16   Global Step: 28760   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 06:43:42,397-Speed 25273.68 samples/sec   Loss 2.4345   LearningRate 0.0004   Epoch: 16   Global Step: 28770   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 06:43:52,267-Speed 24902.71 samples/sec   Loss 2.4670   LearningRate 0.0004   Epoch: 16   Global Step: 28780   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 06:44:02,135-Speed 24907.48 samples/sec   Loss 2.4311   LearningRate 0.0004   Epoch: 16   Global Step: 28790   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 06:44:11,853-Speed 25294.16 samples/sec   Loss 2.4250   LearningRate 0.0004   Epoch: 16   Global Step: 28800   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 06:44:21,604-Speed 25206.35 samples/sec   Loss 2.4181   LearningRate 0.0004   Epoch: 16   Global Step: 28810   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 06:44:31,453-Speed 24957.02 samples/sec   Loss 2.4294   LearningRate 0.0004   Epoch: 16   Global Step: 28820   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 06:44:41,249-Speed 25091.31 samples/sec   Loss 2.4304   LearningRate 0.0004   Epoch: 16   Global Step: 28830   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 06:44:51,035-Speed 25118.25 samples/sec   Loss 2.4392   LearningRate 0.0004   Epoch: 16   Global Step: 28840   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 06:45:00,799-Speed 25172.36 samples/sec   Loss 2.4145   LearningRate 0.0004   Epoch: 16   Global Step: 28850   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 06:45:10,593-Speed 25095.46 samples/sec   Loss 2.4282   LearningRate 0.0004   Epoch: 16   Global Step: 28860   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 06:45:20,479-Speed 24864.30 samples/sec   Loss 2.4323   LearningRate 0.0004   Epoch: 16   Global Step: 28870   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 06:45:30,485-Speed 24565.28 samples/sec   Loss 2.4246   LearningRate 0.0004   Epoch: 16   Global Step: 28880   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 06:45:40,439-Speed 24694.04 samples/sec   Loss 2.4288   LearningRate 0.0004   Epoch: 16   Global Step: 28890   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 06:45:50,345-Speed 24814.57 samples/sec   Loss 2.4057   LearningRate 0.0004   Epoch: 16   Global Step: 28900   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 06:46:00,310-Speed 24664.67 samples/sec   Loss 2.4292   LearningRate 0.0004   Epoch: 16   Global Step: 28910   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 06:46:10,257-Speed 24710.72 samples/sec   Loss 2.4229   LearningRate 0.0004   Epoch: 16   Global Step: 28920   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 06:46:20,201-Speed 24718.70 samples/sec   Loss 2.4212   LearningRate 0.0004   Epoch: 16   Global Step: 28930   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 06:46:30,002-Speed 25078.54 samples/sec   Loss 2.4135   LearningRate 0.0004   Epoch: 16   Global Step: 28940   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 06:46:39,796-Speed 25104.05 samples/sec   Loss 2.4205   LearningRate 0.0004   Epoch: 16   Global Step: 28950   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 06:46:49,574-Speed 25138.04 samples/sec   Loss 2.4205   LearningRate 0.0004   Epoch: 16   Global Step: 28960   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 06:46:59,273-Speed 25340.84 samples/sec   Loss 2.4392   LearningRate 0.0004   Epoch: 16   Global Step: 28970   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 06:47:08,987-Speed 25302.01 samples/sec   Loss 2.4160   LearningRate 0.0004   Epoch: 16   Global Step: 28980   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 06:47:18,697-Speed 25315.32 samples/sec   Loss 2.4143   LearningRate 0.0004   Epoch: 16   Global Step: 28990   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 06:47:28,394-Speed 25349.11 samples/sec   Loss 2.4057   LearningRate 0.0004   Epoch: 16   Global Step: 29000   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 06:47:38,129-Speed 25250.19 samples/sec   Loss 2.4181   LearningRate 0.0004   Epoch: 16   Global Step: 29010   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 06:47:47,929-Speed 25080.64 samples/sec   Loss 2.4056   LearningRate 0.0004   Epoch: 16   Global Step: 29020   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 06:47:57,638-Speed 25318.55 samples/sec   Loss 2.4099   LearningRate 0.0004   Epoch: 16   Global Step: 29030   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 06:48:07,420-Speed 25124.17 samples/sec   Loss 2.4054   LearningRate 0.0004   Epoch: 16   Global Step: 29040   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 06:48:17,321-Speed 24827.43 samples/sec   Loss 2.4008   LearningRate 0.0004   Epoch: 16   Global Step: 29050   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 06:48:27,152-Speed 25003.38 samples/sec   Loss 2.4028   LearningRate 0.0004   Epoch: 16   Global Step: 29060   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 06:48:36,878-Speed 25271.36 samples/sec   Loss 2.4113   LearningRate 0.0004   Epoch: 16   Global Step: 29070   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-03-26 06:48:46,594-Speed 25296.93 samples/sec   Loss 2.4366   LearningRate 0.0004   Epoch: 16   Global Step: 29080   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 06:48:56,368-Speed 25147.34 samples/sec   Loss 2.4258   LearningRate 0.0004   Epoch: 16   Global Step: 29090   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 06:49:06,226-Speed 24933.65 samples/sec   Loss 2.4383   LearningRate 0.0004   Epoch: 16   Global Step: 29100   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 06:49:16,090-Speed 24917.61 samples/sec   Loss 2.4264   LearningRate 0.0004   Epoch: 16   Global Step: 29110   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 06:49:25,865-Speed 25145.35 samples/sec   Loss 2.4045   LearningRate 0.0004   Epoch: 16   Global Step: 29120   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 06:49:35,555-Speed 25366.10 samples/sec   Loss 2.4331   LearningRate 0.0004   Epoch: 16   Global Step: 29130   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 06:49:45,280-Speed 25274.04 samples/sec   Loss 2.4302   LearningRate 0.0004   Epoch: 16   Global Step: 29140   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 06:49:55,318-Speed 24485.65 samples/sec   Loss 2.4158   LearningRate 0.0004   Epoch: 16   Global Step: 29150   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 06:50:05,122-Speed 25072.43 samples/sec   Loss 2.4249   LearningRate 0.0004   Epoch: 16   Global Step: 29160   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 06:50:14,845-Speed 25277.83 samples/sec   Loss 2.4169   LearningRate 0.0004   Epoch: 16   Global Step: 29170   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 06:50:24,523-Speed 25397.65 samples/sec   Loss 2.3961   LearningRate 0.0004   Epoch: 16   Global Step: 29180   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 06:50:34,229-Speed 25324.37 samples/sec   Loss 2.4177   LearningRate 0.0004   Epoch: 16   Global Step: 29190   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 06:50:44,113-Speed 24867.81 samples/sec   Loss 2.4159   LearningRate 0.0004   Epoch: 16   Global Step: 29200   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 06:50:53,903-Speed 25106.29 samples/sec   Loss 2.4242   LearningRate 0.0004   Epoch: 16   Global Step: 29210   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 06:51:03,723-Speed 25029.86 samples/sec   Loss 2.4029   LearningRate 0.0004   Epoch: 16   Global Step: 29220   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 06:51:13,440-Speed 25293.32 samples/sec   Loss 2.4238   LearningRate 0.0004   Epoch: 16   Global Step: 29230   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 06:51:23,177-Speed 25244.50 samples/sec   Loss 2.4265   LearningRate 0.0004   Epoch: 16   Global Step: 29240   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 06:51:32,934-Speed 25195.83 samples/sec   Loss 2.4154   LearningRate 0.0004   Epoch: 16   Global Step: 29250   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 06:51:42,761-Speed 25013.66 samples/sec   Loss 2.3963   LearningRate 0.0004   Epoch: 16   Global Step: 29260   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 06:51:52,471-Speed 25312.35 samples/sec   Loss 2.3893   LearningRate 0.0004   Epoch: 16   Global Step: 29270   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 06:52:02,250-Speed 25133.75 samples/sec   Loss 2.4088   LearningRate 0.0004   Epoch: 16   Global Step: 29280   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 06:52:11,978-Speed 25269.15 samples/sec   Loss 2.4265   LearningRate 0.0004   Epoch: 16   Global Step: 29290   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 06:52:21,779-Speed 25079.59 samples/sec   Loss 2.4141   LearningRate 0.0004   Epoch: 16   Global Step: 29300   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 06:52:31,593-Speed 25042.63 samples/sec   Loss 2.4065   LearningRate 0.0004   Epoch: 16   Global Step: 29310   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 06:52:41,309-Speed 25299.23 samples/sec   Loss 2.4120   LearningRate 0.0004   Epoch: 16   Global Step: 29320   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 06:52:51,072-Speed 25175.60 samples/sec   Loss 2.3980   LearningRate 0.0004   Epoch: 16   Global Step: 29330   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 06:53:00,787-Speed 25301.34 samples/sec   Loss 2.4094   LearningRate 0.0004   Epoch: 16   Global Step: 29340   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 06:53:10,523-Speed 25246.80 samples/sec   Loss 2.4026   LearningRate 0.0004   Epoch: 16   Global Step: 29350   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 06:53:20,348-Speed 25017.48 samples/sec   Loss 2.4113   LearningRate 0.0004   Epoch: 16   Global Step: 29360   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 06:53:30,116-Speed 25163.67 samples/sec   Loss 2.4156   LearningRate 0.0004   Epoch: 16   Global Step: 29370   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 06:53:39,796-Speed 25390.43 samples/sec   Loss 2.4393   LearningRate 0.0004   Epoch: 16   Global Step: 29380   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-26 06:54:39,729-Speed 4100.69 samples/sec   Loss 2.4503   LearningRate 0.0004   Epoch: 17   Global Step: 29390   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-26 06:54:49,503-Speed 25149.01 samples/sec   Loss 2.3802   LearningRate 0.0004   Epoch: 17   Global Step: 29400   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-26 06:54:59,220-Speed 25293.83 samples/sec   Loss 2.3769   LearningRate 0.0004   Epoch: 17   Global Step: 29410   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-26 06:55:08,911-Speed 25362.66 samples/sec   Loss 2.4014   LearningRate 0.0004   Epoch: 17   Global Step: 29420   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-26 06:55:18,750-Speed 24982.06 samples/sec   Loss 2.4158   LearningRate 0.0004   Epoch: 17   Global Step: 29430   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-26 06:55:28,500-Speed 25211.13 samples/sec   Loss 2.3934   LearningRate 0.0004   Epoch: 17   Global Step: 29440   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-26 06:55:38,243-Speed 25227.00 samples/sec   Loss 2.3708   LearningRate 0.0004   Epoch: 17   Global Step: 29450   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-26 06:55:48,068-Speed 25018.64 samples/sec   Loss 2.3767   LearningRate 0.0004   Epoch: 17   Global Step: 29460   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-26 06:55:57,825-Speed 25191.91 samples/sec   Loss 2.3449   LearningRate 0.0004   Epoch: 17   Global Step: 29470   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-26 06:56:07,545-Speed 25286.38 samples/sec   Loss 2.3650   LearningRate 0.0004   Epoch: 17   Global Step: 29480   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 06:56:17,248-Speed 25336.70 samples/sec   Loss 2.3733   LearningRate 0.0004   Epoch: 17   Global Step: 29490   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 06:56:26,914-Speed 25426.92 samples/sec   Loss 2.3787   LearningRate 0.0004   Epoch: 17   Global Step: 29500   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 06:56:36,707-Speed 25100.94 samples/sec   Loss 2.3735   LearningRate 0.0004   Epoch: 17   Global Step: 29510   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 06:56:46,494-Speed 25114.86 samples/sec   Loss 2.3847   LearningRate 0.0004   Epoch: 17   Global Step: 29520   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 06:56:56,239-Speed 25224.99 samples/sec   Loss 2.3554   LearningRate 0.0004   Epoch: 17   Global Step: 29530   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 06:57:05,977-Speed 25238.56 samples/sec   Loss 2.3972   LearningRate 0.0004   Epoch: 17   Global Step: 29540   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 06:57:15,723-Speed 25221.84 samples/sec   Loss 2.3791   LearningRate 0.0004   Epoch: 17   Global Step: 29550   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 06:57:25,587-Speed 24921.29 samples/sec   Loss 2.3726   LearningRate 0.0004   Epoch: 17   Global Step: 29560   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 06:57:35,279-Speed 25360.11 samples/sec   Loss 2.3677   LearningRate 0.0004   Epoch: 17   Global Step: 29570   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 06:57:45,057-Speed 25138.82 samples/sec   Loss 2.3840   LearningRate 0.0004   Epoch: 17   Global Step: 29580   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-26 06:57:54,859-Speed 25074.96 samples/sec   Loss 2.3712   LearningRate 0.0004   Epoch: 17   Global Step: 29590   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-26 06:58:04,569-Speed 25312.48 samples/sec   Loss 2.4153   LearningRate 0.0004   Epoch: 17   Global Step: 29600   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-26 06:58:14,300-Speed 25257.50 samples/sec   Loss 2.4927   LearningRate 0.0004   Epoch: 17   Global Step: 29610   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-26 06:58:24,020-Speed 25289.09 samples/sec   Loss 2.4426   LearningRate 0.0004   Epoch: 17   Global Step: 29620   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-26 06:58:33,756-Speed 25248.22 samples/sec   Loss 2.3818   LearningRate 0.0004   Epoch: 17   Global Step: 29630   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-26 06:58:43,540-Speed 25122.31 samples/sec   Loss 2.3676   LearningRate 0.0004   Epoch: 17   Global Step: 29640   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-26 06:58:53,252-Speed 25309.22 samples/sec   Loss 2.3609   LearningRate 0.0004   Epoch: 17   Global Step: 29650   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-26 06:59:02,990-Speed 25240.43 samples/sec   Loss 2.3778   LearningRate 0.0004   Epoch: 17   Global Step: 29660   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-26 06:59:12,800-Speed 25053.18 samples/sec   Loss 2.3862   LearningRate 0.0004   Epoch: 17   Global Step: 29670   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-26 06:59:22,653-Speed 24947.51 samples/sec   Loss 2.4062   LearningRate 0.0004   Epoch: 17   Global Step: 29680   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 06:59:32,419-Speed 25171.44 samples/sec   Loss 2.3968   LearningRate 0.0004   Epoch: 17   Global Step: 29690   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 06:59:42,195-Speed 25141.23 samples/sec   Loss 2.3803   LearningRate 0.0004   Epoch: 17   Global Step: 29700   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 06:59:51,933-Speed 25240.71 samples/sec   Loss 2.3871   LearningRate 0.0004   Epoch: 17   Global Step: 29710   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 07:00:01,638-Speed 25325.34 samples/sec   Loss 2.3795   LearningRate 0.0004   Epoch: 17   Global Step: 29720   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 07:00:11,389-Speed 25207.26 samples/sec   Loss 2.4096   LearningRate 0.0004   Epoch: 17   Global Step: 29730   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 07:00:21,192-Speed 25072.92 samples/sec   Loss 2.3869   LearningRate 0.0004   Epoch: 17   Global Step: 29740   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 07:00:30,969-Speed 25138.57 samples/sec   Loss 2.3791   LearningRate 0.0004   Epoch: 17   Global Step: 29750   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 07:00:40,781-Speed 25049.83 samples/sec   Loss 2.3851   LearningRate 0.0004   Epoch: 17   Global Step: 29760   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 07:00:50,485-Speed 25330.85 samples/sec   Loss 2.3677   LearningRate 0.0004   Epoch: 17   Global Step: 29770   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 07:01:00,250-Speed 25172.25 samples/sec   Loss 2.3891   LearningRate 0.0004   Epoch: 17   Global Step: 29780   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-03-26 07:01:09,905-Speed 25459.36 samples/sec   Loss 2.3908   LearningRate 0.0004   Epoch: 17   Global Step: 29790   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 07:01:19,619-Speed 25302.57 samples/sec   Loss 2.3671   LearningRate 0.0004   Epoch: 17   Global Step: 29800   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 07:01:29,418-Speed 25083.04 samples/sec   Loss 2.3820   LearningRate 0.0004   Epoch: 17   Global Step: 29810   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 07:01:39,181-Speed 25174.72 samples/sec   Loss 2.3741   LearningRate 0.0004   Epoch: 17   Global Step: 29820   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 07:01:48,905-Speed 25278.42 samples/sec   Loss 2.3858   LearningRate 0.0004   Epoch: 17   Global Step: 29830   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 07:01:58,583-Speed 25397.37 samples/sec   Loss 2.4747   LearningRate 0.0004   Epoch: 17   Global Step: 29840   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 07:02:08,366-Speed 25126.20 samples/sec   Loss 2.4165   LearningRate 0.0004   Epoch: 17   Global Step: 29850   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 07:02:18,116-Speed 25210.02 samples/sec   Loss 2.3894   LearningRate 0.0004   Epoch: 17   Global Step: 29860   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 07:02:27,828-Speed 25307.93 samples/sec   Loss 2.3766   LearningRate 0.0004   Epoch: 17   Global Step: 29870   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 07:02:37,502-Speed 25411.15 samples/sec   Loss 2.3640   LearningRate 0.0004   Epoch: 17   Global Step: 29880   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 07:02:47,261-Speed 25185.33 samples/sec   Loss 2.3750   LearningRate 0.0004   Epoch: 17   Global Step: 29890   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 07:02:56,999-Speed 25242.32 samples/sec   Loss 2.3416   LearningRate 0.0004   Epoch: 17   Global Step: 29900   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 07:03:06,781-Speed 25128.02 samples/sec   Loss 2.3550   LearningRate 0.0004   Epoch: 17   Global Step: 29910   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 07:03:16,468-Speed 25372.45 samples/sec   Loss 2.3560   LearningRate 0.0004   Epoch: 17   Global Step: 29920   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 07:03:26,216-Speed 25217.52 samples/sec   Loss 2.3729   LearningRate 0.0004   Epoch: 17   Global Step: 29930   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 07:03:35,998-Speed 25127.96 samples/sec   Loss 2.3658   LearningRate 0.0004   Epoch: 17   Global Step: 29940   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 07:03:45,658-Speed 25442.56 samples/sec   Loss 2.3532   LearningRate 0.0004   Epoch: 17   Global Step: 29950   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 07:03:55,400-Speed 25234.72 samples/sec   Loss 2.3420   LearningRate 0.0004   Epoch: 17   Global Step: 29960   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 07:04:05,163-Speed 25175.54 samples/sec   Loss 2.3742   LearningRate 0.0004   Epoch: 17   Global Step: 29970   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 07:04:14,926-Speed 25175.62 samples/sec   Loss 2.3778   LearningRate 0.0004   Epoch: 17   Global Step: 29980   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 07:04:24,625-Speed 25342.94 samples/sec   Loss 2.3855   LearningRate 0.0004   Epoch: 17   Global Step: 29990   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 07:04:34,353-Speed 25269.36 samples/sec   Loss 2.4152   LearningRate 0.0004   Epoch: 17   Global Step: 30000   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 07:04:44,059-Speed 25321.81 samples/sec   Loss 2.3813   LearningRate 0.0004   Epoch: 17   Global Step: 30010   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 07:04:53,763-Speed 25329.30 samples/sec   Loss 2.3651   LearningRate 0.0004   Epoch: 17   Global Step: 30020   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 07:05:03,493-Speed 25263.90 samples/sec   Loss 2.3489   LearningRate 0.0004   Epoch: 17   Global Step: 30030   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 07:05:13,256-Speed 25179.97 samples/sec   Loss 2.3567   LearningRate 0.0004   Epoch: 17   Global Step: 30040   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-26 07:05:22,935-Speed 25397.27 samples/sec   Loss 2.3742   LearningRate 0.0004   Epoch: 17   Global Step: 30050   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-26 07:05:32,715-Speed 25131.54 samples/sec   Loss 2.3545   LearningRate 0.0004   Epoch: 17   Global Step: 30060   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-26 07:05:42,466-Speed 25208.68 samples/sec   Loss 2.3492   LearningRate 0.0004   Epoch: 17   Global Step: 30070   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-26 07:05:52,221-Speed 25196.70 samples/sec   Loss 2.3715   LearningRate 0.0004   Epoch: 17   Global Step: 30080   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-26 07:06:02,001-Speed 25131.72 samples/sec   Loss 2.3562   LearningRate 0.0004   Epoch: 17   Global Step: 30090   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-26 07:06:11,723-Speed 25284.50 samples/sec   Loss 2.3601   LearningRate 0.0004   Epoch: 17   Global Step: 30100   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-26 07:06:21,439-Speed 25303.37 samples/sec   Loss 2.3404   LearningRate 0.0004   Epoch: 17   Global Step: 30110   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-26 07:06:31,178-Speed 25237.89 samples/sec   Loss 2.3582   LearningRate 0.0004   Epoch: 17   Global Step: 30120   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-26 07:06:40,927-Speed 25212.45 samples/sec   Loss 2.3665   LearningRate 0.0004   Epoch: 17   Global Step: 30130   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-26 07:06:50,722-Speed 25097.37 samples/sec   Loss 2.3438   LearningRate 0.0004   Epoch: 17   Global Step: 30140   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 07:07:00,412-Speed 25366.55 samples/sec   Loss 2.3616   LearningRate 0.0004   Epoch: 17   Global Step: 30150   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 07:07:10,213-Speed 25079.28 samples/sec   Loss 2.3618   LearningRate 0.0004   Epoch: 17   Global Step: 30160   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 07:07:20,043-Speed 25004.06 samples/sec   Loss 2.3566   LearningRate 0.0004   Epoch: 17   Global Step: 30170   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 07:07:29,924-Speed 24880.52 samples/sec   Loss 2.3632   LearningRate 0.0004   Epoch: 17   Global Step: 30180   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 07:07:39,685-Speed 25179.85 samples/sec   Loss 2.3675   LearningRate 0.0004   Epoch: 17   Global Step: 30190   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 07:07:49,443-Speed 25190.53 samples/sec   Loss 2.3551   LearningRate 0.0004   Epoch: 17   Global Step: 30200   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 07:07:59,227-Speed 25129.70 samples/sec   Loss 2.3483   LearningRate 0.0004   Epoch: 17   Global Step: 30210   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 07:08:08,992-Speed 25169.69 samples/sec   Loss 2.3607   LearningRate 0.0004   Epoch: 17   Global Step: 30220   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 07:08:18,726-Speed 25252.03 samples/sec   Loss 2.3389   LearningRate 0.0004   Epoch: 17   Global Step: 30230   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 07:08:28,421-Speed 25354.27 samples/sec   Loss 2.3445   LearningRate 0.0004   Epoch: 17   Global Step: 30240   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 07:08:38,187-Speed 25168.03 samples/sec   Loss 2.3298   LearningRate 0.0004   Epoch: 17   Global Step: 30250   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 07:08:48,034-Speed 24963.13 samples/sec   Loss 2.3256   LearningRate 0.0004   Epoch: 17   Global Step: 30260   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 07:08:58,019-Speed 24615.18 samples/sec   Loss 2.3617   LearningRate 0.0004   Epoch: 17   Global Step: 30270   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 07:09:08,069-Speed 24457.66 samples/sec   Loss 2.3551   LearningRate 0.0004   Epoch: 17   Global Step: 30280   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 07:09:18,045-Speed 24638.56 samples/sec   Loss 2.3505   LearningRate 0.0004   Epoch: 17   Global Step: 30290   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 07:09:28,143-Speed 24342.49 samples/sec   Loss 2.3499   LearningRate 0.0004   Epoch: 17   Global Step: 30300   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 07:09:38,182-Speed 24484.56 samples/sec   Loss 2.3392   LearningRate 0.0004   Epoch: 17   Global Step: 30310   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-26 07:09:48,281-Speed 24336.46 samples/sec   Loss 2.3246   LearningRate 0.0004   Epoch: 17   Global Step: 30320   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-26 07:09:58,386-Speed 24323.01 samples/sec   Loss 2.3215   LearningRate 0.0004   Epoch: 17   Global Step: 30330   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-26 07:10:08,580-Speed 24115.67 samples/sec   Loss 2.3538   LearningRate 0.0004   Epoch: 17   Global Step: 30340   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-26 07:10:18,815-Speed 24012.94 samples/sec   Loss 2.3456   LearningRate 0.0004   Epoch: 17   Global Step: 30350   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-26 07:10:28,882-Speed 24418.31 samples/sec   Loss 2.3163   LearningRate 0.0004   Epoch: 17   Global Step: 30360   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-26 07:10:38,840-Speed 24684.32 samples/sec   Loss 2.3269   LearningRate 0.0004   Epoch: 17   Global Step: 30370   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-26 07:10:48,919-Speed 24383.84 samples/sec   Loss 2.3405   LearningRate 0.0004   Epoch: 17   Global Step: 30380   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-26 07:10:58,944-Speed 24521.15 samples/sec   Loss 2.3281   LearningRate 0.0004   Epoch: 17   Global Step: 30390   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-26 07:11:09,112-Speed 24180.19 samples/sec   Loss 2.3446   LearningRate 0.0004   Epoch: 17   Global Step: 30400   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-26 07:11:19,134-Speed 24524.14 samples/sec   Loss 2.3860   LearningRate 0.0004   Epoch: 17   Global Step: 30410   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 07:11:29,122-Speed 24607.99 samples/sec   Loss 2.3743   LearningRate 0.0004   Epoch: 17   Global Step: 30420   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 07:11:39,177-Speed 24445.20 samples/sec   Loss 2.3253   LearningRate 0.0004   Epoch: 17   Global Step: 30430   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 07:11:49,186-Speed 24558.75 samples/sec   Loss 2.3560   LearningRate 0.0004   Epoch: 17   Global Step: 30440   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 07:11:59,113-Speed 24758.97 samples/sec   Loss 2.3537   LearningRate 0.0004   Epoch: 17   Global Step: 30450   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 07:12:09,151-Speed 24486.17 samples/sec   Loss 2.3289   LearningRate 0.0004   Epoch: 17   Global Step: 30460   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 07:12:19,070-Speed 24779.15 samples/sec   Loss 2.3521   LearningRate 0.0004   Epoch: 17   Global Step: 30470   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 07:12:29,154-Speed 24376.27 samples/sec   Loss 2.3299   LearningRate 0.0004   Epoch: 17   Global Step: 30480   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 07:12:39,111-Speed 24686.89 samples/sec   Loss 2.4597   LearningRate 0.0004   Epoch: 17   Global Step: 30490   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-26 07:12:49,300-Speed 24125.30 samples/sec   Loss 2.3501   LearningRate 0.0004   Epoch: 17   Global Step: 30500   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-26 07:12:59,466-Speed 24177.37 samples/sec   Loss 2.3340   LearningRate 0.0004   Epoch: 17   Global Step: 30510   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-26 07:13:09,494-Speed 24510.93 samples/sec   Loss 2.3235   LearningRate 0.0004   Epoch: 17   Global Step: 30520   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-26 07:13:19,439-Speed 24714.95 samples/sec   Loss 2.3107   LearningRate 0.0004   Epoch: 17   Global Step: 30530   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-26 07:13:29,476-Speed 24488.08 samples/sec   Loss 2.3313   LearningRate 0.0004   Epoch: 17   Global Step: 30540   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-26 07:13:39,449-Speed 24645.98 samples/sec   Loss 2.3517   LearningRate 0.0004   Epoch: 17   Global Step: 30550   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-26 07:13:49,411-Speed 24675.19 samples/sec   Loss 2.3786   LearningRate 0.0004   Epoch: 17   Global Step: 30560   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-26 07:13:59,339-Speed 24757.02 samples/sec   Loss 2.3485   LearningRate 0.0004   Epoch: 17   Global Step: 30570   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-26 07:14:09,274-Speed 24740.23 samples/sec   Loss 2.3175   LearningRate 0.0004   Epoch: 17   Global Step: 30580   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-26 07:14:19,352-Speed 24388.51 samples/sec   Loss 2.3332   LearningRate 0.0004   Epoch: 17   Global Step: 30590   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 07:14:29,362-Speed 24555.67 samples/sec   Loss 2.3159   LearningRate 0.0004   Epoch: 17   Global Step: 30600   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 07:14:39,346-Speed 24619.22 samples/sec   Loss 2.3281   LearningRate 0.0004   Epoch: 17   Global Step: 30610   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 07:14:49,430-Speed 24376.08 samples/sec   Loss 2.3580   LearningRate 0.0004   Epoch: 17   Global Step: 30620   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 07:14:59,449-Speed 24531.13 samples/sec   Loss 2.3165   LearningRate 0.0004   Epoch: 17   Global Step: 30630   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 07:15:09,445-Speed 24592.24 samples/sec   Loss 2.3192   LearningRate 0.0004   Epoch: 17   Global Step: 30640   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 07:15:19,422-Speed 24637.01 samples/sec   Loss 2.3335   LearningRate 0.0004   Epoch: 17   Global Step: 30650   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 07:15:29,382-Speed 24677.80 samples/sec   Loss 2.3141   LearningRate 0.0004   Epoch: 17   Global Step: 30660   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 07:15:39,345-Speed 24673.18 samples/sec   Loss 2.3034   LearningRate 0.0004   Epoch: 17   Global Step: 30670   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 07:15:49,395-Speed 24458.14 samples/sec   Loss 2.2976   LearningRate 0.0004   Epoch: 17   Global Step: 30680   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 07:15:59,463-Speed 24413.41 samples/sec   Loss 2.3111   LearningRate 0.0004   Epoch: 17   Global Step: 30690   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-03-26 07:16:09,359-Speed 24844.56 samples/sec   Loss 2.3116   LearningRate 0.0004   Epoch: 17   Global Step: 30700   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 07:16:19,452-Speed 24353.89 samples/sec   Loss 2.2950   LearningRate 0.0004   Epoch: 17   Global Step: 30710   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 07:16:29,613-Speed 24188.91 samples/sec   Loss 2.3117   LearningRate 0.0004   Epoch: 17   Global Step: 30720   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 07:16:39,595-Speed 24624.84 samples/sec   Loss 2.2996   LearningRate 0.0004   Epoch: 17   Global Step: 30730   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 07:16:49,759-Speed 24184.69 samples/sec   Loss 2.3491   LearningRate 0.0004   Epoch: 17   Global Step: 30740   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-26 07:16:59,776-Speed 24537.60 samples/sec   Loss 2.3441   LearningRate 0.0004   Epoch: 17   Global Step: 30750   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-26 07:17:10,043-Speed 23943.14 samples/sec   Loss 2.3434   LearningRate 0.0004   Epoch: 17   Global Step: 30760   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-26 07:17:20,086-Speed 24475.51 samples/sec   Loss 2.2887   LearningRate 0.0004   Epoch: 17   Global Step: 30770   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-26 07:17:30,263-Speed 24151.63 samples/sec   Loss 2.2841   LearningRate 0.0004   Epoch: 17   Global Step: 30780   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-26 07:17:40,287-Speed 24524.12 samples/sec   Loss 2.3114   LearningRate 0.0004   Epoch: 17   Global Step: 30790   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-26 07:17:50,232-Speed 24715.09 samples/sec   Loss 2.2999   LearningRate 0.0004   Epoch: 17   Global Step: 30800   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-26 07:18:00,257-Speed 24518.80 samples/sec   Loss 2.3071   LearningRate 0.0004   Epoch: 17   Global Step: 30810   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-26 07:18:10,237-Speed 24628.99 samples/sec   Loss 2.3350   LearningRate 0.0004   Epoch: 17   Global Step: 30820   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-26 07:18:20,208-Speed 24650.65 samples/sec   Loss 2.3064   LearningRate 0.0004   Epoch: 17   Global Step: 30830   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-26 07:18:30,155-Speed 24714.28 samples/sec   Loss 2.2959   LearningRate 0.0004   Epoch: 17   Global Step: 30840   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 07:18:40,147-Speed 24600.49 samples/sec   Loss 2.3112   LearningRate 0.0004   Epoch: 17   Global Step: 30850   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 07:18:50,069-Speed 24773.12 samples/sec   Loss 2.3295   LearningRate 0.0004   Epoch: 17   Global Step: 30860   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 07:19:00,006-Speed 24735.22 samples/sec   Loss 2.4019   LearningRate 0.0004   Epoch: 17   Global Step: 30870   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 07:19:09,986-Speed 24627.49 samples/sec   Loss 2.3159   LearningRate 0.0004   Epoch: 17   Global Step: 30880   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 07:19:20,026-Speed 24481.82 samples/sec   Loss 2.3041   LearningRate 0.0004   Epoch: 17   Global Step: 30890   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-26 07:19:30,030-Speed 24575.25 samples/sec   Loss 2.3284   LearningRate 0.0004   Epoch: 17   Global Step: 30900   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-26 07:19:40,042-Speed 24555.57 samples/sec   Loss 2.3210   LearningRate 0.0004   Epoch: 17   Global Step: 30910   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-26 07:19:49,961-Speed 24779.02 samples/sec   Loss 2.3384   LearningRate 0.0004   Epoch: 17   Global Step: 30920   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-26 07:20:00,063-Speed 24331.63 samples/sec   Loss 2.3283   LearningRate 0.0004   Epoch: 17   Global Step: 30930   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-26 07:20:09,994-Speed 24749.42 samples/sec   Loss 2.3133   LearningRate 0.0004   Epoch: 17   Global Step: 30940   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-26 07:20:20,021-Speed 24515.19 samples/sec   Loss 2.3149   LearningRate 0.0004   Epoch: 17   Global Step: 30950   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-26 07:20:29,991-Speed 24652.43 samples/sec   Loss 2.3263   LearningRate 0.0004   Epoch: 17   Global Step: 30960   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-26 07:20:39,898-Speed 24809.37 samples/sec   Loss 2.3186   LearningRate 0.0004   Epoch: 17   Global Step: 30970   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-26 07:20:49,834-Speed 24736.13 samples/sec   Loss 2.3181   LearningRate 0.0004   Epoch: 17   Global Step: 30980   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-26 07:20:59,717-Speed 24870.79 samples/sec   Loss 2.3132   LearningRate 0.0004   Epoch: 17   Global Step: 30990   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 07:21:09,570-Speed 24947.59 samples/sec   Loss 2.3023   LearningRate 0.0004   Epoch: 17   Global Step: 31000   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 07:21:19,448-Speed 24882.77 samples/sec   Loss 2.3133   LearningRate 0.0004   Epoch: 17   Global Step: 31010   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 07:21:29,243-Speed 25094.63 samples/sec   Loss 2.3122   LearningRate 0.0004   Epoch: 17   Global Step: 31020   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 07:21:39,011-Speed 25162.82 samples/sec   Loss 2.3285   LearningRate 0.0004   Epoch: 17   Global Step: 31030   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 07:21:48,692-Speed 25389.66 samples/sec   Loss 2.3229   LearningRate 0.0004   Epoch: 17   Global Step: 31040   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 07:21:58,423-Speed 25258.26 samples/sec   Loss 2.3173   LearningRate 0.0004   Epoch: 17   Global Step: 31050   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 07:22:08,302-Speed 24881.10 samples/sec   Loss 2.2929   LearningRate 0.0004   Epoch: 17   Global Step: 31060   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 07:22:18,079-Speed 25137.80 samples/sec   Loss 2.3490   LearningRate 0.0004   Epoch: 17   Global Step: 31070   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 07:22:27,873-Speed 25098.48 samples/sec   Loss 2.3349   LearningRate 0.0004   Epoch: 17   Global Step: 31080   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 07:22:37,733-Speed 24927.56 samples/sec   Loss 2.3291   LearningRate 0.0004   Epoch: 17   Global Step: 31090   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 07:22:47,504-Speed 25154.70 samples/sec   Loss 2.3468   LearningRate 0.0004   Epoch: 17   Global Step: 31100   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 07:22:57,272-Speed 25165.88 samples/sec   Loss 2.3312   LearningRate 0.0004   Epoch: 17   Global Step: 31110   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 07:23:57,530-Speed 4078.48 samples/sec   Loss 2.2830   LearningRate 0.0004   Epoch: 18   Global Step: 31120   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 07:24:07,350-Speed 25031.85 samples/sec   Loss 2.2754   LearningRate 0.0004   Epoch: 18   Global Step: 31130   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-26 07:24:17,102-Speed 25204.97 samples/sec   Loss 2.2783   LearningRate 0.0004   Epoch: 18   Global Step: 31140   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-26 07:24:26,846-Speed 25222.61 samples/sec   Loss 2.2922   LearningRate 0.0004   Epoch: 18   Global Step: 31150   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-26 07:24:36,624-Speed 25138.10 samples/sec   Loss 2.2497   LearningRate 0.0004   Epoch: 18   Global Step: 31160   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-26 07:24:46,337-Speed 25307.07 samples/sec   Loss 2.2556   LearningRate 0.0004   Epoch: 18   Global Step: 31170   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-26 07:24:56,074-Speed 25242.79 samples/sec   Loss 2.2685   LearningRate 0.0004   Epoch: 18   Global Step: 31180   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-26 07:25:05,819-Speed 25223.10 samples/sec   Loss 2.2858   LearningRate 0.0004   Epoch: 18   Global Step: 31190   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-26 07:25:15,666-Speed 24962.03 samples/sec   Loss 2.2648   LearningRate 0.0004   Epoch: 18   Global Step: 31200   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-26 07:25:25,476-Speed 25053.19 samples/sec   Loss 2.2820   LearningRate 0.0004   Epoch: 18   Global Step: 31210   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-26 07:25:35,343-Speed 24910.98 samples/sec   Loss 2.2609   LearningRate 0.0004   Epoch: 18   Global Step: 31220   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-26 07:25:45,271-Speed 24756.95 samples/sec   Loss 2.2817   LearningRate 0.0004   Epoch: 18   Global Step: 31230   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 07:25:55,180-Speed 24806.16 samples/sec   Loss 2.2854   LearningRate 0.0004   Epoch: 18   Global Step: 31240   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 07:26:05,117-Speed 24735.19 samples/sec   Loss 2.2855   LearningRate 0.0004   Epoch: 18   Global Step: 31250   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 07:26:15,007-Speed 24850.25 samples/sec   Loss 2.2772   LearningRate 0.0004   Epoch: 18   Global Step: 31260   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 07:26:24,868-Speed 24927.12 samples/sec   Loss 2.2771   LearningRate 0.0004   Epoch: 18   Global Step: 31270   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 07:26:34,582-Speed 25302.13 samples/sec   Loss 2.2945   LearningRate 0.0004   Epoch: 18   Global Step: 31280   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 07:26:44,470-Speed 24857.53 samples/sec   Loss 2.2760   LearningRate 0.0004   Epoch: 18   Global Step: 31290   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 07:26:54,199-Speed 25264.81 samples/sec   Loss 2.2859   LearningRate 0.0004   Epoch: 18   Global Step: 31300   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 07:27:04,013-Speed 25044.15 samples/sec   Loss 2.2912   LearningRate 0.0004   Epoch: 18   Global Step: 31310   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 07:27:13,759-Speed 25219.94 samples/sec   Loss 2.2769   LearningRate 0.0004   Epoch: 18   Global Step: 31320   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 07:27:23,495-Speed 25245.14 samples/sec   Loss 2.3055   LearningRate 0.0004   Epoch: 18   Global Step: 31330   Fp16 Grad Scale: 131072   Required: 11 hours
Training: 2022-03-26 07:27:33,339-Speed 24968.80 samples/sec   Loss 2.2923   LearningRate 0.0004   Epoch: 18   Global Step: 31340   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-26 07:27:43,307-Speed 24657.51 samples/sec   Loss 2.2936   LearningRate 0.0004   Epoch: 18   Global Step: 31350   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-26 07:27:53,230-Speed 24769.88 samples/sec   Loss 2.2818   LearningRate 0.0004   Epoch: 18   Global Step: 31360   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-26 07:28:03,191-Speed 24676.19 samples/sec   Loss 2.2888   LearningRate 0.0004   Epoch: 18   Global Step: 31370   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-26 07:28:13,143-Speed 24695.31 samples/sec   Loss 2.2690   LearningRate 0.0004   Epoch: 18   Global Step: 31380   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-26 07:28:23,132-Speed 24607.18 samples/sec   Loss 2.2761   LearningRate 0.0004   Epoch: 18   Global Step: 31390   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-26 07:28:33,145-Speed 24547.99 samples/sec   Loss 2.2776   LearningRate 0.0004   Epoch: 18   Global Step: 31400   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-26 07:28:43,066-Speed 24771.85 samples/sec   Loss 2.2725   LearningRate 0.0004   Epoch: 18   Global Step: 31410   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-26 07:28:53,087-Speed 24527.40 samples/sec   Loss 2.2694   LearningRate 0.0004   Epoch: 18   Global Step: 31420   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-26 07:29:02,975-Speed 24857.90 samples/sec   Loss 2.3129   LearningRate 0.0004   Epoch: 18   Global Step: 31430   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-26 07:29:12,883-Speed 24808.65 samples/sec   Loss 2.2736   LearningRate 0.0004   Epoch: 18   Global Step: 31440   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 07:29:22,852-Speed 24655.48 samples/sec   Loss 2.2830   LearningRate 0.0004   Epoch: 18   Global Step: 31450   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-26 07:29:32,735-Speed 24868.88 samples/sec   Loss 2.2961   LearningRate 0.0004   Epoch: 18   Global Step: 31460   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-26 07:29:42,503-Speed 25162.32 samples/sec   Loss 2.3036   LearningRate 0.0004   Epoch: 18   Global Step: 31470   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-26 07:29:52,202-Speed 25341.65 samples/sec   Loss 2.2589   LearningRate 0.0004   Epoch: 18   Global Step: 31480   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-26 07:30:01,875-Speed 25411.39 samples/sec   Loss 2.2630   LearningRate 0.0004   Epoch: 18   Global Step: 31490   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-26 07:30:11,614-Speed 25240.67 samples/sec   Loss 2.2709   LearningRate 0.0004   Epoch: 18   Global Step: 31500   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-26 07:30:21,353-Speed 25238.84 samples/sec   Loss 2.2846   LearningRate 0.0004   Epoch: 18   Global Step: 31510   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-26 07:30:31,159-Speed 25064.49 samples/sec   Loss 2.2735   LearningRate 0.0004   Epoch: 18   Global Step: 31520   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-26 07:30:40,997-Speed 24984.38 samples/sec   Loss 2.2679   LearningRate 0.0004   Epoch: 18   Global Step: 31530   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-26 07:30:50,864-Speed 24909.41 samples/sec   Loss 2.2939   LearningRate 0.0004   Epoch: 18   Global Step: 31540   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-26 07:31:00,692-Speed 25008.11 samples/sec   Loss 2.2796   LearningRate 0.0004   Epoch: 18   Global Step: 31550   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 07:31:10,369-Speed 25400.38 samples/sec   Loss 2.2582   LearningRate 0.0004   Epoch: 18   Global Step: 31560   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-26 07:31:20,060-Speed 25362.65 samples/sec   Loss 2.2561   LearningRate 0.0004   Epoch: 18   Global Step: 31570   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-26 07:31:29,846-Speed 25118.18 samples/sec   Loss 2.2749   LearningRate 0.0004   Epoch: 18   Global Step: 31580   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-26 07:31:39,603-Speed 25191.09 samples/sec   Loss 2.2780   LearningRate 0.0004   Epoch: 18   Global Step: 31590   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-26 07:31:49,354-Speed 25207.61 samples/sec   Loss 2.2838   LearningRate 0.0004   Epoch: 18   Global Step: 31600   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-26 07:31:59,022-Speed 25424.14 samples/sec   Loss 2.2802   LearningRate 0.0004   Epoch: 18   Global Step: 31610   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-26 07:32:08,747-Speed 25275.21 samples/sec   Loss 2.2732   LearningRate 0.0004   Epoch: 18   Global Step: 31620   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-26 07:32:18,437-Speed 25363.66 samples/sec   Loss 2.2656   LearningRate 0.0004   Epoch: 18   Global Step: 31630   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-26 07:32:28,116-Speed 25394.52 samples/sec   Loss 2.2711   LearningRate 0.0004   Epoch: 18   Global Step: 31640   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-26 07:32:37,806-Speed 25364.61 samples/sec   Loss 2.3369   LearningRate 0.0004   Epoch: 18   Global Step: 31650   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-26 07:32:47,653-Speed 24961.86 samples/sec   Loss 2.3132   LearningRate 0.0004   Epoch: 18   Global Step: 31660   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 07:32:57,340-Speed 25371.77 samples/sec   Loss 2.2890   LearningRate 0.0004   Epoch: 18   Global Step: 31670   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 07:33:07,039-Speed 25341.14 samples/sec   Loss 2.2634   LearningRate 0.0004   Epoch: 18   Global Step: 31680   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 07:33:16,776-Speed 25242.83 samples/sec   Loss 2.2720   LearningRate 0.0004   Epoch: 18   Global Step: 31690   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 07:33:26,472-Speed 25351.19 samples/sec   Loss 2.2714   LearningRate 0.0004   Epoch: 18   Global Step: 31700   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 07:33:36,177-Speed 25326.83 samples/sec   Loss 2.2627   LearningRate 0.0004   Epoch: 18   Global Step: 31710   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 07:33:46,010-Speed 24997.34 samples/sec   Loss 2.2647   LearningRate 0.0004   Epoch: 18   Global Step: 31720   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 07:33:55,724-Speed 25302.37 samples/sec   Loss 2.2743   LearningRate 0.0004   Epoch: 18   Global Step: 31730   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 07:34:05,400-Speed 25408.33 samples/sec   Loss 2.2592   LearningRate 0.0004   Epoch: 18   Global Step: 31740   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 07:34:15,081-Speed 25389.20 samples/sec   Loss 2.2468   LearningRate 0.0004   Epoch: 18   Global Step: 31750   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 07:34:24,782-Speed 25336.00 samples/sec   Loss 2.2484   LearningRate 0.0004   Epoch: 18   Global Step: 31760   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-26 07:34:34,510-Speed 25265.34 samples/sec   Loss 2.2599   LearningRate 0.0004   Epoch: 18   Global Step: 31770   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-26 07:34:44,293-Speed 25127.29 samples/sec   Loss 2.2887   LearningRate 0.0004   Epoch: 18   Global Step: 31780   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-26 07:34:54,022-Speed 25263.29 samples/sec   Loss 2.2519   LearningRate 0.0004   Epoch: 18   Global Step: 31790   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-26 07:35:03,853-Speed 25003.79 samples/sec   Loss 2.2703   LearningRate 0.0004   Epoch: 18   Global Step: 31800   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-26 07:35:13,681-Speed 25008.67 samples/sec   Loss 2.2451   LearningRate 0.0004   Epoch: 18   Global Step: 31810   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-26 07:35:23,449-Speed 25162.44 samples/sec   Loss 2.2664   LearningRate 0.0004   Epoch: 18   Global Step: 31820   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-26 07:35:33,337-Speed 24857.46 samples/sec   Loss 2.2812   LearningRate 0.0004   Epoch: 18   Global Step: 31830   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-26 07:35:43,329-Speed 24598.61 samples/sec   Loss 2.2730   LearningRate 0.0004   Epoch: 18   Global Step: 31840   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-26 07:35:53,203-Speed 24894.83 samples/sec   Loss 2.2630   LearningRate 0.0004   Epoch: 18   Global Step: 31850   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-26 07:36:03,044-Speed 24975.78 samples/sec   Loss 2.2605   LearningRate 0.0004   Epoch: 18   Global Step: 31860   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 07:36:12,849-Speed 25064.93 samples/sec   Loss 2.2575   LearningRate 0.0004   Epoch: 18   Global Step: 31870   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 07:36:22,624-Speed 25145.13 samples/sec   Loss 2.2641   LearningRate 0.0004   Epoch: 18   Global Step: 31880   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-26 07:36:32,364-Speed 25235.62 samples/sec   Loss 2.2340   LearningRate 0.0004   Epoch: 18   Global Step: 31890   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-26 07:36:42,111-Speed 25216.84 samples/sec   Loss 2.2466   LearningRate 0.0004   Epoch: 18   Global Step: 31900   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-26 07:36:51,826-Speed 25301.05 samples/sec   Loss 2.2627   LearningRate 0.0004   Epoch: 18   Global Step: 31910   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-26 07:37:01,589-Speed 25174.02 samples/sec   Loss 2.3177   LearningRate 0.0004   Epoch: 18   Global Step: 31920   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 07:37:11,497-Speed 24808.69 samples/sec   Loss 2.2947   LearningRate 0.0004   Epoch: 18   Global Step: 31930   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 07:37:21,200-Speed 25331.44 samples/sec   Loss 2.2764   LearningRate 0.0004   Epoch: 18   Global Step: 31940   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 07:37:31,054-Speed 24943.58 samples/sec   Loss 2.2860   LearningRate 0.0004   Epoch: 18   Global Step: 31950   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 07:37:40,866-Speed 25052.90 samples/sec   Loss 2.2709   LearningRate 0.0004   Epoch: 18   Global Step: 31960   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 07:37:50,618-Speed 25204.52 samples/sec   Loss 2.2246   LearningRate 0.0004   Epoch: 18   Global Step: 31970   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 07:38:00,359-Speed 25231.15 samples/sec   Loss 2.2322   LearningRate 0.0004   Epoch: 18   Global Step: 31980   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 07:38:10,164-Speed 25067.80 samples/sec   Loss 2.2317   LearningRate 0.0004   Epoch: 18   Global Step: 31990   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 07:38:20,014-Speed 24952.76 samples/sec   Loss 2.2299   LearningRate 0.0004   Epoch: 18   Global Step: 32000   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 07:38:29,842-Speed 25007.42 samples/sec   Loss 2.2397   LearningRate 0.0004   Epoch: 18   Global Step: 32010   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 07:38:39,818-Speed 24638.58 samples/sec   Loss 2.2653   LearningRate 0.0004   Epoch: 18   Global Step: 32020   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 07:38:49,577-Speed 25186.64 samples/sec   Loss 2.2773   LearningRate 0.0004   Epoch: 18   Global Step: 32030   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 07:38:59,368-Speed 25104.13 samples/sec   Loss 2.2288   LearningRate 0.0004   Epoch: 18   Global Step: 32040   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 07:39:09,168-Speed 25080.19 samples/sec   Loss 2.2177   LearningRate 0.0004   Epoch: 18   Global Step: 32050   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 07:39:18,933-Speed 25171.48 samples/sec   Loss 2.2312   LearningRate 0.0004   Epoch: 18   Global Step: 32060   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 07:39:28,755-Speed 25026.60 samples/sec   Loss 2.2271   LearningRate 0.0004   Epoch: 18   Global Step: 32070   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 07:39:38,676-Speed 24774.41 samples/sec   Loss 2.2669   LearningRate 0.0004   Epoch: 18   Global Step: 32080   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 07:39:48,553-Speed 24886.70 samples/sec   Loss 2.2665   LearningRate 0.0004   Epoch: 18   Global Step: 32090   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 07:39:58,410-Speed 24934.76 samples/sec   Loss 2.2446   LearningRate 0.0004   Epoch: 18   Global Step: 32100   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 07:40:08,165-Speed 25196.41 samples/sec   Loss 2.2569   LearningRate 0.0004   Epoch: 18   Global Step: 32110   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 07:40:17,866-Speed 25336.30 samples/sec   Loss 2.2621   LearningRate 0.0004   Epoch: 18   Global Step: 32120   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 07:40:27,652-Speed 25114.78 samples/sec   Loss 2.2534   LearningRate 0.0004   Epoch: 18   Global Step: 32130   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 07:40:37,430-Speed 25138.06 samples/sec   Loss 2.2353   LearningRate 0.0004   Epoch: 18   Global Step: 32140   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 07:40:47,230-Speed 25080.31 samples/sec   Loss 2.2297   LearningRate 0.0004   Epoch: 18   Global Step: 32150   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 07:40:56,968-Speed 25240.84 samples/sec   Loss 2.2421   LearningRate 0.0004   Epoch: 18   Global Step: 32160   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 07:41:06,748-Speed 25132.84 samples/sec   Loss 2.2332   LearningRate 0.0004   Epoch: 18   Global Step: 32170   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 07:41:16,543-Speed 25093.28 samples/sec   Loss 2.2594   LearningRate 0.0004   Epoch: 18   Global Step: 32180   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 07:41:26,310-Speed 25166.54 samples/sec   Loss 2.2545   LearningRate 0.0004   Epoch: 18   Global Step: 32190   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 07:41:36,067-Speed 25191.09 samples/sec   Loss 2.2541   LearningRate 0.0004   Epoch: 18   Global Step: 32200   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 07:41:46,187-Speed 24288.11 samples/sec   Loss 2.2264   LearningRate 0.0004   Epoch: 18   Global Step: 32210   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 07:41:56,029-Speed 24972.80 samples/sec   Loss 2.2311   LearningRate 0.0004   Epoch: 18   Global Step: 32220   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 07:42:05,747-Speed 25293.97 samples/sec   Loss 2.2594   LearningRate 0.0004   Epoch: 18   Global Step: 32230   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 07:42:15,557-Speed 25053.57 samples/sec   Loss 2.2421   LearningRate 0.0004   Epoch: 18   Global Step: 32240   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 07:42:25,399-Speed 24974.73 samples/sec   Loss 2.2386   LearningRate 0.0004   Epoch: 18   Global Step: 32250   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 07:42:35,215-Speed 25038.14 samples/sec   Loss 2.2492   LearningRate 0.0004   Epoch: 18   Global Step: 32260   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 07:42:45,012-Speed 25090.04 samples/sec   Loss 2.2527   LearningRate 0.0004   Epoch: 18   Global Step: 32270   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 07:42:54,767-Speed 25197.62 samples/sec   Loss 2.3107   LearningRate 0.0004   Epoch: 18   Global Step: 32280   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 07:43:04,540-Speed 25150.56 samples/sec   Loss 2.2709   LearningRate 0.0004   Epoch: 18   Global Step: 32290   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 07:43:14,297-Speed 25191.25 samples/sec   Loss 2.2331   LearningRate 0.0004   Epoch: 18   Global Step: 32300   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 07:43:24,146-Speed 24955.57 samples/sec   Loss 2.2402   LearningRate 0.0004   Epoch: 18   Global Step: 32310   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 07:43:33,921-Speed 25146.39 samples/sec   Loss 2.2258   LearningRate 0.0003   Epoch: 18   Global Step: 32320   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 07:43:43,780-Speed 24930.01 samples/sec   Loss 2.2334   LearningRate 0.0003   Epoch: 18   Global Step: 32330   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 07:43:53,574-Speed 25096.22 samples/sec   Loss 2.2382   LearningRate 0.0003   Epoch: 18   Global Step: 32340   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 07:44:03,360-Speed 25118.34 samples/sec   Loss 2.2190   LearningRate 0.0003   Epoch: 18   Global Step: 32350   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 07:44:13,081-Speed 25281.55 samples/sec   Loss 2.2070   LearningRate 0.0003   Epoch: 18   Global Step: 32360   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 07:44:22,793-Speed 25308.39 samples/sec   Loss 2.2030   LearningRate 0.0003   Epoch: 18   Global Step: 32370   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 07:44:32,587-Speed 25095.07 samples/sec   Loss 2.2161   LearningRate 0.0003   Epoch: 18   Global Step: 32380   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 07:44:42,338-Speed 25209.45 samples/sec   Loss 2.2499   LearningRate 0.0003   Epoch: 18   Global Step: 32390   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 07:44:52,151-Speed 25047.59 samples/sec   Loss 2.2346   LearningRate 0.0003   Epoch: 18   Global Step: 32400   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 07:45:02,005-Speed 24944.04 samples/sec   Loss 2.2348   LearningRate 0.0003   Epoch: 18   Global Step: 32410   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 07:45:11,820-Speed 25041.82 samples/sec   Loss 2.2245   LearningRate 0.0003   Epoch: 18   Global Step: 32420   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 07:45:21,520-Speed 25340.75 samples/sec   Loss 2.2321   LearningRate 0.0003   Epoch: 18   Global Step: 32430   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 07:45:31,394-Speed 24893.34 samples/sec   Loss 2.2343   LearningRate 0.0003   Epoch: 18   Global Step: 32440   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 07:45:41,432-Speed 24486.09 samples/sec   Loss 2.2366   LearningRate 0.0003   Epoch: 18   Global Step: 32450   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 07:45:51,536-Speed 24325.98 samples/sec   Loss 2.2136   LearningRate 0.0003   Epoch: 18   Global Step: 32460   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 07:46:01,624-Speed 24367.43 samples/sec   Loss 2.2389   LearningRate 0.0003   Epoch: 18   Global Step: 32470   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 07:46:11,802-Speed 24152.37 samples/sec   Loss 2.2584   LearningRate 0.0003   Epoch: 18   Global Step: 32480   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 07:46:21,901-Speed 24337.27 samples/sec   Loss 2.2407   LearningRate 0.0003   Epoch: 18   Global Step: 32490   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 07:46:32,057-Speed 24201.58 samples/sec   Loss 2.2257   LearningRate 0.0003   Epoch: 18   Global Step: 32500   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 07:46:42,153-Speed 24343.72 samples/sec   Loss 2.2074   LearningRate 0.0003   Epoch: 18   Global Step: 32510   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 07:46:52,167-Speed 24544.65 samples/sec   Loss 2.2084   LearningRate 0.0003   Epoch: 18   Global Step: 32520   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 07:47:01,937-Speed 25156.34 samples/sec   Loss 2.2580   LearningRate 0.0003   Epoch: 18   Global Step: 32530   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 07:47:11,685-Speed 25214.39 samples/sec   Loss 2.2405   LearningRate 0.0003   Epoch: 18   Global Step: 32540   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 07:47:21,442-Speed 25192.76 samples/sec   Loss 2.2231   LearningRate 0.0003   Epoch: 18   Global Step: 32550   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 07:47:31,158-Speed 25296.21 samples/sec   Loss 2.2114   LearningRate 0.0003   Epoch: 18   Global Step: 32560   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 07:47:40,947-Speed 25109.35 samples/sec   Loss 2.1928   LearningRate 0.0003   Epoch: 18   Global Step: 32570   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 07:47:50,704-Speed 25190.91 samples/sec   Loss 2.2122   LearningRate 0.0003   Epoch: 18   Global Step: 32580   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 07:48:00,427-Speed 25280.19 samples/sec   Loss 2.2461   LearningRate 0.0003   Epoch: 18   Global Step: 32590   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 07:48:10,279-Speed 24948.17 samples/sec   Loss 2.2413   LearningRate 0.0003   Epoch: 18   Global Step: 32600   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 07:48:20,021-Speed 25230.34 samples/sec   Loss 2.2342   LearningRate 0.0003   Epoch: 18   Global Step: 32610   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 07:48:29,805-Speed 25124.23 samples/sec   Loss 2.2189   LearningRate 0.0003   Epoch: 18   Global Step: 32620   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 07:48:39,596-Speed 25102.53 samples/sec   Loss 2.2219   LearningRate 0.0003   Epoch: 18   Global Step: 32630   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 07:48:49,361-Speed 25171.67 samples/sec   Loss 2.2166   LearningRate 0.0003   Epoch: 18   Global Step: 32640   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 07:48:59,207-Speed 24963.51 samples/sec   Loss 2.2087   LearningRate 0.0003   Epoch: 18   Global Step: 32650   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 07:49:09,046-Speed 24981.79 samples/sec   Loss 2.2136   LearningRate 0.0003   Epoch: 18   Global Step: 32660   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 07:49:18,818-Speed 25152.62 samples/sec   Loss 2.2022   LearningRate 0.0003   Epoch: 18   Global Step: 32670   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 07:49:28,615-Speed 25089.39 samples/sec   Loss 2.2099   LearningRate 0.0003   Epoch: 18   Global Step: 32680   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 07:49:38,589-Speed 24642.83 samples/sec   Loss 2.2288   LearningRate 0.0003   Epoch: 18   Global Step: 32690   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 07:49:48,339-Speed 25208.70 samples/sec   Loss 2.2183   LearningRate 0.0003   Epoch: 18   Global Step: 32700   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 07:49:58,075-Speed 25247.24 samples/sec   Loss 2.2290   LearningRate 0.0003   Epoch: 18   Global Step: 32710   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 07:50:07,922-Speed 24958.82 samples/sec   Loss 2.2280   LearningRate 0.0003   Epoch: 18   Global Step: 32720   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 07:50:17,707-Speed 25120.56 samples/sec   Loss 2.2171   LearningRate 0.0003   Epoch: 18   Global Step: 32730   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 07:50:27,440-Speed 25252.19 samples/sec   Loss 2.2232   LearningRate 0.0003   Epoch: 18   Global Step: 32740   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 07:50:37,208-Speed 25162.20 samples/sec   Loss 2.2245   LearningRate 0.0003   Epoch: 18   Global Step: 32750   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 07:50:46,929-Speed 25285.43 samples/sec   Loss 2.2344   LearningRate 0.0003   Epoch: 18   Global Step: 32760   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 07:50:56,621-Speed 25359.77 samples/sec   Loss 2.2155   LearningRate 0.0003   Epoch: 18   Global Step: 32770   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 07:51:06,404-Speed 25123.18 samples/sec   Loss 2.2335   LearningRate 0.0003   Epoch: 18   Global Step: 32780   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 07:51:16,114-Speed 25313.36 samples/sec   Loss 2.2329   LearningRate 0.0003   Epoch: 18   Global Step: 32790   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 07:51:25,965-Speed 24951.49 samples/sec   Loss 2.2243   LearningRate 0.0003   Epoch: 18   Global Step: 32800   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 07:51:35,776-Speed 25051.60 samples/sec   Loss 2.2434   LearningRate 0.0003   Epoch: 18   Global Step: 32810   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 07:51:45,511-Speed 25252.78 samples/sec   Loss 2.2560   LearningRate 0.0003   Epoch: 18   Global Step: 32820   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 07:51:55,313-Speed 25073.84 samples/sec   Loss 2.2174   LearningRate 0.0003   Epoch: 18   Global Step: 32830   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 07:52:54,455-Speed 4155.53 samples/sec   Loss 2.2084   LearningRate 0.0003   Epoch: 19   Global Step: 32840   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 07:53:04,298-Speed 24970.91 samples/sec   Loss 2.1768   LearningRate 0.0003   Epoch: 19   Global Step: 32850   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 07:53:14,377-Speed 24386.73 samples/sec   Loss 2.1892   LearningRate 0.0003   Epoch: 19   Global Step: 32860   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 07:53:24,254-Speed 24885.19 samples/sec   Loss 2.1956   LearningRate 0.0003   Epoch: 19   Global Step: 32870   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 07:53:34,119-Speed 24916.00 samples/sec   Loss 2.1751   LearningRate 0.0003   Epoch: 19   Global Step: 32880   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 07:53:44,006-Speed 24860.29 samples/sec   Loss 2.1754   LearningRate 0.0003   Epoch: 19   Global Step: 32890   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 07:53:53,969-Speed 24668.54 samples/sec   Loss 2.1890   LearningRate 0.0003   Epoch: 19   Global Step: 32900   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 07:54:03,856-Speed 24859.69 samples/sec   Loss 2.1869   LearningRate 0.0003   Epoch: 19   Global Step: 32910   Fp16 Grad Scale: 131072   Required: 10 hours
Training: 2022-03-26 07:54:13,842-Speed 24613.05 samples/sec   Loss 2.2137   LearningRate 0.0003   Epoch: 19   Global Step: 32920   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 07:54:23,814-Speed 24648.29 samples/sec   Loss 2.1923   LearningRate 0.0003   Epoch: 19   Global Step: 32930   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 07:54:33,761-Speed 24710.11 samples/sec   Loss 2.2051   LearningRate 0.0003   Epoch: 19   Global Step: 32940   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 07:54:43,574-Speed 25046.23 samples/sec   Loss 2.2021   LearningRate 0.0003   Epoch: 19   Global Step: 32950   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 07:54:53,314-Speed 25235.36 samples/sec   Loss 2.2014   LearningRate 0.0003   Epoch: 19   Global Step: 32960   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 07:55:03,180-Speed 24918.26 samples/sec   Loss 2.1867   LearningRate 0.0003   Epoch: 19   Global Step: 32970   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 07:55:13,216-Speed 24527.05 samples/sec   Loss 2.1707   LearningRate 0.0003   Epoch: 19   Global Step: 32980   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 07:55:23,035-Speed 25031.44 samples/sec   Loss 2.1495   LearningRate 0.0003   Epoch: 19   Global Step: 32990   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 07:55:32,860-Speed 25047.89 samples/sec   Loss 2.1818   LearningRate 0.0003   Epoch: 19   Global Step: 33000   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 07:55:42,707-Speed 24959.65 samples/sec   Loss 2.2648   LearningRate 0.0003   Epoch: 19   Global Step: 33010   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 07:55:52,514-Speed 25062.77 samples/sec   Loss 2.2271   LearningRate 0.0003   Epoch: 19   Global Step: 33020   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 07:56:02,400-Speed 24861.90 samples/sec   Loss 2.1789   LearningRate 0.0003   Epoch: 19   Global Step: 33030   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 07:56:12,367-Speed 24660.93 samples/sec   Loss 2.1775   LearningRate 0.0003   Epoch: 19   Global Step: 33040   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 07:56:22,555-Speed 24123.73 samples/sec   Loss 2.1951   LearningRate 0.0003   Epoch: 19   Global Step: 33050   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 07:56:32,825-Speed 23931.69 samples/sec   Loss 2.2083   LearningRate 0.0003   Epoch: 19   Global Step: 33060   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 07:56:43,135-Speed 23840.24 samples/sec   Loss 2.1897   LearningRate 0.0003   Epoch: 19   Global Step: 33070   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 07:56:53,597-Speed 23490.79 samples/sec   Loss 2.2032   LearningRate 0.0003   Epoch: 19   Global Step: 33080   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 07:57:03,943-Speed 23758.39 samples/sec   Loss 2.2287   LearningRate 0.0003   Epoch: 19   Global Step: 33090   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 07:57:14,327-Speed 23670.95 samples/sec   Loss 2.2025   LearningRate 0.0003   Epoch: 19   Global Step: 33100   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 07:57:24,497-Speed 24166.80 samples/sec   Loss 2.2003   LearningRate 0.0003   Epoch: 19   Global Step: 33110   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 07:57:34,453-Speed 24687.36 samples/sec   Loss 2.1761   LearningRate 0.0003   Epoch: 19   Global Step: 33120   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 07:57:44,402-Speed 24705.48 samples/sec   Loss 2.1964   LearningRate 0.0003   Epoch: 19   Global Step: 33130   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 07:57:54,334-Speed 24745.85 samples/sec   Loss 2.1875   LearningRate 0.0003   Epoch: 19   Global Step: 33140   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 07:58:04,385-Speed 24457.13 samples/sec   Loss 2.1921   LearningRate 0.0003   Epoch: 19   Global Step: 33150   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 07:58:14,425-Speed 24479.96 samples/sec   Loss 2.2102   LearningRate 0.0003   Epoch: 19   Global Step: 33160   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 07:58:24,693-Speed 23938.48 samples/sec   Loss 2.1864   LearningRate 0.0003   Epoch: 19   Global Step: 33170   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 07:58:34,844-Speed 24211.48 samples/sec   Loss 2.1867   LearningRate 0.0003   Epoch: 19   Global Step: 33180   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 07:58:44,969-Speed 24275.70 samples/sec   Loss 2.1938   LearningRate 0.0003   Epoch: 19   Global Step: 33190   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 07:58:55,063-Speed 24351.61 samples/sec   Loss 2.1906   LearningRate 0.0003   Epoch: 19   Global Step: 33200   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 07:59:05,151-Speed 24365.16 samples/sec   Loss 2.1992   LearningRate 0.0003   Epoch: 19   Global Step: 33210   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 07:59:15,218-Speed 24421.41 samples/sec   Loss 2.2027   LearningRate 0.0003   Epoch: 19   Global Step: 33220   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 07:59:25,374-Speed 24201.62 samples/sec   Loss 2.2057   LearningRate 0.0003   Epoch: 19   Global Step: 33230   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 07:59:35,335-Speed 24718.81 samples/sec   Loss 2.1982   LearningRate 0.0003   Epoch: 19   Global Step: 33240   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 07:59:45,212-Speed 24886.00 samples/sec   Loss 2.1867   LearningRate 0.0003   Epoch: 19   Global Step: 33250   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 07:59:55,115-Speed 24820.18 samples/sec   Loss 2.1691   LearningRate 0.0003   Epoch: 19   Global Step: 33260   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 08:00:04,900-Speed 25156.19 samples/sec   Loss 2.1822   LearningRate 0.0003   Epoch: 19   Global Step: 33270   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 08:00:14,837-Speed 25079.65 samples/sec   Loss 2.1750   LearningRate 0.0003   Epoch: 19   Global Step: 33280   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 08:00:24,766-Speed 24757.83 samples/sec   Loss 2.1863   LearningRate 0.0003   Epoch: 19   Global Step: 33290   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 08:00:34,700-Speed 24740.71 samples/sec   Loss 2.2046   LearningRate 0.0003   Epoch: 19   Global Step: 33300   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 08:00:44,687-Speed 24631.15 samples/sec   Loss 2.1950   LearningRate 0.0003   Epoch: 19   Global Step: 33310   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 08:00:54,817-Speed 24263.72 samples/sec   Loss 2.1722   LearningRate 0.0003   Epoch: 19   Global Step: 33320   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 08:01:05,035-Speed 24053.87 samples/sec   Loss 2.1935   LearningRate 0.0003   Epoch: 19   Global Step: 33330   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 08:01:15,301-Speed 23940.55 samples/sec   Loss 2.1864   LearningRate 0.0003   Epoch: 19   Global Step: 33340   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 08:01:25,750-Speed 23523.07 samples/sec   Loss 2.1956   LearningRate 0.0003   Epoch: 19   Global Step: 33350   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 08:01:36,132-Speed 23674.85 samples/sec   Loss 2.1906   LearningRate 0.0003   Epoch: 19   Global Step: 33360   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 08:01:46,352-Speed 24051.02 samples/sec   Loss 2.1795   LearningRate 0.0003   Epoch: 19   Global Step: 33370   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:01:56,727-Speed 23689.70 samples/sec   Loss 2.1941   LearningRate 0.0003   Epoch: 19   Global Step: 33380   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:02:06,993-Speed 23941.08 samples/sec   Loss 2.1833   LearningRate 0.0003   Epoch: 19   Global Step: 33390   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:02:17,288-Speed 23876.31 samples/sec   Loss 2.1698   LearningRate 0.0003   Epoch: 19   Global Step: 33400   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:02:27,520-Speed 24022.48 samples/sec   Loss 2.1653   LearningRate 0.0003   Epoch: 19   Global Step: 33410   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:02:37,864-Speed 23760.89 samples/sec   Loss 2.1766   LearningRate 0.0003   Epoch: 19   Global Step: 33420   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:02:48,206-Speed 23767.45 samples/sec   Loss 2.1904   LearningRate 0.0003   Epoch: 19   Global Step: 33430   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:02:58,520-Speed 23830.20 samples/sec   Loss 2.1616   LearningRate 0.0003   Epoch: 19   Global Step: 33440   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:03:08,793-Speed 23928.52 samples/sec   Loss 2.1811   LearningRate 0.0003   Epoch: 19   Global Step: 33450   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:03:19,053-Speed 23956.54 samples/sec   Loss 2.2003   LearningRate 0.0003   Epoch: 19   Global Step: 33460   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:03:29,657-Speed 23177.76 samples/sec   Loss 2.1731   LearningRate 0.0003   Epoch: 19   Global Step: 33470   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 08:03:40,315-Speed 23061.63 samples/sec   Loss 2.1963   LearningRate 0.0003   Epoch: 19   Global Step: 33480   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 08:03:51,099-Speed 22792.53 samples/sec   Loss 2.1852   LearningRate 0.0003   Epoch: 19   Global Step: 33490   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 08:04:01,821-Speed 22923.05 samples/sec   Loss 2.1592   LearningRate 0.0003   Epoch: 19   Global Step: 33500   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 08:04:12,136-Speed 23830.37 samples/sec   Loss 2.1702   LearningRate 0.0003   Epoch: 19   Global Step: 33510   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 08:04:22,441-Speed 23852.48 samples/sec   Loss 2.1738   LearningRate 0.0003   Epoch: 19   Global Step: 33520   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 08:04:32,711-Speed 23931.96 samples/sec   Loss 2.1463   LearningRate 0.0003   Epoch: 19   Global Step: 33530   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:04:42,911-Speed 24097.30 samples/sec   Loss 2.1686   LearningRate 0.0003   Epoch: 19   Global Step: 33540   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:04:53,089-Speed 24150.23 samples/sec   Loss 2.1698   LearningRate 0.0003   Epoch: 19   Global Step: 33550   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:05:03,262-Speed 24160.53 samples/sec   Loss 2.1701   LearningRate 0.0003   Epoch: 19   Global Step: 33560   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:05:13,587-Speed 23804.95 samples/sec   Loss 2.1690   LearningRate 0.0003   Epoch: 19   Global Step: 33570   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:05:23,827-Speed 24003.06 samples/sec   Loss 2.1555   LearningRate 0.0003   Epoch: 19   Global Step: 33580   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:05:34,053-Speed 24036.07 samples/sec   Loss 2.1521   LearningRate 0.0003   Epoch: 19   Global Step: 33590   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:05:44,229-Speed 24155.36 samples/sec   Loss 2.1598   LearningRate 0.0003   Epoch: 19   Global Step: 33600   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:05:54,312-Speed 24377.46 samples/sec   Loss 2.1696   LearningRate 0.0003   Epoch: 19   Global Step: 33610   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:06:04,618-Speed 23848.39 samples/sec   Loss 2.1498   LearningRate 0.0003   Epoch: 19   Global Step: 33620   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:06:14,852-Speed 24018.18 samples/sec   Loss 2.1568   LearningRate 0.0003   Epoch: 19   Global Step: 33630   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 08:06:25,093-Speed 24002.62 samples/sec   Loss 2.1920   LearningRate 0.0003   Epoch: 19   Global Step: 33640   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 08:06:35,360-Speed 23939.33 samples/sec   Loss 2.1926   LearningRate 0.0003   Epoch: 19   Global Step: 33650   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 08:06:45,583-Speed 24045.18 samples/sec   Loss 2.1420   LearningRate 0.0003   Epoch: 19   Global Step: 33660   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 08:06:55,859-Speed 23917.99 samples/sec   Loss 2.1564   LearningRate 0.0003   Epoch: 19   Global Step: 33670   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 08:07:06,150-Speed 23885.77 samples/sec   Loss 2.1550   LearningRate 0.0003   Epoch: 19   Global Step: 33680   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 08:07:16,342-Speed 24116.13 samples/sec   Loss 2.1613   LearningRate 0.0003   Epoch: 19   Global Step: 33690   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 08:07:26,480-Speed 24241.91 samples/sec   Loss 2.1659   LearningRate 0.0003   Epoch: 19   Global Step: 33700   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 08:07:36,758-Speed 23916.09 samples/sec   Loss 2.1858   LearningRate 0.0003   Epoch: 19   Global Step: 33710   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:07:46,976-Speed 24053.67 samples/sec   Loss 2.1742   LearningRate 0.0003   Epoch: 19   Global Step: 33720   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-03-26 08:07:57,138-Speed 24188.65 samples/sec   Loss 2.1274   LearningRate 0.0003   Epoch: 19   Global Step: 33730   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-03-26 08:08:07,306-Speed 24172.46 samples/sec   Loss 2.1395   LearningRate 0.0003   Epoch: 19   Global Step: 33740   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-03-26 08:08:17,440-Speed 24259.62 samples/sec   Loss 2.1697   LearningRate 0.0003   Epoch: 19   Global Step: 33750   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-03-26 08:08:27,602-Speed 24186.14 samples/sec   Loss 2.1914   LearningRate 0.0003   Epoch: 19   Global Step: 33760   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-03-26 08:08:37,808-Speed 24084.72 samples/sec   Loss 2.1603   LearningRate 0.0003   Epoch: 19   Global Step: 33770   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-03-26 08:08:48,020-Speed 24069.61 samples/sec   Loss 2.1396   LearningRate 0.0003   Epoch: 19   Global Step: 33780   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-03-26 08:08:58,295-Speed 23921.72 samples/sec   Loss 2.1544   LearningRate 0.0003   Epoch: 19   Global Step: 33790   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-03-26 08:09:08,570-Speed 23921.95 samples/sec   Loss 2.1715   LearningRate 0.0003   Epoch: 19   Global Step: 33800   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-03-26 08:09:18,985-Speed 23602.48 samples/sec   Loss 2.1269   LearningRate 0.0003   Epoch: 19   Global Step: 33810   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-03-26 08:09:29,381-Speed 23643.17 samples/sec   Loss 2.1452   LearningRate 0.0003   Epoch: 19   Global Step: 33820   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:09:39,596-Speed 24062.76 samples/sec   Loss 2.1443   LearningRate 0.0003   Epoch: 19   Global Step: 33830   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:09:49,858-Speed 23953.87 samples/sec   Loss 2.1358   LearningRate 0.0003   Epoch: 19   Global Step: 33840   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:10:00,118-Speed 23956.13 samples/sec   Loss 2.1395   LearningRate 0.0003   Epoch: 19   Global Step: 33850   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:10:10,477-Speed 23729.69 samples/sec   Loss 2.1359   LearningRate 0.0003   Epoch: 19   Global Step: 33860   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:10:20,817-Speed 23771.86 samples/sec   Loss 2.1520   LearningRate 0.0003   Epoch: 19   Global Step: 33870   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:10:31,094-Speed 23916.29 samples/sec   Loss 2.1617   LearningRate 0.0003   Epoch: 19   Global Step: 33880   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:10:41,415-Speed 23814.95 samples/sec   Loss 2.1621   LearningRate 0.0003   Epoch: 19   Global Step: 33890   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:10:51,727-Speed 23835.44 samples/sec   Loss 2.1552   LearningRate 0.0003   Epoch: 19   Global Step: 33900   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:11:01,945-Speed 24055.10 samples/sec   Loss 2.1547   LearningRate 0.0003   Epoch: 19   Global Step: 33910   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:11:12,377-Speed 23561.70 samples/sec   Loss 2.1309   LearningRate 0.0003   Epoch: 19   Global Step: 33920   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 08:11:22,624-Speed 23988.10 samples/sec   Loss 2.1577   LearningRate 0.0003   Epoch: 19   Global Step: 33930   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:11:32,864-Speed 24000.95 samples/sec   Loss 2.1417   LearningRate 0.0003   Epoch: 19   Global Step: 33940   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:11:43,183-Speed 23819.75 samples/sec   Loss 2.1324   LearningRate 0.0003   Epoch: 19   Global Step: 33950   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:11:53,504-Speed 23814.23 samples/sec   Loss 2.1271   LearningRate 0.0003   Epoch: 19   Global Step: 33960   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:12:03,884-Speed 23678.83 samples/sec   Loss 2.1310   LearningRate 0.0003   Epoch: 19   Global Step: 33970   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:12:14,208-Speed 23807.01 samples/sec   Loss 2.1435   LearningRate 0.0003   Epoch: 19   Global Step: 33980   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:12:24,591-Speed 23673.62 samples/sec   Loss 2.1453   LearningRate 0.0003   Epoch: 19   Global Step: 33990   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:12:34,956-Speed 23711.30 samples/sec   Loss 2.1611   LearningRate 0.0003   Epoch: 19   Global Step: 34000   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:12:45,311-Speed 23736.92 samples/sec   Loss 2.1832   LearningRate 0.0003   Epoch: 19   Global Step: 34010   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:12:55,731-Speed 23588.27 samples/sec   Loss 2.1806   LearningRate 0.0003   Epoch: 19   Global Step: 34020   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:13:06,066-Speed 23782.72 samples/sec   Loss 2.1586   LearningRate 0.0003   Epoch: 19   Global Step: 34030   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 08:13:16,491-Speed 23577.62 samples/sec   Loss 2.1583   LearningRate 0.0003   Epoch: 19   Global Step: 34040   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 08:13:26,747-Speed 23967.47 samples/sec   Loss 2.1666   LearningRate 0.0003   Epoch: 19   Global Step: 34050   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:13:37,120-Speed 23698.16 samples/sec   Loss 2.1269   LearningRate 0.0003   Epoch: 19   Global Step: 34060   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:13:47,429-Speed 23843.76 samples/sec   Loss 2.1506   LearningRate 0.0003   Epoch: 19   Global Step: 34070   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:13:57,612-Speed 24140.79 samples/sec   Loss 2.1325   LearningRate 0.0003   Epoch: 19   Global Step: 34080   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:14:07,715-Speed 24328.09 samples/sec   Loss 2.1531   LearningRate 0.0003   Epoch: 19   Global Step: 34090   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:14:17,931-Speed 24060.26 samples/sec   Loss 2.1706   LearningRate 0.0003   Epoch: 19   Global Step: 34100   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:14:28,292-Speed 23722.94 samples/sec   Loss 2.1522   LearningRate 0.0003   Epoch: 19   Global Step: 34110   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:14:38,700-Speed 23614.36 samples/sec   Loss 2.1345   LearningRate 0.0003   Epoch: 19   Global Step: 34120   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:14:48,999-Speed 23868.10 samples/sec   Loss 2.1348   LearningRate 0.0003   Epoch: 19   Global Step: 34130   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:14:59,423-Speed 23579.09 samples/sec   Loss 2.1451   LearningRate 0.0003   Epoch: 19   Global Step: 34140   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:15:09,721-Speed 23868.23 samples/sec   Loss 2.1500   LearningRate 0.0003   Epoch: 19   Global Step: 34150   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 08:15:20,067-Speed 23756.61 samples/sec   Loss 2.1564   LearningRate 0.0003   Epoch: 19   Global Step: 34160   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 08:15:30,477-Speed 23613.05 samples/sec   Loss 2.1409   LearningRate 0.0003   Epoch: 19   Global Step: 34170   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 08:15:40,793-Speed 23826.02 samples/sec   Loss 2.1728   LearningRate 0.0003   Epoch: 19   Global Step: 34180   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 08:15:51,157-Speed 23714.85 samples/sec   Loss 2.1595   LearningRate 0.0003   Epoch: 19   Global Step: 34190   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:16:01,604-Speed 23528.44 samples/sec   Loss 2.1319   LearningRate 0.0003   Epoch: 19   Global Step: 34200   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:16:12,049-Speed 23532.15 samples/sec   Loss 2.1189   LearningRate 0.0003   Epoch: 19   Global Step: 34210   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:16:22,413-Speed 23718.65 samples/sec   Loss 2.1216   LearningRate 0.0003   Epoch: 19   Global Step: 34220   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:16:32,854-Speed 23541.29 samples/sec   Loss 2.1476   LearningRate 0.0003   Epoch: 19   Global Step: 34230   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:16:43,203-Speed 23750.43 samples/sec   Loss 2.1378   LearningRate 0.0003   Epoch: 19   Global Step: 34240   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:16:53,597-Speed 23646.72 samples/sec   Loss 2.1515   LearningRate 0.0003   Epoch: 19   Global Step: 34250   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:17:03,935-Speed 23774.64 samples/sec   Loss 2.1474   LearningRate 0.0003   Epoch: 19   Global Step: 34260   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:17:14,414-Speed 23457.56 samples/sec   Loss 2.1128   LearningRate 0.0003   Epoch: 19   Global Step: 34270   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:17:24,757-Speed 23763.11 samples/sec   Loss 2.1198   LearningRate 0.0003   Epoch: 19   Global Step: 34280   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:17:34,945-Speed 24123.93 samples/sec   Loss 2.1340   LearningRate 0.0003   Epoch: 19   Global Step: 34290   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:17:45,167-Speed 24047.26 samples/sec   Loss 2.1291   LearningRate 0.0003   Epoch: 19   Global Step: 34300   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:17:55,558-Speed 23653.73 samples/sec   Loss 2.1341   LearningRate 0.0003   Epoch: 19   Global Step: 34310   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:18:05,886-Speed 23807.19 samples/sec   Loss 2.1368   LearningRate 0.0003   Epoch: 19   Global Step: 34320   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:18:16,217-Speed 23790.60 samples/sec   Loss 2.1404   LearningRate 0.0003   Epoch: 19   Global Step: 34330   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:18:26,514-Speed 23872.23 samples/sec   Loss 2.1511   LearningRate 0.0003   Epoch: 19   Global Step: 34340   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:18:36,808-Speed 23876.64 samples/sec   Loss 2.1447   LearningRate 0.0003   Epoch: 19   Global Step: 34350   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:18:47,192-Speed 23675.41 samples/sec   Loss 2.1474   LearningRate 0.0003   Epoch: 19   Global Step: 34360   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:18:57,502-Speed 23840.79 samples/sec   Loss 2.1071   LearningRate 0.0003   Epoch: 19   Global Step: 34370   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:19:07,791-Speed 23888.15 samples/sec   Loss 2.1611   LearningRate 0.0003   Epoch: 19   Global Step: 34380   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:19:18,072-Speed 23908.65 samples/sec   Loss 2.1820   LearningRate 0.0003   Epoch: 19   Global Step: 34390   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 08:19:28,377-Speed 23852.52 samples/sec   Loss 2.1686   LearningRate 0.0003   Epoch: 19   Global Step: 34400   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:19:38,717-Speed 23769.99 samples/sec   Loss 2.2115   LearningRate 0.0003   Epoch: 19   Global Step: 34410   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:19:48,985-Speed 23939.08 samples/sec   Loss 2.1346   LearningRate 0.0003   Epoch: 19   Global Step: 34420   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:19:59,270-Speed 23896.72 samples/sec   Loss 2.1229   LearningRate 0.0003   Epoch: 19   Global Step: 34430   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:20:09,532-Speed 23952.76 samples/sec   Loss 2.1267   LearningRate 0.0003   Epoch: 19   Global Step: 34440   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:20:19,811-Speed 23911.81 samples/sec   Loss 2.1339   LearningRate 0.0003   Epoch: 19   Global Step: 34450   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:20:30,134-Speed 23810.32 samples/sec   Loss 2.1166   LearningRate 0.0003   Epoch: 19   Global Step: 34460   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:20:40,491-Speed 23737.28 samples/sec   Loss 2.1234   LearningRate 0.0003   Epoch: 19   Global Step: 34470   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:20:50,827-Speed 23781.06 samples/sec   Loss 2.1218   LearningRate 0.0003   Epoch: 19   Global Step: 34480   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:21:01,159-Speed 23790.79 samples/sec   Loss 2.1144   LearningRate 0.0003   Epoch: 19   Global Step: 34490   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:21:11,514-Speed 23736.46 samples/sec   Loss 2.1203   LearningRate 0.0003   Epoch: 19   Global Step: 34500   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 08:21:21,812-Speed 23870.00 samples/sec   Loss 2.1447   LearningRate 0.0003   Epoch: 19   Global Step: 34510   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:21:32,202-Speed 23656.20 samples/sec   Loss 2.1355   LearningRate 0.0003   Epoch: 19   Global Step: 34520   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:21:42,515-Speed 23833.59 samples/sec   Loss 2.1384   LearningRate 0.0003   Epoch: 19   Global Step: 34530   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:21:52,822-Speed 23846.60 samples/sec   Loss 2.1108   LearningRate 0.0003   Epoch: 19   Global Step: 34540   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:22:03,202-Speed 23679.06 samples/sec   Loss 2.1298   LearningRate 0.0003   Epoch: 19   Global Step: 34550   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:22:13,589-Speed 23662.39 samples/sec   Loss 2.1388   LearningRate 0.0003   Epoch: 19   Global Step: 34560   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:23:13,869-Speed 4077.11 samples/sec   Loss 2.1554   LearningRate 0.0003   Epoch: 20   Global Step: 34570   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:23:23,990-Speed 24283.98 samples/sec   Loss 2.1092   LearningRate 0.0003   Epoch: 20   Global Step: 34580   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:23:34,124-Speed 24256.55 samples/sec   Loss 2.0875   LearningRate 0.0003   Epoch: 20   Global Step: 34590   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:23:44,252-Speed 24267.66 samples/sec   Loss 2.0993   LearningRate 0.0003   Epoch: 20   Global Step: 34600   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:23:54,490-Speed 24007.94 samples/sec   Loss 2.1105   LearningRate 0.0003   Epoch: 20   Global Step: 34610   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 08:24:04,791-Speed 23860.94 samples/sec   Loss 2.0890   LearningRate 0.0003   Epoch: 20   Global Step: 34620   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 08:24:14,987-Speed 24111.66 samples/sec   Loss 2.0888   LearningRate 0.0003   Epoch: 20   Global Step: 34630   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 08:24:25,185-Speed 24101.98 samples/sec   Loss 2.0839   LearningRate 0.0003   Epoch: 20   Global Step: 34640   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 08:24:35,212-Speed 24514.59 samples/sec   Loss 2.1078   LearningRate 0.0003   Epoch: 20   Global Step: 34650   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 08:24:45,301-Speed 24362.50 samples/sec   Loss 2.1324   LearningRate 0.0003   Epoch: 20   Global Step: 34660   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 08:24:55,562-Speed 23954.13 samples/sec   Loss 2.1259   LearningRate 0.0003   Epoch: 20   Global Step: 34670   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:25:05,867-Speed 23852.79 samples/sec   Loss 2.0840   LearningRate 0.0003   Epoch: 20   Global Step: 34680   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:25:16,042-Speed 24156.90 samples/sec   Loss 2.1094   LearningRate 0.0003   Epoch: 20   Global Step: 34690   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:25:26,214-Speed 24163.40 samples/sec   Loss 2.1047   LearningRate 0.0003   Epoch: 20   Global Step: 34700   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:25:36,493-Speed 23913.49 samples/sec   Loss 2.1330   LearningRate 0.0003   Epoch: 20   Global Step: 34710   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:25:46,690-Speed 24103.46 samples/sec   Loss 2.0937   LearningRate 0.0003   Epoch: 20   Global Step: 34720   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:25:57,039-Speed 23757.41 samples/sec   Loss 2.1037   LearningRate 0.0003   Epoch: 20   Global Step: 34730   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:26:07,224-Speed 24132.59 samples/sec   Loss 2.1321   LearningRate 0.0003   Epoch: 20   Global Step: 34740   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:26:17,227-Speed 24572.05 samples/sec   Loss 2.1320   LearningRate 0.0003   Epoch: 20   Global Step: 34750   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:26:27,291-Speed 24422.08 samples/sec   Loss 2.1223   LearningRate 0.0003   Epoch: 20   Global Step: 34760   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:26:36,990-Speed 25343.29 samples/sec   Loss 2.1033   LearningRate 0.0003   Epoch: 20   Global Step: 34770   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 08:26:46,712-Speed 25282.54 samples/sec   Loss 2.0886   LearningRate 0.0003   Epoch: 20   Global Step: 34780   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:26:56,445-Speed 25259.27 samples/sec   Loss 2.1236   LearningRate 0.0003   Epoch: 20   Global Step: 34790   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:27:06,186-Speed 25235.27 samples/sec   Loss 2.1543   LearningRate 0.0003   Epoch: 20   Global Step: 34800   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:27:16,090-Speed 24816.86 samples/sec   Loss 2.1217   LearningRate 0.0003   Epoch: 20   Global Step: 34810   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:27:25,931-Speed 24976.55 samples/sec   Loss 2.1179   LearningRate 0.0003   Epoch: 20   Global Step: 34820   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:27:35,791-Speed 24930.72 samples/sec   Loss 2.1146   LearningRate 0.0003   Epoch: 20   Global Step: 34830   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:27:45,712-Speed 24775.40 samples/sec   Loss 2.1041   LearningRate 0.0003   Epoch: 20   Global Step: 34840   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:27:55,404-Speed 25360.06 samples/sec   Loss 2.0841   LearningRate 0.0003   Epoch: 20   Global Step: 34850   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:28:05,250-Speed 24963.17 samples/sec   Loss 2.0795   LearningRate 0.0003   Epoch: 20   Global Step: 34860   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:28:14,999-Speed 25213.20 samples/sec   Loss 2.0841   LearningRate 0.0003   Epoch: 20   Global Step: 34870   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:28:24,642-Speed 25488.20 samples/sec   Loss 2.1096   LearningRate 0.0003   Epoch: 20   Global Step: 34880   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:28:34,334-Speed 25359.91 samples/sec   Loss 2.1166   LearningRate 0.0003   Epoch: 20   Global Step: 34890   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:28:44,122-Speed 25112.45 samples/sec   Loss 2.1297   LearningRate 0.0003   Epoch: 20   Global Step: 34900   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:28:53,943-Speed 25027.36 samples/sec   Loss 2.0974   LearningRate 0.0003   Epoch: 20   Global Step: 34910   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:29:03,697-Speed 25199.74 samples/sec   Loss 2.1170   LearningRate 0.0003   Epoch: 20   Global Step: 34920   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:29:13,483-Speed 25123.15 samples/sec   Loss 2.1097   LearningRate 0.0003   Epoch: 20   Global Step: 34930   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:29:23,298-Speed 25041.15 samples/sec   Loss 2.1575   LearningRate 0.0003   Epoch: 20   Global Step: 34940   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:29:33,025-Speed 25270.08 samples/sec   Loss 2.0977   LearningRate 0.0003   Epoch: 20   Global Step: 34950   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:29:42,705-Speed 25392.45 samples/sec   Loss 2.1042   LearningRate 0.0003   Epoch: 20   Global Step: 34960   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:29:52,386-Speed 25389.42 samples/sec   Loss 2.0955   LearningRate 0.0003   Epoch: 20   Global Step: 34970   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:30:02,114-Speed 25265.89 samples/sec   Loss 2.1014   LearningRate 0.0003   Epoch: 20   Global Step: 34980   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 08:30:11,827-Speed 25305.11 samples/sec   Loss 2.1108   LearningRate 0.0003   Epoch: 20   Global Step: 34990   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 08:30:21,613-Speed 25117.10 samples/sec   Loss 2.1071   LearningRate 0.0003   Epoch: 20   Global Step: 35000   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:30:31,332-Speed 25293.23 samples/sec   Loss 2.1075   LearningRate 0.0003   Epoch: 20   Global Step: 35010   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:30:41,010-Speed 25396.66 samples/sec   Loss 2.0848   LearningRate 0.0003   Epoch: 20   Global Step: 35020   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:30:50,719-Speed 25316.22 samples/sec   Loss 2.1011   LearningRate 0.0003   Epoch: 20   Global Step: 35030   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:31:00,679-Speed 24676.86 samples/sec   Loss 2.1075   LearningRate 0.0003   Epoch: 20   Global Step: 35040   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:31:10,473-Speed 25096.73 samples/sec   Loss 2.1625   LearningRate 0.0003   Epoch: 20   Global Step: 35050   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:31:20,202-Speed 25261.83 samples/sec   Loss 2.1603   LearningRate 0.0003   Epoch: 20   Global Step: 35060   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:31:29,901-Speed 25342.57 samples/sec   Loss 2.1140   LearningRate 0.0003   Epoch: 20   Global Step: 35070   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:31:39,692-Speed 25104.22 samples/sec   Loss 2.1023   LearningRate 0.0003   Epoch: 20   Global Step: 35080   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:31:49,462-Speed 25159.54 samples/sec   Loss 2.0854   LearningRate 0.0003   Epoch: 20   Global Step: 35090   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:31:59,229-Speed 25166.94 samples/sec   Loss 2.1018   LearningRate 0.0003   Epoch: 20   Global Step: 35100   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 08:32:08,946-Speed 25294.79 samples/sec   Loss 2.1066   LearningRate 0.0003   Epoch: 20   Global Step: 35110   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 08:32:18,654-Speed 25317.52 samples/sec   Loss 2.1188   LearningRate 0.0003   Epoch: 20   Global Step: 35120   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 08:32:28,358-Speed 25331.13 samples/sec   Loss 2.1257   LearningRate 0.0003   Epoch: 20   Global Step: 35130   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 08:32:38,071-Speed 25307.70 samples/sec   Loss 2.0764   LearningRate 0.0003   Epoch: 20   Global Step: 35140   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 08:32:47,816-Speed 25225.26 samples/sec   Loss 2.0710   LearningRate 0.0003   Epoch: 20   Global Step: 35150   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 08:32:57,530-Speed 25301.13 samples/sec   Loss 2.0838   LearningRate 0.0003   Epoch: 20   Global Step: 35160   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 08:33:07,313-Speed 25126.64 samples/sec   Loss 2.0852   LearningRate 0.0003   Epoch: 20   Global Step: 35170   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:33:17,052-Speed 25240.74 samples/sec   Loss 2.0976   LearningRate 0.0003   Epoch: 20   Global Step: 35180   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:33:26,808-Speed 25194.42 samples/sec   Loss 2.0972   LearningRate 0.0003   Epoch: 20   Global Step: 35190   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:33:36,569-Speed 25180.46 samples/sec   Loss 2.0906   LearningRate 0.0003   Epoch: 20   Global Step: 35200   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:33:46,355-Speed 25116.20 samples/sec   Loss 2.1047   LearningRate 0.0003   Epoch: 20   Global Step: 35210   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:33:56,053-Speed 25343.65 samples/sec   Loss 2.0916   LearningRate 0.0003   Epoch: 20   Global Step: 35220   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:34:05,758-Speed 25326.99 samples/sec   Loss 2.1071   LearningRate 0.0003   Epoch: 20   Global Step: 35230   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:34:15,516-Speed 25189.55 samples/sec   Loss 2.0894   LearningRate 0.0003   Epoch: 20   Global Step: 35240   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:34:25,222-Speed 25325.53 samples/sec   Loss 2.0941   LearningRate 0.0003   Epoch: 20   Global Step: 35250   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:34:35,106-Speed 24866.61 samples/sec   Loss 2.0808   LearningRate 0.0003   Epoch: 20   Global Step: 35260   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:34:44,876-Speed 25157.64 samples/sec   Loss 2.1146   LearningRate 0.0003   Epoch: 20   Global Step: 35270   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 08:34:54,657-Speed 25130.86 samples/sec   Loss 2.0928   LearningRate 0.0003   Epoch: 20   Global Step: 35280   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 08:35:04,598-Speed 24727.41 samples/sec   Loss 2.0840   LearningRate 0.0003   Epoch: 20   Global Step: 35290   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 08:35:14,351-Speed 25201.23 samples/sec   Loss 2.0661   LearningRate 0.0003   Epoch: 20   Global Step: 35300   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:35:24,064-Speed 25303.53 samples/sec   Loss 2.0603   LearningRate 0.0003   Epoch: 20   Global Step: 35310   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:35:33,816-Speed 25206.45 samples/sec   Loss 2.0995   LearningRate 0.0003   Epoch: 20   Global Step: 35320   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:35:43,569-Speed 25203.86 samples/sec   Loss 2.1039   LearningRate 0.0003   Epoch: 20   Global Step: 35330   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:35:53,287-Speed 25290.30 samples/sec   Loss 2.1026   LearningRate 0.0003   Epoch: 20   Global Step: 35340   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:36:03,084-Speed 25090.16 samples/sec   Loss 2.0791   LearningRate 0.0003   Epoch: 20   Global Step: 35350   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:36:12,788-Speed 25327.83 samples/sec   Loss 2.0610   LearningRate 0.0003   Epoch: 20   Global Step: 35360   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:36:22,491-Speed 25330.73 samples/sec   Loss 2.0707   LearningRate 0.0003   Epoch: 20   Global Step: 35370   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:36:32,230-Speed 25237.48 samples/sec   Loss 2.0788   LearningRate 0.0003   Epoch: 20   Global Step: 35380   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:36:41,935-Speed 25327.83 samples/sec   Loss 2.0896   LearningRate 0.0003   Epoch: 20   Global Step: 35390   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:36:51,738-Speed 25072.51 samples/sec   Loss 2.0718   LearningRate 0.0003   Epoch: 20   Global Step: 35400   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 08:37:01,489-Speed 25205.31 samples/sec   Loss 2.0707   LearningRate 0.0003   Epoch: 20   Global Step: 35410   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-26 08:37:11,221-Speed 25255.11 samples/sec   Loss 2.0883   LearningRate 0.0003   Epoch: 20   Global Step: 35420   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:37:20,995-Speed 25148.15 samples/sec   Loss 2.0946   LearningRate 0.0003   Epoch: 20   Global Step: 35430   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:37:30,762-Speed 25164.93 samples/sec   Loss 2.0641   LearningRate 0.0003   Epoch: 20   Global Step: 35440   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:37:40,515-Speed 25201.78 samples/sec   Loss 2.0806   LearningRate 0.0003   Epoch: 20   Global Step: 35450   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:37:50,218-Speed 25331.35 samples/sec   Loss 2.0787   LearningRate 0.0003   Epoch: 20   Global Step: 35460   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:37:59,940-Speed 25282.28 samples/sec   Loss 2.0657   LearningRate 0.0003   Epoch: 20   Global Step: 35470   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:38:09,743-Speed 25076.49 samples/sec   Loss 2.0626   LearningRate 0.0003   Epoch: 20   Global Step: 35480   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:38:19,576-Speed 24995.73 samples/sec   Loss 2.0819   LearningRate 0.0003   Epoch: 20   Global Step: 35490   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:38:29,382-Speed 25064.51 samples/sec   Loss 2.0669   LearningRate 0.0003   Epoch: 20   Global Step: 35500   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-26 08:38:39,082-Speed 25340.42 samples/sec   Loss 2.0714   LearningRate 0.0003   Epoch: 20   Global Step: 35510   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 08:38:49,031-Speed 24704.66 samples/sec   Loss 2.0728   LearningRate 0.0003   Epoch: 20   Global Step: 35520   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-26 08:38:58,851-Speed 25027.76 samples/sec   Loss 2.0561   LearningRate 0.0003   Epoch: 20   Global Step: 35530   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-26 08:39:08,685-Speed 24992.89 samples/sec   Loss 2.0633   LearningRate 0.0003   Epoch: 20   Global Step: 35540   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-26 08:39:18,379-Speed 25356.68 samples/sec   Loss 2.0709   LearningRate 0.0003   Epoch: 20   Global Step: 35550   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 08:39:28,146-Speed 25163.89 samples/sec   Loss 2.0843   LearningRate 0.0003   Epoch: 20   Global Step: 35560   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 08:39:37,875-Speed 25265.45 samples/sec   Loss 2.0771   LearningRate 0.0003   Epoch: 20   Global Step: 35570   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 08:39:47,601-Speed 25271.43 samples/sec   Loss 2.0813   LearningRate 0.0003   Epoch: 20   Global Step: 35580   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 08:39:57,406-Speed 25067.16 samples/sec   Loss 2.0808   LearningRate 0.0003   Epoch: 20   Global Step: 35590   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 08:40:07,184-Speed 25138.24 samples/sec   Loss 2.0750   LearningRate 0.0003   Epoch: 20   Global Step: 35600   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 08:40:16,966-Speed 25127.89 samples/sec   Loss 2.0861   LearningRate 0.0003   Epoch: 20   Global Step: 35610   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 08:40:26,708-Speed 25230.03 samples/sec   Loss 2.0446   LearningRate 0.0003   Epoch: 20   Global Step: 35620   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 08:40:36,445-Speed 25242.03 samples/sec   Loss 2.0574   LearningRate 0.0003   Epoch: 20   Global Step: 35630   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 08:40:46,198-Speed 25201.11 samples/sec   Loss 2.0866   LearningRate 0.0003   Epoch: 20   Global Step: 35640   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 08:40:56,036-Speed 24984.59 samples/sec   Loss 2.0684   LearningRate 0.0003   Epoch: 20   Global Step: 35650   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-26 08:41:05,739-Speed 25332.97 samples/sec   Loss 2.0685   LearningRate 0.0003   Epoch: 20   Global Step: 35660   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-26 08:41:15,572-Speed 24996.60 samples/sec   Loss 2.0708   LearningRate 0.0003   Epoch: 20   Global Step: 35670   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-26 08:41:25,277-Speed 25326.73 samples/sec   Loss 2.0767   LearningRate 0.0003   Epoch: 20   Global Step: 35680   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 08:41:35,034-Speed 25193.58 samples/sec   Loss 2.1068   LearningRate 0.0003   Epoch: 20   Global Step: 35690   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 08:41:44,818-Speed 25123.81 samples/sec   Loss 2.0662   LearningRate 0.0003   Epoch: 20   Global Step: 35700   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 08:41:54,513-Speed 25352.01 samples/sec   Loss 2.0510   LearningRate 0.0003   Epoch: 20   Global Step: 35710   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 08:42:04,292-Speed 25136.86 samples/sec   Loss 2.0493   LearningRate 0.0003   Epoch: 20   Global Step: 35720   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 08:42:14,006-Speed 25303.75 samples/sec   Loss 2.0561   LearningRate 0.0003   Epoch: 20   Global Step: 35730   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 08:42:23,754-Speed 25214.18 samples/sec   Loss 2.0491   LearningRate 0.0003   Epoch: 20   Global Step: 35740   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 08:42:33,567-Speed 25046.02 samples/sec   Loss 2.0405   LearningRate 0.0003   Epoch: 20   Global Step: 35750   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 08:42:43,355-Speed 25114.25 samples/sec   Loss 2.0643   LearningRate 0.0003   Epoch: 20   Global Step: 35760   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 08:42:53,046-Speed 25363.57 samples/sec   Loss 2.0654   LearningRate 0.0003   Epoch: 20   Global Step: 35770   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 08:43:02,812-Speed 25169.18 samples/sec   Loss 2.0594   LearningRate 0.0003   Epoch: 20   Global Step: 35780   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-26 08:43:12,492-Speed 25395.60 samples/sec   Loss 2.0571   LearningRate 0.0003   Epoch: 20   Global Step: 35790   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 08:43:22,260-Speed 25163.76 samples/sec   Loss 2.0458   LearningRate 0.0003   Epoch: 20   Global Step: 35800   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 08:43:32,012-Speed 25207.91 samples/sec   Loss 2.0829   LearningRate 0.0003   Epoch: 20   Global Step: 35810   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 08:43:41,742-Speed 25261.06 samples/sec   Loss 2.0388   LearningRate 0.0003   Epoch: 20   Global Step: 35820   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 08:43:51,570-Speed 25009.13 samples/sec   Loss 2.0420   LearningRate 0.0003   Epoch: 20   Global Step: 35830   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 08:44:01,382-Speed 25051.96 samples/sec   Loss 2.0509   LearningRate 0.0003   Epoch: 20   Global Step: 35840   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 08:44:11,191-Speed 25057.86 samples/sec   Loss 2.0454   LearningRate 0.0003   Epoch: 20   Global Step: 35850   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 08:44:20,962-Speed 25154.09 samples/sec   Loss 2.0605   LearningRate 0.0003   Epoch: 20   Global Step: 35860   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 08:44:30,683-Speed 25283.70 samples/sec   Loss 2.0823   LearningRate 0.0003   Epoch: 20   Global Step: 35870   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 08:44:40,438-Speed 25199.40 samples/sec   Loss 2.0795   LearningRate 0.0003   Epoch: 20   Global Step: 35880   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 08:44:50,160-Speed 25280.26 samples/sec   Loss 2.0650   LearningRate 0.0003   Epoch: 20   Global Step: 35890   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-26 08:44:59,880-Speed 25288.03 samples/sec   Loss 2.0554   LearningRate 0.0003   Epoch: 20   Global Step: 35900   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-26 08:45:09,636-Speed 25192.64 samples/sec   Loss 2.0529   LearningRate 0.0003   Epoch: 20   Global Step: 35910   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-26 08:45:19,429-Speed 25099.17 samples/sec   Loss 2.0538   LearningRate 0.0003   Epoch: 20   Global Step: 35920   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 08:45:29,211-Speed 25127.71 samples/sec   Loss 2.0540   LearningRate 0.0003   Epoch: 20   Global Step: 35930   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 08:45:39,066-Speed 24943.66 samples/sec   Loss 2.0815   LearningRate 0.0003   Epoch: 20   Global Step: 35940   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 08:45:48,795-Speed 25265.31 samples/sec   Loss 2.0513   LearningRate 0.0003   Epoch: 20   Global Step: 35950   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 08:45:58,673-Speed 24882.30 samples/sec   Loss 2.0335   LearningRate 0.0003   Epoch: 20   Global Step: 35960   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 08:46:08,563-Speed 24853.00 samples/sec   Loss 2.0362   LearningRate 0.0003   Epoch: 20   Global Step: 35970   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 08:46:18,491-Speed 24757.97 samples/sec   Loss 2.0545   LearningRate 0.0003   Epoch: 20   Global Step: 35980   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 08:46:28,211-Speed 25285.92 samples/sec   Loss 2.0465   LearningRate 0.0003   Epoch: 20   Global Step: 35990   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 08:46:37,969-Speed 25189.74 samples/sec   Loss 2.0496   LearningRate 0.0003   Epoch: 20   Global Step: 36000   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 08:46:47,757-Speed 25112.11 samples/sec   Loss 2.0545   LearningRate 0.0003   Epoch: 20   Global Step: 36010   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 08:46:57,515-Speed 25190.77 samples/sec   Loss 2.0658   LearningRate 0.0003   Epoch: 20   Global Step: 36020   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 08:47:07,279-Speed 25174.95 samples/sec   Loss 2.0595   LearningRate 0.0003   Epoch: 20   Global Step: 36030   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 08:47:16,988-Speed 25315.75 samples/sec   Loss 2.0415   LearningRate 0.0003   Epoch: 20   Global Step: 36040   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 08:47:26,686-Speed 25345.34 samples/sec   Loss 2.0471   LearningRate 0.0003   Epoch: 20   Global Step: 36050   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 08:47:36,462-Speed 25140.39 samples/sec   Loss 2.0415   LearningRate 0.0003   Epoch: 20   Global Step: 36060   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 08:47:46,162-Speed 25341.06 samples/sec   Loss 2.0658   LearningRate 0.0003   Epoch: 20   Global Step: 36070   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 08:47:55,892-Speed 25259.81 samples/sec   Loss 2.0457   LearningRate 0.0003   Epoch: 20   Global Step: 36080   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 08:48:05,620-Speed 25267.90 samples/sec   Loss 2.0694   LearningRate 0.0003   Epoch: 20   Global Step: 36090   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 08:48:15,394-Speed 25147.44 samples/sec   Loss 2.0816   LearningRate 0.0003   Epoch: 20   Global Step: 36100   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 08:48:25,284-Speed 24853.16 samples/sec   Loss 2.0746   LearningRate 0.0003   Epoch: 20   Global Step: 36110   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 08:48:35,122-Speed 24983.46 samples/sec   Loss 2.0610   LearningRate 0.0003   Epoch: 20   Global Step: 36120   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-26 08:48:44,929-Speed 25065.95 samples/sec   Loss 2.0201   LearningRate 0.0003   Epoch: 20   Global Step: 36130   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-26 08:48:54,773-Speed 24967.31 samples/sec   Loss 2.0404   LearningRate 0.0003   Epoch: 20   Global Step: 36140   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-26 08:49:04,503-Speed 25261.38 samples/sec   Loss 2.0461   LearningRate 0.0003   Epoch: 20   Global Step: 36150   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-26 08:49:14,257-Speed 25200.80 samples/sec   Loss 2.0490   LearningRate 0.0003   Epoch: 20   Global Step: 36160   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-26 08:49:28,933-Speed 16746.54 samples/sec   Loss 2.0672   LearningRate 0.0003   Epoch: 20   Global Step: 36170   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-26 08:49:38,701-Speed 25162.29 samples/sec   Loss 2.0603   LearningRate 0.0003   Epoch: 20   Global Step: 36180   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-26 08:49:48,480-Speed 25134.59 samples/sec   Loss 2.0658   LearningRate 0.0003   Epoch: 20   Global Step: 36190   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-26 08:49:58,316-Speed 24990.57 samples/sec   Loss 2.0686   LearningRate 0.0003   Epoch: 20   Global Step: 36200   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-26 08:50:08,148-Speed 24998.08 samples/sec   Loss 2.0366   LearningRate 0.0003   Epoch: 20   Global Step: 36210   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-26 08:50:17,957-Speed 25059.87 samples/sec   Loss 2.0484   LearningRate 0.0003   Epoch: 20   Global Step: 36220   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 08:50:27,707-Speed 25208.35 samples/sec   Loss 2.0412   LearningRate 0.0003   Epoch: 20   Global Step: 36230   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 08:50:37,391-Speed 25383.28 samples/sec   Loss 2.0499   LearningRate 0.0003   Epoch: 20   Global Step: 36240   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 08:50:47,215-Speed 25019.47 samples/sec   Loss 2.0522   LearningRate 0.0003   Epoch: 20   Global Step: 36250   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 08:50:56,945-Speed 25262.71 samples/sec   Loss 2.0409   LearningRate 0.0003   Epoch: 20   Global Step: 36260   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 08:51:06,692-Speed 25216.17 samples/sec   Loss 2.0684   LearningRate 0.0003   Epoch: 20   Global Step: 36270   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 08:51:16,503-Speed 25052.39 samples/sec   Loss 2.0721   LearningRate 0.0003   Epoch: 20   Global Step: 36280   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 08:51:26,322-Speed 25032.76 samples/sec   Loss 2.0744   LearningRate 0.0003   Epoch: 20   Global Step: 36290   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 08:52:26,282-Speed 4098.86 samples/sec   Loss 2.0421   LearningRate 0.0003   Epoch: 21   Global Step: 36300   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 08:52:35,970-Speed 25370.06 samples/sec   Loss 2.0375   LearningRate 0.0003   Epoch: 21   Global Step: 36310   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 08:52:45,720-Speed 25210.71 samples/sec   Loss 2.0483   LearningRate 0.0003   Epoch: 21   Global Step: 36320   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-26 08:52:55,448-Speed 25265.19 samples/sec   Loss 2.0159   LearningRate 0.0003   Epoch: 21   Global Step: 36330   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-26 08:53:05,171-Speed 25282.79 samples/sec   Loss 2.0003   LearningRate 0.0003   Epoch: 21   Global Step: 36340   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-26 08:53:14,904-Speed 25252.93 samples/sec   Loss 2.0138   LearningRate 0.0003   Epoch: 21   Global Step: 36350   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 08:53:24,673-Speed 25161.31 samples/sec   Loss 2.0388   LearningRate 0.0003   Epoch: 21   Global Step: 36360   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 08:53:34,487-Speed 25048.03 samples/sec   Loss 2.0245   LearningRate 0.0003   Epoch: 21   Global Step: 36370   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 08:53:44,281-Speed 25095.20 samples/sec   Loss 2.0384   LearningRate 0.0003   Epoch: 21   Global Step: 36380   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 08:53:54,043-Speed 25178.92 samples/sec   Loss 2.0288   LearningRate 0.0003   Epoch: 21   Global Step: 36390   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 08:54:03,775-Speed 25255.05 samples/sec   Loss 2.0245   LearningRate 0.0003   Epoch: 21   Global Step: 36400   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 08:54:13,505-Speed 25262.31 samples/sec   Loss 2.0130   LearningRate 0.0003   Epoch: 21   Global Step: 36410   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 08:54:23,253-Speed 25213.76 samples/sec   Loss 2.0141   LearningRate 0.0003   Epoch: 21   Global Step: 36420   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 08:54:33,002-Speed 25214.53 samples/sec   Loss 2.0190   LearningRate 0.0003   Epoch: 21   Global Step: 36430   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 08:54:42,992-Speed 24603.65 samples/sec   Loss 2.0403   LearningRate 0.0003   Epoch: 21   Global Step: 36440   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 08:54:52,865-Speed 24897.76 samples/sec   Loss 2.0273   LearningRate 0.0003   Epoch: 21   Global Step: 36450   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-26 08:55:02,620-Speed 25204.72 samples/sec   Loss 2.0211   LearningRate 0.0003   Epoch: 21   Global Step: 36460   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-26 08:55:12,350-Speed 25262.23 samples/sec   Loss 2.0313   LearningRate 0.0003   Epoch: 21   Global Step: 36470   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-26 08:55:22,212-Speed 24924.20 samples/sec   Loss 2.0652   LearningRate 0.0003   Epoch: 21   Global Step: 36480   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 08:55:32,220-Speed 24559.36 samples/sec   Loss 2.0237   LearningRate 0.0003   Epoch: 21   Global Step: 36490   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 08:55:42,336-Speed 24297.51 samples/sec   Loss 2.0112   LearningRate 0.0003   Epoch: 21   Global Step: 36500   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 08:55:52,346-Speed 24557.71 samples/sec   Loss 2.0408   LearningRate 0.0003   Epoch: 21   Global Step: 36510   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 08:56:02,428-Speed 24380.10 samples/sec   Loss 2.0353   LearningRate 0.0003   Epoch: 21   Global Step: 36520   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 08:56:12,412-Speed 24617.13 samples/sec   Loss 2.0189   LearningRate 0.0003   Epoch: 21   Global Step: 36530   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 08:56:22,559-Speed 24224.85 samples/sec   Loss 2.0258   LearningRate 0.0003   Epoch: 21   Global Step: 36540   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 08:56:32,597-Speed 24485.90 samples/sec   Loss 2.0419   LearningRate 0.0003   Epoch: 21   Global Step: 36550   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 08:56:42,552-Speed 24692.28 samples/sec   Loss 2.0325   LearningRate 0.0003   Epoch: 21   Global Step: 36560   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 08:56:52,590-Speed 24488.25 samples/sec   Loss 2.0309   LearningRate 0.0003   Epoch: 21   Global Step: 36570   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 08:57:02,701-Speed 24308.78 samples/sec   Loss 2.0162   LearningRate 0.0003   Epoch: 21   Global Step: 36580   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-26 08:57:12,734-Speed 24502.24 samples/sec   Loss 2.0432   LearningRate 0.0003   Epoch: 21   Global Step: 36590   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 08:57:22,871-Speed 24244.93 samples/sec   Loss 2.0407   LearningRate 0.0003   Epoch: 21   Global Step: 36600   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 08:57:32,964-Speed 24355.72 samples/sec   Loss 2.0402   LearningRate 0.0003   Epoch: 21   Global Step: 36610   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 08:57:43,012-Speed 24463.52 samples/sec   Loss 2.0361   LearningRate 0.0003   Epoch: 21   Global Step: 36620   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 08:57:52,950-Speed 24731.03 samples/sec   Loss 2.0328   LearningRate 0.0003   Epoch: 21   Global Step: 36630   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 08:58:02,971-Speed 24527.80 samples/sec   Loss 2.0157   LearningRate 0.0003   Epoch: 21   Global Step: 36640   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 08:58:13,124-Speed 24209.47 samples/sec   Loss 2.0054   LearningRate 0.0003   Epoch: 21   Global Step: 36650   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 08:58:23,223-Speed 24337.96 samples/sec   Loss 2.0362   LearningRate 0.0003   Epoch: 21   Global Step: 36660   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 08:58:33,178-Speed 24688.78 samples/sec   Loss 2.0386   LearningRate 0.0003   Epoch: 21   Global Step: 36670   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 08:58:43,346-Speed 24174.57 samples/sec   Loss 2.0218   LearningRate 0.0003   Epoch: 21   Global Step: 36680   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 08:58:53,536-Speed 24121.58 samples/sec   Loss 2.0167   LearningRate 0.0003   Epoch: 21   Global Step: 36690   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-26 08:59:03,770-Speed 24014.79 samples/sec   Loss 2.0338   LearningRate 0.0003   Epoch: 21   Global Step: 36700   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-26 08:59:13,963-Speed 24114.97 samples/sec   Loss 2.0371   LearningRate 0.0003   Epoch: 21   Global Step: 36710   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 08:59:23,998-Speed 24494.91 samples/sec   Loss 2.0196   LearningRate 0.0003   Epoch: 21   Global Step: 36720   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 08:59:34,010-Speed 24549.98 samples/sec   Loss 2.0175   LearningRate 0.0003   Epoch: 21   Global Step: 36730   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 08:59:44,006-Speed 24589.83 samples/sec   Loss 2.0222   LearningRate 0.0003   Epoch: 21   Global Step: 36740   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 08:59:53,999-Speed 24595.40 samples/sec   Loss 2.0194   LearningRate 0.0003   Epoch: 21   Global Step: 36750   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:00:04,082-Speed 24375.92 samples/sec   Loss 2.0338   LearningRate 0.0003   Epoch: 21   Global Step: 36760   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:00:14,144-Speed 24429.03 samples/sec   Loss 2.0035   LearningRate 0.0003   Epoch: 21   Global Step: 36770   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:00:24,124-Speed 24626.41 samples/sec   Loss 2.0011   LearningRate 0.0003   Epoch: 21   Global Step: 36780   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:00:34,112-Speed 24612.63 samples/sec   Loss 2.0084   LearningRate 0.0003   Epoch: 21   Global Step: 36790   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:00:44,168-Speed 24441.98 samples/sec   Loss 2.0169   LearningRate 0.0003   Epoch: 21   Global Step: 36800   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:00:54,395-Speed 24033.52 samples/sec   Loss 2.0139   LearningRate 0.0003   Epoch: 21   Global Step: 36810   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-26 09:01:04,214-Speed 25031.89 samples/sec   Loss 2.0290   LearningRate 0.0003   Epoch: 21   Global Step: 36820   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:01:13,913-Speed 25343.11 samples/sec   Loss 2.0156   LearningRate 0.0003   Epoch: 21   Global Step: 36830   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:01:23,687-Speed 25146.69 samples/sec   Loss 2.0222   LearningRate 0.0003   Epoch: 21   Global Step: 36840   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:01:33,468-Speed 25131.10 samples/sec   Loss 2.0080   LearningRate 0.0003   Epoch: 21   Global Step: 36850   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:01:43,200-Speed 25257.09 samples/sec   Loss 2.0141   LearningRate 0.0003   Epoch: 21   Global Step: 36860   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:01:52,887-Speed 25373.50 samples/sec   Loss 2.0072   LearningRate 0.0003   Epoch: 21   Global Step: 36870   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:02:02,603-Speed 25302.69 samples/sec   Loss 1.9984   LearningRate 0.0003   Epoch: 21   Global Step: 36880   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:02:12,301-Speed 25351.57 samples/sec   Loss 2.0114   LearningRate 0.0003   Epoch: 21   Global Step: 36890   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:02:22,062-Speed 25182.24 samples/sec   Loss 2.0384   LearningRate 0.0003   Epoch: 21   Global Step: 36900   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:02:31,905-Speed 24970.71 samples/sec   Loss 2.0183   LearningRate 0.0003   Epoch: 21   Global Step: 36910   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:02:41,790-Speed 24868.46 samples/sec   Loss 2.0223   LearningRate 0.0003   Epoch: 21   Global Step: 36920   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-26 09:02:51,526-Speed 25246.05 samples/sec   Loss 2.0032   LearningRate 0.0003   Epoch: 21   Global Step: 36930   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-26 09:03:01,217-Speed 25368.05 samples/sec   Loss 2.0065   LearningRate 0.0003   Epoch: 21   Global Step: 36940   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-26 09:03:10,934-Speed 25296.64 samples/sec   Loss 2.0127   LearningRate 0.0003   Epoch: 21   Global Step: 36950   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-26 09:03:20,641-Speed 25321.54 samples/sec   Loss 2.0202   LearningRate 0.0003   Epoch: 21   Global Step: 36960   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-26 09:03:30,348-Speed 25322.05 samples/sec   Loss 2.0113   LearningRate 0.0003   Epoch: 21   Global Step: 36970   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-26 09:03:40,154-Speed 25067.79 samples/sec   Loss 2.0091   LearningRate 0.0003   Epoch: 21   Global Step: 36980   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-26 09:03:49,877-Speed 25278.49 samples/sec   Loss 2.0139   LearningRate 0.0003   Epoch: 21   Global Step: 36990   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-26 09:03:59,563-Speed 25377.53 samples/sec   Loss 1.9990   LearningRate 0.0003   Epoch: 21   Global Step: 37000   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:04:09,269-Speed 25323.53 samples/sec   Loss 2.0106   LearningRate 0.0003   Epoch: 21   Global Step: 37010   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:04:18,988-Speed 25289.47 samples/sec   Loss 2.0321   LearningRate 0.0003   Epoch: 21   Global Step: 37020   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:04:28,763-Speed 25144.15 samples/sec   Loss 1.9917   LearningRate 0.0003   Epoch: 21   Global Step: 37030   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:04:38,534-Speed 25154.68 samples/sec   Loss 2.0029   LearningRate 0.0003   Epoch: 21   Global Step: 37040   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:04:48,314-Speed 25133.06 samples/sec   Loss 1.9664   LearningRate 0.0003   Epoch: 21   Global Step: 37050   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:04:58,057-Speed 25226.86 samples/sec   Loss 1.9774   LearningRate 0.0003   Epoch: 21   Global Step: 37060   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:05:07,805-Speed 25215.03 samples/sec   Loss 1.9796   LearningRate 0.0003   Epoch: 21   Global Step: 37070   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:05:17,591-Speed 25118.78 samples/sec   Loss 1.9974   LearningRate 0.0003   Epoch: 21   Global Step: 37080   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:05:27,298-Speed 25323.13 samples/sec   Loss 1.9936   LearningRate 0.0003   Epoch: 21   Global Step: 37090   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:05:37,051-Speed 25201.55 samples/sec   Loss 1.9958   LearningRate 0.0003   Epoch: 21   Global Step: 37100   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-26 09:05:46,789-Speed 25238.96 samples/sec   Loss 1.9765   LearningRate 0.0003   Epoch: 21   Global Step: 37110   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-26 09:05:56,517-Speed 25267.40 samples/sec   Loss 1.9888   LearningRate 0.0003   Epoch: 21   Global Step: 37120   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-26 09:06:06,275-Speed 25190.34 samples/sec   Loss 1.9841   LearningRate 0.0003   Epoch: 21   Global Step: 37130   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:06:16,015-Speed 25234.18 samples/sec   Loss 1.9982   LearningRate 0.0003   Epoch: 21   Global Step: 37140   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:06:25,711-Speed 25351.83 samples/sec   Loss 1.9805   LearningRate 0.0003   Epoch: 21   Global Step: 37150   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:06:35,504-Speed 25099.31 samples/sec   Loss 1.9774   LearningRate 0.0003   Epoch: 21   Global Step: 37160   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:06:45,336-Speed 25000.00 samples/sec   Loss 1.9989   LearningRate 0.0003   Epoch: 21   Global Step: 37170   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:06:55,062-Speed 25270.81 samples/sec   Loss 2.0107   LearningRate 0.0003   Epoch: 21   Global Step: 37180   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:07:04,800-Speed 25239.43 samples/sec   Loss 1.9926   LearningRate 0.0003   Epoch: 21   Global Step: 37190   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:07:14,515-Speed 25303.07 samples/sec   Loss 2.0089   LearningRate 0.0003   Epoch: 21   Global Step: 37200   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:07:24,230-Speed 25301.57 samples/sec   Loss 1.9904   LearningRate 0.0003   Epoch: 21   Global Step: 37210   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:07:33,958-Speed 25266.91 samples/sec   Loss 2.0088   LearningRate 0.0003   Epoch: 21   Global Step: 37220   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:07:43,813-Speed 24940.18 samples/sec   Loss 1.9836   LearningRate 0.0003   Epoch: 21   Global Step: 37230   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-26 09:07:53,596-Speed 25125.70 samples/sec   Loss 1.9933   LearningRate 0.0003   Epoch: 21   Global Step: 37240   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:08:03,405-Speed 25057.06 samples/sec   Loss 1.9889   LearningRate 0.0003   Epoch: 21   Global Step: 37250   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:08:13,181-Speed 25141.86 samples/sec   Loss 2.0077   LearningRate 0.0003   Epoch: 21   Global Step: 37260   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:08:22,943-Speed 25179.21 samples/sec   Loss 1.9921   LearningRate 0.0003   Epoch: 21   Global Step: 37270   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:08:32,644-Speed 25334.96 samples/sec   Loss 1.9694   LearningRate 0.0003   Epoch: 21   Global Step: 37280   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:08:42,398-Speed 25199.32 samples/sec   Loss 1.9914   LearningRate 0.0003   Epoch: 21   Global Step: 37290   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:08:52,129-Speed 25262.93 samples/sec   Loss 1.9904   LearningRate 0.0003   Epoch: 21   Global Step: 37300   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:09:01,877-Speed 25216.53 samples/sec   Loss 1.9856   LearningRate 0.0003   Epoch: 21   Global Step: 37310   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:09:11,623-Speed 25220.50 samples/sec   Loss 1.9921   LearningRate 0.0003   Epoch: 21   Global Step: 37320   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:09:21,396-Speed 25151.99 samples/sec   Loss 2.0018   LearningRate 0.0003   Epoch: 21   Global Step: 37330   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:09:31,266-Speed 24904.02 samples/sec   Loss 2.0103   LearningRate 0.0003   Epoch: 21   Global Step: 37340   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-26 09:09:41,095-Speed 25004.83 samples/sec   Loss 2.0132   LearningRate 0.0003   Epoch: 21   Global Step: 37350   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-26 09:09:50,957-Speed 24925.18 samples/sec   Loss 1.9738   LearningRate 0.0003   Epoch: 21   Global Step: 37360   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-26 09:10:00,722-Speed 25172.42 samples/sec   Loss 1.9861   LearningRate 0.0003   Epoch: 21   Global Step: 37370   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-26 09:10:10,448-Speed 25273.33 samples/sec   Loss 1.9936   LearningRate 0.0003   Epoch: 21   Global Step: 37380   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-26 09:10:20,132-Speed 25381.43 samples/sec   Loss 1.9948   LearningRate 0.0003   Epoch: 21   Global Step: 37390   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-26 09:10:29,908-Speed 25141.74 samples/sec   Loss 1.9921   LearningRate 0.0003   Epoch: 21   Global Step: 37400   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-26 09:10:39,708-Speed 25081.08 samples/sec   Loss 1.9814   LearningRate 0.0003   Epoch: 21   Global Step: 37410   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:10:49,403-Speed 25354.52 samples/sec   Loss 1.9775   LearningRate 0.0003   Epoch: 21   Global Step: 37420   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:10:59,173-Speed 25158.14 samples/sec   Loss 1.9945   LearningRate 0.0003   Epoch: 21   Global Step: 37430   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:11:08,874-Speed 25337.66 samples/sec   Loss 1.9652   LearningRate 0.0003   Epoch: 21   Global Step: 37440   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:11:18,594-Speed 25285.82 samples/sec   Loss 1.9799   LearningRate 0.0003   Epoch: 21   Global Step: 37450   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:11:28,467-Speed 24896.84 samples/sec   Loss 1.9862   LearningRate 0.0003   Epoch: 21   Global Step: 37460   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:11:38,311-Speed 24969.18 samples/sec   Loss 1.9918   LearningRate 0.0003   Epoch: 21   Global Step: 37470   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:11:48,075-Speed 25173.24 samples/sec   Loss 1.9860   LearningRate 0.0003   Epoch: 21   Global Step: 37480   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:11:57,917-Speed 24972.49 samples/sec   Loss 1.9706   LearningRate 0.0003   Epoch: 21   Global Step: 37490   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:12:07,663-Speed 25219.65 samples/sec   Loss 1.9575   LearningRate 0.0003   Epoch: 21   Global Step: 37500   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:12:17,431-Speed 25164.47 samples/sec   Loss 1.9839   LearningRate 0.0003   Epoch: 21   Global Step: 37510   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-26 09:12:27,187-Speed 25192.25 samples/sec   Loss 1.9677   LearningRate 0.0003   Epoch: 21   Global Step: 37520   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-26 09:12:36,927-Speed 25237.22 samples/sec   Loss 1.9718   LearningRate 0.0003   Epoch: 21   Global Step: 37530   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-26 09:12:46,595-Speed 25422.11 samples/sec   Loss 1.9848   LearningRate 0.0003   Epoch: 21   Global Step: 37540   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:12:56,257-Speed 25440.59 samples/sec   Loss 1.9651   LearningRate 0.0003   Epoch: 21   Global Step: 37550   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:13:06,026-Speed 25162.18 samples/sec   Loss 1.9670   LearningRate 0.0003   Epoch: 21   Global Step: 37560   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:13:15,778-Speed 25203.95 samples/sec   Loss 1.9908   LearningRate 0.0003   Epoch: 21   Global Step: 37570   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:13:25,496-Speed 25292.49 samples/sec   Loss 1.9886   LearningRate 0.0003   Epoch: 21   Global Step: 37580   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:13:35,229-Speed 25252.21 samples/sec   Loss 1.9706   LearningRate 0.0003   Epoch: 21   Global Step: 37590   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:13:45,003-Speed 25147.94 samples/sec   Loss 1.9953   LearningRate 0.0003   Epoch: 21   Global Step: 37600   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:13:54,767-Speed 25175.15 samples/sec   Loss 1.9691   LearningRate 0.0003   Epoch: 21   Global Step: 37610   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:14:04,528-Speed 25181.05 samples/sec   Loss 1.9790   LearningRate 0.0003   Epoch: 21   Global Step: 37620   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:14:14,247-Speed 25290.09 samples/sec   Loss 1.9899   LearningRate 0.0003   Epoch: 21   Global Step: 37630   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:14:24,015-Speed 25164.91 samples/sec   Loss 1.9946   LearningRate 0.0003   Epoch: 21   Global Step: 37640   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-26 09:14:33,707-Speed 25361.18 samples/sec   Loss 1.9820   LearningRate 0.0003   Epoch: 21   Global Step: 37650   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:14:43,491-Speed 25124.94 samples/sec   Loss 1.9823   LearningRate 0.0003   Epoch: 21   Global Step: 37660   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:14:53,223-Speed 25256.23 samples/sec   Loss 1.9921   LearningRate 0.0003   Epoch: 21   Global Step: 37670   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:15:03,029-Speed 25065.92 samples/sec   Loss 1.9721   LearningRate 0.0003   Epoch: 21   Global Step: 37680   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:15:12,797-Speed 25160.31 samples/sec   Loss 1.9820   LearningRate 0.0003   Epoch: 21   Global Step: 37690   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:15:22,569-Speed 25154.29 samples/sec   Loss 1.9648   LearningRate 0.0003   Epoch: 21   Global Step: 37700   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:15:32,410-Speed 24977.12 samples/sec   Loss 1.9920   LearningRate 0.0003   Epoch: 21   Global Step: 37710   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:15:42,113-Speed 25331.15 samples/sec   Loss 2.0188   LearningRate 0.0003   Epoch: 21   Global Step: 37720   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:15:51,777-Speed 25436.01 samples/sec   Loss 2.0006   LearningRate 0.0003   Epoch: 21   Global Step: 37730   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:16:01,510-Speed 25251.49 samples/sec   Loss 1.9626   LearningRate 0.0003   Epoch: 21   Global Step: 37740   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:16:11,244-Speed 25251.02 samples/sec   Loss 1.9622   LearningRate 0.0003   Epoch: 21   Global Step: 37750   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-26 09:16:20,967-Speed 25279.80 samples/sec   Loss 1.9586   LearningRate 0.0003   Epoch: 21   Global Step: 37760   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:16:30,778-Speed 25053.05 samples/sec   Loss 1.9644   LearningRate 0.0003   Epoch: 21   Global Step: 37770   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:16:40,580-Speed 25076.82 samples/sec   Loss 1.9666   LearningRate 0.0003   Epoch: 21   Global Step: 37780   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:16:50,437-Speed 24937.41 samples/sec   Loss 1.9743   LearningRate 0.0003   Epoch: 21   Global Step: 37790   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:17:00,224-Speed 25112.95 samples/sec   Loss 1.9576   LearningRate 0.0003   Epoch: 21   Global Step: 37800   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:17:10,081-Speed 24937.39 samples/sec   Loss 1.9682   LearningRate 0.0003   Epoch: 21   Global Step: 37810   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:17:19,857-Speed 25142.23 samples/sec   Loss 1.9778   LearningRate 0.0003   Epoch: 21   Global Step: 37820   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:17:29,578-Speed 25287.21 samples/sec   Loss 1.9769   LearningRate 0.0003   Epoch: 21   Global Step: 37830   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:17:39,374-Speed 25091.14 samples/sec   Loss 1.9869   LearningRate 0.0003   Epoch: 21   Global Step: 37840   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:17:49,122-Speed 25213.10 samples/sec   Loss 1.9642   LearningRate 0.0003   Epoch: 21   Global Step: 37850   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:17:58,830-Speed 25318.66 samples/sec   Loss 1.9739   LearningRate 0.0003   Epoch: 21   Global Step: 37860   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:18:08,617-Speed 25115.00 samples/sec   Loss 1.9688   LearningRate 0.0003   Epoch: 21   Global Step: 37870   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:18:18,328-Speed 25310.66 samples/sec   Loss 1.9533   LearningRate 0.0003   Epoch: 21   Global Step: 37880   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:18:28,074-Speed 25220.52 samples/sec   Loss 1.9930   LearningRate 0.0003   Epoch: 21   Global Step: 37890   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:18:37,892-Speed 25032.56 samples/sec   Loss 1.9572   LearningRate 0.0003   Epoch: 21   Global Step: 37900   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:18:47,611-Speed 25291.37 samples/sec   Loss 1.9589   LearningRate 0.0003   Epoch: 21   Global Step: 37910   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:18:57,350-Speed 25238.80 samples/sec   Loss 1.9800   LearningRate 0.0003   Epoch: 21   Global Step: 37920   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:19:07,070-Speed 25286.66 samples/sec   Loss 1.9808   LearningRate 0.0003   Epoch: 21   Global Step: 37930   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:19:16,769-Speed 25341.16 samples/sec   Loss 1.9779   LearningRate 0.0003   Epoch: 21   Global Step: 37940   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:19:26,560-Speed 25104.58 samples/sec   Loss 1.9951   LearningRate 0.0003   Epoch: 21   Global Step: 37950   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:19:36,304-Speed 25228.29 samples/sec   Loss 1.9838   LearningRate 0.0003   Epoch: 21   Global Step: 37960   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-26 09:19:46,031-Speed 25269.94 samples/sec   Loss 1.9506   LearningRate 0.0003   Epoch: 21   Global Step: 37970   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-26 09:19:55,706-Speed 25407.72 samples/sec   Loss 1.9765   LearningRate 0.0003   Epoch: 21   Global Step: 37980   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-26 09:20:05,498-Speed 25100.96 samples/sec   Loss 1.9846   LearningRate 0.0003   Epoch: 21   Global Step: 37990   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-26 09:20:15,244-Speed 25219.72 samples/sec   Loss 1.9739   LearningRate 0.0003   Epoch: 21   Global Step: 38000   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-26 09:20:24,997-Speed 25202.78 samples/sec   Loss 1.9811   LearningRate 0.0003   Epoch: 21   Global Step: 38010   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-26 09:20:34,831-Speed 24995.54 samples/sec   Loss 1.9992   LearningRate 0.0002   Epoch: 21   Global Step: 38020   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-26 09:21:33,695-Speed 4175.13 samples/sec   Loss 1.9528   LearningRate 0.0002   Epoch: 22   Global Step: 38030   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:21:43,872-Speed 24151.72 samples/sec   Loss 1.9513   LearningRate 0.0002   Epoch: 22   Global Step: 38040   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:21:53,626-Speed 25200.17 samples/sec   Loss 1.9722   LearningRate 0.0002   Epoch: 22   Global Step: 38050   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:22:03,394-Speed 25164.25 samples/sec   Loss 1.9267   LearningRate 0.0002   Epoch: 22   Global Step: 38060   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:22:13,152-Speed 25189.20 samples/sec   Loss 1.9408   LearningRate 0.0002   Epoch: 22   Global Step: 38070   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:22:22,909-Speed 25191.32 samples/sec   Loss 1.9375   LearningRate 0.0002   Epoch: 22   Global Step: 38080   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:22:32,733-Speed 25020.14 samples/sec   Loss 1.9119   LearningRate 0.0002   Epoch: 22   Global Step: 38090   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:22:42,502-Speed 25160.86 samples/sec   Loss 1.9430   LearningRate 0.0002   Epoch: 22   Global Step: 38100   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:22:52,358-Speed 24937.15 samples/sec   Loss 1.9440   LearningRate 0.0002   Epoch: 22   Global Step: 38110   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:23:02,084-Speed 25270.87 samples/sec   Loss 1.9516   LearningRate 0.0002   Epoch: 22   Global Step: 38120   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:23:11,881-Speed 25089.57 samples/sec   Loss 1.9654   LearningRate 0.0002   Epoch: 22   Global Step: 38130   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-26 09:23:21,607-Speed 25273.61 samples/sec   Loss 1.9494   LearningRate 0.0002   Epoch: 22   Global Step: 38140   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-26 09:23:31,471-Speed 24919.95 samples/sec   Loss 1.9553   LearningRate 0.0002   Epoch: 22   Global Step: 38150   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-26 09:23:41,175-Speed 25330.47 samples/sec   Loss 1.9276   LearningRate 0.0002   Epoch: 22   Global Step: 38160   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-26 09:23:50,918-Speed 25228.52 samples/sec   Loss 1.9509   LearningRate 0.0002   Epoch: 22   Global Step: 38170   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:24:00,719-Speed 25080.09 samples/sec   Loss 1.9645   LearningRate 0.0002   Epoch: 22   Global Step: 38180   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:24:10,535-Speed 25038.83 samples/sec   Loss 1.9492   LearningRate 0.0002   Epoch: 22   Global Step: 38190   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:24:20,312-Speed 25140.87 samples/sec   Loss 1.9388   LearningRate 0.0002   Epoch: 22   Global Step: 38200   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:24:30,137-Speed 25019.48 samples/sec   Loss 1.9251   LearningRate 0.0002   Epoch: 22   Global Step: 38210   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:24:39,819-Speed 25385.84 samples/sec   Loss 1.9271   LearningRate 0.0002   Epoch: 22   Global Step: 38220   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:24:49,506-Speed 25375.22 samples/sec   Loss 1.9571   LearningRate 0.0002   Epoch: 22   Global Step: 38230   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:24:59,219-Speed 25307.92 samples/sec   Loss 1.9212   LearningRate 0.0002   Epoch: 22   Global Step: 38240   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:25:09,003-Speed 25122.48 samples/sec   Loss 1.9594   LearningRate 0.0002   Epoch: 22   Global Step: 38250   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:25:18,824-Speed 25028.23 samples/sec   Loss 1.9347   LearningRate 0.0002   Epoch: 22   Global Step: 38260   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:25:28,750-Speed 24760.73 samples/sec   Loss 1.9277   LearningRate 0.0002   Epoch: 22   Global Step: 38270   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-26 09:25:38,622-Speed 24900.54 samples/sec   Loss 1.9601   LearningRate 0.0002   Epoch: 22   Global Step: 38280   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-26 09:25:48,358-Speed 25245.99 samples/sec   Loss 1.9541   LearningRate 0.0002   Epoch: 22   Global Step: 38290   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-26 09:25:58,265-Speed 24809.32 samples/sec   Loss 1.9555   LearningRate 0.0002   Epoch: 22   Global Step: 38300   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-26 09:26:08,039-Speed 25147.22 samples/sec   Loss 1.9619   LearningRate 0.0002   Epoch: 22   Global Step: 38310   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-26 09:26:17,883-Speed 24968.61 samples/sec   Loss 1.9403   LearningRate 0.0002   Epoch: 22   Global Step: 38320   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:26:27,706-Speed 25022.63 samples/sec   Loss 1.9452   LearningRate 0.0002   Epoch: 22   Global Step: 38330   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:26:37,560-Speed 24945.37 samples/sec   Loss 1.9478   LearningRate 0.0002   Epoch: 22   Global Step: 38340   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:26:47,244-Speed 25380.80 samples/sec   Loss 1.9515   LearningRate 0.0002   Epoch: 22   Global Step: 38350   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:26:57,000-Speed 25194.62 samples/sec   Loss 1.9475   LearningRate 0.0002   Epoch: 22   Global Step: 38360   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:27:06,853-Speed 24946.84 samples/sec   Loss 1.9490   LearningRate 0.0002   Epoch: 22   Global Step: 38370   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:27:16,575-Speed 25283.69 samples/sec   Loss 1.9645   LearningRate 0.0002   Epoch: 22   Global Step: 38380   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:27:26,327-Speed 25202.82 samples/sec   Loss 1.9421   LearningRate 0.0002   Epoch: 22   Global Step: 38390   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:27:36,115-Speed 25112.56 samples/sec   Loss 1.9242   LearningRate 0.0002   Epoch: 22   Global Step: 38400   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:27:45,956-Speed 24976.01 samples/sec   Loss 1.9469   LearningRate 0.0002   Epoch: 22   Global Step: 38410   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:27:55,654-Speed 25343.85 samples/sec   Loss 1.9405   LearningRate 0.0002   Epoch: 22   Global Step: 38420   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:28:05,530-Speed 24889.64 samples/sec   Loss 1.9438   LearningRate 0.0002   Epoch: 22   Global Step: 38430   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:28:15,301-Speed 25154.34 samples/sec   Loss 1.9484   LearningRate 0.0002   Epoch: 22   Global Step: 38440   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:28:25,157-Speed 24946.18 samples/sec   Loss 1.9301   LearningRate 0.0002   Epoch: 22   Global Step: 38450   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:28:34,957-Speed 25081.33 samples/sec   Loss 1.9278   LearningRate 0.0002   Epoch: 22   Global Step: 38460   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:28:44,729-Speed 25151.68 samples/sec   Loss 1.9352   LearningRate 0.0002   Epoch: 22   Global Step: 38470   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:28:54,592-Speed 24921.38 samples/sec   Loss 1.9415   LearningRate 0.0002   Epoch: 22   Global Step: 38480   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:29:04,491-Speed 24829.69 samples/sec   Loss 1.9622   LearningRate 0.0002   Epoch: 22   Global Step: 38490   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:29:14,228-Speed 25242.04 samples/sec   Loss 1.9460   LearningRate 0.0002   Epoch: 22   Global Step: 38500   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:29:24,038-Speed 25055.24 samples/sec   Loss 1.9419   LearningRate 0.0002   Epoch: 22   Global Step: 38510   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:29:33,861-Speed 25023.53 samples/sec   Loss 1.9391   LearningRate 0.0002   Epoch: 22   Global Step: 38520   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-26 09:29:43,655-Speed 25096.64 samples/sec   Loss 1.9363   LearningRate 0.0002   Epoch: 22   Global Step: 38530   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-26 09:29:53,528-Speed 24893.99 samples/sec   Loss 1.9588   LearningRate 0.0002   Epoch: 22   Global Step: 38540   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-26 09:30:03,298-Speed 25158.39 samples/sec   Loss 1.9757   LearningRate 0.0002   Epoch: 22   Global Step: 38550   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-26 09:30:13,114-Speed 25042.68 samples/sec   Loss 1.9422   LearningRate 0.0002   Epoch: 22   Global Step: 38560   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:30:22,824-Speed 25310.88 samples/sec   Loss 1.9132   LearningRate 0.0002   Epoch: 22   Global Step: 38570   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:30:32,769-Speed 24716.88 samples/sec   Loss 1.9326   LearningRate 0.0002   Epoch: 22   Global Step: 38580   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:30:42,517-Speed 25213.86 samples/sec   Loss 1.9304   LearningRate 0.0002   Epoch: 22   Global Step: 38590   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:30:52,385-Speed 24908.29 samples/sec   Loss 1.9351   LearningRate 0.0002   Epoch: 22   Global Step: 38600   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:31:02,209-Speed 25021.29 samples/sec   Loss 1.9299   LearningRate 0.0002   Epoch: 22   Global Step: 38610   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:31:12,181-Speed 24650.36 samples/sec   Loss 1.9531   LearningRate 0.0002   Epoch: 22   Global Step: 38620   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:31:21,902-Speed 25287.03 samples/sec   Loss 1.9231   LearningRate 0.0002   Epoch: 22   Global Step: 38630   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:31:31,597-Speed 25354.82 samples/sec   Loss 1.9183   LearningRate 0.0002   Epoch: 22   Global Step: 38640   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:31:41,457-Speed 24929.57 samples/sec   Loss 1.9279   LearningRate 0.0002   Epoch: 22   Global Step: 38650   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:31:51,181-Speed 25279.01 samples/sec   Loss 1.9393   LearningRate 0.0002   Epoch: 22   Global Step: 38660   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-26 09:32:00,914-Speed 25254.41 samples/sec   Loss 1.9280   LearningRate 0.0002   Epoch: 22   Global Step: 38670   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:32:10,767-Speed 24945.94 samples/sec   Loss 1.9504   LearningRate 0.0002   Epoch: 22   Global Step: 38680   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:32:20,560-Speed 25097.76 samples/sec   Loss 1.9381   LearningRate 0.0002   Epoch: 22   Global Step: 38690   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:32:30,429-Speed 24906.50 samples/sec   Loss 1.9255   LearningRate 0.0002   Epoch: 22   Global Step: 38700   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:32:40,223-Speed 25093.97 samples/sec   Loss 1.9446   LearningRate 0.0002   Epoch: 22   Global Step: 38710   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:32:50,080-Speed 24937.84 samples/sec   Loss 1.9580   LearningRate 0.0002   Epoch: 22   Global Step: 38720   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:32:59,966-Speed 24860.25 samples/sec   Loss 1.9499   LearningRate 0.0002   Epoch: 22   Global Step: 38730   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:33:09,758-Speed 25101.89 samples/sec   Loss 1.9183   LearningRate 0.0002   Epoch: 22   Global Step: 38740   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:33:19,581-Speed 25022.53 samples/sec   Loss 1.9128   LearningRate 0.0002   Epoch: 22   Global Step: 38750   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:33:29,304-Speed 25279.92 samples/sec   Loss 1.9129   LearningRate 0.0002   Epoch: 22   Global Step: 38760   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:33:39,030-Speed 25272.59 samples/sec   Loss 1.9185   LearningRate 0.0002   Epoch: 22   Global Step: 38770   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-26 09:33:48,832-Speed 25073.83 samples/sec   Loss 1.9202   LearningRate 0.0002   Epoch: 22   Global Step: 38780   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-26 09:33:58,727-Speed 24839.83 samples/sec   Loss 1.9282   LearningRate 0.0002   Epoch: 22   Global Step: 38790   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-26 09:34:08,463-Speed 25246.78 samples/sec   Loss 1.9165   LearningRate 0.0002   Epoch: 22   Global Step: 38800   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-26 09:34:18,305-Speed 24975.48 samples/sec   Loss 1.9270   LearningRate 0.0002   Epoch: 22   Global Step: 38810   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-26 09:34:28,077-Speed 25152.68 samples/sec   Loss 1.9223   LearningRate 0.0002   Epoch: 22   Global Step: 38820   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:34:37,848-Speed 25154.43 samples/sec   Loss 1.9409   LearningRate 0.0002   Epoch: 22   Global Step: 38830   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:34:47,560-Speed 25309.41 samples/sec   Loss 1.9218   LearningRate 0.0002   Epoch: 22   Global Step: 38840   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:34:57,275-Speed 25302.19 samples/sec   Loss 1.9194   LearningRate 0.0002   Epoch: 22   Global Step: 38850   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:35:07,045-Speed 25155.67 samples/sec   Loss 1.9263   LearningRate 0.0002   Epoch: 22   Global Step: 38860   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:35:16,855-Speed 25056.93 samples/sec   Loss 1.9425   LearningRate 0.0002   Epoch: 22   Global Step: 38870   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:35:26,642-Speed 25114.38 samples/sec   Loss 1.9268   LearningRate 0.0002   Epoch: 22   Global Step: 38880   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:35:36,458-Speed 25040.57 samples/sec   Loss 1.9178   LearningRate 0.0002   Epoch: 22   Global Step: 38890   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:35:46,267-Speed 25060.04 samples/sec   Loss 1.9242   LearningRate 0.0002   Epoch: 22   Global Step: 38900   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:35:56,152-Speed 24863.33 samples/sec   Loss 1.9276   LearningRate 0.0002   Epoch: 22   Global Step: 38910   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:36:05,973-Speed 25026.03 samples/sec   Loss 1.9243   LearningRate 0.0002   Epoch: 22   Global Step: 38920   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-26 09:36:15,744-Speed 25154.94 samples/sec   Loss 1.9106   LearningRate 0.0002   Epoch: 22   Global Step: 38930   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-26 09:36:25,489-Speed 25224.19 samples/sec   Loss 1.8922   LearningRate 0.0002   Epoch: 22   Global Step: 38940   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-26 09:36:35,212-Speed 25279.30 samples/sec   Loss 1.8996   LearningRate 0.0002   Epoch: 22   Global Step: 38950   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-26 09:36:44,949-Speed 25243.49 samples/sec   Loss 1.9050   LearningRate 0.0002   Epoch: 22   Global Step: 38960   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-26 09:36:54,699-Speed 25208.39 samples/sec   Loss 1.9411   LearningRate 0.0002   Epoch: 22   Global Step: 38970   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-26 09:37:04,455-Speed 25199.97 samples/sec   Loss 1.9176   LearningRate 0.0002   Epoch: 22   Global Step: 38980   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-26 09:37:14,231-Speed 25142.51 samples/sec   Loss 1.9389   LearningRate 0.0002   Epoch: 22   Global Step: 38990   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-26 09:37:23,999-Speed 25162.23 samples/sec   Loss 1.9154   LearningRate 0.0002   Epoch: 22   Global Step: 39000   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:37:33,728-Speed 25262.07 samples/sec   Loss 1.9000   LearningRate 0.0002   Epoch: 22   Global Step: 39010   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-26 09:37:43,444-Speed 25296.69 samples/sec   Loss 1.9079   LearningRate 0.0002   Epoch: 22   Global Step: 39020   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 09:37:53,373-Speed 24756.31 samples/sec   Loss 1.9207   LearningRate 0.0002   Epoch: 22   Global Step: 39030   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 09:38:03,496-Speed 24287.23 samples/sec   Loss 1.9137   LearningRate 0.0002   Epoch: 22   Global Step: 39040   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 09:38:13,390-Speed 24841.46 samples/sec   Loss 1.9076   LearningRate 0.0002   Epoch: 22   Global Step: 39050   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 09:38:23,243-Speed 24943.50 samples/sec   Loss 1.9339   LearningRate 0.0002   Epoch: 22   Global Step: 39060   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 09:38:33,104-Speed 24926.10 samples/sec   Loss 1.9242   LearningRate 0.0002   Epoch: 22   Global Step: 39070   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 09:38:43,072-Speed 24657.05 samples/sec   Loss 1.9227   LearningRate 0.0002   Epoch: 22   Global Step: 39080   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 09:38:52,793-Speed 25286.21 samples/sec   Loss 1.9074   LearningRate 0.0002   Epoch: 22   Global Step: 39090   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 09:39:02,488-Speed 25350.59 samples/sec   Loss 1.9009   LearningRate 0.0002   Epoch: 22   Global Step: 39100   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-03-26 09:39:12,397-Speed 24805.62 samples/sec   Loss 1.9007   LearningRate 0.0002   Epoch: 22   Global Step: 39110   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-03-26 09:39:22,146-Speed 25216.96 samples/sec   Loss 1.9141   LearningRate 0.0002   Epoch: 22   Global Step: 39120   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 09:39:31,965-Speed 25034.63 samples/sec   Loss 1.9036   LearningRate 0.0002   Epoch: 22   Global Step: 39130   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 09:39:41,905-Speed 24727.91 samples/sec   Loss 1.9003   LearningRate 0.0002   Epoch: 22   Global Step: 39140   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 09:39:51,696-Speed 25102.93 samples/sec   Loss 1.8990   LearningRate 0.0002   Epoch: 22   Global Step: 39150   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 09:40:01,505-Speed 25058.72 samples/sec   Loss 1.9427   LearningRate 0.0002   Epoch: 22   Global Step: 39160   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 09:40:11,366-Speed 24924.38 samples/sec   Loss 1.9160   LearningRate 0.0002   Epoch: 22   Global Step: 39170   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 09:40:21,131-Speed 25172.06 samples/sec   Loss 1.9182   LearningRate 0.0002   Epoch: 22   Global Step: 39180   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 09:40:30,904-Speed 25150.89 samples/sec   Loss 1.9155   LearningRate 0.0002   Epoch: 22   Global Step: 39190   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 09:40:40,616-Speed 25308.32 samples/sec   Loss 1.9127   LearningRate 0.0002   Epoch: 22   Global Step: 39200   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 09:40:50,487-Speed 24899.28 samples/sec   Loss 1.9095   LearningRate 0.0002   Epoch: 22   Global Step: 39210   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 09:41:00,356-Speed 24912.78 samples/sec   Loss 1.9244   LearningRate 0.0002   Epoch: 22   Global Step: 39220   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 09:41:10,282-Speed 24762.09 samples/sec   Loss 1.9146   LearningRate 0.0002   Epoch: 22   Global Step: 39230   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 09:41:20,113-Speed 25002.73 samples/sec   Loss 1.9021   LearningRate 0.0002   Epoch: 22   Global Step: 39240   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 09:41:29,810-Speed 25347.66 samples/sec   Loss 1.8928   LearningRate 0.0002   Epoch: 22   Global Step: 39250   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 09:41:39,543-Speed 25253.43 samples/sec   Loss 1.8772   LearningRate 0.0002   Epoch: 22   Global Step: 39260   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 09:41:49,274-Speed 25258.01 samples/sec   Loss 1.8944   LearningRate 0.0002   Epoch: 22   Global Step: 39270   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-03-26 09:41:59,091-Speed 25037.87 samples/sec   Loss 1.9123   LearningRate 0.0002   Epoch: 22   Global Step: 39280   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-03-26 09:42:08,939-Speed 24958.15 samples/sec   Loss 1.9054   LearningRate 0.0002   Epoch: 22   Global Step: 39290   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-03-26 09:42:18,714-Speed 25146.28 samples/sec   Loss 1.9287   LearningRate 0.0002   Epoch: 22   Global Step: 39300   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-03-26 09:42:28,422-Speed 25317.40 samples/sec   Loss 1.9049   LearningRate 0.0002   Epoch: 22   Global Step: 39310   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-03-26 09:42:38,263-Speed 24979.24 samples/sec   Loss 1.8913   LearningRate 0.0002   Epoch: 22   Global Step: 39320   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-03-26 09:42:47,974-Speed 25309.07 samples/sec   Loss 1.8888   LearningRate 0.0002   Epoch: 22   Global Step: 39330   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-03-26 09:42:57,851-Speed 24885.15 samples/sec   Loss 1.9054   LearningRate 0.0002   Epoch: 22   Global Step: 39340   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-03-26 09:43:07,582-Speed 25261.79 samples/sec   Loss 1.8868   LearningRate 0.0002   Epoch: 22   Global Step: 39350   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-03-26 09:43:17,291-Speed 25315.93 samples/sec   Loss 1.8930   LearningRate 0.0002   Epoch: 22   Global Step: 39360   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-03-26 09:43:27,028-Speed 25240.35 samples/sec   Loss 1.8948   LearningRate 0.0002   Epoch: 22   Global Step: 39370   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 09:43:36,795-Speed 25166.66 samples/sec   Loss 1.9054   LearningRate 0.0002   Epoch: 22   Global Step: 39380   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 09:43:46,561-Speed 25170.84 samples/sec   Loss 1.8889   LearningRate 0.0002   Epoch: 22   Global Step: 39390   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 09:43:56,515-Speed 24698.69 samples/sec   Loss 1.9078   LearningRate 0.0002   Epoch: 22   Global Step: 39400   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 09:44:06,214-Speed 25341.36 samples/sec   Loss 1.9311   LearningRate 0.0002   Epoch: 22   Global Step: 39410   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 09:44:15,903-Speed 25370.88 samples/sec   Loss 1.8994   LearningRate 0.0002   Epoch: 22   Global Step: 39420   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 09:44:25,610-Speed 25323.12 samples/sec   Loss 1.8621   LearningRate 0.0002   Epoch: 22   Global Step: 39430   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 09:44:35,362-Speed 25204.28 samples/sec   Loss 1.8831   LearningRate 0.0002   Epoch: 22   Global Step: 39440   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 09:44:45,093-Speed 25258.43 samples/sec   Loss 1.9166   LearningRate 0.0002   Epoch: 22   Global Step: 39450   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 09:44:54,829-Speed 25246.06 samples/sec   Loss 1.8940   LearningRate 0.0002   Epoch: 22   Global Step: 39460   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 09:45:04,591-Speed 25178.87 samples/sec   Loss 1.9021   LearningRate 0.0002   Epoch: 22   Global Step: 39470   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-03-26 09:45:14,306-Speed 25300.28 samples/sec   Loss 1.8932   LearningRate 0.0002   Epoch: 22   Global Step: 39480   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 09:45:24,181-Speed 24891.65 samples/sec   Loss 1.9092   LearningRate 0.0002   Epoch: 22   Global Step: 39490   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 09:45:34,076-Speed 24839.00 samples/sec   Loss 1.9102   LearningRate 0.0002   Epoch: 22   Global Step: 39500   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 09:45:43,935-Speed 24931.31 samples/sec   Loss 1.9103   LearningRate 0.0002   Epoch: 22   Global Step: 39510   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 09:45:53,702-Speed 25166.48 samples/sec   Loss 1.9368   LearningRate 0.0002   Epoch: 22   Global Step: 39520   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 09:46:03,494-Speed 25102.35 samples/sec   Loss 1.8893   LearningRate 0.0002   Epoch: 22   Global Step: 39530   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 09:46:13,297-Speed 25070.76 samples/sec   Loss 1.8845   LearningRate 0.0002   Epoch: 22   Global Step: 39540   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 09:46:22,999-Speed 25333.99 samples/sec   Loss 1.8900   LearningRate 0.0002   Epoch: 22   Global Step: 39550   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 09:46:32,669-Speed 25419.53 samples/sec   Loss 1.9028   LearningRate 0.0002   Epoch: 22   Global Step: 39560   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 09:46:42,387-Speed 25292.81 samples/sec   Loss 1.8967   LearningRate 0.0002   Epoch: 22   Global Step: 39570   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 09:46:52,167-Speed 25131.04 samples/sec   Loss 1.9012   LearningRate 0.0002   Epoch: 22   Global Step: 39580   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-03-26 09:47:02,034-Speed 24911.98 samples/sec   Loss 1.9024   LearningRate 0.0002   Epoch: 22   Global Step: 39590   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 09:47:11,754-Speed 25289.26 samples/sec   Loss 1.9147   LearningRate 0.0002   Epoch: 22   Global Step: 39600   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 09:47:21,559-Speed 25069.05 samples/sec   Loss 1.9007   LearningRate 0.0002   Epoch: 22   Global Step: 39610   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 09:47:31,333-Speed 25146.34 samples/sec   Loss 1.8860   LearningRate 0.0002   Epoch: 22   Global Step: 39620   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 09:47:41,186-Speed 24945.89 samples/sec   Loss 1.8932   LearningRate 0.0002   Epoch: 22   Global Step: 39630   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 09:47:51,189-Speed 24573.52 samples/sec   Loss 1.8836   LearningRate 0.0002   Epoch: 22   Global Step: 39640   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 09:48:01,137-Speed 24707.84 samples/sec   Loss 1.8820   LearningRate 0.0002   Epoch: 22   Global Step: 39650   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 09:48:11,108-Speed 24648.83 samples/sec   Loss 1.8959   LearningRate 0.0002   Epoch: 22   Global Step: 39660   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 09:48:21,024-Speed 24788.62 samples/sec   Loss 1.9007   LearningRate 0.0002   Epoch: 22   Global Step: 39670   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 09:48:31,006-Speed 24622.04 samples/sec   Loss 1.8920   LearningRate 0.0002   Epoch: 22   Global Step: 39680   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 09:48:40,949-Speed 24720.01 samples/sec   Loss 1.8983   LearningRate 0.0002   Epoch: 22   Global Step: 39690   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-03-26 09:48:50,754-Speed 25070.39 samples/sec   Loss 1.9092   LearningRate 0.0002   Epoch: 22   Global Step: 39700   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-03-26 09:49:00,512-Speed 25188.97 samples/sec   Loss 1.9050   LearningRate 0.0002   Epoch: 22   Global Step: 39710   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-03-26 09:49:10,230-Speed 25293.07 samples/sec   Loss 1.9142   LearningRate 0.0002   Epoch: 22   Global Step: 39720   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 09:49:20,120-Speed 24851.20 samples/sec   Loss 1.9108   LearningRate 0.0002   Epoch: 22   Global Step: 39730   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 09:49:30,037-Speed 24786.43 samples/sec   Loss 1.9039   LearningRate 0.0002   Epoch: 22   Global Step: 39740   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 09:49:39,954-Speed 24784.70 samples/sec   Loss 1.8955   LearningRate 0.0002   Epoch: 22   Global Step: 39750   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 09:50:39,013-Speed 4161.34 samples/sec   Loss 1.8816   LearningRate 0.0002   Epoch: 23   Global Step: 39760   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 09:50:48,909-Speed 24840.35 samples/sec   Loss 1.8625   LearningRate 0.0002   Epoch: 23   Global Step: 39770   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 09:50:58,746-Speed 24988.70 samples/sec   Loss 1.8682   LearningRate 0.0002   Epoch: 23   Global Step: 39780   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 09:51:08,513-Speed 25164.91 samples/sec   Loss 1.8815   LearningRate 0.0002   Epoch: 23   Global Step: 39790   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 09:51:18,422-Speed 24805.99 samples/sec   Loss 1.8805   LearningRate 0.0002   Epoch: 23   Global Step: 39800   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 09:51:28,283-Speed 24926.25 samples/sec   Loss 1.8912   LearningRate 0.0002   Epoch: 23   Global Step: 39810   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 09:51:38,056-Speed 25148.61 samples/sec   Loss 1.8926   LearningRate 0.0002   Epoch: 23   Global Step: 39820   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 09:51:47,758-Speed 25335.50 samples/sec   Loss 1.8639   LearningRate 0.0002   Epoch: 23   Global Step: 39830   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 09:51:57,500-Speed 25228.72 samples/sec   Loss 1.8722   LearningRate 0.0002   Epoch: 23   Global Step: 39840   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 09:52:07,190-Speed 25365.37 samples/sec   Loss 1.8694   LearningRate 0.0002   Epoch: 23   Global Step: 39850   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 09:52:16,975-Speed 25119.89 samples/sec   Loss 1.8809   LearningRate 0.0002   Epoch: 23   Global Step: 39860   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 09:52:26,743-Speed 25163.29 samples/sec   Loss 1.8600   LearningRate 0.0002   Epoch: 23   Global Step: 39870   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 09:52:36,437-Speed 25354.21 samples/sec   Loss 1.8867   LearningRate 0.0002   Epoch: 23   Global Step: 39880   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 09:52:46,197-Speed 25184.34 samples/sec   Loss 1.8658   LearningRate 0.0002   Epoch: 23   Global Step: 39890   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 09:52:55,990-Speed 25105.36 samples/sec   Loss 1.8606   LearningRate 0.0002   Epoch: 23   Global Step: 39900   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 09:53:05,779-Speed 25110.81 samples/sec   Loss 1.8570   LearningRate 0.0002   Epoch: 23   Global Step: 39910   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 09:53:15,567-Speed 25112.37 samples/sec   Loss 1.8655   LearningRate 0.0002   Epoch: 23   Global Step: 39920   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-03-26 09:53:25,315-Speed 25216.03 samples/sec   Loss 1.8715   LearningRate 0.0002   Epoch: 23   Global Step: 39930   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 09:53:35,091-Speed 25142.44 samples/sec   Loss 1.8473   LearningRate 0.0002   Epoch: 23   Global Step: 39940   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 09:53:44,807-Speed 25297.51 samples/sec   Loss 1.8782   LearningRate 0.0002   Epoch: 23   Global Step: 39950   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 09:53:54,598-Speed 25103.81 samples/sec   Loss 1.8676   LearningRate 0.0002   Epoch: 23   Global Step: 39960   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 09:54:04,305-Speed 25323.01 samples/sec   Loss 1.8768   LearningRate 0.0002   Epoch: 23   Global Step: 39970   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 09:54:14,134-Speed 25004.72 samples/sec   Loss 1.8811   LearningRate 0.0002   Epoch: 23   Global Step: 39980   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 09:54:23,832-Speed 25345.22 samples/sec   Loss 1.8821   LearningRate 0.0002   Epoch: 23   Global Step: 39990   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 09:54:33,584-Speed 25202.57 samples/sec   Loss 1.8777   LearningRate 0.0002   Epoch: 23   Global Step: 40000   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 09:54:43,391-Speed 25064.49 samples/sec   Loss 1.8845   LearningRate 0.0002   Epoch: 23   Global Step: 40010   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 09:54:53,184-Speed 25098.09 samples/sec   Loss 1.8797   LearningRate 0.0002   Epoch: 23   Global Step: 40020   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 09:55:02,940-Speed 25196.32 samples/sec   Loss 1.9045   LearningRate 0.0002   Epoch: 23   Global Step: 40030   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 09:55:12,646-Speed 25325.42 samples/sec   Loss 1.8886   LearningRate 0.0002   Epoch: 23   Global Step: 40040   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 09:55:22,320-Speed 25408.10 samples/sec   Loss 1.8742   LearningRate 0.0002   Epoch: 23   Global Step: 40050   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 09:55:32,027-Speed 25321.34 samples/sec   Loss 1.8554   LearningRate 0.0002   Epoch: 23   Global Step: 40060   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 09:55:41,830-Speed 25074.46 samples/sec   Loss 1.8656   LearningRate 0.0002   Epoch: 23   Global Step: 40070   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 09:55:51,573-Speed 25232.12 samples/sec   Loss 1.8592   LearningRate 0.0002   Epoch: 23   Global Step: 40080   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 09:56:01,391-Speed 25035.46 samples/sec   Loss 1.8782   LearningRate 0.0002   Epoch: 23   Global Step: 40090   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 09:56:11,184-Speed 25099.82 samples/sec   Loss 1.8751   LearningRate 0.0002   Epoch: 23   Global Step: 40100   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 09:56:20,903-Speed 25296.17 samples/sec   Loss 1.8568   LearningRate 0.0002   Epoch: 23   Global Step: 40110   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 09:56:30,584-Speed 25387.15 samples/sec   Loss 1.8774   LearningRate 0.0002   Epoch: 23   Global Step: 40120   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 09:56:40,320-Speed 25247.46 samples/sec   Loss 1.8962   LearningRate 0.0002   Epoch: 23   Global Step: 40130   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-03-26 09:56:50,105-Speed 25117.26 samples/sec   Loss 1.8678   LearningRate 0.0002   Epoch: 23   Global Step: 40140   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 09:56:59,843-Speed 25241.04 samples/sec   Loss 1.8570   LearningRate 0.0002   Epoch: 23   Global Step: 40150   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 09:57:09,573-Speed 25262.85 samples/sec   Loss 1.8808   LearningRate 0.0002   Epoch: 23   Global Step: 40160   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 09:57:19,392-Speed 25030.98 samples/sec   Loss 1.8806   LearningRate 0.0002   Epoch: 23   Global Step: 40170   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 09:57:29,129-Speed 25244.17 samples/sec   Loss 1.8646   LearningRate 0.0002   Epoch: 23   Global Step: 40180   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 09:57:38,956-Speed 25013.45 samples/sec   Loss 1.8667   LearningRate 0.0002   Epoch: 23   Global Step: 40190   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 09:57:48,764-Speed 25060.93 samples/sec   Loss 1.8797   LearningRate 0.0002   Epoch: 23   Global Step: 40200   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 09:57:58,629-Speed 24912.97 samples/sec   Loss 1.8742   LearningRate 0.0002   Epoch: 23   Global Step: 40210   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 09:58:08,396-Speed 25165.85 samples/sec   Loss 1.8850   LearningRate 0.0002   Epoch: 23   Global Step: 40220   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 09:58:18,204-Speed 25061.83 samples/sec   Loss 1.8613   LearningRate 0.0002   Epoch: 23   Global Step: 40230   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 09:58:28,008-Speed 25070.92 samples/sec   Loss 1.8713   LearningRate 0.0002   Epoch: 23   Global Step: 40240   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 09:58:37,761-Speed 25199.90 samples/sec   Loss 1.8741   LearningRate 0.0002   Epoch: 23   Global Step: 40250   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 09:58:47,550-Speed 25108.28 samples/sec   Loss 1.8645   LearningRate 0.0002   Epoch: 23   Global Step: 40260   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 09:58:57,265-Speed 25301.71 samples/sec   Loss 1.8647   LearningRate 0.0002   Epoch: 23   Global Step: 40270   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 09:59:06,996-Speed 25260.56 samples/sec   Loss 1.8530   LearningRate 0.0002   Epoch: 23   Global Step: 40280   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 09:59:16,728-Speed 25256.05 samples/sec   Loss 1.8745   LearningRate 0.0002   Epoch: 23   Global Step: 40290   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 09:59:26,458-Speed 25262.27 samples/sec   Loss 1.8890   LearningRate 0.0002   Epoch: 23   Global Step: 40300   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 09:59:36,185-Speed 25268.52 samples/sec   Loss 1.8916   LearningRate 0.0002   Epoch: 23   Global Step: 40310   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 09:59:45,993-Speed 25059.10 samples/sec   Loss 1.8656   LearningRate 0.0002   Epoch: 23   Global Step: 40320   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 09:59:55,741-Speed 25215.42 samples/sec   Loss 1.8499   LearningRate 0.0002   Epoch: 23   Global Step: 40330   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:00:05,480-Speed 25237.35 samples/sec   Loss 1.8500   LearningRate 0.0002   Epoch: 23   Global Step: 40340   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-03-26 10:00:15,194-Speed 25304.37 samples/sec   Loss 1.8708   LearningRate 0.0002   Epoch: 23   Global Step: 40350   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:00:24,994-Speed 25079.79 samples/sec   Loss 1.8691   LearningRate 0.0002   Epoch: 23   Global Step: 40360   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:00:34,808-Speed 25044.16 samples/sec   Loss 1.8623   LearningRate 0.0002   Epoch: 23   Global Step: 40370   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:00:44,550-Speed 25231.56 samples/sec   Loss 1.8717   LearningRate 0.0002   Epoch: 23   Global Step: 40380   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:00:54,285-Speed 25246.88 samples/sec   Loss 1.8640   LearningRate 0.0002   Epoch: 23   Global Step: 40390   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:01:04,019-Speed 25251.84 samples/sec   Loss 1.8518   LearningRate 0.0002   Epoch: 23   Global Step: 40400   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:01:13,795-Speed 25143.37 samples/sec   Loss 1.8624   LearningRate 0.0002   Epoch: 23   Global Step: 40410   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:01:23,562-Speed 25164.73 samples/sec   Loss 1.8408   LearningRate 0.0002   Epoch: 23   Global Step: 40420   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:01:33,301-Speed 25240.24 samples/sec   Loss 1.8661   LearningRate 0.0002   Epoch: 23   Global Step: 40430   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:01:43,033-Speed 25254.68 samples/sec   Loss 1.8596   LearningRate 0.0002   Epoch: 23   Global Step: 40440   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:01:52,769-Speed 25244.39 samples/sec   Loss 1.8490   LearningRate 0.0002   Epoch: 23   Global Step: 40450   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-03-26 10:02:02,523-Speed 25199.24 samples/sec   Loss 1.8613   LearningRate 0.0002   Epoch: 23   Global Step: 40460   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:02:12,262-Speed 25240.09 samples/sec   Loss 1.8493   LearningRate 0.0002   Epoch: 23   Global Step: 40470   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:02:22,010-Speed 25212.84 samples/sec   Loss 1.8494   LearningRate 0.0002   Epoch: 23   Global Step: 40480   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:02:31,781-Speed 25155.02 samples/sec   Loss 1.8551   LearningRate 0.0002   Epoch: 23   Global Step: 40490   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:02:41,566-Speed 25118.46 samples/sec   Loss 1.8684   LearningRate 0.0002   Epoch: 23   Global Step: 40500   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:02:51,403-Speed 24986.24 samples/sec   Loss 1.8605   LearningRate 0.0002   Epoch: 23   Global Step: 40510   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:03:01,165-Speed 25177.45 samples/sec   Loss 1.8458   LearningRate 0.0002   Epoch: 23   Global Step: 40520   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:03:10,923-Speed 25188.20 samples/sec   Loss 1.8534   LearningRate 0.0002   Epoch: 23   Global Step: 40530   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:03:20,639-Speed 25297.43 samples/sec   Loss 1.8806   LearningRate 0.0002   Epoch: 23   Global Step: 40540   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:03:30,462-Speed 25022.85 samples/sec   Loss 1.8554   LearningRate 0.0002   Epoch: 23   Global Step: 40550   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:03:40,329-Speed 24911.87 samples/sec   Loss 1.8435   LearningRate 0.0002   Epoch: 23   Global Step: 40560   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-03-26 10:03:50,015-Speed 25374.69 samples/sec   Loss 1.8644   LearningRate 0.0002   Epoch: 23   Global Step: 40570   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-03-26 10:03:59,854-Speed 24982.00 samples/sec   Loss 1.8428   LearningRate 0.0002   Epoch: 23   Global Step: 40580   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:04:09,586-Speed 25258.69 samples/sec   Loss 1.8763   LearningRate 0.0002   Epoch: 23   Global Step: 40590   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:04:19,358-Speed 25153.30 samples/sec   Loss 1.8576   LearningRate 0.0002   Epoch: 23   Global Step: 40600   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:04:29,074-Speed 25296.94 samples/sec   Loss 1.8506   LearningRate 0.0002   Epoch: 23   Global Step: 40610   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:04:38,808-Speed 25252.84 samples/sec   Loss 1.8511   LearningRate 0.0002   Epoch: 23   Global Step: 40620   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:04:48,499-Speed 25361.28 samples/sec   Loss 1.8663   LearningRate 0.0002   Epoch: 23   Global Step: 40630   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:04:58,261-Speed 25182.49 samples/sec   Loss 1.8570   LearningRate 0.0002   Epoch: 23   Global Step: 40640   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:05:08,011-Speed 25211.43 samples/sec   Loss 1.8682   LearningRate 0.0002   Epoch: 23   Global Step: 40650   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:05:17,769-Speed 25188.73 samples/sec   Loss 1.8687   LearningRate 0.0002   Epoch: 23   Global Step: 40660   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:05:27,514-Speed 25221.47 samples/sec   Loss 1.8394   LearningRate 0.0002   Epoch: 23   Global Step: 40670   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:05:37,377-Speed 24921.47 samples/sec   Loss 1.8274   LearningRate 0.0002   Epoch: 23   Global Step: 40680   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-03-26 10:05:47,115-Speed 25240.75 samples/sec   Loss 1.8445   LearningRate 0.0002   Epoch: 23   Global Step: 40690   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-03-26 10:05:56,835-Speed 25286.93 samples/sec   Loss 1.8650   LearningRate 0.0002   Epoch: 23   Global Step: 40700   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-03-26 10:06:06,587-Speed 25204.40 samples/sec   Loss 1.8333   LearningRate 0.0002   Epoch: 23   Global Step: 40710   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:06:16,320-Speed 25252.14 samples/sec   Loss 1.8611   LearningRate 0.0002   Epoch: 23   Global Step: 40720   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:06:26,079-Speed 25184.21 samples/sec   Loss 1.8416   LearningRate 0.0002   Epoch: 23   Global Step: 40730   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:06:35,836-Speed 25193.44 samples/sec   Loss 1.8378   LearningRate 0.0002   Epoch: 23   Global Step: 40740   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:06:45,731-Speed 24838.26 samples/sec   Loss 1.8167   LearningRate 0.0002   Epoch: 23   Global Step: 40750   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:06:55,553-Speed 25025.09 samples/sec   Loss 1.8330   LearningRate 0.0002   Epoch: 23   Global Step: 40760   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:07:05,346-Speed 25097.85 samples/sec   Loss 1.8540   LearningRate 0.0002   Epoch: 23   Global Step: 40770   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:07:15,054-Speed 25318.58 samples/sec   Loss 1.8576   LearningRate 0.0002   Epoch: 23   Global Step: 40780   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:07:24,860-Speed 25064.02 samples/sec   Loss 1.8460   LearningRate 0.0002   Epoch: 23   Global Step: 40790   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:07:34,692-Speed 25000.38 samples/sec   Loss 1.8389   LearningRate 0.0002   Epoch: 23   Global Step: 40800   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:07:44,421-Speed 25263.44 samples/sec   Loss 1.8325   LearningRate 0.0002   Epoch: 23   Global Step: 40810   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-03-26 10:07:54,134-Speed 25305.73 samples/sec   Loss 1.8377   LearningRate 0.0002   Epoch: 23   Global Step: 40820   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:08:03,899-Speed 25170.34 samples/sec   Loss 1.8615   LearningRate 0.0002   Epoch: 23   Global Step: 40830   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:08:13,633-Speed 25251.99 samples/sec   Loss 1.8373   LearningRate 0.0002   Epoch: 23   Global Step: 40840   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:08:23,410-Speed 25137.15 samples/sec   Loss 1.8459   LearningRate 0.0002   Epoch: 23   Global Step: 40850   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:08:33,399-Speed 24606.84 samples/sec   Loss 1.8551   LearningRate 0.0002   Epoch: 23   Global Step: 40860   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:08:43,188-Speed 25107.99 samples/sec   Loss 1.8474   LearningRate 0.0002   Epoch: 23   Global Step: 40870   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:08:52,911-Speed 25280.60 samples/sec   Loss 1.8551   LearningRate 0.0002   Epoch: 23   Global Step: 40880   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:09:02,813-Speed 24821.46 samples/sec   Loss 1.8461   LearningRate 0.0002   Epoch: 23   Global Step: 40890   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:09:12,650-Speed 24983.68 samples/sec   Loss 1.8491   LearningRate 0.0002   Epoch: 23   Global Step: 40900   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:09:22,602-Speed 24697.61 samples/sec   Loss 1.8467   LearningRate 0.0002   Epoch: 23   Global Step: 40910   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:09:32,460-Speed 24933.38 samples/sec   Loss 1.8326   LearningRate 0.0002   Epoch: 23   Global Step: 40920   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-03-26 10:09:42,191-Speed 25261.63 samples/sec   Loss 1.8457   LearningRate 0.0002   Epoch: 23   Global Step: 40930   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:09:51,988-Speed 25087.13 samples/sec   Loss 1.8468   LearningRate 0.0002   Epoch: 23   Global Step: 40940   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:10:01,772-Speed 25122.46 samples/sec   Loss 1.8331   LearningRate 0.0002   Epoch: 23   Global Step: 40950   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:10:11,651-Speed 24879.53 samples/sec   Loss 1.8201   LearningRate 0.0002   Epoch: 23   Global Step: 40960   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:10:21,371-Speed 25286.78 samples/sec   Loss 1.8343   LearningRate 0.0002   Epoch: 23   Global Step: 40970   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:10:31,202-Speed 25001.56 samples/sec   Loss 1.8451   LearningRate 0.0002   Epoch: 23   Global Step: 40980   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:10:41,010-Speed 25057.95 samples/sec   Loss 1.8401   LearningRate 0.0002   Epoch: 23   Global Step: 40990   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:10:50,777-Speed 25173.56 samples/sec   Loss 1.8308   LearningRate 0.0002   Epoch: 23   Global Step: 41000   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:11:00,616-Speed 24981.17 samples/sec   Loss 1.8302   LearningRate 0.0002   Epoch: 23   Global Step: 41010   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:11:10,456-Speed 24977.15 samples/sec   Loss 1.8266   LearningRate 0.0002   Epoch: 23   Global Step: 41020   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:11:20,345-Speed 24854.43 samples/sec   Loss 1.8450   LearningRate 0.0002   Epoch: 23   Global Step: 41030   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:11:30,345-Speed 24578.85 samples/sec   Loss 1.8387   LearningRate 0.0002   Epoch: 23   Global Step: 41040   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:11:40,267-Speed 24771.46 samples/sec   Loss 1.8495   LearningRate 0.0002   Epoch: 23   Global Step: 41050   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:11:50,000-Speed 25253.53 samples/sec   Loss 1.8484   LearningRate 0.0002   Epoch: 23   Global Step: 41060   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:11:59,944-Speed 24716.48 samples/sec   Loss 1.8368   LearningRate 0.0002   Epoch: 23   Global Step: 41070   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:12:09,657-Speed 25306.62 samples/sec   Loss 1.8302   LearningRate 0.0002   Epoch: 23   Global Step: 41080   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:12:19,446-Speed 25107.69 samples/sec   Loss 1.8424   LearningRate 0.0002   Epoch: 23   Global Step: 41090   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:12:29,298-Speed 24949.54 samples/sec   Loss 1.8319   LearningRate 0.0002   Epoch: 23   Global Step: 41100   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:12:39,070-Speed 25153.66 samples/sec   Loss 1.8073   LearningRate 0.0002   Epoch: 23   Global Step: 41110   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:12:49,011-Speed 24728.54 samples/sec   Loss 1.8115   LearningRate 0.0002   Epoch: 23   Global Step: 41120   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:12:58,749-Speed 25245.62 samples/sec   Loss 1.8135   LearningRate 0.0002   Epoch: 23   Global Step: 41130   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:13:08,452-Speed 25338.47 samples/sec   Loss 1.8352   LearningRate 0.0002   Epoch: 23   Global Step: 41140   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:13:18,297-Speed 24966.47 samples/sec   Loss 1.8246   LearningRate 0.0002   Epoch: 23   Global Step: 41150   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:13:28,037-Speed 25240.14 samples/sec   Loss 1.8201   LearningRate 0.0002   Epoch: 23   Global Step: 41160   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:13:37,821-Speed 25121.11 samples/sec   Loss 1.8238   LearningRate 0.0002   Epoch: 23   Global Step: 41170   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:13:47,503-Speed 25388.65 samples/sec   Loss 1.8306   LearningRate 0.0002   Epoch: 23   Global Step: 41180   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:13:57,283-Speed 25136.00 samples/sec   Loss 1.8294   LearningRate 0.0002   Epoch: 23   Global Step: 41190   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:14:07,062-Speed 25134.70 samples/sec   Loss 1.8404   LearningRate 0.0002   Epoch: 23   Global Step: 41200   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:14:16,845-Speed 25125.94 samples/sec   Loss 1.8390   LearningRate 0.0002   Epoch: 23   Global Step: 41210   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:14:26,603-Speed 25187.27 samples/sec   Loss 1.8129   LearningRate 0.0002   Epoch: 23   Global Step: 41220   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:14:36,488-Speed 24872.54 samples/sec   Loss 1.8208   LearningRate 0.0002   Epoch: 23   Global Step: 41230   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-03-26 10:14:46,219-Speed 25256.93 samples/sec   Loss 1.8175   LearningRate 0.0002   Epoch: 23   Global Step: 41240   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:14:55,991-Speed 25154.62 samples/sec   Loss 1.8315   LearningRate 0.0002   Epoch: 23   Global Step: 41250   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:15:05,736-Speed 25220.44 samples/sec   Loss 1.8344   LearningRate 0.0002   Epoch: 23   Global Step: 41260   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:15:15,469-Speed 25252.99 samples/sec   Loss 1.8202   LearningRate 0.0002   Epoch: 23   Global Step: 41270   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:15:25,285-Speed 25040.35 samples/sec   Loss 1.8211   LearningRate 0.0002   Epoch: 23   Global Step: 41280   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:15:35,268-Speed 24620.58 samples/sec   Loss 1.8242   LearningRate 0.0002   Epoch: 23   Global Step: 41290   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:15:45,314-Speed 24467.62 samples/sec   Loss 1.8269   LearningRate 0.0002   Epoch: 23   Global Step: 41300   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:15:55,179-Speed 24913.28 samples/sec   Loss 1.8313   LearningRate 0.0002   Epoch: 23   Global Step: 41310   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:16:04,994-Speed 25043.19 samples/sec   Loss 1.8472   LearningRate 0.0002   Epoch: 23   Global Step: 41320   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:16:14,801-Speed 25062.04 samples/sec   Loss 1.8327   LearningRate 0.0002   Epoch: 23   Global Step: 41330   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:16:24,573-Speed 25153.07 samples/sec   Loss 1.8351   LearningRate 0.0002   Epoch: 23   Global Step: 41340   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:16:34,322-Speed 25209.86 samples/sec   Loss 1.8382   LearningRate 0.0002   Epoch: 23   Global Step: 41350   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:16:44,193-Speed 24901.24 samples/sec   Loss 1.8323   LearningRate 0.0002   Epoch: 23   Global Step: 41360   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:16:54,014-Speed 25028.07 samples/sec   Loss 1.8338   LearningRate 0.0002   Epoch: 23   Global Step: 41370   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:17:03,863-Speed 24957.73 samples/sec   Loss 1.8335   LearningRate 0.0002   Epoch: 23   Global Step: 41380   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:17:13,636-Speed 25149.01 samples/sec   Loss 1.8307   LearningRate 0.0002   Epoch: 23   Global Step: 41390   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:17:23,578-Speed 24731.44 samples/sec   Loss 1.8201   LearningRate 0.0002   Epoch: 23   Global Step: 41400   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:17:33,426-Speed 24961.39 samples/sec   Loss 1.8349   LearningRate 0.0002   Epoch: 23   Global Step: 41410   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:17:43,304-Speed 24883.66 samples/sec   Loss 1.8325   LearningRate 0.0002   Epoch: 23   Global Step: 41420   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:17:53,138-Speed 24994.31 samples/sec   Loss 1.8162   LearningRate 0.0002   Epoch: 23   Global Step: 41430   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:18:02,989-Speed 24950.75 samples/sec   Loss 1.8275   LearningRate 0.0002   Epoch: 23   Global Step: 41440   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-03-26 10:18:12,802-Speed 25047.83 samples/sec   Loss 1.8300   LearningRate 0.0002   Epoch: 23   Global Step: 41450   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:18:22,602-Speed 25081.40 samples/sec   Loss 1.8100   LearningRate 0.0002   Epoch: 23   Global Step: 41460   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:18:32,357-Speed 25198.54 samples/sec   Loss 1.8288   LearningRate 0.0002   Epoch: 23   Global Step: 41470   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:18:42,181-Speed 25019.15 samples/sec   Loss 1.8159   LearningRate 0.0002   Epoch: 23   Global Step: 41480   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:19:43,135-Speed 4032.18 samples/sec   Loss 1.8087   LearningRate 0.0002   Epoch: 24   Global Step: 41490   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:19:52,961-Speed 25013.41 samples/sec   Loss 1.8088   LearningRate 0.0002   Epoch: 24   Global Step: 41500   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:20:02,829-Speed 24908.98 samples/sec   Loss 1.8191   LearningRate 0.0002   Epoch: 24   Global Step: 41510   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:20:12,621-Speed 25099.67 samples/sec   Loss 1.8117   LearningRate 0.0002   Epoch: 24   Global Step: 41520   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:20:22,474-Speed 24946.74 samples/sec   Loss 1.8293   LearningRate 0.0002   Epoch: 24   Global Step: 41530   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:20:32,335-Speed 24927.54 samples/sec   Loss 1.8380   LearningRate 0.0002   Epoch: 24   Global Step: 41540   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:20:42,195-Speed 24931.17 samples/sec   Loss 1.8017   LearningRate 0.0002   Epoch: 24   Global Step: 41550   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:20:52,143-Speed 24709.48 samples/sec   Loss 1.7987   LearningRate 0.0002   Epoch: 24   Global Step: 41560   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:21:01,988-Speed 24968.77 samples/sec   Loss 1.7898   LearningRate 0.0002   Epoch: 24   Global Step: 41570   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:21:11,761-Speed 25152.75 samples/sec   Loss 1.8086   LearningRate 0.0002   Epoch: 24   Global Step: 41580   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:21:21,630-Speed 24904.85 samples/sec   Loss 1.8203   LearningRate 0.0002   Epoch: 24   Global Step: 41590   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:21:31,514-Speed 24868.16 samples/sec   Loss 1.7947   LearningRate 0.0002   Epoch: 24   Global Step: 41600   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:21:41,329-Speed 25042.97 samples/sec   Loss 1.8154   LearningRate 0.0002   Epoch: 24   Global Step: 41610   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:21:51,078-Speed 25211.54 samples/sec   Loss 1.8022   LearningRate 0.0002   Epoch: 24   Global Step: 41620   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:22:00,916-Speed 24985.29 samples/sec   Loss 1.7988   LearningRate 0.0002   Epoch: 24   Global Step: 41630   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:22:10,792-Speed 24887.00 samples/sec   Loss 1.8180   LearningRate 0.0002   Epoch: 24   Global Step: 41640   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:22:20,714-Speed 24772.46 samples/sec   Loss 1.8184   LearningRate 0.0002   Epoch: 24   Global Step: 41650   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-03-26 10:22:30,476-Speed 25180.49 samples/sec   Loss 1.8073   LearningRate 0.0002   Epoch: 24   Global Step: 41660   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:22:40,447-Speed 24650.84 samples/sec   Loss 1.8108   LearningRate 0.0002   Epoch: 24   Global Step: 41670   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:22:50,363-Speed 24786.56 samples/sec   Loss 1.8133   LearningRate 0.0002   Epoch: 24   Global Step: 41680   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:23:00,276-Speed 24793.33 samples/sec   Loss 1.8404   LearningRate 0.0002   Epoch: 24   Global Step: 41690   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:23:10,109-Speed 24998.95 samples/sec   Loss 1.8135   LearningRate 0.0002   Epoch: 24   Global Step: 41700   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:23:19,898-Speed 25109.44 samples/sec   Loss 1.7990   LearningRate 0.0002   Epoch: 24   Global Step: 41710   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:23:29,825-Speed 24760.99 samples/sec   Loss 1.7855   LearningRate 0.0002   Epoch: 24   Global Step: 41720   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:23:39,702-Speed 24885.83 samples/sec   Loss 1.7743   LearningRate 0.0002   Epoch: 24   Global Step: 41730   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:23:49,542-Speed 24979.98 samples/sec   Loss 1.8007   LearningRate 0.0002   Epoch: 24   Global Step: 41740   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:23:59,476-Speed 24749.48 samples/sec   Loss 1.7996   LearningRate 0.0002   Epoch: 24   Global Step: 41750   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:24:09,324-Speed 24957.48 samples/sec   Loss 1.8026   LearningRate 0.0002   Epoch: 24   Global Step: 41760   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-03-26 10:24:19,119-Speed 25094.61 samples/sec   Loss 1.8178   LearningRate 0.0002   Epoch: 24   Global Step: 41770   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-03-26 10:24:28,940-Speed 25027.80 samples/sec   Loss 1.8317   LearningRate 0.0002   Epoch: 24   Global Step: 41780   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:24:38,917-Speed 24635.14 samples/sec   Loss 1.7987   LearningRate 0.0002   Epoch: 24   Global Step: 41790   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:24:48,778-Speed 24925.12 samples/sec   Loss 1.8089   LearningRate 0.0002   Epoch: 24   Global Step: 41800   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:24:58,534-Speed 25195.14 samples/sec   Loss 1.8080   LearningRate 0.0002   Epoch: 24   Global Step: 41810   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:25:08,345-Speed 25052.40 samples/sec   Loss 1.8073   LearningRate 0.0002   Epoch: 24   Global Step: 41820   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:25:18,169-Speed 25020.50 samples/sec   Loss 1.8271   LearningRate 0.0002   Epoch: 24   Global Step: 41830   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:25:28,193-Speed 24521.53 samples/sec   Loss 1.8039   LearningRate 0.0002   Epoch: 24   Global Step: 41840   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:25:37,931-Speed 25239.97 samples/sec   Loss 1.8124   LearningRate 0.0002   Epoch: 24   Global Step: 41850   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:25:47,710-Speed 25134.71 samples/sec   Loss 1.8508   LearningRate 0.0002   Epoch: 24   Global Step: 41860   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:25:57,486-Speed 25142.82 samples/sec   Loss 1.8186   LearningRate 0.0002   Epoch: 24   Global Step: 41870   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:26:07,357-Speed 24907.54 samples/sec   Loss 1.8081   LearningRate 0.0002   Epoch: 24   Global Step: 41880   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-03-26 10:26:17,103-Speed 25217.80 samples/sec   Loss 1.7998   LearningRate 0.0002   Epoch: 24   Global Step: 41890   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-03-26 10:26:26,892-Speed 25110.44 samples/sec   Loss 1.8012   LearningRate 0.0002   Epoch: 24   Global Step: 41900   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-03-26 10:26:36,830-Speed 24732.10 samples/sec   Loss 1.8106   LearningRate 0.0002   Epoch: 24   Global Step: 41910   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-03-26 10:26:46,668-Speed 24992.11 samples/sec   Loss 1.8037   LearningRate 0.0002   Epoch: 24   Global Step: 41920   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-03-26 10:26:56,569-Speed 24826.94 samples/sec   Loss 1.7841   LearningRate 0.0002   Epoch: 24   Global Step: 41930   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-03-26 10:27:06,345-Speed 25142.22 samples/sec   Loss 1.7924   LearningRate 0.0002   Epoch: 24   Global Step: 41940   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-03-26 10:27:16,138-Speed 25099.98 samples/sec   Loss 1.7833   LearningRate 0.0002   Epoch: 24   Global Step: 41950   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-03-26 10:27:25,920-Speed 25127.18 samples/sec   Loss 1.8067   LearningRate 0.0002   Epoch: 24   Global Step: 41960   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-03-26 10:27:35,811-Speed 24853.72 samples/sec   Loss 1.7806   LearningRate 0.0002   Epoch: 24   Global Step: 41970   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-03-26 10:27:45,684-Speed 24896.84 samples/sec   Loss 1.7914   LearningRate 0.0002   Epoch: 24   Global Step: 41980   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:27:55,647-Speed 24672.21 samples/sec   Loss 1.8130   LearningRate 0.0002   Epoch: 24   Global Step: 41990   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:28:05,434-Speed 25115.06 samples/sec   Loss 1.7954   LearningRate 0.0002   Epoch: 24   Global Step: 42000   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:28:15,291-Speed 24933.85 samples/sec   Loss 1.8092   LearningRate 0.0002   Epoch: 24   Global Step: 42010   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:28:25,149-Speed 24934.81 samples/sec   Loss 1.8024   LearningRate 0.0002   Epoch: 24   Global Step: 42020   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:28:35,028-Speed 24881.28 samples/sec   Loss 1.7928   LearningRate 0.0002   Epoch: 24   Global Step: 42030   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:28:44,873-Speed 24965.24 samples/sec   Loss 1.8078   LearningRate 0.0002   Epoch: 24   Global Step: 42040   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:28:54,577-Speed 25331.52 samples/sec   Loss 1.7798   LearningRate 0.0002   Epoch: 24   Global Step: 42050   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:29:04,389-Speed 25051.10 samples/sec   Loss 1.7896   LearningRate 0.0002   Epoch: 24   Global Step: 42060   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:29:14,201-Speed 25052.47 samples/sec   Loss 1.8013   LearningRate 0.0002   Epoch: 24   Global Step: 42070   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:29:24,045-Speed 24974.44 samples/sec   Loss 1.7983   LearningRate 0.0002   Epoch: 24   Global Step: 42080   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-03-26 10:29:33,887-Speed 24976.22 samples/sec   Loss 1.7928   LearningRate 0.0002   Epoch: 24   Global Step: 42090   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:29:43,823-Speed 24736.44 samples/sec   Loss 1.8037   LearningRate 0.0002   Epoch: 24   Global Step: 42100   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:29:53,745-Speed 24774.49 samples/sec   Loss 1.7958   LearningRate 0.0002   Epoch: 24   Global Step: 42110   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:30:03,517-Speed 25152.30 samples/sec   Loss 1.8115   LearningRate 0.0002   Epoch: 24   Global Step: 42120   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:30:13,291-Speed 25146.34 samples/sec   Loss 1.7803   LearningRate 0.0002   Epoch: 24   Global Step: 42130   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:30:23,026-Speed 25245.22 samples/sec   Loss 1.8075   LearningRate 0.0002   Epoch: 24   Global Step: 42140   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:30:32,784-Speed 25190.08 samples/sec   Loss 1.8037   LearningRate 0.0002   Epoch: 24   Global Step: 42150   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:30:42,664-Speed 24879.36 samples/sec   Loss 1.8112   LearningRate 0.0002   Epoch: 24   Global Step: 42160   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:30:52,398-Speed 25250.71 samples/sec   Loss 1.7968   LearningRate 0.0002   Epoch: 24   Global Step: 42170   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:31:02,191-Speed 25099.33 samples/sec   Loss 1.7849   LearningRate 0.0002   Epoch: 24   Global Step: 42180   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:31:11,937-Speed 25219.82 samples/sec   Loss 1.7659   LearningRate 0.0002   Epoch: 24   Global Step: 42190   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-03-26 10:31:21,714-Speed 25140.08 samples/sec   Loss 1.7931   LearningRate 0.0002   Epoch: 24   Global Step: 42200   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:31:31,529-Speed 25042.29 samples/sec   Loss 1.7918   LearningRate 0.0002   Epoch: 24   Global Step: 42210   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:31:41,423-Speed 24841.57 samples/sec   Loss 1.7836   LearningRate 0.0002   Epoch: 24   Global Step: 42220   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:31:51,235-Speed 25051.88 samples/sec   Loss 1.7816   LearningRate 0.0002   Epoch: 24   Global Step: 42230   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:32:01,095-Speed 24927.45 samples/sec   Loss 1.8196   LearningRate 0.0002   Epoch: 24   Global Step: 42240   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:32:10,917-Speed 25023.37 samples/sec   Loss 1.7939   LearningRate 0.0002   Epoch: 24   Global Step: 42250   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:32:20,691-Speed 25149.23 samples/sec   Loss 1.7642   LearningRate 0.0002   Epoch: 24   Global Step: 42260   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:32:30,406-Speed 25298.68 samples/sec   Loss 1.7667   LearningRate 0.0002   Epoch: 24   Global Step: 42270   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:32:40,245-Speed 24980.96 samples/sec   Loss 1.7792   LearningRate 0.0002   Epoch: 24   Global Step: 42280   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:32:50,024-Speed 25134.78 samples/sec   Loss 1.7827   LearningRate 0.0002   Epoch: 24   Global Step: 42290   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:32:59,744-Speed 25287.00 samples/sec   Loss 1.7823   LearningRate 0.0002   Epoch: 24   Global Step: 42300   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-03-26 10:33:09,513-Speed 25162.21 samples/sec   Loss 1.7887   LearningRate 0.0002   Epoch: 24   Global Step: 42310   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:33:19,321-Speed 25061.44 samples/sec   Loss 1.7873   LearningRate 0.0002   Epoch: 24   Global Step: 42320   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:33:29,188-Speed 24910.26 samples/sec   Loss 1.7984   LearningRate 0.0002   Epoch: 24   Global Step: 42330   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:33:39,069-Speed 24882.96 samples/sec   Loss 1.7755   LearningRate 0.0002   Epoch: 24   Global Step: 42340   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:33:48,928-Speed 24930.01 samples/sec   Loss 1.7932   LearningRate 0.0002   Epoch: 24   Global Step: 42350   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:33:58,710-Speed 25131.93 samples/sec   Loss 1.7971   LearningRate 0.0002   Epoch: 24   Global Step: 42360   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:34:08,492-Speed 25125.29 samples/sec   Loss 1.7969   LearningRate 0.0002   Epoch: 24   Global Step: 42370   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:34:18,236-Speed 25226.39 samples/sec   Loss 1.7779   LearningRate 0.0002   Epoch: 24   Global Step: 42380   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:34:28,130-Speed 24843.64 samples/sec   Loss 1.7696   LearningRate 0.0002   Epoch: 24   Global Step: 42390   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:34:37,950-Speed 25036.17 samples/sec   Loss 1.7617   LearningRate 0.0002   Epoch: 24   Global Step: 42400   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:34:47,714-Speed 25172.08 samples/sec   Loss 1.7676   LearningRate 0.0002   Epoch: 24   Global Step: 42410   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-03-26 10:34:57,502-Speed 25118.85 samples/sec   Loss 1.7839   LearningRate 0.0002   Epoch: 24   Global Step: 42420   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:35:07,296-Speed 25098.63 samples/sec   Loss 1.7908   LearningRate 0.0002   Epoch: 24   Global Step: 42430   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:35:17,153-Speed 24933.85 samples/sec   Loss 1.7740   LearningRate 0.0002   Epoch: 24   Global Step: 42440   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:35:27,018-Speed 24914.70 samples/sec   Loss 1.7760   LearningRate 0.0002   Epoch: 24   Global Step: 42450   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:35:36,917-Speed 24829.43 samples/sec   Loss 1.8010   LearningRate 0.0002   Epoch: 24   Global Step: 42460   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:35:46,826-Speed 24804.63 samples/sec   Loss 1.7851   LearningRate 0.0002   Epoch: 24   Global Step: 42470   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:35:56,723-Speed 24836.97 samples/sec   Loss 1.7980   LearningRate 0.0002   Epoch: 24   Global Step: 42480   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:36:06,575-Speed 24949.39 samples/sec   Loss 1.7699   LearningRate 0.0002   Epoch: 24   Global Step: 42490   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:36:16,373-Speed 25085.07 samples/sec   Loss 1.7628   LearningRate 0.0002   Epoch: 24   Global Step: 42500   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:36:26,232-Speed 24931.46 samples/sec   Loss 1.7694   LearningRate 0.0002   Epoch: 24   Global Step: 42510   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:36:36,020-Speed 25118.79 samples/sec   Loss 1.7732   LearningRate 0.0002   Epoch: 24   Global Step: 42520   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:36:45,834-Speed 25045.71 samples/sec   Loss 1.7610   LearningRate 0.0002   Epoch: 24   Global Step: 42530   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:36:55,729-Speed 24839.39 samples/sec   Loss 1.7696   LearningRate 0.0002   Epoch: 24   Global Step: 42540   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-26 10:37:05,927-Speed 24103.61 samples/sec   Loss 1.7769   LearningRate 0.0002   Epoch: 24   Global Step: 42550   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 10:37:15,951-Speed 24519.27 samples/sec   Loss 1.7771   LearningRate 0.0002   Epoch: 24   Global Step: 42560   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 10:37:25,860-Speed 24806.44 samples/sec   Loss 1.7990   LearningRate 0.0002   Epoch: 24   Global Step: 42570   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-26 10:37:35,870-Speed 24555.06 samples/sec   Loss 1.7826   LearningRate 0.0002   Epoch: 24   Global Step: 42580   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-26 10:37:45,998-Speed 24267.27 samples/sec   Loss 1.7693   LearningRate 0.0002   Epoch: 24   Global Step: 42590   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-26 10:37:55,945-Speed 24709.09 samples/sec   Loss 1.7671   LearningRate 0.0002   Epoch: 24   Global Step: 42600   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-26 10:38:05,929-Speed 24620.78 samples/sec   Loss 1.7539   LearningRate 0.0002   Epoch: 24   Global Step: 42610   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-26 10:38:15,987-Speed 24436.95 samples/sec   Loss 1.7694   LearningRate 0.0002   Epoch: 24   Global Step: 42620   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-26 10:38:26,065-Speed 24387.91 samples/sec   Loss 1.7660   LearningRate 0.0002   Epoch: 24   Global Step: 42630   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-26 10:38:36,091-Speed 24516.44 samples/sec   Loss 1.7669   LearningRate 0.0002   Epoch: 24   Global Step: 42640   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-26 10:38:45,960-Speed 24906.76 samples/sec   Loss 1.7758   LearningRate 0.0002   Epoch: 24   Global Step: 42650   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-26 10:38:55,865-Speed 24816.09 samples/sec   Loss 1.7715   LearningRate 0.0002   Epoch: 24   Global Step: 42660   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-26 10:39:05,769-Speed 24819.58 samples/sec   Loss 1.7848   LearningRate 0.0002   Epoch: 24   Global Step: 42670   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 10:39:15,610-Speed 24977.35 samples/sec   Loss 1.7905   LearningRate 0.0002   Epoch: 24   Global Step: 42680   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 10:39:25,411-Speed 25078.08 samples/sec   Loss 1.7757   LearningRate 0.0002   Epoch: 24   Global Step: 42690   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 10:39:35,216-Speed 25069.98 samples/sec   Loss 1.7834   LearningRate 0.0002   Epoch: 24   Global Step: 42700   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 10:39:45,043-Speed 25012.01 samples/sec   Loss 1.7866   LearningRate 0.0002   Epoch: 24   Global Step: 42710   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 10:39:54,817-Speed 25148.94 samples/sec   Loss 1.7760   LearningRate 0.0002   Epoch: 24   Global Step: 42720   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 10:40:04,640-Speed 25021.71 samples/sec   Loss 1.7899   LearningRate 0.0002   Epoch: 24   Global Step: 42730   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 10:40:14,510-Speed 24904.29 samples/sec   Loss 1.7645   LearningRate 0.0002   Epoch: 24   Global Step: 42740   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 10:40:24,302-Speed 25103.04 samples/sec   Loss 1.7672   LearningRate 0.0002   Epoch: 24   Global Step: 42750   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 10:40:34,070-Speed 25164.93 samples/sec   Loss 1.7691   LearningRate 0.0002   Epoch: 24   Global Step: 42760   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 10:40:43,840-Speed 25159.60 samples/sec   Loss 1.7627   LearningRate 0.0002   Epoch: 24   Global Step: 42770   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 10:40:53,591-Speed 25205.61 samples/sec   Loss 1.7534   LearningRate 0.0002   Epoch: 24   Global Step: 42780   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 10:41:03,311-Speed 25285.98 samples/sec   Loss 1.7538   LearningRate 0.0002   Epoch: 24   Global Step: 42790   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 10:41:13,070-Speed 25188.10 samples/sec   Loss 1.7639   LearningRate 0.0002   Epoch: 24   Global Step: 42800   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 10:41:22,851-Speed 25128.95 samples/sec   Loss 1.7787   LearningRate 0.0002   Epoch: 24   Global Step: 42810   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-26 10:41:32,621-Speed 25157.13 samples/sec   Loss 1.7719   LearningRate 0.0002   Epoch: 24   Global Step: 42820   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-26 10:41:42,495-Speed 24892.25 samples/sec   Loss 1.7732   LearningRate 0.0002   Epoch: 24   Global Step: 42830   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-26 10:41:52,301-Speed 25066.77 samples/sec   Loss 1.7732   LearningRate 0.0002   Epoch: 24   Global Step: 42840   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-26 10:42:02,083-Speed 25125.66 samples/sec   Loss 1.7663   LearningRate 0.0002   Epoch: 24   Global Step: 42850   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-26 10:42:11,836-Speed 25203.64 samples/sec   Loss 1.7710   LearningRate 0.0002   Epoch: 24   Global Step: 42860   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-26 10:42:21,517-Speed 25388.17 samples/sec   Loss 1.7699   LearningRate 0.0002   Epoch: 24   Global Step: 42870   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-26 10:42:31,334-Speed 25050.75 samples/sec   Loss 1.7678   LearningRate 0.0002   Epoch: 24   Global Step: 42880   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-26 10:42:41,177-Speed 24973.90 samples/sec   Loss 1.7523   LearningRate 0.0002   Epoch: 24   Global Step: 42890   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-26 10:42:50,992-Speed 25041.53 samples/sec   Loss 1.7543   LearningRate 0.0002   Epoch: 24   Global Step: 42900   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-26 10:43:00,798-Speed 25066.93 samples/sec   Loss 1.7624   LearningRate 0.0002   Epoch: 24   Global Step: 42910   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 10:43:10,725-Speed 24760.07 samples/sec   Loss 1.7545   LearningRate 0.0002   Epoch: 24   Global Step: 42920   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 10:43:20,599-Speed 24892.64 samples/sec   Loss 1.7638   LearningRate 0.0002   Epoch: 24   Global Step: 42930   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 10:43:30,471-Speed 24898.33 samples/sec   Loss 1.7844   LearningRate 0.0002   Epoch: 24   Global Step: 42940   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 10:43:40,247-Speed 25141.65 samples/sec   Loss 1.7564   LearningRate 0.0002   Epoch: 24   Global Step: 42950   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 10:43:50,018-Speed 25159.74 samples/sec   Loss 1.7824   LearningRate 0.0002   Epoch: 24   Global Step: 42960   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 10:43:59,766-Speed 25216.34 samples/sec   Loss 1.7607   LearningRate 0.0002   Epoch: 24   Global Step: 42970   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 10:44:09,572-Speed 25064.68 samples/sec   Loss 1.7560   LearningRate 0.0002   Epoch: 24   Global Step: 42980   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 10:44:19,315-Speed 25227.75 samples/sec   Loss 1.7702   LearningRate 0.0002   Epoch: 24   Global Step: 42990   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 10:44:29,043-Speed 25265.52 samples/sec   Loss 1.7707   LearningRate 0.0002   Epoch: 24   Global Step: 43000   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 10:44:38,806-Speed 25176.93 samples/sec   Loss 1.7702   LearningRate 0.0002   Epoch: 24   Global Step: 43010   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-03-26 10:44:48,664-Speed 24935.55 samples/sec   Loss 1.7653   LearningRate 0.0002   Epoch: 24   Global Step: 43020   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 10:44:58,382-Speed 25291.80 samples/sec   Loss 1.7659   LearningRate 0.0002   Epoch: 24   Global Step: 43030   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 10:45:08,213-Speed 25002.06 samples/sec   Loss 1.7639   LearningRate 0.0002   Epoch: 24   Global Step: 43040   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 10:45:18,175-Speed 24673.68 samples/sec   Loss 1.8043   LearningRate 0.0002   Epoch: 24   Global Step: 43050   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 10:45:27,999-Speed 25019.76 samples/sec   Loss 1.7683   LearningRate 0.0002   Epoch: 24   Global Step: 43060   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 10:45:37,690-Speed 25362.15 samples/sec   Loss 1.7623   LearningRate 0.0002   Epoch: 24   Global Step: 43070   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 10:45:47,515-Speed 25016.83 samples/sec   Loss 1.7470   LearningRate 0.0002   Epoch: 24   Global Step: 43080   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 10:45:57,283-Speed 25162.26 samples/sec   Loss 1.7452   LearningRate 0.0002   Epoch: 24   Global Step: 43090   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 10:46:07,132-Speed 24962.27 samples/sec   Loss 1.7597   LearningRate 0.0002   Epoch: 24   Global Step: 43100   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 10:46:17,071-Speed 24728.61 samples/sec   Loss 1.7675   LearningRate 0.0002   Epoch: 24   Global Step: 43110   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 10:46:26,836-Speed 25169.73 samples/sec   Loss 1.7837   LearningRate 0.0002   Epoch: 24   Global Step: 43120   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 10:46:36,662-Speed 25018.56 samples/sec   Loss 1.7686   LearningRate 0.0002   Epoch: 24   Global Step: 43130   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 10:46:46,665-Speed 24577.95 samples/sec   Loss 1.7602   LearningRate 0.0002   Epoch: 24   Global Step: 43140   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 10:46:56,409-Speed 25224.33 samples/sec   Loss 1.7577   LearningRate 0.0002   Epoch: 24   Global Step: 43150   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 10:47:06,188-Speed 25135.47 samples/sec   Loss 1.7399   LearningRate 0.0002   Epoch: 24   Global Step: 43160   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 10:47:16,043-Speed 24942.90 samples/sec   Loss 1.7605   LearningRate 0.0002   Epoch: 24   Global Step: 43170   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 10:47:25,963-Speed 24776.52 samples/sec   Loss 1.7646   LearningRate 0.0002   Epoch: 24   Global Step: 43180   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 10:47:35,799-Speed 24988.58 samples/sec   Loss 1.7615   LearningRate 0.0002   Epoch: 24   Global Step: 43190   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 10:47:45,662-Speed 24921.62 samples/sec   Loss 1.7789   LearningRate 0.0002   Epoch: 24   Global Step: 43200   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 10:48:45,049-Speed 4138.40 samples/sec   Loss 1.7710   LearningRate 0.0002   Epoch: 25   Global Step: 43210   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 10:48:55,074-Speed 24517.22 samples/sec   Loss 1.7387   LearningRate 0.0002   Epoch: 25   Global Step: 43220   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 10:49:05,226-Speed 24221.79 samples/sec   Loss 1.7415   LearningRate 0.0002   Epoch: 25   Global Step: 43230   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 10:49:15,321-Speed 24347.98 samples/sec   Loss 1.7497   LearningRate 0.0002   Epoch: 25   Global Step: 43240   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 10:49:25,413-Speed 24354.73 samples/sec   Loss 1.7403   LearningRate 0.0002   Epoch: 25   Global Step: 43250   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 10:49:35,562-Speed 24218.19 samples/sec   Loss 1.7490   LearningRate 0.0002   Epoch: 25   Global Step: 43260   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 10:49:45,856-Speed 23877.48 samples/sec   Loss 1.7445   LearningRate 0.0002   Epoch: 25   Global Step: 43270   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 10:49:55,996-Speed 24241.35 samples/sec   Loss 1.7444   LearningRate 0.0002   Epoch: 25   Global Step: 43280   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 10:50:06,167-Speed 24165.39 samples/sec   Loss 1.7416   LearningRate 0.0002   Epoch: 25   Global Step: 43290   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 10:50:15,940-Speed 25158.57 samples/sec   Loss 1.7476   LearningRate 0.0002   Epoch: 25   Global Step: 43300   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 10:50:25,872-Speed 24749.58 samples/sec   Loss 1.7364   LearningRate 0.0002   Epoch: 25   Global Step: 43310   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 10:50:35,777-Speed 24820.44 samples/sec   Loss 1.7313   LearningRate 0.0002   Epoch: 25   Global Step: 43320   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 10:50:45,686-Speed 24806.00 samples/sec   Loss 1.7480   LearningRate 0.0002   Epoch: 25   Global Step: 43330   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 10:50:55,434-Speed 25213.34 samples/sec   Loss 1.7428   LearningRate 0.0002   Epoch: 25   Global Step: 43340   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 10:51:05,246-Speed 25049.68 samples/sec   Loss 1.7430   LearningRate 0.0002   Epoch: 25   Global Step: 43350   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 10:51:15,068-Speed 25026.99 samples/sec   Loss 1.7398   LearningRate 0.0002   Epoch: 25   Global Step: 43360   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 10:51:24,885-Speed 25038.72 samples/sec   Loss 1.7418   LearningRate 0.0002   Epoch: 25   Global Step: 43370   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 10:51:34,635-Speed 25208.37 samples/sec   Loss 1.7255   LearningRate 0.0002   Epoch: 25   Global Step: 43380   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 10:51:44,501-Speed 24915.18 samples/sec   Loss 1.7467   LearningRate 0.0002   Epoch: 25   Global Step: 43390   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 10:51:54,362-Speed 24923.99 samples/sec   Loss 1.7589   LearningRate 0.0002   Epoch: 25   Global Step: 43400   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 10:52:04,149-Speed 25114.49 samples/sec   Loss 1.7499   LearningRate 0.0002   Epoch: 25   Global Step: 43410   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 10:52:13,978-Speed 25008.16 samples/sec   Loss 1.7594   LearningRate 0.0002   Epoch: 25   Global Step: 43420   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 10:52:23,869-Speed 24848.31 samples/sec   Loss 1.7464   LearningRate 0.0002   Epoch: 25   Global Step: 43430   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 10:52:33,693-Speed 25019.45 samples/sec   Loss 1.7341   LearningRate 0.0002   Epoch: 25   Global Step: 43440   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 10:52:43,464-Speed 25154.42 samples/sec   Loss 1.7320   LearningRate 0.0002   Epoch: 25   Global Step: 43450   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 10:52:53,257-Speed 25099.61 samples/sec   Loss 1.7315   LearningRate 0.0002   Epoch: 25   Global Step: 43460   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 10:53:03,087-Speed 25004.98 samples/sec   Loss 1.7295   LearningRate 0.0002   Epoch: 25   Global Step: 43470   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 10:53:13,031-Speed 24716.92 samples/sec   Loss 1.7374   LearningRate 0.0002   Epoch: 25   Global Step: 43480   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 10:53:22,842-Speed 25051.68 samples/sec   Loss 1.7422   LearningRate 0.0002   Epoch: 25   Global Step: 43490   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 10:53:32,618-Speed 25144.38 samples/sec   Loss 1.7361   LearningRate 0.0002   Epoch: 25   Global Step: 43500   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 10:53:42,434-Speed 25040.34 samples/sec   Loss 1.7462   LearningRate 0.0002   Epoch: 25   Global Step: 43510   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 10:53:52,145-Speed 25311.40 samples/sec   Loss 1.7570   LearningRate 0.0002   Epoch: 25   Global Step: 43520   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-03-26 10:54:01,961-Speed 25045.73 samples/sec   Loss 1.7485   LearningRate 0.0002   Epoch: 25   Global Step: 43530   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 10:54:11,826-Speed 24915.88 samples/sec   Loss 1.7462   LearningRate 0.0002   Epoch: 25   Global Step: 43540   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 10:54:21,695-Speed 24905.79 samples/sec   Loss 1.7400   LearningRate 0.0002   Epoch: 25   Global Step: 43550   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 10:54:31,620-Speed 24763.64 samples/sec   Loss 1.7404   LearningRate 0.0002   Epoch: 25   Global Step: 43560   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 10:54:41,518-Speed 24831.86 samples/sec   Loss 1.7468   LearningRate 0.0002   Epoch: 25   Global Step: 43570   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 10:54:51,309-Speed 25103.78 samples/sec   Loss 1.7339   LearningRate 0.0002   Epoch: 25   Global Step: 43580   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 10:55:01,078-Speed 25158.50 samples/sec   Loss 1.7356   LearningRate 0.0002   Epoch: 25   Global Step: 43590   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 10:55:10,892-Speed 25046.21 samples/sec   Loss 1.7364   LearningRate 0.0002   Epoch: 25   Global Step: 43600   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 10:55:20,676-Speed 25120.57 samples/sec   Loss 1.7379   LearningRate 0.0002   Epoch: 25   Global Step: 43610   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 10:55:30,549-Speed 24895.93 samples/sec   Loss 1.7468   LearningRate 0.0002   Epoch: 25   Global Step: 43620   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 10:55:40,326-Speed 25142.49 samples/sec   Loss 1.7335   LearningRate 0.0002   Epoch: 25   Global Step: 43630   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-03-26 10:55:50,234-Speed 24806.38 samples/sec   Loss 1.7291   LearningRate 0.0002   Epoch: 25   Global Step: 43640   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 10:55:59,993-Speed 25187.74 samples/sec   Loss 1.7293   LearningRate 0.0002   Epoch: 25   Global Step: 43650   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 10:56:09,878-Speed 24863.15 samples/sec   Loss 1.7439   LearningRate 0.0002   Epoch: 25   Global Step: 43660   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 10:56:19,678-Speed 25081.04 samples/sec   Loss 1.7325   LearningRate 0.0002   Epoch: 25   Global Step: 43670   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 10:56:29,537-Speed 24931.88 samples/sec   Loss 1.7374   LearningRate 0.0002   Epoch: 25   Global Step: 43680   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 10:56:39,446-Speed 24807.95 samples/sec   Loss 1.7336   LearningRate 0.0002   Epoch: 25   Global Step: 43690   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 10:56:49,262-Speed 25037.29 samples/sec   Loss 1.7436   LearningRate 0.0002   Epoch: 25   Global Step: 43700   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 10:56:59,145-Speed 24870.77 samples/sec   Loss 1.7511   LearningRate 0.0002   Epoch: 25   Global Step: 43710   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 10:57:08,919-Speed 25147.80 samples/sec   Loss 1.7479   LearningRate 0.0002   Epoch: 25   Global Step: 43720   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 10:57:18,673-Speed 25199.09 samples/sec   Loss 1.7311   LearningRate 0.0002   Epoch: 25   Global Step: 43730   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 10:57:28,480-Speed 25063.79 samples/sec   Loss 1.7551   LearningRate 0.0002   Epoch: 25   Global Step: 43740   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 10:57:38,376-Speed 24840.10 samples/sec   Loss 1.7392   LearningRate 0.0002   Epoch: 25   Global Step: 43750   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 10:57:48,233-Speed 24934.75 samples/sec   Loss 1.7447   LearningRate 0.0002   Epoch: 25   Global Step: 43760   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 10:57:58,009-Speed 25141.66 samples/sec   Loss 1.7322   LearningRate 0.0002   Epoch: 25   Global Step: 43770   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 10:58:07,727-Speed 25298.71 samples/sec   Loss 1.7395   LearningRate 0.0002   Epoch: 25   Global Step: 43780   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 10:58:17,474-Speed 25218.58 samples/sec   Loss 1.7322   LearningRate 0.0002   Epoch: 25   Global Step: 43790   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 10:58:27,237-Speed 25185.26 samples/sec   Loss 1.7562   LearningRate 0.0002   Epoch: 25   Global Step: 43800   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 10:58:37,051-Speed 25045.91 samples/sec   Loss 1.7332   LearningRate 0.0002   Epoch: 25   Global Step: 43810   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 10:58:46,852-Speed 25078.67 samples/sec   Loss 1.7263   LearningRate 0.0002   Epoch: 25   Global Step: 43820   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 10:58:56,732-Speed 24879.84 samples/sec   Loss 1.7247   LearningRate 0.0002   Epoch: 25   Global Step: 43830   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 10:59:06,503-Speed 25153.63 samples/sec   Loss 1.7384   LearningRate 0.0002   Epoch: 25   Global Step: 43840   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 10:59:16,303-Speed 25082.79 samples/sec   Loss 1.7342   LearningRate 0.0002   Epoch: 25   Global Step: 43850   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 10:59:26,140-Speed 24989.40 samples/sec   Loss 1.7252   LearningRate 0.0002   Epoch: 25   Global Step: 43860   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 10:59:36,051-Speed 24800.20 samples/sec   Loss 1.7361   LearningRate 0.0002   Epoch: 25   Global Step: 43870   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 10:59:45,951-Speed 24826.58 samples/sec   Loss 1.7342   LearningRate 0.0002   Epoch: 25   Global Step: 43880   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 10:59:55,727-Speed 25143.14 samples/sec   Loss 1.7310   LearningRate 0.0002   Epoch: 25   Global Step: 43890   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:00:05,622-Speed 24841.96 samples/sec   Loss 1.7314   LearningRate 0.0002   Epoch: 25   Global Step: 43900   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:00:15,486-Speed 24917.67 samples/sec   Loss 1.7345   LearningRate 0.0002   Epoch: 25   Global Step: 43910   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:00:25,303-Speed 25038.34 samples/sec   Loss 1.7392   LearningRate 0.0002   Epoch: 25   Global Step: 43920   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:00:35,067-Speed 25173.20 samples/sec   Loss 1.7708   LearningRate 0.0002   Epoch: 25   Global Step: 43930   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:00:44,834-Speed 25164.76 samples/sec   Loss 1.7444   LearningRate 0.0002   Epoch: 25   Global Step: 43940   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-03-26 11:00:54,594-Speed 25183.35 samples/sec   Loss 1.7359   LearningRate 0.0002   Epoch: 25   Global Step: 43950   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:01:04,408-Speed 25045.52 samples/sec   Loss 1.7287   LearningRate 0.0002   Epoch: 25   Global Step: 43960   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:01:14,168-Speed 25182.78 samples/sec   Loss 1.7170   LearningRate 0.0002   Epoch: 25   Global Step: 43970   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:01:23,907-Speed 25245.27 samples/sec   Loss 1.7138   LearningRate 0.0002   Epoch: 25   Global Step: 43980   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:01:33,666-Speed 25188.21 samples/sec   Loss 1.7157   LearningRate 0.0002   Epoch: 25   Global Step: 43990   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:01:43,456-Speed 25106.76 samples/sec   Loss 1.7356   LearningRate 0.0002   Epoch: 25   Global Step: 44000   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:01:53,289-Speed 24997.08 samples/sec   Loss 1.7257   LearningRate 0.0002   Epoch: 25   Global Step: 44010   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:02:03,192-Speed 24819.73 samples/sec   Loss 1.7250   LearningRate 0.0002   Epoch: 25   Global Step: 44020   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:02:13,052-Speed 24929.04 samples/sec   Loss 1.7284   LearningRate 0.0002   Epoch: 25   Global Step: 44030   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:02:22,846-Speed 25094.51 samples/sec   Loss 1.7154   LearningRate 0.0002   Epoch: 25   Global Step: 44040   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:02:32,727-Speed 24876.26 samples/sec   Loss 1.7168   LearningRate 0.0002   Epoch: 25   Global Step: 44050   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:02:42,545-Speed 25036.65 samples/sec   Loss 1.7129   LearningRate 0.0002   Epoch: 25   Global Step: 44060   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:02:52,334-Speed 25106.41 samples/sec   Loss 1.6997   LearningRate 0.0002   Epoch: 25   Global Step: 44070   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:03:02,206-Speed 24906.19 samples/sec   Loss 1.7075   LearningRate 0.0002   Epoch: 25   Global Step: 44080   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:03:12,023-Speed 25037.10 samples/sec   Loss 1.7230   LearningRate 0.0002   Epoch: 25   Global Step: 44090   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:03:21,775-Speed 25202.89 samples/sec   Loss 1.7290   LearningRate 0.0002   Epoch: 25   Global Step: 44100   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:03:31,535-Speed 25191.29 samples/sec   Loss 1.7243   LearningRate 0.0002   Epoch: 25   Global Step: 44110   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:03:41,433-Speed 24833.49 samples/sec   Loss 1.7156   LearningRate 0.0002   Epoch: 25   Global Step: 44120   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:03:51,401-Speed 24663.71 samples/sec   Loss 1.7090   LearningRate 0.0002   Epoch: 25   Global Step: 44130   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:04:01,313-Speed 24796.51 samples/sec   Loss 1.7176   LearningRate 0.0002   Epoch: 25   Global Step: 44140   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:04:11,186-Speed 24896.07 samples/sec   Loss 1.7172   LearningRate 0.0002   Epoch: 25   Global Step: 44150   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-03-26 11:04:21,097-Speed 24798.92 samples/sec   Loss 1.7160   LearningRate 0.0002   Epoch: 25   Global Step: 44160   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-03-26 11:04:30,895-Speed 25085.24 samples/sec   Loss 1.7267   LearningRate 0.0002   Epoch: 25   Global Step: 44170   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:04:40,794-Speed 24831.30 samples/sec   Loss 1.7204   LearningRate 0.0002   Epoch: 25   Global Step: 44180   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:04:50,552-Speed 25187.70 samples/sec   Loss 1.7196   LearningRate 0.0002   Epoch: 25   Global Step: 44190   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:05:00,292-Speed 25234.80 samples/sec   Loss 1.7141   LearningRate 0.0002   Epoch: 25   Global Step: 44200   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:05:10,101-Speed 25057.78 samples/sec   Loss 1.7203   LearningRate 0.0002   Epoch: 25   Global Step: 44210   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:05:19,878-Speed 25140.64 samples/sec   Loss 1.7315   LearningRate 0.0002   Epoch: 25   Global Step: 44220   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:05:29,772-Speed 24844.63 samples/sec   Loss 1.7247   LearningRate 0.0002   Epoch: 25   Global Step: 44230   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:05:39,575-Speed 25075.61 samples/sec   Loss 1.7141   LearningRate 0.0002   Epoch: 25   Global Step: 44240   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:05:49,488-Speed 24793.29 samples/sec   Loss 1.7113   LearningRate 0.0002   Epoch: 25   Global Step: 44250   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:05:59,335-Speed 24967.63 samples/sec   Loss 1.7091   LearningRate 0.0002   Epoch: 25   Global Step: 44260   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:06:09,234-Speed 24832.19 samples/sec   Loss 1.7169   LearningRate 0.0002   Epoch: 25   Global Step: 44270   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:06:19,123-Speed 24856.38 samples/sec   Loss 1.7172   LearningRate 0.0002   Epoch: 25   Global Step: 44280   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:06:28,867-Speed 25223.34 samples/sec   Loss 1.7129   LearningRate 0.0002   Epoch: 25   Global Step: 44290   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:06:38,733-Speed 24912.72 samples/sec   Loss 1.7156   LearningRate 0.0002   Epoch: 25   Global Step: 44300   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:06:48,644-Speed 24799.15 samples/sec   Loss 1.7144   LearningRate 0.0002   Epoch: 25   Global Step: 44310   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:06:58,516-Speed 24898.86 samples/sec   Loss 1.7185   LearningRate 0.0002   Epoch: 25   Global Step: 44320   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:07:08,373-Speed 24934.19 samples/sec   Loss 1.7064   LearningRate 0.0002   Epoch: 25   Global Step: 44330   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:07:18,109-Speed 25246.97 samples/sec   Loss 1.7206   LearningRate 0.0002   Epoch: 25   Global Step: 44340   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:07:28,010-Speed 24825.09 samples/sec   Loss 1.7189   LearningRate 0.0002   Epoch: 25   Global Step: 44350   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:07:38,038-Speed 24511.16 samples/sec   Loss 1.7155   LearningRate 0.0002   Epoch: 25   Global Step: 44360   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:07:47,831-Speed 25101.53 samples/sec   Loss 1.7014   LearningRate 0.0002   Epoch: 25   Global Step: 44370   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-03-26 11:07:57,745-Speed 24791.55 samples/sec   Loss 1.7131   LearningRate 0.0002   Epoch: 25   Global Step: 44380   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:08:07,816-Speed 24405.19 samples/sec   Loss 1.7205   LearningRate 0.0002   Epoch: 25   Global Step: 44390   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:08:17,900-Speed 24372.96 samples/sec   Loss 1.7108   LearningRate 0.0002   Epoch: 25   Global Step: 44400   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:08:28,009-Speed 24315.13 samples/sec   Loss 1.7249   LearningRate 0.0002   Epoch: 25   Global Step: 44410   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:08:38,122-Speed 24303.83 samples/sec   Loss 1.7232   LearningRate 0.0002   Epoch: 25   Global Step: 44420   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:08:48,203-Speed 24381.36 samples/sec   Loss 1.7110   LearningRate 0.0002   Epoch: 25   Global Step: 44430   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:08:58,290-Speed 24369.73 samples/sec   Loss 1.7059   LearningRate 0.0002   Epoch: 25   Global Step: 44440   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:09:08,404-Speed 24302.30 samples/sec   Loss 1.7093   LearningRate 0.0002   Epoch: 25   Global Step: 44450   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:09:18,493-Speed 24361.82 samples/sec   Loss 1.6872   LearningRate 0.0002   Epoch: 25   Global Step: 44460   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:09:28,597-Speed 24323.64 samples/sec   Loss 1.6922   LearningRate 0.0002   Epoch: 25   Global Step: 44470   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:09:38,675-Speed 24390.96 samples/sec   Loss 1.7059   LearningRate 0.0002   Epoch: 25   Global Step: 44480   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:09:48,806-Speed 24262.91 samples/sec   Loss 1.7182   LearningRate 0.0002   Epoch: 25   Global Step: 44490   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:09:58,936-Speed 24263.71 samples/sec   Loss 1.6978   LearningRate 0.0002   Epoch: 25   Global Step: 44500   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:10:09,039-Speed 24328.31 samples/sec   Loss 1.7075   LearningRate 0.0002   Epoch: 25   Global Step: 44510   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:10:19,124-Speed 24373.84 samples/sec   Loss 1.7175   LearningRate 0.0002   Epoch: 25   Global Step: 44520   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:10:29,215-Speed 24356.44 samples/sec   Loss 1.7110   LearningRate 0.0002   Epoch: 25   Global Step: 44530   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:10:39,339-Speed 24278.16 samples/sec   Loss 1.7098   LearningRate 0.0002   Epoch: 25   Global Step: 44540   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:10:49,489-Speed 24216.26 samples/sec   Loss 1.6981   LearningRate 0.0002   Epoch: 25   Global Step: 44550   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:10:59,711-Speed 24046.13 samples/sec   Loss 1.7010   LearningRate 0.0002   Epoch: 25   Global Step: 44560   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:11:09,812-Speed 24331.85 samples/sec   Loss 1.7219   LearningRate 0.0002   Epoch: 25   Global Step: 44570   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:11:19,960-Speed 24220.93 samples/sec   Loss 1.7057   LearningRate 0.0002   Epoch: 25   Global Step: 44580   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-03-26 11:11:30,064-Speed 24331.09 samples/sec   Loss 1.6970   LearningRate 0.0002   Epoch: 25   Global Step: 44590   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:11:40,190-Speed 24274.89 samples/sec   Loss 1.7073   LearningRate 0.0002   Epoch: 25   Global Step: 44600   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:11:50,107-Speed 24784.99 samples/sec   Loss 1.7033   LearningRate 0.0002   Epoch: 25   Global Step: 44610   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:11:59,916-Speed 25059.66 samples/sec   Loss 1.7029   LearningRate 0.0002   Epoch: 25   Global Step: 44620   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:12:09,682-Speed 25168.62 samples/sec   Loss 1.7033   LearningRate 0.0002   Epoch: 25   Global Step: 44630   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:12:19,427-Speed 25221.55 samples/sec   Loss 1.6998   LearningRate 0.0002   Epoch: 25   Global Step: 44640   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-26 11:12:29,227-Speed 25078.61 samples/sec   Loss 1.7030   LearningRate 0.0002   Epoch: 25   Global Step: 44650   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-26 11:12:38,967-Speed 25237.28 samples/sec   Loss 1.7107   LearningRate 0.0002   Epoch: 25   Global Step: 44660   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-26 11:12:48,879-Speed 24798.17 samples/sec   Loss 1.7109   LearningRate 0.0002   Epoch: 25   Global Step: 44670   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-26 11:12:58,686-Speed 25061.40 samples/sec   Loss 1.7034   LearningRate 0.0002   Epoch: 25   Global Step: 44680   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-26 11:13:08,494-Speed 25060.88 samples/sec   Loss 1.6905   LearningRate 0.0002   Epoch: 25   Global Step: 44690   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-26 11:13:18,296-Speed 25074.99 samples/sec   Loss 1.6956   LearningRate 0.0002   Epoch: 25   Global Step: 44700   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-26 11:13:28,055-Speed 25186.50 samples/sec   Loss 1.6844   LearningRate 0.0002   Epoch: 25   Global Step: 44710   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-26 11:13:37,813-Speed 25187.73 samples/sec   Loss 1.6969   LearningRate 0.0002   Epoch: 25   Global Step: 44720   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-26 11:13:47,615-Speed 25077.75 samples/sec   Loss 1.6916   LearningRate 0.0002   Epoch: 25   Global Step: 44730   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-26 11:13:57,365-Speed 25207.47 samples/sec   Loss 1.6875   LearningRate 0.0002   Epoch: 25   Global Step: 44740   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:14:07,205-Speed 24980.02 samples/sec   Loss 1.6968   LearningRate 0.0002   Epoch: 25   Global Step: 44750   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:14:17,042-Speed 24987.72 samples/sec   Loss 1.6937   LearningRate 0.0002   Epoch: 25   Global Step: 44760   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:14:26,811-Speed 25160.55 samples/sec   Loss 1.6970   LearningRate 0.0002   Epoch: 25   Global Step: 44770   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:14:36,546-Speed 25250.51 samples/sec   Loss 1.6948   LearningRate 0.0002   Epoch: 25   Global Step: 44780   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:14:46,374-Speed 25011.35 samples/sec   Loss 1.6799   LearningRate 0.0002   Epoch: 25   Global Step: 44790   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:14:56,129-Speed 25195.44 samples/sec   Loss 1.6898   LearningRate 0.0002   Epoch: 25   Global Step: 44800   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:15:05,898-Speed 25161.32 samples/sec   Loss 1.7023   LearningRate 0.0002   Epoch: 25   Global Step: 44810   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:15:15,650-Speed 25204.32 samples/sec   Loss 1.6946   LearningRate 0.0002   Epoch: 25   Global Step: 44820   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:15:25,468-Speed 25039.23 samples/sec   Loss 1.6984   LearningRate 0.0002   Epoch: 25   Global Step: 44830   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:15:35,226-Speed 25196.17 samples/sec   Loss 1.7074   LearningRate 0.0002   Epoch: 25   Global Step: 44840   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:15:45,002-Speed 25142.02 samples/sec   Loss 1.6890   LearningRate 0.0002   Epoch: 25   Global Step: 44850   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:15:54,833-Speed 25001.31 samples/sec   Loss 1.6842   LearningRate 0.0002   Epoch: 25   Global Step: 44860   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:16:04,586-Speed 25202.74 samples/sec   Loss 1.6940   LearningRate 0.0002   Epoch: 25   Global Step: 44870   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:16:14,315-Speed 25263.08 samples/sec   Loss 1.6948   LearningRate 0.0002   Epoch: 25   Global Step: 44880   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:16:24,041-Speed 25272.75 samples/sec   Loss 1.6996   LearningRate 0.0002   Epoch: 25   Global Step: 44890   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:16:33,715-Speed 25408.56 samples/sec   Loss 1.7020   LearningRate 0.0002   Epoch: 25   Global Step: 44900   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:16:43,499-Speed 25122.38 samples/sec   Loss 1.7183   LearningRate 0.0002   Epoch: 25   Global Step: 44910   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:16:53,366-Speed 24910.03 samples/sec   Loss 1.7138   LearningRate 0.0002   Epoch: 25   Global Step: 44920   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:17:03,072-Speed 25331.48 samples/sec   Loss 1.6925   LearningRate 0.0002   Epoch: 25   Global Step: 44930   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:18:03,268-Speed 4082.79 samples/sec   Loss 1.6847   LearningRate 0.0002   Epoch: 26   Global Step: 44940   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:18:13,051-Speed 25124.77 samples/sec   Loss 1.6737   LearningRate 0.0002   Epoch: 26   Global Step: 44950   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:18:22,749-Speed 25343.41 samples/sec   Loss 1.6880   LearningRate 0.0002   Epoch: 26   Global Step: 44960   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:18:32,469-Speed 25287.61 samples/sec   Loss 1.6894   LearningRate 0.0002   Epoch: 26   Global Step: 44970   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:18:42,246-Speed 25141.26 samples/sec   Loss 1.6993   LearningRate 0.0002   Epoch: 26   Global Step: 44980   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:18:51,923-Speed 25400.56 samples/sec   Loss 1.6904   LearningRate 0.0002   Epoch: 26   Global Step: 44990   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:19:01,652-Speed 25262.47 samples/sec   Loss 1.6859   LearningRate 0.0002   Epoch: 26   Global Step: 45000   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:19:11,381-Speed 25266.68 samples/sec   Loss 1.6863   LearningRate 0.0002   Epoch: 26   Global Step: 45010   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-26 11:19:21,191-Speed 25053.30 samples/sec   Loss 1.6910   LearningRate 0.0002   Epoch: 26   Global Step: 45020   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-26 11:19:31,068-Speed 24885.78 samples/sec   Loss 1.7008   LearningRate 0.0001   Epoch: 26   Global Step: 45030   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-26 11:19:40,889-Speed 25024.59 samples/sec   Loss 1.6758   LearningRate 0.0001   Epoch: 26   Global Step: 45040   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-26 11:19:50,721-Speed 24999.36 samples/sec   Loss 1.6794   LearningRate 0.0001   Epoch: 26   Global Step: 45050   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-26 11:20:00,574-Speed 24949.63 samples/sec   Loss 1.6895   LearningRate 0.0001   Epoch: 26   Global Step: 45060   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-26 11:20:10,293-Speed 25290.35 samples/sec   Loss 1.6738   LearningRate 0.0001   Epoch: 26   Global Step: 45070   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-26 11:20:20,027-Speed 25251.55 samples/sec   Loss 1.6837   LearningRate 0.0001   Epoch: 26   Global Step: 45080   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-26 11:20:29,792-Speed 25170.41 samples/sec   Loss 1.6908   LearningRate 0.0001   Epoch: 26   Global Step: 45090   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-26 11:20:39,528-Speed 25247.11 samples/sec   Loss 1.6799   LearningRate 0.0001   Epoch: 26   Global Step: 45100   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-26 11:20:49,319-Speed 25105.09 samples/sec   Loss 1.6691   LearningRate 0.0001   Epoch: 26   Global Step: 45110   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:20:59,053-Speed 25248.88 samples/sec   Loss 1.6888   LearningRate 0.0001   Epoch: 26   Global Step: 45120   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:21:08,804-Speed 25206.34 samples/sec   Loss 1.6888   LearningRate 0.0001   Epoch: 26   Global Step: 45130   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:21:18,504-Speed 25339.68 samples/sec   Loss 1.6843   LearningRate 0.0001   Epoch: 26   Global Step: 45140   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:21:28,246-Speed 25229.98 samples/sec   Loss 1.6822   LearningRate 0.0001   Epoch: 26   Global Step: 45150   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:21:37,981-Speed 25246.88 samples/sec   Loss 1.6833   LearningRate 0.0001   Epoch: 26   Global Step: 45160   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:21:47,708-Speed 25266.61 samples/sec   Loss 1.6959   LearningRate 0.0001   Epoch: 26   Global Step: 45170   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:21:57,426-Speed 25299.49 samples/sec   Loss 1.6967   LearningRate 0.0001   Epoch: 26   Global Step: 45180   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:22:07,161-Speed 25249.05 samples/sec   Loss 1.6975   LearningRate 0.0001   Epoch: 26   Global Step: 45190   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:22:16,983-Speed 25024.49 samples/sec   Loss 1.6921   LearningRate 0.0001   Epoch: 26   Global Step: 45200   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:22:26,744-Speed 25181.93 samples/sec   Loss 1.6585   LearningRate 0.0001   Epoch: 26   Global Step: 45210   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-03-26 11:22:36,439-Speed 25350.87 samples/sec   Loss 1.6578   LearningRate 0.0001   Epoch: 26   Global Step: 45220   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:22:46,228-Speed 25107.47 samples/sec   Loss 1.6826   LearningRate 0.0001   Epoch: 26   Global Step: 45230   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:22:55,927-Speed 25341.03 samples/sec   Loss 1.6725   LearningRate 0.0001   Epoch: 26   Global Step: 45240   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:23:05,631-Speed 25328.79 samples/sec   Loss 1.6711   LearningRate 0.0001   Epoch: 26   Global Step: 45250   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:23:15,328-Speed 25348.16 samples/sec   Loss 1.6821   LearningRate 0.0001   Epoch: 26   Global Step: 45260   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:23:25,035-Speed 25318.75 samples/sec   Loss 1.6921   LearningRate 0.0001   Epoch: 26   Global Step: 45270   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:23:34,733-Speed 25343.97 samples/sec   Loss 1.6941   LearningRate 0.0001   Epoch: 26   Global Step: 45280   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:23:44,575-Speed 24975.44 samples/sec   Loss 1.6774   LearningRate 0.0001   Epoch: 26   Global Step: 45290   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:23:54,333-Speed 25187.35 samples/sec   Loss 1.6912   LearningRate 0.0001   Epoch: 26   Global Step: 45300   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:24:04,034-Speed 25336.30 samples/sec   Loss 1.6938   LearningRate 0.0001   Epoch: 26   Global Step: 45310   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:24:13,805-Speed 25155.29 samples/sec   Loss 1.6889   LearningRate 0.0001   Epoch: 26   Global Step: 45320   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:24:23,524-Speed 25288.62 samples/sec   Loss 1.6818   LearningRate 0.0001   Epoch: 26   Global Step: 45330   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:24:33,295-Speed 25153.15 samples/sec   Loss 1.6850   LearningRate 0.0001   Epoch: 26   Global Step: 45340   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:24:43,051-Speed 25193.36 samples/sec   Loss 1.6688   LearningRate 0.0001   Epoch: 26   Global Step: 45350   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:24:52,868-Speed 25039.02 samples/sec   Loss 1.6820   LearningRate 0.0001   Epoch: 26   Global Step: 45360   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:25:02,608-Speed 25234.16 samples/sec   Loss 1.6829   LearningRate 0.0001   Epoch: 26   Global Step: 45370   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:25:12,355-Speed 25216.43 samples/sec   Loss 1.6759   LearningRate 0.0001   Epoch: 26   Global Step: 45380   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:25:22,053-Speed 25343.78 samples/sec   Loss 1.6786   LearningRate 0.0001   Epoch: 26   Global Step: 45390   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:25:31,734-Speed 25387.98 samples/sec   Loss 1.6681   LearningRate 0.0001   Epoch: 26   Global Step: 45400   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:25:41,452-Speed 25290.64 samples/sec   Loss 1.6965   LearningRate 0.0001   Epoch: 26   Global Step: 45410   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:25:51,168-Speed 25299.19 samples/sec   Loss 1.6729   LearningRate 0.0001   Epoch: 26   Global Step: 45420   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:26:00,885-Speed 25293.62 samples/sec   Loss 1.6782   LearningRate 0.0001   Epoch: 26   Global Step: 45430   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:26:10,686-Speed 25078.33 samples/sec   Loss 1.6820   LearningRate 0.0001   Epoch: 26   Global Step: 45440   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:26:20,410-Speed 25276.30 samples/sec   Loss 1.6791   LearningRate 0.0001   Epoch: 26   Global Step: 45450   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:26:30,281-Speed 24902.61 samples/sec   Loss 1.6851   LearningRate 0.0001   Epoch: 26   Global Step: 45460   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:26:39,942-Speed 25439.99 samples/sec   Loss 1.6792   LearningRate 0.0001   Epoch: 26   Global Step: 45470   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:26:49,688-Speed 25218.44 samples/sec   Loss 1.6777   LearningRate 0.0001   Epoch: 26   Global Step: 45480   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:26:59,397-Speed 25316.26 samples/sec   Loss 1.6789   LearningRate 0.0001   Epoch: 26   Global Step: 45490   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:27:09,122-Speed 25272.21 samples/sec   Loss 1.6655   LearningRate 0.0001   Epoch: 26   Global Step: 45500   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:27:18,893-Speed 25155.67 samples/sec   Loss 1.6717   LearningRate 0.0001   Epoch: 26   Global Step: 45510   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:27:28,690-Speed 25087.96 samples/sec   Loss 1.6756   LearningRate 0.0001   Epoch: 26   Global Step: 45520   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:27:38,484-Speed 25095.46 samples/sec   Loss 1.6727   LearningRate 0.0001   Epoch: 26   Global Step: 45530   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:27:48,476-Speed 24597.75 samples/sec   Loss 1.6674   LearningRate 0.0001   Epoch: 26   Global Step: 45540   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:27:58,527-Speed 24455.78 samples/sec   Loss 1.6683   LearningRate 0.0001   Epoch: 26   Global Step: 45550   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:28:08,564-Speed 24487.17 samples/sec   Loss 1.6612   LearningRate 0.0001   Epoch: 26   Global Step: 45560   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:28:18,587-Speed 24522.39 samples/sec   Loss 1.6666   LearningRate 0.0001   Epoch: 26   Global Step: 45570   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:28:28,573-Speed 24613.14 samples/sec   Loss 1.6689   LearningRate 0.0001   Epoch: 26   Global Step: 45580   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:28:38,638-Speed 24419.11 samples/sec   Loss 1.6763   LearningRate 0.0001   Epoch: 26   Global Step: 45590   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:28:48,704-Speed 24417.34 samples/sec   Loss 1.6827   LearningRate 0.0001   Epoch: 26   Global Step: 45600   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:28:58,711-Speed 24562.71 samples/sec   Loss 1.6732   LearningRate 0.0001   Epoch: 26   Global Step: 45610   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:29:08,766-Speed 24443.30 samples/sec   Loss 1.6852   LearningRate 0.0001   Epoch: 26   Global Step: 45620   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-03-26 11:29:18,747-Speed 24624.42 samples/sec   Loss 1.6661   LearningRate 0.0001   Epoch: 26   Global Step: 45630   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:29:28,799-Speed 24452.82 samples/sec   Loss 1.6700   LearningRate 0.0001   Epoch: 26   Global Step: 45640   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:29:38,855-Speed 24441.25 samples/sec   Loss 1.6678   LearningRate 0.0001   Epoch: 26   Global Step: 45650   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:29:48,817-Speed 24670.87 samples/sec   Loss 1.6582   LearningRate 0.0001   Epoch: 26   Global Step: 45660   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:29:58,850-Speed 24497.90 samples/sec   Loss 1.6739   LearningRate 0.0001   Epoch: 26   Global Step: 45670   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:30:08,879-Speed 24507.25 samples/sec   Loss 1.6606   LearningRate 0.0001   Epoch: 26   Global Step: 45680   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:30:18,918-Speed 24484.24 samples/sec   Loss 1.6559   LearningRate 0.0001   Epoch: 26   Global Step: 45690   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:30:28,887-Speed 24654.61 samples/sec   Loss 1.6685   LearningRate 0.0001   Epoch: 26   Global Step: 45700   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:30:38,871-Speed 24617.60 samples/sec   Loss 1.6639   LearningRate 0.0001   Epoch: 26   Global Step: 45710   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:30:48,890-Speed 24529.94 samples/sec   Loss 1.6717   LearningRate 0.0001   Epoch: 26   Global Step: 45720   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:30:58,859-Speed 24655.55 samples/sec   Loss 1.6609   LearningRate 0.0001   Epoch: 26   Global Step: 45730   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:31:08,864-Speed 24565.14 samples/sec   Loss 1.6712   LearningRate 0.0001   Epoch: 26   Global Step: 45740   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:31:18,833-Speed 24654.98 samples/sec   Loss 1.6635   LearningRate 0.0001   Epoch: 26   Global Step: 45750   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:31:28,830-Speed 24586.50 samples/sec   Loss 1.6671   LearningRate 0.0001   Epoch: 26   Global Step: 45760   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:31:38,881-Speed 24460.13 samples/sec   Loss 1.6704   LearningRate 0.0001   Epoch: 26   Global Step: 45770   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:31:48,861-Speed 24627.37 samples/sec   Loss 1.6606   LearningRate 0.0001   Epoch: 26   Global Step: 45780   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:31:58,837-Speed 24636.58 samples/sec   Loss 1.6656   LearningRate 0.0001   Epoch: 26   Global Step: 45790   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:32:08,879-Speed 24475.32 samples/sec   Loss 1.6651   LearningRate 0.0001   Epoch: 26   Global Step: 45800   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:32:18,992-Speed 24305.08 samples/sec   Loss 1.6505   LearningRate 0.0001   Epoch: 26   Global Step: 45810   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:32:29,039-Speed 24465.13 samples/sec   Loss 1.6567   LearningRate 0.0001   Epoch: 26   Global Step: 45820   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:32:39,028-Speed 24606.28 samples/sec   Loss 1.6673   LearningRate 0.0001   Epoch: 26   Global Step: 45830   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:32:49,131-Speed 24331.63 samples/sec   Loss 1.6728   LearningRate 0.0001   Epoch: 26   Global Step: 45840   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:32:59,123-Speed 24604.24 samples/sec   Loss 1.6493   LearningRate 0.0001   Epoch: 26   Global Step: 45850   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:33:08,934-Speed 25053.24 samples/sec   Loss 1.6548   LearningRate 0.0001   Epoch: 26   Global Step: 45860   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:33:18,654-Speed 25288.00 samples/sec   Loss 1.6662   LearningRate 0.0001   Epoch: 26   Global Step: 45870   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:33:28,392-Speed 25239.45 samples/sec   Loss 1.6586   LearningRate 0.0001   Epoch: 26   Global Step: 45880   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:33:38,075-Speed 25384.67 samples/sec   Loss 1.6646   LearningRate 0.0001   Epoch: 26   Global Step: 45890   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:33:47,841-Speed 25169.80 samples/sec   Loss 1.6657   LearningRate 0.0001   Epoch: 26   Global Step: 45900   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:33:57,598-Speed 25191.74 samples/sec   Loss 1.6729   LearningRate 0.0001   Epoch: 26   Global Step: 45910   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:34:07,415-Speed 25037.52 samples/sec   Loss 1.6682   LearningRate 0.0001   Epoch: 26   Global Step: 45920   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:34:17,187-Speed 25152.93 samples/sec   Loss 1.6727   LearningRate 0.0001   Epoch: 26   Global Step: 45930   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-03-26 11:34:27,002-Speed 25042.92 samples/sec   Loss 1.6502   LearningRate 0.0001   Epoch: 26   Global Step: 45940   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:34:36,780-Speed 25137.57 samples/sec   Loss 1.6642   LearningRate 0.0001   Epoch: 26   Global Step: 45950   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:34:46,719-Speed 24729.93 samples/sec   Loss 1.6589   LearningRate 0.0001   Epoch: 26   Global Step: 45960   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:34:56,917-Speed 24102.90 samples/sec   Loss 1.6577   LearningRate 0.0001   Epoch: 26   Global Step: 45970   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:35:06,819-Speed 24823.00 samples/sec   Loss 1.6622   LearningRate 0.0001   Epoch: 26   Global Step: 45980   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:35:16,720-Speed 24824.85 samples/sec   Loss 1.6406   LearningRate 0.0001   Epoch: 26   Global Step: 45990   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:35:26,713-Speed 24595.32 samples/sec   Loss 1.6548   LearningRate 0.0001   Epoch: 26   Global Step: 46000   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:35:36,745-Speed 24502.94 samples/sec   Loss 1.6588   LearningRate 0.0001   Epoch: 26   Global Step: 46010   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:35:46,658-Speed 24793.17 samples/sec   Loss 1.6578   LearningRate 0.0001   Epoch: 26   Global Step: 46020   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:35:56,605-Speed 24712.47 samples/sec   Loss 1.6690   LearningRate 0.0001   Epoch: 26   Global Step: 46030   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:36:06,615-Speed 24555.12 samples/sec   Loss 1.6490   LearningRate 0.0001   Epoch: 26   Global Step: 46040   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-03-26 11:36:16,594-Speed 24629.13 samples/sec   Loss 1.6601   LearningRate 0.0001   Epoch: 26   Global Step: 46050   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-26 11:36:26,495-Speed 24827.02 samples/sec   Loss 1.6671   LearningRate 0.0001   Epoch: 26   Global Step: 46060   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-26 11:36:36,374-Speed 24882.09 samples/sec   Loss 1.6491   LearningRate 0.0001   Epoch: 26   Global Step: 46070   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-26 11:36:46,357-Speed 24626.09 samples/sec   Loss 1.6465   LearningRate 0.0001   Epoch: 26   Global Step: 46080   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-26 11:36:56,347-Speed 24604.31 samples/sec   Loss 1.6625   LearningRate 0.0001   Epoch: 26   Global Step: 46090   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-26 11:37:06,272-Speed 24764.77 samples/sec   Loss 1.6570   LearningRate 0.0001   Epoch: 26   Global Step: 46100   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-26 11:37:16,190-Speed 24782.53 samples/sec   Loss 1.6464   LearningRate 0.0001   Epoch: 26   Global Step: 46110   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-26 11:37:26,163-Speed 24645.75 samples/sec   Loss 1.6462   LearningRate 0.0001   Epoch: 26   Global Step: 46120   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-26 11:37:36,125-Speed 24671.91 samples/sec   Loss 1.6526   LearningRate 0.0001   Epoch: 26   Global Step: 46130   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-26 11:37:46,076-Speed 24702.00 samples/sec   Loss 1.6496   LearningRate 0.0001   Epoch: 26   Global Step: 46140   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-26 11:37:56,019-Speed 24719.39 samples/sec   Loss 1.6567   LearningRate 0.0001   Epoch: 26   Global Step: 46150   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-26 11:38:05,897-Speed 24882.59 samples/sec   Loss 1.6555   LearningRate 0.0001   Epoch: 26   Global Step: 46160   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 11:38:15,887-Speed 24602.23 samples/sec   Loss 1.6552   LearningRate 0.0001   Epoch: 26   Global Step: 46170   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 11:38:25,764-Speed 24884.78 samples/sec   Loss 1.6425   LearningRate 0.0001   Epoch: 26   Global Step: 46180   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 11:38:35,640-Speed 24888.76 samples/sec   Loss 1.6439   LearningRate 0.0001   Epoch: 26   Global Step: 46190   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 11:38:45,541-Speed 24825.39 samples/sec   Loss 1.6604   LearningRate 0.0001   Epoch: 26   Global Step: 46200   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 11:38:55,608-Speed 24416.42 samples/sec   Loss 1.6685   LearningRate 0.0001   Epoch: 26   Global Step: 46210   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 11:39:05,537-Speed 24754.09 samples/sec   Loss 1.6423   LearningRate 0.0001   Epoch: 26   Global Step: 46220   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 11:39:15,500-Speed 24668.82 samples/sec   Loss 1.6413   LearningRate 0.0001   Epoch: 26   Global Step: 46230   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 11:39:25,521-Speed 24529.83 samples/sec   Loss 1.6553   LearningRate 0.0001   Epoch: 26   Global Step: 46240   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 11:39:35,386-Speed 24914.40 samples/sec   Loss 1.6470   LearningRate 0.0001   Epoch: 26   Global Step: 46250   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 11:39:45,331-Speed 24715.82 samples/sec   Loss 1.6613   LearningRate 0.0001   Epoch: 26   Global Step: 46260   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-03-26 11:39:55,273-Speed 24722.80 samples/sec   Loss 1.6483   LearningRate 0.0001   Epoch: 26   Global Step: 46270   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 11:40:05,283-Speed 24555.64 samples/sec   Loss 1.6534   LearningRate 0.0001   Epoch: 26   Global Step: 46280   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 11:40:15,176-Speed 24845.00 samples/sec   Loss 1.6445   LearningRate 0.0001   Epoch: 26   Global Step: 46290   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 11:40:25,041-Speed 24916.03 samples/sec   Loss 1.6504   LearningRate 0.0001   Epoch: 26   Global Step: 46300   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 11:40:35,096-Speed 24444.91 samples/sec   Loss 1.6529   LearningRate 0.0001   Epoch: 26   Global Step: 46310   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 11:40:44,980-Speed 24869.00 samples/sec   Loss 1.6403   LearningRate 0.0001   Epoch: 26   Global Step: 46320   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 11:40:54,979-Speed 24582.59 samples/sec   Loss 1.6453   LearningRate 0.0001   Epoch: 26   Global Step: 46330   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 11:41:04,856-Speed 24884.14 samples/sec   Loss 1.6447   LearningRate 0.0001   Epoch: 26   Global Step: 46340   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 11:41:14,732-Speed 24888.30 samples/sec   Loss 1.6401   LearningRate 0.0001   Epoch: 26   Global Step: 46350   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 11:41:24,654-Speed 24770.33 samples/sec   Loss 1.6419   LearningRate 0.0001   Epoch: 26   Global Step: 46360   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 11:41:34,541-Speed 24860.98 samples/sec   Loss 1.6314   LearningRate 0.0001   Epoch: 26   Global Step: 46370   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 11:41:44,459-Speed 24783.99 samples/sec   Loss 1.6532   LearningRate 0.0001   Epoch: 26   Global Step: 46380   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 11:41:54,442-Speed 24619.81 samples/sec   Loss 1.6425   LearningRate 0.0001   Epoch: 26   Global Step: 46390   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 11:42:04,419-Speed 24635.88 samples/sec   Loss 1.6590   LearningRate 0.0001   Epoch: 26   Global Step: 46400   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 11:42:14,363-Speed 24718.03 samples/sec   Loss 1.6391   LearningRate 0.0001   Epoch: 26   Global Step: 46410   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 11:42:24,280-Speed 24784.77 samples/sec   Loss 1.6376   LearningRate 0.0001   Epoch: 26   Global Step: 46420   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 11:42:34,200-Speed 24777.08 samples/sec   Loss 1.6395   LearningRate 0.0001   Epoch: 26   Global Step: 46430   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 11:42:44,153-Speed 24693.47 samples/sec   Loss 1.6514   LearningRate 0.0001   Epoch: 26   Global Step: 46440   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 11:42:54,063-Speed 24802.58 samples/sec   Loss 1.6383   LearningRate 0.0001   Epoch: 26   Global Step: 46450   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 11:43:03,994-Speed 24754.99 samples/sec   Loss 1.6485   LearningRate 0.0001   Epoch: 26   Global Step: 46460   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 11:43:14,019-Speed 24519.13 samples/sec   Loss 1.6515   LearningRate 0.0001   Epoch: 26   Global Step: 46470   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 11:43:23,963-Speed 24718.20 samples/sec   Loss 1.6493   LearningRate 0.0001   Epoch: 26   Global Step: 46480   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 11:43:34,148-Speed 24131.98 samples/sec   Loss 1.6437   LearningRate 0.0001   Epoch: 26   Global Step: 46490   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 11:43:43,954-Speed 25065.33 samples/sec   Loss 1.6405   LearningRate 0.0001   Epoch: 26   Global Step: 46500   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 11:43:53,692-Speed 25240.35 samples/sec   Loss 1.6482   LearningRate 0.0001   Epoch: 26   Global Step: 46510   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 11:44:03,505-Speed 25047.32 samples/sec   Loss 1.6383   LearningRate 0.0001   Epoch: 26   Global Step: 46520   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 11:44:13,210-Speed 25326.99 samples/sec   Loss 1.6352   LearningRate 0.0001   Epoch: 26   Global Step: 46530   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 11:44:22,961-Speed 25206.34 samples/sec   Loss 1.6328   LearningRate 0.0001   Epoch: 26   Global Step: 46540   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 11:44:32,729-Speed 25164.03 samples/sec   Loss 1.6320   LearningRate 0.0001   Epoch: 26   Global Step: 46550   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 11:44:42,464-Speed 25245.43 samples/sec   Loss 1.6433   LearningRate 0.0001   Epoch: 26   Global Step: 46560   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 11:44:52,290-Speed 25017.37 samples/sec   Loss 1.6536   LearningRate 0.0001   Epoch: 26   Global Step: 46570   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-03-26 11:45:02,254-Speed 24669.51 samples/sec   Loss 1.6515   LearningRate 0.0001   Epoch: 26   Global Step: 46580   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 11:45:12,198-Speed 24715.96 samples/sec   Loss 1.6417   LearningRate 0.0001   Epoch: 26   Global Step: 46590   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 11:45:22,131-Speed 24745.88 samples/sec   Loss 1.6458   LearningRate 0.0001   Epoch: 26   Global Step: 46600   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 11:45:32,023-Speed 24847.04 samples/sec   Loss 1.6464   LearningRate 0.0001   Epoch: 26   Global Step: 46610   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 11:45:41,945-Speed 24774.54 samples/sec   Loss 1.6530   LearningRate 0.0001   Epoch: 26   Global Step: 46620   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 11:45:51,888-Speed 24720.90 samples/sec   Loss 1.6449   LearningRate 0.0001   Epoch: 26   Global Step: 46630   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 11:46:01,814-Speed 24761.91 samples/sec   Loss 1.6481   LearningRate 0.0001   Epoch: 26   Global Step: 46640   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 11:46:11,724-Speed 24803.05 samples/sec   Loss 1.6426   LearningRate 0.0001   Epoch: 26   Global Step: 46650   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 11:46:21,658-Speed 24740.54 samples/sec   Loss 1.6314   LearningRate 0.0001   Epoch: 26   Global Step: 46660   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 11:47:21,087-Speed 4135.45 samples/sec   Loss 1.6455   LearningRate 0.0001   Epoch: 27   Global Step: 46670   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 11:47:30,807-Speed 25289.16 samples/sec   Loss 1.6234   LearningRate 0.0001   Epoch: 27   Global Step: 46680   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 11:47:40,672-Speed 24913.97 samples/sec   Loss 1.6218   LearningRate 0.0001   Epoch: 27   Global Step: 46690   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 11:47:50,821-Speed 24217.37 samples/sec   Loss 1.6430   LearningRate 0.0001   Epoch: 27   Global Step: 46700   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 11:48:00,782-Speed 24675.54 samples/sec   Loss 1.6321   LearningRate 0.0001   Epoch: 27   Global Step: 46710   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 11:48:10,702-Speed 24778.33 samples/sec   Loss 1.6279   LearningRate 0.0001   Epoch: 27   Global Step: 46720   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 11:48:20,724-Speed 24524.67 samples/sec   Loss 1.6175   LearningRate 0.0001   Epoch: 27   Global Step: 46730   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 11:48:30,735-Speed 24551.72 samples/sec   Loss 1.6221   LearningRate 0.0001   Epoch: 27   Global Step: 46740   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 11:48:40,768-Speed 24499.66 samples/sec   Loss 1.6227   LearningRate 0.0001   Epoch: 27   Global Step: 46750   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 11:48:50,822-Speed 24445.66 samples/sec   Loss 1.6339   LearningRate 0.0001   Epoch: 27   Global Step: 46760   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 11:49:00,865-Speed 24473.81 samples/sec   Loss 1.6465   LearningRate 0.0001   Epoch: 27   Global Step: 46770   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 11:49:10,974-Speed 24316.40 samples/sec   Loss 1.6339   LearningRate 0.0001   Epoch: 27   Global Step: 46780   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 11:49:20,881-Speed 24811.15 samples/sec   Loss 1.6295   LearningRate 0.0001   Epoch: 27   Global Step: 46790   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 11:49:30,971-Speed 24357.66 samples/sec   Loss 1.6297   LearningRate 0.0001   Epoch: 27   Global Step: 46800   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 11:49:41,130-Speed 24198.46 samples/sec   Loss 1.6328   LearningRate 0.0001   Epoch: 27   Global Step: 46810   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 11:49:51,038-Speed 24807.00 samples/sec   Loss 1.6321   LearningRate 0.0001   Epoch: 27   Global Step: 46820   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 11:50:00,998-Speed 24678.26 samples/sec   Loss 1.6310   LearningRate 0.0001   Epoch: 27   Global Step: 46830   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 11:50:11,098-Speed 24339.20 samples/sec   Loss 1.6321   LearningRate 0.0001   Epoch: 27   Global Step: 46840   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 11:50:21,205-Speed 24319.39 samples/sec   Loss 1.6208   LearningRate 0.0001   Epoch: 27   Global Step: 46850   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 11:50:31,332-Speed 24269.76 samples/sec   Loss 1.6340   LearningRate 0.0001   Epoch: 27   Global Step: 46860   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 11:50:41,330-Speed 24584.32 samples/sec   Loss 1.6302   LearningRate 0.0001   Epoch: 27   Global Step: 46870   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 11:50:51,451-Speed 24287.60 samples/sec   Loss 1.6497   LearningRate 0.0001   Epoch: 27   Global Step: 46880   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 11:51:01,427-Speed 24639.43 samples/sec   Loss 1.6224   LearningRate 0.0001   Epoch: 27   Global Step: 46890   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 11:51:11,539-Speed 24306.62 samples/sec   Loss 1.6228   LearningRate 0.0001   Epoch: 27   Global Step: 46900   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 11:51:21,579-Speed 24481.09 samples/sec   Loss 1.6169   LearningRate 0.0001   Epoch: 27   Global Step: 46910   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 11:51:31,518-Speed 24728.87 samples/sec   Loss 1.6330   LearningRate 0.0001   Epoch: 27   Global Step: 46920   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 11:51:41,673-Speed 24204.34 samples/sec   Loss 1.6180   LearningRate 0.0001   Epoch: 27   Global Step: 46930   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 11:51:51,737-Speed 24422.64 samples/sec   Loss 1.6419   LearningRate 0.0001   Epoch: 27   Global Step: 46940   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 11:52:01,962-Speed 24037.60 samples/sec   Loss 1.6300   LearningRate 0.0001   Epoch: 27   Global Step: 46950   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 11:52:12,037-Speed 24394.46 samples/sec   Loss 1.6179   LearningRate 0.0001   Epoch: 27   Global Step: 46960   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 11:52:22,126-Speed 24363.68 samples/sec   Loss 1.6330   LearningRate 0.0001   Epoch: 27   Global Step: 46970   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 11:52:32,020-Speed 24842.27 samples/sec   Loss 1.6225   LearningRate 0.0001   Epoch: 27   Global Step: 46980   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 11:52:42,088-Speed 24415.21 samples/sec   Loss 1.6277   LearningRate 0.0001   Epoch: 27   Global Step: 46990   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 11:52:52,085-Speed 24587.94 samples/sec   Loss 1.6313   LearningRate 0.0001   Epoch: 27   Global Step: 47000   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 11:53:02,145-Speed 24433.10 samples/sec   Loss 1.6434   LearningRate 0.0001   Epoch: 27   Global Step: 47010   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 11:53:12,133-Speed 24609.70 samples/sec   Loss 1.6239   LearningRate 0.0001   Epoch: 27   Global Step: 47020   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 11:53:22,289-Speed 24202.50 samples/sec   Loss 1.6245   LearningRate 0.0001   Epoch: 27   Global Step: 47030   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 11:53:32,404-Speed 24299.62 samples/sec   Loss 1.6355   LearningRate 0.0001   Epoch: 27   Global Step: 47040   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 11:53:42,387-Speed 24623.01 samples/sec   Loss 1.6188   LearningRate 0.0001   Epoch: 27   Global Step: 47050   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 11:53:52,261-Speed 24892.31 samples/sec   Loss 1.6227   LearningRate 0.0001   Epoch: 27   Global Step: 47060   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 11:54:02,197-Speed 24736.37 samples/sec   Loss 1.6387   LearningRate 0.0001   Epoch: 27   Global Step: 47070   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 11:54:12,153-Speed 24689.91 samples/sec   Loss 1.6109   LearningRate 0.0001   Epoch: 27   Global Step: 47080   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-03-26 11:54:22,146-Speed 24599.21 samples/sec   Loss 1.6169   LearningRate 0.0001   Epoch: 27   Global Step: 47090   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 11:54:32,204-Speed 24436.11 samples/sec   Loss 1.6302   LearningRate 0.0001   Epoch: 27   Global Step: 47100   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 11:54:42,273-Speed 24411.30 samples/sec   Loss 1.6311   LearningRate 0.0001   Epoch: 27   Global Step: 47110   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 11:54:52,384-Speed 24310.12 samples/sec   Loss 1.6270   LearningRate 0.0001   Epoch: 27   Global Step: 47120   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-26 11:55:02,423-Speed 24485.35 samples/sec   Loss 1.6323   LearningRate 0.0001   Epoch: 27   Global Step: 47130   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-26 11:55:12,431-Speed 24559.98 samples/sec   Loss 1.6146   LearningRate 0.0001   Epoch: 27   Global Step: 47140   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-26 11:55:22,411-Speed 24628.62 samples/sec   Loss 1.6207   LearningRate 0.0001   Epoch: 27   Global Step: 47150   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-26 11:55:32,489-Speed 24388.33 samples/sec   Loss 1.6256   LearningRate 0.0001   Epoch: 27   Global Step: 47160   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-26 11:55:42,504-Speed 24541.79 samples/sec   Loss 1.6145   LearningRate 0.0001   Epoch: 27   Global Step: 47170   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-26 11:55:52,484-Speed 24631.29 samples/sec   Loss 1.6196   LearningRate 0.0001   Epoch: 27   Global Step: 47180   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-26 11:56:02,596-Speed 24306.02 samples/sec   Loss 1.6165   LearningRate 0.0001   Epoch: 27   Global Step: 47190   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-26 11:56:12,655-Speed 24435.84 samples/sec   Loss 1.6109   LearningRate 0.0001   Epoch: 27   Global Step: 47200   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-26 11:56:22,703-Speed 24461.14 samples/sec   Loss 1.6252   LearningRate 0.0001   Epoch: 27   Global Step: 47210   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-26 11:56:32,656-Speed 24695.71 samples/sec   Loss 1.6059   LearningRate 0.0001   Epoch: 27   Global Step: 47220   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 11:56:42,588-Speed 24747.31 samples/sec   Loss 1.6324   LearningRate 0.0001   Epoch: 27   Global Step: 47230   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 11:56:52,585-Speed 24587.02 samples/sec   Loss 1.6173   LearningRate 0.0001   Epoch: 27   Global Step: 47240   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 11:57:02,657-Speed 24402.00 samples/sec   Loss 1.6283   LearningRate 0.0001   Epoch: 27   Global Step: 47250   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 11:57:12,400-Speed 25227.13 samples/sec   Loss 1.6239   LearningRate 0.0001   Epoch: 27   Global Step: 47260   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 11:57:22,283-Speed 24869.66 samples/sec   Loss 1.6068   LearningRate 0.0001   Epoch: 27   Global Step: 47270   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 11:57:32,044-Speed 25181.69 samples/sec   Loss 1.6157   LearningRate 0.0001   Epoch: 27   Global Step: 47280   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 11:57:41,890-Speed 24963.61 samples/sec   Loss 1.6222   LearningRate 0.0001   Epoch: 27   Global Step: 47290   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 11:57:51,758-Speed 24908.45 samples/sec   Loss 1.6128   LearningRate 0.0001   Epoch: 27   Global Step: 47300   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 11:58:01,494-Speed 25246.25 samples/sec   Loss 1.6147   LearningRate 0.0001   Epoch: 27   Global Step: 47310   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 11:58:11,269-Speed 25145.03 samples/sec   Loss 1.6149   LearningRate 0.0001   Epoch: 27   Global Step: 47320   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 11:58:21,050-Speed 25137.26 samples/sec   Loss 1.6210   LearningRate 0.0001   Epoch: 27   Global Step: 47330   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 11:58:30,820-Speed 25160.18 samples/sec   Loss 1.6144   LearningRate 0.0001   Epoch: 27   Global Step: 47340   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 11:58:40,615-Speed 25094.49 samples/sec   Loss 1.6184   LearningRate 0.0001   Epoch: 27   Global Step: 47350   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 11:58:50,376-Speed 25181.01 samples/sec   Loss 1.6072   LearningRate 0.0001   Epoch: 27   Global Step: 47360   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 11:59:00,232-Speed 24937.70 samples/sec   Loss 1.6125   LearningRate 0.0001   Epoch: 27   Global Step: 47370   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 11:59:10,029-Speed 25087.84 samples/sec   Loss 1.6128   LearningRate 0.0001   Epoch: 27   Global Step: 47380   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 11:59:19,863-Speed 24994.12 samples/sec   Loss 1.6040   LearningRate 0.0001   Epoch: 27   Global Step: 47390   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 11:59:29,726-Speed 24920.52 samples/sec   Loss 1.6186   LearningRate 0.0001   Epoch: 27   Global Step: 47400   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 11:59:39,471-Speed 25223.06 samples/sec   Loss 1.6127   LearningRate 0.0001   Epoch: 27   Global Step: 47410   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 11:59:49,271-Speed 25080.34 samples/sec   Loss 1.6178   LearningRate 0.0001   Epoch: 27   Global Step: 47420   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 11:59:58,978-Speed 25323.54 samples/sec   Loss 1.6139   LearningRate 0.0001   Epoch: 27   Global Step: 47430   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:00:08,744-Speed 25166.57 samples/sec   Loss 1.6040   LearningRate 0.0001   Epoch: 27   Global Step: 47440   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:00:18,565-Speed 25028.00 samples/sec   Loss 1.5972   LearningRate 0.0001   Epoch: 27   Global Step: 47450   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:00:28,517-Speed 24698.51 samples/sec   Loss 1.6131   LearningRate 0.0001   Epoch: 27   Global Step: 47460   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:00:38,436-Speed 24779.97 samples/sec   Loss 1.6099   LearningRate 0.0001   Epoch: 27   Global Step: 47470   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:00:48,255-Speed 25034.40 samples/sec   Loss 1.6105   LearningRate 0.0001   Epoch: 27   Global Step: 47480   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:00:58,233-Speed 24631.92 samples/sec   Loss 1.6110   LearningRate 0.0001   Epoch: 27   Global Step: 47490   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:01:08,091-Speed 24936.12 samples/sec   Loss 1.6014   LearningRate 0.0001   Epoch: 27   Global Step: 47500   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:01:17,892-Speed 25076.94 samples/sec   Loss 1.6172   LearningRate 0.0001   Epoch: 27   Global Step: 47510   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:01:27,846-Speed 24693.46 samples/sec   Loss 1.6013   LearningRate 0.0001   Epoch: 27   Global Step: 47520   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:01:37,708-Speed 24924.77 samples/sec   Loss 1.5993   LearningRate 0.0001   Epoch: 27   Global Step: 47530   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:01:47,497-Speed 25108.86 samples/sec   Loss 1.6109   LearningRate 0.0001   Epoch: 27   Global Step: 47540   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:01:57,302-Speed 25066.96 samples/sec   Loss 1.6069   LearningRate 0.0001   Epoch: 27   Global Step: 47550   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:02:07,098-Speed 25093.25 samples/sec   Loss 1.6064   LearningRate 0.0001   Epoch: 27   Global Step: 47560   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:02:16,970-Speed 24897.24 samples/sec   Loss 1.6161   LearningRate 0.0001   Epoch: 27   Global Step: 47570   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:02:26,930-Speed 24676.37 samples/sec   Loss 1.6174   LearningRate 0.0001   Epoch: 27   Global Step: 47580   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:02:36,805-Speed 24890.09 samples/sec   Loss 1.6178   LearningRate 0.0001   Epoch: 27   Global Step: 47590   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:02:46,689-Speed 24866.50 samples/sec   Loss 1.5927   LearningRate 0.0001   Epoch: 27   Global Step: 47600   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:02:56,600-Speed 24802.07 samples/sec   Loss 1.5951   LearningRate 0.0001   Epoch: 27   Global Step: 47610   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:03:06,419-Speed 25030.93 samples/sec   Loss 1.6154   LearningRate 0.0001   Epoch: 27   Global Step: 47620   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-03-26 12:03:16,149-Speed 25262.41 samples/sec   Loss 1.6009   LearningRate 0.0001   Epoch: 27   Global Step: 47630   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-03-26 12:03:25,944-Speed 25093.86 samples/sec   Loss 1.6104   LearningRate 0.0001   Epoch: 27   Global Step: 47640   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:03:35,704-Speed 25183.16 samples/sec   Loss 1.6104   LearningRate 0.0001   Epoch: 27   Global Step: 47650   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:03:45,503-Speed 25084.83 samples/sec   Loss 1.6129   LearningRate 0.0001   Epoch: 27   Global Step: 47660   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:03:55,327-Speed 25017.67 samples/sec   Loss 1.6027   LearningRate 0.0001   Epoch: 27   Global Step: 47670   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:04:05,042-Speed 25301.14 samples/sec   Loss 1.6194   LearningRate 0.0001   Epoch: 27   Global Step: 47680   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:04:14,899-Speed 24934.56 samples/sec   Loss 1.6036   LearningRate 0.0001   Epoch: 27   Global Step: 47690   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:04:24,677-Speed 25137.73 samples/sec   Loss 1.5942   LearningRate 0.0001   Epoch: 27   Global Step: 47700   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:04:34,416-Speed 25238.31 samples/sec   Loss 1.6061   LearningRate 0.0001   Epoch: 27   Global Step: 47710   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:04:44,211-Speed 25092.83 samples/sec   Loss 1.5987   LearningRate 0.0001   Epoch: 27   Global Step: 47720   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:04:54,051-Speed 24981.23 samples/sec   Loss 1.5993   LearningRate 0.0001   Epoch: 27   Global Step: 47730   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:05:03,880-Speed 25005.43 samples/sec   Loss 1.5906   LearningRate 0.0001   Epoch: 27   Global Step: 47740   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:05:13,692-Speed 25050.95 samples/sec   Loss 1.5939   LearningRate 0.0001   Epoch: 27   Global Step: 47750   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:05:23,419-Speed 25269.02 samples/sec   Loss 1.5983   LearningRate 0.0001   Epoch: 27   Global Step: 47760   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:05:33,138-Speed 25287.72 samples/sec   Loss 1.6038   LearningRate 0.0001   Epoch: 27   Global Step: 47770   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:05:42,952-Speed 25046.53 samples/sec   Loss 1.6066   LearningRate 0.0001   Epoch: 27   Global Step: 47780   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:05:52,663-Speed 25316.54 samples/sec   Loss 1.6017   LearningRate 0.0001   Epoch: 27   Global Step: 47790   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:06:02,355-Speed 25359.86 samples/sec   Loss 1.5992   LearningRate 0.0001   Epoch: 27   Global Step: 47800   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:06:12,049-Speed 25356.30 samples/sec   Loss 1.6006   LearningRate 0.0001   Epoch: 27   Global Step: 47810   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:06:21,762-Speed 25305.95 samples/sec   Loss 1.5984   LearningRate 0.0001   Epoch: 27   Global Step: 47820   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:06:31,570-Speed 25060.53 samples/sec   Loss 1.6062   LearningRate 0.0001   Epoch: 27   Global Step: 47830   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:06:41,294-Speed 25275.23 samples/sec   Loss 1.6037   LearningRate 0.0001   Epoch: 27   Global Step: 47840   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:06:51,120-Speed 25016.30 samples/sec   Loss 1.5947   LearningRate 0.0001   Epoch: 27   Global Step: 47850   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:07:00,913-Speed 25098.68 samples/sec   Loss 1.5821   LearningRate 0.0001   Epoch: 27   Global Step: 47860   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:07:10,642-Speed 25263.29 samples/sec   Loss 1.5925   LearningRate 0.0001   Epoch: 27   Global Step: 47870   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:07:20,419-Speed 25139.82 samples/sec   Loss 1.6015   LearningRate 0.0001   Epoch: 27   Global Step: 47880   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:07:30,162-Speed 25227.40 samples/sec   Loss 1.5987   LearningRate 0.0001   Epoch: 27   Global Step: 47890   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:07:39,910-Speed 25215.68 samples/sec   Loss 1.5988   LearningRate 0.0001   Epoch: 27   Global Step: 47900   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:07:49,627-Speed 25293.74 samples/sec   Loss 1.5901   LearningRate 0.0001   Epoch: 27   Global Step: 47910   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:07:59,360-Speed 25255.77 samples/sec   Loss 1.5953   LearningRate 0.0001   Epoch: 27   Global Step: 47920   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:08:09,160-Speed 25080.86 samples/sec   Loss 1.6057   LearningRate 0.0001   Epoch: 27   Global Step: 47930   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:08:18,956-Speed 25089.43 samples/sec   Loss 1.5954   LearningRate 0.0001   Epoch: 27   Global Step: 47940   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-03-26 12:08:28,811-Speed 24941.95 samples/sec   Loss 1.5924   LearningRate 0.0001   Epoch: 27   Global Step: 47950   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:08:38,583-Speed 25154.84 samples/sec   Loss 1.6050   LearningRate 0.0001   Epoch: 27   Global Step: 47960   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:08:48,434-Speed 24949.14 samples/sec   Loss 1.5996   LearningRate 0.0001   Epoch: 27   Global Step: 47970   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:08:58,226-Speed 25105.74 samples/sec   Loss 1.5954   LearningRate 0.0001   Epoch: 27   Global Step: 47980   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:09:08,017-Speed 25104.23 samples/sec   Loss 1.5762   LearningRate 0.0001   Epoch: 27   Global Step: 47990   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:09:17,764-Speed 25217.71 samples/sec   Loss 1.5998   LearningRate 0.0001   Epoch: 27   Global Step: 48000   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:09:27,478-Speed 25303.43 samples/sec   Loss 1.5921   LearningRate 0.0001   Epoch: 27   Global Step: 48010   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:09:37,282-Speed 25071.56 samples/sec   Loss 1.5853   LearningRate 0.0001   Epoch: 27   Global Step: 48020   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:09:46,997-Speed 25300.50 samples/sec   Loss 1.6017   LearningRate 0.0001   Epoch: 27   Global Step: 48030   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:09:56,737-Speed 25235.61 samples/sec   Loss 1.5902   LearningRate 0.0001   Epoch: 27   Global Step: 48040   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:10:06,455-Speed 25293.67 samples/sec   Loss 1.5922   LearningRate 0.0001   Epoch: 27   Global Step: 48050   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:10:16,227-Speed 25153.88 samples/sec   Loss 1.5967   LearningRate 0.0001   Epoch: 27   Global Step: 48060   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:10:25,975-Speed 25216.13 samples/sec   Loss 1.5870   LearningRate 0.0001   Epoch: 27   Global Step: 48070   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:10:35,821-Speed 24965.19 samples/sec   Loss 1.5875   LearningRate 0.0001   Epoch: 27   Global Step: 48080   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:10:45,631-Speed 25055.80 samples/sec   Loss 1.5873   LearningRate 0.0001   Epoch: 27   Global Step: 48090   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:10:55,322-Speed 25363.73 samples/sec   Loss 1.5996   LearningRate 0.0001   Epoch: 27   Global Step: 48100   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:11:05,096-Speed 25148.59 samples/sec   Loss 1.5984   LearningRate 0.0001   Epoch: 27   Global Step: 48110   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:11:14,882-Speed 25116.36 samples/sec   Loss 1.5971   LearningRate 0.0001   Epoch: 27   Global Step: 48120   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:11:24,702-Speed 25034.36 samples/sec   Loss 1.5841   LearningRate 0.0001   Epoch: 27   Global Step: 48130   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:11:34,428-Speed 25280.22 samples/sec   Loss 1.5975   LearningRate 0.0001   Epoch: 27   Global Step: 48140   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:11:44,170-Speed 25229.99 samples/sec   Loss 1.5818   LearningRate 0.0001   Epoch: 27   Global Step: 48150   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:11:53,894-Speed 25278.41 samples/sec   Loss 1.5814   LearningRate 0.0001   Epoch: 27   Global Step: 48160   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:12:03,703-Speed 25061.95 samples/sec   Loss 1.5917   LearningRate 0.0001   Epoch: 27   Global Step: 48170   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:12:13,417-Speed 25304.04 samples/sec   Loss 1.6077   LearningRate 0.0001   Epoch: 27   Global Step: 48180   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:12:23,175-Speed 25189.06 samples/sec   Loss 1.5959   LearningRate 0.0001   Epoch: 27   Global Step: 48190   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:12:32,959-Speed 25119.66 samples/sec   Loss 1.5947   LearningRate 0.0001   Epoch: 27   Global Step: 48200   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:12:42,748-Speed 25109.89 samples/sec   Loss 1.5964   LearningRate 0.0001   Epoch: 27   Global Step: 48210   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:12:52,480-Speed 25255.53 samples/sec   Loss 1.5865   LearningRate 0.0001   Epoch: 27   Global Step: 48220   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:13:02,199-Speed 25292.27 samples/sec   Loss 1.5895   LearningRate 0.0001   Epoch: 27   Global Step: 48230   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:13:11,995-Speed 25088.99 samples/sec   Loss 1.5900   LearningRate 0.0001   Epoch: 27   Global Step: 48240   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:13:21,688-Speed 25359.16 samples/sec   Loss 1.5901   LearningRate 0.0001   Epoch: 27   Global Step: 48250   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:13:31,421-Speed 25251.98 samples/sec   Loss 1.5906   LearningRate 0.0001   Epoch: 27   Global Step: 48260   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:13:41,168-Speed 25218.64 samples/sec   Loss 1.5954   LearningRate 0.0001   Epoch: 27   Global Step: 48270   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:13:50,884-Speed 25296.34 samples/sec   Loss 1.5987   LearningRate 0.0001   Epoch: 27   Global Step: 48280   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:14:00,727-Speed 24970.91 samples/sec   Loss 1.5903   LearningRate 0.0001   Epoch: 27   Global Step: 48290   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:14:10,660-Speed 24746.10 samples/sec   Loss 1.5795   LearningRate 0.0001   Epoch: 27   Global Step: 48300   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:14:20,478-Speed 25033.69 samples/sec   Loss 1.5871   LearningRate 0.0001   Epoch: 27   Global Step: 48310   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:14:30,284-Speed 25068.32 samples/sec   Loss 1.5840   LearningRate 0.0001   Epoch: 27   Global Step: 48320   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:14:40,081-Speed 25087.17 samples/sec   Loss 1.5903   LearningRate 0.0001   Epoch: 27   Global Step: 48330   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:14:49,826-Speed 25223.33 samples/sec   Loss 1.5731   LearningRate 0.0001   Epoch: 27   Global Step: 48340   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:14:59,658-Speed 25001.29 samples/sec   Loss 1.5970   LearningRate 0.0001   Epoch: 27   Global Step: 48350   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-03-26 12:15:09,508-Speed 24952.39 samples/sec   Loss 1.5935   LearningRate 0.0001   Epoch: 27   Global Step: 48360   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:15:19,221-Speed 25307.54 samples/sec   Loss 1.6015   LearningRate 0.0001   Epoch: 27   Global Step: 48370   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:15:28,985-Speed 25175.27 samples/sec   Loss 1.6010   LearningRate 0.0001   Epoch: 27   Global Step: 48380   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:15:38,787-Speed 25076.03 samples/sec   Loss 1.6009   LearningRate 0.0001   Epoch: 27   Global Step: 48390   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-26 12:16:37,813-Speed 4163.70 samples/sec   Loss 1.5899   LearningRate 0.0001   Epoch: 28   Global Step: 48400   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-26 12:16:47,604-Speed 25103.76 samples/sec   Loss 1.5760   LearningRate 0.0001   Epoch: 28   Global Step: 48410   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-26 12:16:57,372-Speed 25164.28 samples/sec   Loss 1.5611   LearningRate 0.0001   Epoch: 28   Global Step: 48420   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-26 12:17:07,039-Speed 25426.76 samples/sec   Loss 1.5867   LearningRate 0.0001   Epoch: 28   Global Step: 48430   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-26 12:17:16,760-Speed 25284.99 samples/sec   Loss 1.5773   LearningRate 0.0001   Epoch: 28   Global Step: 48440   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-26 12:17:26,453-Speed 25357.53 samples/sec   Loss 1.5700   LearningRate 0.0001   Epoch: 28   Global Step: 48450   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-26 12:17:36,176-Speed 25277.68 samples/sec   Loss 1.5691   LearningRate 0.0001   Epoch: 28   Global Step: 48460   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-26 12:17:45,890-Speed 25304.46 samples/sec   Loss 1.5778   LearningRate 0.0001   Epoch: 28   Global Step: 48470   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-26 12:17:55,574-Speed 25382.40 samples/sec   Loss 1.5822   LearningRate 0.0001   Epoch: 28   Global Step: 48480   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-26 12:18:05,329-Speed 25195.49 samples/sec   Loss 1.5712   LearningRate 0.0001   Epoch: 28   Global Step: 48490   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:18:15,047-Speed 25292.17 samples/sec   Loss 1.5610   LearningRate 0.0001   Epoch: 28   Global Step: 48500   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:18:24,776-Speed 25264.72 samples/sec   Loss 1.5714   LearningRate 0.0001   Epoch: 28   Global Step: 48510   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:18:34,468-Speed 25360.17 samples/sec   Loss 1.5714   LearningRate 0.0001   Epoch: 28   Global Step: 48520   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:18:44,207-Speed 25239.10 samples/sec   Loss 1.5732   LearningRate 0.0001   Epoch: 28   Global Step: 48530   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:18:53,947-Speed 25234.43 samples/sec   Loss 1.5613   LearningRate 0.0001   Epoch: 28   Global Step: 48540   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:19:03,829-Speed 24872.19 samples/sec   Loss 1.5816   LearningRate 0.0001   Epoch: 28   Global Step: 48550   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:19:13,545-Speed 25300.92 samples/sec   Loss 1.5626   LearningRate 0.0001   Epoch: 28   Global Step: 48560   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:19:23,368-Speed 25020.70 samples/sec   Loss 1.5757   LearningRate 0.0001   Epoch: 28   Global Step: 48570   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:19:33,188-Speed 25031.71 samples/sec   Loss 1.5847   LearningRate 0.0001   Epoch: 28   Global Step: 48580   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-26 12:19:42,914-Speed 25271.32 samples/sec   Loss 1.5802   LearningRate 0.0001   Epoch: 28   Global Step: 48590   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-26 12:19:52,680-Speed 25169.55 samples/sec   Loss 1.5911   LearningRate 0.0001   Epoch: 28   Global Step: 48600   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-26 12:20:02,735-Speed 24444.00 samples/sec   Loss 1.5778   LearningRate 0.0001   Epoch: 28   Global Step: 48610   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-26 12:20:12,827-Speed 24356.29 samples/sec   Loss 1.5782   LearningRate 0.0001   Epoch: 28   Global Step: 48620   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-26 12:20:22,915-Speed 24362.78 samples/sec   Loss 1.5726   LearningRate 0.0001   Epoch: 28   Global Step: 48630   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-26 12:20:32,991-Speed 24394.75 samples/sec   Loss 1.5865   LearningRate 0.0001   Epoch: 28   Global Step: 48640   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-26 12:20:43,056-Speed 24426.46 samples/sec   Loss 1.5720   LearningRate 0.0001   Epoch: 28   Global Step: 48650   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-26 12:20:53,136-Speed 24383.58 samples/sec   Loss 1.5807   LearningRate 0.0001   Epoch: 28   Global Step: 48660   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-26 12:21:02,935-Speed 25083.15 samples/sec   Loss 1.5741   LearningRate 0.0001   Epoch: 28   Global Step: 48670   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-26 12:21:12,700-Speed 25171.02 samples/sec   Loss 1.5639   LearningRate 0.0001   Epoch: 28   Global Step: 48680   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:21:22,470-Speed 25162.10 samples/sec   Loss 1.5785   LearningRate 0.0001   Epoch: 28   Global Step: 48690   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:21:32,196-Speed 25270.75 samples/sec   Loss 1.5625   LearningRate 0.0001   Epoch: 28   Global Step: 48700   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:21:41,880-Speed 25382.62 samples/sec   Loss 1.5660   LearningRate 0.0001   Epoch: 28   Global Step: 48710   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:21:51,577-Speed 25347.06 samples/sec   Loss 1.5696   LearningRate 0.0001   Epoch: 28   Global Step: 48720   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:22:01,421-Speed 24968.83 samples/sec   Loss 1.5799   LearningRate 0.0001   Epoch: 28   Global Step: 48730   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:22:11,305-Speed 24873.76 samples/sec   Loss 1.5751   LearningRate 0.0001   Epoch: 28   Global Step: 48740   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:22:21,145-Speed 24980.03 samples/sec   Loss 1.5772   LearningRate 0.0001   Epoch: 28   Global Step: 48750   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:22:31,043-Speed 24831.20 samples/sec   Loss 1.5645   LearningRate 0.0001   Epoch: 28   Global Step: 48760   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:22:40,962-Speed 24779.56 samples/sec   Loss 1.5722   LearningRate 0.0001   Epoch: 28   Global Step: 48770   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:22:50,771-Speed 25059.42 samples/sec   Loss 1.5793   LearningRate 0.0001   Epoch: 28   Global Step: 48780   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:23:00,673-Speed 24823.42 samples/sec   Loss 1.5667   LearningRate 0.0001   Epoch: 28   Global Step: 48790   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:23:10,491-Speed 25034.18 samples/sec   Loss 1.5698   LearningRate 0.0001   Epoch: 28   Global Step: 48800   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:23:20,223-Speed 25257.69 samples/sec   Loss 1.5820   LearningRate 0.0001   Epoch: 28   Global Step: 48810   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:23:29,993-Speed 25156.84 samples/sec   Loss 1.5805   LearningRate 0.0001   Epoch: 28   Global Step: 48820   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:23:39,707-Speed 25303.06 samples/sec   Loss 1.5752   LearningRate 0.0001   Epoch: 28   Global Step: 48830   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:23:49,545-Speed 24985.06 samples/sec   Loss 1.5763   LearningRate 0.0001   Epoch: 28   Global Step: 48840   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:23:59,418-Speed 24895.35 samples/sec   Loss 1.5717   LearningRate 0.0001   Epoch: 28   Global Step: 48850   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:24:09,222-Speed 25069.42 samples/sec   Loss 1.5688   LearningRate 0.0001   Epoch: 28   Global Step: 48860   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:24:18,931-Speed 25316.62 samples/sec   Loss 1.5724   LearningRate 0.0001   Epoch: 28   Global Step: 48870   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:24:28,624-Speed 25358.29 samples/sec   Loss 1.5480   LearningRate 0.0001   Epoch: 28   Global Step: 48880   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:24:38,371-Speed 25216.98 samples/sec   Loss 1.5682   LearningRate 0.0001   Epoch: 28   Global Step: 48890   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:24:48,044-Speed 25411.27 samples/sec   Loss 1.5672   LearningRate 0.0001   Epoch: 28   Global Step: 48900   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:24:57,770-Speed 25271.06 samples/sec   Loss 1.5702   LearningRate 0.0001   Epoch: 28   Global Step: 48910   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:25:07,500-Speed 25261.76 samples/sec   Loss 1.5734   LearningRate 0.0001   Epoch: 28   Global Step: 48920   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:25:17,220-Speed 25288.79 samples/sec   Loss 1.5780   LearningRate 0.0001   Epoch: 28   Global Step: 48930   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:25:26,911-Speed 25364.84 samples/sec   Loss 1.5689   LearningRate 0.0001   Epoch: 28   Global Step: 48940   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:25:36,697-Speed 25117.66 samples/sec   Loss 1.5684   LearningRate 0.0001   Epoch: 28   Global Step: 48950   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:25:46,417-Speed 25285.86 samples/sec   Loss 1.5619   LearningRate 0.0001   Epoch: 28   Global Step: 48960   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:25:56,168-Speed 25207.61 samples/sec   Loss 1.5554   LearningRate 0.0001   Epoch: 28   Global Step: 48970   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:26:05,888-Speed 25285.95 samples/sec   Loss 1.5667   LearningRate 0.0001   Epoch: 28   Global Step: 48980   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:26:15,737-Speed 24955.73 samples/sec   Loss 1.5777   LearningRate 0.0001   Epoch: 28   Global Step: 48990   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:26:25,426-Speed 25369.03 samples/sec   Loss 1.5663   LearningRate 0.0001   Epoch: 28   Global Step: 49000   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:26:35,149-Speed 25278.05 samples/sec   Loss 1.5489   LearningRate 0.0001   Epoch: 28   Global Step: 49010   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:26:44,907-Speed 25189.68 samples/sec   Loss 1.5740   LearningRate 0.0001   Epoch: 28   Global Step: 49020   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:26:54,639-Speed 25253.64 samples/sec   Loss 1.5656   LearningRate 0.0001   Epoch: 28   Global Step: 49030   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:27:04,310-Speed 25417.43 samples/sec   Loss 1.5712   LearningRate 0.0001   Epoch: 28   Global Step: 49040   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:27:14,075-Speed 25171.18 samples/sec   Loss 1.5815   LearningRate 0.0001   Epoch: 28   Global Step: 49050   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:27:23,819-Speed 25225.89 samples/sec   Loss 1.5700   LearningRate 0.0001   Epoch: 28   Global Step: 49060   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-26 12:27:33,653-Speed 24995.46 samples/sec   Loss 1.5780   LearningRate 0.0001   Epoch: 28   Global Step: 49070   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-26 12:27:43,412-Speed 25186.13 samples/sec   Loss 1.5680   LearningRate 0.0001   Epoch: 28   Global Step: 49080   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-26 12:27:53,198-Speed 25115.78 samples/sec   Loss 1.5736   LearningRate 0.0001   Epoch: 28   Global Step: 49090   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-26 12:28:02,890-Speed 25360.08 samples/sec   Loss 1.5651   LearningRate 0.0001   Epoch: 28   Global Step: 49100   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-26 12:28:12,647-Speed 25194.14 samples/sec   Loss 1.5608   LearningRate 0.0001   Epoch: 28   Global Step: 49110   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-26 12:28:22,364-Speed 25294.53 samples/sec   Loss 1.5727   LearningRate 0.0001   Epoch: 28   Global Step: 49120   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-26 12:28:32,137-Speed 25149.16 samples/sec   Loss 1.5650   LearningRate 0.0001   Epoch: 28   Global Step: 49130   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-26 12:28:41,947-Speed 25057.78 samples/sec   Loss 1.5694   LearningRate 0.0001   Epoch: 28   Global Step: 49140   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-26 12:28:51,649-Speed 25336.82 samples/sec   Loss 1.5641   LearningRate 0.0001   Epoch: 28   Global Step: 49150   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-26 12:29:01,412-Speed 25175.33 samples/sec   Loss 1.5786   LearningRate 0.0001   Epoch: 28   Global Step: 49160   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:29:11,206-Speed 25097.96 samples/sec   Loss 1.5552   LearningRate 0.0001   Epoch: 28   Global Step: 49170   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:29:20,921-Speed 25300.96 samples/sec   Loss 1.5559   LearningRate 0.0001   Epoch: 28   Global Step: 49180   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:29:30,695-Speed 25149.44 samples/sec   Loss 1.5668   LearningRate 0.0001   Epoch: 28   Global Step: 49190   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:29:40,413-Speed 25293.93 samples/sec   Loss 1.5679   LearningRate 0.0001   Epoch: 28   Global Step: 49200   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:29:50,181-Speed 25162.38 samples/sec   Loss 1.5552   LearningRate 0.0001   Epoch: 28   Global Step: 49210   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:29:59,968-Speed 25114.17 samples/sec   Loss 1.5554   LearningRate 0.0001   Epoch: 28   Global Step: 49220   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:30:09,712-Speed 25227.19 samples/sec   Loss 1.5587   LearningRate 0.0001   Epoch: 28   Global Step: 49230   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:30:19,573-Speed 24923.38 samples/sec   Loss 1.5565   LearningRate 0.0001   Epoch: 28   Global Step: 49240   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:30:29,329-Speed 25196.49 samples/sec   Loss 1.5540   LearningRate 0.0001   Epoch: 28   Global Step: 49250   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:30:39,067-Speed 25239.41 samples/sec   Loss 1.5499   LearningRate 0.0001   Epoch: 28   Global Step: 49260   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:30:48,774-Speed 25320.76 samples/sec   Loss 1.5586   LearningRate 0.0001   Epoch: 28   Global Step: 49270   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:30:58,641-Speed 24912.50 samples/sec   Loss 1.5567   LearningRate 0.0001   Epoch: 28   Global Step: 49280   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:31:08,386-Speed 25222.00 samples/sec   Loss 1.5441   LearningRate 0.0001   Epoch: 28   Global Step: 49290   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:31:18,168-Speed 25127.77 samples/sec   Loss 1.5499   LearningRate 0.0001   Epoch: 28   Global Step: 49300   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:31:27,939-Speed 25163.42 samples/sec   Loss 1.5515   LearningRate 0.0001   Epoch: 28   Global Step: 49310   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:31:37,619-Speed 25391.44 samples/sec   Loss 1.5516   LearningRate 0.0001   Epoch: 28   Global Step: 49320   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-26 12:31:47,416-Speed 25088.82 samples/sec   Loss 1.5567   LearningRate 0.0001   Epoch: 28   Global Step: 49330   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-26 12:31:57,117-Speed 25337.03 samples/sec   Loss 1.5520   LearningRate 0.0001   Epoch: 28   Global Step: 49340   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-26 12:32:06,803-Speed 25375.70 samples/sec   Loss 1.5529   LearningRate 0.0001   Epoch: 28   Global Step: 49350   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-26 12:32:16,508-Speed 25327.28 samples/sec   Loss 1.5510   LearningRate 0.0001   Epoch: 28   Global Step: 49360   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-26 12:32:26,311-Speed 25073.71 samples/sec   Loss 1.5313   LearningRate 0.0001   Epoch: 28   Global Step: 49370   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-26 12:32:36,064-Speed 25203.99 samples/sec   Loss 1.5433   LearningRate 0.0001   Epoch: 28   Global Step: 49380   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-26 12:32:45,849-Speed 25118.26 samples/sec   Loss 1.5561   LearningRate 0.0001   Epoch: 28   Global Step: 49390   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-26 12:32:55,585-Speed 25245.47 samples/sec   Loss 1.5522   LearningRate 0.0001   Epoch: 28   Global Step: 49400   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-26 12:33:05,571-Speed 24615.02 samples/sec   Loss 1.5480   LearningRate 0.0001   Epoch: 28   Global Step: 49410   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-26 12:33:15,618-Speed 24464.57 samples/sec   Loss 1.5506   LearningRate 0.0001   Epoch: 28   Global Step: 49420   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:33:25,595-Speed 24636.27 samples/sec   Loss 1.5663   LearningRate 0.0001   Epoch: 28   Global Step: 49430   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:33:35,505-Speed 24801.39 samples/sec   Loss 1.5458   LearningRate 0.0001   Epoch: 28   Global Step: 49440   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:33:45,522-Speed 24537.86 samples/sec   Loss 1.5548   LearningRate 0.0001   Epoch: 28   Global Step: 49450   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:33:55,497-Speed 24639.69 samples/sec   Loss 1.5564   LearningRate 0.0001   Epoch: 28   Global Step: 49460   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:34:05,495-Speed 24584.90 samples/sec   Loss 1.5579   LearningRate 0.0001   Epoch: 28   Global Step: 49470   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:34:15,545-Speed 24458.43 samples/sec   Loss 1.5528   LearningRate 0.0001   Epoch: 28   Global Step: 49480   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:34:25,563-Speed 24534.00 samples/sec   Loss 1.5555   LearningRate 0.0001   Epoch: 28   Global Step: 49490   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:34:35,581-Speed 24535.46 samples/sec   Loss 1.5523   LearningRate 0.0001   Epoch: 28   Global Step: 49500   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:34:45,690-Speed 24314.61 samples/sec   Loss 1.5497   LearningRate 0.0001   Epoch: 28   Global Step: 49510   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:34:55,699-Speed 24558.64 samples/sec   Loss 1.5486   LearningRate 0.0001   Epoch: 28   Global Step: 49520   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:35:05,712-Speed 24546.85 samples/sec   Loss 1.5445   LearningRate 0.0001   Epoch: 28   Global Step: 49530   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:35:15,748-Speed 24491.79 samples/sec   Loss 1.5503   LearningRate 0.0001   Epoch: 28   Global Step: 49540   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:35:25,865-Speed 24295.41 samples/sec   Loss 1.5555   LearningRate 0.0001   Epoch: 28   Global Step: 49550   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:35:36,037-Speed 24162.71 samples/sec   Loss 1.5453   LearningRate 0.0001   Epoch: 28   Global Step: 49560   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:35:46,018-Speed 24626.68 samples/sec   Loss 1.5551   LearningRate 0.0001   Epoch: 28   Global Step: 49570   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:35:56,016-Speed 24584.38 samples/sec   Loss 1.5461   LearningRate 0.0001   Epoch: 28   Global Step: 49580   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:36:06,166-Speed 24217.18 samples/sec   Loss 1.5473   LearningRate 0.0001   Epoch: 28   Global Step: 49590   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:36:16,190-Speed 24521.34 samples/sec   Loss 1.5530   LearningRate 0.0001   Epoch: 28   Global Step: 49600   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:36:26,173-Speed 24620.78 samples/sec   Loss 1.5509   LearningRate 0.0001   Epoch: 28   Global Step: 49610   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:36:36,286-Speed 24305.03 samples/sec   Loss 1.5430   LearningRate 0.0001   Epoch: 28   Global Step: 49620   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-26 12:36:46,507-Speed 24045.26 samples/sec   Loss 1.5331   LearningRate 0.0001   Epoch: 28   Global Step: 49630   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:36:56,504-Speed 24589.67 samples/sec   Loss 1.5468   LearningRate 0.0001   Epoch: 28   Global Step: 49640   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:37:06,576-Speed 24403.09 samples/sec   Loss 1.5348   LearningRate 0.0001   Epoch: 28   Global Step: 49650   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:37:16,625-Speed 24461.21 samples/sec   Loss 1.5546   LearningRate 0.0001   Epoch: 28   Global Step: 49660   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:37:26,699-Speed 24405.01 samples/sec   Loss 1.5486   LearningRate 0.0001   Epoch: 28   Global Step: 49670   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:37:36,743-Speed 24473.54 samples/sec   Loss 1.5455   LearningRate 0.0001   Epoch: 28   Global Step: 49680   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:37:46,710-Speed 24659.08 samples/sec   Loss 1.5468   LearningRate 0.0001   Epoch: 28   Global Step: 49690   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:37:56,828-Speed 24295.88 samples/sec   Loss 1.5448   LearningRate 0.0001   Epoch: 28   Global Step: 49700   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:38:06,844-Speed 24540.00 samples/sec   Loss 1.5518   LearningRate 0.0001   Epoch: 28   Global Step: 49710   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:38:16,944-Speed 24334.45 samples/sec   Loss 1.5416   LearningRate 0.0001   Epoch: 28   Global Step: 49720   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-03-26 12:38:26,996-Speed 24453.07 samples/sec   Loss 1.5311   LearningRate 0.0001   Epoch: 28   Global Step: 49730   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:38:37,030-Speed 24497.31 samples/sec   Loss 1.5385   LearningRate 0.0001   Epoch: 28   Global Step: 49740   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:38:46,998-Speed 24657.09 samples/sec   Loss 1.5460   LearningRate 0.0001   Epoch: 28   Global Step: 49750   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:38:57,050-Speed 24454.47 samples/sec   Loss 1.5562   LearningRate 0.0001   Epoch: 28   Global Step: 49760   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:39:07,156-Speed 24320.42 samples/sec   Loss 1.5407   LearningRate 0.0001   Epoch: 28   Global Step: 49770   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:39:17,137-Speed 24625.72 samples/sec   Loss 1.5367   LearningRate 0.0001   Epoch: 28   Global Step: 49780   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:39:27,144-Speed 24566.15 samples/sec   Loss 1.5350   LearningRate 0.0001   Epoch: 28   Global Step: 49790   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:39:37,136-Speed 24597.41 samples/sec   Loss 1.5380   LearningRate 0.0001   Epoch: 28   Global Step: 49800   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:39:47,173-Speed 24490.40 samples/sec   Loss 1.5445   LearningRate 0.0001   Epoch: 28   Global Step: 49810   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:39:57,195-Speed 24525.40 samples/sec   Loss 1.5258   LearningRate 0.0001   Epoch: 28   Global Step: 49820   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:40:07,170-Speed 24639.64 samples/sec   Loss 1.5395   LearningRate 0.0001   Epoch: 28   Global Step: 49830   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-03-26 12:40:17,329-Speed 24195.43 samples/sec   Loss 1.5360   LearningRate 0.0001   Epoch: 28   Global Step: 49840   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:40:27,354-Speed 24518.52 samples/sec   Loss 1.5465   LearningRate 0.0001   Epoch: 28   Global Step: 49850   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:40:37,623-Speed 23936.20 samples/sec   Loss 1.5403   LearningRate 0.0001   Epoch: 28   Global Step: 49860   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:40:47,701-Speed 24387.38 samples/sec   Loss 1.5424   LearningRate 0.0001   Epoch: 28   Global Step: 49870   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:40:57,789-Speed 24366.67 samples/sec   Loss 1.5474   LearningRate 0.0001   Epoch: 28   Global Step: 49880   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:41:07,875-Speed 24368.94 samples/sec   Loss 1.5294   LearningRate 0.0001   Epoch: 28   Global Step: 49890   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:41:17,901-Speed 24518.64 samples/sec   Loss 1.5329   LearningRate 0.0001   Epoch: 28   Global Step: 49900   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:41:27,896-Speed 24593.02 samples/sec   Loss 1.5482   LearningRate 0.0001   Epoch: 28   Global Step: 49910   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:41:37,961-Speed 24420.60 samples/sec   Loss 1.5459   LearningRate 0.0001   Epoch: 28   Global Step: 49920   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:41:48,008-Speed 24463.51 samples/sec   Loss 1.5384   LearningRate 0.0001   Epoch: 28   Global Step: 49930   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:41:58,221-Speed 24066.25 samples/sec   Loss 1.5360   LearningRate 0.0001   Epoch: 28   Global Step: 49940   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:42:08,253-Speed 24500.61 samples/sec   Loss 1.5427   LearningRate 0.0001   Epoch: 28   Global Step: 49950   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:42:18,286-Speed 24498.51 samples/sec   Loss 1.5409   LearningRate 0.0001   Epoch: 28   Global Step: 49960   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:42:28,414-Speed 24269.40 samples/sec   Loss 1.5388   LearningRate 0.0001   Epoch: 28   Global Step: 49970   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:42:38,420-Speed 24567.26 samples/sec   Loss 1.5388   LearningRate 0.0001   Epoch: 28   Global Step: 49980   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:42:48,594-Speed 24161.41 samples/sec   Loss 1.5361   LearningRate 0.0001   Epoch: 28   Global Step: 49990   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:42:58,668-Speed 24403.67 samples/sec   Loss 1.5462   LearningRate 0.0001   Epoch: 28   Global Step: 50000   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:43:08,671-Speed 24573.02 samples/sec   Loss 1.5464   LearningRate 0.0001   Epoch: 28   Global Step: 50010   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:43:18,776-Speed 24325.24 samples/sec   Loss 1.5319   LearningRate 0.0001   Epoch: 28   Global Step: 50020   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:43:28,828-Speed 24452.77 samples/sec   Loss 1.5335   LearningRate 0.0001   Epoch: 28   Global Step: 50030   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:43:38,812-Speed 24618.55 samples/sec   Loss 1.5496   LearningRate 0.0001   Epoch: 28   Global Step: 50040   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:43:48,770-Speed 24680.51 samples/sec   Loss 1.5328   LearningRate 0.0001   Epoch: 28   Global Step: 50050   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:43:58,910-Speed 24240.51 samples/sec   Loss 1.5416   LearningRate 0.0001   Epoch: 28   Global Step: 50060   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:44:08,891-Speed 24629.07 samples/sec   Loss 1.5377   LearningRate 0.0001   Epoch: 28   Global Step: 50070   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:44:19,002-Speed 24310.41 samples/sec   Loss 1.5222   LearningRate 0.0001   Epoch: 28   Global Step: 50080   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:44:29,087-Speed 24371.20 samples/sec   Loss 1.5476   LearningRate 0.0001   Epoch: 28   Global Step: 50090   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:44:39,098-Speed 24552.23 samples/sec   Loss 1.5396   LearningRate 0.0001   Epoch: 28   Global Step: 50100   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:44:49,284-Speed 24137.81 samples/sec   Loss 1.5568   LearningRate 0.0001   Epoch: 28   Global Step: 50110   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:44:59,253-Speed 24655.78 samples/sec   Loss 1.5450   LearningRate 0.0001   Epoch: 28   Global Step: 50120   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:45:58,998-Speed 4113.58 samples/sec   Loss 1.5272   LearningRate 0.0001   Epoch: 29   Global Step: 50130   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:46:09,063-Speed 24421.63 samples/sec   Loss 1.5333   LearningRate 0.0001   Epoch: 29   Global Step: 50140   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-03-26 12:46:19,015-Speed 24695.60 samples/sec   Loss 1.5133   LearningRate 0.0001   Epoch: 29   Global Step: 50150   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:46:28,954-Speed 24733.01 samples/sec   Loss 1.5270   LearningRate 0.0001   Epoch: 29   Global Step: 50160   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:46:38,953-Speed 24580.85 samples/sec   Loss 1.5468   LearningRate 0.0001   Epoch: 29   Global Step: 50170   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:46:49,007-Speed 24448.24 samples/sec   Loss 1.5252   LearningRate 0.0001   Epoch: 29   Global Step: 50180   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:46:59,086-Speed 24387.81 samples/sec   Loss 1.5286   LearningRate 0.0001   Epoch: 29   Global Step: 50190   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:47:09,093-Speed 24561.79 samples/sec   Loss 1.5280   LearningRate 0.0001   Epoch: 29   Global Step: 50200   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:47:19,153-Speed 24434.06 samples/sec   Loss 1.5183   LearningRate 0.0001   Epoch: 29   Global Step: 50210   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:47:29,247-Speed 24351.11 samples/sec   Loss 1.5151   LearningRate 0.0001   Epoch: 29   Global Step: 50220   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:47:39,315-Speed 24411.51 samples/sec   Loss 1.5147   LearningRate 0.0001   Epoch: 29   Global Step: 50230   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:47:49,304-Speed 24606.45 samples/sec   Loss 1.5285   LearningRate 0.0001   Epoch: 29   Global Step: 50240   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:47:59,273-Speed 24656.19 samples/sec   Loss 1.5256   LearningRate 0.0001   Epoch: 29   Global Step: 50250   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:48:09,314-Speed 24478.93 samples/sec   Loss 1.5314   LearningRate 0.0001   Epoch: 29   Global Step: 50260   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:48:19,439-Speed 24275.20 samples/sec   Loss 1.5275   LearningRate 0.0001   Epoch: 29   Global Step: 50270   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:48:29,439-Speed 24579.62 samples/sec   Loss 1.5160   LearningRate 0.0001   Epoch: 29   Global Step: 50280   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:48:39,422-Speed 24621.59 samples/sec   Loss 1.5210   LearningRate 0.0001   Epoch: 29   Global Step: 50290   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:48:49,487-Speed 24420.67 samples/sec   Loss 1.5334   LearningRate 0.0001   Epoch: 29   Global Step: 50300   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:48:59,574-Speed 24365.87 samples/sec   Loss 1.5243   LearningRate 0.0001   Epoch: 29   Global Step: 50310   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:49:09,763-Speed 24123.90 samples/sec   Loss 1.5244   LearningRate 0.0001   Epoch: 29   Global Step: 50320   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:49:19,751-Speed 24608.26 samples/sec   Loss 1.5380   LearningRate 0.0001   Epoch: 29   Global Step: 50330   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:49:29,714-Speed 24671.17 samples/sec   Loss 1.5292   LearningRate 0.0001   Epoch: 29   Global Step: 50340   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:49:39,763-Speed 24460.40 samples/sec   Loss 1.5302   LearningRate 0.0001   Epoch: 29   Global Step: 50350   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-03-26 12:49:49,812-Speed 24460.92 samples/sec   Loss 1.5271   LearningRate 0.0001   Epoch: 29   Global Step: 50360   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:49:59,967-Speed 24206.04 samples/sec   Loss 1.5296   LearningRate 0.0001   Epoch: 29   Global Step: 50370   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:50:09,943-Speed 24638.27 samples/sec   Loss 1.5226   LearningRate 0.0001   Epoch: 29   Global Step: 50380   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:50:20,276-Speed 23786.57 samples/sec   Loss 1.5195   LearningRate 0.0001   Epoch: 29   Global Step: 50390   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:50:30,235-Speed 24683.45 samples/sec   Loss 1.5283   LearningRate 0.0001   Epoch: 29   Global Step: 50400   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:50:40,293-Speed 24436.72 samples/sec   Loss 1.5216   LearningRate 0.0001   Epoch: 29   Global Step: 50410   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:50:50,385-Speed 24356.11 samples/sec   Loss 1.5232   LearningRate 0.0001   Epoch: 29   Global Step: 50420   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:51:00,423-Speed 24486.08 samples/sec   Loss 1.5328   LearningRate 0.0001   Epoch: 29   Global Step: 50430   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:51:10,435-Speed 24549.78 samples/sec   Loss 1.5264   LearningRate 0.0001   Epoch: 29   Global Step: 50440   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:51:20,413-Speed 24631.88 samples/sec   Loss 1.5222   LearningRate 0.0001   Epoch: 29   Global Step: 50450   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:51:30,371-Speed 24682.17 samples/sec   Loss 1.5186   LearningRate 0.0001   Epoch: 29   Global Step: 50460   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:51:40,449-Speed 24390.69 samples/sec   Loss 1.5371   LearningRate 0.0001   Epoch: 29   Global Step: 50470   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:51:50,588-Speed 24240.78 samples/sec   Loss 1.5249   LearningRate 0.0001   Epoch: 29   Global Step: 50480   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:52:00,650-Speed 24426.65 samples/sec   Loss 1.5182   LearningRate 0.0001   Epoch: 29   Global Step: 50490   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:52:10,809-Speed 24196.66 samples/sec   Loss 1.5323   LearningRate 0.0001   Epoch: 29   Global Step: 50500   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:52:21,032-Speed 24043.42 samples/sec   Loss 1.5233   LearningRate 0.0001   Epoch: 29   Global Step: 50510   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:52:31,167-Speed 24252.42 samples/sec   Loss 1.5223   LearningRate 0.0001   Epoch: 29   Global Step: 50520   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:52:41,400-Speed 24020.64 samples/sec   Loss 1.5171   LearningRate 0.0001   Epoch: 29   Global Step: 50530   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:52:51,603-Speed 24089.15 samples/sec   Loss 1.5280   LearningRate 0.0001   Epoch: 29   Global Step: 50540   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:53:01,715-Speed 24314.42 samples/sec   Loss 1.5264   LearningRate 0.0001   Epoch: 29   Global Step: 50550   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:53:11,730-Speed 24541.97 samples/sec   Loss 1.5308   LearningRate 0.0001   Epoch: 29   Global Step: 50560   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:53:21,765-Speed 24493.00 samples/sec   Loss 1.5295   LearningRate 0.0001   Epoch: 29   Global Step: 50570   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:53:31,960-Speed 24108.99 samples/sec   Loss 1.5322   LearningRate 0.0001   Epoch: 29   Global Step: 50580   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:53:42,014-Speed 24446.16 samples/sec   Loss 1.5137   LearningRate 0.0001   Epoch: 29   Global Step: 50590   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:53:52,192-Speed 24150.39 samples/sec   Loss 1.5222   LearningRate 0.0001   Epoch: 29   Global Step: 50600   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:54:02,242-Speed 24455.32 samples/sec   Loss 1.5213   LearningRate 0.0001   Epoch: 29   Global Step: 50610   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:54:12,362-Speed 24289.71 samples/sec   Loss 1.5238   LearningRate 0.0001   Epoch: 29   Global Step: 50620   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:54:22,505-Speed 24232.85 samples/sec   Loss 1.5173   LearningRate 0.0001   Epoch: 29   Global Step: 50630   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:54:32,555-Speed 24460.92 samples/sec   Loss 1.5099   LearningRate 0.0001   Epoch: 29   Global Step: 50640   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:54:42,537-Speed 24625.30 samples/sec   Loss 1.5167   LearningRate 0.0001   Epoch: 29   Global Step: 50650   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:54:52,455-Speed 24784.34 samples/sec   Loss 1.5131   LearningRate 0.0001   Epoch: 29   Global Step: 50660   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:55:02,464-Speed 24559.50 samples/sec   Loss 1.5132   LearningRate 0.0001   Epoch: 29   Global Step: 50670   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:55:12,407-Speed 24718.63 samples/sec   Loss 1.5222   LearningRate 0.0001   Epoch: 29   Global Step: 50680   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:55:22,285-Speed 24885.38 samples/sec   Loss 1.5212   LearningRate 0.0001   Epoch: 29   Global Step: 50690   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:55:32,120-Speed 25000.31 samples/sec   Loss 1.5092   LearningRate 0.0001   Epoch: 29   Global Step: 50700   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:55:41,906-Speed 25117.48 samples/sec   Loss 1.5285   LearningRate 0.0001   Epoch: 29   Global Step: 50710   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:55:51,799-Speed 24847.17 samples/sec   Loss 1.5154   LearningRate 0.0001   Epoch: 29   Global Step: 50720   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:56:01,678-Speed 24879.63 samples/sec   Loss 1.5224   LearningRate 0.0001   Epoch: 29   Global Step: 50730   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:56:11,486-Speed 25061.29 samples/sec   Loss 1.5138   LearningRate 0.0001   Epoch: 29   Global Step: 50740   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:56:21,253-Speed 25170.53 samples/sec   Loss 1.5110   LearningRate 0.0001   Epoch: 29   Global Step: 50750   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:56:30,975-Speed 25283.64 samples/sec   Loss 1.5142   LearningRate 0.0001   Epoch: 29   Global Step: 50760   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:56:40,819-Speed 24968.93 samples/sec   Loss 1.5149   LearningRate 0.0001   Epoch: 29   Global Step: 50770   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:56:50,643-Speed 25021.97 samples/sec   Loss 1.5228   LearningRate 0.0001   Epoch: 29   Global Step: 50780   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:57:00,415-Speed 25152.33 samples/sec   Loss 1.5178   LearningRate 0.0001   Epoch: 29   Global Step: 50790   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:57:10,255-Speed 24978.95 samples/sec   Loss 1.5142   LearningRate 0.0001   Epoch: 29   Global Step: 50800   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:57:19,989-Speed 25249.87 samples/sec   Loss 1.5180   LearningRate 0.0001   Epoch: 29   Global Step: 50810   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:57:29,771-Speed 25127.23 samples/sec   Loss 1.5213   LearningRate 0.0001   Epoch: 29   Global Step: 50820   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:57:39,553-Speed 25125.57 samples/sec   Loss 1.5203   LearningRate 0.0001   Epoch: 29   Global Step: 50830   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:57:49,393-Speed 24979.15 samples/sec   Loss 1.4959   LearningRate 0.0001   Epoch: 29   Global Step: 50840   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:57:59,201-Speed 25061.35 samples/sec   Loss 1.5105   LearningRate 0.0001   Epoch: 29   Global Step: 50850   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:58:08,961-Speed 25184.49 samples/sec   Loss 1.5167   LearningRate 0.0001   Epoch: 29   Global Step: 50860   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:58:18,830-Speed 24906.77 samples/sec   Loss 1.5176   LearningRate 0.0001   Epoch: 29   Global Step: 50870   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:58:28,580-Speed 25212.13 samples/sec   Loss 1.5100   LearningRate 0.0001   Epoch: 29   Global Step: 50880   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:58:38,410-Speed 25005.55 samples/sec   Loss 1.5110   LearningRate 0.0001   Epoch: 29   Global Step: 50890   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:58:48,226-Speed 25040.69 samples/sec   Loss 1.5156   LearningRate 0.0001   Epoch: 29   Global Step: 50900   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:58:57,928-Speed 25335.47 samples/sec   Loss 1.5069   LearningRate 0.0001   Epoch: 29   Global Step: 50910   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:59:07,763-Speed 24990.96 samples/sec   Loss 1.5173   LearningRate 0.0001   Epoch: 29   Global Step: 50920   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:59:17,469-Speed 25324.58 samples/sec   Loss 1.5060   LearningRate 0.0001   Epoch: 29   Global Step: 50930   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:59:27,239-Speed 25161.82 samples/sec   Loss 1.4971   LearningRate 0.0001   Epoch: 29   Global Step: 50940   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:59:36,968-Speed 25263.19 samples/sec   Loss 1.5175   LearningRate 0.0001   Epoch: 29   Global Step: 50950   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 12:59:46,701-Speed 25255.57 samples/sec   Loss 1.5021   LearningRate 0.0001   Epoch: 29   Global Step: 50960   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-03-26 12:59:56,446-Speed 25222.96 samples/sec   Loss 1.5133   LearningRate 0.0001   Epoch: 29   Global Step: 50970   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:00:06,204-Speed 25189.56 samples/sec   Loss 1.5151   LearningRate 0.0001   Epoch: 29   Global Step: 50980   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:00:16,040-Speed 24990.18 samples/sec   Loss 1.5121   LearningRate 0.0001   Epoch: 29   Global Step: 50990   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:00:25,760-Speed 25286.61 samples/sec   Loss 1.5164   LearningRate 0.0001   Epoch: 29   Global Step: 51000   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-26 13:00:35,572-Speed 25051.28 samples/sec   Loss 1.5074   LearningRate 0.0001   Epoch: 29   Global Step: 51010   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-26 13:00:45,446-Speed 24891.21 samples/sec   Loss 1.5056   LearningRate 0.0001   Epoch: 29   Global Step: 51020   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-26 13:00:55,214-Speed 25166.19 samples/sec   Loss 1.4991   LearningRate 0.0001   Epoch: 29   Global Step: 51030   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-26 13:01:05,119-Speed 24814.22 samples/sec   Loss 1.5142   LearningRate 0.0001   Epoch: 29   Global Step: 51040   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-26 13:01:14,937-Speed 25033.06 samples/sec   Loss 1.5079   LearningRate 0.0001   Epoch: 29   Global Step: 51050   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-26 13:01:24,729-Speed 25105.65 samples/sec   Loss 1.5149   LearningRate 0.0001   Epoch: 29   Global Step: 51060   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-26 13:01:34,568-Speed 24984.52 samples/sec   Loss 1.5048   LearningRate 0.0001   Epoch: 29   Global Step: 51070   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-26 13:01:44,329-Speed 25180.36 samples/sec   Loss 1.5026   LearningRate 0.0001   Epoch: 29   Global Step: 51080   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-26 13:01:54,086-Speed 25196.19 samples/sec   Loss 1.4920   LearningRate 0.0001   Epoch: 29   Global Step: 51090   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-26 13:02:03,823-Speed 25246.25 samples/sec   Loss 1.5023   LearningRate 0.0001   Epoch: 29   Global Step: 51100   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:02:13,568-Speed 25219.77 samples/sec   Loss 1.5102   LearningRate 0.0001   Epoch: 29   Global Step: 51110   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:02:23,328-Speed 25182.93 samples/sec   Loss 1.5040   LearningRate 0.0001   Epoch: 29   Global Step: 51120   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:02:33,073-Speed 25223.16 samples/sec   Loss 1.5072   LearningRate 0.0001   Epoch: 29   Global Step: 51130   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:02:43,020-Speed 24710.71 samples/sec   Loss 1.4996   LearningRate 0.0001   Epoch: 29   Global Step: 51140   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:02:52,740-Speed 25285.82 samples/sec   Loss 1.5129   LearningRate 0.0001   Epoch: 29   Global Step: 51150   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:03:02,509-Speed 25162.29 samples/sec   Loss 1.4999   LearningRate 0.0001   Epoch: 29   Global Step: 51160   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:03:12,243-Speed 25251.30 samples/sec   Loss 1.5122   LearningRate 0.0001   Epoch: 29   Global Step: 51170   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:03:21,936-Speed 25358.55 samples/sec   Loss 1.4987   LearningRate 0.0001   Epoch: 29   Global Step: 51180   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-26 13:03:31,693-Speed 25190.87 samples/sec   Loss 1.4953   LearningRate 0.0001   Epoch: 29   Global Step: 51190   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-26 13:03:41,531-Speed 24983.48 samples/sec   Loss 1.5076   LearningRate 0.0001   Epoch: 29   Global Step: 51200   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-26 13:03:51,310-Speed 25136.33 samples/sec   Loss 1.5077   LearningRate 0.0001   Epoch: 29   Global Step: 51210   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-26 13:04:01,158-Speed 24959.34 samples/sec   Loss 1.5118   LearningRate 0.0001   Epoch: 29   Global Step: 51220   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-26 13:04:10,973-Speed 25041.66 samples/sec   Loss 1.5043   LearningRate 0.0001   Epoch: 29   Global Step: 51230   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-26 13:04:20,819-Speed 24965.95 samples/sec   Loss 1.4994   LearningRate 0.0001   Epoch: 29   Global Step: 51240   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-26 13:04:30,520-Speed 25336.55 samples/sec   Loss 1.5016   LearningRate 0.0001   Epoch: 29   Global Step: 51250   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-26 13:04:40,236-Speed 25296.90 samples/sec   Loss 1.5056   LearningRate 0.0001   Epoch: 29   Global Step: 51260   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-26 13:04:49,957-Speed 25286.65 samples/sec   Loss 1.4975   LearningRate 0.0001   Epoch: 29   Global Step: 51270   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-26 13:04:59,706-Speed 25210.51 samples/sec   Loss 1.5046   LearningRate 0.0001   Epoch: 29   Global Step: 51280   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:05:09,484-Speed 25140.68 samples/sec   Loss 1.5044   LearningRate 0.0001   Epoch: 29   Global Step: 51290   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:05:19,229-Speed 25222.81 samples/sec   Loss 1.4994   LearningRate 0.0001   Epoch: 29   Global Step: 51300   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:05:28,983-Speed 25202.44 samples/sec   Loss 1.5090   LearningRate 0.0001   Epoch: 29   Global Step: 51310   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:05:38,730-Speed 25218.04 samples/sec   Loss 1.4949   LearningRate 0.0001   Epoch: 29   Global Step: 51320   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:05:48,471-Speed 25232.42 samples/sec   Loss 1.4935   LearningRate 0.0001   Epoch: 29   Global Step: 51330   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:05:58,213-Speed 25230.55 samples/sec   Loss 1.4966   LearningRate 0.0001   Epoch: 29   Global Step: 51340   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:06:08,016-Speed 25074.18 samples/sec   Loss 1.5045   LearningRate 0.0001   Epoch: 29   Global Step: 51350   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:06:17,790-Speed 25148.26 samples/sec   Loss 1.5005   LearningRate 0.0001   Epoch: 29   Global Step: 51360   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:06:27,508-Speed 25290.61 samples/sec   Loss 1.4918   LearningRate 0.0001   Epoch: 29   Global Step: 51370   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:06:37,263-Speed 25195.70 samples/sec   Loss 1.4906   LearningRate 0.0001   Epoch: 29   Global Step: 51380   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:06:47,103-Speed 24980.38 samples/sec   Loss 1.4885   LearningRate 0.0001   Epoch: 29   Global Step: 51390   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:06:56,851-Speed 25214.67 samples/sec   Loss 1.4989   LearningRate 0.0001   Epoch: 29   Global Step: 51400   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:07:06,717-Speed 24914.87 samples/sec   Loss 1.4918   LearningRate 0.0001   Epoch: 29   Global Step: 51410   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:07:16,448-Speed 25259.63 samples/sec   Loss 1.4984   LearningRate 0.0001   Epoch: 29   Global Step: 51420   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:07:26,215-Speed 25167.27 samples/sec   Loss 1.4984   LearningRate 0.0001   Epoch: 29   Global Step: 51430   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:07:35,991-Speed 25145.22 samples/sec   Loss 1.4846   LearningRate 0.0001   Epoch: 29   Global Step: 51440   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:07:45,735-Speed 25226.33 samples/sec   Loss 1.4898   LearningRate 0.0001   Epoch: 29   Global Step: 51450   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:07:55,510-Speed 25143.32 samples/sec   Loss 1.5113   LearningRate 0.0001   Epoch: 29   Global Step: 51460   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:08:05,182-Speed 25414.70 samples/sec   Loss 1.4948   LearningRate 0.0001   Epoch: 29   Global Step: 51470   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:08:14,862-Speed 25389.24 samples/sec   Loss 1.4896   LearningRate 0.0001   Epoch: 29   Global Step: 51480   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:08:24,594-Speed 25260.20 samples/sec   Loss 1.4930   LearningRate 0.0001   Epoch: 29   Global Step: 51490   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:08:34,468-Speed 24893.13 samples/sec   Loss 1.4890   LearningRate 0.0001   Epoch: 29   Global Step: 51500   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:08:44,237-Speed 25159.74 samples/sec   Loss 1.5067   LearningRate 0.0001   Epoch: 29   Global Step: 51510   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:08:53,948-Speed 25311.17 samples/sec   Loss 1.5017   LearningRate 0.0001   Epoch: 29   Global Step: 51520   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:09:03,802-Speed 24942.34 samples/sec   Loss 1.4941   LearningRate 0.0001   Epoch: 29   Global Step: 51530   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:09:13,484-Speed 25388.32 samples/sec   Loss 1.4908   LearningRate 0.0001   Epoch: 29   Global Step: 51540   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:09:23,261-Speed 25138.57 samples/sec   Loss 1.4987   LearningRate 0.0001   Epoch: 29   Global Step: 51550   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:09:32,987-Speed 25272.63 samples/sec   Loss 1.5005   LearningRate 0.0001   Epoch: 29   Global Step: 51560   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:09:42,731-Speed 25224.72 samples/sec   Loss 1.4867   LearningRate 0.0001   Epoch: 29   Global Step: 51570   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:09:52,522-Speed 25103.69 samples/sec   Loss 1.4951   LearningRate 0.0001   Epoch: 29   Global Step: 51580   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:10:02,269-Speed 25217.63 samples/sec   Loss 1.5002   LearningRate 0.0001   Epoch: 29   Global Step: 51590   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:10:12,036-Speed 25166.78 samples/sec   Loss 1.4840   LearningRate 0.0001   Epoch: 29   Global Step: 51600   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:10:21,797-Speed 25180.11 samples/sec   Loss 1.4946   LearningRate 0.0001   Epoch: 29   Global Step: 51610   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:10:31,559-Speed 25179.14 samples/sec   Loss 1.4979   LearningRate 0.0001   Epoch: 29   Global Step: 51620   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:10:41,304-Speed 25222.21 samples/sec   Loss 1.4928   LearningRate 0.0001   Epoch: 29   Global Step: 51630   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:10:51,065-Speed 25179.87 samples/sec   Loss 1.4932   LearningRate 0.0001   Epoch: 29   Global Step: 51640   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:11:00,823-Speed 25191.04 samples/sec   Loss 1.4829   LearningRate 0.0001   Epoch: 29   Global Step: 51650   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:11:10,562-Speed 25238.63 samples/sec   Loss 1.5028   LearningRate 0.0001   Epoch: 29   Global Step: 51660   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:11:20,270-Speed 25317.56 samples/sec   Loss 1.4865   LearningRate 0.0001   Epoch: 29   Global Step: 51670   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:11:30,139-Speed 24906.51 samples/sec   Loss 1.4915   LearningRate 0.0001   Epoch: 29   Global Step: 51680   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-03-26 13:11:39,855-Speed 25296.98 samples/sec   Loss 1.4997   LearningRate 0.0001   Epoch: 29   Global Step: 51690   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:11:49,638-Speed 25126.15 samples/sec   Loss 1.4902   LearningRate 0.0001   Epoch: 29   Global Step: 51700   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:11:59,485-Speed 24966.98 samples/sec   Loss 1.5029   LearningRate 0.0001   Epoch: 29   Global Step: 51710   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:12:09,240-Speed 25200.29 samples/sec   Loss 1.5078   LearningRate 0.0001   Epoch: 29   Global Step: 51720   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:12:18,986-Speed 25220.87 samples/sec   Loss 1.4870   LearningRate 0.0001   Epoch: 29   Global Step: 51730   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:12:28,859-Speed 24897.13 samples/sec   Loss 1.4939   LearningRate 0.0001   Epoch: 29   Global Step: 51740   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:12:38,660-Speed 25079.58 samples/sec   Loss 1.4944   LearningRate 0.0001   Epoch: 29   Global Step: 51750   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:12:48,533-Speed 24893.22 samples/sec   Loss 1.4965   LearningRate 0.0001   Epoch: 29   Global Step: 51760   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:12:58,300-Speed 25167.33 samples/sec   Loss 1.4864   LearningRate 0.0001   Epoch: 29   Global Step: 51770   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:13:08,056-Speed 25192.34 samples/sec   Loss 1.5059   LearningRate 0.0001   Epoch: 29   Global Step: 51780   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:13:17,734-Speed 25396.70 samples/sec   Loss 1.4820   LearningRate 0.0001   Epoch: 29   Global Step: 51790   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:13:27,497-Speed 25176.74 samples/sec   Loss 1.4955   LearningRate 0.0001   Epoch: 29   Global Step: 51800   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:13:37,273-Speed 25144.26 samples/sec   Loss 1.4918   LearningRate 0.0001   Epoch: 29   Global Step: 51810   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:13:47,022-Speed 25212.54 samples/sec   Loss 1.5001   LearningRate 0.0001   Epoch: 29   Global Step: 51820   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:13:56,786-Speed 25173.00 samples/sec   Loss 1.4929   LearningRate 0.0001   Epoch: 29   Global Step: 51830   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:14:06,521-Speed 25252.25 samples/sec   Loss 1.4877   LearningRate 0.0001   Epoch: 29   Global Step: 51840   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:15:06,020-Speed 4130.55 samples/sec   Loss 1.4964   LearningRate 0.0001   Epoch: 30   Global Step: 51850   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:15:15,719-Speed 25342.26 samples/sec   Loss 1.4801   LearningRate 0.0001   Epoch: 30   Global Step: 51860   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:15:25,535-Speed 25042.81 samples/sec   Loss 1.4875   LearningRate 0.0001   Epoch: 30   Global Step: 51870   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:15:35,346-Speed 25056.32 samples/sec   Loss 1.4809   LearningRate 0.0001   Epoch: 30   Global Step: 51880   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:15:45,046-Speed 25342.62 samples/sec   Loss 1.4813   LearningRate 0.0001   Epoch: 30   Global Step: 51890   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:15:54,880-Speed 24993.17 samples/sec   Loss 1.4706   LearningRate 0.0001   Epoch: 30   Global Step: 51900   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:16:04,571-Speed 25363.33 samples/sec   Loss 1.4827   LearningRate 0.0001   Epoch: 30   Global Step: 51910   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:16:14,287-Speed 25297.78 samples/sec   Loss 1.4851   LearningRate 0.0001   Epoch: 30   Global Step: 51920   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:16:23,995-Speed 25317.10 samples/sec   Loss 1.4832   LearningRate 0.0001   Epoch: 30   Global Step: 51930   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:16:33,697-Speed 25340.49 samples/sec   Loss 1.4806   LearningRate 0.0001   Epoch: 30   Global Step: 51940   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:16:43,499-Speed 25085.27 samples/sec   Loss 1.4880   LearningRate 0.0001   Epoch: 30   Global Step: 51950   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:16:53,291-Speed 25100.06 samples/sec   Loss 1.4738   LearningRate 0.0001   Epoch: 30   Global Step: 51960   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:17:03,058-Speed 25165.54 samples/sec   Loss 1.4821   LearningRate 0.0001   Epoch: 30   Global Step: 51970   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:17:12,891-Speed 24999.32 samples/sec   Loss 1.4746   LearningRate 0.0001   Epoch: 30   Global Step: 51980   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:17:22,544-Speed 25461.95 samples/sec   Loss 1.4768   LearningRate 0.0001   Epoch: 30   Global Step: 51990   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:17:32,282-Speed 25240.71 samples/sec   Loss 1.4677   LearningRate 0.0001   Epoch: 30   Global Step: 52000   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:17:41,996-Speed 25306.96 samples/sec   Loss 1.4789   LearningRate 0.0001   Epoch: 30   Global Step: 52010   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:17:51,653-Speed 25453.96 samples/sec   Loss 1.4841   LearningRate 0.0001   Epoch: 30   Global Step: 52020   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:18:01,388-Speed 25247.09 samples/sec   Loss 1.4809   LearningRate 0.0001   Epoch: 30   Global Step: 52030   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:18:11,177-Speed 25111.70 samples/sec   Loss 1.4800   LearningRate 0.0001   Epoch: 30   Global Step: 52040   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:18:20,967-Speed 25107.30 samples/sec   Loss 1.4802   LearningRate 0.0001   Epoch: 30   Global Step: 52050   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:18:30,796-Speed 25006.83 samples/sec   Loss 1.4761   LearningRate 0.0001   Epoch: 30   Global Step: 52060   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:18:40,506-Speed 25312.10 samples/sec   Loss 1.4846   LearningRate 0.0001   Epoch: 30   Global Step: 52070   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:18:50,292-Speed 25119.26 samples/sec   Loss 1.4773   LearningRate 0.0001   Epoch: 30   Global Step: 52080   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:19:00,022-Speed 25260.70 samples/sec   Loss 1.4855   LearningRate 0.0001   Epoch: 30   Global Step: 52090   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:19:09,760-Speed 25239.78 samples/sec   Loss 1.4795   LearningRate 0.0001   Epoch: 30   Global Step: 52100   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:19:19,582-Speed 25024.48 samples/sec   Loss 1.4786   LearningRate 0.0001   Epoch: 30   Global Step: 52110   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:19:29,408-Speed 25013.61 samples/sec   Loss 1.4883   LearningRate 0.0001   Epoch: 30   Global Step: 52120   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:19:39,186-Speed 25144.15 samples/sec   Loss 1.4782   LearningRate 0.0001   Epoch: 30   Global Step: 52130   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:19:48,998-Speed 25051.23 samples/sec   Loss 1.4762   LearningRate 0.0001   Epoch: 30   Global Step: 52140   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:19:58,690-Speed 25361.28 samples/sec   Loss 1.4848   LearningRate 0.0001   Epoch: 30   Global Step: 52150   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:20:08,481-Speed 25106.82 samples/sec   Loss 1.4727   LearningRate 0.0001   Epoch: 30   Global Step: 52160   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:20:18,230-Speed 25211.61 samples/sec   Loss 1.4711   LearningRate 0.0001   Epoch: 30   Global Step: 52170   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:20:27,979-Speed 25212.22 samples/sec   Loss 1.4771   LearningRate 0.0001   Epoch: 30   Global Step: 52180   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:20:37,755-Speed 25144.87 samples/sec   Loss 1.4765   LearningRate 0.0001   Epoch: 30   Global Step: 52190   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:20:47,623-Speed 24910.76 samples/sec   Loss 1.4793   LearningRate 0.0001   Epoch: 30   Global Step: 52200   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:20:57,349-Speed 25269.81 samples/sec   Loss 1.4888   LearningRate 0.0001   Epoch: 30   Global Step: 52210   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:21:07,097-Speed 25213.85 samples/sec   Loss 1.4818   LearningRate 0.0001   Epoch: 30   Global Step: 52220   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:21:16,790-Speed 25360.73 samples/sec   Loss 1.4811   LearningRate 0.0001   Epoch: 30   Global Step: 52230   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:21:26,550-Speed 25185.85 samples/sec   Loss 1.4822   LearningRate 0.0001   Epoch: 30   Global Step: 52240   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:21:36,319-Speed 25158.50 samples/sec   Loss 1.4848   LearningRate 0.0001   Epoch: 30   Global Step: 52250   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:21:45,981-Speed 25439.59 samples/sec   Loss 1.4755   LearningRate 0.0001   Epoch: 30   Global Step: 52260   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-26 13:21:55,774-Speed 25098.97 samples/sec   Loss 1.4713   LearningRate 0.0001   Epoch: 30   Global Step: 52270   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-26 13:22:05,582-Speed 25060.60 samples/sec   Loss 1.4772   LearningRate 0.0001   Epoch: 30   Global Step: 52280   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-26 13:22:15,371-Speed 25108.68 samples/sec   Loss 1.4705   LearningRate 0.0001   Epoch: 30   Global Step: 52290   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-26 13:22:25,127-Speed 25196.31 samples/sec   Loss 1.4783   LearningRate 0.0001   Epoch: 30   Global Step: 52300   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-26 13:22:34,934-Speed 25064.36 samples/sec   Loss 1.4838   LearningRate 0.0001   Epoch: 30   Global Step: 52310   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-26 13:22:44,726-Speed 25106.64 samples/sec   Loss 1.4773   LearningRate 0.0001   Epoch: 30   Global Step: 52320   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-26 13:22:54,478-Speed 25204.95 samples/sec   Loss 1.4702   LearningRate 0.0001   Epoch: 30   Global Step: 52330   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-26 13:23:04,202-Speed 25280.13 samples/sec   Loss 1.4810   LearningRate 0.0001   Epoch: 30   Global Step: 52340   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-26 13:23:14,025-Speed 25023.19 samples/sec   Loss 1.4584   LearningRate 0.0001   Epoch: 30   Global Step: 52350   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-26 13:23:23,825-Speed 25080.37 samples/sec   Loss 1.4776   LearningRate 0.0001   Epoch: 30   Global Step: 52360   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:23:33,572-Speed 25218.06 samples/sec   Loss 1.4712   LearningRate 0.0001   Epoch: 30   Global Step: 52370   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:23:43,277-Speed 25327.34 samples/sec   Loss 1.4781   LearningRate 0.0001   Epoch: 30   Global Step: 52380   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:23:53,163-Speed 24866.93 samples/sec   Loss 1.4770   LearningRate 0.0001   Epoch: 30   Global Step: 52390   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:24:02,869-Speed 25332.30 samples/sec   Loss 1.4707   LearningRate 0.0001   Epoch: 30   Global Step: 52400   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:24:12,610-Speed 25234.74 samples/sec   Loss 1.4776   LearningRate 0.0001   Epoch: 30   Global Step: 52410   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:24:22,331-Speed 25284.36 samples/sec   Loss 1.4743   LearningRate 0.0001   Epoch: 30   Global Step: 52420   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:24:32,106-Speed 25146.34 samples/sec   Loss 1.4766   LearningRate 0.0001   Epoch: 30   Global Step: 52430   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:24:41,898-Speed 25104.48 samples/sec   Loss 1.4702   LearningRate 0.0001   Epoch: 30   Global Step: 52440   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:24:51,658-Speed 25182.25 samples/sec   Loss 1.4698   LearningRate 0.0001   Epoch: 30   Global Step: 52450   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:25:01,422-Speed 25173.84 samples/sec   Loss 1.4682   LearningRate 0.0001   Epoch: 30   Global Step: 52460   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:25:11,167-Speed 25223.80 samples/sec   Loss 1.4727   LearningRate 0.0001   Epoch: 30   Global Step: 52470   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:25:20,907-Speed 25235.58 samples/sec   Loss 1.4769   LearningRate 0.0001   Epoch: 30   Global Step: 52480   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:25:30,708-Speed 25076.60 samples/sec   Loss 1.4778   LearningRate 0.0001   Epoch: 30   Global Step: 52490   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:25:40,432-Speed 25276.50 samples/sec   Loss 1.4759   LearningRate 0.0001   Epoch: 30   Global Step: 52500   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:25:50,232-Speed 25084.82 samples/sec   Loss 1.4740   LearningRate 0.0001   Epoch: 30   Global Step: 52510   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:25:59,947-Speed 25301.04 samples/sec   Loss 1.4709   LearningRate 0.0001   Epoch: 30   Global Step: 52520   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:26:09,691-Speed 25225.17 samples/sec   Loss 1.4751   LearningRate 0.0001   Epoch: 30   Global Step: 52530   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:26:19,512-Speed 25026.11 samples/sec   Loss 1.4786   LearningRate 0.0001   Epoch: 30   Global Step: 52540   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:26:29,219-Speed 25319.95 samples/sec   Loss 1.4680   LearningRate 0.0001   Epoch: 30   Global Step: 52550   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:26:39,026-Speed 25065.49 samples/sec   Loss 1.4556   LearningRate 0.0001   Epoch: 30   Global Step: 52560   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:26:48,749-Speed 25278.05 samples/sec   Loss 1.4670   LearningRate 0.0001   Epoch: 30   Global Step: 52570   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:26:58,458-Speed 25315.22 samples/sec   Loss 1.4729   LearningRate 0.0001   Epoch: 30   Global Step: 52580   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:27:08,312-Speed 24944.34 samples/sec   Loss 1.4632   LearningRate 0.0001   Epoch: 30   Global Step: 52590   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:27:18,139-Speed 25012.19 samples/sec   Loss 1.4694   LearningRate 0.0001   Epoch: 30   Global Step: 52600   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:27:28,078-Speed 24730.57 samples/sec   Loss 1.4717   LearningRate 0.0001   Epoch: 30   Global Step: 52610   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:27:37,818-Speed 25234.61 samples/sec   Loss 1.4666   LearningRate 0.0001   Epoch: 30   Global Step: 52620   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:27:47,898-Speed 24386.03 samples/sec   Loss 1.4756   LearningRate 0.0001   Epoch: 30   Global Step: 52630   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-26 13:27:57,989-Speed 24358.89 samples/sec   Loss 1.4681   LearningRate 0.0001   Epoch: 30   Global Step: 52640   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-26 13:28:08,060-Speed 24407.59 samples/sec   Loss 1.4674   LearningRate 0.0001   Epoch: 30   Global Step: 52650   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-26 13:28:18,178-Speed 24291.51 samples/sec   Loss 1.4704   LearningRate 0.0001   Epoch: 30   Global Step: 52660   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-26 13:28:28,291-Speed 24305.34 samples/sec   Loss 1.4666   LearningRate 0.0001   Epoch: 30   Global Step: 52670   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-26 13:28:38,426-Speed 24250.89 samples/sec   Loss 1.4616   LearningRate 0.0001   Epoch: 30   Global Step: 52680   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-26 13:28:48,499-Speed 24402.22 samples/sec   Loss 1.4639   LearningRate 0.0001   Epoch: 30   Global Step: 52690   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-26 13:28:58,592-Speed 24352.61 samples/sec   Loss 1.4684   LearningRate 0.0001   Epoch: 30   Global Step: 52700   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-26 13:29:08,757-Speed 24182.51 samples/sec   Loss 1.4547   LearningRate 0.0001   Epoch: 30   Global Step: 52710   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-26 13:29:18,824-Speed 24414.60 samples/sec   Loss 1.4483   LearningRate 0.0001   Epoch: 30   Global Step: 52720   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-26 13:29:28,960-Speed 24250.14 samples/sec   Loss 1.4706   LearningRate 0.0001   Epoch: 30   Global Step: 52730   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:29:39,037-Speed 24390.24 samples/sec   Loss 1.4690   LearningRate 0.0001   Epoch: 30   Global Step: 52740   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:29:49,120-Speed 24377.19 samples/sec   Loss 1.4537   LearningRate 0.0001   Epoch: 30   Global Step: 52750   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:29:59,195-Speed 24396.03 samples/sec   Loss 1.4597   LearningRate 0.0001   Epoch: 30   Global Step: 52760   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:30:09,266-Speed 24406.77 samples/sec   Loss 1.4662   LearningRate 0.0001   Epoch: 30   Global Step: 52770   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:30:19,324-Speed 24438.26 samples/sec   Loss 1.4683   LearningRate 0.0001   Epoch: 30   Global Step: 52780   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:30:29,420-Speed 24346.40 samples/sec   Loss 1.4660   LearningRate 0.0001   Epoch: 30   Global Step: 52790   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:30:39,507-Speed 24365.57 samples/sec   Loss 1.4654   LearningRate 0.0001   Epoch: 30   Global Step: 52800   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:30:49,546-Speed 24485.59 samples/sec   Loss 1.4505   LearningRate 0.0001   Epoch: 30   Global Step: 52810   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:30:59,595-Speed 24459.15 samples/sec   Loss 1.4610   LearningRate 0.0001   Epoch: 30   Global Step: 52820   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:31:09,664-Speed 24409.02 samples/sec   Loss 1.4621   LearningRate 0.0001   Epoch: 30   Global Step: 52830   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:31:19,785-Speed 24285.54 samples/sec   Loss 1.4640   LearningRate 0.0001   Epoch: 30   Global Step: 52840   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:31:29,840-Speed 24445.29 samples/sec   Loss 1.4691   LearningRate 0.0001   Epoch: 30   Global Step: 52850   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:31:39,928-Speed 24366.00 samples/sec   Loss 1.4530   LearningRate 0.0001   Epoch: 30   Global Step: 52860   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:31:50,025-Speed 24342.11 samples/sec   Loss 1.4652   LearningRate 0.0001   Epoch: 30   Global Step: 52870   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:32:00,069-Speed 24472.04 samples/sec   Loss 1.4621   LearningRate 0.0001   Epoch: 30   Global Step: 52880   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:32:10,171-Speed 24331.45 samples/sec   Loss 1.4533   LearningRate 0.0001   Epoch: 30   Global Step: 52890   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:32:20,253-Speed 24378.57 samples/sec   Loss 1.4637   LearningRate 0.0001   Epoch: 30   Global Step: 52900   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:32:30,288-Speed 24494.40 samples/sec   Loss 1.4604   LearningRate 0.0001   Epoch: 30   Global Step: 52910   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:32:40,390-Speed 24331.84 samples/sec   Loss 1.4665   LearningRate 0.0001   Epoch: 30   Global Step: 52920   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:32:50,477-Speed 24366.90 samples/sec   Loss 1.4672   LearningRate 0.0001   Epoch: 30   Global Step: 52930   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-26 13:33:00,553-Speed 24398.79 samples/sec   Loss 1.4733   LearningRate 0.0001   Epoch: 30   Global Step: 52940   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-26 13:33:10,658-Speed 24324.05 samples/sec   Loss 1.4546   LearningRate 0.0001   Epoch: 30   Global Step: 52950   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-26 13:33:20,738-Speed 24386.56 samples/sec   Loss 1.4608   LearningRate 0.0001   Epoch: 30   Global Step: 52960   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-26 13:33:30,839-Speed 24332.76 samples/sec   Loss 1.4540   LearningRate 0.0001   Epoch: 30   Global Step: 52970   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-26 13:33:40,929-Speed 24364.86 samples/sec   Loss 1.4548   LearningRate 0.0001   Epoch: 30   Global Step: 52980   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-26 13:33:51,122-Speed 24120.13 samples/sec   Loss 1.4603   LearningRate 0.0001   Epoch: 30   Global Step: 52990   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-26 13:34:01,254-Speed 24258.44 samples/sec   Loss 1.4591   LearningRate 0.0001   Epoch: 30   Global Step: 53000   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-26 13:34:11,332-Speed 24388.60 samples/sec   Loss 1.4556   LearningRate 0.0001   Epoch: 30   Global Step: 53010   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-26 13:34:21,549-Speed 24058.61 samples/sec   Loss 1.4522   LearningRate 0.0001   Epoch: 30   Global Step: 53020   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-26 13:34:31,656-Speed 24318.28 samples/sec   Loss 1.4656   LearningRate 0.0001   Epoch: 30   Global Step: 53030   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:34:41,704-Speed 24461.43 samples/sec   Loss 1.4644   LearningRate 0.0001   Epoch: 30   Global Step: 53040   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:34:51,825-Speed 24286.21 samples/sec   Loss 1.4592   LearningRate 0.0001   Epoch: 30   Global Step: 53050   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:35:01,725-Speed 24828.30 samples/sec   Loss 1.4575   LearningRate 0.0001   Epoch: 30   Global Step: 53060   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:35:11,585-Speed 24927.15 samples/sec   Loss 1.4518   LearningRate 0.0001   Epoch: 30   Global Step: 53070   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:35:21,471-Speed 24864.17 samples/sec   Loss 1.4560   LearningRate 0.0001   Epoch: 30   Global Step: 53080   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:35:31,315-Speed 24970.94 samples/sec   Loss 1.4552   LearningRate 0.0001   Epoch: 30   Global Step: 53090   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:35:41,087-Speed 25151.34 samples/sec   Loss 1.4529   LearningRate 0.0001   Epoch: 30   Global Step: 53100   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:35:50,868-Speed 25132.49 samples/sec   Loss 1.4614   LearningRate 0.0001   Epoch: 30   Global Step: 53110   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:36:00,630-Speed 25179.70 samples/sec   Loss 1.4515   LearningRate 0.0001   Epoch: 30   Global Step: 53120   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:36:10,332-Speed 25332.55 samples/sec   Loss 1.4636   LearningRate 0.0001   Epoch: 30   Global Step: 53130   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-03-26 13:36:20,271-Speed 24731.06 samples/sec   Loss 1.4468   LearningRate 0.0001   Epoch: 30   Global Step: 53140   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:36:30,046-Speed 25147.30 samples/sec   Loss 1.4524   LearningRate 0.0001   Epoch: 30   Global Step: 53150   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:36:39,876-Speed 25002.51 samples/sec   Loss 1.4557   LearningRate 0.0001   Epoch: 30   Global Step: 53160   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:36:49,732-Speed 24937.50 samples/sec   Loss 1.4627   LearningRate 0.0001   Epoch: 30   Global Step: 53170   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-26 13:36:59,516-Speed 25122.34 samples/sec   Loss 1.4487   LearningRate 0.0001   Epoch: 30   Global Step: 53180   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 13:37:09,311-Speed 25095.26 samples/sec   Loss 1.4473   LearningRate 0.0001   Epoch: 30   Global Step: 53190   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 13:37:19,129-Speed 25034.13 samples/sec   Loss 1.4555   LearningRate 0.0001   Epoch: 30   Global Step: 53200   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-26 13:37:28,929-Speed 25084.64 samples/sec   Loss 1.4498   LearningRate 0.0001   Epoch: 30   Global Step: 53210   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-26 13:37:38,616-Speed 25372.53 samples/sec   Loss 1.4518   LearningRate 0.0001   Epoch: 30   Global Step: 53220   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-26 13:37:48,332-Speed 25296.45 samples/sec   Loss 1.4496   LearningRate 0.0001   Epoch: 30   Global Step: 53230   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-26 13:37:58,211-Speed 24881.67 samples/sec   Loss 1.4595   LearningRate 0.0001   Epoch: 30   Global Step: 53240   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-26 13:38:08,110-Speed 24831.03 samples/sec   Loss 1.4471   LearningRate 0.0001   Epoch: 30   Global Step: 53250   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-26 13:38:17,971-Speed 24925.89 samples/sec   Loss 1.4463   LearningRate 0.0001   Epoch: 30   Global Step: 53260   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-26 13:38:27,737-Speed 25167.54 samples/sec   Loss 1.4545   LearningRate 0.0001   Epoch: 30   Global Step: 53270   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-26 13:38:37,549-Speed 25052.11 samples/sec   Loss 1.4489   LearningRate 0.0001   Epoch: 30   Global Step: 53280   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-26 13:38:47,471-Speed 24771.78 samples/sec   Loss 1.4549   LearningRate 0.0001   Epoch: 30   Global Step: 53290   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-26 13:38:57,384-Speed 24796.65 samples/sec   Loss 1.4423   LearningRate 0.0001   Epoch: 30   Global Step: 53300   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 13:39:07,299-Speed 24788.55 samples/sec   Loss 1.4447   LearningRate 0.0001   Epoch: 30   Global Step: 53310   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 13:39:17,180-Speed 24880.96 samples/sec   Loss 1.4573   LearningRate 0.0001   Epoch: 30   Global Step: 53320   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 13:39:26,948-Speed 25163.90 samples/sec   Loss 1.4583   LearningRate 0.0001   Epoch: 30   Global Step: 53330   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 13:39:36,753-Speed 25068.52 samples/sec   Loss 1.4538   LearningRate 0.0001   Epoch: 30   Global Step: 53340   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 13:39:46,559-Speed 25066.61 samples/sec   Loss 1.4441   LearningRate 0.0001   Epoch: 30   Global Step: 53350   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 13:39:56,361-Speed 25074.44 samples/sec   Loss 1.4470   LearningRate 0.0001   Epoch: 30   Global Step: 53360   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 13:40:06,170-Speed 25057.67 samples/sec   Loss 1.4592   LearningRate 0.0001   Epoch: 30   Global Step: 53370   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 13:40:16,087-Speed 24785.44 samples/sec   Loss 1.4500   LearningRate 0.0001   Epoch: 30   Global Step: 53380   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 13:40:25,828-Speed 25233.48 samples/sec   Loss 1.4450   LearningRate 0.0001   Epoch: 30   Global Step: 53390   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 13:40:35,633-Speed 25074.04 samples/sec   Loss 1.4612   LearningRate 0.0001   Epoch: 30   Global Step: 53400   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 13:40:45,547-Speed 24792.22 samples/sec   Loss 1.4474   LearningRate 0.0001   Epoch: 30   Global Step: 53410   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 13:40:55,422-Speed 24888.49 samples/sec   Loss 1.4485   LearningRate 0.0001   Epoch: 30   Global Step: 53420   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 13:41:05,161-Speed 25237.38 samples/sec   Loss 1.4385   LearningRate 0.0001   Epoch: 30   Global Step: 53430   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 13:41:14,922-Speed 25181.94 samples/sec   Loss 1.4519   LearningRate 0.0001   Epoch: 30   Global Step: 53440   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 13:41:24,646-Speed 25277.18 samples/sec   Loss 1.4483   LearningRate 0.0001   Epoch: 30   Global Step: 53450   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 13:41:34,389-Speed 25230.03 samples/sec   Loss 1.4453   LearningRate 0.0001   Epoch: 30   Global Step: 53460   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 13:41:44,174-Speed 25117.06 samples/sec   Loss 1.4481   LearningRate 0.0001   Epoch: 30   Global Step: 53470   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 13:41:53,969-Speed 25093.95 samples/sec   Loss 1.4524   LearningRate 0.0001   Epoch: 30   Global Step: 53480   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 13:42:03,734-Speed 25172.72 samples/sec   Loss 1.4419   LearningRate 0.0001   Epoch: 30   Global Step: 53490   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 13:42:13,500-Speed 25165.52 samples/sec   Loss 1.4450   LearningRate 0.0001   Epoch: 30   Global Step: 53500   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 13:42:23,283-Speed 25126.96 samples/sec   Loss 1.4432   LearningRate 0.0001   Epoch: 30   Global Step: 53510   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 13:42:33,055-Speed 25152.58 samples/sec   Loss 1.4438   LearningRate 0.0001   Epoch: 30   Global Step: 53520   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 13:42:42,843-Speed 25112.54 samples/sec   Loss 1.4593   LearningRate 0.0001   Epoch: 30   Global Step: 53530   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 13:42:52,617-Speed 25146.07 samples/sec   Loss 1.4535   LearningRate 0.0001   Epoch: 30   Global Step: 53540   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 13:43:02,346-Speed 25270.57 samples/sec   Loss 1.4591   LearningRate 0.0001   Epoch: 30   Global Step: 53550   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 13:43:12,115-Speed 25159.92 samples/sec   Loss 1.4644   LearningRate 0.0001   Epoch: 30   Global Step: 53560   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 13:43:22,000-Speed 24871.07 samples/sec   Loss 1.4646   LearningRate 0.0001   Epoch: 30   Global Step: 53570   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 13:44:20,973-Speed 4167.43 samples/sec   Loss 1.4470   LearningRate 0.0001   Epoch: 31   Global Step: 53580   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 13:44:30,811-Speed 24985.54 samples/sec   Loss 1.4410   LearningRate 0.0001   Epoch: 31   Global Step: 53590   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 13:44:40,584-Speed 25154.31 samples/sec   Loss 1.4436   LearningRate 0.0001   Epoch: 31   Global Step: 53600   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-03-26 13:44:50,337-Speed 25203.82 samples/sec   Loss 1.4445   LearningRate 0.0001   Epoch: 31   Global Step: 53610   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 13:45:00,093-Speed 25193.43 samples/sec   Loss 1.4370   LearningRate 0.0001   Epoch: 31   Global Step: 53620   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 13:45:09,996-Speed 24819.00 samples/sec   Loss 1.4382   LearningRate 0.0001   Epoch: 31   Global Step: 53630   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 13:45:19,845-Speed 24959.07 samples/sec   Loss 1.4356   LearningRate 0.0001   Epoch: 31   Global Step: 53640   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 13:45:29,710-Speed 24922.22 samples/sec   Loss 1.4372   LearningRate 0.0001   Epoch: 31   Global Step: 53650   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 13:45:39,572-Speed 24925.40 samples/sec   Loss 1.4342   LearningRate 0.0001   Epoch: 31   Global Step: 53660   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 13:45:49,404-Speed 25003.54 samples/sec   Loss 1.4462   LearningRate 0.0001   Epoch: 31   Global Step: 53670   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 13:45:59,094-Speed 25365.12 samples/sec   Loss 1.4342   LearningRate 0.0001   Epoch: 31   Global Step: 53680   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 13:46:08,941-Speed 24963.97 samples/sec   Loss 1.4398   LearningRate 0.0001   Epoch: 31   Global Step: 53690   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 13:46:18,704-Speed 25175.57 samples/sec   Loss 1.4416   LearningRate 0.0001   Epoch: 31   Global Step: 53700   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-26 13:46:28,495-Speed 25105.82 samples/sec   Loss 1.4419   LearningRate 0.0001   Epoch: 31   Global Step: 53710   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-26 13:46:38,279-Speed 25121.33 samples/sec   Loss 1.4336   LearningRate 0.0001   Epoch: 31   Global Step: 53720   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-26 13:46:48,083-Speed 25072.63 samples/sec   Loss 1.4318   LearningRate 0.0001   Epoch: 31   Global Step: 53730   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-26 13:46:58,099-Speed 24542.09 samples/sec   Loss 1.4446   LearningRate 0.0001   Epoch: 31   Global Step: 53740   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-26 13:47:08,151-Speed 24451.06 samples/sec   Loss 1.4346   LearningRate 0.0001   Epoch: 31   Global Step: 53750   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-26 13:47:18,143-Speed 24599.93 samples/sec   Loss 1.4385   LearningRate 0.0001   Epoch: 31   Global Step: 53760   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-26 13:47:28,195-Speed 24452.10 samples/sec   Loss 1.4476   LearningRate 0.0001   Epoch: 31   Global Step: 53770   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-26 13:47:38,267-Speed 24402.65 samples/sec   Loss 1.4289   LearningRate 0.0001   Epoch: 31   Global Step: 53780   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-26 13:47:48,341-Speed 24399.26 samples/sec   Loss 1.4341   LearningRate 0.0001   Epoch: 31   Global Step: 53790   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-26 13:47:58,482-Speed 24238.15 samples/sec   Loss 1.4306   LearningRate 0.0001   Epoch: 31   Global Step: 53800   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 13:48:08,572-Speed 24360.47 samples/sec   Loss 1.4392   LearningRate 0.0001   Epoch: 31   Global Step: 53810   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 13:48:18,601-Speed 24509.53 samples/sec   Loss 1.4452   LearningRate 0.0001   Epoch: 31   Global Step: 53820   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 13:48:28,669-Speed 24413.10 samples/sec   Loss 1.4474   LearningRate 0.0001   Epoch: 31   Global Step: 53830   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 13:48:38,689-Speed 24530.29 samples/sec   Loss 1.4385   LearningRate 0.0001   Epoch: 31   Global Step: 53840   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 13:48:48,858-Speed 24170.34 samples/sec   Loss 1.4325   LearningRate 0.0001   Epoch: 31   Global Step: 53850   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 13:48:58,822-Speed 24669.45 samples/sec   Loss 1.4548   LearningRate 0.0001   Epoch: 31   Global Step: 53860   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-26 13:49:08,897-Speed 24396.55 samples/sec   Loss 1.4370   LearningRate 0.0001   Epoch: 31   Global Step: 53870   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-26 13:49:18,995-Speed 24339.47 samples/sec   Loss 1.4524   LearningRate 0.0001   Epoch: 31   Global Step: 53880   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-26 13:49:29,094-Speed 24339.60 samples/sec   Loss 1.4393   LearningRate 0.0001   Epoch: 31   Global Step: 53890   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-26 13:49:39,163-Speed 24410.67 samples/sec   Loss 1.4351   LearningRate 0.0001   Epoch: 31   Global Step: 53900   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-26 13:49:49,185-Speed 24523.83 samples/sec   Loss 1.4326   LearningRate 0.0001   Epoch: 31   Global Step: 53910   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-26 13:49:59,121-Speed 24747.25 samples/sec   Loss 1.4443   LearningRate 0.0001   Epoch: 31   Global Step: 53920   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-26 13:50:09,310-Speed 24123.93 samples/sec   Loss 1.4491   LearningRate 0.0001   Epoch: 31   Global Step: 53930   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-26 13:50:19,388-Speed 24390.44 samples/sec   Loss 1.4442   LearningRate 0.0001   Epoch: 31   Global Step: 53940   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-26 13:50:29,387-Speed 24581.73 samples/sec   Loss 1.4415   LearningRate 0.0001   Epoch: 31   Global Step: 53950   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-26 13:50:39,423-Speed 24490.71 samples/sec   Loss 1.4295   LearningRate 0.0001   Epoch: 31   Global Step: 53960   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 13:50:49,542-Speed 24291.14 samples/sec   Loss 1.4456   LearningRate 0.0001   Epoch: 31   Global Step: 53970   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 13:50:59,559-Speed 24538.80 samples/sec   Loss 1.4272   LearningRate 0.0001   Epoch: 31   Global Step: 53980   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 13:51:09,560-Speed 24577.02 samples/sec   Loss 1.4347   LearningRate 0.0001   Epoch: 31   Global Step: 53990   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 13:51:19,740-Speed 24146.38 samples/sec   Loss 1.4358   LearningRate 0.0001   Epoch: 31   Global Step: 54000   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 13:51:29,843-Speed 24327.20 samples/sec   Loss 1.4329   LearningRate 0.0001   Epoch: 31   Global Step: 54010   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 13:51:39,961-Speed 24294.13 samples/sec   Loss 1.4487   LearningRate 0.0001   Epoch: 31   Global Step: 54020   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 13:51:49,932-Speed 24650.92 samples/sec   Loss 1.4388   LearningRate 0.0001   Epoch: 31   Global Step: 54030   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 13:52:00,129-Speed 24106.54 samples/sec   Loss 1.4410   LearningRate 0.0001   Epoch: 31   Global Step: 54040   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 13:52:10,265-Speed 24248.95 samples/sec   Loss 1.4315   LearningRate 0.0001   Epoch: 31   Global Step: 54050   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 13:52:20,282-Speed 24537.38 samples/sec   Loss 1.4364   LearningRate 0.0001   Epoch: 31   Global Step: 54060   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 13:52:30,319-Speed 24490.45 samples/sec   Loss 1.4395   LearningRate 0.0001   Epoch: 31   Global Step: 54070   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 13:52:40,369-Speed 24459.07 samples/sec   Loss 1.4380   LearningRate 0.0001   Epoch: 31   Global Step: 54080   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 13:52:50,497-Speed 24268.93 samples/sec   Loss 1.4376   LearningRate 0.0001   Epoch: 31   Global Step: 54090   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 13:53:00,501-Speed 24567.97 samples/sec   Loss 1.4396   LearningRate 0.0001   Epoch: 31   Global Step: 54100   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 13:53:10,464-Speed 24671.15 samples/sec   Loss 1.4259   LearningRate 0.0001   Epoch: 31   Global Step: 54110   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 13:53:20,528-Speed 24422.29 samples/sec   Loss 1.4400   LearningRate 0.0001   Epoch: 31   Global Step: 54120   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 13:53:30,556-Speed 24510.82 samples/sec   Loss 1.4275   LearningRate 0.0001   Epoch: 31   Global Step: 54130   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 13:53:40,647-Speed 24358.27 samples/sec   Loss 1.4255   LearningRate 0.0001   Epoch: 31   Global Step: 54140   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 13:53:50,698-Speed 24456.06 samples/sec   Loss 1.4249   LearningRate 0.0001   Epoch: 31   Global Step: 54150   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 13:54:00,720-Speed 24524.83 samples/sec   Loss 1.4376   LearningRate 0.0001   Epoch: 31   Global Step: 54160   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 13:54:10,775-Speed 24453.93 samples/sec   Loss 1.4295   LearningRate 0.0001   Epoch: 31   Global Step: 54170   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 13:54:20,828-Speed 24450.16 samples/sec   Loss 1.4314   LearningRate 0.0001   Epoch: 31   Global Step: 54180   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 13:54:30,894-Speed 24417.86 samples/sec   Loss 1.4275   LearningRate 0.0001   Epoch: 31   Global Step: 54190   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 13:54:40,901-Speed 24561.72 samples/sec   Loss 1.4308   LearningRate 0.0001   Epoch: 31   Global Step: 54200   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 13:54:50,938-Speed 24490.36 samples/sec   Loss 1.4307   LearningRate 0.0001   Epoch: 31   Global Step: 54210   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 13:55:01,042-Speed 24325.90 samples/sec   Loss 1.4321   LearningRate 0.0001   Epoch: 31   Global Step: 54220   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 13:55:11,202-Speed 24190.34 samples/sec   Loss 1.4281   LearningRate 0.0001   Epoch: 31   Global Step: 54230   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 13:55:21,365-Speed 24184.36 samples/sec   Loss 1.4318   LearningRate 0.0001   Epoch: 31   Global Step: 54240   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 13:55:31,523-Speed 24198.15 samples/sec   Loss 1.4403   LearningRate 0.0001   Epoch: 31   Global Step: 54250   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 13:55:41,573-Speed 24455.79 samples/sec   Loss 1.4386   LearningRate 0.0001   Epoch: 31   Global Step: 54260   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 13:55:51,687-Speed 24301.77 samples/sec   Loss 1.4270   LearningRate 0.0001   Epoch: 31   Global Step: 54270   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 13:56:01,685-Speed 24583.24 samples/sec   Loss 1.4330   LearningRate 0.0001   Epoch: 31   Global Step: 54280   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 13:56:11,668-Speed 24620.11 samples/sec   Loss 1.4365   LearningRate 0.0001   Epoch: 31   Global Step: 54290   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 13:56:21,780-Speed 24307.10 samples/sec   Loss 1.4286   LearningRate 0.0001   Epoch: 31   Global Step: 54300   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 13:56:31,789-Speed 24557.30 samples/sec   Loss 1.4387   LearningRate 0.0001   Epoch: 31   Global Step: 54310   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 13:56:41,814-Speed 24520.77 samples/sec   Loss 1.4350   LearningRate 0.0001   Epoch: 31   Global Step: 54320   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 13:56:51,845-Speed 24501.87 samples/sec   Loss 1.4262   LearningRate 0.0001   Epoch: 31   Global Step: 54330   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 13:57:01,885-Speed 24481.66 samples/sec   Loss 1.4290   LearningRate 0.0001   Epoch: 31   Global Step: 54340   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 13:57:11,976-Speed 24357.16 samples/sec   Loss 1.4248   LearningRate 0.0001   Epoch: 31   Global Step: 54350   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 13:57:22,030-Speed 24447.40 samples/sec   Loss 1.4268   LearningRate 0.0001   Epoch: 31   Global Step: 54360   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 13:57:32,029-Speed 24580.49 samples/sec   Loss 1.4260   LearningRate 0.0001   Epoch: 31   Global Step: 54370   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 13:57:42,034-Speed 24567.66 samples/sec   Loss 1.4312   LearningRate 0.0001   Epoch: 31   Global Step: 54380   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 13:57:52,150-Speed 24297.64 samples/sec   Loss 1.4258   LearningRate 0.0001   Epoch: 31   Global Step: 54390   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 13:58:02,155-Speed 24567.09 samples/sec   Loss 1.4264   LearningRate 0.0001   Epoch: 31   Global Step: 54400   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 13:58:12,148-Speed 24596.38 samples/sec   Loss 1.4307   LearningRate 0.0001   Epoch: 31   Global Step: 54410   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 13:58:22,264-Speed 24298.58 samples/sec   Loss 1.4279   LearningRate 0.0001   Epoch: 31   Global Step: 54420   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 13:58:32,318-Speed 24448.52 samples/sec   Loss 1.4202   LearningRate 0.0001   Epoch: 31   Global Step: 54430   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 13:58:42,402-Speed 24373.83 samples/sec   Loss 1.4115   LearningRate 0.0001   Epoch: 31   Global Step: 54440   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 13:58:52,515-Speed 24305.99 samples/sec   Loss 1.4198   LearningRate 0.0001   Epoch: 31   Global Step: 54450   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 13:59:02,536-Speed 24528.30 samples/sec   Loss 1.4185   LearningRate 0.0001   Epoch: 31   Global Step: 54460   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-03-26 13:59:12,535-Speed 24581.19 samples/sec   Loss 1.4271   LearningRate 0.0001   Epoch: 31   Global Step: 54470   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 13:59:22,251-Speed 25298.11 samples/sec   Loss 1.4166   LearningRate 0.0001   Epoch: 31   Global Step: 54480   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 13:59:32,079-Speed 25009.58 samples/sec   Loss 1.4083   LearningRate 0.0001   Epoch: 31   Global Step: 54490   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 13:59:41,820-Speed 25233.25 samples/sec   Loss 1.4167   LearningRate 0.0001   Epoch: 31   Global Step: 54500   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 13:59:51,625-Speed 25068.08 samples/sec   Loss 1.4208   LearningRate 0.0001   Epoch: 31   Global Step: 54510   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:00:01,486-Speed 24926.81 samples/sec   Loss 1.4237   LearningRate 0.0001   Epoch: 31   Global Step: 54520   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:00:11,277-Speed 25106.10 samples/sec   Loss 1.4249   LearningRate 0.0001   Epoch: 31   Global Step: 54530   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:00:21,071-Speed 25095.17 samples/sec   Loss 1.4242   LearningRate 0.0001   Epoch: 31   Global Step: 54540   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:00:30,856-Speed 25120.02 samples/sec   Loss 1.4186   LearningRate 0.0001   Epoch: 31   Global Step: 54550   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:00:40,685-Speed 25007.95 samples/sec   Loss 1.4190   LearningRate 0.0001   Epoch: 31   Global Step: 54560   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:00:50,486-Speed 25079.99 samples/sec   Loss 1.4251   LearningRate 0.0001   Epoch: 31   Global Step: 54570   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:01:00,245-Speed 25184.80 samples/sec   Loss 1.4196   LearningRate 0.0001   Epoch: 31   Global Step: 54580   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:01:10,179-Speed 24744.17 samples/sec   Loss 1.4201   LearningRate 0.0001   Epoch: 31   Global Step: 54590   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:01:19,967-Speed 25110.18 samples/sec   Loss 1.4136   LearningRate 0.0001   Epoch: 31   Global Step: 54600   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:01:29,863-Speed 24844.15 samples/sec   Loss 1.4318   LearningRate 0.0001   Epoch: 31   Global Step: 54610   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:01:39,641-Speed 25136.60 samples/sec   Loss 1.4370   LearningRate 0.0001   Epoch: 31   Global Step: 54620   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:01:49,502-Speed 24926.37 samples/sec   Loss 1.4156   LearningRate 0.0001   Epoch: 31   Global Step: 54630   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:01:59,308-Speed 25064.19 samples/sec   Loss 1.4201   LearningRate 0.0001   Epoch: 31   Global Step: 54640   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:02:09,126-Speed 25041.28 samples/sec   Loss 1.4202   LearningRate 0.0001   Epoch: 31   Global Step: 54650   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:02:18,886-Speed 25184.28 samples/sec   Loss 1.4140   LearningRate 0.0001   Epoch: 31   Global Step: 54660   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:02:28,591-Speed 25329.35 samples/sec   Loss 1.4288   LearningRate 0.0001   Epoch: 31   Global Step: 54670   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:02:38,249-Speed 25449.30 samples/sec   Loss 1.4229   LearningRate 0.0001   Epoch: 31   Global Step: 54680   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:02:47,985-Speed 25248.88 samples/sec   Loss 1.4177   LearningRate 0.0001   Epoch: 31   Global Step: 54690   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:02:57,812-Speed 25012.64 samples/sec   Loss 1.4217   LearningRate 0.0001   Epoch: 31   Global Step: 54700   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:03:07,572-Speed 25183.38 samples/sec   Loss 1.4206   LearningRate 0.0001   Epoch: 31   Global Step: 54710   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:03:17,331-Speed 25188.12 samples/sec   Loss 1.4082   LearningRate 0.0001   Epoch: 31   Global Step: 54720   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:03:27,064-Speed 25255.26 samples/sec   Loss 1.4164   LearningRate 0.0001   Epoch: 31   Global Step: 54730   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:03:36,869-Speed 25071.01 samples/sec   Loss 1.4189   LearningRate 0.0001   Epoch: 31   Global Step: 54740   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:03:46,687-Speed 25037.28 samples/sec   Loss 1.4239   LearningRate 0.0001   Epoch: 31   Global Step: 54750   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:03:56,470-Speed 25125.25 samples/sec   Loss 1.4158   LearningRate 0.0001   Epoch: 31   Global Step: 54760   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:04:06,215-Speed 25222.65 samples/sec   Loss 1.4162   LearningRate 0.0001   Epoch: 31   Global Step: 54770   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:04:15,950-Speed 25246.92 samples/sec   Loss 1.4173   LearningRate 0.0001   Epoch: 31   Global Step: 54780   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:04:25,658-Speed 25319.58 samples/sec   Loss 1.4093   LearningRate 0.0001   Epoch: 31   Global Step: 54790   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-26 14:04:35,405-Speed 25218.84 samples/sec   Loss 1.4065   LearningRate 0.0001   Epoch: 31   Global Step: 54800   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-26 14:04:45,316-Speed 24803.10 samples/sec   Loss 1.4137   LearningRate 0.0001   Epoch: 31   Global Step: 54810   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-26 14:04:55,050-Speed 25253.72 samples/sec   Loss 1.4120   LearningRate 0.0001   Epoch: 31   Global Step: 54820   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-26 14:05:04,814-Speed 25172.37 samples/sec   Loss 1.4144   LearningRate 0.0001   Epoch: 31   Global Step: 54830   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-26 14:05:14,553-Speed 25242.80 samples/sec   Loss 1.4151   LearningRate 0.0001   Epoch: 31   Global Step: 54840   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-26 14:05:24,285-Speed 25258.99 samples/sec   Loss 1.4147   LearningRate 0.0001   Epoch: 31   Global Step: 54850   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-26 14:05:34,024-Speed 25238.49 samples/sec   Loss 1.4257   LearningRate 0.0001   Epoch: 31   Global Step: 54860   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-26 14:05:43,806-Speed 25126.45 samples/sec   Loss 1.4140   LearningRate 0.0001   Epoch: 31   Global Step: 54870   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-26 14:05:53,512-Speed 25325.47 samples/sec   Loss 1.4232   LearningRate 0.0001   Epoch: 31   Global Step: 54880   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-26 14:06:03,415-Speed 24820.51 samples/sec   Loss 1.4187   LearningRate 0.0001   Epoch: 31   Global Step: 54890   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:06:13,098-Speed 25384.40 samples/sec   Loss 1.4064   LearningRate 0.0001   Epoch: 31   Global Step: 54900   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:06:22,902-Speed 25070.78 samples/sec   Loss 1.4277   LearningRate 0.0001   Epoch: 31   Global Step: 54910   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:06:32,699-Speed 25088.38 samples/sec   Loss 1.4046   LearningRate 0.0001   Epoch: 31   Global Step: 54920   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:06:42,460-Speed 25189.22 samples/sec   Loss 1.4241   LearningRate 0.0001   Epoch: 31   Global Step: 54930   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:06:52,393-Speed 24746.00 samples/sec   Loss 1.4119   LearningRate 0.0001   Epoch: 31   Global Step: 54940   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:07:02,264-Speed 24900.02 samples/sec   Loss 1.4011   LearningRate 0.0001   Epoch: 31   Global Step: 54950   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:07:11,925-Speed 25442.64 samples/sec   Loss 1.3991   LearningRate 0.0001   Epoch: 31   Global Step: 54960   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:07:21,654-Speed 25265.02 samples/sec   Loss 1.4089   LearningRate 0.0001   Epoch: 31   Global Step: 54970   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:07:31,451-Speed 25088.33 samples/sec   Loss 1.4078   LearningRate 0.0001   Epoch: 31   Global Step: 54980   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:07:41,175-Speed 25278.87 samples/sec   Loss 1.4090   LearningRate 0.0001   Epoch: 31   Global Step: 54990   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:07:50,921-Speed 25220.07 samples/sec   Loss 1.4120   LearningRate 0.0001   Epoch: 31   Global Step: 55000   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:08:00,690-Speed 25160.74 samples/sec   Loss 1.4043   LearningRate 0.0001   Epoch: 31   Global Step: 55010   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:08:10,534-Speed 24967.28 samples/sec   Loss 1.4231   LearningRate 0.0001   Epoch: 31   Global Step: 55020   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:08:20,359-Speed 25015.92 samples/sec   Loss 1.4148   LearningRate 0.0001   Epoch: 31   Global Step: 55030   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:08:30,198-Speed 24982.71 samples/sec   Loss 1.4186   LearningRate 0.0001   Epoch: 31   Global Step: 55040   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:08:39,902-Speed 25329.16 samples/sec   Loss 1.4136   LearningRate 0.0001   Epoch: 31   Global Step: 55050   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:08:49,692-Speed 25108.14 samples/sec   Loss 1.4137   LearningRate 0.0001   Epoch: 31   Global Step: 55060   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:08:59,456-Speed 25174.99 samples/sec   Loss 1.4190   LearningRate 0.0001   Epoch: 31   Global Step: 55070   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:09:09,214-Speed 25187.41 samples/sec   Loss 1.4052   LearningRate 0.0001   Epoch: 31   Global Step: 55080   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:09:18,978-Speed 25174.24 samples/sec   Loss 1.4058   LearningRate 0.0001   Epoch: 31   Global Step: 55090   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-03-26 14:09:28,695-Speed 25297.23 samples/sec   Loss 1.4128   LearningRate 0.0001   Epoch: 31   Global Step: 55100   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:09:38,446-Speed 25209.88 samples/sec   Loss 1.4171   LearningRate 0.0001   Epoch: 31   Global Step: 55110   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:09:48,211-Speed 25173.44 samples/sec   Loss 1.3989   LearningRate 0.0001   Epoch: 31   Global Step: 55120   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:09:58,115-Speed 24817.85 samples/sec   Loss 1.4084   LearningRate 0.0001   Epoch: 31   Global Step: 55130   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:10:07,951-Speed 24989.06 samples/sec   Loss 1.4104   LearningRate 0.0001   Epoch: 31   Global Step: 55140   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:10:17,803-Speed 24949.45 samples/sec   Loss 1.4145   LearningRate 0.0001   Epoch: 31   Global Step: 55150   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:10:27,598-Speed 25094.16 samples/sec   Loss 1.4126   LearningRate 0.0001   Epoch: 31   Global Step: 55160   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:10:37,316-Speed 25294.28 samples/sec   Loss 1.4089   LearningRate 0.0001   Epoch: 31   Global Step: 55170   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:10:47,036-Speed 25288.04 samples/sec   Loss 1.4088   LearningRate 0.0001   Epoch: 31   Global Step: 55180   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:10:56,809-Speed 25148.55 samples/sec   Loss 1.4201   LearningRate 0.0001   Epoch: 31   Global Step: 55190   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:11:06,608-Speed 25091.20 samples/sec   Loss 1.4126   LearningRate 0.0001   Epoch: 31   Global Step: 55200   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:11:16,314-Speed 25324.59 samples/sec   Loss 1.3962   LearningRate 0.0000   Epoch: 31   Global Step: 55210   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:11:25,986-Speed 25413.08 samples/sec   Loss 1.4090   LearningRate 0.0000   Epoch: 31   Global Step: 55220   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:11:35,728-Speed 25229.97 samples/sec   Loss 1.4146   LearningRate 0.0000   Epoch: 31   Global Step: 55230   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:11:45,444-Speed 25297.60 samples/sec   Loss 1.4079   LearningRate 0.0000   Epoch: 31   Global Step: 55240   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-26 14:11:55,201-Speed 25192.26 samples/sec   Loss 1.4070   LearningRate 0.0000   Epoch: 31   Global Step: 55250   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-26 14:12:05,067-Speed 24913.69 samples/sec   Loss 1.4197   LearningRate 0.0000   Epoch: 31   Global Step: 55260   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-26 14:12:14,895-Speed 25009.03 samples/sec   Loss 1.4137   LearningRate 0.0000   Epoch: 31   Global Step: 55270   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-26 14:12:24,604-Speed 25315.40 samples/sec   Loss 1.4118   LearningRate 0.0000   Epoch: 31   Global Step: 55280   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-26 14:12:34,411-Speed 25062.99 samples/sec   Loss 1.4244   LearningRate 0.0000   Epoch: 31   Global Step: 55290   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-26 14:12:44,099-Speed 25370.65 samples/sec   Loss 1.4049   LearningRate 0.0000   Epoch: 31   Global Step: 55300   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-26 14:13:43,195-Speed 4158.73 samples/sec   Loss 1.4113   LearningRate 0.0000   Epoch: 32   Global Step: 55310   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-26 14:13:53,057-Speed 24924.48 samples/sec   Loss 1.3985   LearningRate 0.0000   Epoch: 32   Global Step: 55320   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-26 14:14:02,960-Speed 24820.61 samples/sec   Loss 1.4182   LearningRate 0.0000   Epoch: 32   Global Step: 55330   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-26 14:14:12,947-Speed 24612.17 samples/sec   Loss 1.3976   LearningRate 0.0000   Epoch: 32   Global Step: 55340   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:14:22,854-Speed 24807.86 samples/sec   Loss 1.4093   LearningRate 0.0000   Epoch: 32   Global Step: 55350   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:14:32,777-Speed 24769.02 samples/sec   Loss 1.3937   LearningRate 0.0000   Epoch: 32   Global Step: 55360   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:14:42,636-Speed 24932.34 samples/sec   Loss 1.4034   LearningRate 0.0000   Epoch: 32   Global Step: 55370   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:14:52,574-Speed 24732.47 samples/sec   Loss 1.3991   LearningRate 0.0000   Epoch: 32   Global Step: 55380   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:15:02,505-Speed 24750.56 samples/sec   Loss 1.4051   LearningRate 0.0000   Epoch: 32   Global Step: 55390   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:15:12,504-Speed 24580.38 samples/sec   Loss 1.3998   LearningRate 0.0000   Epoch: 32   Global Step: 55400   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:15:22,460-Speed 24686.46 samples/sec   Loss 1.4012   LearningRate 0.0000   Epoch: 32   Global Step: 55410   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:15:32,262-Speed 25077.05 samples/sec   Loss 1.3935   LearningRate 0.0000   Epoch: 32   Global Step: 55420   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-26 14:15:42,045-Speed 25123.86 samples/sec   Loss 1.4041   LearningRate 0.0000   Epoch: 32   Global Step: 55430   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-26 14:15:51,808-Speed 25177.03 samples/sec   Loss 1.4034   LearningRate 0.0000   Epoch: 32   Global Step: 55440   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-26 14:16:01,577-Speed 25158.35 samples/sec   Loss 1.4019   LearningRate 0.0000   Epoch: 32   Global Step: 55450   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-26 14:16:11,388-Speed 25054.23 samples/sec   Loss 1.3957   LearningRate 0.0000   Epoch: 32   Global Step: 55460   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-26 14:16:21,124-Speed 25246.74 samples/sec   Loss 1.4014   LearningRate 0.0000   Epoch: 32   Global Step: 55470   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-26 14:16:30,945-Speed 25026.18 samples/sec   Loss 1.3966   LearningRate 0.0000   Epoch: 32   Global Step: 55480   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-26 14:16:40,666-Speed 25285.34 samples/sec   Loss 1.3986   LearningRate 0.0000   Epoch: 32   Global Step: 55490   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-26 14:16:50,426-Speed 25184.22 samples/sec   Loss 1.4006   LearningRate 0.0000   Epoch: 32   Global Step: 55500   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-26 14:17:00,277-Speed 24954.55 samples/sec   Loss 1.4015   LearningRate 0.0000   Epoch: 32   Global Step: 55510   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-26 14:17:09,975-Speed 25344.54 samples/sec   Loss 1.3987   LearningRate 0.0000   Epoch: 32   Global Step: 55520   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:17:19,748-Speed 25149.53 samples/sec   Loss 1.4072   LearningRate 0.0000   Epoch: 32   Global Step: 55530   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:17:29,470-Speed 25284.54 samples/sec   Loss 1.3991   LearningRate 0.0000   Epoch: 32   Global Step: 55540   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:17:39,304-Speed 24993.36 samples/sec   Loss 1.3950   LearningRate 0.0000   Epoch: 32   Global Step: 55550   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:17:49,202-Speed 24831.92 samples/sec   Loss 1.4003   LearningRate 0.0000   Epoch: 32   Global Step: 55560   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:17:59,064-Speed 24925.24 samples/sec   Loss 1.4077   LearningRate 0.0000   Epoch: 32   Global Step: 55570   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:18:08,957-Speed 24843.71 samples/sec   Loss 1.4019   LearningRate 0.0000   Epoch: 32   Global Step: 55580   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:18:18,833-Speed 24890.06 samples/sec   Loss 1.4011   LearningRate 0.0000   Epoch: 32   Global Step: 55590   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:18:28,601-Speed 25162.16 samples/sec   Loss 1.4056   LearningRate 0.0000   Epoch: 32   Global Step: 55600   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:18:38,462-Speed 24927.57 samples/sec   Loss 1.4085   LearningRate 0.0000   Epoch: 32   Global Step: 55610   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:18:48,171-Speed 25316.58 samples/sec   Loss 1.4091   LearningRate 0.0000   Epoch: 32   Global Step: 55620   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:18:57,919-Speed 25212.55 samples/sec   Loss 1.3927   LearningRate 0.0000   Epoch: 32   Global Step: 55630   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:19:07,664-Speed 25224.50 samples/sec   Loss 1.4013   LearningRate 0.0000   Epoch: 32   Global Step: 55640   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:19:17,563-Speed 24828.78 samples/sec   Loss 1.3976   LearningRate 0.0000   Epoch: 32   Global Step: 55650   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:19:27,342-Speed 25134.08 samples/sec   Loss 1.3994   LearningRate 0.0000   Epoch: 32   Global Step: 55660   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:19:37,090-Speed 25215.47 samples/sec   Loss 1.3956   LearningRate 0.0000   Epoch: 32   Global Step: 55670   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:19:46,849-Speed 25186.70 samples/sec   Loss 1.4009   LearningRate 0.0000   Epoch: 32   Global Step: 55680   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:19:56,640-Speed 25103.42 samples/sec   Loss 1.3994   LearningRate 0.0000   Epoch: 32   Global Step: 55690   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:20:06,398-Speed 25191.64 samples/sec   Loss 1.3892   LearningRate 0.0000   Epoch: 32   Global Step: 55700   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:20:16,214-Speed 25042.70 samples/sec   Loss 1.4066   LearningRate 0.0000   Epoch: 32   Global Step: 55710   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:20:25,957-Speed 25226.95 samples/sec   Loss 1.3994   LearningRate 0.0000   Epoch: 32   Global Step: 55720   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:20:35,829-Speed 24899.26 samples/sec   Loss 1.4010   LearningRate 0.0000   Epoch: 32   Global Step: 55730   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:20:45,709-Speed 24879.56 samples/sec   Loss 1.4005   LearningRate 0.0000   Epoch: 32   Global Step: 55740   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:20:55,538-Speed 25004.72 samples/sec   Loss 1.3948   LearningRate 0.0000   Epoch: 32   Global Step: 55750   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:21:05,299-Speed 25183.12 samples/sec   Loss 1.3979   LearningRate 0.0000   Epoch: 32   Global Step: 55760   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:21:15,098-Speed 25084.79 samples/sec   Loss 1.3970   LearningRate 0.0000   Epoch: 32   Global Step: 55770   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:21:24,976-Speed 24881.39 samples/sec   Loss 1.4043   LearningRate 0.0000   Epoch: 32   Global Step: 55780   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:21:34,836-Speed 24928.52 samples/sec   Loss 1.4001   LearningRate 0.0000   Epoch: 32   Global Step: 55790   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:21:44,570-Speed 25252.05 samples/sec   Loss 1.4024   LearningRate 0.0000   Epoch: 32   Global Step: 55800   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:21:54,351-Speed 25131.37 samples/sec   Loss 1.4050   LearningRate 0.0000   Epoch: 32   Global Step: 55810   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:22:04,135-Speed 25123.24 samples/sec   Loss 1.3950   LearningRate 0.0000   Epoch: 32   Global Step: 55820   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:22:13,922-Speed 25115.55 samples/sec   Loss 1.3907   LearningRate 0.0000   Epoch: 32   Global Step: 55830   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:22:23,700-Speed 25138.02 samples/sec   Loss 1.3922   LearningRate 0.0000   Epoch: 32   Global Step: 55840   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:22:33,539-Speed 24980.65 samples/sec   Loss 1.3875   LearningRate 0.0000   Epoch: 32   Global Step: 55850   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:22:43,307-Speed 25163.66 samples/sec   Loss 1.3926   LearningRate 0.0000   Epoch: 32   Global Step: 55860   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-26 14:22:53,199-Speed 24849.39 samples/sec   Loss 1.3878   LearningRate 0.0000   Epoch: 32   Global Step: 55870   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-26 14:23:03,010-Speed 25054.05 samples/sec   Loss 1.3827   LearningRate 0.0000   Epoch: 32   Global Step: 55880   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-26 14:23:12,754-Speed 25225.15 samples/sec   Loss 1.4008   LearningRate 0.0000   Epoch: 32   Global Step: 55890   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-26 14:23:22,473-Speed 25290.13 samples/sec   Loss 1.3986   LearningRate 0.0000   Epoch: 32   Global Step: 55900   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-26 14:23:32,209-Speed 25244.50 samples/sec   Loss 1.3933   LearningRate 0.0000   Epoch: 32   Global Step: 55910   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-26 14:23:41,938-Speed 25265.20 samples/sec   Loss 1.3938   LearningRate 0.0000   Epoch: 32   Global Step: 55920   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-26 14:23:51,689-Speed 25207.08 samples/sec   Loss 1.3926   LearningRate 0.0000   Epoch: 32   Global Step: 55930   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-26 14:24:01,560-Speed 24905.49 samples/sec   Loss 1.3864   LearningRate 0.0000   Epoch: 32   Global Step: 55940   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-26 14:24:11,356-Speed 25092.96 samples/sec   Loss 1.4017   LearningRate 0.0000   Epoch: 32   Global Step: 55950   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-26 14:24:21,144-Speed 25111.77 samples/sec   Loss 1.4050   LearningRate 0.0000   Epoch: 32   Global Step: 55960   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:24:30,890-Speed 25221.54 samples/sec   Loss 1.3943   LearningRate 0.0000   Epoch: 32   Global Step: 55970   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:24:40,652-Speed 25178.10 samples/sec   Loss 1.3901   LearningRate 0.0000   Epoch: 32   Global Step: 55980   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:24:50,427-Speed 25145.82 samples/sec   Loss 1.4003   LearningRate 0.0000   Epoch: 32   Global Step: 55990   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:25:00,295-Speed 24907.24 samples/sec   Loss 1.3968   LearningRate 0.0000   Epoch: 32   Global Step: 56000   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:25:10,165-Speed 24904.41 samples/sec   Loss 1.4048   LearningRate 0.0000   Epoch: 32   Global Step: 56010   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:25:19,921-Speed 25196.05 samples/sec   Loss 1.3886   LearningRate 0.0000   Epoch: 32   Global Step: 56020   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:25:29,690-Speed 25159.58 samples/sec   Loss 1.3908   LearningRate 0.0000   Epoch: 32   Global Step: 56030   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:25:39,456-Speed 25169.21 samples/sec   Loss 1.3949   LearningRate 0.0000   Epoch: 32   Global Step: 56040   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:25:49,221-Speed 25171.11 samples/sec   Loss 1.4000   LearningRate 0.0000   Epoch: 32   Global Step: 56050   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:25:58,896-Speed 25404.33 samples/sec   Loss 1.3860   LearningRate 0.0000   Epoch: 32   Global Step: 56060   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:26:08,569-Speed 25410.03 samples/sec   Loss 1.3919   LearningRate 0.0000   Epoch: 32   Global Step: 56070   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:26:18,358-Speed 25109.01 samples/sec   Loss 1.3931   LearningRate 0.0000   Epoch: 32   Global Step: 56080   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:26:28,164-Speed 25064.75 samples/sec   Loss 1.3942   LearningRate 0.0000   Epoch: 32   Global Step: 56090   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:26:37,885-Speed 25283.71 samples/sec   Loss 1.3841   LearningRate 0.0000   Epoch: 32   Global Step: 56100   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:26:47,665-Speed 25138.83 samples/sec   Loss 1.3848   LearningRate 0.0000   Epoch: 32   Global Step: 56110   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:26:57,647-Speed 24623.42 samples/sec   Loss 1.4011   LearningRate 0.0000   Epoch: 32   Global Step: 56120   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:27:07,390-Speed 25227.70 samples/sec   Loss 1.3823   LearningRate 0.0000   Epoch: 32   Global Step: 56130   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:27:17,157-Speed 25164.99 samples/sec   Loss 1.3910   LearningRate 0.0000   Epoch: 32   Global Step: 56140   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:27:26,943-Speed 25114.92 samples/sec   Loss 1.3857   LearningRate 0.0000   Epoch: 32   Global Step: 56150   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:27:36,719-Speed 25143.87 samples/sec   Loss 1.3812   LearningRate 0.0000   Epoch: 32   Global Step: 56160   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:27:46,470-Speed 25207.23 samples/sec   Loss 1.3836   LearningRate 0.0000   Epoch: 32   Global Step: 56170   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:27:56,413-Speed 24721.54 samples/sec   Loss 1.3850   LearningRate 0.0000   Epoch: 32   Global Step: 56180   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:28:06,184-Speed 25154.23 samples/sec   Loss 1.3864   LearningRate 0.0000   Epoch: 32   Global Step: 56190   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:28:15,885-Speed 25338.97 samples/sec   Loss 1.3805   LearningRate 0.0000   Epoch: 32   Global Step: 56200   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:28:25,620-Speed 25248.67 samples/sec   Loss 1.3842   LearningRate 0.0000   Epoch: 32   Global Step: 56210   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:28:35,421-Speed 25079.36 samples/sec   Loss 1.3810   LearningRate 0.0000   Epoch: 32   Global Step: 56220   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:28:45,155-Speed 25251.81 samples/sec   Loss 1.3886   LearningRate 0.0000   Epoch: 32   Global Step: 56230   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:28:54,938-Speed 25124.38 samples/sec   Loss 1.3868   LearningRate 0.0000   Epoch: 32   Global Step: 56240   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:29:04,647-Speed 25316.86 samples/sec   Loss 1.3895   LearningRate 0.0000   Epoch: 32   Global Step: 56250   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:29:14,373-Speed 25273.41 samples/sec   Loss 1.3858   LearningRate 0.0000   Epoch: 32   Global Step: 56260   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:29:24,166-Speed 25104.45 samples/sec   Loss 1.3873   LearningRate 0.0000   Epoch: 32   Global Step: 56270   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:29:33,932-Speed 25168.07 samples/sec   Loss 1.3787   LearningRate 0.0000   Epoch: 32   Global Step: 56280   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:29:43,752-Speed 25028.56 samples/sec   Loss 1.3922   LearningRate 0.0000   Epoch: 32   Global Step: 56290   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:29:53,473-Speed 25285.51 samples/sec   Loss 1.3874   LearningRate 0.0000   Epoch: 32   Global Step: 56300   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:30:03,361-Speed 24857.37 samples/sec   Loss 1.3779   LearningRate 0.0000   Epoch: 32   Global Step: 56310   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:30:13,168-Speed 25065.27 samples/sec   Loss 1.3818   LearningRate 0.0000   Epoch: 32   Global Step: 56320   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:30:23,058-Speed 24852.94 samples/sec   Loss 1.3777   LearningRate 0.0000   Epoch: 32   Global Step: 56330   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:30:32,804-Speed 25221.54 samples/sec   Loss 1.3881   LearningRate 0.0000   Epoch: 32   Global Step: 56340   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:30:42,597-Speed 25100.23 samples/sec   Loss 1.3904   LearningRate 0.0000   Epoch: 32   Global Step: 56350   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:30:52,359-Speed 25177.73 samples/sec   Loss 1.3917   LearningRate 0.0000   Epoch: 32   Global Step: 56360   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:31:02,151-Speed 25106.31 samples/sec   Loss 1.3860   LearningRate 0.0000   Epoch: 32   Global Step: 56370   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:31:11,875-Speed 25275.84 samples/sec   Loss 1.3859   LearningRate 0.0000   Epoch: 32   Global Step: 56380   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:31:21,712-Speed 24993.66 samples/sec   Loss 1.3814   LearningRate 0.0000   Epoch: 32   Global Step: 56390   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:31:31,494-Speed 25126.80 samples/sec   Loss 1.3814   LearningRate 0.0000   Epoch: 32   Global Step: 56400   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:31:41,255-Speed 25181.90 samples/sec   Loss 1.3770   LearningRate 0.0000   Epoch: 32   Global Step: 56410   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:31:51,089-Speed 24993.33 samples/sec   Loss 1.3757   LearningRate 0.0000   Epoch: 32   Global Step: 56420   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:32:00,894-Speed 25067.07 samples/sec   Loss 1.3849   LearningRate 0.0000   Epoch: 32   Global Step: 56430   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:32:10,617-Speed 25280.58 samples/sec   Loss 1.3794   LearningRate 0.0000   Epoch: 32   Global Step: 56440   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:32:20,341-Speed 25277.16 samples/sec   Loss 1.3877   LearningRate 0.0000   Epoch: 32   Global Step: 56450   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:32:30,114-Speed 25147.54 samples/sec   Loss 1.3787   LearningRate 0.0000   Epoch: 32   Global Step: 56460   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-26 14:32:39,835-Speed 25283.86 samples/sec   Loss 1.3883   LearningRate 0.0000   Epoch: 32   Global Step: 56470   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-26 14:32:49,645-Speed 25057.71 samples/sec   Loss 1.3749   LearningRate 0.0000   Epoch: 32   Global Step: 56480   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-26 14:32:59,345-Speed 25338.21 samples/sec   Loss 1.3839   LearningRate 0.0000   Epoch: 32   Global Step: 56490   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-26 14:33:09,040-Speed 25362.34 samples/sec   Loss 1.3777   LearningRate 0.0000   Epoch: 32   Global Step: 56500   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-26 14:33:18,818-Speed 25137.69 samples/sec   Loss 1.3810   LearningRate 0.0000   Epoch: 32   Global Step: 56510   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-26 14:33:28,522-Speed 25329.07 samples/sec   Loss 1.3806   LearningRate 0.0000   Epoch: 32   Global Step: 56520   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-26 14:33:38,237-Speed 25308.61 samples/sec   Loss 1.3764   LearningRate 0.0000   Epoch: 32   Global Step: 56530   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-26 14:33:47,995-Speed 25188.17 samples/sec   Loss 1.3771   LearningRate 0.0000   Epoch: 32   Global Step: 56540   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-26 14:33:57,736-Speed 25234.75 samples/sec   Loss 1.3854   LearningRate 0.0000   Epoch: 32   Global Step: 56550   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-26 14:34:07,524-Speed 25112.27 samples/sec   Loss 1.3862   LearningRate 0.0000   Epoch: 32   Global Step: 56560   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:34:17,324-Speed 25080.39 samples/sec   Loss 1.3787   LearningRate 0.0000   Epoch: 32   Global Step: 56570   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:34:27,007-Speed 25381.21 samples/sec   Loss 1.3888   LearningRate 0.0000   Epoch: 32   Global Step: 56580   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:34:36,855-Speed 24958.93 samples/sec   Loss 1.3893   LearningRate 0.0000   Epoch: 32   Global Step: 56590   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:34:46,652-Speed 25090.77 samples/sec   Loss 1.3746   LearningRate 0.0000   Epoch: 32   Global Step: 56600   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:34:56,410-Speed 25189.70 samples/sec   Loss 1.3734   LearningRate 0.0000   Epoch: 32   Global Step: 56610   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:35:06,327-Speed 24784.34 samples/sec   Loss 1.3763   LearningRate 0.0000   Epoch: 32   Global Step: 56620   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-26 14:35:16,185-Speed 24935.49 samples/sec   Loss 1.3842   LearningRate 0.0000   Epoch: 32   Global Step: 56630   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-26 14:35:26,021-Speed 24989.87 samples/sec   Loss 1.3834   LearningRate 0.0000   Epoch: 32   Global Step: 56640   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-26 14:35:35,830-Speed 25058.74 samples/sec   Loss 1.3764   LearningRate 0.0000   Epoch: 32   Global Step: 56650   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-26 14:35:45,669-Speed 24982.46 samples/sec   Loss 1.3856   LearningRate 0.0000   Epoch: 32   Global Step: 56660   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-26 14:35:55,471-Speed 25074.18 samples/sec   Loss 1.3786   LearningRate 0.0000   Epoch: 32   Global Step: 56670   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-26 14:36:05,307-Speed 24989.49 samples/sec   Loss 1.3742   LearningRate 0.0000   Epoch: 32   Global Step: 56680   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-26 14:36:15,138-Speed 25020.35 samples/sec   Loss 1.3793   LearningRate 0.0000   Epoch: 32   Global Step: 56690   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-26 14:36:24,996-Speed 24931.19 samples/sec   Loss 1.3675   LearningRate 0.0000   Epoch: 32   Global Step: 56700   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-26 14:36:34,695-Speed 25343.42 samples/sec   Loss 1.3817   LearningRate 0.0000   Epoch: 32   Global Step: 56710   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-26 14:36:44,453-Speed 25188.44 samples/sec   Loss 1.3697   LearningRate 0.0000   Epoch: 32   Global Step: 56720   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 14:36:54,349-Speed 24837.46 samples/sec   Loss 1.3762   LearningRate 0.0000   Epoch: 32   Global Step: 56730   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 14:37:04,180-Speed 25002.39 samples/sec   Loss 1.3801   LearningRate 0.0000   Epoch: 32   Global Step: 56740   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 14:37:13,992-Speed 25050.10 samples/sec   Loss 1.3703   LearningRate 0.0000   Epoch: 32   Global Step: 56750   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 14:37:23,778-Speed 25116.24 samples/sec   Loss 1.3812   LearningRate 0.0000   Epoch: 32   Global Step: 56760   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 14:37:33,661-Speed 24871.13 samples/sec   Loss 1.3713   LearningRate 0.0000   Epoch: 32   Global Step: 56770   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 14:37:43,528-Speed 24909.72 samples/sec   Loss 1.3864   LearningRate 0.0000   Epoch: 32   Global Step: 56780   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 14:37:53,276-Speed 25212.73 samples/sec   Loss 1.3796   LearningRate 0.0000   Epoch: 32   Global Step: 56790   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 14:38:03,048-Speed 25152.65 samples/sec   Loss 1.3833   LearningRate 0.0000   Epoch: 32   Global Step: 56800   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 14:38:12,808-Speed 25184.87 samples/sec   Loss 1.3799   LearningRate 0.0000   Epoch: 32   Global Step: 56810   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 14:38:22,591-Speed 25130.93 samples/sec   Loss 1.3824   LearningRate 0.0000   Epoch: 32   Global Step: 56820   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 14:38:32,328-Speed 25243.99 samples/sec   Loss 1.3719   LearningRate 0.0000   Epoch: 32   Global Step: 56830   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 14:38:42,119-Speed 25105.56 samples/sec   Loss 1.3764   LearningRate 0.0000   Epoch: 32   Global Step: 56840   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 14:38:51,943-Speed 25018.73 samples/sec   Loss 1.3747   LearningRate 0.0000   Epoch: 32   Global Step: 56850   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 14:39:01,651-Speed 25320.49 samples/sec   Loss 1.3763   LearningRate 0.0000   Epoch: 32   Global Step: 56860   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 14:39:11,500-Speed 24954.83 samples/sec   Loss 1.3732   LearningRate 0.0000   Epoch: 32   Global Step: 56870   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 14:39:21,205-Speed 25331.48 samples/sec   Loss 1.3768   LearningRate 0.0000   Epoch: 32   Global Step: 56880   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 14:39:30,923-Speed 25291.86 samples/sec   Loss 1.3759   LearningRate 0.0000   Epoch: 32   Global Step: 56890   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 14:39:40,719-Speed 25092.27 samples/sec   Loss 1.3815   LearningRate 0.0000   Epoch: 32   Global Step: 56900   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 14:39:50,506-Speed 25115.02 samples/sec   Loss 1.3802   LearningRate 0.0000   Epoch: 32   Global Step: 56910   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 14:40:00,279-Speed 25152.31 samples/sec   Loss 1.3789   LearningRate 0.0000   Epoch: 32   Global Step: 56920   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 14:40:10,169-Speed 24851.25 samples/sec   Loss 1.3734   LearningRate 0.0000   Epoch: 32   Global Step: 56930   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 14:40:19,953-Speed 25122.44 samples/sec   Loss 1.3722   LearningRate 0.0000   Epoch: 32   Global Step: 56940   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 14:40:29,959-Speed 24563.43 samples/sec   Loss 1.3745   LearningRate 0.0000   Epoch: 32   Global Step: 56950   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 14:40:39,831-Speed 24899.48 samples/sec   Loss 1.3679   LearningRate 0.0000   Epoch: 32   Global Step: 56960   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 14:40:49,725-Speed 24843.27 samples/sec   Loss 1.3723   LearningRate 0.0000   Epoch: 32   Global Step: 56970   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 14:40:59,608-Speed 24870.38 samples/sec   Loss 1.3818   LearningRate 0.0000   Epoch: 32   Global Step: 56980   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 14:41:09,403-Speed 25094.20 samples/sec   Loss 1.3766   LearningRate 0.0000   Epoch: 32   Global Step: 56990   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 14:41:19,178-Speed 25146.97 samples/sec   Loss 1.3772   LearningRate 0.0000   Epoch: 32   Global Step: 57000   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 14:41:28,997-Speed 25036.02 samples/sec   Loss 1.3886   LearningRate 0.0000   Epoch: 32   Global Step: 57010   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 14:41:38,765-Speed 25162.44 samples/sec   Loss 1.3743   LearningRate 0.0000   Epoch: 32   Global Step: 57020   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 14:41:48,565-Speed 25080.97 samples/sec   Loss 1.3833   LearningRate 0.0000   Epoch: 32   Global Step: 57030   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 14:42:47,872-Speed 4144.01 samples/sec   Loss 1.3745   LearningRate 0.0000   Epoch: 33   Global Step: 57040   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 14:42:57,593-Speed 25283.73 samples/sec   Loss 1.3729   LearningRate 0.0000   Epoch: 33   Global Step: 57050   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 14:43:07,349-Speed 25196.45 samples/sec   Loss 1.3652   LearningRate 0.0000   Epoch: 33   Global Step: 57060   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 14:43:17,110-Speed 25180.81 samples/sec   Loss 1.3704   LearningRate 0.0000   Epoch: 33   Global Step: 57070   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 14:43:26,907-Speed 25087.23 samples/sec   Loss 1.3774   LearningRate 0.0000   Epoch: 33   Global Step: 57080   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 14:43:36,716-Speed 25058.52 samples/sec   Loss 1.3660   LearningRate 0.0000   Epoch: 33   Global Step: 57090   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 14:43:46,429-Speed 25305.16 samples/sec   Loss 1.3629   LearningRate 0.0000   Epoch: 33   Global Step: 57100   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 14:43:56,223-Speed 25095.29 samples/sec   Loss 1.3696   LearningRate 0.0000   Epoch: 33   Global Step: 57110   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 14:44:05,989-Speed 25167.96 samples/sec   Loss 1.3643   LearningRate 0.0000   Epoch: 33   Global Step: 57120   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 14:44:15,762-Speed 25152.32 samples/sec   Loss 1.3653   LearningRate 0.0000   Epoch: 33   Global Step: 57130   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 14:44:25,515-Speed 25199.75 samples/sec   Loss 1.3636   LearningRate 0.0000   Epoch: 33   Global Step: 57140   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 14:44:35,315-Speed 25082.52 samples/sec   Loss 1.3612   LearningRate 0.0000   Epoch: 33   Global Step: 57150   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 14:44:45,150-Speed 24990.18 samples/sec   Loss 1.3716   LearningRate 0.0000   Epoch: 33   Global Step: 57160   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 14:44:54,908-Speed 25189.57 samples/sec   Loss 1.3780   LearningRate 0.0000   Epoch: 33   Global Step: 57170   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 14:45:04,750-Speed 24973.76 samples/sec   Loss 1.3748   LearningRate 0.0000   Epoch: 33   Global Step: 57180   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 14:45:14,623-Speed 24894.57 samples/sec   Loss 1.3666   LearningRate 0.0000   Epoch: 33   Global Step: 57190   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 14:45:24,383-Speed 25189.66 samples/sec   Loss 1.3666   LearningRate 0.0000   Epoch: 33   Global Step: 57200   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 14:45:34,125-Speed 25230.85 samples/sec   Loss 1.3593   LearningRate 0.0000   Epoch: 33   Global Step: 57210   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 14:45:43,849-Speed 25276.60 samples/sec   Loss 1.3671   LearningRate 0.0000   Epoch: 33   Global Step: 57220   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 14:45:53,607-Speed 25187.22 samples/sec   Loss 1.3596   LearningRate 0.0000   Epoch: 33   Global Step: 57230   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 14:46:03,394-Speed 25116.49 samples/sec   Loss 1.3736   LearningRate 0.0000   Epoch: 33   Global Step: 57240   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 14:46:13,195-Speed 25077.80 samples/sec   Loss 1.3684   LearningRate 0.0000   Epoch: 33   Global Step: 57250   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 14:46:23,124-Speed 24753.94 samples/sec   Loss 1.3728   LearningRate 0.0000   Epoch: 33   Global Step: 57260   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 14:46:32,818-Speed 25357.43 samples/sec   Loss 1.3634   LearningRate 0.0000   Epoch: 33   Global Step: 57270   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 14:46:42,565-Speed 25216.17 samples/sec   Loss 1.3617   LearningRate 0.0000   Epoch: 33   Global Step: 57280   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 14:46:52,418-Speed 24946.29 samples/sec   Loss 1.3571   LearningRate 0.0000   Epoch: 33   Global Step: 57290   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 14:47:02,182-Speed 25174.91 samples/sec   Loss 1.3562   LearningRate 0.0000   Epoch: 33   Global Step: 57300   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 14:47:11,950-Speed 25163.11 samples/sec   Loss 1.3723   LearningRate 0.0000   Epoch: 33   Global Step: 57310   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 14:47:21,856-Speed 24809.23 samples/sec   Loss 1.3709   LearningRate 0.0000   Epoch: 33   Global Step: 57320   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 14:47:31,690-Speed 24995.49 samples/sec   Loss 1.3807   LearningRate 0.0000   Epoch: 33   Global Step: 57330   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-03-26 14:47:41,455-Speed 25169.71 samples/sec   Loss 1.3695   LearningRate 0.0000   Epoch: 33   Global Step: 57340   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 14:47:51,298-Speed 24970.33 samples/sec   Loss 1.3725   LearningRate 0.0000   Epoch: 33   Global Step: 57350   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 14:48:01,122-Speed 25020.18 samples/sec   Loss 1.3651   LearningRate 0.0000   Epoch: 33   Global Step: 57360   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 14:48:10,831-Speed 25323.69 samples/sec   Loss 1.3704   LearningRate 0.0000   Epoch: 33   Global Step: 57370   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 14:48:20,509-Speed 25394.41 samples/sec   Loss 1.3685   LearningRate 0.0000   Epoch: 33   Global Step: 57380   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 14:48:30,292-Speed 25123.43 samples/sec   Loss 1.3676   LearningRate 0.0000   Epoch: 33   Global Step: 57390   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 14:48:40,029-Speed 25251.30 samples/sec   Loss 1.3705   LearningRate 0.0000   Epoch: 33   Global Step: 57400   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 14:48:49,842-Speed 25048.74 samples/sec   Loss 1.3648   LearningRate 0.0000   Epoch: 33   Global Step: 57410   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 14:48:59,625-Speed 25122.51 samples/sec   Loss 1.3698   LearningRate 0.0000   Epoch: 33   Global Step: 57420   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 14:49:09,519-Speed 24841.95 samples/sec   Loss 1.3659   LearningRate 0.0000   Epoch: 33   Global Step: 57430   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 14:49:19,246-Speed 25267.49 samples/sec   Loss 1.3614   LearningRate 0.0000   Epoch: 33   Global Step: 57440   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 14:49:29,103-Speed 24934.31 samples/sec   Loss 1.3645   LearningRate 0.0000   Epoch: 33   Global Step: 57450   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 14:49:38,891-Speed 25111.16 samples/sec   Loss 1.3648   LearningRate 0.0000   Epoch: 33   Global Step: 57460   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 14:49:48,675-Speed 25122.03 samples/sec   Loss 1.3650   LearningRate 0.0000   Epoch: 33   Global Step: 57470   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 14:49:58,398-Speed 25279.18 samples/sec   Loss 1.3727   LearningRate 0.0000   Epoch: 33   Global Step: 57480   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 14:50:08,132-Speed 25251.55 samples/sec   Loss 1.3763   LearningRate 0.0000   Epoch: 33   Global Step: 57490   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 14:50:17,827-Speed 25351.94 samples/sec   Loss 1.3600   LearningRate 0.0000   Epoch: 33   Global Step: 57500   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 14:50:27,617-Speed 25105.69 samples/sec   Loss 1.3709   LearningRate 0.0000   Epoch: 33   Global Step: 57510   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 14:50:37,301-Speed 25381.85 samples/sec   Loss 1.3684   LearningRate 0.0000   Epoch: 33   Global Step: 57520   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 14:50:47,058-Speed 25193.34 samples/sec   Loss 1.3639   LearningRate 0.0000   Epoch: 33   Global Step: 57530   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 14:50:56,817-Speed 25185.06 samples/sec   Loss 1.3641   LearningRate 0.0000   Epoch: 33   Global Step: 57540   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 14:51:06,598-Speed 25128.55 samples/sec   Loss 1.3626   LearningRate 0.0000   Epoch: 33   Global Step: 57550   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 14:51:16,297-Speed 25343.20 samples/sec   Loss 1.3586   LearningRate 0.0000   Epoch: 33   Global Step: 57560   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 14:51:26,099-Speed 25075.83 samples/sec   Loss 1.3688   LearningRate 0.0000   Epoch: 33   Global Step: 57570   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 14:51:35,854-Speed 25198.00 samples/sec   Loss 1.3598   LearningRate 0.0000   Epoch: 33   Global Step: 57580   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 14:51:45,700-Speed 24964.56 samples/sec   Loss 1.3579   LearningRate 0.0000   Epoch: 33   Global Step: 57590   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 14:51:55,557-Speed 24936.00 samples/sec   Loss 1.3564   LearningRate 0.0000   Epoch: 33   Global Step: 57600   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 14:52:05,328-Speed 25155.21 samples/sec   Loss 1.3584   LearningRate 0.0000   Epoch: 33   Global Step: 57610   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 14:52:15,130-Speed 25075.66 samples/sec   Loss 1.3702   LearningRate 0.0000   Epoch: 33   Global Step: 57620   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 14:52:24,904-Speed 25146.86 samples/sec   Loss 1.3629   LearningRate 0.0000   Epoch: 33   Global Step: 57630   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 14:52:34,604-Speed 25341.43 samples/sec   Loss 1.3662   LearningRate 0.0000   Epoch: 33   Global Step: 57640   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 14:52:44,491-Speed 24862.47 samples/sec   Loss 1.3584   LearningRate 0.0000   Epoch: 33   Global Step: 57650   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 14:52:54,422-Speed 24749.60 samples/sec   Loss 1.3646   LearningRate 0.0000   Epoch: 33   Global Step: 57660   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 14:53:04,173-Speed 25205.47 samples/sec   Loss 1.3615   LearningRate 0.0000   Epoch: 33   Global Step: 57670   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 14:53:13,844-Speed 25415.97 samples/sec   Loss 1.3645   LearningRate 0.0000   Epoch: 33   Global Step: 57680   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 14:53:23,569-Speed 25275.12 samples/sec   Loss 1.3640   LearningRate 0.0000   Epoch: 33   Global Step: 57690   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 14:53:33,353-Speed 25122.57 samples/sec   Loss 1.3643   LearningRate 0.0000   Epoch: 33   Global Step: 57700   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 14:53:43,171-Speed 25034.02 samples/sec   Loss 1.3629   LearningRate 0.0000   Epoch: 33   Global Step: 57710   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 14:53:52,952-Speed 25128.11 samples/sec   Loss 1.3580   LearningRate 0.0000   Epoch: 33   Global Step: 57720   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 14:54:02,769-Speed 25037.58 samples/sec   Loss 1.3628   LearningRate 0.0000   Epoch: 33   Global Step: 57730   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 14:54:12,598-Speed 25008.00 samples/sec   Loss 1.3667   LearningRate 0.0000   Epoch: 33   Global Step: 57740   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 14:54:22,340-Speed 25237.08 samples/sec   Loss 1.3557   LearningRate 0.0000   Epoch: 33   Global Step: 57750   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 14:54:32,086-Speed 25219.02 samples/sec   Loss 1.3566   LearningRate 0.0000   Epoch: 33   Global Step: 57760   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 14:54:41,858-Speed 25158.87 samples/sec   Loss 1.3594   LearningRate 0.0000   Epoch: 33   Global Step: 57770   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 14:54:51,741-Speed 24871.63 samples/sec   Loss 1.3511   LearningRate 0.0000   Epoch: 33   Global Step: 57780   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 14:55:01,601-Speed 24929.15 samples/sec   Loss 1.3487   LearningRate 0.0000   Epoch: 33   Global Step: 57790   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 14:55:11,374-Speed 25151.91 samples/sec   Loss 1.3741   LearningRate 0.0000   Epoch: 33   Global Step: 57800   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 14:55:21,131-Speed 25190.67 samples/sec   Loss 1.3644   LearningRate 0.0000   Epoch: 33   Global Step: 57810   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 14:55:30,872-Speed 25236.72 samples/sec   Loss 1.3610   LearningRate 0.0000   Epoch: 33   Global Step: 57820   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 14:55:40,684-Speed 25048.86 samples/sec   Loss 1.3651   LearningRate 0.0000   Epoch: 33   Global Step: 57830   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 14:55:50,432-Speed 25215.44 samples/sec   Loss 1.3576   LearningRate 0.0000   Epoch: 33   Global Step: 57840   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 14:56:00,288-Speed 24945.99 samples/sec   Loss 1.3575   LearningRate 0.0000   Epoch: 33   Global Step: 57850   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 14:56:10,094-Speed 25065.97 samples/sec   Loss 1.3553   LearningRate 0.0000   Epoch: 33   Global Step: 57860   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 14:56:20,016-Speed 24772.24 samples/sec   Loss 1.3626   LearningRate 0.0000   Epoch: 33   Global Step: 57870   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 14:56:29,841-Speed 25017.60 samples/sec   Loss 1.3530   LearningRate 0.0000   Epoch: 33   Global Step: 57880   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 14:56:39,632-Speed 25110.07 samples/sec   Loss 1.3568   LearningRate 0.0000   Epoch: 33   Global Step: 57890   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 14:56:49,348-Speed 25298.75 samples/sec   Loss 1.3459   LearningRate 0.0000   Epoch: 33   Global Step: 57900   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 14:56:59,149-Speed 25080.28 samples/sec   Loss 1.3593   LearningRate 0.0000   Epoch: 33   Global Step: 57910   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 14:57:08,896-Speed 25217.04 samples/sec   Loss 1.3448   LearningRate 0.0000   Epoch: 33   Global Step: 57920   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 14:57:18,695-Speed 25090.79 samples/sec   Loss 1.3535   LearningRate 0.0000   Epoch: 33   Global Step: 57930   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 14:57:28,455-Speed 25182.17 samples/sec   Loss 1.3479   LearningRate 0.0000   Epoch: 33   Global Step: 57940   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-03-26 14:57:38,169-Speed 25302.23 samples/sec   Loss 1.3584   LearningRate 0.0000   Epoch: 33   Global Step: 57950   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 14:57:47,942-Speed 25150.79 samples/sec   Loss 1.3479   LearningRate 0.0000   Epoch: 33   Global Step: 57960   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 14:57:57,690-Speed 25215.58 samples/sec   Loss 1.3505   LearningRate 0.0000   Epoch: 33   Global Step: 57970   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 14:58:07,519-Speed 25006.84 samples/sec   Loss 1.3627   LearningRate 0.0000   Epoch: 33   Global Step: 57980   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 14:58:17,308-Speed 25109.73 samples/sec   Loss 1.3495   LearningRate 0.0000   Epoch: 33   Global Step: 57990   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 14:58:27,095-Speed 25112.58 samples/sec   Loss 1.3493   LearningRate 0.0000   Epoch: 33   Global Step: 58000   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 14:58:36,869-Speed 25147.23 samples/sec   Loss 1.3547   LearningRate 0.0000   Epoch: 33   Global Step: 58010   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 14:58:46,612-Speed 25228.37 samples/sec   Loss 1.3571   LearningRate 0.0000   Epoch: 33   Global Step: 58020   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 14:58:56,336-Speed 25279.04 samples/sec   Loss 1.3471   LearningRate 0.0000   Epoch: 33   Global Step: 58030   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 14:59:06,023-Speed 25372.89 samples/sec   Loss 1.3424   LearningRate 0.0000   Epoch: 33   Global Step: 58040   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 14:59:15,819-Speed 25092.53 samples/sec   Loss 1.3529   LearningRate 0.0000   Epoch: 33   Global Step: 58050   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 14:59:25,647-Speed 25008.93 samples/sec   Loss 1.3577   LearningRate 0.0000   Epoch: 33   Global Step: 58060   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 14:59:35,398-Speed 25206.21 samples/sec   Loss 1.3407   LearningRate 0.0000   Epoch: 33   Global Step: 58070   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 14:59:45,264-Speed 24914.52 samples/sec   Loss 1.3512   LearningRate 0.0000   Epoch: 33   Global Step: 58080   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 14:59:55,046-Speed 25125.34 samples/sec   Loss 1.3606   LearningRate 0.0000   Epoch: 33   Global Step: 58090   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 15:00:04,822-Speed 25141.13 samples/sec   Loss 1.3499   LearningRate 0.0000   Epoch: 33   Global Step: 58100   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 15:00:14,677-Speed 24941.51 samples/sec   Loss 1.3526   LearningRate 0.0000   Epoch: 33   Global Step: 58110   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 15:00:24,447-Speed 25159.31 samples/sec   Loss 1.3568   LearningRate 0.0000   Epoch: 33   Global Step: 58120   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 15:00:34,254-Speed 25061.11 samples/sec   Loss 1.3467   LearningRate 0.0000   Epoch: 33   Global Step: 58130   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 15:00:43,992-Speed 25241.95 samples/sec   Loss 1.3500   LearningRate 0.0000   Epoch: 33   Global Step: 58140   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 15:00:53,881-Speed 24854.20 samples/sec   Loss 1.3621   LearningRate 0.0000   Epoch: 33   Global Step: 58150   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 15:01:03,700-Speed 25040.00 samples/sec   Loss 1.3619   LearningRate 0.0000   Epoch: 33   Global Step: 58160   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 15:01:13,439-Speed 25240.26 samples/sec   Loss 1.3535   LearningRate 0.0000   Epoch: 33   Global Step: 58170   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 15:01:23,309-Speed 24904.57 samples/sec   Loss 1.3490   LearningRate 0.0000   Epoch: 33   Global Step: 58180   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 15:01:33,167-Speed 24933.62 samples/sec   Loss 1.3411   LearningRate 0.0000   Epoch: 33   Global Step: 58190   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 15:01:42,959-Speed 25104.99 samples/sec   Loss 1.3468   LearningRate 0.0000   Epoch: 33   Global Step: 58200   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 15:01:52,703-Speed 25225.85 samples/sec   Loss 1.3527   LearningRate 0.0000   Epoch: 33   Global Step: 58210   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 15:02:02,418-Speed 25305.38 samples/sec   Loss 1.3470   LearningRate 0.0000   Epoch: 33   Global Step: 58220   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 15:02:12,151-Speed 25254.83 samples/sec   Loss 1.3560   LearningRate 0.0000   Epoch: 33   Global Step: 58230   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 15:02:21,944-Speed 25097.34 samples/sec   Loss 1.3554   LearningRate 0.0000   Epoch: 33   Global Step: 58240   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 15:02:31,630-Speed 25375.53 samples/sec   Loss 1.3462   LearningRate 0.0000   Epoch: 33   Global Step: 58250   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 15:02:41,402-Speed 25152.77 samples/sec   Loss 1.3548   LearningRate 0.0000   Epoch: 33   Global Step: 58260   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 15:02:51,136-Speed 25251.96 samples/sec   Loss 1.3495   LearningRate 0.0000   Epoch: 33   Global Step: 58270   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 15:03:00,876-Speed 25235.32 samples/sec   Loss 1.3522   LearningRate 0.0000   Epoch: 33   Global Step: 58280   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 15:03:10,653-Speed 25147.10 samples/sec   Loss 1.3536   LearningRate 0.0000   Epoch: 33   Global Step: 58290   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 15:03:20,347-Speed 25357.14 samples/sec   Loss 1.3478   LearningRate 0.0000   Epoch: 33   Global Step: 58300   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 15:03:30,190-Speed 24972.26 samples/sec   Loss 1.3541   LearningRate 0.0000   Epoch: 33   Global Step: 58310   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 15:03:39,959-Speed 25160.46 samples/sec   Loss 1.3456   LearningRate 0.0000   Epoch: 33   Global Step: 58320   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 15:03:49,695-Speed 25246.15 samples/sec   Loss 1.3534   LearningRate 0.0000   Epoch: 33   Global Step: 58330   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 15:03:59,480-Speed 25120.11 samples/sec   Loss 1.3532   LearningRate 0.0000   Epoch: 33   Global Step: 58340   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 15:04:09,275-Speed 25094.41 samples/sec   Loss 1.3532   LearningRate 0.0000   Epoch: 33   Global Step: 58350   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 15:04:19,110-Speed 24991.14 samples/sec   Loss 1.3451   LearningRate 0.0000   Epoch: 33   Global Step: 58360   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 15:04:28,869-Speed 25186.49 samples/sec   Loss 1.3446   LearningRate 0.0000   Epoch: 33   Global Step: 58370   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 15:04:38,547-Speed 25395.88 samples/sec   Loss 1.3545   LearningRate 0.0000   Epoch: 33   Global Step: 58380   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 15:04:48,349-Speed 25076.39 samples/sec   Loss 1.3576   LearningRate 0.0000   Epoch: 33   Global Step: 58390   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 15:04:58,116-Speed 25167.35 samples/sec   Loss 1.3491   LearningRate 0.0000   Epoch: 33   Global Step: 58400   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 15:05:07,889-Speed 25151.98 samples/sec   Loss 1.3468   LearningRate 0.0000   Epoch: 33   Global Step: 58410   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 15:05:17,784-Speed 24839.70 samples/sec   Loss 1.3478   LearningRate 0.0000   Epoch: 33   Global Step: 58420   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 15:05:27,717-Speed 24745.84 samples/sec   Loss 1.3565   LearningRate 0.0000   Epoch: 33   Global Step: 58430   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 15:05:37,771-Speed 24446.22 samples/sec   Loss 1.3533   LearningRate 0.0000   Epoch: 33   Global Step: 58440   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 15:05:47,826-Speed 24451.57 samples/sec   Loss 1.3363   LearningRate 0.0000   Epoch: 33   Global Step: 58450   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 15:05:57,848-Speed 24525.60 samples/sec   Loss 1.3449   LearningRate 0.0000   Epoch: 33   Global Step: 58460   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 15:06:07,958-Speed 24313.99 samples/sec   Loss 1.3422   LearningRate 0.0000   Epoch: 33   Global Step: 58470   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 15:06:17,929-Speed 24653.17 samples/sec   Loss 1.3436   LearningRate 0.0000   Epoch: 33   Global Step: 58480   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 15:06:28,180-Speed 23975.74 samples/sec   Loss 1.3444   LearningRate 0.0000   Epoch: 33   Global Step: 58490   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 15:06:38,213-Speed 24500.03 samples/sec   Loss 1.3543   LearningRate 0.0000   Epoch: 33   Global Step: 58500   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 15:06:48,260-Speed 24464.18 samples/sec   Loss 1.3541   LearningRate 0.0000   Epoch: 33   Global Step: 58510   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 15:06:58,119-Speed 24929.21 samples/sec   Loss 1.3493   LearningRate 0.0000   Epoch: 33   Global Step: 58520   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 15:07:07,906-Speed 25115.76 samples/sec   Loss 1.3502   LearningRate 0.0000   Epoch: 33   Global Step: 58530   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 15:07:17,700-Speed 25097.42 samples/sec   Loss 1.3425   LearningRate 0.0000   Epoch: 33   Global Step: 58540   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 15:07:27,484-Speed 25126.64 samples/sec   Loss 1.3489   LearningRate 0.0000   Epoch: 33   Global Step: 58550   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 15:07:37,284-Speed 25081.57 samples/sec   Loss 1.3445   LearningRate 0.0000   Epoch: 33   Global Step: 58560   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 15:07:46,997-Speed 25307.28 samples/sec   Loss 1.3401   LearningRate 0.0000   Epoch: 33   Global Step: 58570   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 15:07:56,682-Speed 25378.17 samples/sec   Loss 1.3419   LearningRate 0.0000   Epoch: 33   Global Step: 58580   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 15:08:06,478-Speed 25093.73 samples/sec   Loss 1.3435   LearningRate 0.0000   Epoch: 33   Global Step: 58590   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 15:08:16,368-Speed 24852.42 samples/sec   Loss 1.3426   LearningRate 0.0000   Epoch: 33   Global Step: 58600   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 15:08:26,185-Speed 25037.75 samples/sec   Loss 1.3461   LearningRate 0.0000   Epoch: 33   Global Step: 58610   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 15:08:35,967-Speed 25128.35 samples/sec   Loss 1.3407   LearningRate 0.0000   Epoch: 33   Global Step: 58620   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 15:08:45,890-Speed 24775.83 samples/sec   Loss 1.3468   LearningRate 0.0000   Epoch: 33   Global Step: 58630   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 15:08:55,714-Speed 25019.54 samples/sec   Loss 1.3596   LearningRate 0.0000   Epoch: 33   Global Step: 58640   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 15:09:05,558-Speed 24968.04 samples/sec   Loss 1.3410   LearningRate 0.0000   Epoch: 33   Global Step: 58650   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 15:09:15,292-Speed 25249.82 samples/sec   Loss 1.3444   LearningRate 0.0000   Epoch: 33   Global Step: 58660   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 15:09:25,095-Speed 25074.21 samples/sec   Loss 1.3483   LearningRate 0.0000   Epoch: 33   Global Step: 58670   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 15:09:34,984-Speed 24854.99 samples/sec   Loss 1.3380   LearningRate 0.0000   Epoch: 33   Global Step: 58680   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 15:09:44,840-Speed 24940.13 samples/sec   Loss 1.3391   LearningRate 0.0000   Epoch: 33   Global Step: 58690   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 15:09:54,724-Speed 24866.91 samples/sec   Loss 1.3559   LearningRate 0.0000   Epoch: 33   Global Step: 58700   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 15:10:04,526-Speed 25077.08 samples/sec   Loss 1.3518   LearningRate 0.0000   Epoch: 33   Global Step: 58710   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 15:10:14,400-Speed 24892.06 samples/sec   Loss 1.3567   LearningRate 0.0000   Epoch: 33   Global Step: 58720   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 15:10:24,253-Speed 24947.39 samples/sec   Loss 1.3516   LearningRate 0.0000   Epoch: 33   Global Step: 58730   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 15:10:34,024-Speed 25155.44 samples/sec   Loss 1.3497   LearningRate 0.0000   Epoch: 33   Global Step: 58740   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 15:10:44,114-Speed 24360.69 samples/sec   Loss 1.3410   LearningRate 0.0000   Epoch: 33   Global Step: 58750   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 15:10:54,051-Speed 24735.59 samples/sec   Loss 1.3487   LearningRate 0.0000   Epoch: 33   Global Step: 58760   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 15:11:53,444-Speed 4138.01 samples/sec   Loss 1.3388   LearningRate 0.0000   Epoch: 34   Global Step: 58770   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 15:12:03,199-Speed 25194.83 samples/sec   Loss 1.3407   LearningRate 0.0000   Epoch: 34   Global Step: 58780   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 15:12:12,948-Speed 25213.56 samples/sec   Loss 1.3337   LearningRate 0.0000   Epoch: 34   Global Step: 58790   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 15:12:22,679-Speed 25258.69 samples/sec   Loss 1.3392   LearningRate 0.0000   Epoch: 34   Global Step: 58800   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 15:12:32,383-Speed 25330.04 samples/sec   Loss 1.3354   LearningRate 0.0000   Epoch: 34   Global Step: 58810   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 15:12:42,286-Speed 24818.82 samples/sec   Loss 1.3438   LearningRate 0.0000   Epoch: 34   Global Step: 58820   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 15:12:52,026-Speed 25235.77 samples/sec   Loss 1.3403   LearningRate 0.0000   Epoch: 34   Global Step: 58830   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 15:13:01,861-Speed 24991.26 samples/sec   Loss 1.3418   LearningRate 0.0000   Epoch: 34   Global Step: 58840   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 15:13:11,790-Speed 24754.40 samples/sec   Loss 1.3427   LearningRate 0.0000   Epoch: 34   Global Step: 58850   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 15:13:21,747-Speed 24686.25 samples/sec   Loss 1.3410   LearningRate 0.0000   Epoch: 34   Global Step: 58860   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 15:13:31,709-Speed 24672.53 samples/sec   Loss 1.3364   LearningRate 0.0000   Epoch: 34   Global Step: 58870   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 15:13:41,607-Speed 24831.64 samples/sec   Loss 1.3369   LearningRate 0.0000   Epoch: 34   Global Step: 58880   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 15:13:51,531-Speed 24769.43 samples/sec   Loss 1.3305   LearningRate 0.0000   Epoch: 34   Global Step: 58890   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 15:14:01,432-Speed 24825.63 samples/sec   Loss 1.3413   LearningRate 0.0000   Epoch: 34   Global Step: 58900   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 15:14:11,398-Speed 24662.66 samples/sec   Loss 1.3428   LearningRate 0.0000   Epoch: 34   Global Step: 58910   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 15:14:21,361-Speed 24670.33 samples/sec   Loss 1.3438   LearningRate 0.0000   Epoch: 34   Global Step: 58920   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 15:14:31,286-Speed 24765.71 samples/sec   Loss 1.3478   LearningRate 0.0000   Epoch: 34   Global Step: 58930   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 15:14:41,221-Speed 24742.60 samples/sec   Loss 1.3371   LearningRate 0.0000   Epoch: 34   Global Step: 58940   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 15:14:51,164-Speed 24720.43 samples/sec   Loss 1.3429   LearningRate 0.0000   Epoch: 34   Global Step: 58950   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 15:15:01,088-Speed 24767.26 samples/sec   Loss 1.3336   LearningRate 0.0000   Epoch: 34   Global Step: 58960   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 15:15:10,981-Speed 24845.98 samples/sec   Loss 1.3326   LearningRate 0.0000   Epoch: 34   Global Step: 58970   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 15:15:20,931-Speed 24703.20 samples/sec   Loss 1.3309   LearningRate 0.0000   Epoch: 34   Global Step: 58980   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 15:15:30,891-Speed 24678.01 samples/sec   Loss 1.3489   LearningRate 0.0000   Epoch: 34   Global Step: 58990   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 15:15:40,880-Speed 24605.73 samples/sec   Loss 1.3304   LearningRate 0.0000   Epoch: 34   Global Step: 59000   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 15:15:50,787-Speed 24809.64 samples/sec   Loss 1.3320   LearningRate 0.0000   Epoch: 34   Global Step: 59010   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 15:16:00,714-Speed 24761.17 samples/sec   Loss 1.3345   LearningRate 0.0000   Epoch: 34   Global Step: 59020   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 15:16:10,619-Speed 24812.96 samples/sec   Loss 1.3379   LearningRate 0.0000   Epoch: 34   Global Step: 59030   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 15:16:20,545-Speed 24762.97 samples/sec   Loss 1.3183   LearningRate 0.0000   Epoch: 34   Global Step: 59040   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 15:16:30,478-Speed 24744.84 samples/sec   Loss 1.3328   LearningRate 0.0000   Epoch: 34   Global Step: 59050   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 15:16:40,355-Speed 24893.62 samples/sec   Loss 1.3431   LearningRate 0.0000   Epoch: 34   Global Step: 59060   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 15:16:50,294-Speed 24732.00 samples/sec   Loss 1.3443   LearningRate 0.0000   Epoch: 34   Global Step: 59070   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 15:17:00,155-Speed 24925.81 samples/sec   Loss 1.3442   LearningRate 0.0000   Epoch: 34   Global Step: 59080   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 15:17:10,141-Speed 24614.27 samples/sec   Loss 1.3458   LearningRate 0.0000   Epoch: 34   Global Step: 59090   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 15:17:20,130-Speed 24604.07 samples/sec   Loss 1.3387   LearningRate 0.0000   Epoch: 34   Global Step: 59100   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 15:17:30,215-Speed 24372.40 samples/sec   Loss 1.3338   LearningRate 0.0000   Epoch: 34   Global Step: 59110   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 15:17:40,156-Speed 24724.57 samples/sec   Loss 1.3453   LearningRate 0.0000   Epoch: 34   Global Step: 59120   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 15:17:50,106-Speed 24702.94 samples/sec   Loss 1.3403   LearningRate 0.0000   Epoch: 34   Global Step: 59130   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 15:18:00,002-Speed 24835.90 samples/sec   Loss 1.3358   LearningRate 0.0000   Epoch: 34   Global Step: 59140   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 15:18:09,924-Speed 24775.39 samples/sec   Loss 1.3388   LearningRate 0.0000   Epoch: 34   Global Step: 59150   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 15:18:19,785-Speed 24924.15 samples/sec   Loss 1.3320   LearningRate 0.0000   Epoch: 34   Global Step: 59160   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 15:18:29,677-Speed 24847.23 samples/sec   Loss 1.3310   LearningRate 0.0000   Epoch: 34   Global Step: 59170   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 15:18:39,588-Speed 24799.74 samples/sec   Loss 1.3373   LearningRate 0.0000   Epoch: 34   Global Step: 59180   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 15:18:49,473-Speed 24865.78 samples/sec   Loss 1.3339   LearningRate 0.0000   Epoch: 34   Global Step: 59190   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 15:18:59,373-Speed 24828.59 samples/sec   Loss 1.3384   LearningRate 0.0000   Epoch: 34   Global Step: 59200   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 15:19:09,318-Speed 24715.37 samples/sec   Loss 1.3386   LearningRate 0.0000   Epoch: 34   Global Step: 59210   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 15:19:19,326-Speed 24557.94 samples/sec   Loss 1.3327   LearningRate 0.0000   Epoch: 34   Global Step: 59220   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 15:19:29,203-Speed 24887.75 samples/sec   Loss 1.3376   LearningRate 0.0000   Epoch: 34   Global Step: 59230   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 15:19:39,127-Speed 24766.82 samples/sec   Loss 1.3296   LearningRate 0.0000   Epoch: 34   Global Step: 59240   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 15:19:49,091-Speed 24666.10 samples/sec   Loss 1.3411   LearningRate 0.0000   Epoch: 34   Global Step: 59250   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 15:19:58,936-Speed 24967.88 samples/sec   Loss 1.3424   LearningRate 0.0000   Epoch: 34   Global Step: 59260   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 15:20:08,804-Speed 24911.64 samples/sec   Loss 1.3457   LearningRate 0.0000   Epoch: 34   Global Step: 59270   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 15:20:18,585-Speed 25130.86 samples/sec   Loss 1.3325   LearningRate 0.0000   Epoch: 34   Global Step: 59280   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 15:20:28,309-Speed 25274.95 samples/sec   Loss 1.3402   LearningRate 0.0000   Epoch: 34   Global Step: 59290   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 15:20:38,277-Speed 24657.62 samples/sec   Loss 1.3360   LearningRate 0.0000   Epoch: 34   Global Step: 59300   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 15:20:48,263-Speed 24613.43 samples/sec   Loss 1.3361   LearningRate 0.0000   Epoch: 34   Global Step: 59310   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 15:20:58,243-Speed 24628.18 samples/sec   Loss 1.3412   LearningRate 0.0000   Epoch: 34   Global Step: 59320   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 15:21:08,268-Speed 24521.46 samples/sec   Loss 1.3363   LearningRate 0.0000   Epoch: 34   Global Step: 59330   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 15:21:18,281-Speed 24548.03 samples/sec   Loss 1.3263   LearningRate 0.0000   Epoch: 34   Global Step: 59340   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 15:21:28,276-Speed 24590.41 samples/sec   Loss 1.3377   LearningRate 0.0000   Epoch: 34   Global Step: 59350   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 15:21:38,286-Speed 24557.38 samples/sec   Loss 1.3349   LearningRate 0.0000   Epoch: 34   Global Step: 59360   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 15:21:48,285-Speed 24582.01 samples/sec   Loss 1.3294   LearningRate 0.0000   Epoch: 34   Global Step: 59370   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 15:21:58,306-Speed 24531.71 samples/sec   Loss 1.3470   LearningRate 0.0000   Epoch: 34   Global Step: 59380   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 15:22:08,452-Speed 24225.44 samples/sec   Loss 1.3275   LearningRate 0.0000   Epoch: 34   Global Step: 59390   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 15:22:18,432-Speed 24630.12 samples/sec   Loss 1.3343   LearningRate 0.0000   Epoch: 34   Global Step: 59400   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 15:22:28,565-Speed 24256.67 samples/sec   Loss 1.3395   LearningRate 0.0000   Epoch: 34   Global Step: 59410   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 15:22:38,545-Speed 24633.42 samples/sec   Loss 1.3360   LearningRate 0.0000   Epoch: 34   Global Step: 59420   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 15:22:48,590-Speed 24470.23 samples/sec   Loss 1.3396   LearningRate 0.0000   Epoch: 34   Global Step: 59430   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 15:22:58,629-Speed 24490.60 samples/sec   Loss 1.3299   LearningRate 0.0000   Epoch: 34   Global Step: 59440   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 15:23:08,723-Speed 24350.91 samples/sec   Loss 1.3321   LearningRate 0.0000   Epoch: 34   Global Step: 59450   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 15:23:18,839-Speed 24298.49 samples/sec   Loss 1.3287   LearningRate 0.0000   Epoch: 34   Global Step: 59460   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 15:23:28,947-Speed 24322.79 samples/sec   Loss 1.3416   LearningRate 0.0000   Epoch: 34   Global Step: 59470   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 15:23:39,021-Speed 24398.01 samples/sec   Loss 1.3515   LearningRate 0.0000   Epoch: 34   Global Step: 59480   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 15:23:49,140-Speed 24293.73 samples/sec   Loss 1.3362   LearningRate 0.0000   Epoch: 34   Global Step: 59490   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 15:23:59,182-Speed 24476.13 samples/sec   Loss 1.3258   LearningRate 0.0000   Epoch: 34   Global Step: 59500   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 15:24:09,465-Speed 23903.01 samples/sec   Loss 1.3223   LearningRate 0.0000   Epoch: 34   Global Step: 59510   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 15:24:19,510-Speed 24468.74 samples/sec   Loss 1.3283   LearningRate 0.0000   Epoch: 34   Global Step: 59520   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 15:24:29,546-Speed 24491.60 samples/sec   Loss 1.3273   LearningRate 0.0000   Epoch: 34   Global Step: 59530   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 15:24:39,517-Speed 24652.56 samples/sec   Loss 1.3412   LearningRate 0.0000   Epoch: 34   Global Step: 59540   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 15:24:49,566-Speed 24459.18 samples/sec   Loss 1.3297   LearningRate 0.0000   Epoch: 34   Global Step: 59550   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 15:24:59,777-Speed 24076.62 samples/sec   Loss 1.3317   LearningRate 0.0000   Epoch: 34   Global Step: 59560   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 15:25:09,780-Speed 24576.66 samples/sec   Loss 1.3322   LearningRate 0.0000   Epoch: 34   Global Step: 59570   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 15:25:19,784-Speed 24568.52 samples/sec   Loss 1.3271   LearningRate 0.0000   Epoch: 34   Global Step: 59580   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 15:25:29,765-Speed 24628.30 samples/sec   Loss 1.3356   LearningRate 0.0000   Epoch: 34   Global Step: 59590   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 15:25:39,787-Speed 24525.45 samples/sec   Loss 1.3293   LearningRate 0.0000   Epoch: 34   Global Step: 59600   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 15:25:49,788-Speed 24576.44 samples/sec   Loss 1.3258   LearningRate 0.0000   Epoch: 34   Global Step: 59610   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 15:25:59,886-Speed 24340.09 samples/sec   Loss 1.3247   LearningRate 0.0000   Epoch: 34   Global Step: 59620   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 15:26:09,897-Speed 24552.37 samples/sec   Loss 1.3268   LearningRate 0.0000   Epoch: 34   Global Step: 59630   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 15:26:19,960-Speed 24427.43 samples/sec   Loss 1.3327   LearningRate 0.0000   Epoch: 34   Global Step: 59640   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 15:26:30,009-Speed 24458.19 samples/sec   Loss 1.3249   LearningRate 0.0000   Epoch: 34   Global Step: 59650   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 15:26:40,033-Speed 24522.10 samples/sec   Loss 1.3268   LearningRate 0.0000   Epoch: 34   Global Step: 59660   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 15:26:50,150-Speed 24293.74 samples/sec   Loss 1.3235   LearningRate 0.0000   Epoch: 34   Global Step: 59670   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 15:27:00,193-Speed 24476.26 samples/sec   Loss 1.3184   LearningRate 0.0000   Epoch: 34   Global Step: 59680   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 15:27:10,272-Speed 24387.05 samples/sec   Loss 1.3291   LearningRate 0.0000   Epoch: 34   Global Step: 59690   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 15:27:20,326-Speed 24445.97 samples/sec   Loss 1.3278   LearningRate 0.0000   Epoch: 34   Global Step: 59700   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 15:27:30,336-Speed 24556.68 samples/sec   Loss 1.3328   LearningRate 0.0000   Epoch: 34   Global Step: 59710   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 15:27:40,359-Speed 24521.06 samples/sec   Loss 1.3197   LearningRate 0.0000   Epoch: 34   Global Step: 59720   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 15:27:50,340-Speed 24626.89 samples/sec   Loss 1.3233   LearningRate 0.0000   Epoch: 34   Global Step: 59730   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 15:28:00,230-Speed 24852.18 samples/sec   Loss 1.3286   LearningRate 0.0000   Epoch: 34   Global Step: 59740   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 15:28:10,007-Speed 25139.04 samples/sec   Loss 1.3176   LearningRate 0.0000   Epoch: 34   Global Step: 59750   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 15:28:20,017-Speed 24555.39 samples/sec   Loss 1.3261   LearningRate 0.0000   Epoch: 34   Global Step: 59760   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 15:28:29,867-Speed 24952.55 samples/sec   Loss 1.3302   LearningRate 0.0000   Epoch: 34   Global Step: 59770   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 15:28:39,657-Speed 25106.71 samples/sec   Loss 1.3320   LearningRate 0.0000   Epoch: 34   Global Step: 59780   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 15:28:49,547-Speed 24851.62 samples/sec   Loss 1.3260   LearningRate 0.0000   Epoch: 34   Global Step: 59790   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 15:28:59,375-Speed 25009.22 samples/sec   Loss 1.3161   LearningRate 0.0000   Epoch: 34   Global Step: 59800   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 15:29:09,339-Speed 24667.39 samples/sec   Loss 1.3294   LearningRate 0.0000   Epoch: 34   Global Step: 59810   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 15:29:19,267-Speed 24757.08 samples/sec   Loss 1.3096   LearningRate 0.0000   Epoch: 34   Global Step: 59820   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 15:29:29,157-Speed 24859.70 samples/sec   Loss 1.3254   LearningRate 0.0000   Epoch: 34   Global Step: 59830   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 15:29:39,100-Speed 24720.65 samples/sec   Loss 1.3267   LearningRate 0.0000   Epoch: 34   Global Step: 59840   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 15:29:48,999-Speed 24829.24 samples/sec   Loss 1.3287   LearningRate 0.0000   Epoch: 34   Global Step: 59850   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 15:29:58,975-Speed 24639.14 samples/sec   Loss 1.3279   LearningRate 0.0000   Epoch: 34   Global Step: 59860   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 15:30:08,938-Speed 24670.51 samples/sec   Loss 1.3195   LearningRate 0.0000   Epoch: 34   Global Step: 59870   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 15:30:18,826-Speed 24856.95 samples/sec   Loss 1.3222   LearningRate 0.0000   Epoch: 34   Global Step: 59880   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 15:30:28,750-Speed 24767.24 samples/sec   Loss 1.3266   LearningRate 0.0000   Epoch: 34   Global Step: 59890   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 15:30:38,644-Speed 24843.52 samples/sec   Loss 1.3250   LearningRate 0.0000   Epoch: 34   Global Step: 59900   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 15:30:48,626-Speed 24623.32 samples/sec   Loss 1.3258   LearningRate 0.0000   Epoch: 34   Global Step: 59910   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 15:30:58,519-Speed 24846.45 samples/sec   Loss 1.3295   LearningRate 0.0000   Epoch: 34   Global Step: 59920   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 15:31:08,457-Speed 24732.42 samples/sec   Loss 1.3292   LearningRate 0.0000   Epoch: 34   Global Step: 59930   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 15:31:18,336-Speed 24879.54 samples/sec   Loss 1.3192   LearningRate 0.0000   Epoch: 34   Global Step: 59940   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 15:31:28,341-Speed 24568.87 samples/sec   Loss 1.3254   LearningRate 0.0000   Epoch: 34   Global Step: 59950   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 15:31:38,283-Speed 24724.54 samples/sec   Loss 1.3184   LearningRate 0.0000   Epoch: 34   Global Step: 59960   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 15:31:48,237-Speed 24698.26 samples/sec   Loss 1.3190   LearningRate 0.0000   Epoch: 34   Global Step: 59970   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 15:31:58,112-Speed 24897.37 samples/sec   Loss 1.3209   LearningRate 0.0000   Epoch: 34   Global Step: 59980   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 15:32:08,021-Speed 24804.94 samples/sec   Loss 1.3271   LearningRate 0.0000   Epoch: 34   Global Step: 59990   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 15:32:17,965-Speed 24720.18 samples/sec   Loss 1.3254   LearningRate 0.0000   Epoch: 34   Global Step: 60000   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 15:32:27,892-Speed 24760.12 samples/sec   Loss 1.3250   LearningRate 0.0000   Epoch: 34   Global Step: 60010   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 15:32:37,887-Speed 24592.42 samples/sec   Loss 1.3219   LearningRate 0.0000   Epoch: 34   Global Step: 60020   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 15:32:47,824-Speed 24734.06 samples/sec   Loss 1.3177   LearningRate 0.0000   Epoch: 34   Global Step: 60030   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 15:32:57,738-Speed 24794.86 samples/sec   Loss 1.3291   LearningRate 0.0000   Epoch: 34   Global Step: 60040   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 15:33:07,821-Speed 24377.60 samples/sec   Loss 1.3262   LearningRate 0.0000   Epoch: 34   Global Step: 60050   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 15:33:17,990-Speed 24170.40 samples/sec   Loss 1.3108   LearningRate 0.0000   Epoch: 34   Global Step: 60060   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 15:33:28,167-Speed 24150.98 samples/sec   Loss 1.3239   LearningRate 0.0000   Epoch: 34   Global Step: 60070   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 15:33:38,132-Speed 24667.31 samples/sec   Loss 1.3284   LearningRate 0.0000   Epoch: 34   Global Step: 60080   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 15:33:48,024-Speed 24845.67 samples/sec   Loss 1.3227   LearningRate 0.0000   Epoch: 34   Global Step: 60090   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 15:33:58,029-Speed 24568.25 samples/sec   Loss 1.3098   LearningRate 0.0000   Epoch: 34   Global Step: 60100   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-03-26 15:34:08,169-Speed 24238.56 samples/sec   Loss 1.3353   LearningRate 0.0000   Epoch: 34   Global Step: 60110   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 15:34:18,204-Speed 24494.30 samples/sec   Loss 1.3174   LearningRate 0.0000   Epoch: 34   Global Step: 60120   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 15:34:28,351-Speed 24225.14 samples/sec   Loss 1.3219   LearningRate 0.0000   Epoch: 34   Global Step: 60130   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 15:34:38,412-Speed 24429.68 samples/sec   Loss 1.3144   LearningRate 0.0000   Epoch: 34   Global Step: 60140   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 15:34:48,475-Speed 24426.38 samples/sec   Loss 1.3126   LearningRate 0.0000   Epoch: 34   Global Step: 60150   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 15:34:58,558-Speed 24383.61 samples/sec   Loss 1.3205   LearningRate 0.0000   Epoch: 34   Global Step: 60160   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 15:35:08,654-Speed 24345.10 samples/sec   Loss 1.3261   LearningRate 0.0000   Epoch: 34   Global Step: 60170   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 15:35:18,826-Speed 24163.28 samples/sec   Loss 1.3134   LearningRate 0.0000   Epoch: 34   Global Step: 60180   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-26 15:35:28,854-Speed 24512.12 samples/sec   Loss 1.3185   LearningRate 0.0000   Epoch: 34   Global Step: 60190   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 15:35:38,785-Speed 24749.01 samples/sec   Loss 1.3269   LearningRate 0.0000   Epoch: 34   Global Step: 60200   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 15:35:48,635-Speed 24955.21 samples/sec   Loss 1.3140   LearningRate 0.0000   Epoch: 34   Global Step: 60210   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 15:35:58,555-Speed 24778.78 samples/sec   Loss 1.3320   LearningRate 0.0000   Epoch: 34   Global Step: 60220   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 15:36:08,461-Speed 24813.14 samples/sec   Loss 1.3108   LearningRate 0.0000   Epoch: 34   Global Step: 60230   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 15:36:18,363-Speed 24824.35 samples/sec   Loss 1.3244   LearningRate 0.0000   Epoch: 34   Global Step: 60240   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 15:36:28,303-Speed 24726.75 samples/sec   Loss 1.3265   LearningRate 0.0000   Epoch: 34   Global Step: 60250   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-26 15:36:38,327-Speed 24521.80 samples/sec   Loss 1.3252   LearningRate 0.0000   Epoch: 34   Global Step: 60260   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-26 15:36:48,295-Speed 24667.76 samples/sec   Loss 1.3258   LearningRate 0.0000   Epoch: 34   Global Step: 60270   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-26 15:36:58,254-Speed 24679.03 samples/sec   Loss 1.3296   LearningRate 0.0000   Epoch: 34   Global Step: 60280   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-26 15:37:08,212-Speed 24683.05 samples/sec   Loss 1.3207   LearningRate 0.0000   Epoch: 34   Global Step: 60290   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 15:37:18,133-Speed 24775.98 samples/sec   Loss 1.3221   LearningRate 0.0000   Epoch: 34   Global Step: 60300   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 15:37:28,153-Speed 24530.73 samples/sec   Loss 1.3217   LearningRate 0.0000   Epoch: 34   Global Step: 60310   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 15:37:38,185-Speed 24504.63 samples/sec   Loss 1.3259   LearningRate 0.0000   Epoch: 34   Global Step: 60320   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 15:37:48,210-Speed 24517.77 samples/sec   Loss 1.3228   LearningRate 0.0000   Epoch: 34   Global Step: 60330   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 15:37:58,147-Speed 24736.71 samples/sec   Loss 1.3101   LearningRate 0.0000   Epoch: 34   Global Step: 60340   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 15:38:08,090-Speed 24724.56 samples/sec   Loss 1.3250   LearningRate 0.0000   Epoch: 34   Global Step: 60350   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 15:38:18,011-Speed 24773.85 samples/sec   Loss 1.3140   LearningRate 0.0000   Epoch: 34   Global Step: 60360   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 15:38:27,907-Speed 24837.70 samples/sec   Loss 1.3293   LearningRate 0.0000   Epoch: 34   Global Step: 60370   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 15:38:37,821-Speed 24799.22 samples/sec   Loss 1.3170   LearningRate 0.0000   Epoch: 34   Global Step: 60380   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 15:38:47,708-Speed 24859.56 samples/sec   Loss 1.3130   LearningRate 0.0000   Epoch: 34   Global Step: 60390   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 15:38:57,627-Speed 24783.20 samples/sec   Loss 1.3203   LearningRate 0.0000   Epoch: 34   Global Step: 60400   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 15:39:07,632-Speed 24567.61 samples/sec   Loss 1.3168   LearningRate 0.0000   Epoch: 34   Global Step: 60410   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 15:39:17,539-Speed 24813.12 samples/sec   Loss 1.3224   LearningRate 0.0000   Epoch: 34   Global Step: 60420   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 15:39:27,487-Speed 24708.26 samples/sec   Loss 1.3204   LearningRate 0.0000   Epoch: 34   Global Step: 60430   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 15:39:37,415-Speed 24758.39 samples/sec   Loss 1.3154   LearningRate 0.0000   Epoch: 34   Global Step: 60440   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 15:39:47,322-Speed 24811.76 samples/sec   Loss 1.3193   LearningRate 0.0000   Epoch: 34   Global Step: 60450   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 15:39:57,234-Speed 24796.74 samples/sec   Loss 1.3241   LearningRate 0.0000   Epoch: 34   Global Step: 60460   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 15:40:07,224-Speed 24605.38 samples/sec   Loss 1.3258   LearningRate 0.0000   Epoch: 34   Global Step: 60470   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 15:40:17,153-Speed 24756.36 samples/sec   Loss 1.3321   LearningRate 0.0000   Epoch: 34   Global Step: 60480   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 15:40:27,115-Speed 24672.88 samples/sec   Loss 1.3112   LearningRate 0.0000   Epoch: 34   Global Step: 60490   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 15:41:26,120-Speed 4165.12 samples/sec   Loss 1.3190   LearningRate 0.0000   Epoch: 35   Global Step: 60500   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 15:41:35,987-Speed 24910.85 samples/sec   Loss 1.3141   LearningRate 0.0000   Epoch: 35   Global Step: 60510   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 15:41:45,899-Speed 24797.93 samples/sec   Loss 1.3173   LearningRate 0.0000   Epoch: 35   Global Step: 60520   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 15:41:55,813-Speed 24791.89 samples/sec   Loss 1.3117   LearningRate 0.0000   Epoch: 35   Global Step: 60530   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 15:42:05,800-Speed 24611.80 samples/sec   Loss 1.3116   LearningRate 0.0000   Epoch: 35   Global Step: 60540   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 15:42:15,762-Speed 24673.52 samples/sec   Loss 1.3220   LearningRate 0.0000   Epoch: 35   Global Step: 60550   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 15:42:25,720-Speed 24683.44 samples/sec   Loss 1.3140   LearningRate 0.0000   Epoch: 35   Global Step: 60560   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 15:42:35,786-Speed 24418.64 samples/sec   Loss 1.3143   LearningRate 0.0000   Epoch: 35   Global Step: 60570   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 15:42:45,743-Speed 24690.71 samples/sec   Loss 1.3115   LearningRate 0.0000   Epoch: 35   Global Step: 60580   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 15:42:55,710-Speed 24660.69 samples/sec   Loss 1.3185   LearningRate 0.0000   Epoch: 35   Global Step: 60590   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 15:43:05,633-Speed 24770.66 samples/sec   Loss 1.3183   LearningRate 0.0000   Epoch: 35   Global Step: 60600   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 15:43:15,559-Speed 24763.32 samples/sec   Loss 1.3036   LearningRate 0.0000   Epoch: 35   Global Step: 60610   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-26 15:43:25,521-Speed 24671.32 samples/sec   Loss 1.3283   LearningRate 0.0000   Epoch: 35   Global Step: 60620   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-26 15:43:35,407-Speed 24863.14 samples/sec   Loss 1.3119   LearningRate 0.0000   Epoch: 35   Global Step: 60630   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-26 15:43:45,380-Speed 24644.92 samples/sec   Loss 1.3199   LearningRate 0.0000   Epoch: 35   Global Step: 60640   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-26 15:43:55,396-Speed 24539.15 samples/sec   Loss 1.3160   LearningRate 0.0000   Epoch: 35   Global Step: 60650   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-26 15:44:05,305-Speed 24809.54 samples/sec   Loss 1.3184   LearningRate 0.0000   Epoch: 35   Global Step: 60660   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-26 15:44:15,204-Speed 24831.44 samples/sec   Loss 1.3100   LearningRate 0.0000   Epoch: 35   Global Step: 60670   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-26 15:44:25,080-Speed 24887.03 samples/sec   Loss 1.3168   LearningRate 0.0000   Epoch: 35   Global Step: 60680   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-26 15:44:35,002-Speed 24771.85 samples/sec   Loss 1.3081   LearningRate 0.0000   Epoch: 35   Global Step: 60690   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-26 15:44:44,919-Speed 24786.88 samples/sec   Loss 1.3102   LearningRate 0.0000   Epoch: 35   Global Step: 60700   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-26 15:44:54,896-Speed 24634.16 samples/sec   Loss 1.3160   LearningRate 0.0000   Epoch: 35   Global Step: 60710   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 15:45:04,842-Speed 24713.24 samples/sec   Loss 1.3079   LearningRate 0.0000   Epoch: 35   Global Step: 60720   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 15:45:14,782-Speed 24728.16 samples/sec   Loss 1.3058   LearningRate 0.0000   Epoch: 35   Global Step: 60730   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 15:45:24,731-Speed 24705.35 samples/sec   Loss 1.3133   LearningRate 0.0000   Epoch: 35   Global Step: 60740   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 15:45:34,675-Speed 24717.31 samples/sec   Loss 1.3052   LearningRate 0.0000   Epoch: 35   Global Step: 60750   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 15:45:44,665-Speed 24604.77 samples/sec   Loss 1.3123   LearningRate 0.0000   Epoch: 35   Global Step: 60760   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 15:45:54,672-Speed 24563.14 samples/sec   Loss 1.3098   LearningRate 0.0000   Epoch: 35   Global Step: 60770   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 15:46:04,606-Speed 24741.84 samples/sec   Loss 1.3142   LearningRate 0.0000   Epoch: 35   Global Step: 60780   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 15:46:14,506-Speed 24827.89 samples/sec   Loss 1.3240   LearningRate 0.0000   Epoch: 35   Global Step: 60790   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 15:46:24,428-Speed 24772.11 samples/sec   Loss 1.3140   LearningRate 0.0000   Epoch: 35   Global Step: 60800   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 15:46:34,331-Speed 24820.94 samples/sec   Loss 1.3169   LearningRate 0.0000   Epoch: 35   Global Step: 60810   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 15:46:44,472-Speed 24237.04 samples/sec   Loss 1.3167   LearningRate 0.0000   Epoch: 35   Global Step: 60820   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 15:46:54,490-Speed 24537.49 samples/sec   Loss 1.3117   LearningRate 0.0000   Epoch: 35   Global Step: 60830   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 15:47:04,471-Speed 24624.54 samples/sec   Loss 1.3118   LearningRate 0.0000   Epoch: 35   Global Step: 60840   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 15:47:14,345-Speed 24899.68 samples/sec   Loss 1.3107   LearningRate 0.0000   Epoch: 35   Global Step: 60850   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 15:47:24,283-Speed 24733.15 samples/sec   Loss 1.3089   LearningRate 0.0000   Epoch: 35   Global Step: 60860   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 15:47:34,202-Speed 24778.91 samples/sec   Loss 1.3084   LearningRate 0.0000   Epoch: 35   Global Step: 60870   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 15:47:44,164-Speed 24674.19 samples/sec   Loss 1.3121   LearningRate 0.0000   Epoch: 35   Global Step: 60880   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 15:47:54,100-Speed 24739.12 samples/sec   Loss 1.3092   LearningRate 0.0000   Epoch: 35   Global Step: 60890   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 15:48:04,036-Speed 24737.24 samples/sec   Loss 1.3071   LearningRate 0.0000   Epoch: 35   Global Step: 60900   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 15:48:13,973-Speed 24735.23 samples/sec   Loss 1.3217   LearningRate 0.0000   Epoch: 35   Global Step: 60910   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 15:48:23,907-Speed 24742.85 samples/sec   Loss 1.3133   LearningRate 0.0000   Epoch: 35   Global Step: 60920   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 15:48:33,889-Speed 24624.95 samples/sec   Loss 1.3103   LearningRate 0.0000   Epoch: 35   Global Step: 60930   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 15:48:43,870-Speed 24626.01 samples/sec   Loss 1.3168   LearningRate 0.0000   Epoch: 35   Global Step: 60940   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 15:48:53,832-Speed 24672.79 samples/sec   Loss 1.3149   LearningRate 0.0000   Epoch: 35   Global Step: 60950   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 15:49:03,797-Speed 24666.15 samples/sec   Loss 1.3186   LearningRate 0.0000   Epoch: 35   Global Step: 60960   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 15:49:13,724-Speed 24760.69 samples/sec   Loss 1.3121   LearningRate 0.0000   Epoch: 35   Global Step: 60970   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 15:49:23,655-Speed 24747.97 samples/sec   Loss 1.3095   LearningRate 0.0000   Epoch: 35   Global Step: 60980   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 15:49:33,593-Speed 24733.09 samples/sec   Loss 1.3205   LearningRate 0.0000   Epoch: 35   Global Step: 60990   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 15:49:43,497-Speed 24816.92 samples/sec   Loss 1.3163   LearningRate 0.0000   Epoch: 35   Global Step: 61000   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 15:49:53,375-Speed 24881.81 samples/sec   Loss 1.3151   LearningRate 0.0000   Epoch: 35   Global Step: 61010   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 15:50:03,401-Speed 24514.88 samples/sec   Loss 1.3141   LearningRate 0.0000   Epoch: 35   Global Step: 61020   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 15:50:13,416-Speed 24545.23 samples/sec   Loss 1.3128   LearningRate 0.0000   Epoch: 35   Global Step: 61030   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 15:50:23,195-Speed 25140.76 samples/sec   Loss 1.3105   LearningRate 0.0000   Epoch: 35   Global Step: 61040   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 15:50:32,916-Speed 25284.58 samples/sec   Loss 1.3076   LearningRate 0.0000   Epoch: 35   Global Step: 61050   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 15:50:42,717-Speed 25078.46 samples/sec   Loss 1.3087   LearningRate 0.0000   Epoch: 35   Global Step: 61060   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 15:50:52,410-Speed 25358.56 samples/sec   Loss 1.3058   LearningRate 0.0000   Epoch: 35   Global Step: 61070   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 15:51:02,150-Speed 25234.20 samples/sec   Loss 1.3125   LearningRate 0.0000   Epoch: 35   Global Step: 61080   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 15:51:11,994-Speed 24970.29 samples/sec   Loss 1.3070   LearningRate 0.0000   Epoch: 35   Global Step: 61090   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 15:51:21,781-Speed 25113.43 samples/sec   Loss 1.3138   LearningRate 0.0000   Epoch: 35   Global Step: 61100   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 15:51:31,727-Speed 24711.37 samples/sec   Loss 1.3094   LearningRate 0.0000   Epoch: 35   Global Step: 61110   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-03-26 15:51:41,435-Speed 25319.79 samples/sec   Loss 1.3042   LearningRate 0.0000   Epoch: 35   Global Step: 61120   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 15:51:51,141-Speed 25325.72 samples/sec   Loss 1.3088   LearningRate 0.0000   Epoch: 35   Global Step: 61130   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 15:52:00,894-Speed 25204.39 samples/sec   Loss 1.3041   LearningRate 0.0000   Epoch: 35   Global Step: 61140   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 15:52:10,583-Speed 25369.36 samples/sec   Loss 1.3071   LearningRate 0.0000   Epoch: 35   Global Step: 61150   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-26 15:52:20,317-Speed 25250.87 samples/sec   Loss 1.3108   LearningRate 0.0000   Epoch: 35   Global Step: 61160   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-26 15:52:30,075-Speed 25190.35 samples/sec   Loss 1.3104   LearningRate 0.0000   Epoch: 35   Global Step: 61170   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-26 15:52:39,981-Speed 24812.59 samples/sec   Loss 1.3122   LearningRate 0.0000   Epoch: 35   Global Step: 61180   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-26 15:52:49,827-Speed 24964.50 samples/sec   Loss 1.3071   LearningRate 0.0000   Epoch: 35   Global Step: 61190   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-26 15:52:59,599-Speed 25152.52 samples/sec   Loss 1.3098   LearningRate 0.0000   Epoch: 35   Global Step: 61200   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-26 15:53:09,357-Speed 25189.27 samples/sec   Loss 1.3097   LearningRate 0.0000   Epoch: 35   Global Step: 61210   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-26 15:53:19,185-Speed 25009.16 samples/sec   Loss 1.3149   LearningRate 0.0000   Epoch: 35   Global Step: 61220   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-26 15:53:28,972-Speed 25115.19 samples/sec   Loss 1.2970   LearningRate 0.0000   Epoch: 35   Global Step: 61230   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-26 15:53:38,741-Speed 25160.57 samples/sec   Loss 1.3111   LearningRate 0.0000   Epoch: 35   Global Step: 61240   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-26 15:53:48,668-Speed 24760.48 samples/sec   Loss 1.3220   LearningRate 0.0000   Epoch: 35   Global Step: 61250   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 15:53:58,458-Speed 25107.18 samples/sec   Loss 1.3102   LearningRate 0.0000   Epoch: 35   Global Step: 61260   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 15:54:08,315-Speed 24937.21 samples/sec   Loss 1.2977   LearningRate 0.0000   Epoch: 35   Global Step: 61270   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 15:54:18,249-Speed 24742.41 samples/sec   Loss 1.3084   LearningRate 0.0000   Epoch: 35   Global Step: 61280   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 15:54:28,094-Speed 24968.99 samples/sec   Loss 1.3074   LearningRate 0.0000   Epoch: 35   Global Step: 61290   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 15:54:37,843-Speed 25211.69 samples/sec   Loss 1.2996   LearningRate 0.0000   Epoch: 35   Global Step: 61300   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 15:54:47,605-Speed 25180.14 samples/sec   Loss 1.3091   LearningRate 0.0000   Epoch: 35   Global Step: 61310   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 15:54:57,391-Speed 25117.12 samples/sec   Loss 1.3036   LearningRate 0.0000   Epoch: 35   Global Step: 61320   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 15:55:07,228-Speed 24985.47 samples/sec   Loss 1.3023   LearningRate 0.0000   Epoch: 35   Global Step: 61330   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 15:55:17,054-Speed 25016.41 samples/sec   Loss 1.3014   LearningRate 0.0000   Epoch: 35   Global Step: 61340   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 15:55:26,814-Speed 25183.05 samples/sec   Loss 1.3021   LearningRate 0.0000   Epoch: 35   Global Step: 61350   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 15:55:36,679-Speed 24914.76 samples/sec   Loss 1.3067   LearningRate 0.0000   Epoch: 35   Global Step: 61360   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 15:55:46,374-Speed 25352.48 samples/sec   Loss 1.3113   LearningRate 0.0000   Epoch: 35   Global Step: 61370   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 15:55:56,190-Speed 25040.13 samples/sec   Loss 1.3032   LearningRate 0.0000   Epoch: 35   Global Step: 61380   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 15:56:06,071-Speed 24873.39 samples/sec   Loss 1.3022   LearningRate 0.0000   Epoch: 35   Global Step: 61390   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 15:56:15,881-Speed 25055.68 samples/sec   Loss 1.3110   LearningRate 0.0000   Epoch: 35   Global Step: 61400   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 15:56:25,674-Speed 25098.61 samples/sec   Loss 1.2952   LearningRate 0.0000   Epoch: 35   Global Step: 61410   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 15:56:35,346-Speed 25412.32 samples/sec   Loss 1.3068   LearningRate 0.0000   Epoch: 35   Global Step: 61420   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 15:56:45,102-Speed 25200.67 samples/sec   Loss 1.3032   LearningRate 0.0000   Epoch: 35   Global Step: 61430   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 15:56:54,965-Speed 24920.69 samples/sec   Loss 1.3043   LearningRate 0.0000   Epoch: 35   Global Step: 61440   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 15:57:04,807-Speed 24971.58 samples/sec   Loss 1.2975   LearningRate 0.0000   Epoch: 35   Global Step: 61450   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 15:57:14,574-Speed 25165.36 samples/sec   Loss 1.2965   LearningRate 0.0000   Epoch: 35   Global Step: 61460   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 15:57:24,380-Speed 25073.86 samples/sec   Loss 1.2974   LearningRate 0.0000   Epoch: 35   Global Step: 61470   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 15:57:34,197-Speed 25039.07 samples/sec   Loss 1.2943   LearningRate 0.0000   Epoch: 35   Global Step: 61480   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 15:57:43,964-Speed 25166.42 samples/sec   Loss 1.3026   LearningRate 0.0000   Epoch: 35   Global Step: 61490   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 15:57:53,820-Speed 24939.26 samples/sec   Loss 1.3090   LearningRate 0.0000   Epoch: 35   Global Step: 61500   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 15:58:03,586-Speed 25167.82 samples/sec   Loss 1.3069   LearningRate 0.0000   Epoch: 35   Global Step: 61510   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 15:58:13,349-Speed 25176.93 samples/sec   Loss 1.3039   LearningRate 0.0000   Epoch: 35   Global Step: 61520   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-26 15:58:23,189-Speed 24979.33 samples/sec   Loss 1.3050   LearningRate 0.0000   Epoch: 35   Global Step: 61530   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-26 15:58:33,121-Speed 24747.74 samples/sec   Loss 1.3037   LearningRate 0.0000   Epoch: 35   Global Step: 61540   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-26 15:58:42,946-Speed 25016.99 samples/sec   Loss 1.3102   LearningRate 0.0000   Epoch: 35   Global Step: 61550   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-26 15:58:52,808-Speed 24924.93 samples/sec   Loss 1.2987   LearningRate 0.0000   Epoch: 35   Global Step: 61560   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-26 15:59:02,574-Speed 25169.75 samples/sec   Loss 1.3024   LearningRate 0.0000   Epoch: 35   Global Step: 61570   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-26 15:59:12,370-Speed 25090.66 samples/sec   Loss 1.3105   LearningRate 0.0000   Epoch: 35   Global Step: 61580   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-26 15:59:22,123-Speed 25201.51 samples/sec   Loss 1.3051   LearningRate 0.0000   Epoch: 35   Global Step: 61590   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-26 15:59:31,925-Speed 25073.33 samples/sec   Loss 1.2978   LearningRate 0.0000   Epoch: 35   Global Step: 61600   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-26 15:59:41,716-Speed 25104.44 samples/sec   Loss 1.2987   LearningRate 0.0000   Epoch: 35   Global Step: 61610   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-26 15:59:51,522-Speed 25073.43 samples/sec   Loss 1.2997   LearningRate 0.0000   Epoch: 35   Global Step: 61620   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:00:01,367-Speed 24972.34 samples/sec   Loss 1.3017   LearningRate 0.0000   Epoch: 35   Global Step: 61630   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:00:11,298-Speed 24758.00 samples/sec   Loss 1.2963   LearningRate 0.0000   Epoch: 35   Global Step: 61640   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:00:21,149-Speed 24952.70 samples/sec   Loss 1.3088   LearningRate 0.0000   Epoch: 35   Global Step: 61650   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:00:31,169-Speed 24528.08 samples/sec   Loss 1.3039   LearningRate 0.0000   Epoch: 35   Global Step: 61660   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:00:41,382-Speed 24067.72 samples/sec   Loss 1.2999   LearningRate 0.0000   Epoch: 35   Global Step: 61670   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:00:51,556-Speed 24157.54 samples/sec   Loss 1.3053   LearningRate 0.0000   Epoch: 35   Global Step: 61680   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:01:01,669-Speed 24312.92 samples/sec   Loss 1.2983   LearningRate 0.0000   Epoch: 35   Global Step: 61690   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:01:11,665-Speed 24590.04 samples/sec   Loss 1.2970   LearningRate 0.0000   Epoch: 35   Global Step: 61700   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:01:21,652-Speed 24611.33 samples/sec   Loss 1.2961   LearningRate 0.0000   Epoch: 35   Global Step: 61710   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:01:31,669-Speed 24536.05 samples/sec   Loss 1.3084   LearningRate 0.0000   Epoch: 35   Global Step: 61720   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-03-26 16:01:41,676-Speed 24560.91 samples/sec   Loss 1.3048   LearningRate 0.0000   Epoch: 35   Global Step: 61730   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:01:51,706-Speed 24506.66 samples/sec   Loss 1.2988   LearningRate 0.0000   Epoch: 35   Global Step: 61740   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:02:01,757-Speed 24454.47 samples/sec   Loss 1.3035   LearningRate 0.0000   Epoch: 35   Global Step: 61750   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:02:11,830-Speed 24399.99 samples/sec   Loss 1.3060   LearningRate 0.0000   Epoch: 35   Global Step: 61760   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:02:22,026-Speed 24108.11 samples/sec   Loss 1.3043   LearningRate 0.0000   Epoch: 35   Global Step: 61770   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:02:32,255-Speed 24029.70 samples/sec   Loss 1.2960   LearningRate 0.0000   Epoch: 35   Global Step: 61780   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:02:42,345-Speed 24359.28 samples/sec   Loss 1.3026   LearningRate 0.0000   Epoch: 35   Global Step: 61790   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:02:52,362-Speed 24537.35 samples/sec   Loss 1.2906   LearningRate 0.0000   Epoch: 35   Global Step: 61800   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:03:02,341-Speed 24632.17 samples/sec   Loss 1.3028   LearningRate 0.0000   Epoch: 35   Global Step: 61810   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:03:12,325-Speed 24618.51 samples/sec   Loss 1.2982   LearningRate 0.0000   Epoch: 35   Global Step: 61820   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:03:22,383-Speed 24438.39 samples/sec   Loss 1.2976   LearningRate 0.0000   Epoch: 35   Global Step: 61830   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:03:32,308-Speed 24764.29 samples/sec   Loss 1.2916   LearningRate 0.0000   Epoch: 35   Global Step: 61840   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:03:42,256-Speed 24711.72 samples/sec   Loss 1.2973   LearningRate 0.0000   Epoch: 35   Global Step: 61850   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:03:52,261-Speed 24566.08 samples/sec   Loss 1.2981   LearningRate 0.0000   Epoch: 35   Global Step: 61860   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:04:02,356-Speed 24346.96 samples/sec   Loss 1.3052   LearningRate 0.0000   Epoch: 35   Global Step: 61870   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:04:12,427-Speed 24412.59 samples/sec   Loss 1.2976   LearningRate 0.0000   Epoch: 35   Global Step: 61880   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:04:22,303-Speed 24888.00 samples/sec   Loss 1.2915   LearningRate 0.0000   Epoch: 35   Global Step: 61890   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:04:32,430-Speed 24269.63 samples/sec   Loss 1.2891   LearningRate 0.0000   Epoch: 35   Global Step: 61900   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:04:42,535-Speed 24324.51 samples/sec   Loss 1.2982   LearningRate 0.0000   Epoch: 35   Global Step: 61910   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:04:52,653-Speed 24292.67 samples/sec   Loss 1.2913   LearningRate 0.0000   Epoch: 35   Global Step: 61920   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:05:02,520-Speed 24910.58 samples/sec   Loss 1.3071   LearningRate 0.0000   Epoch: 35   Global Step: 61930   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:05:12,483-Speed 24672.65 samples/sec   Loss 1.2994   LearningRate 0.0000   Epoch: 35   Global Step: 61940   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:05:22,433-Speed 24702.06 samples/sec   Loss 1.3090   LearningRate 0.0000   Epoch: 35   Global Step: 61950   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:05:32,402-Speed 24656.07 samples/sec   Loss 1.2969   LearningRate 0.0000   Epoch: 35   Global Step: 61960   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:05:42,423-Speed 24527.44 samples/sec   Loss 1.2967   LearningRate 0.0000   Epoch: 35   Global Step: 61970   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:05:52,494-Speed 24406.75 samples/sec   Loss 1.3048   LearningRate 0.0000   Epoch: 35   Global Step: 61980   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:06:02,337-Speed 24971.22 samples/sec   Loss 1.2980   LearningRate 0.0000   Epoch: 35   Global Step: 61990   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:06:12,164-Speed 25015.06 samples/sec   Loss 1.2949   LearningRate 0.0000   Epoch: 35   Global Step: 62000   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:06:21,975-Speed 25054.07 samples/sec   Loss 1.3081   LearningRate 0.0000   Epoch: 35   Global Step: 62010   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:06:31,844-Speed 24904.99 samples/sec   Loss 1.3000   LearningRate 0.0000   Epoch: 35   Global Step: 62020   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:06:41,643-Speed 25093.03 samples/sec   Loss 1.2968   LearningRate 0.0000   Epoch: 35   Global Step: 62030   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:06:51,494-Speed 24956.11 samples/sec   Loss 1.2868   LearningRate 0.0000   Epoch: 35   Global Step: 62040   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:07:01,282-Speed 25110.11 samples/sec   Loss 1.2977   LearningRate 0.0000   Epoch: 35   Global Step: 62050   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:07:11,004-Speed 25284.40 samples/sec   Loss 1.3011   LearningRate 0.0000   Epoch: 35   Global Step: 62060   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:07:20,822-Speed 25044.18 samples/sec   Loss 1.2922   LearningRate 0.0000   Epoch: 35   Global Step: 62070   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:07:30,696-Speed 24892.68 samples/sec   Loss 1.2991   LearningRate 0.0000   Epoch: 35   Global Step: 62080   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:07:40,504-Speed 25059.65 samples/sec   Loss 1.2936   LearningRate 0.0000   Epoch: 35   Global Step: 62090   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:07:50,396-Speed 24846.90 samples/sec   Loss 1.3032   LearningRate 0.0000   Epoch: 35   Global Step: 62100   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:08:00,140-Speed 25226.78 samples/sec   Loss 1.2957   LearningRate 0.0000   Epoch: 35   Global Step: 62110   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:08:09,945-Speed 25075.52 samples/sec   Loss 1.3014   LearningRate 0.0000   Epoch: 35   Global Step: 62120   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:08:19,736-Speed 25102.82 samples/sec   Loss 1.3012   LearningRate 0.0000   Epoch: 35   Global Step: 62130   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:08:29,790-Speed 24448.73 samples/sec   Loss 1.3011   LearningRate 0.0000   Epoch: 35   Global Step: 62140   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:08:39,830-Speed 24480.68 samples/sec   Loss 1.3142   LearningRate 0.0000   Epoch: 35   Global Step: 62150   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:08:49,840-Speed 24551.99 samples/sec   Loss 1.2976   LearningRate 0.0000   Epoch: 35   Global Step: 62160   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:08:59,795-Speed 24691.81 samples/sec   Loss 1.2999   LearningRate 0.0000   Epoch: 35   Global Step: 62170   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:09:09,782-Speed 24610.37 samples/sec   Loss 1.2984   LearningRate 0.0000   Epoch: 35   Global Step: 62180   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:09:19,741-Speed 24680.76 samples/sec   Loss 1.2921   LearningRate 0.0000   Epoch: 35   Global Step: 62190   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:09:29,640-Speed 24828.91 samples/sec   Loss 1.2937   LearningRate 0.0000   Epoch: 35   Global Step: 62200   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:09:39,624-Speed 24617.56 samples/sec   Loss 1.3062   LearningRate 0.0000   Epoch: 35   Global Step: 62210   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:10:38,999-Speed 4139.23 samples/sec   Loss 1.3031   LearningRate 0.0000   Epoch: 36   Global Step: 62220   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:10:48,683-Speed 25381.41 samples/sec   Loss 1.2951   LearningRate 0.0000   Epoch: 36   Global Step: 62230   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:10:58,455-Speed 25153.17 samples/sec   Loss 1.2955   LearningRate 0.0000   Epoch: 36   Global Step: 62240   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:11:08,169-Speed 25303.91 samples/sec   Loss 1.2930   LearningRate 0.0000   Epoch: 36   Global Step: 62250   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:11:17,973-Speed 25071.49 samples/sec   Loss 1.3002   LearningRate 0.0000   Epoch: 36   Global Step: 62260   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:11:27,710-Speed 25242.24 samples/sec   Loss 1.2955   LearningRate 0.0000   Epoch: 36   Global Step: 62270   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:11:37,409-Speed 25343.13 samples/sec   Loss 1.2915   LearningRate 0.0000   Epoch: 36   Global Step: 62280   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:11:47,252-Speed 24970.63 samples/sec   Loss 1.2920   LearningRate 0.0000   Epoch: 36   Global Step: 62290   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:11:56,917-Speed 25432.26 samples/sec   Loss 1.2873   LearningRate 0.0000   Epoch: 36   Global Step: 62300   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:12:06,653-Speed 25246.78 samples/sec   Loss 1.2877   LearningRate 0.0000   Epoch: 36   Global Step: 62310   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:12:16,395-Speed 25239.02 samples/sec   Loss 1.2892   LearningRate 0.0000   Epoch: 36   Global Step: 62320   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:12:26,070-Speed 25405.64 samples/sec   Loss 1.2987   LearningRate 0.0000   Epoch: 36   Global Step: 62330   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:12:35,779-Speed 25314.85 samples/sec   Loss 1.2876   LearningRate 0.0000   Epoch: 36   Global Step: 62340   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:12:45,648-Speed 24905.98 samples/sec   Loss 1.2886   LearningRate 0.0000   Epoch: 36   Global Step: 62350   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:12:55,472-Speed 25023.91 samples/sec   Loss 1.2860   LearningRate 0.0000   Epoch: 36   Global Step: 62360   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:13:05,277-Speed 25069.24 samples/sec   Loss 1.2867   LearningRate 0.0000   Epoch: 36   Global Step: 62370   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:13:15,048-Speed 25154.85 samples/sec   Loss 1.2959   LearningRate 0.0000   Epoch: 36   Global Step: 62380   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:13:24,747-Speed 25343.96 samples/sec   Loss 1.2897   LearningRate 0.0000   Epoch: 36   Global Step: 62390   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:13:34,464-Speed 25294.55 samples/sec   Loss 1.3055   LearningRate 0.0000   Epoch: 36   Global Step: 62400   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:13:44,311-Speed 24961.39 samples/sec   Loss 1.2900   LearningRate 0.0000   Epoch: 36   Global Step: 62410   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:13:54,092-Speed 25129.06 samples/sec   Loss 1.2973   LearningRate 0.0000   Epoch: 36   Global Step: 62420   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:14:03,875-Speed 25125.13 samples/sec   Loss 1.2913   LearningRate 0.0000   Epoch: 36   Global Step: 62430   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:14:13,593-Speed 25291.85 samples/sec   Loss 1.3022   LearningRate 0.0000   Epoch: 36   Global Step: 62440   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:14:23,324-Speed 25259.92 samples/sec   Loss 1.2933   LearningRate 0.0000   Epoch: 36   Global Step: 62450   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:14:33,048-Speed 25275.02 samples/sec   Loss 1.2955   LearningRate 0.0000   Epoch: 36   Global Step: 62460   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:14:42,834-Speed 25117.63 samples/sec   Loss 1.2884   LearningRate 0.0000   Epoch: 36   Global Step: 62470   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:14:52,689-Speed 24943.01 samples/sec   Loss 1.2902   LearningRate 0.0000   Epoch: 36   Global Step: 62480   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:15:02,587-Speed 24831.83 samples/sec   Loss 1.2978   LearningRate 0.0000   Epoch: 36   Global Step: 62490   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:15:12,310-Speed 25281.32 samples/sec   Loss 1.2874   LearningRate 0.0000   Epoch: 36   Global Step: 62500   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:15:22,019-Speed 25314.31 samples/sec   Loss 1.2877   LearningRate 0.0000   Epoch: 36   Global Step: 62510   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:15:31,843-Speed 25020.30 samples/sec   Loss 1.2897   LearningRate 0.0000   Epoch: 36   Global Step: 62520   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:15:41,533-Speed 25368.11 samples/sec   Loss 1.2976   LearningRate 0.0000   Epoch: 36   Global Step: 62530   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-03-26 16:15:51,264-Speed 25258.72 samples/sec   Loss 1.2939   LearningRate 0.0000   Epoch: 36   Global Step: 62540   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:16:01,004-Speed 25235.15 samples/sec   Loss 1.2862   LearningRate 0.0000   Epoch: 36   Global Step: 62550   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:16:10,851-Speed 24969.49 samples/sec   Loss 1.3011   LearningRate 0.0000   Epoch: 36   Global Step: 62560   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:16:20,617-Speed 25168.43 samples/sec   Loss 1.2986   LearningRate 0.0000   Epoch: 36   Global Step: 62570   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:16:30,489-Speed 24898.34 samples/sec   Loss 1.2938   LearningRate 0.0000   Epoch: 36   Global Step: 62580   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:16:40,322-Speed 24994.11 samples/sec   Loss 1.3002   LearningRate 0.0000   Epoch: 36   Global Step: 62590   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:16:50,037-Speed 25301.47 samples/sec   Loss 1.3044   LearningRate 0.0000   Epoch: 36   Global Step: 62600   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:16:59,882-Speed 24967.15 samples/sec   Loss 1.2883   LearningRate 0.0000   Epoch: 36   Global Step: 62610   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:17:09,765-Speed 24870.32 samples/sec   Loss 1.2904   LearningRate 0.0000   Epoch: 36   Global Step: 62620   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:17:19,747-Speed 24627.00 samples/sec   Loss 1.2970   LearningRate 0.0000   Epoch: 36   Global Step: 62630   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:17:29,681-Speed 24742.95 samples/sec   Loss 1.2985   LearningRate 0.0000   Epoch: 36   Global Step: 62640   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:17:39,401-Speed 25287.16 samples/sec   Loss 1.2954   LearningRate 0.0000   Epoch: 36   Global Step: 62650   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:17:49,081-Speed 25390.99 samples/sec   Loss 1.2941   LearningRate 0.0000   Epoch: 36   Global Step: 62660   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:17:58,912-Speed 25001.98 samples/sec   Loss 1.2918   LearningRate 0.0000   Epoch: 36   Global Step: 62670   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:18:08,775-Speed 24920.98 samples/sec   Loss 1.2970   LearningRate 0.0000   Epoch: 36   Global Step: 62680   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:18:18,566-Speed 25102.60 samples/sec   Loss 1.2870   LearningRate 0.0000   Epoch: 36   Global Step: 62690   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:18:28,347-Speed 25132.05 samples/sec   Loss 1.2862   LearningRate 0.0000   Epoch: 36   Global Step: 62700   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:18:38,082-Speed 25257.23 samples/sec   Loss 1.2897   LearningRate 0.0000   Epoch: 36   Global Step: 62710   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:18:47,757-Speed 25404.09 samples/sec   Loss 1.2929   LearningRate 0.0000   Epoch: 36   Global Step: 62720   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:18:57,444-Speed 25377.02 samples/sec   Loss 1.2964   LearningRate 0.0000   Epoch: 36   Global Step: 62730   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:19:07,136-Speed 25358.12 samples/sec   Loss 1.2937   LearningRate 0.0000   Epoch: 36   Global Step: 62740   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:19:16,815-Speed 25397.36 samples/sec   Loss 1.2872   LearningRate 0.0000   Epoch: 36   Global Step: 62750   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:19:26,602-Speed 25114.35 samples/sec   Loss 1.2936   LearningRate 0.0000   Epoch: 36   Global Step: 62760   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:19:36,281-Speed 25394.51 samples/sec   Loss 1.2916   LearningRate 0.0000   Epoch: 36   Global Step: 62770   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:19:46,117-Speed 24987.73 samples/sec   Loss 1.2891   LearningRate 0.0000   Epoch: 36   Global Step: 62780   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:19:55,826-Speed 25319.06 samples/sec   Loss 1.2935   LearningRate 0.0000   Epoch: 36   Global Step: 62790   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:20:05,587-Speed 25180.69 samples/sec   Loss 1.2970   LearningRate 0.0000   Epoch: 36   Global Step: 62800   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:20:15,342-Speed 25194.62 samples/sec   Loss 1.3005   LearningRate 0.0000   Epoch: 36   Global Step: 62810   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:20:25,149-Speed 25064.54 samples/sec   Loss 1.2828   LearningRate 0.0000   Epoch: 36   Global Step: 62820   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:20:34,977-Speed 25010.14 samples/sec   Loss 1.2934   LearningRate 0.0000   Epoch: 36   Global Step: 62830   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:20:44,864-Speed 24859.10 samples/sec   Loss 1.2949   LearningRate 0.0000   Epoch: 36   Global Step: 62840   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:20:54,678-Speed 25046.31 samples/sec   Loss 1.2930   LearningRate 0.0000   Epoch: 36   Global Step: 62850   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:21:04,455-Speed 25139.43 samples/sec   Loss 1.2952   LearningRate 0.0000   Epoch: 36   Global Step: 62860   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:21:14,265-Speed 25056.68 samples/sec   Loss 1.2923   LearningRate 0.0000   Epoch: 36   Global Step: 62870   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:21:23,989-Speed 25284.99 samples/sec   Loss 1.2929   LearningRate 0.0000   Epoch: 36   Global Step: 62880   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:21:33,787-Speed 25084.14 samples/sec   Loss 1.2882   LearningRate 0.0000   Epoch: 36   Global Step: 62890   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:21:43,535-Speed 25214.85 samples/sec   Loss 1.2853   LearningRate 0.0000   Epoch: 36   Global Step: 62900   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:21:53,279-Speed 25226.88 samples/sec   Loss 1.2972   LearningRate 0.0000   Epoch: 36   Global Step: 62910   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:22:03,020-Speed 25230.80 samples/sec   Loss 1.2932   LearningRate 0.0000   Epoch: 36   Global Step: 62920   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:22:12,788-Speed 25164.68 samples/sec   Loss 1.2937   LearningRate 0.0000   Epoch: 36   Global Step: 62930   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:22:22,510-Speed 25283.06 samples/sec   Loss 1.2819   LearningRate 0.0000   Epoch: 36   Global Step: 62940   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:22:32,259-Speed 25210.66 samples/sec   Loss 1.2938   LearningRate 0.0000   Epoch: 36   Global Step: 62950   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:22:42,034-Speed 25145.61 samples/sec   Loss 1.2847   LearningRate 0.0000   Epoch: 36   Global Step: 62960   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:22:51,944-Speed 24810.12 samples/sec   Loss 1.2849   LearningRate 0.0000   Epoch: 36   Global Step: 62970   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:23:01,799-Speed 24940.18 samples/sec   Loss 1.2989   LearningRate 0.0000   Epoch: 36   Global Step: 62980   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:23:11,601-Speed 25075.29 samples/sec   Loss 1.2901   LearningRate 0.0000   Epoch: 36   Global Step: 62990   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:23:21,397-Speed 25092.27 samples/sec   Loss 1.2870   LearningRate 0.0000   Epoch: 36   Global Step: 63000   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-26 16:23:31,107-Speed 25312.79 samples/sec   Loss 1.2952   LearningRate 0.0000   Epoch: 36   Global Step: 63010   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-26 16:23:40,881-Speed 25147.24 samples/sec   Loss 1.2971   LearningRate 0.0000   Epoch: 36   Global Step: 63020   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-26 16:23:50,588-Speed 25322.31 samples/sec   Loss 1.2939   LearningRate 0.0000   Epoch: 36   Global Step: 63030   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-26 16:24:00,382-Speed 25102.59 samples/sec   Loss 1.2851   LearningRate 0.0000   Epoch: 36   Global Step: 63040   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-26 16:24:10,156-Speed 25146.74 samples/sec   Loss 1.2875   LearningRate 0.0000   Epoch: 36   Global Step: 63050   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-26 16:24:19,912-Speed 25195.90 samples/sec   Loss 1.2794   LearningRate 0.0000   Epoch: 36   Global Step: 63060   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-26 16:24:29,735-Speed 25023.44 samples/sec   Loss 1.2832   LearningRate 0.0000   Epoch: 36   Global Step: 63070   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-26 16:24:39,461-Speed 25276.11 samples/sec   Loss 1.2786   LearningRate 0.0000   Epoch: 36   Global Step: 63080   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-26 16:24:49,381-Speed 24778.19 samples/sec   Loss 1.2789   LearningRate 0.0000   Epoch: 36   Global Step: 63090   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-26 16:24:59,285-Speed 24818.92 samples/sec   Loss 1.2895   LearningRate 0.0000   Epoch: 36   Global Step: 63100   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:25:09,192-Speed 24809.39 samples/sec   Loss 1.2875   LearningRate 0.0000   Epoch: 36   Global Step: 63110   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:25:19,124-Speed 24748.36 samples/sec   Loss 1.2786   LearningRate 0.0000   Epoch: 36   Global Step: 63120   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:25:29,114-Speed 24602.50 samples/sec   Loss 1.2889   LearningRate 0.0000   Epoch: 36   Global Step: 63130   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:25:39,063-Speed 24705.92 samples/sec   Loss 1.2876   LearningRate 0.0000   Epoch: 36   Global Step: 63140   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:25:49,023-Speed 24685.11 samples/sec   Loss 1.2875   LearningRate 0.0000   Epoch: 36   Global Step: 63150   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:25:59,083-Speed 24430.17 samples/sec   Loss 1.2779   LearningRate 0.0000   Epoch: 36   Global Step: 63160   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:26:09,000-Speed 24784.38 samples/sec   Loss 1.2817   LearningRate 0.0000   Epoch: 36   Global Step: 63170   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:26:18,959-Speed 24685.20 samples/sec   Loss 1.2839   LearningRate 0.0000   Epoch: 36   Global Step: 63180   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:26:28,853-Speed 24843.34 samples/sec   Loss 1.2883   LearningRate 0.0000   Epoch: 36   Global Step: 63190   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:26:38,789-Speed 24736.97 samples/sec   Loss 1.2804   LearningRate 0.0000   Epoch: 36   Global Step: 63200   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:26:48,663-Speed 24893.63 samples/sec   Loss 1.2849   LearningRate 0.0000   Epoch: 36   Global Step: 63210   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:26:58,567-Speed 24817.94 samples/sec   Loss 1.2949   LearningRate 0.0000   Epoch: 36   Global Step: 63220   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:27:08,486-Speed 24785.40 samples/sec   Loss 1.2817   LearningRate 0.0000   Epoch: 36   Global Step: 63230   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:27:18,411-Speed 24766.49 samples/sec   Loss 1.2701   LearningRate 0.0000   Epoch: 36   Global Step: 63240   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:27:28,286-Speed 24890.46 samples/sec   Loss 1.2891   LearningRate 0.0000   Epoch: 36   Global Step: 63250   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:27:38,251-Speed 24665.64 samples/sec   Loss 1.2933   LearningRate 0.0000   Epoch: 36   Global Step: 63260   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:27:48,215-Speed 24666.93 samples/sec   Loss 1.2872   LearningRate 0.0000   Epoch: 36   Global Step: 63270   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:27:58,014-Speed 25084.44 samples/sec   Loss 1.2856   LearningRate 0.0000   Epoch: 36   Global Step: 63280   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:28:07,757-Speed 25227.75 samples/sec   Loss 1.2827   LearningRate 0.0000   Epoch: 36   Global Step: 63290   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:28:17,561-Speed 25072.12 samples/sec   Loss 1.2837   LearningRate 0.0000   Epoch: 36   Global Step: 63300   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:28:27,203-Speed 25492.27 samples/sec   Loss 1.2852   LearningRate 0.0000   Epoch: 36   Global Step: 63310   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:28:37,010-Speed 25063.14 samples/sec   Loss 1.2870   LearningRate 0.0000   Epoch: 36   Global Step: 63320   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:28:46,803-Speed 25098.45 samples/sec   Loss 1.2932   LearningRate 0.0000   Epoch: 36   Global Step: 63330   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:28:56,541-Speed 25240.94 samples/sec   Loss 1.2898   LearningRate 0.0000   Epoch: 36   Global Step: 63340   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:29:06,370-Speed 25007.06 samples/sec   Loss 1.2819   LearningRate 0.0000   Epoch: 36   Global Step: 63350   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:29:16,144-Speed 25148.52 samples/sec   Loss 1.2853   LearningRate 0.0000   Epoch: 36   Global Step: 63360   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:29:25,955-Speed 25054.12 samples/sec   Loss 1.2845   LearningRate 0.0000   Epoch: 36   Global Step: 63370   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:29:35,798-Speed 24968.85 samples/sec   Loss 1.2809   LearningRate 0.0000   Epoch: 36   Global Step: 63380   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:29:45,723-Speed 24764.93 samples/sec   Loss 1.2836   LearningRate 0.0000   Epoch: 36   Global Step: 63390   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:29:55,642-Speed 24778.51 samples/sec   Loss 1.2884   LearningRate 0.0000   Epoch: 36   Global Step: 63400   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:30:05,589-Speed 24718.75 samples/sec   Loss 1.2859   LearningRate 0.0000   Epoch: 36   Global Step: 63410   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:30:15,472-Speed 24874.26 samples/sec   Loss 1.2800   LearningRate 0.0000   Epoch: 36   Global Step: 63420   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:30:25,439-Speed 24660.73 samples/sec   Loss 1.2842   LearningRate 0.0000   Epoch: 36   Global Step: 63430   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:30:35,237-Speed 25085.52 samples/sec   Loss 1.2852   LearningRate 0.0000   Epoch: 36   Global Step: 63440   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:30:45,080-Speed 24970.23 samples/sec   Loss 1.2830   LearningRate 0.0000   Epoch: 36   Global Step: 63450   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:30:54,807-Speed 25266.57 samples/sec   Loss 1.2904   LearningRate 0.0000   Epoch: 36   Global Step: 63460   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:31:04,507-Speed 25338.17 samples/sec   Loss 1.2784   LearningRate 0.0000   Epoch: 36   Global Step: 63470   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:31:14,219-Speed 25309.61 samples/sec   Loss 1.2761   LearningRate 0.0000   Epoch: 36   Global Step: 63480   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:31:24,024-Speed 25067.41 samples/sec   Loss 1.2928   LearningRate 0.0000   Epoch: 36   Global Step: 63490   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:31:33,756-Speed 25255.85 samples/sec   Loss 1.2720   LearningRate 0.0000   Epoch: 36   Global Step: 63500   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:31:43,533-Speed 25139.36 samples/sec   Loss 1.2893   LearningRate 0.0000   Epoch: 36   Global Step: 63510   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:31:53,373-Speed 24977.50 samples/sec   Loss 1.2876   LearningRate 0.0000   Epoch: 36   Global Step: 63520   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:32:03,088-Speed 25299.32 samples/sec   Loss 1.2895   LearningRate 0.0000   Epoch: 36   Global Step: 63530   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:32:12,841-Speed 25201.37 samples/sec   Loss 1.2741   LearningRate 0.0000   Epoch: 36   Global Step: 63540   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:32:22,631-Speed 25104.86 samples/sec   Loss 1.2818   LearningRate 0.0000   Epoch: 36   Global Step: 63550   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:32:32,461-Speed 25005.13 samples/sec   Loss 1.2781   LearningRate 0.0000   Epoch: 36   Global Step: 63560   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:32:42,306-Speed 24963.61 samples/sec   Loss 1.2850   LearningRate 0.0000   Epoch: 36   Global Step: 63570   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:32:51,995-Speed 25368.30 samples/sec   Loss 1.2859   LearningRate 0.0000   Epoch: 36   Global Step: 63580   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:33:01,672-Speed 25397.61 samples/sec   Loss 1.2889   LearningRate 0.0000   Epoch: 36   Global Step: 63590   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:33:11,442-Speed 25158.05 samples/sec   Loss 1.2806   LearningRate 0.0000   Epoch: 36   Global Step: 63600   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:33:21,142-Speed 25337.83 samples/sec   Loss 1.2806   LearningRate 0.0000   Epoch: 36   Global Step: 63610   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:33:30,963-Speed 25027.01 samples/sec   Loss 1.2862   LearningRate 0.0000   Epoch: 36   Global Step: 63620   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:33:40,639-Speed 25402.20 samples/sec   Loss 1.2804   LearningRate 0.0000   Epoch: 36   Global Step: 63630   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:33:50,482-Speed 24971.37 samples/sec   Loss 1.2757   LearningRate 0.0000   Epoch: 36   Global Step: 63640   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:34:00,324-Speed 24972.05 samples/sec   Loss 1.2806   LearningRate 0.0000   Epoch: 36   Global Step: 63650   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:34:10,115-Speed 25111.36 samples/sec   Loss 1.2834   LearningRate 0.0000   Epoch: 36   Global Step: 63660   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:34:19,911-Speed 25089.36 samples/sec   Loss 1.2716   LearningRate 0.0000   Epoch: 36   Global Step: 63670   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:34:29,676-Speed 25170.44 samples/sec   Loss 1.2702   LearningRate 0.0000   Epoch: 36   Global Step: 63680   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:34:39,547-Speed 24898.75 samples/sec   Loss 1.2838   LearningRate 0.0000   Epoch: 36   Global Step: 63690   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-26 16:34:49,416-Speed 24906.72 samples/sec   Loss 1.2723   LearningRate 0.0000   Epoch: 36   Global Step: 63700   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-26 16:34:59,159-Speed 25225.67 samples/sec   Loss 1.2783   LearningRate 0.0000   Epoch: 36   Global Step: 63710   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-26 16:35:08,881-Speed 25282.01 samples/sec   Loss 1.2773   LearningRate 0.0000   Epoch: 36   Global Step: 63720   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-26 16:35:18,755-Speed 24892.36 samples/sec   Loss 1.2762   LearningRate 0.0000   Epoch: 36   Global Step: 63730   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-26 16:35:28,541-Speed 25115.84 samples/sec   Loss 1.2885   LearningRate 0.0000   Epoch: 36   Global Step: 63740   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-26 16:35:38,255-Speed 25303.22 samples/sec   Loss 1.2871   LearningRate 0.0000   Epoch: 36   Global Step: 63750   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-26 16:35:47,994-Speed 25236.71 samples/sec   Loss 1.2887   LearningRate 0.0000   Epoch: 36   Global Step: 63760   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-26 16:35:57,769-Speed 25146.15 samples/sec   Loss 1.2893   LearningRate 0.0000   Epoch: 36   Global Step: 63770   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-26 16:36:07,526-Speed 25191.49 samples/sec   Loss 1.2830   LearningRate 0.0000   Epoch: 36   Global Step: 63780   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-26 16:36:17,388-Speed 24922.66 samples/sec   Loss 1.2821   LearningRate 0.0000   Epoch: 36   Global Step: 63790   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:36:27,132-Speed 25226.41 samples/sec   Loss 1.2809   LearningRate 0.0000   Epoch: 36   Global Step: 63800   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-26 16:36:36,807-Speed 25405.52 samples/sec   Loss 1.2813   LearningRate 0.0000   Epoch: 36   Global Step: 63810   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 16:36:46,615-Speed 25058.27 samples/sec   Loss 1.2776   LearningRate 0.0000   Epoch: 36   Global Step: 63820   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 16:36:56,386-Speed 25157.01 samples/sec   Loss 1.2873   LearningRate 0.0000   Epoch: 36   Global Step: 63830   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 16:37:06,183-Speed 25087.81 samples/sec   Loss 1.2869   LearningRate 0.0000   Epoch: 36   Global Step: 63840   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 16:37:15,982-Speed 25083.87 samples/sec   Loss 1.2819   LearningRate 0.0000   Epoch: 36   Global Step: 63850   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 16:37:25,868-Speed 24863.80 samples/sec   Loss 1.2870   LearningRate 0.0000   Epoch: 36   Global Step: 63860   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 16:37:35,796-Speed 24757.23 samples/sec   Loss 1.2766   LearningRate 0.0000   Epoch: 36   Global Step: 63870   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 16:37:45,673-Speed 24887.09 samples/sec   Loss 1.2846   LearningRate 0.0000   Epoch: 36   Global Step: 63880   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 16:37:55,500-Speed 25011.81 samples/sec   Loss 1.2841   LearningRate 0.0000   Epoch: 36   Global Step: 63890   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 16:38:05,243-Speed 25227.76 samples/sec   Loss 1.2776   LearningRate 0.0000   Epoch: 36   Global Step: 63900   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 16:38:15,072-Speed 25022.85 samples/sec   Loss 1.2773   LearningRate 0.0000   Epoch: 36   Global Step: 63910   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 16:38:24,825-Speed 25201.10 samples/sec   Loss 1.2855   LearningRate 0.0000   Epoch: 36   Global Step: 63920   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 16:38:34,641-Speed 25040.28 samples/sec   Loss 1.2819   LearningRate 0.0000   Epoch: 36   Global Step: 63930   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 16:38:44,469-Speed 25010.86 samples/sec   Loss 1.2859   LearningRate 0.0000   Epoch: 36   Global Step: 63940   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 16:39:44,163-Speed 4117.07 samples/sec   Loss 1.2865   LearningRate 0.0000   Epoch: 37   Global Step: 63950   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 16:39:53,924-Speed 25180.39 samples/sec   Loss 1.2930   LearningRate 0.0000   Epoch: 37   Global Step: 63960   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 16:40:03,714-Speed 25106.94 samples/sec   Loss 1.2803   LearningRate 0.0000   Epoch: 37   Global Step: 63970   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 16:40:13,681-Speed 24661.78 samples/sec   Loss 1.2751   LearningRate 0.0000   Epoch: 37   Global Step: 63980   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 16:40:23,470-Speed 25108.38 samples/sec   Loss 1.2831   LearningRate 0.0000   Epoch: 37   Global Step: 63990   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 16:40:33,291-Speed 25033.36 samples/sec   Loss 1.2804   LearningRate 0.0000   Epoch: 37   Global Step: 64000   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 16:40:43,198-Speed 24815.85 samples/sec   Loss 1.2725   LearningRate 0.0000   Epoch: 37   Global Step: 64010   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 16:40:52,934-Speed 25245.15 samples/sec   Loss 1.2748   LearningRate 0.0000   Epoch: 37   Global Step: 64020   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 16:41:02,629-Speed 25353.94 samples/sec   Loss 1.2855   LearningRate 0.0000   Epoch: 37   Global Step: 64030   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 16:41:12,408-Speed 25142.46 samples/sec   Loss 1.2784   LearningRate 0.0000   Epoch: 37   Global Step: 64040   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-26 16:41:22,148-Speed 25233.14 samples/sec   Loss 1.2737   LearningRate 0.0000   Epoch: 37   Global Step: 64050   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-26 16:41:31,930-Speed 25136.79 samples/sec   Loss 1.2770   LearningRate 0.0000   Epoch: 37   Global Step: 64060   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-26 16:41:41,768-Speed 24989.63 samples/sec   Loss 1.2751   LearningRate 0.0000   Epoch: 37   Global Step: 64070   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-26 16:41:51,641-Speed 24898.06 samples/sec   Loss 1.2771   LearningRate 0.0000   Epoch: 37   Global Step: 64080   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-26 16:42:01,367-Speed 25271.00 samples/sec   Loss 1.2708   LearningRate 0.0000   Epoch: 37   Global Step: 64090   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-26 16:42:11,149-Speed 25125.88 samples/sec   Loss 1.2781   LearningRate 0.0000   Epoch: 37   Global Step: 64100   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-26 16:42:20,973-Speed 25021.27 samples/sec   Loss 1.2787   LearningRate 0.0000   Epoch: 37   Global Step: 64110   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-26 16:42:30,674-Speed 25335.34 samples/sec   Loss 1.2822   LearningRate 0.0000   Epoch: 37   Global Step: 64120   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-26 16:42:40,565-Speed 24848.38 samples/sec   Loss 1.2706   LearningRate 0.0000   Epoch: 37   Global Step: 64130   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-26 16:42:50,402-Speed 24985.71 samples/sec   Loss 1.2877   LearningRate 0.0000   Epoch: 37   Global Step: 64140   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 16:43:00,232-Speed 25006.97 samples/sec   Loss 1.2794   LearningRate 0.0000   Epoch: 37   Global Step: 64150   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 16:43:09,957-Speed 25273.03 samples/sec   Loss 1.2783   LearningRate 0.0000   Epoch: 37   Global Step: 64160   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 16:43:19,798-Speed 24974.74 samples/sec   Loss 1.2813   LearningRate 0.0000   Epoch: 37   Global Step: 64170   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 16:43:29,555-Speed 25191.61 samples/sec   Loss 1.2683   LearningRate 0.0000   Epoch: 37   Global Step: 64180   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 16:43:39,262-Speed 25327.48 samples/sec   Loss 1.2743   LearningRate 0.0000   Epoch: 37   Global Step: 64190   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 16:43:49,086-Speed 25032.84 samples/sec   Loss 1.2791   LearningRate 0.0000   Epoch: 37   Global Step: 64200   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 16:43:58,873-Speed 25112.65 samples/sec   Loss 1.2822   LearningRate 0.0000   Epoch: 37   Global Step: 64210   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 16:44:08,599-Speed 25279.62 samples/sec   Loss 1.2717   LearningRate 0.0000   Epoch: 37   Global Step: 64220   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 16:44:18,375-Speed 25141.59 samples/sec   Loss 1.2782   LearningRate 0.0000   Epoch: 37   Global Step: 64230   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 16:44:28,192-Speed 25046.48 samples/sec   Loss 1.2732   LearningRate 0.0000   Epoch: 37   Global Step: 64240   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 16:44:38,235-Speed 24473.09 samples/sec   Loss 1.2796   LearningRate 0.0000   Epoch: 37   Global Step: 64250   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 16:44:48,394-Speed 24192.86 samples/sec   Loss 1.2859   LearningRate 0.0000   Epoch: 37   Global Step: 64260   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 16:44:58,427-Speed 24500.04 samples/sec   Loss 1.2754   LearningRate 0.0000   Epoch: 37   Global Step: 64270   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 16:45:08,499-Speed 24404.46 samples/sec   Loss 1.2699   LearningRate 0.0000   Epoch: 37   Global Step: 64280   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 16:45:18,603-Speed 24325.76 samples/sec   Loss 1.2789   LearningRate 0.0000   Epoch: 37   Global Step: 64290   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 16:45:28,691-Speed 24364.58 samples/sec   Loss 1.2675   LearningRate 0.0000   Epoch: 37   Global Step: 64300   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 16:45:38,749-Speed 24441.66 samples/sec   Loss 1.2842   LearningRate 0.0000   Epoch: 37   Global Step: 64310   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 16:45:48,814-Speed 24420.20 samples/sec   Loss 1.2837   LearningRate 0.0000   Epoch: 37   Global Step: 64320   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 16:45:59,076-Speed 23952.30 samples/sec   Loss 1.2874   LearningRate 0.0000   Epoch: 37   Global Step: 64330   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 16:46:09,124-Speed 24460.87 samples/sec   Loss 1.2733   LearningRate 0.0000   Epoch: 37   Global Step: 64340   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 16:46:19,367-Speed 23995.22 samples/sec   Loss 1.2762   LearningRate 0.0000   Epoch: 37   Global Step: 64350   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 16:46:29,474-Speed 24321.04 samples/sec   Loss 1.2736   LearningRate 0.0000   Epoch: 37   Global Step: 64360   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 16:46:39,577-Speed 24329.56 samples/sec   Loss 1.2762   LearningRate 0.0000   Epoch: 37   Global Step: 64370   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 16:46:49,728-Speed 24213.61 samples/sec   Loss 1.2863   LearningRate 0.0000   Epoch: 37   Global Step: 64380   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 16:46:59,836-Speed 24314.16 samples/sec   Loss 1.2768   LearningRate 0.0000   Epoch: 37   Global Step: 64390   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 16:47:09,868-Speed 24501.14 samples/sec   Loss 1.2902   LearningRate 0.0000   Epoch: 37   Global Step: 64400   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 16:47:19,963-Speed 24348.12 samples/sec   Loss 1.2788   LearningRate 0.0000   Epoch: 37   Global Step: 64410   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 16:47:30,172-Speed 24076.18 samples/sec   Loss 1.2764   LearningRate 0.0000   Epoch: 37   Global Step: 64420   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 16:47:40,305-Speed 24257.83 samples/sec   Loss 1.2757   LearningRate 0.0000   Epoch: 37   Global Step: 64430   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 16:47:50,337-Speed 24500.73 samples/sec   Loss 1.2721   LearningRate 0.0000   Epoch: 37   Global Step: 64440   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 16:48:00,404-Speed 24414.82 samples/sec   Loss 1.2706   LearningRate 0.0000   Epoch: 37   Global Step: 64450   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 16:48:10,640-Speed 24011.68 samples/sec   Loss 1.2844   LearningRate 0.0000   Epoch: 37   Global Step: 64460   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 16:48:20,764-Speed 24279.25 samples/sec   Loss 1.2767   LearningRate 0.0000   Epoch: 37   Global Step: 64470   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 16:48:30,886-Speed 24283.45 samples/sec   Loss 1.2816   LearningRate 0.0000   Epoch: 37   Global Step: 64480   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 16:48:40,983-Speed 24342.01 samples/sec   Loss 1.2716   LearningRate 0.0000   Epoch: 37   Global Step: 64490   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 16:48:51,113-Speed 24265.47 samples/sec   Loss 1.2750   LearningRate 0.0000   Epoch: 37   Global Step: 64500   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 16:49:01,237-Speed 24280.10 samples/sec   Loss 1.2764   LearningRate 0.0000   Epoch: 37   Global Step: 64510   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 16:49:11,429-Speed 24116.16 samples/sec   Loss 1.2634   LearningRate 0.0000   Epoch: 37   Global Step: 64520   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 16:49:21,693-Speed 23947.39 samples/sec   Loss 1.2801   LearningRate 0.0000   Epoch: 37   Global Step: 64530   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 16:49:31,848-Speed 24202.95 samples/sec   Loss 1.2695   LearningRate 0.0000   Epoch: 37   Global Step: 64540   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-03-26 16:49:41,924-Speed 24392.89 samples/sec   Loss 1.2733   LearningRate 0.0000   Epoch: 37   Global Step: 64550   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 16:49:52,024-Speed 24337.65 samples/sec   Loss 1.2724   LearningRate 0.0000   Epoch: 37   Global Step: 64560   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 16:50:02,132-Speed 24318.91 samples/sec   Loss 1.2776   LearningRate 0.0000   Epoch: 37   Global Step: 64570   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 16:50:12,304-Speed 24165.29 samples/sec   Loss 1.2735   LearningRate 0.0000   Epoch: 37   Global Step: 64580   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 16:50:22,431-Speed 24274.73 samples/sec   Loss 1.2807   LearningRate 0.0000   Epoch: 37   Global Step: 64590   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 16:50:32,614-Speed 24139.01 samples/sec   Loss 1.2744   LearningRate 0.0000   Epoch: 37   Global Step: 64600   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 16:50:42,716-Speed 24330.26 samples/sec   Loss 1.2813   LearningRate 0.0000   Epoch: 37   Global Step: 64610   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 16:50:52,851-Speed 24254.31 samples/sec   Loss 1.2792   LearningRate 0.0000   Epoch: 37   Global Step: 64620   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 16:51:02,969-Speed 24290.85 samples/sec   Loss 1.2709   LearningRate 0.0000   Epoch: 37   Global Step: 64630   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 16:51:13,121-Speed 24210.97 samples/sec   Loss 1.2751   LearningRate 0.0000   Epoch: 37   Global Step: 64640   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 16:51:23,205-Speed 24373.22 samples/sec   Loss 1.2733   LearningRate 0.0000   Epoch: 37   Global Step: 64650   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 16:51:33,354-Speed 24220.40 samples/sec   Loss 1.2758   LearningRate 0.0000   Epoch: 37   Global Step: 64660   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 16:51:43,501-Speed 24224.84 samples/sec   Loss 1.2768   LearningRate 0.0000   Epoch: 37   Global Step: 64670   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 16:51:53,593-Speed 24356.83 samples/sec   Loss 1.2852   LearningRate 0.0000   Epoch: 37   Global Step: 64680   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 16:52:03,685-Speed 24357.30 samples/sec   Loss 1.2705   LearningRate 0.0000   Epoch: 37   Global Step: 64690   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 16:52:13,801-Speed 24296.98 samples/sec   Loss 1.2850   LearningRate 0.0000   Epoch: 37   Global Step: 64700   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 16:52:23,895-Speed 24349.04 samples/sec   Loss 1.2713   LearningRate 0.0000   Epoch: 37   Global Step: 64710   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 16:52:34,164-Speed 23935.84 samples/sec   Loss 1.2833   LearningRate 0.0000   Epoch: 37   Global Step: 64720   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 16:52:44,316-Speed 24212.27 samples/sec   Loss 1.2744   LearningRate 0.0000   Epoch: 37   Global Step: 64730   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 16:52:54,438-Speed 24281.03 samples/sec   Loss 1.2769   LearningRate 0.0000   Epoch: 37   Global Step: 64740   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 16:53:04,548-Speed 24313.37 samples/sec   Loss 1.2832   LearningRate 0.0000   Epoch: 37   Global Step: 64750   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 16:53:14,683-Speed 24252.02 samples/sec   Loss 1.2746   LearningRate 0.0000   Epoch: 37   Global Step: 64760   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 16:53:24,844-Speed 24190.54 samples/sec   Loss 1.2723   LearningRate 0.0000   Epoch: 37   Global Step: 64770   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 16:53:35,106-Speed 23951.96 samples/sec   Loss 1.2694   LearningRate 0.0000   Epoch: 37   Global Step: 64780   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 16:53:45,283-Speed 24152.27 samples/sec   Loss 1.2602   LearningRate 0.0000   Epoch: 37   Global Step: 64790   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 16:53:55,536-Speed 23973.07 samples/sec   Loss 1.2606   LearningRate 0.0000   Epoch: 37   Global Step: 64800   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 16:54:05,725-Speed 24124.73 samples/sec   Loss 1.2700   LearningRate 0.0000   Epoch: 37   Global Step: 64810   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 16:54:15,979-Speed 23968.49 samples/sec   Loss 1.2711   LearningRate 0.0000   Epoch: 37   Global Step: 64820   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 16:54:26,046-Speed 24417.28 samples/sec   Loss 1.2683   LearningRate 0.0000   Epoch: 37   Global Step: 64830   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-26 16:54:36,164-Speed 24291.50 samples/sec   Loss 1.2729   LearningRate 0.0000   Epoch: 37   Global Step: 64840   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-26 16:54:46,335-Speed 24165.83 samples/sec   Loss 1.2721   LearningRate 0.0000   Epoch: 37   Global Step: 64850   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-26 16:54:56,499-Speed 24183.21 samples/sec   Loss 1.2679   LearningRate 0.0000   Epoch: 37   Global Step: 64860   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-26 16:55:06,623-Speed 24277.94 samples/sec   Loss 1.2739   LearningRate 0.0000   Epoch: 37   Global Step: 64870   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-26 16:55:16,803-Speed 24142.89 samples/sec   Loss 1.2618   LearningRate 0.0000   Epoch: 37   Global Step: 64880   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-26 16:55:27,002-Speed 24101.09 samples/sec   Loss 1.2759   LearningRate 0.0000   Epoch: 37   Global Step: 64890   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-26 16:55:37,088-Speed 24368.18 samples/sec   Loss 1.2687   LearningRate 0.0000   Epoch: 37   Global Step: 64900   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-26 16:55:47,360-Speed 23927.02 samples/sec   Loss 1.2652   LearningRate 0.0000   Epoch: 37   Global Step: 64910   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-26 16:55:57,465-Speed 24324.73 samples/sec   Loss 1.2707   LearningRate 0.0000   Epoch: 37   Global Step: 64920   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-26 16:56:07,552-Speed 24365.89 samples/sec   Loss 1.2735   LearningRate 0.0000   Epoch: 37   Global Step: 64930   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 16:56:17,803-Speed 23977.04 samples/sec   Loss 1.2817   LearningRate 0.0000   Epoch: 37   Global Step: 64940   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-26 16:56:27,954-Speed 24212.13 samples/sec   Loss 1.2755   LearningRate 0.0000   Epoch: 37   Global Step: 64950   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-26 16:56:38,098-Speed 24229.42 samples/sec   Loss 1.2721   LearningRate 0.0000   Epoch: 37   Global Step: 64960   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-26 16:56:48,269-Speed 24165.36 samples/sec   Loss 1.2675   LearningRate 0.0000   Epoch: 37   Global Step: 64970   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-26 16:56:58,397-Speed 24269.50 samples/sec   Loss 1.2741   LearningRate 0.0000   Epoch: 37   Global Step: 64980   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-26 16:57:08,489-Speed 24354.83 samples/sec   Loss 1.2787   LearningRate 0.0000   Epoch: 37   Global Step: 64990   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-26 16:57:18,588-Speed 24336.89 samples/sec   Loss 1.2799   LearningRate 0.0000   Epoch: 37   Global Step: 65000   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-26 16:57:28,658-Speed 24408.07 samples/sec   Loss 1.2688   LearningRate 0.0000   Epoch: 37   Global Step: 65010   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-26 16:57:38,819-Speed 24190.06 samples/sec   Loss 1.2655   LearningRate 0.0000   Epoch: 37   Global Step: 65020   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-26 16:57:48,986-Speed 24176.56 samples/sec   Loss 1.2776   LearningRate 0.0000   Epoch: 37   Global Step: 65030   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-26 16:57:59,210-Speed 24040.58 samples/sec   Loss 1.2708   LearningRate 0.0000   Epoch: 37   Global Step: 65040   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 16:58:09,368-Speed 24196.21 samples/sec   Loss 1.2560   LearningRate 0.0000   Epoch: 37   Global Step: 65050   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 16:58:19,554-Speed 24132.94 samples/sec   Loss 1.2692   LearningRate 0.0000   Epoch: 37   Global Step: 65060   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 16:58:29,669-Speed 24303.22 samples/sec   Loss 1.2724   LearningRate 0.0000   Epoch: 37   Global Step: 65070   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 16:58:39,779-Speed 24312.94 samples/sec   Loss 1.2769   LearningRate 0.0000   Epoch: 37   Global Step: 65080   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 16:58:50,028-Speed 23987.87 samples/sec   Loss 1.2751   LearningRate 0.0000   Epoch: 37   Global Step: 65090   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 16:59:00,189-Speed 24191.95 samples/sec   Loss 1.2710   LearningRate 0.0000   Epoch: 37   Global Step: 65100   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 16:59:10,360-Speed 24164.36 samples/sec   Loss 1.2787   LearningRate 0.0000   Epoch: 37   Global Step: 65110   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 16:59:20,535-Speed 24156.96 samples/sec   Loss 1.2710   LearningRate 0.0000   Epoch: 37   Global Step: 65120   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 16:59:30,689-Speed 24205.09 samples/sec   Loss 1.2704   LearningRate 0.0000   Epoch: 37   Global Step: 65130   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 16:59:40,765-Speed 24395.66 samples/sec   Loss 1.2706   LearningRate 0.0000   Epoch: 37   Global Step: 65140   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 16:59:50,863-Speed 24341.31 samples/sec   Loss 1.2770   LearningRate 0.0000   Epoch: 37   Global Step: 65150   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:00:00,944-Speed 24379.38 samples/sec   Loss 1.2681   LearningRate 0.0000   Epoch: 37   Global Step: 65160   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:00:11,161-Speed 24057.69 samples/sec   Loss 1.2699   LearningRate 0.0000   Epoch: 37   Global Step: 65170   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:00:21,356-Speed 24107.21 samples/sec   Loss 1.2735   LearningRate 0.0000   Epoch: 37   Global Step: 65180   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:00:31,562-Speed 24083.13 samples/sec   Loss 1.2716   LearningRate 0.0000   Epoch: 37   Global Step: 65190   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:00:41,663-Speed 24334.92 samples/sec   Loss 1.2731   LearningRate 0.0000   Epoch: 37   Global Step: 65200   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:00:51,719-Speed 24439.96 samples/sec   Loss 1.2709   LearningRate 0.0000   Epoch: 37   Global Step: 65210   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:01:01,822-Speed 24327.46 samples/sec   Loss 1.2701   LearningRate 0.0000   Epoch: 37   Global Step: 65220   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:01:11,907-Speed 24371.27 samples/sec   Loss 1.2626   LearningRate 0.0000   Epoch: 37   Global Step: 65230   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:01:22,044-Speed 24247.60 samples/sec   Loss 1.2683   LearningRate 0.0000   Epoch: 37   Global Step: 65240   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:01:32,119-Speed 24394.27 samples/sec   Loss 1.2604   LearningRate 0.0000   Epoch: 37   Global Step: 65250   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:01:42,279-Speed 24191.35 samples/sec   Loss 1.2683   LearningRate 0.0000   Epoch: 37   Global Step: 65260   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:01:52,346-Speed 24419.68 samples/sec   Loss 1.2674   LearningRate 0.0000   Epoch: 37   Global Step: 65270   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:02:02,424-Speed 24388.58 samples/sec   Loss 1.2779   LearningRate 0.0000   Epoch: 37   Global Step: 65280   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:02:12,553-Speed 24264.32 samples/sec   Loss 1.2771   LearningRate 0.0000   Epoch: 37   Global Step: 65290   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:02:22,793-Speed 24007.11 samples/sec   Loss 1.2792   LearningRate 0.0000   Epoch: 37   Global Step: 65300   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:02:32,911-Speed 24299.52 samples/sec   Loss 1.2667   LearningRate 0.0000   Epoch: 37   Global Step: 65310   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:02:43,144-Speed 24017.57 samples/sec   Loss 1.2771   LearningRate 0.0000   Epoch: 37   Global Step: 65320   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:02:53,376-Speed 24023.03 samples/sec   Loss 1.2703   LearningRate 0.0000   Epoch: 37   Global Step: 65330   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:03:03,478-Speed 24327.93 samples/sec   Loss 1.2733   LearningRate 0.0000   Epoch: 37   Global Step: 65340   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:03:13,692-Speed 24064.63 samples/sec   Loss 1.2615   LearningRate 0.0000   Epoch: 37   Global Step: 65350   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:03:23,892-Speed 24096.19 samples/sec   Loss 1.2602   LearningRate 0.0000   Epoch: 37   Global Step: 65360   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:03:34,062-Speed 24170.79 samples/sec   Loss 1.2748   LearningRate 0.0000   Epoch: 37   Global Step: 65370   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:03:44,140-Speed 24388.88 samples/sec   Loss 1.2721   LearningRate 0.0000   Epoch: 37   Global Step: 65380   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:03:54,208-Speed 24412.07 samples/sec   Loss 1.2670   LearningRate 0.0000   Epoch: 37   Global Step: 65390   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:04:04,300-Speed 24354.61 samples/sec   Loss 1.2563   LearningRate 0.0000   Epoch: 37   Global Step: 65400   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:04:14,343-Speed 24472.73 samples/sec   Loss 1.2603   LearningRate 0.0000   Epoch: 37   Global Step: 65410   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-26 17:04:24,489-Speed 24225.27 samples/sec   Loss 1.2797   LearningRate 0.0000   Epoch: 37   Global Step: 65420   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-26 17:04:34,660-Speed 24172.40 samples/sec   Loss 1.2663   LearningRate 0.0000   Epoch: 37   Global Step: 65430   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-26 17:04:44,807-Speed 24222.75 samples/sec   Loss 1.2737   LearningRate 0.0000   Epoch: 37   Global Step: 65440   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-26 17:04:55,034-Speed 24032.77 samples/sec   Loss 1.2778   LearningRate 0.0000   Epoch: 37   Global Step: 65450   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-26 17:05:05,136-Speed 24328.95 samples/sec   Loss 1.2753   LearningRate 0.0000   Epoch: 37   Global Step: 65460   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-26 17:05:15,273-Speed 24253.42 samples/sec   Loss 1.2725   LearningRate 0.0000   Epoch: 37   Global Step: 65470   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-26 17:05:25,452-Speed 24151.91 samples/sec   Loss 1.2756   LearningRate 0.0000   Epoch: 37   Global Step: 65480   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-26 17:05:35,547-Speed 24353.91 samples/sec   Loss 1.2801   LearningRate 0.0000   Epoch: 37   Global Step: 65490   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-26 17:05:45,811-Speed 23945.90 samples/sec   Loss 1.2773   LearningRate 0.0000   Epoch: 37   Global Step: 65500   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-26 17:05:55,937-Speed 24273.42 samples/sec   Loss 1.2525   LearningRate 0.0000   Epoch: 37   Global Step: 65510   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:06:06,093-Speed 24200.37 samples/sec   Loss 1.2795   LearningRate 0.0000   Epoch: 37   Global Step: 65520   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:06:16,263-Speed 24168.64 samples/sec   Loss 1.2663   LearningRate 0.0000   Epoch: 37   Global Step: 65530   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:06:26,356-Speed 24352.04 samples/sec   Loss 1.2537   LearningRate 0.0000   Epoch: 37   Global Step: 65540   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:06:36,531-Speed 24157.14 samples/sec   Loss 1.2602   LearningRate 0.0000   Epoch: 37   Global Step: 65550   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:06:46,597-Speed 24415.76 samples/sec   Loss 1.2647   LearningRate 0.0000   Epoch: 37   Global Step: 65560   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:06:56,786-Speed 24122.57 samples/sec   Loss 1.2728   LearningRate 0.0000   Epoch: 37   Global Step: 65570   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:07:06,888-Speed 24331.53 samples/sec   Loss 1.2706   LearningRate 0.0000   Epoch: 37   Global Step: 65580   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:07:16,969-Speed 24382.69 samples/sec   Loss 1.2735   LearningRate 0.0000   Epoch: 37   Global Step: 65590   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:07:27,057-Speed 24364.52 samples/sec   Loss 1.2705   LearningRate 0.0000   Epoch: 37   Global Step: 65600   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:07:37,234-Speed 24151.66 samples/sec   Loss 1.2679   LearningRate 0.0000   Epoch: 37   Global Step: 65610   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:07:47,310-Speed 24403.15 samples/sec   Loss 1.2702   LearningRate 0.0000   Epoch: 37   Global Step: 65620   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:07:57,410-Speed 24336.77 samples/sec   Loss 1.2824   LearningRate 0.0000   Epoch: 37   Global Step: 65630   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:08:07,521-Speed 24310.08 samples/sec   Loss 1.2691   LearningRate 0.0000   Epoch: 37   Global Step: 65640   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:08:17,615-Speed 24356.12 samples/sec   Loss 1.2828   LearningRate 0.0000   Epoch: 37   Global Step: 65650   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:08:27,777-Speed 24187.60 samples/sec   Loss 1.2753   LearningRate 0.0000   Epoch: 37   Global Step: 65660   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:08:37,929-Speed 24210.70 samples/sec   Loss 1.2759   LearningRate 0.0000   Epoch: 37   Global Step: 65670   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:09:37,579-Speed 4120.19 samples/sec   Loss 1.2674   LearningRate 0.0000   Epoch: 38   Global Step: 65680   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:09:47,433-Speed 24943.76 samples/sec   Loss 1.2749   LearningRate 0.0000   Epoch: 38   Global Step: 65690   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:09:57,313-Speed 24875.96 samples/sec   Loss 1.2799   LearningRate 0.0000   Epoch: 38   Global Step: 65700   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:10:07,216-Speed 24822.92 samples/sec   Loss 1.2665   LearningRate 0.0000   Epoch: 38   Global Step: 65710   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-26 17:10:17,173-Speed 24690.95 samples/sec   Loss 1.2523   LearningRate 0.0000   Epoch: 38   Global Step: 65720   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-26 17:10:27,151-Speed 24633.16 samples/sec   Loss 1.2672   LearningRate 0.0000   Epoch: 38   Global Step: 65730   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-26 17:10:37,138-Speed 24618.66 samples/sec   Loss 1.2695   LearningRate 0.0000   Epoch: 38   Global Step: 65740   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-26 17:10:47,099-Speed 24675.63 samples/sec   Loss 1.2646   LearningRate 0.0000   Epoch: 38   Global Step: 65750   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-26 17:10:57,025-Speed 24761.63 samples/sec   Loss 1.2685   LearningRate 0.0000   Epoch: 38   Global Step: 65760   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-26 17:11:07,107-Speed 24378.18 samples/sec   Loss 1.2656   LearningRate 0.0000   Epoch: 38   Global Step: 65770   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-26 17:11:17,076-Speed 24655.75 samples/sec   Loss 1.2578   LearningRate 0.0000   Epoch: 38   Global Step: 65780   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-26 17:11:27,093-Speed 24539.45 samples/sec   Loss 1.2676   LearningRate 0.0000   Epoch: 38   Global Step: 65790   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-26 17:11:37,004-Speed 24799.65 samples/sec   Loss 1.2678   LearningRate 0.0000   Epoch: 38   Global Step: 65800   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-26 17:11:46,964-Speed 24676.63 samples/sec   Loss 1.2720   LearningRate 0.0000   Epoch: 38   Global Step: 65810   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:11:56,934-Speed 24651.91 samples/sec   Loss 1.2568   LearningRate 0.0000   Epoch: 38   Global Step: 65820   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:12:06,977-Speed 24481.39 samples/sec   Loss 1.2624   LearningRate 0.0000   Epoch: 38   Global Step: 65830   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:12:16,998-Speed 24527.50 samples/sec   Loss 1.2709   LearningRate 0.0000   Epoch: 38   Global Step: 65840   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:12:26,957-Speed 24679.62 samples/sec   Loss 1.2626   LearningRate 0.0000   Epoch: 38   Global Step: 65850   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:12:36,980-Speed 24532.02 samples/sec   Loss 1.2710   LearningRate 0.0000   Epoch: 38   Global Step: 65860   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:12:47,003-Speed 24522.93 samples/sec   Loss 1.2651   LearningRate 0.0000   Epoch: 38   Global Step: 65870   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:12:57,009-Speed 24567.37 samples/sec   Loss 1.2695   LearningRate 0.0000   Epoch: 38   Global Step: 65880   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:13:07,070-Speed 24429.51 samples/sec   Loss 1.2672   LearningRate 0.0000   Epoch: 38   Global Step: 65890   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:13:17,107-Speed 24490.09 samples/sec   Loss 1.2654   LearningRate 0.0000   Epoch: 38   Global Step: 65900   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-26 17:13:27,077-Speed 24652.85 samples/sec   Loss 1.2700   LearningRate 0.0000   Epoch: 38   Global Step: 65910   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-26 17:13:37,217-Speed 24238.61 samples/sec   Loss 1.2734   LearningRate 0.0000   Epoch: 38   Global Step: 65920   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-26 17:13:47,216-Speed 24581.92 samples/sec   Loss 1.2711   LearningRate 0.0000   Epoch: 38   Global Step: 65930   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-26 17:13:57,268-Speed 24450.94 samples/sec   Loss 1.2677   LearningRate 0.0000   Epoch: 38   Global Step: 65940   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-26 17:14:07,167-Speed 24829.59 samples/sec   Loss 1.2627   LearningRate 0.0000   Epoch: 38   Global Step: 65950   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-26 17:14:17,137-Speed 24652.95 samples/sec   Loss 1.2609   LearningRate 0.0000   Epoch: 38   Global Step: 65960   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-26 17:14:27,051-Speed 24790.67 samples/sec   Loss 1.2615   LearningRate 0.0000   Epoch: 38   Global Step: 65970   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-26 17:14:37,014-Speed 24670.55 samples/sec   Loss 1.2712   LearningRate 0.0000   Epoch: 38   Global Step: 65980   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-26 17:14:46,962-Speed 24709.75 samples/sec   Loss 1.2751   LearningRate 0.0000   Epoch: 38   Global Step: 65990   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-26 17:14:57,055-Speed 24358.23 samples/sec   Loss 1.2638   LearningRate 0.0000   Epoch: 38   Global Step: 66000   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:15:07,056-Speed 24576.14 samples/sec   Loss 1.2697   LearningRate 0.0000   Epoch: 38   Global Step: 66010   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:15:17,025-Speed 24655.41 samples/sec   Loss 1.2618   LearningRate 0.0000   Epoch: 38   Global Step: 66020   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:15:27,000-Speed 24640.50 samples/sec   Loss 1.2730   LearningRate 0.0000   Epoch: 38   Global Step: 66030   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:15:36,991-Speed 24601.73 samples/sec   Loss 1.2727   LearningRate 0.0000   Epoch: 38   Global Step: 66040   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:15:46,863-Speed 24899.90 samples/sec   Loss 1.2661   LearningRate 0.0000   Epoch: 38   Global Step: 66050   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:15:56,857-Speed 24593.29 samples/sec   Loss 1.2724   LearningRate 0.0000   Epoch: 38   Global Step: 66060   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:16:06,831-Speed 24644.57 samples/sec   Loss 1.2655   LearningRate 0.0000   Epoch: 38   Global Step: 66070   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:16:16,747-Speed 24793.78 samples/sec   Loss 1.2638   LearningRate 0.0000   Epoch: 38   Global Step: 66080   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:16:26,658-Speed 24799.37 samples/sec   Loss 1.2661   LearningRate 0.0000   Epoch: 38   Global Step: 66090   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:16:36,649-Speed 24610.81 samples/sec   Loss 1.2712   LearningRate 0.0000   Epoch: 38   Global Step: 66100   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:16:46,661-Speed 24551.73 samples/sec   Loss 1.2663   LearningRate 0.0000   Epoch: 38   Global Step: 66110   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:16:56,640-Speed 24630.54 samples/sec   Loss 1.2605   LearningRate 0.0000   Epoch: 38   Global Step: 66120   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:17:06,559-Speed 24780.19 samples/sec   Loss 1.2709   LearningRate 0.0000   Epoch: 38   Global Step: 66130   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:17:16,483-Speed 24767.44 samples/sec   Loss 1.2687   LearningRate 0.0000   Epoch: 38   Global Step: 66140   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:17:26,417-Speed 24740.91 samples/sec   Loss 1.2720   LearningRate 0.0000   Epoch: 38   Global Step: 66150   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:17:36,103-Speed 25377.71 samples/sec   Loss 1.2738   LearningRate 0.0000   Epoch: 38   Global Step: 66160   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-26 17:17:45,957-Speed 24942.34 samples/sec   Loss 1.2613   LearningRate 0.0000   Epoch: 38   Global Step: 66170   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-26 17:17:55,841-Speed 24867.46 samples/sec   Loss 1.2692   LearningRate 0.0000   Epoch: 38   Global Step: 66180   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-26 17:18:05,760-Speed 24782.42 samples/sec   Loss 1.2674   LearningRate 0.0000   Epoch: 38   Global Step: 66190   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-26 17:18:15,648-Speed 24857.73 samples/sec   Loss 1.2645   LearningRate 0.0000   Epoch: 38   Global Step: 66200   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-26 17:18:25,591-Speed 24718.29 samples/sec   Loss 1.2632   LearningRate 0.0000   Epoch: 38   Global Step: 66210   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-26 17:18:35,489-Speed 24832.69 samples/sec   Loss 1.2681   LearningRate 0.0000   Epoch: 38   Global Step: 66220   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-26 17:18:45,408-Speed 24781.79 samples/sec   Loss 1.2647   LearningRate 0.0000   Epoch: 38   Global Step: 66230   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-26 17:18:55,304-Speed 24838.96 samples/sec   Loss 1.2595   LearningRate 0.0000   Epoch: 38   Global Step: 66240   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-26 17:19:05,205-Speed 24824.40 samples/sec   Loss 1.2701   LearningRate 0.0000   Epoch: 38   Global Step: 66250   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-26 17:19:15,065-Speed 24929.50 samples/sec   Loss 1.2657   LearningRate 0.0000   Epoch: 38   Global Step: 66260   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:19:24,906-Speed 24973.72 samples/sec   Loss 1.2802   LearningRate 0.0000   Epoch: 38   Global Step: 66270   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:19:34,897-Speed 24601.97 samples/sec   Loss 1.2747   LearningRate 0.0000   Epoch: 38   Global Step: 66280   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:19:44,702-Speed 25070.64 samples/sec   Loss 1.2665   LearningRate 0.0000   Epoch: 38   Global Step: 66290   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:19:54,517-Speed 25040.78 samples/sec   Loss 1.2737   LearningRate 0.0000   Epoch: 38   Global Step: 66300   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:20:04,478-Speed 24675.58 samples/sec   Loss 1.2680   LearningRate 0.0000   Epoch: 38   Global Step: 66310   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:20:14,223-Speed 25222.25 samples/sec   Loss 1.2584   LearningRate 0.0000   Epoch: 38   Global Step: 66320   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:20:24,106-Speed 24869.05 samples/sec   Loss 1.2617   LearningRate 0.0000   Epoch: 38   Global Step: 66330   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:20:33,881-Speed 25146.69 samples/sec   Loss 1.2660   LearningRate 0.0000   Epoch: 38   Global Step: 66340   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:20:43,713-Speed 24996.87 samples/sec   Loss 1.2692   LearningRate 0.0000   Epoch: 38   Global Step: 66350   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:20:53,517-Speed 25070.32 samples/sec   Loss 1.2661   LearningRate 0.0000   Epoch: 38   Global Step: 66360   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:21:03,312-Speed 25093.51 samples/sec   Loss 1.2614   LearningRate 0.0000   Epoch: 38   Global Step: 66370   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:21:13,278-Speed 24661.35 samples/sec   Loss 1.2722   LearningRate 0.0000   Epoch: 38   Global Step: 66380   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:21:23,247-Speed 24654.69 samples/sec   Loss 1.2670   LearningRate 0.0000   Epoch: 38   Global Step: 66390   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:21:33,156-Speed 24804.07 samples/sec   Loss 1.2705   LearningRate 0.0000   Epoch: 38   Global Step: 66400   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:21:43,093-Speed 24736.72 samples/sec   Loss 1.2682   LearningRate 0.0000   Epoch: 38   Global Step: 66410   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:21:53,057-Speed 24665.28 samples/sec   Loss 1.2644   LearningRate 0.0000   Epoch: 38   Global Step: 66420   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:22:02,864-Speed 25063.72 samples/sec   Loss 1.2631   LearningRate 0.0000   Epoch: 38   Global Step: 66430   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:22:12,759-Speed 24839.57 samples/sec   Loss 1.2626   LearningRate 0.0000   Epoch: 38   Global Step: 66440   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:22:22,645-Speed 24862.67 samples/sec   Loss 1.2672   LearningRate 0.0000   Epoch: 38   Global Step: 66450   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:22:32,538-Speed 24844.03 samples/sec   Loss 1.2648   LearningRate 0.0000   Epoch: 38   Global Step: 66460   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:22:42,592-Speed 24447.23 samples/sec   Loss 1.2701   LearningRate 0.0000   Epoch: 38   Global Step: 66470   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:22:52,454-Speed 24921.96 samples/sec   Loss 1.2621   LearningRate 0.0000   Epoch: 38   Global Step: 66480   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:23:02,269-Speed 25043.96 samples/sec   Loss 1.2640   LearningRate 0.0000   Epoch: 38   Global Step: 66490   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:23:12,044-Speed 25144.20 samples/sec   Loss 1.2653   LearningRate 0.0000   Epoch: 38   Global Step: 66500   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:23:22,001-Speed 24683.99 samples/sec   Loss 1.2601   LearningRate 0.0000   Epoch: 38   Global Step: 66510   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:23:31,976-Speed 24641.97 samples/sec   Loss 1.2668   LearningRate 0.0000   Epoch: 38   Global Step: 66520   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:23:41,817-Speed 24976.38 samples/sec   Loss 1.2530   LearningRate 0.0000   Epoch: 38   Global Step: 66530   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:23:51,662-Speed 24965.36 samples/sec   Loss 1.2673   LearningRate 0.0000   Epoch: 38   Global Step: 66540   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:24:01,468-Speed 25063.10 samples/sec   Loss 1.2514   LearningRate 0.0000   Epoch: 38   Global Step: 66550   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:24:11,301-Speed 24995.66 samples/sec   Loss 1.2578   LearningRate 0.0000   Epoch: 38   Global Step: 66560   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:24:21,144-Speed 24972.57 samples/sec   Loss 1.2707   LearningRate 0.0000   Epoch: 38   Global Step: 66570   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:24:31,262-Speed 24292.05 samples/sec   Loss 1.2585   LearningRate 0.0000   Epoch: 38   Global Step: 66580   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:24:41,069-Speed 25060.43 samples/sec   Loss 1.2666   LearningRate 0.0000   Epoch: 38   Global Step: 66590   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:24:50,854-Speed 25119.21 samples/sec   Loss 1.2605   LearningRate 0.0000   Epoch: 38   Global Step: 66600   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:25:00,760-Speed 24819.25 samples/sec   Loss 1.2655   LearningRate 0.0000   Epoch: 38   Global Step: 66610   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:25:10,639-Speed 24878.58 samples/sec   Loss 1.2592   LearningRate 0.0000   Epoch: 38   Global Step: 66620   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:25:20,465-Speed 25013.61 samples/sec   Loss 1.2651   LearningRate 0.0000   Epoch: 38   Global Step: 66630   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:25:30,283-Speed 25034.73 samples/sec   Loss 1.2664   LearningRate 0.0000   Epoch: 38   Global Step: 66640   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:25:40,062-Speed 25134.98 samples/sec   Loss 1.2618   LearningRate 0.0000   Epoch: 38   Global Step: 66650   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:25:49,943-Speed 24873.61 samples/sec   Loss 1.2560   LearningRate 0.0000   Epoch: 38   Global Step: 66660   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-03-26 17:25:59,735-Speed 25101.76 samples/sec   Loss 1.2637   LearningRate 0.0000   Epoch: 38   Global Step: 66670   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:26:09,614-Speed 24880.33 samples/sec   Loss 1.2626   LearningRate 0.0000   Epoch: 38   Global Step: 66680   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:26:19,406-Speed 25101.21 samples/sec   Loss 1.2646   LearningRate 0.0000   Epoch: 38   Global Step: 66690   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:26:29,292-Speed 24863.29 samples/sec   Loss 1.2635   LearningRate 0.0000   Epoch: 38   Global Step: 66700   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:26:39,135-Speed 24968.67 samples/sec   Loss 1.2578   LearningRate 0.0000   Epoch: 38   Global Step: 66710   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:26:48,949-Speed 25044.69 samples/sec   Loss 1.2635   LearningRate 0.0000   Epoch: 38   Global Step: 66720   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:26:58,751-Speed 25077.92 samples/sec   Loss 1.2592   LearningRate 0.0000   Epoch: 38   Global Step: 66730   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:27:08,501-Speed 25207.75 samples/sec   Loss 1.2624   LearningRate 0.0000   Epoch: 38   Global Step: 66740   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:27:18,294-Speed 25098.61 samples/sec   Loss 1.2651   LearningRate 0.0000   Epoch: 38   Global Step: 66750   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:27:28,185-Speed 24851.93 samples/sec   Loss 1.2651   LearningRate 0.0000   Epoch: 38   Global Step: 66760   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:27:38,250-Speed 24419.52 samples/sec   Loss 1.2660   LearningRate 0.0000   Epoch: 38   Global Step: 66770   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:27:48,104-Speed 24944.38 samples/sec   Loss 1.2636   LearningRate 0.0000   Epoch: 38   Global Step: 66780   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:27:57,990-Speed 24860.29 samples/sec   Loss 1.2533   LearningRate 0.0000   Epoch: 38   Global Step: 66790   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:28:07,678-Speed 25370.67 samples/sec   Loss 1.2553   LearningRate 0.0000   Epoch: 38   Global Step: 66800   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:28:17,391-Speed 25303.50 samples/sec   Loss 1.2689   LearningRate 0.0000   Epoch: 38   Global Step: 66810   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:28:27,202-Speed 25053.99 samples/sec   Loss 1.2641   LearningRate 0.0000   Epoch: 38   Global Step: 66820   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:28:37,038-Speed 24989.24 samples/sec   Loss 1.2650   LearningRate 0.0000   Epoch: 38   Global Step: 66830   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:28:46,759-Speed 25283.97 samples/sec   Loss 1.2609   LearningRate 0.0000   Epoch: 38   Global Step: 66840   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:28:56,516-Speed 25191.27 samples/sec   Loss 1.2523   LearningRate 0.0000   Epoch: 38   Global Step: 66850   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:29:06,355-Speed 24981.42 samples/sec   Loss 1.2666   LearningRate 0.0000   Epoch: 38   Global Step: 66860   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:29:16,213-Speed 24935.28 samples/sec   Loss 1.2686   LearningRate 0.0000   Epoch: 38   Global Step: 66870   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:29:26,095-Speed 24890.64 samples/sec   Loss 1.2588   LearningRate 0.0000   Epoch: 38   Global Step: 66880   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:29:35,855-Speed 25185.01 samples/sec   Loss 1.2546   LearningRate 0.0000   Epoch: 38   Global Step: 66890   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:29:45,639-Speed 25122.92 samples/sec   Loss 1.2608   LearningRate 0.0000   Epoch: 38   Global Step: 66900   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:29:55,454-Speed 25042.64 samples/sec   Loss 1.2675   LearningRate 0.0000   Epoch: 38   Global Step: 66910   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:30:05,304-Speed 24960.36 samples/sec   Loss 1.2459   LearningRate 0.0000   Epoch: 38   Global Step: 66920   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:30:15,077-Speed 25149.67 samples/sec   Loss 1.2709   LearningRate 0.0000   Epoch: 38   Global Step: 66930   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:30:24,912-Speed 24999.51 samples/sec   Loss 1.2688   LearningRate 0.0000   Epoch: 38   Global Step: 66940   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:30:34,778-Speed 24911.89 samples/sec   Loss 1.2690   LearningRate 0.0000   Epoch: 38   Global Step: 66950   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:30:44,511-Speed 25255.24 samples/sec   Loss 1.2576   LearningRate 0.0000   Epoch: 38   Global Step: 66960   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:30:54,346-Speed 24991.29 samples/sec   Loss 1.2684   LearningRate 0.0000   Epoch: 38   Global Step: 66970   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:31:04,160-Speed 25045.45 samples/sec   Loss 1.2776   LearningRate 0.0000   Epoch: 38   Global Step: 66980   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:31:13,976-Speed 25038.54 samples/sec   Loss 1.2652   LearningRate 0.0000   Epoch: 38   Global Step: 66990   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:31:23,767-Speed 25105.28 samples/sec   Loss 1.2565   LearningRate 0.0000   Epoch: 38   Global Step: 67000   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:31:33,477-Speed 25313.67 samples/sec   Loss 1.2709   LearningRate 0.0000   Epoch: 38   Global Step: 67010   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:31:43,251-Speed 25146.62 samples/sec   Loss 1.2620   LearningRate 0.0000   Epoch: 38   Global Step: 67020   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:31:53,019-Speed 25163.19 samples/sec   Loss 1.2593   LearningRate 0.0000   Epoch: 38   Global Step: 67030   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:32:02,766-Speed 25219.46 samples/sec   Loss 1.2583   LearningRate 0.0000   Epoch: 38   Global Step: 67040   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:32:12,471-Speed 25331.74 samples/sec   Loss 1.2613   LearningRate 0.0000   Epoch: 38   Global Step: 67050   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:32:22,300-Speed 25005.52 samples/sec   Loss 1.2596   LearningRate 0.0000   Epoch: 38   Global Step: 67060   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:32:32,001-Speed 25338.16 samples/sec   Loss 1.2668   LearningRate 0.0000   Epoch: 38   Global Step: 67070   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:32:41,791-Speed 25106.29 samples/sec   Loss 1.2613   LearningRate 0.0000   Epoch: 38   Global Step: 67080   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:32:51,622-Speed 25003.47 samples/sec   Loss 1.2606   LearningRate 0.0000   Epoch: 38   Global Step: 67090   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:33:01,335-Speed 25304.83 samples/sec   Loss 1.2540   LearningRate 0.0000   Epoch: 38   Global Step: 67100   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:33:11,074-Speed 25238.75 samples/sec   Loss 1.2552   LearningRate 0.0000   Epoch: 38   Global Step: 67110   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:33:20,869-Speed 25095.24 samples/sec   Loss 1.2628   LearningRate 0.0000   Epoch: 38   Global Step: 67120   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:33:30,611-Speed 25229.15 samples/sec   Loss 1.2625   LearningRate 0.0000   Epoch: 38   Global Step: 67130   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:33:40,322-Speed 25311.67 samples/sec   Loss 1.2674   LearningRate 0.0000   Epoch: 38   Global Step: 67140   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:33:50,135-Speed 25048.28 samples/sec   Loss 1.2667   LearningRate 0.0000   Epoch: 38   Global Step: 67150   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:33:59,898-Speed 25176.21 samples/sec   Loss 1.2555   LearningRate 0.0000   Epoch: 38   Global Step: 67160   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:34:09,632-Speed 25249.19 samples/sec   Loss 1.2588   LearningRate 0.0000   Epoch: 38   Global Step: 67170   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:34:19,456-Speed 25021.52 samples/sec   Loss 1.2609   LearningRate 0.0000   Epoch: 38   Global Step: 67180   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:34:29,278-Speed 25023.44 samples/sec   Loss 1.2684   LearningRate 0.0000   Epoch: 38   Global Step: 67190   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:34:39,135-Speed 24937.09 samples/sec   Loss 1.2688   LearningRate 0.0000   Epoch: 38   Global Step: 67200   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:34:48,990-Speed 24941.17 samples/sec   Loss 1.2585   LearningRate 0.0000   Epoch: 38   Global Step: 67210   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:34:58,892-Speed 24827.93 samples/sec   Loss 1.2594   LearningRate 0.0000   Epoch: 38   Global Step: 67220   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:35:08,790-Speed 24832.53 samples/sec   Loss 1.2691   LearningRate 0.0000   Epoch: 38   Global Step: 67230   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:35:18,611-Speed 25028.81 samples/sec   Loss 1.2560   LearningRate 0.0000   Epoch: 38   Global Step: 67240   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:35:28,450-Speed 24980.93 samples/sec   Loss 1.2659   LearningRate 0.0000   Epoch: 38   Global Step: 67250   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:35:38,203-Speed 25205.29 samples/sec   Loss 1.2528   LearningRate 0.0000   Epoch: 38   Global Step: 67260   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:35:47,984-Speed 25134.13 samples/sec   Loss 1.2614   LearningRate 0.0000   Epoch: 38   Global Step: 67270   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:35:57,777-Speed 25097.44 samples/sec   Loss 1.2654   LearningRate 0.0000   Epoch: 38   Global Step: 67280   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:36:07,577-Speed 25082.78 samples/sec   Loss 1.2696   LearningRate 0.0000   Epoch: 38   Global Step: 67290   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:36:17,372-Speed 25093.79 samples/sec   Loss 1.2595   LearningRate 0.0000   Epoch: 38   Global Step: 67300   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:36:27,105-Speed 25254.63 samples/sec   Loss 1.2573   LearningRate 0.0000   Epoch: 38   Global Step: 67310   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:36:36,902-Speed 25088.84 samples/sec   Loss 1.2626   LearningRate 0.0000   Epoch: 38   Global Step: 67320   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:36:46,692-Speed 25108.94 samples/sec   Loss 1.2682   LearningRate 0.0000   Epoch: 38   Global Step: 67330   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:36:56,594-Speed 24822.68 samples/sec   Loss 1.2579   LearningRate 0.0000   Epoch: 38   Global Step: 67340   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-26 17:37:06,391-Speed 25090.15 samples/sec   Loss 1.2672   LearningRate 0.0000   Epoch: 38   Global Step: 67350   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 17:37:16,299-Speed 24805.52 samples/sec   Loss 1.2602   LearningRate 0.0000   Epoch: 38   Global Step: 67360   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 17:37:26,069-Speed 25158.87 samples/sec   Loss 1.2684   LearningRate 0.0000   Epoch: 38   Global Step: 67370   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 17:37:35,887-Speed 25034.91 samples/sec   Loss 1.2662   LearningRate 0.0000   Epoch: 38   Global Step: 67380   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 17:37:45,659-Speed 25162.35 samples/sec   Loss 1.2652   LearningRate 0.0000   Epoch: 38   Global Step: 67390   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 17:37:55,456-Speed 25089.20 samples/sec   Loss 1.2706   LearningRate 0.0000   Epoch: 38   Global Step: 67400   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 17:38:55,153-Speed 4116.84 samples/sec   Loss 1.2665   LearningRate 0.0000   Epoch: 39   Global Step: 67410   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 17:39:04,993-Speed 24986.97 samples/sec   Loss 1.2640   LearningRate 0.0000   Epoch: 39   Global Step: 67420   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 17:39:14,789-Speed 25092.19 samples/sec   Loss 1.2711   LearningRate 0.0000   Epoch: 39   Global Step: 67430   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 17:39:24,539-Speed 25210.30 samples/sec   Loss 1.2616   LearningRate 0.0000   Epoch: 39   Global Step: 67440   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 17:39:34,286-Speed 25218.18 samples/sec   Loss 1.2611   LearningRate 0.0000   Epoch: 39   Global Step: 67450   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 17:39:44,102-Speed 25038.26 samples/sec   Loss 1.2645   LearningRate 0.0000   Epoch: 39   Global Step: 67460   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 17:39:53,913-Speed 25052.23 samples/sec   Loss 1.2641   LearningRate 0.0000   Epoch: 39   Global Step: 67470   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-03-26 17:40:03,743-Speed 25003.92 samples/sec   Loss 1.2525   LearningRate 0.0000   Epoch: 39   Global Step: 67480   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 17:40:13,575-Speed 25007.45 samples/sec   Loss 1.2592   LearningRate 0.0000   Epoch: 39   Global Step: 67490   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 17:40:23,352-Speed 25140.08 samples/sec   Loss 1.2626   LearningRate 0.0000   Epoch: 39   Global Step: 67500   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 17:40:33,181-Speed 25007.93 samples/sec   Loss 1.2647   LearningRate 0.0000   Epoch: 39   Global Step: 67510   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 17:40:42,965-Speed 25119.76 samples/sec   Loss 1.2552   LearningRate 0.0000   Epoch: 39   Global Step: 67520   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 17:40:52,813-Speed 24962.22 samples/sec   Loss 1.2559   LearningRate 0.0000   Epoch: 39   Global Step: 67530   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 17:41:02,679-Speed 24913.79 samples/sec   Loss 1.2513   LearningRate 0.0000   Epoch: 39   Global Step: 67540   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 17:41:12,549-Speed 24908.10 samples/sec   Loss 1.2560   LearningRate 0.0000   Epoch: 39   Global Step: 67550   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 17:41:22,410-Speed 24926.07 samples/sec   Loss 1.2566   LearningRate 0.0000   Epoch: 39   Global Step: 67560   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 17:41:32,298-Speed 24858.11 samples/sec   Loss 1.2532   LearningRate 0.0000   Epoch: 39   Global Step: 67570   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 17:41:42,005-Speed 25321.86 samples/sec   Loss 1.2598   LearningRate 0.0000   Epoch: 39   Global Step: 67580   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 17:41:51,722-Speed 25295.90 samples/sec   Loss 1.2586   LearningRate 0.0000   Epoch: 39   Global Step: 67590   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 17:42:01,414-Speed 25361.20 samples/sec   Loss 1.2608   LearningRate 0.0000   Epoch: 39   Global Step: 67600   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 17:42:11,151-Speed 25242.92 samples/sec   Loss 1.2615   LearningRate 0.0000   Epoch: 39   Global Step: 67610   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 17:42:20,858-Speed 25320.34 samples/sec   Loss 1.2597   LearningRate 0.0000   Epoch: 39   Global Step: 67620   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 17:42:30,770-Speed 24798.10 samples/sec   Loss 1.2612   LearningRate 0.0000   Epoch: 39   Global Step: 67630   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 17:42:40,560-Speed 25107.85 samples/sec   Loss 1.2662   LearningRate 0.0000   Epoch: 39   Global Step: 67640   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 17:42:50,359-Speed 25081.79 samples/sec   Loss 1.2561   LearningRate 0.0000   Epoch: 39   Global Step: 67650   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 17:43:00,070-Speed 25312.14 samples/sec   Loss 1.2667   LearningRate 0.0000   Epoch: 39   Global Step: 67660   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 17:43:09,932-Speed 24921.84 samples/sec   Loss 1.2671   LearningRate 0.0000   Epoch: 39   Global Step: 67670   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 17:43:19,774-Speed 24974.41 samples/sec   Loss 1.2625   LearningRate 0.0000   Epoch: 39   Global Step: 67680   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 17:43:29,482-Speed 25317.43 samples/sec   Loss 1.2620   LearningRate 0.0000   Epoch: 39   Global Step: 67690   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 17:43:39,315-Speed 24995.80 samples/sec   Loss 1.2631   LearningRate 0.0000   Epoch: 39   Global Step: 67700   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 17:43:49,109-Speed 25097.75 samples/sec   Loss 1.2684   LearningRate 0.0000   Epoch: 39   Global Step: 67710   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 17:43:59,011-Speed 24831.08 samples/sec   Loss 1.2686   LearningRate 0.0000   Epoch: 39   Global Step: 67720   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 17:44:08,729-Speed 25294.58 samples/sec   Loss 1.2530   LearningRate 0.0000   Epoch: 39   Global Step: 67730   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 17:44:18,444-Speed 25302.64 samples/sec   Loss 1.2694   LearningRate 0.0000   Epoch: 39   Global Step: 67740   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 17:44:28,252-Speed 25060.39 samples/sec   Loss 1.2578   LearningRate 0.0000   Epoch: 39   Global Step: 67750   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 17:44:38,045-Speed 25098.63 samples/sec   Loss 1.2603   LearningRate 0.0000   Epoch: 39   Global Step: 67760   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 17:44:47,903-Speed 24932.58 samples/sec   Loss 1.2626   LearningRate 0.0000   Epoch: 39   Global Step: 67770   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 17:44:57,797-Speed 24843.21 samples/sec   Loss 1.2660   LearningRate 0.0000   Epoch: 39   Global Step: 67780   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 17:45:07,565-Speed 25163.05 samples/sec   Loss 1.2631   LearningRate 0.0000   Epoch: 39   Global Step: 67790   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 17:45:17,361-Speed 25090.48 samples/sec   Loss 1.2493   LearningRate 0.0000   Epoch: 39   Global Step: 67800   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 17:45:27,205-Speed 24973.01 samples/sec   Loss 1.2620   LearningRate 0.0000   Epoch: 39   Global Step: 67810   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 17:45:36,993-Speed 25111.76 samples/sec   Loss 1.2473   LearningRate 0.0000   Epoch: 39   Global Step: 67820   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 17:45:46,848-Speed 24941.35 samples/sec   Loss 1.2674   LearningRate 0.0000   Epoch: 39   Global Step: 67830   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 17:45:56,612-Speed 25170.93 samples/sec   Loss 1.2727   LearningRate 0.0000   Epoch: 39   Global Step: 67840   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 17:46:06,307-Speed 25352.85 samples/sec   Loss 1.2510   LearningRate 0.0000   Epoch: 39   Global Step: 67850   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 17:46:16,111-Speed 25071.71 samples/sec   Loss 1.2671   LearningRate 0.0000   Epoch: 39   Global Step: 67860   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 17:46:25,953-Speed 24973.25 samples/sec   Loss 1.2593   LearningRate 0.0000   Epoch: 39   Global Step: 67870   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 17:46:35,745-Speed 25109.92 samples/sec   Loss 1.2608   LearningRate 0.0000   Epoch: 39   Global Step: 67880   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 17:46:45,546-Speed 25078.90 samples/sec   Loss 1.2746   LearningRate 0.0000   Epoch: 39   Global Step: 67890   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 17:46:55,241-Speed 25351.16 samples/sec   Loss 1.2613   LearningRate 0.0000   Epoch: 39   Global Step: 67900   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-03-26 17:47:05,028-Speed 25114.68 samples/sec   Loss 1.2623   LearningRate 0.0000   Epoch: 39   Global Step: 67910   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-03-26 17:47:15,143-Speed 24301.57 samples/sec   Loss 1.2562   LearningRate 0.0000   Epoch: 39   Global Step: 67920   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-03-26 17:47:25,237-Speed 24350.01 samples/sec   Loss 1.2584   LearningRate 0.0000   Epoch: 39   Global Step: 67930   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-03-26 17:47:35,359-Speed 24284.13 samples/sec   Loss 1.2644   LearningRate 0.0000   Epoch: 39   Global Step: 67940   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-03-26 17:47:45,429-Speed 24409.34 samples/sec   Loss 1.2617   LearningRate 0.0000   Epoch: 39   Global Step: 67950   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-03-26 17:47:55,479-Speed 24455.61 samples/sec   Loss 1.2568   LearningRate 0.0000   Epoch: 39   Global Step: 67960   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-03-26 17:48:05,538-Speed 24434.91 samples/sec   Loss 1.2635   LearningRate 0.0000   Epoch: 39   Global Step: 67970   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-03-26 17:48:15,623-Speed 24373.32 samples/sec   Loss 1.2620   LearningRate 0.0000   Epoch: 39   Global Step: 67980   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-03-26 17:48:25,717-Speed 24349.37 samples/sec   Loss 1.2618   LearningRate 0.0000   Epoch: 39   Global Step: 67990   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-03-26 17:48:35,778-Speed 24430.42 samples/sec   Loss 1.2611   LearningRate 0.0000   Epoch: 39   Global Step: 68000   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 17:48:45,851-Speed 24402.19 samples/sec   Loss 1.2609   LearningRate 0.0000   Epoch: 39   Global Step: 68010   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 17:48:55,912-Speed 24429.96 samples/sec   Loss 1.2608   LearningRate 0.0000   Epoch: 39   Global Step: 68020   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 17:49:05,955-Speed 24474.28 samples/sec   Loss 1.2559   LearningRate 0.0000   Epoch: 39   Global Step: 68030   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 17:49:16,043-Speed 24370.28 samples/sec   Loss 1.2550   LearningRate 0.0000   Epoch: 39   Global Step: 68040   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 17:49:26,178-Speed 24252.18 samples/sec   Loss 1.2637   LearningRate 0.0000   Epoch: 39   Global Step: 68050   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 17:49:36,251-Speed 24399.95 samples/sec   Loss 1.2626   LearningRate 0.0000   Epoch: 39   Global Step: 68060   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 17:49:46,431-Speed 24145.25 samples/sec   Loss 1.2574   LearningRate 0.0000   Epoch: 39   Global Step: 68070   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 17:49:56,493-Speed 24426.47 samples/sec   Loss 1.2574   LearningRate 0.0000   Epoch: 39   Global Step: 68080   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-03-26 17:50:06,593-Speed 24336.47 samples/sec   Loss 1.2679   LearningRate 0.0000   Epoch: 39   Global Step: 68090   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-03-26 17:50:16,684-Speed 24357.38 samples/sec   Loss 1.2598   LearningRate 0.0000   Epoch: 39   Global Step: 68100   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-03-26 17:50:26,778-Speed 24357.96 samples/sec   Loss 1.2605   LearningRate 0.0000   Epoch: 39   Global Step: 68110   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-03-26 17:50:36,892-Speed 24307.90 samples/sec   Loss 1.2672   LearningRate 0.0000   Epoch: 39   Global Step: 68120   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-03-26 17:50:47,041-Speed 24217.74 samples/sec   Loss 1.2625   LearningRate 0.0000   Epoch: 39   Global Step: 68130   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-03-26 17:50:57,225-Speed 24136.81 samples/sec   Loss 1.2596   LearningRate 0.0000   Epoch: 39   Global Step: 68140   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-03-26 17:51:07,281-Speed 24443.33 samples/sec   Loss 1.2531   LearningRate 0.0000   Epoch: 39   Global Step: 68150   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-03-26 17:51:17,372-Speed 24359.11 samples/sec   Loss 1.2753   LearningRate 0.0000   Epoch: 39   Global Step: 68160   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-03-26 17:51:27,509-Speed 24247.91 samples/sec   Loss 1.2558   LearningRate 0.0000   Epoch: 39   Global Step: 68170   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-03-26 17:51:37,594-Speed 24371.88 samples/sec   Loss 1.2506   LearningRate 0.0000   Epoch: 39   Global Step: 68180   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 17:51:47,748-Speed 24207.68 samples/sec   Loss 1.2659   LearningRate 0.0000   Epoch: 39   Global Step: 68190   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 17:51:57,946-Speed 24102.46 samples/sec   Loss 1.2733   LearningRate 0.0000   Epoch: 39   Global Step: 68200   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 17:52:08,074-Speed 24268.56 samples/sec   Loss 1.2551   LearningRate 0.0000   Epoch: 39   Global Step: 68210   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 17:52:18,178-Speed 24326.44 samples/sec   Loss 1.2537   LearningRate 0.0000   Epoch: 39   Global Step: 68220   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 17:52:28,319-Speed 24238.41 samples/sec   Loss 1.2656   LearningRate 0.0000   Epoch: 39   Global Step: 68230   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 17:52:38,484-Speed 24179.98 samples/sec   Loss 1.2524   LearningRate 0.0000   Epoch: 39   Global Step: 68240   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 17:52:48,592-Speed 24324.26 samples/sec   Loss 1.2617   LearningRate 0.0000   Epoch: 39   Global Step: 68250   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 17:52:58,710-Speed 24290.14 samples/sec   Loss 1.2502   LearningRate 0.0000   Epoch: 39   Global Step: 68260   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 17:53:08,765-Speed 24446.43 samples/sec   Loss 1.2542   LearningRate 0.0000   Epoch: 39   Global Step: 68270   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 17:53:18,892-Speed 24278.85 samples/sec   Loss 1.2606   LearningRate 0.0000   Epoch: 39   Global Step: 68280   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 17:53:28,926-Speed 24495.75 samples/sec   Loss 1.2488   LearningRate 0.0000   Epoch: 39   Global Step: 68290   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-03-26 17:53:39,003-Speed 24392.08 samples/sec   Loss 1.2537   LearningRate 0.0000   Epoch: 39   Global Step: 68300   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-03-26 17:53:49,105-Speed 24330.59 samples/sec   Loss 1.2560   LearningRate 0.0000   Epoch: 39   Global Step: 68310   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-03-26 17:53:59,135-Speed 24505.50 samples/sec   Loss 1.2611   LearningRate 0.0000   Epoch: 39   Global Step: 68320   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-03-26 17:54:09,241-Speed 24320.04 samples/sec   Loss 1.2468   LearningRate 0.0000   Epoch: 39   Global Step: 68330   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-03-26 17:54:19,406-Speed 24181.56 samples/sec   Loss 1.2601   LearningRate 0.0000   Epoch: 39   Global Step: 68340   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-03-26 17:54:29,583-Speed 24148.80 samples/sec   Loss 1.2595   LearningRate 0.0000   Epoch: 39   Global Step: 68350   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-03-26 17:54:39,733-Speed 24218.21 samples/sec   Loss 1.2567   LearningRate 0.0000   Epoch: 39   Global Step: 68360   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-03-26 17:54:49,908-Speed 24157.19 samples/sec   Loss 1.2555   LearningRate 0.0000   Epoch: 39   Global Step: 68370   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-03-26 17:54:59,724-Speed 25040.05 samples/sec   Loss 1.2692   LearningRate 0.0000   Epoch: 39   Global Step: 68380   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-03-26 17:55:09,530-Speed 25067.75 samples/sec   Loss 1.2510   LearningRate 0.0000   Epoch: 39   Global Step: 68390   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 17:55:19,412-Speed 24875.30 samples/sec   Loss 1.2581   LearningRate 0.0000   Epoch: 39   Global Step: 68400   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 17:55:29,176-Speed 25173.41 samples/sec   Loss 1.2603   LearningRate 0.0000   Epoch: 39   Global Step: 68410   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 17:55:38,955-Speed 25134.69 samples/sec   Loss 1.2678   LearningRate 0.0000   Epoch: 39   Global Step: 68420   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 17:55:48,788-Speed 24998.81 samples/sec   Loss 1.2622   LearningRate 0.0000   Epoch: 39   Global Step: 68430   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 17:55:58,595-Speed 25063.46 samples/sec   Loss 1.2598   LearningRate 0.0000   Epoch: 39   Global Step: 68440   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 17:56:08,446-Speed 24953.19 samples/sec   Loss 1.2577   LearningRate 0.0000   Epoch: 39   Global Step: 68450   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 17:56:18,247-Speed 25078.83 samples/sec   Loss 1.2552   LearningRate 0.0000   Epoch: 39   Global Step: 68460   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 17:56:28,081-Speed 24996.67 samples/sec   Loss 1.2616   LearningRate 0.0000   Epoch: 39   Global Step: 68470   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 17:56:37,861-Speed 25132.88 samples/sec   Loss 1.2589   LearningRate 0.0000   Epoch: 39   Global Step: 68480   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 17:56:47,711-Speed 24951.09 samples/sec   Loss 1.2567   LearningRate 0.0000   Epoch: 39   Global Step: 68490   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 17:56:57,519-Speed 25059.28 samples/sec   Loss 1.2639   LearningRate 0.0000   Epoch: 39   Global Step: 68500   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 17:57:07,319-Speed 25082.45 samples/sec   Loss 1.2525   LearningRate 0.0000   Epoch: 39   Global Step: 68510   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 17:57:17,132-Speed 25047.20 samples/sec   Loss 1.2639   LearningRate 0.0000   Epoch: 39   Global Step: 68520   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 17:57:26,889-Speed 25192.34 samples/sec   Loss 1.2584   LearningRate 0.0000   Epoch: 39   Global Step: 68530   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 17:57:36,697-Speed 25059.92 samples/sec   Loss 1.2652   LearningRate 0.0000   Epoch: 39   Global Step: 68540   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 17:57:46,480-Speed 25123.22 samples/sec   Loss 1.2649   LearningRate 0.0000   Epoch: 39   Global Step: 68550   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 17:57:56,263-Speed 25124.18 samples/sec   Loss 1.2542   LearningRate 0.0000   Epoch: 39   Global Step: 68560   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 17:58:05,976-Speed 25303.32 samples/sec   Loss 1.2598   LearningRate 0.0000   Epoch: 39   Global Step: 68570   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 17:58:15,755-Speed 25143.39 samples/sec   Loss 1.2559   LearningRate 0.0000   Epoch: 39   Global Step: 68580   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 17:58:25,528-Speed 25148.83 samples/sec   Loss 1.2643   LearningRate 0.0000   Epoch: 39   Global Step: 68590   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 17:58:35,487-Speed 24680.11 samples/sec   Loss 1.2648   LearningRate 0.0000   Epoch: 39   Global Step: 68600   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 17:58:45,206-Speed 25294.06 samples/sec   Loss 1.2595   LearningRate 0.0000   Epoch: 39   Global Step: 68610   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 17:58:54,975-Speed 25160.11 samples/sec   Loss 1.2565   LearningRate 0.0000   Epoch: 39   Global Step: 68620   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 17:59:04,815-Speed 24978.83 samples/sec   Loss 1.2600   LearningRate 0.0000   Epoch: 39   Global Step: 68630   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 17:59:14,565-Speed 25208.44 samples/sec   Loss 1.2478   LearningRate 0.0000   Epoch: 39   Global Step: 68640   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 17:59:24,423-Speed 24934.02 samples/sec   Loss 1.2556   LearningRate 0.0000   Epoch: 39   Global Step: 68650   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 17:59:34,153-Speed 25262.66 samples/sec   Loss 1.2509   LearningRate 0.0000   Epoch: 39   Global Step: 68660   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 17:59:43,997-Speed 24970.57 samples/sec   Loss 1.2590   LearningRate 0.0000   Epoch: 39   Global Step: 68670   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 17:59:53,738-Speed 25231.60 samples/sec   Loss 1.2593   LearningRate 0.0000   Epoch: 39   Global Step: 68680   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 18:00:03,397-Speed 25452.72 samples/sec   Loss 1.2620   LearningRate 0.0000   Epoch: 39   Global Step: 68690   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 18:00:13,194-Speed 25087.11 samples/sec   Loss 1.2620   LearningRate 0.0000   Epoch: 39   Global Step: 68700   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 18:00:23,106-Speed 24799.32 samples/sec   Loss 1.2530   LearningRate 0.0000   Epoch: 39   Global Step: 68710   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 18:00:32,880-Speed 25146.10 samples/sec   Loss 1.2580   LearningRate 0.0000   Epoch: 39   Global Step: 68720   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 18:00:42,630-Speed 25209.55 samples/sec   Loss 1.2659   LearningRate 0.0000   Epoch: 39   Global Step: 68730   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 18:00:52,342-Speed 25308.97 samples/sec   Loss 1.2611   LearningRate 0.0000   Epoch: 39   Global Step: 68740   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 18:01:02,083-Speed 25231.30 samples/sec   Loss 1.2593   LearningRate 0.0000   Epoch: 39   Global Step: 68750   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 18:01:11,901-Speed 25037.00 samples/sec   Loss 1.2715   LearningRate 0.0000   Epoch: 39   Global Step: 68760   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 18:01:21,626-Speed 25273.34 samples/sec   Loss 1.2572   LearningRate 0.0000   Epoch: 39   Global Step: 68770   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 18:01:31,402-Speed 25141.34 samples/sec   Loss 1.2513   LearningRate 0.0000   Epoch: 39   Global Step: 68780   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-03-26 18:01:41,187-Speed 25120.28 samples/sec   Loss 1.2566   LearningRate 0.0000   Epoch: 39   Global Step: 68790   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-03-26 18:01:51,061-Speed 24893.45 samples/sec   Loss 1.2603   LearningRate 0.0000   Epoch: 39   Global Step: 68800   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-03-26 18:02:00,911-Speed 24952.86 samples/sec   Loss 1.2563   LearningRate 0.0000   Epoch: 39   Global Step: 68810   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-03-26 18:02:10,648-Speed 25254.02 samples/sec   Loss 1.2593   LearningRate 0.0000   Epoch: 39   Global Step: 68820   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-03-26 18:02:20,532-Speed 24866.89 samples/sec   Loss 1.2508   LearningRate 0.0000   Epoch: 39   Global Step: 68830   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-03-26 18:02:30,303-Speed 25157.68 samples/sec   Loss 1.2590   LearningRate 0.0000   Epoch: 39   Global Step: 68840   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-03-26 18:02:40,070-Speed 25165.52 samples/sec   Loss 1.2480   LearningRate 0.0000   Epoch: 39   Global Step: 68850   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-03-26 18:02:49,843-Speed 25149.69 samples/sec   Loss 1.2672   LearningRate 0.0000   Epoch: 39   Global Step: 68860   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-03-26 18:02:59,725-Speed 24872.56 samples/sec   Loss 1.2574   LearningRate 0.0000   Epoch: 39   Global Step: 68870   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-03-26 18:03:09,586-Speed 24926.23 samples/sec   Loss 1.2653   LearningRate 0.0000   Epoch: 39   Global Step: 68880   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 18:03:19,411-Speed 25017.00 samples/sec   Loss 1.2594   LearningRate 0.0000   Epoch: 39   Global Step: 68890   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 18:03:29,194-Speed 25123.22 samples/sec   Loss 1.2592   LearningRate 0.0000   Epoch: 39   Global Step: 68900   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 18:03:38,983-Speed 25109.10 samples/sec   Loss 1.2541   LearningRate 0.0000   Epoch: 39   Global Step: 68910   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 18:03:48,696-Speed 25312.85 samples/sec   Loss 1.2617   LearningRate 0.0000   Epoch: 39   Global Step: 68920   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 18:03:58,544-Speed 24958.36 samples/sec   Loss 1.2657   LearningRate 0.0000   Epoch: 39   Global Step: 68930   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 18:04:08,288-Speed 25225.57 samples/sec   Loss 1.2557   LearningRate 0.0000   Epoch: 39   Global Step: 68940   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 18:04:18,018-Speed 25259.33 samples/sec   Loss 1.2675   LearningRate 0.0000   Epoch: 39   Global Step: 68950   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 18:04:27,735-Speed 25294.97 samples/sec   Loss 1.2526   LearningRate 0.0000   Epoch: 39   Global Step: 68960   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 18:04:37,530-Speed 25093.71 samples/sec   Loss 1.2488   LearningRate 0.0000   Epoch: 39   Global Step: 68970   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 18:04:47,395-Speed 24916.66 samples/sec   Loss 1.2570   LearningRate 0.0000   Epoch: 39   Global Step: 68980   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 18:04:57,076-Speed 25389.81 samples/sec   Loss 1.2533   LearningRate 0.0000   Epoch: 39   Global Step: 68990   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 18:05:06,902-Speed 25013.50 samples/sec   Loss 1.2644   LearningRate 0.0000   Epoch: 39   Global Step: 69000   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 18:05:16,652-Speed 25210.48 samples/sec   Loss 1.2592   LearningRate 0.0000   Epoch: 39   Global Step: 69010   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 18:05:26,372-Speed 25286.79 samples/sec   Loss 1.2598   LearningRate 0.0000   Epoch: 39   Global Step: 69020   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 18:05:36,177-Speed 25068.14 samples/sec   Loss 1.2561   LearningRate 0.0000   Epoch: 39   Global Step: 69030   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 18:05:45,869-Speed 25360.39 samples/sec   Loss 1.2640   LearningRate 0.0000   Epoch: 39   Global Step: 69040   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 18:05:55,572-Speed 25330.77 samples/sec   Loss 1.2679   LearningRate 0.0000   Epoch: 39   Global Step: 69050   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 18:06:05,336-Speed 25173.51 samples/sec   Loss 1.2620   LearningRate 0.0000   Epoch: 39   Global Step: 69060   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 18:06:15,136-Speed 25080.79 samples/sec   Loss 1.2681   LearningRate 0.0000   Epoch: 39   Global Step: 69070   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 18:06:24,901-Speed 25171.84 samples/sec   Loss 1.2553   LearningRate 0.0000   Epoch: 39   Global Step: 69080   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 18:06:34,611-Speed 25312.46 samples/sec   Loss 1.2565   LearningRate 0.0000   Epoch: 39   Global Step: 69090   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 18:06:44,373-Speed 25178.82 samples/sec   Loss 1.2629   LearningRate 0.0000   Epoch: 39   Global Step: 69100   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 18:06:54,163-Speed 25106.18 samples/sec   Loss 1.2647   LearningRate 0.0000   Epoch: 39   Global Step: 69110   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-26 18:07:03,942-Speed 25135.09 samples/sec   Loss 1.2615   LearningRate 0.0000   Epoch: 39   Global Step: 69120   Fp16 Grad Scale: 32768   Required: -0 hours
Training: 2022-03-26 18:07:13,664-Speed 25280.24 samples/sec   Loss 1.2615   LearningRate 0.0000   Epoch: 39   Global Step: 69130   Fp16 Grad Scale: 32768   Required: -0 hours