Training: 2022-03-03 08:04:25,066-rank_id: 0
Training: 2022-03-03 08:05:28,575-Speed 13912.69 samples/sec   Loss 42.4892   LearningRate 0.0000   Epoch: 0   Global Step: 20   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-03-03 08:05:46,223-Speed 13926.31 samples/sec   Loss 42.4764   LearningRate 0.0000   Epoch: 0   Global Step: 30   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-03-03 08:06:04,049-Speed 13787.92 samples/sec   Loss 42.4636   LearningRate 0.0000   Epoch: 0   Global Step: 40   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-03-03 08:06:21,863-Speed 13797.58 samples/sec   Loss 42.4250   LearningRate 0.0000   Epoch: 0   Global Step: 50   Fp16 Grad Scale: 65536   Required: 35 hours
Training: 2022-03-03 08:06:39,678-Speed 13795.80 samples/sec   Loss 42.3780   LearningRate 0.0000   Epoch: 0   Global Step: 60   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-03-03 08:06:57,458-Speed 13824.54 samples/sec   Loss 42.2783   LearningRate 0.0000   Epoch: 0   Global Step: 70   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-03-03 08:07:15,280-Speed 13791.01 samples/sec   Loss 42.1334   LearningRate 0.0000   Epoch: 0   Global Step: 80   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-03-03 08:07:33,029-Speed 13847.08 samples/sec   Loss 41.9596   LearningRate 0.0000   Epoch: 0   Global Step: 90   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-03-03 08:07:50,813-Speed 13820.37 samples/sec   Loss 41.7365   LearningRate 0.0000   Epoch: 0   Global Step: 100   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-03-03 08:08:08,717-Speed 13727.80 samples/sec   Loss 41.4752   LearningRate 0.0000   Epoch: 0   Global Step: 110   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-03-03 08:08:26,526-Speed 13800.93 samples/sec   Loss 41.2082   LearningRate 0.0000   Epoch: 0   Global Step: 120   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-03-03 08:08:44,253-Speed 13863.98 samples/sec   Loss 40.9147   LearningRate 0.0000   Epoch: 0   Global Step: 130   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-03-03 08:09:02,054-Speed 13807.67 samples/sec   Loss 40.6283   LearningRate 0.0000   Epoch: 0   Global Step: 140   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-03-03 08:09:19,820-Speed 13834.18 samples/sec   Loss 40.3172   LearningRate 0.0000   Epoch: 0   Global Step: 150   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-03-03 08:09:37,573-Speed 13844.00 samples/sec   Loss 40.0319   LearningRate 0.0000   Epoch: 0   Global Step: 160   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-03-03 08:09:55,367-Speed 13812.18 samples/sec   Loss 39.7825   LearningRate 0.0000   Epoch: 0   Global Step: 170   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-03-03 08:10:13,153-Speed 13818.63 samples/sec   Loss 39.5631   LearningRate 0.0000   Epoch: 0   Global Step: 180   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-03-03 08:10:30,955-Speed 13806.28 samples/sec   Loss 39.3847   LearningRate 0.0000   Epoch: 0   Global Step: 190   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-03-03 08:10:48,686-Speed 13860.77 samples/sec   Loss 39.2179   LearningRate 0.0000   Epoch: 0   Global Step: 200   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-03-03 08:11:06,539-Speed 13766.89 samples/sec   Loss 39.0972   LearningRate 0.0000   Epoch: 0   Global Step: 210   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-03-03 08:11:24,268-Speed 13862.74 samples/sec   Loss 39.0031   LearningRate 0.0000   Epoch: 0   Global Step: 220   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-03-03 08:11:41,981-Speed 13875.46 samples/sec   Loss 38.9293   LearningRate 0.0000   Epoch: 0   Global Step: 230   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-03-03 08:11:59,738-Speed 13841.07 samples/sec   Loss 38.8636   LearningRate 0.0000   Epoch: 0   Global Step: 240   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-03-03 08:12:17,509-Speed 13830.24 samples/sec   Loss 38.8420   LearningRate 0.0000   Epoch: 0   Global Step: 250   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-03-03 08:12:35,194-Speed 13897.54 samples/sec   Loss 38.8140   LearningRate 0.0000   Epoch: 0   Global Step: 260   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-03-03 08:12:52,882-Speed 13896.15 samples/sec   Loss 38.8162   LearningRate 0.0000   Epoch: 0   Global Step: 270   Fp16 Grad Scale: 16384   Required: 34 hours
Training: 2022-03-03 08:13:10,561-Speed 13902.39 samples/sec   Loss 38.8487   LearningRate 0.0000   Epoch: 0   Global Step: 280   Fp16 Grad Scale: 16384   Required: 34 hours
Training: 2022-03-03 08:13:28,305-Speed 13851.31 samples/sec   Loss 38.8465   LearningRate 0.0000   Epoch: 0   Global Step: 290   Fp16 Grad Scale: 16384   Required: 34 hours
Training: 2022-03-03 08:13:46,162-Speed 13763.55 samples/sec   Loss 38.8397   LearningRate 0.0000   Epoch: 0   Global Step: 300   Fp16 Grad Scale: 16384   Required: 34 hours
Training: 2022-03-03 08:14:03,883-Speed 13868.49 samples/sec   Loss 38.8301   LearningRate 0.0000   Epoch: 0   Global Step: 310   Fp16 Grad Scale: 16384   Required: 34 hours
Training: 2022-03-03 08:14:21,639-Speed 13842.04 samples/sec   Loss 38.8364   LearningRate 0.0000   Epoch: 0   Global Step: 320   Fp16 Grad Scale: 16384   Required: 34 hours
Training: 2022-03-03 08:14:39,389-Speed 13846.51 samples/sec   Loss 38.8474   LearningRate 0.0000   Epoch: 0   Global Step: 330   Fp16 Grad Scale: 16384   Required: 34 hours
Training: 2022-03-03 08:14:57,179-Speed 13815.29 samples/sec   Loss 38.8848   LearningRate 0.0000   Epoch: 0   Global Step: 340   Fp16 Grad Scale: 8192   Required: 34 hours
Training: 2022-03-03 08:15:14,898-Speed 13870.53 samples/sec   Loss 38.8806   LearningRate 0.0001   Epoch: 0   Global Step: 350   Fp16 Grad Scale: 8192   Required: 34 hours
Training: 2022-03-03 08:15:32,633-Speed 13858.63 samples/sec   Loss 38.8829   LearningRate 0.0001   Epoch: 0   Global Step: 360   Fp16 Grad Scale: 8192   Required: 34 hours
Training: 2022-03-03 08:15:50,422-Speed 13816.21 samples/sec   Loss 38.8849   LearningRate 0.0001   Epoch: 0   Global Step: 370   Fp16 Grad Scale: 8192   Required: 34 hours
Training: 2022-03-03 08:16:08,167-Speed 13850.07 samples/sec   Loss 38.8941   LearningRate 0.0001   Epoch: 0   Global Step: 380   Fp16 Grad Scale: 8192   Required: 34 hours
Training: 2022-03-03 08:16:25,951-Speed 13819.86 samples/sec   Loss 38.8790   LearningRate 0.0001   Epoch: 0   Global Step: 390   Fp16 Grad Scale: 8192   Required: 34 hours
Training: 2022-03-03 08:16:43,637-Speed 13896.74 samples/sec   Loss 38.8755   LearningRate 0.0001   Epoch: 0   Global Step: 400   Fp16 Grad Scale: 8192   Required: 34 hours
Training: 2022-03-03 08:17:01,346-Speed 13878.60 samples/sec   Loss 38.8982   LearningRate 0.0001   Epoch: 0   Global Step: 410   Fp16 Grad Scale: 8192   Required: 34 hours
Training: 2022-03-03 08:17:19,087-Speed 13853.36 samples/sec   Loss 38.9245   LearningRate 0.0001   Epoch: 0   Global Step: 420   Fp16 Grad Scale: 8192   Required: 34 hours
Training: 2022-03-03 08:17:36,834-Speed 13848.57 samples/sec   Loss 38.8834   LearningRate 0.0001   Epoch: 0   Global Step: 430   Fp16 Grad Scale: 8192   Required: 34 hours
Training: 2022-03-03 08:17:54,591-Speed 13842.23 samples/sec   Loss 38.8782   LearningRate 0.0001   Epoch: 0   Global Step: 440   Fp16 Grad Scale: 16384   Required: 34 hours
Training: 2022-03-03 08:18:12,424-Speed 13783.54 samples/sec   Loss 38.8602   LearningRate 0.0001   Epoch: 0   Global Step: 450   Fp16 Grad Scale: 16384   Required: 34 hours
Training: 2022-03-03 08:18:30,198-Speed 13827.63 samples/sec   Loss 38.8489   LearningRate 0.0001   Epoch: 0   Global Step: 460   Fp16 Grad Scale: 16384   Required: 34 hours
Training: 2022-03-03 08:18:47,945-Speed 13848.76 samples/sec   Loss 38.8465   LearningRate 0.0001   Epoch: 0   Global Step: 470   Fp16 Grad Scale: 16384   Required: 34 hours
Training: 2022-03-03 08:19:05,705-Speed 13839.12 samples/sec   Loss 38.8471   LearningRate 0.0001   Epoch: 0   Global Step: 480   Fp16 Grad Scale: 16384   Required: 34 hours
Training: 2022-03-03 08:19:23,500-Speed 13811.29 samples/sec   Loss 38.8400   LearningRate 0.0001   Epoch: 0   Global Step: 490   Fp16 Grad Scale: 16384   Required: 34 hours
Training: 2022-03-03 08:19:41,328-Speed 13785.90 samples/sec   Loss 38.8598   LearningRate 0.0001   Epoch: 0   Global Step: 500   Fp16 Grad Scale: 16384   Required: 34 hours
Training: 2022-03-03 08:19:59,100-Speed 13829.85 samples/sec   Loss 38.8535   LearningRate 0.0001   Epoch: 0   Global Step: 510   Fp16 Grad Scale: 16384   Required: 34 hours
Training: 2022-03-03 08:20:16,842-Speed 13852.16 samples/sec   Loss 38.8620   LearningRate 0.0001   Epoch: 0   Global Step: 520   Fp16 Grad Scale: 16384   Required: 34 hours
Training: 2022-03-03 08:20:34,592-Speed 13847.07 samples/sec   Loss 38.8628   LearningRate 0.0001   Epoch: 0   Global Step: 530   Fp16 Grad Scale: 16384   Required: 34 hours
Training: 2022-03-03 08:20:52,312-Speed 13869.23 samples/sec   Loss 38.8776   LearningRate 0.0001   Epoch: 0   Global Step: 540   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-03-03 08:21:10,073-Speed 13838.32 samples/sec   Loss 38.8709   LearningRate 0.0001   Epoch: 0   Global Step: 550   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-03-03 08:21:27,871-Speed 13809.66 samples/sec   Loss 38.8748   LearningRate 0.0001   Epoch: 0   Global Step: 560   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-03-03 08:21:45,689-Speed 13793.13 samples/sec   Loss 38.8965   LearningRate 0.0001   Epoch: 0   Global Step: 570   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-03-03 08:22:03,579-Speed 13738.15 samples/sec   Loss 38.9041   LearningRate 0.0001   Epoch: 0   Global Step: 580   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-03-03 08:22:21,328-Speed 13847.18 samples/sec   Loss 38.9106   LearningRate 0.0001   Epoch: 0   Global Step: 590   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-03-03 08:22:39,078-Speed 13846.92 samples/sec   Loss 38.9325   LearningRate 0.0001   Epoch: 0   Global Step: 600   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-03-03 08:22:56,853-Speed 13826.43 samples/sec   Loss 38.9278   LearningRate 0.0001   Epoch: 0   Global Step: 610   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-03-03 08:23:14,503-Speed 13925.14 samples/sec   Loss 38.9422   LearningRate 0.0001   Epoch: 0   Global Step: 620   Fp16 Grad Scale: 8192   Required: 34 hours
Training: 2022-03-03 08:23:32,260-Speed 13840.69 samples/sec   Loss 38.9742   LearningRate 0.0001   Epoch: 0   Global Step: 630   Fp16 Grad Scale: 8192   Required: 34 hours
Training: 2022-03-03 08:23:50,046-Speed 13818.72 samples/sec   Loss 38.9523   LearningRate 0.0001   Epoch: 0   Global Step: 640   Fp16 Grad Scale: 8192   Required: 34 hours
Training: 2022-03-03 08:24:07,818-Speed 13829.47 samples/sec   Loss 39.0715   LearningRate 0.0001   Epoch: 0   Global Step: 650   Fp16 Grad Scale: 4096   Required: 34 hours
Training: 2022-03-03 08:24:25,570-Speed 13844.92 samples/sec   Loss 39.0198   LearningRate 0.0001   Epoch: 0   Global Step: 660   Fp16 Grad Scale: 4096   Required: 34 hours
Training: 2022-03-03 08:24:43,276-Speed 13880.68 samples/sec   Loss 39.0395   LearningRate 0.0001   Epoch: 0   Global Step: 670   Fp16 Grad Scale: 4096   Required: 34 hours
Training: 2022-03-03 08:25:01,091-Speed 13795.41 samples/sec   Loss 39.0070   LearningRate 0.0001   Epoch: 0   Global Step: 680   Fp16 Grad Scale: 4096   Required: 34 hours
Training: 2022-03-03 08:25:18,859-Speed 13832.52 samples/sec   Loss 39.0112   LearningRate 0.0001   Epoch: 0   Global Step: 690   Fp16 Grad Scale: 4096   Required: 34 hours
Training: 2022-03-03 08:25:36,660-Speed 13807.04 samples/sec   Loss 39.0098   LearningRate 0.0001   Epoch: 0   Global Step: 700   Fp16 Grad Scale: 4096   Required: 34 hours
Training: 2022-03-03 08:25:54,396-Speed 13856.76 samples/sec   Loss 39.0003   LearningRate 0.0001   Epoch: 0   Global Step: 710   Fp16 Grad Scale: 4096   Required: 34 hours
Training: 2022-03-03 08:26:12,224-Speed 13786.34 samples/sec   Loss 39.0102   LearningRate 0.0001   Epoch: 0   Global Step: 720   Fp16 Grad Scale: 4096   Required: 34 hours
Training: 2022-03-03 08:26:29,977-Speed 13843.96 samples/sec   Loss 39.0031   LearningRate 0.0001   Epoch: 0   Global Step: 730   Fp16 Grad Scale: 4096   Required: 34 hours
Training: 2022-03-03 08:26:47,798-Speed 13791.24 samples/sec   Loss 39.0007   LearningRate 0.0001   Epoch: 0   Global Step: 740   Fp16 Grad Scale: 4096   Required: 34 hours
Training: 2022-03-03 08:27:05,589-Speed 13814.13 samples/sec   Loss 39.0052   LearningRate 0.0001   Epoch: 0   Global Step: 750   Fp16 Grad Scale: 8192   Required: 34 hours
Training: 2022-03-03 08:27:23,349-Speed 13839.08 samples/sec   Loss 38.9972   LearningRate 0.0001   Epoch: 0   Global Step: 760   Fp16 Grad Scale: 8192   Required: 34 hours
Training: 2022-03-03 08:27:41,054-Speed 13881.22 samples/sec   Loss 38.9906   LearningRate 0.0001   Epoch: 0   Global Step: 770   Fp16 Grad Scale: 8192   Required: 34 hours
Training: 2022-03-03 08:27:58,805-Speed 13846.07 samples/sec   Loss 38.9871   LearningRate 0.0001   Epoch: 0   Global Step: 780   Fp16 Grad Scale: 8192   Required: 34 hours
Training: 2022-03-03 08:28:16,523-Speed 13871.55 samples/sec   Loss 38.9822   LearningRate 0.0001   Epoch: 0   Global Step: 790   Fp16 Grad Scale: 8192   Required: 34 hours
Training: 2022-03-03 08:28:34,270-Speed 13848.87 samples/sec   Loss 38.9683   LearningRate 0.0001   Epoch: 0   Global Step: 800   Fp16 Grad Scale: 8192   Required: 34 hours
Training: 2022-03-03 08:28:52,018-Speed 13848.00 samples/sec   Loss 38.9722   LearningRate 0.0001   Epoch: 0   Global Step: 810   Fp16 Grad Scale: 8192   Required: 34 hours
Training: 2022-03-03 08:29:09,801-Speed 13820.80 samples/sec   Loss 38.9758   LearningRate 0.0001   Epoch: 0   Global Step: 820   Fp16 Grad Scale: 8192   Required: 34 hours
Training: 2022-03-03 08:29:27,525-Speed 13866.94 samples/sec   Loss 38.9755   LearningRate 0.0001   Epoch: 0   Global Step: 830   Fp16 Grad Scale: 8192   Required: 34 hours
Training: 2022-03-03 08:29:45,323-Speed 13808.78 samples/sec   Loss 38.9726   LearningRate 0.0001   Epoch: 0   Global Step: 840   Fp16 Grad Scale: 8192   Required: 34 hours
Training: 2022-03-03 08:30:03,178-Speed 13765.34 samples/sec   Loss 38.9720   LearningRate 0.0001   Epoch: 0   Global Step: 850   Fp16 Grad Scale: 16384   Required: 34 hours
Training: 2022-03-03 08:30:21,023-Speed 13772.40 samples/sec   Loss 38.9793   LearningRate 0.0001   Epoch: 0   Global Step: 860   Fp16 Grad Scale: 16384   Required: 34 hours
Training: 2022-03-03 08:30:38,793-Speed 13831.31 samples/sec   Loss 38.9842   LearningRate 0.0001   Epoch: 0   Global Step: 870   Fp16 Grad Scale: 16384   Required: 34 hours
Training: 2022-03-03 08:30:56,595-Speed 13806.20 samples/sec   Loss 38.9836   LearningRate 0.0001   Epoch: 0   Global Step: 880   Fp16 Grad Scale: 16384   Required: 34 hours
Training: 2022-03-03 08:31:14,319-Speed 13866.70 samples/sec   Loss 38.9835   LearningRate 0.0001   Epoch: 0   Global Step: 890   Fp16 Grad Scale: 16384   Required: 34 hours
Training: 2022-03-03 08:31:32,045-Speed 13865.02 samples/sec   Loss 38.9756   LearningRate 0.0001   Epoch: 0   Global Step: 900   Fp16 Grad Scale: 16384   Required: 34 hours
Training: 2022-03-03 08:31:49,880-Speed 13780.51 samples/sec   Loss 38.9804   LearningRate 0.0001   Epoch: 0   Global Step: 910   Fp16 Grad Scale: 16384   Required: 34 hours
Training: 2022-03-03 08:32:07,650-Speed 13831.58 samples/sec   Loss 38.9767   LearningRate 0.0001   Epoch: 0   Global Step: 920   Fp16 Grad Scale: 16384   Required: 34 hours
Training: 2022-03-03 08:32:25,385-Speed 13858.11 samples/sec   Loss 38.9758   LearningRate 0.0001   Epoch: 0   Global Step: 930   Fp16 Grad Scale: 16384   Required: 34 hours
Training: 2022-03-03 08:32:43,123-Speed 13855.88 samples/sec   Loss 38.9730   LearningRate 0.0001   Epoch: 0   Global Step: 940   Fp16 Grad Scale: 16384   Required: 34 hours
Training: 2022-03-03 08:33:00,864-Speed 13852.99 samples/sec   Loss 38.9514   LearningRate 0.0001   Epoch: 0   Global Step: 950   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-03-03 08:33:18,685-Speed 13791.73 samples/sec   Loss 38.9424   LearningRate 0.0001   Epoch: 0   Global Step: 960   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-03-03 08:33:36,387-Speed 13884.18 samples/sec   Loss 38.9416   LearningRate 0.0001   Epoch: 0   Global Step: 970   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-03-03 08:33:54,093-Speed 13880.81 samples/sec   Loss 38.9334   LearningRate 0.0001   Epoch: 0   Global Step: 980   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-03-03 08:34:11,916-Speed 13789.72 samples/sec   Loss 38.9140   LearningRate 0.0001   Epoch: 0   Global Step: 990   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-03-03 08:34:29,740-Speed 13788.90 samples/sec   Loss 38.8864   LearningRate 0.0001   Epoch: 0   Global Step: 1000   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-03-03 08:34:47,602-Speed 13759.92 samples/sec   Loss 38.8724   LearningRate 0.0001   Epoch: 0   Global Step: 1010   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-03-03 08:35:05,319-Speed 13871.98 samples/sec   Loss 38.8473   LearningRate 0.0001   Epoch: 0   Global Step: 1020   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-03-03 08:35:23,096-Speed 13824.77 samples/sec   Loss 38.8155   LearningRate 0.0001   Epoch: 0   Global Step: 1030   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-03-03 08:35:40,923-Speed 13787.13 samples/sec   Loss 38.7901   LearningRate 0.0002   Epoch: 0   Global Step: 1040   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-03-03 08:35:58,678-Speed 13842.70 samples/sec   Loss 38.7585   LearningRate 0.0002   Epoch: 0   Global Step: 1050   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-03-03 08:36:16,459-Speed 13822.10 samples/sec   Loss 38.7198   LearningRate 0.0002   Epoch: 0   Global Step: 1060   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-03-03 08:36:34,246-Speed 13817.67 samples/sec   Loss 38.6768   LearningRate 0.0002   Epoch: 0   Global Step: 1070   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-03-03 08:36:51,994-Speed 13847.91 samples/sec   Loss 38.6483   LearningRate 0.0002   Epoch: 0   Global Step: 1080   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-03-03 08:37:09,808-Speed 13798.74 samples/sec   Loss 38.6080   LearningRate 0.0002   Epoch: 0   Global Step: 1090   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-03-03 08:37:27,588-Speed 13823.00 samples/sec   Loss 38.5714   LearningRate 0.0002   Epoch: 0   Global Step: 1100   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-03-03 08:37:45,344-Speed 13841.63 samples/sec   Loss 38.5292   LearningRate 0.0002   Epoch: 0   Global Step: 1110   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-03-03 08:38:03,111-Speed 13833.50 samples/sec   Loss 38.5053   LearningRate 0.0002   Epoch: 0   Global Step: 1120   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-03-03 08:38:20,893-Speed 13821.86 samples/sec   Loss 38.4691   LearningRate 0.0002   Epoch: 0   Global Step: 1130   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-03-03 08:38:38,745-Speed 13767.48 samples/sec   Loss 38.4542   LearningRate 0.0002   Epoch: 0   Global Step: 1140   Fp16 Grad Scale: 65536   Required: 34 hours
Training: 2022-03-03 08:38:56,547-Speed 13805.91 samples/sec   Loss 38.4065   LearningRate 0.0002   Epoch: 0   Global Step: 1150   Fp16 Grad Scale: 131072   Required: 34 hours
Training: 2022-03-03 08:39:14,320-Speed 13828.61 samples/sec   Loss 38.3869   LearningRate 0.0002   Epoch: 0   Global Step: 1160   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-03-03 08:39:32,097-Speed 13825.75 samples/sec   Loss 38.3578   LearningRate 0.0002   Epoch: 0   Global Step: 1170   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-03-03 08:39:49,890-Speed 13812.76 samples/sec   Loss 38.3266   LearningRate 0.0002   Epoch: 0   Global Step: 1180   Fp16 Grad Scale: 32768   Required: 34 hours
Training: 2022-03-03 08:40:07,738-Speed 13770.69 samples/sec   Loss 38.3128   LearningRate 0.0002   Epoch: 0   Global Step: 1190   Fp16 Grad Scale: 16384   Required: 34 hours
Training: 2022-03-03 08:40:25,583-Speed 13772.86 samples/sec   Loss 38.2927   LearningRate 0.0002   Epoch: 0   Global Step: 1200   Fp16 Grad Scale: 16384   Required: 34 hours
Training: 2022-03-03 08:40:43,370-Speed 13817.65 samples/sec   Loss 38.3706   LearningRate 0.0002   Epoch: 0   Global Step: 1210   Fp16 Grad Scale: 8192   Required: 34 hours
Training: 2022-03-03 08:41:01,209-Speed 13776.92 samples/sec   Loss 38.3209   LearningRate 0.0002   Epoch: 0   Global Step: 1220   Fp16 Grad Scale: 8192   Required: 34 hours
Training: 2022-03-03 08:41:19,071-Speed 13765.50 samples/sec   Loss 38.2093   LearningRate 0.0002   Epoch: 0   Global Step: 1230   Fp16 Grad Scale: 8192   Required: 34 hours
Training: 2022-03-03 08:41:36,903-Speed 13782.51 samples/sec   Loss 38.1617   LearningRate 0.0002   Epoch: 0   Global Step: 1240   Fp16 Grad Scale: 8192   Required: 34 hours
Training: 2022-03-03 08:41:54,742-Speed 13777.37 samples/sec   Loss 38.1224   LearningRate 0.0002   Epoch: 0   Global Step: 1250   Fp16 Grad Scale: 8192   Required: 34 hours
Training: 2022-03-03 08:42:12,624-Speed 13744.14 samples/sec   Loss 38.0936   LearningRate 0.0002   Epoch: 0   Global Step: 1260   Fp16 Grad Scale: 8192   Required: 34 hours
Training: 2022-03-03 08:42:30,456-Speed 13783.69 samples/sec   Loss 38.0492   LearningRate 0.0002   Epoch: 0   Global Step: 1270   Fp16 Grad Scale: 8192   Required: 34 hours
Training: 2022-03-03 08:42:48,311-Speed 13764.58 samples/sec   Loss 38.0031   LearningRate 0.0002   Epoch: 0   Global Step: 1280   Fp16 Grad Scale: 8192   Required: 34 hours
Training: 2022-03-03 08:43:06,204-Speed 13736.24 samples/sec   Loss 37.9594   LearningRate 0.0002   Epoch: 0   Global Step: 1290   Fp16 Grad Scale: 8192   Required: 34 hours
Training: 2022-03-03 08:43:24,038-Speed 13781.29 samples/sec   Loss 37.9136   LearningRate 0.0002   Epoch: 0   Global Step: 1300   Fp16 Grad Scale: 8192   Required: 34 hours
Training: 2022-03-03 08:43:41,978-Speed 13700.12 samples/sec   Loss 37.8599   LearningRate 0.0002   Epoch: 0   Global Step: 1310   Fp16 Grad Scale: 16384   Required: 34 hours
Training: 2022-03-03 08:43:59,882-Speed 13727.38 samples/sec   Loss 37.8138   LearningRate 0.0002   Epoch: 0   Global Step: 1320   Fp16 Grad Scale: 16384   Required: 34 hours
Training: 2022-03-03 08:44:17,756-Speed 13750.10 samples/sec   Loss 37.7872   LearningRate 0.0002   Epoch: 0   Global Step: 1330   Fp16 Grad Scale: 8192   Required: 33 hours
Training: 2022-03-03 08:44:35,530-Speed 13827.79 samples/sec   Loss 37.8208   LearningRate 0.0002   Epoch: 0   Global Step: 1340   Fp16 Grad Scale: 8192   Required: 33 hours
Training: 2022-03-03 08:44:53,472-Speed 13698.74 samples/sec   Loss 37.7613   LearningRate 0.0002   Epoch: 0   Global Step: 1350   Fp16 Grad Scale: 8192   Required: 33 hours
Training: 2022-03-03 08:45:11,236-Speed 13834.91 samples/sec   Loss 37.6893   LearningRate 0.0002   Epoch: 0   Global Step: 1360   Fp16 Grad Scale: 4096   Required: 33 hours
Training: 2022-03-03 08:45:29,059-Speed 13789.80 samples/sec   Loss 37.5833   LearningRate 0.0002   Epoch: 0   Global Step: 1370   Fp16 Grad Scale: 4096   Required: 33 hours
Training: 2022-03-03 08:45:46,976-Speed 13717.71 samples/sec   Loss 37.5570   LearningRate 0.0002   Epoch: 0   Global Step: 1380   Fp16 Grad Scale: 4096   Required: 33 hours
Training: 2022-03-03 08:46:05,092-Speed 13567.19 samples/sec   Loss 37.4757   LearningRate 0.0002   Epoch: 0   Global Step: 1390   Fp16 Grad Scale: 4096   Required: 33 hours
Training: 2022-03-03 08:46:23,228-Speed 13551.73 samples/sec   Loss 37.3808   LearningRate 0.0002   Epoch: 0   Global Step: 1400   Fp16 Grad Scale: 4096   Required: 33 hours
Training: 2022-03-03 08:46:41,398-Speed 13526.43 samples/sec   Loss 37.3408   LearningRate 0.0002   Epoch: 0   Global Step: 1410   Fp16 Grad Scale: 4096   Required: 33 hours
Training: 2022-03-03 08:46:59,571-Speed 13524.15 samples/sec   Loss 37.3004   LearningRate 0.0002   Epoch: 0   Global Step: 1420   Fp16 Grad Scale: 4096   Required: 33 hours
Training: 2022-03-03 08:47:17,716-Speed 13544.78 samples/sec   Loss 37.2942   LearningRate 0.0002   Epoch: 0   Global Step: 1430   Fp16 Grad Scale: 4096   Required: 33 hours
Training: 2022-03-03 08:47:35,822-Speed 13574.75 samples/sec   Loss 37.2175   LearningRate 0.0002   Epoch: 0   Global Step: 1440   Fp16 Grad Scale: 4096   Required: 33 hours
Training: 2022-03-03 08:47:53,969-Speed 13543.38 samples/sec   Loss 37.1724   LearningRate 0.0002   Epoch: 0   Global Step: 1450   Fp16 Grad Scale: 4096   Required: 33 hours
Training: 2022-03-03 08:48:12,074-Speed 13575.27 samples/sec   Loss 37.0700   LearningRate 0.0002   Epoch: 0   Global Step: 1460   Fp16 Grad Scale: 8192   Required: 33 hours
Training: 2022-03-03 08:48:30,220-Speed 13543.76 samples/sec   Loss 37.0468   LearningRate 0.0002   Epoch: 0   Global Step: 1470   Fp16 Grad Scale: 8192   Required: 33 hours
Training: 2022-03-03 08:48:48,312-Speed 13584.54 samples/sec   Loss 37.0402   LearningRate 0.0002   Epoch: 0   Global Step: 1480   Fp16 Grad Scale: 8192   Required: 33 hours
Training: 2022-03-03 08:49:06,626-Speed 13422.38 samples/sec   Loss 36.9097   LearningRate 0.0002   Epoch: 0   Global Step: 1490   Fp16 Grad Scale: 8192   Required: 33 hours
Training: 2022-03-03 08:49:24,776-Speed 13541.34 samples/sec   Loss 36.8841   LearningRate 0.0002   Epoch: 0   Global Step: 1500   Fp16 Grad Scale: 8192   Required: 33 hours
Training: 2022-03-03 08:49:42,933-Speed 13535.28 samples/sec   Loss 36.8346   LearningRate 0.0002   Epoch: 0   Global Step: 1510   Fp16 Grad Scale: 4096   Required: 33 hours
Training: 2022-03-03 08:50:01,071-Speed 13550.17 samples/sec   Loss 36.7794   LearningRate 0.0002   Epoch: 0   Global Step: 1520   Fp16 Grad Scale: 4096   Required: 33 hours
Training: 2022-03-03 08:50:19,197-Speed 13559.59 samples/sec   Loss 36.6977   LearningRate 0.0002   Epoch: 0   Global Step: 1530   Fp16 Grad Scale: 4096   Required: 33 hours
Training: 2022-03-03 08:50:37,425-Speed 13483.22 samples/sec   Loss 36.6119   LearningRate 0.0002   Epoch: 0   Global Step: 1540   Fp16 Grad Scale: 4096   Required: 33 hours
Training: 2022-03-03 08:50:55,536-Speed 13571.42 samples/sec   Loss 36.5650   LearningRate 0.0002   Epoch: 0   Global Step: 1550   Fp16 Grad Scale: 4096   Required: 33 hours
Training: 2022-03-03 08:51:13,729-Speed 13509.15 samples/sec   Loss 36.6514   LearningRate 0.0002   Epoch: 0   Global Step: 1560   Fp16 Grad Scale: 2048   Required: 33 hours
Training: 2022-03-03 08:51:31,790-Speed 13608.64 samples/sec   Loss 36.4679   LearningRate 0.0002   Epoch: 0   Global Step: 1570   Fp16 Grad Scale: 1024   Required: 33 hours
Training: 2022-03-03 08:51:49,877-Speed 13588.24 samples/sec   Loss 36.4191   LearningRate 0.0002   Epoch: 0   Global Step: 1580   Fp16 Grad Scale: 1024   Required: 33 hours
Training: 2022-03-03 08:52:08,014-Speed 13550.97 samples/sec   Loss 36.3615   LearningRate 0.0002   Epoch: 0   Global Step: 1590   Fp16 Grad Scale: 1024   Required: 33 hours
Training: 2022-03-03 08:52:26,159-Speed 13545.35 samples/sec   Loss 36.3309   LearningRate 0.0002   Epoch: 0   Global Step: 1600   Fp16 Grad Scale: 1024   Required: 33 hours
Training: 2022-03-03 08:52:44,298-Speed 13549.19 samples/sec   Loss 36.3660   LearningRate 0.0002   Epoch: 0   Global Step: 1610   Fp16 Grad Scale: 1024   Required: 33 hours
Training: 2022-03-03 08:53:02,452-Speed 13538.40 samples/sec   Loss 36.2173   LearningRate 0.0002   Epoch: 0   Global Step: 1620   Fp16 Grad Scale: 1024   Required: 33 hours
Training: 2022-03-03 08:53:20,568-Speed 13566.57 samples/sec   Loss 36.1629   LearningRate 0.0002   Epoch: 0   Global Step: 1630   Fp16 Grad Scale: 1024   Required: 33 hours
Training: 2022-03-03 08:53:38,774-Speed 13500.01 samples/sec   Loss 36.0479   LearningRate 0.0002   Epoch: 0   Global Step: 1640   Fp16 Grad Scale: 1024   Required: 33 hours
Training: 2022-03-03 08:53:56,673-Speed 13731.63 samples/sec   Loss 35.9773   LearningRate 0.0002   Epoch: 0   Global Step: 1650   Fp16 Grad Scale: 1024   Required: 33 hours
Training: 2022-03-03 08:54:14,529-Speed 13763.75 samples/sec   Loss 35.8711   LearningRate 0.0002   Epoch: 0   Global Step: 1660   Fp16 Grad Scale: 1024   Required: 33 hours
Training: 2022-03-03 08:54:32,301-Speed 13830.88 samples/sec   Loss 35.8540   LearningRate 0.0002   Epoch: 0   Global Step: 1670   Fp16 Grad Scale: 2048   Required: 33 hours
Training: 2022-03-03 08:54:50,101-Speed 13807.64 samples/sec   Loss 35.7735   LearningRate 0.0002   Epoch: 0   Global Step: 1680   Fp16 Grad Scale: 2048   Required: 33 hours
Training: 2022-03-03 08:55:07,781-Speed 13901.30 samples/sec   Loss 35.7210   LearningRate 0.0002   Epoch: 0   Global Step: 1690   Fp16 Grad Scale: 2048   Required: 33 hours
Training: 2022-03-03 08:55:25,515-Speed 13858.76 samples/sec   Loss 35.6461   LearningRate 0.0002   Epoch: 0   Global Step: 1700   Fp16 Grad Scale: 2048   Required: 33 hours
Training: 2022-03-03 08:55:43,246-Speed 13861.38 samples/sec   Loss 35.5800   LearningRate 0.0002   Epoch: 0   Global Step: 1710   Fp16 Grad Scale: 2048   Required: 33 hours
Training: 2022-03-03 08:56:00,947-Speed 13884.52 samples/sec   Loss 35.5038   LearningRate 0.0002   Epoch: 0   Global Step: 1720   Fp16 Grad Scale: 2048   Required: 33 hours
Training: 2022-03-03 08:57:08,583-Speed 3633.66 samples/sec   Loss 35.5531   LearningRate 0.0003   Epoch: 1   Global Step: 1730   Fp16 Grad Scale: 2048   Required: 34 hours
Training: 2022-03-03 08:57:26,309-Speed 13864.73 samples/sec   Loss 35.5557   LearningRate 0.0003   Epoch: 1   Global Step: 1740   Fp16 Grad Scale: 2048   Required: 34 hours
Training: 2022-03-03 08:57:44,043-Speed 13859.43 samples/sec   Loss 35.4544   LearningRate 0.0003   Epoch: 1   Global Step: 1750   Fp16 Grad Scale: 2048   Required: 34 hours
Training: 2022-03-03 08:58:01,798-Speed 13842.65 samples/sec   Loss 35.4012   LearningRate 0.0003   Epoch: 1   Global Step: 1760   Fp16 Grad Scale: 2048   Required: 34 hours
Training: 2022-03-03 08:58:19,514-Speed 13873.03 samples/sec   Loss 35.3628   LearningRate 0.0003   Epoch: 1   Global Step: 1770   Fp16 Grad Scale: 4096   Required: 34 hours
Training: 2022-03-03 08:58:37,355-Speed 13775.54 samples/sec   Loss 35.3447   LearningRate 0.0003   Epoch: 1   Global Step: 1780   Fp16 Grad Scale: 4096   Required: 34 hours
Training: 2022-03-03 08:58:55,129-Speed 13828.19 samples/sec   Loss 35.2740   LearningRate 0.0003   Epoch: 1   Global Step: 1790   Fp16 Grad Scale: 4096   Required: 34 hours
Training: 2022-03-03 08:59:12,910-Speed 13822.55 samples/sec   Loss 35.1900   LearningRate 0.0003   Epoch: 1   Global Step: 1800   Fp16 Grad Scale: 4096   Required: 34 hours
Training: 2022-03-03 08:59:30,607-Speed 13888.20 samples/sec   Loss 35.1373   LearningRate 0.0003   Epoch: 1   Global Step: 1810   Fp16 Grad Scale: 4096   Required: 34 hours
Training: 2022-03-03 08:59:48,344-Speed 13856.45 samples/sec   Loss 35.0047   LearningRate 0.0003   Epoch: 1   Global Step: 1820   Fp16 Grad Scale: 4096   Required: 34 hours
Training: 2022-03-03 09:00:06,117-Speed 13828.50 samples/sec   Loss 34.9075   LearningRate 0.0003   Epoch: 1   Global Step: 1830   Fp16 Grad Scale: 4096   Required: 34 hours
Training: 2022-03-03 09:00:23,869-Speed 13844.93 samples/sec   Loss 34.8102   LearningRate 0.0003   Epoch: 1   Global Step: 1840   Fp16 Grad Scale: 4096   Required: 34 hours
Training: 2022-03-03 09:00:41,630-Speed 13838.42 samples/sec   Loss 34.7652   LearningRate 0.0003   Epoch: 1   Global Step: 1850   Fp16 Grad Scale: 4096   Required: 34 hours
Training: 2022-03-03 09:00:59,465-Speed 13780.76 samples/sec   Loss 34.6772   LearningRate 0.0003   Epoch: 1   Global Step: 1860   Fp16 Grad Scale: 4096   Required: 34 hours
Training: 2022-03-03 09:01:17,168-Speed 13882.71 samples/sec   Loss 34.5755   LearningRate 0.0003   Epoch: 1   Global Step: 1870   Fp16 Grad Scale: 8192   Required: 34 hours
Training: 2022-03-03 09:01:34,865-Speed 13887.89 samples/sec   Loss 34.5566   LearningRate 0.0003   Epoch: 1   Global Step: 1880   Fp16 Grad Scale: 8192   Required: 34 hours
Training: 2022-03-03 09:01:52,704-Speed 13777.61 samples/sec   Loss 34.5804   LearningRate 0.0003   Epoch: 1   Global Step: 1890   Fp16 Grad Scale: 8192   Required: 34 hours
Training: 2022-03-03 09:02:10,431-Speed 13864.35 samples/sec   Loss 34.4526   LearningRate 0.0003   Epoch: 1   Global Step: 1900   Fp16 Grad Scale: 8192   Required: 34 hours
Training: 2022-03-03 09:02:28,181-Speed 13846.54 samples/sec   Loss 34.3351   LearningRate 0.0003   Epoch: 1   Global Step: 1910   Fp16 Grad Scale: 8192   Required: 34 hours
Training: 2022-03-03 09:02:45,920-Speed 13854.59 samples/sec   Loss 34.2268   LearningRate 0.0003   Epoch: 1   Global Step: 1920   Fp16 Grad Scale: 8192   Required: 34 hours
Training: 2022-03-03 09:03:03,640-Speed 13870.02 samples/sec   Loss 34.1568   LearningRate 0.0003   Epoch: 1   Global Step: 1930   Fp16 Grad Scale: 8192   Required: 34 hours
Training: 2022-03-03 09:03:21,344-Speed 13882.62 samples/sec   Loss 34.1038   LearningRate 0.0003   Epoch: 1   Global Step: 1940   Fp16 Grad Scale: 8192   Required: 34 hours
Training: 2022-03-03 09:03:39,129-Speed 13819.35 samples/sec   Loss 34.0145   LearningRate 0.0003   Epoch: 1   Global Step: 1950   Fp16 Grad Scale: 8192   Required: 34 hours
Training: 2022-03-03 09:03:56,881-Speed 13844.66 samples/sec   Loss 33.9539   LearningRate 0.0003   Epoch: 1   Global Step: 1960   Fp16 Grad Scale: 8192   Required: 34 hours
Training: 2022-03-03 09:04:14,624-Speed 13852.10 samples/sec   Loss 33.8635   LearningRate 0.0003   Epoch: 1   Global Step: 1970   Fp16 Grad Scale: 16384   Required: 34 hours
Training: 2022-03-03 09:04:32,405-Speed 13822.33 samples/sec   Loss 33.8227   LearningRate 0.0003   Epoch: 1   Global Step: 1980   Fp16 Grad Scale: 16384   Required: 34 hours
Training: 2022-03-03 09:04:50,123-Speed 13871.28 samples/sec   Loss 33.7152   LearningRate 0.0003   Epoch: 1   Global Step: 1990   Fp16 Grad Scale: 8192   Required: 34 hours
Training: 2022-03-03 09:05:07,904-Speed 13822.59 samples/sec   Loss 33.6241   LearningRate 0.0003   Epoch: 1   Global Step: 2000   Fp16 Grad Scale: 8192   Required: 34 hours
Training: 2022-03-03 09:05:25,663-Speed 13838.95 samples/sec   Loss 33.6186   LearningRate 0.0003   Epoch: 1   Global Step: 2010   Fp16 Grad Scale: 8192   Required: 34 hours
Training: 2022-03-03 09:05:43,437-Speed 13828.59 samples/sec   Loss 33.5551   LearningRate 0.0003   Epoch: 1   Global Step: 2020   Fp16 Grad Scale: 8192   Required: 34 hours
Training: 2022-03-03 09:06:01,204-Speed 13832.62 samples/sec   Loss 33.4624   LearningRate 0.0003   Epoch: 1   Global Step: 2030   Fp16 Grad Scale: 8192   Required: 34 hours
Training: 2022-03-03 09:06:18,904-Speed 13886.19 samples/sec   Loss 33.4992   LearningRate 0.0003   Epoch: 1   Global Step: 2040   Fp16 Grad Scale: 8192   Required: 34 hours
Training: 2022-03-03 09:06:36,705-Speed 13806.62 samples/sec   Loss 33.3382   LearningRate 0.0003   Epoch: 1   Global Step: 2050   Fp16 Grad Scale: 8192   Required: 34 hours
Training: 2022-03-03 09:06:54,528-Speed 13789.80 samples/sec   Loss 33.2099   LearningRate 0.0003   Epoch: 1   Global Step: 2060   Fp16 Grad Scale: 8192   Required: 34 hours
Training: 2022-03-03 09:07:12,344-Speed 13795.44 samples/sec   Loss 33.1453   LearningRate 0.0003   Epoch: 1   Global Step: 2070   Fp16 Grad Scale: 8192   Required: 34 hours
Training: 2022-03-03 09:07:30,033-Speed 13894.05 samples/sec   Loss 33.0261   LearningRate 0.0003   Epoch: 1   Global Step: 2080   Fp16 Grad Scale: 8192   Required: 34 hours
Training: 2022-03-03 09:07:47,791-Speed 13840.48 samples/sec   Loss 32.9128   LearningRate 0.0003   Epoch: 1   Global Step: 2090   Fp16 Grad Scale: 16384   Required: 34 hours
Training: 2022-03-03 09:08:05,589-Speed 13810.22 samples/sec   Loss 32.7946   LearningRate 0.0003   Epoch: 1   Global Step: 2100   Fp16 Grad Scale: 16384   Required: 34 hours
Training: 2022-03-03 09:08:23,346-Speed 13840.72 samples/sec   Loss 32.7730   LearningRate 0.0003   Epoch: 1   Global Step: 2110   Fp16 Grad Scale: 8192   Required: 34 hours
Training: 2022-03-03 09:08:41,069-Speed 13867.57 samples/sec   Loss 32.6987   LearningRate 0.0003   Epoch: 1   Global Step: 2120   Fp16 Grad Scale: 8192   Required: 34 hours
Training: 2022-03-03 09:08:58,774-Speed 13881.55 samples/sec   Loss 32.5884   LearningRate 0.0003   Epoch: 1   Global Step: 2130   Fp16 Grad Scale: 8192   Required: 34 hours
Training: 2022-03-03 09:09:16,584-Speed 13800.10 samples/sec   Loss 32.4788   LearningRate 0.0003   Epoch: 1   Global Step: 2140   Fp16 Grad Scale: 8192   Required: 34 hours
Training: 2022-03-03 09:09:34,300-Speed 13873.17 samples/sec   Loss 32.4069   LearningRate 0.0003   Epoch: 1   Global Step: 2150   Fp16 Grad Scale: 8192   Required: 34 hours
Training: 2022-03-03 09:09:52,122-Speed 13790.50 samples/sec   Loss 32.3353   LearningRate 0.0003   Epoch: 1   Global Step: 2160   Fp16 Grad Scale: 8192   Required: 34 hours
Training: 2022-03-03 09:10:09,839-Speed 13871.86 samples/sec   Loss 32.2968   LearningRate 0.0003   Epoch: 1   Global Step: 2170   Fp16 Grad Scale: 8192   Required: 34 hours
Training: 2022-03-03 09:10:27,515-Speed 13904.67 samples/sec   Loss 32.1148   LearningRate 0.0003   Epoch: 1   Global Step: 2180   Fp16 Grad Scale: 8192   Required: 34 hours
Training: 2022-03-03 09:10:45,332-Speed 13794.39 samples/sec   Loss 32.0056   LearningRate 0.0003   Epoch: 1   Global Step: 2190   Fp16 Grad Scale: 8192   Required: 34 hours
Training: 2022-03-03 09:11:03,128-Speed 13810.83 samples/sec   Loss 31.9613   LearningRate 0.0003   Epoch: 1   Global Step: 2200   Fp16 Grad Scale: 8192   Required: 34 hours
Training: 2022-03-03 09:11:20,820-Speed 13892.19 samples/sec   Loss 31.9026   LearningRate 0.0003   Epoch: 1   Global Step: 2210   Fp16 Grad Scale: 16384   Required: 34 hours
Training: 2022-03-03 09:11:38,512-Speed 13891.54 samples/sec   Loss 31.7751   LearningRate 0.0003   Epoch: 1   Global Step: 2220   Fp16 Grad Scale: 16384   Required: 34 hours
Training: 2022-03-03 09:11:56,306-Speed 13812.78 samples/sec   Loss 31.6323   LearningRate 0.0003   Epoch: 1   Global Step: 2230   Fp16 Grad Scale: 16384   Required: 34 hours
Training: 2022-03-03 09:12:14,101-Speed 13811.13 samples/sec   Loss 31.5413   LearningRate 0.0003   Epoch: 1   Global Step: 2240   Fp16 Grad Scale: 16384   Required: 34 hours
Training: 2022-03-03 09:12:31,862-Speed 13837.78 samples/sec   Loss 31.4241   LearningRate 0.0003   Epoch: 1   Global Step: 2250   Fp16 Grad Scale: 16384   Required: 34 hours
Training: 2022-03-03 09:12:49,580-Speed 13871.74 samples/sec   Loss 31.3262   LearningRate 0.0003   Epoch: 1   Global Step: 2260   Fp16 Grad Scale: 16384   Required: 34 hours
Training: 2022-03-03 09:13:07,255-Speed 13904.82 samples/sec   Loss 31.1940   LearningRate 0.0003   Epoch: 1   Global Step: 2270   Fp16 Grad Scale: 8192   Required: 34 hours
Training: 2022-03-03 09:13:25,041-Speed 13818.61 samples/sec   Loss 31.1200   LearningRate 0.0003   Epoch: 1   Global Step: 2280   Fp16 Grad Scale: 8192   Required: 33 hours
Training: 2022-03-03 09:13:42,713-Speed 13907.76 samples/sec   Loss 31.0600   LearningRate 0.0003   Epoch: 1   Global Step: 2290   Fp16 Grad Scale: 8192   Required: 33 hours
Training: 2022-03-03 09:14:00,411-Speed 13888.44 samples/sec   Loss 31.0208   LearningRate 0.0003   Epoch: 1   Global Step: 2300   Fp16 Grad Scale: 8192   Required: 33 hours
Training: 2022-03-03 09:14:18,184-Speed 13828.20 samples/sec   Loss 30.8744   LearningRate 0.0003   Epoch: 1   Global Step: 2310   Fp16 Grad Scale: 8192   Required: 33 hours
Training: 2022-03-03 09:14:35,873-Speed 13893.84 samples/sec   Loss 30.7826   LearningRate 0.0003   Epoch: 1   Global Step: 2320   Fp16 Grad Scale: 8192   Required: 33 hours
Training: 2022-03-03 09:14:53,644-Speed 13830.94 samples/sec   Loss 30.6314   LearningRate 0.0003   Epoch: 1   Global Step: 2330   Fp16 Grad Scale: 8192   Required: 33 hours
Training: 2022-03-03 09:15:11,506-Speed 13759.68 samples/sec   Loss 30.5017   LearningRate 0.0003   Epoch: 1   Global Step: 2340   Fp16 Grad Scale: 8192   Required: 33 hours
Training: 2022-03-03 09:15:29,276-Speed 13830.44 samples/sec   Loss 30.3838   LearningRate 0.0003   Epoch: 1   Global Step: 2350   Fp16 Grad Scale: 8192   Required: 33 hours
Training: 2022-03-03 09:15:47,050-Speed 13828.03 samples/sec   Loss 30.2962   LearningRate 0.0003   Epoch: 1   Global Step: 2360   Fp16 Grad Scale: 8192   Required: 33 hours
Training: 2022-03-03 09:16:04,781-Speed 13861.73 samples/sec   Loss 30.2379   LearningRate 0.0003   Epoch: 1   Global Step: 2370   Fp16 Grad Scale: 16384   Required: 33 hours
Training: 2022-03-03 09:16:22,468-Speed 13895.60 samples/sec   Loss 30.1137   LearningRate 0.0003   Epoch: 1   Global Step: 2380   Fp16 Grad Scale: 16384   Required: 33 hours
Training: 2022-03-03 09:16:40,211-Speed 13851.96 samples/sec   Loss 29.9769   LearningRate 0.0003   Epoch: 1   Global Step: 2390   Fp16 Grad Scale: 16384   Required: 33 hours
Training: 2022-03-03 09:16:57,982-Speed 13830.26 samples/sec   Loss 29.8317   LearningRate 0.0003   Epoch: 1   Global Step: 2400   Fp16 Grad Scale: 16384   Required: 33 hours
Training: 2022-03-03 09:17:15,729-Speed 13849.03 samples/sec   Loss 29.7221   LearningRate 0.0003   Epoch: 1   Global Step: 2410   Fp16 Grad Scale: 16384   Required: 33 hours
Training: 2022-03-03 09:17:33,433-Speed 13881.91 samples/sec   Loss 29.7048   LearningRate 0.0004   Epoch: 1   Global Step: 2420   Fp16 Grad Scale: 16384   Required: 33 hours
Training: 2022-03-03 09:17:51,313-Speed 13745.93 samples/sec   Loss 29.5675   LearningRate 0.0004   Epoch: 1   Global Step: 2430   Fp16 Grad Scale: 16384   Required: 33 hours
Training: 2022-03-03 09:18:09,124-Speed 13799.36 samples/sec   Loss 29.3802   LearningRate 0.0004   Epoch: 1   Global Step: 2440   Fp16 Grad Scale: 16384   Required: 33 hours
Training: 2022-03-03 09:18:26,878-Speed 13843.54 samples/sec   Loss 29.3104   LearningRate 0.0004   Epoch: 1   Global Step: 2450   Fp16 Grad Scale: 8192   Required: 33 hours
Training: 2022-03-03 09:18:44,714-Speed 13779.11 samples/sec   Loss 29.2667   LearningRate 0.0004   Epoch: 1   Global Step: 2460   Fp16 Grad Scale: 8192   Required: 33 hours
Training: 2022-03-03 09:19:02,488-Speed 13828.27 samples/sec   Loss 29.0750   LearningRate 0.0004   Epoch: 1   Global Step: 2470   Fp16 Grad Scale: 8192   Required: 33 hours
Training: 2022-03-03 09:19:20,297-Speed 13800.69 samples/sec   Loss 28.9467   LearningRate 0.0004   Epoch: 1   Global Step: 2480   Fp16 Grad Scale: 8192   Required: 33 hours
Training: 2022-03-03 09:19:38,151-Speed 13766.06 samples/sec   Loss 28.8525   LearningRate 0.0004   Epoch: 1   Global Step: 2490   Fp16 Grad Scale: 8192   Required: 33 hours
Training: 2022-03-03 09:19:56,076-Speed 13711.87 samples/sec   Loss 28.7785   LearningRate 0.0004   Epoch: 1   Global Step: 2500   Fp16 Grad Scale: 8192   Required: 33 hours
Training: 2022-03-03 09:20:13,872-Speed 13810.35 samples/sec   Loss 28.6038   LearningRate 0.0004   Epoch: 1   Global Step: 2510   Fp16 Grad Scale: 8192   Required: 33 hours
Training: 2022-03-03 09:20:31,680-Speed 13801.40 samples/sec   Loss 28.5469   LearningRate 0.0004   Epoch: 1   Global Step: 2520   Fp16 Grad Scale: 8192   Required: 33 hours
Training: 2022-03-03 09:20:49,584-Speed 13728.51 samples/sec   Loss 28.5782   LearningRate 0.0004   Epoch: 1   Global Step: 2530   Fp16 Grad Scale: 8192   Required: 33 hours
Training: 2022-03-03 09:21:07,475-Speed 13737.44 samples/sec   Loss 28.4046   LearningRate 0.0004   Epoch: 1   Global Step: 2540   Fp16 Grad Scale: 4096   Required: 33 hours
Training: 2022-03-03 09:21:25,522-Speed 13618.41 samples/sec   Loss 28.2257   LearningRate 0.0004   Epoch: 1   Global Step: 2550   Fp16 Grad Scale: 4096   Required: 33 hours
Training: 2022-03-03 09:21:43,383-Speed 13760.41 samples/sec   Loss 28.1340   LearningRate 0.0004   Epoch: 1   Global Step: 2560   Fp16 Grad Scale: 4096   Required: 33 hours
Training: 2022-03-03 09:22:01,298-Speed 13719.13 samples/sec   Loss 27.9651   LearningRate 0.0004   Epoch: 1   Global Step: 2570   Fp16 Grad Scale: 4096   Required: 33 hours
Training: 2022-03-03 09:22:19,145-Speed 13771.33 samples/sec   Loss 27.8427   LearningRate 0.0004   Epoch: 1   Global Step: 2580   Fp16 Grad Scale: 4096   Required: 33 hours
Training: 2022-03-03 09:22:36,891-Speed 13848.94 samples/sec   Loss 27.7203   LearningRate 0.0004   Epoch: 1   Global Step: 2590   Fp16 Grad Scale: 4096   Required: 33 hours
Training: 2022-03-03 09:22:54,744-Speed 13767.28 samples/sec   Loss 27.5573   LearningRate 0.0004   Epoch: 1   Global Step: 2600   Fp16 Grad Scale: 2048   Required: 33 hours
Training: 2022-03-03 09:23:12,550-Speed 13803.12 samples/sec   Loss 27.4450   LearningRate 0.0004   Epoch: 1   Global Step: 2610   Fp16 Grad Scale: 2048   Required: 33 hours
Training: 2022-03-03 09:23:30,394-Speed 13773.40 samples/sec   Loss 27.3745   LearningRate 0.0004   Epoch: 1   Global Step: 2620   Fp16 Grad Scale: 2048   Required: 33 hours
Training: 2022-03-03 09:23:48,245-Speed 13769.09 samples/sec   Loss 27.2276   LearningRate 0.0004   Epoch: 1   Global Step: 2630   Fp16 Grad Scale: 2048   Required: 33 hours
Training: 2022-03-03 09:24:06,052-Speed 13802.99 samples/sec   Loss 27.1099   LearningRate 0.0004   Epoch: 1   Global Step: 2640   Fp16 Grad Scale: 2048   Required: 33 hours
Training: 2022-03-03 09:24:23,880-Speed 13785.54 samples/sec   Loss 27.0195   LearningRate 0.0004   Epoch: 1   Global Step: 2650   Fp16 Grad Scale: 2048   Required: 33 hours
Training: 2022-03-03 09:24:41,709-Speed 13785.15 samples/sec   Loss 26.8865   LearningRate 0.0004   Epoch: 1   Global Step: 2660   Fp16 Grad Scale: 2048   Required: 33 hours
Training: 2022-03-03 09:24:59,524-Speed 13796.53 samples/sec   Loss 26.7253   LearningRate 0.0004   Epoch: 1   Global Step: 2670   Fp16 Grad Scale: 2048   Required: 33 hours
Training: 2022-03-03 09:25:17,458-Speed 13704.15 samples/sec   Loss 26.6498   LearningRate 0.0004   Epoch: 1   Global Step: 2680   Fp16 Grad Scale: 2048   Required: 33 hours
Training: 2022-03-03 09:25:35,249-Speed 13814.37 samples/sec   Loss 26.5064   LearningRate 0.0004   Epoch: 1   Global Step: 2690   Fp16 Grad Scale: 2048   Required: 33 hours
Training: 2022-03-03 09:25:53,121-Speed 13752.32 samples/sec   Loss 26.3562   LearningRate 0.0004   Epoch: 1   Global Step: 2700   Fp16 Grad Scale: 4096   Required: 33 hours
Training: 2022-03-03 09:26:10,915-Speed 13812.55 samples/sec   Loss 26.2177   LearningRate 0.0004   Epoch: 1   Global Step: 2710   Fp16 Grad Scale: 4096   Required: 33 hours
Training: 2022-03-03 09:26:28,718-Speed 13804.55 samples/sec   Loss 26.0584   LearningRate 0.0004   Epoch: 1   Global Step: 2720   Fp16 Grad Scale: 4096   Required: 33 hours
Training: 2022-03-03 09:26:46,644-Speed 13710.86 samples/sec   Loss 25.9390   LearningRate 0.0004   Epoch: 1   Global Step: 2730   Fp16 Grad Scale: 4096   Required: 33 hours
Training: 2022-03-03 09:27:04,494-Speed 13769.02 samples/sec   Loss 25.8979   LearningRate 0.0004   Epoch: 1   Global Step: 2740   Fp16 Grad Scale: 4096   Required: 33 hours
Training: 2022-03-03 09:27:22,283-Speed 13816.17 samples/sec   Loss 25.7764   LearningRate 0.0004   Epoch: 1   Global Step: 2750   Fp16 Grad Scale: 4096   Required: 33 hours
Training: 2022-03-03 09:27:40,123-Speed 13776.87 samples/sec   Loss 25.5662   LearningRate 0.0004   Epoch: 1   Global Step: 2760   Fp16 Grad Scale: 4096   Required: 33 hours
Training: 2022-03-03 09:27:57,918-Speed 13815.01 samples/sec   Loss 25.4346   LearningRate 0.0004   Epoch: 1   Global Step: 2770   Fp16 Grad Scale: 4096   Required: 33 hours
Training: 2022-03-03 09:28:15,703-Speed 13818.92 samples/sec   Loss 25.3088   LearningRate 0.0004   Epoch: 1   Global Step: 2780   Fp16 Grad Scale: 4096   Required: 33 hours
Training: 2022-03-03 09:28:33,558-Speed 13766.15 samples/sec   Loss 25.2201   LearningRate 0.0004   Epoch: 1   Global Step: 2790   Fp16 Grad Scale: 4096   Required: 33 hours
Training: 2022-03-03 09:28:51,381-Speed 13789.38 samples/sec   Loss 25.0652   LearningRate 0.0004   Epoch: 1   Global Step: 2800   Fp16 Grad Scale: 8192   Required: 33 hours
Training: 2022-03-03 09:29:09,195-Speed 13796.51 samples/sec   Loss 24.9586   LearningRate 0.0004   Epoch: 1   Global Step: 2810   Fp16 Grad Scale: 8192   Required: 33 hours
Training: 2022-03-03 09:29:27,153-Speed 13686.54 samples/sec   Loss 24.8858   LearningRate 0.0004   Epoch: 1   Global Step: 2820   Fp16 Grad Scale: 8192   Required: 33 hours
Training: 2022-03-03 09:29:44,963-Speed 13799.90 samples/sec   Loss 24.7069   LearningRate 0.0004   Epoch: 1   Global Step: 2830   Fp16 Grad Scale: 8192   Required: 33 hours
Training: 2022-03-03 09:30:02,815-Speed 13767.66 samples/sec   Loss 24.5654   LearningRate 0.0004   Epoch: 1   Global Step: 2840   Fp16 Grad Scale: 8192   Required: 33 hours
Training: 2022-03-03 09:30:20,597-Speed 13821.01 samples/sec   Loss 24.4343   LearningRate 0.0004   Epoch: 1   Global Step: 2850   Fp16 Grad Scale: 8192   Required: 33 hours
Training: 2022-03-03 09:30:38,396-Speed 13808.99 samples/sec   Loss 24.3576   LearningRate 0.0004   Epoch: 1   Global Step: 2860   Fp16 Grad Scale: 8192   Required: 33 hours
Training: 2022-03-03 09:30:56,208-Speed 13798.14 samples/sec   Loss 24.1971   LearningRate 0.0004   Epoch: 1   Global Step: 2870   Fp16 Grad Scale: 8192   Required: 33 hours
Training: 2022-03-03 09:31:14,008-Speed 13807.32 samples/sec   Loss 24.1181   LearningRate 0.0004   Epoch: 1   Global Step: 2880   Fp16 Grad Scale: 8192   Required: 33 hours
Training: 2022-03-03 09:31:31,839-Speed 13783.74 samples/sec   Loss 23.9296   LearningRate 0.0004   Epoch: 1   Global Step: 2890   Fp16 Grad Scale: 8192   Required: 33 hours
Training: 2022-03-03 09:31:49,662-Speed 13789.73 samples/sec   Loss 23.8119   LearningRate 0.0004   Epoch: 1   Global Step: 2900   Fp16 Grad Scale: 16384   Required: 33 hours
Training: 2022-03-03 09:32:07,546-Speed 13744.14 samples/sec   Loss 23.6697   LearningRate 0.0004   Epoch: 1   Global Step: 2910   Fp16 Grad Scale: 16384   Required: 33 hours
Training: 2022-03-03 09:32:25,414-Speed 13754.51 samples/sec   Loss 23.4899   LearningRate 0.0004   Epoch: 1   Global Step: 2920   Fp16 Grad Scale: 16384   Required: 33 hours
Training: 2022-03-03 09:32:43,274-Speed 13761.40 samples/sec   Loss 23.3738   LearningRate 0.0004   Epoch: 1   Global Step: 2930   Fp16 Grad Scale: 16384   Required: 33 hours
Training: 2022-03-03 09:33:01,145-Speed 13752.93 samples/sec   Loss 23.2855   LearningRate 0.0004   Epoch: 1   Global Step: 2940   Fp16 Grad Scale: 16384   Required: 33 hours
Training: 2022-03-03 09:33:18,974-Speed 13785.14 samples/sec   Loss 23.1657   LearningRate 0.0004   Epoch: 1   Global Step: 2950   Fp16 Grad Scale: 16384   Required: 33 hours
Training: 2022-03-03 09:33:36,793-Speed 13792.80 samples/sec   Loss 22.9788   LearningRate 0.0004   Epoch: 1   Global Step: 2960   Fp16 Grad Scale: 16384   Required: 33 hours
Training: 2022-03-03 09:33:54,685-Speed 13737.15 samples/sec   Loss 22.9435   LearningRate 0.0004   Epoch: 1   Global Step: 2970   Fp16 Grad Scale: 16384   Required: 33 hours
Training: 2022-03-03 09:34:12,498-Speed 13798.32 samples/sec   Loss 22.7832   LearningRate 0.0004   Epoch: 1   Global Step: 2980   Fp16 Grad Scale: 16384   Required: 33 hours
Training: 2022-03-03 09:34:30,299-Speed 13806.38 samples/sec   Loss 22.6576   LearningRate 0.0004   Epoch: 1   Global Step: 2990   Fp16 Grad Scale: 16384   Required: 33 hours
Training: 2022-03-03 09:34:48,126-Speed 13787.02 samples/sec   Loss 22.5278   LearningRate 0.0004   Epoch: 1   Global Step: 3000   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-03-03 09:35:05,991-Speed 13756.91 samples/sec   Loss 22.3702   LearningRate 0.0004   Epoch: 1   Global Step: 3010   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-03-03 09:35:23,878-Speed 13740.51 samples/sec   Loss 22.3141   LearningRate 0.0004   Epoch: 1   Global Step: 3020   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-03-03 09:35:41,764-Speed 13741.11 samples/sec   Loss 22.1622   LearningRate 0.0004   Epoch: 1   Global Step: 3030   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-03-03 09:35:59,586-Speed 13790.78 samples/sec   Loss 21.9911   LearningRate 0.0004   Epoch: 1   Global Step: 3040   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-03-03 09:36:17,473-Speed 13740.26 samples/sec   Loss 21.9280   LearningRate 0.0004   Epoch: 1   Global Step: 3050   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-03-03 09:36:35,605-Speed 13554.32 samples/sec   Loss 21.7514   LearningRate 0.0004   Epoch: 1   Global Step: 3060   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-03-03 09:36:53,449-Speed 13773.99 samples/sec   Loss 21.6066   LearningRate 0.0004   Epoch: 1   Global Step: 3070   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-03-03 09:37:11,382-Speed 13705.38 samples/sec   Loss 21.4856   LearningRate 0.0004   Epoch: 1   Global Step: 3080   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-03-03 09:37:29,257-Speed 13749.76 samples/sec   Loss 21.3891   LearningRate 0.0004   Epoch: 1   Global Step: 3090   Fp16 Grad Scale: 32768   Required: 33 hours
Training: 2022-03-03 09:37:47,108-Speed 13768.06 samples/sec   Loss 21.2793   LearningRate 0.0004   Epoch: 1   Global Step: 3100   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-03-03 09:38:04,989-Speed 13745.07 samples/sec   Loss 21.1248   LearningRate 0.0004   Epoch: 1   Global Step: 3110   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-03-03 09:38:22,793-Speed 13804.79 samples/sec   Loss 20.9951   LearningRate 0.0005   Epoch: 1   Global Step: 3120   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-03-03 09:38:40,684-Speed 13737.66 samples/sec   Loss 20.8648   LearningRate 0.0005   Epoch: 1   Global Step: 3130   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-03-03 09:38:58,625-Speed 13698.84 samples/sec   Loss 20.7772   LearningRate 0.0005   Epoch: 1   Global Step: 3140   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-03-03 09:39:16,451-Speed 13787.01 samples/sec   Loss 20.7054   LearningRate 0.0005   Epoch: 1   Global Step: 3150   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-03-03 09:39:34,292-Speed 13776.42 samples/sec   Loss 20.5445   LearningRate 0.0005   Epoch: 1   Global Step: 3160   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-03-03 09:39:52,211-Speed 13715.63 samples/sec   Loss 20.5153   LearningRate 0.0005   Epoch: 1   Global Step: 3170   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-03-03 09:40:10,170-Speed 13685.23 samples/sec   Loss 20.3372   LearningRate 0.0005   Epoch: 1   Global Step: 3180   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-03-03 09:40:28,098-Speed 13708.98 samples/sec   Loss 20.2067   LearningRate 0.0005   Epoch: 1   Global Step: 3190   Fp16 Grad Scale: 65536   Required: 33 hours
Training: 2022-03-03 09:40:46,024-Speed 13710.96 samples/sec   Loss 20.1509   LearningRate 0.0005   Epoch: 1   Global Step: 3200   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-03 09:41:03,892-Speed 13756.85 samples/sec   Loss 20.0018   LearningRate 0.0005   Epoch: 1   Global Step: 3210   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-03 09:41:21,840-Speed 13693.20 samples/sec   Loss 19.8522   LearningRate 0.0005   Epoch: 1   Global Step: 3220   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-03 09:41:39,711-Speed 13753.16 samples/sec   Loss 19.7650   LearningRate 0.0005   Epoch: 1   Global Step: 3230   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-03 09:41:57,666-Speed 13688.29 samples/sec   Loss 19.6733   LearningRate 0.0005   Epoch: 1   Global Step: 3240   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-03 09:42:15,512-Speed 13772.17 samples/sec   Loss 19.5675   LearningRate 0.0005   Epoch: 1   Global Step: 3250   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-03 09:42:33,401-Speed 13738.68 samples/sec   Loss 19.4078   LearningRate 0.0005   Epoch: 1   Global Step: 3260   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-03 09:42:51,381-Speed 13669.61 samples/sec   Loss 19.3313   LearningRate 0.0005   Epoch: 1   Global Step: 3270   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-03 09:43:09,282-Speed 13730.03 samples/sec   Loss 19.2461   LearningRate 0.0005   Epoch: 1   Global Step: 3280   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-03 09:43:27,157-Speed 13749.70 samples/sec   Loss 19.1304   LearningRate 0.0005   Epoch: 1   Global Step: 3290   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-03 09:43:44,991-Speed 13780.59 samples/sec   Loss 19.0403   LearningRate 0.0005   Epoch: 1   Global Step: 3300   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-03 09:44:03,008-Speed 13641.51 samples/sec   Loss 18.9700   LearningRate 0.0005   Epoch: 1   Global Step: 3310   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-03 09:44:20,807-Speed 13808.53 samples/sec   Loss 18.8407   LearningRate 0.0005   Epoch: 1   Global Step: 3320   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-03 09:44:38,645-Speed 13778.82 samples/sec   Loss 18.6800   LearningRate 0.0005   Epoch: 1   Global Step: 3330   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-03 09:44:56,498-Speed 13766.54 samples/sec   Loss 18.6049   LearningRate 0.0005   Epoch: 1   Global Step: 3340   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-03 09:45:14,384-Speed 13741.48 samples/sec   Loss 18.5446   LearningRate 0.0005   Epoch: 1   Global Step: 3350   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-03 09:45:32,263-Speed 13746.52 samples/sec   Loss 18.4337   LearningRate 0.0005   Epoch: 1   Global Step: 3360   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-03 09:45:50,157-Speed 13735.12 samples/sec   Loss 18.3089   LearningRate 0.0005   Epoch: 1   Global Step: 3370   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-03 09:46:08,065-Speed 13724.17 samples/sec   Loss 18.1975   LearningRate 0.0005   Epoch: 1   Global Step: 3380   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-03 09:46:25,961-Speed 13734.57 samples/sec   Loss 18.0874   LearningRate 0.0005   Epoch: 1   Global Step: 3390   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-03 09:46:43,887-Speed 13710.29 samples/sec   Loss 18.0297   LearningRate 0.0005   Epoch: 1   Global Step: 3400   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-03 09:47:01,709-Speed 13790.66 samples/sec   Loss 17.8903   LearningRate 0.0005   Epoch: 1   Global Step: 3410   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-03 09:47:19,682-Speed 13674.53 samples/sec   Loss 17.8396   LearningRate 0.0005   Epoch: 1   Global Step: 3420   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-03 09:47:37,540-Speed 13763.25 samples/sec   Loss 17.7496   LearningRate 0.0005   Epoch: 1   Global Step: 3430   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-03 09:47:55,399-Speed 13763.76 samples/sec   Loss 17.6247   LearningRate 0.0005   Epoch: 1   Global Step: 3440   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-03 09:48:13,254-Speed 13764.72 samples/sec   Loss 17.5910   LearningRate 0.0005   Epoch: 1   Global Step: 3450   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-03 09:49:21,084-Speed 3623.24 samples/sec   Loss 17.4380   LearningRate 0.0005   Epoch: 2   Global Step: 3460   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-03 09:49:38,953-Speed 13755.02 samples/sec   Loss 17.2721   LearningRate 0.0005   Epoch: 2   Global Step: 3470   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-03 09:49:56,762-Speed 13799.96 samples/sec   Loss 17.2015   LearningRate 0.0005   Epoch: 2   Global Step: 3480   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-03 09:50:14,694-Speed 13706.33 samples/sec   Loss 17.1504   LearningRate 0.0005   Epoch: 2   Global Step: 3490   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-03 09:50:32,556-Speed 13759.83 samples/sec   Loss 17.0511   LearningRate 0.0005   Epoch: 2   Global Step: 3500   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-03-03 09:50:50,423-Speed 13756.00 samples/sec   Loss 16.9772   LearningRate 0.0005   Epoch: 2   Global Step: 3510   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-03 09:51:08,431-Speed 13647.24 samples/sec   Loss 16.9112   LearningRate 0.0005   Epoch: 2   Global Step: 3520   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-03 09:51:26,306-Speed 13749.59 samples/sec   Loss 16.7676   LearningRate 0.0005   Epoch: 2   Global Step: 3530   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-03 09:51:44,155-Speed 13770.34 samples/sec   Loss 16.6862   LearningRate 0.0005   Epoch: 2   Global Step: 3540   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-03 09:52:02,010-Speed 13765.30 samples/sec   Loss 16.6608   LearningRate 0.0005   Epoch: 2   Global Step: 3550   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-03 09:52:19,897-Speed 13740.21 samples/sec   Loss 16.6435   LearningRate 0.0005   Epoch: 2   Global Step: 3560   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-03 09:52:38,023-Speed 13559.35 samples/sec   Loss 16.4541   LearningRate 0.0005   Epoch: 2   Global Step: 3570   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-03 09:52:55,992-Speed 13677.60 samples/sec   Loss 16.3415   LearningRate 0.0005   Epoch: 2   Global Step: 3580   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-03 09:53:13,997-Speed 13650.44 samples/sec   Loss 16.2446   LearningRate 0.0005   Epoch: 2   Global Step: 3590   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-03 09:53:31,963-Speed 13680.01 samples/sec   Loss 16.2461   LearningRate 0.0005   Epoch: 2   Global Step: 3600   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-03 09:53:49,926-Speed 13681.93 samples/sec   Loss 16.1302   LearningRate 0.0005   Epoch: 2   Global Step: 3610   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-03 09:54:07,846-Speed 13715.33 samples/sec   Loss 16.0572   LearningRate 0.0005   Epoch: 2   Global Step: 3620   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-03 09:54:25,725-Speed 13746.89 samples/sec   Loss 16.0477   LearningRate 0.0005   Epoch: 2   Global Step: 3630   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-03 09:54:43,820-Speed 13582.35 samples/sec   Loss 15.9614   LearningRate 0.0005   Epoch: 2   Global Step: 3640   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-03 09:55:01,745-Speed 13711.72 samples/sec   Loss 15.8111   LearningRate 0.0005   Epoch: 2   Global Step: 3650   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-03 09:55:19,583-Speed 13778.06 samples/sec   Loss 15.7780   LearningRate 0.0005   Epoch: 2   Global Step: 3660   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-03 09:55:37,532-Speed 13692.67 samples/sec   Loss 15.6485   LearningRate 0.0005   Epoch: 2   Global Step: 3670   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-03 09:55:55,412-Speed 13746.06 samples/sec   Loss 15.5662   LearningRate 0.0005   Epoch: 2   Global Step: 3680   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-03 09:56:13,349-Speed 13702.44 samples/sec   Loss 15.4928   LearningRate 0.0005   Epoch: 2   Global Step: 3690   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-03 09:56:31,241-Speed 13736.88 samples/sec   Loss 15.4393   LearningRate 0.0005   Epoch: 2   Global Step: 3700   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-03 09:56:49,032-Speed 13814.71 samples/sec   Loss 15.3704   LearningRate 0.0005   Epoch: 2   Global Step: 3710   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-03 09:57:06,972-Speed 13699.68 samples/sec   Loss 15.2785   LearningRate 0.0005   Epoch: 2   Global Step: 3720   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-03 09:57:24,871-Speed 13731.06 samples/sec   Loss 15.2103   LearningRate 0.0005   Epoch: 2   Global Step: 3730   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-03 09:57:42,727-Speed 13764.54 samples/sec   Loss 15.1511   LearningRate 0.0005   Epoch: 2   Global Step: 3740   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-03 09:58:00,559-Speed 13783.20 samples/sec   Loss 15.1083   LearningRate 0.0005   Epoch: 2   Global Step: 3750   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-03 09:58:18,462-Speed 13728.07 samples/sec   Loss 15.0011   LearningRate 0.0005   Epoch: 2   Global Step: 3760   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-03 09:58:36,350-Speed 13739.81 samples/sec   Loss 14.8975   LearningRate 0.0005   Epoch: 2   Global Step: 3770   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-03 09:58:54,281-Speed 13706.85 samples/sec   Loss 14.8750   LearningRate 0.0005   Epoch: 2   Global Step: 3780   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-03 09:59:12,119-Speed 13777.66 samples/sec   Loss 14.7159   LearningRate 0.0005   Epoch: 2   Global Step: 3790   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-03 09:59:29,971-Speed 13767.59 samples/sec   Loss 14.7004   LearningRate 0.0005   Epoch: 2   Global Step: 3800   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-03 09:59:47,843-Speed 13751.87 samples/sec   Loss 14.6226   LearningRate 0.0006   Epoch: 2   Global Step: 3810   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-03 10:00:05,870-Speed 13633.56 samples/sec   Loss 14.6076   LearningRate 0.0006   Epoch: 2   Global Step: 3820   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-03 10:00:23,695-Speed 13788.07 samples/sec   Loss 14.5568   LearningRate 0.0006   Epoch: 2   Global Step: 3830   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-03 10:00:41,509-Speed 13797.46 samples/sec   Loss 14.4441   LearningRate 0.0006   Epoch: 2   Global Step: 3840   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-03 10:00:59,450-Speed 13699.00 samples/sec   Loss 14.2926   LearningRate 0.0006   Epoch: 2   Global Step: 3850   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-03 10:01:17,300-Speed 13768.58 samples/sec   Loss 14.2915   LearningRate 0.0006   Epoch: 2   Global Step: 3860   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-03 10:01:35,159-Speed 13761.83 samples/sec   Loss 14.1843   LearningRate 0.0006   Epoch: 2   Global Step: 3870   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-03 10:01:53,071-Speed 13721.96 samples/sec   Loss 14.1817   LearningRate 0.0006   Epoch: 2   Global Step: 3880   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-03 10:02:11,030-Speed 13684.88 samples/sec   Loss 14.1055   LearningRate 0.0006   Epoch: 2   Global Step: 3890   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-03 10:02:28,894-Speed 13758.43 samples/sec   Loss 14.0630   LearningRate 0.0006   Epoch: 2   Global Step: 3900   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-03 10:02:46,721-Speed 13786.96 samples/sec   Loss 13.9432   LearningRate 0.0006   Epoch: 2   Global Step: 3910   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-03 10:03:04,650-Speed 13708.36 samples/sec   Loss 13.8881   LearningRate 0.0006   Epoch: 2   Global Step: 3920   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-03 10:03:22,647-Speed 13656.44 samples/sec   Loss 13.8539   LearningRate 0.0006   Epoch: 2   Global Step: 3930   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-03 10:03:40,598-Speed 13691.38 samples/sec   Loss 13.8137   LearningRate 0.0006   Epoch: 2   Global Step: 3940   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-03 10:03:58,481-Speed 13743.62 samples/sec   Loss 13.7871   LearningRate 0.0006   Epoch: 2   Global Step: 3950   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-03 10:04:16,374-Speed 13735.77 samples/sec   Loss 13.7344   LearningRate 0.0006   Epoch: 2   Global Step: 3960   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-03 10:04:34,214-Speed 13777.27 samples/sec   Loss 13.6199   LearningRate 0.0006   Epoch: 2   Global Step: 3970   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-03 10:04:52,060-Speed 13771.69 samples/sec   Loss 13.5125   LearningRate 0.0006   Epoch: 2   Global Step: 3980   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-03 10:05:09,903-Speed 13773.85 samples/sec   Loss 13.4485   LearningRate 0.0006   Epoch: 2   Global Step: 3990   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-03 10:05:27,674-Speed 13831.15 samples/sec   Loss 13.4629   LearningRate 0.0006   Epoch: 2   Global Step: 4000   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-03 10:05:45,446-Speed 13829.03 samples/sec   Loss 13.4473   LearningRate 0.0006   Epoch: 2   Global Step: 4010   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-03 10:06:03,271-Speed 13788.68 samples/sec   Loss 13.3518   LearningRate 0.0006   Epoch: 2   Global Step: 4020   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-03 10:06:21,054-Speed 13820.89 samples/sec   Loss 13.2399   LearningRate 0.0006   Epoch: 2   Global Step: 4030   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-03 10:06:38,857-Speed 13805.22 samples/sec   Loss 13.1824   LearningRate 0.0006   Epoch: 2   Global Step: 4040   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-03 10:06:56,572-Speed 13873.79 samples/sec   Loss 13.1503   LearningRate 0.0006   Epoch: 2   Global Step: 4050   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-03 10:07:14,320-Speed 13847.65 samples/sec   Loss 13.0836   LearningRate 0.0006   Epoch: 2   Global Step: 4060   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-03 10:07:32,087-Speed 13835.50 samples/sec   Loss 13.0263   LearningRate 0.0006   Epoch: 2   Global Step: 4070   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-03 10:07:49,898-Speed 13800.55 samples/sec   Loss 12.9696   LearningRate 0.0006   Epoch: 2   Global Step: 4080   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-03 10:08:07,700-Speed 13806.51 samples/sec   Loss 12.9287   LearningRate 0.0006   Epoch: 2   Global Step: 4090   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-03 10:08:25,562-Speed 13759.23 samples/sec   Loss 12.8667   LearningRate 0.0006   Epoch: 2   Global Step: 4100   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-03 10:08:43,296-Speed 13859.67 samples/sec   Loss 12.9377   LearningRate 0.0006   Epoch: 2   Global Step: 4110   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-03 10:09:01,128-Speed 13782.40 samples/sec   Loss 12.8275   LearningRate 0.0006   Epoch: 2   Global Step: 4120   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-03 10:09:18,946-Speed 13793.47 samples/sec   Loss 12.7534   LearningRate 0.0006   Epoch: 2   Global Step: 4130   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-03 10:09:36,737-Speed 13814.65 samples/sec   Loss 12.7555   LearningRate 0.0006   Epoch: 2   Global Step: 4140   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-03 10:09:54,487-Speed 13846.74 samples/sec   Loss 12.6587   LearningRate 0.0006   Epoch: 2   Global Step: 4150   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-03 10:10:12,348-Speed 13760.76 samples/sec   Loss 12.5846   LearningRate 0.0006   Epoch: 2   Global Step: 4160   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-03 10:10:30,123-Speed 13826.83 samples/sec   Loss 12.5126   LearningRate 0.0006   Epoch: 2   Global Step: 4170   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-03 10:10:47,900-Speed 13825.15 samples/sec   Loss 12.5199   LearningRate 0.0006   Epoch: 2   Global Step: 4180   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-03 10:11:05,698-Speed 13810.04 samples/sec   Loss 12.4278   LearningRate 0.0006   Epoch: 2   Global Step: 4190   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-03 10:11:23,508-Speed 13799.85 samples/sec   Loss 12.3860   LearningRate 0.0006   Epoch: 2   Global Step: 4200   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-03 10:11:41,249-Speed 13852.99 samples/sec   Loss 12.3051   LearningRate 0.0006   Epoch: 2   Global Step: 4210   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-03 10:11:58,946-Speed 13887.75 samples/sec   Loss 12.2179   LearningRate 0.0006   Epoch: 2   Global Step: 4220   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-03 10:12:16,757-Speed 13799.28 samples/sec   Loss 12.2156   LearningRate 0.0006   Epoch: 2   Global Step: 4230   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-03 10:12:34,501-Speed 13851.15 samples/sec   Loss 12.1773   LearningRate 0.0006   Epoch: 2   Global Step: 4240   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-03 10:12:52,294-Speed 13813.45 samples/sec   Loss 12.1781   LearningRate 0.0006   Epoch: 2   Global Step: 4250   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-03 10:13:10,089-Speed 13811.51 samples/sec   Loss 12.0810   LearningRate 0.0006   Epoch: 2   Global Step: 4260   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-03 10:13:27,924-Speed 13780.81 samples/sec   Loss 12.0239   LearningRate 0.0006   Epoch: 2   Global Step: 4270   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-03 10:13:45,681-Speed 13842.22 samples/sec   Loss 11.9613   LearningRate 0.0006   Epoch: 2   Global Step: 4280   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-03 10:14:03,500-Speed 13792.99 samples/sec   Loss 12.0319   LearningRate 0.0006   Epoch: 2   Global Step: 4290   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-03 10:14:21,364-Speed 13758.18 samples/sec   Loss 11.9673   LearningRate 0.0006   Epoch: 2   Global Step: 4300   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-03 10:14:39,116-Speed 13844.58 samples/sec   Loss 11.8649   LearningRate 0.0006   Epoch: 2   Global Step: 4310   Fp16 Grad Scale: 262144   Required: 33 hours
Training: 2022-03-03 10:14:56,855-Speed 13854.88 samples/sec   Loss 11.8540   LearningRate 0.0006   Epoch: 2   Global Step: 4320   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-03 10:15:14,629-Speed 13828.66 samples/sec   Loss 11.7987   LearningRate 0.0006   Epoch: 2   Global Step: 4330   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-03 10:15:32,425-Speed 13811.50 samples/sec   Loss 11.7578   LearningRate 0.0006   Epoch: 2   Global Step: 4340   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-03 10:15:50,186-Speed 13838.07 samples/sec   Loss 11.7257   LearningRate 0.0006   Epoch: 2   Global Step: 4350   Fp16 Grad Scale: 131072   Required: 33 hours
Training: 2022-03-03 10:16:07,968-Speed 13821.07 samples/sec   Loss 11.7093   LearningRate 0.0006   Epoch: 2   Global Step: 4360   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 10:16:25,842-Speed 13750.38 samples/sec   Loss 11.6251   LearningRate 0.0006   Epoch: 2   Global Step: 4370   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 10:16:43,577-Speed 13858.37 samples/sec   Loss 11.5432   LearningRate 0.0006   Epoch: 2   Global Step: 4380   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 10:17:01,306-Speed 13862.94 samples/sec   Loss 11.5216   LearningRate 0.0006   Epoch: 2   Global Step: 4390   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 10:17:19,039-Speed 13860.16 samples/sec   Loss 11.5026   LearningRate 0.0006   Epoch: 2   Global Step: 4400   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 10:17:36,830-Speed 13814.14 samples/sec   Loss 11.4639   LearningRate 0.0006   Epoch: 2   Global Step: 4410   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 10:17:54,591-Speed 13838.24 samples/sec   Loss 11.3888   LearningRate 0.0006   Epoch: 2   Global Step: 4420   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 10:18:12,326-Speed 13859.19 samples/sec   Loss 11.3967   LearningRate 0.0006   Epoch: 2   Global Step: 4430   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 10:18:30,147-Speed 13790.87 samples/sec   Loss 11.3638   LearningRate 0.0006   Epoch: 2   Global Step: 4440   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 10:18:48,099-Speed 13690.99 samples/sec   Loss 11.3500   LearningRate 0.0006   Epoch: 2   Global Step: 4450   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 10:19:06,171-Speed 13600.34 samples/sec   Loss 11.3081   LearningRate 0.0006   Epoch: 2   Global Step: 4460   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 10:19:24,163-Speed 13659.79 samples/sec   Loss 11.2517   LearningRate 0.0006   Epoch: 2   Global Step: 4470   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 10:19:42,271-Speed 13572.77 samples/sec   Loss 11.2499   LearningRate 0.0006   Epoch: 2   Global Step: 4480   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 10:20:00,336-Speed 13605.37 samples/sec   Loss 11.1613   LearningRate 0.0006   Epoch: 2   Global Step: 4490   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 10:20:18,438-Speed 13577.30 samples/sec   Loss 11.1475   LearningRate 0.0007   Epoch: 2   Global Step: 4500   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 10:20:36,495-Speed 13611.00 samples/sec   Loss 11.0924   LearningRate 0.0007   Epoch: 2   Global Step: 4510   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 10:20:54,585-Speed 13586.46 samples/sec   Loss 11.0383   LearningRate 0.0007   Epoch: 2   Global Step: 4520   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 10:21:12,678-Speed 13583.60 samples/sec   Loss 11.0638   LearningRate 0.0007   Epoch: 2   Global Step: 4530   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 10:21:30,686-Speed 13648.36 samples/sec   Loss 11.0390   LearningRate 0.0007   Epoch: 2   Global Step: 4540   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 10:21:48,732-Speed 13618.98 samples/sec   Loss 10.9646   LearningRate 0.0007   Epoch: 2   Global Step: 4550   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 10:22:06,846-Speed 13568.94 samples/sec   Loss 10.8788   LearningRate 0.0007   Epoch: 2   Global Step: 4560   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 10:22:24,887-Speed 13622.37 samples/sec   Loss 10.8641   LearningRate 0.0007   Epoch: 2   Global Step: 4570   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 10:22:42,967-Speed 13593.93 samples/sec   Loss 10.8553   LearningRate 0.0007   Epoch: 2   Global Step: 4580   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 10:23:01,092-Speed 13560.25 samples/sec   Loss 10.7868   LearningRate 0.0007   Epoch: 2   Global Step: 4590   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 10:23:19,165-Speed 13599.02 samples/sec   Loss 10.7302   LearningRate 0.0007   Epoch: 2   Global Step: 4600   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 10:23:37,191-Speed 13634.71 samples/sec   Loss 10.7951   LearningRate 0.0007   Epoch: 2   Global Step: 4610   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 10:23:55,283-Speed 13583.72 samples/sec   Loss 10.7043   LearningRate 0.0007   Epoch: 2   Global Step: 4620   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 10:24:13,374-Speed 13586.23 samples/sec   Loss 10.7075   LearningRate 0.0007   Epoch: 2   Global Step: 4630   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 10:24:31,420-Speed 13619.50 samples/sec   Loss 10.6534   LearningRate 0.0007   Epoch: 2   Global Step: 4640   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 10:24:49,534-Speed 13568.19 samples/sec   Loss 10.6392   LearningRate 0.0007   Epoch: 2   Global Step: 4650   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 10:25:07,719-Speed 13515.18 samples/sec   Loss 10.6968   LearningRate 0.0007   Epoch: 2   Global Step: 4660   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 10:25:25,783-Speed 13605.09 samples/sec   Loss 10.5744   LearningRate 0.0007   Epoch: 2   Global Step: 4670   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 10:25:43,880-Speed 13581.20 samples/sec   Loss 10.4908   LearningRate 0.0007   Epoch: 2   Global Step: 4680   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 10:26:02,144-Speed 13456.35 samples/sec   Loss 10.5948   LearningRate 0.0007   Epoch: 2   Global Step: 4690   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 10:26:20,176-Speed 13630.09 samples/sec   Loss 10.4815   LearningRate 0.0007   Epoch: 2   Global Step: 4700   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 10:26:38,207-Speed 13631.19 samples/sec   Loss 10.4757   LearningRate 0.0007   Epoch: 2   Global Step: 4710   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 10:26:56,263-Speed 13612.01 samples/sec   Loss 10.3633   LearningRate 0.0007   Epoch: 2   Global Step: 4720   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 10:27:14,344-Speed 13592.44 samples/sec   Loss 10.3554   LearningRate 0.0007   Epoch: 2   Global Step: 4730   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 10:27:32,517-Speed 13524.58 samples/sec   Loss 10.3275   LearningRate 0.0007   Epoch: 2   Global Step: 4740   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 10:27:50,583-Speed 13604.16 samples/sec   Loss 10.3043   LearningRate 0.0007   Epoch: 2   Global Step: 4750   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 10:28:08,736-Speed 13539.26 samples/sec   Loss 10.2927   LearningRate 0.0007   Epoch: 2   Global Step: 4760   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 10:28:26,513-Speed 13824.63 samples/sec   Loss 10.2059   LearningRate 0.0007   Epoch: 2   Global Step: 4770   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 10:28:44,305-Speed 13814.55 samples/sec   Loss 10.2282   LearningRate 0.0007   Epoch: 2   Global Step: 4780   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 10:29:02,042-Speed 13856.45 samples/sec   Loss 10.1813   LearningRate 0.0007   Epoch: 2   Global Step: 4790   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 10:29:19,771-Speed 13862.55 samples/sec   Loss 10.2093   LearningRate 0.0007   Epoch: 2   Global Step: 4800   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 10:29:37,557-Speed 13818.64 samples/sec   Loss 10.2023   LearningRate 0.0007   Epoch: 2   Global Step: 4810   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 10:29:55,306-Speed 13847.63 samples/sec   Loss 10.0780   LearningRate 0.0007   Epoch: 2   Global Step: 4820   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 10:30:13,092-Speed 13818.91 samples/sec   Loss 10.0414   LearningRate 0.0007   Epoch: 2   Global Step: 4830   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 10:30:30,819-Speed 13863.95 samples/sec   Loss 10.0319   LearningRate 0.0007   Epoch: 2   Global Step: 4840   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 10:30:48,529-Speed 13878.14 samples/sec   Loss 10.0415   LearningRate 0.0007   Epoch: 2   Global Step: 4850   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 10:31:06,278-Speed 13847.75 samples/sec   Loss 10.0393   LearningRate 0.0007   Epoch: 2   Global Step: 4860   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 10:31:24,096-Speed 13793.49 samples/sec   Loss 9.9965   LearningRate 0.0007   Epoch: 2   Global Step: 4870   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 10:31:41,922-Speed 13787.66 samples/sec   Loss 9.9941   LearningRate 0.0007   Epoch: 2   Global Step: 4880   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 10:31:59,816-Speed 13734.78 samples/sec   Loss 9.8827   LearningRate 0.0007   Epoch: 2   Global Step: 4890   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 10:32:17,606-Speed 13815.42 samples/sec   Loss 9.9247   LearningRate 0.0007   Epoch: 2   Global Step: 4900   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 10:32:35,360-Speed 13843.64 samples/sec   Loss 9.8317   LearningRate 0.0007   Epoch: 2   Global Step: 4910   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 10:32:53,113-Speed 13843.86 samples/sec   Loss 9.9766   LearningRate 0.0007   Epoch: 2   Global Step: 4920   Fp16 Grad Scale: 262144   Required: 32 hours
Training: 2022-03-03 10:33:10,824-Speed 13877.20 samples/sec   Loss 9.8587   LearningRate 0.0007   Epoch: 2   Global Step: 4930   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 10:33:28,637-Speed 13797.54 samples/sec   Loss 9.8553   LearningRate 0.0007   Epoch: 2   Global Step: 4940   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 10:33:46,365-Speed 13863.66 samples/sec   Loss 9.8242   LearningRate 0.0007   Epoch: 2   Global Step: 4950   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 10:34:04,185-Speed 13792.55 samples/sec   Loss 9.7939   LearningRate 0.0007   Epoch: 2   Global Step: 4960   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 10:34:21,973-Speed 13816.51 samples/sec   Loss 9.7381   LearningRate 0.0007   Epoch: 2   Global Step: 4970   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 10:34:39,719-Speed 13850.18 samples/sec   Loss 9.7562   LearningRate 0.0007   Epoch: 2   Global Step: 4980   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 10:34:57,472-Speed 13843.53 samples/sec   Loss 9.6810   LearningRate 0.0007   Epoch: 2   Global Step: 4990   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-03 10:35:15,270-Speed 13809.55 samples/sec   Loss 9.6715   LearningRate 0.0007   Epoch: 2   Global Step: 5000   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-03 10:35:33,042-Speed 13829.19 samples/sec   Loss 9.6479   LearningRate 0.0007   Epoch: 2   Global Step: 5010   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-03 10:35:50,802-Speed 13838.53 samples/sec   Loss 9.6420   LearningRate 0.0007   Epoch: 2   Global Step: 5020   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-03 10:36:08,598-Speed 13811.13 samples/sec   Loss 9.6333   LearningRate 0.0007   Epoch: 2   Global Step: 5030   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-03 10:36:26,334-Speed 13856.94 samples/sec   Loss 9.5596   LearningRate 0.0007   Epoch: 2   Global Step: 5040   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-03 10:36:44,228-Speed 13735.48 samples/sec   Loss 9.5516   LearningRate 0.0007   Epoch: 2   Global Step: 5050   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-03 10:37:02,061-Speed 13782.02 samples/sec   Loss 9.6145   LearningRate 0.0007   Epoch: 2   Global Step: 5060   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-03 10:37:19,812-Speed 13845.33 samples/sec   Loss 9.5411   LearningRate 0.0007   Epoch: 2   Global Step: 5070   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-03 10:37:37,642-Speed 13784.73 samples/sec   Loss 9.4940   LearningRate 0.0007   Epoch: 2   Global Step: 5080   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-03 10:37:55,386-Speed 13851.64 samples/sec   Loss 9.4702   LearningRate 0.0007   Epoch: 2   Global Step: 5090   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 10:38:13,092-Speed 13880.24 samples/sec   Loss 9.4059   LearningRate 0.0007   Epoch: 2   Global Step: 5100   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 10:38:30,782-Speed 13893.89 samples/sec   Loss 9.4620   LearningRate 0.0007   Epoch: 2   Global Step: 5110   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 10:38:48,573-Speed 13815.31 samples/sec   Loss 9.4602   LearningRate 0.0007   Epoch: 2   Global Step: 5120   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 10:39:06,326-Speed 13844.42 samples/sec   Loss 9.4277   LearningRate 0.0007   Epoch: 2   Global Step: 5130   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 10:39:24,036-Speed 13878.12 samples/sec   Loss 9.4240   LearningRate 0.0007   Epoch: 2   Global Step: 5140   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 10:39:41,761-Speed 13865.71 samples/sec   Loss 9.4218   LearningRate 0.0007   Epoch: 2   Global Step: 5150   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-03 10:39:59,486-Speed 13866.20 samples/sec   Loss 9.3741   LearningRate 0.0007   Epoch: 2   Global Step: 5160   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-03 10:40:17,201-Speed 13873.76 samples/sec   Loss 9.4015   LearningRate 0.0007   Epoch: 2   Global Step: 5170   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-03 10:40:34,944-Speed 13853.98 samples/sec   Loss 9.3555   LearningRate 0.0007   Epoch: 2   Global Step: 5180   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-03 10:41:42,586-Speed 3633.26 samples/sec   Loss 9.2058   LearningRate 0.0008   Epoch: 3   Global Step: 5190   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-03 10:42:00,306-Speed 13869.76 samples/sec   Loss 9.1614   LearningRate 0.0008   Epoch: 3   Global Step: 5200   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-03 10:42:18,054-Speed 13848.31 samples/sec   Loss 9.1329   LearningRate 0.0008   Epoch: 3   Global Step: 5210   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-03 10:42:35,771-Speed 13872.12 samples/sec   Loss 9.1619   LearningRate 0.0008   Epoch: 3   Global Step: 5220   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-03 10:42:53,495-Speed 13867.20 samples/sec   Loss 9.1391   LearningRate 0.0008   Epoch: 3   Global Step: 5230   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-03 10:43:11,295-Speed 13807.64 samples/sec   Loss 9.0847   LearningRate 0.0008   Epoch: 3   Global Step: 5240   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-03 10:43:28,986-Speed 13892.48 samples/sec   Loss 9.0987   LearningRate 0.0008   Epoch: 3   Global Step: 5250   Fp16 Grad Scale: 32768   Required: 32 hours
Training: 2022-03-03 10:43:46,771-Speed 13819.34 samples/sec   Loss 9.0874   LearningRate 0.0008   Epoch: 3   Global Step: 5260   Fp16 Grad Scale: 32768   Required: 32 hours
Training: 2022-03-03 10:44:04,587-Speed 13794.77 samples/sec   Loss 9.0138   LearningRate 0.0008   Epoch: 3   Global Step: 5270   Fp16 Grad Scale: 32768   Required: 32 hours
Training: 2022-03-03 10:44:22,350-Speed 13836.64 samples/sec   Loss 9.0256   LearningRate 0.0008   Epoch: 3   Global Step: 5280   Fp16 Grad Scale: 32768   Required: 32 hours
Training: 2022-03-03 10:44:40,137-Speed 13818.94 samples/sec   Loss 9.0360   LearningRate 0.0008   Epoch: 3   Global Step: 5290   Fp16 Grad Scale: 32768   Required: 32 hours
Training: 2022-03-03 10:44:57,867-Speed 13861.61 samples/sec   Loss 8.9814   LearningRate 0.0008   Epoch: 3   Global Step: 5300   Fp16 Grad Scale: 32768   Required: 32 hours
Training: 2022-03-03 10:45:15,662-Speed 13811.98 samples/sec   Loss 8.9942   LearningRate 0.0008   Epoch: 3   Global Step: 5310   Fp16 Grad Scale: 32768   Required: 32 hours
Training: 2022-03-03 10:45:33,501-Speed 13776.72 samples/sec   Loss 8.9716   LearningRate 0.0008   Epoch: 3   Global Step: 5320   Fp16 Grad Scale: 32768   Required: 32 hours
Training: 2022-03-03 10:45:51,301-Speed 13808.02 samples/sec   Loss 8.9904   LearningRate 0.0008   Epoch: 3   Global Step: 5330   Fp16 Grad Scale: 32768   Required: 32 hours
Training: 2022-03-03 10:46:09,233-Speed 13705.98 samples/sec   Loss 8.9571   LearningRate 0.0008   Epoch: 3   Global Step: 5340   Fp16 Grad Scale: 32768   Required: 32 hours
Training: 2022-03-03 10:46:27,115-Speed 13743.69 samples/sec   Loss 9.0349   LearningRate 0.0008   Epoch: 3   Global Step: 5350   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-03 10:46:44,836-Speed 13869.09 samples/sec   Loss 8.9203   LearningRate 0.0008   Epoch: 3   Global Step: 5360   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-03 10:47:02,708-Speed 13751.98 samples/sec   Loss 8.8564   LearningRate 0.0008   Epoch: 3   Global Step: 5370   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-03 10:47:20,470-Speed 13837.33 samples/sec   Loss 8.8409   LearningRate 0.0008   Epoch: 3   Global Step: 5380   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-03 10:47:38,202-Speed 13860.15 samples/sec   Loss 8.8298   LearningRate 0.0008   Epoch: 3   Global Step: 5390   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-03 10:47:55,977-Speed 13827.63 samples/sec   Loss 8.8980   LearningRate 0.0008   Epoch: 3   Global Step: 5400   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-03 10:48:13,753-Speed 13826.42 samples/sec   Loss 8.8237   LearningRate 0.0008   Epoch: 3   Global Step: 5410   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-03 10:48:31,518-Speed 13834.72 samples/sec   Loss 8.8194   LearningRate 0.0008   Epoch: 3   Global Step: 5420   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-03 10:48:49,266-Speed 13847.83 samples/sec   Loss 8.7002   LearningRate 0.0008   Epoch: 3   Global Step: 5430   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-03 10:49:07,064-Speed 13809.50 samples/sec   Loss 8.7778   LearningRate 0.0008   Epoch: 3   Global Step: 5440   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-03 10:49:24,774-Speed 13877.81 samples/sec   Loss 8.7554   LearningRate 0.0008   Epoch: 3   Global Step: 5450   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 10:49:42,547-Speed 13828.46 samples/sec   Loss 8.8205   LearningRate 0.0008   Epoch: 3   Global Step: 5460   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 10:50:00,371-Speed 13789.22 samples/sec   Loss 8.7396   LearningRate 0.0008   Epoch: 3   Global Step: 5470   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 10:50:18,143-Speed 13829.03 samples/sec   Loss 8.7064   LearningRate 0.0008   Epoch: 3   Global Step: 5480   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 10:50:35,886-Speed 13851.87 samples/sec   Loss 8.6784   LearningRate 0.0008   Epoch: 3   Global Step: 5490   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 10:50:53,655-Speed 13831.96 samples/sec   Loss 8.6061   LearningRate 0.0008   Epoch: 3   Global Step: 5500   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 10:51:11,455-Speed 13807.61 samples/sec   Loss 8.6355   LearningRate 0.0008   Epoch: 3   Global Step: 5510   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 10:51:29,274-Speed 13793.34 samples/sec   Loss 8.6202   LearningRate 0.0008   Epoch: 3   Global Step: 5520   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 10:51:47,084-Speed 13799.71 samples/sec   Loss 8.5954   LearningRate 0.0008   Epoch: 3   Global Step: 5530   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 10:52:04,876-Speed 13813.77 samples/sec   Loss 8.6206   LearningRate 0.0008   Epoch: 3   Global Step: 5540   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 10:52:22,618-Speed 13852.59 samples/sec   Loss 8.6838   LearningRate 0.0008   Epoch: 3   Global Step: 5550   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 10:52:40,423-Speed 13804.63 samples/sec   Loss 8.6419   LearningRate 0.0008   Epoch: 3   Global Step: 5560   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 10:52:58,206-Speed 13822.10 samples/sec   Loss 8.5643   LearningRate 0.0008   Epoch: 3   Global Step: 5570   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 10:53:15,951-Speed 13850.22 samples/sec   Loss 8.5148   LearningRate 0.0008   Epoch: 3   Global Step: 5580   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 10:53:33,676-Speed 13866.06 samples/sec   Loss 8.4501   LearningRate 0.0008   Epoch: 3   Global Step: 5590   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 10:53:51,502-Speed 13787.14 samples/sec   Loss 8.5865   LearningRate 0.0008   Epoch: 3   Global Step: 5600   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 10:54:09,223-Speed 13869.70 samples/sec   Loss 8.4834   LearningRate 0.0008   Epoch: 3   Global Step: 5610   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 10:54:26,993-Speed 13830.85 samples/sec   Loss 8.4266   LearningRate 0.0008   Epoch: 3   Global Step: 5620   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 10:54:44,760-Speed 13833.19 samples/sec   Loss 8.4901   LearningRate 0.0008   Epoch: 3   Global Step: 5630   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 10:55:02,598-Speed 13778.03 samples/sec   Loss 8.4355   LearningRate 0.0008   Epoch: 3   Global Step: 5640   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 10:55:20,383-Speed 13819.13 samples/sec   Loss 8.4356   LearningRate 0.0008   Epoch: 3   Global Step: 5650   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 10:55:38,164-Speed 13823.80 samples/sec   Loss 8.4333   LearningRate 0.0008   Epoch: 3   Global Step: 5660   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 10:55:55,983-Speed 13792.52 samples/sec   Loss 8.3614   LearningRate 0.0008   Epoch: 3   Global Step: 5670   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 10:56:13,738-Speed 13843.23 samples/sec   Loss 8.3327   LearningRate 0.0008   Epoch: 3   Global Step: 5680   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 10:56:31,577-Speed 13778.36 samples/sec   Loss 8.3549   LearningRate 0.0008   Epoch: 3   Global Step: 5690   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 10:56:49,321-Speed 13851.02 samples/sec   Loss 8.3495   LearningRate 0.0008   Epoch: 3   Global Step: 5700   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 10:57:07,114-Speed 13813.47 samples/sec   Loss 8.3570   LearningRate 0.0008   Epoch: 3   Global Step: 5710   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 10:57:25,028-Speed 13719.35 samples/sec   Loss 8.3023   LearningRate 0.0008   Epoch: 3   Global Step: 5720   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 10:57:42,848-Speed 13791.86 samples/sec   Loss 8.2486   LearningRate 0.0008   Epoch: 3   Global Step: 5730   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 10:58:00,690-Speed 13775.38 samples/sec   Loss 8.2982   LearningRate 0.0008   Epoch: 3   Global Step: 5740   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 10:58:18,545-Speed 13765.52 samples/sec   Loss 8.3405   LearningRate 0.0008   Epoch: 3   Global Step: 5750   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 10:58:36,266-Speed 13868.96 samples/sec   Loss 8.3031   LearningRate 0.0008   Epoch: 3   Global Step: 5760   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 10:58:53,981-Speed 13873.65 samples/sec   Loss 8.2758   LearningRate 0.0008   Epoch: 3   Global Step: 5770   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 10:59:11,719-Speed 13856.07 samples/sec   Loss 8.2624   LearningRate 0.0008   Epoch: 3   Global Step: 5780   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 10:59:29,499-Speed 13823.06 samples/sec   Loss 8.1798   LearningRate 0.0008   Epoch: 3   Global Step: 5790   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 10:59:47,297-Speed 13809.51 samples/sec   Loss 8.1933   LearningRate 0.0008   Epoch: 3   Global Step: 5800   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 11:00:05,084-Speed 13817.34 samples/sec   Loss 8.1857   LearningRate 0.0008   Epoch: 3   Global Step: 5810   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 11:00:22,841-Speed 13841.19 samples/sec   Loss 8.1749   LearningRate 0.0008   Epoch: 3   Global Step: 5820   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 11:00:40,663-Speed 13790.43 samples/sec   Loss 8.1440   LearningRate 0.0008   Epoch: 3   Global Step: 5830   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 11:00:58,370-Speed 13880.22 samples/sec   Loss 8.1235   LearningRate 0.0008   Epoch: 3   Global Step: 5840   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 11:01:16,049-Speed 13902.55 samples/sec   Loss 8.1656   LearningRate 0.0008   Epoch: 3   Global Step: 5850   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-03 11:01:33,819-Speed 13830.72 samples/sec   Loss 8.0975   LearningRate 0.0008   Epoch: 3   Global Step: 5860   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-03 11:01:51,564-Speed 13850.29 samples/sec   Loss 8.0329   LearningRate 0.0008   Epoch: 3   Global Step: 5870   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-03 11:02:09,329-Speed 13834.90 samples/sec   Loss 8.0611   LearningRate 0.0009   Epoch: 3   Global Step: 5880   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-03 11:02:27,071-Speed 13852.51 samples/sec   Loss 8.0166   LearningRate 0.0009   Epoch: 3   Global Step: 5890   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-03 11:02:44,859-Speed 13817.46 samples/sec   Loss 8.0842   LearningRate 0.0009   Epoch: 3   Global Step: 5900   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-03 11:03:02,536-Speed 13903.69 samples/sec   Loss 8.0597   LearningRate 0.0009   Epoch: 3   Global Step: 5910   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-03 11:03:20,412-Speed 13749.93 samples/sec   Loss 8.0502   LearningRate 0.0009   Epoch: 3   Global Step: 5920   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-03 11:03:38,096-Speed 13897.92 samples/sec   Loss 7.9858   LearningRate 0.0009   Epoch: 3   Global Step: 5930   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-03 11:03:55,838-Speed 13853.28 samples/sec   Loss 8.0511   LearningRate 0.0009   Epoch: 3   Global Step: 5940   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-03 11:04:13,603-Speed 13835.49 samples/sec   Loss 8.0138   LearningRate 0.0009   Epoch: 3   Global Step: 5950   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 11:04:31,272-Speed 13909.81 samples/sec   Loss 7.9892   LearningRate 0.0009   Epoch: 3   Global Step: 5960   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-03 11:04:49,003-Speed 13862.33 samples/sec   Loss 7.9692   LearningRate 0.0009   Epoch: 3   Global Step: 5970   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-03 11:05:06,808-Speed 13803.93 samples/sec   Loss 7.9789   LearningRate 0.0009   Epoch: 3   Global Step: 5980   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-03 11:05:24,577-Speed 13832.23 samples/sec   Loss 7.9013   LearningRate 0.0009   Epoch: 3   Global Step: 5990   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-03 11:05:42,260-Speed 13898.31 samples/sec   Loss 7.8475   LearningRate 0.0009   Epoch: 3   Global Step: 6000   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-03 11:06:00,007-Speed 13849.61 samples/sec   Loss 7.8437   LearningRate 0.0009   Epoch: 3   Global Step: 6010   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-03 11:06:17,756-Speed 13847.07 samples/sec   Loss 7.8577   LearningRate 0.0009   Epoch: 3   Global Step: 6020   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-03 11:06:35,461-Speed 13882.09 samples/sec   Loss 7.8226   LearningRate 0.0009   Epoch: 3   Global Step: 6030   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-03 11:06:53,173-Speed 13875.40 samples/sec   Loss 7.8788   LearningRate 0.0009   Epoch: 3   Global Step: 6040   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-03 11:07:10,850-Speed 13903.56 samples/sec   Loss 7.8260   LearningRate 0.0009   Epoch: 3   Global Step: 6050   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-03 11:07:28,626-Speed 13828.02 samples/sec   Loss 7.7779   LearningRate 0.0009   Epoch: 3   Global Step: 6060   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 11:07:46,431-Speed 13804.07 samples/sec   Loss 7.7919   LearningRate 0.0009   Epoch: 3   Global Step: 6070   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 11:08:04,187-Speed 13841.43 samples/sec   Loss 7.7967   LearningRate 0.0009   Epoch: 3   Global Step: 6080   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 11:08:21,866-Speed 13902.36 samples/sec   Loss 7.8695   LearningRate 0.0009   Epoch: 3   Global Step: 6090   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 11:08:39,560-Speed 13890.83 samples/sec   Loss 7.7519   LearningRate 0.0009   Epoch: 3   Global Step: 6100   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 11:08:57,344-Speed 13819.37 samples/sec   Loss 7.7390   LearningRate 0.0009   Epoch: 3   Global Step: 6110   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-03 11:09:15,094-Speed 13846.66 samples/sec   Loss 7.7998   LearningRate 0.0009   Epoch: 3   Global Step: 6120   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-03 11:09:32,894-Speed 13807.67 samples/sec   Loss 7.7781   LearningRate 0.0009   Epoch: 3   Global Step: 6130   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-03 11:09:50,692-Speed 13809.73 samples/sec   Loss 7.7308   LearningRate 0.0009   Epoch: 3   Global Step: 6140   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-03 11:10:08,493-Speed 13806.52 samples/sec   Loss 7.6352   LearningRate 0.0009   Epoch: 3   Global Step: 6150   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-03 11:10:26,213-Speed 13869.55 samples/sec   Loss 7.6544   LearningRate 0.0009   Epoch: 3   Global Step: 6160   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-03 11:10:43,931-Speed 13871.97 samples/sec   Loss 7.6337   LearningRate 0.0009   Epoch: 3   Global Step: 6170   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-03 11:11:01,667-Speed 13857.31 samples/sec   Loss 7.6323   LearningRate 0.0009   Epoch: 3   Global Step: 6180   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-03 11:11:19,356-Speed 13894.92 samples/sec   Loss 7.6765   LearningRate 0.0009   Epoch: 3   Global Step: 6190   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-03 11:11:37,136-Speed 13823.47 samples/sec   Loss 7.6130   LearningRate 0.0009   Epoch: 3   Global Step: 6200   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-03 11:11:55,041-Speed 13726.18 samples/sec   Loss 7.6326   LearningRate 0.0009   Epoch: 3   Global Step: 6210   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 11:12:12,915-Speed 13750.34 samples/sec   Loss 7.5820   LearningRate 0.0009   Epoch: 3   Global Step: 6220   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 11:12:30,637-Speed 13868.26 samples/sec   Loss 7.6288   LearningRate 0.0009   Epoch: 3   Global Step: 6230   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 11:12:48,381-Speed 13851.51 samples/sec   Loss 7.5696   LearningRate 0.0009   Epoch: 3   Global Step: 6240   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 11:13:06,146-Speed 13834.65 samples/sec   Loss 7.5637   LearningRate 0.0009   Epoch: 3   Global Step: 6250   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 11:13:23,838-Speed 13892.65 samples/sec   Loss 7.5502   LearningRate 0.0009   Epoch: 3   Global Step: 6260   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 11:13:41,546-Speed 13879.08 samples/sec   Loss 7.5417   LearningRate 0.0009   Epoch: 3   Global Step: 6270   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 11:13:59,307-Speed 13838.12 samples/sec   Loss 7.5335   LearningRate 0.0009   Epoch: 3   Global Step: 6280   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 11:14:17,071-Speed 13835.84 samples/sec   Loss 7.4937   LearningRate 0.0009   Epoch: 3   Global Step: 6290   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 11:14:34,795-Speed 13866.65 samples/sec   Loss 7.5509   LearningRate 0.0009   Epoch: 3   Global Step: 6300   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 11:14:52,491-Speed 13888.63 samples/sec   Loss 7.5001   LearningRate 0.0009   Epoch: 3   Global Step: 6310   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 11:15:10,254-Speed 13837.34 samples/sec   Loss 7.5129   LearningRate 0.0009   Epoch: 3   Global Step: 6320   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 11:15:28,005-Speed 13846.15 samples/sec   Loss 7.4951   LearningRate 0.0009   Epoch: 3   Global Step: 6330   Fp16 Grad Scale: 131072   Required: 32 hours
Training: 2022-03-03 11:15:45,845-Speed 13776.29 samples/sec   Loss 7.4669   LearningRate 0.0009   Epoch: 3   Global Step: 6340   Fp16 Grad Scale: 65536   Required: 32 hours
Training: 2022-03-03 11:16:03,561-Speed 13873.50 samples/sec   Loss 7.4764   LearningRate 0.0009   Epoch: 3   Global Step: 6350   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-03 11:16:21,329-Speed 13832.35 samples/sec   Loss 7.5298   LearningRate 0.0009   Epoch: 3   Global Step: 6360   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-03 11:16:39,123-Speed 13812.68 samples/sec   Loss 7.4059   LearningRate 0.0009   Epoch: 3   Global Step: 6370   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-03 11:16:56,854-Speed 13860.84 samples/sec   Loss 7.4255   LearningRate 0.0009   Epoch: 3   Global Step: 6380   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-03 11:17:14,687-Speed 13782.17 samples/sec   Loss 7.4025   LearningRate 0.0009   Epoch: 3   Global Step: 6390   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-03 11:17:32,468-Speed 13822.66 samples/sec   Loss 7.3947   LearningRate 0.0009   Epoch: 3   Global Step: 6400   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-03 11:17:50,343-Speed 13749.50 samples/sec   Loss 7.4148   LearningRate 0.0009   Epoch: 3   Global Step: 6410   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-03 11:18:08,135-Speed 13813.63 samples/sec   Loss 7.3469   LearningRate 0.0009   Epoch: 3   Global Step: 6420   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-03 11:18:25,846-Speed 13878.11 samples/sec   Loss 7.3420   LearningRate 0.0009   Epoch: 3   Global Step: 6430   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-03 11:18:43,560-Speed 13874.38 samples/sec   Loss 7.3463   LearningRate 0.0009   Epoch: 3   Global Step: 6440   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-03-03 11:19:01,347-Speed 13818.31 samples/sec   Loss 7.3787   LearningRate 0.0009   Epoch: 3   Global Step: 6450   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-03 11:19:19,153-Speed 13802.24 samples/sec   Loss 7.3445   LearningRate 0.0009   Epoch: 3   Global Step: 6460   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-03 11:19:36,931-Speed 13825.47 samples/sec   Loss 7.3244   LearningRate 0.0009   Epoch: 3   Global Step: 6470   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-03 11:19:54,647-Speed 13872.97 samples/sec   Loss 7.3392   LearningRate 0.0009   Epoch: 3   Global Step: 6480   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-03 11:20:12,434-Speed 13817.25 samples/sec   Loss 7.3297   LearningRate 0.0009   Epoch: 3   Global Step: 6490   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-03 11:20:30,197-Speed 13836.53 samples/sec   Loss 7.2753   LearningRate 0.0009   Epoch: 3   Global Step: 6500   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-03 11:20:47,910-Speed 13875.58 samples/sec   Loss 7.2408   LearningRate 0.0009   Epoch: 3   Global Step: 6510   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-03 11:21:05,632-Speed 13868.55 samples/sec   Loss 7.3368   LearningRate 0.0009   Epoch: 3   Global Step: 6520   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-03 11:21:23,332-Speed 13885.40 samples/sec   Loss 7.3690   LearningRate 0.0009   Epoch: 3   Global Step: 6530   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-03 11:21:41,037-Speed 13881.24 samples/sec   Loss 7.3183   LearningRate 0.0009   Epoch: 3   Global Step: 6540   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-03 11:21:58,813-Speed 13826.92 samples/sec   Loss 7.2687   LearningRate 0.0009   Epoch: 3   Global Step: 6550   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-03-03 11:22:16,671-Speed 13762.01 samples/sec   Loss 7.2648   LearningRate 0.0009   Epoch: 3   Global Step: 6560   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-03 11:22:34,356-Speed 13897.52 samples/sec   Loss 7.2354   LearningRate 0.0010   Epoch: 3   Global Step: 6570   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-03 11:22:52,054-Speed 13887.71 samples/sec   Loss 7.2721   LearningRate 0.0010   Epoch: 3   Global Step: 6580   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-03 11:23:09,724-Speed 13909.08 samples/sec   Loss 7.1960   LearningRate 0.0010   Epoch: 3   Global Step: 6590   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-03 11:23:27,420-Speed 13888.34 samples/sec   Loss 7.1189   LearningRate 0.0010   Epoch: 3   Global Step: 6600   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-03 11:23:45,147-Speed 13864.35 samples/sec   Loss 7.2032   LearningRate 0.0010   Epoch: 3   Global Step: 6610   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-03 11:24:02,896-Speed 13846.94 samples/sec   Loss 7.2223   LearningRate 0.0010   Epoch: 3   Global Step: 6620   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-03 11:24:20,641-Speed 13850.75 samples/sec   Loss 7.1755   LearningRate 0.0010   Epoch: 3   Global Step: 6630   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-03 11:24:38,396-Speed 13842.39 samples/sec   Loss 7.1712   LearningRate 0.0010   Epoch: 3   Global Step: 6640   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-03 11:24:56,185-Speed 13816.03 samples/sec   Loss 7.1562   LearningRate 0.0010   Epoch: 3   Global Step: 6650   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-03 11:25:13,928-Speed 13851.30 samples/sec   Loss 7.1379   LearningRate 0.0010   Epoch: 3   Global Step: 6660   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-03-03 11:25:31,652-Speed 13867.12 samples/sec   Loss 7.0824   LearningRate 0.0010   Epoch: 3   Global Step: 6670   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-03-03 11:25:49,392-Speed 13853.76 samples/sec   Loss 7.1319   LearningRate 0.0010   Epoch: 3   Global Step: 6680   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-03-03 11:26:07,202-Speed 13800.03 samples/sec   Loss 7.1017   LearningRate 0.0010   Epoch: 3   Global Step: 6690   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-03-03 11:26:24,978-Speed 13826.55 samples/sec   Loss 7.1224   LearningRate 0.0010   Epoch: 3   Global Step: 6700   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-03-03 11:26:42,729-Speed 13846.10 samples/sec   Loss 7.0844   LearningRate 0.0010   Epoch: 3   Global Step: 6710   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-03-03 11:27:00,476-Speed 13848.16 samples/sec   Loss 7.1206   LearningRate 0.0010   Epoch: 3   Global Step: 6720   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-03-03 11:27:18,216-Speed 13854.57 samples/sec   Loss 7.0672   LearningRate 0.0010   Epoch: 3   Global Step: 6730   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-03-03 11:27:35,896-Speed 13901.79 samples/sec   Loss 7.0799   LearningRate 0.0010   Epoch: 3   Global Step: 6740   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-03-03 11:27:53,630-Speed 13858.58 samples/sec   Loss 7.0640   LearningRate 0.0010   Epoch: 3   Global Step: 6750   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-03-03 11:28:11,315-Speed 13897.42 samples/sec   Loss 7.0414   LearningRate 0.0010   Epoch: 3   Global Step: 6760   Fp16 Grad Scale: 262144   Required: 31 hours
Training: 2022-03-03 11:28:29,010-Speed 13889.07 samples/sec   Loss 7.0256   LearningRate 0.0010   Epoch: 3   Global Step: 6770   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-03-03 11:28:46,668-Speed 13919.05 samples/sec   Loss 6.9943   LearningRate 0.0010   Epoch: 3   Global Step: 6780   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-03-03 11:29:04,473-Speed 13803.44 samples/sec   Loss 7.0218   LearningRate 0.0010   Epoch: 3   Global Step: 6790   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-03-03 11:29:22,288-Speed 13796.24 samples/sec   Loss 7.0620   LearningRate 0.0010   Epoch: 3   Global Step: 6800   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-03-03 11:29:39,969-Speed 13900.48 samples/sec   Loss 7.0222   LearningRate 0.0010   Epoch: 3   Global Step: 6810   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-03-03 11:29:57,692-Speed 13867.83 samples/sec   Loss 6.9909   LearningRate 0.0010   Epoch: 3   Global Step: 6820   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-03-03 11:30:15,463-Speed 13829.70 samples/sec   Loss 6.9650   LearningRate 0.0010   Epoch: 3   Global Step: 6830   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-03-03 11:30:33,296-Speed 13781.89 samples/sec   Loss 7.0010   LearningRate 0.0010   Epoch: 3   Global Step: 6840   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-03-03 11:30:51,039-Speed 13852.69 samples/sec   Loss 6.9439   LearningRate 0.0010   Epoch: 3   Global Step: 6850   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-03-03 11:31:08,740-Speed 13884.54 samples/sec   Loss 6.9994   LearningRate 0.0010   Epoch: 3   Global Step: 6860   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-03-03 11:31:26,476-Speed 13857.61 samples/sec   Loss 6.9402   LearningRate 0.0010   Epoch: 3   Global Step: 6870   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-03-03 11:31:44,314-Speed 13777.63 samples/sec   Loss 7.0064   LearningRate 0.0010   Epoch: 3   Global Step: 6880   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-03-03 11:32:02,015-Speed 13885.33 samples/sec   Loss 7.0281   LearningRate 0.0010   Epoch: 3   Global Step: 6890   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-03-03 11:32:19,824-Speed 13800.55 samples/sec   Loss 6.9503   LearningRate 0.0010   Epoch: 3   Global Step: 6900   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-03-03 11:32:37,579-Speed 13842.60 samples/sec   Loss 6.9493   LearningRate 0.0010   Epoch: 3   Global Step: 6910   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-03 11:33:46,918-Speed 3544.37 samples/sec   Loss 6.8595   LearningRate 0.0010   Epoch: 4   Global Step: 6920   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-03 11:34:04,771-Speed 13766.95 samples/sec   Loss 6.8145   LearningRate 0.0010   Epoch: 4   Global Step: 6930   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-03 11:34:22,421-Speed 13925.61 samples/sec   Loss 6.7803   LearningRate 0.0010   Epoch: 4   Global Step: 6940   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-03 11:34:40,110-Speed 13894.36 samples/sec   Loss 6.8227   LearningRate 0.0010   Epoch: 4   Global Step: 6950   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-03 11:34:57,821-Speed 13877.44 samples/sec   Loss 6.8824   LearningRate 0.0010   Epoch: 4   Global Step: 6960   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-03 11:35:15,536-Speed 13874.26 samples/sec   Loss 6.7643   LearningRate 0.0010   Epoch: 4   Global Step: 6970   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-03 11:35:33,211-Speed 13904.69 samples/sec   Loss 6.7486   LearningRate 0.0010   Epoch: 4   Global Step: 6980   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-03 11:35:50,997-Speed 13818.47 samples/sec   Loss 6.7839   LearningRate 0.0010   Epoch: 4   Global Step: 6990   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-03 11:36:08,707-Speed 13877.89 samples/sec   Loss 6.7819   LearningRate 0.0010   Epoch: 4   Global Step: 7000   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-03 11:36:26,434-Speed 13864.20 samples/sec   Loss 6.7266   LearningRate 0.0010   Epoch: 4   Global Step: 7010   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-03-03 11:36:44,314-Speed 13745.55 samples/sec   Loss 6.6949   LearningRate 0.0010   Epoch: 4   Global Step: 7020   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-03-03 11:37:02,091-Speed 13825.74 samples/sec   Loss 6.7691   LearningRate 0.0010   Epoch: 4   Global Step: 7030   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-03-03 11:37:19,854-Speed 13836.39 samples/sec   Loss 6.7311   LearningRate 0.0010   Epoch: 4   Global Step: 7040   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-03-03 11:37:37,540-Speed 13896.52 samples/sec   Loss 6.6866   LearningRate 0.0010   Epoch: 4   Global Step: 7050   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-03-03 11:37:55,268-Speed 13863.83 samples/sec   Loss 6.7489   LearningRate 0.0010   Epoch: 4   Global Step: 7060   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-03-03 11:38:13,109-Speed 13775.42 samples/sec   Loss 6.6964   LearningRate 0.0010   Epoch: 4   Global Step: 7070   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-03-03 11:38:30,800-Speed 13892.93 samples/sec   Loss 6.6824   LearningRate 0.0010   Epoch: 4   Global Step: 7080   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-03-03 11:38:48,499-Speed 13886.93 samples/sec   Loss 6.6523   LearningRate 0.0010   Epoch: 4   Global Step: 7090   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-03 11:39:06,170-Speed 13907.65 samples/sec   Loss 6.6479   LearningRate 0.0010   Epoch: 4   Global Step: 7100   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-03 11:39:23,933-Speed 13836.25 samples/sec   Loss 6.6578   LearningRate 0.0010   Epoch: 4   Global Step: 7110   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-03 11:39:41,615-Speed 13900.04 samples/sec   Loss 6.6562   LearningRate 0.0010   Epoch: 4   Global Step: 7120   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-03-03 11:39:59,340-Speed 13866.34 samples/sec   Loss 6.6721   LearningRate 0.0010   Epoch: 4   Global Step: 7130   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-03-03 11:40:17,062-Speed 13869.77 samples/sec   Loss 6.6484   LearningRate 0.0010   Epoch: 4   Global Step: 7140   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-03-03 11:40:34,779-Speed 13872.22 samples/sec   Loss 6.5812   LearningRate 0.0010   Epoch: 4   Global Step: 7150   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-03-03 11:40:52,542-Speed 13837.30 samples/sec   Loss 6.6120   LearningRate 0.0010   Epoch: 4   Global Step: 7160   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-03-03 11:41:10,253-Speed 13877.07 samples/sec   Loss 6.6608   LearningRate 0.0010   Epoch: 4   Global Step: 7170   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-03-03 11:41:27,939-Speed 13895.74 samples/sec   Loss 6.5617   LearningRate 0.0010   Epoch: 4   Global Step: 7180   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-03-03 11:41:45,782-Speed 13774.29 samples/sec   Loss 6.5695   LearningRate 0.0010   Epoch: 4   Global Step: 7190   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-03-03 11:42:03,475-Speed 13893.25 samples/sec   Loss 6.5599   LearningRate 0.0010   Epoch: 4   Global Step: 7200   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-03-03 11:42:21,212-Speed 13856.38 samples/sec   Loss 6.5351   LearningRate 0.0010   Epoch: 4   Global Step: 7210   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-03-03 11:42:38,957-Speed 13850.82 samples/sec   Loss 6.5420   LearningRate 0.0010   Epoch: 4   Global Step: 7220   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-03 11:42:56,702-Speed 13850.59 samples/sec   Loss 6.5210   LearningRate 0.0010   Epoch: 4   Global Step: 7230   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-03 11:43:14,375-Speed 13906.35 samples/sec   Loss 6.5523   LearningRate 0.0010   Epoch: 4   Global Step: 7240   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-03 11:43:32,095-Speed 13870.57 samples/sec   Loss 6.4744   LearningRate 0.0010   Epoch: 4   Global Step: 7250   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-03-03 11:43:49,839-Speed 13850.86 samples/sec   Loss 6.4732   LearningRate 0.0010   Epoch: 4   Global Step: 7260   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-03-03 11:44:07,547-Speed 13880.18 samples/sec   Loss 6.4895   LearningRate 0.0010   Epoch: 4   Global Step: 7270   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-03-03 11:44:25,287-Speed 13853.88 samples/sec   Loss 6.5068   LearningRate 0.0010   Epoch: 4   Global Step: 7280   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-03-03 11:44:42,979-Speed 13892.36 samples/sec   Loss 6.4917   LearningRate 0.0010   Epoch: 4   Global Step: 7290   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-03-03 11:45:00,673-Speed 13889.66 samples/sec   Loss 6.4930   LearningRate 0.0010   Epoch: 4   Global Step: 7300   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-03-03 11:45:18,363-Speed 13894.34 samples/sec   Loss 6.4371   LearningRate 0.0010   Epoch: 4   Global Step: 7310   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-03-03 11:45:36,087-Speed 13866.97 samples/sec   Loss 6.4642   LearningRate 0.0010   Epoch: 4   Global Step: 7320   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-03-03 11:45:53,862-Speed 13826.86 samples/sec   Loss 6.4216   LearningRate 0.0010   Epoch: 4   Global Step: 7330   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-03-03 11:46:11,652-Speed 13815.28 samples/sec   Loss 6.4242   LearningRate 0.0010   Epoch: 4   Global Step: 7340   Fp16 Grad Scale: 32768   Required: 31 hours
Training: 2022-03-03 11:46:29,344-Speed 13892.26 samples/sec   Loss 6.4084   LearningRate 0.0010   Epoch: 4   Global Step: 7350   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-03 11:46:47,172-Speed 13785.64 samples/sec   Loss 6.4823   LearningRate 0.0010   Epoch: 4   Global Step: 7360   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-03 11:47:04,960-Speed 13816.52 samples/sec   Loss 6.4028   LearningRate 0.0010   Epoch: 4   Global Step: 7370   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-03 11:47:22,682-Speed 13868.85 samples/sec   Loss 6.3541   LearningRate 0.0010   Epoch: 4   Global Step: 7380   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-03 11:47:40,497-Speed 13796.21 samples/sec   Loss 6.3768   LearningRate 0.0010   Epoch: 4   Global Step: 7390   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-03 11:47:58,275-Speed 13824.61 samples/sec   Loss 6.3632   LearningRate 0.0010   Epoch: 4   Global Step: 7400   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-03 11:48:16,086-Speed 13799.07 samples/sec   Loss 6.3117   LearningRate 0.0010   Epoch: 4   Global Step: 7410   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-03 11:48:33,863-Speed 13825.45 samples/sec   Loss 6.3535   LearningRate 0.0010   Epoch: 4   Global Step: 7420   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-03 11:48:51,535-Speed 13907.46 samples/sec   Loss 6.3571   LearningRate 0.0010   Epoch: 4   Global Step: 7430   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-03 11:49:09,355-Speed 13795.24 samples/sec   Loss 6.2703   LearningRate 0.0010   Epoch: 4   Global Step: 7440   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-03 11:49:27,058-Speed 13883.11 samples/sec   Loss 6.2320   LearningRate 0.0010   Epoch: 4   Global Step: 7450   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-03-03 11:49:44,783-Speed 13866.31 samples/sec   Loss 6.2565   LearningRate 0.0010   Epoch: 4   Global Step: 7460   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-03-03 11:50:02,500-Speed 13872.11 samples/sec   Loss 6.2434   LearningRate 0.0010   Epoch: 4   Global Step: 7470   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-03-03 11:50:20,270-Speed 13832.41 samples/sec   Loss 6.3493   LearningRate 0.0010   Epoch: 4   Global Step: 7480   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-03-03 11:50:37,987-Speed 13873.30 samples/sec   Loss 6.2754   LearningRate 0.0010   Epoch: 4   Global Step: 7490   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-03-03 11:50:55,692-Speed 13881.93 samples/sec   Loss 6.2093   LearningRate 0.0010   Epoch: 4   Global Step: 7500   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-03 11:51:13,392-Speed 13885.43 samples/sec   Loss 6.2338   LearningRate 0.0010   Epoch: 4   Global Step: 7510   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-03 11:51:31,088-Speed 13889.34 samples/sec   Loss 6.2292   LearningRate 0.0010   Epoch: 4   Global Step: 7520   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-03 11:51:48,897-Speed 13800.00 samples/sec   Loss 6.2225   LearningRate 0.0010   Epoch: 4   Global Step: 7530   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-03 11:52:06,701-Speed 13805.82 samples/sec   Loss 6.1897   LearningRate 0.0010   Epoch: 4   Global Step: 7540   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-03 11:52:24,422-Speed 13868.88 samples/sec   Loss 6.1918   LearningRate 0.0010   Epoch: 4   Global Step: 7550   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-03 11:52:42,131-Speed 13879.06 samples/sec   Loss 6.1886   LearningRate 0.0010   Epoch: 4   Global Step: 7560   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-03 11:52:59,938-Speed 13802.92 samples/sec   Loss 6.1791   LearningRate 0.0010   Epoch: 4   Global Step: 7570   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-03 11:53:17,708-Speed 13830.77 samples/sec   Loss 6.1477   LearningRate 0.0010   Epoch: 4   Global Step: 7580   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-03 11:53:35,455-Speed 13848.56 samples/sec   Loss 6.2307   LearningRate 0.0010   Epoch: 4   Global Step: 7590   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-03 11:53:53,161-Speed 13880.95 samples/sec   Loss 6.1754   LearningRate 0.0010   Epoch: 4   Global Step: 7600   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-03-03 11:54:10,861-Speed 13886.16 samples/sec   Loss 6.1653   LearningRate 0.0010   Epoch: 4   Global Step: 7610   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-03-03 11:54:28,680-Speed 13792.79 samples/sec   Loss 6.1261   LearningRate 0.0010   Epoch: 4   Global Step: 7620   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-03-03 11:54:46,388-Speed 13879.52 samples/sec   Loss 6.1076   LearningRate 0.0010   Epoch: 4   Global Step: 7630   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-03 11:55:04,124-Speed 13856.96 samples/sec   Loss 6.0950   LearningRate 0.0010   Epoch: 4   Global Step: 7640   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-03 11:55:21,842-Speed 13872.18 samples/sec   Loss 6.1017   LearningRate 0.0010   Epoch: 4   Global Step: 7650   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-03 11:55:39,638-Speed 13810.30 samples/sec   Loss 6.1015   LearningRate 0.0010   Epoch: 4   Global Step: 7660   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-03 11:55:57,361-Speed 13867.82 samples/sec   Loss 6.1118   LearningRate 0.0010   Epoch: 4   Global Step: 7670   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-03 11:56:15,113-Speed 13845.02 samples/sec   Loss 6.0885   LearningRate 0.0010   Epoch: 4   Global Step: 7680   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-03 11:56:32,845-Speed 13860.20 samples/sec   Loss 6.0642   LearningRate 0.0010   Epoch: 4   Global Step: 7690   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-03 11:56:50,511-Speed 13912.27 samples/sec   Loss 6.1169   LearningRate 0.0010   Epoch: 4   Global Step: 7700   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-03 11:57:08,247-Speed 13857.08 samples/sec   Loss 6.0657   LearningRate 0.0010   Epoch: 4   Global Step: 7710   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-03 11:57:25,977-Speed 13862.42 samples/sec   Loss 5.9842   LearningRate 0.0010   Epoch: 4   Global Step: 7720   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-03 11:57:43,837-Speed 13761.03 samples/sec   Loss 5.9849   LearningRate 0.0010   Epoch: 4   Global Step: 7730   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-03-03 11:58:01,639-Speed 13806.57 samples/sec   Loss 6.0537   LearningRate 0.0010   Epoch: 4   Global Step: 7740   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-03-03 11:58:19,331-Speed 13891.32 samples/sec   Loss 5.9869   LearningRate 0.0010   Epoch: 4   Global Step: 7750   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-03-03 11:58:37,055-Speed 13866.84 samples/sec   Loss 5.9425   LearningRate 0.0010   Epoch: 4   Global Step: 7760   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-03-03 11:58:54,806-Speed 13846.30 samples/sec   Loss 6.0490   LearningRate 0.0010   Epoch: 4   Global Step: 7770   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-03 11:59:12,538-Speed 13860.08 samples/sec   Loss 5.9986   LearningRate 0.0010   Epoch: 4   Global Step: 7780   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-03 11:59:30,292-Speed 13845.11 samples/sec   Loss 5.9317   LearningRate 0.0010   Epoch: 4   Global Step: 7790   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-03 11:59:47,985-Speed 13890.87 samples/sec   Loss 5.9845   LearningRate 0.0010   Epoch: 4   Global Step: 7800   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-03 12:00:05,694-Speed 13878.31 samples/sec   Loss 5.9729   LearningRate 0.0010   Epoch: 4   Global Step: 7810   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-03 12:00:23,418-Speed 13866.87 samples/sec   Loss 5.9748   LearningRate 0.0010   Epoch: 4   Global Step: 7820   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-03 12:00:41,121-Speed 13883.66 samples/sec   Loss 5.9268   LearningRate 0.0010   Epoch: 4   Global Step: 7830   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-03 12:00:58,795-Speed 13906.85 samples/sec   Loss 5.8962   LearningRate 0.0010   Epoch: 4   Global Step: 7840   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-03 12:01:16,527-Speed 13861.18 samples/sec   Loss 5.9139   LearningRate 0.0010   Epoch: 4   Global Step: 7850   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-03 12:01:34,220-Speed 13890.47 samples/sec   Loss 5.8925   LearningRate 0.0010   Epoch: 4   Global Step: 7860   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-03 12:01:51,964-Speed 13851.69 samples/sec   Loss 5.8865   LearningRate 0.0010   Epoch: 4   Global Step: 7870   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-03 12:02:09,724-Speed 13838.63 samples/sec   Loss 5.8703   LearningRate 0.0010   Epoch: 4   Global Step: 7880   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-03 12:02:27,500-Speed 13827.19 samples/sec   Loss 5.8644   LearningRate 0.0010   Epoch: 4   Global Step: 7890   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-03 12:02:45,346-Speed 13771.86 samples/sec   Loss 5.9126   LearningRate 0.0010   Epoch: 4   Global Step: 7900   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-03 12:03:03,150-Speed 13805.09 samples/sec   Loss 5.8778   LearningRate 0.0010   Epoch: 4   Global Step: 7910   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-03 12:03:20,947-Speed 13810.64 samples/sec   Loss 5.8631   LearningRate 0.0010   Epoch: 4   Global Step: 7920   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-03 12:03:38,763-Speed 13795.00 samples/sec   Loss 5.8581   LearningRate 0.0010   Epoch: 4   Global Step: 7930   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-03 12:03:56,450-Speed 13895.15 samples/sec   Loss 5.8795   LearningRate 0.0010   Epoch: 4   Global Step: 7940   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-03 12:04:14,153-Speed 13883.69 samples/sec   Loss 5.8325   LearningRate 0.0010   Epoch: 4   Global Step: 7950   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-03 12:04:31,838-Speed 13897.41 samples/sec   Loss 5.7853   LearningRate 0.0010   Epoch: 4   Global Step: 7960   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-03 12:04:49,534-Speed 13888.95 samples/sec   Loss 5.7646   LearningRate 0.0010   Epoch: 4   Global Step: 7970   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-03-03 12:05:07,247-Speed 13874.97 samples/sec   Loss 5.8718   LearningRate 0.0010   Epoch: 4   Global Step: 7980   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-03-03 12:05:24,976-Speed 13862.76 samples/sec   Loss 5.9171   LearningRate 0.0010   Epoch: 4   Global Step: 7990   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-03-03 12:05:42,730-Speed 13844.14 samples/sec   Loss 5.8141   LearningRate 0.0010   Epoch: 4   Global Step: 8000   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-03 12:06:00,463-Speed 13860.00 samples/sec   Loss 5.7564   LearningRate 0.0010   Epoch: 4   Global Step: 8010   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-03 12:06:18,127-Speed 13913.85 samples/sec   Loss 5.8101   LearningRate 0.0010   Epoch: 4   Global Step: 8020   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-03 12:06:35,883-Speed 13841.35 samples/sec   Loss 5.7684   LearningRate 0.0010   Epoch: 4   Global Step: 8030   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-03 12:06:53,591-Speed 13879.44 samples/sec   Loss 5.7504   LearningRate 0.0010   Epoch: 4   Global Step: 8040   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-03 12:07:11,321-Speed 13862.13 samples/sec   Loss 5.7338   LearningRate 0.0010   Epoch: 4   Global Step: 8050   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-03 12:07:28,978-Speed 13920.77 samples/sec   Loss 5.7265   LearningRate 0.0010   Epoch: 4   Global Step: 8060   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-03 12:07:46,696-Speed 13870.78 samples/sec   Loss 5.7123   LearningRate 0.0010   Epoch: 4   Global Step: 8070   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-03 12:08:04,576-Speed 13747.40 samples/sec   Loss 5.7168   LearningRate 0.0010   Epoch: 4   Global Step: 8080   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-03 12:08:22,388-Speed 13797.97 samples/sec   Loss 5.7055   LearningRate 0.0010   Epoch: 4   Global Step: 8090   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-03 12:08:40,134-Speed 13850.21 samples/sec   Loss 5.6660   LearningRate 0.0010   Epoch: 4   Global Step: 8100   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-03-03 12:08:57,949-Speed 13796.66 samples/sec   Loss 5.6923   LearningRate 0.0010   Epoch: 4   Global Step: 8110   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-03-03 12:09:15,639-Speed 13893.58 samples/sec   Loss 5.6847   LearningRate 0.0010   Epoch: 4   Global Step: 8120   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-03-03 12:09:33,311-Speed 13907.45 samples/sec   Loss 5.6898   LearningRate 0.0010   Epoch: 4   Global Step: 8130   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-03 12:09:51,064-Speed 13843.81 samples/sec   Loss 5.6937   LearningRate 0.0010   Epoch: 4   Global Step: 8140   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-03 12:10:08,831-Speed 13833.49 samples/sec   Loss 5.6979   LearningRate 0.0010   Epoch: 4   Global Step: 8150   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-03 12:10:26,539-Speed 13879.53 samples/sec   Loss 5.6308   LearningRate 0.0010   Epoch: 4   Global Step: 8160   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-03 12:10:44,377-Speed 13778.22 samples/sec   Loss 5.5972   LearningRate 0.0010   Epoch: 4   Global Step: 8170   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-03 12:11:02,066-Speed 13894.01 samples/sec   Loss 5.6429   LearningRate 0.0010   Epoch: 4   Global Step: 8180   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-03 12:11:19,777-Speed 13877.39 samples/sec   Loss 5.6453   LearningRate 0.0010   Epoch: 4   Global Step: 8190   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-03 12:11:37,494-Speed 13871.77 samples/sec   Loss 5.6337   LearningRate 0.0010   Epoch: 4   Global Step: 8200   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-03 12:11:55,193-Speed 13886.31 samples/sec   Loss 5.6203   LearningRate 0.0010   Epoch: 4   Global Step: 8210   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-03 12:12:12,890-Speed 13888.15 samples/sec   Loss 5.5945   LearningRate 0.0010   Epoch: 4   Global Step: 8220   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-03 12:12:30,695-Speed 13803.68 samples/sec   Loss 5.6367   LearningRate 0.0010   Epoch: 4   Global Step: 8230   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-03-03 12:12:48,390-Speed 13890.21 samples/sec   Loss 5.5799   LearningRate 0.0010   Epoch: 4   Global Step: 8240   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-03-03 12:13:06,097-Speed 13879.53 samples/sec   Loss 5.6133   LearningRate 0.0010   Epoch: 4   Global Step: 8250   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-03-03 12:13:23,849-Speed 13845.03 samples/sec   Loss 5.5780   LearningRate 0.0010   Epoch: 4   Global Step: 8260   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-03-03 12:13:41,546-Speed 13887.97 samples/sec   Loss 5.5837   LearningRate 0.0010   Epoch: 4   Global Step: 8270   Fp16 Grad Scale: 131072   Required: 31 hours
Training: 2022-03-03 12:13:59,229-Speed 13899.63 samples/sec   Loss 5.5545   LearningRate 0.0010   Epoch: 4   Global Step: 8280   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-03 12:14:16,955-Speed 13865.17 samples/sec   Loss 5.5665   LearningRate 0.0010   Epoch: 4   Global Step: 8290   Fp16 Grad Scale: 65536   Required: 31 hours
Training: 2022-03-03 12:14:34,700-Speed 13849.92 samples/sec   Loss 5.6413   LearningRate 0.0010   Epoch: 4   Global Step: 8300   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-03 12:14:52,415-Speed 13873.82 samples/sec   Loss 5.5377   LearningRate 0.0010   Epoch: 4   Global Step: 8310   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-03 12:15:10,188-Speed 13829.26 samples/sec   Loss 5.5849   LearningRate 0.0010   Epoch: 4   Global Step: 8320   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-03 12:15:27,925-Speed 13856.60 samples/sec   Loss 5.5117   LearningRate 0.0010   Epoch: 4   Global Step: 8330   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-03 12:15:45,661-Speed 13857.06 samples/sec   Loss 5.5481   LearningRate 0.0010   Epoch: 4   Global Step: 8340   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-03 12:16:03,388-Speed 13865.14 samples/sec   Loss 5.5072   LearningRate 0.0010   Epoch: 4   Global Step: 8350   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-03 12:16:21,119-Speed 13861.23 samples/sec   Loss 5.5238   LearningRate 0.0010   Epoch: 4   Global Step: 8360   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-03 12:16:38,827-Speed 13878.78 samples/sec   Loss 5.4468   LearningRate 0.0010   Epoch: 4   Global Step: 8370   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-03 12:16:56,565-Speed 13856.45 samples/sec   Loss 5.4910   LearningRate 0.0010   Epoch: 4   Global Step: 8380   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-03 12:17:14,274-Speed 13878.64 samples/sec   Loss 5.4645   LearningRate 0.0010   Epoch: 4   Global Step: 8390   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-03 12:17:31,984-Speed 13877.81 samples/sec   Loss 5.4844   LearningRate 0.0010   Epoch: 4   Global Step: 8400   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-03 12:17:49,819-Speed 13780.38 samples/sec   Loss 5.5219   LearningRate 0.0010   Epoch: 4   Global Step: 8410   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-03 12:18:07,543-Speed 13866.89 samples/sec   Loss 5.4591   LearningRate 0.0010   Epoch: 4   Global Step: 8420   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-03 12:18:25,287-Speed 13851.35 samples/sec   Loss 5.4413   LearningRate 0.0010   Epoch: 4   Global Step: 8430   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-03 12:18:42,975-Speed 13894.46 samples/sec   Loss 5.4389   LearningRate 0.0010   Epoch: 4   Global Step: 8440   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-03 12:19:00,697-Speed 13868.04 samples/sec   Loss 5.4626   LearningRate 0.0010   Epoch: 4   Global Step: 8450   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-03 12:19:18,448-Speed 13845.77 samples/sec   Loss 5.4210   LearningRate 0.0010   Epoch: 4   Global Step: 8460   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-03 12:19:36,154-Speed 13881.18 samples/sec   Loss 5.3834   LearningRate 0.0010   Epoch: 4   Global Step: 8470   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-03 12:19:53,854-Speed 13886.33 samples/sec   Loss 5.4207   LearningRate 0.0010   Epoch: 4   Global Step: 8480   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-03 12:20:11,590-Speed 13857.54 samples/sec   Loss 5.4703   LearningRate 0.0009   Epoch: 4   Global Step: 8490   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-03 12:20:29,253-Speed 13915.25 samples/sec   Loss 5.4452   LearningRate 0.0009   Epoch: 4   Global Step: 8500   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-03 12:20:46,947-Speed 13889.98 samples/sec   Loss 5.3985   LearningRate 0.0009   Epoch: 4   Global Step: 8510   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-03 12:21:04,702-Speed 13842.55 samples/sec   Loss 5.3866   LearningRate 0.0009   Epoch: 4   Global Step: 8520   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-03-03 12:21:22,417-Speed 13874.28 samples/sec   Loss 5.3425   LearningRate 0.0009   Epoch: 4   Global Step: 8530   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-03-03 12:21:40,172-Speed 13842.37 samples/sec   Loss 5.4026   LearningRate 0.0009   Epoch: 4   Global Step: 8540   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-03 12:21:57,908-Speed 13857.11 samples/sec   Loss 5.3651   LearningRate 0.0009   Epoch: 4   Global Step: 8550   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-03 12:22:15,612-Speed 13882.35 samples/sec   Loss 5.3694   LearningRate 0.0009   Epoch: 4   Global Step: 8560   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-03 12:22:33,260-Speed 13927.22 samples/sec   Loss 5.4168   LearningRate 0.0009   Epoch: 4   Global Step: 8570   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-03 12:22:50,968-Speed 13879.43 samples/sec   Loss 5.4318   LearningRate 0.0009   Epoch: 4   Global Step: 8580   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-03 12:23:08,708-Speed 13854.30 samples/sec   Loss 5.3632   LearningRate 0.0009   Epoch: 4   Global Step: 8590   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-03 12:23:26,359-Speed 13924.10 samples/sec   Loss 5.3415   LearningRate 0.0009   Epoch: 4   Global Step: 8600   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-03 12:23:44,095-Speed 13858.18 samples/sec   Loss 5.3200   LearningRate 0.0009   Epoch: 4   Global Step: 8610   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-03 12:24:01,860-Speed 13834.95 samples/sec   Loss 5.4301   LearningRate 0.0009   Epoch: 4   Global Step: 8620   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-03 12:24:19,598-Speed 13855.59 samples/sec   Loss 5.3968   LearningRate 0.0009   Epoch: 4   Global Step: 8630   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-03 12:24:37,319-Speed 13869.18 samples/sec   Loss 5.3822   LearningRate 0.0009   Epoch: 4   Global Step: 8640   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-03 12:25:45,088-Speed 3626.50 samples/sec   Loss 5.2538   LearningRate 0.0009   Epoch: 5   Global Step: 8650   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-03 12:26:02,780-Speed 13891.67 samples/sec   Loss 5.2590   LearningRate 0.0009   Epoch: 5   Global Step: 8660   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-03 12:26:20,467-Speed 13896.11 samples/sec   Loss 5.2486   LearningRate 0.0009   Epoch: 5   Global Step: 8670   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-03 12:26:38,101-Speed 13937.79 samples/sec   Loss 5.2361   LearningRate 0.0009   Epoch: 5   Global Step: 8680   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-03 12:26:55,888-Speed 13817.88 samples/sec   Loss 5.2126   LearningRate 0.0009   Epoch: 5   Global Step: 8690   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-03 12:27:13,611-Speed 13867.51 samples/sec   Loss 5.2032   LearningRate 0.0009   Epoch: 5   Global Step: 8700   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-03 12:27:31,360-Speed 13847.29 samples/sec   Loss 5.2472   LearningRate 0.0009   Epoch: 5   Global Step: 8710   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-03 12:27:49,063-Speed 13883.26 samples/sec   Loss 5.2589   LearningRate 0.0009   Epoch: 5   Global Step: 8720   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-03 12:28:06,974-Speed 13721.49 samples/sec   Loss 5.2320   LearningRate 0.0009   Epoch: 5   Global Step: 8730   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-03 12:28:24,773-Speed 13808.92 samples/sec   Loss 5.2187   LearningRate 0.0009   Epoch: 5   Global Step: 8740   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-03 12:28:42,548-Speed 13827.95 samples/sec   Loss 5.2072   LearningRate 0.0009   Epoch: 5   Global Step: 8750   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-03 12:29:00,289-Speed 13853.71 samples/sec   Loss 5.2125   LearningRate 0.0009   Epoch: 5   Global Step: 8760   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-03 12:29:17,988-Speed 13887.12 samples/sec   Loss 5.2141   LearningRate 0.0009   Epoch: 5   Global Step: 8770   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-03 12:29:35,665-Speed 13902.75 samples/sec   Loss 5.2140   LearningRate 0.0009   Epoch: 5   Global Step: 8780   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-03 12:29:53,423-Speed 13840.51 samples/sec   Loss 5.2261   LearningRate 0.0009   Epoch: 5   Global Step: 8790   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-03 12:30:11,141-Speed 13871.43 samples/sec   Loss 5.1935   LearningRate 0.0009   Epoch: 5   Global Step: 8800   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-03 12:30:28,844-Speed 13883.77 samples/sec   Loss 5.1517   LearningRate 0.0009   Epoch: 5   Global Step: 8810   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-03 12:30:46,722-Speed 13747.44 samples/sec   Loss 5.1879   LearningRate 0.0009   Epoch: 5   Global Step: 8820   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-03 12:31:04,429-Speed 13879.09 samples/sec   Loss 5.2668   LearningRate 0.0009   Epoch: 5   Global Step: 8830   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-03 12:31:22,152-Speed 13868.27 samples/sec   Loss 5.1722   LearningRate 0.0009   Epoch: 5   Global Step: 8840   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-03 12:31:39,871-Speed 13870.28 samples/sec   Loss 5.1607   LearningRate 0.0009   Epoch: 5   Global Step: 8850   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-03 12:31:57,537-Speed 13912.63 samples/sec   Loss 5.1710   LearningRate 0.0009   Epoch: 5   Global Step: 8860   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-03 12:32:15,300-Speed 13836.68 samples/sec   Loss 5.1177   LearningRate 0.0009   Epoch: 5   Global Step: 8870   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-03 12:32:33,028-Speed 13863.76 samples/sec   Loss 5.1800   LearningRate 0.0009   Epoch: 5   Global Step: 8880   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-03 12:32:50,701-Speed 13906.51 samples/sec   Loss 5.1770   LearningRate 0.0009   Epoch: 5   Global Step: 8890   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-03 12:33:08,403-Speed 13884.75 samples/sec   Loss 5.1484   LearningRate 0.0009   Epoch: 5   Global Step: 8900   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-03 12:33:26,115-Speed 13875.51 samples/sec   Loss 5.1315   LearningRate 0.0009   Epoch: 5   Global Step: 8910   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-03 12:33:43,899-Speed 13820.61 samples/sec   Loss 5.1557   LearningRate 0.0009   Epoch: 5   Global Step: 8920   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-03 12:34:01,628-Speed 13862.82 samples/sec   Loss 5.1618   LearningRate 0.0009   Epoch: 5   Global Step: 8930   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-03 12:34:19,410-Speed 13821.76 samples/sec   Loss 5.2165   LearningRate 0.0009   Epoch: 5   Global Step: 8940   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-03 12:34:37,121-Speed 13877.10 samples/sec   Loss 5.1359   LearningRate 0.0009   Epoch: 5   Global Step: 8950   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-03 12:34:54,848-Speed 13864.31 samples/sec   Loss 5.1074   LearningRate 0.0009   Epoch: 5   Global Step: 8960   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-03 12:35:12,544-Speed 13889.32 samples/sec   Loss 5.1146   LearningRate 0.0009   Epoch: 5   Global Step: 8970   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-03 12:35:30,223-Speed 13902.00 samples/sec   Loss 5.1149   LearningRate 0.0009   Epoch: 5   Global Step: 8980   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-03 12:35:47,984-Speed 13838.70 samples/sec   Loss 5.0975   LearningRate 0.0009   Epoch: 5   Global Step: 8990   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-03 12:36:05,651-Speed 13912.73 samples/sec   Loss 5.1304   LearningRate 0.0009   Epoch: 5   Global Step: 9000   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-03 12:36:23,420-Speed 13831.59 samples/sec   Loss 5.1105   LearningRate 0.0009   Epoch: 5   Global Step: 9010   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-03 12:36:41,142-Speed 13868.46 samples/sec   Loss 5.0839   LearningRate 0.0009   Epoch: 5   Global Step: 9020   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-03 12:36:58,798-Speed 13920.48 samples/sec   Loss 5.0750   LearningRate 0.0009   Epoch: 5   Global Step: 9030   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-03 12:37:16,578-Speed 13825.16 samples/sec   Loss 5.0688   LearningRate 0.0009   Epoch: 5   Global Step: 9040   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-03 12:37:34,282-Speed 13883.20 samples/sec   Loss 5.0592   LearningRate 0.0009   Epoch: 5   Global Step: 9050   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-03 12:37:51,931-Speed 13925.46 samples/sec   Loss 5.0309   LearningRate 0.0009   Epoch: 5   Global Step: 9060   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-03 12:38:09,566-Speed 13937.01 samples/sec   Loss 5.0588   LearningRate 0.0009   Epoch: 5   Global Step: 9070   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-03 12:38:27,277-Speed 13876.85 samples/sec   Loss 5.0716   LearningRate 0.0009   Epoch: 5   Global Step: 9080   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-03 12:38:45,155-Speed 13747.25 samples/sec   Loss 5.0358   LearningRate 0.0009   Epoch: 5   Global Step: 9090   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-03 12:39:02,883-Speed 13863.10 samples/sec   Loss 5.0187   LearningRate 0.0009   Epoch: 5   Global Step: 9100   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-03 12:39:20,660-Speed 13826.19 samples/sec   Loss 5.0667   LearningRate 0.0009   Epoch: 5   Global Step: 9110   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-03 12:39:38,342-Speed 13899.62 samples/sec   Loss 5.0335   LearningRate 0.0009   Epoch: 5   Global Step: 9120   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-03 12:39:56,061-Speed 13871.86 samples/sec   Loss 5.0512   LearningRate 0.0009   Epoch: 5   Global Step: 9130   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-03-03 12:40:13,741-Speed 13901.47 samples/sec   Loss 5.0174   LearningRate 0.0009   Epoch: 5   Global Step: 9140   Fp16 Grad Scale: 16384   Required: 30 hours
Training: 2022-03-03 12:40:31,534-Speed 13815.06 samples/sec   Loss 5.0282   LearningRate 0.0009   Epoch: 5   Global Step: 9150   Fp16 Grad Scale: 16384   Required: 30 hours
Training: 2022-03-03 12:40:49,260-Speed 13865.10 samples/sec   Loss 5.0220   LearningRate 0.0009   Epoch: 5   Global Step: 9160   Fp16 Grad Scale: 16384   Required: 30 hours
Training: 2022-03-03 12:41:06,940-Speed 13901.18 samples/sec   Loss 4.9853   LearningRate 0.0009   Epoch: 5   Global Step: 9170   Fp16 Grad Scale: 16384   Required: 30 hours
Training: 2022-03-03 12:41:24,618-Speed 13903.39 samples/sec   Loss 5.0486   LearningRate 0.0009   Epoch: 5   Global Step: 9180   Fp16 Grad Scale: 16384   Required: 30 hours
Training: 2022-03-03 12:41:42,305-Speed 13896.08 samples/sec   Loss 4.9868   LearningRate 0.0009   Epoch: 5   Global Step: 9190   Fp16 Grad Scale: 16384   Required: 30 hours
Training: 2022-03-03 12:42:00,050-Speed 13850.53 samples/sec   Loss 4.9471   LearningRate 0.0009   Epoch: 5   Global Step: 9200   Fp16 Grad Scale: 16384   Required: 30 hours
Training: 2022-03-03 12:42:17,794-Speed 13851.20 samples/sec   Loss 4.9594   LearningRate 0.0009   Epoch: 5   Global Step: 9210   Fp16 Grad Scale: 16384   Required: 30 hours
Training: 2022-03-03 12:42:35,542-Speed 13847.83 samples/sec   Loss 4.9822   LearningRate 0.0009   Epoch: 5   Global Step: 9220   Fp16 Grad Scale: 16384   Required: 30 hours
Training: 2022-03-03 12:42:53,342-Speed 13808.51 samples/sec   Loss 5.0119   LearningRate 0.0009   Epoch: 5   Global Step: 9230   Fp16 Grad Scale: 16384   Required: 30 hours
Training: 2022-03-03 12:43:11,071-Speed 13862.93 samples/sec   Loss 5.0280   LearningRate 0.0009   Epoch: 5   Global Step: 9240   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-03 12:43:28,825-Speed 13843.38 samples/sec   Loss 4.9590   LearningRate 0.0009   Epoch: 5   Global Step: 9250   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-03 12:43:46,520-Speed 13888.79 samples/sec   Loss 4.9672   LearningRate 0.0009   Epoch: 5   Global Step: 9260   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-03 12:44:04,244-Speed 13868.41 samples/sec   Loss 4.9393   LearningRate 0.0009   Epoch: 5   Global Step: 9270   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-03 12:44:21,976-Speed 13860.27 samples/sec   Loss 4.9227   LearningRate 0.0009   Epoch: 5   Global Step: 9280   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-03 12:44:39,723-Speed 13849.65 samples/sec   Loss 4.9632   LearningRate 0.0009   Epoch: 5   Global Step: 9290   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-03 12:44:57,524-Speed 13806.50 samples/sec   Loss 4.9243   LearningRate 0.0009   Epoch: 5   Global Step: 9300   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-03 12:45:15,214-Speed 13893.98 samples/sec   Loss 4.9456   LearningRate 0.0009   Epoch: 5   Global Step: 9310   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-03 12:45:32,925-Speed 13877.51 samples/sec   Loss 4.9745   LearningRate 0.0009   Epoch: 5   Global Step: 9320   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-03 12:45:50,764-Speed 13776.76 samples/sec   Loss 4.8887   LearningRate 0.0009   Epoch: 5   Global Step: 9330   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-03 12:46:08,507-Speed 13852.08 samples/sec   Loss 4.9672   LearningRate 0.0009   Epoch: 5   Global Step: 9340   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-03 12:46:26,221-Speed 13875.42 samples/sec   Loss 4.9097   LearningRate 0.0009   Epoch: 5   Global Step: 9350   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-03 12:46:43,958-Speed 13855.67 samples/sec   Loss 4.9202   LearningRate 0.0009   Epoch: 5   Global Step: 9360   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-03 12:47:01,597-Speed 13933.93 samples/sec   Loss 4.8767   LearningRate 0.0009   Epoch: 5   Global Step: 9370   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-03 12:47:19,288-Speed 13892.54 samples/sec   Loss 4.9070   LearningRate 0.0009   Epoch: 5   Global Step: 9380   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-03 12:47:37,004-Speed 13873.49 samples/sec   Loss 4.9208   LearningRate 0.0009   Epoch: 5   Global Step: 9390   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-03 12:47:54,827-Speed 13790.51 samples/sec   Loss 4.9760   LearningRate 0.0009   Epoch: 5   Global Step: 9400   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-03 12:48:12,616-Speed 13817.17 samples/sec   Loss 4.8860   LearningRate 0.0009   Epoch: 5   Global Step: 9410   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-03 12:48:30,345-Speed 13863.36 samples/sec   Loss 4.8421   LearningRate 0.0009   Epoch: 5   Global Step: 9420   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-03 12:48:48,041-Speed 13888.22 samples/sec   Loss 4.8891   LearningRate 0.0009   Epoch: 5   Global Step: 9430   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-03 12:49:05,778-Speed 13856.79 samples/sec   Loss 4.8751   LearningRate 0.0009   Epoch: 5   Global Step: 9440   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-03-03 12:49:23,554-Speed 13826.23 samples/sec   Loss 4.8713   LearningRate 0.0009   Epoch: 5   Global Step: 9450   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-03-03 12:49:41,369-Speed 13796.14 samples/sec   Loss 4.8659   LearningRate 0.0009   Epoch: 5   Global Step: 9460   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-03 12:49:59,117-Speed 13848.09 samples/sec   Loss 4.8399   LearningRate 0.0009   Epoch: 5   Global Step: 9470   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-03 12:50:16,877-Speed 13839.18 samples/sec   Loss 4.8295   LearningRate 0.0009   Epoch: 5   Global Step: 9480   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-03 12:50:34,695-Speed 13793.53 samples/sec   Loss 4.8555   LearningRate 0.0009   Epoch: 5   Global Step: 9490   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-03 12:50:52,452-Speed 13840.99 samples/sec   Loss 4.8607   LearningRate 0.0009   Epoch: 5   Global Step: 9500   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-03 12:51:10,296-Speed 13773.58 samples/sec   Loss 4.8234   LearningRate 0.0009   Epoch: 5   Global Step: 9510   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-03 12:51:28,047-Speed 13845.67 samples/sec   Loss 4.8291   LearningRate 0.0009   Epoch: 5   Global Step: 9520   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-03 12:51:45,772-Speed 13866.29 samples/sec   Loss 4.8258   LearningRate 0.0009   Epoch: 5   Global Step: 9530   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-03 12:52:03,483-Speed 13876.74 samples/sec   Loss 4.8458   LearningRate 0.0009   Epoch: 5   Global Step: 9540   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-03 12:52:21,260-Speed 13826.60 samples/sec   Loss 4.8755   LearningRate 0.0009   Epoch: 5   Global Step: 9550   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-03 12:52:39,008-Speed 13848.00 samples/sec   Loss 4.8168   LearningRate 0.0009   Epoch: 5   Global Step: 9560   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-03 12:52:56,790-Speed 13821.26 samples/sec   Loss 4.8153   LearningRate 0.0009   Epoch: 5   Global Step: 9570   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-03 12:53:14,585-Speed 13812.02 samples/sec   Loss 4.7891   LearningRate 0.0009   Epoch: 5   Global Step: 9580   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-03 12:53:32,366-Speed 13822.38 samples/sec   Loss 4.7897   LearningRate 0.0009   Epoch: 5   Global Step: 9590   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-03 12:53:50,124-Speed 13839.89 samples/sec   Loss 4.7952   LearningRate 0.0009   Epoch: 5   Global Step: 9600   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-03 12:54:08,067-Speed 13698.58 samples/sec   Loss 4.7872   LearningRate 0.0009   Epoch: 5   Global Step: 9610   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-03 12:54:25,824-Speed 13841.13 samples/sec   Loss 4.8123   LearningRate 0.0009   Epoch: 5   Global Step: 9620   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-03 12:54:43,562-Speed 13855.77 samples/sec   Loss 4.7709   LearningRate 0.0009   Epoch: 5   Global Step: 9630   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-03 12:55:01,316-Speed 13842.67 samples/sec   Loss 4.7686   LearningRate 0.0009   Epoch: 5   Global Step: 9640   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-03 12:55:19,043-Speed 13865.20 samples/sec   Loss 4.7669   LearningRate 0.0009   Epoch: 5   Global Step: 9650   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-03 12:55:36,851-Speed 13802.83 samples/sec   Loss 4.7883   LearningRate 0.0009   Epoch: 5   Global Step: 9660   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-03 12:55:54,621-Speed 13830.43 samples/sec   Loss 4.7549   LearningRate 0.0009   Epoch: 5   Global Step: 9670   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-03-03 12:56:12,381-Speed 13838.55 samples/sec   Loss 4.8214   LearningRate 0.0009   Epoch: 5   Global Step: 9680   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-03-03 12:56:30,149-Speed 13832.79 samples/sec   Loss 4.7804   LearningRate 0.0009   Epoch: 5   Global Step: 9690   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-03-03 12:56:47,846-Speed 13887.74 samples/sec   Loss 4.7429   LearningRate 0.0009   Epoch: 5   Global Step: 9700   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-03 12:57:05,599-Speed 13844.21 samples/sec   Loss 4.7618   LearningRate 0.0009   Epoch: 5   Global Step: 9710   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-03 12:57:23,297-Speed 13887.23 samples/sec   Loss 4.7299   LearningRate 0.0009   Epoch: 5   Global Step: 9720   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-03 12:57:41,008-Speed 13877.32 samples/sec   Loss 4.7755   LearningRate 0.0009   Epoch: 5   Global Step: 9730   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-03 12:57:58,698-Speed 13893.27 samples/sec   Loss 4.7079   LearningRate 0.0009   Epoch: 5   Global Step: 9740   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-03 12:58:16,386-Speed 13895.59 samples/sec   Loss 4.7475   LearningRate 0.0009   Epoch: 5   Global Step: 9750   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-03 12:58:34,076-Speed 13893.53 samples/sec   Loss 4.7513   LearningRate 0.0009   Epoch: 5   Global Step: 9760   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-03 12:58:51,847-Speed 13829.97 samples/sec   Loss 4.7016   LearningRate 0.0009   Epoch: 5   Global Step: 9770   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-03 12:59:09,643-Speed 13811.14 samples/sec   Loss 4.7397   LearningRate 0.0009   Epoch: 5   Global Step: 9780   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-03 12:59:27,348-Speed 13881.93 samples/sec   Loss 4.7068   LearningRate 0.0009   Epoch: 5   Global Step: 9790   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-03 12:59:45,178-Speed 13784.49 samples/sec   Loss 4.6934   LearningRate 0.0009   Epoch: 5   Global Step: 9800   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-03-03 13:00:02,932-Speed 13843.79 samples/sec   Loss 4.7009   LearningRate 0.0009   Epoch: 5   Global Step: 9810   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-03 13:00:20,720-Speed 13816.40 samples/sec   Loss 4.6957   LearningRate 0.0009   Epoch: 5   Global Step: 9820   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-03 13:00:38,438-Speed 13871.60 samples/sec   Loss 4.7186   LearningRate 0.0009   Epoch: 5   Global Step: 9830   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-03 13:00:56,124-Speed 13896.11 samples/sec   Loss 4.6883   LearningRate 0.0009   Epoch: 5   Global Step: 9840   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-03 13:01:13,853-Speed 13863.07 samples/sec   Loss 4.7098   LearningRate 0.0009   Epoch: 5   Global Step: 9850   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-03 13:01:31,560-Speed 13880.18 samples/sec   Loss 4.7058   LearningRate 0.0009   Epoch: 5   Global Step: 9860   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-03 13:01:49,370-Speed 13799.93 samples/sec   Loss 4.6503   LearningRate 0.0009   Epoch: 5   Global Step: 9870   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-03 13:02:07,172-Speed 13805.85 samples/sec   Loss 4.6414   LearningRate 0.0009   Epoch: 5   Global Step: 9880   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-03 13:02:24,871-Speed 13886.77 samples/sec   Loss 4.6894   LearningRate 0.0009   Epoch: 5   Global Step: 9890   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-03 13:02:42,680-Speed 13800.39 samples/sec   Loss 4.6821   LearningRate 0.0009   Epoch: 5   Global Step: 9900   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-03 13:03:00,458-Speed 13825.17 samples/sec   Loss 4.6539   LearningRate 0.0009   Epoch: 5   Global Step: 9910   Fp16 Grad Scale: 131072   Required: 30 hours
Training: 2022-03-03 13:03:18,114-Speed 13920.27 samples/sec   Loss 4.6407   LearningRate 0.0009   Epoch: 5   Global Step: 9920   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-03 13:03:35,835-Speed 13868.37 samples/sec   Loss 4.6177   LearningRate 0.0009   Epoch: 5   Global Step: 9930   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-03 13:03:53,688-Speed 13767.33 samples/sec   Loss 4.6223   LearningRate 0.0009   Epoch: 5   Global Step: 9940   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-03 13:04:11,442-Speed 13843.59 samples/sec   Loss 4.6491   LearningRate 0.0009   Epoch: 5   Global Step: 9950   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-03 13:04:29,159-Speed 13871.75 samples/sec   Loss 4.5989   LearningRate 0.0009   Epoch: 5   Global Step: 9960   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-03 13:04:46,923-Speed 13835.52 samples/sec   Loss 4.6926   LearningRate 0.0009   Epoch: 5   Global Step: 9970   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-03 13:05:04,658-Speed 13858.80 samples/sec   Loss 4.6977   LearningRate 0.0009   Epoch: 5   Global Step: 9980   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-03 13:05:22,379-Speed 13868.74 samples/sec   Loss 4.6733   LearningRate 0.0009   Epoch: 5   Global Step: 9990   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-03 13:05:40,181-Speed 13806.88 samples/sec   Loss 4.6660   LearningRate 0.0009   Epoch: 5   Global Step: 10000   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-03 13:05:57,943-Speed 13836.55 samples/sec   Loss 4.6398   LearningRate 0.0009   Epoch: 5   Global Step: 10010   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-03 13:06:15,708-Speed 13835.29 samples/sec   Loss 4.6150   LearningRate 0.0009   Epoch: 5   Global Step: 10020   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-03 13:06:33,396-Speed 13894.88 samples/sec   Loss 4.5863   LearningRate 0.0009   Epoch: 5   Global Step: 10030   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-03 13:06:51,083-Speed 13895.79 samples/sec   Loss 4.6099   LearningRate 0.0009   Epoch: 5   Global Step: 10040   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-03 13:07:08,809-Speed 13865.48 samples/sec   Loss 4.5835   LearningRate 0.0009   Epoch: 5   Global Step: 10050   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-03 13:07:26,543-Speed 13858.42 samples/sec   Loss 4.6457   LearningRate 0.0009   Epoch: 5   Global Step: 10060   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-03 13:07:44,295-Speed 13845.37 samples/sec   Loss 4.6141   LearningRate 0.0009   Epoch: 5   Global Step: 10070   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-03 13:08:01,980-Speed 13897.21 samples/sec   Loss 4.5827   LearningRate 0.0009   Epoch: 5   Global Step: 10080   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-03 13:08:19,761-Speed 13822.60 samples/sec   Loss 4.5661   LearningRate 0.0009   Epoch: 5   Global Step: 10090   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-03 13:08:37,498-Speed 13857.08 samples/sec   Loss 4.5692   LearningRate 0.0009   Epoch: 5   Global Step: 10100   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-03 13:08:55,222-Speed 13866.52 samples/sec   Loss 4.5770   LearningRate 0.0009   Epoch: 5   Global Step: 10110   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-03 13:09:12,900-Speed 13903.77 samples/sec   Loss 4.5989   LearningRate 0.0009   Epoch: 5   Global Step: 10120   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-03 13:09:30,633-Speed 13860.82 samples/sec   Loss 4.5757   LearningRate 0.0009   Epoch: 5   Global Step: 10130   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-03 13:09:48,426-Speed 13813.39 samples/sec   Loss 4.5630   LearningRate 0.0009   Epoch: 5   Global Step: 10140   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-03 13:10:06,241-Speed 13795.43 samples/sec   Loss 4.5321   LearningRate 0.0009   Epoch: 5   Global Step: 10150   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-03 13:10:24,023-Speed 13823.02 samples/sec   Loss 4.5641   LearningRate 0.0009   Epoch: 5   Global Step: 10160   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-03 13:10:41,824-Speed 13806.55 samples/sec   Loss 4.5852   LearningRate 0.0009   Epoch: 5   Global Step: 10170   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-03 13:10:59,571-Speed 13849.30 samples/sec   Loss 4.5667   LearningRate 0.0009   Epoch: 5   Global Step: 10180   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-03 13:11:17,229-Speed 13918.43 samples/sec   Loss 4.5368   LearningRate 0.0009   Epoch: 5   Global Step: 10190   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-03 13:11:34,941-Speed 13876.38 samples/sec   Loss 4.5643   LearningRate 0.0009   Epoch: 5   Global Step: 10200   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-03 13:11:52,688-Speed 13848.55 samples/sec   Loss 4.5362   LearningRate 0.0009   Epoch: 5   Global Step: 10210   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-03 13:12:10,428-Speed 13854.46 samples/sec   Loss 4.5372   LearningRate 0.0009   Epoch: 5   Global Step: 10220   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-03 13:12:28,179-Speed 13846.12 samples/sec   Loss 4.5440   LearningRate 0.0009   Epoch: 5   Global Step: 10230   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-03 13:12:45,939-Speed 13838.97 samples/sec   Loss 4.5281   LearningRate 0.0009   Epoch: 5   Global Step: 10240   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-03 13:13:03,653-Speed 13874.61 samples/sec   Loss 4.5034   LearningRate 0.0009   Epoch: 5   Global Step: 10250   Fp16 Grad Scale: 65536   Required: 30 hours
Training: 2022-03-03 13:13:21,491-Speed 13777.88 samples/sec   Loss 4.5442   LearningRate 0.0009   Epoch: 5   Global Step: 10260   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-03 13:13:39,248-Speed 13841.27 samples/sec   Loss 4.5154   LearningRate 0.0009   Epoch: 5   Global Step: 10270   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-03 13:13:57,002-Speed 13843.44 samples/sec   Loss 4.5502   LearningRate 0.0009   Epoch: 5   Global Step: 10280   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-03 13:14:14,733-Speed 13861.11 samples/sec   Loss 4.5341   LearningRate 0.0009   Epoch: 5   Global Step: 10290   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-03 13:14:32,414-Speed 13900.69 samples/sec   Loss 4.5079   LearningRate 0.0009   Epoch: 5   Global Step: 10300   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-03 13:14:50,137-Speed 13866.98 samples/sec   Loss 4.5216   LearningRate 0.0009   Epoch: 5   Global Step: 10310   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-03 13:15:07,842-Speed 13882.11 samples/sec   Loss 4.4972   LearningRate 0.0009   Epoch: 5   Global Step: 10320   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-03 13:15:25,563-Speed 13868.95 samples/sec   Loss 4.4960   LearningRate 0.0009   Epoch: 5   Global Step: 10330   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-03 13:15:43,361-Speed 13809.28 samples/sec   Loss 4.5869   LearningRate 0.0009   Epoch: 5   Global Step: 10340   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-03 13:16:01,063-Speed 13883.24 samples/sec   Loss 4.5318   LearningRate 0.0009   Epoch: 5   Global Step: 10350   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-03 13:16:18,818-Speed 13843.23 samples/sec   Loss 4.5826   LearningRate 0.0009   Epoch: 5   Global Step: 10360   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-03 13:16:36,484-Speed 13912.48 samples/sec   Loss 4.5705   LearningRate 0.0009   Epoch: 5   Global Step: 10370   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-03 13:17:45,343-Speed 3569.10 samples/sec   Loss 4.4227   LearningRate 0.0009   Epoch: 6   Global Step: 10380   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-03 13:18:03,077-Speed 13858.56 samples/sec   Loss 4.3843   LearningRate 0.0009   Epoch: 6   Global Step: 10390   Fp16 Grad Scale: 32768   Required: 30 hours
Training: 2022-03-03 13:18:20,764-Speed 13897.35 samples/sec   Loss 4.4755   LearningRate 0.0009   Epoch: 6   Global Step: 10400   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-03 13:18:38,457-Speed 13890.67 samples/sec   Loss 4.4648   LearningRate 0.0009   Epoch: 6   Global Step: 10410   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-03 13:18:56,178-Speed 13869.18 samples/sec   Loss 4.4196   LearningRate 0.0009   Epoch: 6   Global Step: 10420   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-03 13:19:13,936-Speed 13839.86 samples/sec   Loss 4.4214   LearningRate 0.0009   Epoch: 6   Global Step: 10430   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-03 13:19:31,762-Speed 13787.78 samples/sec   Loss 4.4639   LearningRate 0.0009   Epoch: 6   Global Step: 10440   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-03 13:19:49,490-Speed 13863.57 samples/sec   Loss 4.3857   LearningRate 0.0009   Epoch: 6   Global Step: 10450   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-03 13:20:07,170-Speed 13901.58 samples/sec   Loss 4.3911   LearningRate 0.0009   Epoch: 6   Global Step: 10460   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-03 13:20:24,861-Speed 13893.22 samples/sec   Loss 4.4117   LearningRate 0.0009   Epoch: 6   Global Step: 10470   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-03 13:20:42,584-Speed 13867.61 samples/sec   Loss 4.4262   LearningRate 0.0009   Epoch: 6   Global Step: 10480   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-03 13:21:00,300-Speed 13873.06 samples/sec   Loss 4.4024   LearningRate 0.0009   Epoch: 6   Global Step: 10490   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-03 13:21:18,044-Speed 13850.53 samples/sec   Loss 4.4290   LearningRate 0.0009   Epoch: 6   Global Step: 10500   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-03 13:21:35,771-Speed 13865.08 samples/sec   Loss 4.4426   LearningRate 0.0009   Epoch: 6   Global Step: 10510   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-03 13:21:53,486-Speed 13874.05 samples/sec   Loss 4.4113   LearningRate 0.0009   Epoch: 6   Global Step: 10520   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-03 13:22:11,189-Speed 13882.77 samples/sec   Loss 4.4396   LearningRate 0.0009   Epoch: 6   Global Step: 10530   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-03 13:22:28,897-Speed 13880.03 samples/sec   Loss 4.4107   LearningRate 0.0009   Epoch: 6   Global Step: 10540   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-03 13:22:46,653-Speed 13841.76 samples/sec   Loss 4.4087   LearningRate 0.0009   Epoch: 6   Global Step: 10550   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-03 13:23:04,365-Speed 13876.40 samples/sec   Loss 4.3965   LearningRate 0.0009   Epoch: 6   Global Step: 10560   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-03 13:23:22,079-Speed 13874.62 samples/sec   Loss 4.4331   LearningRate 0.0009   Epoch: 6   Global Step: 10570   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-03 13:23:39,808-Speed 13863.12 samples/sec   Loss 4.4328   LearningRate 0.0009   Epoch: 6   Global Step: 10580   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-03 13:23:57,509-Speed 13884.80 samples/sec   Loss 4.3935   LearningRate 0.0009   Epoch: 6   Global Step: 10590   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-03 13:24:15,207-Speed 13887.02 samples/sec   Loss 4.3629   LearningRate 0.0009   Epoch: 6   Global Step: 10600   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-03 13:24:33,004-Speed 13809.87 samples/sec   Loss 4.3856   LearningRate 0.0009   Epoch: 6   Global Step: 10610   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-03 13:24:50,718-Speed 13874.95 samples/sec   Loss 4.4524   LearningRate 0.0009   Epoch: 6   Global Step: 10620   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-03 13:25:08,409-Speed 13892.57 samples/sec   Loss 4.4106   LearningRate 0.0009   Epoch: 6   Global Step: 10630   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-03 13:25:26,141-Speed 13861.02 samples/sec   Loss 4.3618   LearningRate 0.0009   Epoch: 6   Global Step: 10640   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-03 13:25:43,870-Speed 13862.36 samples/sec   Loss 4.3721   LearningRate 0.0009   Epoch: 6   Global Step: 10650   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-03 13:26:01,631-Speed 13838.05 samples/sec   Loss 4.3764   LearningRate 0.0009   Epoch: 6   Global Step: 10660   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-03 13:26:19,435-Speed 13804.80 samples/sec   Loss 4.3913   LearningRate 0.0009   Epoch: 6   Global Step: 10670   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-03 13:26:37,176-Speed 13853.21 samples/sec   Loss 4.3751   LearningRate 0.0009   Epoch: 6   Global Step: 10680   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-03 13:26:55,019-Speed 13774.66 samples/sec   Loss 4.3620   LearningRate 0.0009   Epoch: 6   Global Step: 10690   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-03-03 13:27:12,729-Speed 13878.39 samples/sec   Loss 4.3311   LearningRate 0.0009   Epoch: 6   Global Step: 10700   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-03-03 13:27:30,485-Speed 13842.13 samples/sec   Loss 4.3745   LearningRate 0.0009   Epoch: 6   Global Step: 10710   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-03-03 13:27:48,240-Speed 13843.54 samples/sec   Loss 4.3587   LearningRate 0.0009   Epoch: 6   Global Step: 10720   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-03 13:28:05,983-Speed 13852.45 samples/sec   Loss 4.3436   LearningRate 0.0009   Epoch: 6   Global Step: 10730   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-03 13:28:23,735-Speed 13844.59 samples/sec   Loss 4.3117   LearningRate 0.0009   Epoch: 6   Global Step: 10740   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-03 13:28:41,515-Speed 13823.07 samples/sec   Loss 4.3916   LearningRate 0.0009   Epoch: 6   Global Step: 10750   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-03 13:28:59,332-Speed 13794.28 samples/sec   Loss 4.3839   LearningRate 0.0009   Epoch: 6   Global Step: 10760   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-03 13:29:17,088-Speed 13842.31 samples/sec   Loss 4.3289   LearningRate 0.0009   Epoch: 6   Global Step: 10770   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-03 13:29:34,828-Speed 13854.64 samples/sec   Loss 4.3220   LearningRate 0.0009   Epoch: 6   Global Step: 10780   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-03 13:29:52,537-Speed 13878.50 samples/sec   Loss 4.3435   LearningRate 0.0009   Epoch: 6   Global Step: 10790   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-03 13:30:10,311-Speed 13827.33 samples/sec   Loss 4.4349   LearningRate 0.0009   Epoch: 6   Global Step: 10800   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-03 13:30:28,062-Speed 13846.36 samples/sec   Loss 4.3427   LearningRate 0.0009   Epoch: 6   Global Step: 10810   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-03 13:30:45,768-Speed 13880.90 samples/sec   Loss 4.3200   LearningRate 0.0009   Epoch: 6   Global Step: 10820   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-03-03 13:31:03,499-Speed 13861.35 samples/sec   Loss 4.3191   LearningRate 0.0009   Epoch: 6   Global Step: 10830   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-03-03 13:31:21,202-Speed 13883.01 samples/sec   Loss 4.3318   LearningRate 0.0009   Epoch: 6   Global Step: 10840   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-03-03 13:31:38,916-Speed 13874.68 samples/sec   Loss 4.3203   LearningRate 0.0009   Epoch: 6   Global Step: 10850   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-03-03 13:31:56,651-Speed 13858.26 samples/sec   Loss 4.3102   LearningRate 0.0009   Epoch: 6   Global Step: 10860   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-03 13:32:14,386-Speed 13858.32 samples/sec   Loss 4.2910   LearningRate 0.0009   Epoch: 6   Global Step: 10870   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-03 13:32:31,997-Speed 13955.97 samples/sec   Loss 4.2909   LearningRate 0.0009   Epoch: 6   Global Step: 10880   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-03 13:32:49,746-Speed 13847.16 samples/sec   Loss 4.3356   LearningRate 0.0009   Epoch: 6   Global Step: 10890   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-03 13:33:07,504-Speed 13840.40 samples/sec   Loss 4.3302   LearningRate 0.0009   Epoch: 6   Global Step: 10900   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-03 13:33:25,170-Speed 13912.00 samples/sec   Loss 4.2901   LearningRate 0.0009   Epoch: 6   Global Step: 10910   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-03 13:33:42,936-Speed 13833.88 samples/sec   Loss 4.2812   LearningRate 0.0009   Epoch: 6   Global Step: 10920   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-03 13:34:00,669-Speed 13860.26 samples/sec   Loss 4.2549   LearningRate 0.0009   Epoch: 6   Global Step: 10930   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-03 13:34:18,428-Speed 13839.58 samples/sec   Loss 4.2731   LearningRate 0.0009   Epoch: 6   Global Step: 10940   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-03 13:34:36,188-Speed 13838.30 samples/sec   Loss 4.2904   LearningRate 0.0009   Epoch: 6   Global Step: 10950   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-03 13:34:53,972-Speed 13820.22 samples/sec   Loss 4.3379   LearningRate 0.0009   Epoch: 6   Global Step: 10960   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-03 13:35:11,678-Speed 13880.89 samples/sec   Loss 4.2952   LearningRate 0.0009   Epoch: 6   Global Step: 10970   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-03 13:35:29,432-Speed 13843.79 samples/sec   Loss 4.2316   LearningRate 0.0009   Epoch: 6   Global Step: 10980   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-03 13:35:47,155-Speed 13866.83 samples/sec   Loss 4.2504   LearningRate 0.0009   Epoch: 6   Global Step: 10990   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-03 13:36:05,061-Speed 13726.73 samples/sec   Loss 4.2834   LearningRate 0.0009   Epoch: 6   Global Step: 11000   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-03 13:36:22,757-Speed 13888.90 samples/sec   Loss 4.2595   LearningRate 0.0009   Epoch: 6   Global Step: 11010   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-03 13:36:40,545-Speed 13816.62 samples/sec   Loss 4.2533   LearningRate 0.0009   Epoch: 6   Global Step: 11020   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-03 13:36:58,239-Speed 13890.18 samples/sec   Loss 4.2350   LearningRate 0.0009   Epoch: 6   Global Step: 11030   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-03 13:37:15,917-Speed 13903.35 samples/sec   Loss 4.2259   LearningRate 0.0009   Epoch: 6   Global Step: 11040   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-03 13:37:33,588-Speed 13908.15 samples/sec   Loss 4.2807   LearningRate 0.0009   Epoch: 6   Global Step: 11050   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-03 13:37:51,311-Speed 13867.78 samples/sec   Loss 4.2715   LearningRate 0.0009   Epoch: 6   Global Step: 11060   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-03 13:38:09,042-Speed 13861.40 samples/sec   Loss 4.2694   LearningRate 0.0009   Epoch: 6   Global Step: 11070   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-03 13:38:26,825-Speed 13820.79 samples/sec   Loss 4.2325   LearningRate 0.0009   Epoch: 6   Global Step: 11080   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-03 13:38:44,560-Speed 13858.30 samples/sec   Loss 4.2862   LearningRate 0.0009   Epoch: 6   Global Step: 11090   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-03 13:39:02,383-Speed 13789.81 samples/sec   Loss 4.2568   LearningRate 0.0009   Epoch: 6   Global Step: 11100   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-03 13:39:20,153-Speed 13831.17 samples/sec   Loss 4.2010   LearningRate 0.0009   Epoch: 6   Global Step: 11110   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-03 13:39:37,888-Speed 13857.86 samples/sec   Loss 4.2337   LearningRate 0.0009   Epoch: 6   Global Step: 11120   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-03 13:39:55,619-Speed 13861.42 samples/sec   Loss 4.2153   LearningRate 0.0009   Epoch: 6   Global Step: 11130   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-03 13:40:13,430-Speed 13799.20 samples/sec   Loss 4.1995   LearningRate 0.0009   Epoch: 6   Global Step: 11140   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-03 13:40:31,150-Speed 13869.70 samples/sec   Loss 4.2040   LearningRate 0.0009   Epoch: 6   Global Step: 11150   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-03 13:40:48,905-Speed 13842.93 samples/sec   Loss 4.1988   LearningRate 0.0009   Epoch: 6   Global Step: 11160   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-03 13:41:06,621-Speed 13873.12 samples/sec   Loss 4.2231   LearningRate 0.0009   Epoch: 6   Global Step: 11170   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-03 13:41:24,340-Speed 13870.63 samples/sec   Loss 4.1961   LearningRate 0.0009   Epoch: 6   Global Step: 11180   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-03 13:41:42,023-Speed 13898.68 samples/sec   Loss 4.2056   LearningRate 0.0009   Epoch: 6   Global Step: 11190   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-03 13:41:59,782-Speed 13839.72 samples/sec   Loss 4.2104   LearningRate 0.0009   Epoch: 6   Global Step: 11200   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-03 13:42:17,493-Speed 13876.98 samples/sec   Loss 4.1848   LearningRate 0.0009   Epoch: 6   Global Step: 11210   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-03 13:42:35,191-Speed 13887.40 samples/sec   Loss 4.2488   LearningRate 0.0009   Epoch: 6   Global Step: 11220   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-03 13:42:52,882-Speed 13892.87 samples/sec   Loss 4.1959   LearningRate 0.0009   Epoch: 6   Global Step: 11230   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-03 13:43:10,644-Speed 13837.06 samples/sec   Loss 4.2263   LearningRate 0.0009   Epoch: 6   Global Step: 11240   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-03 13:43:28,348-Speed 13882.43 samples/sec   Loss 4.1974   LearningRate 0.0009   Epoch: 6   Global Step: 11250   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-03 13:43:46,031-Speed 13899.54 samples/sec   Loss 4.2117   LearningRate 0.0009   Epoch: 6   Global Step: 11260   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-03 13:44:03,752-Speed 13869.21 samples/sec   Loss 4.1336   LearningRate 0.0009   Epoch: 6   Global Step: 11270   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-03 13:44:21,407-Speed 13920.74 samples/sec   Loss 4.1619   LearningRate 0.0009   Epoch: 6   Global Step: 11280   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-03 13:44:39,151-Speed 13851.15 samples/sec   Loss 4.1779   LearningRate 0.0009   Epoch: 6   Global Step: 11290   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-03 13:44:56,945-Speed 13812.21 samples/sec   Loss 4.1527   LearningRate 0.0009   Epoch: 6   Global Step: 11300   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-03 13:45:14,639-Speed 13890.82 samples/sec   Loss 4.1755   LearningRate 0.0009   Epoch: 6   Global Step: 11310   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-03 13:45:32,323-Speed 13897.81 samples/sec   Loss 4.1804   LearningRate 0.0009   Epoch: 6   Global Step: 11320   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-03 13:45:49,990-Speed 13911.32 samples/sec   Loss 4.1268   LearningRate 0.0009   Epoch: 6   Global Step: 11330   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-03 13:46:07,711-Speed 13869.68 samples/sec   Loss 4.1987   LearningRate 0.0009   Epoch: 6   Global Step: 11340   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-03 13:46:25,383-Speed 13907.69 samples/sec   Loss 4.1651   LearningRate 0.0009   Epoch: 6   Global Step: 11350   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-03 13:46:43,127-Speed 13851.17 samples/sec   Loss 4.1667   LearningRate 0.0009   Epoch: 6   Global Step: 11360   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-03 13:47:00,865-Speed 13855.45 samples/sec   Loss 4.1361   LearningRate 0.0009   Epoch: 6   Global Step: 11370   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-03 13:47:18,549-Speed 13898.63 samples/sec   Loss 4.1578   LearningRate 0.0009   Epoch: 6   Global Step: 11380   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-03 13:47:36,261-Speed 13875.74 samples/sec   Loss 4.1711   LearningRate 0.0009   Epoch: 6   Global Step: 11390   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-03 13:47:54,009-Speed 13847.99 samples/sec   Loss 4.1418   LearningRate 0.0009   Epoch: 6   Global Step: 11400   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-03 13:48:11,665-Speed 13920.21 samples/sec   Loss 4.1073   LearningRate 0.0009   Epoch: 6   Global Step: 11410   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-03 13:48:29,356-Speed 13893.50 samples/sec   Loss 4.1232   LearningRate 0.0009   Epoch: 6   Global Step: 11420   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-03 13:48:47,062-Speed 13881.26 samples/sec   Loss 4.0710   LearningRate 0.0009   Epoch: 6   Global Step: 11430   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-03 13:49:04,810-Speed 13848.40 samples/sec   Loss 4.1820   LearningRate 0.0009   Epoch: 6   Global Step: 11440   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-03 13:49:22,518-Speed 13880.17 samples/sec   Loss 4.1844   LearningRate 0.0009   Epoch: 6   Global Step: 11450   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-03 13:49:40,258-Speed 13853.96 samples/sec   Loss 4.1287   LearningRate 0.0009   Epoch: 6   Global Step: 11460   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-03-03 13:49:57,940-Speed 13899.58 samples/sec   Loss 4.1520   LearningRate 0.0009   Epoch: 6   Global Step: 11470   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-03 13:50:15,686-Speed 13849.89 samples/sec   Loss 4.0918   LearningRate 0.0009   Epoch: 6   Global Step: 11480   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-03 13:50:33,392-Speed 13881.28 samples/sec   Loss 4.0890   LearningRate 0.0009   Epoch: 6   Global Step: 11490   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-03 13:50:51,150-Speed 13840.43 samples/sec   Loss 4.1343   LearningRate 0.0009   Epoch: 6   Global Step: 11500   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-03 13:51:08,850-Speed 13885.21 samples/sec   Loss 4.1636   LearningRate 0.0009   Epoch: 6   Global Step: 11510   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-03 13:51:26,573-Speed 13867.95 samples/sec   Loss 4.0960   LearningRate 0.0009   Epoch: 6   Global Step: 11520   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-03 13:51:44,379-Speed 13803.08 samples/sec   Loss 4.1059   LearningRate 0.0009   Epoch: 6   Global Step: 11530   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-03 13:52:02,167-Speed 13817.21 samples/sec   Loss 4.0907   LearningRate 0.0009   Epoch: 6   Global Step: 11540   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-03 13:52:19,907-Speed 13853.74 samples/sec   Loss 4.0987   LearningRate 0.0009   Epoch: 6   Global Step: 11550   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-03 13:52:37,567-Speed 13917.12 samples/sec   Loss 4.1204   LearningRate 0.0009   Epoch: 6   Global Step: 11560   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-03 13:52:55,402-Speed 13781.08 samples/sec   Loss 4.0884   LearningRate 0.0009   Epoch: 6   Global Step: 11570   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-03-03 13:53:13,081-Speed 13902.29 samples/sec   Loss 4.1062   LearningRate 0.0009   Epoch: 6   Global Step: 11580   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-03-03 13:53:30,807-Speed 13864.86 samples/sec   Loss 4.0900   LearningRate 0.0009   Epoch: 6   Global Step: 11590   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-03-03 13:53:48,560-Speed 13843.63 samples/sec   Loss 4.1061   LearningRate 0.0009   Epoch: 6   Global Step: 11600   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-03 13:54:06,332-Speed 13830.29 samples/sec   Loss 4.0990   LearningRate 0.0009   Epoch: 6   Global Step: 11610   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-03 13:54:24,070-Speed 13855.40 samples/sec   Loss 4.0880   LearningRate 0.0009   Epoch: 6   Global Step: 11620   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-03 13:54:41,870-Speed 13807.81 samples/sec   Loss 4.1153   LearningRate 0.0009   Epoch: 6   Global Step: 11630   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-03 13:54:59,664-Speed 13812.34 samples/sec   Loss 4.0500   LearningRate 0.0009   Epoch: 6   Global Step: 11640   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-03 13:55:17,357-Speed 13891.34 samples/sec   Loss 4.0478   LearningRate 0.0009   Epoch: 6   Global Step: 11650   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-03 13:55:35,088-Speed 13861.18 samples/sec   Loss 4.0816   LearningRate 0.0009   Epoch: 6   Global Step: 11660   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-03 13:55:52,864-Speed 13826.04 samples/sec   Loss 4.0267   LearningRate 0.0009   Epoch: 6   Global Step: 11670   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-03 13:56:10,552-Speed 13894.98 samples/sec   Loss 4.0950   LearningRate 0.0009   Epoch: 6   Global Step: 11680   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-03 13:56:28,240-Speed 13895.28 samples/sec   Loss 4.0496   LearningRate 0.0009   Epoch: 6   Global Step: 11690   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-03 13:56:45,957-Speed 13872.07 samples/sec   Loss 4.0884   LearningRate 0.0009   Epoch: 6   Global Step: 11700   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-03-03 13:57:03,656-Speed 13886.38 samples/sec   Loss 4.0551   LearningRate 0.0009   Epoch: 6   Global Step: 11710   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-03-03 13:57:21,375-Speed 13870.73 samples/sec   Loss 4.0835   LearningRate 0.0009   Epoch: 6   Global Step: 11720   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-03-03 13:57:39,037-Speed 13915.51 samples/sec   Loss 4.0287   LearningRate 0.0009   Epoch: 6   Global Step: 11730   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-03 13:57:56,778-Speed 13853.86 samples/sec   Loss 4.0561   LearningRate 0.0009   Epoch: 6   Global Step: 11740   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-03 13:58:14,479-Speed 13884.45 samples/sec   Loss 4.0495   LearningRate 0.0009   Epoch: 6   Global Step: 11750   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-03 13:58:32,248-Speed 13831.74 samples/sec   Loss 4.0361   LearningRate 0.0009   Epoch: 6   Global Step: 11760   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-03 13:58:49,983-Speed 13858.39 samples/sec   Loss 4.0344   LearningRate 0.0008   Epoch: 6   Global Step: 11770   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-03 13:59:07,699-Speed 13873.36 samples/sec   Loss 4.0179   LearningRate 0.0008   Epoch: 6   Global Step: 11780   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-03 13:59:25,378-Speed 13902.01 samples/sec   Loss 4.0404   LearningRate 0.0008   Epoch: 6   Global Step: 11790   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-03 13:59:43,052-Speed 13906.33 samples/sec   Loss 4.0356   LearningRate 0.0008   Epoch: 6   Global Step: 11800   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-03 14:00:00,780-Speed 13863.40 samples/sec   Loss 4.0842   LearningRate 0.0008   Epoch: 6   Global Step: 11810   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-03 14:00:18,534-Speed 13843.78 samples/sec   Loss 4.0269   LearningRate 0.0008   Epoch: 6   Global Step: 11820   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-03 14:00:36,254-Speed 13869.76 samples/sec   Loss 4.0065   LearningRate 0.0008   Epoch: 6   Global Step: 11830   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-03-03 14:00:54,021-Speed 13835.41 samples/sec   Loss 4.0479   LearningRate 0.0008   Epoch: 6   Global Step: 11840   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-03-03 14:01:11,811-Speed 13815.40 samples/sec   Loss 4.0335   LearningRate 0.0008   Epoch: 6   Global Step: 11850   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-03 14:01:29,488-Speed 13903.13 samples/sec   Loss 4.0328   LearningRate 0.0008   Epoch: 6   Global Step: 11860   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-03 14:01:47,195-Speed 13880.61 samples/sec   Loss 4.0386   LearningRate 0.0008   Epoch: 6   Global Step: 11870   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-03 14:02:04,903-Speed 13879.75 samples/sec   Loss 4.0038   LearningRate 0.0008   Epoch: 6   Global Step: 11880   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-03 14:02:22,580-Speed 13903.04 samples/sec   Loss 3.9863   LearningRate 0.0008   Epoch: 6   Global Step: 11890   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-03 14:02:40,354-Speed 13828.47 samples/sec   Loss 3.9696   LearningRate 0.0008   Epoch: 6   Global Step: 11900   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-03 14:02:58,057-Speed 13883.07 samples/sec   Loss 4.0315   LearningRate 0.0008   Epoch: 6   Global Step: 11910   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-03 14:03:15,779-Speed 13868.20 samples/sec   Loss 4.0844   LearningRate 0.0008   Epoch: 6   Global Step: 11920   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-03 14:03:33,547-Speed 13834.20 samples/sec   Loss 4.0250   LearningRate 0.0008   Epoch: 6   Global Step: 11930   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-03 14:03:51,313-Speed 13833.89 samples/sec   Loss 4.0171   LearningRate 0.0008   Epoch: 6   Global Step: 11940   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-03 14:04:09,122-Speed 13800.25 samples/sec   Loss 3.9711   LearningRate 0.0008   Epoch: 6   Global Step: 11950   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-03-03 14:04:26,915-Speed 13813.36 samples/sec   Loss 3.9838   LearningRate 0.0008   Epoch: 6   Global Step: 11960   Fp16 Grad Scale: 131072   Required: 29 hours
Training: 2022-03-03 14:04:44,635-Speed 13869.70 samples/sec   Loss 3.9764   LearningRate 0.0008   Epoch: 6   Global Step: 11970   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-03 14:05:02,459-Speed 13788.72 samples/sec   Loss 3.9848   LearningRate 0.0008   Epoch: 6   Global Step: 11980   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-03 14:05:20,186-Speed 13867.65 samples/sec   Loss 4.0272   LearningRate 0.0008   Epoch: 6   Global Step: 11990   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-03 14:05:37,862-Speed 13904.08 samples/sec   Loss 3.9593   LearningRate 0.0008   Epoch: 6   Global Step: 12000   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-03 14:05:55,582-Speed 13870.17 samples/sec   Loss 3.9987   LearningRate 0.0008   Epoch: 6   Global Step: 12010   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-03 14:06:13,287-Speed 13881.96 samples/sec   Loss 4.0003   LearningRate 0.0008   Epoch: 6   Global Step: 12020   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-03 14:06:30,978-Speed 13892.54 samples/sec   Loss 3.9565   LearningRate 0.0008   Epoch: 6   Global Step: 12030   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-03 14:06:48,720-Speed 13852.95 samples/sec   Loss 3.9762   LearningRate 0.0008   Epoch: 6   Global Step: 12040   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-03 14:07:06,463-Speed 13851.82 samples/sec   Loss 3.9890   LearningRate 0.0008   Epoch: 6   Global Step: 12050   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-03 14:07:24,244-Speed 13822.31 samples/sec   Loss 3.9883   LearningRate 0.0008   Epoch: 6   Global Step: 12060   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-03 14:07:41,971-Speed 13864.67 samples/sec   Loss 3.9880   LearningRate 0.0008   Epoch: 6   Global Step: 12070   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-03 14:07:59,659-Speed 13894.79 samples/sec   Loss 4.0565   LearningRate 0.0008   Epoch: 6   Global Step: 12080   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-03 14:08:17,468-Speed 13800.97 samples/sec   Loss 4.0223   LearningRate 0.0008   Epoch: 6   Global Step: 12090   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-03 14:09:25,635-Speed 3605.35 samples/sec   Loss 3.9772   LearningRate 0.0008   Epoch: 7   Global Step: 12100   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-03 14:09:43,307-Speed 13907.18 samples/sec   Loss 3.9027   LearningRate 0.0008   Epoch: 7   Global Step: 12110   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-03 14:10:01,080-Speed 13828.37 samples/sec   Loss 3.9277   LearningRate 0.0008   Epoch: 7   Global Step: 12120   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-03 14:10:18,774-Speed 13890.37 samples/sec   Loss 3.9066   LearningRate 0.0008   Epoch: 7   Global Step: 12130   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-03 14:10:36,529-Speed 13842.71 samples/sec   Loss 3.9121   LearningRate 0.0008   Epoch: 7   Global Step: 12140   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-03 14:10:54,395-Speed 13756.73 samples/sec   Loss 3.9282   LearningRate 0.0008   Epoch: 7   Global Step: 12150   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-03 14:11:12,196-Speed 13806.98 samples/sec   Loss 3.9125   LearningRate 0.0008   Epoch: 7   Global Step: 12160   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-03 14:11:29,955-Speed 13840.28 samples/sec   Loss 3.9020   LearningRate 0.0008   Epoch: 7   Global Step: 12170   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-03 14:11:47,851-Speed 13733.60 samples/sec   Loss 3.9000   LearningRate 0.0008   Epoch: 7   Global Step: 12180   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-03 14:12:05,679-Speed 13785.94 samples/sec   Loss 3.8904   LearningRate 0.0008   Epoch: 7   Global Step: 12190   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-03 14:12:23,451-Speed 13829.34 samples/sec   Loss 3.9042   LearningRate 0.0008   Epoch: 7   Global Step: 12200   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-03 14:12:41,236-Speed 13819.39 samples/sec   Loss 3.9471   LearningRate 0.0008   Epoch: 7   Global Step: 12210   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-03 14:12:59,041-Speed 13803.84 samples/sec   Loss 3.8942   LearningRate 0.0008   Epoch: 7   Global Step: 12220   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-03 14:13:16,876-Speed 13780.56 samples/sec   Loss 3.9240   LearningRate 0.0008   Epoch: 7   Global Step: 12230   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-03 14:13:34,701-Speed 13788.25 samples/sec   Loss 3.9141   LearningRate 0.0008   Epoch: 7   Global Step: 12240   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-03 14:13:52,466-Speed 13834.45 samples/sec   Loss 3.9293   LearningRate 0.0008   Epoch: 7   Global Step: 12250   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-03 14:14:10,311-Speed 13773.57 samples/sec   Loss 3.8795   LearningRate 0.0008   Epoch: 7   Global Step: 12260   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-03 14:14:28,133-Speed 13790.32 samples/sec   Loss 3.9009   LearningRate 0.0008   Epoch: 7   Global Step: 12270   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-03 14:14:45,932-Speed 13808.50 samples/sec   Loss 3.8710   LearningRate 0.0008   Epoch: 7   Global Step: 12280   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-03 14:15:03,773-Speed 13775.68 samples/sec   Loss 3.8969   LearningRate 0.0008   Epoch: 7   Global Step: 12290   Fp16 Grad Scale: 65536   Required: 29 hours
Training: 2022-03-03 14:15:21,500-Speed 13864.72 samples/sec   Loss 3.9424   LearningRate 0.0008   Epoch: 7   Global Step: 12300   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-03 14:15:39,264-Speed 13835.35 samples/sec   Loss 3.8876   LearningRate 0.0008   Epoch: 7   Global Step: 12310   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-03 14:15:57,054-Speed 13815.14 samples/sec   Loss 3.9175   LearningRate 0.0008   Epoch: 7   Global Step: 12320   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-03 14:16:14,966-Speed 13721.91 samples/sec   Loss 3.8803   LearningRate 0.0008   Epoch: 7   Global Step: 12330   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-03 14:16:32,831-Speed 13757.19 samples/sec   Loss 3.8720   LearningRate 0.0008   Epoch: 7   Global Step: 12340   Fp16 Grad Scale: 32768   Required: 29 hours
Training: 2022-03-03 14:16:50,655-Speed 13789.14 samples/sec   Loss 3.8897   LearningRate 0.0008   Epoch: 7   Global Step: 12350   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-03-03 14:17:08,390-Speed 13858.19 samples/sec   Loss 3.8888   LearningRate 0.0008   Epoch: 7   Global Step: 12360   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-03-03 14:17:26,134-Speed 13851.58 samples/sec   Loss 3.9031   LearningRate 0.0008   Epoch: 7   Global Step: 12370   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-03-03 14:17:43,845-Speed 13877.47 samples/sec   Loss 3.8828   LearningRate 0.0008   Epoch: 7   Global Step: 12380   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-03-03 14:18:01,608-Speed 13836.07 samples/sec   Loss 3.8501   LearningRate 0.0008   Epoch: 7   Global Step: 12390   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-03-03 14:18:19,390-Speed 13821.88 samples/sec   Loss 3.8735   LearningRate 0.0008   Epoch: 7   Global Step: 12400   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-03 14:18:37,113-Speed 13867.40 samples/sec   Loss 3.8932   LearningRate 0.0008   Epoch: 7   Global Step: 12410   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-03 14:18:55,002-Speed 13739.00 samples/sec   Loss 3.9244   LearningRate 0.0008   Epoch: 7   Global Step: 12420   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-03 14:19:12,714-Speed 13876.39 samples/sec   Loss 3.9157   LearningRate 0.0008   Epoch: 7   Global Step: 12430   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-03 14:19:30,383-Speed 13909.77 samples/sec   Loss 3.8561   LearningRate 0.0008   Epoch: 7   Global Step: 12440   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-03 14:19:48,033-Speed 13925.27 samples/sec   Loss 3.8797   LearningRate 0.0008   Epoch: 7   Global Step: 12450   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-03 14:20:05,739-Speed 13880.78 samples/sec   Loss 3.8566   LearningRate 0.0008   Epoch: 7   Global Step: 12460   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-03 14:20:23,399-Speed 13917.68 samples/sec   Loss 3.8480   LearningRate 0.0008   Epoch: 7   Global Step: 12470   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-03 14:20:41,134-Speed 13858.44 samples/sec   Loss 3.8749   LearningRate 0.0008   Epoch: 7   Global Step: 12480   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-03 14:20:58,796-Speed 13915.46 samples/sec   Loss 3.8819   LearningRate 0.0008   Epoch: 7   Global Step: 12490   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-03-03 14:21:16,470-Speed 13906.48 samples/sec   Loss 3.8782   LearningRate 0.0008   Epoch: 7   Global Step: 12500   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-03-03 14:21:34,185-Speed 13874.15 samples/sec   Loss 3.8330   LearningRate 0.0008   Epoch: 7   Global Step: 12510   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-03-03 14:21:51,922-Speed 13856.77 samples/sec   Loss 3.8247   LearningRate 0.0008   Epoch: 7   Global Step: 12520   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-03-03 14:22:09,818-Speed 13733.68 samples/sec   Loss 3.8916   LearningRate 0.0008   Epoch: 7   Global Step: 12530   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-03-03 14:22:27,529-Speed 13877.21 samples/sec   Loss 3.8774   LearningRate 0.0008   Epoch: 7   Global Step: 12540   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-03-03 14:22:45,239-Speed 13877.60 samples/sec   Loss 3.8327   LearningRate 0.0008   Epoch: 7   Global Step: 12550   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-03-03 14:23:03,001-Speed 13837.17 samples/sec   Loss 3.8245   LearningRate 0.0008   Epoch: 7   Global Step: 12560   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-03-03 14:23:20,728-Speed 13864.32 samples/sec   Loss 3.8647   LearningRate 0.0008   Epoch: 7   Global Step: 12570   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-03-03 14:23:38,390-Speed 13915.62 samples/sec   Loss 3.8235   LearningRate 0.0008   Epoch: 7   Global Step: 12580   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-03-03 14:23:56,126-Speed 13857.76 samples/sec   Loss 3.8272   LearningRate 0.0008   Epoch: 7   Global Step: 12590   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-03 14:24:13,828-Speed 13883.72 samples/sec   Loss 3.8405   LearningRate 0.0008   Epoch: 7   Global Step: 12600   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-03 14:24:31,543-Speed 13874.75 samples/sec   Loss 3.8157   LearningRate 0.0008   Epoch: 7   Global Step: 12610   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-03 14:24:49,315-Speed 13828.84 samples/sec   Loss 3.8131   LearningRate 0.0008   Epoch: 7   Global Step: 12620   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-03 14:25:07,137-Speed 13790.83 samples/sec   Loss 3.8518   LearningRate 0.0008   Epoch: 7   Global Step: 12630   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-03 14:25:24,819-Speed 13900.04 samples/sec   Loss 3.8565   LearningRate 0.0008   Epoch: 7   Global Step: 12640   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-03 14:25:42,553-Speed 13859.70 samples/sec   Loss 3.8163   LearningRate 0.0008   Epoch: 7   Global Step: 12650   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-03 14:26:00,262-Speed 13878.28 samples/sec   Loss 3.8265   LearningRate 0.0008   Epoch: 7   Global Step: 12660   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-03 14:26:17,945-Speed 13898.95 samples/sec   Loss 3.8207   LearningRate 0.0008   Epoch: 7   Global Step: 12670   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-03 14:26:35,653-Speed 13879.22 samples/sec   Loss 3.7937   LearningRate 0.0008   Epoch: 7   Global Step: 12680   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-03 14:26:53,420-Speed 13833.43 samples/sec   Loss 3.7835   LearningRate 0.0008   Epoch: 7   Global Step: 12690   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-03-03 14:27:11,226-Speed 13803.36 samples/sec   Loss 3.7732   LearningRate 0.0008   Epoch: 7   Global Step: 12700   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-03-03 14:27:28,961-Speed 13857.58 samples/sec   Loss 3.8449   LearningRate 0.0008   Epoch: 7   Global Step: 12710   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-03 14:27:46,731-Speed 13831.37 samples/sec   Loss 3.8107   LearningRate 0.0008   Epoch: 7   Global Step: 12720   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-03 14:28:04,475-Speed 13851.55 samples/sec   Loss 3.7895   LearningRate 0.0008   Epoch: 7   Global Step: 12730   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-03 14:28:22,165-Speed 13893.28 samples/sec   Loss 3.8021   LearningRate 0.0008   Epoch: 7   Global Step: 12740   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-03 14:28:39,918-Speed 13844.09 samples/sec   Loss 3.8359   LearningRate 0.0008   Epoch: 7   Global Step: 12750   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-03 14:28:57,608-Speed 13893.24 samples/sec   Loss 3.8157   LearningRate 0.0008   Epoch: 7   Global Step: 12760   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-03 14:29:15,284-Speed 13905.06 samples/sec   Loss 3.7779   LearningRate 0.0008   Epoch: 7   Global Step: 12770   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-03 14:29:33,000-Speed 13873.26 samples/sec   Loss 3.7810   LearningRate 0.0008   Epoch: 7   Global Step: 12780   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-03 14:29:50,831-Speed 13782.99 samples/sec   Loss 3.7648   LearningRate 0.0008   Epoch: 7   Global Step: 12790   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-03 14:30:08,524-Speed 13891.80 samples/sec   Loss 3.7958   LearningRate 0.0008   Epoch: 7   Global Step: 12800   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-03 14:30:26,389-Speed 13757.59 samples/sec   Loss 3.7963   LearningRate 0.0008   Epoch: 7   Global Step: 12810   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-03-03 14:30:44,106-Speed 13871.85 samples/sec   Loss 3.7834   LearningRate 0.0008   Epoch: 7   Global Step: 12820   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-03 14:31:01,769-Speed 13914.85 samples/sec   Loss 3.7670   LearningRate 0.0008   Epoch: 7   Global Step: 12830   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-03 14:31:19,519-Speed 13846.15 samples/sec   Loss 3.8476   LearningRate 0.0008   Epoch: 7   Global Step: 12840   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-03 14:31:37,247-Speed 13863.97 samples/sec   Loss 3.8015   LearningRate 0.0008   Epoch: 7   Global Step: 12850   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-03 14:31:54,962-Speed 13875.40 samples/sec   Loss 3.7568   LearningRate 0.0008   Epoch: 7   Global Step: 12860   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-03 14:32:12,698-Speed 13857.12 samples/sec   Loss 3.7969   LearningRate 0.0008   Epoch: 7   Global Step: 12870   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-03 14:32:30,473-Speed 13827.23 samples/sec   Loss 3.7615   LearningRate 0.0008   Epoch: 7   Global Step: 12880   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-03 14:32:48,167-Speed 13892.36 samples/sec   Loss 3.7791   LearningRate 0.0008   Epoch: 7   Global Step: 12890   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-03 14:33:05,793-Speed 13943.91 samples/sec   Loss 3.7934   LearningRate 0.0008   Epoch: 7   Global Step: 12900   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-03-03 14:33:23,576-Speed 13820.85 samples/sec   Loss 3.7554   LearningRate 0.0008   Epoch: 7   Global Step: 12910   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-03-03 14:33:41,271-Speed 13891.24 samples/sec   Loss 3.7254   LearningRate 0.0008   Epoch: 7   Global Step: 12920   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-03-03 14:33:59,077-Speed 13803.22 samples/sec   Loss 3.7645   LearningRate 0.0008   Epoch: 7   Global Step: 12930   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-03-03 14:34:16,752-Speed 13905.20 samples/sec   Loss 3.8233   LearningRate 0.0008   Epoch: 7   Global Step: 12940   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-03-03 14:34:34,451-Speed 13886.22 samples/sec   Loss 3.7483   LearningRate 0.0008   Epoch: 7   Global Step: 12950   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-03-03 14:34:52,208-Speed 13840.96 samples/sec   Loss 3.7414   LearningRate 0.0008   Epoch: 7   Global Step: 12960   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-03-03 14:35:09,909-Speed 13885.08 samples/sec   Loss 3.7384   LearningRate 0.0008   Epoch: 7   Global Step: 12970   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-03-03 14:35:27,653-Speed 13851.56 samples/sec   Loss 3.7579   LearningRate 0.0008   Epoch: 7   Global Step: 12980   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-03-03 14:35:45,365-Speed 13876.15 samples/sec   Loss 3.7539   LearningRate 0.0008   Epoch: 7   Global Step: 12990   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-03-03 14:36:03,093-Speed 13863.78 samples/sec   Loss 3.7229   LearningRate 0.0008   Epoch: 7   Global Step: 13000   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-03 14:36:20,782-Speed 13894.26 samples/sec   Loss 3.7070   LearningRate 0.0008   Epoch: 7   Global Step: 13010   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-03 14:36:38,523-Speed 13853.84 samples/sec   Loss 3.7459   LearningRate 0.0008   Epoch: 7   Global Step: 13020   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-03 14:36:56,281-Speed 13840.25 samples/sec   Loss 3.7247   LearningRate 0.0008   Epoch: 7   Global Step: 13030   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-03 14:37:14,020-Speed 13855.09 samples/sec   Loss 3.7448   LearningRate 0.0008   Epoch: 7   Global Step: 13040   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-03-03 14:37:31,701-Speed 13900.45 samples/sec   Loss 3.7002   LearningRate 0.0008   Epoch: 7   Global Step: 13050   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-03-03 14:37:49,382-Speed 13900.50 samples/sec   Loss 3.6871   LearningRate 0.0008   Epoch: 7   Global Step: 13060   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-03-03 14:38:07,144-Speed 13838.28 samples/sec   Loss 3.7151   LearningRate 0.0008   Epoch: 7   Global Step: 13070   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-03-03 14:38:24,821-Speed 13903.88 samples/sec   Loss 3.7259   LearningRate 0.0008   Epoch: 7   Global Step: 13080   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-03-03 14:38:42,556-Speed 13858.71 samples/sec   Loss 3.7103   LearningRate 0.0008   Epoch: 7   Global Step: 13090   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-03-03 14:39:00,278-Speed 13868.32 samples/sec   Loss 3.7389   LearningRate 0.0008   Epoch: 7   Global Step: 13100   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-03-03 14:39:18,075-Speed 13809.88 samples/sec   Loss 3.7108   LearningRate 0.0008   Epoch: 7   Global Step: 13110   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-03-03 14:39:35,772-Speed 13888.25 samples/sec   Loss 3.7044   LearningRate 0.0008   Epoch: 7   Global Step: 13120   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-03-03 14:39:53,522-Speed 13846.24 samples/sec   Loss 3.7048   LearningRate 0.0008   Epoch: 7   Global Step: 13130   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-03-03 14:40:11,291-Speed 13831.99 samples/sec   Loss 3.7257   LearningRate 0.0008   Epoch: 7   Global Step: 13140   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-03 14:40:28,991-Speed 13885.84 samples/sec   Loss 3.6882   LearningRate 0.0008   Epoch: 7   Global Step: 13150   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-03 14:40:46,756-Speed 13834.39 samples/sec   Loss 3.7231   LearningRate 0.0008   Epoch: 7   Global Step: 13160   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-03-03 14:41:04,603-Speed 13772.37 samples/sec   Loss 3.7099   LearningRate 0.0008   Epoch: 7   Global Step: 13170   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-03-03 14:41:22,348-Speed 13850.36 samples/sec   Loss 3.7211   LearningRate 0.0008   Epoch: 7   Global Step: 13180   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-03-03 14:41:40,154-Speed 13812.29 samples/sec   Loss 3.7005   LearningRate 0.0008   Epoch: 7   Global Step: 13190   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-03-03 14:41:57,932-Speed 13824.25 samples/sec   Loss 3.6590   LearningRate 0.0008   Epoch: 7   Global Step: 13200   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-03-03 14:42:15,734-Speed 13811.69 samples/sec   Loss 3.7195   LearningRate 0.0008   Epoch: 7   Global Step: 13210   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-03-03 14:42:33,543-Speed 13800.39 samples/sec   Loss 3.7045   LearningRate 0.0008   Epoch: 7   Global Step: 13220   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-03-03 14:42:51,290-Speed 13858.37 samples/sec   Loss 3.6679   LearningRate 0.0008   Epoch: 7   Global Step: 13230   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-03-03 14:43:09,068-Speed 13824.36 samples/sec   Loss 3.6915   LearningRate 0.0008   Epoch: 7   Global Step: 13240   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-03-03 14:43:26,837-Speed 13836.41 samples/sec   Loss 3.6813   LearningRate 0.0008   Epoch: 7   Global Step: 13250   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-03-03 14:43:44,628-Speed 13815.07 samples/sec   Loss 3.6863   LearningRate 0.0008   Epoch: 7   Global Step: 13260   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-03 14:44:02,346-Speed 13871.26 samples/sec   Loss 3.6693   LearningRate 0.0008   Epoch: 7   Global Step: 13270   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-03-03 14:44:20,146-Speed 13807.59 samples/sec   Loss 3.6932   LearningRate 0.0008   Epoch: 7   Global Step: 13280   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-03-03 14:44:38,040-Speed 13734.99 samples/sec   Loss 3.6738   LearningRate 0.0008   Epoch: 7   Global Step: 13290   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-03-03 14:44:55,810-Speed 13831.48 samples/sec   Loss 3.6880   LearningRate 0.0008   Epoch: 7   Global Step: 13300   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-03-03 14:45:13,580-Speed 13830.83 samples/sec   Loss 3.6591   LearningRate 0.0008   Epoch: 7   Global Step: 13310   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-03-03 14:45:31,401-Speed 13791.70 samples/sec   Loss 3.6767   LearningRate 0.0008   Epoch: 7   Global Step: 13320   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-03-03 14:45:49,173-Speed 13829.15 samples/sec   Loss 3.6701   LearningRate 0.0008   Epoch: 7   Global Step: 13330   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-03-03 14:46:07,018-Speed 13772.87 samples/sec   Loss 3.6827   LearningRate 0.0008   Epoch: 7   Global Step: 13340   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-03-03 14:46:24,816-Speed 13809.06 samples/sec   Loss 3.6917   LearningRate 0.0008   Epoch: 7   Global Step: 13350   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-03-03 14:46:42,770-Speed 13689.02 samples/sec   Loss 3.7077   LearningRate 0.0008   Epoch: 7   Global Step: 13360   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-03-03 14:47:00,543-Speed 13829.32 samples/sec   Loss 3.6856   LearningRate 0.0008   Epoch: 7   Global Step: 13370   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-03 14:47:18,427-Speed 13750.95 samples/sec   Loss 3.6326   LearningRate 0.0008   Epoch: 7   Global Step: 13380   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-03 14:47:36,262-Speed 13780.77 samples/sec   Loss 3.6324   LearningRate 0.0008   Epoch: 7   Global Step: 13390   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-03 14:47:54,081-Speed 13792.58 samples/sec   Loss 3.6912   LearningRate 0.0008   Epoch: 7   Global Step: 13400   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-03 14:48:11,929-Speed 13770.82 samples/sec   Loss 3.6319   LearningRate 0.0008   Epoch: 7   Global Step: 13410   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-03 14:48:29,846-Speed 13717.49 samples/sec   Loss 3.6321   LearningRate 0.0008   Epoch: 7   Global Step: 13420   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-03 14:48:47,658-Speed 13813.40 samples/sec   Loss 3.6442   LearningRate 0.0008   Epoch: 7   Global Step: 13430   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-03 14:49:05,429-Speed 13830.38 samples/sec   Loss 3.6566   LearningRate 0.0008   Epoch: 7   Global Step: 13440   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-03 14:49:23,253-Speed 13795.51 samples/sec   Loss 3.6858   LearningRate 0.0008   Epoch: 7   Global Step: 13450   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-03 14:49:41,029-Speed 13826.01 samples/sec   Loss 3.6729   LearningRate 0.0008   Epoch: 7   Global Step: 13460   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-03 14:49:58,808-Speed 13830.62 samples/sec   Loss 3.6417   LearningRate 0.0008   Epoch: 7   Global Step: 13470   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-03-03 14:50:16,720-Speed 13721.39 samples/sec   Loss 3.6439   LearningRate 0.0008   Epoch: 7   Global Step: 13480   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-03-03 14:50:34,496-Speed 13831.30 samples/sec   Loss 3.6549   LearningRate 0.0008   Epoch: 7   Global Step: 13490   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-03-03 14:50:52,358-Speed 13759.65 samples/sec   Loss 3.6402   LearningRate 0.0008   Epoch: 7   Global Step: 13500   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-03-03 14:51:10,141-Speed 13820.68 samples/sec   Loss 3.6555   LearningRate 0.0008   Epoch: 7   Global Step: 13510   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-03-03 14:51:27,893-Speed 13845.50 samples/sec   Loss 3.6451   LearningRate 0.0008   Epoch: 7   Global Step: 13520   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-03-03 14:51:45,706-Speed 13797.24 samples/sec   Loss 3.6364   LearningRate 0.0008   Epoch: 7   Global Step: 13530   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-03-03 14:52:03,540-Speed 13781.74 samples/sec   Loss 3.6599   LearningRate 0.0008   Epoch: 7   Global Step: 13540   Fp16 Grad Scale: 131072   Required: 28 hours
Training: 2022-03-03 14:52:21,419-Speed 13746.25 samples/sec   Loss 3.6063   LearningRate 0.0008   Epoch: 7   Global Step: 13550   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-03 14:52:39,185-Speed 13834.51 samples/sec   Loss 3.6113   LearningRate 0.0008   Epoch: 7   Global Step: 13560   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-03-03 14:52:56,978-Speed 13812.86 samples/sec   Loss 3.6393   LearningRate 0.0008   Epoch: 7   Global Step: 13570   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-03-03 14:53:14,864-Speed 13740.99 samples/sec   Loss 3.6483   LearningRate 0.0008   Epoch: 7   Global Step: 13580   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-03-03 14:53:32,644-Speed 13823.63 samples/sec   Loss 3.6159   LearningRate 0.0008   Epoch: 7   Global Step: 13590   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-03-03 14:53:50,464-Speed 13791.67 samples/sec   Loss 3.6294   LearningRate 0.0008   Epoch: 7   Global Step: 13600   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-03-03 14:54:08,546-Speed 13592.32 samples/sec   Loss 3.5961   LearningRate 0.0008   Epoch: 7   Global Step: 13610   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-03-03 14:54:26,669-Speed 13561.58 samples/sec   Loss 3.6226   LearningRate 0.0008   Epoch: 7   Global Step: 13620   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-03-03 14:54:44,782-Speed 13569.28 samples/sec   Loss 3.6438   LearningRate 0.0008   Epoch: 7   Global Step: 13630   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-03-03 14:55:02,901-Speed 13564.61 samples/sec   Loss 3.6401   LearningRate 0.0008   Epoch: 7   Global Step: 13640   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-03-03 14:55:21,038-Speed 13550.90 samples/sec   Loss 3.5787   LearningRate 0.0008   Epoch: 7   Global Step: 13650   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-03-03 14:55:39,131-Speed 13583.59 samples/sec   Loss 3.6069   LearningRate 0.0008   Epoch: 7   Global Step: 13660   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-03 14:55:57,200-Speed 13602.46 samples/sec   Loss 3.5681   LearningRate 0.0008   Epoch: 7   Global Step: 13670   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-03 14:56:15,373-Speed 13524.07 samples/sec   Loss 3.5806   LearningRate 0.0008   Epoch: 7   Global Step: 13680   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-03 14:56:33,507-Speed 13553.49 samples/sec   Loss 3.6188   LearningRate 0.0008   Epoch: 7   Global Step: 13690   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-03 14:56:51,604-Speed 13590.00 samples/sec   Loss 3.6253   LearningRate 0.0008   Epoch: 7   Global Step: 13700   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-03 14:57:09,660-Speed 13612.37 samples/sec   Loss 3.6945   LearningRate 0.0008   Epoch: 7   Global Step: 13710   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-03 14:57:27,769-Speed 13571.62 samples/sec   Loss 3.6078   LearningRate 0.0008   Epoch: 7   Global Step: 13720   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-03 14:57:45,840-Speed 13601.11 samples/sec   Loss 3.5777   LearningRate 0.0008   Epoch: 7   Global Step: 13730   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-03 14:58:03,947-Speed 13572.85 samples/sec   Loss 3.5815   LearningRate 0.0008   Epoch: 7   Global Step: 13740   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-03-03 14:58:22,073-Speed 13566.69 samples/sec   Loss 3.5639   LearningRate 0.0008   Epoch: 7   Global Step: 13750   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-03-03 14:58:40,197-Speed 13561.09 samples/sec   Loss 3.5572   LearningRate 0.0008   Epoch: 7   Global Step: 13760   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-03-03 14:58:58,272-Speed 13603.95 samples/sec   Loss 3.6025   LearningRate 0.0008   Epoch: 7   Global Step: 13770   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-03-03 14:59:16,398-Speed 13559.16 samples/sec   Loss 3.6677   LearningRate 0.0008   Epoch: 7   Global Step: 13780   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-03-03 14:59:34,462-Speed 13606.11 samples/sec   Loss 3.6275   LearningRate 0.0008   Epoch: 7   Global Step: 13790   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-03-03 14:59:52,622-Speed 13534.16 samples/sec   Loss 3.6130   LearningRate 0.0008   Epoch: 7   Global Step: 13800   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-03-03 15:00:10,706-Speed 13590.89 samples/sec   Loss 3.6209   LearningRate 0.0008   Epoch: 7   Global Step: 13810   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-03-03 15:00:28,828-Speed 13562.13 samples/sec   Loss 3.6315   LearningRate 0.0008   Epoch: 7   Global Step: 13820   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-03-03 15:01:44,005-Speed 3269.19 samples/sec   Loss 3.5854   LearningRate 0.0008   Epoch: 8   Global Step: 13830   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-03-03 15:02:02,063-Speed 13610.65 samples/sec   Loss 3.5567   LearningRate 0.0008   Epoch: 8   Global Step: 13840   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-03 15:02:20,108-Speed 13628.84 samples/sec   Loss 3.5499   LearningRate 0.0008   Epoch: 8   Global Step: 13850   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-03 15:02:37,804-Speed 13888.92 samples/sec   Loss 3.5261   LearningRate 0.0008   Epoch: 8   Global Step: 13860   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-03 15:02:55,490-Speed 13896.92 samples/sec   Loss 3.5480   LearningRate 0.0008   Epoch: 8   Global Step: 13870   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-03-03 15:03:13,137-Speed 13927.90 samples/sec   Loss 3.5540   LearningRate 0.0008   Epoch: 8   Global Step: 13880   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-03-03 15:03:30,800-Speed 13915.31 samples/sec   Loss 3.5402   LearningRate 0.0008   Epoch: 8   Global Step: 13890   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-03-03 15:03:48,598-Speed 13809.38 samples/sec   Loss 3.5572   LearningRate 0.0008   Epoch: 8   Global Step: 13900   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-03-03 15:04:06,239-Speed 13931.96 samples/sec   Loss 3.5301   LearningRate 0.0008   Epoch: 8   Global Step: 13910   Fp16 Grad Scale: 16384   Required: 28 hours
Training: 2022-03-03 15:04:23,955-Speed 13873.24 samples/sec   Loss 3.5193   LearningRate 0.0008   Epoch: 8   Global Step: 13920   Fp16 Grad Scale: 16384   Required: 28 hours
Training: 2022-03-03 15:04:41,711-Speed 13841.47 samples/sec   Loss 3.5545   LearningRate 0.0008   Epoch: 8   Global Step: 13930   Fp16 Grad Scale: 16384   Required: 28 hours
Training: 2022-03-03 15:04:59,420-Speed 13883.68 samples/sec   Loss 3.5202   LearningRate 0.0008   Epoch: 8   Global Step: 13940   Fp16 Grad Scale: 16384   Required: 28 hours
Training: 2022-03-03 15:05:17,190-Speed 13830.85 samples/sec   Loss 3.5485   LearningRate 0.0008   Epoch: 8   Global Step: 13950   Fp16 Grad Scale: 16384   Required: 28 hours
Training: 2022-03-03 15:05:34,953-Speed 13840.50 samples/sec   Loss 3.5635   LearningRate 0.0008   Epoch: 8   Global Step: 13960   Fp16 Grad Scale: 16384   Required: 28 hours
Training: 2022-03-03 15:05:52,682-Speed 13863.17 samples/sec   Loss 3.5471   LearningRate 0.0008   Epoch: 8   Global Step: 13970   Fp16 Grad Scale: 16384   Required: 28 hours
Training: 2022-03-03 15:06:10,410-Speed 13868.79 samples/sec   Loss 3.5230   LearningRate 0.0008   Epoch: 8   Global Step: 13980   Fp16 Grad Scale: 16384   Required: 28 hours
Training: 2022-03-03 15:06:28,063-Speed 13922.98 samples/sec   Loss 3.4903   LearningRate 0.0008   Epoch: 8   Global Step: 13990   Fp16 Grad Scale: 16384   Required: 28 hours
Training: 2022-03-03 15:06:45,838-Speed 13836.92 samples/sec   Loss 3.5558   LearningRate 0.0008   Epoch: 8   Global Step: 14000   Fp16 Grad Scale: 16384   Required: 28 hours
Training: 2022-03-03 15:07:03,507-Speed 13910.25 samples/sec   Loss 3.5234   LearningRate 0.0008   Epoch: 8   Global Step: 14010   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-03-03 15:07:21,164-Speed 13925.93 samples/sec   Loss 3.5264   LearningRate 0.0008   Epoch: 8   Global Step: 14020   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-03-03 15:07:38,957-Speed 13812.98 samples/sec   Loss 3.5457   LearningRate 0.0008   Epoch: 8   Global Step: 14030   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-03-03 15:07:56,630-Speed 13913.32 samples/sec   Loss 3.6096   LearningRate 0.0008   Epoch: 8   Global Step: 14040   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-03-03 15:08:14,284-Speed 13922.15 samples/sec   Loss 3.5387   LearningRate 0.0008   Epoch: 8   Global Step: 14050   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-03-03 15:08:31,910-Speed 13952.55 samples/sec   Loss 3.5106   LearningRate 0.0008   Epoch: 8   Global Step: 14060   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-03-03 15:08:49,599-Speed 13893.95 samples/sec   Loss 3.5271   LearningRate 0.0008   Epoch: 8   Global Step: 14070   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-03-03 15:09:07,272-Speed 13907.08 samples/sec   Loss 3.5500   LearningRate 0.0008   Epoch: 8   Global Step: 14080   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-03-03 15:09:24,983-Speed 13877.00 samples/sec   Loss 3.5394   LearningRate 0.0008   Epoch: 8   Global Step: 14090   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-03-03 15:09:42,730-Speed 13849.54 samples/sec   Loss 3.4978   LearningRate 0.0008   Epoch: 8   Global Step: 14100   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-03-03 15:10:00,434-Speed 13882.26 samples/sec   Loss 3.5349   LearningRate 0.0008   Epoch: 8   Global Step: 14110   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-03 15:10:18,151-Speed 13872.38 samples/sec   Loss 3.5252   LearningRate 0.0008   Epoch: 8   Global Step: 14120   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-03 15:10:35,927-Speed 13825.97 samples/sec   Loss 3.5336   LearningRate 0.0008   Epoch: 8   Global Step: 14130   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-03 15:10:53,582-Speed 13921.36 samples/sec   Loss 3.4882   LearningRate 0.0008   Epoch: 8   Global Step: 14140   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-03 15:11:11,394-Speed 13797.72 samples/sec   Loss 3.5173   LearningRate 0.0008   Epoch: 8   Global Step: 14150   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-03 15:11:29,081-Speed 13896.18 samples/sec   Loss 3.5220   LearningRate 0.0008   Epoch: 8   Global Step: 14160   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-03-03 15:11:46,763-Speed 13899.66 samples/sec   Loss 3.5101   LearningRate 0.0008   Epoch: 8   Global Step: 14170   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-03-03 15:12:04,456-Speed 13891.63 samples/sec   Loss 3.5305   LearningRate 0.0008   Epoch: 8   Global Step: 14180   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-03-03 15:12:22,144-Speed 13894.69 samples/sec   Loss 3.5191   LearningRate 0.0008   Epoch: 8   Global Step: 14190   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-03-03 15:12:39,819-Speed 13905.63 samples/sec   Loss 3.4985   LearningRate 0.0008   Epoch: 8   Global Step: 14200   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-03-03 15:12:57,653-Speed 13781.72 samples/sec   Loss 3.5623   LearningRate 0.0008   Epoch: 8   Global Step: 14210   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-03-03 15:13:15,368-Speed 13874.31 samples/sec   Loss 3.5118   LearningRate 0.0008   Epoch: 8   Global Step: 14220   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-03-03 15:13:33,235-Speed 13755.29 samples/sec   Loss 3.4631   LearningRate 0.0008   Epoch: 8   Global Step: 14230   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-03-03 15:13:50,971-Speed 13857.97 samples/sec   Loss 3.4838   LearningRate 0.0008   Epoch: 8   Global Step: 14240   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-03-03 15:14:08,786-Speed 13795.91 samples/sec   Loss 3.5072   LearningRate 0.0008   Epoch: 8   Global Step: 14250   Fp16 Grad Scale: 32768   Required: 28 hours
Training: 2022-03-03 15:14:26,608-Speed 13790.71 samples/sec   Loss 3.5193   LearningRate 0.0008   Epoch: 8   Global Step: 14260   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-03 15:14:44,282-Speed 13905.39 samples/sec   Loss 3.4902   LearningRate 0.0008   Epoch: 8   Global Step: 14270   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-03 15:15:02,023-Speed 13853.74 samples/sec   Loss 3.4905   LearningRate 0.0008   Epoch: 8   Global Step: 14280   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-03 15:15:19,718-Speed 13890.78 samples/sec   Loss 3.4574   LearningRate 0.0008   Epoch: 8   Global Step: 14290   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-03 15:15:37,480-Speed 13838.02 samples/sec   Loss 3.4934   LearningRate 0.0008   Epoch: 8   Global Step: 14300   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-03 15:15:55,265-Speed 13819.69 samples/sec   Loss 3.5238   LearningRate 0.0008   Epoch: 8   Global Step: 14310   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-03 15:16:13,009-Speed 13850.71 samples/sec   Loss 3.4889   LearningRate 0.0008   Epoch: 8   Global Step: 14320   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-03 15:16:30,740-Speed 13861.47 samples/sec   Loss 3.5030   LearningRate 0.0008   Epoch: 8   Global Step: 14330   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-03 15:16:48,477-Speed 13857.03 samples/sec   Loss 3.4825   LearningRate 0.0008   Epoch: 8   Global Step: 14340   Fp16 Grad Scale: 65536   Required: 28 hours
Training: 2022-03-03 15:17:06,275-Speed 13814.10 samples/sec   Loss 3.4653   LearningRate 0.0008   Epoch: 8   Global Step: 14350   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-03 15:17:24,185-Speed 13723.27 samples/sec   Loss 3.4770   LearningRate 0.0008   Epoch: 8   Global Step: 14360   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-03-03 15:17:42,176-Speed 13660.91 samples/sec   Loss 3.4940   LearningRate 0.0008   Epoch: 8   Global Step: 14370   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-03-03 15:18:00,044-Speed 13755.02 samples/sec   Loss 3.4973   LearningRate 0.0008   Epoch: 8   Global Step: 14380   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-03 15:18:17,808-Speed 13835.46 samples/sec   Loss 3.4404   LearningRate 0.0008   Epoch: 8   Global Step: 14390   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-03 15:18:35,626-Speed 13800.77 samples/sec   Loss 3.4436   LearningRate 0.0008   Epoch: 8   Global Step: 14400   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-03 15:18:53,495-Speed 13754.66 samples/sec   Loss 3.4905   LearningRate 0.0008   Epoch: 8   Global Step: 14410   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-03 15:19:11,331-Speed 13780.58 samples/sec   Loss 3.4436   LearningRate 0.0008   Epoch: 8   Global Step: 14420   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-03 15:19:29,159-Speed 13786.14 samples/sec   Loss 3.4757   LearningRate 0.0008   Epoch: 8   Global Step: 14430   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-03 15:19:47,007-Speed 13770.84 samples/sec   Loss 3.4542   LearningRate 0.0008   Epoch: 8   Global Step: 14440   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-03 15:20:04,874-Speed 13755.24 samples/sec   Loss 3.4539   LearningRate 0.0008   Epoch: 8   Global Step: 14450   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-03 15:20:22,761-Speed 13741.17 samples/sec   Loss 3.4476   LearningRate 0.0008   Epoch: 8   Global Step: 14460   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-03 15:20:40,651-Speed 13738.04 samples/sec   Loss 3.4779   LearningRate 0.0008   Epoch: 8   Global Step: 14470   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-03 15:20:58,557-Speed 13725.81 samples/sec   Loss 3.4863   LearningRate 0.0008   Epoch: 8   Global Step: 14480   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-03 15:21:16,531-Speed 13673.32 samples/sec   Loss 3.4450   LearningRate 0.0008   Epoch: 8   Global Step: 14490   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-03 15:21:34,404-Speed 13751.85 samples/sec   Loss 3.4394   LearningRate 0.0008   Epoch: 8   Global Step: 14500   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-03 15:21:52,176-Speed 13829.44 samples/sec   Loss 3.4774   LearningRate 0.0008   Epoch: 8   Global Step: 14510   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-03 15:22:10,051-Speed 13749.52 samples/sec   Loss 3.4703   LearningRate 0.0008   Epoch: 8   Global Step: 14520   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-03 15:22:28,102-Speed 13615.27 samples/sec   Loss 3.4700   LearningRate 0.0008   Epoch: 8   Global Step: 14530   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-03 15:22:46,026-Speed 13713.54 samples/sec   Loss 3.4774   LearningRate 0.0008   Epoch: 8   Global Step: 14540   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-03 15:23:03,783-Speed 13841.42 samples/sec   Loss 3.4615   LearningRate 0.0008   Epoch: 8   Global Step: 14550   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-03 15:23:21,652-Speed 13754.06 samples/sec   Loss 3.4376   LearningRate 0.0008   Epoch: 8   Global Step: 14560   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-03 15:23:39,555-Speed 13728.61 samples/sec   Loss 3.4401   LearningRate 0.0008   Epoch: 8   Global Step: 14570   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-03 15:23:57,395-Speed 13776.43 samples/sec   Loss 3.4647   LearningRate 0.0008   Epoch: 8   Global Step: 14580   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-03 15:24:15,400-Speed 13650.63 samples/sec   Loss 3.4548   LearningRate 0.0008   Epoch: 8   Global Step: 14590   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-03 15:24:33,341-Speed 13699.27 samples/sec   Loss 3.4043   LearningRate 0.0008   Epoch: 8   Global Step: 14600   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-03 15:24:51,205-Speed 13758.07 samples/sec   Loss 3.4338   LearningRate 0.0008   Epoch: 8   Global Step: 14610   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-03 15:25:08,995-Speed 13819.78 samples/sec   Loss 3.4537   LearningRate 0.0008   Epoch: 8   Global Step: 14620   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-03 15:25:26,900-Speed 13726.65 samples/sec   Loss 3.4437   LearningRate 0.0008   Epoch: 8   Global Step: 14630   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-03 15:25:44,773-Speed 13756.34 samples/sec   Loss 3.4478   LearningRate 0.0008   Epoch: 8   Global Step: 14640   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-03-03 15:26:02,703-Speed 13707.30 samples/sec   Loss 3.4436   LearningRate 0.0008   Epoch: 8   Global Step: 14650   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-03-03 15:26:20,533-Speed 13791.14 samples/sec   Loss 3.4297   LearningRate 0.0008   Epoch: 8   Global Step: 14660   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-03-03 15:26:38,379-Speed 13771.77 samples/sec   Loss 3.4299   LearningRate 0.0008   Epoch: 8   Global Step: 14670   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-03-03 15:26:56,212-Speed 13782.08 samples/sec   Loss 3.4322   LearningRate 0.0008   Epoch: 8   Global Step: 14680   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-03-03 15:27:14,019-Speed 13801.94 samples/sec   Loss 3.4328   LearningRate 0.0008   Epoch: 8   Global Step: 14690   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-03 15:27:32,015-Speed 13657.96 samples/sec   Loss 3.4263   LearningRate 0.0008   Epoch: 8   Global Step: 14700   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-03 15:27:49,814-Speed 13815.03 samples/sec   Loss 3.4026   LearningRate 0.0008   Epoch: 8   Global Step: 14710   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-03 15:28:07,677-Speed 13758.83 samples/sec   Loss 3.3964   LearningRate 0.0008   Epoch: 8   Global Step: 14720   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-03 15:28:25,522-Speed 13773.36 samples/sec   Loss 3.4145   LearningRate 0.0008   Epoch: 8   Global Step: 14730   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-03 15:28:43,393-Speed 13753.46 samples/sec   Loss 3.4371   LearningRate 0.0008   Epoch: 8   Global Step: 14740   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-03 15:29:01,244-Speed 13767.94 samples/sec   Loss 3.4237   LearningRate 0.0008   Epoch: 8   Global Step: 14750   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-03 15:29:19,124-Speed 13778.74 samples/sec   Loss 3.3897   LearningRate 0.0008   Epoch: 8   Global Step: 14760   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-03 15:29:37,037-Speed 13720.71 samples/sec   Loss 3.4138   LearningRate 0.0008   Epoch: 8   Global Step: 14770   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-03 15:29:55,055-Speed 13640.98 samples/sec   Loss 3.4203   LearningRate 0.0008   Epoch: 8   Global Step: 14780   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-03 15:30:12,879-Speed 13788.67 samples/sec   Loss 3.3840   LearningRate 0.0008   Epoch: 8   Global Step: 14790   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-03 15:30:30,857-Speed 13671.12 samples/sec   Loss 3.3737   LearningRate 0.0008   Epoch: 8   Global Step: 14800   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-03 15:30:48,719-Speed 13759.68 samples/sec   Loss 3.4029   LearningRate 0.0008   Epoch: 8   Global Step: 14810   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-03 15:31:06,708-Speed 13662.88 samples/sec   Loss 3.3790   LearningRate 0.0008   Epoch: 8   Global Step: 14820   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-03 15:31:24,669-Speed 13683.46 samples/sec   Loss 3.3935   LearningRate 0.0008   Epoch: 8   Global Step: 14830   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-03 15:31:42,663-Speed 13659.02 samples/sec   Loss 3.4130   LearningRate 0.0008   Epoch: 8   Global Step: 14840   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-03 15:32:00,559-Speed 13733.48 samples/sec   Loss 3.3957   LearningRate 0.0008   Epoch: 8   Global Step: 14850   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-03 15:32:18,466-Speed 13725.48 samples/sec   Loss 3.4082   LearningRate 0.0008   Epoch: 8   Global Step: 14860   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-03 15:32:36,387-Speed 13714.18 samples/sec   Loss 3.4045   LearningRate 0.0008   Epoch: 8   Global Step: 14870   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-03 15:32:54,226-Speed 13778.01 samples/sec   Loss 3.3693   LearningRate 0.0008   Epoch: 8   Global Step: 14880   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-03 15:33:12,005-Speed 13823.97 samples/sec   Loss 3.3586   LearningRate 0.0008   Epoch: 8   Global Step: 14890   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-03-03 15:33:29,841-Speed 13780.09 samples/sec   Loss 3.3792   LearningRate 0.0008   Epoch: 8   Global Step: 14900   Fp16 Grad Scale: 131072   Required: 27 hours
Training: 2022-03-03 15:33:47,780-Speed 13700.68 samples/sec   Loss 3.4166   LearningRate 0.0008   Epoch: 8   Global Step: 14910   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-03 15:34:05,789-Speed 13647.17 samples/sec   Loss 3.4056   LearningRate 0.0008   Epoch: 8   Global Step: 14920   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-03 15:34:23,666-Speed 13755.20 samples/sec   Loss 3.3570   LearningRate 0.0008   Epoch: 8   Global Step: 14930   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-03 15:34:41,507-Speed 13775.90 samples/sec   Loss 3.3726   LearningRate 0.0008   Epoch: 8   Global Step: 14940   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-03 15:34:59,417-Speed 13722.35 samples/sec   Loss 3.3605   LearningRate 0.0008   Epoch: 8   Global Step: 14950   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-03 15:35:17,322-Speed 13726.87 samples/sec   Loss 3.3711   LearningRate 0.0008   Epoch: 8   Global Step: 14960   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-03 15:35:35,120-Speed 13810.70 samples/sec   Loss 3.3741   LearningRate 0.0008   Epoch: 8   Global Step: 14970   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-03 15:35:52,985-Speed 13760.24 samples/sec   Loss 3.3826   LearningRate 0.0008   Epoch: 8   Global Step: 14980   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-03 15:36:10,801-Speed 13794.64 samples/sec   Loss 3.3706   LearningRate 0.0008   Epoch: 8   Global Step: 14990   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-03 15:36:28,697-Speed 13743.24 samples/sec   Loss 3.3658   LearningRate 0.0008   Epoch: 8   Global Step: 15000   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-03 15:36:46,537-Speed 13776.78 samples/sec   Loss 3.3515   LearningRate 0.0008   Epoch: 8   Global Step: 15010   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-03 15:37:04,382-Speed 13772.71 samples/sec   Loss 3.3527   LearningRate 0.0008   Epoch: 8   Global Step: 15020   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-03 15:37:22,246-Speed 13758.36 samples/sec   Loss 3.3922   LearningRate 0.0008   Epoch: 8   Global Step: 15030   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-03 15:37:40,076-Speed 13784.29 samples/sec   Loss 3.3635   LearningRate 0.0008   Epoch: 8   Global Step: 15040   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-03 15:37:57,936-Speed 13761.57 samples/sec   Loss 3.3377   LearningRate 0.0008   Epoch: 8   Global Step: 15050   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-03 15:38:15,760-Speed 13788.52 samples/sec   Loss 3.3545   LearningRate 0.0008   Epoch: 8   Global Step: 15060   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-03 15:38:33,690-Speed 13707.67 samples/sec   Loss 3.3822   LearningRate 0.0008   Epoch: 8   Global Step: 15070   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-03 15:38:51,493-Speed 13805.42 samples/sec   Loss 3.3722   LearningRate 0.0008   Epoch: 8   Global Step: 15080   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-03 15:39:09,431-Speed 13701.28 samples/sec   Loss 3.4035   LearningRate 0.0008   Epoch: 8   Global Step: 15090   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-03 15:39:27,512-Speed 13593.25 samples/sec   Loss 3.3426   LearningRate 0.0008   Epoch: 8   Global Step: 15100   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-03 15:39:45,328-Speed 13795.15 samples/sec   Loss 3.3538   LearningRate 0.0008   Epoch: 8   Global Step: 15110   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-03 15:40:03,190-Speed 13759.99 samples/sec   Loss 3.3515   LearningRate 0.0008   Epoch: 8   Global Step: 15120   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-03 15:40:21,146-Speed 13688.05 samples/sec   Loss 3.3589   LearningRate 0.0008   Epoch: 8   Global Step: 15130   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-03 15:40:39,128-Speed 13667.90 samples/sec   Loss 3.3207   LearningRate 0.0008   Epoch: 8   Global Step: 15140   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-03 15:40:57,062-Speed 13704.53 samples/sec   Loss 3.3454   LearningRate 0.0008   Epoch: 8   Global Step: 15150   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-03 15:41:15,048-Speed 13664.69 samples/sec   Loss 3.3698   LearningRate 0.0008   Epoch: 8   Global Step: 15160   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-03 15:41:32,865-Speed 13796.01 samples/sec   Loss 3.3531   LearningRate 0.0008   Epoch: 8   Global Step: 15170   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-03 15:41:50,733-Speed 13755.42 samples/sec   Loss 3.3068   LearningRate 0.0008   Epoch: 8   Global Step: 15180   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-03 15:42:08,693-Speed 13684.51 samples/sec   Loss 3.3934   LearningRate 0.0008   Epoch: 8   Global Step: 15190   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-03 15:42:26,584-Speed 13737.42 samples/sec   Loss 3.3796   LearningRate 0.0008   Epoch: 8   Global Step: 15200   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-03 15:42:44,541-Speed 13687.03 samples/sec   Loss 3.3262   LearningRate 0.0008   Epoch: 8   Global Step: 15210   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-03 15:43:02,458-Speed 13717.67 samples/sec   Loss 3.3332   LearningRate 0.0008   Epoch: 8   Global Step: 15220   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-03 15:43:20,291-Speed 13782.07 samples/sec   Loss 3.3275   LearningRate 0.0008   Epoch: 8   Global Step: 15230   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-03 15:43:38,096-Speed 13803.62 samples/sec   Loss 3.3219   LearningRate 0.0008   Epoch: 8   Global Step: 15240   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-03 15:43:55,942-Speed 13772.02 samples/sec   Loss 3.2921   LearningRate 0.0007   Epoch: 8   Global Step: 15250   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-03 15:44:13,816-Speed 13750.74 samples/sec   Loss 3.2830   LearningRate 0.0007   Epoch: 8   Global Step: 15260   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-03 15:44:31,689-Speed 13751.77 samples/sec   Loss 3.3247   LearningRate 0.0007   Epoch: 8   Global Step: 15270   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-03 15:44:49,536-Speed 13771.02 samples/sec   Loss 3.3417   LearningRate 0.0007   Epoch: 8   Global Step: 15280   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-03 15:45:07,281-Speed 13850.29 samples/sec   Loss 3.3739   LearningRate 0.0007   Epoch: 8   Global Step: 15290   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-03 15:45:24,995-Speed 13874.91 samples/sec   Loss 3.2961   LearningRate 0.0007   Epoch: 8   Global Step: 15300   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-03 15:45:42,782-Speed 13818.18 samples/sec   Loss 3.3283   LearningRate 0.0007   Epoch: 8   Global Step: 15310   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-03 15:46:00,514-Speed 13860.76 samples/sec   Loss 3.3611   LearningRate 0.0007   Epoch: 8   Global Step: 15320   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-03 15:46:18,250-Speed 13857.13 samples/sec   Loss 3.3314   LearningRate 0.0007   Epoch: 8   Global Step: 15330   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-03 15:46:36,021-Speed 13830.01 samples/sec   Loss 3.3219   LearningRate 0.0007   Epoch: 8   Global Step: 15340   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-03 15:46:53,717-Speed 13889.16 samples/sec   Loss 3.3183   LearningRate 0.0007   Epoch: 8   Global Step: 15350   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-03 15:47:11,508-Speed 13815.22 samples/sec   Loss 3.3346   LearningRate 0.0007   Epoch: 8   Global Step: 15360   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-03 15:47:29,295-Speed 13817.58 samples/sec   Loss 3.3418   LearningRate 0.0007   Epoch: 8   Global Step: 15370   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-03 15:47:46,963-Speed 13910.30 samples/sec   Loss 3.3028   LearningRate 0.0007   Epoch: 8   Global Step: 15380   Fp16 Grad Scale: 16384   Required: 27 hours
Training: 2022-03-03 15:48:04,794-Speed 13783.85 samples/sec   Loss 3.3103   LearningRate 0.0007   Epoch: 8   Global Step: 15390   Fp16 Grad Scale: 16384   Required: 27 hours
Training: 2022-03-03 15:48:22,544-Speed 13846.81 samples/sec   Loss 3.3187   LearningRate 0.0007   Epoch: 8   Global Step: 15400   Fp16 Grad Scale: 16384   Required: 27 hours
Training: 2022-03-03 15:48:40,246-Speed 13883.80 samples/sec   Loss 3.2837   LearningRate 0.0007   Epoch: 8   Global Step: 15410   Fp16 Grad Scale: 16384   Required: 27 hours
Training: 2022-03-03 15:48:57,927-Speed 13901.08 samples/sec   Loss 3.2792   LearningRate 0.0007   Epoch: 8   Global Step: 15420   Fp16 Grad Scale: 16384   Required: 27 hours
Training: 2022-03-03 15:49:15,644-Speed 13872.55 samples/sec   Loss 3.3819   LearningRate 0.0007   Epoch: 8   Global Step: 15430   Fp16 Grad Scale: 16384   Required: 27 hours
Training: 2022-03-03 15:49:33,362-Speed 13871.98 samples/sec   Loss 3.3756   LearningRate 0.0007   Epoch: 8   Global Step: 15440   Fp16 Grad Scale: 16384   Required: 27 hours
Training: 2022-03-03 15:49:51,244-Speed 13743.68 samples/sec   Loss 3.3068   LearningRate 0.0007   Epoch: 8   Global Step: 15450   Fp16 Grad Scale: 16384   Required: 27 hours
Training: 2022-03-03 15:50:09,323-Speed 13595.24 samples/sec   Loss 3.2934   LearningRate 0.0007   Epoch: 8   Global Step: 15460   Fp16 Grad Scale: 16384   Required: 27 hours
Training: 2022-03-03 15:50:27,149-Speed 13787.27 samples/sec   Loss 3.3275   LearningRate 0.0007   Epoch: 8   Global Step: 15470   Fp16 Grad Scale: 16384   Required: 27 hours
Training: 2022-03-03 15:50:45,051-Speed 13730.19 samples/sec   Loss 3.2823   LearningRate 0.0007   Epoch: 8   Global Step: 15480   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-03 15:51:03,199-Speed 13543.41 samples/sec   Loss 3.2807   LearningRate 0.0007   Epoch: 8   Global Step: 15490   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-03 15:51:20,943-Speed 13850.93 samples/sec   Loss 3.3478   LearningRate 0.0007   Epoch: 8   Global Step: 15500   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-03 15:51:38,691-Speed 13848.39 samples/sec   Loss 3.3629   LearningRate 0.0007   Epoch: 8   Global Step: 15510   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-03 15:51:56,404-Speed 13875.32 samples/sec   Loss 3.3480   LearningRate 0.0007   Epoch: 8   Global Step: 15520   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-03 15:52:14,151-Speed 13848.79 samples/sec   Loss 3.3022   LearningRate 0.0007   Epoch: 8   Global Step: 15530   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-03 15:52:31,896-Speed 13850.80 samples/sec   Loss 3.3004   LearningRate 0.0007   Epoch: 8   Global Step: 15540   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-03 15:52:49,820-Speed 13711.97 samples/sec   Loss 3.3370   LearningRate 0.0007   Epoch: 8   Global Step: 15550   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-03 15:53:57,159-Speed 3649.71 samples/sec   Loss 3.3412   LearningRate 0.0007   Epoch: 9   Global Step: 15560   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-03 15:54:14,905-Speed 13849.35 samples/sec   Loss 3.2789   LearningRate 0.0007   Epoch: 9   Global Step: 15570   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-03 15:54:32,670-Speed 13834.97 samples/sec   Loss 3.2407   LearningRate 0.0007   Epoch: 9   Global Step: 15580   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-03 15:54:50,487-Speed 13794.58 samples/sec   Loss 3.2447   LearningRate 0.0007   Epoch: 9   Global Step: 15590   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-03 15:55:08,263-Speed 13826.05 samples/sec   Loss 3.2676   LearningRate 0.0007   Epoch: 9   Global Step: 15600   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-03 15:55:26,126-Speed 13759.18 samples/sec   Loss 3.2364   LearningRate 0.0007   Epoch: 9   Global Step: 15610   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-03 15:55:43,938-Speed 13798.08 samples/sec   Loss 3.2731   LearningRate 0.0007   Epoch: 9   Global Step: 15620   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-03 15:56:01,742-Speed 13804.31 samples/sec   Loss 3.2320   LearningRate 0.0007   Epoch: 9   Global Step: 15630   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-03 15:56:20,292-Speed 13249.85 samples/sec   Loss 3.2507   LearningRate 0.0007   Epoch: 9   Global Step: 15640   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-03 15:56:38,264-Speed 13781.30 samples/sec   Loss 3.2568   LearningRate 0.0007   Epoch: 9   Global Step: 15650   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-03 15:56:56,082-Speed 13793.35 samples/sec   Loss 3.2503   LearningRate 0.0007   Epoch: 9   Global Step: 15660   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-03 15:57:13,919-Speed 13778.89 samples/sec   Loss 3.2615   LearningRate 0.0007   Epoch: 9   Global Step: 15670   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-03 15:57:31,820-Speed 13730.27 samples/sec   Loss 3.2505   LearningRate 0.0007   Epoch: 9   Global Step: 15680   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-03 15:57:49,637-Speed 13794.30 samples/sec   Loss 3.2864   LearningRate 0.0007   Epoch: 9   Global Step: 15690   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-03 15:58:07,513-Speed 13749.65 samples/sec   Loss 3.2901   LearningRate 0.0007   Epoch: 9   Global Step: 15700   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-03 15:58:25,370-Speed 13763.58 samples/sec   Loss 3.2468   LearningRate 0.0007   Epoch: 9   Global Step: 15710   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-03 15:58:43,147-Speed 13825.86 samples/sec   Loss 3.2305   LearningRate 0.0007   Epoch: 9   Global Step: 15720   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-03 15:59:00,866-Speed 13870.80 samples/sec   Loss 3.2313   LearningRate 0.0007   Epoch: 9   Global Step: 15730   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-03 15:59:18,668-Speed 13805.52 samples/sec   Loss 3.2588   LearningRate 0.0007   Epoch: 9   Global Step: 15740   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-03 15:59:36,451-Speed 13820.77 samples/sec   Loss 3.2390   LearningRate 0.0007   Epoch: 9   Global Step: 15750   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-03 15:59:54,308-Speed 13763.87 samples/sec   Loss 3.2621   LearningRate 0.0007   Epoch: 9   Global Step: 15760   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-03 16:00:12,117-Speed 13800.71 samples/sec   Loss 3.2603   LearningRate 0.0007   Epoch: 9   Global Step: 15770   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-03 16:00:29,983-Speed 13756.19 samples/sec   Loss 3.2797   LearningRate 0.0007   Epoch: 9   Global Step: 15780   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-03 16:00:47,817-Speed 13782.22 samples/sec   Loss 3.2756   LearningRate 0.0007   Epoch: 9   Global Step: 15790   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-03 16:01:05,614-Speed 13809.90 samples/sec   Loss 3.2439   LearningRate 0.0007   Epoch: 9   Global Step: 15800   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-03 16:01:23,385-Speed 13830.27 samples/sec   Loss 3.2306   LearningRate 0.0007   Epoch: 9   Global Step: 15810   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-03 16:01:41,207-Speed 13790.55 samples/sec   Loss 3.2170   LearningRate 0.0007   Epoch: 9   Global Step: 15820   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-03 16:01:59,007-Speed 13807.99 samples/sec   Loss 3.2555   LearningRate 0.0007   Epoch: 9   Global Step: 15830   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-03 16:02:16,834-Speed 13786.23 samples/sec   Loss 3.2404   LearningRate 0.0007   Epoch: 9   Global Step: 15840   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-03 16:02:34,657-Speed 13790.39 samples/sec   Loss 3.2139   LearningRate 0.0007   Epoch: 9   Global Step: 15850   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-03 16:02:52,493-Speed 13779.88 samples/sec   Loss 3.2298   LearningRate 0.0007   Epoch: 9   Global Step: 15860   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-03 16:03:10,346-Speed 13765.99 samples/sec   Loss 3.2868   LearningRate 0.0007   Epoch: 9   Global Step: 15870   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-03 16:03:28,151-Speed 13803.96 samples/sec   Loss 3.3042   LearningRate 0.0007   Epoch: 9   Global Step: 15880   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-03 16:03:46,062-Speed 13722.50 samples/sec   Loss 3.2579   LearningRate 0.0007   Epoch: 9   Global Step: 15890   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-03 16:04:04,693-Speed 13192.18 samples/sec   Loss 3.2048   LearningRate 0.0007   Epoch: 9   Global Step: 15900   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-03 16:04:22,545-Speed 13768.04 samples/sec   Loss 3.2214   LearningRate 0.0007   Epoch: 9   Global Step: 15910   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-03 16:04:40,403-Speed 13762.65 samples/sec   Loss 3.2281   LearningRate 0.0007   Epoch: 9   Global Step: 15920   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-03 16:04:58,228-Speed 13787.70 samples/sec   Loss 3.2288   LearningRate 0.0007   Epoch: 9   Global Step: 15930   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-03 16:05:16,983-Speed 13106.69 samples/sec   Loss 3.2659   LearningRate 0.0007   Epoch: 9   Global Step: 15940   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-03 16:05:34,823-Speed 13777.26 samples/sec   Loss 3.2140   LearningRate 0.0007   Epoch: 9   Global Step: 15950   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-03 16:05:52,564-Speed 13853.50 samples/sec   Loss 3.2238   LearningRate 0.0007   Epoch: 9   Global Step: 15960   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-03 16:06:10,449-Speed 13742.22 samples/sec   Loss 3.2151   LearningRate 0.0007   Epoch: 9   Global Step: 15970   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-03 16:06:28,328-Speed 13747.39 samples/sec   Loss 3.2421   LearningRate 0.0007   Epoch: 9   Global Step: 15980   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-03 16:06:46,974-Speed 13181.08 samples/sec   Loss 3.2421   LearningRate 0.0007   Epoch: 9   Global Step: 15990   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-03 16:07:04,813-Speed 13777.07 samples/sec   Loss 3.2190   LearningRate 0.0007   Epoch: 9   Global Step: 16000   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-03 16:07:22,625-Speed 13798.97 samples/sec   Loss 3.2367   LearningRate 0.0007   Epoch: 9   Global Step: 16010   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-03 16:07:40,424-Speed 13808.46 samples/sec   Loss 3.2015   LearningRate 0.0007   Epoch: 9   Global Step: 16020   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-03 16:07:58,249-Speed 13788.90 samples/sec   Loss 3.1949   LearningRate 0.0007   Epoch: 9   Global Step: 16030   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-03 16:08:16,088-Speed 13777.12 samples/sec   Loss 3.2128   LearningRate 0.0007   Epoch: 9   Global Step: 16040   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-03 16:08:33,908-Speed 13791.87 samples/sec   Loss 3.2284   LearningRate 0.0007   Epoch: 9   Global Step: 16050   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-03 16:08:51,753-Speed 13772.98 samples/sec   Loss 3.2627   LearningRate 0.0007   Epoch: 9   Global Step: 16060   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-03 16:09:09,580-Speed 13787.01 samples/sec   Loss 3.2203   LearningRate 0.0007   Epoch: 9   Global Step: 16070   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-03 16:09:27,383-Speed 13804.84 samples/sec   Loss 3.1926   LearningRate 0.0007   Epoch: 9   Global Step: 16080   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-03 16:09:45,200-Speed 13794.24 samples/sec   Loss 3.2052   LearningRate 0.0007   Epoch: 9   Global Step: 16090   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-03 16:10:03,070-Speed 13754.38 samples/sec   Loss 3.2229   LearningRate 0.0007   Epoch: 9   Global Step: 16100   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-03 16:10:20,872-Speed 13805.90 samples/sec   Loss 3.1862   LearningRate 0.0007   Epoch: 9   Global Step: 16110   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-03 16:10:38,725-Speed 13766.62 samples/sec   Loss 3.1820   LearningRate 0.0007   Epoch: 9   Global Step: 16120   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-03 16:10:56,525-Speed 13807.84 samples/sec   Loss 3.1894   LearningRate 0.0007   Epoch: 9   Global Step: 16130   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-03 16:11:14,737-Speed 13495.39 samples/sec   Loss 3.2053   LearningRate 0.0007   Epoch: 9   Global Step: 16140   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-03 16:11:33,119-Speed 13370.46 samples/sec   Loss 3.1987   LearningRate 0.0007   Epoch: 9   Global Step: 16150   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-03 16:11:50,887-Speed 13832.76 samples/sec   Loss 3.2051   LearningRate 0.0007   Epoch: 9   Global Step: 16160   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-03 16:12:08,660-Speed 13829.01 samples/sec   Loss 3.2048   LearningRate 0.0007   Epoch: 9   Global Step: 16170   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-03 16:12:26,467-Speed 13802.66 samples/sec   Loss 3.1724   LearningRate 0.0007   Epoch: 9   Global Step: 16180   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-03 16:12:44,264-Speed 13809.79 samples/sec   Loss 3.1794   LearningRate 0.0007   Epoch: 9   Global Step: 16190   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-03 16:13:02,147-Speed 13743.51 samples/sec   Loss 3.1966   LearningRate 0.0007   Epoch: 9   Global Step: 16200   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-03 16:13:19,988-Speed 13775.89 samples/sec   Loss 3.2001   LearningRate 0.0007   Epoch: 9   Global Step: 16210   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-03 16:13:37,790-Speed 13805.38 samples/sec   Loss 3.2012   LearningRate 0.0007   Epoch: 9   Global Step: 16220   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-03 16:13:55,643-Speed 13767.22 samples/sec   Loss 3.2076   LearningRate 0.0007   Epoch: 9   Global Step: 16230   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-03 16:14:13,509-Speed 13757.25 samples/sec   Loss 3.1818   LearningRate 0.0007   Epoch: 9   Global Step: 16240   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-03 16:14:31,374-Speed 13758.71 samples/sec   Loss 3.1791   LearningRate 0.0007   Epoch: 9   Global Step: 16250   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-03 16:14:49,188-Speed 13796.47 samples/sec   Loss 3.1768   LearningRate 0.0007   Epoch: 9   Global Step: 16260   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-03 16:15:07,057-Speed 13754.72 samples/sec   Loss 3.1929   LearningRate 0.0007   Epoch: 9   Global Step: 16270   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-03 16:15:24,888-Speed 13783.70 samples/sec   Loss 3.2151   LearningRate 0.0007   Epoch: 9   Global Step: 16280   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-03 16:15:42,816-Speed 13709.03 samples/sec   Loss 3.2011   LearningRate 0.0007   Epoch: 9   Global Step: 16290   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-03 16:16:00,729-Speed 13720.02 samples/sec   Loss 3.1923   LearningRate 0.0007   Epoch: 9   Global Step: 16300   Fp16 Grad Scale: 65536   Required: 27 hours
Training: 2022-03-03 16:16:18,651-Speed 13714.40 samples/sec   Loss 3.1823   LearningRate 0.0007   Epoch: 9   Global Step: 16310   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-03 16:16:36,440-Speed 13815.69 samples/sec   Loss 3.1757   LearningRate 0.0007   Epoch: 9   Global Step: 16320   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-03 16:16:54,173-Speed 13860.11 samples/sec   Loss 3.1677   LearningRate 0.0007   Epoch: 9   Global Step: 16330   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-03 16:17:11,973-Speed 13807.63 samples/sec   Loss 3.1443   LearningRate 0.0007   Epoch: 9   Global Step: 16340   Fp16 Grad Scale: 32768   Required: 27 hours
Training: 2022-03-03 16:17:29,730-Speed 13842.30 samples/sec   Loss 3.1642   LearningRate 0.0007   Epoch: 9   Global Step: 16350   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-03-03 16:17:47,559-Speed 13785.07 samples/sec   Loss 3.1795   LearningRate 0.0007   Epoch: 9   Global Step: 16360   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-03-03 16:18:05,356-Speed 13810.65 samples/sec   Loss 3.1739   LearningRate 0.0007   Epoch: 9   Global Step: 16370   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-03-03 16:18:23,130-Speed 13827.67 samples/sec   Loss 3.1813   LearningRate 0.0007   Epoch: 9   Global Step: 16380   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-03-03 16:18:40,874-Speed 13851.94 samples/sec   Loss 3.1811   LearningRate 0.0007   Epoch: 9   Global Step: 16390   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-03-03 16:18:58,639-Speed 13835.13 samples/sec   Loss 3.1849   LearningRate 0.0007   Epoch: 9   Global Step: 16400   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-03-03 16:19:16,442-Speed 13805.35 samples/sec   Loss 3.1993   LearningRate 0.0007   Epoch: 9   Global Step: 16410   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-03 16:19:34,272-Speed 13784.53 samples/sec   Loss 3.1661   LearningRate 0.0007   Epoch: 9   Global Step: 16420   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-03 16:19:52,035-Speed 13836.02 samples/sec   Loss 3.1639   LearningRate 0.0007   Epoch: 9   Global Step: 16430   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-03 16:20:09,836-Speed 13807.46 samples/sec   Loss 3.1617   LearningRate 0.0007   Epoch: 9   Global Step: 16440   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-03 16:20:27,652-Speed 13794.64 samples/sec   Loss 3.1473   LearningRate 0.0007   Epoch: 9   Global Step: 16450   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-03 16:20:45,523-Speed 13752.68 samples/sec   Loss 3.1601   LearningRate 0.0007   Epoch: 9   Global Step: 16460   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-03 16:21:03,342-Speed 13793.25 samples/sec   Loss 3.1759   LearningRate 0.0007   Epoch: 9   Global Step: 16470   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-03 16:21:21,157-Speed 13796.16 samples/sec   Loss 3.1603   LearningRate 0.0007   Epoch: 9   Global Step: 16480   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-03 16:21:39,033-Speed 13749.50 samples/sec   Loss 3.1657   LearningRate 0.0007   Epoch: 9   Global Step: 16490   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-03 16:21:56,893-Speed 13760.82 samples/sec   Loss 3.1260   LearningRate 0.0007   Epoch: 9   Global Step: 16500   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-03 16:22:14,700-Speed 13802.24 samples/sec   Loss 3.1409   LearningRate 0.0007   Epoch: 9   Global Step: 16510   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-03-03 16:22:32,544-Speed 13773.71 samples/sec   Loss 3.1620   LearningRate 0.0007   Epoch: 9   Global Step: 16520   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-03-03 16:22:50,320-Speed 13826.20 samples/sec   Loss 3.1476   LearningRate 0.0007   Epoch: 9   Global Step: 16530   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-03-03 16:23:08,228-Speed 13724.53 samples/sec   Loss 3.1216   LearningRate 0.0007   Epoch: 9   Global Step: 16540   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-03-03 16:23:26,071-Speed 13774.39 samples/sec   Loss 3.1560   LearningRate 0.0007   Epoch: 9   Global Step: 16550   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-03-03 16:23:43,905-Speed 13781.03 samples/sec   Loss 3.1552   LearningRate 0.0007   Epoch: 9   Global Step: 16560   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-03-03 16:24:01,740-Speed 13780.23 samples/sec   Loss 3.1346   LearningRate 0.0007   Epoch: 9   Global Step: 16570   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-03 16:24:19,554-Speed 13797.48 samples/sec   Loss 3.1510   LearningRate 0.0007   Epoch: 9   Global Step: 16580   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-03 16:24:37,366-Speed 13798.73 samples/sec   Loss 3.1111   LearningRate 0.0007   Epoch: 9   Global Step: 16590   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-03 16:24:55,119-Speed 13843.91 samples/sec   Loss 3.1470   LearningRate 0.0007   Epoch: 9   Global Step: 16600   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-03 16:25:12,952-Speed 13781.98 samples/sec   Loss 3.1418   LearningRate 0.0007   Epoch: 9   Global Step: 16610   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-03 16:25:30,741-Speed 13817.93 samples/sec   Loss 3.1411   LearningRate 0.0007   Epoch: 9   Global Step: 16620   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-03 16:25:48,529-Speed 13816.51 samples/sec   Loss 3.1514   LearningRate 0.0007   Epoch: 9   Global Step: 16630   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-03 16:26:06,398-Speed 13755.12 samples/sec   Loss 3.1374   LearningRate 0.0007   Epoch: 9   Global Step: 16640   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-03 16:26:24,230-Speed 13782.76 samples/sec   Loss 3.1183   LearningRate 0.0007   Epoch: 9   Global Step: 16650   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-03 16:26:42,189-Speed 13686.80 samples/sec   Loss 3.1361   LearningRate 0.0007   Epoch: 9   Global Step: 16660   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-03 16:27:00,003-Speed 13796.64 samples/sec   Loss 3.1299   LearningRate 0.0007   Epoch: 9   Global Step: 16670   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-03 16:27:17,836-Speed 13782.35 samples/sec   Loss 3.1287   LearningRate 0.0007   Epoch: 9   Global Step: 16680   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-03 16:27:35,668-Speed 13782.73 samples/sec   Loss 3.1115   LearningRate 0.0007   Epoch: 9   Global Step: 16690   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-03 16:27:53,475-Speed 13802.36 samples/sec   Loss 3.1490   LearningRate 0.0007   Epoch: 9   Global Step: 16700   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-03 16:28:11,345-Speed 13753.36 samples/sec   Loss 3.1454   LearningRate 0.0007   Epoch: 9   Global Step: 16710   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-03 16:28:29,111-Speed 13834.61 samples/sec   Loss 3.0783   LearningRate 0.0007   Epoch: 9   Global Step: 16720   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-03 16:28:46,975-Speed 13757.92 samples/sec   Loss 3.1167   LearningRate 0.0007   Epoch: 9   Global Step: 16730   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-03 16:29:04,823-Speed 13770.22 samples/sec   Loss 3.1386   LearningRate 0.0007   Epoch: 9   Global Step: 16740   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-03 16:29:22,625-Speed 13805.72 samples/sec   Loss 3.1336   LearningRate 0.0007   Epoch: 9   Global Step: 16750   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-03 16:29:40,403-Speed 13825.32 samples/sec   Loss 3.1310   LearningRate 0.0007   Epoch: 9   Global Step: 16760   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-03 16:29:58,242-Speed 13777.41 samples/sec   Loss 3.1093   LearningRate 0.0007   Epoch: 9   Global Step: 16770   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-03-03 16:30:16,122-Speed 13747.13 samples/sec   Loss 3.1437   LearningRate 0.0007   Epoch: 9   Global Step: 16780   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-03-03 16:30:33,880-Speed 13840.04 samples/sec   Loss 3.1104   LearningRate 0.0007   Epoch: 9   Global Step: 16790   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-03-03 16:30:51,649-Speed 13831.73 samples/sec   Loss 3.1096   LearningRate 0.0007   Epoch: 9   Global Step: 16800   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-03-03 16:31:09,441-Speed 13813.59 samples/sec   Loss 3.1075   LearningRate 0.0007   Epoch: 9   Global Step: 16810   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-03-03 16:31:27,205-Speed 13836.10 samples/sec   Loss 3.1124   LearningRate 0.0007   Epoch: 9   Global Step: 16820   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-03 16:31:45,042-Speed 13779.28 samples/sec   Loss 3.0991   LearningRate 0.0007   Epoch: 9   Global Step: 16830   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-03 16:32:02,911-Speed 13754.52 samples/sec   Loss 3.1266   LearningRate 0.0007   Epoch: 9   Global Step: 16840   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-03 16:32:20,881-Speed 13676.66 samples/sec   Loss 3.0908   LearningRate 0.0007   Epoch: 9   Global Step: 16850   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-03 16:32:39,380-Speed 13286.44 samples/sec   Loss 3.0938   LearningRate 0.0007   Epoch: 9   Global Step: 16860   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-03 16:32:57,180-Speed 13807.14 samples/sec   Loss 3.0888   LearningRate 0.0007   Epoch: 9   Global Step: 16870   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-03 16:33:15,108-Speed 13709.09 samples/sec   Loss 3.1038   LearningRate 0.0007   Epoch: 9   Global Step: 16880   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-03 16:33:32,886-Speed 13824.44 samples/sec   Loss 3.1016   LearningRate 0.0007   Epoch: 9   Global Step: 16890   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-03-03 16:33:51,541-Speed 13174.82 samples/sec   Loss 3.1180   LearningRate 0.0007   Epoch: 9   Global Step: 16900   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-03-03 16:34:09,536-Speed 13658.32 samples/sec   Loss 3.0866   LearningRate 0.0007   Epoch: 9   Global Step: 16910   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-03-03 16:34:27,318-Speed 13821.58 samples/sec   Loss 3.1033   LearningRate 0.0007   Epoch: 9   Global Step: 16920   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-03-03 16:34:45,098-Speed 13823.28 samples/sec   Loss 3.1165   LearningRate 0.0007   Epoch: 9   Global Step: 16930   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-03-03 16:35:02,930-Speed 13783.18 samples/sec   Loss 3.0808   LearningRate 0.0007   Epoch: 9   Global Step: 16940   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-03-03 16:35:20,966-Speed 13626.99 samples/sec   Loss 3.1100   LearningRate 0.0007   Epoch: 9   Global Step: 16950   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-03-03 16:35:39,433-Speed 13308.73 samples/sec   Loss 3.1031   LearningRate 0.0007   Epoch: 9   Global Step: 16960   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-03-03 16:35:57,370-Speed 13701.74 samples/sec   Loss 3.0752   LearningRate 0.0007   Epoch: 9   Global Step: 16970   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-03-03 16:36:15,155-Speed 13819.86 samples/sec   Loss 3.0848   LearningRate 0.0007   Epoch: 9   Global Step: 16980   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-03-03 16:36:32,939-Speed 13819.92 samples/sec   Loss 3.0668   LearningRate 0.0007   Epoch: 9   Global Step: 16990   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-03 16:36:50,812-Speed 13751.70 samples/sec   Loss 3.0936   LearningRate 0.0007   Epoch: 9   Global Step: 17000   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-03 16:37:09,518-Speed 13138.76 samples/sec   Loss 3.1252   LearningRate 0.0007   Epoch: 9   Global Step: 17010   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-03 16:37:27,209-Speed 13892.83 samples/sec   Loss 3.0890   LearningRate 0.0007   Epoch: 9   Global Step: 17020   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-03 16:37:44,935-Speed 13865.46 samples/sec   Loss 3.0701   LearningRate 0.0007   Epoch: 9   Global Step: 17030   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-03 16:38:02,807-Speed 13751.84 samples/sec   Loss 3.0901   LearningRate 0.0007   Epoch: 9   Global Step: 17040   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-03 16:38:20,470-Speed 13915.75 samples/sec   Loss 3.0938   LearningRate 0.0007   Epoch: 9   Global Step: 17050   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-03-03 16:38:38,166-Speed 13889.03 samples/sec   Loss 3.0937   LearningRate 0.0007   Epoch: 9   Global Step: 17060   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-03-03 16:38:55,925-Speed 13839.39 samples/sec   Loss 3.0882   LearningRate 0.0007   Epoch: 9   Global Step: 17070   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-03-03 16:39:13,705-Speed 13823.18 samples/sec   Loss 3.0668   LearningRate 0.0007   Epoch: 9   Global Step: 17080   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-03-03 16:39:31,375-Speed 13908.92 samples/sec   Loss 3.0707   LearningRate 0.0007   Epoch: 9   Global Step: 17090   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-03-03 16:39:49,125-Speed 13846.68 samples/sec   Loss 3.0786   LearningRate 0.0007   Epoch: 9   Global Step: 17100   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-03-03 16:40:06,858-Speed 13860.36 samples/sec   Loss 3.0730   LearningRate 0.0007   Epoch: 9   Global Step: 17110   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-03-03 16:40:24,661-Speed 13804.97 samples/sec   Loss 3.0589   LearningRate 0.0007   Epoch: 9   Global Step: 17120   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-03-03 16:40:42,417-Speed 13841.90 samples/sec   Loss 3.0828   LearningRate 0.0007   Epoch: 9   Global Step: 17130   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-03-03 16:41:00,232-Speed 13796.94 samples/sec   Loss 3.0949   LearningRate 0.0007   Epoch: 9   Global Step: 17140   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-03-03 16:41:18,003-Speed 13830.36 samples/sec   Loss 3.0837   LearningRate 0.0007   Epoch: 9   Global Step: 17150   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-03 16:41:35,744-Speed 13854.03 samples/sec   Loss 3.0616   LearningRate 0.0007   Epoch: 9   Global Step: 17160   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-03 16:41:53,500-Speed 13841.76 samples/sec   Loss 3.0630   LearningRate 0.0007   Epoch: 9   Global Step: 17170   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-03 16:42:11,228-Speed 13863.87 samples/sec   Loss 3.0577   LearningRate 0.0007   Epoch: 9   Global Step: 17180   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-03 16:42:28,955-Speed 13864.50 samples/sec   Loss 3.0842   LearningRate 0.0007   Epoch: 9   Global Step: 17190   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-03 16:42:46,687-Speed 13860.96 samples/sec   Loss 3.0783   LearningRate 0.0007   Epoch: 9   Global Step: 17200   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-03 16:43:04,404-Speed 13871.93 samples/sec   Loss 3.0728   LearningRate 0.0007   Epoch: 9   Global Step: 17210   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-03 16:43:22,174-Speed 13830.74 samples/sec   Loss 3.0773   LearningRate 0.0007   Epoch: 9   Global Step: 17220   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-03 16:43:39,915-Speed 13854.42 samples/sec   Loss 3.0651   LearningRate 0.0007   Epoch: 9   Global Step: 17230   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-03 16:43:58,566-Speed 13179.16 samples/sec   Loss 3.0650   LearningRate 0.0007   Epoch: 9   Global Step: 17240   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-03 16:44:16,346-Speed 13822.85 samples/sec   Loss 3.0849   LearningRate 0.0007   Epoch: 9   Global Step: 17250   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-03-03 16:44:34,082-Speed 13857.37 samples/sec   Loss 3.0949   LearningRate 0.0007   Epoch: 9   Global Step: 17260   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-03-03 16:44:51,862-Speed 13823.56 samples/sec   Loss 3.0897   LearningRate 0.0007   Epoch: 9   Global Step: 17270   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-03-03 16:45:09,565-Speed 13883.38 samples/sec   Loss 3.0961   LearningRate 0.0007   Epoch: 9   Global Step: 17280   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-03-03 16:46:17,217-Speed 3632.76 samples/sec   Loss 3.0635   LearningRate 0.0007   Epoch: 10   Global Step: 17290   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-03 16:46:35,602-Speed 13368.07 samples/sec   Loss 3.0191   LearningRate 0.0007   Epoch: 10   Global Step: 17300   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-03 16:46:53,262-Speed 13917.36 samples/sec   Loss 3.0543   LearningRate 0.0007   Epoch: 10   Global Step: 17310   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-03 16:47:11,022-Speed 13838.74 samples/sec   Loss 3.0267   LearningRate 0.0007   Epoch: 10   Global Step: 17320   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-03 16:47:28,785-Speed 13836.44 samples/sec   Loss 3.0164   LearningRate 0.0007   Epoch: 10   Global Step: 17330   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-03 16:47:47,344-Speed 13242.83 samples/sec   Loss 3.0198   LearningRate 0.0007   Epoch: 10   Global Step: 17340   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-03 16:48:05,100-Speed 13842.12 samples/sec   Loss 3.0245   LearningRate 0.0007   Epoch: 10   Global Step: 17350   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-03 16:48:22,935-Speed 13780.44 samples/sec   Loss 3.0090   LearningRate 0.0007   Epoch: 10   Global Step: 17360   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-03 16:48:41,382-Speed 13323.73 samples/sec   Loss 3.0315   LearningRate 0.0007   Epoch: 10   Global Step: 17370   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-03 16:48:59,029-Speed 13927.72 samples/sec   Loss 3.0137   LearningRate 0.0007   Epoch: 10   Global Step: 17380   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-03 16:49:16,730-Speed 13884.60 samples/sec   Loss 3.0364   LearningRate 0.0007   Epoch: 10   Global Step: 17390   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-03-03 16:49:34,407-Speed 13904.37 samples/sec   Loss 3.0371   LearningRate 0.0007   Epoch: 10   Global Step: 17400   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-03 16:49:52,061-Speed 13921.38 samples/sec   Loss 3.0704   LearningRate 0.0007   Epoch: 10   Global Step: 17410   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-03 16:50:09,722-Speed 13916.32 samples/sec   Loss 3.0465   LearningRate 0.0007   Epoch: 10   Global Step: 17420   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-03 16:50:27,424-Speed 13884.01 samples/sec   Loss 3.0587   LearningRate 0.0007   Epoch: 10   Global Step: 17430   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-03 16:50:45,117-Speed 13891.13 samples/sec   Loss 3.0242   LearningRate 0.0007   Epoch: 10   Global Step: 17440   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-03 16:51:02,896-Speed 13824.16 samples/sec   Loss 3.0273   LearningRate 0.0007   Epoch: 10   Global Step: 17450   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-03 16:51:20,603-Speed 13880.57 samples/sec   Loss 3.0059   LearningRate 0.0007   Epoch: 10   Global Step: 17460   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-03 16:51:38,276-Speed 13907.17 samples/sec   Loss 3.0357   LearningRate 0.0007   Epoch: 10   Global Step: 17470   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-03 16:51:56,018-Speed 13852.12 samples/sec   Loss 3.0206   LearningRate 0.0007   Epoch: 10   Global Step: 17480   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-03-03 16:52:13,912-Speed 13735.08 samples/sec   Loss 3.0234   LearningRate 0.0007   Epoch: 10   Global Step: 17490   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-03-03 16:52:31,677-Speed 13835.01 samples/sec   Loss 3.0192   LearningRate 0.0007   Epoch: 10   Global Step: 17500   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-03-03 16:52:49,365-Speed 13894.53 samples/sec   Loss 3.0117   LearningRate 0.0007   Epoch: 10   Global Step: 17510   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-03-03 16:53:07,032-Speed 13912.60 samples/sec   Loss 3.0132   LearningRate 0.0007   Epoch: 10   Global Step: 17520   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-03-03 16:53:24,736-Speed 13882.04 samples/sec   Loss 3.0296   LearningRate 0.0007   Epoch: 10   Global Step: 17530   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-03-03 16:53:42,436-Speed 13885.27 samples/sec   Loss 3.0579   LearningRate 0.0007   Epoch: 10   Global Step: 17540   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-03-03 16:54:00,144-Speed 13879.73 samples/sec   Loss 3.0235   LearningRate 0.0007   Epoch: 10   Global Step: 17550   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-03-03 16:54:17,909-Speed 13835.41 samples/sec   Loss 3.0125   LearningRate 0.0007   Epoch: 10   Global Step: 17560   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-03-03 16:54:35,617-Speed 13881.27 samples/sec   Loss 3.0139   LearningRate 0.0007   Epoch: 10   Global Step: 17570   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-03-03 16:54:53,439-Speed 13790.06 samples/sec   Loss 3.0487   LearningRate 0.0007   Epoch: 10   Global Step: 17580   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-03 16:55:11,082-Speed 13930.66 samples/sec   Loss 3.0187   LearningRate 0.0007   Epoch: 10   Global Step: 17590   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-03 16:55:28,757-Speed 13905.68 samples/sec   Loss 3.0383   LearningRate 0.0007   Epoch: 10   Global Step: 17600   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-03 16:55:46,517-Speed 13838.97 samples/sec   Loss 3.0396   LearningRate 0.0007   Epoch: 10   Global Step: 17610   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-03 16:56:04,241-Speed 13866.04 samples/sec   Loss 2.9991   LearningRate 0.0007   Epoch: 10   Global Step: 17620   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-03 16:56:21,969-Speed 13864.07 samples/sec   Loss 3.0005   LearningRate 0.0007   Epoch: 10   Global Step: 17630   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-03 16:56:39,699-Speed 13861.82 samples/sec   Loss 3.0231   LearningRate 0.0007   Epoch: 10   Global Step: 17640   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-03 16:56:57,356-Speed 13920.14 samples/sec   Loss 3.0210   LearningRate 0.0007   Epoch: 10   Global Step: 17650   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-03-03 16:57:15,067-Speed 13876.66 samples/sec   Loss 3.0017   LearningRate 0.0007   Epoch: 10   Global Step: 17660   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-03-03 16:57:32,775-Speed 13879.06 samples/sec   Loss 3.0169   LearningRate 0.0007   Epoch: 10   Global Step: 17670   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-03-03 16:57:50,511-Speed 13857.53 samples/sec   Loss 3.0267   LearningRate 0.0007   Epoch: 10   Global Step: 17680   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-03-03 16:58:08,284-Speed 13829.39 samples/sec   Loss 3.0081   LearningRate 0.0007   Epoch: 10   Global Step: 17690   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-03-03 16:58:25,997-Speed 13875.14 samples/sec   Loss 3.0088   LearningRate 0.0007   Epoch: 10   Global Step: 17700   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-03-03 16:58:43,692-Speed 13889.76 samples/sec   Loss 2.9937   LearningRate 0.0007   Epoch: 10   Global Step: 17710   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-03-03 16:59:01,394-Speed 13883.61 samples/sec   Loss 2.9903   LearningRate 0.0007   Epoch: 10   Global Step: 17720   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-03-03 16:59:19,107-Speed 13875.86 samples/sec   Loss 3.0010   LearningRate 0.0007   Epoch: 10   Global Step: 17730   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-03-03 16:59:36,838-Speed 13861.22 samples/sec   Loss 3.0091   LearningRate 0.0007   Epoch: 10   Global Step: 17740   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-03-03 16:59:54,524-Speed 13896.63 samples/sec   Loss 2.9919   LearningRate 0.0007   Epoch: 10   Global Step: 17750   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-03 17:00:12,278-Speed 13843.15 samples/sec   Loss 2.9950   LearningRate 0.0007   Epoch: 10   Global Step: 17760   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-03-03 17:00:30,072-Speed 13812.35 samples/sec   Loss 3.0102   LearningRate 0.0007   Epoch: 10   Global Step: 17770   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-03-03 17:00:47,820-Speed 13848.98 samples/sec   Loss 2.9773   LearningRate 0.0007   Epoch: 10   Global Step: 17780   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-03-03 17:01:05,555-Speed 13857.94 samples/sec   Loss 3.0015   LearningRate 0.0007   Epoch: 10   Global Step: 17790   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-03-03 17:01:23,306-Speed 13845.30 samples/sec   Loss 3.0144   LearningRate 0.0007   Epoch: 10   Global Step: 17800   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-03-03 17:01:41,052-Speed 13849.50 samples/sec   Loss 3.0253   LearningRate 0.0007   Epoch: 10   Global Step: 17810   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-03-03 17:01:58,749-Speed 13888.84 samples/sec   Loss 3.0051   LearningRate 0.0007   Epoch: 10   Global Step: 17820   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-03-03 17:02:16,435-Speed 13896.17 samples/sec   Loss 2.9761   LearningRate 0.0007   Epoch: 10   Global Step: 17830   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-03-03 17:02:34,082-Speed 13927.39 samples/sec   Loss 2.9828   LearningRate 0.0007   Epoch: 10   Global Step: 17840   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-03-03 17:02:51,799-Speed 13872.43 samples/sec   Loss 2.9829   LearningRate 0.0007   Epoch: 10   Global Step: 17850   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-03-03 17:03:09,493-Speed 13890.56 samples/sec   Loss 2.9735   LearningRate 0.0007   Epoch: 10   Global Step: 17860   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-03 17:03:27,224-Speed 13861.54 samples/sec   Loss 2.9830   LearningRate 0.0007   Epoch: 10   Global Step: 17870   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-03 17:03:44,983-Speed 13839.13 samples/sec   Loss 3.0161   LearningRate 0.0007   Epoch: 10   Global Step: 17880   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-03 17:04:02,739-Speed 13841.46 samples/sec   Loss 2.9918   LearningRate 0.0007   Epoch: 10   Global Step: 17890   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-03 17:04:20,408-Speed 13910.70 samples/sec   Loss 2.9606   LearningRate 0.0007   Epoch: 10   Global Step: 17900   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-03 17:04:38,144-Speed 13857.71 samples/sec   Loss 2.9632   LearningRate 0.0007   Epoch: 10   Global Step: 17910   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-03 17:04:55,828-Speed 13898.36 samples/sec   Loss 2.9838   LearningRate 0.0007   Epoch: 10   Global Step: 17920   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-03 17:05:13,569-Speed 13853.78 samples/sec   Loss 2.9891   LearningRate 0.0007   Epoch: 10   Global Step: 17930   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-03 17:05:31,297-Speed 13863.45 samples/sec   Loss 2.9870   LearningRate 0.0007   Epoch: 10   Global Step: 17940   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-03 17:05:49,033-Speed 13857.85 samples/sec   Loss 2.9921   LearningRate 0.0007   Epoch: 10   Global Step: 17950   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-03 17:06:06,739-Speed 13881.34 samples/sec   Loss 2.9800   LearningRate 0.0007   Epoch: 10   Global Step: 17960   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-03-03 17:06:24,419-Speed 13900.64 samples/sec   Loss 2.9428   LearningRate 0.0007   Epoch: 10   Global Step: 17970   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-03 17:06:42,131-Speed 13876.65 samples/sec   Loss 2.9436   LearningRate 0.0007   Epoch: 10   Global Step: 17980   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-03 17:06:59,866-Speed 13858.48 samples/sec   Loss 2.9832   LearningRate 0.0007   Epoch: 10   Global Step: 17990   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-03 17:07:17,590-Speed 13866.91 samples/sec   Loss 2.9707   LearningRate 0.0007   Epoch: 10   Global Step: 18000   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-03-03 17:07:35,313-Speed 13867.65 samples/sec   Loss 2.9794   LearningRate 0.0007   Epoch: 10   Global Step: 18010   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-03-03 17:07:53,013-Speed 13885.03 samples/sec   Loss 2.9613   LearningRate 0.0007   Epoch: 10   Global Step: 18020   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-03-03 17:08:10,750-Speed 13857.42 samples/sec   Loss 2.9603   LearningRate 0.0007   Epoch: 10   Global Step: 18030   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-03-03 17:08:28,489-Speed 13854.78 samples/sec   Loss 2.9694   LearningRate 0.0007   Epoch: 10   Global Step: 18040   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-03-03 17:08:46,267-Speed 13824.98 samples/sec   Loss 2.9843   LearningRate 0.0007   Epoch: 10   Global Step: 18050   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-03-03 17:09:03,998-Speed 13861.25 samples/sec   Loss 3.0008   LearningRate 0.0007   Epoch: 10   Global Step: 18060   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-03-03 17:09:21,734-Speed 13857.49 samples/sec   Loss 2.9594   LearningRate 0.0007   Epoch: 10   Global Step: 18070   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-03-03 17:09:39,420-Speed 13897.13 samples/sec   Loss 2.9590   LearningRate 0.0007   Epoch: 10   Global Step: 18080   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-03-03 17:09:57,195-Speed 13826.98 samples/sec   Loss 2.9538   LearningRate 0.0007   Epoch: 10   Global Step: 18090   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-03-03 17:10:14,944-Speed 13847.33 samples/sec   Loss 2.9430   LearningRate 0.0007   Epoch: 10   Global Step: 18100   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-03 17:10:32,681-Speed 13856.31 samples/sec   Loss 2.9225   LearningRate 0.0007   Epoch: 10   Global Step: 18110   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-03 17:10:50,414-Speed 13860.20 samples/sec   Loss 2.9596   LearningRate 0.0007   Epoch: 10   Global Step: 18120   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-03 17:11:08,099-Speed 13897.61 samples/sec   Loss 2.9412   LearningRate 0.0007   Epoch: 10   Global Step: 18130   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-03 17:11:25,825-Speed 13865.10 samples/sec   Loss 2.9424   LearningRate 0.0007   Epoch: 10   Global Step: 18140   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-03 17:11:43,544-Speed 13870.71 samples/sec   Loss 2.9851   LearningRate 0.0007   Epoch: 10   Global Step: 18150   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-03 17:12:01,207-Speed 13914.64 samples/sec   Loss 2.9633   LearningRate 0.0007   Epoch: 10   Global Step: 18160   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-03 17:12:18,881-Speed 13906.19 samples/sec   Loss 2.9412   LearningRate 0.0007   Epoch: 10   Global Step: 18170   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-03 17:12:36,664-Speed 13821.50 samples/sec   Loss 2.9291   LearningRate 0.0007   Epoch: 10   Global Step: 18180   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-03 17:12:54,390-Speed 13864.87 samples/sec   Loss 2.9377   LearningRate 0.0007   Epoch: 10   Global Step: 18190   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-03 17:13:12,094-Speed 13882.42 samples/sec   Loss 2.9382   LearningRate 0.0007   Epoch: 10   Global Step: 18200   Fp16 Grad Scale: 131072   Required: 26 hours
Training: 2022-03-03 17:13:29,811-Speed 13872.68 samples/sec   Loss 2.9433   LearningRate 0.0007   Epoch: 10   Global Step: 18210   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-03 17:13:47,592-Speed 13824.31 samples/sec   Loss 2.9568   LearningRate 0.0007   Epoch: 10   Global Step: 18220   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-03 17:14:05,400-Speed 13801.88 samples/sec   Loss 2.9364   LearningRate 0.0007   Epoch: 10   Global Step: 18230   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-03 17:14:23,100-Speed 13886.03 samples/sec   Loss 2.9350   LearningRate 0.0007   Epoch: 10   Global Step: 18240   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-03 17:14:40,860-Speed 13838.72 samples/sec   Loss 2.9435   LearningRate 0.0007   Epoch: 10   Global Step: 18250   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-03 17:14:58,569-Speed 13878.00 samples/sec   Loss 2.9455   LearningRate 0.0007   Epoch: 10   Global Step: 18260   Fp16 Grad Scale: 65536   Required: 26 hours
Training: 2022-03-03 17:15:16,349-Speed 13823.25 samples/sec   Loss 2.9215   LearningRate 0.0007   Epoch: 10   Global Step: 18270   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-03-03 17:15:34,042-Speed 13891.59 samples/sec   Loss 2.9200   LearningRate 0.0007   Epoch: 10   Global Step: 18280   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-03-03 17:15:51,775-Speed 13859.60 samples/sec   Loss 2.9062   LearningRate 0.0007   Epoch: 10   Global Step: 18290   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-03-03 17:16:09,595-Speed 13791.85 samples/sec   Loss 2.9442   LearningRate 0.0007   Epoch: 10   Global Step: 18300   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-03-03 17:16:27,317-Speed 13868.61 samples/sec   Loss 2.9439   LearningRate 0.0007   Epoch: 10   Global Step: 18310   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-03-03 17:16:44,978-Speed 13916.52 samples/sec   Loss 2.9262   LearningRate 0.0007   Epoch: 10   Global Step: 18320   Fp16 Grad Scale: 32768   Required: 26 hours
Training: 2022-03-03 17:17:02,762-Speed 13819.30 samples/sec   Loss 2.9511   LearningRate 0.0007   Epoch: 10   Global Step: 18330   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-03 17:17:20,435-Speed 13906.81 samples/sec   Loss 2.9241   LearningRate 0.0007   Epoch: 10   Global Step: 18340   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-03 17:17:38,228-Speed 13813.52 samples/sec   Loss 2.9336   LearningRate 0.0007   Epoch: 10   Global Step: 18350   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-03 17:17:55,898-Speed 13909.74 samples/sec   Loss 2.9367   LearningRate 0.0007   Epoch: 10   Global Step: 18360   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-03 17:18:13,642-Speed 13852.27 samples/sec   Loss 2.9256   LearningRate 0.0007   Epoch: 10   Global Step: 18370   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-03 17:18:31,408-Speed 13834.37 samples/sec   Loss 2.9560   LearningRate 0.0007   Epoch: 10   Global Step: 18380   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-03 17:18:49,219-Speed 13798.62 samples/sec   Loss 2.9589   LearningRate 0.0007   Epoch: 10   Global Step: 18390   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-03 17:19:06,871-Speed 13923.31 samples/sec   Loss 2.9056   LearningRate 0.0007   Epoch: 10   Global Step: 18400   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-03 17:19:24,621-Speed 13846.83 samples/sec   Loss 2.9164   LearningRate 0.0007   Epoch: 10   Global Step: 18410   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-03 17:19:42,339-Speed 13872.13 samples/sec   Loss 2.9338   LearningRate 0.0007   Epoch: 10   Global Step: 18420   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-03 17:20:00,052-Speed 13875.60 samples/sec   Loss 2.9063   LearningRate 0.0007   Epoch: 10   Global Step: 18430   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-03 17:20:17,861-Speed 13799.94 samples/sec   Loss 2.9225   LearningRate 0.0007   Epoch: 10   Global Step: 18440   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-03 17:20:35,594-Speed 13860.33 samples/sec   Loss 2.9392   LearningRate 0.0007   Epoch: 10   Global Step: 18450   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-03 17:20:53,331-Speed 13856.54 samples/sec   Loss 2.9149   LearningRate 0.0007   Epoch: 10   Global Step: 18460   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-03 17:21:11,044-Speed 13875.62 samples/sec   Loss 2.9201   LearningRate 0.0007   Epoch: 10   Global Step: 18470   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-03-03 17:21:28,788-Speed 13851.55 samples/sec   Loss 2.9120   LearningRate 0.0007   Epoch: 10   Global Step: 18480   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-03-03 17:21:46,523-Speed 13858.43 samples/sec   Loss 2.8831   LearningRate 0.0007   Epoch: 10   Global Step: 18490   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-03-03 17:22:04,241-Speed 13872.20 samples/sec   Loss 2.9140   LearningRate 0.0007   Epoch: 10   Global Step: 18500   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-03 17:22:21,907-Speed 13911.79 samples/sec   Loss 2.9709   LearningRate 0.0007   Epoch: 10   Global Step: 18510   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-03 17:22:39,642-Speed 13858.79 samples/sec   Loss 2.9022   LearningRate 0.0007   Epoch: 10   Global Step: 18520   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-03 17:22:57,320-Speed 13902.70 samples/sec   Loss 2.8977   LearningRate 0.0007   Epoch: 10   Global Step: 18530   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-03 17:23:15,042-Speed 13868.81 samples/sec   Loss 2.9414   LearningRate 0.0007   Epoch: 10   Global Step: 18540   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-03 17:23:32,701-Speed 13917.83 samples/sec   Loss 2.9015   LearningRate 0.0007   Epoch: 10   Global Step: 18550   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-03 17:23:50,456-Speed 13842.68 samples/sec   Loss 2.9083   LearningRate 0.0007   Epoch: 10   Global Step: 18560   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-03 17:24:08,123-Speed 13911.81 samples/sec   Loss 2.8600   LearningRate 0.0007   Epoch: 10   Global Step: 18570   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-03 17:24:25,813-Speed 13893.18 samples/sec   Loss 2.9127   LearningRate 0.0007   Epoch: 10   Global Step: 18580   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-03 17:24:43,539-Speed 13865.72 samples/sec   Loss 2.9270   LearningRate 0.0007   Epoch: 10   Global Step: 18590   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-03 17:25:01,285-Speed 13849.16 samples/sec   Loss 2.9019   LearningRate 0.0007   Epoch: 10   Global Step: 18600   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-03 17:25:19,023-Speed 13856.00 samples/sec   Loss 2.8862   LearningRate 0.0007   Epoch: 10   Global Step: 18610   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-03 17:25:36,754-Speed 13861.83 samples/sec   Loss 2.9019   LearningRate 0.0007   Epoch: 10   Global Step: 18620   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-03 17:25:54,534-Speed 13822.76 samples/sec   Loss 2.9174   LearningRate 0.0007   Epoch: 10   Global Step: 18630   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-03 17:26:12,244-Speed 13878.69 samples/sec   Loss 2.8947   LearningRate 0.0007   Epoch: 10   Global Step: 18640   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-03 17:26:30,022-Speed 13825.03 samples/sec   Loss 2.9127   LearningRate 0.0007   Epoch: 10   Global Step: 18650   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-03 17:26:47,703-Speed 13900.80 samples/sec   Loss 2.8929   LearningRate 0.0007   Epoch: 10   Global Step: 18660   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-03 17:27:05,482-Speed 13823.25 samples/sec   Loss 2.8855   LearningRate 0.0007   Epoch: 10   Global Step: 18670   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-03 17:27:23,378-Speed 13733.68 samples/sec   Loss 2.9057   LearningRate 0.0007   Epoch: 10   Global Step: 18680   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-03 17:27:41,131-Speed 13844.19 samples/sec   Loss 2.9152   LearningRate 0.0007   Epoch: 10   Global Step: 18690   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-03 17:27:59,056-Speed 13711.73 samples/sec   Loss 2.9067   LearningRate 0.0007   Epoch: 10   Global Step: 18700   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-03 17:28:16,771-Speed 13873.89 samples/sec   Loss 2.8696   LearningRate 0.0007   Epoch: 10   Global Step: 18710   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-03 17:28:34,514-Speed 13851.69 samples/sec   Loss 2.8995   LearningRate 0.0007   Epoch: 10   Global Step: 18720   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-03 17:28:52,257-Speed 13852.25 samples/sec   Loss 2.8950   LearningRate 0.0007   Epoch: 10   Global Step: 18730   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-03 17:29:09,984-Speed 13864.71 samples/sec   Loss 2.8833   LearningRate 0.0007   Epoch: 10   Global Step: 18740   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-03 17:29:27,809-Speed 13788.67 samples/sec   Loss 2.8588   LearningRate 0.0007   Epoch: 10   Global Step: 18750   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-03-03 17:29:45,554-Speed 13850.31 samples/sec   Loss 2.8986   LearningRate 0.0007   Epoch: 10   Global Step: 18760   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-03 17:30:03,318-Speed 13835.50 samples/sec   Loss 2.8909   LearningRate 0.0007   Epoch: 10   Global Step: 18770   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-03 17:30:21,019-Speed 13885.26 samples/sec   Loss 2.8910   LearningRate 0.0007   Epoch: 10   Global Step: 18780   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-03 17:30:38,762-Speed 13851.64 samples/sec   Loss 2.8805   LearningRate 0.0007   Epoch: 10   Global Step: 18790   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-03 17:30:56,583-Speed 13791.33 samples/sec   Loss 2.8653   LearningRate 0.0007   Epoch: 10   Global Step: 18800   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-03 17:31:14,320-Speed 13856.98 samples/sec   Loss 2.8673   LearningRate 0.0007   Epoch: 10   Global Step: 18810   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-03 17:31:32,032-Speed 13876.20 samples/sec   Loss 2.8941   LearningRate 0.0007   Epoch: 10   Global Step: 18820   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-03 17:31:49,735-Speed 13883.60 samples/sec   Loss 2.8790   LearningRate 0.0007   Epoch: 10   Global Step: 18830   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-03 17:32:07,499-Speed 13835.32 samples/sec   Loss 2.8857   LearningRate 0.0007   Epoch: 10   Global Step: 18840   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-03 17:32:25,187-Speed 13895.39 samples/sec   Loss 2.8660   LearningRate 0.0007   Epoch: 10   Global Step: 18850   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-03 17:32:42,900-Speed 13875.46 samples/sec   Loss 2.8685   LearningRate 0.0007   Epoch: 10   Global Step: 18860   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-03 17:33:00,703-Speed 13804.93 samples/sec   Loss 2.8745   LearningRate 0.0007   Epoch: 10   Global Step: 18870   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-03 17:33:18,434-Speed 13861.45 samples/sec   Loss 2.9025   LearningRate 0.0007   Epoch: 10   Global Step: 18880   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-03 17:33:36,115-Speed 13900.95 samples/sec   Loss 2.8978   LearningRate 0.0007   Epoch: 10   Global Step: 18890   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-03 17:33:53,835-Speed 13869.65 samples/sec   Loss 2.8795   LearningRate 0.0007   Epoch: 10   Global Step: 18900   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-03 17:34:11,553-Speed 13871.95 samples/sec   Loss 2.8629   LearningRate 0.0007   Epoch: 10   Global Step: 18910   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-03 17:34:29,311-Speed 13840.21 samples/sec   Loss 2.8570   LearningRate 0.0007   Epoch: 10   Global Step: 18920   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-03 17:34:47,069-Speed 13840.70 samples/sec   Loss 2.8448   LearningRate 0.0007   Epoch: 10   Global Step: 18930   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-03 17:35:04,895-Speed 13787.37 samples/sec   Loss 2.8665   LearningRate 0.0007   Epoch: 10   Global Step: 18940   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-03 17:35:22,659-Speed 13835.85 samples/sec   Loss 2.9062   LearningRate 0.0007   Epoch: 10   Global Step: 18950   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-03 17:35:40,396-Speed 13856.65 samples/sec   Loss 2.8729   LearningRate 0.0007   Epoch: 10   Global Step: 18960   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-03 17:35:58,092-Speed 13888.42 samples/sec   Loss 2.8947   LearningRate 0.0006   Epoch: 10   Global Step: 18970   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-03 17:36:15,869-Speed 13824.92 samples/sec   Loss 2.8805   LearningRate 0.0006   Epoch: 10   Global Step: 18980   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-03 17:36:33,681-Speed 13801.11 samples/sec   Loss 2.8711   LearningRate 0.0006   Epoch: 10   Global Step: 18990   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-03 17:36:51,381-Speed 13885.97 samples/sec   Loss 2.8687   LearningRate 0.0006   Epoch: 10   Global Step: 19000   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-03 17:37:09,151-Speed 13830.30 samples/sec   Loss 2.9261   LearningRate 0.0006   Epoch: 10   Global Step: 19010   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-03 17:38:16,686-Speed 3639.08 samples/sec   Loss 2.8415   LearningRate 0.0006   Epoch: 11   Global Step: 19020   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-03 17:38:34,410-Speed 13866.44 samples/sec   Loss 2.8228   LearningRate 0.0006   Epoch: 11   Global Step: 19030   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-03 17:38:52,075-Speed 13913.90 samples/sec   Loss 2.8350   LearningRate 0.0006   Epoch: 11   Global Step: 19040   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-03 17:39:09,808-Speed 13859.10 samples/sec   Loss 2.8362   LearningRate 0.0006   Epoch: 11   Global Step: 19050   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-03 17:39:27,557-Speed 13847.29 samples/sec   Loss 2.8287   LearningRate 0.0006   Epoch: 11   Global Step: 19060   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-03 17:39:45,285-Speed 13863.93 samples/sec   Loss 2.8569   LearningRate 0.0006   Epoch: 11   Global Step: 19070   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-03 17:40:03,003-Speed 13871.76 samples/sec   Loss 2.8640   LearningRate 0.0006   Epoch: 11   Global Step: 19080   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-03 17:40:20,715-Speed 13876.32 samples/sec   Loss 2.8705   LearningRate 0.0006   Epoch: 11   Global Step: 19090   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-03 17:40:38,449-Speed 13858.42 samples/sec   Loss 2.8469   LearningRate 0.0006   Epoch: 11   Global Step: 19100   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-03 17:40:56,120-Speed 13908.93 samples/sec   Loss 2.8204   LearningRate 0.0006   Epoch: 11   Global Step: 19110   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-03 17:41:13,886-Speed 13833.85 samples/sec   Loss 2.8335   LearningRate 0.0006   Epoch: 11   Global Step: 19120   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-03 17:41:31,673-Speed 13818.04 samples/sec   Loss 2.8485   LearningRate 0.0006   Epoch: 11   Global Step: 19130   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-03 17:41:49,405-Speed 13859.93 samples/sec   Loss 2.8360   LearningRate 0.0006   Epoch: 11   Global Step: 19140   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-03 17:42:07,145-Speed 13854.97 samples/sec   Loss 2.8165   LearningRate 0.0006   Epoch: 11   Global Step: 19150   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-03 17:42:24,902-Speed 13840.64 samples/sec   Loss 2.8370   LearningRate 0.0006   Epoch: 11   Global Step: 19160   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-03 17:42:42,594-Speed 13892.58 samples/sec   Loss 2.8456   LearningRate 0.0006   Epoch: 11   Global Step: 19170   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-03 17:43:00,388-Speed 13811.59 samples/sec   Loss 2.8441   LearningRate 0.0006   Epoch: 11   Global Step: 19180   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-03 17:43:18,176-Speed 13817.91 samples/sec   Loss 2.8324   LearningRate 0.0006   Epoch: 11   Global Step: 19190   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-03 17:43:35,964-Speed 13816.79 samples/sec   Loss 2.8490   LearningRate 0.0006   Epoch: 11   Global Step: 19200   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-03-03 17:43:53,739-Speed 13826.64 samples/sec   Loss 2.8389   LearningRate 0.0006   Epoch: 11   Global Step: 19210   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-03-03 17:44:11,507-Speed 13833.08 samples/sec   Loss 2.8238   LearningRate 0.0006   Epoch: 11   Global Step: 19220   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-03-03 17:44:29,217-Speed 13877.75 samples/sec   Loss 2.8301   LearningRate 0.0006   Epoch: 11   Global Step: 19230   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-03-03 17:44:46,958-Speed 13853.90 samples/sec   Loss 2.8158   LearningRate 0.0006   Epoch: 11   Global Step: 19240   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-03-03 17:45:04,664-Speed 13880.84 samples/sec   Loss 2.8434   LearningRate 0.0006   Epoch: 11   Global Step: 19250   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-03-03 17:45:22,409-Speed 13850.88 samples/sec   Loss 2.8381   LearningRate 0.0006   Epoch: 11   Global Step: 19260   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-03-03 17:45:40,166-Speed 13840.66 samples/sec   Loss 2.8388   LearningRate 0.0006   Epoch: 11   Global Step: 19270   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-03-03 17:45:57,937-Speed 13830.57 samples/sec   Loss 2.8211   LearningRate 0.0006   Epoch: 11   Global Step: 19280   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-03 17:46:15,702-Speed 13834.99 samples/sec   Loss 2.8174   LearningRate 0.0006   Epoch: 11   Global Step: 19290   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-03 17:46:33,444-Speed 13852.18 samples/sec   Loss 2.8027   LearningRate 0.0006   Epoch: 11   Global Step: 19300   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-03 17:46:51,134-Speed 13893.61 samples/sec   Loss 2.8403   LearningRate 0.0006   Epoch: 11   Global Step: 19310   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-03 17:47:08,844-Speed 13878.15 samples/sec   Loss 2.8605   LearningRate 0.0006   Epoch: 11   Global Step: 19320   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-03 17:47:26,596-Speed 13844.59 samples/sec   Loss 2.8397   LearningRate 0.0006   Epoch: 11   Global Step: 19330   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-03 17:47:44,314-Speed 13871.75 samples/sec   Loss 2.8112   LearningRate 0.0006   Epoch: 11   Global Step: 19340   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-03 17:48:02,076-Speed 13836.87 samples/sec   Loss 2.8186   LearningRate 0.0006   Epoch: 11   Global Step: 19350   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-03 17:48:19,884-Speed 13801.94 samples/sec   Loss 2.8019   LearningRate 0.0006   Epoch: 11   Global Step: 19360   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-03 17:48:37,603-Speed 13871.35 samples/sec   Loss 2.8288   LearningRate 0.0006   Epoch: 11   Global Step: 19370   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-03 17:48:55,410-Speed 13801.59 samples/sec   Loss 2.8139   LearningRate 0.0006   Epoch: 11   Global Step: 19380   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-03-03 17:49:13,107-Speed 13888.30 samples/sec   Loss 2.8661   LearningRate 0.0006   Epoch: 11   Global Step: 19390   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-03-03 17:49:30,870-Speed 13836.57 samples/sec   Loss 2.8414   LearningRate 0.0006   Epoch: 11   Global Step: 19400   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-03-03 17:49:48,828-Speed 13686.39 samples/sec   Loss 2.8084   LearningRate 0.0006   Epoch: 11   Global Step: 19410   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-03-03 17:50:06,633-Speed 13803.22 samples/sec   Loss 2.8131   LearningRate 0.0006   Epoch: 11   Global Step: 19420   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-03-03 17:50:24,389-Speed 13841.56 samples/sec   Loss 2.7864   LearningRate 0.0006   Epoch: 11   Global Step: 19430   Fp16 Grad Scale: 131072   Required: 25 hours
Training: 2022-03-03 17:50:42,115-Speed 13865.56 samples/sec   Loss 2.8030   LearningRate 0.0006   Epoch: 11   Global Step: 19440   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-03 17:50:59,963-Speed 13770.85 samples/sec   Loss 2.7975   LearningRate 0.0006   Epoch: 11   Global Step: 19450   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-03 17:51:17,773-Speed 13800.31 samples/sec   Loss 2.8566   LearningRate 0.0006   Epoch: 11   Global Step: 19460   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-03 17:51:35,579-Speed 13802.43 samples/sec   Loss 2.8270   LearningRate 0.0006   Epoch: 11   Global Step: 19470   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-03 17:51:53,446-Speed 13755.76 samples/sec   Loss 2.8331   LearningRate 0.0006   Epoch: 11   Global Step: 19480   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-03 17:52:11,240-Speed 13812.66 samples/sec   Loss 2.8059   LearningRate 0.0006   Epoch: 11   Global Step: 19490   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-03 17:52:29,023-Speed 13820.96 samples/sec   Loss 2.7922   LearningRate 0.0006   Epoch: 11   Global Step: 19500   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-03 17:52:46,714-Speed 13892.08 samples/sec   Loss 2.8314   LearningRate 0.0006   Epoch: 11   Global Step: 19510   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-03 17:53:04,493-Speed 13824.07 samples/sec   Loss 2.8080   LearningRate 0.0006   Epoch: 11   Global Step: 19520   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-03 17:53:22,219-Speed 13865.28 samples/sec   Loss 2.8116   LearningRate 0.0006   Epoch: 11   Global Step: 19530   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-03 17:53:39,967-Speed 13848.60 samples/sec   Loss 2.8082   LearningRate 0.0006   Epoch: 11   Global Step: 19540   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-03 17:53:57,757-Speed 13815.01 samples/sec   Loss 2.7787   LearningRate 0.0006   Epoch: 11   Global Step: 19550   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-03 17:54:15,568-Speed 13798.94 samples/sec   Loss 2.8672   LearningRate 0.0006   Epoch: 11   Global Step: 19560   Fp16 Grad Scale: 16384   Required: 25 hours
Training: 2022-03-03 17:54:33,288-Speed 13870.87 samples/sec   Loss 2.8075   LearningRate 0.0006   Epoch: 11   Global Step: 19570   Fp16 Grad Scale: 16384   Required: 25 hours
Training: 2022-03-03 17:54:51,112-Speed 13789.11 samples/sec   Loss 2.8123   LearningRate 0.0006   Epoch: 11   Global Step: 19580   Fp16 Grad Scale: 16384   Required: 25 hours
Training: 2022-03-03 17:55:08,817-Speed 13881.81 samples/sec   Loss 2.7881   LearningRate 0.0006   Epoch: 11   Global Step: 19590   Fp16 Grad Scale: 16384   Required: 25 hours
Training: 2022-03-03 17:55:26,491-Speed 13905.68 samples/sec   Loss 2.7797   LearningRate 0.0006   Epoch: 11   Global Step: 19600   Fp16 Grad Scale: 16384   Required: 25 hours
Training: 2022-03-03 17:55:44,261-Speed 13831.25 samples/sec   Loss 2.7838   LearningRate 0.0006   Epoch: 11   Global Step: 19610   Fp16 Grad Scale: 16384   Required: 25 hours
Training: 2022-03-03 17:56:02,057-Speed 13810.92 samples/sec   Loss 2.7871   LearningRate 0.0006   Epoch: 11   Global Step: 19620   Fp16 Grad Scale: 16384   Required: 25 hours
Training: 2022-03-03 17:56:19,841-Speed 13819.95 samples/sec   Loss 2.7865   LearningRate 0.0006   Epoch: 11   Global Step: 19630   Fp16 Grad Scale: 16384   Required: 25 hours
Training: 2022-03-03 17:56:37,548-Speed 13880.11 samples/sec   Loss 2.7960   LearningRate 0.0006   Epoch: 11   Global Step: 19640   Fp16 Grad Scale: 16384   Required: 25 hours
Training: 2022-03-03 17:56:55,357-Speed 13800.78 samples/sec   Loss 2.8076   LearningRate 0.0006   Epoch: 11   Global Step: 19650   Fp16 Grad Scale: 16384   Required: 25 hours
Training: 2022-03-03 17:57:13,105-Speed 13848.03 samples/sec   Loss 2.7937   LearningRate 0.0006   Epoch: 11   Global Step: 19660   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-03 17:57:30,854-Speed 13847.88 samples/sec   Loss 2.8104   LearningRate 0.0006   Epoch: 11   Global Step: 19670   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-03 17:57:48,651-Speed 13809.23 samples/sec   Loss 2.7975   LearningRate 0.0006   Epoch: 11   Global Step: 19680   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-03 17:58:06,376-Speed 13866.48 samples/sec   Loss 2.7990   LearningRate 0.0006   Epoch: 11   Global Step: 19690   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-03 17:58:24,126-Speed 13846.46 samples/sec   Loss 2.7975   LearningRate 0.0006   Epoch: 11   Global Step: 19700   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-03 17:58:41,845-Speed 13871.27 samples/sec   Loss 2.7868   LearningRate 0.0006   Epoch: 11   Global Step: 19710   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-03 17:58:59,603-Speed 13839.86 samples/sec   Loss 2.7995   LearningRate 0.0006   Epoch: 11   Global Step: 19720   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-03 17:59:17,317-Speed 13875.44 samples/sec   Loss 2.7800   LearningRate 0.0006   Epoch: 11   Global Step: 19730   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-03 17:59:35,055-Speed 13857.10 samples/sec   Loss 2.7885   LearningRate 0.0006   Epoch: 11   Global Step: 19740   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-03 17:59:52,853-Speed 13809.47 samples/sec   Loss 2.7934   LearningRate 0.0006   Epoch: 11   Global Step: 19750   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-03 18:00:10,581-Speed 13863.21 samples/sec   Loss 2.8118   LearningRate 0.0006   Epoch: 11   Global Step: 19760   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-03 18:00:28,393-Speed 13799.62 samples/sec   Loss 2.7937   LearningRate 0.0006   Epoch: 11   Global Step: 19770   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-03 18:00:46,133-Speed 13853.99 samples/sec   Loss 2.8024   LearningRate 0.0006   Epoch: 11   Global Step: 19780   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-03 18:01:03,920-Speed 13817.76 samples/sec   Loss 2.7590   LearningRate 0.0006   Epoch: 11   Global Step: 19790   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-03 18:01:21,650-Speed 13862.07 samples/sec   Loss 2.7655   LearningRate 0.0006   Epoch: 11   Global Step: 19800   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-03 18:01:39,439-Speed 13816.53 samples/sec   Loss 2.7730   LearningRate 0.0006   Epoch: 11   Global Step: 19810   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-03 18:01:57,250-Speed 13798.73 samples/sec   Loss 2.7767   LearningRate 0.0006   Epoch: 11   Global Step: 19820   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-03 18:02:15,015-Speed 13835.26 samples/sec   Loss 2.7687   LearningRate 0.0006   Epoch: 11   Global Step: 19830   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-03 18:02:32,731-Speed 13873.68 samples/sec   Loss 2.8150   LearningRate 0.0006   Epoch: 11   Global Step: 19840   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-03 18:02:50,542-Speed 13798.70 samples/sec   Loss 2.7821   LearningRate 0.0006   Epoch: 11   Global Step: 19850   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-03 18:03:08,242-Speed 13886.06 samples/sec   Loss 2.7742   LearningRate 0.0006   Epoch: 11   Global Step: 19860   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-03 18:03:26,038-Speed 13810.69 samples/sec   Loss 2.7539   LearningRate 0.0006   Epoch: 11   Global Step: 19870   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-03 18:03:43,798-Speed 13838.17 samples/sec   Loss 2.7704   LearningRate 0.0006   Epoch: 11   Global Step: 19880   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-03 18:04:01,578-Speed 13823.59 samples/sec   Loss 2.7360   LearningRate 0.0006   Epoch: 11   Global Step: 19890   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-03 18:04:19,372-Speed 13813.31 samples/sec   Loss 2.7793   LearningRate 0.0006   Epoch: 11   Global Step: 19900   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-03 18:04:37,105-Speed 13860.64 samples/sec   Loss 2.7522   LearningRate 0.0006   Epoch: 11   Global Step: 19910   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-03 18:04:54,842-Speed 13856.98 samples/sec   Loss 2.7434   LearningRate 0.0006   Epoch: 11   Global Step: 19920   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-03 18:05:12,600-Speed 13839.76 samples/sec   Loss 2.7476   LearningRate 0.0006   Epoch: 11   Global Step: 19930   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-03 18:05:30,410-Speed 13799.90 samples/sec   Loss 2.7787   LearningRate 0.0006   Epoch: 11   Global Step: 19940   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-03 18:05:48,190-Speed 13823.14 samples/sec   Loss 2.7691   LearningRate 0.0006   Epoch: 11   Global Step: 19950   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-03 18:06:05,988-Speed 13809.28 samples/sec   Loss 2.7506   LearningRate 0.0006   Epoch: 11   Global Step: 19960   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-03 18:06:23,744-Speed 13842.74 samples/sec   Loss 2.7435   LearningRate 0.0006   Epoch: 11   Global Step: 19970   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-03 18:06:41,511-Speed 13833.17 samples/sec   Loss 2.7602   LearningRate 0.0006   Epoch: 11   Global Step: 19980   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-03 18:06:59,273-Speed 13837.47 samples/sec   Loss 2.7717   LearningRate 0.0006   Epoch: 11   Global Step: 19990   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-03 18:07:16,983-Speed 13877.52 samples/sec   Loss 2.7690   LearningRate 0.0006   Epoch: 11   Global Step: 20000   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-03 18:07:34,740-Speed 13841.28 samples/sec   Loss 2.7511   LearningRate 0.0006   Epoch: 11   Global Step: 20010   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-03 18:07:52,465-Speed 13865.61 samples/sec   Loss 2.7601   LearningRate 0.0006   Epoch: 11   Global Step: 20020   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-03 18:08:10,207-Speed 13853.60 samples/sec   Loss 2.7730   LearningRate 0.0006   Epoch: 11   Global Step: 20030   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-03 18:08:28,020-Speed 13796.87 samples/sec   Loss 2.7746   LearningRate 0.0006   Epoch: 11   Global Step: 20040   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-03 18:08:45,713-Speed 13891.22 samples/sec   Loss 2.7688   LearningRate 0.0006   Epoch: 11   Global Step: 20050   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-03 18:09:03,410-Speed 13888.45 samples/sec   Loss 2.7719   LearningRate 0.0006   Epoch: 11   Global Step: 20060   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-03 18:09:21,183-Speed 13828.31 samples/sec   Loss 2.7577   LearningRate 0.0006   Epoch: 11   Global Step: 20070   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-03 18:09:38,926-Speed 13852.24 samples/sec   Loss 2.7271   LearningRate 0.0006   Epoch: 11   Global Step: 20080   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-03 18:09:56,646-Speed 13869.58 samples/sec   Loss 2.7474   LearningRate 0.0006   Epoch: 11   Global Step: 20090   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-03 18:10:14,381-Speed 13858.08 samples/sec   Loss 2.7858   LearningRate 0.0006   Epoch: 11   Global Step: 20100   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-03 18:10:32,127-Speed 13849.60 samples/sec   Loss 2.7708   LearningRate 0.0006   Epoch: 11   Global Step: 20110   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-03 18:10:49,924-Speed 13810.93 samples/sec   Loss 2.7497   LearningRate 0.0006   Epoch: 11   Global Step: 20120   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-03 18:11:07,748-Speed 13789.48 samples/sec   Loss 2.7335   LearningRate 0.0006   Epoch: 11   Global Step: 20130   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-03 18:11:25,539-Speed 13813.83 samples/sec   Loss 2.7349   LearningRate 0.0006   Epoch: 11   Global Step: 20140   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-03 18:11:43,277-Speed 13856.37 samples/sec   Loss 2.7596   LearningRate 0.0006   Epoch: 11   Global Step: 20150   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-03 18:12:01,035-Speed 13840.87 samples/sec   Loss 2.7401   LearningRate 0.0006   Epoch: 11   Global Step: 20160   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-03 18:12:18,757-Speed 13868.99 samples/sec   Loss 2.7690   LearningRate 0.0006   Epoch: 11   Global Step: 20170   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-03 18:12:36,513-Speed 13841.41 samples/sec   Loss 2.7537   LearningRate 0.0006   Epoch: 11   Global Step: 20180   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-03 18:12:54,244-Speed 13861.10 samples/sec   Loss 2.7221   LearningRate 0.0006   Epoch: 11   Global Step: 20190   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-03 18:13:11,967-Speed 13867.60 samples/sec   Loss 2.7213   LearningRate 0.0006   Epoch: 11   Global Step: 20200   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-03 18:13:29,694-Speed 13864.92 samples/sec   Loss 2.7434   LearningRate 0.0006   Epoch: 11   Global Step: 20210   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-03 18:13:47,466-Speed 13829.25 samples/sec   Loss 2.7482   LearningRate 0.0006   Epoch: 11   Global Step: 20220   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-03 18:14:05,271-Speed 13804.47 samples/sec   Loss 2.7194   LearningRate 0.0006   Epoch: 11   Global Step: 20230   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-03 18:14:23,010-Speed 13855.02 samples/sec   Loss 2.7111   LearningRate 0.0006   Epoch: 11   Global Step: 20240   Fp16 Grad Scale: 32768   Required: 25 hours
Training: 2022-03-03 18:14:40,805-Speed 13811.90 samples/sec   Loss 2.7411   LearningRate 0.0006   Epoch: 11   Global Step: 20250   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-03 18:14:58,563-Speed 13840.77 samples/sec   Loss 2.7304   LearningRate 0.0006   Epoch: 11   Global Step: 20260   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-03 18:15:16,279-Speed 13873.14 samples/sec   Loss 2.7558   LearningRate 0.0006   Epoch: 11   Global Step: 20270   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-03 18:15:33,997-Speed 13870.87 samples/sec   Loss 2.7253   LearningRate 0.0006   Epoch: 11   Global Step: 20280   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-03 18:15:51,796-Speed 13809.29 samples/sec   Loss 2.7045   LearningRate 0.0006   Epoch: 11   Global Step: 20290   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-03 18:16:09,574-Speed 13824.74 samples/sec   Loss 2.7534   LearningRate 0.0006   Epoch: 11   Global Step: 20300   Fp16 Grad Scale: 65536   Required: 25 hours
Training: 2022-03-03 18:16:27,290-Speed 13872.77 samples/sec   Loss 2.7500   LearningRate 0.0006   Epoch: 11   Global Step: 20310   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-03 18:16:45,000-Speed 13877.74 samples/sec   Loss 2.7371   LearningRate 0.0006   Epoch: 11   Global Step: 20320   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-03 18:17:02,802-Speed 13806.47 samples/sec   Loss 2.7284   LearningRate 0.0006   Epoch: 11   Global Step: 20330   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-03 18:17:20,491-Speed 13893.94 samples/sec   Loss 2.7494   LearningRate 0.0006   Epoch: 11   Global Step: 20340   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-03 18:17:38,416-Speed 13711.01 samples/sec   Loss 2.7420   LearningRate 0.0006   Epoch: 11   Global Step: 20350   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-03-03 18:17:56,147-Speed 13861.68 samples/sec   Loss 2.7166   LearningRate 0.0006   Epoch: 11   Global Step: 20360   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-03 18:18:13,921-Speed 13828.05 samples/sec   Loss 2.7194   LearningRate 0.0006   Epoch: 11   Global Step: 20370   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-03 18:18:31,613-Speed 13891.76 samples/sec   Loss 2.7078   LearningRate 0.0006   Epoch: 11   Global Step: 20380   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-03 18:18:49,376-Speed 13836.79 samples/sec   Loss 2.7135   LearningRate 0.0006   Epoch: 11   Global Step: 20390   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-03 18:19:07,096-Speed 13869.43 samples/sec   Loss 2.7356   LearningRate 0.0006   Epoch: 11   Global Step: 20400   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-03 18:19:24,741-Speed 13929.09 samples/sec   Loss 2.7000   LearningRate 0.0006   Epoch: 11   Global Step: 20410   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-03 18:19:42,441-Speed 13886.29 samples/sec   Loss 2.7289   LearningRate 0.0006   Epoch: 11   Global Step: 20420   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-03 18:20:00,178-Speed 13856.14 samples/sec   Loss 2.7252   LearningRate 0.0006   Epoch: 11   Global Step: 20430   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-03 18:20:17,940-Speed 13837.43 samples/sec   Loss 2.7251   LearningRate 0.0006   Epoch: 11   Global Step: 20440   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-03 18:20:35,615-Speed 13904.98 samples/sec   Loss 2.6987   LearningRate 0.0006   Epoch: 11   Global Step: 20450   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-03 18:20:53,336-Speed 13869.33 samples/sec   Loss 2.7036   LearningRate 0.0006   Epoch: 11   Global Step: 20460   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-03-03 18:21:10,980-Speed 13930.01 samples/sec   Loss 2.7054   LearningRate 0.0006   Epoch: 11   Global Step: 20470   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-03 18:21:28,720-Speed 13853.80 samples/sec   Loss 2.7879   LearningRate 0.0006   Epoch: 11   Global Step: 20480   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-03 18:21:46,419-Speed 13886.88 samples/sec   Loss 2.7170   LearningRate 0.0006   Epoch: 11   Global Step: 20490   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-03 18:22:04,125-Speed 13880.75 samples/sec   Loss 2.7283   LearningRate 0.0006   Epoch: 11   Global Step: 20500   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-03 18:22:21,832-Speed 13880.19 samples/sec   Loss 2.6926   LearningRate 0.0006   Epoch: 11   Global Step: 20510   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-03 18:22:39,606-Speed 13828.01 samples/sec   Loss 2.7066   LearningRate 0.0006   Epoch: 11   Global Step: 20520   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-03 18:22:57,240-Speed 13936.86 samples/sec   Loss 2.7441   LearningRate 0.0006   Epoch: 11   Global Step: 20530   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-03-03 18:23:15,023-Speed 13821.39 samples/sec   Loss 2.7496   LearningRate 0.0006   Epoch: 11   Global Step: 20540   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-03-03 18:23:32,790-Speed 13833.45 samples/sec   Loss 2.7403   LearningRate 0.0006   Epoch: 11   Global Step: 20550   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-03-03 18:23:50,462-Speed 13907.66 samples/sec   Loss 2.7216   LearningRate 0.0006   Epoch: 11   Global Step: 20560   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-03-03 18:24:08,171-Speed 13878.26 samples/sec   Loss 2.6997   LearningRate 0.0006   Epoch: 11   Global Step: 20570   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-03-03 18:24:25,865-Speed 13890.93 samples/sec   Loss 2.7082   LearningRate 0.0006   Epoch: 11   Global Step: 20580   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-03-03 18:24:43,564-Speed 13886.67 samples/sec   Loss 2.6991   LearningRate 0.0006   Epoch: 11   Global Step: 20590   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-03-03 18:25:01,325-Speed 13837.56 samples/sec   Loss 2.7135   LearningRate 0.0006   Epoch: 11   Global Step: 20600   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-03-03 18:25:19,086-Speed 13837.78 samples/sec   Loss 2.7103   LearningRate 0.0006   Epoch: 11   Global Step: 20610   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-03-03 18:25:36,867-Speed 13823.06 samples/sec   Loss 2.6963   LearningRate 0.0006   Epoch: 11   Global Step: 20620   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-03-03 18:25:54,528-Speed 13916.03 samples/sec   Loss 2.6825   LearningRate 0.0006   Epoch: 11   Global Step: 20630   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-03 18:26:12,273-Speed 13850.66 samples/sec   Loss 2.7118   LearningRate 0.0006   Epoch: 11   Global Step: 20640   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-03 18:26:30,061-Speed 13816.43 samples/sec   Loss 2.7187   LearningRate 0.0006   Epoch: 11   Global Step: 20650   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-03 18:26:47,797-Speed 13857.40 samples/sec   Loss 2.7164   LearningRate 0.0006   Epoch: 11   Global Step: 20660   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-03 18:27:05,566-Speed 13832.07 samples/sec   Loss 2.7413   LearningRate 0.0006   Epoch: 11   Global Step: 20670   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-03 18:27:23,330-Speed 13836.12 samples/sec   Loss 2.7369   LearningRate 0.0006   Epoch: 11   Global Step: 20680   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-03-03 18:27:41,098-Speed 13832.39 samples/sec   Loss 2.6919   LearningRate 0.0006   Epoch: 11   Global Step: 20690   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-03-03 18:27:58,845-Speed 13848.82 samples/sec   Loss 2.6821   LearningRate 0.0006   Epoch: 11   Global Step: 20700   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-03-03 18:28:16,507-Speed 13916.78 samples/sec   Loss 2.7268   LearningRate 0.0006   Epoch: 11   Global Step: 20710   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-03-03 18:28:34,251-Speed 13851.02 samples/sec   Loss 2.7293   LearningRate 0.0006   Epoch: 11   Global Step: 20720   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-03-03 18:28:51,926-Speed 13905.45 samples/sec   Loss 2.7159   LearningRate 0.0006   Epoch: 11   Global Step: 20730   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-03-03 18:29:09,635-Speed 13878.55 samples/sec   Loss 2.7063   LearningRate 0.0006   Epoch: 11   Global Step: 20740   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-03-03 18:30:18,552-Speed 3566.08 samples/sec   Loss 2.6495   LearningRate 0.0006   Epoch: 12   Global Step: 20750   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-03-03 18:30:36,189-Speed 13935.70 samples/sec   Loss 2.6696   LearningRate 0.0006   Epoch: 12   Global Step: 20760   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-03-03 18:30:53,892-Speed 13882.93 samples/sec   Loss 2.6570   LearningRate 0.0006   Epoch: 12   Global Step: 20770   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-03-03 18:31:11,622-Speed 13863.12 samples/sec   Loss 2.6313   LearningRate 0.0006   Epoch: 12   Global Step: 20780   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-03 18:31:29,345-Speed 13867.00 samples/sec   Loss 2.6914   LearningRate 0.0006   Epoch: 12   Global Step: 20790   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-03 18:31:47,098-Speed 13844.73 samples/sec   Loss 2.6799   LearningRate 0.0006   Epoch: 12   Global Step: 20800   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-03 18:32:04,950-Speed 13767.69 samples/sec   Loss 2.6562   LearningRate 0.0006   Epoch: 12   Global Step: 20810   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-03 18:32:22,811-Speed 13760.22 samples/sec   Loss 2.6710   LearningRate 0.0006   Epoch: 12   Global Step: 20820   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-03-03 18:32:40,693-Speed 13744.08 samples/sec   Loss 2.6578   LearningRate 0.0006   Epoch: 12   Global Step: 20830   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-03-03 18:32:58,525-Speed 13783.43 samples/sec   Loss 2.6547   LearningRate 0.0006   Epoch: 12   Global Step: 20840   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-03-03 18:33:16,323-Speed 13808.71 samples/sec   Loss 2.6596   LearningRate 0.0006   Epoch: 12   Global Step: 20850   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-03-03 18:33:34,128-Speed 13803.49 samples/sec   Loss 2.6572   LearningRate 0.0006   Epoch: 12   Global Step: 20860   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-03-03 18:33:51,865-Speed 13857.03 samples/sec   Loss 2.7075   LearningRate 0.0006   Epoch: 12   Global Step: 20870   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-03-03 18:34:09,621-Speed 13841.95 samples/sec   Loss 2.6665   LearningRate 0.0006   Epoch: 12   Global Step: 20880   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-03-03 18:34:27,516-Speed 13734.75 samples/sec   Loss 2.6556   LearningRate 0.0006   Epoch: 12   Global Step: 20890   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-03-03 18:34:45,623-Speed 13572.85 samples/sec   Loss 2.6553   LearningRate 0.0006   Epoch: 12   Global Step: 20900   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-03-03 18:35:03,616-Speed 13660.10 samples/sec   Loss 2.6789   LearningRate 0.0006   Epoch: 12   Global Step: 20910   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-03-03 18:35:21,281-Speed 13913.06 samples/sec   Loss 2.6688   LearningRate 0.0006   Epoch: 12   Global Step: 20920   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-03 18:35:39,040-Speed 13839.43 samples/sec   Loss 2.6735   LearningRate 0.0006   Epoch: 12   Global Step: 20930   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-03 18:35:56,811-Speed 13831.19 samples/sec   Loss 2.6669   LearningRate 0.0006   Epoch: 12   Global Step: 20940   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-03 18:36:14,523-Speed 13876.46 samples/sec   Loss 2.7000   LearningRate 0.0006   Epoch: 12   Global Step: 20950   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-03 18:36:32,245-Speed 13868.65 samples/sec   Loss 2.6788   LearningRate 0.0006   Epoch: 12   Global Step: 20960   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-03 18:36:50,079-Speed 13781.11 samples/sec   Loss 2.6642   LearningRate 0.0006   Epoch: 12   Global Step: 20970   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-03 18:37:07,777-Speed 13887.41 samples/sec   Loss 2.6388   LearningRate 0.0006   Epoch: 12   Global Step: 20980   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-03 18:37:25,561-Speed 13819.49 samples/sec   Loss 2.6581   LearningRate 0.0006   Epoch: 12   Global Step: 20990   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-03 18:37:43,373-Speed 13798.94 samples/sec   Loss 2.7295   LearningRate 0.0006   Epoch: 12   Global Step: 21000   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-03 18:38:01,296-Speed 13713.08 samples/sec   Loss 2.6646   LearningRate 0.0006   Epoch: 12   Global Step: 21010   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-03 18:38:19,273-Speed 13671.94 samples/sec   Loss 2.6514   LearningRate 0.0006   Epoch: 12   Global Step: 21020   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-03-03 18:38:37,214-Speed 13698.73 samples/sec   Loss 2.6796   LearningRate 0.0006   Epoch: 12   Global Step: 21030   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-03-03 18:38:55,029-Speed 13796.74 samples/sec   Loss 2.6711   LearningRate 0.0006   Epoch: 12   Global Step: 21040   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-03 18:39:12,879-Speed 13769.06 samples/sec   Loss 2.6644   LearningRate 0.0006   Epoch: 12   Global Step: 21050   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-03 18:39:30,696-Speed 13794.30 samples/sec   Loss 2.6436   LearningRate 0.0006   Epoch: 12   Global Step: 21060   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-03 18:39:48,573-Speed 13748.35 samples/sec   Loss 2.6808   LearningRate 0.0006   Epoch: 12   Global Step: 21070   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-03 18:40:06,499-Speed 13710.63 samples/sec   Loss 2.6878   LearningRate 0.0006   Epoch: 12   Global Step: 21080   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-03 18:40:24,307-Speed 13800.98 samples/sec   Loss 2.6557   LearningRate 0.0006   Epoch: 12   Global Step: 21090   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-03 18:40:42,150-Speed 13774.54 samples/sec   Loss 2.6498   LearningRate 0.0006   Epoch: 12   Global Step: 21100   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-03 18:40:59,932-Speed 13822.09 samples/sec   Loss 2.6466   LearningRate 0.0006   Epoch: 12   Global Step: 21110   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-03-03 18:41:17,633-Speed 13884.42 samples/sec   Loss 2.6515   LearningRate 0.0006   Epoch: 12   Global Step: 21120   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-03-03 18:41:35,308-Speed 13905.77 samples/sec   Loss 2.6477   LearningRate 0.0006   Epoch: 12   Global Step: 21130   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-03-03 18:41:52,966-Speed 13918.15 samples/sec   Loss 2.6613   LearningRate 0.0006   Epoch: 12   Global Step: 21140   Fp16 Grad Scale: 16384   Required: 24 hours
Training: 2022-03-03 18:42:10,766-Speed 13807.66 samples/sec   Loss 2.6623   LearningRate 0.0006   Epoch: 12   Global Step: 21150   Fp16 Grad Scale: 16384   Required: 24 hours
Training: 2022-03-03 18:42:28,588-Speed 13790.17 samples/sec   Loss 2.6835   LearningRate 0.0006   Epoch: 12   Global Step: 21160   Fp16 Grad Scale: 16384   Required: 24 hours
Training: 2022-03-03 18:42:46,336-Speed 13848.04 samples/sec   Loss 2.6747   LearningRate 0.0006   Epoch: 12   Global Step: 21170   Fp16 Grad Scale: 16384   Required: 24 hours
Training: 2022-03-03 18:43:04,191-Speed 13765.21 samples/sec   Loss 2.6688   LearningRate 0.0006   Epoch: 12   Global Step: 21180   Fp16 Grad Scale: 16384   Required: 24 hours
Training: 2022-03-03 18:43:21,992-Speed 13807.00 samples/sec   Loss 2.6702   LearningRate 0.0006   Epoch: 12   Global Step: 21190   Fp16 Grad Scale: 16384   Required: 24 hours
Training: 2022-03-03 18:43:39,835-Speed 13774.80 samples/sec   Loss 2.6529   LearningRate 0.0006   Epoch: 12   Global Step: 21200   Fp16 Grad Scale: 16384   Required: 24 hours
Training: 2022-03-03 18:43:57,640-Speed 13803.60 samples/sec   Loss 2.6659   LearningRate 0.0006   Epoch: 12   Global Step: 21210   Fp16 Grad Scale: 16384   Required: 24 hours
Training: 2022-03-03 18:44:15,524-Speed 13742.51 samples/sec   Loss 2.6458   LearningRate 0.0006   Epoch: 12   Global Step: 21220   Fp16 Grad Scale: 16384   Required: 24 hours
Training: 2022-03-03 18:44:33,342-Speed 13793.86 samples/sec   Loss 2.6731   LearningRate 0.0006   Epoch: 12   Global Step: 21230   Fp16 Grad Scale: 16384   Required: 24 hours
Training: 2022-03-03 18:44:51,169-Speed 13786.70 samples/sec   Loss 2.6539   LearningRate 0.0006   Epoch: 12   Global Step: 21240   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-03-03 18:45:08,988-Speed 13792.51 samples/sec   Loss 2.6473   LearningRate 0.0006   Epoch: 12   Global Step: 21250   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-03-03 18:45:26,815-Speed 13787.43 samples/sec   Loss 2.6557   LearningRate 0.0006   Epoch: 12   Global Step: 21260   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-03-03 18:45:44,664-Speed 13769.74 samples/sec   Loss 2.6462   LearningRate 0.0006   Epoch: 12   Global Step: 21270   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-03-03 18:46:02,489-Speed 13787.72 samples/sec   Loss 2.6384   LearningRate 0.0006   Epoch: 12   Global Step: 21280   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-03-03 18:46:20,283-Speed 13812.68 samples/sec   Loss 2.6526   LearningRate 0.0006   Epoch: 12   Global Step: 21290   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-03-03 18:46:38,035-Speed 13845.13 samples/sec   Loss 2.6486   LearningRate 0.0006   Epoch: 12   Global Step: 21300   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-03-03 18:46:55,806-Speed 13830.14 samples/sec   Loss 2.6318   LearningRate 0.0006   Epoch: 12   Global Step: 21310   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-03-03 18:47:13,604-Speed 13809.09 samples/sec   Loss 2.6240   LearningRate 0.0006   Epoch: 12   Global Step: 21320   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-03-03 18:47:31,423-Speed 13792.89 samples/sec   Loss 2.6334   LearningRate 0.0006   Epoch: 12   Global Step: 21330   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-03-03 18:47:49,225-Speed 13806.18 samples/sec   Loss 2.6181   LearningRate 0.0006   Epoch: 12   Global Step: 21340   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-03 18:48:07,074-Speed 13769.74 samples/sec   Loss 2.6371   LearningRate 0.0006   Epoch: 12   Global Step: 21350   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-03 18:48:24,861-Speed 13818.01 samples/sec   Loss 2.6302   LearningRate 0.0006   Epoch: 12   Global Step: 21360   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-03 18:48:42,674-Speed 13797.38 samples/sec   Loss 2.6431   LearningRate 0.0006   Epoch: 12   Global Step: 21370   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-03 18:49:00,557-Speed 13742.97 samples/sec   Loss 2.6383   LearningRate 0.0006   Epoch: 12   Global Step: 21380   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-03 18:49:18,336-Speed 13824.35 samples/sec   Loss 2.6585   LearningRate 0.0006   Epoch: 12   Global Step: 21390   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-03 18:49:36,125-Speed 13816.54 samples/sec   Loss 2.6311   LearningRate 0.0006   Epoch: 12   Global Step: 21400   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-03 18:49:53,975-Speed 13768.81 samples/sec   Loss 2.6375   LearningRate 0.0006   Epoch: 12   Global Step: 21410   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-03 18:50:11,794-Speed 13792.90 samples/sec   Loss 2.6348   LearningRate 0.0006   Epoch: 12   Global Step: 21420   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-03 18:50:29,575-Speed 13822.49 samples/sec   Loss 2.6278   LearningRate 0.0006   Epoch: 12   Global Step: 21430   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-03 18:50:47,399-Speed 13788.34 samples/sec   Loss 2.6169   LearningRate 0.0006   Epoch: 12   Global Step: 21440   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-03-03 18:51:05,207-Speed 13801.22 samples/sec   Loss 2.6125   LearningRate 0.0006   Epoch: 12   Global Step: 21450   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-03-03 18:51:23,027-Speed 13792.59 samples/sec   Loss 2.6214   LearningRate 0.0006   Epoch: 12   Global Step: 21460   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-03-03 18:51:40,791-Speed 13835.25 samples/sec   Loss 2.6326   LearningRate 0.0006   Epoch: 12   Global Step: 21470   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-03 18:51:58,666-Speed 13752.41 samples/sec   Loss 2.6163   LearningRate 0.0006   Epoch: 12   Global Step: 21480   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-03 18:52:16,577-Speed 13722.07 samples/sec   Loss 2.6320   LearningRate 0.0006   Epoch: 12   Global Step: 21490   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-03 18:52:34,342-Speed 13834.78 samples/sec   Loss 2.6380   LearningRate 0.0006   Epoch: 12   Global Step: 21500   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-03 18:52:52,149-Speed 13802.26 samples/sec   Loss 2.6078   LearningRate 0.0006   Epoch: 12   Global Step: 21510   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-03 18:53:09,910-Speed 13837.58 samples/sec   Loss 2.6173   LearningRate 0.0006   Epoch: 12   Global Step: 21520   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-03 18:53:27,652-Speed 13852.72 samples/sec   Loss 2.6450   LearningRate 0.0006   Epoch: 12   Global Step: 21530   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-03 18:53:45,409-Speed 13842.31 samples/sec   Loss 2.6319   LearningRate 0.0006   Epoch: 12   Global Step: 21540   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-03 18:54:03,207-Speed 13809.61 samples/sec   Loss 2.6345   LearningRate 0.0006   Epoch: 12   Global Step: 21550   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-03 18:54:20,940-Speed 13859.72 samples/sec   Loss 2.5856   LearningRate 0.0006   Epoch: 12   Global Step: 21560   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-03 18:54:38,697-Speed 13841.28 samples/sec   Loss 2.5925   LearningRate 0.0006   Epoch: 12   Global Step: 21570   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-03-03 18:54:56,359-Speed 13914.91 samples/sec   Loss 2.5998   LearningRate 0.0006   Epoch: 12   Global Step: 21580   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-03-03 18:55:14,342-Speed 13667.60 samples/sec   Loss 2.5924   LearningRate 0.0006   Epoch: 12   Global Step: 21590   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-03-03 18:55:32,323-Speed 13668.61 samples/sec   Loss 2.6173   LearningRate 0.0006   Epoch: 12   Global Step: 21600   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-03-03 18:55:50,232-Speed 13723.86 samples/sec   Loss 2.6472   LearningRate 0.0006   Epoch: 12   Global Step: 21610   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-03 18:56:08,199-Speed 13680.34 samples/sec   Loss 2.5933   LearningRate 0.0006   Epoch: 12   Global Step: 21620   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-03 18:56:26,193-Speed 13658.57 samples/sec   Loss 2.6000   LearningRate 0.0006   Epoch: 12   Global Step: 21630   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-03 18:56:44,263-Speed 13601.52 samples/sec   Loss 2.6014   LearningRate 0.0006   Epoch: 12   Global Step: 21640   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-03 18:57:02,209-Speed 13695.56 samples/sec   Loss 2.5909   LearningRate 0.0006   Epoch: 12   Global Step: 21650   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-03-03 18:57:20,205-Speed 13656.45 samples/sec   Loss 2.6189   LearningRate 0.0006   Epoch: 12   Global Step: 21660   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-03-03 18:57:38,162-Speed 13686.86 samples/sec   Loss 2.6281   LearningRate 0.0006   Epoch: 12   Global Step: 21670   Fp16 Grad Scale: 16384   Required: 24 hours
Training: 2022-03-03 18:57:56,150-Speed 13663.53 samples/sec   Loss 2.6144   LearningRate 0.0006   Epoch: 12   Global Step: 21680   Fp16 Grad Scale: 16384   Required: 24 hours
Training: 2022-03-03 18:58:14,254-Speed 13576.76 samples/sec   Loss 2.5969   LearningRate 0.0006   Epoch: 12   Global Step: 21690   Fp16 Grad Scale: 16384   Required: 24 hours
Training: 2022-03-03 18:58:32,211-Speed 13686.77 samples/sec   Loss 2.5607   LearningRate 0.0006   Epoch: 12   Global Step: 21700   Fp16 Grad Scale: 16384   Required: 24 hours
Training: 2022-03-03 18:58:50,169-Speed 13686.11 samples/sec   Loss 2.5963   LearningRate 0.0006   Epoch: 12   Global Step: 21710   Fp16 Grad Scale: 16384   Required: 24 hours
Training: 2022-03-03 18:59:08,102-Speed 13704.70 samples/sec   Loss 2.6054   LearningRate 0.0006   Epoch: 12   Global Step: 21720   Fp16 Grad Scale: 16384   Required: 24 hours
Training: 2022-03-03 18:59:26,069-Speed 13679.87 samples/sec   Loss 2.6090   LearningRate 0.0006   Epoch: 12   Global Step: 21730   Fp16 Grad Scale: 16384   Required: 24 hours
Training: 2022-03-03 18:59:44,210-Speed 13547.69 samples/sec   Loss 2.6096   LearningRate 0.0006   Epoch: 12   Global Step: 21740   Fp16 Grad Scale: 16384   Required: 24 hours
Training: 2022-03-03 19:00:02,294-Speed 13590.89 samples/sec   Loss 2.6221   LearningRate 0.0006   Epoch: 12   Global Step: 21750   Fp16 Grad Scale: 16384   Required: 24 hours
Training: 2022-03-03 19:00:20,383-Speed 13588.25 samples/sec   Loss 2.6192   LearningRate 0.0006   Epoch: 12   Global Step: 21760   Fp16 Grad Scale: 16384   Required: 24 hours
Training: 2022-03-03 19:00:38,339-Speed 13687.84 samples/sec   Loss 2.6180   LearningRate 0.0006   Epoch: 12   Global Step: 21770   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-03-03 19:00:56,343-Speed 13651.37 samples/sec   Loss 2.5955   LearningRate 0.0006   Epoch: 12   Global Step: 21780   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-03-03 19:01:14,337-Speed 13658.17 samples/sec   Loss 2.5978   LearningRate 0.0006   Epoch: 12   Global Step: 21790   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-03-03 19:01:32,309-Speed 13675.33 samples/sec   Loss 2.5975   LearningRate 0.0006   Epoch: 12   Global Step: 21800   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-03-03 19:01:50,272-Speed 13683.79 samples/sec   Loss 2.6248   LearningRate 0.0006   Epoch: 12   Global Step: 21810   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-03-03 19:02:08,266-Speed 13658.85 samples/sec   Loss 2.6116   LearningRate 0.0006   Epoch: 12   Global Step: 21820   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-03-03 19:02:26,274-Speed 13647.87 samples/sec   Loss 2.5926   LearningRate 0.0006   Epoch: 12   Global Step: 21830   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-03-03 19:02:44,312-Speed 13625.39 samples/sec   Loss 2.5945   LearningRate 0.0006   Epoch: 12   Global Step: 21840   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-03-03 19:03:02,291-Speed 13670.50 samples/sec   Loss 2.5905   LearningRate 0.0006   Epoch: 12   Global Step: 21850   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-03-03 19:03:20,237-Speed 13695.69 samples/sec   Loss 2.5593   LearningRate 0.0006   Epoch: 12   Global Step: 21860   Fp16 Grad Scale: 32768   Required: 24 hours
Training: 2022-03-03 19:03:38,260-Speed 13638.31 samples/sec   Loss 2.5534   LearningRate 0.0006   Epoch: 12   Global Step: 21870   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-03 19:03:56,247-Speed 13664.29 samples/sec   Loss 2.5955   LearningRate 0.0006   Epoch: 12   Global Step: 21880   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-03 19:04:14,263-Speed 13642.43 samples/sec   Loss 2.5839   LearningRate 0.0006   Epoch: 12   Global Step: 21890   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-03 19:04:32,265-Speed 13652.64 samples/sec   Loss 2.5813   LearningRate 0.0006   Epoch: 12   Global Step: 21900   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-03 19:04:50,266-Speed 13653.36 samples/sec   Loss 2.5886   LearningRate 0.0006   Epoch: 12   Global Step: 21910   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-03 19:05:08,305-Speed 13624.62 samples/sec   Loss 2.6273   LearningRate 0.0006   Epoch: 12   Global Step: 21920   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-03 19:05:26,304-Speed 13655.02 samples/sec   Loss 2.5808   LearningRate 0.0006   Epoch: 12   Global Step: 21930   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-03 19:05:44,356-Speed 13614.50 samples/sec   Loss 2.5820   LearningRate 0.0006   Epoch: 12   Global Step: 21940   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-03 19:06:02,298-Speed 13698.38 samples/sec   Loss 2.5831   LearningRate 0.0006   Epoch: 12   Global Step: 21950   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-03 19:06:20,275-Speed 13671.68 samples/sec   Loss 2.5768   LearningRate 0.0006   Epoch: 12   Global Step: 21960   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-03 19:06:38,295-Speed 13639.33 samples/sec   Loss 2.5755   LearningRate 0.0006   Epoch: 12   Global Step: 21970   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-03-03 19:06:56,291-Speed 13656.86 samples/sec   Loss 2.5642   LearningRate 0.0006   Epoch: 12   Global Step: 21980   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-03-03 19:07:14,293-Speed 13652.89 samples/sec   Loss 2.5932   LearningRate 0.0006   Epoch: 12   Global Step: 21990   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-03 19:07:32,326-Speed 13628.81 samples/sec   Loss 2.5765   LearningRate 0.0006   Epoch: 12   Global Step: 22000   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-03 19:07:50,446-Speed 13564.31 samples/sec   Loss 2.5645   LearningRate 0.0006   Epoch: 12   Global Step: 22010   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-03 19:08:08,531-Speed 13590.28 samples/sec   Loss 2.5656   LearningRate 0.0006   Epoch: 12   Global Step: 22020   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-03 19:08:26,605-Speed 13597.90 samples/sec   Loss 2.6068   LearningRate 0.0006   Epoch: 12   Global Step: 22030   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-03 19:08:44,581-Speed 13672.65 samples/sec   Loss 2.5732   LearningRate 0.0006   Epoch: 12   Global Step: 22040   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-03 19:09:02,454-Speed 13750.73 samples/sec   Loss 2.5597   LearningRate 0.0006   Epoch: 12   Global Step: 22050   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-03 19:09:20,212-Speed 13841.83 samples/sec   Loss 2.5641   LearningRate 0.0006   Epoch: 12   Global Step: 22060   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-03 19:09:37,949-Speed 13856.69 samples/sec   Loss 2.5885   LearningRate 0.0006   Epoch: 12   Global Step: 22070   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-03 19:09:55,622-Speed 13906.69 samples/sec   Loss 2.5656   LearningRate 0.0006   Epoch: 12   Global Step: 22080   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-03 19:10:13,386-Speed 13835.19 samples/sec   Loss 2.5721   LearningRate 0.0006   Epoch: 12   Global Step: 22090   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-03-03 19:10:31,099-Speed 13875.97 samples/sec   Loss 2.5955   LearningRate 0.0006   Epoch: 12   Global Step: 22100   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-03-03 19:10:49,014-Speed 13718.44 samples/sec   Loss 2.5662   LearningRate 0.0006   Epoch: 12   Global Step: 22110   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-03 19:11:06,819-Speed 13803.61 samples/sec   Loss 2.5729   LearningRate 0.0006   Epoch: 12   Global Step: 22120   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-03 19:11:24,673-Speed 13766.21 samples/sec   Loss 2.5781   LearningRate 0.0006   Epoch: 12   Global Step: 22130   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-03 19:11:42,408-Speed 13858.41 samples/sec   Loss 2.5667   LearningRate 0.0006   Epoch: 12   Global Step: 22140   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-03 19:12:00,160-Speed 13844.50 samples/sec   Loss 2.5757   LearningRate 0.0006   Epoch: 12   Global Step: 22150   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-03 19:12:17,910-Speed 13846.76 samples/sec   Loss 2.5598   LearningRate 0.0006   Epoch: 12   Global Step: 22160   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-03 19:12:35,593-Speed 13898.92 samples/sec   Loss 2.5538   LearningRate 0.0006   Epoch: 12   Global Step: 22170   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-03 19:12:53,298-Speed 13881.62 samples/sec   Loss 2.5659   LearningRate 0.0006   Epoch: 12   Global Step: 22180   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-03 19:13:11,034-Speed 13857.56 samples/sec   Loss 2.5794   LearningRate 0.0006   Epoch: 12   Global Step: 22190   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-03 19:13:28,809-Speed 13826.84 samples/sec   Loss 2.5727   LearningRate 0.0006   Epoch: 12   Global Step: 22200   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-03 19:13:46,534-Speed 13866.24 samples/sec   Loss 2.5817   LearningRate 0.0006   Epoch: 12   Global Step: 22210   Fp16 Grad Scale: 131072   Required: 24 hours
Training: 2022-03-03 19:14:04,335-Speed 13806.83 samples/sec   Loss 2.5818   LearningRate 0.0006   Epoch: 12   Global Step: 22220   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-03 19:14:22,036-Speed 13884.80 samples/sec   Loss 2.5604   LearningRate 0.0006   Epoch: 12   Global Step: 22230   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-03 19:14:39,741-Speed 13881.90 samples/sec   Loss 2.5616   LearningRate 0.0006   Epoch: 12   Global Step: 22240   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-03 19:14:57,488-Speed 13848.93 samples/sec   Loss 2.5580   LearningRate 0.0006   Epoch: 12   Global Step: 22250   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-03 19:15:15,340-Speed 13767.49 samples/sec   Loss 2.5686   LearningRate 0.0006   Epoch: 12   Global Step: 22260   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-03 19:15:33,072-Speed 13860.40 samples/sec   Loss 2.5648   LearningRate 0.0006   Epoch: 12   Global Step: 22270   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-03 19:15:50,830-Speed 13840.71 samples/sec   Loss 2.5804   LearningRate 0.0006   Epoch: 12   Global Step: 22280   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-03 19:16:08,635-Speed 13803.56 samples/sec   Loss 2.5522   LearningRate 0.0006   Epoch: 12   Global Step: 22290   Fp16 Grad Scale: 65536   Required: 24 hours
Training: 2022-03-03 19:16:26,354-Speed 13870.12 samples/sec   Loss 2.5470   LearningRate 0.0006   Epoch: 12   Global Step: 22300   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-03 19:16:44,119-Speed 13835.83 samples/sec   Loss 2.5469   LearningRate 0.0006   Epoch: 12   Global Step: 22310   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-03 19:17:01,894-Speed 13826.93 samples/sec   Loss 2.5456   LearningRate 0.0006   Epoch: 12   Global Step: 22320   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-03-03 19:17:19,576-Speed 13900.36 samples/sec   Loss 2.6125   LearningRate 0.0006   Epoch: 12   Global Step: 22330   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-03 19:17:37,301-Speed 13865.60 samples/sec   Loss 2.5772   LearningRate 0.0006   Epoch: 12   Global Step: 22340   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-03 19:17:55,034-Speed 13859.45 samples/sec   Loss 2.5663   LearningRate 0.0006   Epoch: 12   Global Step: 22350   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-03 19:18:12,724-Speed 13894.16 samples/sec   Loss 2.5564   LearningRate 0.0006   Epoch: 12   Global Step: 22360   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-03 19:18:30,511-Speed 13817.53 samples/sec   Loss 2.5486   LearningRate 0.0006   Epoch: 12   Global Step: 22370   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-03 19:18:48,192-Speed 13900.66 samples/sec   Loss 2.5555   LearningRate 0.0006   Epoch: 12   Global Step: 22380   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-03 19:19:05,903-Speed 13877.21 samples/sec   Loss 2.5445   LearningRate 0.0006   Epoch: 12   Global Step: 22390   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-03 19:19:23,609-Speed 13881.10 samples/sec   Loss 2.5759   LearningRate 0.0006   Epoch: 12   Global Step: 22400   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-03 19:19:41,332-Speed 13867.85 samples/sec   Loss 2.5652   LearningRate 0.0006   Epoch: 12   Global Step: 22410   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-03 19:19:59,121-Speed 13815.64 samples/sec   Loss 2.5674   LearningRate 0.0006   Epoch: 12   Global Step: 22420   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-03 19:20:16,909-Speed 13817.48 samples/sec   Loss 2.5510   LearningRate 0.0006   Epoch: 12   Global Step: 22430   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-03-03 19:20:34,761-Speed 13769.07 samples/sec   Loss 2.5684   LearningRate 0.0006   Epoch: 12   Global Step: 22440   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-03-03 19:20:52,448-Speed 13895.52 samples/sec   Loss 2.5781   LearningRate 0.0006   Epoch: 12   Global Step: 22450   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-03-03 19:21:10,215-Speed 13833.28 samples/sec   Loss 2.6140   LearningRate 0.0006   Epoch: 12   Global Step: 22460   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-03 19:22:19,288-Speed 3558.04 samples/sec   Loss 2.6045   LearningRate 0.0006   Epoch: 13   Global Step: 22470   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-03 19:22:36,968-Speed 13901.88 samples/sec   Loss 2.5316   LearningRate 0.0006   Epoch: 13   Global Step: 22480   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-03 19:22:54,734-Speed 13834.16 samples/sec   Loss 2.5144   LearningRate 0.0006   Epoch: 13   Global Step: 22490   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-03 19:23:12,475-Speed 13853.69 samples/sec   Loss 2.5129   LearningRate 0.0006   Epoch: 13   Global Step: 22500   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-03 19:23:30,233-Speed 13840.47 samples/sec   Loss 2.5207   LearningRate 0.0006   Epoch: 13   Global Step: 22510   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-03 19:23:48,057-Speed 13788.98 samples/sec   Loss 2.5124   LearningRate 0.0006   Epoch: 13   Global Step: 22520   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-03 19:24:05,897-Speed 13776.58 samples/sec   Loss 2.5157   LearningRate 0.0006   Epoch: 13   Global Step: 22530   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-03 19:24:23,722-Speed 13788.17 samples/sec   Loss 2.5399   LearningRate 0.0006   Epoch: 13   Global Step: 22540   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-03 19:24:41,523-Speed 13806.64 samples/sec   Loss 2.5273   LearningRate 0.0006   Epoch: 13   Global Step: 22550   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-03 19:24:59,278-Speed 13843.31 samples/sec   Loss 2.5125   LearningRate 0.0006   Epoch: 13   Global Step: 22560   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-03 19:25:17,180-Speed 13728.59 samples/sec   Loss 2.5148   LearningRate 0.0006   Epoch: 13   Global Step: 22570   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-03 19:25:35,749-Speed 13236.07 samples/sec   Loss 2.5133   LearningRate 0.0006   Epoch: 13   Global Step: 22580   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-03 19:25:53,567-Speed 13793.43 samples/sec   Loss 2.5188   LearningRate 0.0006   Epoch: 13   Global Step: 22590   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-03 19:26:11,338-Speed 13830.62 samples/sec   Loss 2.5106   LearningRate 0.0006   Epoch: 13   Global Step: 22600   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-03 19:26:29,958-Speed 13199.53 samples/sec   Loss 2.5166   LearningRate 0.0006   Epoch: 13   Global Step: 22610   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-03 19:26:47,691-Speed 13860.01 samples/sec   Loss 2.5211   LearningRate 0.0006   Epoch: 13   Global Step: 22620   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-03 19:27:05,407-Speed 13872.55 samples/sec   Loss 2.5518   LearningRate 0.0006   Epoch: 13   Global Step: 22630   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-03 19:27:23,216-Speed 13800.47 samples/sec   Loss 2.5236   LearningRate 0.0006   Epoch: 13   Global Step: 22640   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-03 19:27:40,965-Speed 13848.58 samples/sec   Loss 2.5261   LearningRate 0.0006   Epoch: 13   Global Step: 22650   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-03 19:27:58,699-Speed 13859.04 samples/sec   Loss 2.5028   LearningRate 0.0006   Epoch: 13   Global Step: 22660   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-03 19:28:16,572-Speed 13751.31 samples/sec   Loss 2.5268   LearningRate 0.0006   Epoch: 13   Global Step: 22670   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-03 19:28:34,319-Speed 13848.25 samples/sec   Loss 2.5197   LearningRate 0.0006   Epoch: 13   Global Step: 22680   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-03 19:28:52,078-Speed 13840.52 samples/sec   Loss 2.5138   LearningRate 0.0006   Epoch: 13   Global Step: 22690   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-03 19:29:09,802-Speed 13866.79 samples/sec   Loss 2.5178   LearningRate 0.0006   Epoch: 13   Global Step: 22700   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-03 19:29:27,538-Speed 13857.24 samples/sec   Loss 2.5132   LearningRate 0.0006   Epoch: 13   Global Step: 22710   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-03 19:29:45,298-Speed 13839.23 samples/sec   Loss 2.5087   LearningRate 0.0006   Epoch: 13   Global Step: 22720   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-03 19:30:03,065-Speed 13833.28 samples/sec   Loss 2.5278   LearningRate 0.0006   Epoch: 13   Global Step: 22730   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-03 19:30:20,753-Speed 13895.53 samples/sec   Loss 2.5171   LearningRate 0.0006   Epoch: 13   Global Step: 22740   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-03 19:30:38,551-Speed 13808.92 samples/sec   Loss 2.5380   LearningRate 0.0006   Epoch: 13   Global Step: 22750   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-03 19:30:56,316-Speed 13834.89 samples/sec   Loss 2.4935   LearningRate 0.0006   Epoch: 13   Global Step: 22760   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-03 19:31:14,100-Speed 13819.98 samples/sec   Loss 2.5236   LearningRate 0.0006   Epoch: 13   Global Step: 22770   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-03 19:31:31,847-Speed 13848.92 samples/sec   Loss 2.5172   LearningRate 0.0006   Epoch: 13   Global Step: 22780   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-03 19:31:49,581-Speed 13859.54 samples/sec   Loss 2.5237   LearningRate 0.0006   Epoch: 13   Global Step: 22790   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-03 19:32:07,361-Speed 13823.04 samples/sec   Loss 2.5129   LearningRate 0.0006   Epoch: 13   Global Step: 22800   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-03 19:32:25,122-Speed 13838.02 samples/sec   Loss 2.5340   LearningRate 0.0006   Epoch: 13   Global Step: 22810   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-03 19:32:42,801-Speed 13901.95 samples/sec   Loss 2.5524   LearningRate 0.0006   Epoch: 13   Global Step: 22820   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-03 19:33:00,526-Speed 13866.52 samples/sec   Loss 2.5636   LearningRate 0.0006   Epoch: 13   Global Step: 22830   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-03 19:33:18,286-Speed 13838.31 samples/sec   Loss 2.5204   LearningRate 0.0006   Epoch: 13   Global Step: 22840   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-03 19:33:36,009-Speed 13867.91 samples/sec   Loss 2.5263   LearningRate 0.0006   Epoch: 13   Global Step: 22850   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-03 19:33:53,828-Speed 13792.59 samples/sec   Loss 2.5292   LearningRate 0.0006   Epoch: 13   Global Step: 22860   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-03 19:34:11,619-Speed 13814.65 samples/sec   Loss 2.5126   LearningRate 0.0006   Epoch: 13   Global Step: 22870   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-03 19:34:29,428-Speed 13800.51 samples/sec   Loss 2.5009   LearningRate 0.0006   Epoch: 13   Global Step: 22880   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-03 19:34:47,150-Speed 13868.18 samples/sec   Loss 2.5026   LearningRate 0.0006   Epoch: 13   Global Step: 22890   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-03 19:35:04,878-Speed 13864.35 samples/sec   Loss 2.4947   LearningRate 0.0006   Epoch: 13   Global Step: 22900   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-03 19:35:22,684-Speed 13803.04 samples/sec   Loss 2.5201   LearningRate 0.0006   Epoch: 13   Global Step: 22910   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-03 19:35:40,463-Speed 13823.97 samples/sec   Loss 2.5217   LearningRate 0.0006   Epoch: 13   Global Step: 22920   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-03 19:35:58,237-Speed 13827.59 samples/sec   Loss 2.4995   LearningRate 0.0006   Epoch: 13   Global Step: 22930   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-03-03 19:36:16,055-Speed 13793.98 samples/sec   Loss 2.4964   LearningRate 0.0006   Epoch: 13   Global Step: 22940   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-03 19:36:33,835-Speed 13822.98 samples/sec   Loss 2.5200   LearningRate 0.0006   Epoch: 13   Global Step: 22950   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-03 19:36:51,684-Speed 13769.51 samples/sec   Loss 2.4961   LearningRate 0.0006   Epoch: 13   Global Step: 22960   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-03 19:37:10,021-Speed 13403.68 samples/sec   Loss 2.5032   LearningRate 0.0006   Epoch: 13   Global Step: 22970   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-03 19:37:28,315-Speed 13434.38 samples/sec   Loss 2.5000   LearningRate 0.0006   Epoch: 13   Global Step: 22980   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-03 19:37:46,611-Speed 13433.19 samples/sec   Loss 2.5093   LearningRate 0.0005   Epoch: 13   Global Step: 22990   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-03 19:38:04,417-Speed 13803.17 samples/sec   Loss 2.4973   LearningRate 0.0005   Epoch: 13   Global Step: 23000   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-03 19:38:22,171-Speed 13843.37 samples/sec   Loss 2.4785   LearningRate 0.0005   Epoch: 13   Global Step: 23010   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-03 19:38:40,626-Speed 13317.52 samples/sec   Loss 2.4824   LearningRate 0.0005   Epoch: 13   Global Step: 23020   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-03 19:38:58,304-Speed 13902.92 samples/sec   Loss 2.4892   LearningRate 0.0005   Epoch: 13   Global Step: 23030   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-03 19:39:16,213-Speed 13723.89 samples/sec   Loss 2.5176   LearningRate 0.0005   Epoch: 13   Global Step: 23040   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-03 19:39:34,740-Speed 13265.66 samples/sec   Loss 2.5042   LearningRate 0.0005   Epoch: 13   Global Step: 23050   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-03 19:39:53,169-Speed 13336.28 samples/sec   Loss 2.4820   LearningRate 0.0005   Epoch: 13   Global Step: 23060   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-03 19:40:10,935-Speed 13834.00 samples/sec   Loss 2.5132   LearningRate 0.0005   Epoch: 13   Global Step: 23070   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-03 19:40:28,692-Speed 13841.62 samples/sec   Loss 2.4798   LearningRate 0.0005   Epoch: 13   Global Step: 23080   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-03 19:40:46,428-Speed 13857.35 samples/sec   Loss 2.4762   LearningRate 0.0005   Epoch: 13   Global Step: 23090   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-03 19:41:05,053-Speed 13195.82 samples/sec   Loss 2.4896   LearningRate 0.0005   Epoch: 13   Global Step: 23100   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-03 19:41:22,777-Speed 13866.66 samples/sec   Loss 2.4720   LearningRate 0.0005   Epoch: 13   Global Step: 23110   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-03 19:41:41,434-Speed 13173.26 samples/sec   Loss 2.4718   LearningRate 0.0005   Epoch: 13   Global Step: 23120   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-03 19:41:59,146-Speed 13876.16 samples/sec   Loss 2.5072   LearningRate 0.0005   Epoch: 13   Global Step: 23130   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-03 19:42:17,038-Speed 13736.49 samples/sec   Loss 2.5062   LearningRate 0.0005   Epoch: 13   Global Step: 23140   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-03 19:42:35,668-Speed 13192.79 samples/sec   Loss 2.4836   LearningRate 0.0005   Epoch: 13   Global Step: 23150   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-03 19:42:53,413-Speed 13850.16 samples/sec   Loss 2.5049   LearningRate 0.0005   Epoch: 13   Global Step: 23160   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-03 19:43:11,863-Speed 13321.73 samples/sec   Loss 2.5114   LearningRate 0.0005   Epoch: 13   Global Step: 23170   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-03 19:43:29,927-Speed 13605.57 samples/sec   Loss 2.5079   LearningRate 0.0005   Epoch: 13   Global Step: 23180   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-03 19:43:48,303-Speed 13375.15 samples/sec   Loss 2.4810   LearningRate 0.0005   Epoch: 13   Global Step: 23190   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-03 19:44:06,172-Speed 13754.46 samples/sec   Loss 2.4919   LearningRate 0.0005   Epoch: 13   Global Step: 23200   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-03 19:44:24,783-Speed 13205.41 samples/sec   Loss 2.4931   LearningRate 0.0005   Epoch: 13   Global Step: 23210   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-03 19:44:42,681-Speed 13731.90 samples/sec   Loss 2.4700   LearningRate 0.0005   Epoch: 13   Global Step: 23220   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-03 19:45:00,470-Speed 13816.44 samples/sec   Loss 2.4858   LearningRate 0.0005   Epoch: 13   Global Step: 23230   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-03 19:45:18,534-Speed 13606.05 samples/sec   Loss 2.4870   LearningRate 0.0005   Epoch: 13   Global Step: 23240   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-03 19:45:36,868-Speed 13405.58 samples/sec   Loss 2.4931   LearningRate 0.0005   Epoch: 13   Global Step: 23250   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-03 19:45:54,579-Speed 13876.88 samples/sec   Loss 2.4728   LearningRate 0.0005   Epoch: 13   Global Step: 23260   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-03 19:46:13,186-Speed 13208.91 samples/sec   Loss 2.4921   LearningRate 0.0005   Epoch: 13   Global Step: 23270   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-03 19:46:30,923-Speed 13856.67 samples/sec   Loss 2.4865   LearningRate 0.0005   Epoch: 13   Global Step: 23280   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-03 19:46:48,688-Speed 13835.28 samples/sec   Loss 2.4850   LearningRate 0.0005   Epoch: 13   Global Step: 23290   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-03 19:47:06,464-Speed 13826.06 samples/sec   Loss 2.4518   LearningRate 0.0005   Epoch: 13   Global Step: 23300   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-03 19:47:24,238-Speed 13827.85 samples/sec   Loss 2.4522   LearningRate 0.0005   Epoch: 13   Global Step: 23310   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-03 19:47:41,975-Speed 13856.77 samples/sec   Loss 2.4790   LearningRate 0.0005   Epoch: 13   Global Step: 23320   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-03 19:47:59,750-Speed 13827.52 samples/sec   Loss 2.4816   LearningRate 0.0005   Epoch: 13   Global Step: 23330   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-03 19:48:17,523-Speed 13827.88 samples/sec   Loss 2.4698   LearningRate 0.0005   Epoch: 13   Global Step: 23340   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-03 19:48:35,246-Speed 13868.13 samples/sec   Loss 2.4719   LearningRate 0.0005   Epoch: 13   Global Step: 23350   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-03 19:48:53,034-Speed 13817.00 samples/sec   Loss 2.4802   LearningRate 0.0005   Epoch: 13   Global Step: 23360   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-03-03 19:49:10,830-Speed 13810.65 samples/sec   Loss 2.4645   LearningRate 0.0005   Epoch: 13   Global Step: 23370   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-03-03 19:49:28,620-Speed 13815.86 samples/sec   Loss 2.4612   LearningRate 0.0005   Epoch: 13   Global Step: 23380   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-03-03 19:49:46,609-Speed 13662.84 samples/sec   Loss 2.4616   LearningRate 0.0005   Epoch: 13   Global Step: 23390   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-03 19:50:05,194-Speed 13224.52 samples/sec   Loss 2.4622   LearningRate 0.0005   Epoch: 13   Global Step: 23400   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-03 19:50:22,877-Speed 13898.61 samples/sec   Loss 2.4810   LearningRate 0.0005   Epoch: 13   Global Step: 23410   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-03 19:50:41,029-Speed 13540.27 samples/sec   Loss 2.4536   LearningRate 0.0005   Epoch: 13   Global Step: 23420   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-03 19:50:59,201-Speed 13525.05 samples/sec   Loss 2.4492   LearningRate 0.0005   Epoch: 13   Global Step: 23430   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-03 19:51:17,798-Speed 13215.94 samples/sec   Loss 2.4812   LearningRate 0.0005   Epoch: 13   Global Step: 23440   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-03 19:51:35,587-Speed 13817.05 samples/sec   Loss 2.4649   LearningRate 0.0005   Epoch: 13   Global Step: 23450   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-03 19:51:53,408-Speed 13791.11 samples/sec   Loss 2.4652   LearningRate 0.0005   Epoch: 13   Global Step: 23460   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-03 19:52:12,024-Speed 13201.95 samples/sec   Loss 2.4517   LearningRate 0.0005   Epoch: 13   Global Step: 23470   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-03 19:52:30,006-Speed 13668.63 samples/sec   Loss 2.4647   LearningRate 0.0005   Epoch: 13   Global Step: 23480   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-03 19:52:48,624-Speed 13202.22 samples/sec   Loss 2.5379   LearningRate 0.0005   Epoch: 13   Global Step: 23490   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-03 19:53:06,975-Speed 13392.74 samples/sec   Loss 2.4993   LearningRate 0.0005   Epoch: 13   Global Step: 23500   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-03 19:53:25,057-Speed 13593.46 samples/sec   Loss 2.4635   LearningRate 0.0005   Epoch: 13   Global Step: 23510   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-03 19:53:43,426-Speed 13379.58 samples/sec   Loss 2.4513   LearningRate 0.0005   Epoch: 13   Global Step: 23520   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-03 19:54:01,228-Speed 13805.97 samples/sec   Loss 2.4368   LearningRate 0.0005   Epoch: 13   Global Step: 23530   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-03 19:54:18,973-Speed 13850.35 samples/sec   Loss 2.4524   LearningRate 0.0005   Epoch: 13   Global Step: 23540   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-03 19:54:36,747-Speed 13828.35 samples/sec   Loss 2.4618   LearningRate 0.0005   Epoch: 13   Global Step: 23550   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-03 19:54:54,517-Speed 13830.54 samples/sec   Loss 2.4454   LearningRate 0.0005   Epoch: 13   Global Step: 23560   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-03 19:55:12,320-Speed 13805.73 samples/sec   Loss 2.4449   LearningRate 0.0005   Epoch: 13   Global Step: 23570   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-03 19:55:29,982-Speed 13915.66 samples/sec   Loss 2.4401   LearningRate 0.0005   Epoch: 13   Global Step: 23580   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-03 19:55:47,699-Speed 13872.92 samples/sec   Loss 2.4551   LearningRate 0.0005   Epoch: 13   Global Step: 23590   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-03 19:56:05,388-Speed 13894.87 samples/sec   Loss 2.4660   LearningRate 0.0005   Epoch: 13   Global Step: 23600   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-03 19:56:23,120-Speed 13860.22 samples/sec   Loss 2.4220   LearningRate 0.0005   Epoch: 13   Global Step: 23610   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-03 19:56:40,894-Speed 13827.99 samples/sec   Loss 2.4404   LearningRate 0.0005   Epoch: 13   Global Step: 23620   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-03 19:56:58,624-Speed 13865.23 samples/sec   Loss 2.4472   LearningRate 0.0005   Epoch: 13   Global Step: 23630   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-03 19:57:16,354-Speed 13861.95 samples/sec   Loss 2.4720   LearningRate 0.0005   Epoch: 13   Global Step: 23640   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-03 19:57:34,032-Speed 13902.57 samples/sec   Loss 2.5009   LearningRate 0.0005   Epoch: 13   Global Step: 23650   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-03 19:57:51,760-Speed 13864.07 samples/sec   Loss 2.4859   LearningRate 0.0005   Epoch: 13   Global Step: 23660   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-03 19:58:09,606-Speed 13771.96 samples/sec   Loss 2.4429   LearningRate 0.0005   Epoch: 13   Global Step: 23670   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-03 19:58:27,321-Speed 13874.32 samples/sec   Loss 2.4349   LearningRate 0.0005   Epoch: 13   Global Step: 23680   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-03 19:58:45,046-Speed 13865.84 samples/sec   Loss 2.4555   LearningRate 0.0005   Epoch: 13   Global Step: 23690   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-03 19:59:02,812-Speed 13834.87 samples/sec   Loss 2.4574   LearningRate 0.0005   Epoch: 13   Global Step: 23700   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-03 19:59:20,580-Speed 13832.81 samples/sec   Loss 2.4276   LearningRate 0.0005   Epoch: 13   Global Step: 23710   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-03 19:59:38,251-Speed 13908.08 samples/sec   Loss 2.4139   LearningRate 0.0005   Epoch: 13   Global Step: 23720   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-03 19:59:56,024-Speed 13829.05 samples/sec   Loss 2.4426   LearningRate 0.0005   Epoch: 13   Global Step: 23730   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-03 20:00:13,856-Speed 13782.89 samples/sec   Loss 2.4283   LearningRate 0.0005   Epoch: 13   Global Step: 23740   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-03 20:00:31,658-Speed 13806.39 samples/sec   Loss 2.4294   LearningRate 0.0005   Epoch: 13   Global Step: 23750   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-03 20:00:49,386-Speed 13863.82 samples/sec   Loss 2.4369   LearningRate 0.0005   Epoch: 13   Global Step: 23760   Fp16 Grad Scale: 16384   Required: 23 hours
Training: 2022-03-03 20:01:07,162-Speed 13825.97 samples/sec   Loss 2.4327   LearningRate 0.0005   Epoch: 13   Global Step: 23770   Fp16 Grad Scale: 16384   Required: 23 hours
Training: 2022-03-03 20:01:24,989-Speed 13786.50 samples/sec   Loss 2.4477   LearningRate 0.0005   Epoch: 13   Global Step: 23780   Fp16 Grad Scale: 16384   Required: 23 hours
Training: 2022-03-03 20:01:42,729-Speed 13854.32 samples/sec   Loss 2.4559   LearningRate 0.0005   Epoch: 13   Global Step: 23790   Fp16 Grad Scale: 16384   Required: 23 hours
Training: 2022-03-03 20:02:00,690-Speed 13684.09 samples/sec   Loss 2.4451   LearningRate 0.0005   Epoch: 13   Global Step: 23800   Fp16 Grad Scale: 16384   Required: 23 hours
Training: 2022-03-03 20:02:18,636-Speed 13695.65 samples/sec   Loss 2.4284   LearningRate 0.0005   Epoch: 13   Global Step: 23810   Fp16 Grad Scale: 16384   Required: 23 hours
Training: 2022-03-03 20:02:36,461-Speed 13787.69 samples/sec   Loss 2.4447   LearningRate 0.0005   Epoch: 13   Global Step: 23820   Fp16 Grad Scale: 16384   Required: 23 hours
Training: 2022-03-03 20:02:54,326-Speed 13757.91 samples/sec   Loss 2.4397   LearningRate 0.0005   Epoch: 13   Global Step: 23830   Fp16 Grad Scale: 16384   Required: 23 hours
Training: 2022-03-03 20:03:12,100-Speed 13827.36 samples/sec   Loss 2.4429   LearningRate 0.0005   Epoch: 13   Global Step: 23840   Fp16 Grad Scale: 16384   Required: 23 hours
Training: 2022-03-03 20:03:29,885-Speed 13819.30 samples/sec   Loss 2.4357   LearningRate 0.0005   Epoch: 13   Global Step: 23850   Fp16 Grad Scale: 16384   Required: 23 hours
Training: 2022-03-03 20:03:47,650-Speed 13834.73 samples/sec   Loss 2.4352   LearningRate 0.0005   Epoch: 13   Global Step: 23860   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-03 20:04:05,499-Speed 13770.05 samples/sec   Loss 2.4270   LearningRate 0.0005   Epoch: 13   Global Step: 23870   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-03 20:04:23,285-Speed 13818.68 samples/sec   Loss 2.4310   LearningRate 0.0005   Epoch: 13   Global Step: 23880   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-03 20:04:41,213-Speed 13709.04 samples/sec   Loss 2.4262   LearningRate 0.0005   Epoch: 13   Global Step: 23890   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-03 20:04:59,323-Speed 13571.28 samples/sec   Loss 2.4209   LearningRate 0.0005   Epoch: 13   Global Step: 23900   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-03 20:05:17,422-Speed 13580.97 samples/sec   Loss 2.4411   LearningRate 0.0005   Epoch: 13   Global Step: 23910   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-03 20:05:35,162-Speed 13854.18 samples/sec   Loss 2.4505   LearningRate 0.0005   Epoch: 13   Global Step: 23920   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-03 20:05:52,861-Speed 13886.40 samples/sec   Loss 2.4550   LearningRate 0.0005   Epoch: 13   Global Step: 23930   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-03 20:06:10,581-Speed 13869.73 samples/sec   Loss 2.4443   LearningRate 0.0005   Epoch: 13   Global Step: 23940   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-03 20:06:28,310-Speed 13863.38 samples/sec   Loss 2.4231   LearningRate 0.0005   Epoch: 13   Global Step: 23950   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-03 20:06:46,027-Speed 13872.27 samples/sec   Loss 2.4425   LearningRate 0.0005   Epoch: 13   Global Step: 23960   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-03 20:07:03,762-Speed 13858.36 samples/sec   Loss 2.4301   LearningRate 0.0005   Epoch: 13   Global Step: 23970   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-03 20:07:21,496-Speed 13858.95 samples/sec   Loss 2.4235   LearningRate 0.0005   Epoch: 13   Global Step: 23980   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-03 20:07:39,431-Speed 13703.21 samples/sec   Loss 2.4381   LearningRate 0.0005   Epoch: 13   Global Step: 23990   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-03 20:07:57,342-Speed 13723.24 samples/sec   Loss 2.4081   LearningRate 0.0005   Epoch: 13   Global Step: 24000   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-03 20:08:15,273-Speed 13707.35 samples/sec   Loss 2.4364   LearningRate 0.0005   Epoch: 13   Global Step: 24010   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-03 20:08:33,170-Speed 13732.52 samples/sec   Loss 2.4567   LearningRate 0.0005   Epoch: 13   Global Step: 24020   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-03 20:08:50,925-Speed 13843.05 samples/sec   Loss 2.4263   LearningRate 0.0005   Epoch: 13   Global Step: 24030   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-03 20:09:08,621-Speed 13888.66 samples/sec   Loss 2.4149   LearningRate 0.0005   Epoch: 13   Global Step: 24040   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-03 20:09:26,369-Speed 13848.22 samples/sec   Loss 2.4307   LearningRate 0.0005   Epoch: 13   Global Step: 24050   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-03 20:09:44,018-Speed 13926.74 samples/sec   Loss 2.4382   LearningRate 0.0005   Epoch: 13   Global Step: 24060   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-03-03 20:10:01,788-Speed 13831.02 samples/sec   Loss 2.4296   LearningRate 0.0005   Epoch: 13   Global Step: 24070   Fp16 Grad Scale: 131072   Required: 23 hours
Training: 2022-03-03 20:10:19,481-Speed 13890.92 samples/sec   Loss 2.4162   LearningRate 0.0005   Epoch: 13   Global Step: 24080   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-03 20:10:37,224-Speed 13852.56 samples/sec   Loss 2.4076   LearningRate 0.0005   Epoch: 13   Global Step: 24090   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-03 20:10:55,129-Speed 13732.54 samples/sec   Loss 2.4296   LearningRate 0.0005   Epoch: 13   Global Step: 24100   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-03 20:11:12,849-Speed 13869.52 samples/sec   Loss 2.4365   LearningRate 0.0005   Epoch: 13   Global Step: 24110   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-03 20:11:30,516-Speed 13912.21 samples/sec   Loss 2.4245   LearningRate 0.0005   Epoch: 13   Global Step: 24120   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-03 20:11:48,267-Speed 13846.21 samples/sec   Loss 2.4339   LearningRate 0.0005   Epoch: 13   Global Step: 24130   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-03 20:12:06,034-Speed 13833.21 samples/sec   Loss 2.4293   LearningRate 0.0005   Epoch: 13   Global Step: 24140   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-03 20:12:23,683-Speed 13925.47 samples/sec   Loss 2.4280   LearningRate 0.0005   Epoch: 13   Global Step: 24150   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-03 20:12:41,336-Speed 13922.83 samples/sec   Loss 2.4418   LearningRate 0.0005   Epoch: 13   Global Step: 24160   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-03 20:12:59,061-Speed 13866.45 samples/sec   Loss 2.4211   LearningRate 0.0005   Epoch: 13   Global Step: 24170   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-03 20:13:16,792-Speed 13861.37 samples/sec   Loss 2.4336   LearningRate 0.0005   Epoch: 13   Global Step: 24180   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-03 20:13:34,451-Speed 13917.97 samples/sec   Loss 2.4433   LearningRate 0.0005   Epoch: 13   Global Step: 24190   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-03 20:14:41,689-Speed 3655.10 samples/sec   Loss 2.4223   LearningRate 0.0005   Epoch: 14   Global Step: 24200   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-03 20:14:59,317-Speed 13942.34 samples/sec   Loss 2.4217   LearningRate 0.0005   Epoch: 14   Global Step: 24210   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-03 20:15:16,969-Speed 13923.39 samples/sec   Loss 2.3981   LearningRate 0.0005   Epoch: 14   Global Step: 24220   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-03 20:15:34,618-Speed 13925.64 samples/sec   Loss 2.3797   LearningRate 0.0005   Epoch: 14   Global Step: 24230   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-03 20:15:52,339-Speed 13869.76 samples/sec   Loss 2.3759   LearningRate 0.0005   Epoch: 14   Global Step: 24240   Fp16 Grad Scale: 32768   Required: 23 hours
Training: 2022-03-03 20:16:10,147-Speed 13801.14 samples/sec   Loss 2.3915   LearningRate 0.0005   Epoch: 14   Global Step: 24250   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-03 20:16:27,830-Speed 13898.82 samples/sec   Loss 2.3860   LearningRate 0.0005   Epoch: 14   Global Step: 24260   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-03 20:16:45,499-Speed 13910.31 samples/sec   Loss 2.3788   LearningRate 0.0005   Epoch: 14   Global Step: 24270   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-03 20:17:03,189-Speed 13893.18 samples/sec   Loss 2.3903   LearningRate 0.0005   Epoch: 14   Global Step: 24280   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-03 20:17:20,834-Speed 13928.89 samples/sec   Loss 2.3700   LearningRate 0.0005   Epoch: 14   Global Step: 24290   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-03 20:17:38,497-Speed 13915.27 samples/sec   Loss 2.4038   LearningRate 0.0005   Epoch: 14   Global Step: 24300   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-03 20:17:56,214-Speed 13872.16 samples/sec   Loss 2.4018   LearningRate 0.0005   Epoch: 14   Global Step: 24310   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-03 20:18:13,869-Speed 13920.54 samples/sec   Loss 2.3734   LearningRate 0.0005   Epoch: 14   Global Step: 24320   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-03 20:18:31,690-Speed 13791.70 samples/sec   Loss 2.3785   LearningRate 0.0005   Epoch: 14   Global Step: 24330   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-03 20:18:49,372-Speed 13899.89 samples/sec   Loss 2.3905   LearningRate 0.0005   Epoch: 14   Global Step: 24340   Fp16 Grad Scale: 65536   Required: 23 hours
Training: 2022-03-03 20:19:07,163-Speed 13814.22 samples/sec   Loss 2.4163   LearningRate 0.0005   Epoch: 14   Global Step: 24350   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-03 20:19:24,993-Speed 13784.63 samples/sec   Loss 2.4044   LearningRate 0.0005   Epoch: 14   Global Step: 24360   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-03 20:19:42,673-Speed 13901.53 samples/sec   Loss 2.4152   LearningRate 0.0005   Epoch: 14   Global Step: 24370   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-03 20:20:00,309-Speed 13935.73 samples/sec   Loss 2.3948   LearningRate 0.0005   Epoch: 14   Global Step: 24380   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-03 20:20:18,026-Speed 13872.29 samples/sec   Loss 2.3809   LearningRate 0.0005   Epoch: 14   Global Step: 24390   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-03 20:20:35,734-Speed 13879.05 samples/sec   Loss 2.3778   LearningRate 0.0005   Epoch: 14   Global Step: 24400   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-03 20:20:53,494-Speed 13839.27 samples/sec   Loss 2.3854   LearningRate 0.0005   Epoch: 14   Global Step: 24410   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-03 20:21:11,243-Speed 13849.48 samples/sec   Loss 2.3823   LearningRate 0.0005   Epoch: 14   Global Step: 24420   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-03 20:21:29,003-Speed 13838.96 samples/sec   Loss 2.3838   LearningRate 0.0005   Epoch: 14   Global Step: 24430   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-03 20:21:46,915-Speed 13721.22 samples/sec   Loss 2.3882   LearningRate 0.0005   Epoch: 14   Global Step: 24440   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-03 20:22:04,800-Speed 13742.15 samples/sec   Loss 2.4157   LearningRate 0.0005   Epoch: 14   Global Step: 24450   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-03-03 20:22:22,591-Speed 13815.11 samples/sec   Loss 2.4214   LearningRate 0.0005   Epoch: 14   Global Step: 24460   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-03-03 20:22:40,232-Speed 13931.85 samples/sec   Loss 2.3881   LearningRate 0.0005   Epoch: 14   Global Step: 24470   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-03 20:22:57,925-Speed 13891.06 samples/sec   Loss 2.3909   LearningRate 0.0005   Epoch: 14   Global Step: 24480   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-03 20:23:15,670-Speed 13850.34 samples/sec   Loss 2.3705   LearningRate 0.0005   Epoch: 14   Global Step: 24490   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-03 20:23:33,369-Speed 13886.95 samples/sec   Loss 2.3747   LearningRate 0.0005   Epoch: 14   Global Step: 24500   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-03 20:23:51,087-Speed 13871.47 samples/sec   Loss 2.3706   LearningRate 0.0005   Epoch: 14   Global Step: 24510   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-03 20:24:08,766-Speed 13902.50 samples/sec   Loss 2.3949   LearningRate 0.0005   Epoch: 14   Global Step: 24520   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-03 20:24:26,486-Speed 13869.98 samples/sec   Loss 2.3848   LearningRate 0.0005   Epoch: 14   Global Step: 24530   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-03 20:24:44,158-Speed 13907.89 samples/sec   Loss 2.3737   LearningRate 0.0005   Epoch: 14   Global Step: 24540   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-03 20:25:01,861-Speed 13883.61 samples/sec   Loss 2.3980   LearningRate 0.0005   Epoch: 14   Global Step: 24550   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-03 20:25:19,617-Speed 13842.00 samples/sec   Loss 2.3808   LearningRate 0.0005   Epoch: 14   Global Step: 24560   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-03 20:25:37,393-Speed 13826.81 samples/sec   Loss 2.3698   LearningRate 0.0005   Epoch: 14   Global Step: 24570   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-03-03 20:25:55,099-Speed 13880.68 samples/sec   Loss 2.3952   LearningRate 0.0005   Epoch: 14   Global Step: 24580   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-03-03 20:26:12,912-Speed 13798.12 samples/sec   Loss 2.3936   LearningRate 0.0005   Epoch: 14   Global Step: 24590   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-03-03 20:26:30,716-Speed 13803.92 samples/sec   Loss 2.3914   LearningRate 0.0005   Epoch: 14   Global Step: 24600   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-03-03 20:26:48,447-Speed 13861.37 samples/sec   Loss 2.3820   LearningRate 0.0005   Epoch: 14   Global Step: 24610   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-03 20:27:06,253-Speed 13803.10 samples/sec   Loss 2.3618   LearningRate 0.0005   Epoch: 14   Global Step: 24620   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-03 20:27:24,050-Speed 13810.07 samples/sec   Loss 2.3736   LearningRate 0.0005   Epoch: 14   Global Step: 24630   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-03 20:27:41,952-Speed 13729.18 samples/sec   Loss 2.3895   LearningRate 0.0005   Epoch: 14   Global Step: 24640   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-03 20:27:59,733-Speed 13822.27 samples/sec   Loss 2.3841   LearningRate 0.0005   Epoch: 14   Global Step: 24650   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-03 20:28:17,558-Speed 13787.56 samples/sec   Loss 2.3896   LearningRate 0.0005   Epoch: 14   Global Step: 24660   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-03 20:28:35,415-Speed 13764.42 samples/sec   Loss 2.3671   LearningRate 0.0005   Epoch: 14   Global Step: 24670   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-03 20:28:53,214-Speed 13807.82 samples/sec   Loss 2.3994   LearningRate 0.0005   Epoch: 14   Global Step: 24680   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-03 20:29:11,166-Speed 13690.97 samples/sec   Loss 2.3875   LearningRate 0.0005   Epoch: 14   Global Step: 24690   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-03 20:29:29,321-Speed 13537.38 samples/sec   Loss 2.4167   LearningRate 0.0005   Epoch: 14   Global Step: 24700   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-03 20:29:47,153-Speed 13782.94 samples/sec   Loss 2.3693   LearningRate 0.0005   Epoch: 14   Global Step: 24710   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-03-03 20:30:04,875-Speed 13868.47 samples/sec   Loss 2.3583   LearningRate 0.0005   Epoch: 14   Global Step: 24720   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-03 20:30:22,588-Speed 13875.68 samples/sec   Loss 2.3705   LearningRate 0.0005   Epoch: 14   Global Step: 24730   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-03 20:30:40,420-Speed 13782.42 samples/sec   Loss 2.3639   LearningRate 0.0005   Epoch: 14   Global Step: 24740   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-03 20:30:58,197-Speed 13826.15 samples/sec   Loss 2.3834   LearningRate 0.0005   Epoch: 14   Global Step: 24750   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-03 20:31:15,889-Speed 13891.65 samples/sec   Loss 2.3699   LearningRate 0.0005   Epoch: 14   Global Step: 24760   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-03 20:31:33,617-Speed 13863.47 samples/sec   Loss 2.3485   LearningRate 0.0005   Epoch: 14   Global Step: 24770   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-03 20:31:51,366-Speed 13847.25 samples/sec   Loss 2.3570   LearningRate 0.0005   Epoch: 14   Global Step: 24780   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-03 20:32:09,106-Speed 13854.53 samples/sec   Loss 2.3749   LearningRate 0.0005   Epoch: 14   Global Step: 24790   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-03 20:32:26,797-Speed 13892.95 samples/sec   Loss 2.3734   LearningRate 0.0005   Epoch: 14   Global Step: 24800   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-03 20:32:44,531-Speed 13858.92 samples/sec   Loss 2.3585   LearningRate 0.0005   Epoch: 14   Global Step: 24810   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-03 20:33:02,338-Speed 13801.76 samples/sec   Loss 2.3449   LearningRate 0.0005   Epoch: 14   Global Step: 24820   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-03 20:33:20,074-Speed 13857.61 samples/sec   Loss 2.4066   LearningRate 0.0005   Epoch: 14   Global Step: 24830   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-03 20:33:37,858-Speed 13820.75 samples/sec   Loss 2.3839   LearningRate 0.0005   Epoch: 14   Global Step: 24840   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-03 20:33:55,609-Speed 13845.52 samples/sec   Loss 2.3610   LearningRate 0.0005   Epoch: 14   Global Step: 24850   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-03 20:34:13,475-Speed 13756.47 samples/sec   Loss 2.3649   LearningRate 0.0005   Epoch: 14   Global Step: 24860   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-03 20:34:31,192-Speed 13872.78 samples/sec   Loss 2.3656   LearningRate 0.0005   Epoch: 14   Global Step: 24870   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-03 20:34:48,924-Speed 13860.49 samples/sec   Loss 2.3581   LearningRate 0.0005   Epoch: 14   Global Step: 24880   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-03 20:35:06,785-Speed 13760.61 samples/sec   Loss 2.3602   LearningRate 0.0005   Epoch: 14   Global Step: 24890   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-03 20:35:24,658-Speed 13751.15 samples/sec   Loss 2.3660   LearningRate 0.0005   Epoch: 14   Global Step: 24900   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-03 20:35:42,505-Speed 13771.20 samples/sec   Loss 2.3379   LearningRate 0.0005   Epoch: 14   Global Step: 24910   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-03 20:36:00,345-Speed 13776.43 samples/sec   Loss 2.3521   LearningRate 0.0005   Epoch: 14   Global Step: 24920   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-03 20:36:18,231-Speed 13741.37 samples/sec   Loss 2.3773   LearningRate 0.0005   Epoch: 14   Global Step: 24930   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-03 20:36:35,960-Speed 13862.80 samples/sec   Loss 2.3468   LearningRate 0.0005   Epoch: 14   Global Step: 24940   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-03 20:36:53,708-Speed 13848.40 samples/sec   Loss 2.3649   LearningRate 0.0005   Epoch: 14   Global Step: 24950   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-03 20:37:11,518-Speed 13799.95 samples/sec   Loss 2.3478   LearningRate 0.0005   Epoch: 14   Global Step: 24960   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-03 20:37:29,198-Speed 13900.94 samples/sec   Loss 2.3582   LearningRate 0.0005   Epoch: 14   Global Step: 24970   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-03 20:37:46,917-Speed 13870.79 samples/sec   Loss 2.3573   LearningRate 0.0005   Epoch: 14   Global Step: 24980   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-03 20:38:04,738-Speed 13791.56 samples/sec   Loss 2.3425   LearningRate 0.0005   Epoch: 14   Global Step: 24990   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-03 20:38:22,594-Speed 13764.45 samples/sec   Loss 2.3465   LearningRate 0.0005   Epoch: 14   Global Step: 25000   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-03 20:38:40,406-Speed 13797.83 samples/sec   Loss 2.3885   LearningRate 0.0005   Epoch: 14   Global Step: 25010   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-03 20:38:58,272-Speed 13756.82 samples/sec   Loss 2.3834   LearningRate 0.0005   Epoch: 14   Global Step: 25020   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-03 20:39:16,142-Speed 13753.57 samples/sec   Loss 2.3605   LearningRate 0.0005   Epoch: 14   Global Step: 25030   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-03 20:39:33,906-Speed 13835.67 samples/sec   Loss 2.3626   LearningRate 0.0005   Epoch: 14   Global Step: 25040   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-03 20:39:51,750-Speed 13774.05 samples/sec   Loss 2.3340   LearningRate 0.0005   Epoch: 14   Global Step: 25050   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-03 20:40:09,509-Speed 13839.45 samples/sec   Loss 2.3253   LearningRate 0.0005   Epoch: 14   Global Step: 25060   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-03 20:40:27,319-Speed 13799.69 samples/sec   Loss 2.3528   LearningRate 0.0005   Epoch: 14   Global Step: 25070   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-03 20:40:45,066-Speed 13848.56 samples/sec   Loss 2.3433   LearningRate 0.0005   Epoch: 14   Global Step: 25080   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-03 20:41:02,818-Speed 13845.39 samples/sec   Loss 2.3309   LearningRate 0.0005   Epoch: 14   Global Step: 25090   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-03 20:41:20,522-Speed 13882.50 samples/sec   Loss 2.3348   LearningRate 0.0005   Epoch: 14   Global Step: 25100   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-03-03 20:41:38,270-Speed 13847.60 samples/sec   Loss 2.3475   LearningRate 0.0005   Epoch: 14   Global Step: 25110   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-03 20:41:56,102-Speed 13782.79 samples/sec   Loss 2.3586   LearningRate 0.0005   Epoch: 14   Global Step: 25120   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-03 20:42:13,978-Speed 13749.66 samples/sec   Loss 2.3429   LearningRate 0.0005   Epoch: 14   Global Step: 25130   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-03 20:42:31,757-Speed 13823.54 samples/sec   Loss 2.3396   LearningRate 0.0005   Epoch: 14   Global Step: 25140   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-03 20:42:49,632-Speed 13749.43 samples/sec   Loss 2.3360   LearningRate 0.0005   Epoch: 14   Global Step: 25150   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-03 20:43:07,475-Speed 13774.91 samples/sec   Loss 2.3462   LearningRate 0.0005   Epoch: 14   Global Step: 25160   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-03 20:43:25,229-Speed 13843.28 samples/sec   Loss 2.3353   LearningRate 0.0005   Epoch: 14   Global Step: 25170   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-03 20:43:43,007-Speed 13825.13 samples/sec   Loss 2.3391   LearningRate 0.0005   Epoch: 14   Global Step: 25180   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-03 20:44:00,841-Speed 13783.55 samples/sec   Loss 2.3231   LearningRate 0.0005   Epoch: 14   Global Step: 25190   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-03 20:44:18,733-Speed 13736.47 samples/sec   Loss 2.3233   LearningRate 0.0005   Epoch: 14   Global Step: 25200   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-03 20:44:36,617-Speed 13742.70 samples/sec   Loss 2.3431   LearningRate 0.0005   Epoch: 14   Global Step: 25210   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-03 20:44:54,461-Speed 13773.74 samples/sec   Loss 2.3402   LearningRate 0.0005   Epoch: 14   Global Step: 25220   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-03 20:45:12,353-Speed 13736.48 samples/sec   Loss 2.3339   LearningRate 0.0005   Epoch: 14   Global Step: 25230   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-03 20:45:30,197-Speed 13773.43 samples/sec   Loss 2.3246   LearningRate 0.0005   Epoch: 14   Global Step: 25240   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-03 20:45:48,002-Speed 13803.50 samples/sec   Loss 2.3318   LearningRate 0.0005   Epoch: 14   Global Step: 25250   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-03 20:46:05,924-Speed 13714.08 samples/sec   Loss 2.3276   LearningRate 0.0005   Epoch: 14   Global Step: 25260   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-03 20:46:23,747-Speed 13789.73 samples/sec   Loss 2.3361   LearningRate 0.0005   Epoch: 14   Global Step: 25270   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-03 20:46:41,544-Speed 13809.93 samples/sec   Loss 2.3402   LearningRate 0.0005   Epoch: 14   Global Step: 25280   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-03 20:46:59,437-Speed 13735.97 samples/sec   Loss 2.3404   LearningRate 0.0005   Epoch: 14   Global Step: 25290   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-03 20:47:17,268-Speed 13783.68 samples/sec   Loss 2.3404   LearningRate 0.0005   Epoch: 14   Global Step: 25300   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-03 20:47:35,133-Speed 13757.96 samples/sec   Loss 2.3192   LearningRate 0.0005   Epoch: 14   Global Step: 25310   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-03 20:47:52,949-Speed 13795.05 samples/sec   Loss 2.3248   LearningRate 0.0005   Epoch: 14   Global Step: 25320   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-03 20:48:10,924-Speed 13673.07 samples/sec   Loss 2.3343   LearningRate 0.0005   Epoch: 14   Global Step: 25330   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-03 20:48:28,737-Speed 13797.24 samples/sec   Loss 2.3226   LearningRate 0.0005   Epoch: 14   Global Step: 25340   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-03 20:48:46,543-Speed 13803.32 samples/sec   Loss 2.3526   LearningRate 0.0005   Epoch: 14   Global Step: 25350   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-03 20:49:04,382-Speed 13777.94 samples/sec   Loss 2.3300   LearningRate 0.0005   Epoch: 14   Global Step: 25360   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-03 20:49:22,223-Speed 13775.75 samples/sec   Loss 2.3444   LearningRate 0.0005   Epoch: 14   Global Step: 25370   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-03 20:49:40,107-Speed 13742.60 samples/sec   Loss 2.3156   LearningRate 0.0005   Epoch: 14   Global Step: 25380   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-03-03 20:49:57,909-Speed 13806.14 samples/sec   Loss 2.3140   LearningRate 0.0005   Epoch: 14   Global Step: 25390   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-03-03 20:50:15,655-Speed 13849.32 samples/sec   Loss 2.3322   LearningRate 0.0005   Epoch: 14   Global Step: 25400   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-03 20:50:33,514-Speed 13762.54 samples/sec   Loss 2.2914   LearningRate 0.0005   Epoch: 14   Global Step: 25410   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-03 20:50:51,413-Speed 13730.92 samples/sec   Loss 2.3252   LearningRate 0.0005   Epoch: 14   Global Step: 25420   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-03 20:51:09,170-Speed 13841.68 samples/sec   Loss 2.3060   LearningRate 0.0005   Epoch: 14   Global Step: 25430   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-03 20:51:27,016-Speed 13772.13 samples/sec   Loss 2.3267   LearningRate 0.0005   Epoch: 14   Global Step: 25440   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-03 20:51:44,783-Speed 13833.37 samples/sec   Loss 2.3266   LearningRate 0.0005   Epoch: 14   Global Step: 25450   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-03 20:52:02,658-Speed 13749.17 samples/sec   Loss 2.3253   LearningRate 0.0005   Epoch: 14   Global Step: 25460   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-03 20:52:20,426-Speed 13833.18 samples/sec   Loss 2.3143   LearningRate 0.0005   Epoch: 14   Global Step: 25470   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-03 20:52:38,222-Speed 13810.46 samples/sec   Loss 2.3133   LearningRate 0.0005   Epoch: 14   Global Step: 25480   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-03 20:52:56,081-Speed 13762.20 samples/sec   Loss 2.3138   LearningRate 0.0005   Epoch: 14   Global Step: 25490   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-03 20:53:13,874-Speed 13812.99 samples/sec   Loss 2.3064   LearningRate 0.0005   Epoch: 14   Global Step: 25500   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-03 20:53:31,650-Speed 13826.04 samples/sec   Loss 2.3310   LearningRate 0.0005   Epoch: 14   Global Step: 25510   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-03 20:53:49,413-Speed 13837.27 samples/sec   Loss 2.3232   LearningRate 0.0005   Epoch: 14   Global Step: 25520   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-03 20:54:07,304-Speed 13737.22 samples/sec   Loss 2.3149   LearningRate 0.0005   Epoch: 14   Global Step: 25530   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-03 20:54:25,151-Speed 13771.38 samples/sec   Loss 2.2989   LearningRate 0.0005   Epoch: 14   Global Step: 25540   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-03 20:54:42,962-Speed 13798.57 samples/sec   Loss 2.3161   LearningRate 0.0005   Epoch: 14   Global Step: 25550   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-03 20:55:00,734-Speed 13829.75 samples/sec   Loss 2.3078   LearningRate 0.0005   Epoch: 14   Global Step: 25560   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-03 20:55:18,538-Speed 13804.09 samples/sec   Loss 2.3226   LearningRate 0.0005   Epoch: 14   Global Step: 25570   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-03 20:55:36,297-Speed 13839.36 samples/sec   Loss 2.3119   LearningRate 0.0005   Epoch: 14   Global Step: 25580   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-03 20:55:54,026-Speed 13863.28 samples/sec   Loss 2.3164   LearningRate 0.0005   Epoch: 14   Global Step: 25590   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-03 20:56:11,853-Speed 13786.71 samples/sec   Loss 2.3144   LearningRate 0.0005   Epoch: 14   Global Step: 25600   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-03 20:56:29,665-Speed 13798.15 samples/sec   Loss 2.2879   LearningRate 0.0005   Epoch: 14   Global Step: 25610   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-03 20:56:47,411-Speed 13849.83 samples/sec   Loss 2.3267   LearningRate 0.0005   Epoch: 14   Global Step: 25620   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-03 20:57:05,199-Speed 13816.96 samples/sec   Loss 2.3140   LearningRate 0.0005   Epoch: 14   Global Step: 25630   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-03 20:57:22,932-Speed 13860.18 samples/sec   Loss 2.3114   LearningRate 0.0005   Epoch: 14   Global Step: 25640   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-03 20:57:40,679-Speed 13848.53 samples/sec   Loss 2.2961   LearningRate 0.0005   Epoch: 14   Global Step: 25650   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-03 20:57:58,518-Speed 13777.35 samples/sec   Loss 2.2979   LearningRate 0.0005   Epoch: 14   Global Step: 25660   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-03 20:58:16,206-Speed 13894.75 samples/sec   Loss 2.3021   LearningRate 0.0005   Epoch: 14   Global Step: 25670   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-03 20:58:34,000-Speed 13812.19 samples/sec   Loss 2.2887   LearningRate 0.0005   Epoch: 14   Global Step: 25680   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-03 20:58:51,741-Speed 13854.37 samples/sec   Loss 2.3128   LearningRate 0.0005   Epoch: 14   Global Step: 25690   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-03 20:59:09,428-Speed 13895.34 samples/sec   Loss 2.3155   LearningRate 0.0005   Epoch: 14   Global Step: 25700   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-03 20:59:27,175-Speed 13849.03 samples/sec   Loss 2.3152   LearningRate 0.0005   Epoch: 14   Global Step: 25710   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-03 20:59:44,893-Speed 13872.26 samples/sec   Loss 2.3471   LearningRate 0.0005   Epoch: 14   Global Step: 25720   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-03 21:00:02,765-Speed 13752.28 samples/sec   Loss 2.3217   LearningRate 0.0005   Epoch: 14   Global Step: 25730   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-03 21:00:20,493-Speed 13863.76 samples/sec   Loss 2.3181   LearningRate 0.0005   Epoch: 14   Global Step: 25740   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-03 21:00:38,189-Speed 13888.45 samples/sec   Loss 2.3263   LearningRate 0.0005   Epoch: 14   Global Step: 25750   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-03 21:00:55,968-Speed 13823.85 samples/sec   Loss 2.3223   LearningRate 0.0005   Epoch: 14   Global Step: 25760   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-03 21:01:13,790-Speed 13790.95 samples/sec   Loss 2.3137   LearningRate 0.0005   Epoch: 14   Global Step: 25770   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-03 21:01:31,467-Speed 13903.74 samples/sec   Loss 2.3021   LearningRate 0.0005   Epoch: 14   Global Step: 25780   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-03 21:01:49,215-Speed 13847.96 samples/sec   Loss 2.2963   LearningRate 0.0005   Epoch: 14   Global Step: 25790   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-03-03 21:02:06,921-Speed 13882.30 samples/sec   Loss 2.2967   LearningRate 0.0005   Epoch: 14   Global Step: 25800   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-03-03 21:02:24,633-Speed 13875.96 samples/sec   Loss 2.3086   LearningRate 0.0005   Epoch: 14   Global Step: 25810   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-03-03 21:02:42,308-Speed 13905.83 samples/sec   Loss 2.3071   LearningRate 0.0005   Epoch: 14   Global Step: 25820   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-03 21:02:59,970-Speed 13915.21 samples/sec   Loss 2.3177   LearningRate 0.0005   Epoch: 14   Global Step: 25830   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-03 21:03:17,721-Speed 13845.13 samples/sec   Loss 2.2981   LearningRate 0.0005   Epoch: 14   Global Step: 25840   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-03 21:03:35,425-Speed 13883.38 samples/sec   Loss 2.2881   LearningRate 0.0005   Epoch: 14   Global Step: 25850   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-03 21:03:53,115-Speed 13893.62 samples/sec   Loss 2.3031   LearningRate 0.0005   Epoch: 14   Global Step: 25860   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-03 21:04:10,854-Speed 13855.33 samples/sec   Loss 2.3108   LearningRate 0.0005   Epoch: 14   Global Step: 25870   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-03 21:04:28,671-Speed 13794.87 samples/sec   Loss 2.2811   LearningRate 0.0005   Epoch: 14   Global Step: 25880   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-03 21:04:46,350-Speed 13901.86 samples/sec   Loss 2.3186   LearningRate 0.0005   Epoch: 14   Global Step: 25890   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-03 21:05:04,193-Speed 13774.90 samples/sec   Loss 2.3087   LearningRate 0.0005   Epoch: 14   Global Step: 25900   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-03 21:05:21,924-Speed 13861.32 samples/sec   Loss 2.3380   LearningRate 0.0005   Epoch: 14   Global Step: 25910   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-03 21:05:39,669-Speed 13850.09 samples/sec   Loss 2.3149   LearningRate 0.0005   Epoch: 14   Global Step: 25920   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-03 21:06:48,913-Speed 3549.24 samples/sec   Loss 2.2814   LearningRate 0.0005   Epoch: 15   Global Step: 25930   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-03 21:07:06,520-Speed 13959.10 samples/sec   Loss 2.2586   LearningRate 0.0005   Epoch: 15   Global Step: 25940   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-03 21:07:24,150-Speed 13941.48 samples/sec   Loss 2.2808   LearningRate 0.0005   Epoch: 15   Global Step: 25950   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-03 21:07:41,816-Speed 13912.41 samples/sec   Loss 2.2682   LearningRate 0.0005   Epoch: 15   Global Step: 25960   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-03 21:07:59,600-Speed 13820.03 samples/sec   Loss 2.2685   LearningRate 0.0005   Epoch: 15   Global Step: 25970   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-03 21:08:17,317-Speed 13872.63 samples/sec   Loss 2.2817   LearningRate 0.0005   Epoch: 15   Global Step: 25980   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-03 21:08:35,007-Speed 13893.67 samples/sec   Loss 2.2724   LearningRate 0.0005   Epoch: 15   Global Step: 25990   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-03 21:08:52,772-Speed 13834.64 samples/sec   Loss 2.2608   LearningRate 0.0005   Epoch: 15   Global Step: 26000   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-03 21:09:10,565-Speed 13812.75 samples/sec   Loss 2.2709   LearningRate 0.0005   Epoch: 15   Global Step: 26010   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-03 21:09:28,260-Speed 13889.71 samples/sec   Loss 2.2577   LearningRate 0.0005   Epoch: 15   Global Step: 26020   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-03 21:09:45,945-Speed 13897.50 samples/sec   Loss 2.2932   LearningRate 0.0005   Epoch: 15   Global Step: 26030   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-03 21:10:03,622-Speed 13905.19 samples/sec   Loss 2.2642   LearningRate 0.0005   Epoch: 15   Global Step: 26040   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-03 21:10:21,332-Speed 13877.67 samples/sec   Loss 2.2686   LearningRate 0.0005   Epoch: 15   Global Step: 26050   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-03 21:10:39,136-Speed 13804.55 samples/sec   Loss 2.2575   LearningRate 0.0005   Epoch: 15   Global Step: 26060   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-03 21:10:56,941-Speed 13803.69 samples/sec   Loss 2.2618   LearningRate 0.0005   Epoch: 15   Global Step: 26070   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-03 21:11:14,681-Speed 13855.06 samples/sec   Loss 2.2695   LearningRate 0.0005   Epoch: 15   Global Step: 26080   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-03 21:11:32,407-Speed 13864.92 samples/sec   Loss 2.2732   LearningRate 0.0005   Epoch: 15   Global Step: 26090   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-03 21:11:50,096-Speed 13894.29 samples/sec   Loss 2.2832   LearningRate 0.0005   Epoch: 15   Global Step: 26100   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-03 21:12:07,801-Speed 13882.11 samples/sec   Loss 2.2637   LearningRate 0.0005   Epoch: 15   Global Step: 26110   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-03 21:12:25,598-Speed 13810.28 samples/sec   Loss 2.2688   LearningRate 0.0005   Epoch: 15   Global Step: 26120   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-03 21:12:43,308-Speed 13878.15 samples/sec   Loss 2.2835   LearningRate 0.0005   Epoch: 15   Global Step: 26130   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-03 21:13:01,132-Speed 13788.62 samples/sec   Loss 2.2808   LearningRate 0.0005   Epoch: 15   Global Step: 26140   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-03 21:13:18,924-Speed 13814.17 samples/sec   Loss 2.2439   LearningRate 0.0005   Epoch: 15   Global Step: 26150   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-03 21:13:36,667-Speed 13852.32 samples/sec   Loss 2.2708   LearningRate 0.0005   Epoch: 15   Global Step: 26160   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-03 21:13:54,435-Speed 13832.35 samples/sec   Loss 2.2790   LearningRate 0.0005   Epoch: 15   Global Step: 26170   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-03 21:14:12,127-Speed 13891.21 samples/sec   Loss 2.2611   LearningRate 0.0005   Epoch: 15   Global Step: 26180   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-03 21:14:29,819-Speed 13892.66 samples/sec   Loss 2.2826   LearningRate 0.0005   Epoch: 15   Global Step: 26190   Fp16 Grad Scale: 131072   Required: 22 hours
Training: 2022-03-03 21:14:47,494-Speed 13905.00 samples/sec   Loss 2.2700   LearningRate 0.0005   Epoch: 15   Global Step: 26200   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-03 21:15:05,247-Speed 13844.92 samples/sec   Loss 2.2573   LearningRate 0.0005   Epoch: 15   Global Step: 26210   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-03 21:15:23,092-Speed 13772.71 samples/sec   Loss 2.2860   LearningRate 0.0005   Epoch: 15   Global Step: 26220   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-03 21:15:40,866-Speed 13827.51 samples/sec   Loss 2.2854   LearningRate 0.0005   Epoch: 15   Global Step: 26230   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-03 21:15:58,547-Speed 13901.03 samples/sec   Loss 2.2564   LearningRate 0.0005   Epoch: 15   Global Step: 26240   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-03 21:16:16,301-Speed 13843.72 samples/sec   Loss 2.2540   LearningRate 0.0005   Epoch: 15   Global Step: 26250   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-03 21:16:34,012-Speed 13877.06 samples/sec   Loss 2.2624   LearningRate 0.0005   Epoch: 15   Global Step: 26260   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-03 21:16:51,812-Speed 13807.29 samples/sec   Loss 2.2720   LearningRate 0.0005   Epoch: 15   Global Step: 26270   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-03 21:17:09,579-Speed 13833.39 samples/sec   Loss 2.2646   LearningRate 0.0005   Epoch: 15   Global Step: 26280   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-03 21:17:27,321-Speed 13852.83 samples/sec   Loss 2.2487   LearningRate 0.0005   Epoch: 15   Global Step: 26290   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-03 21:17:45,078-Speed 13841.70 samples/sec   Loss 2.2542   LearningRate 0.0005   Epoch: 15   Global Step: 26300   Fp16 Grad Scale: 32768   Required: 22 hours
Training: 2022-03-03 21:18:02,799-Speed 13868.83 samples/sec   Loss 2.2872   LearningRate 0.0005   Epoch: 15   Global Step: 26310   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-03 21:18:20,569-Speed 13830.99 samples/sec   Loss 2.2630   LearningRate 0.0005   Epoch: 15   Global Step: 26320   Fp16 Grad Scale: 65536   Required: 22 hours
Training: 2022-03-03 21:18:38,371-Speed 13806.61 samples/sec   Loss 2.2659   LearningRate 0.0005   Epoch: 15   Global Step: 26330   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-03 21:18:56,342-Speed 13676.79 samples/sec   Loss 2.2576   LearningRate 0.0005   Epoch: 15   Global Step: 26340   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-03 21:19:14,363-Speed 13638.19 samples/sec   Loss 2.2755   LearningRate 0.0005   Epoch: 15   Global Step: 26350   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-03 21:19:32,362-Speed 13656.62 samples/sec   Loss 2.2615   LearningRate 0.0005   Epoch: 15   Global Step: 26360   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-03 21:19:50,051-Speed 13893.90 samples/sec   Loss 2.2614   LearningRate 0.0005   Epoch: 15   Global Step: 26370   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-03 21:20:07,852-Speed 13806.89 samples/sec   Loss 2.2738   LearningRate 0.0005   Epoch: 15   Global Step: 26380   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-03 21:20:25,594-Speed 13852.66 samples/sec   Loss 2.2594   LearningRate 0.0005   Epoch: 15   Global Step: 26390   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-03 21:20:43,290-Speed 13888.49 samples/sec   Loss 2.2330   LearningRate 0.0005   Epoch: 15   Global Step: 26400   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-03 21:21:00,984-Speed 13890.63 samples/sec   Loss 2.2782   LearningRate 0.0005   Epoch: 15   Global Step: 26410   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-03-03 21:21:18,720-Speed 13857.43 samples/sec   Loss 2.2600   LearningRate 0.0005   Epoch: 15   Global Step: 26420   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-03-03 21:21:36,414-Speed 13890.83 samples/sec   Loss 2.2550   LearningRate 0.0005   Epoch: 15   Global Step: 26430   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-03 21:21:54,107-Speed 13890.52 samples/sec   Loss 2.2499   LearningRate 0.0005   Epoch: 15   Global Step: 26440   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-03 21:22:11,844-Speed 13856.99 samples/sec   Loss 2.2556   LearningRate 0.0005   Epoch: 15   Global Step: 26450   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-03 21:22:29,560-Speed 13872.90 samples/sec   Loss 2.2443   LearningRate 0.0005   Epoch: 15   Global Step: 26460   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-03 21:22:47,368-Speed 13801.46 samples/sec   Loss 2.2542   LearningRate 0.0005   Epoch: 15   Global Step: 26470   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-03-03 21:23:05,200-Speed 13782.84 samples/sec   Loss 2.2732   LearningRate 0.0005   Epoch: 15   Global Step: 26480   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-03-03 21:23:22,957-Speed 13840.88 samples/sec   Loss 2.2480   LearningRate 0.0005   Epoch: 15   Global Step: 26490   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-03-03 21:23:40,652-Speed 13890.20 samples/sec   Loss 2.2529   LearningRate 0.0005   Epoch: 15   Global Step: 26500   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-03-03 21:23:58,351-Speed 13886.06 samples/sec   Loss 2.2377   LearningRate 0.0005   Epoch: 15   Global Step: 26510   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-03-03 21:24:16,071-Speed 13869.48 samples/sec   Loss 2.2417   LearningRate 0.0005   Epoch: 15   Global Step: 26520   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-03-03 21:24:33,861-Speed 13816.14 samples/sec   Loss 2.2598   LearningRate 0.0005   Epoch: 15   Global Step: 26530   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-03-03 21:24:51,626-Speed 13834.31 samples/sec   Loss 2.2586   LearningRate 0.0005   Epoch: 15   Global Step: 26540   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-03-03 21:25:09,339-Speed 13875.76 samples/sec   Loss 2.2510   LearningRate 0.0005   Epoch: 15   Global Step: 26550   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-03-03 21:25:27,083-Speed 13851.22 samples/sec   Loss 2.2541   LearningRate 0.0005   Epoch: 15   Global Step: 26560   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-03-03 21:25:44,855-Speed 13829.62 samples/sec   Loss 2.2194   LearningRate 0.0005   Epoch: 15   Global Step: 26570   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-03 21:26:02,606-Speed 13846.41 samples/sec   Loss 2.2452   LearningRate 0.0005   Epoch: 15   Global Step: 26580   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-03 21:26:20,408-Speed 13805.32 samples/sec   Loss 2.2334   LearningRate 0.0005   Epoch: 15   Global Step: 26590   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-03 21:26:38,121-Speed 13875.49 samples/sec   Loss 2.2547   LearningRate 0.0005   Epoch: 15   Global Step: 26600   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-03-03 21:26:55,864-Speed 13852.08 samples/sec   Loss 2.2536   LearningRate 0.0005   Epoch: 15   Global Step: 26610   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-03-03 21:27:13,678-Speed 13796.96 samples/sec   Loss 2.2387   LearningRate 0.0005   Epoch: 15   Global Step: 26620   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-03-03 21:27:31,406-Speed 13863.42 samples/sec   Loss 2.2336   LearningRate 0.0005   Epoch: 15   Global Step: 26630   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-03-03 21:27:49,282-Speed 13749.32 samples/sec   Loss 2.2466   LearningRate 0.0005   Epoch: 15   Global Step: 26640   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-03-03 21:28:07,092-Speed 13799.37 samples/sec   Loss 2.2697   LearningRate 0.0005   Epoch: 15   Global Step: 26650   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-03-03 21:28:24,890-Speed 13809.53 samples/sec   Loss 2.2486   LearningRate 0.0005   Epoch: 15   Global Step: 26660   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-03-03 21:28:42,684-Speed 13812.53 samples/sec   Loss 2.2335   LearningRate 0.0005   Epoch: 15   Global Step: 26670   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-03-03 21:29:00,447-Speed 13836.08 samples/sec   Loss 2.2383   LearningRate 0.0005   Epoch: 15   Global Step: 26680   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-03-03 21:29:18,202-Speed 13843.91 samples/sec   Loss 2.2530   LearningRate 0.0005   Epoch: 15   Global Step: 26690   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-03-03 21:29:35,929-Speed 13864.21 samples/sec   Loss 2.2257   LearningRate 0.0005   Epoch: 15   Global Step: 26700   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-03 21:29:53,673-Speed 13851.29 samples/sec   Loss 2.2326   LearningRate 0.0005   Epoch: 15   Global Step: 26710   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-03 21:30:11,524-Speed 13768.34 samples/sec   Loss 2.2278   LearningRate 0.0005   Epoch: 15   Global Step: 26720   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-03 21:30:29,363-Speed 13776.97 samples/sec   Loss 2.2307   LearningRate 0.0005   Epoch: 15   Global Step: 26730   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-03 21:30:47,053-Speed 13893.62 samples/sec   Loss 2.2251   LearningRate 0.0005   Epoch: 15   Global Step: 26740   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-03 21:31:04,795-Speed 13853.09 samples/sec   Loss 2.2476   LearningRate 0.0005   Epoch: 15   Global Step: 26750   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-03 21:31:22,512-Speed 13872.38 samples/sec   Loss 2.2663   LearningRate 0.0005   Epoch: 15   Global Step: 26760   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-03-03 21:31:40,287-Speed 13826.91 samples/sec   Loss 2.2332   LearningRate 0.0005   Epoch: 15   Global Step: 26770   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-03-03 21:31:57,971-Speed 13898.35 samples/sec   Loss 2.2301   LearningRate 0.0005   Epoch: 15   Global Step: 26780   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-03-03 21:32:15,734-Speed 13836.13 samples/sec   Loss 2.2337   LearningRate 0.0005   Epoch: 15   Global Step: 26790   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-03-03 21:32:33,487-Speed 13844.70 samples/sec   Loss 2.2548   LearningRate 0.0005   Epoch: 15   Global Step: 26800   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-03-03 21:32:51,227-Speed 13854.63 samples/sec   Loss 2.2525   LearningRate 0.0005   Epoch: 15   Global Step: 26810   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-03-03 21:33:09,048-Speed 13791.44 samples/sec   Loss 2.2277   LearningRate 0.0005   Epoch: 15   Global Step: 26820   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-03-03 21:33:26,791-Speed 13852.15 samples/sec   Loss 2.2164   LearningRate 0.0005   Epoch: 15   Global Step: 26830   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-03-03 21:33:44,604-Speed 13798.81 samples/sec   Loss 2.2144   LearningRate 0.0005   Epoch: 15   Global Step: 26840   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-03-03 21:34:02,371-Speed 13833.49 samples/sec   Loss 2.2457   LearningRate 0.0005   Epoch: 15   Global Step: 26850   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-03-03 21:34:20,106-Speed 13857.60 samples/sec   Loss 2.2269   LearningRate 0.0005   Epoch: 15   Global Step: 26860   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-03 21:34:37,946-Speed 13777.01 samples/sec   Loss 2.2047   LearningRate 0.0005   Epoch: 15   Global Step: 26870   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-03 21:34:55,784-Speed 13778.54 samples/sec   Loss 2.2236   LearningRate 0.0005   Epoch: 15   Global Step: 26880   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-03 21:35:13,514-Speed 13862.23 samples/sec   Loss 2.2455   LearningRate 0.0005   Epoch: 15   Global Step: 26890   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-03 21:35:31,340-Speed 13787.49 samples/sec   Loss 2.2452   LearningRate 0.0005   Epoch: 15   Global Step: 26900   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-03 21:35:49,125-Speed 13818.85 samples/sec   Loss 2.2176   LearningRate 0.0005   Epoch: 15   Global Step: 26910   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-03-03 21:36:06,913-Speed 13817.11 samples/sec   Loss 2.2377   LearningRate 0.0005   Epoch: 15   Global Step: 26920   Fp16 Grad Scale: 16384   Required: 21 hours
Training: 2022-03-03 21:36:24,707-Speed 13812.17 samples/sec   Loss 2.2427   LearningRate 0.0005   Epoch: 15   Global Step: 26930   Fp16 Grad Scale: 16384   Required: 21 hours
Training: 2022-03-03 21:36:42,603-Speed 13733.90 samples/sec   Loss 2.2212   LearningRate 0.0005   Epoch: 15   Global Step: 26940   Fp16 Grad Scale: 16384   Required: 21 hours
Training: 2022-03-03 21:37:00,314-Speed 13877.65 samples/sec   Loss 2.2178   LearningRate 0.0005   Epoch: 15   Global Step: 26950   Fp16 Grad Scale: 16384   Required: 21 hours
Training: 2022-03-03 21:37:18,077-Speed 13837.32 samples/sec   Loss 2.2157   LearningRate 0.0005   Epoch: 15   Global Step: 26960   Fp16 Grad Scale: 16384   Required: 21 hours
Training: 2022-03-03 21:37:35,866-Speed 13815.71 samples/sec   Loss 2.2103   LearningRate 0.0005   Epoch: 15   Global Step: 26970   Fp16 Grad Scale: 16384   Required: 21 hours
Training: 2022-03-03 21:37:53,616-Speed 13847.37 samples/sec   Loss 2.2060   LearningRate 0.0005   Epoch: 15   Global Step: 26980   Fp16 Grad Scale: 16384   Required: 21 hours
Training: 2022-03-03 21:38:11,327-Speed 13876.83 samples/sec   Loss 2.2122   LearningRate 0.0005   Epoch: 15   Global Step: 26990   Fp16 Grad Scale: 16384   Required: 21 hours
Training: 2022-03-03 21:38:29,055-Speed 13864.21 samples/sec   Loss 2.2086   LearningRate 0.0005   Epoch: 15   Global Step: 27000   Fp16 Grad Scale: 16384   Required: 21 hours
Training: 2022-03-03 21:38:46,800-Speed 13850.60 samples/sec   Loss 2.2083   LearningRate 0.0005   Epoch: 15   Global Step: 27010   Fp16 Grad Scale: 16384   Required: 21 hours
Training: 2022-03-03 21:39:04,576-Speed 13825.69 samples/sec   Loss 2.1937   LearningRate 0.0005   Epoch: 15   Global Step: 27020   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-03-03 21:39:22,359-Speed 13820.92 samples/sec   Loss 2.2043   LearningRate 0.0005   Epoch: 15   Global Step: 27030   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-03-03 21:39:40,011-Speed 13923.45 samples/sec   Loss 2.2229   LearningRate 0.0005   Epoch: 15   Global Step: 27040   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-03-03 21:39:57,802-Speed 13814.87 samples/sec   Loss 2.2096   LearningRate 0.0005   Epoch: 15   Global Step: 27050   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-03-03 21:40:15,553-Speed 13845.78 samples/sec   Loss 2.2217   LearningRate 0.0005   Epoch: 15   Global Step: 27060   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-03-03 21:40:33,275-Speed 13868.42 samples/sec   Loss 2.2208   LearningRate 0.0005   Epoch: 15   Global Step: 27070   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-03-03 21:40:51,052-Speed 13825.64 samples/sec   Loss 2.2057   LearningRate 0.0005   Epoch: 15   Global Step: 27080   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-03-03 21:41:08,803-Speed 13848.84 samples/sec   Loss 2.2295   LearningRate 0.0005   Epoch: 15   Global Step: 27090   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-03-03 21:41:26,566-Speed 13836.27 samples/sec   Loss 2.1998   LearningRate 0.0005   Epoch: 15   Global Step: 27100   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-03-03 21:41:44,390-Speed 13788.79 samples/sec   Loss 2.1963   LearningRate 0.0005   Epoch: 15   Global Step: 27110   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-03-03 21:42:02,116-Speed 13865.60 samples/sec   Loss 2.2146   LearningRate 0.0005   Epoch: 15   Global Step: 27120   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-03 21:42:19,852-Speed 13857.04 samples/sec   Loss 2.2266   LearningRate 0.0005   Epoch: 15   Global Step: 27130   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-03 21:42:37,723-Speed 13752.88 samples/sec   Loss 2.2069   LearningRate 0.0005   Epoch: 15   Global Step: 27140   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-03 21:42:55,461-Speed 13855.74 samples/sec   Loss 2.2148   LearningRate 0.0005   Epoch: 15   Global Step: 27150   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-03 21:43:13,205-Speed 13852.23 samples/sec   Loss 2.2094   LearningRate 0.0005   Epoch: 15   Global Step: 27160   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-03 21:43:30,916-Speed 13877.13 samples/sec   Loss 2.2054   LearningRate 0.0005   Epoch: 15   Global Step: 27170   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-03 21:43:48,649-Speed 13859.83 samples/sec   Loss 2.2057   LearningRate 0.0005   Epoch: 15   Global Step: 27180   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-03 21:44:06,471-Speed 13790.54 samples/sec   Loss 2.2259   LearningRate 0.0005   Epoch: 15   Global Step: 27190   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-03 21:44:24,240-Speed 13831.71 samples/sec   Loss 2.2169   LearningRate 0.0005   Epoch: 15   Global Step: 27200   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-03 21:44:42,009-Speed 13831.55 samples/sec   Loss 2.2037   LearningRate 0.0005   Epoch: 15   Global Step: 27210   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-03 21:44:59,735-Speed 13865.40 samples/sec   Loss 2.2098   LearningRate 0.0005   Epoch: 15   Global Step: 27220   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-03-03 21:45:17,524-Speed 13816.17 samples/sec   Loss 2.2193   LearningRate 0.0005   Epoch: 15   Global Step: 27230   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-03 21:45:35,264-Speed 13854.39 samples/sec   Loss 2.2031   LearningRate 0.0005   Epoch: 15   Global Step: 27240   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-03 21:45:52,982-Speed 13871.36 samples/sec   Loss 2.1951   LearningRate 0.0005   Epoch: 15   Global Step: 27250   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-03 21:46:10,736-Speed 13843.54 samples/sec   Loss 2.1931   LearningRate 0.0005   Epoch: 15   Global Step: 27260   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-03 21:46:28,492-Speed 13841.63 samples/sec   Loss 2.1752   LearningRate 0.0005   Epoch: 15   Global Step: 27270   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-03 21:46:46,275-Speed 13821.84 samples/sec   Loss 2.1712   LearningRate 0.0005   Epoch: 15   Global Step: 27280   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-03 21:47:04,076-Speed 13807.04 samples/sec   Loss 2.2022   LearningRate 0.0005   Epoch: 15   Global Step: 27290   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-03 21:47:21,820-Speed 13851.01 samples/sec   Loss 2.2140   LearningRate 0.0005   Epoch: 15   Global Step: 27300   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-03 21:47:39,551-Speed 13860.85 samples/sec   Loss 2.2049   LearningRate 0.0005   Epoch: 15   Global Step: 27310   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-03 21:47:57,334-Speed 13821.35 samples/sec   Loss 2.1902   LearningRate 0.0005   Epoch: 15   Global Step: 27320   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-03 21:48:15,334-Speed 13653.70 samples/sec   Loss 2.2091   LearningRate 0.0005   Epoch: 15   Global Step: 27330   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-03-03 21:48:33,083-Speed 13847.53 samples/sec   Loss 2.1959   LearningRate 0.0005   Epoch: 15   Global Step: 27340   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-03 21:48:50,896-Speed 13797.42 samples/sec   Loss 2.1867   LearningRate 0.0005   Epoch: 15   Global Step: 27350   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-03 21:49:08,658-Speed 13837.04 samples/sec   Loss 2.1984   LearningRate 0.0005   Epoch: 15   Global Step: 27360   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-03 21:49:26,392-Speed 13859.38 samples/sec   Loss 2.1926   LearningRate 0.0005   Epoch: 15   Global Step: 27370   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-03 21:49:44,113-Speed 13869.27 samples/sec   Loss 2.1798   LearningRate 0.0005   Epoch: 15   Global Step: 27380   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-03-03 21:50:01,808-Speed 13889.27 samples/sec   Loss 2.1876   LearningRate 0.0004   Epoch: 15   Global Step: 27390   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-03-03 21:50:19,639-Speed 13783.98 samples/sec   Loss 2.2054   LearningRate 0.0004   Epoch: 15   Global Step: 27400   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-03-03 21:50:37,386-Speed 13848.41 samples/sec   Loss 2.2057   LearningRate 0.0004   Epoch: 15   Global Step: 27410   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-03-03 21:50:55,144-Speed 13840.84 samples/sec   Loss 2.2118   LearningRate 0.0004   Epoch: 15   Global Step: 27420   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-03-03 21:51:12,904-Speed 13838.31 samples/sec   Loss 2.2048   LearningRate 0.0004   Epoch: 15   Global Step: 27430   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-03-03 21:51:30,619-Speed 13874.54 samples/sec   Loss 2.2027   LearningRate 0.0004   Epoch: 15   Global Step: 27440   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-03-03 21:51:48,311-Speed 13892.00 samples/sec   Loss 2.1861   LearningRate 0.0004   Epoch: 15   Global Step: 27450   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-03-03 21:52:06,072-Speed 13838.20 samples/sec   Loss 2.1889   LearningRate 0.0004   Epoch: 15   Global Step: 27460   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-03-03 21:52:23,866-Speed 13812.07 samples/sec   Loss 2.1855   LearningRate 0.0004   Epoch: 15   Global Step: 27470   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-03-03 21:52:41,757-Speed 13737.16 samples/sec   Loss 2.1925   LearningRate 0.0004   Epoch: 15   Global Step: 27480   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-03 21:52:59,557-Speed 13808.00 samples/sec   Loss 2.1676   LearningRate 0.0004   Epoch: 15   Global Step: 27490   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-03 21:53:17,299-Speed 13852.71 samples/sec   Loss 2.2039   LearningRate 0.0004   Epoch: 15   Global Step: 27500   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-03 21:53:35,002-Speed 13883.20 samples/sec   Loss 2.1829   LearningRate 0.0004   Epoch: 15   Global Step: 27510   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-03 21:53:52,766-Speed 13835.78 samples/sec   Loss 2.1985   LearningRate 0.0004   Epoch: 15   Global Step: 27520   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-03 21:54:10,506-Speed 13854.18 samples/sec   Loss 2.2028   LearningRate 0.0004   Epoch: 15   Global Step: 27530   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-03 21:54:28,222-Speed 13873.43 samples/sec   Loss 2.1772   LearningRate 0.0004   Epoch: 15   Global Step: 27540   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-03 21:54:45,917-Speed 13889.30 samples/sec   Loss 2.1949   LearningRate 0.0004   Epoch: 15   Global Step: 27550   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-03 21:55:03,646-Speed 13862.77 samples/sec   Loss 2.1868   LearningRate 0.0004   Epoch: 15   Global Step: 27560   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-03 21:55:21,309-Speed 13914.96 samples/sec   Loss 2.1939   LearningRate 0.0004   Epoch: 15   Global Step: 27570   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-03 21:55:39,046-Speed 13856.29 samples/sec   Loss 2.2007   LearningRate 0.0004   Epoch: 15   Global Step: 27580   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-03-03 21:55:56,854-Speed 13801.93 samples/sec   Loss 2.2042   LearningRate 0.0004   Epoch: 15   Global Step: 27590   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-03-03 21:56:14,512-Speed 13918.29 samples/sec   Loss 2.1997   LearningRate 0.0004   Epoch: 15   Global Step: 27600   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-03 21:56:32,242-Speed 13862.25 samples/sec   Loss 2.2056   LearningRate 0.0004   Epoch: 15   Global Step: 27610   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-03 21:56:49,916-Speed 13906.40 samples/sec   Loss 2.2008   LearningRate 0.0004   Epoch: 15   Global Step: 27620   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-03 21:57:07,626-Speed 13877.93 samples/sec   Loss 2.1994   LearningRate 0.0004   Epoch: 15   Global Step: 27630   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-03 21:57:25,327-Speed 13885.33 samples/sec   Loss 2.2024   LearningRate 0.0004   Epoch: 15   Global Step: 27640   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-03-03 21:57:43,012-Speed 13896.96 samples/sec   Loss 2.2014   LearningRate 0.0004   Epoch: 15   Global Step: 27650   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-03-03 21:58:52,213-Speed 3551.46 samples/sec   Loss 2.1591   LearningRate 0.0004   Epoch: 16   Global Step: 27660   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-03-03 21:59:09,912-Speed 13886.78 samples/sec   Loss 2.1601   LearningRate 0.0004   Epoch: 16   Global Step: 27670   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-03-03 21:59:27,637-Speed 13865.47 samples/sec   Loss 2.1613   LearningRate 0.0004   Epoch: 16   Global Step: 27680   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-03-03 21:59:45,359-Speed 13868.98 samples/sec   Loss 2.1445   LearningRate 0.0004   Epoch: 16   Global Step: 27690   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-03-03 22:00:03,152-Speed 13812.64 samples/sec   Loss 2.1483   LearningRate 0.0004   Epoch: 16   Global Step: 27700   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-03-03 22:00:20,983-Speed 13783.74 samples/sec   Loss 2.1789   LearningRate 0.0004   Epoch: 16   Global Step: 27710   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-03-03 22:00:38,797-Speed 13796.62 samples/sec   Loss 2.1616   LearningRate 0.0004   Epoch: 16   Global Step: 27720   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-03-03 22:00:56,496-Speed 13886.29 samples/sec   Loss 2.1540   LearningRate 0.0004   Epoch: 16   Global Step: 27730   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-03-03 22:01:14,245-Speed 13847.68 samples/sec   Loss 2.1600   LearningRate 0.0004   Epoch: 16   Global Step: 27740   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-03 22:01:31,981-Speed 13859.56 samples/sec   Loss 2.1760   LearningRate 0.0004   Epoch: 16   Global Step: 27750   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-03 22:01:49,681-Speed 13885.91 samples/sec   Loss 2.1615   LearningRate 0.0004   Epoch: 16   Global Step: 27760   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-03 22:02:07,526-Speed 13772.41 samples/sec   Loss 2.1704   LearningRate 0.0004   Epoch: 16   Global Step: 27770   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-03 22:02:25,333-Speed 13802.62 samples/sec   Loss 2.1620   LearningRate 0.0004   Epoch: 16   Global Step: 27780   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-03 22:02:43,129-Speed 13810.19 samples/sec   Loss 2.1487   LearningRate 0.0004   Epoch: 16   Global Step: 27790   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-03 22:03:00,878-Speed 13847.18 samples/sec   Loss 2.1607   LearningRate 0.0004   Epoch: 16   Global Step: 27800   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-03 22:03:18,688-Speed 13800.33 samples/sec   Loss 2.1693   LearningRate 0.0004   Epoch: 16   Global Step: 27810   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-03 22:03:36,604-Speed 13718.48 samples/sec   Loss 2.1549   LearningRate 0.0004   Epoch: 16   Global Step: 27820   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-03 22:03:54,494-Speed 13737.69 samples/sec   Loss 2.1573   LearningRate 0.0004   Epoch: 16   Global Step: 27830   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-03 22:04:12,270-Speed 13826.59 samples/sec   Loss 2.1837   LearningRate 0.0004   Epoch: 16   Global Step: 27840   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-03 22:04:30,330-Speed 13608.50 samples/sec   Loss 2.1758   LearningRate 0.0004   Epoch: 16   Global Step: 27850   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-03 22:04:48,547-Speed 13491.36 samples/sec   Loss 2.1577   LearningRate 0.0004   Epoch: 16   Global Step: 27860   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-03 22:05:06,671-Speed 13561.05 samples/sec   Loss 2.1531   LearningRate 0.0004   Epoch: 16   Global Step: 27870   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-03 22:05:24,977-Speed 13426.23 samples/sec   Loss 2.1605   LearningRate 0.0004   Epoch: 16   Global Step: 27880   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-03 22:05:43,118-Speed 13547.64 samples/sec   Loss 2.1713   LearningRate 0.0004   Epoch: 16   Global Step: 27890   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-03 22:06:01,231-Speed 13569.46 samples/sec   Loss 2.1927   LearningRate 0.0004   Epoch: 16   Global Step: 27900   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-03 22:06:19,337-Speed 13575.16 samples/sec   Loss 2.1579   LearningRate 0.0004   Epoch: 16   Global Step: 27910   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-03 22:06:37,478-Speed 13549.44 samples/sec   Loss 2.1437   LearningRate 0.0004   Epoch: 16   Global Step: 27920   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-03 22:06:55,647-Speed 13526.97 samples/sec   Loss 2.1700   LearningRate 0.0004   Epoch: 16   Global Step: 27930   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-03 22:07:13,669-Speed 13637.88 samples/sec   Loss 2.2111   LearningRate 0.0004   Epoch: 16   Global Step: 27940   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-03-03 22:07:31,816-Speed 13543.82 samples/sec   Loss 2.1564   LearningRate 0.0004   Epoch: 16   Global Step: 27950   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-03-03 22:07:49,943-Speed 13558.64 samples/sec   Loss 2.1498   LearningRate 0.0004   Epoch: 16   Global Step: 27960   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-03-03 22:08:08,032-Speed 13587.41 samples/sec   Loss 2.1496   LearningRate 0.0004   Epoch: 16   Global Step: 27970   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-03-03 22:08:26,162-Speed 13556.67 samples/sec   Loss 2.1456   LearningRate 0.0004   Epoch: 16   Global Step: 27980   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-03-03 22:08:44,266-Speed 13576.53 samples/sec   Loss 2.1577   LearningRate 0.0004   Epoch: 16   Global Step: 27990   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-03-03 22:09:02,444-Speed 13520.37 samples/sec   Loss 2.1723   LearningRate 0.0004   Epoch: 16   Global Step: 28000   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-03-03 22:09:20,587-Speed 13546.90 samples/sec   Loss 2.1730   LearningRate 0.0004   Epoch: 16   Global Step: 28010   Fp16 Grad Scale: 16384   Required: 21 hours
Training: 2022-03-03 22:09:38,726-Speed 13549.19 samples/sec   Loss 2.1675   LearningRate 0.0004   Epoch: 16   Global Step: 28020   Fp16 Grad Scale: 16384   Required: 21 hours
Training: 2022-03-03 22:09:56,808-Speed 13592.18 samples/sec   Loss 2.1485   LearningRate 0.0004   Epoch: 16   Global Step: 28030   Fp16 Grad Scale: 16384   Required: 21 hours
Training: 2022-03-03 22:10:14,956-Speed 13543.08 samples/sec   Loss 2.1378   LearningRate 0.0004   Epoch: 16   Global Step: 28040   Fp16 Grad Scale: 16384   Required: 21 hours
Training: 2022-03-03 22:10:33,065-Speed 13571.87 samples/sec   Loss 2.1487   LearningRate 0.0004   Epoch: 16   Global Step: 28050   Fp16 Grad Scale: 16384   Required: 21 hours
Training: 2022-03-03 22:10:51,285-Speed 13489.41 samples/sec   Loss 2.1592   LearningRate 0.0004   Epoch: 16   Global Step: 28060   Fp16 Grad Scale: 16384   Required: 21 hours
Training: 2022-03-03 22:11:09,432-Speed 13543.58 samples/sec   Loss 2.1425   LearningRate 0.0004   Epoch: 16   Global Step: 28070   Fp16 Grad Scale: 16384   Required: 21 hours
Training: 2022-03-03 22:11:27,526-Speed 13583.49 samples/sec   Loss 2.1262   LearningRate 0.0004   Epoch: 16   Global Step: 28080   Fp16 Grad Scale: 16384   Required: 21 hours
Training: 2022-03-03 22:11:45,699-Speed 13524.12 samples/sec   Loss 2.1511   LearningRate 0.0004   Epoch: 16   Global Step: 28090   Fp16 Grad Scale: 16384   Required: 21 hours
Training: 2022-03-03 22:12:03,862-Speed 13531.76 samples/sec   Loss 2.1548   LearningRate 0.0004   Epoch: 16   Global Step: 28100   Fp16 Grad Scale: 16384   Required: 21 hours
Training: 2022-03-03 22:12:21,952-Speed 13585.92 samples/sec   Loss 2.1307   LearningRate 0.0004   Epoch: 16   Global Step: 28110   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-03-03 22:12:40,049-Speed 13581.02 samples/sec   Loss 2.1525   LearningRate 0.0004   Epoch: 16   Global Step: 28120   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-03-03 22:12:58,145-Speed 13581.68 samples/sec   Loss 2.1537   LearningRate 0.0004   Epoch: 16   Global Step: 28130   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-03-03 22:13:16,312-Speed 13529.09 samples/sec   Loss 2.1466   LearningRate 0.0004   Epoch: 16   Global Step: 28140   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-03-03 22:13:34,437-Speed 13559.91 samples/sec   Loss 2.1738   LearningRate 0.0004   Epoch: 16   Global Step: 28150   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-03-03 22:13:52,541-Speed 13575.79 samples/sec   Loss 2.1670   LearningRate 0.0004   Epoch: 16   Global Step: 28160   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-03-03 22:14:10,663-Speed 13562.52 samples/sec   Loss 2.1318   LearningRate 0.0004   Epoch: 16   Global Step: 28170   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-03-03 22:14:28,791-Speed 13557.44 samples/sec   Loss 2.1464   LearningRate 0.0004   Epoch: 16   Global Step: 28180   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-03-03 22:14:46,874-Speed 13591.18 samples/sec   Loss 2.1585   LearningRate 0.0004   Epoch: 16   Global Step: 28190   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-03-03 22:15:05,049-Speed 13523.42 samples/sec   Loss 2.1399   LearningRate 0.0004   Epoch: 16   Global Step: 28200   Fp16 Grad Scale: 32768   Required: 21 hours
Training: 2022-03-03 22:15:23,145-Speed 13581.22 samples/sec   Loss 2.1471   LearningRate 0.0004   Epoch: 16   Global Step: 28210   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-03 22:15:41,254-Speed 13572.06 samples/sec   Loss 2.1474   LearningRate 0.0004   Epoch: 16   Global Step: 28220   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-03 22:15:59,362-Speed 13573.01 samples/sec   Loss 2.1499   LearningRate 0.0004   Epoch: 16   Global Step: 28230   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-03 22:16:17,429-Speed 13604.00 samples/sec   Loss 2.1186   LearningRate 0.0004   Epoch: 16   Global Step: 28240   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-03 22:16:35,540-Speed 13570.51 samples/sec   Loss 2.1396   LearningRate 0.0004   Epoch: 16   Global Step: 28250   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-03 22:16:53,644-Speed 13575.59 samples/sec   Loss 2.1428   LearningRate 0.0004   Epoch: 16   Global Step: 28260   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-03 22:17:11,744-Speed 13579.50 samples/sec   Loss 2.1509   LearningRate 0.0004   Epoch: 16   Global Step: 28270   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-03 22:17:29,983-Speed 13474.68 samples/sec   Loss 2.1284   LearningRate 0.0004   Epoch: 16   Global Step: 28280   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-03 22:17:48,140-Speed 13536.09 samples/sec   Loss 2.1428   LearningRate 0.0004   Epoch: 16   Global Step: 28290   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-03 22:18:06,279-Speed 13549.91 samples/sec   Loss 2.1371   LearningRate 0.0004   Epoch: 16   Global Step: 28300   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-03 22:18:24,380-Speed 13577.93 samples/sec   Loss 2.1412   LearningRate 0.0004   Epoch: 16   Global Step: 28310   Fp16 Grad Scale: 131072   Required: 21 hours
Training: 2022-03-03 22:18:42,397-Speed 13643.73 samples/sec   Loss 2.1202   LearningRate 0.0004   Epoch: 16   Global Step: 28320   Fp16 Grad Scale: 65536   Required: 21 hours
Training: 2022-03-03 22:19:00,506-Speed 13571.56 samples/sec   Loss 2.1348   LearningRate 0.0004   Epoch: 16   Global Step: 28330   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-03 22:19:18,589-Speed 13591.57 samples/sec   Loss 2.1664   LearningRate 0.0004   Epoch: 16   Global Step: 28340   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-03 22:19:36,772-Speed 13516.98 samples/sec   Loss 2.1343   LearningRate 0.0004   Epoch: 16   Global Step: 28350   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-03 22:19:54,928-Speed 13536.46 samples/sec   Loss 2.1306   LearningRate 0.0004   Epoch: 16   Global Step: 28360   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-03 22:20:13,014-Speed 13589.79 samples/sec   Loss 2.1267   LearningRate 0.0004   Epoch: 16   Global Step: 28370   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-03 22:20:31,073-Speed 13609.66 samples/sec   Loss 2.1528   LearningRate 0.0004   Epoch: 16   Global Step: 28380   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-03 22:20:49,165-Speed 13584.33 samples/sec   Loss 2.1192   LearningRate 0.0004   Epoch: 16   Global Step: 28390   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-03 22:21:07,338-Speed 13524.92 samples/sec   Loss 2.1254   LearningRate 0.0004   Epoch: 16   Global Step: 28400   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-03 22:21:25,406-Speed 13602.33 samples/sec   Loss 2.1498   LearningRate 0.0004   Epoch: 16   Global Step: 28410   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-03 22:21:43,504-Speed 13580.13 samples/sec   Loss 2.1244   LearningRate 0.0004   Epoch: 16   Global Step: 28420   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-03 22:22:01,650-Speed 13544.48 samples/sec   Loss 2.1258   LearningRate 0.0004   Epoch: 16   Global Step: 28430   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-03 22:22:19,744-Speed 13583.51 samples/sec   Loss 2.1410   LearningRate 0.0004   Epoch: 16   Global Step: 28440   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-03 22:22:37,818-Speed 13598.29 samples/sec   Loss 2.1441   LearningRate 0.0004   Epoch: 16   Global Step: 28450   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-03 22:22:55,855-Speed 13627.62 samples/sec   Loss 2.1259   LearningRate 0.0004   Epoch: 16   Global Step: 28460   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-03 22:23:13,954-Speed 13579.30 samples/sec   Loss 2.1111   LearningRate 0.0004   Epoch: 16   Global Step: 28470   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-03 22:23:32,086-Speed 13555.15 samples/sec   Loss 2.1286   LearningRate 0.0004   Epoch: 16   Global Step: 28480   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-03 22:23:50,131-Speed 13620.52 samples/sec   Loss 2.1287   LearningRate 0.0004   Epoch: 16   Global Step: 28490   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-03 22:24:08,224-Speed 13583.45 samples/sec   Loss 2.1261   LearningRate 0.0004   Epoch: 16   Global Step: 28500   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-03 22:24:26,303-Speed 13595.02 samples/sec   Loss 2.1159   LearningRate 0.0004   Epoch: 16   Global Step: 28510   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-03 22:24:44,028-Speed 13865.96 samples/sec   Loss 2.1300   LearningRate 0.0004   Epoch: 16   Global Step: 28520   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-03 22:25:01,792-Speed 13835.75 samples/sec   Loss 2.1236   LearningRate 0.0004   Epoch: 16   Global Step: 28530   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-03 22:25:19,485-Speed 13891.32 samples/sec   Loss 2.1095   LearningRate 0.0004   Epoch: 16   Global Step: 28540   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-03 22:25:37,199-Speed 13873.89 samples/sec   Loss 2.1463   LearningRate 0.0004   Epoch: 16   Global Step: 28550   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-03 22:25:54,971-Speed 13830.16 samples/sec   Loss 2.1169   LearningRate 0.0004   Epoch: 16   Global Step: 28560   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-03 22:26:12,796-Speed 13787.69 samples/sec   Loss 2.1080   LearningRate 0.0004   Epoch: 16   Global Step: 28570   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-03 22:26:30,618-Speed 13790.56 samples/sec   Loss 2.1132   LearningRate 0.0004   Epoch: 16   Global Step: 28580   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-03 22:26:48,390-Speed 13829.24 samples/sec   Loss 2.1210   LearningRate 0.0004   Epoch: 16   Global Step: 28590   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-03 22:27:06,170-Speed 13823.13 samples/sec   Loss 2.1141   LearningRate 0.0004   Epoch: 16   Global Step: 28600   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-03 22:27:24,007-Speed 13779.64 samples/sec   Loss 2.1291   LearningRate 0.0004   Epoch: 16   Global Step: 28610   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-03 22:27:41,937-Speed 13707.37 samples/sec   Loss 2.1151   LearningRate 0.0004   Epoch: 16   Global Step: 28620   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-03 22:27:59,744-Speed 13802.38 samples/sec   Loss 2.1169   LearningRate 0.0004   Epoch: 16   Global Step: 28630   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-03 22:28:17,609-Speed 13757.94 samples/sec   Loss 2.1280   LearningRate 0.0004   Epoch: 16   Global Step: 28640   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-03 22:28:35,469-Speed 13761.21 samples/sec   Loss 2.1262   LearningRate 0.0004   Epoch: 16   Global Step: 28650   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-03 22:28:53,275-Speed 13803.11 samples/sec   Loss 2.1108   LearningRate 0.0004   Epoch: 16   Global Step: 28660   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-03 22:29:11,143-Speed 13754.94 samples/sec   Loss 2.1209   LearningRate 0.0004   Epoch: 16   Global Step: 28670   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-03 22:29:28,975-Speed 13782.66 samples/sec   Loss 2.1316   LearningRate 0.0004   Epoch: 16   Global Step: 28680   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-03 22:29:46,860-Speed 13742.67 samples/sec   Loss 2.1196   LearningRate 0.0004   Epoch: 16   Global Step: 28690   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-03 22:30:04,669-Speed 13800.90 samples/sec   Loss 2.1178   LearningRate 0.0004   Epoch: 16   Global Step: 28700   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-03 22:30:22,345-Speed 13905.52 samples/sec   Loss 2.1331   LearningRate 0.0004   Epoch: 16   Global Step: 28710   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-03 22:30:40,089-Speed 13851.15 samples/sec   Loss 2.1076   LearningRate 0.0004   Epoch: 16   Global Step: 28720   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-03 22:30:57,899-Speed 13801.70 samples/sec   Loss 2.1102   LearningRate 0.0004   Epoch: 16   Global Step: 28730   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-03 22:31:15,692-Speed 13813.33 samples/sec   Loss 2.1314   LearningRate 0.0004   Epoch: 16   Global Step: 28740   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-03 22:31:33,434-Speed 13852.68 samples/sec   Loss 2.1073   LearningRate 0.0004   Epoch: 16   Global Step: 28750   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-03-03 22:31:51,246-Speed 13798.03 samples/sec   Loss 2.1177   LearningRate 0.0004   Epoch: 16   Global Step: 28760   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-03-03 22:32:08,950-Speed 13882.71 samples/sec   Loss 2.1063   LearningRate 0.0004   Epoch: 16   Global Step: 28770   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-03-03 22:32:26,695-Speed 13850.63 samples/sec   Loss 2.1142   LearningRate 0.0004   Epoch: 16   Global Step: 28780   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-03-03 22:32:44,345-Speed 13924.53 samples/sec   Loss 2.1076   LearningRate 0.0004   Epoch: 16   Global Step: 28790   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-03-03 22:33:02,214-Speed 13754.67 samples/sec   Loss 2.1214   LearningRate 0.0004   Epoch: 16   Global Step: 28800   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-03-03 22:33:19,942-Speed 13863.50 samples/sec   Loss 2.0970   LearningRate 0.0004   Epoch: 16   Global Step: 28810   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-03-03 22:33:37,785-Speed 13775.23 samples/sec   Loss 2.1030   LearningRate 0.0004   Epoch: 16   Global Step: 28820   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-03-03 22:33:55,608-Speed 13789.54 samples/sec   Loss 2.1224   LearningRate 0.0004   Epoch: 16   Global Step: 28830   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-03-03 22:34:13,475-Speed 13756.20 samples/sec   Loss 2.1185   LearningRate 0.0004   Epoch: 16   Global Step: 28840   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-03-03 22:34:31,289-Speed 13796.57 samples/sec   Loss 2.1069   LearningRate 0.0004   Epoch: 16   Global Step: 28850   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-03 22:34:48,966-Speed 13903.79 samples/sec   Loss 2.0847   LearningRate 0.0004   Epoch: 16   Global Step: 28860   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-03 22:35:06,725-Speed 13840.00 samples/sec   Loss 2.1037   LearningRate 0.0004   Epoch: 16   Global Step: 28870   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-03 22:35:24,497-Speed 13828.82 samples/sec   Loss 2.1164   LearningRate 0.0004   Epoch: 16   Global Step: 28880   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-03 22:35:42,191-Speed 13892.64 samples/sec   Loss 2.1006   LearningRate 0.0004   Epoch: 16   Global Step: 28890   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-03 22:35:59,915-Speed 13867.03 samples/sec   Loss 2.0932   LearningRate 0.0004   Epoch: 16   Global Step: 28900   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-03 22:36:17,634-Speed 13870.64 samples/sec   Loss 2.0946   LearningRate 0.0004   Epoch: 16   Global Step: 28910   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-03 22:36:35,325-Speed 13893.02 samples/sec   Loss 2.0867   LearningRate 0.0004   Epoch: 16   Global Step: 28920   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-03 22:36:53,101-Speed 13825.63 samples/sec   Loss 2.0976   LearningRate 0.0004   Epoch: 16   Global Step: 28930   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-03 22:37:10,886-Speed 13819.12 samples/sec   Loss 2.1070   LearningRate 0.0004   Epoch: 16   Global Step: 28940   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-03 22:37:28,708-Speed 13791.48 samples/sec   Loss 2.1063   LearningRate 0.0004   Epoch: 16   Global Step: 28950   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-03 22:37:46,393-Speed 13897.36 samples/sec   Loss 2.0944   LearningRate 0.0004   Epoch: 16   Global Step: 28960   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-03 22:38:04,097-Speed 13881.82 samples/sec   Loss 2.1050   LearningRate 0.0004   Epoch: 16   Global Step: 28970   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-03 22:38:21,792-Speed 13889.94 samples/sec   Loss 2.0873   LearningRate 0.0004   Epoch: 16   Global Step: 28980   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-03 22:38:39,559-Speed 13833.06 samples/sec   Loss 2.1015   LearningRate 0.0004   Epoch: 16   Global Step: 28990   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-03 22:38:57,508-Speed 13692.90 samples/sec   Loss 2.0792   LearningRate 0.0004   Epoch: 16   Global Step: 29000   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-03 22:39:15,195-Speed 13895.63 samples/sec   Loss 2.0813   LearningRate 0.0004   Epoch: 16   Global Step: 29010   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-03 22:39:32,900-Speed 13881.91 samples/sec   Loss 2.0974   LearningRate 0.0004   Epoch: 16   Global Step: 29020   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-03 22:39:50,636-Speed 13858.07 samples/sec   Loss 2.0802   LearningRate 0.0004   Epoch: 16   Global Step: 29030   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-03 22:40:08,389-Speed 13843.73 samples/sec   Loss 2.1026   LearningRate 0.0004   Epoch: 16   Global Step: 29040   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-03 22:40:26,097-Speed 13879.41 samples/sec   Loss 2.0833   LearningRate 0.0004   Epoch: 16   Global Step: 29050   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-03 22:40:43,822-Speed 13866.55 samples/sec   Loss 2.0799   LearningRate 0.0004   Epoch: 16   Global Step: 29060   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-03 22:41:01,632-Speed 13800.18 samples/sec   Loss 2.0915   LearningRate 0.0004   Epoch: 16   Global Step: 29070   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-03 22:41:19,370-Speed 13855.89 samples/sec   Loss 2.0927   LearningRate 0.0004   Epoch: 16   Global Step: 29080   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-03 22:41:37,054-Speed 13897.52 samples/sec   Loss 2.0878   LearningRate 0.0004   Epoch: 16   Global Step: 29090   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-03 22:41:54,748-Speed 13891.01 samples/sec   Loss 2.1090   LearningRate 0.0004   Epoch: 16   Global Step: 29100   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-03 22:42:12,662-Speed 13721.01 samples/sec   Loss 2.0941   LearningRate 0.0004   Epoch: 16   Global Step: 29110   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-03 22:42:30,416-Speed 13843.21 samples/sec   Loss 2.0762   LearningRate 0.0004   Epoch: 16   Global Step: 29120   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-03 22:42:48,139-Speed 13867.36 samples/sec   Loss 2.0876   LearningRate 0.0004   Epoch: 16   Global Step: 29130   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-03 22:43:05,888-Speed 13847.54 samples/sec   Loss 2.0993   LearningRate 0.0004   Epoch: 16   Global Step: 29140   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-03 22:43:23,649-Speed 13838.94 samples/sec   Loss 2.1095   LearningRate 0.0004   Epoch: 16   Global Step: 29150   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-03 22:43:41,388-Speed 13854.99 samples/sec   Loss 2.0997   LearningRate 0.0004   Epoch: 16   Global Step: 29160   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-03 22:43:59,091-Speed 13883.57 samples/sec   Loss 2.0964   LearningRate 0.0004   Epoch: 16   Global Step: 29170   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-03 22:44:16,923-Speed 13782.28 samples/sec   Loss 2.0831   LearningRate 0.0004   Epoch: 16   Global Step: 29180   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-03-03 22:44:34,625-Speed 13884.88 samples/sec   Loss 2.0959   LearningRate 0.0004   Epoch: 16   Global Step: 29190   Fp16 Grad Scale: 131072   Required: 20 hours
Training: 2022-03-03 22:44:52,355-Speed 13861.70 samples/sec   Loss 2.0808   LearningRate 0.0004   Epoch: 16   Global Step: 29200   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-03 22:45:10,133-Speed 13826.85 samples/sec   Loss 2.0807   LearningRate 0.0004   Epoch: 16   Global Step: 29210   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-03 22:45:27,902-Speed 13831.62 samples/sec   Loss 2.0795   LearningRate 0.0004   Epoch: 16   Global Step: 29220   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-03 22:45:45,642-Speed 13854.29 samples/sec   Loss 2.0801   LearningRate 0.0004   Epoch: 16   Global Step: 29230   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-03 22:46:03,426-Speed 13819.96 samples/sec   Loss 2.0807   LearningRate 0.0004   Epoch: 16   Global Step: 29240   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-03 22:46:21,174-Speed 13848.36 samples/sec   Loss 2.1162   LearningRate 0.0004   Epoch: 16   Global Step: 29250   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-03 22:46:38,873-Speed 13886.87 samples/sec   Loss 2.0870   LearningRate 0.0004   Epoch: 16   Global Step: 29260   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-03 22:46:56,607-Speed 13858.75 samples/sec   Loss 2.0854   LearningRate 0.0004   Epoch: 16   Global Step: 29270   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-03 22:47:14,473-Speed 13757.02 samples/sec   Loss 2.0787   LearningRate 0.0004   Epoch: 16   Global Step: 29280   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-03 22:47:32,249-Speed 13826.02 samples/sec   Loss 2.1017   LearningRate 0.0004   Epoch: 16   Global Step: 29290   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-03 22:47:50,042-Speed 13813.55 samples/sec   Loss 2.1121   LearningRate 0.0004   Epoch: 16   Global Step: 29300   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-03 22:48:07,822-Speed 13822.97 samples/sec   Loss 2.0864   LearningRate 0.0004   Epoch: 16   Global Step: 29310   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-03 22:48:25,510-Speed 13894.88 samples/sec   Loss 2.0718   LearningRate 0.0004   Epoch: 16   Global Step: 29320   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-03 22:48:43,272-Speed 13837.37 samples/sec   Loss 2.1064   LearningRate 0.0004   Epoch: 16   Global Step: 29330   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-03-03 22:49:00,974-Speed 13884.44 samples/sec   Loss 2.1116   LearningRate 0.0004   Epoch: 16   Global Step: 29340   Fp16 Grad Scale: 8192   Required: 20 hours
Training: 2022-03-03 22:49:18,736-Speed 13836.71 samples/sec   Loss 2.0969   LearningRate 0.0004   Epoch: 16   Global Step: 29350   Fp16 Grad Scale: 8192   Required: 20 hours
Training: 2022-03-03 22:49:36,424-Speed 13895.21 samples/sec   Loss 2.0892   LearningRate 0.0004   Epoch: 16   Global Step: 29360   Fp16 Grad Scale: 8192   Required: 20 hours
Training: 2022-03-03 22:49:54,296-Speed 13752.15 samples/sec   Loss 2.0945   LearningRate 0.0004   Epoch: 16   Global Step: 29370   Fp16 Grad Scale: 8192   Required: 20 hours
Training: 2022-03-03 22:50:12,068-Speed 13830.38 samples/sec   Loss 2.1086   LearningRate 0.0004   Epoch: 16   Global Step: 29380   Fp16 Grad Scale: 8192   Required: 20 hours
Training: 2022-03-03 22:51:20,994-Speed 3565.67 samples/sec   Loss 2.0606   LearningRate 0.0004   Epoch: 17   Global Step: 29390   Fp16 Grad Scale: 8192   Required: 20 hours
Training: 2022-03-03 22:51:38,609-Speed 13952.29 samples/sec   Loss 2.0539   LearningRate 0.0004   Epoch: 17   Global Step: 29400   Fp16 Grad Scale: 8192   Required: 20 hours
Training: 2022-03-03 22:51:56,277-Speed 13911.06 samples/sec   Loss 2.0486   LearningRate 0.0004   Epoch: 17   Global Step: 29410   Fp16 Grad Scale: 8192   Required: 20 hours
Training: 2022-03-03 22:52:14,098-Speed 13791.53 samples/sec   Loss 2.0524   LearningRate 0.0004   Epoch: 17   Global Step: 29420   Fp16 Grad Scale: 8192   Required: 20 hours
Training: 2022-03-03 22:52:31,915-Speed 13794.64 samples/sec   Loss 2.0353   LearningRate 0.0004   Epoch: 17   Global Step: 29430   Fp16 Grad Scale: 8192   Required: 20 hours
Training: 2022-03-03 22:52:49,671-Speed 13841.26 samples/sec   Loss 2.0704   LearningRate 0.0004   Epoch: 17   Global Step: 29440   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-03-03 22:53:07,447-Speed 13827.02 samples/sec   Loss 2.0468   LearningRate 0.0004   Epoch: 17   Global Step: 29450   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-03-03 22:53:25,209-Speed 13837.54 samples/sec   Loss 2.0715   LearningRate 0.0004   Epoch: 17   Global Step: 29460   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-03-03 22:53:42,937-Speed 13863.69 samples/sec   Loss 2.0571   LearningRate 0.0004   Epoch: 17   Global Step: 29470   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-03-03 22:54:00,575-Speed 13933.93 samples/sec   Loss 2.0687   LearningRate 0.0004   Epoch: 17   Global Step: 29480   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-03-03 22:54:18,274-Speed 13886.93 samples/sec   Loss 2.0511   LearningRate 0.0004   Epoch: 17   Global Step: 29490   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-03-03 22:54:36,047-Speed 13828.31 samples/sec   Loss 2.0501   LearningRate 0.0004   Epoch: 17   Global Step: 29500   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-03-03 22:54:53,754-Speed 13880.10 samples/sec   Loss 2.0489   LearningRate 0.0004   Epoch: 17   Global Step: 29510   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-03-03 22:55:11,457-Speed 13883.38 samples/sec   Loss 2.0505   LearningRate 0.0004   Epoch: 17   Global Step: 29520   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-03-03 22:55:29,149-Speed 13892.37 samples/sec   Loss 2.0659   LearningRate 0.0004   Epoch: 17   Global Step: 29530   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-03-03 22:55:46,867-Speed 13871.34 samples/sec   Loss 2.0536   LearningRate 0.0004   Epoch: 17   Global Step: 29540   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-03 22:56:04,599-Speed 13860.69 samples/sec   Loss 2.0658   LearningRate 0.0004   Epoch: 17   Global Step: 29550   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-03 22:56:22,309-Speed 13877.39 samples/sec   Loss 2.0633   LearningRate 0.0004   Epoch: 17   Global Step: 29560   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-03 22:56:39,980-Speed 13909.27 samples/sec   Loss 2.0605   LearningRate 0.0004   Epoch: 17   Global Step: 29570   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-03 22:56:57,727-Speed 13848.97 samples/sec   Loss 2.0815   LearningRate 0.0004   Epoch: 17   Global Step: 29580   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-03 22:57:15,393-Speed 13912.05 samples/sec   Loss 2.0721   LearningRate 0.0004   Epoch: 17   Global Step: 29590   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-03 22:57:33,091-Speed 13887.12 samples/sec   Loss 2.0691   LearningRate 0.0004   Epoch: 17   Global Step: 29600   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-03 22:57:50,778-Speed 13896.54 samples/sec   Loss 2.0497   LearningRate 0.0004   Epoch: 17   Global Step: 29610   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-03 22:58:08,494-Speed 13873.13 samples/sec   Loss 2.0406   LearningRate 0.0004   Epoch: 17   Global Step: 29620   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-03 22:58:26,455-Speed 13683.57 samples/sec   Loss 2.0635   LearningRate 0.0004   Epoch: 17   Global Step: 29630   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-03 22:58:44,164-Speed 13878.28 samples/sec   Loss 2.0603   LearningRate 0.0004   Epoch: 17   Global Step: 29640   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-03 22:59:02,025-Speed 13760.83 samples/sec   Loss 2.0702   LearningRate 0.0004   Epoch: 17   Global Step: 29650   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-03 22:59:19,778-Speed 13844.00 samples/sec   Loss 2.0569   LearningRate 0.0004   Epoch: 17   Global Step: 29660   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-03 22:59:37,526-Speed 13848.19 samples/sec   Loss 2.0683   LearningRate 0.0004   Epoch: 17   Global Step: 29670   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-03 22:59:55,250-Speed 13866.91 samples/sec   Loss 2.0609   LearningRate 0.0004   Epoch: 17   Global Step: 29680   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-03 23:00:13,038-Speed 13816.97 samples/sec   Loss 2.0740   LearningRate 0.0004   Epoch: 17   Global Step: 29690   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-03 23:00:30,697-Speed 13918.42 samples/sec   Loss 2.0839   LearningRate 0.0004   Epoch: 17   Global Step: 29700   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-03 23:00:48,454-Speed 13840.67 samples/sec   Loss 2.0532   LearningRate 0.0004   Epoch: 17   Global Step: 29710   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-03 23:01:06,303-Speed 13769.38 samples/sec   Loss 2.0464   LearningRate 0.0004   Epoch: 17   Global Step: 29720   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-03 23:01:24,001-Speed 13887.47 samples/sec   Loss 2.0466   LearningRate 0.0004   Epoch: 17   Global Step: 29730   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-03 23:01:41,776-Speed 13828.05 samples/sec   Loss 2.0560   LearningRate 0.0004   Epoch: 17   Global Step: 29740   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-03 23:01:59,464-Speed 13894.92 samples/sec   Loss 2.0625   LearningRate 0.0004   Epoch: 17   Global Step: 29750   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-03 23:02:17,263-Speed 13807.84 samples/sec   Loss 2.0582   LearningRate 0.0004   Epoch: 17   Global Step: 29760   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-03 23:02:34,995-Speed 13860.51 samples/sec   Loss 2.0623   LearningRate 0.0004   Epoch: 17   Global Step: 29770   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-03 23:02:52,716-Speed 13869.36 samples/sec   Loss 2.0516   LearningRate 0.0004   Epoch: 17   Global Step: 29780   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-03 23:03:10,410-Speed 13890.85 samples/sec   Loss 2.0534   LearningRate 0.0004   Epoch: 17   Global Step: 29790   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-03 23:03:28,176-Speed 13833.72 samples/sec   Loss 2.0521   LearningRate 0.0004   Epoch: 17   Global Step: 29800   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-03 23:03:45,903-Speed 13864.69 samples/sec   Loss 2.0596   LearningRate 0.0004   Epoch: 17   Global Step: 29810   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-03 23:04:03,740-Speed 13779.02 samples/sec   Loss 2.0569   LearningRate 0.0004   Epoch: 17   Global Step: 29820   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-03 23:04:21,499-Speed 13839.97 samples/sec   Loss 2.0601   LearningRate 0.0004   Epoch: 17   Global Step: 29830   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-03 23:04:39,265-Speed 13833.51 samples/sec   Loss 2.0439   LearningRate 0.0004   Epoch: 17   Global Step: 29840   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-03 23:04:57,031-Speed 13833.88 samples/sec   Loss 2.0510   LearningRate 0.0004   Epoch: 17   Global Step: 29850   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-03 23:05:14,774-Speed 13852.76 samples/sec   Loss 2.0392   LearningRate 0.0004   Epoch: 17   Global Step: 29860   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-03 23:05:32,468-Speed 13890.07 samples/sec   Loss 2.0439   LearningRate 0.0004   Epoch: 17   Global Step: 29870   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-03 23:05:50,211-Speed 13852.22 samples/sec   Loss 2.0483   LearningRate 0.0004   Epoch: 17   Global Step: 29880   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-03 23:06:07,991-Speed 13823.52 samples/sec   Loss 2.0418   LearningRate 0.0004   Epoch: 17   Global Step: 29890   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-03 23:06:25,721-Speed 13862.55 samples/sec   Loss 2.0440   LearningRate 0.0004   Epoch: 17   Global Step: 29900   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-03 23:06:43,489-Speed 13832.44 samples/sec   Loss 2.0451   LearningRate 0.0004   Epoch: 17   Global Step: 29910   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-03 23:07:01,178-Speed 13894.29 samples/sec   Loss 2.0534   LearningRate 0.0004   Epoch: 17   Global Step: 29920   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-03 23:07:18,897-Speed 13870.45 samples/sec   Loss 2.0326   LearningRate 0.0004   Epoch: 17   Global Step: 29930   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-03 23:07:36,596-Speed 13886.62 samples/sec   Loss 2.0448   LearningRate 0.0004   Epoch: 17   Global Step: 29940   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-03 23:07:54,368-Speed 13829.82 samples/sec   Loss 2.0482   LearningRate 0.0004   Epoch: 17   Global Step: 29950   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-03 23:08:12,145-Speed 13827.24 samples/sec   Loss 2.0377   LearningRate 0.0004   Epoch: 17   Global Step: 29960   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-03 23:08:29,935-Speed 13815.22 samples/sec   Loss 2.0367   LearningRate 0.0004   Epoch: 17   Global Step: 29970   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-03 23:08:47,724-Speed 13816.44 samples/sec   Loss 2.0453   LearningRate 0.0004   Epoch: 17   Global Step: 29980   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-03 23:09:05,471-Speed 13848.85 samples/sec   Loss 2.0343   LearningRate 0.0004   Epoch: 17   Global Step: 29990   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-03 23:09:23,226-Speed 13843.14 samples/sec   Loss 2.0421   LearningRate 0.0004   Epoch: 17   Global Step: 30000   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-03 23:09:40,904-Speed 13903.22 samples/sec   Loss 2.0325   LearningRate 0.0004   Epoch: 17   Global Step: 30010   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-03 23:09:58,623-Speed 13870.83 samples/sec   Loss 2.0410   LearningRate 0.0004   Epoch: 17   Global Step: 30020   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-03 23:10:16,374-Speed 13845.89 samples/sec   Loss 2.0267   LearningRate 0.0004   Epoch: 17   Global Step: 30030   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-03 23:10:34,101-Speed 13865.96 samples/sec   Loss 2.0308   LearningRate 0.0004   Epoch: 17   Global Step: 30040   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-03 23:10:51,936-Speed 13780.78 samples/sec   Loss 2.0434   LearningRate 0.0004   Epoch: 17   Global Step: 30050   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-03 23:11:09,659-Speed 13867.10 samples/sec   Loss 2.0340   LearningRate 0.0004   Epoch: 17   Global Step: 30060   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-03 23:11:27,363-Speed 13882.57 samples/sec   Loss 2.0414   LearningRate 0.0004   Epoch: 17   Global Step: 30070   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-03 23:11:45,139-Speed 13826.14 samples/sec   Loss 2.0265   LearningRate 0.0004   Epoch: 17   Global Step: 30080   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-03 23:12:03,011-Speed 13752.10 samples/sec   Loss 2.0475   LearningRate 0.0004   Epoch: 17   Global Step: 30090   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-03 23:12:20,762-Speed 13845.82 samples/sec   Loss 2.0414   LearningRate 0.0004   Epoch: 17   Global Step: 30100   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-03 23:12:38,549-Speed 13817.59 samples/sec   Loss 2.0334   LearningRate 0.0004   Epoch: 17   Global Step: 30110   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-03 23:12:56,290-Speed 13854.13 samples/sec   Loss 2.0462   LearningRate 0.0004   Epoch: 17   Global Step: 30120   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-03 23:13:14,028-Speed 13855.23 samples/sec   Loss 2.0299   LearningRate 0.0004   Epoch: 17   Global Step: 30130   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-03 23:13:31,846-Speed 13795.20 samples/sec   Loss 2.0345   LearningRate 0.0004   Epoch: 17   Global Step: 30140   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-03 23:13:49,652-Speed 13803.06 samples/sec   Loss 2.0368   LearningRate 0.0004   Epoch: 17   Global Step: 30150   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-03 23:14:07,444-Speed 13814.37 samples/sec   Loss 2.0275   LearningRate 0.0004   Epoch: 17   Global Step: 30160   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-03 23:14:25,244-Speed 13807.58 samples/sec   Loss 2.0399   LearningRate 0.0004   Epoch: 17   Global Step: 30170   Fp16 Grad Scale: 65536   Required: 20 hours
Training: 2022-03-03 23:14:43,009-Speed 13835.71 samples/sec   Loss 2.0259   LearningRate 0.0004   Epoch: 17   Global Step: 30180   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-03 23:15:00,859-Speed 13768.75 samples/sec   Loss 2.0458   LearningRate 0.0004   Epoch: 17   Global Step: 30190   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-03 23:15:18,641-Speed 13821.88 samples/sec   Loss 2.0400   LearningRate 0.0004   Epoch: 17   Global Step: 30200   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-03 23:15:36,427-Speed 13818.12 samples/sec   Loss 2.0223   LearningRate 0.0004   Epoch: 17   Global Step: 30210   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-03 23:15:54,231-Speed 13804.60 samples/sec   Loss 2.0327   LearningRate 0.0004   Epoch: 17   Global Step: 30220   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-03 23:16:11,970-Speed 13855.27 samples/sec   Loss 2.0344   LearningRate 0.0004   Epoch: 17   Global Step: 30230   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-03 23:16:29,720-Speed 13846.77 samples/sec   Loss 2.0346   LearningRate 0.0004   Epoch: 17   Global Step: 30240   Fp16 Grad Scale: 32768   Required: 20 hours
Training: 2022-03-03 23:16:47,403-Speed 13898.60 samples/sec   Loss 2.0312   LearningRate 0.0004   Epoch: 17   Global Step: 30250   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-03-03 23:17:05,181-Speed 13824.73 samples/sec   Loss 2.0373   LearningRate 0.0004   Epoch: 17   Global Step: 30260   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-03-03 23:17:22,887-Speed 13882.46 samples/sec   Loss 2.0358   LearningRate 0.0004   Epoch: 17   Global Step: 30270   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-03-03 23:17:40,640-Speed 13845.51 samples/sec   Loss 2.0256   LearningRate 0.0004   Epoch: 17   Global Step: 30280   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-03-03 23:17:58,398-Speed 13840.06 samples/sec   Loss 2.0223   LearningRate 0.0004   Epoch: 17   Global Step: 30290   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-03-03 23:18:16,182-Speed 13821.25 samples/sec   Loss 2.0166   LearningRate 0.0004   Epoch: 17   Global Step: 30300   Fp16 Grad Scale: 16384   Required: 20 hours
Training: 2022-03-03 23:18:33,944-Speed 13837.19 samples/sec   Loss 2.0080   LearningRate 0.0004   Epoch: 17   Global Step: 30310   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-03-03 23:18:51,713-Speed 13832.23 samples/sec   Loss 2.0127   LearningRate 0.0004   Epoch: 17   Global Step: 30320   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-03-03 23:19:09,447-Speed 13858.96 samples/sec   Loss 2.0081   LearningRate 0.0004   Epoch: 17   Global Step: 30330   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-03-03 23:19:27,194-Speed 13848.85 samples/sec   Loss 2.0158   LearningRate 0.0004   Epoch: 17   Global Step: 30340   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-03-03 23:19:44,856-Speed 13915.81 samples/sec   Loss 2.0140   LearningRate 0.0004   Epoch: 17   Global Step: 30350   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-03 23:20:02,635-Speed 13823.98 samples/sec   Loss 2.0230   LearningRate 0.0004   Epoch: 17   Global Step: 30360   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-03 23:20:20,470-Speed 13783.82 samples/sec   Loss 2.0269   LearningRate 0.0004   Epoch: 17   Global Step: 30370   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-03 23:20:38,331-Speed 13760.36 samples/sec   Loss 2.0288   LearningRate 0.0004   Epoch: 17   Global Step: 30380   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-03 23:20:56,364-Speed 13628.61 samples/sec   Loss 2.0257   LearningRate 0.0004   Epoch: 17   Global Step: 30390   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-03 23:21:14,492-Speed 13557.96 samples/sec   Loss 2.0243   LearningRate 0.0004   Epoch: 17   Global Step: 30400   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-03 23:21:32,610-Speed 13566.50 samples/sec   Loss 1.9984   LearningRate 0.0004   Epoch: 17   Global Step: 30410   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-03 23:21:50,725-Speed 13567.39 samples/sec   Loss 2.0038   LearningRate 0.0004   Epoch: 17   Global Step: 30420   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-03 23:22:08,902-Speed 13520.87 samples/sec   Loss 2.0148   LearningRate 0.0004   Epoch: 17   Global Step: 30430   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-03 23:22:27,018-Speed 13567.11 samples/sec   Loss 2.0129   LearningRate 0.0004   Epoch: 17   Global Step: 30440   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-03 23:22:45,143-Speed 13560.38 samples/sec   Loss 2.0300   LearningRate 0.0004   Epoch: 17   Global Step: 30450   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-03 23:23:03,282-Speed 13549.37 samples/sec   Loss 2.0112   LearningRate 0.0004   Epoch: 17   Global Step: 30460   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-03 23:23:21,351-Speed 13601.67 samples/sec   Loss 2.0019   LearningRate 0.0004   Epoch: 17   Global Step: 30470   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-03 23:23:39,449-Speed 13580.44 samples/sec   Loss 2.0199   LearningRate 0.0004   Epoch: 17   Global Step: 30480   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-03 23:23:57,595-Speed 13544.53 samples/sec   Loss 2.0253   LearningRate 0.0004   Epoch: 17   Global Step: 30490   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-03 23:24:15,947-Speed 13392.68 samples/sec   Loss 1.9933   LearningRate 0.0004   Epoch: 17   Global Step: 30500   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-03 23:24:34,059-Speed 13569.55 samples/sec   Loss 2.0125   LearningRate 0.0004   Epoch: 17   Global Step: 30510   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-03 23:24:52,168-Speed 13571.62 samples/sec   Loss 2.0132   LearningRate 0.0004   Epoch: 17   Global Step: 30520   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-03 23:25:10,303-Speed 13553.71 samples/sec   Loss 2.0029   LearningRate 0.0004   Epoch: 17   Global Step: 30530   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-03 23:25:28,465-Speed 13532.29 samples/sec   Loss 2.0182   LearningRate 0.0004   Epoch: 17   Global Step: 30540   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-03 23:25:46,295-Speed 13784.30 samples/sec   Loss 2.0113   LearningRate 0.0004   Epoch: 17   Global Step: 30550   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-03 23:26:04,109-Speed 13796.42 samples/sec   Loss 2.0113   LearningRate 0.0004   Epoch: 17   Global Step: 30560   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-03 23:26:21,889-Speed 13823.58 samples/sec   Loss 2.0058   LearningRate 0.0004   Epoch: 17   Global Step: 30570   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-03 23:26:39,575-Speed 13896.27 samples/sec   Loss 2.0280   LearningRate 0.0004   Epoch: 17   Global Step: 30580   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-03 23:26:57,347-Speed 13829.98 samples/sec   Loss 2.0114   LearningRate 0.0004   Epoch: 17   Global Step: 30590   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-03 23:27:15,143-Speed 13810.19 samples/sec   Loss 1.9884   LearningRate 0.0004   Epoch: 17   Global Step: 30600   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-03 23:27:32,826-Speed 13899.07 samples/sec   Loss 1.9905   LearningRate 0.0004   Epoch: 17   Global Step: 30610   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-03 23:27:50,547-Speed 13868.93 samples/sec   Loss 2.0188   LearningRate 0.0004   Epoch: 17   Global Step: 30620   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-03 23:28:08,346-Speed 13808.51 samples/sec   Loss 2.0040   LearningRate 0.0004   Epoch: 17   Global Step: 30630   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-03 23:28:26,062-Speed 13872.97 samples/sec   Loss 1.9999   LearningRate 0.0004   Epoch: 17   Global Step: 30640   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-03 23:28:43,887-Speed 13788.59 samples/sec   Loss 2.0021   LearningRate 0.0004   Epoch: 17   Global Step: 30650   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-03 23:29:01,587-Speed 13886.00 samples/sec   Loss 2.0202   LearningRate 0.0004   Epoch: 17   Global Step: 30660   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-03 23:29:19,301-Speed 13874.02 samples/sec   Loss 2.0090   LearningRate 0.0004   Epoch: 17   Global Step: 30670   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-03 23:29:37,007-Speed 13880.68 samples/sec   Loss 2.0080   LearningRate 0.0004   Epoch: 17   Global Step: 30680   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-03 23:29:54,716-Speed 13879.16 samples/sec   Loss 1.9930   LearningRate 0.0004   Epoch: 17   Global Step: 30690   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-03 23:30:12,628-Speed 13721.43 samples/sec   Loss 2.0007   LearningRate 0.0004   Epoch: 17   Global Step: 30700   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-03 23:30:30,411-Speed 13820.65 samples/sec   Loss 1.9941   LearningRate 0.0004   Epoch: 17   Global Step: 30710   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-03 23:30:48,131-Speed 13869.57 samples/sec   Loss 2.0135   LearningRate 0.0004   Epoch: 17   Global Step: 30720   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-03 23:31:05,863-Speed 13861.02 samples/sec   Loss 1.9868   LearningRate 0.0004   Epoch: 17   Global Step: 30730   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-03 23:31:23,615-Speed 13844.82 samples/sec   Loss 2.0109   LearningRate 0.0004   Epoch: 17   Global Step: 30740   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-03 23:31:41,335-Speed 13869.88 samples/sec   Loss 1.9903   LearningRate 0.0004   Epoch: 17   Global Step: 30750   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-03 23:31:59,009-Speed 13906.64 samples/sec   Loss 1.9806   LearningRate 0.0004   Epoch: 17   Global Step: 30760   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-03 23:32:16,785-Speed 13826.36 samples/sec   Loss 1.9876   LearningRate 0.0004   Epoch: 17   Global Step: 30770   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-03 23:32:34,528-Speed 13851.55 samples/sec   Loss 1.9848   LearningRate 0.0004   Epoch: 17   Global Step: 30780   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-03 23:32:52,337-Speed 13801.12 samples/sec   Loss 1.9767   LearningRate 0.0004   Epoch: 17   Global Step: 30790   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-03 23:33:10,047-Speed 13877.04 samples/sec   Loss 2.0047   LearningRate 0.0004   Epoch: 17   Global Step: 30800   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-03 23:33:27,791-Speed 13851.40 samples/sec   Loss 2.0122   LearningRate 0.0004   Epoch: 17   Global Step: 30810   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-03 23:33:45,547-Speed 13843.18 samples/sec   Loss 1.9812   LearningRate 0.0004   Epoch: 17   Global Step: 30820   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-03 23:34:03,352-Speed 13804.37 samples/sec   Loss 2.0058   LearningRate 0.0004   Epoch: 17   Global Step: 30830   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-03 23:34:21,092-Speed 13854.07 samples/sec   Loss 1.9895   LearningRate 0.0004   Epoch: 17   Global Step: 30840   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-03 23:34:38,865-Speed 13828.66 samples/sec   Loss 1.9968   LearningRate 0.0004   Epoch: 17   Global Step: 30850   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-03 23:34:56,652-Speed 13818.12 samples/sec   Loss 1.9871   LearningRate 0.0004   Epoch: 17   Global Step: 30860   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-03 23:35:14,443-Speed 13814.42 samples/sec   Loss 1.9935   LearningRate 0.0004   Epoch: 17   Global Step: 30870   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-03 23:35:32,149-Speed 13880.56 samples/sec   Loss 1.9924   LearningRate 0.0004   Epoch: 17   Global Step: 30880   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-03 23:35:49,911-Speed 13836.91 samples/sec   Loss 1.9972   LearningRate 0.0004   Epoch: 17   Global Step: 30890   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-03 23:36:07,720-Speed 13801.06 samples/sec   Loss 2.0041   LearningRate 0.0004   Epoch: 17   Global Step: 30900   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-03 23:36:25,491-Speed 13830.32 samples/sec   Loss 1.9902   LearningRate 0.0004   Epoch: 17   Global Step: 30910   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-03 23:36:43,317-Speed 13787.45 samples/sec   Loss 1.9946   LearningRate 0.0004   Epoch: 17   Global Step: 30920   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-03 23:37:01,015-Speed 13886.72 samples/sec   Loss 1.9964   LearningRate 0.0004   Epoch: 17   Global Step: 30930   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-03 23:37:18,746-Speed 13861.58 samples/sec   Loss 2.0104   LearningRate 0.0004   Epoch: 17   Global Step: 30940   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-03 23:37:36,644-Speed 13732.03 samples/sec   Loss 2.0088   LearningRate 0.0004   Epoch: 17   Global Step: 30950   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-03 23:37:54,414-Speed 13831.44 samples/sec   Loss 1.9947   LearningRate 0.0004   Epoch: 17   Global Step: 30960   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-03-03 23:38:12,264-Speed 13768.68 samples/sec   Loss 1.9749   LearningRate 0.0004   Epoch: 17   Global Step: 30970   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-03-03 23:38:29,937-Speed 13906.88 samples/sec   Loss 1.9866   LearningRate 0.0004   Epoch: 17   Global Step: 30980   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-03-03 23:38:47,649-Speed 13877.90 samples/sec   Loss 1.9926   LearningRate 0.0004   Epoch: 17   Global Step: 30990   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-03-03 23:39:05,419-Speed 13830.63 samples/sec   Loss 2.0100   LearningRate 0.0004   Epoch: 17   Global Step: 31000   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-03-03 23:39:23,143-Speed 13867.18 samples/sec   Loss 1.9980   LearningRate 0.0004   Epoch: 17   Global Step: 31010   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-03-03 23:39:40,893-Speed 13846.67 samples/sec   Loss 2.0016   LearningRate 0.0004   Epoch: 17   Global Step: 31020   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-03-03 23:39:58,599-Speed 13880.95 samples/sec   Loss 1.9983   LearningRate 0.0004   Epoch: 17   Global Step: 31030   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-03-03 23:40:16,331-Speed 13860.76 samples/sec   Loss 1.9952   LearningRate 0.0004   Epoch: 17   Global Step: 31040   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-03-03 23:40:33,997-Speed 13912.29 samples/sec   Loss 2.0023   LearningRate 0.0004   Epoch: 17   Global Step: 31050   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-03-03 23:40:51,719-Speed 13868.54 samples/sec   Loss 2.0021   LearningRate 0.0004   Epoch: 17   Global Step: 31060   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-03 23:41:09,578-Speed 13762.03 samples/sec   Loss 2.0028   LearningRate 0.0004   Epoch: 17   Global Step: 31070   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-03 23:41:27,316-Speed 13856.11 samples/sec   Loss 2.0093   LearningRate 0.0004   Epoch: 17   Global Step: 31080   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-03 23:41:45,061-Speed 13850.13 samples/sec   Loss 2.0092   LearningRate 0.0004   Epoch: 17   Global Step: 31090   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-03 23:42:02,780-Speed 13870.52 samples/sec   Loss 2.0100   LearningRate 0.0004   Epoch: 17   Global Step: 31100   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-03 23:42:20,634-Speed 13766.48 samples/sec   Loss 2.0264   LearningRate 0.0004   Epoch: 17   Global Step: 31110   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-03 23:43:29,381-Speed 3577.08 samples/sec   Loss 1.9687   LearningRate 0.0004   Epoch: 18   Global Step: 31120   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-03 23:43:47,017-Speed 13935.98 samples/sec   Loss 1.9601   LearningRate 0.0004   Epoch: 18   Global Step: 31130   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-03 23:44:04,673-Speed 13920.06 samples/sec   Loss 1.9373   LearningRate 0.0004   Epoch: 18   Global Step: 31140   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-03 23:44:22,340-Speed 13911.91 samples/sec   Loss 1.9642   LearningRate 0.0004   Epoch: 18   Global Step: 31150   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-03 23:44:40,044-Speed 13882.02 samples/sec   Loss 1.9680   LearningRate 0.0004   Epoch: 18   Global Step: 31160   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-03 23:44:57,694-Speed 13925.16 samples/sec   Loss 1.9562   LearningRate 0.0004   Epoch: 18   Global Step: 31170   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-03 23:45:15,538-Speed 13773.29 samples/sec   Loss 1.9605   LearningRate 0.0004   Epoch: 18   Global Step: 31180   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-03 23:45:33,430-Speed 13737.17 samples/sec   Loss 1.9577   LearningRate 0.0004   Epoch: 18   Global Step: 31190   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-03 23:45:51,291-Speed 13759.94 samples/sec   Loss 1.9645   LearningRate 0.0004   Epoch: 18   Global Step: 31200   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-03 23:46:09,095-Speed 13804.39 samples/sec   Loss 1.9483   LearningRate 0.0004   Epoch: 18   Global Step: 31210   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-03 23:46:26,902-Speed 13802.57 samples/sec   Loss 1.9643   LearningRate 0.0004   Epoch: 18   Global Step: 31220   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-03 23:46:44,686-Speed 13819.78 samples/sec   Loss 1.9568   LearningRate 0.0004   Epoch: 18   Global Step: 31230   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-03 23:47:02,516-Speed 13785.37 samples/sec   Loss 1.9703   LearningRate 0.0004   Epoch: 18   Global Step: 31240   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-03 23:47:20,345-Speed 13785.08 samples/sec   Loss 1.9666   LearningRate 0.0004   Epoch: 18   Global Step: 31250   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-03 23:47:38,113-Speed 13832.57 samples/sec   Loss 1.9553   LearningRate 0.0004   Epoch: 18   Global Step: 31260   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-03 23:47:55,932-Speed 13793.34 samples/sec   Loss 1.9547   LearningRate 0.0004   Epoch: 18   Global Step: 31270   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-03 23:48:13,730-Speed 13808.69 samples/sec   Loss 1.9721   LearningRate 0.0004   Epoch: 18   Global Step: 31280   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-03 23:48:31,485-Speed 13842.91 samples/sec   Loss 1.9587   LearningRate 0.0004   Epoch: 18   Global Step: 31290   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-03 23:48:49,184-Speed 13885.94 samples/sec   Loss 1.9625   LearningRate 0.0004   Epoch: 18   Global Step: 31300   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-03 23:49:06,932-Speed 13848.51 samples/sec   Loss 1.9597   LearningRate 0.0004   Epoch: 18   Global Step: 31310   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-03 23:49:24,643-Speed 13876.77 samples/sec   Loss 1.9433   LearningRate 0.0004   Epoch: 18   Global Step: 31320   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-03 23:49:42,369-Speed 13865.19 samples/sec   Loss 1.9524   LearningRate 0.0004   Epoch: 18   Global Step: 31330   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-03 23:50:00,084-Speed 13874.16 samples/sec   Loss 1.9665   LearningRate 0.0004   Epoch: 18   Global Step: 31340   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-03 23:50:17,742-Speed 13918.65 samples/sec   Loss 1.9646   LearningRate 0.0004   Epoch: 18   Global Step: 31350   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-03 23:50:35,466-Speed 13867.15 samples/sec   Loss 1.9604   LearningRate 0.0004   Epoch: 18   Global Step: 31360   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-03-03 23:50:53,208-Speed 13852.56 samples/sec   Loss 1.9569   LearningRate 0.0004   Epoch: 18   Global Step: 31370   Fp16 Grad Scale: 131072   Required: 19 hours
Training: 2022-03-03 23:51:10,927-Speed 13870.31 samples/sec   Loss 1.9651   LearningRate 0.0004   Epoch: 18   Global Step: 31380   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-03 23:51:28,645-Speed 13873.92 samples/sec   Loss 1.9664   LearningRate 0.0004   Epoch: 18   Global Step: 31390   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-03 23:51:46,317-Speed 13907.28 samples/sec   Loss 1.9735   LearningRate 0.0004   Epoch: 18   Global Step: 31400   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-03 23:52:04,097-Speed 13823.30 samples/sec   Loss 1.9559   LearningRate 0.0004   Epoch: 18   Global Step: 31410   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-03 23:52:21,844-Speed 13848.96 samples/sec   Loss 1.9589   LearningRate 0.0004   Epoch: 18   Global Step: 31420   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-03 23:52:39,610-Speed 13834.24 samples/sec   Loss 1.9856   LearningRate 0.0004   Epoch: 18   Global Step: 31430   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-03 23:52:57,368-Speed 13839.89 samples/sec   Loss 1.9694   LearningRate 0.0004   Epoch: 18   Global Step: 31440   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-03 23:53:15,074-Speed 13881.48 samples/sec   Loss 1.9591   LearningRate 0.0004   Epoch: 18   Global Step: 31450   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-03 23:53:32,751-Speed 13903.55 samples/sec   Loss 1.9773   LearningRate 0.0004   Epoch: 18   Global Step: 31460   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-03 23:53:50,465-Speed 13874.66 samples/sec   Loss 1.9677   LearningRate 0.0004   Epoch: 18   Global Step: 31470   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-03 23:54:08,227-Speed 13837.44 samples/sec   Loss 1.9686   LearningRate 0.0004   Epoch: 18   Global Step: 31480   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-03 23:54:25,929-Speed 13883.53 samples/sec   Loss 1.9558   LearningRate 0.0004   Epoch: 18   Global Step: 31490   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-03 23:54:43,640-Speed 13877.09 samples/sec   Loss 1.9443   LearningRate 0.0004   Epoch: 18   Global Step: 31500   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-03 23:55:01,384-Speed 13852.37 samples/sec   Loss 1.9804   LearningRate 0.0004   Epoch: 18   Global Step: 31510   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-03 23:55:19,076-Speed 13892.37 samples/sec   Loss 1.9732   LearningRate 0.0004   Epoch: 18   Global Step: 31520   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-03 23:55:36,824-Speed 13847.69 samples/sec   Loss 1.9557   LearningRate 0.0004   Epoch: 18   Global Step: 31530   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-03 23:55:54,567-Speed 13852.20 samples/sec   Loss 1.9536   LearningRate 0.0004   Epoch: 18   Global Step: 31540   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-03 23:56:12,335-Speed 13832.15 samples/sec   Loss 1.9578   LearningRate 0.0004   Epoch: 18   Global Step: 31550   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-03 23:56:30,160-Speed 13788.03 samples/sec   Loss 1.9585   LearningRate 0.0004   Epoch: 18   Global Step: 31560   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-03 23:56:47,882-Speed 13868.51 samples/sec   Loss 1.9501   LearningRate 0.0004   Epoch: 18   Global Step: 31570   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-03-03 23:57:05,689-Speed 13802.66 samples/sec   Loss 1.9470   LearningRate 0.0004   Epoch: 18   Global Step: 31580   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-03-03 23:57:23,473-Speed 13819.62 samples/sec   Loss 1.9495   LearningRate 0.0004   Epoch: 18   Global Step: 31590   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-03-03 23:57:41,283-Speed 13800.23 samples/sec   Loss 1.9584   LearningRate 0.0004   Epoch: 18   Global Step: 31600   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-03-03 23:57:59,090-Speed 13802.10 samples/sec   Loss 1.9483   LearningRate 0.0004   Epoch: 18   Global Step: 31610   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-03-03 23:58:17,046-Speed 13687.75 samples/sec   Loss 1.9482   LearningRate 0.0004   Epoch: 18   Global Step: 31620   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-03-03 23:58:34,864-Speed 13793.18 samples/sec   Loss 1.9546   LearningRate 0.0004   Epoch: 18   Global Step: 31630   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-03-03 23:58:52,708-Speed 13773.63 samples/sec   Loss 1.9496   LearningRate 0.0004   Epoch: 18   Global Step: 31640   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-03-03 23:59:10,544-Speed 13780.35 samples/sec   Loss 1.9556   LearningRate 0.0004   Epoch: 18   Global Step: 31650   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-03-03 23:59:28,530-Speed 13664.78 samples/sec   Loss 1.9537   LearningRate 0.0004   Epoch: 18   Global Step: 31660   Fp16 Grad Scale: 16384   Required: 19 hours
Training: 2022-03-03 23:59:46,408-Speed 13747.20 samples/sec   Loss 1.9577   LearningRate 0.0004   Epoch: 18   Global Step: 31670   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-04 00:00:04,246-Speed 13778.19 samples/sec   Loss 1.9414   LearningRate 0.0004   Epoch: 18   Global Step: 31680   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-04 00:00:22,149-Speed 13730.38 samples/sec   Loss 1.9565   LearningRate 0.0004   Epoch: 18   Global Step: 31690   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-04 00:00:39,982-Speed 13782.15 samples/sec   Loss 1.9424   LearningRate 0.0004   Epoch: 18   Global Step: 31700   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-04 00:00:57,924-Speed 13698.14 samples/sec   Loss 1.9558   LearningRate 0.0004   Epoch: 18   Global Step: 31710   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-04 00:01:15,802-Speed 13747.76 samples/sec   Loss 1.9477   LearningRate 0.0004   Epoch: 18   Global Step: 31720   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-04 00:01:33,714-Speed 13721.09 samples/sec   Loss 1.9353   LearningRate 0.0004   Epoch: 18   Global Step: 31730   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-04 00:01:51,595-Speed 13745.27 samples/sec   Loss 1.9442   LearningRate 0.0004   Epoch: 18   Global Step: 31740   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-04 00:02:09,419-Speed 13788.71 samples/sec   Loss 1.9473   LearningRate 0.0004   Epoch: 18   Global Step: 31750   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-04 00:02:27,396-Speed 13671.36 samples/sec   Loss 1.9444   LearningRate 0.0004   Epoch: 18   Global Step: 31760   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-04 00:02:45,457-Speed 13608.15 samples/sec   Loss 1.9517   LearningRate 0.0004   Epoch: 18   Global Step: 31770   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-04 00:03:03,609-Speed 13539.94 samples/sec   Loss 1.9403   LearningRate 0.0004   Epoch: 18   Global Step: 31780   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-04 00:03:21,795-Speed 13514.66 samples/sec   Loss 1.9339   LearningRate 0.0004   Epoch: 18   Global Step: 31790   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-04 00:03:40,104-Speed 13423.24 samples/sec   Loss 1.9320   LearningRate 0.0004   Epoch: 18   Global Step: 31800   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-04 00:03:58,268-Speed 13531.04 samples/sec   Loss 1.9463   LearningRate 0.0004   Epoch: 18   Global Step: 31810   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-04 00:04:16,395-Speed 13558.72 samples/sec   Loss 1.9542   LearningRate 0.0004   Epoch: 18   Global Step: 31820   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-04 00:04:34,528-Speed 13554.01 samples/sec   Loss 1.9418   LearningRate 0.0004   Epoch: 18   Global Step: 31830   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-04 00:04:52,648-Speed 13563.58 samples/sec   Loss 1.9516   LearningRate 0.0004   Epoch: 18   Global Step: 31840   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-04 00:05:10,666-Speed 13641.27 samples/sec   Loss 1.9485   LearningRate 0.0004   Epoch: 18   Global Step: 31850   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-04 00:05:28,445-Speed 13823.54 samples/sec   Loss 1.9465   LearningRate 0.0004   Epoch: 18   Global Step: 31860   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-04 00:05:46,176-Speed 13861.71 samples/sec   Loss 1.9190   LearningRate 0.0004   Epoch: 18   Global Step: 31870   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-04 00:06:04,150-Speed 13673.89 samples/sec   Loss 1.9428   LearningRate 0.0004   Epoch: 18   Global Step: 31880   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-04 00:06:21,989-Speed 13777.74 samples/sec   Loss 1.9442   LearningRate 0.0004   Epoch: 18   Global Step: 31890   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-04 00:06:39,942-Speed 13689.98 samples/sec   Loss 1.9343   LearningRate 0.0004   Epoch: 18   Global Step: 31900   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-04 00:06:57,825-Speed 13743.02 samples/sec   Loss 1.9553   LearningRate 0.0004   Epoch: 18   Global Step: 31910   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-04 00:07:15,680-Speed 13765.01 samples/sec   Loss 1.9402   LearningRate 0.0004   Epoch: 18   Global Step: 31920   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-04 00:07:33,575-Speed 13734.69 samples/sec   Loss 1.9238   LearningRate 0.0004   Epoch: 18   Global Step: 31930   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-04 00:07:51,313-Speed 13855.93 samples/sec   Loss 1.9219   LearningRate 0.0004   Epoch: 18   Global Step: 31940   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-04 00:08:09,205-Speed 13736.02 samples/sec   Loss 1.9317   LearningRate 0.0004   Epoch: 18   Global Step: 31950   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-04 00:08:26,951-Speed 13849.91 samples/sec   Loss 1.9461   LearningRate 0.0004   Epoch: 18   Global Step: 31960   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-04 00:08:44,736-Speed 13818.86 samples/sec   Loss 1.9226   LearningRate 0.0004   Epoch: 18   Global Step: 31970   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-04 00:09:02,507-Speed 13830.87 samples/sec   Loss 1.9297   LearningRate 0.0004   Epoch: 18   Global Step: 31980   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-04 00:09:20,259-Speed 13844.86 samples/sec   Loss 1.9380   LearningRate 0.0004   Epoch: 18   Global Step: 31990   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-04 00:09:37,976-Speed 13872.22 samples/sec   Loss 1.9439   LearningRate 0.0004   Epoch: 18   Global Step: 32000   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-04 00:09:55,674-Speed 13886.79 samples/sec   Loss 1.9278   LearningRate 0.0004   Epoch: 18   Global Step: 32010   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-04 00:10:13,419-Speed 13850.89 samples/sec   Loss 1.9397   LearningRate 0.0004   Epoch: 18   Global Step: 32020   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-04 00:10:31,182-Speed 13835.94 samples/sec   Loss 1.9331   LearningRate 0.0004   Epoch: 18   Global Step: 32030   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-04 00:10:49,030-Speed 13770.48 samples/sec   Loss 1.9182   LearningRate 0.0004   Epoch: 18   Global Step: 32040   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-04 00:11:07,063-Speed 13629.32 samples/sec   Loss 1.9167   LearningRate 0.0004   Epoch: 18   Global Step: 32050   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-04 00:11:25,099-Speed 13627.49 samples/sec   Loss 1.9182   LearningRate 0.0004   Epoch: 18   Global Step: 32060   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-04 00:11:43,146-Speed 13618.99 samples/sec   Loss 1.9210   LearningRate 0.0004   Epoch: 18   Global Step: 32070   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-04 00:12:01,149-Speed 13651.41 samples/sec   Loss 1.9181   LearningRate 0.0004   Epoch: 18   Global Step: 32080   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-04 00:12:19,167-Speed 13640.47 samples/sec   Loss 1.9500   LearningRate 0.0004   Epoch: 18   Global Step: 32090   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-04 00:12:37,183-Speed 13642.07 samples/sec   Loss 1.9210   LearningRate 0.0004   Epoch: 18   Global Step: 32100   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-04 00:12:54,954-Speed 13830.38 samples/sec   Loss 1.8951   LearningRate 0.0004   Epoch: 18   Global Step: 32110   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-04 00:13:12,594-Speed 13932.51 samples/sec   Loss 1.9119   LearningRate 0.0004   Epoch: 18   Global Step: 32120   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-04 00:13:30,345-Speed 13846.05 samples/sec   Loss 1.9215   LearningRate 0.0004   Epoch: 18   Global Step: 32130   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-04 00:13:48,078-Speed 13859.60 samples/sec   Loss 1.9362   LearningRate 0.0004   Epoch: 18   Global Step: 32140   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-04 00:14:05,808-Speed 13862.30 samples/sec   Loss 1.9264   LearningRate 0.0004   Epoch: 18   Global Step: 32150   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-04 00:14:23,507-Speed 13887.52 samples/sec   Loss 1.9182   LearningRate 0.0004   Epoch: 18   Global Step: 32160   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-04 00:14:41,190-Speed 13898.67 samples/sec   Loss 1.9260   LearningRate 0.0004   Epoch: 18   Global Step: 32170   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-04 00:14:58,972-Speed 13821.37 samples/sec   Loss 1.9182   LearningRate 0.0004   Epoch: 18   Global Step: 32180   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-04 00:15:16,676-Speed 13882.65 samples/sec   Loss 1.9178   LearningRate 0.0004   Epoch: 18   Global Step: 32190   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-04 00:15:34,432-Speed 13841.56 samples/sec   Loss 1.9236   LearningRate 0.0004   Epoch: 18   Global Step: 32200   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-04 00:15:52,102-Speed 13909.22 samples/sec   Loss 1.9180   LearningRate 0.0004   Epoch: 18   Global Step: 32210   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-04 00:16:09,866-Speed 13836.71 samples/sec   Loss 1.9243   LearningRate 0.0004   Epoch: 18   Global Step: 32220   Fp16 Grad Scale: 65536   Required: 19 hours
Training: 2022-03-04 00:16:27,616-Speed 13846.24 samples/sec   Loss 1.9303   LearningRate 0.0004   Epoch: 18   Global Step: 32230   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-04 00:16:45,405-Speed 13816.62 samples/sec   Loss 1.9220   LearningRate 0.0004   Epoch: 18   Global Step: 32240   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-04 00:17:03,152-Speed 13849.02 samples/sec   Loss 1.9183   LearningRate 0.0004   Epoch: 18   Global Step: 32250   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-04 00:17:20,890-Speed 13855.76 samples/sec   Loss 1.9220   LearningRate 0.0004   Epoch: 18   Global Step: 32260   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-04 00:17:38,694-Speed 13804.10 samples/sec   Loss 1.9168   LearningRate 0.0004   Epoch: 18   Global Step: 32270   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-04 00:17:56,407-Speed 13875.94 samples/sec   Loss 1.9153   LearningRate 0.0004   Epoch: 18   Global Step: 32280   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-04 00:18:14,143-Speed 13857.09 samples/sec   Loss 1.9215   LearningRate 0.0004   Epoch: 18   Global Step: 32290   Fp16 Grad Scale: 32768   Required: 19 hours
Training: 2022-03-04 00:18:31,918-Speed 13827.34 samples/sec   Loss 1.9122   LearningRate 0.0004   Epoch: 18   Global Step: 32300   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-04 00:18:49,722-Speed 13803.87 samples/sec   Loss 1.9258   LearningRate 0.0004   Epoch: 18   Global Step: 32310   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-04 00:19:07,491-Speed 13832.00 samples/sec   Loss 1.9308   LearningRate 0.0003   Epoch: 18   Global Step: 32320   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-04 00:19:25,326-Speed 13780.53 samples/sec   Loss 1.9211   LearningRate 0.0003   Epoch: 18   Global Step: 32330   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-04 00:19:43,109-Speed 13821.60 samples/sec   Loss 1.9047   LearningRate 0.0003   Epoch: 18   Global Step: 32340   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-04 00:20:00,907-Speed 13808.52 samples/sec   Loss 1.8900   LearningRate 0.0003   Epoch: 18   Global Step: 32350   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-04 00:20:18,702-Speed 13811.93 samples/sec   Loss 1.9139   LearningRate 0.0003   Epoch: 18   Global Step: 32360   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-04 00:20:36,436-Speed 13858.86 samples/sec   Loss 1.9088   LearningRate 0.0003   Epoch: 18   Global Step: 32370   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-04 00:20:54,146-Speed 13878.27 samples/sec   Loss 1.9213   LearningRate 0.0003   Epoch: 18   Global Step: 32380   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-04 00:21:11,848-Speed 13884.61 samples/sec   Loss 1.9034   LearningRate 0.0003   Epoch: 18   Global Step: 32390   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-04 00:21:29,698-Speed 13769.13 samples/sec   Loss 1.9106   LearningRate 0.0003   Epoch: 18   Global Step: 32400   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-04 00:21:47,401-Speed 13883.23 samples/sec   Loss 1.9075   LearningRate 0.0003   Epoch: 18   Global Step: 32410   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-04 00:22:05,185-Speed 13819.59 samples/sec   Loss 1.9049   LearningRate 0.0003   Epoch: 18   Global Step: 32420   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-04 00:22:22,955-Speed 13831.48 samples/sec   Loss 1.9131   LearningRate 0.0003   Epoch: 18   Global Step: 32430   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-04 00:22:40,727-Speed 13828.71 samples/sec   Loss 1.9173   LearningRate 0.0003   Epoch: 18   Global Step: 32440   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-04 00:22:58,524-Speed 13810.28 samples/sec   Loss 1.9067   LearningRate 0.0003   Epoch: 18   Global Step: 32450   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-04 00:23:16,452-Speed 13709.17 samples/sec   Loss 1.8960   LearningRate 0.0003   Epoch: 18   Global Step: 32460   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-04 00:23:34,326-Speed 13750.63 samples/sec   Loss 1.8994   LearningRate 0.0003   Epoch: 18   Global Step: 32470   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-04 00:23:52,133-Speed 13802.20 samples/sec   Loss 1.8976   LearningRate 0.0003   Epoch: 18   Global Step: 32480   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-04 00:24:09,958-Speed 13788.08 samples/sec   Loss 1.9102   LearningRate 0.0003   Epoch: 18   Global Step: 32490   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-04 00:24:27,794-Speed 13779.90 samples/sec   Loss 1.9057   LearningRate 0.0003   Epoch: 18   Global Step: 32500   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-04 00:24:45,508-Speed 13876.07 samples/sec   Loss 1.8975   LearningRate 0.0003   Epoch: 18   Global Step: 32510   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-04 00:25:03,268-Speed 13839.09 samples/sec   Loss 1.8977   LearningRate 0.0003   Epoch: 18   Global Step: 32520   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-04 00:25:21,026-Speed 13840.55 samples/sec   Loss 1.9142   LearningRate 0.0003   Epoch: 18   Global Step: 32530   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-04 00:25:38,718-Speed 13891.64 samples/sec   Loss 1.9079   LearningRate 0.0003   Epoch: 18   Global Step: 32540   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-04 00:25:56,494-Speed 13827.29 samples/sec   Loss 1.9042   LearningRate 0.0003   Epoch: 18   Global Step: 32550   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-04 00:26:14,311-Speed 13794.74 samples/sec   Loss 1.9000   LearningRate 0.0003   Epoch: 18   Global Step: 32560   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-04 00:26:32,054-Speed 13851.88 samples/sec   Loss 1.9050   LearningRate 0.0003   Epoch: 18   Global Step: 32570   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-04 00:26:49,773-Speed 13871.31 samples/sec   Loss 1.9061   LearningRate 0.0003   Epoch: 18   Global Step: 32580   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-04 00:27:07,515-Speed 13854.26 samples/sec   Loss 1.9088   LearningRate 0.0003   Epoch: 18   Global Step: 32590   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-04 00:27:25,238-Speed 13867.54 samples/sec   Loss 1.8933   LearningRate 0.0003   Epoch: 18   Global Step: 32600   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-04 00:27:43,131-Speed 13736.18 samples/sec   Loss 1.9038   LearningRate 0.0003   Epoch: 18   Global Step: 32610   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-04 00:28:00,838-Speed 13880.48 samples/sec   Loss 1.8924   LearningRate 0.0003   Epoch: 18   Global Step: 32620   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-04 00:28:18,586-Speed 13847.91 samples/sec   Loss 1.9032   LearningRate 0.0003   Epoch: 18   Global Step: 32630   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-04 00:28:36,347-Speed 13837.47 samples/sec   Loss 1.9109   LearningRate 0.0003   Epoch: 18   Global Step: 32640   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-04 00:28:54,135-Speed 13817.23 samples/sec   Loss 1.9036   LearningRate 0.0003   Epoch: 18   Global Step: 32650   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-04 00:29:12,048-Speed 13721.05 samples/sec   Loss 1.9039   LearningRate 0.0003   Epoch: 18   Global Step: 32660   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-04 00:29:29,810-Speed 13836.60 samples/sec   Loss 1.8921   LearningRate 0.0003   Epoch: 18   Global Step: 32670   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-04 00:29:47,520-Speed 13877.63 samples/sec   Loss 1.8916   LearningRate 0.0003   Epoch: 18   Global Step: 32680   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-04 00:30:05,219-Speed 13886.80 samples/sec   Loss 1.9057   LearningRate 0.0003   Epoch: 18   Global Step: 32690   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-04 00:30:22,918-Speed 13886.34 samples/sec   Loss 1.9200   LearningRate 0.0003   Epoch: 18   Global Step: 32700   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-04 00:30:40,678-Speed 13839.02 samples/sec   Loss 1.9109   LearningRate 0.0003   Epoch: 18   Global Step: 32710   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-04 00:30:58,463-Speed 13819.13 samples/sec   Loss 1.9005   LearningRate 0.0003   Epoch: 18   Global Step: 32720   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-04 00:31:16,252-Speed 13815.80 samples/sec   Loss 1.8987   LearningRate 0.0003   Epoch: 18   Global Step: 32730   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-04 00:31:34,132-Speed 13746.06 samples/sec   Loss 1.8992   LearningRate 0.0003   Epoch: 18   Global Step: 32740   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-04 00:31:51,929-Speed 13809.95 samples/sec   Loss 1.9002   LearningRate 0.0003   Epoch: 18   Global Step: 32750   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-04 00:32:09,725-Speed 13810.52 samples/sec   Loss 1.9205   LearningRate 0.0003   Epoch: 18   Global Step: 32760   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-04 00:32:27,511-Speed 13818.16 samples/sec   Loss 1.9131   LearningRate 0.0003   Epoch: 18   Global Step: 32770   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-04 00:32:45,186-Speed 13905.45 samples/sec   Loss 1.8965   LearningRate 0.0003   Epoch: 18   Global Step: 32780   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-04 00:33:02,949-Speed 13837.36 samples/sec   Loss 1.9216   LearningRate 0.0003   Epoch: 18   Global Step: 32790   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-04 00:33:20,707-Speed 13840.02 samples/sec   Loss 1.9139   LearningRate 0.0003   Epoch: 18   Global Step: 32800   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-04 00:33:38,427-Speed 13869.52 samples/sec   Loss 1.9124   LearningRate 0.0003   Epoch: 18   Global Step: 32810   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-04 00:33:56,152-Speed 13865.91 samples/sec   Loss 1.9301   LearningRate 0.0003   Epoch: 18   Global Step: 32820   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-04 00:34:13,898-Speed 13850.39 samples/sec   Loss 1.9267   LearningRate 0.0003   Epoch: 18   Global Step: 32830   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-04 00:35:22,359-Speed 3589.83 samples/sec   Loss 1.9103   LearningRate 0.0003   Epoch: 19   Global Step: 32840   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-04 00:35:40,028-Speed 13910.50 samples/sec   Loss 1.8679   LearningRate 0.0003   Epoch: 19   Global Step: 32850   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-04 00:35:57,687-Speed 13918.26 samples/sec   Loss 1.8714   LearningRate 0.0003   Epoch: 19   Global Step: 32860   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-04 00:36:15,476-Speed 13816.00 samples/sec   Loss 1.8766   LearningRate 0.0003   Epoch: 19   Global Step: 32870   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-04 00:36:33,179-Speed 13883.36 samples/sec   Loss 1.8609   LearningRate 0.0003   Epoch: 19   Global Step: 32880   Fp16 Grad Scale: 131072   Required: 18 hours
Training: 2022-03-04 00:36:50,860-Speed 13901.25 samples/sec   Loss 1.8614   LearningRate 0.0003   Epoch: 19   Global Step: 32890   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-04 00:37:08,583-Speed 13866.87 samples/sec   Loss 1.8628   LearningRate 0.0003   Epoch: 19   Global Step: 32900   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-04 00:37:26,356-Speed 13828.85 samples/sec   Loss 1.8663   LearningRate 0.0003   Epoch: 19   Global Step: 32910   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-04 00:37:44,095-Speed 13855.76 samples/sec   Loss 1.8833   LearningRate 0.0003   Epoch: 19   Global Step: 32920   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-04 00:38:01,882-Speed 13817.49 samples/sec   Loss 1.8796   LearningRate 0.0003   Epoch: 19   Global Step: 32930   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-04 00:38:19,702-Speed 13792.06 samples/sec   Loss 1.8676   LearningRate 0.0003   Epoch: 19   Global Step: 32940   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-04 00:38:37,592-Speed 13738.36 samples/sec   Loss 1.8721   LearningRate 0.0003   Epoch: 19   Global Step: 32950   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-04 00:38:55,603-Speed 13645.56 samples/sec   Loss 1.8663   LearningRate 0.0003   Epoch: 19   Global Step: 32960   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-04 00:39:13,737-Speed 13553.33 samples/sec   Loss 1.8649   LearningRate 0.0003   Epoch: 19   Global Step: 32970   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-04 00:39:31,853-Speed 13569.45 samples/sec   Loss 1.8557   LearningRate 0.0003   Epoch: 19   Global Step: 32980   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-04 00:39:49,975-Speed 13562.63 samples/sec   Loss 1.8794   LearningRate 0.0003   Epoch: 19   Global Step: 32990   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-04 00:40:07,907-Speed 13705.95 samples/sec   Loss 1.8693   LearningRate 0.0003   Epoch: 19   Global Step: 33000   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-04 00:40:25,758-Speed 13768.23 samples/sec   Loss 1.8653   LearningRate 0.0003   Epoch: 19   Global Step: 33010   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-04 00:40:43,544-Speed 13818.45 samples/sec   Loss 1.8679   LearningRate 0.0003   Epoch: 19   Global Step: 33020   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-04 00:41:01,258-Speed 13874.67 samples/sec   Loss 1.8772   LearningRate 0.0003   Epoch: 19   Global Step: 33030   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-04 00:41:19,041-Speed 13821.34 samples/sec   Loss 1.8608   LearningRate 0.0003   Epoch: 19   Global Step: 33040   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-04 00:41:36,867-Speed 13788.02 samples/sec   Loss 1.8768   LearningRate 0.0003   Epoch: 19   Global Step: 33050   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-04 00:41:54,584-Speed 13872.89 samples/sec   Loss 1.8674   LearningRate 0.0003   Epoch: 19   Global Step: 33060   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-04 00:42:12,340-Speed 13842.45 samples/sec   Loss 1.8892   LearningRate 0.0003   Epoch: 19   Global Step: 33070   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-04 00:42:30,065-Speed 13866.09 samples/sec   Loss 1.8822   LearningRate 0.0003   Epoch: 19   Global Step: 33080   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-04 00:42:47,790-Speed 13866.27 samples/sec   Loss 1.8844   LearningRate 0.0003   Epoch: 19   Global Step: 33090   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-04 00:43:05,526-Speed 13856.88 samples/sec   Loss 1.8819   LearningRate 0.0003   Epoch: 19   Global Step: 33100   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-03-04 00:43:23,250-Speed 13867.00 samples/sec   Loss 1.8815   LearningRate 0.0003   Epoch: 19   Global Step: 33110   Fp16 Grad Scale: 8192   Required: 18 hours
Training: 2022-03-04 00:43:41,051-Speed 13807.42 samples/sec   Loss 1.8792   LearningRate 0.0003   Epoch: 19   Global Step: 33120   Fp16 Grad Scale: 8192   Required: 18 hours
Training: 2022-03-04 00:43:58,783-Speed 13860.06 samples/sec   Loss 1.8828   LearningRate 0.0003   Epoch: 19   Global Step: 33130   Fp16 Grad Scale: 8192   Required: 18 hours
Training: 2022-03-04 00:44:16,569-Speed 13818.76 samples/sec   Loss 1.8694   LearningRate 0.0003   Epoch: 19   Global Step: 33140   Fp16 Grad Scale: 8192   Required: 18 hours
Training: 2022-03-04 00:44:34,291-Speed 13868.54 samples/sec   Loss 1.8714   LearningRate 0.0003   Epoch: 19   Global Step: 33150   Fp16 Grad Scale: 8192   Required: 18 hours
Training: 2022-03-04 00:44:52,039-Speed 13847.67 samples/sec   Loss 1.8816   LearningRate 0.0003   Epoch: 19   Global Step: 33160   Fp16 Grad Scale: 8192   Required: 18 hours
Training: 2022-03-04 00:45:09,932-Speed 13736.03 samples/sec   Loss 1.8874   LearningRate 0.0003   Epoch: 19   Global Step: 33170   Fp16 Grad Scale: 8192   Required: 18 hours
Training: 2022-03-04 00:45:27,718-Speed 13818.68 samples/sec   Loss 1.8724   LearningRate 0.0003   Epoch: 19   Global Step: 33180   Fp16 Grad Scale: 8192   Required: 18 hours
Training: 2022-03-04 00:45:45,398-Speed 13901.80 samples/sec   Loss 1.8836   LearningRate 0.0003   Epoch: 19   Global Step: 33190   Fp16 Grad Scale: 8192   Required: 18 hours
Training: 2022-03-04 00:46:03,205-Speed 13802.11 samples/sec   Loss 1.8795   LearningRate 0.0003   Epoch: 19   Global Step: 33200   Fp16 Grad Scale: 8192   Required: 18 hours
Training: 2022-03-04 00:46:21,094-Speed 13738.99 samples/sec   Loss 1.8786   LearningRate 0.0003   Epoch: 19   Global Step: 33210   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-03-04 00:46:38,920-Speed 13787.45 samples/sec   Loss 1.8728   LearningRate 0.0003   Epoch: 19   Global Step: 33220   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-03-04 00:46:56,693-Speed 13829.27 samples/sec   Loss 1.8581   LearningRate 0.0003   Epoch: 19   Global Step: 33230   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-03-04 00:47:14,412-Speed 13870.85 samples/sec   Loss 1.8772   LearningRate 0.0003   Epoch: 19   Global Step: 33240   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-03-04 00:47:32,146-Speed 13858.80 samples/sec   Loss 1.8647   LearningRate 0.0003   Epoch: 19   Global Step: 33250   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-03-04 00:47:49,945-Speed 13808.63 samples/sec   Loss 1.8634   LearningRate 0.0003   Epoch: 19   Global Step: 33260   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-03-04 00:48:07,733-Speed 13816.80 samples/sec   Loss 1.8680   LearningRate 0.0003   Epoch: 19   Global Step: 33270   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-03-04 00:48:25,478-Speed 13851.12 samples/sec   Loss 1.8785   LearningRate 0.0003   Epoch: 19   Global Step: 33280   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-03-04 00:48:43,243-Speed 13834.47 samples/sec   Loss 1.8785   LearningRate 0.0003   Epoch: 19   Global Step: 33290   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-03-04 00:49:01,044-Speed 13806.36 samples/sec   Loss 1.8572   LearningRate 0.0003   Epoch: 19   Global Step: 33300   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-03-04 00:49:18,767-Speed 13867.58 samples/sec   Loss 1.8595   LearningRate 0.0003   Epoch: 19   Global Step: 33310   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-04 00:49:36,528-Speed 13838.94 samples/sec   Loss 1.8644   LearningRate 0.0003   Epoch: 19   Global Step: 33320   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-04 00:49:54,272-Speed 13851.02 samples/sec   Loss 1.8708   LearningRate 0.0003   Epoch: 19   Global Step: 33330   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-04 00:50:12,081-Speed 13800.82 samples/sec   Loss 1.8608   LearningRate 0.0003   Epoch: 19   Global Step: 33340   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-04 00:50:29,828-Speed 13848.37 samples/sec   Loss 1.8659   LearningRate 0.0003   Epoch: 19   Global Step: 33350   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-04 00:50:47,511-Speed 13899.34 samples/sec   Loss 1.8600   LearningRate 0.0003   Epoch: 19   Global Step: 33360   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-03-04 00:51:05,266-Speed 13843.15 samples/sec   Loss 1.8557   LearningRate 0.0003   Epoch: 19   Global Step: 33370   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-03-04 00:51:23,100-Speed 13780.61 samples/sec   Loss 1.8733   LearningRate 0.0003   Epoch: 19   Global Step: 33380   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-03-04 00:51:40,862-Speed 13837.20 samples/sec   Loss 1.8518   LearningRate 0.0003   Epoch: 19   Global Step: 33390   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-03-04 00:51:58,592-Speed 13862.29 samples/sec   Loss 1.8563   LearningRate 0.0003   Epoch: 19   Global Step: 33400   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-03-04 00:52:16,302-Speed 13878.22 samples/sec   Loss 1.8500   LearningRate 0.0003   Epoch: 19   Global Step: 33410   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-03-04 00:52:34,160-Speed 13762.34 samples/sec   Loss 1.8565   LearningRate 0.0003   Epoch: 19   Global Step: 33420   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-03-04 00:52:51,850-Speed 13893.05 samples/sec   Loss 1.8602   LearningRate 0.0003   Epoch: 19   Global Step: 33430   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-03-04 00:53:09,520-Speed 13909.74 samples/sec   Loss 1.8621   LearningRate 0.0003   Epoch: 19   Global Step: 33440   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-03-04 00:53:27,339-Speed 13792.76 samples/sec   Loss 1.8662   LearningRate 0.0003   Epoch: 19   Global Step: 33450   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-03-04 00:53:45,036-Speed 13887.78 samples/sec   Loss 1.8468   LearningRate 0.0003   Epoch: 19   Global Step: 33460   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-04 00:54:02,801-Speed 13834.56 samples/sec   Loss 1.8533   LearningRate 0.0003   Epoch: 19   Global Step: 33470   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-04 00:54:20,645-Speed 13774.82 samples/sec   Loss 1.8639   LearningRate 0.0003   Epoch: 19   Global Step: 33480   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-04 00:54:38,454-Speed 13800.67 samples/sec   Loss 1.8710   LearningRate 0.0003   Epoch: 19   Global Step: 33490   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-04 00:54:56,435-Speed 13668.23 samples/sec   Loss 1.8715   LearningRate 0.0003   Epoch: 19   Global Step: 33500   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-04 00:55:14,607-Speed 13526.08 samples/sec   Loss 1.8593   LearningRate 0.0003   Epoch: 19   Global Step: 33510   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-04 00:55:32,682-Speed 13597.54 samples/sec   Loss 1.8634   LearningRate 0.0003   Epoch: 19   Global Step: 33520   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-04 00:55:50,760-Speed 13595.32 samples/sec   Loss 1.8593   LearningRate 0.0003   Epoch: 19   Global Step: 33530   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-04 00:56:08,745-Speed 13666.16 samples/sec   Loss 1.8618   LearningRate 0.0003   Epoch: 19   Global Step: 33540   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-04 00:56:26,819-Speed 13597.87 samples/sec   Loss 1.8529   LearningRate 0.0003   Epoch: 19   Global Step: 33550   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-04 00:56:44,848-Speed 13632.38 samples/sec   Loss 1.8600   LearningRate 0.0003   Epoch: 19   Global Step: 33560   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-04 00:57:02,830-Speed 13668.09 samples/sec   Loss 1.8505   LearningRate 0.0003   Epoch: 19   Global Step: 33570   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-04 00:57:20,786-Speed 13687.46 samples/sec   Loss 1.8395   LearningRate 0.0003   Epoch: 19   Global Step: 33580   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-03-04 00:57:38,778-Speed 13660.58 samples/sec   Loss 1.8390   LearningRate 0.0003   Epoch: 19   Global Step: 33590   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-03-04 00:57:56,731-Speed 13689.32 samples/sec   Loss 1.8415   LearningRate 0.0003   Epoch: 19   Global Step: 33600   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-03-04 00:58:14,712-Speed 13668.65 samples/sec   Loss 1.8470   LearningRate 0.0003   Epoch: 19   Global Step: 33610   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-03-04 00:58:32,546-Speed 13781.41 samples/sec   Loss 1.8430   LearningRate 0.0003   Epoch: 19   Global Step: 33620   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-03-04 00:58:50,261-Speed 13874.20 samples/sec   Loss 1.8503   LearningRate 0.0003   Epoch: 19   Global Step: 33630   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-03-04 00:59:07,967-Speed 13881.17 samples/sec   Loss 1.8587   LearningRate 0.0003   Epoch: 19   Global Step: 33640   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-03-04 00:59:25,709-Speed 13852.65 samples/sec   Loss 1.8375   LearningRate 0.0003   Epoch: 19   Global Step: 33650   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-03-04 00:59:43,415-Speed 13881.23 samples/sec   Loss 1.8322   LearningRate 0.0003   Epoch: 19   Global Step: 33660   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-03-04 01:00:01,168-Speed 13843.62 samples/sec   Loss 1.8397   LearningRate 0.0003   Epoch: 19   Global Step: 33670   Fp16 Grad Scale: 16384   Required: 18 hours
Training: 2022-03-04 01:00:18,877-Speed 13878.70 samples/sec   Loss 1.8447   LearningRate 0.0003   Epoch: 19   Global Step: 33680   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-04 01:00:36,626-Speed 13847.61 samples/sec   Loss 1.8494   LearningRate 0.0003   Epoch: 19   Global Step: 33690   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-04 01:00:54,311-Speed 13897.03 samples/sec   Loss 1.8606   LearningRate 0.0003   Epoch: 19   Global Step: 33700   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-04 01:01:12,043-Speed 13860.85 samples/sec   Loss 1.8495   LearningRate 0.0003   Epoch: 19   Global Step: 33710   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-04 01:01:29,751-Speed 13879.19 samples/sec   Loss 1.8422   LearningRate 0.0003   Epoch: 19   Global Step: 33720   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-04 01:01:47,489-Speed 13856.04 samples/sec   Loss 1.8496   LearningRate 0.0003   Epoch: 19   Global Step: 33730   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-04 01:02:05,182-Speed 13891.69 samples/sec   Loss 1.8543   LearningRate 0.0003   Epoch: 19   Global Step: 33740   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-04 01:02:22,933-Speed 13846.41 samples/sec   Loss 1.8445   LearningRate 0.0003   Epoch: 19   Global Step: 33750   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-04 01:02:40,712-Speed 13823.62 samples/sec   Loss 1.8320   LearningRate 0.0003   Epoch: 19   Global Step: 33760   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-04 01:02:58,409-Speed 13888.13 samples/sec   Loss 1.8364   LearningRate 0.0003   Epoch: 19   Global Step: 33770   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-04 01:03:16,133-Speed 13867.40 samples/sec   Loss 1.8498   LearningRate 0.0003   Epoch: 19   Global Step: 33780   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-04 01:03:33,878-Speed 13851.33 samples/sec   Loss 1.8413   LearningRate 0.0003   Epoch: 19   Global Step: 33790   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-04 01:03:51,682-Speed 13804.68 samples/sec   Loss 1.8411   LearningRate 0.0003   Epoch: 19   Global Step: 33800   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-04 01:04:09,369-Speed 13896.11 samples/sec   Loss 1.8264   LearningRate 0.0003   Epoch: 19   Global Step: 33810   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-04 01:04:27,225-Speed 13764.51 samples/sec   Loss 1.8278   LearningRate 0.0003   Epoch: 19   Global Step: 33820   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-04 01:04:44,898-Speed 13906.76 samples/sec   Loss 1.8316   LearningRate 0.0003   Epoch: 19   Global Step: 33830   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-04 01:05:02,673-Speed 13827.32 samples/sec   Loss 1.8385   LearningRate 0.0003   Epoch: 19   Global Step: 33840   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-04 01:05:20,400-Speed 13863.93 samples/sec   Loss 1.8449   LearningRate 0.0003   Epoch: 19   Global Step: 33850   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-04 01:05:38,152-Speed 13846.67 samples/sec   Loss 1.8397   LearningRate 0.0003   Epoch: 19   Global Step: 33860   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-04 01:05:55,927-Speed 13827.27 samples/sec   Loss 1.8419   LearningRate 0.0003   Epoch: 19   Global Step: 33870   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-04 01:06:13,653-Speed 13864.96 samples/sec   Loss 1.8272   LearningRate 0.0003   Epoch: 19   Global Step: 33880   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-04 01:06:31,364-Speed 13877.26 samples/sec   Loss 1.8474   LearningRate 0.0003   Epoch: 19   Global Step: 33890   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-04 01:06:49,133-Speed 13832.15 samples/sec   Loss 1.8324   LearningRate 0.0003   Epoch: 19   Global Step: 33900   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-04 01:07:06,868-Speed 13858.67 samples/sec   Loss 1.8309   LearningRate 0.0003   Epoch: 19   Global Step: 33910   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-04 01:07:24,569-Speed 13884.73 samples/sec   Loss 1.8332   LearningRate 0.0003   Epoch: 19   Global Step: 33920   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-04 01:07:42,323-Speed 13843.05 samples/sec   Loss 1.8273   LearningRate 0.0003   Epoch: 19   Global Step: 33930   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-04 01:08:00,077-Speed 13843.77 samples/sec   Loss 1.8406   LearningRate 0.0003   Epoch: 19   Global Step: 33940   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-04 01:08:17,816-Speed 13855.04 samples/sec   Loss 1.8517   LearningRate 0.0003   Epoch: 19   Global Step: 33950   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-04 01:08:35,569-Speed 13844.51 samples/sec   Loss 1.8422   LearningRate 0.0003   Epoch: 19   Global Step: 33960   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-04 01:08:53,300-Speed 13860.86 samples/sec   Loss 1.8457   LearningRate 0.0003   Epoch: 19   Global Step: 33970   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-04 01:09:10,980-Speed 13901.65 samples/sec   Loss 1.8255   LearningRate 0.0003   Epoch: 19   Global Step: 33980   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-04 01:09:28,711-Speed 13861.82 samples/sec   Loss 1.8367   LearningRate 0.0003   Epoch: 19   Global Step: 33990   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-04 01:09:46,594-Speed 13743.07 samples/sec   Loss 1.8455   LearningRate 0.0003   Epoch: 19   Global Step: 34000   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-04 01:10:04,317-Speed 13868.64 samples/sec   Loss 1.8321   LearningRate 0.0003   Epoch: 19   Global Step: 34010   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-04 01:10:22,031-Speed 13874.09 samples/sec   Loss 1.8261   LearningRate 0.0003   Epoch: 19   Global Step: 34020   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-04 01:10:39,792-Speed 13838.36 samples/sec   Loss 1.8333   LearningRate 0.0003   Epoch: 19   Global Step: 34030   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-04 01:10:57,502-Speed 13878.14 samples/sec   Loss 1.8402   LearningRate 0.0003   Epoch: 19   Global Step: 34040   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-04 01:11:15,231-Speed 13862.32 samples/sec   Loss 1.8427   LearningRate 0.0003   Epoch: 19   Global Step: 34050   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-04 01:11:33,016-Speed 13819.60 samples/sec   Loss 1.8304   LearningRate 0.0003   Epoch: 19   Global Step: 34060   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-04 01:11:50,743-Speed 13864.40 samples/sec   Loss 1.8237   LearningRate 0.0003   Epoch: 19   Global Step: 34070   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-04 01:12:08,667-Speed 13712.21 samples/sec   Loss 1.8196   LearningRate 0.0003   Epoch: 19   Global Step: 34080   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-04 01:12:26,423-Speed 13842.24 samples/sec   Loss 1.8163   LearningRate 0.0003   Epoch: 19   Global Step: 34090   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-04 01:12:44,127-Speed 13882.36 samples/sec   Loss 1.8153   LearningRate 0.0003   Epoch: 19   Global Step: 34100   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-04 01:13:01,810-Speed 13898.70 samples/sec   Loss 1.8095   LearningRate 0.0003   Epoch: 19   Global Step: 34110   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-04 01:13:19,535-Speed 13866.20 samples/sec   Loss 1.8089   LearningRate 0.0003   Epoch: 19   Global Step: 34120   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-04 01:13:37,229-Speed 13890.40 samples/sec   Loss 1.8232   LearningRate 0.0003   Epoch: 19   Global Step: 34130   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-04 01:13:54,928-Speed 13886.58 samples/sec   Loss 1.8264   LearningRate 0.0003   Epoch: 19   Global Step: 34140   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-04 01:14:12,651-Speed 13866.98 samples/sec   Loss 1.8119   LearningRate 0.0003   Epoch: 19   Global Step: 34150   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-04 01:14:30,501-Speed 13769.47 samples/sec   Loss 1.8282   LearningRate 0.0003   Epoch: 19   Global Step: 34160   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-04 01:14:48,219-Speed 13871.76 samples/sec   Loss 1.8171   LearningRate 0.0003   Epoch: 19   Global Step: 34170   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-04 01:15:05,944-Speed 13865.80 samples/sec   Loss 1.8158   LearningRate 0.0003   Epoch: 19   Global Step: 34180   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-04 01:15:23,700-Speed 13841.87 samples/sec   Loss 1.8143   LearningRate 0.0003   Epoch: 19   Global Step: 34190   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-04 01:15:41,436-Speed 13857.65 samples/sec   Loss 1.8182   LearningRate 0.0003   Epoch: 19   Global Step: 34200   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-04 01:15:59,169-Speed 13860.01 samples/sec   Loss 1.8182   LearningRate 0.0003   Epoch: 19   Global Step: 34210   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-04 01:16:16,911-Speed 13852.50 samples/sec   Loss 1.8254   LearningRate 0.0003   Epoch: 19   Global Step: 34220   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-04 01:16:34,670-Speed 13839.15 samples/sec   Loss 1.8117   LearningRate 0.0003   Epoch: 19   Global Step: 34230   Fp16 Grad Scale: 65536   Required: 18 hours
Training: 2022-03-04 01:16:52,466-Speed 13810.99 samples/sec   Loss 1.8128   LearningRate 0.0003   Epoch: 19   Global Step: 34240   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-04 01:17:10,550-Speed 13591.26 samples/sec   Loss 1.8210   LearningRate 0.0003   Epoch: 19   Global Step: 34250   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-04 01:17:28,370-Speed 13791.44 samples/sec   Loss 1.8255   LearningRate 0.0003   Epoch: 19   Global Step: 34260   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-04 01:17:46,196-Speed 13787.49 samples/sec   Loss 1.8204   LearningRate 0.0003   Epoch: 19   Global Step: 34270   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-04 01:18:03,932-Speed 13857.90 samples/sec   Loss 1.8126   LearningRate 0.0003   Epoch: 19   Global Step: 34280   Fp16 Grad Scale: 32768   Required: 18 hours
Training: 2022-03-04 01:18:21,656-Speed 13868.33 samples/sec   Loss 1.8151   LearningRate 0.0003   Epoch: 19   Global Step: 34290   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-04 01:18:39,452-Speed 13810.92 samples/sec   Loss 1.8177   LearningRate 0.0003   Epoch: 19   Global Step: 34300   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-04 01:18:57,204-Speed 13844.45 samples/sec   Loss 1.8134   LearningRate 0.0003   Epoch: 19   Global Step: 34310   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-03-04 01:19:15,083-Speed 13747.07 samples/sec   Loss 1.8269   LearningRate 0.0003   Epoch: 19   Global Step: 34320   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-03-04 01:19:32,935-Speed 13767.70 samples/sec   Loss 1.8289   LearningRate 0.0003   Epoch: 19   Global Step: 34330   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-03-04 01:19:50,697-Speed 13836.82 samples/sec   Loss 1.8152   LearningRate 0.0003   Epoch: 19   Global Step: 34340   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-03-04 01:20:08,414-Speed 13871.74 samples/sec   Loss 1.8172   LearningRate 0.0003   Epoch: 19   Global Step: 34350   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-03-04 01:20:26,266-Speed 13767.58 samples/sec   Loss 1.8182   LearningRate 0.0003   Epoch: 19   Global Step: 34360   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-03-04 01:20:44,004-Speed 13856.53 samples/sec   Loss 1.8106   LearningRate 0.0003   Epoch: 19   Global Step: 34370   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-03-04 01:21:01,776-Speed 13829.43 samples/sec   Loss 1.8258   LearningRate 0.0003   Epoch: 19   Global Step: 34380   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-03-04 01:21:19,548-Speed 13829.09 samples/sec   Loss 1.8227   LearningRate 0.0003   Epoch: 19   Global Step: 34390   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-03-04 01:21:37,286-Speed 13855.64 samples/sec   Loss 1.8133   LearningRate 0.0003   Epoch: 19   Global Step: 34400   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-03-04 01:21:55,013-Speed 13864.54 samples/sec   Loss 1.8205   LearningRate 0.0003   Epoch: 19   Global Step: 34410   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-04 01:22:12,806-Speed 13813.11 samples/sec   Loss 1.8126   LearningRate 0.0003   Epoch: 19   Global Step: 34420   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-04 01:22:30,563-Speed 13841.46 samples/sec   Loss 1.8190   LearningRate 0.0003   Epoch: 19   Global Step: 34430   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-04 01:22:48,247-Speed 13897.99 samples/sec   Loss 1.8083   LearningRate 0.0003   Epoch: 19   Global Step: 34440   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-04 01:23:06,008-Speed 13837.69 samples/sec   Loss 1.8099   LearningRate 0.0003   Epoch: 19   Global Step: 34450   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-04 01:23:23,769-Speed 13838.71 samples/sec   Loss 1.8057   LearningRate 0.0003   Epoch: 19   Global Step: 34460   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-04 01:23:41,537-Speed 13832.50 samples/sec   Loss 1.8101   LearningRate 0.0003   Epoch: 19   Global Step: 34470   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-04 01:23:59,274-Speed 13856.19 samples/sec   Loss 1.8173   LearningRate 0.0003   Epoch: 19   Global Step: 34480   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-04 01:24:17,192-Speed 13716.60 samples/sec   Loss 1.8276   LearningRate 0.0003   Epoch: 19   Global Step: 34490   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-04 01:24:34,951-Speed 13840.14 samples/sec   Loss 1.8391   LearningRate 0.0003   Epoch: 19   Global Step: 34500   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-04 01:24:52,788-Speed 13778.83 samples/sec   Loss 1.8232   LearningRate 0.0003   Epoch: 19   Global Step: 34510   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-04 01:25:10,561-Speed 13828.52 samples/sec   Loss 1.8260   LearningRate 0.0003   Epoch: 19   Global Step: 34520   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-04 01:25:28,275-Speed 13874.55 samples/sec   Loss 1.8242   LearningRate 0.0003   Epoch: 19   Global Step: 34530   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-04 01:25:46,006-Speed 13861.47 samples/sec   Loss 1.8146   LearningRate 0.0003   Epoch: 19   Global Step: 34540   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-04 01:26:03,784-Speed 13824.32 samples/sec   Loss 1.8153   LearningRate 0.0003   Epoch: 19   Global Step: 34550   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-04 01:26:21,612-Speed 13785.99 samples/sec   Loss 1.8345   LearningRate 0.0003   Epoch: 19   Global Step: 34560   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-04 01:27:30,391-Speed 3573.26 samples/sec   Loss 1.8166   LearningRate 0.0003   Epoch: 20   Global Step: 34570   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-04 01:27:47,994-Speed 13962.37 samples/sec   Loss 1.7775   LearningRate 0.0003   Epoch: 20   Global Step: 34580   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-04 01:28:05,795-Speed 13806.66 samples/sec   Loss 1.7887   LearningRate 0.0003   Epoch: 20   Global Step: 34590   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-04 01:28:23,452-Speed 13919.51 samples/sec   Loss 1.7873   LearningRate 0.0003   Epoch: 20   Global Step: 34600   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-04 01:28:41,206-Speed 13843.58 samples/sec   Loss 1.7902   LearningRate 0.0003   Epoch: 20   Global Step: 34610   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-04 01:28:58,942-Speed 13857.14 samples/sec   Loss 1.7915   LearningRate 0.0003   Epoch: 20   Global Step: 34620   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-04 01:29:16,721-Speed 13824.03 samples/sec   Loss 1.7772   LearningRate 0.0003   Epoch: 20   Global Step: 34630   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-04 01:29:34,439-Speed 13875.40 samples/sec   Loss 1.7957   LearningRate 0.0003   Epoch: 20   Global Step: 34640   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-04 01:29:52,145-Speed 13880.32 samples/sec   Loss 1.7866   LearningRate 0.0003   Epoch: 20   Global Step: 34650   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-04 01:30:09,912-Speed 13835.21 samples/sec   Loss 1.8010   LearningRate 0.0003   Epoch: 20   Global Step: 34660   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-04 01:30:27,697-Speed 13819.11 samples/sec   Loss 1.7847   LearningRate 0.0003   Epoch: 20   Global Step: 34670   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-04 01:30:45,434-Speed 13856.07 samples/sec   Loss 1.7985   LearningRate 0.0003   Epoch: 20   Global Step: 34680   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-04 01:31:03,125-Speed 13892.72 samples/sec   Loss 1.7909   LearningRate 0.0003   Epoch: 20   Global Step: 34690   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-04 01:31:20,870-Speed 13851.05 samples/sec   Loss 1.7902   LearningRate 0.0003   Epoch: 20   Global Step: 34700   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-04 01:31:38,652-Speed 13821.10 samples/sec   Loss 1.7903   LearningRate 0.0003   Epoch: 20   Global Step: 34710   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-04 01:31:56,388-Speed 13857.94 samples/sec   Loss 1.8135   LearningRate 0.0003   Epoch: 20   Global Step: 34720   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-04 01:32:14,122-Speed 13858.55 samples/sec   Loss 1.7949   LearningRate 0.0003   Epoch: 20   Global Step: 34730   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-04 01:32:31,887-Speed 13834.73 samples/sec   Loss 1.7840   LearningRate 0.0003   Epoch: 20   Global Step: 34740   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-04 01:32:49,583-Speed 13889.35 samples/sec   Loss 1.7822   LearningRate 0.0003   Epoch: 20   Global Step: 34750   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-04 01:33:07,358-Speed 13827.12 samples/sec   Loss 1.7909   LearningRate 0.0003   Epoch: 20   Global Step: 34760   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-04 01:33:25,137-Speed 13823.84 samples/sec   Loss 1.7931   LearningRate 0.0003   Epoch: 20   Global Step: 34770   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-04 01:33:42,863-Speed 13865.06 samples/sec   Loss 1.8030   LearningRate 0.0003   Epoch: 20   Global Step: 34780   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-04 01:34:00,574-Speed 13877.38 samples/sec   Loss 1.7957   LearningRate 0.0003   Epoch: 20   Global Step: 34790   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-04 01:34:18,336-Speed 13836.88 samples/sec   Loss 1.7932   LearningRate 0.0003   Epoch: 20   Global Step: 34800   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-04 01:34:36,055-Speed 13870.77 samples/sec   Loss 1.8052   LearningRate 0.0003   Epoch: 20   Global Step: 34810   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-04 01:34:53,857-Speed 13806.48 samples/sec   Loss 1.7986   LearningRate 0.0003   Epoch: 20   Global Step: 34820   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-04 01:35:11,603-Speed 13849.18 samples/sec   Loss 1.7933   LearningRate 0.0003   Epoch: 20   Global Step: 34830   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-04 01:35:29,344-Speed 13854.02 samples/sec   Loss 1.7862   LearningRate 0.0003   Epoch: 20   Global Step: 34840   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-04 01:35:47,109-Speed 13834.27 samples/sec   Loss 1.8023   LearningRate 0.0003   Epoch: 20   Global Step: 34850   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-04 01:36:04,841-Speed 13860.98 samples/sec   Loss 1.7928   LearningRate 0.0003   Epoch: 20   Global Step: 34860   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-04 01:36:22,596-Speed 13843.48 samples/sec   Loss 1.7910   LearningRate 0.0003   Epoch: 20   Global Step: 34870   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-04 01:36:40,308-Speed 13876.09 samples/sec   Loss 1.7828   LearningRate 0.0003   Epoch: 20   Global Step: 34880   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-04 01:36:58,027-Speed 13870.49 samples/sec   Loss 1.7952   LearningRate 0.0003   Epoch: 20   Global Step: 34890   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-04 01:37:15,741-Speed 13875.34 samples/sec   Loss 1.7752   LearningRate 0.0003   Epoch: 20   Global Step: 34900   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-04 01:37:33,517-Speed 13826.36 samples/sec   Loss 1.7885   LearningRate 0.0003   Epoch: 20   Global Step: 34910   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-04 01:37:51,280-Speed 13836.25 samples/sec   Loss 1.7820   LearningRate 0.0003   Epoch: 20   Global Step: 34920   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-04 01:38:09,020-Speed 13854.44 samples/sec   Loss 1.7880   LearningRate 0.0003   Epoch: 20   Global Step: 34930   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-04 01:38:26,788-Speed 13832.12 samples/sec   Loss 1.7965   LearningRate 0.0003   Epoch: 20   Global Step: 34940   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-04 01:38:44,519-Speed 13861.41 samples/sec   Loss 1.7799   LearningRate 0.0003   Epoch: 20   Global Step: 34950   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-04 01:39:02,269-Speed 13846.99 samples/sec   Loss 1.7939   LearningRate 0.0003   Epoch: 20   Global Step: 34960   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-04 01:39:20,040-Speed 13829.99 samples/sec   Loss 1.7906   LearningRate 0.0003   Epoch: 20   Global Step: 34970   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-04 01:39:37,793-Speed 13843.58 samples/sec   Loss 1.7842   LearningRate 0.0003   Epoch: 20   Global Step: 34980   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-04 01:39:55,504-Speed 13877.36 samples/sec   Loss 1.7861   LearningRate 0.0003   Epoch: 20   Global Step: 34990   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-04 01:40:13,204-Speed 13886.03 samples/sec   Loss 1.8052   LearningRate 0.0003   Epoch: 20   Global Step: 35000   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-04 01:40:30,970-Speed 13834.04 samples/sec   Loss 1.7824   LearningRate 0.0003   Epoch: 20   Global Step: 35010   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-04 01:40:48,740-Speed 13830.18 samples/sec   Loss 1.7872   LearningRate 0.0003   Epoch: 20   Global Step: 35020   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-04 01:41:06,435-Speed 13889.38 samples/sec   Loss 1.7803   LearningRate 0.0003   Epoch: 20   Global Step: 35030   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-04 01:41:24,152-Speed 13873.05 samples/sec   Loss 1.7810   LearningRate 0.0003   Epoch: 20   Global Step: 35040   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-04 01:41:41,841-Speed 13894.48 samples/sec   Loss 1.7726   LearningRate 0.0003   Epoch: 20   Global Step: 35050   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-04 01:41:59,630-Speed 13815.48 samples/sec   Loss 1.7864   LearningRate 0.0003   Epoch: 20   Global Step: 35060   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-04 01:42:17,366-Speed 13857.27 samples/sec   Loss 1.7760   LearningRate 0.0003   Epoch: 20   Global Step: 35070   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-04 01:42:35,162-Speed 13811.21 samples/sec   Loss 1.7737   LearningRate 0.0003   Epoch: 20   Global Step: 35080   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-04 01:42:52,857-Speed 13889.51 samples/sec   Loss 1.7865   LearningRate 0.0003   Epoch: 20   Global Step: 35090   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-04 01:43:10,618-Speed 13838.36 samples/sec   Loss 1.7772   LearningRate 0.0003   Epoch: 20   Global Step: 35100   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-03-04 01:43:28,306-Speed 13895.22 samples/sec   Loss 1.7795   LearningRate 0.0003   Epoch: 20   Global Step: 35110   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-03-04 01:43:46,002-Speed 13888.68 samples/sec   Loss 1.7664   LearningRate 0.0003   Epoch: 20   Global Step: 35120   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-03-04 01:44:03,781-Speed 13823.70 samples/sec   Loss 1.7792   LearningRate 0.0003   Epoch: 20   Global Step: 35130   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-03-04 01:44:21,535-Speed 13843.66 samples/sec   Loss 1.7625   LearningRate 0.0003   Epoch: 20   Global Step: 35140   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-03-04 01:44:39,279-Speed 13851.35 samples/sec   Loss 1.7658   LearningRate 0.0003   Epoch: 20   Global Step: 35150   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-03-04 01:44:57,130-Speed 13768.44 samples/sec   Loss 1.7836   LearningRate 0.0003   Epoch: 20   Global Step: 35160   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-03-04 01:45:14,985-Speed 13765.09 samples/sec   Loss 1.7730   LearningRate 0.0003   Epoch: 20   Global Step: 35170   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-03-04 01:45:32,785-Speed 13807.74 samples/sec   Loss 1.7653   LearningRate 0.0003   Epoch: 20   Global Step: 35180   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-03-04 01:45:50,628-Speed 13773.73 samples/sec   Loss 1.7648   LearningRate 0.0003   Epoch: 20   Global Step: 35190   Fp16 Grad Scale: 16384   Required: 17 hours
Training: 2022-03-04 01:46:08,657-Speed 13632.78 samples/sec   Loss 1.7692   LearningRate 0.0003   Epoch: 20   Global Step: 35200   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-04 01:46:26,784-Speed 13558.68 samples/sec   Loss 1.7763   LearningRate 0.0003   Epoch: 20   Global Step: 35210   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-04 01:46:44,859-Speed 13597.80 samples/sec   Loss 1.7788   LearningRate 0.0003   Epoch: 20   Global Step: 35220   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-04 01:47:03,007-Speed 13543.03 samples/sec   Loss 1.7850   LearningRate 0.0003   Epoch: 20   Global Step: 35230   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-04 01:47:21,063-Speed 13611.82 samples/sec   Loss 1.7803   LearningRate 0.0003   Epoch: 20   Global Step: 35240   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-04 01:47:38,757-Speed 13890.76 samples/sec   Loss 1.7781   LearningRate 0.0003   Epoch: 20   Global Step: 35250   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-04 01:47:56,489-Speed 13861.33 samples/sec   Loss 1.7654   LearningRate 0.0003   Epoch: 20   Global Step: 35260   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-04 01:48:14,318-Speed 13784.65 samples/sec   Loss 1.7605   LearningRate 0.0003   Epoch: 20   Global Step: 35270   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-04 01:48:32,071-Speed 13844.23 samples/sec   Loss 1.7742   LearningRate 0.0003   Epoch: 20   Global Step: 35280   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-04 01:48:49,812-Speed 13854.26 samples/sec   Loss 1.7772   LearningRate 0.0003   Epoch: 20   Global Step: 35290   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-04 01:49:07,568-Speed 13841.84 samples/sec   Loss 1.7747   LearningRate 0.0003   Epoch: 20   Global Step: 35300   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-04 01:49:25,443-Speed 13750.00 samples/sec   Loss 1.7675   LearningRate 0.0003   Epoch: 20   Global Step: 35310   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-04 01:49:43,369-Speed 13709.85 samples/sec   Loss 1.7754   LearningRate 0.0003   Epoch: 20   Global Step: 35320   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-04 01:50:01,127-Speed 13841.06 samples/sec   Loss 1.7785   LearningRate 0.0003   Epoch: 20   Global Step: 35330   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-04 01:50:18,881-Speed 13842.96 samples/sec   Loss 1.7633   LearningRate 0.0003   Epoch: 20   Global Step: 35340   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-04 01:50:36,582-Speed 13885.23 samples/sec   Loss 1.7661   LearningRate 0.0003   Epoch: 20   Global Step: 35350   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-04 01:50:54,374-Speed 13813.95 samples/sec   Loss 1.7644   LearningRate 0.0003   Epoch: 20   Global Step: 35360   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-04 01:51:12,146-Speed 13829.40 samples/sec   Loss 1.7732   LearningRate 0.0003   Epoch: 20   Global Step: 35370   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-04 01:51:29,988-Speed 13775.25 samples/sec   Loss 1.7778   LearningRate 0.0003   Epoch: 20   Global Step: 35380   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-04 01:51:47,675-Speed 13895.57 samples/sec   Loss 1.7586   LearningRate 0.0003   Epoch: 20   Global Step: 35390   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-04 01:52:05,399-Speed 13866.92 samples/sec   Loss 1.7707   LearningRate 0.0003   Epoch: 20   Global Step: 35400   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-04 01:52:23,067-Speed 13911.74 samples/sec   Loss 1.7649   LearningRate 0.0003   Epoch: 20   Global Step: 35410   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-04 01:52:40,914-Speed 13771.85 samples/sec   Loss 1.7625   LearningRate 0.0003   Epoch: 20   Global Step: 35420   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-04 01:52:58,711-Speed 13810.84 samples/sec   Loss 1.7629   LearningRate 0.0003   Epoch: 20   Global Step: 35430   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-04 01:53:16,479-Speed 13832.38 samples/sec   Loss 1.7689   LearningRate 0.0003   Epoch: 20   Global Step: 35440   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-04 01:53:34,319-Speed 13777.01 samples/sec   Loss 1.7665   LearningRate 0.0003   Epoch: 20   Global Step: 35450   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-04 01:53:52,144-Speed 13788.23 samples/sec   Loss 1.7525   LearningRate 0.0003   Epoch: 20   Global Step: 35460   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-04 01:54:09,980-Speed 13779.82 samples/sec   Loss 1.7663   LearningRate 0.0003   Epoch: 20   Global Step: 35470   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-04 01:54:28,075-Speed 13583.52 samples/sec   Loss 1.7616   LearningRate 0.0003   Epoch: 20   Global Step: 35480   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-04 01:54:46,091-Speed 13641.55 samples/sec   Loss 1.7629   LearningRate 0.0003   Epoch: 20   Global Step: 35490   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-04 01:55:04,226-Speed 13552.70 samples/sec   Loss 1.7665   LearningRate 0.0003   Epoch: 20   Global Step: 35500   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-04 01:55:22,262-Speed 13628.22 samples/sec   Loss 1.7583   LearningRate 0.0003   Epoch: 20   Global Step: 35510   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-04 01:55:40,022-Speed 13838.62 samples/sec   Loss 1.7477   LearningRate 0.0003   Epoch: 20   Global Step: 35520   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-04 01:55:57,944-Speed 13713.76 samples/sec   Loss 1.7426   LearningRate 0.0003   Epoch: 20   Global Step: 35530   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-04 01:56:15,752-Speed 13801.93 samples/sec   Loss 1.7585   LearningRate 0.0003   Epoch: 20   Global Step: 35540   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-04 01:56:33,656-Speed 13727.66 samples/sec   Loss 1.7638   LearningRate 0.0003   Epoch: 20   Global Step: 35550   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-04 01:56:51,512-Speed 13763.84 samples/sec   Loss 1.7562   LearningRate 0.0003   Epoch: 20   Global Step: 35560   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-04 01:57:09,332-Speed 13792.19 samples/sec   Loss 1.7701   LearningRate 0.0003   Epoch: 20   Global Step: 35570   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-04 01:57:27,229-Speed 13733.32 samples/sec   Loss 1.7611   LearningRate 0.0003   Epoch: 20   Global Step: 35580   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-04 01:57:45,021-Speed 13813.87 samples/sec   Loss 1.7525   LearningRate 0.0003   Epoch: 20   Global Step: 35590   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-04 01:58:02,769-Speed 13848.37 samples/sec   Loss 1.7653   LearningRate 0.0003   Epoch: 20   Global Step: 35600   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-04 01:58:20,469-Speed 13885.71 samples/sec   Loss 1.7310   LearningRate 0.0003   Epoch: 20   Global Step: 35610   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-04 01:58:38,231-Speed 13837.24 samples/sec   Loss 1.7537   LearningRate 0.0003   Epoch: 20   Global Step: 35620   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-04 01:58:55,969-Speed 13855.69 samples/sec   Loss 1.7505   LearningRate 0.0003   Epoch: 20   Global Step: 35630   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-04 01:59:13,735-Speed 13833.81 samples/sec   Loss 1.7547   LearningRate 0.0003   Epoch: 20   Global Step: 35640   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-04 01:59:31,552-Speed 13794.39 samples/sec   Loss 1.7559   LearningRate 0.0003   Epoch: 20   Global Step: 35650   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-04 01:59:49,311-Speed 13839.98 samples/sec   Loss 1.7581   LearningRate 0.0003   Epoch: 20   Global Step: 35660   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-04 02:00:07,010-Speed 13886.34 samples/sec   Loss 1.7490   LearningRate 0.0003   Epoch: 20   Global Step: 35670   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-04 02:00:24,690-Speed 13901.49 samples/sec   Loss 1.7372   LearningRate 0.0003   Epoch: 20   Global Step: 35680   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-04 02:00:42,466-Speed 13825.59 samples/sec   Loss 1.7554   LearningRate 0.0003   Epoch: 20   Global Step: 35690   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-04 02:01:00,154-Speed 13895.18 samples/sec   Loss 1.7456   LearningRate 0.0003   Epoch: 20   Global Step: 35700   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-04 02:01:17,903-Speed 13848.95 samples/sec   Loss 1.7437   LearningRate 0.0003   Epoch: 20   Global Step: 35710   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-04 02:01:35,612-Speed 13878.45 samples/sec   Loss 1.7521   LearningRate 0.0003   Epoch: 20   Global Step: 35720   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-04 02:01:53,304-Speed 13892.73 samples/sec   Loss 1.7471   LearningRate 0.0003   Epoch: 20   Global Step: 35730   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-04 02:02:11,042-Speed 13856.23 samples/sec   Loss 1.7623   LearningRate 0.0003   Epoch: 20   Global Step: 35740   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-04 02:02:28,832-Speed 13815.52 samples/sec   Loss 1.7460   LearningRate 0.0003   Epoch: 20   Global Step: 35750   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-04 02:02:46,654-Speed 13792.93 samples/sec   Loss 1.7382   LearningRate 0.0003   Epoch: 20   Global Step: 35760   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-04 02:03:04,413-Speed 13841.62 samples/sec   Loss 1.7462   LearningRate 0.0003   Epoch: 20   Global Step: 35770   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-04 02:03:22,095-Speed 13899.72 samples/sec   Loss 1.7468   LearningRate 0.0003   Epoch: 20   Global Step: 35780   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-04 02:03:39,863-Speed 13832.24 samples/sec   Loss 1.7432   LearningRate 0.0003   Epoch: 20   Global Step: 35790   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-04 02:03:57,579-Speed 13873.77 samples/sec   Loss 1.7480   LearningRate 0.0003   Epoch: 20   Global Step: 35800   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-04 02:04:15,366-Speed 13817.15 samples/sec   Loss 1.7410   LearningRate 0.0003   Epoch: 20   Global Step: 35810   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-04 02:04:33,087-Speed 13869.03 samples/sec   Loss 1.7351   LearningRate 0.0003   Epoch: 20   Global Step: 35820   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-04 02:04:50,873-Speed 13818.45 samples/sec   Loss 1.7502   LearningRate 0.0003   Epoch: 20   Global Step: 35830   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-04 02:05:08,644-Speed 13831.76 samples/sec   Loss 1.7428   LearningRate 0.0003   Epoch: 20   Global Step: 35840   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-04 02:05:26,424-Speed 13824.01 samples/sec   Loss 1.7469   LearningRate 0.0003   Epoch: 20   Global Step: 35850   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-04 02:05:44,197-Speed 13828.91 samples/sec   Loss 1.7361   LearningRate 0.0003   Epoch: 20   Global Step: 35860   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-04 02:06:01,986-Speed 13816.93 samples/sec   Loss 1.7565   LearningRate 0.0003   Epoch: 20   Global Step: 35870   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-04 02:06:19,744-Speed 13840.70 samples/sec   Loss 1.7395   LearningRate 0.0003   Epoch: 20   Global Step: 35880   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-04 02:06:37,539-Speed 13811.49 samples/sec   Loss 1.7500   LearningRate 0.0003   Epoch: 20   Global Step: 35890   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-04 02:06:55,288-Speed 13847.40 samples/sec   Loss 1.7495   LearningRate 0.0003   Epoch: 20   Global Step: 35900   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-04 02:07:13,019-Speed 13860.97 samples/sec   Loss 1.7348   LearningRate 0.0003   Epoch: 20   Global Step: 35910   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-04 02:07:30,775-Speed 13842.34 samples/sec   Loss 1.7395   LearningRate 0.0003   Epoch: 20   Global Step: 35920   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-04 02:07:48,541-Speed 13834.24 samples/sec   Loss 1.7458   LearningRate 0.0003   Epoch: 20   Global Step: 35930   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-04 02:08:06,370-Speed 13785.39 samples/sec   Loss 1.7329   LearningRate 0.0003   Epoch: 20   Global Step: 35940   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-04 02:08:24,155-Speed 13818.98 samples/sec   Loss 1.7286   LearningRate 0.0003   Epoch: 20   Global Step: 35950   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-04 02:08:41,884-Speed 13863.16 samples/sec   Loss 1.7276   LearningRate 0.0003   Epoch: 20   Global Step: 35960   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-04 02:08:59,740-Speed 13764.56 samples/sec   Loss 1.7224   LearningRate 0.0003   Epoch: 20   Global Step: 35970   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-04 02:09:17,600-Speed 13762.11 samples/sec   Loss 1.7410   LearningRate 0.0003   Epoch: 20   Global Step: 35980   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-04 02:09:35,369-Speed 13831.47 samples/sec   Loss 1.7457   LearningRate 0.0003   Epoch: 20   Global Step: 35990   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-04 02:09:53,144-Speed 13827.55 samples/sec   Loss 1.7499   LearningRate 0.0003   Epoch: 20   Global Step: 36000   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-04 02:10:10,861-Speed 13872.42 samples/sec   Loss 1.7311   LearningRate 0.0003   Epoch: 20   Global Step: 36010   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-04 02:10:28,668-Speed 13802.51 samples/sec   Loss 1.7480   LearningRate 0.0003   Epoch: 20   Global Step: 36020   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-04 02:10:46,416-Speed 13847.58 samples/sec   Loss 1.7419   LearningRate 0.0003   Epoch: 20   Global Step: 36030   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-04 02:11:04,266-Speed 13768.59 samples/sec   Loss 1.7251   LearningRate 0.0003   Epoch: 20   Global Step: 36040   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-04 02:11:22,008-Speed 13853.48 samples/sec   Loss 1.7266   LearningRate 0.0003   Epoch: 20   Global Step: 36050   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-04 02:11:39,772-Speed 13835.32 samples/sec   Loss 1.7348   LearningRate 0.0003   Epoch: 20   Global Step: 36060   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-04 02:11:57,469-Speed 13887.80 samples/sec   Loss 1.7469   LearningRate 0.0003   Epoch: 20   Global Step: 36070   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-04 02:12:15,228-Speed 13839.45 samples/sec   Loss 1.7315   LearningRate 0.0003   Epoch: 20   Global Step: 36080   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-04 02:12:33,000-Speed 13830.01 samples/sec   Loss 1.7331   LearningRate 0.0003   Epoch: 20   Global Step: 36090   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-04 02:12:50,859-Speed 13762.32 samples/sec   Loss 1.7348   LearningRate 0.0003   Epoch: 20   Global Step: 36100   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-04 02:13:08,669-Speed 13799.81 samples/sec   Loss 1.7327   LearningRate 0.0003   Epoch: 20   Global Step: 36110   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-04 02:13:26,393-Speed 13866.54 samples/sec   Loss 1.7403   LearningRate 0.0003   Epoch: 20   Global Step: 36120   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-04 02:13:44,108-Speed 13874.25 samples/sec   Loss 1.7312   LearningRate 0.0003   Epoch: 20   Global Step: 36130   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-04 02:14:01,830-Speed 13868.12 samples/sec   Loss 1.7200   LearningRate 0.0003   Epoch: 20   Global Step: 36140   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-04 02:14:19,697-Speed 13756.15 samples/sec   Loss 1.7195   LearningRate 0.0003   Epoch: 20   Global Step: 36150   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-04 02:14:37,503-Speed 13802.72 samples/sec   Loss 1.7189   LearningRate 0.0003   Epoch: 20   Global Step: 36160   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-04 02:14:55,336-Speed 13782.36 samples/sec   Loss 1.7342   LearningRate 0.0003   Epoch: 20   Global Step: 36170   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-04 02:15:13,146-Speed 13800.26 samples/sec   Loss 1.7266   LearningRate 0.0003   Epoch: 20   Global Step: 36180   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-04 02:15:30,987-Speed 13776.19 samples/sec   Loss 1.7440   LearningRate 0.0003   Epoch: 20   Global Step: 36190   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-04 02:15:48,746-Speed 13839.40 samples/sec   Loss 1.7323   LearningRate 0.0003   Epoch: 20   Global Step: 36200   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-04 02:16:06,536-Speed 13816.55 samples/sec   Loss 1.7320   LearningRate 0.0003   Epoch: 20   Global Step: 36210   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-04 02:16:24,392-Speed 13764.24 samples/sec   Loss 1.7440   LearningRate 0.0003   Epoch: 20   Global Step: 36220   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-04 02:16:42,355-Speed 13682.02 samples/sec   Loss 1.7350   LearningRate 0.0003   Epoch: 20   Global Step: 36230   Fp16 Grad Scale: 32768   Required: 17 hours
Training: 2022-03-04 02:17:00,165-Speed 13800.84 samples/sec   Loss 1.7408   LearningRate 0.0003   Epoch: 20   Global Step: 36240   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-04 02:17:17,999-Speed 13781.15 samples/sec   Loss 1.7416   LearningRate 0.0003   Epoch: 20   Global Step: 36250   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-04 02:17:35,886-Speed 13740.28 samples/sec   Loss 1.7473   LearningRate 0.0003   Epoch: 20   Global Step: 36260   Fp16 Grad Scale: 65536   Required: 17 hours
Training: 2022-03-04 02:17:53,768-Speed 13744.63 samples/sec   Loss 1.7442   LearningRate 0.0003   Epoch: 20   Global Step: 36270   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 02:18:11,499-Speed 13861.47 samples/sec   Loss 1.7421   LearningRate 0.0003   Epoch: 20   Global Step: 36280   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 02:18:29,309-Speed 13801.37 samples/sec   Loss 1.7503   LearningRate 0.0003   Epoch: 20   Global Step: 36290   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 02:19:36,965-Speed 3632.54 samples/sec   Loss 1.7399   LearningRate 0.0003   Epoch: 21   Global Step: 36300   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 02:19:54,700-Speed 13858.53 samples/sec   Loss 1.6956   LearningRate 0.0003   Epoch: 21   Global Step: 36310   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 02:20:12,513-Speed 13797.22 samples/sec   Loss 1.7091   LearningRate 0.0003   Epoch: 21   Global Step: 36320   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 02:20:30,245-Speed 13860.89 samples/sec   Loss 1.7294   LearningRate 0.0003   Epoch: 21   Global Step: 36330   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-03-04 02:20:48,077-Speed 13782.61 samples/sec   Loss 1.7139   LearningRate 0.0003   Epoch: 21   Global Step: 36340   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-03-04 02:21:05,921-Speed 13774.00 samples/sec   Loss 1.7020   LearningRate 0.0003   Epoch: 21   Global Step: 36350   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-03-04 02:21:23,802-Speed 13744.28 samples/sec   Loss 1.7091   LearningRate 0.0003   Epoch: 21   Global Step: 36360   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-03-04 02:21:41,713-Speed 13722.42 samples/sec   Loss 1.7156   LearningRate 0.0003   Epoch: 21   Global Step: 36370   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-03-04 02:21:59,647-Speed 13704.57 samples/sec   Loss 1.7085   LearningRate 0.0003   Epoch: 21   Global Step: 36380   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-03-04 02:22:17,532-Speed 13742.03 samples/sec   Loss 1.7040   LearningRate 0.0003   Epoch: 21   Global Step: 36390   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-03-04 02:22:35,514-Speed 13667.77 samples/sec   Loss 1.7139   LearningRate 0.0003   Epoch: 21   Global Step: 36400   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-03-04 02:22:53,414-Speed 13730.37 samples/sec   Loss 1.7030   LearningRate 0.0003   Epoch: 21   Global Step: 36410   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-03-04 02:23:11,316-Speed 13729.67 samples/sec   Loss 1.7165   LearningRate 0.0003   Epoch: 21   Global Step: 36420   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-03-04 02:23:29,493-Speed 13521.24 samples/sec   Loss 1.7177   LearningRate 0.0003   Epoch: 21   Global Step: 36430   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 02:23:47,669-Speed 13523.20 samples/sec   Loss 1.7120   LearningRate 0.0003   Epoch: 21   Global Step: 36440   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 02:24:05,877-Speed 13498.15 samples/sec   Loss 1.7035   LearningRate 0.0003   Epoch: 21   Global Step: 36450   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 02:24:23,823-Speed 13695.38 samples/sec   Loss 1.7102   LearningRate 0.0003   Epoch: 21   Global Step: 36460   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 02:24:41,585-Speed 13837.15 samples/sec   Loss 1.7104   LearningRate 0.0003   Epoch: 21   Global Step: 36470   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 02:24:59,328-Speed 13851.68 samples/sec   Loss 1.7222   LearningRate 0.0003   Epoch: 21   Global Step: 36480   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 02:25:17,077-Speed 13847.25 samples/sec   Loss 1.7201   LearningRate 0.0003   Epoch: 21   Global Step: 36490   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 02:25:34,994-Speed 13718.27 samples/sec   Loss 1.7233   LearningRate 0.0003   Epoch: 21   Global Step: 36500   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 02:25:52,762-Speed 13832.16 samples/sec   Loss 1.7071   LearningRate 0.0003   Epoch: 21   Global Step: 36510   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 02:26:10,515-Speed 13843.70 samples/sec   Loss 1.7288   LearningRate 0.0003   Epoch: 21   Global Step: 36520   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 02:26:28,491-Speed 13672.91 samples/sec   Loss 1.7147   LearningRate 0.0003   Epoch: 21   Global Step: 36530   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-04 02:26:46,289-Speed 13809.50 samples/sec   Loss 1.7206   LearningRate 0.0003   Epoch: 21   Global Step: 36540   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-04 02:27:04,132-Speed 13774.02 samples/sec   Loss 1.7211   LearningRate 0.0003   Epoch: 21   Global Step: 36550   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-04 02:27:21,882-Speed 13846.95 samples/sec   Loss 1.7150   LearningRate 0.0003   Epoch: 21   Global Step: 36560   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-04 02:27:39,744-Speed 13759.35 samples/sec   Loss 1.6997   LearningRate 0.0003   Epoch: 21   Global Step: 36570   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 02:27:57,527-Speed 13821.92 samples/sec   Loss 1.6962   LearningRate 0.0003   Epoch: 21   Global Step: 36580   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 02:28:15,549-Speed 13637.77 samples/sec   Loss 1.7053   LearningRate 0.0003   Epoch: 21   Global Step: 36590   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 02:28:33,606-Speed 13611.11 samples/sec   Loss 1.6995   LearningRate 0.0003   Epoch: 21   Global Step: 36600   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 02:28:51,419-Speed 13797.46 samples/sec   Loss 1.7121   LearningRate 0.0003   Epoch: 21   Global Step: 36610   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 02:29:09,300-Speed 13744.89 samples/sec   Loss 1.7205   LearningRate 0.0003   Epoch: 21   Global Step: 36620   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 02:29:27,144-Speed 13773.27 samples/sec   Loss 1.7334   LearningRate 0.0003   Epoch: 21   Global Step: 36630   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 02:29:44,944-Speed 13808.20 samples/sec   Loss 1.7118   LearningRate 0.0003   Epoch: 21   Global Step: 36640   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 02:30:02,757-Speed 13797.07 samples/sec   Loss 1.7077   LearningRate 0.0003   Epoch: 21   Global Step: 36650   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 02:30:20,562-Speed 13804.15 samples/sec   Loss 1.7160   LearningRate 0.0003   Epoch: 21   Global Step: 36660   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-03-04 02:30:38,399-Speed 13778.56 samples/sec   Loss 1.7088   LearningRate 0.0003   Epoch: 21   Global Step: 36670   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-03-04 02:30:56,201-Speed 13806.34 samples/sec   Loss 1.7109   LearningRate 0.0003   Epoch: 21   Global Step: 36680   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-03-04 02:31:14,104-Speed 13728.18 samples/sec   Loss 1.7070   LearningRate 0.0003   Epoch: 21   Global Step: 36690   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-03-04 02:31:31,980-Speed 13748.92 samples/sec   Loss 1.7137   LearningRate 0.0003   Epoch: 21   Global Step: 36700   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-03-04 02:31:49,873-Speed 13735.77 samples/sec   Loss 1.7200   LearningRate 0.0003   Epoch: 21   Global Step: 36710   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-03-04 02:32:07,708-Speed 13780.49 samples/sec   Loss 1.7049   LearningRate 0.0003   Epoch: 21   Global Step: 36720   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-03-04 02:32:25,663-Speed 13688.12 samples/sec   Loss 1.7169   LearningRate 0.0003   Epoch: 21   Global Step: 36730   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-03-04 02:32:43,672-Speed 13647.53 samples/sec   Loss 1.7137   LearningRate 0.0003   Epoch: 21   Global Step: 36740   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-03-04 02:33:01,728-Speed 13612.41 samples/sec   Loss 1.7123   LearningRate 0.0003   Epoch: 21   Global Step: 36750   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-03-04 02:33:19,586-Speed 13762.36 samples/sec   Loss 1.6982   LearningRate 0.0003   Epoch: 21   Global Step: 36760   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 02:33:37,485-Speed 13731.56 samples/sec   Loss 1.7069   LearningRate 0.0003   Epoch: 21   Global Step: 36770   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 02:33:55,356-Speed 13753.82 samples/sec   Loss 1.7034   LearningRate 0.0003   Epoch: 21   Global Step: 36780   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 02:34:13,349-Speed 13659.93 samples/sec   Loss 1.6935   LearningRate 0.0003   Epoch: 21   Global Step: 36790   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 02:34:31,215-Speed 13756.35 samples/sec   Loss 1.6919   LearningRate 0.0003   Epoch: 21   Global Step: 36800   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 02:34:49,155-Speed 13699.99 samples/sec   Loss 1.7008   LearningRate 0.0003   Epoch: 21   Global Step: 36810   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 02:35:07,078-Speed 13712.31 samples/sec   Loss 1.6982   LearningRate 0.0003   Epoch: 21   Global Step: 36820   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 02:35:25,025-Speed 13695.19 samples/sec   Loss 1.7060   LearningRate 0.0003   Epoch: 21   Global Step: 36830   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 02:35:42,936-Speed 13721.63 samples/sec   Loss 1.6949   LearningRate 0.0003   Epoch: 21   Global Step: 36840   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 02:36:00,670-Speed 13859.77 samples/sec   Loss 1.6960   LearningRate 0.0003   Epoch: 21   Global Step: 36850   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 02:36:18,633-Speed 13682.62 samples/sec   Loss 1.7094   LearningRate 0.0003   Epoch: 21   Global Step: 36860   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-04 02:36:36,413-Speed 13823.27 samples/sec   Loss 1.7138   LearningRate 0.0003   Epoch: 21   Global Step: 36870   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-04 02:36:54,304-Speed 13737.15 samples/sec   Loss 1.7022   LearningRate 0.0003   Epoch: 21   Global Step: 36880   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-04 02:37:12,259-Speed 13688.34 samples/sec   Loss 1.6955   LearningRate 0.0003   Epoch: 21   Global Step: 36890   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-04 02:37:30,234-Speed 13673.25 samples/sec   Loss 1.6924   LearningRate 0.0003   Epoch: 21   Global Step: 36900   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-04 02:37:48,052-Speed 13793.75 samples/sec   Loss 1.6870   LearningRate 0.0003   Epoch: 21   Global Step: 36910   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 02:38:05,898-Speed 13773.45 samples/sec   Loss 1.6870   LearningRate 0.0003   Epoch: 21   Global Step: 36920   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-03-04 02:38:23,702-Speed 13805.33 samples/sec   Loss 1.7066   LearningRate 0.0003   Epoch: 21   Global Step: 36930   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-03-04 02:38:41,453-Speed 13846.64 samples/sec   Loss 1.6956   LearningRate 0.0003   Epoch: 21   Global Step: 36940   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-03-04 02:38:59,245-Speed 13814.19 samples/sec   Loss 1.7070   LearningRate 0.0003   Epoch: 21   Global Step: 36950   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-03-04 02:39:17,088-Speed 13774.37 samples/sec   Loss 1.6944   LearningRate 0.0003   Epoch: 21   Global Step: 36960   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-03-04 02:39:34,995-Speed 13725.52 samples/sec   Loss 1.6975   LearningRate 0.0003   Epoch: 21   Global Step: 36970   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-03-04 02:39:52,819-Speed 13788.76 samples/sec   Loss 1.6944   LearningRate 0.0003   Epoch: 21   Global Step: 36980   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-03-04 02:40:10,658-Speed 13777.09 samples/sec   Loss 1.6973   LearningRate 0.0003   Epoch: 21   Global Step: 36990   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-03-04 02:40:28,423-Speed 13835.50 samples/sec   Loss 1.6893   LearningRate 0.0003   Epoch: 21   Global Step: 37000   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-03-04 02:40:46,315-Speed 13736.42 samples/sec   Loss 1.6927   LearningRate 0.0003   Epoch: 21   Global Step: 37010   Fp16 Grad Scale: 16384   Required: 16 hours
Training: 2022-03-04 02:41:04,226-Speed 13721.69 samples/sec   Loss 1.7018   LearningRate 0.0003   Epoch: 21   Global Step: 37020   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 02:41:22,038-Speed 13798.96 samples/sec   Loss 1.6925   LearningRate 0.0003   Epoch: 21   Global Step: 37030   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 02:41:39,923-Speed 13741.87 samples/sec   Loss 1.6808   LearningRate 0.0003   Epoch: 21   Global Step: 37040   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 02:41:57,689-Speed 13833.97 samples/sec   Loss 1.7022   LearningRate 0.0003   Epoch: 21   Global Step: 37050   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 02:42:15,458-Speed 13831.97 samples/sec   Loss 1.6808   LearningRate 0.0003   Epoch: 21   Global Step: 37060   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 02:42:33,221-Speed 13836.20 samples/sec   Loss 1.6931   LearningRate 0.0003   Epoch: 21   Global Step: 37070   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 02:42:51,139-Speed 13716.78 samples/sec   Loss 1.6901   LearningRate 0.0003   Epoch: 21   Global Step: 37080   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 02:43:09,093-Speed 13689.69 samples/sec   Loss 1.6843   LearningRate 0.0003   Epoch: 21   Global Step: 37090   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 02:43:26,788-Speed 13891.54 samples/sec   Loss 1.6922   LearningRate 0.0003   Epoch: 21   Global Step: 37100   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 02:43:44,518-Speed 13861.90 samples/sec   Loss 1.6869   LearningRate 0.0003   Epoch: 21   Global Step: 37110   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 02:44:02,361-Speed 13774.24 samples/sec   Loss 1.6739   LearningRate 0.0003   Epoch: 21   Global Step: 37120   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-04 02:44:20,186-Speed 13788.60 samples/sec   Loss 1.6824   LearningRate 0.0003   Epoch: 21   Global Step: 37130   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-04 02:44:38,020-Speed 13781.08 samples/sec   Loss 1.6820   LearningRate 0.0003   Epoch: 21   Global Step: 37140   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-04 02:44:55,730-Speed 13878.82 samples/sec   Loss 1.6725   LearningRate 0.0003   Epoch: 21   Global Step: 37150   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-04 02:45:13,588-Speed 13761.92 samples/sec   Loss 1.6972   LearningRate 0.0003   Epoch: 21   Global Step: 37160   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-04 02:45:31,465-Speed 13748.42 samples/sec   Loss 1.6883   LearningRate 0.0003   Epoch: 21   Global Step: 37170   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-04 02:45:49,312-Speed 13771.56 samples/sec   Loss 1.6794   LearningRate 0.0003   Epoch: 21   Global Step: 37180   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-04 02:46:07,150-Speed 13778.21 samples/sec   Loss 1.6792   LearningRate 0.0003   Epoch: 21   Global Step: 37190   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 02:46:24,983-Speed 13782.09 samples/sec   Loss 1.6949   LearningRate 0.0003   Epoch: 21   Global Step: 37200   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 02:46:42,823-Speed 13776.15 samples/sec   Loss 1.6847   LearningRate 0.0003   Epoch: 21   Global Step: 37210   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 02:47:00,714-Speed 13737.78 samples/sec   Loss 1.6899   LearningRate 0.0003   Epoch: 21   Global Step: 37220   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 02:47:18,566-Speed 13766.94 samples/sec   Loss 1.6825   LearningRate 0.0003   Epoch: 21   Global Step: 37230   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 02:47:36,576-Speed 13647.11 samples/sec   Loss 1.6700   LearningRate 0.0003   Epoch: 21   Global Step: 37240   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 02:47:54,359-Speed 13820.86 samples/sec   Loss 1.6821   LearningRate 0.0003   Epoch: 21   Global Step: 37250   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 02:48:12,163-Speed 13804.87 samples/sec   Loss 1.6784   LearningRate 0.0003   Epoch: 21   Global Step: 37260   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 02:48:29,940-Speed 13825.17 samples/sec   Loss 1.6655   LearningRate 0.0003   Epoch: 21   Global Step: 37270   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 02:48:47,764-Speed 13788.70 samples/sec   Loss 1.6739   LearningRate 0.0003   Epoch: 21   Global Step: 37280   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 02:49:05,550-Speed 13819.65 samples/sec   Loss 1.6802   LearningRate 0.0003   Epoch: 21   Global Step: 37290   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 02:49:23,400-Speed 13769.16 samples/sec   Loss 1.6790   LearningRate 0.0003   Epoch: 21   Global Step: 37300   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 02:49:41,214-Speed 13796.58 samples/sec   Loss 1.6724   LearningRate 0.0003   Epoch: 21   Global Step: 37310   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 02:49:58,965-Speed 13845.62 samples/sec   Loss 1.6726   LearningRate 0.0003   Epoch: 21   Global Step: 37320   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 02:50:16,869-Speed 13727.11 samples/sec   Loss 1.6857   LearningRate 0.0003   Epoch: 21   Global Step: 37330   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 02:50:34,603-Speed 13859.37 samples/sec   Loss 1.6730   LearningRate 0.0003   Epoch: 21   Global Step: 37340   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 02:50:52,408-Speed 13803.90 samples/sec   Loss 1.6762   LearningRate 0.0003   Epoch: 21   Global Step: 37350   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 02:51:10,220-Speed 13797.56 samples/sec   Loss 1.6729   LearningRate 0.0003   Epoch: 21   Global Step: 37360   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 02:51:28,040-Speed 13792.58 samples/sec   Loss 1.6733   LearningRate 0.0003   Epoch: 21   Global Step: 37370   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 02:51:45,808-Speed 13832.83 samples/sec   Loss 1.6709   LearningRate 0.0003   Epoch: 21   Global Step: 37380   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 02:52:03,600-Speed 13813.36 samples/sec   Loss 1.6688   LearningRate 0.0003   Epoch: 21   Global Step: 37390   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-04 02:52:21,474-Speed 13750.08 samples/sec   Loss 1.6768   LearningRate 0.0003   Epoch: 21   Global Step: 37400   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-04 02:52:39,475-Speed 13653.41 samples/sec   Loss 1.6731   LearningRate 0.0003   Epoch: 21   Global Step: 37410   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 02:52:57,335-Speed 13761.83 samples/sec   Loss 1.6822   LearningRate 0.0003   Epoch: 21   Global Step: 37420   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 02:53:15,066-Speed 13860.87 samples/sec   Loss 1.6736   LearningRate 0.0003   Epoch: 21   Global Step: 37430   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 02:53:32,892-Speed 13787.52 samples/sec   Loss 1.6584   LearningRate 0.0003   Epoch: 21   Global Step: 37440   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 02:53:50,804-Speed 13721.89 samples/sec   Loss 1.6690   LearningRate 0.0003   Epoch: 21   Global Step: 37450   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 02:54:08,657-Speed 13766.70 samples/sec   Loss 1.6818   LearningRate 0.0003   Epoch: 21   Global Step: 37460   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 02:54:26,438-Speed 13822.13 samples/sec   Loss 1.6651   LearningRate 0.0003   Epoch: 21   Global Step: 37470   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 02:54:44,194-Speed 13841.90 samples/sec   Loss 1.6613   LearningRate 0.0003   Epoch: 21   Global Step: 37480   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 02:55:02,125-Speed 13706.47 samples/sec   Loss 1.6850   LearningRate 0.0003   Epoch: 21   Global Step: 37490   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 02:55:20,225-Speed 13578.78 samples/sec   Loss 1.6634   LearningRate 0.0003   Epoch: 21   Global Step: 37500   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 02:55:38,230-Speed 13650.55 samples/sec   Loss 1.6629   LearningRate 0.0003   Epoch: 21   Global Step: 37510   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-04 02:55:56,388-Speed 13535.28 samples/sec   Loss 1.6723   LearningRate 0.0003   Epoch: 21   Global Step: 37520   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-04 02:56:14,499-Speed 13570.24 samples/sec   Loss 1.6614   LearningRate 0.0003   Epoch: 21   Global Step: 37530   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-04 02:56:32,577-Speed 13596.05 samples/sec   Loss 1.6623   LearningRate 0.0003   Epoch: 21   Global Step: 37540   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-04 02:56:50,614-Speed 13625.78 samples/sec   Loss 1.6700   LearningRate 0.0003   Epoch: 21   Global Step: 37550   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-04 02:57:08,767-Speed 13539.18 samples/sec   Loss 1.6696   LearningRate 0.0003   Epoch: 21   Global Step: 37560   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-04 02:57:26,800-Speed 13628.91 samples/sec   Loss 1.6712   LearningRate 0.0003   Epoch: 21   Global Step: 37570   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 02:57:44,932-Speed 13554.91 samples/sec   Loss 1.6728   LearningRate 0.0003   Epoch: 21   Global Step: 37580   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 02:58:03,015-Speed 13592.36 samples/sec   Loss 1.6724   LearningRate 0.0003   Epoch: 21   Global Step: 37590   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 02:58:21,029-Speed 13643.52 samples/sec   Loss 1.6705   LearningRate 0.0003   Epoch: 21   Global Step: 37600   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 02:58:39,145-Speed 13566.48 samples/sec   Loss 1.6665   LearningRate 0.0003   Epoch: 21   Global Step: 37610   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 02:58:57,240-Speed 13582.75 samples/sec   Loss 1.6559   LearningRate 0.0003   Epoch: 21   Global Step: 37620   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 02:59:15,045-Speed 13804.09 samples/sec   Loss 1.6709   LearningRate 0.0003   Epoch: 21   Global Step: 37630   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 02:59:32,825-Speed 13822.71 samples/sec   Loss 1.6650   LearningRate 0.0003   Epoch: 21   Global Step: 37640   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 02:59:50,689-Speed 13758.39 samples/sec   Loss 1.6647   LearningRate 0.0003   Epoch: 21   Global Step: 37650   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 03:00:08,529-Speed 13776.38 samples/sec   Loss 1.6611   LearningRate 0.0003   Epoch: 21   Global Step: 37660   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 03:00:26,365-Speed 13779.92 samples/sec   Loss 1.6701   LearningRate 0.0003   Epoch: 21   Global Step: 37670   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-04 03:00:44,149-Speed 13820.01 samples/sec   Loss 1.6694   LearningRate 0.0003   Epoch: 21   Global Step: 37680   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-04 03:01:01,957-Speed 13801.69 samples/sec   Loss 1.6663   LearningRate 0.0003   Epoch: 21   Global Step: 37690   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-04 03:01:19,737-Speed 13822.53 samples/sec   Loss 1.6698   LearningRate 0.0003   Epoch: 21   Global Step: 37700   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-04 03:01:37,503-Speed 13835.07 samples/sec   Loss 1.6674   LearningRate 0.0003   Epoch: 21   Global Step: 37710   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-04 03:01:55,319-Speed 13795.25 samples/sec   Loss 1.6586   LearningRate 0.0003   Epoch: 21   Global Step: 37720   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-04 03:02:13,170-Speed 13769.07 samples/sec   Loss 1.6624   LearningRate 0.0003   Epoch: 21   Global Step: 37730   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-04 03:02:31,018-Speed 13770.10 samples/sec   Loss 1.6497   LearningRate 0.0003   Epoch: 21   Global Step: 37740   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 03:02:48,921-Speed 13728.67 samples/sec   Loss 1.6587   LearningRate 0.0003   Epoch: 21   Global Step: 37750   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 03:03:06,739-Speed 13793.23 samples/sec   Loss 1.6669   LearningRate 0.0003   Epoch: 21   Global Step: 37760   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 03:03:24,531-Speed 13814.86 samples/sec   Loss 1.6556   LearningRate 0.0003   Epoch: 21   Global Step: 37770   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 03:03:42,423-Speed 13737.23 samples/sec   Loss 1.6627   LearningRate 0.0003   Epoch: 21   Global Step: 37780   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 03:04:00,214-Speed 13815.70 samples/sec   Loss 1.6544   LearningRate 0.0003   Epoch: 21   Global Step: 37790   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 03:04:18,074-Speed 13761.48 samples/sec   Loss 1.6646   LearningRate 0.0003   Epoch: 21   Global Step: 37800   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 03:04:35,972-Speed 13731.46 samples/sec   Loss 1.6525   LearningRate 0.0003   Epoch: 21   Global Step: 37810   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 03:04:53,797-Speed 13787.98 samples/sec   Loss 1.6662   LearningRate 0.0003   Epoch: 21   Global Step: 37820   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 03:05:11,587-Speed 13815.76 samples/sec   Loss 1.6718   LearningRate 0.0003   Epoch: 21   Global Step: 37830   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 03:05:29,410-Speed 13790.83 samples/sec   Loss 1.6634   LearningRate 0.0003   Epoch: 21   Global Step: 37840   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-04 03:05:47,218-Speed 13801.48 samples/sec   Loss 1.6626   LearningRate 0.0003   Epoch: 21   Global Step: 37850   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-04 03:06:05,025-Speed 13802.08 samples/sec   Loss 1.6633   LearningRate 0.0003   Epoch: 21   Global Step: 37860   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 03:06:22,846-Speed 13791.42 samples/sec   Loss 1.6484   LearningRate 0.0003   Epoch: 21   Global Step: 37870   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 03:06:40,680-Speed 13781.70 samples/sec   Loss 1.6497   LearningRate 0.0003   Epoch: 21   Global Step: 37880   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 03:06:58,507-Speed 13786.91 samples/sec   Loss 1.6471   LearningRate 0.0003   Epoch: 21   Global Step: 37890   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 03:07:16,359-Speed 13766.98 samples/sec   Loss 1.6505   LearningRate 0.0003   Epoch: 21   Global Step: 37900   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 03:07:34,216-Speed 13763.34 samples/sec   Loss 1.6716   LearningRate 0.0003   Epoch: 21   Global Step: 37910   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 03:07:51,947-Speed 13861.86 samples/sec   Loss 1.6679   LearningRate 0.0003   Epoch: 21   Global Step: 37920   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 03:08:09,773-Speed 13787.86 samples/sec   Loss 1.6613   LearningRate 0.0003   Epoch: 21   Global Step: 37930   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 03:08:27,641-Speed 13754.84 samples/sec   Loss 1.6584   LearningRate 0.0003   Epoch: 21   Global Step: 37940   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 03:08:45,426-Speed 13819.60 samples/sec   Loss 1.6587   LearningRate 0.0003   Epoch: 21   Global Step: 37950   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 03:09:03,223-Speed 13809.65 samples/sec   Loss 1.6601   LearningRate 0.0003   Epoch: 21   Global Step: 37960   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-04 03:09:20,973-Speed 13846.67 samples/sec   Loss 1.6658   LearningRate 0.0003   Epoch: 21   Global Step: 37970   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-04 03:09:38,740-Speed 13833.24 samples/sec   Loss 1.6583   LearningRate 0.0003   Epoch: 21   Global Step: 37980   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-04 03:09:56,578-Speed 13778.33 samples/sec   Loss 1.6738   LearningRate 0.0003   Epoch: 21   Global Step: 37990   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-04 03:10:14,422-Speed 13773.62 samples/sec   Loss 1.6716   LearningRate 0.0003   Epoch: 21   Global Step: 38000   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-04 03:10:32,277-Speed 13765.12 samples/sec   Loss 1.6641   LearningRate 0.0003   Epoch: 21   Global Step: 38010   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-04 03:10:50,058-Speed 13822.77 samples/sec   Loss 1.6754   LearningRate 0.0002   Epoch: 21   Global Step: 38020   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-04 03:11:57,981-Speed 3618.23 samples/sec   Loss 1.6381   LearningRate 0.0002   Epoch: 22   Global Step: 38030   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-04 03:12:15,706-Speed 13866.69 samples/sec   Loss 1.6399   LearningRate 0.0002   Epoch: 22   Global Step: 38040   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-04 03:12:33,435-Speed 13862.55 samples/sec   Loss 1.6378   LearningRate 0.0002   Epoch: 22   Global Step: 38050   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-04 03:12:51,127-Speed 13891.93 samples/sec   Loss 1.6354   LearningRate 0.0002   Epoch: 22   Global Step: 38060   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 03:13:08,897-Speed 13830.59 samples/sec   Loss 1.6259   LearningRate 0.0002   Epoch: 22   Global Step: 38070   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 03:13:26,736-Speed 13778.27 samples/sec   Loss 1.6365   LearningRate 0.0002   Epoch: 22   Global Step: 38080   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 03:13:44,557-Speed 13791.20 samples/sec   Loss 1.6336   LearningRate 0.0002   Epoch: 22   Global Step: 38090   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 03:14:02,284-Speed 13865.58 samples/sec   Loss 1.6430   LearningRate 0.0002   Epoch: 22   Global Step: 38100   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 03:14:20,149-Speed 13756.89 samples/sec   Loss 1.6376   LearningRate 0.0002   Epoch: 22   Global Step: 38110   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 03:14:37,992-Speed 13774.68 samples/sec   Loss 1.6374   LearningRate 0.0002   Epoch: 22   Global Step: 38120   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 03:14:55,738-Speed 13849.73 samples/sec   Loss 1.6397   LearningRate 0.0002   Epoch: 22   Global Step: 38130   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 03:15:13,624-Speed 13741.47 samples/sec   Loss 1.6188   LearningRate 0.0002   Epoch: 22   Global Step: 38140   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 03:15:31,497-Speed 13750.91 samples/sec   Loss 1.6306   LearningRate 0.0002   Epoch: 22   Global Step: 38150   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 03:15:49,301-Speed 13805.07 samples/sec   Loss 1.6351   LearningRate 0.0002   Epoch: 22   Global Step: 38160   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-04 03:16:07,240-Speed 13700.44 samples/sec   Loss 1.6356   LearningRate 0.0002   Epoch: 22   Global Step: 38170   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 03:16:25,133-Speed 13735.79 samples/sec   Loss 1.6375   LearningRate 0.0002   Epoch: 22   Global Step: 38180   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 03:16:42,937-Speed 13804.58 samples/sec   Loss 1.6500   LearningRate 0.0002   Epoch: 22   Global Step: 38190   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 03:17:00,766-Speed 13784.98 samples/sec   Loss 1.6464   LearningRate 0.0002   Epoch: 22   Global Step: 38200   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 03:17:18,741-Speed 13673.36 samples/sec   Loss 1.6413   LearningRate 0.0002   Epoch: 22   Global Step: 38210   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 03:17:36,489-Speed 13848.16 samples/sec   Loss 1.6294   LearningRate 0.0002   Epoch: 22   Global Step: 38220   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 03:17:54,318-Speed 13785.25 samples/sec   Loss 1.6447   LearningRate 0.0002   Epoch: 22   Global Step: 38230   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 03:18:12,108-Speed 13815.20 samples/sec   Loss 1.6479   LearningRate 0.0002   Epoch: 22   Global Step: 38240   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 03:18:30,009-Speed 13730.13 samples/sec   Loss 1.6364   LearningRate 0.0002   Epoch: 22   Global Step: 38250   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 03:18:47,857-Speed 13769.98 samples/sec   Loss 1.6495   LearningRate 0.0002   Epoch: 22   Global Step: 38260   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 03:19:05,775-Speed 13716.73 samples/sec   Loss 1.6284   LearningRate 0.0002   Epoch: 22   Global Step: 38270   Fp16 Grad Scale: 65536   Required: 16 hours
Training: 2022-03-04 03:19:23,584-Speed 13800.55 samples/sec   Loss 1.6393   LearningRate 0.0002   Epoch: 22   Global Step: 38280   Fp16 Grad Scale: 32768   Required: 16 hours
Training: 2022-03-04 03:19:41,447-Speed 13759.69 samples/sec   Loss 1.6240   LearningRate 0.0002   Epoch: 22   Global Step: 38290   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-04 03:19:59,322-Speed 13751.03 samples/sec   Loss 1.6475   LearningRate 0.0002   Epoch: 22   Global Step: 38300   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-04 03:20:17,222-Speed 13731.04 samples/sec   Loss 1.6347   LearningRate 0.0002   Epoch: 22   Global Step: 38310   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-04 03:20:34,959-Speed 13856.53 samples/sec   Loss 1.6378   LearningRate 0.0002   Epoch: 22   Global Step: 38320   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-04 03:20:52,779-Speed 13792.84 samples/sec   Loss 1.6419   LearningRate 0.0002   Epoch: 22   Global Step: 38330   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-04 03:21:10,615-Speed 13779.33 samples/sec   Loss 1.6456   LearningRate 0.0002   Epoch: 22   Global Step: 38340   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-04 03:21:28,509-Speed 13735.69 samples/sec   Loss 1.6372   LearningRate 0.0002   Epoch: 22   Global Step: 38350   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-04 03:21:46,317-Speed 13800.74 samples/sec   Loss 1.6308   LearningRate 0.0002   Epoch: 22   Global Step: 38360   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-04 03:22:04,222-Speed 13727.99 samples/sec   Loss 1.6373   LearningRate 0.0002   Epoch: 22   Global Step: 38370   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-04 03:22:22,113-Speed 13737.09 samples/sec   Loss 1.6426   LearningRate 0.0002   Epoch: 22   Global Step: 38380   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-04 03:22:40,053-Speed 13699.80 samples/sec   Loss 1.6388   LearningRate 0.0002   Epoch: 22   Global Step: 38390   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-04 03:22:57,939-Speed 13741.24 samples/sec   Loss 1.6255   LearningRate 0.0002   Epoch: 22   Global Step: 38400   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-04 03:23:16,159-Speed 13489.55 samples/sec   Loss 1.6369   LearningRate 0.0002   Epoch: 22   Global Step: 38410   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-04 03:23:34,289-Speed 13556.60 samples/sec   Loss 1.6275   LearningRate 0.0002   Epoch: 22   Global Step: 38420   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-04 03:23:52,450-Speed 13532.64 samples/sec   Loss 1.6452   LearningRate 0.0002   Epoch: 22   Global Step: 38430   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-04 03:24:10,556-Speed 13575.86 samples/sec   Loss 1.6288   LearningRate 0.0002   Epoch: 22   Global Step: 38440   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-04 03:24:28,811-Speed 13463.56 samples/sec   Loss 1.6340   LearningRate 0.0002   Epoch: 22   Global Step: 38450   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-04 03:24:46,988-Speed 13521.77 samples/sec   Loss 1.6421   LearningRate 0.0002   Epoch: 22   Global Step: 38460   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-04 03:25:05,098-Speed 13570.72 samples/sec   Loss 1.6294   LearningRate 0.0002   Epoch: 22   Global Step: 38470   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-04 03:25:23,217-Speed 13564.71 samples/sec   Loss 1.6470   LearningRate 0.0002   Epoch: 22   Global Step: 38480   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-04 03:25:41,480-Speed 13458.98 samples/sec   Loss 1.6314   LearningRate 0.0002   Epoch: 22   Global Step: 38490   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-04 03:25:59,724-Speed 13471.78 samples/sec   Loss 1.6284   LearningRate 0.0002   Epoch: 22   Global Step: 38500   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-03-04 03:26:18,047-Speed 13413.67 samples/sec   Loss 1.6164   LearningRate 0.0002   Epoch: 22   Global Step: 38510   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-03-04 03:26:36,241-Speed 13507.99 samples/sec   Loss 1.6269   LearningRate 0.0002   Epoch: 22   Global Step: 38520   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-03-04 03:26:54,404-Speed 13531.44 samples/sec   Loss 1.6238   LearningRate 0.0002   Epoch: 22   Global Step: 38530   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-03-04 03:27:12,542-Speed 13551.05 samples/sec   Loss 1.6301   LearningRate 0.0002   Epoch: 22   Global Step: 38540   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-03-04 03:27:30,717-Speed 13522.95 samples/sec   Loss 1.6241   LearningRate 0.0002   Epoch: 22   Global Step: 38550   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-03-04 03:27:48,873-Speed 13536.75 samples/sec   Loss 1.6246   LearningRate 0.0002   Epoch: 22   Global Step: 38560   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-03-04 03:28:06,988-Speed 13567.11 samples/sec   Loss 1.6287   LearningRate 0.0002   Epoch: 22   Global Step: 38570   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-03-04 03:28:25,185-Speed 13506.10 samples/sec   Loss 1.6301   LearningRate 0.0002   Epoch: 22   Global Step: 38580   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-03-04 03:28:43,305-Speed 13564.00 samples/sec   Loss 1.6227   LearningRate 0.0002   Epoch: 22   Global Step: 38590   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-03-04 03:29:01,468-Speed 13531.85 samples/sec   Loss 1.6267   LearningRate 0.0002   Epoch: 22   Global Step: 38600   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-04 03:29:19,567-Speed 13579.04 samples/sec   Loss 1.6286   LearningRate 0.0002   Epoch: 22   Global Step: 38610   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-04 03:29:37,710-Speed 13546.92 samples/sec   Loss 1.6187   LearningRate 0.0002   Epoch: 22   Global Step: 38620   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-04 03:29:55,939-Speed 13483.26 samples/sec   Loss 1.6364   LearningRate 0.0002   Epoch: 22   Global Step: 38630   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-04 03:30:14,113-Speed 13523.31 samples/sec   Loss 1.6249   LearningRate 0.0002   Epoch: 22   Global Step: 38640   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-04 03:30:32,241-Speed 13557.52 samples/sec   Loss 1.6284   LearningRate 0.0002   Epoch: 22   Global Step: 38650   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-03-04 03:30:50,385-Speed 13545.45 samples/sec   Loss 1.6160   LearningRate 0.0002   Epoch: 22   Global Step: 38660   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-03-04 03:31:08,212-Speed 13786.84 samples/sec   Loss 1.6256   LearningRate 0.0002   Epoch: 22   Global Step: 38670   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-03-04 03:31:26,100-Speed 13740.02 samples/sec   Loss 1.6223   LearningRate 0.0002   Epoch: 22   Global Step: 38680   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-03-04 03:31:43,951-Speed 13768.09 samples/sec   Loss 1.6247   LearningRate 0.0002   Epoch: 22   Global Step: 38690   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-03-04 03:32:01,790-Speed 13776.96 samples/sec   Loss 1.6303   LearningRate 0.0002   Epoch: 22   Global Step: 38700   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-03-04 03:32:19,526-Speed 13857.77 samples/sec   Loss 1.6315   LearningRate 0.0002   Epoch: 22   Global Step: 38710   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-03-04 03:32:37,293-Speed 13833.51 samples/sec   Loss 1.6122   LearningRate 0.0002   Epoch: 22   Global Step: 38720   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-03-04 03:32:55,073-Speed 13823.36 samples/sec   Loss 1.6260   LearningRate 0.0002   Epoch: 22   Global Step: 38730   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-03-04 03:33:12,826-Speed 13844.06 samples/sec   Loss 1.6183   LearningRate 0.0002   Epoch: 22   Global Step: 38740   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-03-04 03:33:30,619-Speed 13813.19 samples/sec   Loss 1.6142   LearningRate 0.0002   Epoch: 22   Global Step: 38750   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-04 03:33:48,484-Speed 13757.22 samples/sec   Loss 1.6221   LearningRate 0.0002   Epoch: 22   Global Step: 38760   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-04 03:34:06,204-Speed 13870.31 samples/sec   Loss 1.6284   LearningRate 0.0002   Epoch: 22   Global Step: 38770   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-04 03:34:23,971-Speed 13832.48 samples/sec   Loss 1.6087   LearningRate 0.0002   Epoch: 22   Global Step: 38780   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-04 03:34:41,750-Speed 13824.38 samples/sec   Loss 1.6190   LearningRate 0.0002   Epoch: 22   Global Step: 38790   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-04 03:34:59,521-Speed 13830.23 samples/sec   Loss 1.6108   LearningRate 0.0002   Epoch: 22   Global Step: 38800   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-04 03:35:17,216-Speed 13889.32 samples/sec   Loss 1.6099   LearningRate 0.0002   Epoch: 22   Global Step: 38810   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-04 03:35:34,947-Speed 13862.58 samples/sec   Loss 1.6136   LearningRate 0.0002   Epoch: 22   Global Step: 38820   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-04 03:35:52,765-Speed 13793.73 samples/sec   Loss 1.6173   LearningRate 0.0002   Epoch: 22   Global Step: 38830   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-04 03:36:10,647-Speed 13743.84 samples/sec   Loss 1.6042   LearningRate 0.0002   Epoch: 22   Global Step: 38840   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-04 03:36:28,352-Speed 13881.64 samples/sec   Loss 1.6117   LearningRate 0.0002   Epoch: 22   Global Step: 38850   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-04 03:36:46,112-Speed 13840.20 samples/sec   Loss 1.6066   LearningRate 0.0002   Epoch: 22   Global Step: 38860   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-04 03:37:03,879-Speed 13832.94 samples/sec   Loss 1.6095   LearningRate 0.0002   Epoch: 22   Global Step: 38870   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-04 03:37:21,606-Speed 13864.56 samples/sec   Loss 1.6121   LearningRate 0.0002   Epoch: 22   Global Step: 38880   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-04 03:37:39,392-Speed 13818.80 samples/sec   Loss 1.6129   LearningRate 0.0002   Epoch: 22   Global Step: 38890   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-04 03:37:57,195-Speed 13804.82 samples/sec   Loss 1.6069   LearningRate 0.0002   Epoch: 22   Global Step: 38900   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-04 03:38:15,092-Speed 13732.71 samples/sec   Loss 1.6078   LearningRate 0.0002   Epoch: 22   Global Step: 38910   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-04 03:38:32,925-Speed 13782.60 samples/sec   Loss 1.6005   LearningRate 0.0002   Epoch: 22   Global Step: 38920   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-04 03:38:50,714-Speed 13815.56 samples/sec   Loss 1.6137   LearningRate 0.0002   Epoch: 22   Global Step: 38930   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-04 03:39:08,556-Speed 13775.67 samples/sec   Loss 1.5957   LearningRate 0.0002   Epoch: 22   Global Step: 38940   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-04 03:39:26,329-Speed 13828.22 samples/sec   Loss 1.6149   LearningRate 0.0002   Epoch: 22   Global Step: 38950   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-04 03:39:44,132-Speed 13805.36 samples/sec   Loss 1.6030   LearningRate 0.0002   Epoch: 22   Global Step: 38960   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-04 03:40:01,970-Speed 13778.84 samples/sec   Loss 1.6048   LearningRate 0.0002   Epoch: 22   Global Step: 38970   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-04 03:40:19,881-Speed 13722.23 samples/sec   Loss 1.5952   LearningRate 0.0002   Epoch: 22   Global Step: 38980   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-04 03:40:37,677-Speed 13810.73 samples/sec   Loss 1.6073   LearningRate 0.0002   Epoch: 22   Global Step: 38990   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-04 03:40:55,460-Speed 13820.49 samples/sec   Loss 1.6169   LearningRate 0.0002   Epoch: 22   Global Step: 39000   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-04 03:41:13,240-Speed 13823.35 samples/sec   Loss 1.6232   LearningRate 0.0002   Epoch: 22   Global Step: 39010   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-04 03:41:30,944-Speed 13882.70 samples/sec   Loss 1.6059   LearningRate 0.0002   Epoch: 22   Global Step: 39020   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-04 03:41:48,721-Speed 13825.79 samples/sec   Loss 1.6050   LearningRate 0.0002   Epoch: 22   Global Step: 39030   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-04 03:42:06,456-Speed 13857.72 samples/sec   Loss 1.6095   LearningRate 0.0002   Epoch: 22   Global Step: 39040   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-04 03:42:24,305-Speed 13770.78 samples/sec   Loss 1.6078   LearningRate 0.0002   Epoch: 22   Global Step: 39050   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-04 03:42:42,085-Speed 13822.75 samples/sec   Loss 1.6113   LearningRate 0.0002   Epoch: 22   Global Step: 39060   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-04 03:42:59,840-Speed 13843.30 samples/sec   Loss 1.6160   LearningRate 0.0002   Epoch: 22   Global Step: 39070   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-04 03:43:17,634-Speed 13811.93 samples/sec   Loss 1.5957   LearningRate 0.0002   Epoch: 22   Global Step: 39080   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-04 03:43:35,367-Speed 13859.57 samples/sec   Loss 1.6166   LearningRate 0.0002   Epoch: 22   Global Step: 39090   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-03-04 03:43:53,151-Speed 13820.53 samples/sec   Loss 1.5948   LearningRate 0.0002   Epoch: 22   Global Step: 39100   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-03-04 03:44:10,986-Speed 13779.93 samples/sec   Loss 1.6037   LearningRate 0.0002   Epoch: 22   Global Step: 39110   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-03-04 03:44:28,791-Speed 13804.08 samples/sec   Loss 1.6209   LearningRate 0.0002   Epoch: 22   Global Step: 39120   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-03-04 03:44:46,514-Speed 13867.61 samples/sec   Loss 1.6061   LearningRate 0.0002   Epoch: 22   Global Step: 39130   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-03-04 03:45:04,343-Speed 13785.06 samples/sec   Loss 1.6027   LearningRate 0.0002   Epoch: 22   Global Step: 39140   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-03-04 03:45:22,059-Speed 13873.79 samples/sec   Loss 1.5986   LearningRate 0.0002   Epoch: 22   Global Step: 39150   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-03-04 03:45:39,824-Speed 13835.24 samples/sec   Loss 1.6040   LearningRate 0.0002   Epoch: 22   Global Step: 39160   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-03-04 03:45:57,621-Speed 13809.94 samples/sec   Loss 1.5959   LearningRate 0.0002   Epoch: 22   Global Step: 39170   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-03-04 03:46:15,417-Speed 13810.78 samples/sec   Loss 1.5944   LearningRate 0.0002   Epoch: 22   Global Step: 39180   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-03-04 03:46:33,129-Speed 13876.48 samples/sec   Loss 1.6083   LearningRate 0.0002   Epoch: 22   Global Step: 39190   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-04 03:46:50,877-Speed 13847.74 samples/sec   Loss 1.6001   LearningRate 0.0002   Epoch: 22   Global Step: 39200   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-04 03:47:08,727-Speed 13768.89 samples/sec   Loss 1.5979   LearningRate 0.0002   Epoch: 22   Global Step: 39210   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-04 03:47:26,472-Speed 13851.03 samples/sec   Loss 1.5995   LearningRate 0.0002   Epoch: 22   Global Step: 39220   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-04 03:47:44,207-Speed 13858.34 samples/sec   Loss 1.6020   LearningRate 0.0002   Epoch: 22   Global Step: 39230   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-04 03:48:02,236-Speed 13631.77 samples/sec   Loss 1.6072   LearningRate 0.0002   Epoch: 22   Global Step: 39240   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-04 03:48:19,995-Speed 13838.95 samples/sec   Loss 1.5925   LearningRate 0.0002   Epoch: 22   Global Step: 39250   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-04 03:48:37,729-Speed 13859.75 samples/sec   Loss 1.5910   LearningRate 0.0002   Epoch: 22   Global Step: 39260   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-03-04 03:48:55,567-Speed 13777.97 samples/sec   Loss 1.5929   LearningRate 0.0002   Epoch: 22   Global Step: 39270   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-03-04 03:49:13,322-Speed 13842.92 samples/sec   Loss 1.5968   LearningRate 0.0002   Epoch: 22   Global Step: 39280   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-03-04 03:49:31,143-Speed 13790.98 samples/sec   Loss 1.5951   LearningRate 0.0002   Epoch: 22   Global Step: 39290   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-03-04 03:49:48,900-Speed 13841.42 samples/sec   Loss 1.5898   LearningRate 0.0002   Epoch: 22   Global Step: 39300   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-03-04 03:50:06,742-Speed 13775.00 samples/sec   Loss 1.5930   LearningRate 0.0002   Epoch: 22   Global Step: 39310   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-03-04 03:50:24,521-Speed 13824.06 samples/sec   Loss 1.5906   LearningRate 0.0002   Epoch: 22   Global Step: 39320   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-03-04 03:50:42,376-Speed 13764.78 samples/sec   Loss 1.5893   LearningRate 0.0002   Epoch: 22   Global Step: 39330   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-03-04 03:51:00,098-Speed 13868.67 samples/sec   Loss 1.6045   LearningRate 0.0002   Epoch: 22   Global Step: 39340   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-03-04 03:51:17,870-Speed 13829.40 samples/sec   Loss 1.6059   LearningRate 0.0002   Epoch: 22   Global Step: 39350   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-03-04 03:51:35,705-Speed 13780.43 samples/sec   Loss 1.6111   LearningRate 0.0002   Epoch: 22   Global Step: 39360   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-04 03:51:53,417-Speed 13876.65 samples/sec   Loss 1.6001   LearningRate 0.0002   Epoch: 22   Global Step: 39370   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-04 03:52:11,145-Speed 13863.23 samples/sec   Loss 1.5856   LearningRate 0.0002   Epoch: 22   Global Step: 39380   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-04 03:52:28,857-Speed 13876.29 samples/sec   Loss 1.5804   LearningRate 0.0002   Epoch: 22   Global Step: 39390   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-04 03:52:46,649-Speed 13813.67 samples/sec   Loss 1.5844   LearningRate 0.0002   Epoch: 22   Global Step: 39400   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-04 03:53:04,466-Speed 13795.34 samples/sec   Loss 1.5879   LearningRate 0.0002   Epoch: 22   Global Step: 39410   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-04 03:53:22,270-Speed 13804.08 samples/sec   Loss 1.5954   LearningRate 0.0002   Epoch: 22   Global Step: 39420   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-04 03:53:40,055-Speed 13819.40 samples/sec   Loss 1.5932   LearningRate 0.0002   Epoch: 22   Global Step: 39430   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-04 03:53:57,861-Speed 13802.76 samples/sec   Loss 1.5997   LearningRate 0.0002   Epoch: 22   Global Step: 39440   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-04 03:54:15,749-Speed 13740.19 samples/sec   Loss 1.5893   LearningRate 0.0002   Epoch: 22   Global Step: 39450   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-04 03:54:33,649-Speed 13730.09 samples/sec   Loss 1.5906   LearningRate 0.0002   Epoch: 22   Global Step: 39460   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-04 03:54:51,544-Speed 13734.01 samples/sec   Loss 1.5745   LearningRate 0.0002   Epoch: 22   Global Step: 39470   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-04 03:55:09,609-Speed 13605.59 samples/sec   Loss 1.5849   LearningRate 0.0002   Epoch: 22   Global Step: 39480   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-04 03:55:27,491-Speed 13744.76 samples/sec   Loss 1.5819   LearningRate 0.0002   Epoch: 22   Global Step: 39490   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-04 03:55:45,182-Speed 13892.04 samples/sec   Loss 1.5921   LearningRate 0.0002   Epoch: 22   Global Step: 39500   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-04 03:56:02,971-Speed 13815.87 samples/sec   Loss 1.5871   LearningRate 0.0002   Epoch: 22   Global Step: 39510   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-04 03:56:20,669-Speed 13887.91 samples/sec   Loss 1.5843   LearningRate 0.0002   Epoch: 22   Global Step: 39520   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-04 03:56:38,480-Speed 13799.04 samples/sec   Loss 1.5897   LearningRate 0.0002   Epoch: 22   Global Step: 39530   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-04 03:56:56,397-Speed 13717.10 samples/sec   Loss 1.5837   LearningRate 0.0002   Epoch: 22   Global Step: 39540   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-04 03:57:14,133-Speed 13857.26 samples/sec   Loss 1.5835   LearningRate 0.0002   Epoch: 22   Global Step: 39550   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-04 03:57:31,912-Speed 13824.73 samples/sec   Loss 1.5969   LearningRate 0.0002   Epoch: 22   Global Step: 39560   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-04 03:57:49,668-Speed 13841.56 samples/sec   Loss 1.5855   LearningRate 0.0002   Epoch: 22   Global Step: 39570   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-04 03:58:07,437-Speed 13831.57 samples/sec   Loss 1.6005   LearningRate 0.0002   Epoch: 22   Global Step: 39580   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-04 03:58:25,201-Speed 13835.96 samples/sec   Loss 1.5914   LearningRate 0.0002   Epoch: 22   Global Step: 39590   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-04 03:58:43,034-Speed 13781.73 samples/sec   Loss 1.5790   LearningRate 0.0002   Epoch: 22   Global Step: 39600   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-04 03:59:00,858-Speed 13788.97 samples/sec   Loss 1.6013   LearningRate 0.0002   Epoch: 22   Global Step: 39610   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-04 03:59:18,675-Speed 13794.67 samples/sec   Loss 1.5945   LearningRate 0.0002   Epoch: 22   Global Step: 39620   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-04 03:59:36,481-Speed 13802.67 samples/sec   Loss 1.5815   LearningRate 0.0002   Epoch: 22   Global Step: 39630   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-04 03:59:54,244-Speed 13836.72 samples/sec   Loss 1.5786   LearningRate 0.0002   Epoch: 22   Global Step: 39640   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-04 04:00:12,098-Speed 13766.04 samples/sec   Loss 1.5694   LearningRate 0.0002   Epoch: 22   Global Step: 39650   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-04 04:00:29,942-Speed 13773.60 samples/sec   Loss 1.5781   LearningRate 0.0002   Epoch: 22   Global Step: 39660   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-04 04:00:47,745-Speed 13804.83 samples/sec   Loss 1.5880   LearningRate 0.0002   Epoch: 22   Global Step: 39670   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-04 04:01:05,564-Speed 13795.68 samples/sec   Loss 1.5897   LearningRate 0.0002   Epoch: 22   Global Step: 39680   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-04 04:01:23,378-Speed 13796.96 samples/sec   Loss 1.5889   LearningRate 0.0002   Epoch: 22   Global Step: 39690   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-04 04:01:41,191-Speed 13797.13 samples/sec   Loss 1.5983   LearningRate 0.0002   Epoch: 22   Global Step: 39700   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-04 04:01:59,023-Speed 13782.42 samples/sec   Loss 1.5868   LearningRate 0.0002   Epoch: 22   Global Step: 39710   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-04 04:02:16,800-Speed 13825.74 samples/sec   Loss 1.5963   LearningRate 0.0002   Epoch: 22   Global Step: 39720   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-04 04:02:34,588-Speed 13817.32 samples/sec   Loss 1.5983   LearningRate 0.0002   Epoch: 22   Global Step: 39730   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-04 04:02:52,301-Speed 13875.16 samples/sec   Loss 1.5874   LearningRate 0.0002   Epoch: 22   Global Step: 39740   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-04 04:03:10,126-Speed 13787.95 samples/sec   Loss 1.5920   LearningRate 0.0002   Epoch: 22   Global Step: 39750   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-04 04:04:18,626-Speed 3587.85 samples/sec   Loss 1.5876   LearningRate 0.0002   Epoch: 23   Global Step: 39760   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-04 04:04:36,309-Speed 13899.76 samples/sec   Loss 1.5723   LearningRate 0.0002   Epoch: 23   Global Step: 39770   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-04 04:04:53,993-Speed 13897.54 samples/sec   Loss 1.5769   LearningRate 0.0002   Epoch: 23   Global Step: 39780   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-04 04:05:11,899-Speed 13725.98 samples/sec   Loss 1.5585   LearningRate 0.0002   Epoch: 23   Global Step: 39790   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-04 04:05:29,687-Speed 13816.99 samples/sec   Loss 1.5692   LearningRate 0.0002   Epoch: 23   Global Step: 39800   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-04 04:05:47,559-Speed 13752.62 samples/sec   Loss 1.5561   LearningRate 0.0002   Epoch: 23   Global Step: 39810   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-04 04:06:05,328-Speed 13831.53 samples/sec   Loss 1.5682   LearningRate 0.0002   Epoch: 23   Global Step: 39820   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-04 04:06:23,152-Speed 13788.66 samples/sec   Loss 1.5607   LearningRate 0.0002   Epoch: 23   Global Step: 39830   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-04 04:06:40,982-Speed 13784.61 samples/sec   Loss 1.5656   LearningRate 0.0002   Epoch: 23   Global Step: 39840   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-04 04:06:58,762-Speed 13823.55 samples/sec   Loss 1.5648   LearningRate 0.0002   Epoch: 23   Global Step: 39850   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-04 04:07:16,660-Speed 13731.80 samples/sec   Loss 1.5700   LearningRate 0.0002   Epoch: 23   Global Step: 39860   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-04 04:07:34,394-Speed 13858.77 samples/sec   Loss 1.5666   LearningRate 0.0002   Epoch: 23   Global Step: 39870   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-04 04:07:52,174-Speed 13823.60 samples/sec   Loss 1.5642   LearningRate 0.0002   Epoch: 23   Global Step: 39880   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-04 04:08:10,005-Speed 13784.85 samples/sec   Loss 1.5652   LearningRate 0.0002   Epoch: 23   Global Step: 39890   Fp16 Grad Scale: 65536   Required: 15 hours
Training: 2022-03-04 04:08:27,734-Speed 13862.97 samples/sec   Loss 1.5793   LearningRate 0.0002   Epoch: 23   Global Step: 39900   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-04 04:08:45,598-Speed 13758.88 samples/sec   Loss 1.5747   LearningRate 0.0002   Epoch: 23   Global Step: 39910   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-04 04:09:03,460-Speed 13760.11 samples/sec   Loss 1.5659   LearningRate 0.0002   Epoch: 23   Global Step: 39920   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-04 04:09:21,301-Speed 13776.11 samples/sec   Loss 1.5636   LearningRate 0.0002   Epoch: 23   Global Step: 39930   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-04 04:09:39,061-Speed 13838.70 samples/sec   Loss 1.5773   LearningRate 0.0002   Epoch: 23   Global Step: 39940   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-04 04:09:56,836-Speed 13827.09 samples/sec   Loss 1.5752   LearningRate 0.0002   Epoch: 23   Global Step: 39950   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-03-04 04:10:14,706-Speed 13753.01 samples/sec   Loss 1.5644   LearningRate 0.0002   Epoch: 23   Global Step: 39960   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-03-04 04:10:32,466-Speed 13838.56 samples/sec   Loss 1.5583   LearningRate 0.0002   Epoch: 23   Global Step: 39970   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-03-04 04:10:50,349-Speed 13744.59 samples/sec   Loss 1.5758   LearningRate 0.0002   Epoch: 23   Global Step: 39980   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-03-04 04:11:08,103-Speed 13842.99 samples/sec   Loss 1.5723   LearningRate 0.0002   Epoch: 23   Global Step: 39990   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-03-04 04:11:25,863-Speed 13839.04 samples/sec   Loss 1.5682   LearningRate 0.0002   Epoch: 23   Global Step: 40000   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-03-04 04:11:43,633-Speed 13830.49 samples/sec   Loss 1.5758   LearningRate 0.0002   Epoch: 23   Global Step: 40010   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-03-04 04:12:01,439-Speed 13803.06 samples/sec   Loss 1.5654   LearningRate 0.0002   Epoch: 23   Global Step: 40020   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-03-04 04:12:19,286-Speed 13773.90 samples/sec   Loss 1.5601   LearningRate 0.0002   Epoch: 23   Global Step: 40030   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-03-04 04:12:37,140-Speed 13766.03 samples/sec   Loss 1.5613   LearningRate 0.0002   Epoch: 23   Global Step: 40040   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-03-04 04:12:54,983-Speed 13774.17 samples/sec   Loss 1.5605   LearningRate 0.0002   Epoch: 23   Global Step: 40050   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-04 04:13:12,817-Speed 13781.21 samples/sec   Loss 1.5577   LearningRate 0.0002   Epoch: 23   Global Step: 40060   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-04 04:13:30,600-Speed 13821.86 samples/sec   Loss 1.5690   LearningRate 0.0002   Epoch: 23   Global Step: 40070   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-04 04:13:48,338-Speed 13856.16 samples/sec   Loss 1.5690   LearningRate 0.0002   Epoch: 23   Global Step: 40080   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-04 04:14:06,159-Speed 13790.91 samples/sec   Loss 1.5727   LearningRate 0.0002   Epoch: 23   Global Step: 40090   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-04 04:14:24,168-Speed 13647.48 samples/sec   Loss 1.5587   LearningRate 0.0002   Epoch: 23   Global Step: 40100   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-03-04 04:14:41,974-Speed 13803.11 samples/sec   Loss 1.5627   LearningRate 0.0002   Epoch: 23   Global Step: 40110   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-03-04 04:14:59,728-Speed 13843.55 samples/sec   Loss 1.5659   LearningRate 0.0002   Epoch: 23   Global Step: 40120   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-03-04 04:15:17,405-Speed 13903.53 samples/sec   Loss 1.5683   LearningRate 0.0002   Epoch: 23   Global Step: 40130   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-03-04 04:15:35,168-Speed 13836.32 samples/sec   Loss 1.5706   LearningRate 0.0002   Epoch: 23   Global Step: 40140   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-03-04 04:15:53,002-Speed 13781.32 samples/sec   Loss 1.5798   LearningRate 0.0002   Epoch: 23   Global Step: 40150   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-03-04 04:16:10,865-Speed 13760.03 samples/sec   Loss 1.5724   LearningRate 0.0002   Epoch: 23   Global Step: 40160   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-03-04 04:16:28,603-Speed 13855.70 samples/sec   Loss 1.5661   LearningRate 0.0002   Epoch: 23   Global Step: 40170   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-03-04 04:16:46,360-Speed 13841.09 samples/sec   Loss 1.5612   LearningRate 0.0002   Epoch: 23   Global Step: 40180   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-03-04 04:17:04,146-Speed 13818.16 samples/sec   Loss 1.5550   LearningRate 0.0002   Epoch: 23   Global Step: 40190   Fp16 Grad Scale: 16384   Required: 15 hours
Training: 2022-03-04 04:17:21,932-Speed 13818.55 samples/sec   Loss 1.5624   LearningRate 0.0002   Epoch: 23   Global Step: 40200   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-04 04:17:39,681-Speed 13847.47 samples/sec   Loss 1.5580   LearningRate 0.0002   Epoch: 23   Global Step: 40210   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-04 04:17:57,449-Speed 13832.43 samples/sec   Loss 1.5598   LearningRate 0.0002   Epoch: 23   Global Step: 40220   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-04 04:18:15,387-Speed 13702.28 samples/sec   Loss 1.5557   LearningRate 0.0002   Epoch: 23   Global Step: 40230   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-04 04:18:33,204-Speed 13793.90 samples/sec   Loss 1.5577   LearningRate 0.0002   Epoch: 23   Global Step: 40240   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-04 04:18:50,979-Speed 13827.26 samples/sec   Loss 1.5547   LearningRate 0.0002   Epoch: 23   Global Step: 40250   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-04 04:19:08,814-Speed 13781.22 samples/sec   Loss 1.5580   LearningRate 0.0002   Epoch: 23   Global Step: 40260   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-04 04:19:26,608-Speed 13812.19 samples/sec   Loss 1.5664   LearningRate 0.0002   Epoch: 23   Global Step: 40270   Fp16 Grad Scale: 32768   Required: 15 hours
Training: 2022-03-04 04:19:44,354-Speed 13850.08 samples/sec   Loss 1.5470   LearningRate 0.0002   Epoch: 23   Global Step: 40280   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-04 04:20:02,224-Speed 13753.80 samples/sec   Loss 1.5591   LearningRate 0.0002   Epoch: 23   Global Step: 40290   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-04 04:20:20,014-Speed 13815.30 samples/sec   Loss 1.5646   LearningRate 0.0002   Epoch: 23   Global Step: 40300   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-04 04:20:37,789-Speed 13826.57 samples/sec   Loss 1.5539   LearningRate 0.0002   Epoch: 23   Global Step: 40310   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-04 04:20:55,556-Speed 13833.79 samples/sec   Loss 1.5533   LearningRate 0.0002   Epoch: 23   Global Step: 40320   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-04 04:21:13,294-Speed 13856.70 samples/sec   Loss 1.5585   LearningRate 0.0002   Epoch: 23   Global Step: 40330   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-04 04:21:31,058-Speed 13835.61 samples/sec   Loss 1.5403   LearningRate 0.0002   Epoch: 23   Global Step: 40340   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-04 04:21:48,885-Speed 13786.23 samples/sec   Loss 1.5554   LearningRate 0.0002   Epoch: 23   Global Step: 40350   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-04 04:22:06,681-Speed 13811.63 samples/sec   Loss 1.5568   LearningRate 0.0002   Epoch: 23   Global Step: 40360   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-04 04:22:24,547-Speed 13755.95 samples/sec   Loss 1.5429   LearningRate 0.0002   Epoch: 23   Global Step: 40370   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-04 04:22:42,527-Speed 13670.69 samples/sec   Loss 1.5478   LearningRate 0.0002   Epoch: 23   Global Step: 40380   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-04 04:23:00,451-Speed 13712.39 samples/sec   Loss 1.5633   LearningRate 0.0002   Epoch: 23   Global Step: 40390   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-04 04:23:18,310-Speed 13762.14 samples/sec   Loss 1.5442   LearningRate 0.0002   Epoch: 23   Global Step: 40400   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-04 04:23:36,151-Speed 13776.01 samples/sec   Loss 1.5472   LearningRate 0.0002   Epoch: 23   Global Step: 40410   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-04 04:23:53,989-Speed 13778.40 samples/sec   Loss 1.5458   LearningRate 0.0002   Epoch: 23   Global Step: 40420   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-04 04:24:11,872-Speed 13743.39 samples/sec   Loss 1.5493   LearningRate 0.0002   Epoch: 23   Global Step: 40430   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-04 04:24:29,673-Speed 13806.86 samples/sec   Loss 1.5451   LearningRate 0.0002   Epoch: 23   Global Step: 40440   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-04 04:24:47,433-Speed 13838.70 samples/sec   Loss 1.5473   LearningRate 0.0002   Epoch: 23   Global Step: 40450   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-04 04:25:05,208-Speed 13827.08 samples/sec   Loss 1.5544   LearningRate 0.0002   Epoch: 23   Global Step: 40460   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-04 04:25:23,035-Speed 13786.52 samples/sec   Loss 1.5403   LearningRate 0.0002   Epoch: 23   Global Step: 40470   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-04 04:25:40,819-Speed 13820.47 samples/sec   Loss 1.5531   LearningRate 0.0002   Epoch: 23   Global Step: 40480   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-04 04:25:58,669-Speed 13768.34 samples/sec   Loss 1.5511   LearningRate 0.0002   Epoch: 23   Global Step: 40490   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-04 04:26:16,511-Speed 13775.04 samples/sec   Loss 1.5456   LearningRate 0.0002   Epoch: 23   Global Step: 40500   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-04 04:26:34,259-Speed 13848.42 samples/sec   Loss 1.5581   LearningRate 0.0002   Epoch: 23   Global Step: 40510   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-04 04:26:51,997-Speed 13856.50 samples/sec   Loss 1.5509   LearningRate 0.0002   Epoch: 23   Global Step: 40520   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-04 04:27:09,759-Speed 13836.58 samples/sec   Loss 1.5393   LearningRate 0.0002   Epoch: 23   Global Step: 40530   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-04 04:27:27,453-Speed 13890.72 samples/sec   Loss 1.5531   LearningRate 0.0002   Epoch: 23   Global Step: 40540   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-04 04:27:45,231-Speed 13824.33 samples/sec   Loss 1.5352   LearningRate 0.0002   Epoch: 23   Global Step: 40550   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-04 04:28:03,022-Speed 13814.85 samples/sec   Loss 1.5380   LearningRate 0.0002   Epoch: 23   Global Step: 40560   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-04 04:28:20,782-Speed 13838.78 samples/sec   Loss 1.5352   LearningRate 0.0002   Epoch: 23   Global Step: 40570   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-04 04:28:38,502-Speed 13869.68 samples/sec   Loss 1.5480   LearningRate 0.0002   Epoch: 23   Global Step: 40580   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-04 04:28:56,342-Speed 13776.50 samples/sec   Loss 1.5492   LearningRate 0.0002   Epoch: 23   Global Step: 40590   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-04 04:29:14,069-Speed 13865.03 samples/sec   Loss 1.5550   LearningRate 0.0002   Epoch: 23   Global Step: 40600   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-03-04 04:29:31,894-Speed 13788.33 samples/sec   Loss 1.5528   LearningRate 0.0002   Epoch: 23   Global Step: 40610   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-03-04 04:29:49,611-Speed 13872.29 samples/sec   Loss 1.5418   LearningRate 0.0002   Epoch: 23   Global Step: 40620   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-03-04 04:30:07,389-Speed 13824.45 samples/sec   Loss 1.5449   LearningRate 0.0002   Epoch: 23   Global Step: 40630   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-03-04 04:30:25,176-Speed 13817.90 samples/sec   Loss 1.5429   LearningRate 0.0002   Epoch: 23   Global Step: 40640   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-03-04 04:30:43,111-Speed 13703.95 samples/sec   Loss 1.5331   LearningRate 0.0002   Epoch: 23   Global Step: 40650   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-03-04 04:31:00,935-Speed 13789.03 samples/sec   Loss 1.5392   LearningRate 0.0002   Epoch: 23   Global Step: 40660   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-03-04 04:31:18,696-Speed 13837.52 samples/sec   Loss 1.5412   LearningRate 0.0002   Epoch: 23   Global Step: 40670   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-03-04 04:31:36,405-Speed 13878.81 samples/sec   Loss 1.5354   LearningRate 0.0002   Epoch: 23   Global Step: 40680   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-03-04 04:31:54,144-Speed 13855.06 samples/sec   Loss 1.5287   LearningRate 0.0002   Epoch: 23   Global Step: 40690   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-03-04 04:32:11,900-Speed 13842.32 samples/sec   Loss 1.5490   LearningRate 0.0002   Epoch: 23   Global Step: 40700   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-04 04:32:29,777-Speed 13748.87 samples/sec   Loss 1.5413   LearningRate 0.0002   Epoch: 23   Global Step: 40710   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-04 04:32:47,561-Speed 13819.97 samples/sec   Loss 1.5417   LearningRate 0.0002   Epoch: 23   Global Step: 40720   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-04 04:33:05,403-Speed 13775.29 samples/sec   Loss 1.5448   LearningRate 0.0002   Epoch: 23   Global Step: 40730   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-03-04 04:33:23,271-Speed 13754.71 samples/sec   Loss 1.5347   LearningRate 0.0002   Epoch: 23   Global Step: 40740   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-03-04 04:33:41,128-Speed 13763.49 samples/sec   Loss 1.5339   LearningRate 0.0002   Epoch: 23   Global Step: 40750   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-03-04 04:33:58,953-Speed 13788.09 samples/sec   Loss 1.5470   LearningRate 0.0002   Epoch: 23   Global Step: 40760   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-03-04 04:34:16,742-Speed 13816.72 samples/sec   Loss 1.5378   LearningRate 0.0002   Epoch: 23   Global Step: 40770   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-03-04 04:34:34,556-Speed 13796.95 samples/sec   Loss 1.5292   LearningRate 0.0002   Epoch: 23   Global Step: 40780   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-03-04 04:34:52,347-Speed 13814.26 samples/sec   Loss 1.5434   LearningRate 0.0002   Epoch: 23   Global Step: 40790   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-03-04 04:35:10,146-Speed 13808.25 samples/sec   Loss 1.5421   LearningRate 0.0002   Epoch: 23   Global Step: 40800   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-03-04 04:35:27,974-Speed 13785.77 samples/sec   Loss 1.5437   LearningRate 0.0002   Epoch: 23   Global Step: 40810   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-03-04 04:35:45,844-Speed 13754.00 samples/sec   Loss 1.5345   LearningRate 0.0002   Epoch: 23   Global Step: 40820   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-03-04 04:36:03,688-Speed 13773.65 samples/sec   Loss 1.5199   LearningRate 0.0002   Epoch: 23   Global Step: 40830   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-04 04:36:21,545-Speed 13763.67 samples/sec   Loss 1.5312   LearningRate 0.0002   Epoch: 23   Global Step: 40840   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-04 04:36:39,282-Speed 13856.56 samples/sec   Loss 1.5398   LearningRate 0.0002   Epoch: 23   Global Step: 40850   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-04 04:36:57,033-Speed 13846.02 samples/sec   Loss 1.5311   LearningRate 0.0002   Epoch: 23   Global Step: 40860   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-04 04:37:14,838-Speed 13803.80 samples/sec   Loss 1.5382   LearningRate 0.0002   Epoch: 23   Global Step: 40870   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-04 04:37:32,634-Speed 13810.75 samples/sec   Loss 1.5301   LearningRate 0.0002   Epoch: 23   Global Step: 40880   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-04 04:37:50,415-Speed 13822.25 samples/sec   Loss 1.5346   LearningRate 0.0002   Epoch: 23   Global Step: 40890   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-03-04 04:38:08,209-Speed 13812.54 samples/sec   Loss 1.5293   LearningRate 0.0002   Epoch: 23   Global Step: 40900   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-03-04 04:38:26,031-Speed 13790.35 samples/sec   Loss 1.5461   LearningRate 0.0002   Epoch: 23   Global Step: 40910   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-03-04 04:38:43,779-Speed 13848.19 samples/sec   Loss 1.5399   LearningRate 0.0002   Epoch: 23   Global Step: 40920   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-03-04 04:39:01,586-Speed 13801.98 samples/sec   Loss 1.5393   LearningRate 0.0002   Epoch: 23   Global Step: 40930   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-03-04 04:39:19,387-Speed 13807.39 samples/sec   Loss 1.5313   LearningRate 0.0002   Epoch: 23   Global Step: 40940   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-03-04 04:39:37,234-Speed 13771.15 samples/sec   Loss 1.5172   LearningRate 0.0002   Epoch: 23   Global Step: 40950   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-03-04 04:39:55,002-Speed 13833.54 samples/sec   Loss 1.5258   LearningRate 0.0002   Epoch: 23   Global Step: 40960   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-03-04 04:40:12,757-Speed 13842.42 samples/sec   Loss 1.5264   LearningRate 0.0002   Epoch: 23   Global Step: 40970   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-03-04 04:40:30,618-Speed 13760.21 samples/sec   Loss 1.5233   LearningRate 0.0002   Epoch: 23   Global Step: 40980   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-03-04 04:40:48,369-Speed 13846.01 samples/sec   Loss 1.5247   LearningRate 0.0002   Epoch: 23   Global Step: 40990   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-04 04:41:06,164-Speed 13812.01 samples/sec   Loss 1.5332   LearningRate 0.0002   Epoch: 23   Global Step: 41000   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-04 04:41:23,938-Speed 13827.19 samples/sec   Loss 1.5305   LearningRate 0.0002   Epoch: 23   Global Step: 41010   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-04 04:41:41,739-Speed 13806.62 samples/sec   Loss 1.5245   LearningRate 0.0002   Epoch: 23   Global Step: 41020   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-04 04:41:59,428-Speed 13895.03 samples/sec   Loss 1.5246   LearningRate 0.0002   Epoch: 23   Global Step: 41030   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-04 04:42:17,199-Speed 13829.75 samples/sec   Loss 1.5173   LearningRate 0.0002   Epoch: 23   Global Step: 41040   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-04 04:42:34,938-Speed 13855.71 samples/sec   Loss 1.5282   LearningRate 0.0002   Epoch: 23   Global Step: 41050   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-04 04:42:52,697-Speed 13839.14 samples/sec   Loss 1.5232   LearningRate 0.0002   Epoch: 23   Global Step: 41060   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-04 04:43:10,435-Speed 13856.02 samples/sec   Loss 1.5209   LearningRate 0.0002   Epoch: 23   Global Step: 41070   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-04 04:43:28,172-Speed 13857.46 samples/sec   Loss 1.5230   LearningRate 0.0002   Epoch: 23   Global Step: 41080   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-04 04:43:45,897-Speed 13865.49 samples/sec   Loss 1.5465   LearningRate 0.0002   Epoch: 23   Global Step: 41090   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-04 04:44:03,667-Speed 13831.09 samples/sec   Loss 1.5172   LearningRate 0.0002   Epoch: 23   Global Step: 41100   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-04 04:44:21,383-Speed 13873.14 samples/sec   Loss 1.5293   LearningRate 0.0002   Epoch: 23   Global Step: 41110   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-04 04:44:39,221-Speed 13777.84 samples/sec   Loss 1.5179   LearningRate 0.0002   Epoch: 23   Global Step: 41120   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-04 04:44:56,970-Speed 13847.41 samples/sec   Loss 1.5268   LearningRate 0.0002   Epoch: 23   Global Step: 41130   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-04 04:45:14,794-Speed 13788.96 samples/sec   Loss 1.5222   LearningRate 0.0002   Epoch: 23   Global Step: 41140   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-04 04:45:32,631-Speed 13779.14 samples/sec   Loss 1.5223   LearningRate 0.0002   Epoch: 23   Global Step: 41150   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-04 04:45:50,403-Speed 13829.68 samples/sec   Loss 1.5258   LearningRate 0.0002   Epoch: 23   Global Step: 41160   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-04 04:46:08,082-Speed 13902.35 samples/sec   Loss 1.5268   LearningRate 0.0002   Epoch: 23   Global Step: 41170   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-04 04:46:25,843-Speed 13837.91 samples/sec   Loss 1.5099   LearningRate 0.0002   Epoch: 23   Global Step: 41180   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-04 04:46:43,651-Speed 13801.03 samples/sec   Loss 1.5382   LearningRate 0.0002   Epoch: 23   Global Step: 41190   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-04 04:47:01,429-Speed 13824.73 samples/sec   Loss 1.5216   LearningRate 0.0002   Epoch: 23   Global Step: 41200   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-04 04:47:19,210-Speed 13822.39 samples/sec   Loss 1.5084   LearningRate 0.0002   Epoch: 23   Global Step: 41210   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-04 04:47:36,988-Speed 13825.09 samples/sec   Loss 1.5241   LearningRate 0.0002   Epoch: 23   Global Step: 41220   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-04 04:47:54,738-Speed 13846.49 samples/sec   Loss 1.5201   LearningRate 0.0002   Epoch: 23   Global Step: 41230   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-04 04:48:12,608-Speed 13753.51 samples/sec   Loss 1.5234   LearningRate 0.0002   Epoch: 23   Global Step: 41240   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-04 04:48:30,410-Speed 13805.92 samples/sec   Loss 1.5208   LearningRate 0.0002   Epoch: 23   Global Step: 41250   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-04 04:48:48,209-Speed 13808.17 samples/sec   Loss 1.5212   LearningRate 0.0002   Epoch: 23   Global Step: 41260   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-04 04:49:05,947-Speed 13856.22 samples/sec   Loss 1.5263   LearningRate 0.0002   Epoch: 23   Global Step: 41270   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-04 04:49:23,704-Speed 13840.56 samples/sec   Loss 1.5216   LearningRate 0.0002   Epoch: 23   Global Step: 41280   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-04 04:49:41,452-Speed 13848.57 samples/sec   Loss 1.5097   LearningRate 0.0002   Epoch: 23   Global Step: 41290   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-04 04:49:59,242-Speed 13815.48 samples/sec   Loss 1.5154   LearningRate 0.0002   Epoch: 23   Global Step: 41300   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-04 04:50:17,053-Speed 13799.20 samples/sec   Loss 1.5194   LearningRate 0.0002   Epoch: 23   Global Step: 41310   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-04 04:50:34,904-Speed 13768.03 samples/sec   Loss 1.5218   LearningRate 0.0002   Epoch: 23   Global Step: 41320   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-04 04:50:52,648-Speed 13851.11 samples/sec   Loss 1.5199   LearningRate 0.0002   Epoch: 23   Global Step: 41330   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-04 04:51:10,394-Speed 13849.80 samples/sec   Loss 1.5178   LearningRate 0.0002   Epoch: 23   Global Step: 41340   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-04 04:51:28,195-Speed 13806.75 samples/sec   Loss 1.5280   LearningRate 0.0002   Epoch: 23   Global Step: 41350   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-04 04:51:45,912-Speed 13871.79 samples/sec   Loss 1.5206   LearningRate 0.0002   Epoch: 23   Global Step: 41360   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-04 04:52:03,636-Speed 13866.59 samples/sec   Loss 1.5294   LearningRate 0.0002   Epoch: 23   Global Step: 41370   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-04 04:52:21,539-Speed 13728.45 samples/sec   Loss 1.5120   LearningRate 0.0002   Epoch: 23   Global Step: 41380   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-04 04:52:39,333-Speed 13812.92 samples/sec   Loss 1.5307   LearningRate 0.0002   Epoch: 23   Global Step: 41390   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-04 04:52:57,086-Speed 13843.49 samples/sec   Loss 1.5096   LearningRate 0.0002   Epoch: 23   Global Step: 41400   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-04 04:53:14,937-Speed 13767.95 samples/sec   Loss 1.5184   LearningRate 0.0002   Epoch: 23   Global Step: 41410   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-04 04:53:32,731-Speed 13812.62 samples/sec   Loss 1.5185   LearningRate 0.0002   Epoch: 23   Global Step: 41420   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-04 04:53:50,521-Speed 13815.82 samples/sec   Loss 1.5287   LearningRate 0.0002   Epoch: 23   Global Step: 41430   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-04 04:54:08,176-Speed 13920.28 samples/sec   Loss 1.5214   LearningRate 0.0002   Epoch: 23   Global Step: 41440   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-04 04:54:25,930-Speed 13843.99 samples/sec   Loss 1.5171   LearningRate 0.0002   Epoch: 23   Global Step: 41450   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-04 04:54:43,743-Speed 13797.32 samples/sec   Loss 1.5238   LearningRate 0.0002   Epoch: 23   Global Step: 41460   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-04 04:55:01,730-Speed 13663.83 samples/sec   Loss 1.5203   LearningRate 0.0002   Epoch: 23   Global Step: 41470   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-04 04:55:19,660-Speed 13708.87 samples/sec   Loss 1.5264   LearningRate 0.0002   Epoch: 23   Global Step: 41480   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-04 04:56:28,753-Speed 3556.99 samples/sec   Loss 1.4998   LearningRate 0.0002   Epoch: 24   Global Step: 41490   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-04 04:56:46,599-Speed 13772.14 samples/sec   Loss 1.5048   LearningRate 0.0002   Epoch: 24   Global Step: 41500   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-04 04:57:04,464-Speed 13757.45 samples/sec   Loss 1.5084   LearningRate 0.0002   Epoch: 24   Global Step: 41510   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-04 04:57:22,256-Speed 13814.03 samples/sec   Loss 1.4967   LearningRate 0.0002   Epoch: 24   Global Step: 41520   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-04 04:57:40,049-Speed 13812.33 samples/sec   Loss 1.5002   LearningRate 0.0002   Epoch: 24   Global Step: 41530   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-04 04:57:57,890-Speed 13775.74 samples/sec   Loss 1.5011   LearningRate 0.0002   Epoch: 24   Global Step: 41540   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-04 04:58:15,894-Speed 13651.99 samples/sec   Loss 1.5059   LearningRate 0.0002   Epoch: 24   Global Step: 41550   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-04 04:58:33,755-Speed 13760.03 samples/sec   Loss 1.5079   LearningRate 0.0002   Epoch: 24   Global Step: 41560   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-04 04:58:51,555-Speed 13808.01 samples/sec   Loss 1.4882   LearningRate 0.0002   Epoch: 24   Global Step: 41570   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-04 04:59:09,401-Speed 13771.98 samples/sec   Loss 1.5062   LearningRate 0.0002   Epoch: 24   Global Step: 41580   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-03-04 04:59:27,228-Speed 13786.85 samples/sec   Loss 1.4961   LearningRate 0.0002   Epoch: 24   Global Step: 41590   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-03-04 04:59:45,102-Speed 13749.91 samples/sec   Loss 1.5092   LearningRate 0.0002   Epoch: 24   Global Step: 41600   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-03-04 05:00:02,938-Speed 13780.37 samples/sec   Loss 1.4950   LearningRate 0.0002   Epoch: 24   Global Step: 41610   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-03-04 05:00:20,777-Speed 13777.78 samples/sec   Loss 1.5115   LearningRate 0.0002   Epoch: 24   Global Step: 41620   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-03-04 05:00:38,647-Speed 13753.78 samples/sec   Loss 1.5061   LearningRate 0.0002   Epoch: 24   Global Step: 41630   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-03-04 05:00:56,446-Speed 13809.10 samples/sec   Loss 1.4994   LearningRate 0.0002   Epoch: 24   Global Step: 41640   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-03-04 05:01:14,311-Speed 13757.16 samples/sec   Loss 1.4975   LearningRate 0.0002   Epoch: 24   Global Step: 41650   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-03-04 05:01:32,205-Speed 13735.05 samples/sec   Loss 1.5040   LearningRate 0.0002   Epoch: 24   Global Step: 41660   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-03-04 05:01:50,017-Speed 13798.32 samples/sec   Loss 1.5026   LearningRate 0.0002   Epoch: 24   Global Step: 41670   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-03-04 05:02:07,931-Speed 13719.45 samples/sec   Loss 1.4947   LearningRate 0.0002   Epoch: 24   Global Step: 41680   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-04 05:02:25,767-Speed 13779.98 samples/sec   Loss 1.5012   LearningRate 0.0002   Epoch: 24   Global Step: 41690   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-04 05:02:43,663-Speed 13733.36 samples/sec   Loss 1.5078   LearningRate 0.0002   Epoch: 24   Global Step: 41700   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-04 05:03:01,602-Speed 13700.86 samples/sec   Loss 1.4975   LearningRate 0.0002   Epoch: 24   Global Step: 41710   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-04 05:03:19,397-Speed 13811.56 samples/sec   Loss 1.4996   LearningRate 0.0002   Epoch: 24   Global Step: 41720   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-04 05:03:37,102-Speed 13881.93 samples/sec   Loss 1.4978   LearningRate 0.0002   Epoch: 24   Global Step: 41730   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-04 05:03:54,961-Speed 13761.87 samples/sec   Loss 1.5059   LearningRate 0.0002   Epoch: 24   Global Step: 41740   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-04 05:04:12,750-Speed 13816.28 samples/sec   Loss 1.5078   LearningRate 0.0002   Epoch: 24   Global Step: 41750   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-04 05:04:30,691-Speed 13699.47 samples/sec   Loss 1.4910   LearningRate 0.0002   Epoch: 24   Global Step: 41760   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-04 05:04:48,546-Speed 13765.12 samples/sec   Loss 1.5004   LearningRate 0.0002   Epoch: 24   Global Step: 41770   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-04 05:05:06,450-Speed 13728.01 samples/sec   Loss 1.5065   LearningRate 0.0002   Epoch: 24   Global Step: 41780   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-04 05:05:24,278-Speed 13786.03 samples/sec   Loss 1.5004   LearningRate 0.0002   Epoch: 24   Global Step: 41790   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-04 05:05:42,159-Speed 13745.90 samples/sec   Loss 1.5038   LearningRate 0.0002   Epoch: 24   Global Step: 41800   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-04 05:05:59,979-Speed 13792.03 samples/sec   Loss 1.5008   LearningRate 0.0002   Epoch: 24   Global Step: 41810   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-04 05:06:17,825-Speed 13772.41 samples/sec   Loss 1.5029   LearningRate 0.0002   Epoch: 24   Global Step: 41820   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-04 05:06:35,681-Speed 13763.90 samples/sec   Loss 1.5142   LearningRate 0.0002   Epoch: 24   Global Step: 41830   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-04 05:06:53,610-Speed 13708.48 samples/sec   Loss 1.5197   LearningRate 0.0002   Epoch: 24   Global Step: 41840   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-04 05:07:11,466-Speed 13764.37 samples/sec   Loss 1.5058   LearningRate 0.0002   Epoch: 24   Global Step: 41850   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-04 05:07:29,266-Speed 13807.53 samples/sec   Loss 1.5060   LearningRate 0.0002   Epoch: 24   Global Step: 41860   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-04 05:07:46,986-Speed 13870.08 samples/sec   Loss 1.5065   LearningRate 0.0002   Epoch: 24   Global Step: 41870   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-04 05:08:04,852-Speed 13757.14 samples/sec   Loss 1.4984   LearningRate 0.0002   Epoch: 24   Global Step: 41880   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-04 05:08:22,629-Speed 13827.31 samples/sec   Loss 1.5004   LearningRate 0.0002   Epoch: 24   Global Step: 41890   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-04 05:08:40,449-Speed 13792.04 samples/sec   Loss 1.4971   LearningRate 0.0002   Epoch: 24   Global Step: 41900   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-04 05:08:58,231-Speed 13821.14 samples/sec   Loss 1.4960   LearningRate 0.0002   Epoch: 24   Global Step: 41910   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-04 05:09:16,075-Speed 13774.26 samples/sec   Loss 1.4966   LearningRate 0.0002   Epoch: 24   Global Step: 41920   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-04 05:09:33,906-Speed 13783.48 samples/sec   Loss 1.4807   LearningRate 0.0002   Epoch: 24   Global Step: 41930   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-04 05:09:51,811-Speed 13726.48 samples/sec   Loss 1.4893   LearningRate 0.0002   Epoch: 24   Global Step: 41940   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-04 05:10:09,756-Speed 13697.38 samples/sec   Loss 1.4895   LearningRate 0.0002   Epoch: 24   Global Step: 41950   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-04 05:10:27,675-Speed 13715.60 samples/sec   Loss 1.5034   LearningRate 0.0002   Epoch: 24   Global Step: 41960   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-04 05:10:45,448-Speed 13828.58 samples/sec   Loss 1.5029   LearningRate 0.0002   Epoch: 24   Global Step: 41970   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-04 05:11:03,229-Speed 13822.96 samples/sec   Loss 1.4854   LearningRate 0.0002   Epoch: 24   Global Step: 41980   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-04 05:11:21,058-Speed 13784.72 samples/sec   Loss 1.4922   LearningRate 0.0002   Epoch: 24   Global Step: 41990   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-04 05:11:38,885-Speed 13786.82 samples/sec   Loss 1.4946   LearningRate 0.0002   Epoch: 24   Global Step: 42000   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-04 05:11:56,707-Speed 13790.46 samples/sec   Loss 1.4827   LearningRate 0.0002   Epoch: 24   Global Step: 42010   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-03-04 05:12:14,675-Speed 13678.48 samples/sec   Loss 1.4959   LearningRate 0.0002   Epoch: 24   Global Step: 42020   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-03-04 05:12:32,468-Speed 13813.32 samples/sec   Loss 1.4798   LearningRate 0.0002   Epoch: 24   Global Step: 42030   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-03-04 05:12:50,290-Speed 13790.16 samples/sec   Loss 1.4810   LearningRate 0.0002   Epoch: 24   Global Step: 42040   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-03-04 05:13:08,228-Speed 13701.90 samples/sec   Loss 1.4869   LearningRate 0.0002   Epoch: 24   Global Step: 42050   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-03-04 05:13:26,058-Speed 13785.09 samples/sec   Loss 1.4846   LearningRate 0.0002   Epoch: 24   Global Step: 42060   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-03-04 05:13:43,897-Speed 13777.81 samples/sec   Loss 1.4882   LearningRate 0.0002   Epoch: 24   Global Step: 42070   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-03-04 05:14:01,696-Speed 13808.39 samples/sec   Loss 1.4940   LearningRate 0.0002   Epoch: 24   Global Step: 42080   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-03-04 05:14:19,511-Speed 13797.37 samples/sec   Loss 1.4832   LearningRate 0.0002   Epoch: 24   Global Step: 42090   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-03-04 05:14:37,269-Speed 13840.82 samples/sec   Loss 1.4764   LearningRate 0.0002   Epoch: 24   Global Step: 42100   Fp16 Grad Scale: 16384   Required: 14 hours
Training: 2022-03-04 05:14:55,170-Speed 13729.34 samples/sec   Loss 1.4887   LearningRate 0.0002   Epoch: 24   Global Step: 42110   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-04 05:15:12,972-Speed 13806.13 samples/sec   Loss 1.4848   LearningRate 0.0002   Epoch: 24   Global Step: 42120   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-04 05:15:30,894-Speed 13713.82 samples/sec   Loss 1.4864   LearningRate 0.0002   Epoch: 24   Global Step: 42130   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-04 05:15:48,646-Speed 13845.07 samples/sec   Loss 1.4913   LearningRate 0.0002   Epoch: 24   Global Step: 42140   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-04 05:16:06,563-Speed 13716.85 samples/sec   Loss 1.4876   LearningRate 0.0002   Epoch: 24   Global Step: 42150   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-04 05:16:24,316-Speed 13844.63 samples/sec   Loss 1.4805   LearningRate 0.0002   Epoch: 24   Global Step: 42160   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-04 05:16:42,224-Speed 13724.41 samples/sec   Loss 1.4910   LearningRate 0.0002   Epoch: 24   Global Step: 42170   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-04 05:17:00,053-Speed 13784.81 samples/sec   Loss 1.4859   LearningRate 0.0002   Epoch: 24   Global Step: 42180   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-04 05:17:17,983-Speed 13707.93 samples/sec   Loss 1.4733   LearningRate 0.0002   Epoch: 24   Global Step: 42190   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-04 05:17:35,914-Speed 13706.34 samples/sec   Loss 1.4818   LearningRate 0.0002   Epoch: 24   Global Step: 42200   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-04 05:17:53,846-Speed 13706.09 samples/sec   Loss 1.4788   LearningRate 0.0002   Epoch: 24   Global Step: 42210   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-04 05:18:11,664-Speed 13796.28 samples/sec   Loss 1.4746   LearningRate 0.0002   Epoch: 24   Global Step: 42220   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-04 05:18:29,624-Speed 13686.04 samples/sec   Loss 1.4753   LearningRate 0.0002   Epoch: 24   Global Step: 42230   Fp16 Grad Scale: 65536   Required: 14 hours
Training: 2022-03-04 05:18:47,471-Speed 13770.57 samples/sec   Loss 1.4797   LearningRate 0.0002   Epoch: 24   Global Step: 42240   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-04 05:19:05,348-Speed 13748.37 samples/sec   Loss 1.4834   LearningRate 0.0002   Epoch: 24   Global Step: 42250   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-04 05:19:23,158-Speed 13800.26 samples/sec   Loss 1.4780   LearningRate 0.0002   Epoch: 24   Global Step: 42260   Fp16 Grad Scale: 32768   Required: 14 hours
Training: 2022-03-04 05:19:41,057-Speed 13730.97 samples/sec   Loss 1.4840   LearningRate 0.0002   Epoch: 24   Global Step: 42270   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-04 05:19:58,999-Speed 13698.44 samples/sec   Loss 1.4845   LearningRate 0.0002   Epoch: 24   Global Step: 42280   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-04 05:20:16,787-Speed 13816.85 samples/sec   Loss 1.4795   LearningRate 0.0002   Epoch: 24   Global Step: 42290   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-04 05:20:34,567-Speed 13823.28 samples/sec   Loss 1.4785   LearningRate 0.0002   Epoch: 24   Global Step: 42300   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-04 05:20:52,248-Speed 13901.05 samples/sec   Loss 1.4809   LearningRate 0.0002   Epoch: 24   Global Step: 42310   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-03-04 05:21:10,139-Speed 13739.11 samples/sec   Loss 1.4849   LearningRate 0.0002   Epoch: 24   Global Step: 42320   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-03-04 05:21:27,983-Speed 13773.57 samples/sec   Loss 1.4868   LearningRate 0.0002   Epoch: 24   Global Step: 42330   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-03-04 05:21:46,069-Speed 13589.40 samples/sec   Loss 1.4762   LearningRate 0.0002   Epoch: 24   Global Step: 42340   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-03-04 05:22:04,190-Speed 13563.13 samples/sec   Loss 1.4848   LearningRate 0.0002   Epoch: 24   Global Step: 42350   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-03-04 05:22:22,302-Speed 13570.95 samples/sec   Loss 1.4720   LearningRate 0.0002   Epoch: 24   Global Step: 42360   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-03-04 05:22:40,394-Speed 13584.57 samples/sec   Loss 1.4681   LearningRate 0.0002   Epoch: 24   Global Step: 42370   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-03-04 05:22:58,488-Speed 13583.71 samples/sec   Loss 1.4837   LearningRate 0.0002   Epoch: 24   Global Step: 42380   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-03-04 05:23:16,505-Speed 13641.15 samples/sec   Loss 1.4659   LearningRate 0.0002   Epoch: 24   Global Step: 42390   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-03-04 05:23:34,588-Speed 13591.38 samples/sec   Loss 1.4788   LearningRate 0.0002   Epoch: 24   Global Step: 42400   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-03-04 05:23:52,614-Speed 13634.56 samples/sec   Loss 1.4777   LearningRate 0.0002   Epoch: 24   Global Step: 42410   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-04 05:24:10,735-Speed 13562.93 samples/sec   Loss 1.4777   LearningRate 0.0002   Epoch: 24   Global Step: 42420   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-04 05:24:28,612-Speed 13748.90 samples/sec   Loss 1.4817   LearningRate 0.0002   Epoch: 24   Global Step: 42430   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-04 05:24:46,592-Speed 13668.61 samples/sec   Loss 1.4830   LearningRate 0.0002   Epoch: 24   Global Step: 42440   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-04 05:25:04,566-Speed 13673.84 samples/sec   Loss 1.4796   LearningRate 0.0002   Epoch: 24   Global Step: 42450   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-04 05:25:22,450-Speed 13744.94 samples/sec   Loss 1.4652   LearningRate 0.0002   Epoch: 24   Global Step: 42460   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-04 05:25:40,282-Speed 13783.48 samples/sec   Loss 1.4772   LearningRate 0.0002   Epoch: 24   Global Step: 42470   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-04 05:25:58,187-Speed 13726.28 samples/sec   Loss 1.4765   LearningRate 0.0002   Epoch: 24   Global Step: 42480   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-04 05:26:16,079-Speed 13736.62 samples/sec   Loss 1.4816   LearningRate 0.0002   Epoch: 24   Global Step: 42490   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-04 05:26:33,970-Speed 13737.46 samples/sec   Loss 1.4719   LearningRate 0.0002   Epoch: 24   Global Step: 42500   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-04 05:26:51,821-Speed 13767.75 samples/sec   Loss 1.4666   LearningRate 0.0002   Epoch: 24   Global Step: 42510   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-04 05:27:09,619-Speed 13809.97 samples/sec   Loss 1.4765   LearningRate 0.0002   Epoch: 24   Global Step: 42520   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-04 05:27:27,415-Speed 13810.08 samples/sec   Loss 1.4752   LearningRate 0.0002   Epoch: 24   Global Step: 42530   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-03-04 05:27:45,166-Speed 13846.20 samples/sec   Loss 1.4647   LearningRate 0.0002   Epoch: 24   Global Step: 42540   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-03-04 05:28:02,960-Speed 13812.17 samples/sec   Loss 1.4731   LearningRate 0.0002   Epoch: 24   Global Step: 42550   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-03-04 05:28:20,744-Speed 13819.84 samples/sec   Loss 1.4606   LearningRate 0.0002   Epoch: 24   Global Step: 42560   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-03-04 05:28:38,473-Speed 13862.94 samples/sec   Loss 1.4691   LearningRate 0.0002   Epoch: 24   Global Step: 42570   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-03-04 05:28:56,196-Speed 13867.55 samples/sec   Loss 1.4760   LearningRate 0.0002   Epoch: 24   Global Step: 42580   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-03-04 05:29:14,050-Speed 13766.86 samples/sec   Loss 1.4713   LearningRate 0.0002   Epoch: 24   Global Step: 42590   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-03-04 05:29:31,873-Speed 13789.72 samples/sec   Loss 1.4678   LearningRate 0.0002   Epoch: 24   Global Step: 42600   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-03-04 05:29:49,709-Speed 13780.03 samples/sec   Loss 1.4731   LearningRate 0.0002   Epoch: 24   Global Step: 42610   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-03-04 05:30:07,493-Speed 13819.75 samples/sec   Loss 1.4654   LearningRate 0.0002   Epoch: 24   Global Step: 42620   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-03-04 05:30:25,234-Speed 13854.16 samples/sec   Loss 1.4714   LearningRate 0.0002   Epoch: 24   Global Step: 42630   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-04 05:30:43,036-Speed 13806.48 samples/sec   Loss 1.4694   LearningRate 0.0002   Epoch: 24   Global Step: 42640   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-04 05:31:00,864-Speed 13785.49 samples/sec   Loss 1.4683   LearningRate 0.0002   Epoch: 24   Global Step: 42650   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-04 05:31:18,666-Speed 13806.14 samples/sec   Loss 1.4624   LearningRate 0.0002   Epoch: 24   Global Step: 42660   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-04 05:31:36,623-Speed 13687.29 samples/sec   Loss 1.4696   LearningRate 0.0002   Epoch: 24   Global Step: 42670   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-04 05:31:54,431-Speed 13801.29 samples/sec   Loss 1.4610   LearningRate 0.0002   Epoch: 24   Global Step: 42680   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-04 05:32:12,342-Speed 13722.16 samples/sec   Loss 1.4563   LearningRate 0.0002   Epoch: 24   Global Step: 42690   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-04 05:32:30,038-Speed 13889.02 samples/sec   Loss 1.4649   LearningRate 0.0002   Epoch: 24   Global Step: 42700   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-04 05:32:47,853-Speed 13795.95 samples/sec   Loss 1.4633   LearningRate 0.0002   Epoch: 24   Global Step: 42710   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-03-04 05:33:05,888-Speed 13627.67 samples/sec   Loss 1.4720   LearningRate 0.0002   Epoch: 24   Global Step: 42720   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-03-04 05:33:23,783-Speed 13734.45 samples/sec   Loss 1.4551   LearningRate 0.0002   Epoch: 24   Global Step: 42730   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-03-04 05:33:41,641-Speed 13762.75 samples/sec   Loss 1.4623   LearningRate 0.0002   Epoch: 24   Global Step: 42740   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-03-04 05:33:59,554-Speed 13720.31 samples/sec   Loss 1.4662   LearningRate 0.0002   Epoch: 24   Global Step: 42750   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-03-04 05:34:17,379-Speed 13788.68 samples/sec   Loss 1.4536   LearningRate 0.0002   Epoch: 24   Global Step: 42760   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-03-04 05:34:35,292-Speed 13720.44 samples/sec   Loss 1.4728   LearningRate 0.0002   Epoch: 24   Global Step: 42770   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-03-04 05:34:53,191-Speed 13731.52 samples/sec   Loss 1.4636   LearningRate 0.0002   Epoch: 24   Global Step: 42780   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-03-04 05:35:11,034-Speed 13775.26 samples/sec   Loss 1.4606   LearningRate 0.0002   Epoch: 24   Global Step: 42790   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-03-04 05:35:28,949-Speed 13719.04 samples/sec   Loss 1.4609   LearningRate 0.0002   Epoch: 24   Global Step: 42800   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-03-04 05:35:46,762-Speed 13797.64 samples/sec   Loss 1.4664   LearningRate 0.0002   Epoch: 24   Global Step: 42810   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-04 05:36:04,644-Speed 13744.27 samples/sec   Loss 1.4699   LearningRate 0.0002   Epoch: 24   Global Step: 42820   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-04 05:36:22,424-Speed 13822.98 samples/sec   Loss 1.4623   LearningRate 0.0002   Epoch: 24   Global Step: 42830   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-04 05:36:40,226-Speed 13805.85 samples/sec   Loss 1.4569   LearningRate 0.0002   Epoch: 24   Global Step: 42840   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-04 05:36:58,070-Speed 13774.11 samples/sec   Loss 1.4561   LearningRate 0.0002   Epoch: 24   Global Step: 42850   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-04 05:37:15,884-Speed 13796.19 samples/sec   Loss 1.4515   LearningRate 0.0002   Epoch: 24   Global Step: 42860   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-04 05:37:33,702-Speed 13794.19 samples/sec   Loss 1.4585   LearningRate 0.0002   Epoch: 24   Global Step: 42870   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-04 05:37:51,563-Speed 13760.06 samples/sec   Loss 1.4527   LearningRate 0.0002   Epoch: 24   Global Step: 42880   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-04 05:38:09,435-Speed 13752.01 samples/sec   Loss 1.4476   LearningRate 0.0002   Epoch: 24   Global Step: 42890   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-04 05:38:27,217-Speed 13821.39 samples/sec   Loss 1.4493   LearningRate 0.0002   Epoch: 24   Global Step: 42900   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-04 05:38:45,041-Speed 13788.96 samples/sec   Loss 1.4621   LearningRate 0.0002   Epoch: 24   Global Step: 42910   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-04 05:39:02,851-Speed 13800.24 samples/sec   Loss 1.4589   LearningRate 0.0002   Epoch: 24   Global Step: 42920   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-04 05:39:20,634-Speed 13821.08 samples/sec   Loss 1.4520   LearningRate 0.0002   Epoch: 24   Global Step: 42930   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-04 05:39:38,472-Speed 13778.26 samples/sec   Loss 1.4525   LearningRate 0.0002   Epoch: 24   Global Step: 42940   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-04 05:39:56,298-Speed 13787.01 samples/sec   Loss 1.4522   LearningRate 0.0002   Epoch: 24   Global Step: 42950   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-04 05:40:14,225-Speed 13709.29 samples/sec   Loss 1.4538   LearningRate 0.0002   Epoch: 24   Global Step: 42960   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-04 05:40:32,005-Speed 13824.77 samples/sec   Loss 1.4537   LearningRate 0.0002   Epoch: 24   Global Step: 42970   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-03-04 05:40:49,883-Speed 13747.54 samples/sec   Loss 1.4513   LearningRate 0.0002   Epoch: 24   Global Step: 42980   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-03-04 05:41:07,726-Speed 13774.20 samples/sec   Loss 1.4603   LearningRate 0.0002   Epoch: 24   Global Step: 42990   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-03-04 05:41:25,506-Speed 13823.38 samples/sec   Loss 1.4577   LearningRate 0.0002   Epoch: 24   Global Step: 43000   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-03-04 05:41:43,290-Speed 13819.80 samples/sec   Loss 1.4630   LearningRate 0.0002   Epoch: 24   Global Step: 43010   Fp16 Grad Scale: 8192   Required: 13 hours
Training: 2022-03-04 05:42:01,432-Speed 13547.90 samples/sec   Loss 1.4459   LearningRate 0.0002   Epoch: 24   Global Step: 43020   Fp16 Grad Scale: 8192   Required: 13 hours
Training: 2022-03-04 05:42:19,439-Speed 13648.69 samples/sec   Loss 1.4486   LearningRate 0.0002   Epoch: 24   Global Step: 43030   Fp16 Grad Scale: 8192   Required: 13 hours
Training: 2022-03-04 05:42:37,292-Speed 13768.87 samples/sec   Loss 1.4592   LearningRate 0.0002   Epoch: 24   Global Step: 43040   Fp16 Grad Scale: 8192   Required: 13 hours
Training: 2022-03-04 05:42:55,201-Speed 13723.31 samples/sec   Loss 1.4557   LearningRate 0.0002   Epoch: 24   Global Step: 43050   Fp16 Grad Scale: 8192   Required: 13 hours
Training: 2022-03-04 05:43:13,145-Speed 13696.90 samples/sec   Loss 1.4540   LearningRate 0.0002   Epoch: 24   Global Step: 43060   Fp16 Grad Scale: 8192   Required: 13 hours
Training: 2022-03-04 05:43:30,938-Speed 13813.89 samples/sec   Loss 1.4594   LearningRate 0.0002   Epoch: 24   Global Step: 43070   Fp16 Grad Scale: 8192   Required: 13 hours
Training: 2022-03-04 05:43:48,829-Speed 13736.85 samples/sec   Loss 1.4505   LearningRate 0.0002   Epoch: 24   Global Step: 43080   Fp16 Grad Scale: 8192   Required: 13 hours
Training: 2022-03-04 05:44:06,710-Speed 13745.34 samples/sec   Loss 1.4479   LearningRate 0.0002   Epoch: 24   Global Step: 43090   Fp16 Grad Scale: 8192   Required: 13 hours
Training: 2022-03-04 05:44:24,707-Speed 13656.17 samples/sec   Loss 1.4468   LearningRate 0.0002   Epoch: 24   Global Step: 43100   Fp16 Grad Scale: 8192   Required: 13 hours
Training: 2022-03-04 05:44:42,492-Speed 13819.27 samples/sec   Loss 1.4550   LearningRate 0.0002   Epoch: 24   Global Step: 43110   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-03-04 05:45:00,253-Speed 13839.65 samples/sec   Loss 1.4474   LearningRate 0.0002   Epoch: 24   Global Step: 43120   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-03-04 05:45:18,116-Speed 13758.58 samples/sec   Loss 1.4614   LearningRate 0.0002   Epoch: 24   Global Step: 43130   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-03-04 05:45:35,899-Speed 13821.30 samples/sec   Loss 1.4527   LearningRate 0.0002   Epoch: 24   Global Step: 43140   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-03-04 05:45:53,722-Speed 13790.84 samples/sec   Loss 1.4651   LearningRate 0.0002   Epoch: 24   Global Step: 43150   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-03-04 05:46:11,587-Speed 13757.96 samples/sec   Loss 1.4628   LearningRate 0.0002   Epoch: 24   Global Step: 43160   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-03-04 05:46:29,397-Speed 13799.68 samples/sec   Loss 1.4595   LearningRate 0.0002   Epoch: 24   Global Step: 43170   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-03-04 05:46:47,320-Speed 13712.55 samples/sec   Loss 1.4679   LearningRate 0.0002   Epoch: 24   Global Step: 43180   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-03-04 05:47:05,149-Speed 13785.25 samples/sec   Loss 1.4639   LearningRate 0.0002   Epoch: 24   Global Step: 43190   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-03-04 05:47:22,890-Speed 13853.54 samples/sec   Loss 1.4693   LearningRate 0.0002   Epoch: 24   Global Step: 43200   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-03-04 05:48:31,312-Speed 3591.88 samples/sec   Loss 1.4579   LearningRate 0.0002   Epoch: 25   Global Step: 43210   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-04 05:48:49,260-Speed 13694.50 samples/sec   Loss 1.4346   LearningRate 0.0002   Epoch: 25   Global Step: 43220   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-04 05:49:07,046-Speed 13817.98 samples/sec   Loss 1.4480   LearningRate 0.0002   Epoch: 25   Global Step: 43230   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-04 05:49:24,850-Speed 13804.58 samples/sec   Loss 1.4392   LearningRate 0.0002   Epoch: 25   Global Step: 43240   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-04 05:49:42,700-Speed 13769.23 samples/sec   Loss 1.4436   LearningRate 0.0002   Epoch: 25   Global Step: 43250   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-04 05:50:00,473-Speed 13828.84 samples/sec   Loss 1.4433   LearningRate 0.0002   Epoch: 25   Global Step: 43260   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-04 05:50:18,336-Speed 13759.64 samples/sec   Loss 1.4391   LearningRate 0.0002   Epoch: 25   Global Step: 43270   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-04 05:50:36,205-Speed 13755.59 samples/sec   Loss 1.4390   LearningRate 0.0002   Epoch: 25   Global Step: 43280   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-04 05:50:54,124-Speed 13715.65 samples/sec   Loss 1.4372   LearningRate 0.0002   Epoch: 25   Global Step: 43290   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-04 05:51:11,953-Speed 13785.82 samples/sec   Loss 1.4405   LearningRate 0.0002   Epoch: 25   Global Step: 43300   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-04 05:51:29,822-Speed 13753.83 samples/sec   Loss 1.4396   LearningRate 0.0002   Epoch: 25   Global Step: 43310   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-04 05:51:47,732-Speed 13723.20 samples/sec   Loss 1.4420   LearningRate 0.0002   Epoch: 25   Global Step: 43320   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-03-04 05:52:05,527-Speed 13811.02 samples/sec   Loss 1.4359   LearningRate 0.0002   Epoch: 25   Global Step: 43330   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-03-04 05:52:23,323-Speed 13811.14 samples/sec   Loss 1.4318   LearningRate 0.0002   Epoch: 25   Global Step: 43340   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-03-04 05:52:41,166-Speed 13774.51 samples/sec   Loss 1.4402   LearningRate 0.0002   Epoch: 25   Global Step: 43350   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-03-04 05:52:59,009-Speed 13774.41 samples/sec   Loss 1.4484   LearningRate 0.0002   Epoch: 25   Global Step: 43360   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-03-04 05:53:16,816-Speed 13802.01 samples/sec   Loss 1.4400   LearningRate 0.0002   Epoch: 25   Global Step: 43370   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-03-04 05:53:34,682-Speed 13755.95 samples/sec   Loss 1.4375   LearningRate 0.0002   Epoch: 25   Global Step: 43380   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-03-04 05:53:52,611-Speed 13709.29 samples/sec   Loss 1.4389   LearningRate 0.0002   Epoch: 25   Global Step: 43390   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-03-04 05:54:10,479-Speed 13754.51 samples/sec   Loss 1.4353   LearningRate 0.0002   Epoch: 25   Global Step: 43400   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-03-04 05:54:28,327-Speed 13770.34 samples/sec   Loss 1.4368   LearningRate 0.0002   Epoch: 25   Global Step: 43410   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-03-04 05:54:46,083-Speed 13842.01 samples/sec   Loss 1.4421   LearningRate 0.0002   Epoch: 25   Global Step: 43420   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-04 05:55:03,916-Speed 13782.20 samples/sec   Loss 1.4407   LearningRate 0.0002   Epoch: 25   Global Step: 43430   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-04 05:55:21,845-Speed 13708.49 samples/sec   Loss 1.4383   LearningRate 0.0002   Epoch: 25   Global Step: 43440   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-04 05:55:39,795-Speed 13692.35 samples/sec   Loss 1.4350   LearningRate 0.0002   Epoch: 25   Global Step: 43450   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-04 05:55:57,557-Speed 13836.94 samples/sec   Loss 1.4470   LearningRate 0.0002   Epoch: 25   Global Step: 43460   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-04 05:56:15,381-Speed 13789.08 samples/sec   Loss 1.4516   LearningRate 0.0002   Epoch: 25   Global Step: 43470   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-04 05:56:33,163-Speed 13821.82 samples/sec   Loss 1.4325   LearningRate 0.0002   Epoch: 25   Global Step: 43480   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-04 05:56:50,975-Speed 13798.34 samples/sec   Loss 1.4365   LearningRate 0.0002   Epoch: 25   Global Step: 43490   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-04 05:57:08,805-Speed 13784.25 samples/sec   Loss 1.4387   LearningRate 0.0002   Epoch: 25   Global Step: 43500   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-04 05:57:26,664-Speed 13762.07 samples/sec   Loss 1.4474   LearningRate 0.0002   Epoch: 25   Global Step: 43510   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-04 05:57:44,570-Speed 13725.78 samples/sec   Loss 1.4392   LearningRate 0.0002   Epoch: 25   Global Step: 43520   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-04 05:58:02,426-Speed 13764.12 samples/sec   Loss 1.4288   LearningRate 0.0002   Epoch: 25   Global Step: 43530   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-04 05:58:20,164-Speed 13856.13 samples/sec   Loss 1.4285   LearningRate 0.0002   Epoch: 25   Global Step: 43540   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-04 05:58:38,101-Speed 13703.42 samples/sec   Loss 1.4461   LearningRate 0.0002   Epoch: 25   Global Step: 43550   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-04 05:58:55,923-Speed 13790.43 samples/sec   Loss 1.4426   LearningRate 0.0002   Epoch: 25   Global Step: 43560   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-04 05:59:13,766-Speed 13773.90 samples/sec   Loss 1.4300   LearningRate 0.0002   Epoch: 25   Global Step: 43570   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-04 05:59:31,695-Speed 13708.78 samples/sec   Loss 1.4279   LearningRate 0.0002   Epoch: 25   Global Step: 43580   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-04 05:59:49,463-Speed 13832.29 samples/sec   Loss 1.4433   LearningRate 0.0002   Epoch: 25   Global Step: 43590   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-04 06:00:07,229-Speed 13834.04 samples/sec   Loss 1.4327   LearningRate 0.0002   Epoch: 25   Global Step: 43600   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-04 06:00:25,100-Speed 13752.86 samples/sec   Loss 1.4323   LearningRate 0.0002   Epoch: 25   Global Step: 43610   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-04 06:00:42,965-Speed 13757.00 samples/sec   Loss 1.4398   LearningRate 0.0002   Epoch: 25   Global Step: 43620   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-04 06:01:00,759-Speed 13812.55 samples/sec   Loss 1.4275   LearningRate 0.0002   Epoch: 25   Global Step: 43630   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-04 06:01:18,606-Speed 13771.66 samples/sec   Loss 1.4311   LearningRate 0.0002   Epoch: 25   Global Step: 43640   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-04 06:01:36,570-Speed 13681.14 samples/sec   Loss 1.4214   LearningRate 0.0002   Epoch: 25   Global Step: 43650   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-04 06:01:54,486-Speed 13718.71 samples/sec   Loss 1.4353   LearningRate 0.0002   Epoch: 25   Global Step: 43660   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-04 06:02:12,436-Speed 13692.85 samples/sec   Loss 1.4378   LearningRate 0.0002   Epoch: 25   Global Step: 43670   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-04 06:02:30,332-Speed 13733.47 samples/sec   Loss 1.4416   LearningRate 0.0002   Epoch: 25   Global Step: 43680   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-04 06:02:48,294-Speed 13682.74 samples/sec   Loss 1.4195   LearningRate 0.0002   Epoch: 25   Global Step: 43690   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-04 06:03:06,146-Speed 13769.64 samples/sec   Loss 1.4267   LearningRate 0.0002   Epoch: 25   Global Step: 43700   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-04 06:03:23,958-Speed 13798.77 samples/sec   Loss 1.4163   LearningRate 0.0002   Epoch: 25   Global Step: 43710   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-04 06:03:41,792-Speed 13781.40 samples/sec   Loss 1.4373   LearningRate 0.0002   Epoch: 25   Global Step: 43720   Fp16 Grad Scale: 65536   Required: 13 hours
Training: 2022-03-04 06:03:59,726-Speed 13705.09 samples/sec   Loss 1.4376   LearningRate 0.0002   Epoch: 25   Global Step: 43730   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-04 06:04:17,667-Speed 13700.06 samples/sec   Loss 1.4266   LearningRate 0.0002   Epoch: 25   Global Step: 43740   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-04 06:04:35,636-Speed 13679.13 samples/sec   Loss 1.4324   LearningRate 0.0002   Epoch: 25   Global Step: 43750   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-04 06:04:53,479-Speed 13773.87 samples/sec   Loss 1.4198   LearningRate 0.0002   Epoch: 25   Global Step: 43760   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-04 06:05:11,341-Speed 13759.85 samples/sec   Loss 1.4340   LearningRate 0.0002   Epoch: 25   Global Step: 43770   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-04 06:05:29,153-Speed 13798.47 samples/sec   Loss 1.4162   LearningRate 0.0002   Epoch: 25   Global Step: 43780   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-04 06:05:47,227-Speed 13598.13 samples/sec   Loss 1.4348   LearningRate 0.0002   Epoch: 25   Global Step: 43790   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-04 06:06:04,933-Speed 13880.79 samples/sec   Loss 1.4208   LearningRate 0.0002   Epoch: 25   Global Step: 43800   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-03-04 06:06:22,700-Speed 13833.60 samples/sec   Loss 1.4251   LearningRate 0.0002   Epoch: 25   Global Step: 43810   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-03-04 06:06:40,538-Speed 13778.45 samples/sec   Loss 1.4136   LearningRate 0.0002   Epoch: 25   Global Step: 43820   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-03-04 06:06:58,281-Speed 13852.33 samples/sec   Loss 1.4162   LearningRate 0.0002   Epoch: 25   Global Step: 43830   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-03-04 06:07:16,216-Speed 13703.68 samples/sec   Loss 1.4329   LearningRate 0.0002   Epoch: 25   Global Step: 43840   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-03-04 06:07:33,959-Speed 13852.15 samples/sec   Loss 1.4290   LearningRate 0.0002   Epoch: 25   Global Step: 43850   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-03-04 06:07:51,662-Speed 13882.83 samples/sec   Loss 1.4323   LearningRate 0.0002   Epoch: 25   Global Step: 43860   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-03-04 06:08:09,471-Speed 13800.65 samples/sec   Loss 1.4281   LearningRate 0.0002   Epoch: 25   Global Step: 43870   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-03-04 06:08:27,394-Speed 13713.04 samples/sec   Loss 1.4271   LearningRate 0.0002   Epoch: 25   Global Step: 43880   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-03-04 06:08:45,196-Speed 13806.07 samples/sec   Loss 1.4301   LearningRate 0.0002   Epoch: 25   Global Step: 43890   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-03-04 06:09:03,054-Speed 13763.69 samples/sec   Loss 1.4240   LearningRate 0.0002   Epoch: 25   Global Step: 43900   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-04 06:09:20,825-Speed 13830.86 samples/sec   Loss 1.4186   LearningRate 0.0002   Epoch: 25   Global Step: 43910   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-03-04 06:09:38,662-Speed 13779.58 samples/sec   Loss 1.4263   LearningRate 0.0002   Epoch: 25   Global Step: 43920   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-03-04 06:09:56,382-Speed 13870.36 samples/sec   Loss 1.4286   LearningRate 0.0002   Epoch: 25   Global Step: 43930   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-03-04 06:10:14,176-Speed 13812.66 samples/sec   Loss 1.4220   LearningRate 0.0002   Epoch: 25   Global Step: 43940   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-03-04 06:10:31,930-Speed 13842.94 samples/sec   Loss 1.4165   LearningRate 0.0002   Epoch: 25   Global Step: 43950   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-03-04 06:10:49,698-Speed 13832.95 samples/sec   Loss 1.4136   LearningRate 0.0002   Epoch: 25   Global Step: 43960   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-03-04 06:11:07,468-Speed 13832.10 samples/sec   Loss 1.4291   LearningRate 0.0002   Epoch: 25   Global Step: 43970   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-03-04 06:11:25,241-Speed 13828.62 samples/sec   Loss 1.4140   LearningRate 0.0002   Epoch: 25   Global Step: 43980   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-03-04 06:11:43,040-Speed 13808.31 samples/sec   Loss 1.4265   LearningRate 0.0002   Epoch: 25   Global Step: 43990   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-03-04 06:12:00,839-Speed 13808.51 samples/sec   Loss 1.4241   LearningRate 0.0002   Epoch: 25   Global Step: 44000   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-03-04 06:12:18,627-Speed 13817.10 samples/sec   Loss 1.4266   LearningRate 0.0002   Epoch: 25   Global Step: 44010   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-03-04 06:12:36,424-Speed 13809.54 samples/sec   Loss 1.4081   LearningRate 0.0002   Epoch: 25   Global Step: 44020   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-03-04 06:12:54,193-Speed 13832.60 samples/sec   Loss 1.4193   LearningRate 0.0002   Epoch: 25   Global Step: 44030   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-03-04 06:13:11,915-Speed 13868.89 samples/sec   Loss 1.4072   LearningRate 0.0002   Epoch: 25   Global Step: 44040   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-03-04 06:13:29,631-Speed 13872.53 samples/sec   Loss 1.4094   LearningRate 0.0002   Epoch: 25   Global Step: 44050   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-03-04 06:13:47,323-Speed 13892.31 samples/sec   Loss 1.4129   LearningRate 0.0002   Epoch: 25   Global Step: 44060   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-03-04 06:14:05,070-Speed 13848.59 samples/sec   Loss 1.4193   LearningRate 0.0002   Epoch: 25   Global Step: 44070   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-03-04 06:14:22,817-Speed 13848.69 samples/sec   Loss 1.4132   LearningRate 0.0002   Epoch: 25   Global Step: 44080   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-03-04 06:14:40,614-Speed 13810.24 samples/sec   Loss 1.4161   LearningRate 0.0002   Epoch: 25   Global Step: 44090   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-03-04 06:14:58,338-Speed 13866.85 samples/sec   Loss 1.4178   LearningRate 0.0002   Epoch: 25   Global Step: 44100   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-03-04 06:15:16,039-Speed 13884.92 samples/sec   Loss 1.4151   LearningRate 0.0002   Epoch: 25   Global Step: 44110   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-04 06:15:33,818-Speed 13824.16 samples/sec   Loss 1.4122   LearningRate 0.0002   Epoch: 25   Global Step: 44120   Fp16 Grad Scale: 32768   Required: 13 hours
Training: 2022-03-04 06:15:51,568-Speed 13845.94 samples/sec   Loss 1.4127   LearningRate 0.0002   Epoch: 25   Global Step: 44130   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-03-04 06:16:09,335-Speed 13833.68 samples/sec   Loss 1.4074   LearningRate 0.0002   Epoch: 25   Global Step: 44140   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-03-04 06:16:27,199-Speed 13758.07 samples/sec   Loss 1.4150   LearningRate 0.0002   Epoch: 25   Global Step: 44150   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-03-04 06:16:44,890-Speed 13892.74 samples/sec   Loss 1.4083   LearningRate 0.0002   Epoch: 25   Global Step: 44160   Fp16 Grad Scale: 16384   Required: 13 hours
Training: 2022-03-04 06:17:02,603-Speed 13876.06 samples/sec   Loss 1.4102   LearningRate 0.0002   Epoch: 25   Global Step: 44170   Fp16 Grad Scale: 8192   Required: 13 hours
Training: 2022-03-04 06:17:20,447-Speed 13773.16 samples/sec   Loss 1.4088   LearningRate 0.0002   Epoch: 25   Global Step: 44180   Fp16 Grad Scale: 8192   Required: 13 hours
Training: 2022-03-04 06:17:38,404-Speed 13687.04 samples/sec   Loss 1.4098   LearningRate 0.0002   Epoch: 25   Global Step: 44190   Fp16 Grad Scale: 8192   Required: 13 hours
Training: 2022-03-04 06:17:56,500-Speed 13581.92 samples/sec   Loss 1.4149   LearningRate 0.0002   Epoch: 25   Global Step: 44200   Fp16 Grad Scale: 8192   Required: 13 hours
Training: 2022-03-04 06:18:14,485-Speed 13666.40 samples/sec   Loss 1.4215   LearningRate 0.0002   Epoch: 25   Global Step: 44210   Fp16 Grad Scale: 8192   Required: 13 hours
Training: 2022-03-04 06:18:32,502-Speed 13641.34 samples/sec   Loss 1.4228   LearningRate 0.0002   Epoch: 25   Global Step: 44220   Fp16 Grad Scale: 8192   Required: 13 hours
Training: 2022-03-04 06:18:50,536-Speed 13628.31 samples/sec   Loss 1.4149   LearningRate 0.0002   Epoch: 25   Global Step: 44230   Fp16 Grad Scale: 8192   Required: 13 hours
Training: 2022-03-04 06:19:08,552-Speed 13641.73 samples/sec   Loss 1.4049   LearningRate 0.0002   Epoch: 25   Global Step: 44240   Fp16 Grad Scale: 8192   Required: 13 hours
Training: 2022-03-04 06:19:26,655-Speed 13577.29 samples/sec   Loss 1.4172   LearningRate 0.0002   Epoch: 25   Global Step: 44250   Fp16 Grad Scale: 8192   Required: 13 hours
Training: 2022-03-04 06:19:44,781-Speed 13559.17 samples/sec   Loss 1.4106   LearningRate 0.0002   Epoch: 25   Global Step: 44260   Fp16 Grad Scale: 8192   Required: 12 hours
Training: 2022-03-04 06:20:02,758-Speed 13671.34 samples/sec   Loss 1.4067   LearningRate 0.0002   Epoch: 25   Global Step: 44270   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-03-04 06:20:20,749-Speed 13660.90 samples/sec   Loss 1.4118   LearningRate 0.0002   Epoch: 25   Global Step: 44280   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-03-04 06:20:38,798-Speed 13617.40 samples/sec   Loss 1.4118   LearningRate 0.0002   Epoch: 25   Global Step: 44290   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-03-04 06:20:56,875-Speed 13596.52 samples/sec   Loss 1.4098   LearningRate 0.0002   Epoch: 25   Global Step: 44300   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-03-04 06:21:14,878-Speed 13651.95 samples/sec   Loss 1.4155   LearningRate 0.0002   Epoch: 25   Global Step: 44310   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-03-04 06:21:32,928-Speed 13616.22 samples/sec   Loss 1.4130   LearningRate 0.0002   Epoch: 25   Global Step: 44320   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-03-04 06:21:50,930-Speed 13652.69 samples/sec   Loss 1.4178   LearningRate 0.0002   Epoch: 25   Global Step: 44330   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-03-04 06:22:08,970-Speed 13623.78 samples/sec   Loss 1.4095   LearningRate 0.0002   Epoch: 25   Global Step: 44340   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-03-04 06:22:27,043-Speed 13599.09 samples/sec   Loss 1.4181   LearningRate 0.0002   Epoch: 25   Global Step: 44350   Fp16 Grad Scale: 8192   Required: 12 hours
Training: 2022-03-04 06:22:45,050-Speed 13648.93 samples/sec   Loss 1.4109   LearningRate 0.0002   Epoch: 25   Global Step: 44360   Fp16 Grad Scale: 8192   Required: 12 hours
Training: 2022-03-04 06:23:03,161-Speed 13570.24 samples/sec   Loss 1.4110   LearningRate 0.0002   Epoch: 25   Global Step: 44370   Fp16 Grad Scale: 8192   Required: 12 hours
Training: 2022-03-04 06:23:21,079-Speed 13716.44 samples/sec   Loss 1.4022   LearningRate 0.0002   Epoch: 25   Global Step: 44380   Fp16 Grad Scale: 8192   Required: 12 hours
Training: 2022-03-04 06:23:38,910-Speed 13783.92 samples/sec   Loss 1.4056   LearningRate 0.0002   Epoch: 25   Global Step: 44390   Fp16 Grad Scale: 8192   Required: 12 hours
Training: 2022-03-04 06:23:56,640-Speed 13862.51 samples/sec   Loss 1.4013   LearningRate 0.0002   Epoch: 25   Global Step: 44400   Fp16 Grad Scale: 8192   Required: 12 hours
Training: 2022-03-04 06:24:14,449-Speed 13799.92 samples/sec   Loss 1.4036   LearningRate 0.0002   Epoch: 25   Global Step: 44410   Fp16 Grad Scale: 8192   Required: 12 hours
Training: 2022-03-04 06:24:32,245-Speed 13810.63 samples/sec   Loss 1.3942   LearningRate 0.0002   Epoch: 25   Global Step: 44420   Fp16 Grad Scale: 8192   Required: 12 hours
Training: 2022-03-04 06:24:50,065-Speed 13793.12 samples/sec   Loss 1.3973   LearningRate 0.0002   Epoch: 25   Global Step: 44430   Fp16 Grad Scale: 8192   Required: 12 hours
Training: 2022-03-04 06:25:07,839-Speed 13827.43 samples/sec   Loss 1.4127   LearningRate 0.0002   Epoch: 25   Global Step: 44440   Fp16 Grad Scale: 8192   Required: 12 hours
Training: 2022-03-04 06:25:25,676-Speed 13778.80 samples/sec   Loss 1.3970   LearningRate 0.0002   Epoch: 25   Global Step: 44450   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-03-04 06:25:43,530-Speed 13766.14 samples/sec   Loss 1.3992   LearningRate 0.0002   Epoch: 25   Global Step: 44460   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-03-04 06:26:01,271-Speed 13853.66 samples/sec   Loss 1.3842   LearningRate 0.0002   Epoch: 25   Global Step: 44470   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-03-04 06:26:19,066-Speed 13811.68 samples/sec   Loss 1.3975   LearningRate 0.0002   Epoch: 25   Global Step: 44480   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-03-04 06:26:36,903-Speed 13779.26 samples/sec   Loss 1.4008   LearningRate 0.0002   Epoch: 25   Global Step: 44490   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-03-04 06:26:54,704-Speed 13806.77 samples/sec   Loss 1.4039   LearningRate 0.0002   Epoch: 25   Global Step: 44500   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-03-04 06:27:12,630-Speed 13711.57 samples/sec   Loss 1.4082   LearningRate 0.0002   Epoch: 25   Global Step: 44510   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-03-04 06:27:30,390-Speed 13838.86 samples/sec   Loss 1.3987   LearningRate 0.0002   Epoch: 25   Global Step: 44520   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-03-04 06:27:48,230-Speed 13776.59 samples/sec   Loss 1.4020   LearningRate 0.0002   Epoch: 25   Global Step: 44530   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-03-04 06:28:05,980-Speed 13846.09 samples/sec   Loss 1.4131   LearningRate 0.0002   Epoch: 25   Global Step: 44540   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-03-04 06:28:23,809-Speed 13785.28 samples/sec   Loss 1.3999   LearningRate 0.0002   Epoch: 25   Global Step: 44550   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-04 06:28:41,691-Speed 13744.61 samples/sec   Loss 1.3945   LearningRate 0.0002   Epoch: 25   Global Step: 44560   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-04 06:28:59,531-Speed 13776.38 samples/sec   Loss 1.4047   LearningRate 0.0002   Epoch: 25   Global Step: 44570   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-04 06:29:17,270-Speed 13855.41 samples/sec   Loss 1.4012   LearningRate 0.0002   Epoch: 25   Global Step: 44580   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-04 06:29:34,995-Speed 13865.65 samples/sec   Loss 1.3978   LearningRate 0.0002   Epoch: 25   Global Step: 44590   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-04 06:29:52,790-Speed 13811.67 samples/sec   Loss 1.3829   LearningRate 0.0002   Epoch: 25   Global Step: 44600   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-04 06:30:10,589-Speed 13808.55 samples/sec   Loss 1.3944   LearningRate 0.0002   Epoch: 25   Global Step: 44610   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-04 06:30:28,421-Speed 13783.01 samples/sec   Loss 1.3861   LearningRate 0.0002   Epoch: 25   Global Step: 44620   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-04 06:30:46,182-Speed 13837.19 samples/sec   Loss 1.4113   LearningRate 0.0002   Epoch: 25   Global Step: 44630   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-04 06:31:04,014-Speed 13783.89 samples/sec   Loss 1.4049   LearningRate 0.0002   Epoch: 25   Global Step: 44640   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-04 06:31:21,831-Speed 13794.40 samples/sec   Loss 1.3963   LearningRate 0.0002   Epoch: 25   Global Step: 44650   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-04 06:31:39,619-Speed 13817.17 samples/sec   Loss 1.4018   LearningRate 0.0002   Epoch: 25   Global Step: 44660   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-04 06:31:57,473-Speed 13765.85 samples/sec   Loss 1.3994   LearningRate 0.0002   Epoch: 25   Global Step: 44670   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-04 06:32:15,193-Speed 13869.78 samples/sec   Loss 1.4012   LearningRate 0.0002   Epoch: 25   Global Step: 44680   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-04 06:32:32,900-Speed 13880.96 samples/sec   Loss 1.4019   LearningRate 0.0002   Epoch: 25   Global Step: 44690   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-04 06:32:50,792-Speed 13736.39 samples/sec   Loss 1.3976   LearningRate 0.0002   Epoch: 25   Global Step: 44700   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-04 06:33:08,547-Speed 13842.45 samples/sec   Loss 1.3978   LearningRate 0.0002   Epoch: 25   Global Step: 44710   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-04 06:33:26,334-Speed 13818.21 samples/sec   Loss 1.3838   LearningRate 0.0002   Epoch: 25   Global Step: 44720   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-04 06:33:44,051-Speed 13872.55 samples/sec   Loss 1.4055   LearningRate 0.0002   Epoch: 25   Global Step: 44730   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-04 06:34:01,780-Speed 13862.53 samples/sec   Loss 1.3906   LearningRate 0.0002   Epoch: 25   Global Step: 44740   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-04 06:34:19,526-Speed 13849.90 samples/sec   Loss 1.3937   LearningRate 0.0002   Epoch: 25   Global Step: 44750   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-04 06:34:37,260-Speed 13858.70 samples/sec   Loss 1.4028   LearningRate 0.0002   Epoch: 25   Global Step: 44760   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-04 06:34:55,002-Speed 13853.34 samples/sec   Loss 1.3935   LearningRate 0.0002   Epoch: 25   Global Step: 44770   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-04 06:35:12,839-Speed 13779.79 samples/sec   Loss 1.4005   LearningRate 0.0002   Epoch: 25   Global Step: 44780   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-04 06:35:30,590-Speed 13845.46 samples/sec   Loss 1.3847   LearningRate 0.0002   Epoch: 25   Global Step: 44790   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-04 06:35:48,322-Speed 13860.96 samples/sec   Loss 1.3938   LearningRate 0.0002   Epoch: 25   Global Step: 44800   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-04 06:36:05,998-Speed 13904.85 samples/sec   Loss 1.3898   LearningRate 0.0002   Epoch: 25   Global Step: 44810   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-04 06:36:23,706-Speed 13878.71 samples/sec   Loss 1.3966   LearningRate 0.0002   Epoch: 25   Global Step: 44820   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-04 06:36:41,501-Speed 13811.50 samples/sec   Loss 1.3879   LearningRate 0.0002   Epoch: 25   Global Step: 44830   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-04 06:36:59,274-Speed 13829.14 samples/sec   Loss 1.3990   LearningRate 0.0002   Epoch: 25   Global Step: 44840   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-04 06:37:17,000-Speed 13865.05 samples/sec   Loss 1.3991   LearningRate 0.0002   Epoch: 25   Global Step: 44850   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-04 06:37:34,756-Speed 13842.31 samples/sec   Loss 1.3852   LearningRate 0.0002   Epoch: 25   Global Step: 44860   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-04 06:37:52,486-Speed 13861.88 samples/sec   Loss 1.3937   LearningRate 0.0002   Epoch: 25   Global Step: 44870   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-04 06:38:10,284-Speed 13809.00 samples/sec   Loss 1.4029   LearningRate 0.0002   Epoch: 25   Global Step: 44880   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-04 06:38:28,108-Speed 13789.56 samples/sec   Loss 1.4052   LearningRate 0.0002   Epoch: 25   Global Step: 44890   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-03-04 06:38:45,889-Speed 13821.65 samples/sec   Loss 1.4033   LearningRate 0.0002   Epoch: 25   Global Step: 44900   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-03-04 06:39:03,629-Speed 13854.78 samples/sec   Loss 1.3997   LearningRate 0.0002   Epoch: 25   Global Step: 44910   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-03-04 06:39:21,446-Speed 13794.00 samples/sec   Loss 1.3918   LearningRate 0.0002   Epoch: 25   Global Step: 44920   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-03-04 06:39:39,300-Speed 13767.27 samples/sec   Loss 1.4098   LearningRate 0.0002   Epoch: 25   Global Step: 44930   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-03-04 06:40:47,802-Speed 3587.65 samples/sec   Loss 1.3985   LearningRate 0.0002   Epoch: 26   Global Step: 44940   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-03-04 06:41:05,450-Speed 13926.83 samples/sec   Loss 1.3711   LearningRate 0.0002   Epoch: 26   Global Step: 44950   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-03-04 06:41:23,174-Speed 13867.64 samples/sec   Loss 1.3850   LearningRate 0.0002   Epoch: 26   Global Step: 44960   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-03-04 06:41:40,983-Speed 13800.67 samples/sec   Loss 1.3775   LearningRate 0.0002   Epoch: 26   Global Step: 44970   Fp16 Grad Scale: 8192   Required: 12 hours
Training: 2022-03-04 06:41:58,972-Speed 13662.62 samples/sec   Loss 1.3741   LearningRate 0.0002   Epoch: 26   Global Step: 44980   Fp16 Grad Scale: 8192   Required: 12 hours
Training: 2022-03-04 06:42:16,893-Speed 13714.27 samples/sec   Loss 1.3814   LearningRate 0.0002   Epoch: 26   Global Step: 44990   Fp16 Grad Scale: 8192   Required: 12 hours
Training: 2022-03-04 06:42:34,714-Speed 13791.10 samples/sec   Loss 1.3792   LearningRate 0.0002   Epoch: 26   Global Step: 45000   Fp16 Grad Scale: 8192   Required: 12 hours
Training: 2022-03-04 06:42:52,599-Speed 13742.19 samples/sec   Loss 1.3815   LearningRate 0.0002   Epoch: 26   Global Step: 45010   Fp16 Grad Scale: 8192   Required: 12 hours
Training: 2022-03-04 06:43:10,458-Speed 13761.78 samples/sec   Loss 1.3831   LearningRate 0.0002   Epoch: 26   Global Step: 45020   Fp16 Grad Scale: 8192   Required: 12 hours
Training: 2022-03-04 06:43:28,429-Speed 13676.54 samples/sec   Loss 1.3935   LearningRate 0.0001   Epoch: 26   Global Step: 45030   Fp16 Grad Scale: 8192   Required: 12 hours
Training: 2022-03-04 06:43:46,196-Speed 13833.31 samples/sec   Loss 1.3830   LearningRate 0.0001   Epoch: 26   Global Step: 45040   Fp16 Grad Scale: 8192   Required: 12 hours
Training: 2022-03-04 06:44:04,140-Speed 13696.66 samples/sec   Loss 1.3792   LearningRate 0.0001   Epoch: 26   Global Step: 45050   Fp16 Grad Scale: 8192   Required: 12 hours
Training: 2022-03-04 06:44:21,806-Speed 13912.69 samples/sec   Loss 1.3871   LearningRate 0.0001   Epoch: 26   Global Step: 45060   Fp16 Grad Scale: 8192   Required: 12 hours
Training: 2022-03-04 06:44:39,517-Speed 13876.76 samples/sec   Loss 1.3768   LearningRate 0.0001   Epoch: 26   Global Step: 45070   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-03-04 06:44:57,367-Speed 13768.46 samples/sec   Loss 1.3891   LearningRate 0.0001   Epoch: 26   Global Step: 45080   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-03-04 06:45:15,270-Speed 13728.50 samples/sec   Loss 1.3701   LearningRate 0.0001   Epoch: 26   Global Step: 45090   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-03-04 06:45:33,122-Speed 13767.97 samples/sec   Loss 1.3869   LearningRate 0.0001   Epoch: 26   Global Step: 45100   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-03-04 06:45:50,988-Speed 13757.01 samples/sec   Loss 1.3838   LearningRate 0.0001   Epoch: 26   Global Step: 45110   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-03-04 06:46:08,806-Speed 13794.71 samples/sec   Loss 1.3730   LearningRate 0.0001   Epoch: 26   Global Step: 45120   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-03-04 06:46:26,619-Speed 13798.02 samples/sec   Loss 1.3819   LearningRate 0.0001   Epoch: 26   Global Step: 45130   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-03-04 06:46:44,559-Speed 13700.12 samples/sec   Loss 1.3690   LearningRate 0.0001   Epoch: 26   Global Step: 45140   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-03-04 06:47:02,461-Speed 13728.94 samples/sec   Loss 1.3745   LearningRate 0.0001   Epoch: 26   Global Step: 45150   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-03-04 06:47:20,302-Speed 13776.20 samples/sec   Loss 1.3796   LearningRate 0.0001   Epoch: 26   Global Step: 45160   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-03-04 06:47:38,291-Speed 13662.33 samples/sec   Loss 1.3868   LearningRate 0.0001   Epoch: 26   Global Step: 45170   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-04 06:47:56,176-Speed 13743.67 samples/sec   Loss 1.3903   LearningRate 0.0001   Epoch: 26   Global Step: 45180   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-03-04 06:48:14,065-Speed 13739.12 samples/sec   Loss 1.3843   LearningRate 0.0001   Epoch: 26   Global Step: 45190   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-03-04 06:48:31,993-Speed 13709.47 samples/sec   Loss 1.3805   LearningRate 0.0001   Epoch: 26   Global Step: 45200   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-03-04 06:48:49,794-Speed 13806.45 samples/sec   Loss 1.3799   LearningRate 0.0001   Epoch: 26   Global Step: 45210   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-03-04 06:49:07,601-Speed 13802.13 samples/sec   Loss 1.3822   LearningRate 0.0001   Epoch: 26   Global Step: 45220   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-03-04 06:49:25,450-Speed 13770.08 samples/sec   Loss 1.3833   LearningRate 0.0001   Epoch: 26   Global Step: 45230   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-03-04 06:49:43,280-Speed 13783.80 samples/sec   Loss 1.3833   LearningRate 0.0001   Epoch: 26   Global Step: 45240   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-03-04 06:50:01,231-Speed 13691.79 samples/sec   Loss 1.3822   LearningRate 0.0001   Epoch: 26   Global Step: 45250   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-03-04 06:50:19,132-Speed 13729.78 samples/sec   Loss 1.3793   LearningRate 0.0001   Epoch: 26   Global Step: 45260   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-03-04 06:50:36,998-Speed 13756.64 samples/sec   Loss 1.3777   LearningRate 0.0001   Epoch: 26   Global Step: 45270   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-03-04 06:50:54,845-Speed 13771.51 samples/sec   Loss 1.3902   LearningRate 0.0001   Epoch: 26   Global Step: 45280   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-04 06:51:12,636-Speed 13814.29 samples/sec   Loss 1.3713   LearningRate 0.0001   Epoch: 26   Global Step: 45290   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-04 06:51:30,453-Speed 13796.38 samples/sec   Loss 1.3776   LearningRate 0.0001   Epoch: 26   Global Step: 45300   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-04 06:51:48,226-Speed 13828.70 samples/sec   Loss 1.3840   LearningRate 0.0001   Epoch: 26   Global Step: 45310   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-04 06:52:05,949-Speed 13867.07 samples/sec   Loss 1.3688   LearningRate 0.0001   Epoch: 26   Global Step: 45320   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-04 06:52:23,748-Speed 13808.88 samples/sec   Loss 1.3682   LearningRate 0.0001   Epoch: 26   Global Step: 45330   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-04 06:52:41,533-Speed 13820.61 samples/sec   Loss 1.3809   LearningRate 0.0001   Epoch: 26   Global Step: 45340   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-04 06:52:59,279-Speed 13850.07 samples/sec   Loss 1.3776   LearningRate 0.0001   Epoch: 26   Global Step: 45350   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-04 06:53:17,053-Speed 13827.64 samples/sec   Loss 1.3747   LearningRate 0.0001   Epoch: 26   Global Step: 45360   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-04 06:53:34,860-Speed 13801.89 samples/sec   Loss 1.3757   LearningRate 0.0001   Epoch: 26   Global Step: 45370   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-04 06:53:52,619-Speed 13840.35 samples/sec   Loss 1.3763   LearningRate 0.0001   Epoch: 26   Global Step: 45380   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-04 06:54:10,383-Speed 13835.35 samples/sec   Loss 1.3843   LearningRate 0.0001   Epoch: 26   Global Step: 45390   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-04 06:54:28,096-Speed 13875.63 samples/sec   Loss 1.3658   LearningRate 0.0001   Epoch: 26   Global Step: 45400   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-04 06:54:45,893-Speed 13809.77 samples/sec   Loss 1.3739   LearningRate 0.0001   Epoch: 26   Global Step: 45410   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-04 06:55:03,634-Speed 13853.50 samples/sec   Loss 1.3711   LearningRate 0.0001   Epoch: 26   Global Step: 45420   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-04 06:55:21,440-Speed 13803.05 samples/sec   Loss 1.3568   LearningRate 0.0001   Epoch: 26   Global Step: 45430   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-04 06:55:39,272-Speed 13783.66 samples/sec   Loss 1.3658   LearningRate 0.0001   Epoch: 26   Global Step: 45440   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-04 06:55:57,166-Speed 13735.18 samples/sec   Loss 1.3668   LearningRate 0.0001   Epoch: 26   Global Step: 45450   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-04 06:56:14,984-Speed 13793.66 samples/sec   Loss 1.3798   LearningRate 0.0001   Epoch: 26   Global Step: 45460   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-04 06:56:32,756-Speed 13829.24 samples/sec   Loss 1.3645   LearningRate 0.0001   Epoch: 26   Global Step: 45470   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-04 06:56:50,593-Speed 13778.87 samples/sec   Loss 1.3667   LearningRate 0.0001   Epoch: 26   Global Step: 45480   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-04 06:57:08,460-Speed 13755.99 samples/sec   Loss 1.3711   LearningRate 0.0001   Epoch: 26   Global Step: 45490   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-04 06:57:26,262-Speed 13806.60 samples/sec   Loss 1.3692   LearningRate 0.0001   Epoch: 26   Global Step: 45500   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-04 06:57:43,978-Speed 13872.62 samples/sec   Loss 1.3769   LearningRate 0.0001   Epoch: 26   Global Step: 45510   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-04 06:58:01,664-Speed 13896.85 samples/sec   Loss 1.3664   LearningRate 0.0001   Epoch: 26   Global Step: 45520   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-04 06:58:19,421-Speed 13840.76 samples/sec   Loss 1.3752   LearningRate 0.0001   Epoch: 26   Global Step: 45530   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-03-04 06:58:37,205-Speed 13819.58 samples/sec   Loss 1.3754   LearningRate 0.0001   Epoch: 26   Global Step: 45540   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-03-04 06:58:55,039-Speed 13781.78 samples/sec   Loss 1.3685   LearningRate 0.0001   Epoch: 26   Global Step: 45550   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-03-04 06:59:12,755-Speed 13873.04 samples/sec   Loss 1.3648   LearningRate 0.0001   Epoch: 26   Global Step: 45560   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-03-04 06:59:30,546-Speed 13814.45 samples/sec   Loss 1.3723   LearningRate 0.0001   Epoch: 26   Global Step: 45570   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-03-04 06:59:48,239-Speed 13891.23 samples/sec   Loss 1.3617   LearningRate 0.0001   Epoch: 26   Global Step: 45580   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-03-04 07:00:05,987-Speed 13848.46 samples/sec   Loss 1.3720   LearningRate 0.0001   Epoch: 26   Global Step: 45590   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-03-04 07:00:23,667-Speed 13901.35 samples/sec   Loss 1.3626   LearningRate 0.0001   Epoch: 26   Global Step: 45600   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-03-04 07:00:41,441-Speed 13827.84 samples/sec   Loss 1.3684   LearningRate 0.0001   Epoch: 26   Global Step: 45610   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-03-04 07:00:59,180-Speed 13855.07 samples/sec   Loss 1.3610   LearningRate 0.0001   Epoch: 26   Global Step: 45620   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-03-04 07:01:16,870-Speed 13893.53 samples/sec   Loss 1.3726   LearningRate 0.0001   Epoch: 26   Global Step: 45630   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-04 07:01:34,657-Speed 13817.66 samples/sec   Loss 1.3688   LearningRate 0.0001   Epoch: 26   Global Step: 45640   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-04 07:01:52,447-Speed 13815.77 samples/sec   Loss 1.3654   LearningRate 0.0001   Epoch: 26   Global Step: 45650   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-04 07:02:10,256-Speed 13800.67 samples/sec   Loss 1.3674   LearningRate 0.0001   Epoch: 26   Global Step: 45660   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-04 07:02:28,069-Speed 13797.65 samples/sec   Loss 1.3662   LearningRate 0.0001   Epoch: 26   Global Step: 45670   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-04 07:02:45,765-Speed 13888.46 samples/sec   Loss 1.3708   LearningRate 0.0001   Epoch: 26   Global Step: 45680   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-04 07:03:03,558-Speed 13813.04 samples/sec   Loss 1.3699   LearningRate 0.0001   Epoch: 26   Global Step: 45690   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-04 07:03:21,334-Speed 13826.49 samples/sec   Loss 1.3624   LearningRate 0.0001   Epoch: 26   Global Step: 45700   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-04 07:03:39,045-Speed 13877.84 samples/sec   Loss 1.3625   LearningRate 0.0001   Epoch: 26   Global Step: 45710   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-04 07:03:56,869-Speed 13789.34 samples/sec   Loss 1.3665   LearningRate 0.0001   Epoch: 26   Global Step: 45720   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-03-04 07:04:14,898-Speed 13631.78 samples/sec   Loss 1.3650   LearningRate 0.0001   Epoch: 26   Global Step: 45730   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-03-04 07:04:32,687-Speed 13816.16 samples/sec   Loss 1.3586   LearningRate 0.0001   Epoch: 26   Global Step: 45740   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-03-04 07:04:50,443-Speed 13842.72 samples/sec   Loss 1.3507   LearningRate 0.0001   Epoch: 26   Global Step: 45750   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-03-04 07:05:08,190-Speed 13848.60 samples/sec   Loss 1.3546   LearningRate 0.0001   Epoch: 26   Global Step: 45760   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-03-04 07:05:25,921-Speed 13861.95 samples/sec   Loss 1.3649   LearningRate 0.0001   Epoch: 26   Global Step: 45770   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-03-04 07:05:43,728-Speed 13801.93 samples/sec   Loss 1.3562   LearningRate 0.0001   Epoch: 26   Global Step: 45780   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-03-04 07:06:01,528-Speed 13807.47 samples/sec   Loss 1.3526   LearningRate 0.0001   Epoch: 26   Global Step: 45790   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-03-04 07:06:19,389-Speed 13761.00 samples/sec   Loss 1.3559   LearningRate 0.0001   Epoch: 26   Global Step: 45800   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-03-04 07:06:37,232-Speed 13775.21 samples/sec   Loss 1.3644   LearningRate 0.0001   Epoch: 26   Global Step: 45810   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-03-04 07:06:55,080-Speed 13770.37 samples/sec   Loss 1.3554   LearningRate 0.0001   Epoch: 26   Global Step: 45820   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-04 07:07:12,943-Speed 13758.39 samples/sec   Loss 1.3591   LearningRate 0.0001   Epoch: 26   Global Step: 45830   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-04 07:07:30,738-Speed 13811.96 samples/sec   Loss 1.3629   LearningRate 0.0001   Epoch: 26   Global Step: 45840   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-04 07:07:48,605-Speed 13756.23 samples/sec   Loss 1.3639   LearningRate 0.0001   Epoch: 26   Global Step: 45850   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-04 07:08:06,345-Speed 13854.09 samples/sec   Loss 1.3513   LearningRate 0.0001   Epoch: 26   Global Step: 45860   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-04 07:08:24,143-Speed 13808.87 samples/sec   Loss 1.3546   LearningRate 0.0001   Epoch: 26   Global Step: 45870   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-04 07:08:41,908-Speed 13835.02 samples/sec   Loss 1.3490   LearningRate 0.0001   Epoch: 26   Global Step: 45880   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-04 07:08:59,692-Speed 13820.37 samples/sec   Loss 1.3541   LearningRate 0.0001   Epoch: 26   Global Step: 45890   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-04 07:09:17,433-Speed 13853.66 samples/sec   Loss 1.3542   LearningRate 0.0001   Epoch: 26   Global Step: 45900   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-04 07:09:35,161-Speed 13862.94 samples/sec   Loss 1.3610   LearningRate 0.0001   Epoch: 26   Global Step: 45910   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-04 07:09:52,950-Speed 13818.25 samples/sec   Loss 1.3590   LearningRate 0.0001   Epoch: 26   Global Step: 45920   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-04 07:10:10,661-Speed 13876.63 samples/sec   Loss 1.3516   LearningRate 0.0001   Epoch: 26   Global Step: 45930   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-04 07:10:28,423-Speed 13837.35 samples/sec   Loss 1.3579   LearningRate 0.0001   Epoch: 26   Global Step: 45940   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-04 07:10:46,149-Speed 13866.52 samples/sec   Loss 1.3466   LearningRate 0.0001   Epoch: 26   Global Step: 45950   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-04 07:11:03,940-Speed 13813.94 samples/sec   Loss 1.3563   LearningRate 0.0001   Epoch: 26   Global Step: 45960   Fp16 Grad Scale: 65536   Required: 12 hours
Training: 2022-03-04 07:11:21,632-Speed 13892.27 samples/sec   Loss 1.3631   LearningRate 0.0001   Epoch: 26   Global Step: 45970   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-04 07:11:39,305-Speed 13907.04 samples/sec   Loss 1.3521   LearningRate 0.0001   Epoch: 26   Global Step: 45980   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-04 07:11:57,037-Speed 13860.64 samples/sec   Loss 1.3508   LearningRate 0.0001   Epoch: 26   Global Step: 45990   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-04 07:12:14,816-Speed 13824.49 samples/sec   Loss 1.3509   LearningRate 0.0001   Epoch: 26   Global Step: 46000   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-04 07:12:32,513-Speed 13888.13 samples/sec   Loss 1.3613   LearningRate 0.0001   Epoch: 26   Global Step: 46010   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-04 07:12:50,315-Speed 13806.06 samples/sec   Loss 1.3586   LearningRate 0.0001   Epoch: 26   Global Step: 46020   Fp16 Grad Scale: 32768   Required: 12 hours
Training: 2022-03-04 07:13:08,132-Speed 13793.74 samples/sec   Loss 1.3532   LearningRate 0.0001   Epoch: 26   Global Step: 46030   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-03-04 07:13:25,870-Speed 13855.93 samples/sec   Loss 1.3575   LearningRate 0.0001   Epoch: 26   Global Step: 46040   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-03-04 07:13:43,685-Speed 13799.04 samples/sec   Loss 1.3561   LearningRate 0.0001   Epoch: 26   Global Step: 46050   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-03-04 07:14:01,410-Speed 13866.04 samples/sec   Loss 1.3402   LearningRate 0.0001   Epoch: 26   Global Step: 46060   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-03-04 07:14:19,152-Speed 13853.11 samples/sec   Loss 1.3499   LearningRate 0.0001   Epoch: 26   Global Step: 46070   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-03-04 07:14:36,876-Speed 13866.77 samples/sec   Loss 1.3506   LearningRate 0.0001   Epoch: 26   Global Step: 46080   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-03-04 07:14:54,653-Speed 13825.92 samples/sec   Loss 1.3518   LearningRate 0.0001   Epoch: 26   Global Step: 46090   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-03-04 07:15:12,424-Speed 13830.15 samples/sec   Loss 1.3478   LearningRate 0.0001   Epoch: 26   Global Step: 46100   Fp16 Grad Scale: 8192   Required: 12 hours
Training: 2022-03-04 07:15:30,314-Speed 13737.88 samples/sec   Loss 1.3449   LearningRate 0.0001   Epoch: 26   Global Step: 46110   Fp16 Grad Scale: 8192   Required: 12 hours
Training: 2022-03-04 07:15:48,095-Speed 13821.85 samples/sec   Loss 1.3539   LearningRate 0.0001   Epoch: 26   Global Step: 46120   Fp16 Grad Scale: 8192   Required: 12 hours
Training: 2022-03-04 07:16:06,044-Speed 13693.70 samples/sec   Loss 1.3479   LearningRate 0.0001   Epoch: 26   Global Step: 46130   Fp16 Grad Scale: 8192   Required: 12 hours
Training: 2022-03-04 07:16:24,197-Speed 13539.77 samples/sec   Loss 1.3466   LearningRate 0.0001   Epoch: 26   Global Step: 46140   Fp16 Grad Scale: 8192   Required: 12 hours
Training: 2022-03-04 07:16:42,273-Speed 13598.77 samples/sec   Loss 1.3424   LearningRate 0.0001   Epoch: 26   Global Step: 46150   Fp16 Grad Scale: 8192   Required: 12 hours
Training: 2022-03-04 07:17:00,002-Speed 13862.36 samples/sec   Loss 1.3499   LearningRate 0.0001   Epoch: 26   Global Step: 46160   Fp16 Grad Scale: 8192   Required: 12 hours
Training: 2022-03-04 07:17:17,740-Speed 13855.80 samples/sec   Loss 1.3430   LearningRate 0.0001   Epoch: 26   Global Step: 46170   Fp16 Grad Scale: 8192   Required: 12 hours
Training: 2022-03-04 07:17:35,487-Speed 13849.07 samples/sec   Loss 1.3363   LearningRate 0.0001   Epoch: 26   Global Step: 46180   Fp16 Grad Scale: 8192   Required: 12 hours
Training: 2022-03-04 07:17:53,191-Speed 13882.41 samples/sec   Loss 1.3398   LearningRate 0.0001   Epoch: 26   Global Step: 46190   Fp16 Grad Scale: 8192   Required: 12 hours
Training: 2022-03-04 07:18:10,956-Speed 13835.02 samples/sec   Loss 1.3483   LearningRate 0.0001   Epoch: 26   Global Step: 46200   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-03-04 07:18:28,788-Speed 13783.57 samples/sec   Loss 1.3468   LearningRate 0.0001   Epoch: 26   Global Step: 46210   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-03-04 07:18:46,592-Speed 13804.51 samples/sec   Loss 1.3428   LearningRate 0.0001   Epoch: 26   Global Step: 46220   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-03-04 07:19:04,529-Speed 13702.35 samples/sec   Loss 1.3409   LearningRate 0.0001   Epoch: 26   Global Step: 46230   Fp16 Grad Scale: 16384   Required: 12 hours
Training: 2022-03-04 07:19:22,420-Speed 13737.08 samples/sec   Loss 1.3362   LearningRate 0.0001   Epoch: 26   Global Step: 46240   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-03-04 07:19:40,292-Speed 13752.47 samples/sec   Loss 1.3377   LearningRate 0.0001   Epoch: 26   Global Step: 46250   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-03-04 07:19:58,105-Speed 13797.36 samples/sec   Loss 1.3441   LearningRate 0.0001   Epoch: 26   Global Step: 46260   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-03-04 07:20:16,074-Speed 13677.62 samples/sec   Loss 1.3369   LearningRate 0.0001   Epoch: 26   Global Step: 46270   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-03-04 07:20:33,912-Speed 13778.91 samples/sec   Loss 1.3405   LearningRate 0.0001   Epoch: 26   Global Step: 46280   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-03-04 07:20:51,837-Speed 13710.76 samples/sec   Loss 1.3405   LearningRate 0.0001   Epoch: 26   Global Step: 46290   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-03-04 07:21:09,665-Speed 13786.48 samples/sec   Loss 1.3449   LearningRate 0.0001   Epoch: 26   Global Step: 46300   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 07:21:27,566-Speed 13729.31 samples/sec   Loss 1.3478   LearningRate 0.0001   Epoch: 26   Global Step: 46310   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 07:21:45,399-Speed 13782.33 samples/sec   Loss 1.3379   LearningRate 0.0001   Epoch: 26   Global Step: 46320   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 07:22:03,176-Speed 13825.48 samples/sec   Loss 1.3317   LearningRate 0.0001   Epoch: 26   Global Step: 46330   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 07:22:20,998-Speed 13790.93 samples/sec   Loss 1.3513   LearningRate 0.0001   Epoch: 26   Global Step: 46340   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 07:22:38,763-Speed 13835.04 samples/sec   Loss 1.3413   LearningRate 0.0001   Epoch: 26   Global Step: 46350   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 07:22:56,455-Speed 13891.80 samples/sec   Loss 1.3466   LearningRate 0.0001   Epoch: 26   Global Step: 46360   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 07:23:14,159-Speed 13882.76 samples/sec   Loss 1.3518   LearningRate 0.0001   Epoch: 26   Global Step: 46370   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 07:23:31,884-Speed 13865.95 samples/sec   Loss 1.3468   LearningRate 0.0001   Epoch: 26   Global Step: 46380   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 07:23:49,726-Speed 13775.40 samples/sec   Loss 1.3407   LearningRate 0.0001   Epoch: 26   Global Step: 46390   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 07:24:07,478-Speed 13844.72 samples/sec   Loss 1.3389   LearningRate 0.0001   Epoch: 26   Global Step: 46400   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-04 07:24:25,288-Speed 13800.46 samples/sec   Loss 1.3381   LearningRate 0.0001   Epoch: 26   Global Step: 46410   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-04 07:24:43,085-Speed 13810.63 samples/sec   Loss 1.3384   LearningRate 0.0001   Epoch: 26   Global Step: 46420   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 07:25:00,894-Speed 13800.25 samples/sec   Loss 1.3327   LearningRate 0.0001   Epoch: 26   Global Step: 46430   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 07:25:18,810-Speed 13718.61 samples/sec   Loss 1.3336   LearningRate 0.0001   Epoch: 26   Global Step: 46440   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 07:25:36,843-Speed 13628.88 samples/sec   Loss 1.3415   LearningRate 0.0001   Epoch: 26   Global Step: 46450   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 07:25:54,744-Speed 13730.14 samples/sec   Loss 1.3278   LearningRate 0.0001   Epoch: 26   Global Step: 46460   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 07:26:12,672-Speed 13708.70 samples/sec   Loss 1.3342   LearningRate 0.0001   Epoch: 26   Global Step: 46470   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 07:26:30,466-Speed 13812.79 samples/sec   Loss 1.3357   LearningRate 0.0001   Epoch: 26   Global Step: 46480   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 07:26:48,299-Speed 13781.42 samples/sec   Loss 1.3328   LearningRate 0.0001   Epoch: 26   Global Step: 46490   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 07:27:06,132-Speed 13783.21 samples/sec   Loss 1.3390   LearningRate 0.0001   Epoch: 26   Global Step: 46500   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 07:27:23,872-Speed 13854.37 samples/sec   Loss 1.3429   LearningRate 0.0001   Epoch: 26   Global Step: 46510   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 07:27:41,627-Speed 13842.63 samples/sec   Loss 1.3317   LearningRate 0.0001   Epoch: 26   Global Step: 46520   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-04 07:27:59,332-Speed 13880.89 samples/sec   Loss 1.3455   LearningRate 0.0001   Epoch: 26   Global Step: 46530   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 07:28:17,095-Speed 13836.31 samples/sec   Loss 1.3435   LearningRate 0.0001   Epoch: 26   Global Step: 46540   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 07:28:34,888-Speed 13813.41 samples/sec   Loss 1.3354   LearningRate 0.0001   Epoch: 26   Global Step: 46550   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 07:28:52,710-Speed 13790.83 samples/sec   Loss 1.3414   LearningRate 0.0001   Epoch: 26   Global Step: 46560   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 07:29:10,438-Speed 13863.66 samples/sec   Loss 1.3364   LearningRate 0.0001   Epoch: 26   Global Step: 46570   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 07:29:28,146-Speed 13879.03 samples/sec   Loss 1.3487   LearningRate 0.0001   Epoch: 26   Global Step: 46580   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 07:29:46,112-Speed 13680.51 samples/sec   Loss 1.3453   LearningRate 0.0001   Epoch: 26   Global Step: 46590   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 07:30:04,206-Speed 13582.81 samples/sec   Loss 1.3318   LearningRate 0.0001   Epoch: 26   Global Step: 46600   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 07:30:22,039-Speed 13782.15 samples/sec   Loss 1.3381   LearningRate 0.0001   Epoch: 26   Global Step: 46610   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 07:30:39,801-Speed 13837.57 samples/sec   Loss 1.3379   LearningRate 0.0001   Epoch: 26   Global Step: 46620   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 07:30:57,520-Speed 13871.13 samples/sec   Loss 1.3462   LearningRate 0.0001   Epoch: 26   Global Step: 46630   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-04 07:31:15,358-Speed 13778.28 samples/sec   Loss 1.3370   LearningRate 0.0001   Epoch: 26   Global Step: 46640   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-04 07:31:33,114-Speed 13842.26 samples/sec   Loss 1.3391   LearningRate 0.0001   Epoch: 26   Global Step: 46650   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 07:31:50,964-Speed 13768.48 samples/sec   Loss 1.3561   LearningRate 0.0001   Epoch: 26   Global Step: 46660   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 07:32:59,441-Speed 3589.03 samples/sec   Loss 1.3326   LearningRate 0.0001   Epoch: 27   Global Step: 46670   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 07:33:17,135-Speed 13892.29 samples/sec   Loss 1.3333   LearningRate 0.0001   Epoch: 27   Global Step: 46680   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 07:33:34,822-Speed 13896.37 samples/sec   Loss 1.3217   LearningRate 0.0001   Epoch: 27   Global Step: 46690   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 07:33:52,562-Speed 13853.58 samples/sec   Loss 1.3333   LearningRate 0.0001   Epoch: 27   Global Step: 46700   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-03-04 07:34:10,535-Speed 13674.72 samples/sec   Loss 1.3260   LearningRate 0.0001   Epoch: 27   Global Step: 46710   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-03-04 07:34:28,412-Speed 13748.70 samples/sec   Loss 1.3229   LearningRate 0.0001   Epoch: 27   Global Step: 46720   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-03-04 07:34:46,188-Speed 13825.84 samples/sec   Loss 1.3280   LearningRate 0.0001   Epoch: 27   Global Step: 46730   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-03-04 07:35:04,033-Speed 13773.16 samples/sec   Loss 1.3249   LearningRate 0.0001   Epoch: 27   Global Step: 46740   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-03-04 07:35:21,750-Speed 13872.21 samples/sec   Loss 1.3308   LearningRate 0.0001   Epoch: 27   Global Step: 46750   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-03-04 07:35:39,581-Speed 13784.01 samples/sec   Loss 1.3251   LearningRate 0.0001   Epoch: 27   Global Step: 46760   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-03-04 07:35:57,362-Speed 13822.15 samples/sec   Loss 1.3340   LearningRate 0.0001   Epoch: 27   Global Step: 46770   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-03-04 07:36:15,175-Speed 13797.40 samples/sec   Loss 1.3222   LearningRate 0.0001   Epoch: 27   Global Step: 46780   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-03-04 07:36:32,981-Speed 13803.09 samples/sec   Loss 1.3143   LearningRate 0.0001   Epoch: 27   Global Step: 46790   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-03-04 07:36:50,772-Speed 13814.54 samples/sec   Loss 1.3245   LearningRate 0.0001   Epoch: 27   Global Step: 46800   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 07:37:08,613-Speed 13776.51 samples/sec   Loss 1.3305   LearningRate 0.0001   Epoch: 27   Global Step: 46810   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 07:37:26,362-Speed 13847.97 samples/sec   Loss 1.3234   LearningRate 0.0001   Epoch: 27   Global Step: 46820   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 07:37:44,206-Speed 13773.09 samples/sec   Loss 1.3205   LearningRate 0.0001   Epoch: 27   Global Step: 46830   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 07:38:02,042-Speed 13779.90 samples/sec   Loss 1.3240   LearningRate 0.0001   Epoch: 27   Global Step: 46840   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 07:38:19,849-Speed 13802.52 samples/sec   Loss 1.3264   LearningRate 0.0001   Epoch: 27   Global Step: 46850   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 07:38:37,600-Speed 13845.54 samples/sec   Loss 1.3225   LearningRate 0.0001   Epoch: 27   Global Step: 46860   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 07:38:55,382-Speed 13821.78 samples/sec   Loss 1.3232   LearningRate 0.0001   Epoch: 27   Global Step: 46870   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 07:39:13,243-Speed 13760.22 samples/sec   Loss 1.3209   LearningRate 0.0001   Epoch: 27   Global Step: 46880   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 07:39:31,011-Speed 13832.93 samples/sec   Loss 1.3254   LearningRate 0.0001   Epoch: 27   Global Step: 46890   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 07:39:48,768-Speed 13840.91 samples/sec   Loss 1.3243   LearningRate 0.0001   Epoch: 27   Global Step: 46900   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-04 07:40:06,543-Speed 13826.81 samples/sec   Loss 1.3225   LearningRate 0.0001   Epoch: 27   Global Step: 46910   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-04 07:40:24,281-Speed 13855.94 samples/sec   Loss 1.3289   LearningRate 0.0001   Epoch: 27   Global Step: 46920   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 07:40:42,026-Speed 13850.74 samples/sec   Loss 1.3153   LearningRate 0.0001   Epoch: 27   Global Step: 46930   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 07:40:59,823-Speed 13810.06 samples/sec   Loss 1.3230   LearningRate 0.0001   Epoch: 27   Global Step: 46940   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 07:41:17,656-Speed 13781.97 samples/sec   Loss 1.3277   LearningRate 0.0001   Epoch: 27   Global Step: 46950   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 07:41:35,420-Speed 13835.76 samples/sec   Loss 1.3348   LearningRate 0.0001   Epoch: 27   Global Step: 46960   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 07:41:53,195-Speed 13827.40 samples/sec   Loss 1.3228   LearningRate 0.0001   Epoch: 27   Global Step: 46970   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 07:42:10,890-Speed 13889.39 samples/sec   Loss 1.3201   LearningRate 0.0001   Epoch: 27   Global Step: 46980   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 07:42:28,646-Speed 13841.80 samples/sec   Loss 1.3186   LearningRate 0.0001   Epoch: 27   Global Step: 46990   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 07:42:46,368-Speed 13868.23 samples/sec   Loss 1.3222   LearningRate 0.0001   Epoch: 27   Global Step: 47000   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 07:43:04,115-Speed 13849.69 samples/sec   Loss 1.3317   LearningRate 0.0001   Epoch: 27   Global Step: 47010   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 07:43:21,834-Speed 13870.77 samples/sec   Loss 1.3200   LearningRate 0.0001   Epoch: 27   Global Step: 47020   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 07:43:39,588-Speed 13842.85 samples/sec   Loss 1.3258   LearningRate 0.0001   Epoch: 27   Global Step: 47030   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 07:43:57,291-Speed 13883.50 samples/sec   Loss 1.3277   LearningRate 0.0001   Epoch: 27   Global Step: 47040   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 07:44:15,177-Speed 13741.28 samples/sec   Loss 1.3290   LearningRate 0.0001   Epoch: 27   Global Step: 47050   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 07:44:33,037-Speed 13761.32 samples/sec   Loss 1.3249   LearningRate 0.0001   Epoch: 27   Global Step: 47060   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 07:44:50,737-Speed 13885.09 samples/sec   Loss 1.3308   LearningRate 0.0001   Epoch: 27   Global Step: 47070   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 07:45:08,519-Speed 13822.16 samples/sec   Loss 1.3114   LearningRate 0.0001   Epoch: 27   Global Step: 47080   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 07:45:26,232-Speed 13875.05 samples/sec   Loss 1.3131   LearningRate 0.0001   Epoch: 27   Global Step: 47090   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 07:45:43,938-Speed 13880.78 samples/sec   Loss 1.3136   LearningRate 0.0001   Epoch: 27   Global Step: 47100   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 07:46:01,678-Speed 13854.79 samples/sec   Loss 1.3220   LearningRate 0.0001   Epoch: 27   Global Step: 47110   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 07:46:19,417-Speed 13855.15 samples/sec   Loss 1.3248   LearningRate 0.0001   Epoch: 27   Global Step: 47120   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-04 07:46:37,176-Speed 13839.23 samples/sec   Loss 1.3093   LearningRate 0.0001   Epoch: 27   Global Step: 47130   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 07:46:54,885-Speed 13878.31 samples/sec   Loss 1.3149   LearningRate 0.0001   Epoch: 27   Global Step: 47140   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 07:47:12,593-Speed 13879.39 samples/sec   Loss 1.3265   LearningRate 0.0001   Epoch: 27   Global Step: 47150   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 07:47:30,358-Speed 13834.86 samples/sec   Loss 1.3169   LearningRate 0.0001   Epoch: 27   Global Step: 47160   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 07:47:48,127-Speed 13831.98 samples/sec   Loss 1.3102   LearningRate 0.0001   Epoch: 27   Global Step: 47170   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 07:48:05,864-Speed 13857.59 samples/sec   Loss 1.3173   LearningRate 0.0001   Epoch: 27   Global Step: 47180   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 07:48:23,524-Speed 13916.46 samples/sec   Loss 1.3151   LearningRate 0.0001   Epoch: 27   Global Step: 47190   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 07:48:41,277-Speed 13844.22 samples/sec   Loss 1.3135   LearningRate 0.0001   Epoch: 27   Global Step: 47200   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 07:48:58,982-Speed 13881.95 samples/sec   Loss 1.3157   LearningRate 0.0001   Epoch: 27   Global Step: 47210   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 07:49:16,742-Speed 13840.24 samples/sec   Loss 1.3173   LearningRate 0.0001   Epoch: 27   Global Step: 47220   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 07:49:34,472-Speed 13863.29 samples/sec   Loss 1.3044   LearningRate 0.0001   Epoch: 27   Global Step: 47230   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-04 07:49:52,362-Speed 13738.45 samples/sec   Loss 1.3172   LearningRate 0.0001   Epoch: 27   Global Step: 47240   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-04 07:50:10,037-Speed 13904.66 samples/sec   Loss 1.3193   LearningRate 0.0001   Epoch: 27   Global Step: 47250   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-04 07:50:27,726-Speed 13894.73 samples/sec   Loss 1.3160   LearningRate 0.0001   Epoch: 27   Global Step: 47260   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 07:50:45,488-Speed 13839.44 samples/sec   Loss 1.3162   LearningRate 0.0001   Epoch: 27   Global Step: 47270   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 07:51:03,274-Speed 13818.37 samples/sec   Loss 1.3072   LearningRate 0.0001   Epoch: 27   Global Step: 47280   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 07:51:21,092-Speed 13793.80 samples/sec   Loss 1.3121   LearningRate 0.0001   Epoch: 27   Global Step: 47290   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 07:51:38,899-Speed 13802.03 samples/sec   Loss 1.3156   LearningRate 0.0001   Epoch: 27   Global Step: 47300   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-03-04 07:51:57,065-Speed 13529.40 samples/sec   Loss 1.3167   LearningRate 0.0001   Epoch: 27   Global Step: 47310   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-03-04 07:52:15,364-Speed 13431.45 samples/sec   Loss 1.3268   LearningRate 0.0001   Epoch: 27   Global Step: 47320   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-03-04 07:52:33,479-Speed 13567.10 samples/sec   Loss 1.3092   LearningRate 0.0001   Epoch: 27   Global Step: 47330   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-03-04 07:52:51,644-Speed 13530.02 samples/sec   Loss 1.3108   LearningRate 0.0001   Epoch: 27   Global Step: 47340   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-03-04 07:53:09,718-Speed 13598.66 samples/sec   Loss 1.3140   LearningRate 0.0001   Epoch: 27   Global Step: 47350   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-03-04 07:53:27,537-Speed 13792.60 samples/sec   Loss 1.3123   LearningRate 0.0001   Epoch: 27   Global Step: 47360   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-03-04 07:53:45,316-Speed 13823.68 samples/sec   Loss 1.3072   LearningRate 0.0001   Epoch: 27   Global Step: 47370   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-03-04 07:54:03,104-Speed 13816.97 samples/sec   Loss 1.3049   LearningRate 0.0001   Epoch: 27   Global Step: 47380   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-03-04 07:54:20,861-Speed 13841.30 samples/sec   Loss 1.3140   LearningRate 0.0001   Epoch: 27   Global Step: 47390   Fp16 Grad Scale: 8192   Required: 11 hours
Training: 2022-03-04 07:54:38,632-Speed 13830.18 samples/sec   Loss 1.3036   LearningRate 0.0001   Epoch: 27   Global Step: 47400   Fp16 Grad Scale: 8192   Required: 11 hours
Training: 2022-03-04 07:54:56,365-Speed 13859.92 samples/sec   Loss 1.3085   LearningRate 0.0001   Epoch: 27   Global Step: 47410   Fp16 Grad Scale: 8192   Required: 11 hours
Training: 2022-03-04 07:55:14,133-Speed 13832.31 samples/sec   Loss 1.3052   LearningRate 0.0001   Epoch: 27   Global Step: 47420   Fp16 Grad Scale: 8192   Required: 11 hours
Training: 2022-03-04 07:55:31,936-Speed 13805.84 samples/sec   Loss 1.3029   LearningRate 0.0001   Epoch: 27   Global Step: 47430   Fp16 Grad Scale: 8192   Required: 11 hours
Training: 2022-03-04 07:55:49,736-Speed 13807.83 samples/sec   Loss 1.3140   LearningRate 0.0001   Epoch: 27   Global Step: 47440   Fp16 Grad Scale: 8192   Required: 11 hours
Training: 2022-03-04 07:56:07,530-Speed 13811.89 samples/sec   Loss 1.3041   LearningRate 0.0001   Epoch: 27   Global Step: 47450   Fp16 Grad Scale: 8192   Required: 11 hours
Training: 2022-03-04 07:56:25,359-Speed 13785.32 samples/sec   Loss 1.2984   LearningRate 0.0001   Epoch: 27   Global Step: 47460   Fp16 Grad Scale: 8192   Required: 11 hours
Training: 2022-03-04 07:56:43,178-Speed 13793.22 samples/sec   Loss 1.3088   LearningRate 0.0001   Epoch: 27   Global Step: 47470   Fp16 Grad Scale: 8192   Required: 11 hours
Training: 2022-03-04 07:57:00,982-Speed 13804.38 samples/sec   Loss 1.2958   LearningRate 0.0001   Epoch: 27   Global Step: 47480   Fp16 Grad Scale: 8192   Required: 11 hours
Training: 2022-03-04 07:57:18,777-Speed 13813.69 samples/sec   Loss 1.3067   LearningRate 0.0001   Epoch: 27   Global Step: 47490   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-03-04 07:57:36,592-Speed 13795.17 samples/sec   Loss 1.3076   LearningRate 0.0001   Epoch: 27   Global Step: 47500   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-03-04 07:57:54,401-Speed 13800.83 samples/sec   Loss 1.3100   LearningRate 0.0001   Epoch: 27   Global Step: 47510   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-03-04 07:58:12,241-Speed 13777.53 samples/sec   Loss 1.3092   LearningRate 0.0001   Epoch: 27   Global Step: 47520   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-03-04 07:58:30,053-Speed 13798.91 samples/sec   Loss 1.3004   LearningRate 0.0001   Epoch: 27   Global Step: 47530   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-03-04 07:58:47,878-Speed 13787.90 samples/sec   Loss 1.3096   LearningRate 0.0001   Epoch: 27   Global Step: 47540   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-03-04 07:59:05,713-Speed 13780.71 samples/sec   Loss 1.3114   LearningRate 0.0001   Epoch: 27   Global Step: 47550   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-03-04 07:59:23,536-Speed 13789.95 samples/sec   Loss 1.3071   LearningRate 0.0001   Epoch: 27   Global Step: 47560   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-03-04 07:59:41,331-Speed 13811.49 samples/sec   Loss 1.3070   LearningRate 0.0001   Epoch: 27   Global Step: 47570   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-03-04 07:59:59,115-Speed 13820.12 samples/sec   Loss 1.2907   LearningRate 0.0001   Epoch: 27   Global Step: 47580   Fp16 Grad Scale: 16384   Required: 11 hours
Training: 2022-03-04 08:00:16,919-Speed 13804.57 samples/sec   Loss 1.3034   LearningRate 0.0001   Epoch: 27   Global Step: 47590   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 08:00:34,691-Speed 13829.74 samples/sec   Loss 1.2927   LearningRate 0.0001   Epoch: 27   Global Step: 47600   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 08:00:52,487-Speed 13810.60 samples/sec   Loss 1.3065   LearningRate 0.0001   Epoch: 27   Global Step: 47610   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 08:01:10,283-Speed 13811.40 samples/sec   Loss 1.3101   LearningRate 0.0001   Epoch: 27   Global Step: 47620   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 08:01:28,150-Speed 13755.43 samples/sec   Loss 1.2978   LearningRate 0.0001   Epoch: 27   Global Step: 47630   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 08:01:45,856-Speed 13881.40 samples/sec   Loss 1.3158   LearningRate 0.0001   Epoch: 27   Global Step: 47640   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 08:02:03,718-Speed 13759.65 samples/sec   Loss 1.2932   LearningRate 0.0001   Epoch: 27   Global Step: 47650   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 08:02:21,425-Speed 13879.68 samples/sec   Loss 1.2984   LearningRate 0.0001   Epoch: 27   Global Step: 47660   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 08:02:39,174-Speed 13847.59 samples/sec   Loss 1.3065   LearningRate 0.0001   Epoch: 27   Global Step: 47670   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 08:02:56,974-Speed 13808.06 samples/sec   Loss 1.3087   LearningRate 0.0001   Epoch: 27   Global Step: 47680   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 08:03:14,758-Speed 13819.91 samples/sec   Loss 1.3011   LearningRate 0.0001   Epoch: 27   Global Step: 47690   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-04 08:03:32,483-Speed 13865.98 samples/sec   Loss 1.3020   LearningRate 0.0001   Epoch: 27   Global Step: 47700   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 08:03:50,209-Speed 13865.43 samples/sec   Loss 1.3018   LearningRate 0.0001   Epoch: 27   Global Step: 47710   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 08:04:07,927-Speed 13871.00 samples/sec   Loss 1.3009   LearningRate 0.0001   Epoch: 27   Global Step: 47720   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 08:04:25,788-Speed 13760.89 samples/sec   Loss 1.3062   LearningRate 0.0001   Epoch: 27   Global Step: 47730   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 08:04:43,550-Speed 13838.39 samples/sec   Loss 1.3065   LearningRate 0.0001   Epoch: 27   Global Step: 47740   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 08:05:01,276-Speed 13865.81 samples/sec   Loss 1.3057   LearningRate 0.0001   Epoch: 27   Global Step: 47750   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 08:05:19,181-Speed 13726.50 samples/sec   Loss 1.2973   LearningRate 0.0001   Epoch: 27   Global Step: 47760   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 08:05:37,076-Speed 13734.48 samples/sec   Loss 1.3052   LearningRate 0.0001   Epoch: 27   Global Step: 47770   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 08:05:55,008-Speed 13705.81 samples/sec   Loss 1.3091   LearningRate 0.0001   Epoch: 27   Global Step: 47780   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 08:06:13,041-Speed 13629.50 samples/sec   Loss 1.2966   LearningRate 0.0001   Epoch: 27   Global Step: 47790   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 08:06:31,089-Speed 13617.46 samples/sec   Loss 1.2922   LearningRate 0.0001   Epoch: 27   Global Step: 47800   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-04 08:06:49,137-Speed 13618.38 samples/sec   Loss 1.2933   LearningRate 0.0001   Epoch: 27   Global Step: 47810   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-04 08:07:07,198-Speed 13607.53 samples/sec   Loss 1.2953   LearningRate 0.0001   Epoch: 27   Global Step: 47820   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-04 08:07:25,477-Speed 13446.36 samples/sec   Loss 1.3008   LearningRate 0.0001   Epoch: 27   Global Step: 47830   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-04 08:07:43,432-Speed 13688.30 samples/sec   Loss 1.3030   LearningRate 0.0001   Epoch: 27   Global Step: 47840   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 08:08:01,182-Speed 13848.60 samples/sec   Loss 1.2940   LearningRate 0.0001   Epoch: 27   Global Step: 47850   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 08:08:18,916-Speed 13860.05 samples/sec   Loss 1.2917   LearningRate 0.0001   Epoch: 27   Global Step: 47860   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 08:08:36,617-Speed 13885.00 samples/sec   Loss 1.2982   LearningRate 0.0001   Epoch: 27   Global Step: 47870   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 08:08:54,384-Speed 13832.76 samples/sec   Loss 1.2856   LearningRate 0.0001   Epoch: 27   Global Step: 47880   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 08:09:12,164-Speed 13823.71 samples/sec   Loss 1.2956   LearningRate 0.0001   Epoch: 27   Global Step: 47890   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 08:09:29,920-Speed 13841.66 samples/sec   Loss 1.2945   LearningRate 0.0001   Epoch: 27   Global Step: 47900   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 08:09:47,738-Speed 13793.35 samples/sec   Loss 1.2938   LearningRate 0.0001   Epoch: 27   Global Step: 47910   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 08:10:05,514-Speed 13826.60 samples/sec   Loss 1.2893   LearningRate 0.0001   Epoch: 27   Global Step: 47920   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 08:10:23,305-Speed 13814.13 samples/sec   Loss 1.2924   LearningRate 0.0001   Epoch: 27   Global Step: 47930   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 08:10:41,026-Speed 13869.60 samples/sec   Loss 1.2906   LearningRate 0.0001   Epoch: 27   Global Step: 47940   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 08:10:58,777-Speed 13845.91 samples/sec   Loss 1.2942   LearningRate 0.0001   Epoch: 27   Global Step: 47950   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 08:11:16,552-Speed 13827.04 samples/sec   Loss 1.2949   LearningRate 0.0001   Epoch: 27   Global Step: 47960   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 08:11:34,286-Speed 13859.05 samples/sec   Loss 1.2901   LearningRate 0.0001   Epoch: 27   Global Step: 47970   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 08:11:52,053-Speed 13833.33 samples/sec   Loss 1.2874   LearningRate 0.0001   Epoch: 27   Global Step: 47980   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 08:12:09,784-Speed 13862.41 samples/sec   Loss 1.2857   LearningRate 0.0001   Epoch: 27   Global Step: 47990   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 08:12:27,481-Speed 13888.29 samples/sec   Loss 1.2883   LearningRate 0.0001   Epoch: 27   Global Step: 48000   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 08:12:45,243-Speed 13836.40 samples/sec   Loss 1.2827   LearningRate 0.0001   Epoch: 27   Global Step: 48010   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 08:13:02,947-Speed 13883.13 samples/sec   Loss 1.2889   LearningRate 0.0001   Epoch: 27   Global Step: 48020   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 08:13:20,663-Speed 13873.32 samples/sec   Loss 1.2794   LearningRate 0.0001   Epoch: 27   Global Step: 48030   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 08:13:38,404-Speed 13852.82 samples/sec   Loss 1.2863   LearningRate 0.0001   Epoch: 27   Global Step: 48040   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-04 08:13:56,119-Speed 13873.93 samples/sec   Loss 1.2840   LearningRate 0.0001   Epoch: 27   Global Step: 48050   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-04 08:14:13,827-Speed 13880.10 samples/sec   Loss 1.2892   LearningRate 0.0001   Epoch: 27   Global Step: 48060   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-04 08:14:31,612-Speed 13819.45 samples/sec   Loss 1.2765   LearningRate 0.0001   Epoch: 27   Global Step: 48070   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-04 08:14:49,413-Speed 13807.13 samples/sec   Loss 1.2820   LearningRate 0.0001   Epoch: 27   Global Step: 48080   Fp16 Grad Scale: 65536   Required: 11 hours
Training: 2022-03-04 08:15:07,190-Speed 13824.93 samples/sec   Loss 1.2930   LearningRate 0.0001   Epoch: 27   Global Step: 48090   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 08:15:24,887-Speed 13888.33 samples/sec   Loss 1.2826   LearningRate 0.0001   Epoch: 27   Global Step: 48100   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 08:15:42,667-Speed 13823.97 samples/sec   Loss 1.2958   LearningRate 0.0001   Epoch: 27   Global Step: 48110   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 08:16:00,388-Speed 13869.41 samples/sec   Loss 1.2935   LearningRate 0.0001   Epoch: 27   Global Step: 48120   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 08:16:18,160-Speed 13828.75 samples/sec   Loss 1.2916   LearningRate 0.0001   Epoch: 27   Global Step: 48130   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 08:16:35,941-Speed 13822.97 samples/sec   Loss 1.2886   LearningRate 0.0001   Epoch: 27   Global Step: 48140   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 08:16:53,695-Speed 13842.91 samples/sec   Loss 1.2878   LearningRate 0.0001   Epoch: 27   Global Step: 48150   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 08:17:11,499-Speed 13805.16 samples/sec   Loss 1.2841   LearningRate 0.0001   Epoch: 27   Global Step: 48160   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 08:17:29,416-Speed 13717.13 samples/sec   Loss 1.2879   LearningRate 0.0001   Epoch: 27   Global Step: 48170   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 08:17:47,204-Speed 13816.62 samples/sec   Loss 1.2849   LearningRate 0.0001   Epoch: 27   Global Step: 48180   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 08:18:04,924-Speed 13870.43 samples/sec   Loss 1.2896   LearningRate 0.0001   Epoch: 27   Global Step: 48190   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 08:18:22,653-Speed 13862.63 samples/sec   Loss 1.2889   LearningRate 0.0001   Epoch: 27   Global Step: 48200   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 08:18:40,390-Speed 13857.11 samples/sec   Loss 1.2973   LearningRate 0.0001   Epoch: 27   Global Step: 48210   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 08:18:58,100-Speed 13877.59 samples/sec   Loss 1.2895   LearningRate 0.0001   Epoch: 27   Global Step: 48220   Fp16 Grad Scale: 32768   Required: 11 hours
Training: 2022-03-04 08:19:15,894-Speed 13812.17 samples/sec   Loss 1.2820   LearningRate 0.0001   Epoch: 27   Global Step: 48230   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-04 08:19:33,643-Speed 13847.65 samples/sec   Loss 1.2920   LearningRate 0.0001   Epoch: 27   Global Step: 48240   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-04 08:19:51,459-Speed 13794.57 samples/sec   Loss 1.2885   LearningRate 0.0001   Epoch: 27   Global Step: 48250   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-04 08:20:09,170-Speed 13877.48 samples/sec   Loss 1.2824   LearningRate 0.0001   Epoch: 27   Global Step: 48260   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-04 08:20:26,899-Speed 13863.02 samples/sec   Loss 1.2914   LearningRate 0.0001   Epoch: 27   Global Step: 48270   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-04 08:20:44,611-Speed 13878.61 samples/sec   Loss 1.2857   LearningRate 0.0001   Epoch: 27   Global Step: 48280   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-04 08:21:02,379-Speed 13833.77 samples/sec   Loss 1.2889   LearningRate 0.0001   Epoch: 27   Global Step: 48290   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-04 08:21:20,091-Speed 13875.88 samples/sec   Loss 1.2802   LearningRate 0.0001   Epoch: 27   Global Step: 48300   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-04 08:21:37,840-Speed 13847.07 samples/sec   Loss 1.2820   LearningRate 0.0001   Epoch: 27   Global Step: 48310   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-04 08:21:55,561-Speed 13871.03 samples/sec   Loss 1.2912   LearningRate 0.0001   Epoch: 27   Global Step: 48320   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-04 08:22:13,296-Speed 13858.79 samples/sec   Loss 1.2820   LearningRate 0.0001   Epoch: 27   Global Step: 48330   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-04 08:22:31,113-Speed 13794.03 samples/sec   Loss 1.2827   LearningRate 0.0001   Epoch: 27   Global Step: 48340   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-04 08:22:48,940-Speed 13786.27 samples/sec   Loss 1.2899   LearningRate 0.0001   Epoch: 27   Global Step: 48350   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-04 08:23:06,738-Speed 13809.48 samples/sec   Loss 1.2928   LearningRate 0.0001   Epoch: 27   Global Step: 48360   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-04 08:23:24,525-Speed 13817.74 samples/sec   Loss 1.2894   LearningRate 0.0001   Epoch: 27   Global Step: 48370   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-04 08:23:42,349-Speed 13789.25 samples/sec   Loss 1.2893   LearningRate 0.0001   Epoch: 27   Global Step: 48380   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-04 08:24:00,174-Speed 13788.49 samples/sec   Loss 1.2891   LearningRate 0.0001   Epoch: 27   Global Step: 48390   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-04 08:25:08,215-Speed 3611.97 samples/sec   Loss 1.2847   LearningRate 0.0001   Epoch: 28   Global Step: 48400   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-04 08:25:25,913-Speed 13887.51 samples/sec   Loss 1.2774   LearningRate 0.0001   Epoch: 28   Global Step: 48410   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-04 08:25:43,589-Speed 13904.51 samples/sec   Loss 1.2691   LearningRate 0.0001   Epoch: 28   Global Step: 48420   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-04 08:26:01,347-Speed 13840.19 samples/sec   Loss 1.2733   LearningRate 0.0001   Epoch: 28   Global Step: 48430   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-04 08:26:18,944-Speed 13967.55 samples/sec   Loss 1.2772   LearningRate 0.0001   Epoch: 28   Global Step: 48440   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-04 08:26:36,628-Speed 13898.11 samples/sec   Loss 1.2714   LearningRate 0.0001   Epoch: 28   Global Step: 48450   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-03-04 08:26:54,253-Speed 13945.06 samples/sec   Loss 1.2632   LearningRate 0.0001   Epoch: 28   Global Step: 48460   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-03-04 08:27:11,963-Speed 13877.52 samples/sec   Loss 1.2774   LearningRate 0.0001   Epoch: 28   Global Step: 48470   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-03-04 08:27:29,691-Speed 13864.10 samples/sec   Loss 1.2710   LearningRate 0.0001   Epoch: 28   Global Step: 48480   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-03-04 08:27:47,453-Speed 13837.30 samples/sec   Loss 1.2728   LearningRate 0.0001   Epoch: 28   Global Step: 48490   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-03-04 08:28:05,218-Speed 13835.13 samples/sec   Loss 1.2759   LearningRate 0.0001   Epoch: 28   Global Step: 48500   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-03-04 08:28:22,962-Speed 13852.23 samples/sec   Loss 1.2760   LearningRate 0.0001   Epoch: 28   Global Step: 48510   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-03-04 08:28:40,611-Speed 13925.42 samples/sec   Loss 1.2630   LearningRate 0.0001   Epoch: 28   Global Step: 48520   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-03-04 08:28:58,341-Speed 13862.16 samples/sec   Loss 1.2715   LearningRate 0.0001   Epoch: 28   Global Step: 48530   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-03-04 08:29:15,989-Speed 13926.64 samples/sec   Loss 1.2744   LearningRate 0.0001   Epoch: 28   Global Step: 48540   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-03-04 08:29:33,675-Speed 13896.60 samples/sec   Loss 1.2714   LearningRate 0.0001   Epoch: 28   Global Step: 48550   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-04 08:29:51,422-Speed 13848.71 samples/sec   Loss 1.2651   LearningRate 0.0001   Epoch: 28   Global Step: 48560   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-04 08:30:09,110-Speed 13895.74 samples/sec   Loss 1.2777   LearningRate 0.0001   Epoch: 28   Global Step: 48570   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-04 08:30:26,816-Speed 13880.49 samples/sec   Loss 1.2762   LearningRate 0.0001   Epoch: 28   Global Step: 48580   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-04 08:30:44,542-Speed 13865.55 samples/sec   Loss 1.2667   LearningRate 0.0001   Epoch: 28   Global Step: 48590   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-04 08:31:02,312-Speed 13830.67 samples/sec   Loss 1.2691   LearningRate 0.0001   Epoch: 28   Global Step: 48600   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-04 08:31:20,057-Speed 13850.60 samples/sec   Loss 1.2753   LearningRate 0.0001   Epoch: 28   Global Step: 48610   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-04 08:31:37,729-Speed 13907.86 samples/sec   Loss 1.2772   LearningRate 0.0001   Epoch: 28   Global Step: 48620   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-04 08:31:55,442-Speed 13874.94 samples/sec   Loss 1.2865   LearningRate 0.0001   Epoch: 28   Global Step: 48630   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-04 08:32:13,207-Speed 13834.66 samples/sec   Loss 1.2797   LearningRate 0.0001   Epoch: 28   Global Step: 48640   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-04 08:32:31,181-Speed 13673.98 samples/sec   Loss 1.2844   LearningRate 0.0001   Epoch: 28   Global Step: 48650   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-04 08:32:49,264-Speed 13591.58 samples/sec   Loss 1.2719   LearningRate 0.0001   Epoch: 28   Global Step: 48660   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-04 08:33:07,434-Speed 13526.32 samples/sec   Loss 1.2697   LearningRate 0.0001   Epoch: 28   Global Step: 48670   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-04 08:33:25,518-Speed 13591.09 samples/sec   Loss 1.2706   LearningRate 0.0001   Epoch: 28   Global Step: 48680   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-04 08:33:43,719-Speed 13502.90 samples/sec   Loss 1.2699   LearningRate 0.0001   Epoch: 28   Global Step: 48690   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-04 08:34:01,613-Speed 13735.82 samples/sec   Loss 1.2788   LearningRate 0.0001   Epoch: 28   Global Step: 48700   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-04 08:34:19,442-Speed 13785.45 samples/sec   Loss 1.2627   LearningRate 0.0001   Epoch: 28   Global Step: 48710   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-04 08:34:37,158-Speed 13872.75 samples/sec   Loss 1.2654   LearningRate 0.0001   Epoch: 28   Global Step: 48720   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-04 08:34:54,837-Speed 13901.98 samples/sec   Loss 1.2656   LearningRate 0.0001   Epoch: 28   Global Step: 48730   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-04 08:35:12,484-Speed 13926.99 samples/sec   Loss 1.2794   LearningRate 0.0001   Epoch: 28   Global Step: 48740   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-03-04 08:35:30,255-Speed 13830.77 samples/sec   Loss 1.2619   LearningRate 0.0001   Epoch: 28   Global Step: 48750   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-03-04 08:35:47,941-Speed 13896.61 samples/sec   Loss 1.2640   LearningRate 0.0001   Epoch: 28   Global Step: 48760   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-03-04 08:36:05,716-Speed 13827.12 samples/sec   Loss 1.2732   LearningRate 0.0001   Epoch: 28   Global Step: 48770   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-03-04 08:36:23,495-Speed 13823.06 samples/sec   Loss 1.2721   LearningRate 0.0001   Epoch: 28   Global Step: 48780   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-03-04 08:36:41,236-Speed 13853.90 samples/sec   Loss 1.2636   LearningRate 0.0001   Epoch: 28   Global Step: 48790   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-03-04 08:36:59,075-Speed 13777.66 samples/sec   Loss 1.2691   LearningRate 0.0001   Epoch: 28   Global Step: 48800   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-03-04 08:37:16,893-Speed 13793.25 samples/sec   Loss 1.2736   LearningRate 0.0001   Epoch: 28   Global Step: 48810   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-03-04 08:37:34,814-Speed 13714.63 samples/sec   Loss 1.2678   LearningRate 0.0001   Epoch: 28   Global Step: 48820   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-03-04 08:37:52,645-Speed 13783.84 samples/sec   Loss 1.2558   LearningRate 0.0001   Epoch: 28   Global Step: 48830   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-03-04 08:38:10,470-Speed 13788.76 samples/sec   Loss 1.2700   LearningRate 0.0001   Epoch: 28   Global Step: 48840   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-04 08:38:28,333-Speed 13759.05 samples/sec   Loss 1.2611   LearningRate 0.0001   Epoch: 28   Global Step: 48850   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-04 08:38:46,126-Speed 13812.48 samples/sec   Loss 1.2635   LearningRate 0.0001   Epoch: 28   Global Step: 48860   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-04 08:39:04,085-Speed 13685.78 samples/sec   Loss 1.2593   LearningRate 0.0001   Epoch: 28   Global Step: 48870   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-04 08:39:21,898-Speed 13797.39 samples/sec   Loss 1.2699   LearningRate 0.0001   Epoch: 28   Global Step: 48880   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-04 08:39:39,780-Speed 13744.29 samples/sec   Loss 1.2597   LearningRate 0.0001   Epoch: 28   Global Step: 48890   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-04 08:39:57,570-Speed 13816.89 samples/sec   Loss 1.2690   LearningRate 0.0001   Epoch: 28   Global Step: 48900   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-04 08:40:15,315-Speed 13851.36 samples/sec   Loss 1.2751   LearningRate 0.0001   Epoch: 28   Global Step: 48910   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-04 08:40:33,058-Speed 13851.51 samples/sec   Loss 1.2715   LearningRate 0.0001   Epoch: 28   Global Step: 48920   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-04 08:40:50,830-Speed 13829.39 samples/sec   Loss 1.2680   LearningRate 0.0001   Epoch: 28   Global Step: 48930   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-03-04 08:41:08,645-Speed 13797.19 samples/sec   Loss 1.2641   LearningRate 0.0001   Epoch: 28   Global Step: 48940   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-03-04 08:41:26,506-Speed 13760.13 samples/sec   Loss 1.2560   LearningRate 0.0001   Epoch: 28   Global Step: 48950   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-03-04 08:41:44,385-Speed 13746.58 samples/sec   Loss 1.2610   LearningRate 0.0001   Epoch: 28   Global Step: 48960   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-03-04 08:42:02,163-Speed 13825.00 samples/sec   Loss 1.2667   LearningRate 0.0001   Epoch: 28   Global Step: 48970   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-03-04 08:42:19,966-Speed 13805.33 samples/sec   Loss 1.2561   LearningRate 0.0001   Epoch: 28   Global Step: 48980   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-03-04 08:42:37,711-Speed 13850.24 samples/sec   Loss 1.2613   LearningRate 0.0001   Epoch: 28   Global Step: 48990   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-03-04 08:42:55,504-Speed 13812.92 samples/sec   Loss 1.2673   LearningRate 0.0001   Epoch: 28   Global Step: 49000   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-03-04 08:43:13,308-Speed 13805.83 samples/sec   Loss 1.2763   LearningRate 0.0001   Epoch: 28   Global Step: 49010   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-03-04 08:43:31,028-Speed 13869.31 samples/sec   Loss 1.2628   LearningRate 0.0001   Epoch: 28   Global Step: 49020   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-03-04 08:43:48,821-Speed 13813.29 samples/sec   Loss 1.2684   LearningRate 0.0001   Epoch: 28   Global Step: 49030   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-04 08:44:06,610-Speed 13816.81 samples/sec   Loss 1.2712   LearningRate 0.0001   Epoch: 28   Global Step: 49040   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-04 08:44:24,363-Speed 13844.50 samples/sec   Loss 1.2592   LearningRate 0.0001   Epoch: 28   Global Step: 49050   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-04 08:44:42,104-Speed 13853.26 samples/sec   Loss 1.2703   LearningRate 0.0001   Epoch: 28   Global Step: 49060   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-04 08:44:59,795-Speed 13892.85 samples/sec   Loss 1.2651   LearningRate 0.0001   Epoch: 28   Global Step: 49070   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-04 08:45:17,577-Speed 13821.83 samples/sec   Loss 1.2575   LearningRate 0.0001   Epoch: 28   Global Step: 49080   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-04 08:45:35,318-Speed 13853.79 samples/sec   Loss 1.2661   LearningRate 0.0001   Epoch: 28   Global Step: 49090   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-04 08:45:53,066-Speed 13848.00 samples/sec   Loss 1.2632   LearningRate 0.0001   Epoch: 28   Global Step: 49100   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-04 08:46:10,789-Speed 13868.01 samples/sec   Loss 1.2583   LearningRate 0.0001   Epoch: 28   Global Step: 49110   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-04 08:46:28,485-Speed 13888.92 samples/sec   Loss 1.2626   LearningRate 0.0001   Epoch: 28   Global Step: 49120   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-03-04 08:46:46,212-Speed 13864.72 samples/sec   Loss 1.2530   LearningRate 0.0001   Epoch: 28   Global Step: 49130   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-03-04 08:47:04,118-Speed 13725.28 samples/sec   Loss 1.2615   LearningRate 0.0001   Epoch: 28   Global Step: 49140   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-03-04 08:47:21,862-Speed 13851.25 samples/sec   Loss 1.2569   LearningRate 0.0001   Epoch: 28   Global Step: 49150   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-03-04 08:47:39,600-Speed 13856.13 samples/sec   Loss 1.2593   LearningRate 0.0001   Epoch: 28   Global Step: 49160   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-03-04 08:47:57,470-Speed 13753.68 samples/sec   Loss 1.2659   LearningRate 0.0001   Epoch: 28   Global Step: 49170   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-03-04 08:48:15,241-Speed 13829.83 samples/sec   Loss 1.2567   LearningRate 0.0001   Epoch: 28   Global Step: 49180   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-03-04 08:48:32,983-Speed 13852.96 samples/sec   Loss 1.2587   LearningRate 0.0001   Epoch: 28   Global Step: 49190   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-03-04 08:48:50,712-Speed 13863.33 samples/sec   Loss 1.2549   LearningRate 0.0001   Epoch: 28   Global Step: 49200   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-03-04 08:49:08,426-Speed 13874.51 samples/sec   Loss 1.2561   LearningRate 0.0001   Epoch: 28   Global Step: 49210   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-03-04 08:49:26,188-Speed 13836.62 samples/sec   Loss 1.2495   LearningRate 0.0001   Epoch: 28   Global Step: 49220   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-04 08:49:43,982-Speed 13812.70 samples/sec   Loss 1.2524   LearningRate 0.0001   Epoch: 28   Global Step: 49230   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-04 08:50:01,725-Speed 13852.19 samples/sec   Loss 1.2432   LearningRate 0.0001   Epoch: 28   Global Step: 49240   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-04 08:50:19,438-Speed 13875.39 samples/sec   Loss 1.2487   LearningRate 0.0001   Epoch: 28   Global Step: 49250   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-04 08:50:37,170-Speed 13860.57 samples/sec   Loss 1.2469   LearningRate 0.0001   Epoch: 28   Global Step: 49260   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-04 08:50:54,846-Speed 13904.30 samples/sec   Loss 1.2528   LearningRate 0.0001   Epoch: 28   Global Step: 49270   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-04 08:51:12,705-Speed 13761.75 samples/sec   Loss 1.2442   LearningRate 0.0001   Epoch: 28   Global Step: 49280   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-04 08:51:30,522-Speed 13795.25 samples/sec   Loss 1.2512   LearningRate 0.0001   Epoch: 28   Global Step: 49290   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-04 08:51:48,273-Speed 13845.75 samples/sec   Loss 1.2433   LearningRate 0.0001   Epoch: 28   Global Step: 49300   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-04 08:52:06,028-Speed 13842.45 samples/sec   Loss 1.2446   LearningRate 0.0001   Epoch: 28   Global Step: 49310   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-04 08:52:23,660-Speed 13939.20 samples/sec   Loss 1.2520   LearningRate 0.0001   Epoch: 28   Global Step: 49320   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-04 08:52:41,511-Speed 13768.10 samples/sec   Loss 1.2465   LearningRate 0.0001   Epoch: 28   Global Step: 49330   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-04 08:52:59,184-Speed 13907.10 samples/sec   Loss 1.2549   LearningRate 0.0001   Epoch: 28   Global Step: 49340   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-04 08:53:16,824-Speed 13932.78 samples/sec   Loss 1.2553   LearningRate 0.0001   Epoch: 28   Global Step: 49350   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-04 08:53:34,522-Speed 13887.00 samples/sec   Loss 1.2452   LearningRate 0.0001   Epoch: 28   Global Step: 49360   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-04 08:53:52,256-Speed 13859.42 samples/sec   Loss 1.2550   LearningRate 0.0001   Epoch: 28   Global Step: 49370   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-04 08:54:09,987-Speed 13861.55 samples/sec   Loss 1.2502   LearningRate 0.0001   Epoch: 28   Global Step: 49380   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-04 08:54:27,729-Speed 13852.24 samples/sec   Loss 1.2469   LearningRate 0.0001   Epoch: 28   Global Step: 49390   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-04 08:54:45,409-Speed 13901.41 samples/sec   Loss 1.2568   LearningRate 0.0001   Epoch: 28   Global Step: 49400   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-04 08:55:03,185-Speed 13827.64 samples/sec   Loss 1.2509   LearningRate 0.0001   Epoch: 28   Global Step: 49410   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-04 08:55:20,888-Speed 13883.41 samples/sec   Loss 1.2498   LearningRate 0.0001   Epoch: 28   Global Step: 49420   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-04 08:55:38,682-Speed 13812.50 samples/sec   Loss 1.2550   LearningRate 0.0001   Epoch: 28   Global Step: 49430   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-04 08:55:56,424-Speed 13853.08 samples/sec   Loss 1.2452   LearningRate 0.0001   Epoch: 28   Global Step: 49440   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-04 08:56:14,159-Speed 13858.71 samples/sec   Loss 1.2565   LearningRate 0.0001   Epoch: 28   Global Step: 49450   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-04 08:56:31,948-Speed 13816.20 samples/sec   Loss 1.2408   LearningRate 0.0001   Epoch: 28   Global Step: 49460   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-04 08:56:49,723-Speed 13827.28 samples/sec   Loss 1.2583   LearningRate 0.0001   Epoch: 28   Global Step: 49470   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-04 08:57:07,441-Speed 13870.78 samples/sec   Loss 1.2484   LearningRate 0.0001   Epoch: 28   Global Step: 49480   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-04 08:57:25,143-Speed 13884.32 samples/sec   Loss 1.2398   LearningRate 0.0001   Epoch: 28   Global Step: 49490   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-04 08:57:42,838-Speed 13891.22 samples/sec   Loss 1.2425   LearningRate 0.0001   Epoch: 28   Global Step: 49500   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-04 08:58:00,490-Speed 13922.96 samples/sec   Loss 1.2498   LearningRate 0.0001   Epoch: 28   Global Step: 49510   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-04 08:58:18,187-Speed 13888.34 samples/sec   Loss 1.2478   LearningRate 0.0001   Epoch: 28   Global Step: 49520   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-04 08:58:35,926-Speed 13854.78 samples/sec   Loss 1.2466   LearningRate 0.0001   Epoch: 28   Global Step: 49530   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-03-04 08:58:53,744-Speed 13794.25 samples/sec   Loss 1.2396   LearningRate 0.0001   Epoch: 28   Global Step: 49540   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-03-04 08:59:11,599-Speed 13764.68 samples/sec   Loss 1.2459   LearningRate 0.0001   Epoch: 28   Global Step: 49550   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-03-04 08:59:29,338-Speed 13854.86 samples/sec   Loss 1.2484   LearningRate 0.0001   Epoch: 28   Global Step: 49560   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-03-04 08:59:47,125-Speed 13817.84 samples/sec   Loss 1.2501   LearningRate 0.0001   Epoch: 28   Global Step: 49570   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-03-04 09:00:04,917-Speed 13813.96 samples/sec   Loss 1.2429   LearningRate 0.0001   Epoch: 28   Global Step: 49580   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-03-04 09:00:22,611-Speed 13890.86 samples/sec   Loss 1.2473   LearningRate 0.0001   Epoch: 28   Global Step: 49590   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-03-04 09:00:40,301-Speed 13893.19 samples/sec   Loss 1.2365   LearningRate 0.0001   Epoch: 28   Global Step: 49600   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-03-04 09:00:58,013-Speed 13876.46 samples/sec   Loss 1.2529   LearningRate 0.0001   Epoch: 28   Global Step: 49610   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-03-04 09:01:15,831-Speed 13793.39 samples/sec   Loss 1.2458   LearningRate 0.0001   Epoch: 28   Global Step: 49620   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-03-04 09:01:33,670-Speed 13777.34 samples/sec   Loss 1.2342   LearningRate 0.0001   Epoch: 28   Global Step: 49630   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-04 09:01:51,395-Speed 13866.33 samples/sec   Loss 1.2329   LearningRate 0.0001   Epoch: 28   Global Step: 49640   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-04 09:02:09,209-Speed 13796.25 samples/sec   Loss 1.2375   LearningRate 0.0001   Epoch: 28   Global Step: 49650   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-04 09:02:26,981-Speed 13829.81 samples/sec   Loss 1.2379   LearningRate 0.0001   Epoch: 28   Global Step: 49660   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-04 09:02:44,706-Speed 13865.96 samples/sec   Loss 1.2441   LearningRate 0.0001   Epoch: 28   Global Step: 49670   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-04 09:03:02,424-Speed 13871.49 samples/sec   Loss 1.2387   LearningRate 0.0001   Epoch: 28   Global Step: 49680   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-04 09:03:20,148-Speed 13866.68 samples/sec   Loss 1.2383   LearningRate 0.0001   Epoch: 28   Global Step: 49690   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-04 09:03:37,965-Speed 13794.59 samples/sec   Loss 1.2401   LearningRate 0.0001   Epoch: 28   Global Step: 49700   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-04 09:03:55,773-Speed 13801.68 samples/sec   Loss 1.2385   LearningRate 0.0001   Epoch: 28   Global Step: 49710   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-04 09:04:13,605-Speed 13782.66 samples/sec   Loss 1.2380   LearningRate 0.0001   Epoch: 28   Global Step: 49720   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-04 09:04:31,402-Speed 13810.16 samples/sec   Loss 1.2451   LearningRate 0.0001   Epoch: 28   Global Step: 49730   Fp16 Grad Scale: 65536   Required: 10 hours
Training: 2022-03-04 09:04:49,077-Speed 13905.53 samples/sec   Loss 1.2529   LearningRate 0.0001   Epoch: 28   Global Step: 49740   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-04 09:05:06,779-Speed 13884.11 samples/sec   Loss 1.2408   LearningRate 0.0001   Epoch: 28   Global Step: 49750   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-04 09:05:24,515-Speed 13857.60 samples/sec   Loss 1.2402   LearningRate 0.0001   Epoch: 28   Global Step: 49760   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-04 09:05:42,244-Speed 13862.77 samples/sec   Loss 1.2442   LearningRate 0.0001   Epoch: 28   Global Step: 49770   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-04 09:05:59,950-Speed 13880.71 samples/sec   Loss 1.2418   LearningRate 0.0001   Epoch: 28   Global Step: 49780   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-04 09:06:17,711-Speed 13838.33 samples/sec   Loss 1.2369   LearningRate 0.0001   Epoch: 28   Global Step: 49790   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-04 09:06:35,463-Speed 13844.43 samples/sec   Loss 1.2310   LearningRate 0.0001   Epoch: 28   Global Step: 49800   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-04 09:06:53,239-Speed 13826.41 samples/sec   Loss 1.2396   LearningRate 0.0001   Epoch: 28   Global Step: 49810   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-04 09:07:10,954-Speed 13874.24 samples/sec   Loss 1.2517   LearningRate 0.0001   Epoch: 28   Global Step: 49820   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-04 09:07:28,805-Speed 13768.47 samples/sec   Loss 1.2390   LearningRate 0.0001   Epoch: 28   Global Step: 49830   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-04 09:07:46,523-Speed 13871.72 samples/sec   Loss 1.2395   LearningRate 0.0001   Epoch: 28   Global Step: 49840   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-04 09:08:04,376-Speed 13766.26 samples/sec   Loss 1.2343   LearningRate 0.0001   Epoch: 28   Global Step: 49850   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-04 09:08:22,200-Speed 13788.41 samples/sec   Loss 1.2385   LearningRate 0.0001   Epoch: 28   Global Step: 49860   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-04 09:08:39,991-Speed 13815.14 samples/sec   Loss 1.2451   LearningRate 0.0001   Epoch: 28   Global Step: 49870   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-04 09:08:57,717-Speed 13867.13 samples/sec   Loss 1.2408   LearningRate 0.0001   Epoch: 28   Global Step: 49880   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-03-04 09:09:15,474-Speed 13841.62 samples/sec   Loss 1.2300   LearningRate 0.0001   Epoch: 28   Global Step: 49890   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-03-04 09:09:33,264-Speed 13815.39 samples/sec   Loss 1.2450   LearningRate 0.0001   Epoch: 28   Global Step: 49900   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-03-04 09:09:51,024-Speed 13838.36 samples/sec   Loss 1.2361   LearningRate 0.0001   Epoch: 28   Global Step: 49910   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-03-04 09:10:08,826-Speed 13806.76 samples/sec   Loss 1.2439   LearningRate 0.0001   Epoch: 28   Global Step: 49920   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-03-04 09:10:26,676-Speed 13768.86 samples/sec   Loss 1.2325   LearningRate 0.0001   Epoch: 28   Global Step: 49930   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-03-04 09:10:44,736-Speed 13608.56 samples/sec   Loss 1.2297   LearningRate 0.0001   Epoch: 28   Global Step: 49940   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-03-04 09:11:02,809-Speed 13599.46 samples/sec   Loss 1.2309   LearningRate 0.0001   Epoch: 28   Global Step: 49950   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-03-04 09:11:20,895-Speed 13588.87 samples/sec   Loss 1.2326   LearningRate 0.0001   Epoch: 28   Global Step: 49960   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-03-04 09:11:38,958-Speed 13606.54 samples/sec   Loss 1.2360   LearningRate 0.0001   Epoch: 28   Global Step: 49970   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-03-04 09:11:57,065-Speed 13573.84 samples/sec   Loss 1.2404   LearningRate 0.0001   Epoch: 28   Global Step: 49980   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-04 09:12:15,170-Speed 13575.08 samples/sec   Loss 1.2311   LearningRate 0.0001   Epoch: 28   Global Step: 49990   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-04 09:12:33,304-Speed 13553.86 samples/sec   Loss 1.2303   LearningRate 0.0001   Epoch: 28   Global Step: 50000   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-04 09:12:51,446-Speed 13547.25 samples/sec   Loss 1.2323   LearningRate 0.0001   Epoch: 28   Global Step: 50010   Fp16 Grad Scale: 32768   Required: 10 hours
Training: 2022-03-04 09:13:09,534-Speed 13587.17 samples/sec   Loss 1.2311   LearningRate 0.0001   Epoch: 28   Global Step: 50020   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-03-04 09:13:27,639-Speed 13574.88 samples/sec   Loss 1.2345   LearningRate 0.0001   Epoch: 28   Global Step: 50030   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-03-04 09:13:45,667-Speed 13633.51 samples/sec   Loss 1.2346   LearningRate 0.0001   Epoch: 28   Global Step: 50040   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-03-04 09:14:03,717-Speed 13616.68 samples/sec   Loss 1.2416   LearningRate 0.0001   Epoch: 28   Global Step: 50050   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-03-04 09:14:21,825-Speed 13572.06 samples/sec   Loss 1.2349   LearningRate 0.0001   Epoch: 28   Global Step: 50060   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-03-04 09:14:39,909-Speed 13591.26 samples/sec   Loss 1.2424   LearningRate 0.0001   Epoch: 28   Global Step: 50070   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-03-04 09:14:57,679-Speed 13830.43 samples/sec   Loss 1.2398   LearningRate 0.0001   Epoch: 28   Global Step: 50080   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-03-04 09:15:15,608-Speed 13708.59 samples/sec   Loss 1.2396   LearningRate 0.0001   Epoch: 28   Global Step: 50090   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-03-04 09:15:33,436-Speed 13785.40 samples/sec   Loss 1.2355   LearningRate 0.0001   Epoch: 28   Global Step: 50100   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-03-04 09:15:51,233-Speed 13809.91 samples/sec   Loss 1.2494   LearningRate 0.0001   Epoch: 28   Global Step: 50110   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-03-04 09:16:09,065-Speed 13783.14 samples/sec   Loss 1.2481   LearningRate 0.0001   Epoch: 28   Global Step: 50120   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-03-04 09:17:16,119-Speed 3665.21 samples/sec   Loss 1.2336   LearningRate 0.0001   Epoch: 29   Global Step: 50130   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-03-04 09:17:33,827-Speed 13879.38 samples/sec   Loss 1.2270   LearningRate 0.0001   Epoch: 29   Global Step: 50140   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-03-04 09:17:51,691-Speed 13757.83 samples/sec   Loss 1.2248   LearningRate 0.0001   Epoch: 29   Global Step: 50150   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-03-04 09:18:09,614-Speed 13713.08 samples/sec   Loss 1.2317   LearningRate 0.0001   Epoch: 29   Global Step: 50160   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-03-04 09:18:27,489-Speed 13750.02 samples/sec   Loss 1.2304   LearningRate 0.0001   Epoch: 29   Global Step: 50170   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-03-04 09:18:45,419-Speed 13706.97 samples/sec   Loss 1.2281   LearningRate 0.0001   Epoch: 29   Global Step: 50180   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-03-04 09:19:03,196-Speed 13825.55 samples/sec   Loss 1.2160   LearningRate 0.0001   Epoch: 29   Global Step: 50190   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-03-04 09:19:21,020-Speed 13789.35 samples/sec   Loss 1.2256   LearningRate 0.0001   Epoch: 29   Global Step: 50200   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-03-04 09:19:38,715-Speed 13889.99 samples/sec   Loss 1.2193   LearningRate 0.0001   Epoch: 29   Global Step: 50210   Fp16 Grad Scale: 16384   Required: 10 hours
Training: 2022-03-04 09:19:56,474-Speed 13839.19 samples/sec   Loss 1.2249   LearningRate 0.0001   Epoch: 29   Global Step: 50220   Fp16 Grad Scale: 8192   Required: 10 hours
Training: 2022-03-04 09:20:14,199-Speed 13866.29 samples/sec   Loss 1.2261   LearningRate 0.0001   Epoch: 29   Global Step: 50230   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-03-04 09:20:31,936-Speed 13855.97 samples/sec   Loss 1.2292   LearningRate 0.0001   Epoch: 29   Global Step: 50240   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-03-04 09:20:49,661-Speed 13866.90 samples/sec   Loss 1.2329   LearningRate 0.0001   Epoch: 29   Global Step: 50250   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-03-04 09:21:07,369-Speed 13879.75 samples/sec   Loss 1.2163   LearningRate 0.0001   Epoch: 29   Global Step: 50260   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-03-04 09:21:25,156-Speed 13817.56 samples/sec   Loss 1.2113   LearningRate 0.0001   Epoch: 29   Global Step: 50270   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-03-04 09:21:42,857-Speed 13884.39 samples/sec   Loss 1.2248   LearningRate 0.0001   Epoch: 29   Global Step: 50280   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-03-04 09:22:00,618-Speed 13838.40 samples/sec   Loss 1.2162   LearningRate 0.0001   Epoch: 29   Global Step: 50290   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-03-04 09:22:18,360-Speed 13853.43 samples/sec   Loss 1.2189   LearningRate 0.0001   Epoch: 29   Global Step: 50300   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-03-04 09:22:36,057-Speed 13887.75 samples/sec   Loss 1.2270   LearningRate 0.0001   Epoch: 29   Global Step: 50310   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-03-04 09:22:53,810-Speed 13845.61 samples/sec   Loss 1.2181   LearningRate 0.0001   Epoch: 29   Global Step: 50320   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-03-04 09:23:11,515-Speed 13882.15 samples/sec   Loss 1.2305   LearningRate 0.0001   Epoch: 29   Global Step: 50330   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-03-04 09:23:29,253-Speed 13855.45 samples/sec   Loss 1.2253   LearningRate 0.0001   Epoch: 29   Global Step: 50340   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-03-04 09:23:47,091-Speed 13778.49 samples/sec   Loss 1.2237   LearningRate 0.0001   Epoch: 29   Global Step: 50350   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-03-04 09:24:04,841-Speed 13846.39 samples/sec   Loss 1.2115   LearningRate 0.0001   Epoch: 29   Global Step: 50360   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-03-04 09:24:22,625-Speed 13819.70 samples/sec   Loss 1.2257   LearningRate 0.0001   Epoch: 29   Global Step: 50370   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-03-04 09:24:40,500-Speed 13750.08 samples/sec   Loss 1.2288   LearningRate 0.0001   Epoch: 29   Global Step: 50380   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-03-04 09:24:58,174-Speed 13906.20 samples/sec   Loss 1.2229   LearningRate 0.0001   Epoch: 29   Global Step: 50390   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-03-04 09:25:15,918-Speed 13850.43 samples/sec   Loss 1.2287   LearningRate 0.0001   Epoch: 29   Global Step: 50400   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-03-04 09:25:33,672-Speed 13844.35 samples/sec   Loss 1.2269   LearningRate 0.0001   Epoch: 29   Global Step: 50410   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-03-04 09:25:51,433-Speed 13838.42 samples/sec   Loss 1.2202   LearningRate 0.0001   Epoch: 29   Global Step: 50420   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-03-04 09:26:09,404-Speed 13676.05 samples/sec   Loss 1.2276   LearningRate 0.0001   Epoch: 29   Global Step: 50430   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-03-04 09:26:27,535-Speed 13555.70 samples/sec   Loss 1.2229   LearningRate 0.0001   Epoch: 29   Global Step: 50440   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-03-04 09:26:45,525-Speed 13661.76 samples/sec   Loss 1.2248   LearningRate 0.0001   Epoch: 29   Global Step: 50450   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-03-04 09:27:03,541-Speed 13641.88 samples/sec   Loss 1.2178   LearningRate 0.0001   Epoch: 29   Global Step: 50460   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-03-04 09:27:21,499-Speed 13686.53 samples/sec   Loss 1.2235   LearningRate 0.0001   Epoch: 29   Global Step: 50470   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-03-04 09:27:39,223-Speed 13866.81 samples/sec   Loss 1.2233   LearningRate 0.0001   Epoch: 29   Global Step: 50480   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-03-04 09:27:56,969-Speed 13849.19 samples/sec   Loss 1.2254   LearningRate 0.0001   Epoch: 29   Global Step: 50490   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-03-04 09:28:14,707-Speed 13856.09 samples/sec   Loss 1.2171   LearningRate 0.0001   Epoch: 29   Global Step: 50500   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-03-04 09:28:32,447-Speed 13854.41 samples/sec   Loss 1.2258   LearningRate 0.0001   Epoch: 29   Global Step: 50510   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-03-04 09:28:50,180-Speed 13860.20 samples/sec   Loss 1.2340   LearningRate 0.0001   Epoch: 29   Global Step: 50520   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-04 09:29:07,940-Speed 13838.56 samples/sec   Loss 1.2141   LearningRate 0.0001   Epoch: 29   Global Step: 50530   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-04 09:29:25,734-Speed 13812.46 samples/sec   Loss 1.2253   LearningRate 0.0001   Epoch: 29   Global Step: 50540   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-04 09:29:43,409-Speed 13905.31 samples/sec   Loss 1.2214   LearningRate 0.0001   Epoch: 29   Global Step: 50550   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-04 09:30:01,117-Speed 13879.56 samples/sec   Loss 1.2159   LearningRate 0.0001   Epoch: 29   Global Step: 50560   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-04 09:30:19,061-Speed 13696.61 samples/sec   Loss 1.2266   LearningRate 0.0001   Epoch: 29   Global Step: 50570   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-04 09:30:36,835-Speed 13827.82 samples/sec   Loss 1.2202   LearningRate 0.0001   Epoch: 29   Global Step: 50580   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-03-04 09:30:54,665-Speed 13785.17 samples/sec   Loss 1.2201   LearningRate 0.0001   Epoch: 29   Global Step: 50590   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-03-04 09:31:12,394-Speed 13862.50 samples/sec   Loss 1.2133   LearningRate 0.0001   Epoch: 29   Global Step: 50600   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-03-04 09:31:30,208-Speed 13796.59 samples/sec   Loss 1.2227   LearningRate 0.0001   Epoch: 29   Global Step: 50610   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-03-04 09:31:48,041-Speed 13782.11 samples/sec   Loss 1.2167   LearningRate 0.0001   Epoch: 29   Global Step: 50620   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-03-04 09:32:05,788-Speed 13849.69 samples/sec   Loss 1.2228   LearningRate 0.0001   Epoch: 29   Global Step: 50630   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-03-04 09:32:23,525-Speed 13856.13 samples/sec   Loss 1.2177   LearningRate 0.0001   Epoch: 29   Global Step: 50640   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-03-04 09:32:41,223-Speed 13888.94 samples/sec   Loss 1.2164   LearningRate 0.0001   Epoch: 29   Global Step: 50650   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-03-04 09:32:59,031-Speed 13837.88 samples/sec   Loss 1.2155   LearningRate 0.0001   Epoch: 29   Global Step: 50660   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-03-04 09:33:16,775-Speed 13921.89 samples/sec   Loss 1.2132   LearningRate 0.0001   Epoch: 29   Global Step: 50670   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-03-04 09:33:34,539-Speed 13910.14 samples/sec   Loss 1.2151   LearningRate 0.0001   Epoch: 29   Global Step: 50680   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-03-04 09:33:52,286-Speed 13849.04 samples/sec   Loss 1.2164   LearningRate 0.0001   Epoch: 29   Global Step: 50690   Fp16 Grad Scale: 8192   Required: 9 hours
Training: 2022-03-04 09:34:10,014-Speed 13887.25 samples/sec   Loss 1.2161   LearningRate 0.0001   Epoch: 29   Global Step: 50700   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-03-04 09:34:27,755-Speed 13900.53 samples/sec   Loss 1.2077   LearningRate 0.0001   Epoch: 29   Global Step: 50710   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-03-04 09:34:45,484-Speed 13911.92 samples/sec   Loss 1.2107   LearningRate 0.0001   Epoch: 29   Global Step: 50720   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-03-04 09:35:03,176-Speed 13892.26 samples/sec   Loss 1.2155   LearningRate 0.0001   Epoch: 29   Global Step: 50730   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-03-04 09:35:20,926-Speed 13907.99 samples/sec   Loss 1.2160   LearningRate 0.0001   Epoch: 29   Global Step: 50740   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-03-04 09:35:38,718-Speed 13816.01 samples/sec   Loss 1.2045   LearningRate 0.0001   Epoch: 29   Global Step: 50750   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-03-04 09:35:56,780-Speed 13899.64 samples/sec   Loss 1.2133   LearningRate 0.0001   Epoch: 29   Global Step: 50760   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-03-04 09:36:14,748-Speed 13775.92 samples/sec   Loss 1.2162   LearningRate 0.0001   Epoch: 29   Global Step: 50770   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-03-04 09:36:32,481-Speed 13876.72 samples/sec   Loss 1.2164   LearningRate 0.0001   Epoch: 29   Global Step: 50780   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-03-04 09:36:50,227-Speed 13899.38 samples/sec   Loss 1.2174   LearningRate 0.0001   Epoch: 29   Global Step: 50790   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-03-04 09:37:07,955-Speed 13883.01 samples/sec   Loss 1.2193   LearningRate 0.0001   Epoch: 29   Global Step: 50800   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-04 09:37:25,710-Speed 13843.20 samples/sec   Loss 1.2162   LearningRate 0.0001   Epoch: 29   Global Step: 50810   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-04 09:37:43,462-Speed 13924.40 samples/sec   Loss 1.2179   LearningRate 0.0001   Epoch: 29   Global Step: 50820   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-04 09:38:01,369-Speed 13771.49 samples/sec   Loss 1.2059   LearningRate 0.0001   Epoch: 29   Global Step: 50830   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-04 09:38:19,004-Speed 13940.30 samples/sec   Loss 1.2191   LearningRate 0.0001   Epoch: 29   Global Step: 50840   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-04 09:38:36,837-Speed 13874.41 samples/sec   Loss 1.2097   LearningRate 0.0001   Epoch: 29   Global Step: 50850   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-04 09:38:54,686-Speed 13822.37 samples/sec   Loss 1.2052   LearningRate 0.0001   Epoch: 29   Global Step: 50860   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-04 09:39:12,634-Speed 13923.93 samples/sec   Loss 1.2066   LearningRate 0.0001   Epoch: 29   Global Step: 50870   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-04 09:39:30,462-Speed 13879.44 samples/sec   Loss 1.2112   LearningRate 0.0001   Epoch: 29   Global Step: 50880   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-04 09:39:48,296-Speed 13812.57 samples/sec   Loss 1.2062   LearningRate 0.0001   Epoch: 29   Global Step: 50890   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-04 09:40:06,114-Speed 13794.26 samples/sec   Loss 1.2174   LearningRate 0.0001   Epoch: 29   Global Step: 50900   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-04 09:40:23,892-Speed 13845.42 samples/sec   Loss 1.2113   LearningRate 0.0001   Epoch: 29   Global Step: 50910   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-04 09:40:41,664-Speed 13829.10 samples/sec   Loss 1.2128   LearningRate 0.0001   Epoch: 29   Global Step: 50920   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-04 09:40:59,529-Speed 13860.35 samples/sec   Loss 1.2106   LearningRate 0.0001   Epoch: 29   Global Step: 50930   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-04 09:41:17,350-Speed 13823.57 samples/sec   Loss 1.2046   LearningRate 0.0001   Epoch: 29   Global Step: 50940   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-04 09:41:35,031-Speed 13917.85 samples/sec   Loss 1.2021   LearningRate 0.0001   Epoch: 29   Global Step: 50950   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-04 09:41:52,870-Speed 13857.71 samples/sec   Loss 1.2142   LearningRate 0.0001   Epoch: 29   Global Step: 50960   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-04 09:42:10,570-Speed 13913.57 samples/sec   Loss 1.1990   LearningRate 0.0001   Epoch: 29   Global Step: 50970   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-04 09:42:28,234-Speed 13931.59 samples/sec   Loss 1.2051   LearningRate 0.0001   Epoch: 29   Global Step: 50980   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-04 09:42:46,053-Speed 13815.75 samples/sec   Loss 1.1991   LearningRate 0.0001   Epoch: 29   Global Step: 50990   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-04 09:43:03,798-Speed 13850.19 samples/sec   Loss 1.1974   LearningRate 0.0001   Epoch: 29   Global Step: 51000   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-04 09:43:21,539-Speed 13854.40 samples/sec   Loss 1.2038   LearningRate 0.0001   Epoch: 29   Global Step: 51010   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-04 09:43:39,429-Speed 13819.31 samples/sec   Loss 1.2054   LearningRate 0.0001   Epoch: 29   Global Step: 51020   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-04 09:43:57,167-Speed 13898.69 samples/sec   Loss 1.2027   LearningRate 0.0001   Epoch: 29   Global Step: 51030   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-04 09:44:14,972-Speed 13804.00 samples/sec   Loss 1.2078   LearningRate 0.0001   Epoch: 29   Global Step: 51040   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-04 09:44:32,836-Speed 13771.45 samples/sec   Loss 1.2025   LearningRate 0.0001   Epoch: 29   Global Step: 51050   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-04 09:44:50,851-Speed 13696.37 samples/sec   Loss 1.1955   LearningRate 0.0001   Epoch: 29   Global Step: 51060   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-04 09:45:08,818-Speed 13735.41 samples/sec   Loss 1.2105   LearningRate 0.0001   Epoch: 29   Global Step: 51070   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-04 09:45:26,599-Speed 13834.15 samples/sec   Loss 1.2075   LearningRate 0.0001   Epoch: 29   Global Step: 51080   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-04 09:45:44,408-Speed 13863.98 samples/sec   Loss 1.1961   LearningRate 0.0001   Epoch: 29   Global Step: 51090   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-04 09:46:02,166-Speed 13840.19 samples/sec   Loss 1.2012   LearningRate 0.0001   Epoch: 29   Global Step: 51100   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-04 09:46:19,903-Speed 13875.16 samples/sec   Loss 1.2124   LearningRate 0.0001   Epoch: 29   Global Step: 51110   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-04 09:46:37,664-Speed 13841.51 samples/sec   Loss 1.2017   LearningRate 0.0001   Epoch: 29   Global Step: 51120   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-04 09:46:55,467-Speed 13840.27 samples/sec   Loss 1.2044   LearningRate 0.0001   Epoch: 29   Global Step: 51130   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-04 09:47:13,306-Speed 13840.93 samples/sec   Loss 1.2089   LearningRate 0.0001   Epoch: 29   Global Step: 51140   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-04 09:47:31,229-Speed 13756.19 samples/sec   Loss 1.2074   LearningRate 0.0001   Epoch: 29   Global Step: 51150   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-03-04 09:47:49,101-Speed 13799.36 samples/sec   Loss 1.2010   LearningRate 0.0001   Epoch: 29   Global Step: 51160   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-03-04 09:48:06,962-Speed 13831.65 samples/sec   Loss 1.1974   LearningRate 0.0001   Epoch: 29   Global Step: 51170   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-03-04 09:48:24,858-Speed 13778.81 samples/sec   Loss 1.2043   LearningRate 0.0001   Epoch: 29   Global Step: 51180   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-03-04 09:48:42,793-Speed 13745.17 samples/sec   Loss 1.2114   LearningRate 0.0001   Epoch: 29   Global Step: 51190   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-03-04 09:49:00,666-Speed 13813.44 samples/sec   Loss 1.1995   LearningRate 0.0001   Epoch: 29   Global Step: 51200   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-03-04 09:49:18,492-Speed 13789.11 samples/sec   Loss 1.2022   LearningRate 0.0001   Epoch: 29   Global Step: 51210   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-03-04 09:49:36,372-Speed 13763.30 samples/sec   Loss 1.1937   LearningRate 0.0001   Epoch: 29   Global Step: 51220   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-03-04 09:49:54,283-Speed 13770.30 samples/sec   Loss 1.1992   LearningRate 0.0001   Epoch: 29   Global Step: 51230   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-03-04 09:50:12,093-Speed 13810.25 samples/sec   Loss 1.2040   LearningRate 0.0001   Epoch: 29   Global Step: 51240   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-03-04 09:50:29,976-Speed 13743.43 samples/sec   Loss 1.2042   LearningRate 0.0001   Epoch: 29   Global Step: 51250   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-04 09:50:47,918-Speed 13712.38 samples/sec   Loss 1.1970   LearningRate 0.0001   Epoch: 29   Global Step: 51260   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-04 09:51:05,895-Speed 13672.24 samples/sec   Loss 1.2041   LearningRate 0.0001   Epoch: 29   Global Step: 51270   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-04 09:51:23,725-Speed 13815.59 samples/sec   Loss 1.2036   LearningRate 0.0001   Epoch: 29   Global Step: 51280   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-04 09:51:41,545-Speed 13793.14 samples/sec   Loss 1.2029   LearningRate 0.0001   Epoch: 29   Global Step: 51290   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-04 09:51:59,428-Speed 13769.58 samples/sec   Loss 1.1980   LearningRate 0.0001   Epoch: 29   Global Step: 51300   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-04 09:52:17,386-Speed 13741.70 samples/sec   Loss 1.1938   LearningRate 0.0001   Epoch: 29   Global Step: 51310   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-04 09:52:35,314-Speed 13710.23 samples/sec   Loss 1.2005   LearningRate 0.0001   Epoch: 29   Global Step: 51320   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-04 09:52:53,147-Speed 13782.41 samples/sec   Loss 1.2000   LearningRate 0.0001   Epoch: 29   Global Step: 51330   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-04 09:53:10,943-Speed 13810.30 samples/sec   Loss 1.1943   LearningRate 0.0001   Epoch: 29   Global Step: 51340   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-04 09:53:28,709-Speed 13833.78 samples/sec   Loss 1.1934   LearningRate 0.0001   Epoch: 29   Global Step: 51350   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-04 09:53:46,438-Speed 13864.19 samples/sec   Loss 1.2017   LearningRate 0.0001   Epoch: 29   Global Step: 51360   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-04 09:54:04,229-Speed 13814.86 samples/sec   Loss 1.1967   LearningRate 0.0001   Epoch: 29   Global Step: 51370   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-04 09:54:21,989-Speed 13838.91 samples/sec   Loss 1.1840   LearningRate 0.0001   Epoch: 29   Global Step: 51380   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-04 09:54:39,789-Speed 13807.17 samples/sec   Loss 1.1905   LearningRate 0.0001   Epoch: 29   Global Step: 51390   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-04 09:54:57,629-Speed 13777.15 samples/sec   Loss 1.1952   LearningRate 0.0001   Epoch: 29   Global Step: 51400   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-04 09:55:15,324-Speed 13889.83 samples/sec   Loss 1.1949   LearningRate 0.0001   Epoch: 29   Global Step: 51410   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-04 09:55:33,024-Speed 13885.68 samples/sec   Loss 1.1939   LearningRate 0.0001   Epoch: 29   Global Step: 51420   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-04 09:55:50,859-Speed 13779.91 samples/sec   Loss 1.1829   LearningRate 0.0001   Epoch: 29   Global Step: 51430   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-04 09:56:08,897-Speed 13625.88 samples/sec   Loss 1.1921   LearningRate 0.0001   Epoch: 29   Global Step: 51440   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-04 09:56:26,968-Speed 13600.36 samples/sec   Loss 1.1928   LearningRate 0.0001   Epoch: 29   Global Step: 51450   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-04 09:56:44,994-Speed 13635.04 samples/sec   Loss 1.1958   LearningRate 0.0001   Epoch: 29   Global Step: 51460   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-04 09:57:03,042-Speed 13617.54 samples/sec   Loss 1.1933   LearningRate 0.0001   Epoch: 29   Global Step: 51470   Fp16 Grad Scale: 65536   Required: 9 hours
Training: 2022-03-04 09:57:21,040-Speed 13655.91 samples/sec   Loss 1.1880   LearningRate 0.0001   Epoch: 29   Global Step: 51480   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-04 09:57:39,089-Speed 13617.29 samples/sec   Loss 1.1975   LearningRate 0.0001   Epoch: 29   Global Step: 51490   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-04 09:57:57,057-Speed 13678.26 samples/sec   Loss 1.1849   LearningRate 0.0001   Epoch: 29   Global Step: 51500   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-03-04 09:58:15,073-Speed 13642.15 samples/sec   Loss 1.1985   LearningRate 0.0001   Epoch: 29   Global Step: 51510   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-03-04 09:58:33,071-Speed 13655.75 samples/sec   Loss 1.1899   LearningRate 0.0001   Epoch: 29   Global Step: 51520   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-03-04 09:58:50,821-Speed 13846.42 samples/sec   Loss 1.1902   LearningRate 0.0001   Epoch: 29   Global Step: 51530   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-03-04 09:59:08,619-Speed 13809.06 samples/sec   Loss 1.1910   LearningRate 0.0001   Epoch: 29   Global Step: 51540   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-03-04 09:59:26,394-Speed 13826.92 samples/sec   Loss 1.1885   LearningRate 0.0001   Epoch: 29   Global Step: 51550   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-03-04 09:59:44,099-Speed 13881.85 samples/sec   Loss 1.1923   LearningRate 0.0001   Epoch: 29   Global Step: 51560   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-03-04 10:00:01,780-Speed 13900.82 samples/sec   Loss 1.1928   LearningRate 0.0001   Epoch: 29   Global Step: 51570   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-03-04 10:00:19,567-Speed 13817.47 samples/sec   Loss 1.1917   LearningRate 0.0001   Epoch: 29   Global Step: 51580   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-03-04 10:00:37,292-Speed 13865.81 samples/sec   Loss 1.2020   LearningRate 0.0001   Epoch: 29   Global Step: 51590   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-03-04 10:00:55,061-Speed 13831.92 samples/sec   Loss 1.1979   LearningRate 0.0001   Epoch: 29   Global Step: 51600   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-04 10:01:12,764-Speed 13883.66 samples/sec   Loss 1.1953   LearningRate 0.0001   Epoch: 29   Global Step: 51610   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-04 10:01:30,567-Speed 13804.95 samples/sec   Loss 1.1963   LearningRate 0.0001   Epoch: 29   Global Step: 51620   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-03-04 10:01:48,274-Speed 13880.62 samples/sec   Loss 1.1964   LearningRate 0.0001   Epoch: 29   Global Step: 51630   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-03-04 10:02:06,049-Speed 13826.37 samples/sec   Loss 1.2030   LearningRate 0.0001   Epoch: 29   Global Step: 51640   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-03-04 10:02:23,819-Speed 13831.29 samples/sec   Loss 1.1975   LearningRate 0.0001   Epoch: 29   Global Step: 51650   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-03-04 10:02:41,570-Speed 13846.04 samples/sec   Loss 1.1912   LearningRate 0.0001   Epoch: 29   Global Step: 51660   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-03-04 10:02:59,355-Speed 13819.63 samples/sec   Loss 1.1924   LearningRate 0.0001   Epoch: 29   Global Step: 51670   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-03-04 10:03:17,113-Speed 13840.13 samples/sec   Loss 1.1923   LearningRate 0.0001   Epoch: 29   Global Step: 51680   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-03-04 10:03:34,844-Speed 13861.12 samples/sec   Loss 1.1924   LearningRate 0.0001   Epoch: 29   Global Step: 51690   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-03-04 10:03:52,603-Speed 13839.86 samples/sec   Loss 1.1972   LearningRate 0.0001   Epoch: 29   Global Step: 51700   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-03-04 10:04:10,315-Speed 13876.21 samples/sec   Loss 1.1949   LearningRate 0.0001   Epoch: 29   Global Step: 51710   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-03-04 10:04:28,027-Speed 13876.30 samples/sec   Loss 1.1924   LearningRate 0.0001   Epoch: 29   Global Step: 51720   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-04 10:04:45,723-Speed 13888.64 samples/sec   Loss 1.1815   LearningRate 0.0001   Epoch: 29   Global Step: 51730   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-03-04 10:05:03,432-Speed 13878.46 samples/sec   Loss 1.1726   LearningRate 0.0001   Epoch: 29   Global Step: 51740   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-03-04 10:05:21,184-Speed 13844.81 samples/sec   Loss 1.1896   LearningRate 0.0001   Epoch: 29   Global Step: 51750   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-03-04 10:05:38,910-Speed 13865.28 samples/sec   Loss 1.1955   LearningRate 0.0001   Epoch: 29   Global Step: 51760   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-03-04 10:05:56,646-Speed 13857.70 samples/sec   Loss 1.1911   LearningRate 0.0001   Epoch: 29   Global Step: 51770   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-03-04 10:06:14,396-Speed 13846.91 samples/sec   Loss 1.1935   LearningRate 0.0001   Epoch: 29   Global Step: 51780   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-03-04 10:06:32,105-Speed 13878.37 samples/sec   Loss 1.1889   LearningRate 0.0001   Epoch: 29   Global Step: 51790   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-03-04 10:06:49,947-Speed 13774.57 samples/sec   Loss 1.1960   LearningRate 0.0001   Epoch: 29   Global Step: 51800   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-03-04 10:07:07,682-Speed 13858.70 samples/sec   Loss 1.1917   LearningRate 0.0001   Epoch: 29   Global Step: 51810   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-03-04 10:07:25,411-Speed 13863.78 samples/sec   Loss 1.1958   LearningRate 0.0001   Epoch: 29   Global Step: 51820   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-03-04 10:07:43,150-Speed 13855.41 samples/sec   Loss 1.1938   LearningRate 0.0001   Epoch: 29   Global Step: 51830   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-04 10:08:00,875-Speed 13865.95 samples/sec   Loss 1.1924   LearningRate 0.0001   Epoch: 29   Global Step: 51840   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-04 10:09:08,129-Speed 3654.22 samples/sec   Loss 1.1978   LearningRate 0.0001   Epoch: 30   Global Step: 51850   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-04 10:09:25,836-Speed 13880.65 samples/sec   Loss 1.1830   LearningRate 0.0001   Epoch: 30   Global Step: 51860   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-03-04 10:09:43,480-Speed 13930.13 samples/sec   Loss 1.1869   LearningRate 0.0001   Epoch: 30   Global Step: 51870   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-03-04 10:10:01,212-Speed 13860.40 samples/sec   Loss 1.1782   LearningRate 0.0001   Epoch: 30   Global Step: 51880   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-03-04 10:10:18,941-Speed 13863.06 samples/sec   Loss 1.1856   LearningRate 0.0001   Epoch: 30   Global Step: 51890   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-03-04 10:10:36,567-Speed 13944.31 samples/sec   Loss 1.1851   LearningRate 0.0001   Epoch: 30   Global Step: 51900   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-03-04 10:10:54,365-Speed 13809.00 samples/sec   Loss 1.1743   LearningRate 0.0001   Epoch: 30   Global Step: 51910   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-03-04 10:11:12,216-Speed 13768.54 samples/sec   Loss 1.1897   LearningRate 0.0001   Epoch: 30   Global Step: 51920   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-03-04 10:11:29,908-Speed 13891.30 samples/sec   Loss 1.1713   LearningRate 0.0001   Epoch: 30   Global Step: 51930   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-03-04 10:11:47,546-Speed 13934.70 samples/sec   Loss 1.1845   LearningRate 0.0001   Epoch: 30   Global Step: 51940   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-03-04 10:12:05,310-Speed 13836.02 samples/sec   Loss 1.1823   LearningRate 0.0001   Epoch: 30   Global Step: 51950   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-03-04 10:12:23,016-Speed 13881.04 samples/sec   Loss 1.1790   LearningRate 0.0001   Epoch: 30   Global Step: 51960   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-04 10:12:40,734-Speed 13870.85 samples/sec   Loss 1.1822   LearningRate 0.0001   Epoch: 30   Global Step: 51970   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-04 10:12:58,384-Speed 13925.47 samples/sec   Loss 1.1755   LearningRate 0.0001   Epoch: 30   Global Step: 51980   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-04 10:13:16,052-Speed 13910.55 samples/sec   Loss 1.1826   LearningRate 0.0001   Epoch: 30   Global Step: 51990   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-04 10:13:33,813-Speed 13838.65 samples/sec   Loss 1.1784   LearningRate 0.0001   Epoch: 30   Global Step: 52000   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-04 10:13:51,564-Speed 13846.83 samples/sec   Loss 1.1772   LearningRate 0.0001   Epoch: 30   Global Step: 52010   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-04 10:14:09,278-Speed 13874.79 samples/sec   Loss 1.1775   LearningRate 0.0001   Epoch: 30   Global Step: 52020   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-04 10:14:27,046-Speed 13832.41 samples/sec   Loss 1.1749   LearningRate 0.0001   Epoch: 30   Global Step: 52030   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-04 10:14:44,733-Speed 13895.84 samples/sec   Loss 1.1772   LearningRate 0.0001   Epoch: 30   Global Step: 52040   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-03-04 10:15:02,536-Speed 13805.90 samples/sec   Loss 1.1660   LearningRate 0.0001   Epoch: 30   Global Step: 52050   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-03-04 10:15:20,260-Speed 13866.17 samples/sec   Loss 1.1751   LearningRate 0.0001   Epoch: 30   Global Step: 52060   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-03-04 10:15:38,017-Speed 13841.52 samples/sec   Loss 1.1817   LearningRate 0.0001   Epoch: 30   Global Step: 52070   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-03-04 10:15:55,740-Speed 13867.47 samples/sec   Loss 1.1838   LearningRate 0.0001   Epoch: 30   Global Step: 52080   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-03-04 10:16:13,482-Speed 13853.20 samples/sec   Loss 1.1826   LearningRate 0.0001   Epoch: 30   Global Step: 52090   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-03-04 10:16:31,224-Speed 13852.56 samples/sec   Loss 1.1793   LearningRate 0.0001   Epoch: 30   Global Step: 52100   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-03-04 10:16:49,022-Speed 13808.86 samples/sec   Loss 1.1810   LearningRate 0.0001   Epoch: 30   Global Step: 52110   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-03-04 10:17:06,767-Speed 13851.08 samples/sec   Loss 1.1782   LearningRate 0.0001   Epoch: 30   Global Step: 52120   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-03-04 10:17:24,497-Speed 13862.06 samples/sec   Loss 1.1746   LearningRate 0.0001   Epoch: 30   Global Step: 52130   Fp16 Grad Scale: 16384   Required: 9 hours
Training: 2022-03-04 10:17:42,281-Speed 13880.68 samples/sec   Loss 1.1721   LearningRate 0.0001   Epoch: 30   Global Step: 52140   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-04 10:18:00,100-Speed 13883.98 samples/sec   Loss 1.1790   LearningRate 0.0001   Epoch: 30   Global Step: 52150   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-04 10:18:17,956-Speed 13764.14 samples/sec   Loss 1.1811   LearningRate 0.0001   Epoch: 30   Global Step: 52160   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-04 10:18:35,669-Speed 13875.21 samples/sec   Loss 1.1840   LearningRate 0.0001   Epoch: 30   Global Step: 52170   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-04 10:18:53,374-Speed 13882.17 samples/sec   Loss 1.1772   LearningRate 0.0001   Epoch: 30   Global Step: 52180   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-04 10:19:11,066-Speed 13892.01 samples/sec   Loss 1.1874   LearningRate 0.0001   Epoch: 30   Global Step: 52190   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-04 10:19:28,782-Speed 13873.05 samples/sec   Loss 1.1837   LearningRate 0.0001   Epoch: 30   Global Step: 52200   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-04 10:19:46,481-Speed 13886.57 samples/sec   Loss 1.1823   LearningRate 0.0001   Epoch: 30   Global Step: 52210   Fp16 Grad Scale: 32768   Required: 9 hours
Training: 2022-03-04 10:20:04,207-Speed 13865.78 samples/sec   Loss 1.1848   LearningRate 0.0001   Epoch: 30   Global Step: 52220   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-04 10:20:22,010-Speed 13805.21 samples/sec   Loss 1.1640   LearningRate 0.0001   Epoch: 30   Global Step: 52230   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-04 10:20:39,698-Speed 13895.15 samples/sec   Loss 1.1730   LearningRate 0.0001   Epoch: 30   Global Step: 52240   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-04 10:20:57,391-Speed 13890.65 samples/sec   Loss 1.1799   LearningRate 0.0001   Epoch: 30   Global Step: 52250   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-04 10:21:15,094-Speed 13883.44 samples/sec   Loss 1.1661   LearningRate 0.0001   Epoch: 30   Global Step: 52260   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-03-04 10:21:32,801-Speed 13879.62 samples/sec   Loss 1.1779   LearningRate 0.0001   Epoch: 30   Global Step: 52270   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-03-04 10:21:50,577-Speed 13827.85 samples/sec   Loss 1.1727   LearningRate 0.0001   Epoch: 30   Global Step: 52280   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-03-04 10:22:08,292-Speed 13873.44 samples/sec   Loss 1.1775   LearningRate 0.0001   Epoch: 30   Global Step: 52290   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-03-04 10:22:26,100-Speed 13801.38 samples/sec   Loss 1.1742   LearningRate 0.0001   Epoch: 30   Global Step: 52300   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-03-04 10:22:43,867-Speed 13833.24 samples/sec   Loss 1.1766   LearningRate 0.0001   Epoch: 30   Global Step: 52310   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-03-04 10:23:01,516-Speed 13926.34 samples/sec   Loss 1.1753   LearningRate 0.0001   Epoch: 30   Global Step: 52320   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-03-04 10:23:19,327-Speed 13799.01 samples/sec   Loss 1.1790   LearningRate 0.0001   Epoch: 30   Global Step: 52330   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-03-04 10:23:37,176-Speed 13769.89 samples/sec   Loss 1.1780   LearningRate 0.0001   Epoch: 30   Global Step: 52340   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-03-04 10:23:55,043-Speed 13755.24 samples/sec   Loss 1.1676   LearningRate 0.0001   Epoch: 30   Global Step: 52350   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-03-04 10:24:12,852-Speed 13801.47 samples/sec   Loss 1.1918   LearningRate 0.0001   Epoch: 30   Global Step: 52360   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-04 10:24:30,644-Speed 13813.59 samples/sec   Loss 1.1777   LearningRate 0.0001   Epoch: 30   Global Step: 52370   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-04 10:24:48,533-Speed 13738.81 samples/sec   Loss 1.1719   LearningRate 0.0001   Epoch: 30   Global Step: 52380   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-04 10:25:06,366-Speed 13782.59 samples/sec   Loss 1.1677   LearningRate 0.0001   Epoch: 30   Global Step: 52390   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-04 10:25:24,119-Speed 13844.32 samples/sec   Loss 1.1695   LearningRate 0.0001   Epoch: 30   Global Step: 52400   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-04 10:25:42,052-Speed 13705.62 samples/sec   Loss 1.1788   LearningRate 0.0001   Epoch: 30   Global Step: 52410   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-04 10:25:59,953-Speed 13729.17 samples/sec   Loss 1.1680   LearningRate 0.0001   Epoch: 30   Global Step: 52420   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-04 10:26:17,714-Speed 13838.22 samples/sec   Loss 1.1763   LearningRate 0.0001   Epoch: 30   Global Step: 52430   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-04 10:26:35,561-Speed 13771.31 samples/sec   Loss 1.1750   LearningRate 0.0001   Epoch: 30   Global Step: 52440   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-04 10:26:53,381-Speed 13792.00 samples/sec   Loss 1.1729   LearningRate 0.0001   Epoch: 30   Global Step: 52450   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-04 10:27:11,206-Speed 13788.34 samples/sec   Loss 1.1701   LearningRate 0.0001   Epoch: 30   Global Step: 52460   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-03-04 10:27:29,201-Speed 13657.73 samples/sec   Loss 1.1633   LearningRate 0.0001   Epoch: 30   Global Step: 52470   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-03-04 10:27:47,314-Speed 13569.20 samples/sec   Loss 1.1802   LearningRate 0.0001   Epoch: 30   Global Step: 52480   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-03-04 10:28:05,353-Speed 13624.97 samples/sec   Loss 1.1681   LearningRate 0.0001   Epoch: 30   Global Step: 52490   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-03-04 10:28:23,478-Speed 13559.92 samples/sec   Loss 1.1683   LearningRate 0.0001   Epoch: 30   Global Step: 52500   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-03-04 10:28:41,613-Speed 13552.64 samples/sec   Loss 1.1661   LearningRate 0.0001   Epoch: 30   Global Step: 52510   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-03-04 10:28:59,680-Speed 13603.94 samples/sec   Loss 1.1636   LearningRate 0.0001   Epoch: 30   Global Step: 52520   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-04 10:29:17,795-Speed 13567.31 samples/sec   Loss 1.1628   LearningRate 0.0001   Epoch: 30   Global Step: 52530   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-04 10:29:35,936-Speed 13548.46 samples/sec   Loss 1.1683   LearningRate 0.0001   Epoch: 30   Global Step: 52540   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-04 10:29:54,057-Speed 13562.46 samples/sec   Loss 1.1648   LearningRate 0.0001   Epoch: 30   Global Step: 52550   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-04 10:30:12,201-Speed 13546.16 samples/sec   Loss 1.1637   LearningRate 0.0001   Epoch: 30   Global Step: 52560   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-04 10:30:30,319-Speed 13565.00 samples/sec   Loss 1.1658   LearningRate 0.0001   Epoch: 30   Global Step: 52570   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-04 10:30:48,439-Speed 13564.46 samples/sec   Loss 1.1700   LearningRate 0.0001   Epoch: 30   Global Step: 52580   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-04 10:31:06,535-Speed 13581.09 samples/sec   Loss 1.1697   LearningRate 0.0001   Epoch: 30   Global Step: 52590   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-04 10:31:24,623-Speed 13588.16 samples/sec   Loss 1.1768   LearningRate 0.0001   Epoch: 30   Global Step: 52600   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-04 10:31:42,685-Speed 13607.62 samples/sec   Loss 1.1670   LearningRate 0.0001   Epoch: 30   Global Step: 52610   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-04 10:32:00,739-Speed 13614.49 samples/sec   Loss 1.1589   LearningRate 0.0001   Epoch: 30   Global Step: 52620   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-03-04 10:32:18,887-Speed 13542.87 samples/sec   Loss 1.1808   LearningRate 0.0001   Epoch: 30   Global Step: 52630   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-03-04 10:32:36,937-Speed 13616.01 samples/sec   Loss 1.1696   LearningRate 0.0001   Epoch: 30   Global Step: 52640   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-03-04 10:32:54,721-Speed 13820.24 samples/sec   Loss 1.1684   LearningRate 0.0001   Epoch: 30   Global Step: 52650   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-03-04 10:33:12,637-Speed 13718.05 samples/sec   Loss 1.1674   LearningRate 0.0001   Epoch: 30   Global Step: 52660   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-04 10:33:30,454-Speed 13794.79 samples/sec   Loss 1.1616   LearningRate 0.0001   Epoch: 30   Global Step: 52670   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-04 10:33:48,180-Speed 13865.28 samples/sec   Loss 1.1580   LearningRate 0.0001   Epoch: 30   Global Step: 52680   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-04 10:34:05,918-Speed 13855.43 samples/sec   Loss 1.1629   LearningRate 0.0001   Epoch: 30   Global Step: 52690   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-04 10:34:23,936-Speed 13641.10 samples/sec   Loss 1.1650   LearningRate 0.0001   Epoch: 30   Global Step: 52700   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-04 10:34:41,817-Speed 13745.22 samples/sec   Loss 1.1571   LearningRate 0.0001   Epoch: 30   Global Step: 52710   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-04 10:34:59,673-Speed 13763.78 samples/sec   Loss 1.1700   LearningRate 0.0001   Epoch: 30   Global Step: 52720   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-03-04 10:35:17,476-Speed 13806.32 samples/sec   Loss 1.1658   LearningRate 0.0001   Epoch: 30   Global Step: 52730   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-03-04 10:35:35,214-Speed 13856.15 samples/sec   Loss 1.1630   LearningRate 0.0001   Epoch: 30   Global Step: 52740   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-03-04 10:35:52,987-Speed 13828.29 samples/sec   Loss 1.1534   LearningRate 0.0001   Epoch: 30   Global Step: 52750   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-03-04 10:36:10,702-Speed 13874.13 samples/sec   Loss 1.1603   LearningRate 0.0001   Epoch: 30   Global Step: 52760   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-03-04 10:36:28,434-Speed 13860.44 samples/sec   Loss 1.1539   LearningRate 0.0001   Epoch: 30   Global Step: 52770   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-03-04 10:36:46,186-Speed 13845.51 samples/sec   Loss 1.1637   LearningRate 0.0001   Epoch: 30   Global Step: 52780   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-03-04 10:37:04,014-Speed 13785.53 samples/sec   Loss 1.1608   LearningRate 0.0001   Epoch: 30   Global Step: 52790   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-03-04 10:37:21,826-Speed 13798.56 samples/sec   Loss 1.1670   LearningRate 0.0001   Epoch: 30   Global Step: 52800   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-03-04 10:37:39,602-Speed 13825.87 samples/sec   Loss 1.1571   LearningRate 0.0001   Epoch: 30   Global Step: 52810   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-03-04 10:37:57,309-Speed 13880.39 samples/sec   Loss 1.1531   LearningRate 0.0001   Epoch: 30   Global Step: 52820   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-04 10:38:15,055-Speed 13849.71 samples/sec   Loss 1.1610   LearningRate 0.0001   Epoch: 30   Global Step: 52830   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-04 10:38:32,771-Speed 13873.00 samples/sec   Loss 1.1670   LearningRate 0.0001   Epoch: 30   Global Step: 52840   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-04 10:38:50,575-Speed 13804.50 samples/sec   Loss 1.1638   LearningRate 0.0001   Epoch: 30   Global Step: 52850   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-04 10:39:08,419-Speed 13774.01 samples/sec   Loss 1.1600   LearningRate 0.0001   Epoch: 30   Global Step: 52860   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-04 10:39:26,280-Speed 13760.09 samples/sec   Loss 1.1608   LearningRate 0.0001   Epoch: 30   Global Step: 52870   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-04 10:39:44,024-Speed 13851.58 samples/sec   Loss 1.1605   LearningRate 0.0001   Epoch: 30   Global Step: 52880   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-04 10:40:01,825-Speed 13806.84 samples/sec   Loss 1.1549   LearningRate 0.0001   Epoch: 30   Global Step: 52890   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-04 10:40:19,643-Speed 13794.20 samples/sec   Loss 1.1496   LearningRate 0.0001   Epoch: 30   Global Step: 52900   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-04 10:40:37,355-Speed 13875.79 samples/sec   Loss 1.1656   LearningRate 0.0001   Epoch: 30   Global Step: 52910   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-04 10:40:55,061-Speed 13880.88 samples/sec   Loss 1.1604   LearningRate 0.0001   Epoch: 30   Global Step: 52920   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-03-04 10:41:12,870-Speed 13800.68 samples/sec   Loss 1.1582   LearningRate 0.0001   Epoch: 30   Global Step: 52930   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-03-04 10:41:30,577-Speed 13880.45 samples/sec   Loss 1.1629   LearningRate 0.0001   Epoch: 30   Global Step: 52940   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-03-04 10:41:48,294-Speed 13874.11 samples/sec   Loss 1.1566   LearningRate 0.0001   Epoch: 30   Global Step: 52950   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-04 10:42:06,093-Speed 13808.52 samples/sec   Loss 1.1506   LearningRate 0.0001   Epoch: 30   Global Step: 52960   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-04 10:42:23,847-Speed 13843.24 samples/sec   Loss 1.1567   LearningRate 0.0001   Epoch: 30   Global Step: 52970   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-04 10:42:41,643-Speed 13810.91 samples/sec   Loss 1.1670   LearningRate 0.0001   Epoch: 30   Global Step: 52980   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-04 10:42:59,323-Speed 13900.97 samples/sec   Loss 1.1554   LearningRate 0.0001   Epoch: 30   Global Step: 52990   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-03-04 10:43:17,093-Speed 13831.15 samples/sec   Loss 1.1597   LearningRate 0.0001   Epoch: 30   Global Step: 53000   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-03-04 10:43:34,942-Speed 13769.63 samples/sec   Loss 1.1554   LearningRate 0.0001   Epoch: 30   Global Step: 53010   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-03-04 10:43:52,700-Speed 13840.38 samples/sec   Loss 1.1579   LearningRate 0.0001   Epoch: 30   Global Step: 53020   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-03-04 10:44:10,455-Speed 13842.80 samples/sec   Loss 1.1618   LearningRate 0.0001   Epoch: 30   Global Step: 53030   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-03-04 10:44:28,202-Speed 13848.99 samples/sec   Loss 1.1552   LearningRate 0.0001   Epoch: 30   Global Step: 53040   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-03-04 10:44:45,925-Speed 13867.41 samples/sec   Loss 1.1601   LearningRate 0.0001   Epoch: 30   Global Step: 53050   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-03-04 10:45:03,721-Speed 13810.61 samples/sec   Loss 1.1552   LearningRate 0.0001   Epoch: 30   Global Step: 53060   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-03-04 10:45:21,421-Speed 13886.12 samples/sec   Loss 1.1454   LearningRate 0.0001   Epoch: 30   Global Step: 53070   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-03-04 10:45:39,232-Speed 13799.10 samples/sec   Loss 1.1528   LearningRate 0.0001   Epoch: 30   Global Step: 53080   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-03-04 10:45:57,091-Speed 13762.34 samples/sec   Loss 1.1630   LearningRate 0.0001   Epoch: 30   Global Step: 53090   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-04 10:46:14,874-Speed 13820.49 samples/sec   Loss 1.1507   LearningRate 0.0001   Epoch: 30   Global Step: 53100   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-04 10:46:32,640-Speed 13834.05 samples/sec   Loss 1.1552   LearningRate 0.0001   Epoch: 30   Global Step: 53110   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-04 10:46:50,342-Speed 13884.06 samples/sec   Loss 1.1535   LearningRate 0.0001   Epoch: 30   Global Step: 53120   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-04 10:47:08,118-Speed 13827.38 samples/sec   Loss 1.1431   LearningRate 0.0001   Epoch: 30   Global Step: 53130   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-04 10:47:25,984-Speed 13756.60 samples/sec   Loss 1.1586   LearningRate 0.0001   Epoch: 30   Global Step: 53140   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-04 10:47:43,748-Speed 13835.63 samples/sec   Loss 1.1460   LearningRate 0.0001   Epoch: 30   Global Step: 53150   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-04 10:48:01,459-Speed 13878.25 samples/sec   Loss 1.1526   LearningRate 0.0001   Epoch: 30   Global Step: 53160   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-04 10:48:19,186-Speed 13864.39 samples/sec   Loss 1.1586   LearningRate 0.0001   Epoch: 30   Global Step: 53170   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-04 10:48:36,981-Speed 13811.50 samples/sec   Loss 1.1610   LearningRate 0.0001   Epoch: 30   Global Step: 53180   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-04 10:48:54,672-Speed 13892.60 samples/sec   Loss 1.1468   LearningRate 0.0001   Epoch: 30   Global Step: 53190   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-03-04 10:49:12,364-Speed 13891.91 samples/sec   Loss 1.1524   LearningRate 0.0001   Epoch: 30   Global Step: 53200   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-03-04 10:49:30,100-Speed 13857.56 samples/sec   Loss 1.1469   LearningRate 0.0001   Epoch: 30   Global Step: 53210   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-03-04 10:49:47,831-Speed 13860.84 samples/sec   Loss 1.1538   LearningRate 0.0001   Epoch: 30   Global Step: 53220   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-03-04 10:50:05,577-Speed 13849.67 samples/sec   Loss 1.1557   LearningRate 0.0001   Epoch: 30   Global Step: 53230   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-03-04 10:50:23,329-Speed 13845.07 samples/sec   Loss 1.1462   LearningRate 0.0001   Epoch: 30   Global Step: 53240   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-03-04 10:50:41,029-Speed 13885.62 samples/sec   Loss 1.1445   LearningRate 0.0001   Epoch: 30   Global Step: 53250   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-03-04 10:50:58,755-Speed 13865.21 samples/sec   Loss 1.1593   LearningRate 0.0001   Epoch: 30   Global Step: 53260   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-03-04 10:51:16,475-Speed 13870.00 samples/sec   Loss 1.1537   LearningRate 0.0001   Epoch: 30   Global Step: 53270   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-03-04 10:51:34,215-Speed 13854.73 samples/sec   Loss 1.1491   LearningRate 0.0001   Epoch: 30   Global Step: 53280   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-03-04 10:51:51,972-Speed 13841.48 samples/sec   Loss 1.1529   LearningRate 0.0001   Epoch: 30   Global Step: 53290   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-04 10:52:09,770-Speed 13808.73 samples/sec   Loss 1.1552   LearningRate 0.0001   Epoch: 30   Global Step: 53300   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-04 10:52:27,471-Speed 13884.99 samples/sec   Loss 1.1490   LearningRate 0.0001   Epoch: 30   Global Step: 53310   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-04 10:52:45,152-Speed 13900.05 samples/sec   Loss 1.1502   LearningRate 0.0001   Epoch: 30   Global Step: 53320   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-04 10:53:02,887-Speed 13858.57 samples/sec   Loss 1.1544   LearningRate 0.0001   Epoch: 30   Global Step: 53330   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-04 10:53:20,614-Speed 13864.78 samples/sec   Loss 1.1477   LearningRate 0.0001   Epoch: 30   Global Step: 53340   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-04 10:53:38,354-Speed 13854.08 samples/sec   Loss 1.1599   LearningRate 0.0001   Epoch: 30   Global Step: 53350   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-04 10:53:56,099-Speed 13849.94 samples/sec   Loss 1.1562   LearningRate 0.0001   Epoch: 30   Global Step: 53360   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-04 10:54:13,846-Speed 13849.77 samples/sec   Loss 1.1539   LearningRate 0.0001   Epoch: 30   Global Step: 53370   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-04 10:54:31,546-Speed 13885.32 samples/sec   Loss 1.1455   LearningRate 0.0001   Epoch: 30   Global Step: 53380   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-03-04 10:54:49,287-Speed 13854.24 samples/sec   Loss 1.1471   LearningRate 0.0001   Epoch: 30   Global Step: 53390   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-03-04 10:55:07,027-Speed 13854.95 samples/sec   Loss 1.1417   LearningRate 0.0001   Epoch: 30   Global Step: 53400   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-03-04 10:55:24,773-Speed 13850.86 samples/sec   Loss 1.1462   LearningRate 0.0001   Epoch: 30   Global Step: 53410   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-03-04 10:55:42,560-Speed 13819.15 samples/sec   Loss 1.1481   LearningRate 0.0001   Epoch: 30   Global Step: 53420   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-03-04 10:56:00,286-Speed 13866.06 samples/sec   Loss 1.1365   LearningRate 0.0001   Epoch: 30   Global Step: 53430   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-03-04 10:56:18,006-Speed 13869.26 samples/sec   Loss 1.1498   LearningRate 0.0001   Epoch: 30   Global Step: 53440   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-03-04 10:56:35,830-Speed 13789.57 samples/sec   Loss 1.1501   LearningRate 0.0001   Epoch: 30   Global Step: 53450   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-03-04 10:56:53,519-Speed 13894.53 samples/sec   Loss 1.1481   LearningRate 0.0001   Epoch: 30   Global Step: 53460   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-03-04 10:57:11,231-Speed 13876.43 samples/sec   Loss 1.1536   LearningRate 0.0001   Epoch: 30   Global Step: 53470   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-03-04 10:57:28,947-Speed 13872.36 samples/sec   Loss 1.1519   LearningRate 0.0001   Epoch: 30   Global Step: 53480   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-04 10:57:46,658-Speed 13877.65 samples/sec   Loss 1.1535   LearningRate 0.0001   Epoch: 30   Global Step: 53490   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-04 10:58:04,359-Speed 13884.90 samples/sec   Loss 1.1561   LearningRate 0.0001   Epoch: 30   Global Step: 53500   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-04 10:58:22,061-Speed 13884.14 samples/sec   Loss 1.1413   LearningRate 0.0001   Epoch: 30   Global Step: 53510   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-04 10:58:39,793-Speed 13860.38 samples/sec   Loss 1.1556   LearningRate 0.0001   Epoch: 30   Global Step: 53520   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-04 10:58:57,542-Speed 13847.06 samples/sec   Loss 1.1490   LearningRate 0.0001   Epoch: 30   Global Step: 53530   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-04 10:59:15,267-Speed 13866.65 samples/sec   Loss 1.1486   LearningRate 0.0001   Epoch: 30   Global Step: 53540   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-04 10:59:33,043-Speed 13826.31 samples/sec   Loss 1.1454   LearningRate 0.0001   Epoch: 30   Global Step: 53550   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-04 10:59:50,750-Speed 13879.82 samples/sec   Loss 1.1470   LearningRate 0.0001   Epoch: 30   Global Step: 53560   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-04 11:00:08,477-Speed 13864.44 samples/sec   Loss 1.1474   LearningRate 0.0001   Epoch: 30   Global Step: 53570   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-04 11:01:16,099-Speed 3634.36 samples/sec   Loss 1.1487   LearningRate 0.0001   Epoch: 31   Global Step: 53580   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-03-04 11:01:33,671-Speed 13989.27 samples/sec   Loss 1.1449   LearningRate 0.0001   Epoch: 31   Global Step: 53590   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-04 11:01:51,325-Speed 13922.03 samples/sec   Loss 1.1488   LearningRate 0.0001   Epoch: 31   Global Step: 53600   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-04 11:02:09,028-Speed 13883.22 samples/sec   Loss 1.1331   LearningRate 0.0001   Epoch: 31   Global Step: 53610   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-04 11:02:26,702-Speed 13906.75 samples/sec   Loss 1.1434   LearningRate 0.0001   Epoch: 31   Global Step: 53620   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-04 11:02:44,425-Speed 13866.99 samples/sec   Loss 1.1379   LearningRate 0.0001   Epoch: 31   Global Step: 53630   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-04 11:03:02,021-Speed 13967.74 samples/sec   Loss 1.1384   LearningRate 0.0001   Epoch: 31   Global Step: 53640   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-04 11:03:19,729-Speed 13879.40 samples/sec   Loss 1.1456   LearningRate 0.0001   Epoch: 31   Global Step: 53650   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-04 11:03:37,391-Speed 13915.74 samples/sec   Loss 1.1427   LearningRate 0.0001   Epoch: 31   Global Step: 53660   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-04 11:03:55,035-Speed 13929.69 samples/sec   Loss 1.1372   LearningRate 0.0001   Epoch: 31   Global Step: 53670   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-04 11:04:12,654-Speed 13949.86 samples/sec   Loss 1.1361   LearningRate 0.0001   Epoch: 31   Global Step: 53680   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-03-04 11:04:30,454-Speed 13807.50 samples/sec   Loss 1.1332   LearningRate 0.0001   Epoch: 31   Global Step: 53690   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-03-04 11:04:48,161-Speed 13880.15 samples/sec   Loss 1.1398   LearningRate 0.0001   Epoch: 31   Global Step: 53700   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-03-04 11:05:05,990-Speed 13786.59 samples/sec   Loss 1.1411   LearningRate 0.0001   Epoch: 31   Global Step: 53710   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-03-04 11:05:23,814-Speed 13789.39 samples/sec   Loss 1.1485   LearningRate 0.0001   Epoch: 31   Global Step: 53720   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-03-04 11:05:41,601-Speed 13817.46 samples/sec   Loss 1.1319   LearningRate 0.0001   Epoch: 31   Global Step: 53730   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-03-04 11:05:59,269-Speed 13910.68 samples/sec   Loss 1.1323   LearningRate 0.0001   Epoch: 31   Global Step: 53740   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-03-04 11:06:17,066-Speed 13810.13 samples/sec   Loss 1.1391   LearningRate 0.0001   Epoch: 31   Global Step: 53750   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-03-04 11:06:34,801-Speed 13859.09 samples/sec   Loss 1.1418   LearningRate 0.0001   Epoch: 31   Global Step: 53760   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-03-04 11:06:52,539-Speed 13856.25 samples/sec   Loss 1.1393   LearningRate 0.0001   Epoch: 31   Global Step: 53770   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-03-04 11:07:10,343-Speed 13804.46 samples/sec   Loss 1.1405   LearningRate 0.0001   Epoch: 31   Global Step: 53780   Fp16 Grad Scale: 8192   Required: 8 hours
Training: 2022-03-04 11:07:28,233-Speed 13738.50 samples/sec   Loss 1.1339   LearningRate 0.0001   Epoch: 31   Global Step: 53790   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-03-04 11:07:46,327-Speed 13583.28 samples/sec   Loss 1.1441   LearningRate 0.0001   Epoch: 31   Global Step: 53800   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-03-04 11:08:04,350-Speed 13636.92 samples/sec   Loss 1.1358   LearningRate 0.0001   Epoch: 31   Global Step: 53810   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-03-04 11:08:22,451-Speed 13577.92 samples/sec   Loss 1.1304   LearningRate 0.0001   Epoch: 31   Global Step: 53820   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-03-04 11:08:40,497-Speed 13619.24 samples/sec   Loss 1.1382   LearningRate 0.0001   Epoch: 31   Global Step: 53830   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-03-04 11:08:58,189-Speed 13892.66 samples/sec   Loss 1.1406   LearningRate 0.0001   Epoch: 31   Global Step: 53840   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-03-04 11:09:15,888-Speed 13885.88 samples/sec   Loss 1.1397   LearningRate 0.0001   Epoch: 31   Global Step: 53850   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-03-04 11:09:33,617-Speed 13866.47 samples/sec   Loss 1.1426   LearningRate 0.0001   Epoch: 31   Global Step: 53860   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-03-04 11:09:51,294-Speed 13903.59 samples/sec   Loss 1.1321   LearningRate 0.0001   Epoch: 31   Global Step: 53870   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-03-04 11:10:09,016-Speed 13868.35 samples/sec   Loss 1.1438   LearningRate 0.0001   Epoch: 31   Global Step: 53880   Fp16 Grad Scale: 16384   Required: 8 hours
Training: 2022-03-04 11:10:26,752-Speed 13857.98 samples/sec   Loss 1.1381   LearningRate 0.0001   Epoch: 31   Global Step: 53890   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-04 11:10:44,495-Speed 13851.64 samples/sec   Loss 1.1363   LearningRate 0.0001   Epoch: 31   Global Step: 53900   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-04 11:11:02,313-Speed 13794.08 samples/sec   Loss 1.1376   LearningRate 0.0001   Epoch: 31   Global Step: 53910   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-04 11:11:20,140-Speed 13786.38 samples/sec   Loss 1.1400   LearningRate 0.0001   Epoch: 31   Global Step: 53920   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-04 11:11:37,917-Speed 13825.93 samples/sec   Loss 1.1383   LearningRate 0.0001   Epoch: 31   Global Step: 53930   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-04 11:11:55,789-Speed 13751.39 samples/sec   Loss 1.1371   LearningRate 0.0001   Epoch: 31   Global Step: 53940   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-04 11:12:13,606-Speed 13794.96 samples/sec   Loss 1.1369   LearningRate 0.0001   Epoch: 31   Global Step: 53950   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-04 11:12:31,442-Speed 13779.85 samples/sec   Loss 1.1371   LearningRate 0.0001   Epoch: 31   Global Step: 53960   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-04 11:12:49,336-Speed 13735.77 samples/sec   Loss 1.1358   LearningRate 0.0001   Epoch: 31   Global Step: 53970   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-04 11:13:07,241-Speed 13726.53 samples/sec   Loss 1.1350   LearningRate 0.0001   Epoch: 31   Global Step: 53980   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-04 11:13:24,985-Speed 13851.92 samples/sec   Loss 1.1304   LearningRate 0.0001   Epoch: 31   Global Step: 53990   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-03-04 11:13:42,804-Speed 13792.84 samples/sec   Loss 1.1255   LearningRate 0.0001   Epoch: 31   Global Step: 54000   Fp16 Grad Scale: 65536   Required: 8 hours
Training: 2022-03-04 11:14:00,427-Speed 13946.18 samples/sec   Loss 1.1413   LearningRate 0.0001   Epoch: 31   Global Step: 54010   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-04 11:14:18,215-Speed 13816.71 samples/sec   Loss 1.1387   LearningRate 0.0001   Epoch: 31   Global Step: 54020   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-04 11:14:35,967-Speed 13845.11 samples/sec   Loss 1.1381   LearningRate 0.0001   Epoch: 31   Global Step: 54030   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-04 11:14:53,725-Speed 13839.87 samples/sec   Loss 1.1214   LearningRate 0.0001   Epoch: 31   Global Step: 54040   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-04 11:15:11,443-Speed 13872.09 samples/sec   Loss 1.1355   LearningRate 0.0001   Epoch: 31   Global Step: 54050   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-04 11:15:29,267-Speed 13788.66 samples/sec   Loss 1.1363   LearningRate 0.0001   Epoch: 31   Global Step: 54060   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-04 11:15:46,942-Speed 13905.48 samples/sec   Loss 1.1384   LearningRate 0.0001   Epoch: 31   Global Step: 54070   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-04 11:16:04,756-Speed 13796.44 samples/sec   Loss 1.1394   LearningRate 0.0001   Epoch: 31   Global Step: 54080   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-04 11:16:22,442-Speed 13896.99 samples/sec   Loss 1.1338   LearningRate 0.0001   Epoch: 31   Global Step: 54090   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-04 11:16:40,202-Speed 13838.53 samples/sec   Loss 1.1315   LearningRate 0.0001   Epoch: 31   Global Step: 54100   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-04 11:16:57,874-Speed 13907.52 samples/sec   Loss 1.1331   LearningRate 0.0001   Epoch: 31   Global Step: 54110   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-04 11:17:15,677-Speed 13805.89 samples/sec   Loss 1.1410   LearningRate 0.0001   Epoch: 31   Global Step: 54120   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-04 11:17:33,418-Speed 13853.36 samples/sec   Loss 1.1386   LearningRate 0.0001   Epoch: 31   Global Step: 54130   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-04 11:17:51,234-Speed 13795.36 samples/sec   Loss 1.1263   LearningRate 0.0001   Epoch: 31   Global Step: 54140   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-04 11:18:09,057-Speed 13789.15 samples/sec   Loss 1.1273   LearningRate 0.0001   Epoch: 31   Global Step: 54150   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-04 11:18:26,776-Speed 13871.43 samples/sec   Loss 1.1347   LearningRate 0.0001   Epoch: 31   Global Step: 54160   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-04 11:18:44,500-Speed 13867.03 samples/sec   Loss 1.1283   LearningRate 0.0001   Epoch: 31   Global Step: 54170   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-04 11:19:02,194-Speed 13889.52 samples/sec   Loss 1.1286   LearningRate 0.0001   Epoch: 31   Global Step: 54180   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-04 11:19:19,943-Speed 13847.30 samples/sec   Loss 1.1281   LearningRate 0.0001   Epoch: 31   Global Step: 54190   Fp16 Grad Scale: 32768   Required: 8 hours
Training: 2022-03-04 11:19:37,808-Speed 13758.02 samples/sec   Loss 1.1240   LearningRate 0.0001   Epoch: 31   Global Step: 54200   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-04 11:19:55,645-Speed 13780.80 samples/sec   Loss 1.1289   LearningRate 0.0001   Epoch: 31   Global Step: 54210   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-03-04 11:20:13,443-Speed 13810.95 samples/sec   Loss 1.1301   LearningRate 0.0001   Epoch: 31   Global Step: 54220   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-04 11:20:31,254-Speed 13798.98 samples/sec   Loss 1.1291   LearningRate 0.0001   Epoch: 31   Global Step: 54230   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-04 11:20:49,069-Speed 13796.11 samples/sec   Loss 1.1322   LearningRate 0.0001   Epoch: 31   Global Step: 54240   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-04 11:21:06,890-Speed 13792.04 samples/sec   Loss 1.1365   LearningRate 0.0001   Epoch: 31   Global Step: 54250   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-04 11:21:24,695-Speed 13803.54 samples/sec   Loss 1.1328   LearningRate 0.0001   Epoch: 31   Global Step: 54260   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-04 11:21:42,512-Speed 13794.03 samples/sec   Loss 1.1335   LearningRate 0.0001   Epoch: 31   Global Step: 54270   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-04 11:22:00,307-Speed 13811.34 samples/sec   Loss 1.1269   LearningRate 0.0001   Epoch: 31   Global Step: 54280   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-04 11:22:18,089-Speed 13822.03 samples/sec   Loss 1.1241   LearningRate 0.0001   Epoch: 31   Global Step: 54290   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-04 11:22:35,889-Speed 13807.75 samples/sec   Loss 1.1298   LearningRate 0.0001   Epoch: 31   Global Step: 54300   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-04 11:22:53,678-Speed 13817.94 samples/sec   Loss 1.1318   LearningRate 0.0001   Epoch: 31   Global Step: 54310   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-04 11:23:11,542-Speed 13757.79 samples/sec   Loss 1.1237   LearningRate 0.0001   Epoch: 31   Global Step: 54320   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-04 11:23:29,342-Speed 13807.69 samples/sec   Loss 1.1320   LearningRate 0.0001   Epoch: 31   Global Step: 54330   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-04 11:23:47,117-Speed 13827.01 samples/sec   Loss 1.1239   LearningRate 0.0001   Epoch: 31   Global Step: 54340   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-04 11:24:04,847-Speed 13862.21 samples/sec   Loss 1.1238   LearningRate 0.0001   Epoch: 31   Global Step: 54350   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-04 11:24:22,629-Speed 13821.93 samples/sec   Loss 1.1302   LearningRate 0.0001   Epoch: 31   Global Step: 54360   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-04 11:24:40,466-Speed 13779.40 samples/sec   Loss 1.1351   LearningRate 0.0001   Epoch: 31   Global Step: 54370   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-04 11:24:58,235-Speed 13831.55 samples/sec   Loss 1.1249   LearningRate 0.0001   Epoch: 31   Global Step: 54380   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-04 11:25:16,037-Speed 13806.30 samples/sec   Loss 1.1265   LearningRate 0.0001   Epoch: 31   Global Step: 54390   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-04 11:25:33,749-Speed 13875.96 samples/sec   Loss 1.1231   LearningRate 0.0001   Epoch: 31   Global Step: 54400   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-04 11:25:51,518-Speed 13831.84 samples/sec   Loss 1.1223   LearningRate 0.0001   Epoch: 31   Global Step: 54410   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-04 11:26:09,250-Speed 13860.59 samples/sec   Loss 1.1247   LearningRate 0.0001   Epoch: 31   Global Step: 54420   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-04 11:26:27,095-Speed 13772.79 samples/sec   Loss 1.1156   LearningRate 0.0001   Epoch: 31   Global Step: 54430   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-04 11:26:44,866-Speed 13830.43 samples/sec   Loss 1.1112   LearningRate 0.0001   Epoch: 31   Global Step: 54440   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-04 11:27:02,631-Speed 13835.14 samples/sec   Loss 1.1225   LearningRate 0.0001   Epoch: 31   Global Step: 54450   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-04 11:27:20,382-Speed 13845.71 samples/sec   Loss 1.1256   LearningRate 0.0001   Epoch: 31   Global Step: 54460   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-04 11:27:38,154-Speed 13829.53 samples/sec   Loss 1.1213   LearningRate 0.0001   Epoch: 31   Global Step: 54470   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-04 11:27:55,865-Speed 13876.74 samples/sec   Loss 1.1128   LearningRate 0.0001   Epoch: 31   Global Step: 54480   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-04 11:28:13,596-Speed 13861.38 samples/sec   Loss 1.1173   LearningRate 0.0001   Epoch: 31   Global Step: 54490   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-04 11:28:31,360-Speed 13835.46 samples/sec   Loss 1.1191   LearningRate 0.0001   Epoch: 31   Global Step: 54500   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-04 11:28:49,135-Speed 13827.20 samples/sec   Loss 1.1259   LearningRate 0.0001   Epoch: 31   Global Step: 54510   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-04 11:29:06,885-Speed 13847.07 samples/sec   Loss 1.1152   LearningRate 0.0001   Epoch: 31   Global Step: 54520   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-04 11:29:24,692-Speed 13802.36 samples/sec   Loss 1.1166   LearningRate 0.0001   Epoch: 31   Global Step: 54530   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-04 11:29:42,406-Speed 13875.07 samples/sec   Loss 1.1258   LearningRate 0.0001   Epoch: 31   Global Step: 54540   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-04 11:30:00,065-Speed 13917.74 samples/sec   Loss 1.1177   LearningRate 0.0001   Epoch: 31   Global Step: 54550   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-04 11:30:17,888-Speed 13789.41 samples/sec   Loss 1.1151   LearningRate 0.0001   Epoch: 31   Global Step: 54560   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-04 11:30:35,610-Speed 13868.17 samples/sec   Loss 1.1233   LearningRate 0.0001   Epoch: 31   Global Step: 54570   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-04 11:30:53,354-Speed 13851.56 samples/sec   Loss 1.1120   LearningRate 0.0001   Epoch: 31   Global Step: 54580   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-04 11:31:11,079-Speed 13866.44 samples/sec   Loss 1.1283   LearningRate 0.0001   Epoch: 31   Global Step: 54590   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-04 11:31:28,880-Speed 13806.84 samples/sec   Loss 1.1226   LearningRate 0.0001   Epoch: 31   Global Step: 54600   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-04 11:31:46,682-Speed 13805.56 samples/sec   Loss 1.1256   LearningRate 0.0001   Epoch: 31   Global Step: 54610   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-04 11:32:04,445-Speed 13836.23 samples/sec   Loss 1.1265   LearningRate 0.0001   Epoch: 31   Global Step: 54620   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-04 11:32:22,386-Speed 13699.88 samples/sec   Loss 1.1176   LearningRate 0.0001   Epoch: 31   Global Step: 54630   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-04 11:32:40,220-Speed 13781.17 samples/sec   Loss 1.1282   LearningRate 0.0001   Epoch: 31   Global Step: 54640   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-04 11:32:57,905-Speed 13896.90 samples/sec   Loss 1.1268   LearningRate 0.0001   Epoch: 31   Global Step: 54650   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-04 11:33:15,710-Speed 13804.18 samples/sec   Loss 1.1157   LearningRate 0.0001   Epoch: 31   Global Step: 54660   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-04 11:33:33,616-Speed 13725.72 samples/sec   Loss 1.1269   LearningRate 0.0001   Epoch: 31   Global Step: 54670   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-04 11:33:51,350-Speed 13859.55 samples/sec   Loss 1.1249   LearningRate 0.0001   Epoch: 31   Global Step: 54680   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-04 11:34:09,051-Speed 13884.32 samples/sec   Loss 1.1182   LearningRate 0.0001   Epoch: 31   Global Step: 54690   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-04 11:34:26,856-Speed 13803.97 samples/sec   Loss 1.1125   LearningRate 0.0001   Epoch: 31   Global Step: 54700   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-04 11:34:44,612-Speed 13842.19 samples/sec   Loss 1.1110   LearningRate 0.0001   Epoch: 31   Global Step: 54710   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-04 11:35:02,511-Speed 13731.28 samples/sec   Loss 1.1203   LearningRate 0.0001   Epoch: 31   Global Step: 54720   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-04 11:35:20,331-Speed 13791.81 samples/sec   Loss 1.1181   LearningRate 0.0001   Epoch: 31   Global Step: 54730   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-04 11:35:38,118-Speed 13817.64 samples/sec   Loss 1.1211   LearningRate 0.0001   Epoch: 31   Global Step: 54740   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-04 11:35:55,868-Speed 13846.65 samples/sec   Loss 1.1183   LearningRate 0.0001   Epoch: 31   Global Step: 54750   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-04 11:36:13,552-Speed 13898.56 samples/sec   Loss 1.1277   LearningRate 0.0001   Epoch: 31   Global Step: 54760   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-04 11:36:31,289-Speed 13856.52 samples/sec   Loss 1.1095   LearningRate 0.0001   Epoch: 31   Global Step: 54770   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-04 11:36:49,013-Speed 13866.66 samples/sec   Loss 1.1220   LearningRate 0.0001   Epoch: 31   Global Step: 54780   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-04 11:37:06,799-Speed 13818.77 samples/sec   Loss 1.1083   LearningRate 0.0001   Epoch: 31   Global Step: 54790   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-04 11:37:24,497-Speed 13887.28 samples/sec   Loss 1.1119   LearningRate 0.0001   Epoch: 31   Global Step: 54800   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-04 11:37:42,296-Speed 13808.30 samples/sec   Loss 1.1221   LearningRate 0.0001   Epoch: 31   Global Step: 54810   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-04 11:38:00,041-Speed 13850.53 samples/sec   Loss 1.1122   LearningRate 0.0001   Epoch: 31   Global Step: 54820   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-04 11:38:17,780-Speed 13854.59 samples/sec   Loss 1.1207   LearningRate 0.0001   Epoch: 31   Global Step: 54830   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-04 11:38:35,489-Speed 13879.01 samples/sec   Loss 1.1080   LearningRate 0.0001   Epoch: 31   Global Step: 54840   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-04 11:38:53,315-Speed 13788.01 samples/sec   Loss 1.1104   LearningRate 0.0001   Epoch: 31   Global Step: 54850   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-04 11:39:11,053-Speed 13855.29 samples/sec   Loss 1.1152   LearningRate 0.0001   Epoch: 31   Global Step: 54860   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-04 11:39:28,777-Speed 13867.02 samples/sec   Loss 1.1137   LearningRate 0.0001   Epoch: 31   Global Step: 54870   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-04 11:39:46,550-Speed 13828.88 samples/sec   Loss 1.1128   LearningRate 0.0001   Epoch: 31   Global Step: 54880   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-04 11:40:04,349-Speed 13810.55 samples/sec   Loss 1.1167   LearningRate 0.0001   Epoch: 31   Global Step: 54890   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-04 11:40:22,106-Speed 13840.90 samples/sec   Loss 1.1063   LearningRate 0.0001   Epoch: 31   Global Step: 54900   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-04 11:40:39,891-Speed 13818.89 samples/sec   Loss 1.1151   LearningRate 0.0001   Epoch: 31   Global Step: 54910   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-04 11:40:57,657-Speed 13834.53 samples/sec   Loss 1.1186   LearningRate 0.0001   Epoch: 31   Global Step: 54920   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-04 11:41:15,418-Speed 13838.22 samples/sec   Loss 1.1098   LearningRate 0.0001   Epoch: 31   Global Step: 54930   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-04 11:41:33,208-Speed 13815.22 samples/sec   Loss 1.1125   LearningRate 0.0001   Epoch: 31   Global Step: 54940   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-04 11:41:50,913-Speed 13881.88 samples/sec   Loss 1.1069   LearningRate 0.0001   Epoch: 31   Global Step: 54950   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-04 11:42:08,642-Speed 13862.58 samples/sec   Loss 1.1128   LearningRate 0.0001   Epoch: 31   Global Step: 54960   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-04 11:42:26,335-Speed 13891.48 samples/sec   Loss 1.1188   LearningRate 0.0001   Epoch: 31   Global Step: 54970   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-04 11:42:44,105-Speed 13830.37 samples/sec   Loss 1.1095   LearningRate 0.0001   Epoch: 31   Global Step: 54980   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-04 11:43:01,919-Speed 13796.50 samples/sec   Loss 1.1129   LearningRate 0.0001   Epoch: 31   Global Step: 54990   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-04 11:43:19,652-Speed 13861.11 samples/sec   Loss 1.1026   LearningRate 0.0001   Epoch: 31   Global Step: 55000   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-04 11:43:37,378-Speed 13864.87 samples/sec   Loss 1.1185   LearningRate 0.0001   Epoch: 31   Global Step: 55010   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-04 11:43:55,048-Speed 13909.24 samples/sec   Loss 1.1157   LearningRate 0.0001   Epoch: 31   Global Step: 55020   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-04 11:44:12,840-Speed 13814.43 samples/sec   Loss 1.1146   LearningRate 0.0001   Epoch: 31   Global Step: 55030   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-04 11:44:30,548-Speed 13878.91 samples/sec   Loss 1.1179   LearningRate 0.0001   Epoch: 31   Global Step: 55040   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-04 11:44:48,301-Speed 13844.01 samples/sec   Loss 1.1108   LearningRate 0.0001   Epoch: 31   Global Step: 55050   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-04 11:45:06,115-Speed 13796.90 samples/sec   Loss 1.1150   LearningRate 0.0001   Epoch: 31   Global Step: 55060   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-04 11:45:23,941-Speed 13787.33 samples/sec   Loss 1.1155   LearningRate 0.0001   Epoch: 31   Global Step: 55070   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-04 11:45:41,771-Speed 13784.91 samples/sec   Loss 1.1118   LearningRate 0.0001   Epoch: 31   Global Step: 55080   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-03-04 11:45:59,625-Speed 13765.46 samples/sec   Loss 1.1091   LearningRate 0.0001   Epoch: 31   Global Step: 55090   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-03-04 11:46:17,496-Speed 13753.27 samples/sec   Loss 1.1094   LearningRate 0.0001   Epoch: 31   Global Step: 55100   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-04 11:46:35,220-Speed 13868.85 samples/sec   Loss 1.1135   LearningRate 0.0001   Epoch: 31   Global Step: 55110   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-04 11:46:53,097-Speed 13747.69 samples/sec   Loss 1.1089   LearningRate 0.0001   Epoch: 31   Global Step: 55120   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-04 11:47:10,809-Speed 13876.56 samples/sec   Loss 1.1073   LearningRate 0.0001   Epoch: 31   Global Step: 55130   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-04 11:47:28,733-Speed 13712.38 samples/sec   Loss 1.1134   LearningRate 0.0001   Epoch: 31   Global Step: 55140   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-04 11:47:46,433-Speed 13885.71 samples/sec   Loss 1.1063   LearningRate 0.0001   Epoch: 31   Global Step: 55150   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-04 11:48:04,253-Speed 13791.59 samples/sec   Loss 1.1188   LearningRate 0.0001   Epoch: 31   Global Step: 55160   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-04 11:48:21,953-Speed 13885.76 samples/sec   Loss 1.1137   LearningRate 0.0001   Epoch: 31   Global Step: 55170   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-04 11:48:39,650-Speed 13888.06 samples/sec   Loss 1.1224   LearningRate 0.0001   Epoch: 31   Global Step: 55180   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-04 11:48:57,463-Speed 13797.71 samples/sec   Loss 1.1111   LearningRate 0.0001   Epoch: 31   Global Step: 55190   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-04 11:49:15,229-Speed 13834.01 samples/sec   Loss 1.1067   LearningRate 0.0001   Epoch: 31   Global Step: 55200   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-04 11:49:33,002-Speed 13829.89 samples/sec   Loss 1.1103   LearningRate 0.0000   Epoch: 31   Global Step: 55210   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-04 11:49:50,771-Speed 13833.11 samples/sec   Loss 1.1174   LearningRate 0.0000   Epoch: 31   Global Step: 55220   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-04 11:50:08,582-Speed 13799.14 samples/sec   Loss 1.1110   LearningRate 0.0000   Epoch: 31   Global Step: 55230   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-04 11:50:26,485-Speed 13728.33 samples/sec   Loss 1.1103   LearningRate 0.0000   Epoch: 31   Global Step: 55240   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-04 11:50:44,243-Speed 13839.80 samples/sec   Loss 1.0988   LearningRate 0.0000   Epoch: 31   Global Step: 55250   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-04 11:51:02,043-Speed 13808.24 samples/sec   Loss 1.1055   LearningRate 0.0000   Epoch: 31   Global Step: 55260   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-04 11:51:19,766-Speed 13867.17 samples/sec   Loss 1.1113   LearningRate 0.0000   Epoch: 31   Global Step: 55270   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-04 11:51:37,583-Speed 13794.26 samples/sec   Loss 1.1051   LearningRate 0.0000   Epoch: 31   Global Step: 55280   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-04 11:51:55,375-Speed 13814.21 samples/sec   Loss 1.1082   LearningRate 0.0000   Epoch: 31   Global Step: 55290   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-04 11:52:13,166-Speed 13814.58 samples/sec   Loss 1.1158   LearningRate 0.0000   Epoch: 31   Global Step: 55300   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-04 11:53:21,554-Speed 3593.71 samples/sec   Loss 1.1039   LearningRate 0.0000   Epoch: 32   Global Step: 55310   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-04 11:53:39,325-Speed 13829.55 samples/sec   Loss 1.1060   LearningRate 0.0000   Epoch: 32   Global Step: 55320   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-04 11:53:57,093-Speed 13832.90 samples/sec   Loss 1.1053   LearningRate 0.0000   Epoch: 32   Global Step: 55330   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-04 11:54:14,764-Speed 13908.58 samples/sec   Loss 1.1045   LearningRate 0.0000   Epoch: 32   Global Step: 55340   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-04 11:54:32,460-Speed 13888.41 samples/sec   Loss 1.1060   LearningRate 0.0000   Epoch: 32   Global Step: 55350   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-04 11:54:50,190-Speed 13862.20 samples/sec   Loss 1.1098   LearningRate 0.0000   Epoch: 32   Global Step: 55360   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-04 11:55:08,181-Speed 13661.16 samples/sec   Loss 1.1003   LearningRate 0.0000   Epoch: 32   Global Step: 55370   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-04 11:55:26,159-Speed 13672.15 samples/sec   Loss 1.1036   LearningRate 0.0000   Epoch: 32   Global Step: 55380   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-04 11:55:44,130-Speed 13676.21 samples/sec   Loss 1.1031   LearningRate 0.0000   Epoch: 32   Global Step: 55390   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-04 11:56:02,120-Speed 13662.28 samples/sec   Loss 1.1047   LearningRate 0.0000   Epoch: 32   Global Step: 55400   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-04 11:56:20,137-Speed 13641.17 samples/sec   Loss 1.0946   LearningRate 0.0000   Epoch: 32   Global Step: 55410   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-04 11:56:38,149-Speed 13645.21 samples/sec   Loss 1.1122   LearningRate 0.0000   Epoch: 32   Global Step: 55420   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-04 11:56:56,219-Speed 13601.45 samples/sec   Loss 1.1032   LearningRate 0.0000   Epoch: 32   Global Step: 55430   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-04 11:57:14,350-Speed 13555.04 samples/sec   Loss 1.0976   LearningRate 0.0000   Epoch: 32   Global Step: 55440   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-04 11:57:32,355-Speed 13651.06 samples/sec   Loss 1.1032   LearningRate 0.0000   Epoch: 32   Global Step: 55450   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-04 11:57:50,326-Speed 13675.97 samples/sec   Loss 1.0993   LearningRate 0.0000   Epoch: 32   Global Step: 55460   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-04 11:58:08,288-Speed 13683.21 samples/sec   Loss 1.1028   LearningRate 0.0000   Epoch: 32   Global Step: 55470   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-04 11:58:25,952-Speed 13913.78 samples/sec   Loss 1.1060   LearningRate 0.0000   Epoch: 32   Global Step: 55480   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-04 11:58:43,630-Speed 13902.82 samples/sec   Loss 1.1024   LearningRate 0.0000   Epoch: 32   Global Step: 55490   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-04 11:59:01,337-Speed 13880.61 samples/sec   Loss 1.0981   LearningRate 0.0000   Epoch: 32   Global Step: 55500   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-04 11:59:19,034-Speed 13887.65 samples/sec   Loss 1.1100   LearningRate 0.0000   Epoch: 32   Global Step: 55510   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-04 11:59:36,754-Speed 13869.99 samples/sec   Loss 1.0969   LearningRate 0.0000   Epoch: 32   Global Step: 55520   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-04 11:59:54,537-Speed 13821.18 samples/sec   Loss 1.0990   LearningRate 0.0000   Epoch: 32   Global Step: 55530   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-04 12:00:12,315-Speed 13824.60 samples/sec   Loss 1.0974   LearningRate 0.0000   Epoch: 32   Global Step: 55540   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-04 12:00:29,980-Speed 13913.01 samples/sec   Loss 1.1023   LearningRate 0.0000   Epoch: 32   Global Step: 55550   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-04 12:00:47,759-Speed 13823.84 samples/sec   Loss 1.1013   LearningRate 0.0000   Epoch: 32   Global Step: 55560   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-04 12:01:05,627-Speed 13756.56 samples/sec   Loss 1.1046   LearningRate 0.0000   Epoch: 32   Global Step: 55570   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-04 12:01:23,414-Speed 13817.66 samples/sec   Loss 1.1014   LearningRate 0.0000   Epoch: 32   Global Step: 55580   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-04 12:01:41,170-Speed 13841.73 samples/sec   Loss 1.0996   LearningRate 0.0000   Epoch: 32   Global Step: 55590   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-04 12:01:58,883-Speed 13875.92 samples/sec   Loss 1.0975   LearningRate 0.0000   Epoch: 32   Global Step: 55600   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-04 12:02:16,606-Speed 13867.27 samples/sec   Loss 1.0972   LearningRate 0.0000   Epoch: 32   Global Step: 55610   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-04 12:02:34,313-Speed 13880.03 samples/sec   Loss 1.1022   LearningRate 0.0000   Epoch: 32   Global Step: 55620   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-04 12:02:52,153-Speed 13776.63 samples/sec   Loss 1.1020   LearningRate 0.0000   Epoch: 32   Global Step: 55630   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-04 12:03:09,866-Speed 13876.15 samples/sec   Loss 1.1043   LearningRate 0.0000   Epoch: 32   Global Step: 55640   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-04 12:03:27,634-Speed 13832.68 samples/sec   Loss 1.0946   LearningRate 0.0000   Epoch: 32   Global Step: 55650   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-04 12:03:45,289-Speed 13920.28 samples/sec   Loss 1.0916   LearningRate 0.0000   Epoch: 32   Global Step: 55660   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-04 12:04:03,020-Speed 13861.70 samples/sec   Loss 1.1029   LearningRate 0.0000   Epoch: 32   Global Step: 55670   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-04 12:04:20,744-Speed 13866.97 samples/sec   Loss 1.0998   LearningRate 0.0000   Epoch: 32   Global Step: 55680   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-04 12:04:38,468-Speed 13867.09 samples/sec   Loss 1.1003   LearningRate 0.0000   Epoch: 32   Global Step: 55690   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-04 12:04:56,170-Speed 13883.47 samples/sec   Loss 1.0950   LearningRate 0.0000   Epoch: 32   Global Step: 55700   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-04 12:05:13,871-Speed 13885.95 samples/sec   Loss 1.0999   LearningRate 0.0000   Epoch: 32   Global Step: 55710   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-04 12:05:31,570-Speed 13886.81 samples/sec   Loss 1.1048   LearningRate 0.0000   Epoch: 32   Global Step: 55720   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-04 12:05:49,419-Speed 13769.18 samples/sec   Loss 1.0998   LearningRate 0.0000   Epoch: 32   Global Step: 55730   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-04 12:06:07,194-Speed 13827.38 samples/sec   Loss 1.0964   LearningRate 0.0000   Epoch: 32   Global Step: 55740   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-04 12:06:24,965-Speed 13830.49 samples/sec   Loss 1.1015   LearningRate 0.0000   Epoch: 32   Global Step: 55750   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-04 12:06:42,624-Speed 13917.14 samples/sec   Loss 1.0982   LearningRate 0.0000   Epoch: 32   Global Step: 55760   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-04 12:07:00,329-Speed 13882.17 samples/sec   Loss 1.0984   LearningRate 0.0000   Epoch: 32   Global Step: 55770   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-04 12:07:18,104-Speed 13827.24 samples/sec   Loss 1.0972   LearningRate 0.0000   Epoch: 32   Global Step: 55780   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-04 12:07:35,819-Speed 13873.86 samples/sec   Loss 1.1075   LearningRate 0.0000   Epoch: 32   Global Step: 55790   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-04 12:07:53,555-Speed 13857.38 samples/sec   Loss 1.0988   LearningRate 0.0000   Epoch: 32   Global Step: 55800   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-04 12:08:11,245-Speed 13893.53 samples/sec   Loss 1.0961   LearningRate 0.0000   Epoch: 32   Global Step: 55810   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-04 12:08:28,955-Speed 13877.74 samples/sec   Loss 1.0962   LearningRate 0.0000   Epoch: 32   Global Step: 55820   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-04 12:08:46,743-Speed 13816.53 samples/sec   Loss 1.0970   LearningRate 0.0000   Epoch: 32   Global Step: 55830   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-04 12:09:04,514-Speed 13830.43 samples/sec   Loss 1.0992   LearningRate 0.0000   Epoch: 32   Global Step: 55840   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-04 12:09:22,238-Speed 13866.52 samples/sec   Loss 1.0892   LearningRate 0.0000   Epoch: 32   Global Step: 55850   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-03-04 12:09:39,991-Speed 13844.63 samples/sec   Loss 1.0981   LearningRate 0.0000   Epoch: 32   Global Step: 55860   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-04 12:09:57,770-Speed 13824.23 samples/sec   Loss 1.0954   LearningRate 0.0000   Epoch: 32   Global Step: 55870   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-04 12:10:15,519-Speed 13847.19 samples/sec   Loss 1.0937   LearningRate 0.0000   Epoch: 32   Global Step: 55880   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-04 12:10:33,236-Speed 13872.27 samples/sec   Loss 1.0940   LearningRate 0.0000   Epoch: 32   Global Step: 55890   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-04 12:10:51,014-Speed 13825.40 samples/sec   Loss 1.0976   LearningRate 0.0000   Epoch: 32   Global Step: 55900   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-04 12:11:08,758-Speed 13851.04 samples/sec   Loss 1.0922   LearningRate 0.0000   Epoch: 32   Global Step: 55910   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-04 12:11:26,549-Speed 13814.43 samples/sec   Loss 1.0986   LearningRate 0.0000   Epoch: 32   Global Step: 55920   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-04 12:11:44,373-Speed 13788.99 samples/sec   Loss 1.0990   LearningRate 0.0000   Epoch: 32   Global Step: 55930   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-04 12:12:02,157-Speed 13821.39 samples/sec   Loss 1.0983   LearningRate 0.0000   Epoch: 32   Global Step: 55940   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-04 12:12:19,905-Speed 13847.87 samples/sec   Loss 1.0944   LearningRate 0.0000   Epoch: 32   Global Step: 55950   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-04 12:12:37,750-Speed 13772.69 samples/sec   Loss 1.0906   LearningRate 0.0000   Epoch: 32   Global Step: 55960   Fp16 Grad Scale: 65536   Required: 7 hours
Training: 2022-03-04 12:12:55,429-Speed 13902.20 samples/sec   Loss 1.0930   LearningRate 0.0000   Epoch: 32   Global Step: 55970   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-04 12:13:13,134-Speed 13881.78 samples/sec   Loss 1.0894   LearningRate 0.0000   Epoch: 32   Global Step: 55980   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-04 12:13:30,980-Speed 13771.91 samples/sec   Loss 1.1007   LearningRate 0.0000   Epoch: 32   Global Step: 55990   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-04 12:13:48,691-Speed 13877.42 samples/sec   Loss 1.0914   LearningRate 0.0000   Epoch: 32   Global Step: 56000   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-04 12:14:06,434-Speed 13851.67 samples/sec   Loss 1.0881   LearningRate 0.0000   Epoch: 32   Global Step: 56010   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-04 12:14:24,309-Speed 13750.56 samples/sec   Loss 1.0946   LearningRate 0.0000   Epoch: 32   Global Step: 56020   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-04 12:14:42,068-Speed 13839.53 samples/sec   Loss 1.0906   LearningRate 0.0000   Epoch: 32   Global Step: 56030   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-04 12:14:59,822-Speed 13844.39 samples/sec   Loss 1.0927   LearningRate 0.0000   Epoch: 32   Global Step: 56040   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-04 12:15:17,669-Speed 13770.61 samples/sec   Loss 1.0897   LearningRate 0.0000   Epoch: 32   Global Step: 56050   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-04 12:15:35,447-Speed 13825.15 samples/sec   Loss 1.0925   LearningRate 0.0000   Epoch: 32   Global Step: 56060   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-04 12:15:53,192-Speed 13850.29 samples/sec   Loss 1.0916   LearningRate 0.0000   Epoch: 32   Global Step: 56070   Fp16 Grad Scale: 16384   Required: 7 hours
Training: 2022-03-04 12:16:10,900-Speed 13879.74 samples/sec   Loss 1.0839   LearningRate 0.0000   Epoch: 32   Global Step: 56080   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-04 12:16:28,594-Speed 13889.76 samples/sec   Loss 1.0862   LearningRate 0.0000   Epoch: 32   Global Step: 56090   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-04 12:16:46,329-Speed 13858.53 samples/sec   Loss 1.0911   LearningRate 0.0000   Epoch: 32   Global Step: 56100   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-04 12:17:04,068-Speed 13855.50 samples/sec   Loss 1.0902   LearningRate 0.0000   Epoch: 32   Global Step: 56110   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-04 12:17:21,826-Speed 13840.21 samples/sec   Loss 1.0859   LearningRate 0.0000   Epoch: 32   Global Step: 56120   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-04 12:17:39,578-Speed 13844.64 samples/sec   Loss 1.1048   LearningRate 0.0000   Epoch: 32   Global Step: 56130   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-04 12:17:57,353-Speed 13826.91 samples/sec   Loss 1.0907   LearningRate 0.0000   Epoch: 32   Global Step: 56140   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-04 12:18:15,093-Speed 13854.77 samples/sec   Loss 1.0872   LearningRate 0.0000   Epoch: 32   Global Step: 56150   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-04 12:18:32,850-Speed 13841.40 samples/sec   Loss 1.0919   LearningRate 0.0000   Epoch: 32   Global Step: 56160   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-04 12:18:50,669-Speed 13792.43 samples/sec   Loss 1.0964   LearningRate 0.0000   Epoch: 32   Global Step: 56170   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-04 12:19:08,430-Speed 13837.88 samples/sec   Loss 1.0926   LearningRate 0.0000   Epoch: 32   Global Step: 56180   Fp16 Grad Scale: 32768   Required: 7 hours
Training: 2022-03-04 12:19:26,100-Speed 13909.23 samples/sec   Loss 1.0893   LearningRate 0.0000   Epoch: 32   Global Step: 56190   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-04 12:19:43,881-Speed 13822.07 samples/sec   Loss 1.0941   LearningRate 0.0000   Epoch: 32   Global Step: 56200   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-04 12:20:01,633-Speed 13845.08 samples/sec   Loss 1.0886   LearningRate 0.0000   Epoch: 32   Global Step: 56210   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-04 12:20:19,437-Speed 13805.02 samples/sec   Loss 1.0872   LearningRate 0.0000   Epoch: 32   Global Step: 56220   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-04 12:20:37,168-Speed 13860.95 samples/sec   Loss 1.0823   LearningRate 0.0000   Epoch: 32   Global Step: 56230   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-04 12:20:55,014-Speed 13772.13 samples/sec   Loss 1.0894   LearningRate 0.0000   Epoch: 32   Global Step: 56240   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-04 12:21:12,792-Speed 13826.19 samples/sec   Loss 1.0795   LearningRate 0.0000   Epoch: 32   Global Step: 56250   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-04 12:21:30,594-Speed 13806.07 samples/sec   Loss 1.0850   LearningRate 0.0000   Epoch: 32   Global Step: 56260   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-04 12:21:48,373-Speed 13824.02 samples/sec   Loss 1.0853   LearningRate 0.0000   Epoch: 32   Global Step: 56270   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-04 12:22:06,132-Speed 13840.37 samples/sec   Loss 1.0854   LearningRate 0.0000   Epoch: 32   Global Step: 56280   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-03-04 12:22:23,912-Speed 13823.16 samples/sec   Loss 1.0877   LearningRate 0.0000   Epoch: 32   Global Step: 56290   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-03-04 12:22:41,671-Speed 13840.00 samples/sec   Loss 1.0808   LearningRate 0.0000   Epoch: 32   Global Step: 56300   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-03-04 12:22:59,380-Speed 13878.80 samples/sec   Loss 1.0912   LearningRate 0.0000   Epoch: 32   Global Step: 56310   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-04 12:23:17,065-Speed 13897.13 samples/sec   Loss 1.0772   LearningRate 0.0000   Epoch: 32   Global Step: 56320   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-04 12:23:34,877-Speed 13798.94 samples/sec   Loss 1.0820   LearningRate 0.0000   Epoch: 32   Global Step: 56330   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-04 12:23:52,705-Speed 13785.47 samples/sec   Loss 1.0800   LearningRate 0.0000   Epoch: 32   Global Step: 56340   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-04 12:24:10,460-Speed 13842.36 samples/sec   Loss 1.0963   LearningRate 0.0000   Epoch: 32   Global Step: 56350   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-04 12:24:28,137-Speed 13903.82 samples/sec   Loss 1.0821   LearningRate 0.0000   Epoch: 32   Global Step: 56360   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-04 12:24:45,828-Speed 13893.09 samples/sec   Loss 1.0821   LearningRate 0.0000   Epoch: 32   Global Step: 56370   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-04 12:25:03,573-Speed 13850.73 samples/sec   Loss 1.0893   LearningRate 0.0000   Epoch: 32   Global Step: 56380   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-04 12:25:21,333-Speed 13837.92 samples/sec   Loss 1.0800   LearningRate 0.0000   Epoch: 32   Global Step: 56390   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-04 12:25:39,147-Speed 13796.86 samples/sec   Loss 1.0883   LearningRate 0.0000   Epoch: 32   Global Step: 56400   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-04 12:25:56,873-Speed 13865.67 samples/sec   Loss 1.0875   LearningRate 0.0000   Epoch: 32   Global Step: 56410   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-03-04 12:26:14,591-Speed 13871.43 samples/sec   Loss 1.0812   LearningRate 0.0000   Epoch: 32   Global Step: 56420   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-04 12:26:32,293-Speed 13883.76 samples/sec   Loss 1.0896   LearningRate 0.0000   Epoch: 32   Global Step: 56430   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-04 12:26:50,039-Speed 13849.59 samples/sec   Loss 1.0828   LearningRate 0.0000   Epoch: 32   Global Step: 56440   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-04 12:27:07,829-Speed 13815.99 samples/sec   Loss 1.0870   LearningRate 0.0000   Epoch: 32   Global Step: 56450   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-04 12:27:25,792-Speed 13682.11 samples/sec   Loss 1.0821   LearningRate 0.0000   Epoch: 32   Global Step: 56460   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-04 12:27:43,499-Speed 13881.47 samples/sec   Loss 1.0844   LearningRate 0.0000   Epoch: 32   Global Step: 56470   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-04 12:28:01,412-Speed 13720.68 samples/sec   Loss 1.0835   LearningRate 0.0000   Epoch: 32   Global Step: 56480   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-04 12:28:19,213-Speed 13806.75 samples/sec   Loss 1.0831   LearningRate 0.0000   Epoch: 32   Global Step: 56490   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-04 12:28:36,950-Speed 13856.81 samples/sec   Loss 1.0818   LearningRate 0.0000   Epoch: 32   Global Step: 56500   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-04 12:28:54,777-Speed 13787.48 samples/sec   Loss 1.0816   LearningRate 0.0000   Epoch: 32   Global Step: 56510   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-04 12:29:12,603-Speed 13787.32 samples/sec   Loss 1.0746   LearningRate 0.0000   Epoch: 32   Global Step: 56520   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-03-04 12:29:30,500-Speed 13734.04 samples/sec   Loss 1.0843   LearningRate 0.0000   Epoch: 32   Global Step: 56530   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-03-04 12:29:48,305-Speed 13803.68 samples/sec   Loss 1.0740   LearningRate 0.0000   Epoch: 32   Global Step: 56540   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-03-04 12:30:06,061-Speed 13842.05 samples/sec   Loss 1.0797   LearningRate 0.0000   Epoch: 32   Global Step: 56550   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-04 12:30:23,982-Speed 13715.59 samples/sec   Loss 1.0876   LearningRate 0.0000   Epoch: 32   Global Step: 56560   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-04 12:30:41,747-Speed 13837.56 samples/sec   Loss 1.0794   LearningRate 0.0000   Epoch: 32   Global Step: 56570   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-04 12:30:59,916-Speed 13528.73 samples/sec   Loss 1.0777   LearningRate 0.0000   Epoch: 32   Global Step: 56580   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-04 12:31:18,070-Speed 13538.04 samples/sec   Loss 1.0818   LearningRate 0.0000   Epoch: 32   Global Step: 56590   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-04 12:31:36,104-Speed 13628.46 samples/sec   Loss 1.0805   LearningRate 0.0000   Epoch: 32   Global Step: 56600   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-04 12:31:54,197-Speed 13584.16 samples/sec   Loss 1.0757   LearningRate 0.0000   Epoch: 32   Global Step: 56610   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-04 12:32:12,285-Speed 13587.55 samples/sec   Loss 1.0809   LearningRate 0.0000   Epoch: 32   Global Step: 56620   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-04 12:32:30,392-Speed 13573.61 samples/sec   Loss 1.0821   LearningRate 0.0000   Epoch: 32   Global Step: 56630   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-04 12:32:48,559-Speed 13529.15 samples/sec   Loss 1.0724   LearningRate 0.0000   Epoch: 32   Global Step: 56640   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-04 12:33:06,585-Speed 13634.55 samples/sec   Loss 1.0873   LearningRate 0.0000   Epoch: 32   Global Step: 56650   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-04 12:33:24,649-Speed 13605.87 samples/sec   Loss 1.0862   LearningRate 0.0000   Epoch: 32   Global Step: 56660   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-04 12:33:42,756-Speed 13573.29 samples/sec   Loss 1.0811   LearningRate 0.0000   Epoch: 32   Global Step: 56670   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-04 12:34:00,854-Speed 13580.54 samples/sec   Loss 1.0739   LearningRate 0.0000   Epoch: 32   Global Step: 56680   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-04 12:34:18,930-Speed 13596.31 samples/sec   Loss 1.0803   LearningRate 0.0000   Epoch: 32   Global Step: 56690   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-04 12:34:37,021-Speed 13586.29 samples/sec   Loss 1.0833   LearningRate 0.0000   Epoch: 32   Global Step: 56700   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-04 12:34:55,109-Speed 13587.63 samples/sec   Loss 1.0767   LearningRate 0.0000   Epoch: 32   Global Step: 56710   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-04 12:35:13,159-Speed 13616.11 samples/sec   Loss 1.0705   LearningRate 0.0000   Epoch: 32   Global Step: 56720   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-04 12:35:31,255-Speed 13581.23 samples/sec   Loss 1.0827   LearningRate 0.0000   Epoch: 32   Global Step: 56730   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-04 12:35:48,934-Speed 13903.05 samples/sec   Loss 1.0852   LearningRate 0.0000   Epoch: 32   Global Step: 56740   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-04 12:36:06,671-Speed 13857.16 samples/sec   Loss 1.0748   LearningRate 0.0000   Epoch: 32   Global Step: 56750   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-04 12:36:24,402-Speed 13860.84 samples/sec   Loss 1.0756   LearningRate 0.0000   Epoch: 32   Global Step: 56760   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-04 12:36:42,184-Speed 13822.40 samples/sec   Loss 1.0747   LearningRate 0.0000   Epoch: 32   Global Step: 56770   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-04 12:36:59,869-Speed 13898.37 samples/sec   Loss 1.0759   LearningRate 0.0000   Epoch: 32   Global Step: 56780   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-04 12:37:17,644-Speed 13826.73 samples/sec   Loss 1.0809   LearningRate 0.0000   Epoch: 32   Global Step: 56790   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-04 12:37:35,399-Speed 13842.09 samples/sec   Loss 1.0789   LearningRate 0.0000   Epoch: 32   Global Step: 56800   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-04 12:37:53,335-Speed 13703.47 samples/sec   Loss 1.0748   LearningRate 0.0000   Epoch: 32   Global Step: 56810   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-04 12:38:11,082-Speed 13848.47 samples/sec   Loss 1.0846   LearningRate 0.0000   Epoch: 32   Global Step: 56820   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-04 12:38:28,864-Speed 13822.04 samples/sec   Loss 1.0855   LearningRate 0.0000   Epoch: 32   Global Step: 56830   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-04 12:38:46,533-Speed 13909.75 samples/sec   Loss 1.0618   LearningRate 0.0000   Epoch: 32   Global Step: 56840   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-04 12:39:04,343-Speed 13799.78 samples/sec   Loss 1.0787   LearningRate 0.0000   Epoch: 32   Global Step: 56850   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-04 12:39:22,060-Speed 13872.65 samples/sec   Loss 1.0727   LearningRate 0.0000   Epoch: 32   Global Step: 56860   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-04 12:39:39,733-Speed 13907.15 samples/sec   Loss 1.0814   LearningRate 0.0000   Epoch: 32   Global Step: 56870   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-04 12:39:57,500-Speed 13833.10 samples/sec   Loss 1.0749   LearningRate 0.0000   Epoch: 32   Global Step: 56880   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-04 12:40:15,212-Speed 13876.89 samples/sec   Loss 1.0834   LearningRate 0.0000   Epoch: 32   Global Step: 56890   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-04 12:40:32,953-Speed 13853.27 samples/sec   Loss 1.0770   LearningRate 0.0000   Epoch: 32   Global Step: 56900   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-04 12:40:50,637-Speed 13899.11 samples/sec   Loss 1.0824   LearningRate 0.0000   Epoch: 32   Global Step: 56910   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-04 12:41:08,409-Speed 13829.37 samples/sec   Loss 1.0732   LearningRate 0.0000   Epoch: 32   Global Step: 56920   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-03-04 12:41:26,112-Speed 13883.64 samples/sec   Loss 1.0939   LearningRate 0.0000   Epoch: 32   Global Step: 56930   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-04 12:41:43,906-Speed 13812.30 samples/sec   Loss 1.0735   LearningRate 0.0000   Epoch: 32   Global Step: 56940   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-04 12:42:01,676-Speed 13832.18 samples/sec   Loss 1.0729   LearningRate 0.0000   Epoch: 32   Global Step: 56950   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-04 12:42:19,351-Speed 13905.47 samples/sec   Loss 1.0772   LearningRate 0.0000   Epoch: 32   Global Step: 56960   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-04 12:42:37,179-Speed 13788.13 samples/sec   Loss 1.0753   LearningRate 0.0000   Epoch: 32   Global Step: 56970   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-04 12:42:54,894-Speed 13873.58 samples/sec   Loss 1.0758   LearningRate 0.0000   Epoch: 32   Global Step: 56980   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-04 12:43:12,649-Speed 13842.44 samples/sec   Loss 1.0826   LearningRate 0.0000   Epoch: 32   Global Step: 56990   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-04 12:43:30,417-Speed 13832.56 samples/sec   Loss 1.0826   LearningRate 0.0000   Epoch: 32   Global Step: 57000   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-04 12:43:48,135-Speed 13872.39 samples/sec   Loss 1.0721   LearningRate 0.0000   Epoch: 32   Global Step: 57010   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-04 12:44:05,913-Speed 13825.20 samples/sec   Loss 1.0776   LearningRate 0.0000   Epoch: 32   Global Step: 57020   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-04 12:44:23,669-Speed 13842.07 samples/sec   Loss 1.0768   LearningRate 0.0000   Epoch: 32   Global Step: 57030   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-03-04 12:45:30,793-Speed 3661.33 samples/sec   Loss 1.0612   LearningRate 0.0000   Epoch: 33   Global Step: 57040   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-04 12:45:48,491-Speed 13888.05 samples/sec   Loss 1.0777   LearningRate 0.0000   Epoch: 33   Global Step: 57050   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-04 12:46:06,209-Speed 13874.47 samples/sec   Loss 1.0693   LearningRate 0.0000   Epoch: 33   Global Step: 57060   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-04 12:46:23,920-Speed 13876.91 samples/sec   Loss 1.0669   LearningRate 0.0000   Epoch: 33   Global Step: 57070   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-04 12:46:41,595-Speed 13905.33 samples/sec   Loss 1.0694   LearningRate 0.0000   Epoch: 33   Global Step: 57080   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-04 12:46:59,242-Speed 13927.32 samples/sec   Loss 1.0695   LearningRate 0.0000   Epoch: 33   Global Step: 57090   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-04 12:47:16,969-Speed 13865.52 samples/sec   Loss 1.0700   LearningRate 0.0000   Epoch: 33   Global Step: 57100   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-04 12:47:34,633-Speed 13914.36 samples/sec   Loss 1.0716   LearningRate 0.0000   Epoch: 33   Global Step: 57110   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-04 12:47:52,345-Speed 13876.65 samples/sec   Loss 1.0674   LearningRate 0.0000   Epoch: 33   Global Step: 57120   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-04 12:48:10,044-Speed 13886.99 samples/sec   Loss 1.0798   LearningRate 0.0000   Epoch: 33   Global Step: 57130   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-04 12:48:27,846-Speed 13806.45 samples/sec   Loss 1.0718   LearningRate 0.0000   Epoch: 33   Global Step: 57140   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-04 12:48:45,672-Speed 13787.32 samples/sec   Loss 1.0673   LearningRate 0.0000   Epoch: 33   Global Step: 57150   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-04 12:49:03,403-Speed 13861.67 samples/sec   Loss 1.0688   LearningRate 0.0000   Epoch: 33   Global Step: 57160   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-04 12:49:21,032-Speed 13941.42 samples/sec   Loss 1.0593   LearningRate 0.0000   Epoch: 33   Global Step: 57170   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-04 12:49:38,782-Speed 13846.24 samples/sec   Loss 1.0686   LearningRate 0.0000   Epoch: 33   Global Step: 57180   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-04 12:49:56,490-Speed 13879.41 samples/sec   Loss 1.0677   LearningRate 0.0000   Epoch: 33   Global Step: 57190   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-04 12:50:14,161-Speed 13909.42 samples/sec   Loss 1.0607   LearningRate 0.0000   Epoch: 33   Global Step: 57200   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-04 12:50:31,828-Speed 13911.86 samples/sec   Loss 1.0716   LearningRate 0.0000   Epoch: 33   Global Step: 57210   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-04 12:50:49,569-Speed 13852.96 samples/sec   Loss 1.0639   LearningRate 0.0000   Epoch: 33   Global Step: 57220   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-04 12:51:07,238-Speed 13911.64 samples/sec   Loss 1.0675   LearningRate 0.0000   Epoch: 33   Global Step: 57230   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-04 12:51:24,978-Speed 13855.14 samples/sec   Loss 1.0667   LearningRate 0.0000   Epoch: 33   Global Step: 57240   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-04 12:51:42,692-Speed 13874.53 samples/sec   Loss 1.0663   LearningRate 0.0000   Epoch: 33   Global Step: 57250   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-04 12:52:00,387-Speed 13889.48 samples/sec   Loss 1.0744   LearningRate 0.0000   Epoch: 33   Global Step: 57260   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-04 12:52:18,102-Speed 13873.97 samples/sec   Loss 1.0655   LearningRate 0.0000   Epoch: 33   Global Step: 57270   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-04 12:52:35,829-Speed 13865.52 samples/sec   Loss 1.0671   LearningRate 0.0000   Epoch: 33   Global Step: 57280   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-04 12:52:53,599-Speed 13830.59 samples/sec   Loss 1.0709   LearningRate 0.0000   Epoch: 33   Global Step: 57290   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-04 12:53:11,346-Speed 13848.65 samples/sec   Loss 1.0681   LearningRate 0.0000   Epoch: 33   Global Step: 57300   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-04 12:53:29,040-Speed 13890.30 samples/sec   Loss 1.0679   LearningRate 0.0000   Epoch: 33   Global Step: 57310   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-04 12:53:46,808-Speed 13833.03 samples/sec   Loss 1.0756   LearningRate 0.0000   Epoch: 33   Global Step: 57320   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-04 12:54:04,560-Speed 13845.25 samples/sec   Loss 1.0599   LearningRate 0.0000   Epoch: 33   Global Step: 57330   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-04 12:54:22,368-Speed 13801.76 samples/sec   Loss 1.0702   LearningRate 0.0000   Epoch: 33   Global Step: 57340   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-04 12:54:40,192-Speed 13788.85 samples/sec   Loss 1.0709   LearningRate 0.0000   Epoch: 33   Global Step: 57350   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-04 12:54:57,845-Speed 13921.79 samples/sec   Loss 1.0759   LearningRate 0.0000   Epoch: 33   Global Step: 57360   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-04 12:55:15,660-Speed 13796.74 samples/sec   Loss 1.0698   LearningRate 0.0000   Epoch: 33   Global Step: 57370   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-04 12:55:33,337-Speed 13903.74 samples/sec   Loss 1.0679   LearningRate 0.0000   Epoch: 33   Global Step: 57380   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-04 12:55:51,046-Speed 13878.22 samples/sec   Loss 1.0645   LearningRate 0.0000   Epoch: 33   Global Step: 57390   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-04 12:56:08,746-Speed 13885.58 samples/sec   Loss 1.0695   LearningRate 0.0000   Epoch: 33   Global Step: 57400   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-04 12:56:26,583-Speed 13778.95 samples/sec   Loss 1.0742   LearningRate 0.0000   Epoch: 33   Global Step: 57410   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-04 12:56:44,380-Speed 13810.44 samples/sec   Loss 1.0729   LearningRate 0.0000   Epoch: 33   Global Step: 57420   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-04 12:57:02,116-Speed 13856.76 samples/sec   Loss 1.0657   LearningRate 0.0000   Epoch: 33   Global Step: 57430   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-04 12:57:19,857-Speed 13853.87 samples/sec   Loss 1.0677   LearningRate 0.0000   Epoch: 33   Global Step: 57440   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-04 12:57:37,661-Speed 13804.70 samples/sec   Loss 1.0615   LearningRate 0.0000   Epoch: 33   Global Step: 57450   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-04 12:57:55,468-Speed 13802.62 samples/sec   Loss 1.0598   LearningRate 0.0000   Epoch: 33   Global Step: 57460   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-04 12:58:13,265-Speed 13810.45 samples/sec   Loss 1.0684   LearningRate 0.0000   Epoch: 33   Global Step: 57470   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-04 12:58:31,051-Speed 13818.44 samples/sec   Loss 1.0645   LearningRate 0.0000   Epoch: 33   Global Step: 57480   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-04 12:58:48,925-Speed 13750.20 samples/sec   Loss 1.0634   LearningRate 0.0000   Epoch: 33   Global Step: 57490   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-04 12:59:06,729-Speed 13804.62 samples/sec   Loss 1.0587   LearningRate 0.0000   Epoch: 33   Global Step: 57500   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-04 12:59:24,521-Speed 13813.49 samples/sec   Loss 1.0667   LearningRate 0.0000   Epoch: 33   Global Step: 57510   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-04 12:59:42,243-Speed 13868.88 samples/sec   Loss 1.0692   LearningRate 0.0000   Epoch: 33   Global Step: 57520   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-04 13:00:00,072-Speed 13784.39 samples/sec   Loss 1.0687   LearningRate 0.0000   Epoch: 33   Global Step: 57530   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-04 13:00:17,870-Speed 13810.21 samples/sec   Loss 1.0662   LearningRate 0.0000   Epoch: 33   Global Step: 57540   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-04 13:00:35,689-Speed 13793.03 samples/sec   Loss 1.0610   LearningRate 0.0000   Epoch: 33   Global Step: 57550   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-04 13:00:53,439-Speed 13846.25 samples/sec   Loss 1.0627   LearningRate 0.0000   Epoch: 33   Global Step: 57560   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-04 13:01:11,138-Speed 13886.41 samples/sec   Loss 1.0720   LearningRate 0.0000   Epoch: 33   Global Step: 57570   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-04 13:01:28,873-Speed 13857.88 samples/sec   Loss 1.0680   LearningRate 0.0000   Epoch: 33   Global Step: 57580   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-04 13:01:46,567-Speed 13890.71 samples/sec   Loss 1.0635   LearningRate 0.0000   Epoch: 33   Global Step: 57590   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-04 13:02:04,368-Speed 13807.23 samples/sec   Loss 1.0702   LearningRate 0.0000   Epoch: 33   Global Step: 57600   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-04 13:02:22,077-Speed 13878.44 samples/sec   Loss 1.0628   LearningRate 0.0000   Epoch: 33   Global Step: 57610   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-04 13:02:39,860-Speed 13821.21 samples/sec   Loss 1.0619   LearningRate 0.0000   Epoch: 33   Global Step: 57620   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-04 13:02:57,571-Speed 13876.86 samples/sec   Loss 1.0652   LearningRate 0.0000   Epoch: 33   Global Step: 57630   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-04 13:03:15,351-Speed 13823.17 samples/sec   Loss 1.0563   LearningRate 0.0000   Epoch: 33   Global Step: 57640   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-04 13:03:33,125-Speed 13827.46 samples/sec   Loss 1.0616   LearningRate 0.0000   Epoch: 33   Global Step: 57650   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-04 13:03:50,855-Speed 13862.44 samples/sec   Loss 1.0645   LearningRate 0.0000   Epoch: 33   Global Step: 57660   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-04 13:04:08,532-Speed 13903.65 samples/sec   Loss 1.0549   LearningRate 0.0000   Epoch: 33   Global Step: 57670   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-04 13:04:26,289-Speed 13841.61 samples/sec   Loss 1.0683   LearningRate 0.0000   Epoch: 33   Global Step: 57680   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-04 13:04:43,989-Speed 13885.82 samples/sec   Loss 1.0634   LearningRate 0.0000   Epoch: 33   Global Step: 57690   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-04 13:05:01,718-Speed 13863.97 samples/sec   Loss 1.0595   LearningRate 0.0000   Epoch: 33   Global Step: 57700   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-04 13:05:19,446-Speed 13866.00 samples/sec   Loss 1.0611   LearningRate 0.0000   Epoch: 33   Global Step: 57710   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-04 13:05:37,265-Speed 13792.73 samples/sec   Loss 1.0594   LearningRate 0.0000   Epoch: 33   Global Step: 57720   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-04 13:05:54,971-Speed 13880.54 samples/sec   Loss 1.0598   LearningRate 0.0000   Epoch: 33   Global Step: 57730   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-04 13:06:12,716-Speed 13850.42 samples/sec   Loss 1.0671   LearningRate 0.0000   Epoch: 33   Global Step: 57740   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-04 13:06:30,499-Speed 13821.23 samples/sec   Loss 1.0728   LearningRate 0.0000   Epoch: 33   Global Step: 57750   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-04 13:06:48,306-Speed 13802.28 samples/sec   Loss 1.0583   LearningRate 0.0000   Epoch: 33   Global Step: 57760   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-04 13:07:06,088-Speed 13823.75 samples/sec   Loss 1.0706   LearningRate 0.0000   Epoch: 33   Global Step: 57770   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-04 13:07:23,771-Speed 13898.96 samples/sec   Loss 1.0610   LearningRate 0.0000   Epoch: 33   Global Step: 57780   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-04 13:07:41,478-Speed 13880.76 samples/sec   Loss 1.0545   LearningRate 0.0000   Epoch: 33   Global Step: 57790   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-04 13:07:59,277-Speed 13808.14 samples/sec   Loss 1.0543   LearningRate 0.0000   Epoch: 33   Global Step: 57800   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-04 13:08:17,017-Speed 13854.29 samples/sec   Loss 1.0589   LearningRate 0.0000   Epoch: 33   Global Step: 57810   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-04 13:08:34,875-Speed 13763.35 samples/sec   Loss 1.0613   LearningRate 0.0000   Epoch: 33   Global Step: 57820   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-03-04 13:08:52,557-Speed 13899.90 samples/sec   Loss 1.0571   LearningRate 0.0000   Epoch: 33   Global Step: 57830   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-04 13:09:10,325-Speed 13832.63 samples/sec   Loss 1.0475   LearningRate 0.0000   Epoch: 33   Global Step: 57840   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-04 13:09:28,104-Speed 13823.73 samples/sec   Loss 1.0498   LearningRate 0.0000   Epoch: 33   Global Step: 57850   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-04 13:09:45,844-Speed 13854.71 samples/sec   Loss 1.0625   LearningRate 0.0000   Epoch: 33   Global Step: 57860   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-04 13:10:03,615-Speed 13830.39 samples/sec   Loss 1.0641   LearningRate 0.0000   Epoch: 33   Global Step: 57870   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-04 13:10:21,457-Speed 13774.73 samples/sec   Loss 1.0592   LearningRate 0.0000   Epoch: 33   Global Step: 57880   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-04 13:10:39,190-Speed 13859.88 samples/sec   Loss 1.0637   LearningRate 0.0000   Epoch: 33   Global Step: 57890   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-04 13:10:56,944-Speed 13843.31 samples/sec   Loss 1.0646   LearningRate 0.0000   Epoch: 33   Global Step: 57900   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-04 13:11:15,009-Speed 13605.26 samples/sec   Loss 1.0634   LearningRate 0.0000   Epoch: 33   Global Step: 57910   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-04 13:11:33,018-Speed 13647.03 samples/sec   Loss 1.0503   LearningRate 0.0000   Epoch: 33   Global Step: 57920   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-04 13:11:51,037-Speed 13639.56 samples/sec   Loss 1.0519   LearningRate 0.0000   Epoch: 33   Global Step: 57930   Fp16 Grad Scale: 65536   Required: 6 hours
Training: 2022-03-04 13:12:09,120-Speed 13591.50 samples/sec   Loss 1.0572   LearningRate 0.0000   Epoch: 33   Global Step: 57940   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-04 13:12:27,075-Speed 13689.00 samples/sec   Loss 1.0529   LearningRate 0.0000   Epoch: 33   Global Step: 57950   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-04 13:12:45,085-Speed 13646.26 samples/sec   Loss 1.0611   LearningRate 0.0000   Epoch: 33   Global Step: 57960   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-04 13:13:03,131-Speed 13619.27 samples/sec   Loss 1.0495   LearningRate 0.0000   Epoch: 33   Global Step: 57970   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-04 13:13:21,146-Speed 13643.02 samples/sec   Loss 1.0591   LearningRate 0.0000   Epoch: 33   Global Step: 57980   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-04 13:13:39,165-Speed 13640.27 samples/sec   Loss 1.0534   LearningRate 0.0000   Epoch: 33   Global Step: 57990   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-04 13:13:57,203-Speed 13625.55 samples/sec   Loss 1.0481   LearningRate 0.0000   Epoch: 33   Global Step: 58000   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-04 13:14:15,277-Speed 13598.69 samples/sec   Loss 1.0504   LearningRate 0.0000   Epoch: 33   Global Step: 58010   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-04 13:14:33,291-Speed 13643.30 samples/sec   Loss 1.0536   LearningRate 0.0000   Epoch: 33   Global Step: 58020   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-04 13:14:51,272-Speed 13668.70 samples/sec   Loss 1.0543   LearningRate 0.0000   Epoch: 33   Global Step: 58030   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-04 13:15:09,286-Speed 13643.81 samples/sec   Loss 1.0573   LearningRate 0.0000   Epoch: 33   Global Step: 58040   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-04 13:15:27,263-Speed 13671.73 samples/sec   Loss 1.0576   LearningRate 0.0000   Epoch: 33   Global Step: 58050   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-04 13:15:45,306-Speed 13622.19 samples/sec   Loss 1.0552   LearningRate 0.0000   Epoch: 33   Global Step: 58060   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-04 13:16:03,369-Speed 13606.54 samples/sec   Loss 1.0563   LearningRate 0.0000   Epoch: 33   Global Step: 58070   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-04 13:16:21,454-Speed 13590.12 samples/sec   Loss 1.0509   LearningRate 0.0000   Epoch: 33   Global Step: 58080   Fp16 Grad Scale: 32768   Required: 6 hours
Training: 2022-03-04 13:16:39,417-Speed 13681.98 samples/sec   Loss 1.0574   LearningRate 0.0000   Epoch: 33   Global Step: 58090   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-04 13:16:57,396-Speed 13669.95 samples/sec   Loss 1.0525   LearningRate 0.0000   Epoch: 33   Global Step: 58100   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-04 13:17:15,380-Speed 13666.75 samples/sec   Loss 1.0602   LearningRate 0.0000   Epoch: 33   Global Step: 58110   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-04 13:17:33,489-Speed 13572.37 samples/sec   Loss 1.0501   LearningRate 0.0000   Epoch: 33   Global Step: 58120   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-04 13:17:51,498-Speed 13646.72 samples/sec   Loss 1.0552   LearningRate 0.0000   Epoch: 33   Global Step: 58130   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-04 13:18:09,487-Speed 13662.68 samples/sec   Loss 1.0537   LearningRate 0.0000   Epoch: 33   Global Step: 58140   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-04 13:18:27,502-Speed 13642.73 samples/sec   Loss 1.0584   LearningRate 0.0000   Epoch: 33   Global Step: 58150   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-04 13:18:45,490-Speed 13663.83 samples/sec   Loss 1.0580   LearningRate 0.0000   Epoch: 33   Global Step: 58160   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-04 13:19:03,504-Speed 13643.21 samples/sec   Loss 1.0504   LearningRate 0.0000   Epoch: 33   Global Step: 58170   Fp16 Grad Scale: 16384   Required: 6 hours
Training: 2022-03-04 13:19:21,453-Speed 13692.91 samples/sec   Loss 1.0527   LearningRate 0.0000   Epoch: 33   Global Step: 58180   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-04 13:19:39,458-Speed 13650.62 samples/sec   Loss 1.0518   LearningRate 0.0000   Epoch: 33   Global Step: 58190   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-04 13:19:57,446-Speed 13663.21 samples/sec   Loss 1.0492   LearningRate 0.0000   Epoch: 33   Global Step: 58200   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-04 13:20:15,469-Speed 13636.97 samples/sec   Loss 1.0482   LearningRate 0.0000   Epoch: 33   Global Step: 58210   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-04 13:20:33,445-Speed 13672.14 samples/sec   Loss 1.0649   LearningRate 0.0000   Epoch: 33   Global Step: 58220   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-04 13:20:51,508-Speed 13606.74 samples/sec   Loss 1.0479   LearningRate 0.0000   Epoch: 33   Global Step: 58230   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-04 13:21:09,531-Speed 13637.61 samples/sec   Loss 1.0608   LearningRate 0.0000   Epoch: 33   Global Step: 58240   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-04 13:21:27,553-Speed 13637.05 samples/sec   Loss 1.0555   LearningRate 0.0000   Epoch: 33   Global Step: 58250   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-04 13:21:45,574-Speed 13638.29 samples/sec   Loss 1.0461   LearningRate 0.0000   Epoch: 33   Global Step: 58260   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-04 13:22:03,676-Speed 13577.39 samples/sec   Loss 1.0553   LearningRate 0.0000   Epoch: 33   Global Step: 58270   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-04 13:22:21,818-Speed 13547.17 samples/sec   Loss 1.0541   LearningRate 0.0000   Epoch: 33   Global Step: 58280   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-04 13:22:39,842-Speed 13636.39 samples/sec   Loss 1.0444   LearningRate 0.0000   Epoch: 33   Global Step: 58290   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-04 13:22:57,793-Speed 13691.24 samples/sec   Loss 1.0572   LearningRate 0.0000   Epoch: 33   Global Step: 58300   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-04 13:23:15,841-Speed 13617.91 samples/sec   Loss 1.0545   LearningRate 0.0000   Epoch: 33   Global Step: 58310   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-04 13:23:33,835-Speed 13660.97 samples/sec   Loss 1.0452   LearningRate 0.0000   Epoch: 33   Global Step: 58320   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-04 13:23:51,907-Speed 13599.46 samples/sec   Loss 1.0480   LearningRate 0.0000   Epoch: 33   Global Step: 58330   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-04 13:24:09,919-Speed 13645.26 samples/sec   Loss 1.0538   LearningRate 0.0000   Epoch: 33   Global Step: 58340   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-04 13:24:27,927-Speed 13647.83 samples/sec   Loss 1.0433   LearningRate 0.0000   Epoch: 33   Global Step: 58350   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-04 13:24:45,951-Speed 13636.44 samples/sec   Loss 1.0463   LearningRate 0.0000   Epoch: 33   Global Step: 58360   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-04 13:25:03,943-Speed 13659.71 samples/sec   Loss 1.0489   LearningRate 0.0000   Epoch: 33   Global Step: 58370   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-04 13:25:21,902-Speed 13685.37 samples/sec   Loss 1.0459   LearningRate 0.0000   Epoch: 33   Global Step: 58380   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-04 13:25:39,896-Speed 13659.25 samples/sec   Loss 1.0436   LearningRate 0.0000   Epoch: 33   Global Step: 58390   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-04 13:25:57,953-Speed 13611.33 samples/sec   Loss 1.0411   LearningRate 0.0000   Epoch: 33   Global Step: 58400   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-04 13:26:16,011-Speed 13610.37 samples/sec   Loss 1.0551   LearningRate 0.0000   Epoch: 33   Global Step: 58410   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-04 13:26:34,009-Speed 13655.28 samples/sec   Loss 1.0497   LearningRate 0.0000   Epoch: 33   Global Step: 58420   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-04 13:26:52,070-Speed 13608.26 samples/sec   Loss 1.0440   LearningRate 0.0000   Epoch: 33   Global Step: 58430   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-04 13:27:10,079-Speed 13647.36 samples/sec   Loss 1.0513   LearningRate 0.0000   Epoch: 33   Global Step: 58440   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-04 13:27:28,059-Speed 13669.27 samples/sec   Loss 1.0509   LearningRate 0.0000   Epoch: 33   Global Step: 58450   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-04 13:27:46,099-Speed 13623.98 samples/sec   Loss 1.0478   LearningRate 0.0000   Epoch: 33   Global Step: 58460   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-04 13:28:04,119-Speed 13639.04 samples/sec   Loss 1.0565   LearningRate 0.0000   Epoch: 33   Global Step: 58470   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-04 13:28:22,145-Speed 13634.33 samples/sec   Loss 1.0538   LearningRate 0.0000   Epoch: 33   Global Step: 58480   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-04 13:28:40,093-Speed 13693.92 samples/sec   Loss 1.0480   LearningRate 0.0000   Epoch: 33   Global Step: 58490   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-04 13:28:58,167-Speed 13598.25 samples/sec   Loss 1.0479   LearningRate 0.0000   Epoch: 33   Global Step: 58500   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-04 13:29:16,200-Speed 13629.13 samples/sec   Loss 1.0439   LearningRate 0.0000   Epoch: 33   Global Step: 58510   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-04 13:29:34,259-Speed 13609.85 samples/sec   Loss 1.0491   LearningRate 0.0000   Epoch: 33   Global Step: 58520   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-04 13:29:52,249-Speed 13661.85 samples/sec   Loss 1.0502   LearningRate 0.0000   Epoch: 33   Global Step: 58530   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-03-04 13:30:10,234-Speed 13666.07 samples/sec   Loss 1.0470   LearningRate 0.0000   Epoch: 33   Global Step: 58540   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-03-04 13:30:28,338-Speed 13575.19 samples/sec   Loss 1.0491   LearningRate 0.0000   Epoch: 33   Global Step: 58550   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-03-04 13:30:46,449-Speed 13570.55 samples/sec   Loss 1.0429   LearningRate 0.0000   Epoch: 33   Global Step: 58560   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-04 13:31:04,451-Speed 13652.66 samples/sec   Loss 1.0482   LearningRate 0.0000   Epoch: 33   Global Step: 58570   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-04 13:31:22,502-Speed 13617.18 samples/sec   Loss 1.0437   LearningRate 0.0000   Epoch: 33   Global Step: 58580   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-04 13:31:40,646-Speed 13546.47 samples/sec   Loss 1.0424   LearningRate 0.0000   Epoch: 33   Global Step: 58590   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-04 13:31:58,650-Speed 13650.94 samples/sec   Loss 1.0378   LearningRate 0.0000   Epoch: 33   Global Step: 58600   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-04 13:32:16,757-Speed 13573.49 samples/sec   Loss 1.0547   LearningRate 0.0000   Epoch: 33   Global Step: 58610   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-04 13:32:34,794-Speed 13625.91 samples/sec   Loss 1.0469   LearningRate 0.0000   Epoch: 33   Global Step: 58620   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-04 13:32:52,893-Speed 13580.03 samples/sec   Loss 1.0477   LearningRate 0.0000   Epoch: 33   Global Step: 58630   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-04 13:33:10,920-Speed 13633.49 samples/sec   Loss 1.0551   LearningRate 0.0000   Epoch: 33   Global Step: 58640   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-04 13:33:28,982-Speed 13607.43 samples/sec   Loss 1.0503   LearningRate 0.0000   Epoch: 33   Global Step: 58650   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-04 13:33:46,960-Speed 13670.62 samples/sec   Loss 1.0460   LearningRate 0.0000   Epoch: 33   Global Step: 58660   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-03-04 13:34:04,947-Speed 13663.97 samples/sec   Loss 1.0495   LearningRate 0.0000   Epoch: 33   Global Step: 58670   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-04 13:34:23,057-Speed 13571.84 samples/sec   Loss 1.0509   LearningRate 0.0000   Epoch: 33   Global Step: 58680   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-04 13:34:41,035-Speed 13670.64 samples/sec   Loss 1.0442   LearningRate 0.0000   Epoch: 33   Global Step: 58690   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-04 13:34:58,995-Speed 13684.59 samples/sec   Loss 1.0455   LearningRate 0.0000   Epoch: 33   Global Step: 58700   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-04 13:35:16,993-Speed 13655.85 samples/sec   Loss 1.0423   LearningRate 0.0000   Epoch: 33   Global Step: 58710   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-04 13:35:34,983-Speed 13662.61 samples/sec   Loss 1.0443   LearningRate 0.0000   Epoch: 33   Global Step: 58720   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-04 13:35:53,065-Speed 13592.89 samples/sec   Loss 1.0451   LearningRate 0.0000   Epoch: 33   Global Step: 58730   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-04 13:36:10,987-Speed 13713.57 samples/sec   Loss 1.0511   LearningRate 0.0000   Epoch: 33   Global Step: 58740   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-04 13:36:29,008-Speed 13638.21 samples/sec   Loss 1.0542   LearningRate 0.0000   Epoch: 33   Global Step: 58750   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-04 13:36:47,008-Speed 13654.55 samples/sec   Loss 1.0472   LearningRate 0.0000   Epoch: 33   Global Step: 58760   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-04 13:37:56,234-Speed 3550.19 samples/sec   Loss 1.0436   LearningRate 0.0000   Epoch: 34   Global Step: 58770   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-04 13:38:14,173-Speed 13700.57 samples/sec   Loss 1.0386   LearningRate 0.0000   Epoch: 34   Global Step: 58780   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-04 13:38:32,133-Speed 13684.62 samples/sec   Loss 1.0512   LearningRate 0.0000   Epoch: 34   Global Step: 58790   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-04 13:38:50,122-Speed 13662.36 samples/sec   Loss 1.0375   LearningRate 0.0000   Epoch: 34   Global Step: 58800   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-04 13:39:08,099-Speed 13672.56 samples/sec   Loss 1.0427   LearningRate 0.0000   Epoch: 34   Global Step: 58810   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-04 13:39:26,107-Speed 13647.67 samples/sec   Loss 1.0358   LearningRate 0.0000   Epoch: 34   Global Step: 58820   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-04 13:39:44,138-Speed 13630.70 samples/sec   Loss 1.0417   LearningRate 0.0000   Epoch: 34   Global Step: 58830   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-04 13:40:02,002-Speed 13758.02 samples/sec   Loss 1.0404   LearningRate 0.0000   Epoch: 34   Global Step: 58840   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-04 13:40:19,935-Speed 13705.99 samples/sec   Loss 1.0440   LearningRate 0.0000   Epoch: 34   Global Step: 58850   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-04 13:40:37,901-Speed 13679.89 samples/sec   Loss 1.0353   LearningRate 0.0000   Epoch: 34   Global Step: 58860   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-04 13:40:55,843-Speed 13698.92 samples/sec   Loss 1.0381   LearningRate 0.0000   Epoch: 34   Global Step: 58870   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-04 13:41:13,825-Speed 13667.56 samples/sec   Loss 1.0368   LearningRate 0.0000   Epoch: 34   Global Step: 58880   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-04 13:41:31,790-Speed 13680.71 samples/sec   Loss 1.0351   LearningRate 0.0000   Epoch: 34   Global Step: 58890   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-04 13:41:49,723-Speed 13705.47 samples/sec   Loss 1.0421   LearningRate 0.0000   Epoch: 34   Global Step: 58900   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-04 13:42:07,793-Speed 13601.76 samples/sec   Loss 1.0455   LearningRate 0.0000   Epoch: 34   Global Step: 58910   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-04 13:42:25,786-Speed 13659.14 samples/sec   Loss 1.0406   LearningRate 0.0000   Epoch: 34   Global Step: 58920   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-04 13:42:43,840-Speed 13613.77 samples/sec   Loss 1.0376   LearningRate 0.0000   Epoch: 34   Global Step: 58930   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-04 13:43:01,746-Speed 13725.15 samples/sec   Loss 1.0346   LearningRate 0.0000   Epoch: 34   Global Step: 58940   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-04 13:43:19,721-Speed 13674.32 samples/sec   Loss 1.0438   LearningRate 0.0000   Epoch: 34   Global Step: 58950   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-04 13:43:37,665-Speed 13696.26 samples/sec   Loss 1.0385   LearningRate 0.0000   Epoch: 34   Global Step: 58960   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-04 13:43:55,649-Speed 13666.38 samples/sec   Loss 1.0399   LearningRate 0.0000   Epoch: 34   Global Step: 58970   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-04 13:44:13,608-Speed 13685.72 samples/sec   Loss 1.0368   LearningRate 0.0000   Epoch: 34   Global Step: 58980   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-04 13:44:31,541-Speed 13705.35 samples/sec   Loss 1.0381   LearningRate 0.0000   Epoch: 34   Global Step: 58990   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-04 13:44:49,528-Speed 13664.00 samples/sec   Loss 1.0384   LearningRate 0.0000   Epoch: 34   Global Step: 59000   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-04 13:45:07,503-Speed 13673.18 samples/sec   Loss 1.0446   LearningRate 0.0000   Epoch: 34   Global Step: 59010   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-04 13:45:25,453-Speed 13692.86 samples/sec   Loss 1.0388   LearningRate 0.0000   Epoch: 34   Global Step: 59020   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-04 13:45:43,422-Speed 13677.41 samples/sec   Loss 1.0429   LearningRate 0.0000   Epoch: 34   Global Step: 59030   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-04 13:46:01,308-Speed 13741.14 samples/sec   Loss 1.0434   LearningRate 0.0000   Epoch: 34   Global Step: 59040   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-03-04 13:46:19,272-Speed 13681.88 samples/sec   Loss 1.0442   LearningRate 0.0000   Epoch: 34   Global Step: 59050   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-03-04 13:46:37,252-Speed 13669.24 samples/sec   Loss 1.0374   LearningRate 0.0000   Epoch: 34   Global Step: 59060   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-03-04 13:46:55,197-Speed 13696.29 samples/sec   Loss 1.0396   LearningRate 0.0000   Epoch: 34   Global Step: 59070   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-03-04 13:47:13,126-Speed 13708.65 samples/sec   Loss 1.0481   LearningRate 0.0000   Epoch: 34   Global Step: 59080   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-03-04 13:47:31,120-Speed 13658.24 samples/sec   Loss 1.0446   LearningRate 0.0000   Epoch: 34   Global Step: 59090   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-03-04 13:47:49,029-Speed 13724.15 samples/sec   Loss 1.0376   LearningRate 0.0000   Epoch: 34   Global Step: 59100   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-03-04 13:48:07,036-Speed 13648.76 samples/sec   Loss 1.0410   LearningRate 0.0000   Epoch: 34   Global Step: 59110   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-03-04 13:48:25,046-Speed 13646.81 samples/sec   Loss 1.0382   LearningRate 0.0000   Epoch: 34   Global Step: 59120   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-03-04 13:48:43,053-Speed 13648.93 samples/sec   Loss 1.0444   LearningRate 0.0000   Epoch: 34   Global Step: 59130   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-03-04 13:49:01,040-Speed 13664.12 samples/sec   Loss 1.0401   LearningRate 0.0000   Epoch: 34   Global Step: 59140   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-04 13:49:19,117-Speed 13595.52 samples/sec   Loss 1.0403   LearningRate 0.0000   Epoch: 34   Global Step: 59150   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-04 13:49:37,105-Speed 13664.00 samples/sec   Loss 1.0336   LearningRate 0.0000   Epoch: 34   Global Step: 59160   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-04 13:49:55,089-Speed 13666.01 samples/sec   Loss 1.0425   LearningRate 0.0000   Epoch: 34   Global Step: 59170   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-04 13:50:13,091-Speed 13652.81 samples/sec   Loss 1.0395   LearningRate 0.0000   Epoch: 34   Global Step: 59180   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-04 13:50:31,063-Speed 13675.73 samples/sec   Loss 1.0402   LearningRate 0.0000   Epoch: 34   Global Step: 59190   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-04 13:50:49,077-Speed 13643.20 samples/sec   Loss 1.0388   LearningRate 0.0000   Epoch: 34   Global Step: 59200   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-04 13:51:07,098-Speed 13638.38 samples/sec   Loss 1.0428   LearningRate 0.0000   Epoch: 34   Global Step: 59210   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-04 13:51:25,060-Speed 13683.23 samples/sec   Loss 1.0337   LearningRate 0.0000   Epoch: 34   Global Step: 59220   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-04 13:51:43,038-Speed 13671.88 samples/sec   Loss 1.0368   LearningRate 0.0000   Epoch: 34   Global Step: 59230   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-04 13:52:01,102-Speed 13605.74 samples/sec   Loss 1.0436   LearningRate 0.0000   Epoch: 34   Global Step: 59240   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-04 13:52:19,183-Speed 13592.68 samples/sec   Loss 1.0379   LearningRate 0.0000   Epoch: 34   Global Step: 59250   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-04 13:52:37,195-Speed 13644.97 samples/sec   Loss 1.0395   LearningRate 0.0000   Epoch: 34   Global Step: 59260   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-04 13:52:55,285-Speed 13586.61 samples/sec   Loss 1.0353   LearningRate 0.0000   Epoch: 34   Global Step: 59270   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-04 13:53:13,309-Speed 13636.12 samples/sec   Loss 1.0286   LearningRate 0.0000   Epoch: 34   Global Step: 59280   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-04 13:53:31,334-Speed 13634.86 samples/sec   Loss 1.0348   LearningRate 0.0000   Epoch: 34   Global Step: 59290   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-04 13:53:49,374-Speed 13623.57 samples/sec   Loss 1.0378   LearningRate 0.0000   Epoch: 34   Global Step: 59300   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-04 13:54:07,401-Speed 13634.44 samples/sec   Loss 1.0376   LearningRate 0.0000   Epoch: 34   Global Step: 59310   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-04 13:54:25,411-Speed 13647.28 samples/sec   Loss 1.0335   LearningRate 0.0000   Epoch: 34   Global Step: 59320   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-04 13:54:43,499-Speed 13587.56 samples/sec   Loss 1.0323   LearningRate 0.0000   Epoch: 34   Global Step: 59330   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-04 13:55:01,533-Speed 13628.29 samples/sec   Loss 1.0421   LearningRate 0.0000   Epoch: 34   Global Step: 59340   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-04 13:55:19,338-Speed 13804.28 samples/sec   Loss 1.0370   LearningRate 0.0000   Epoch: 34   Global Step: 59350   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-04 13:55:37,045-Speed 13880.14 samples/sec   Loss 1.0325   LearningRate 0.0000   Epoch: 34   Global Step: 59360   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-04 13:55:54,697-Speed 13923.43 samples/sec   Loss 1.0317   LearningRate 0.0000   Epoch: 34   Global Step: 59370   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-03-04 13:56:12,397-Speed 13885.03 samples/sec   Loss 1.0276   LearningRate 0.0000   Epoch: 34   Global Step: 59380   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-03-04 13:56:30,082-Speed 13898.19 samples/sec   Loss 1.0323   LearningRate 0.0000   Epoch: 34   Global Step: 59390   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-03-04 13:56:47,730-Speed 13927.37 samples/sec   Loss 1.0429   LearningRate 0.0000   Epoch: 34   Global Step: 59400   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-03-04 13:57:05,433-Speed 13883.21 samples/sec   Loss 1.0366   LearningRate 0.0000   Epoch: 34   Global Step: 59410   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-03-04 13:57:23,185-Speed 13844.35 samples/sec   Loss 1.0362   LearningRate 0.0000   Epoch: 34   Global Step: 59420   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-03-04 13:57:40,866-Speed 13901.49 samples/sec   Loss 1.0306   LearningRate 0.0000   Epoch: 34   Global Step: 59430   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-03-04 13:57:58,603-Speed 13855.97 samples/sec   Loss 1.0323   LearningRate 0.0000   Epoch: 34   Global Step: 59440   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-03-04 13:58:16,316-Speed 13875.78 samples/sec   Loss 1.0379   LearningRate 0.0000   Epoch: 34   Global Step: 59450   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-03-04 13:58:34,019-Speed 13884.26 samples/sec   Loss 1.0342   LearningRate 0.0000   Epoch: 34   Global Step: 59460   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-03-04 13:58:51,714-Speed 13889.89 samples/sec   Loss 1.0309   LearningRate 0.0000   Epoch: 34   Global Step: 59470   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-04 13:59:09,453-Speed 13854.67 samples/sec   Loss 1.0360   LearningRate 0.0000   Epoch: 34   Global Step: 59480   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-04 13:59:27,195-Speed 13853.19 samples/sec   Loss 1.0374   LearningRate 0.0000   Epoch: 34   Global Step: 59490   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-04 13:59:44,891-Speed 13888.36 samples/sec   Loss 1.0277   LearningRate 0.0000   Epoch: 34   Global Step: 59500   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-04 14:00:02,603-Speed 13876.67 samples/sec   Loss 1.0398   LearningRate 0.0000   Epoch: 34   Global Step: 59510   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-04 14:00:20,314-Speed 13876.89 samples/sec   Loss 1.0340   LearningRate 0.0000   Epoch: 34   Global Step: 59520   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-04 14:00:38,098-Speed 13819.97 samples/sec   Loss 1.0308   LearningRate 0.0000   Epoch: 34   Global Step: 59530   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-04 14:00:55,799-Speed 13884.29 samples/sec   Loss 1.0353   LearningRate 0.0000   Epoch: 34   Global Step: 59540   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-04 14:01:13,514-Speed 13875.50 samples/sec   Loss 1.0250   LearningRate 0.0000   Epoch: 34   Global Step: 59550   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-04 14:01:31,192-Speed 13902.45 samples/sec   Loss 1.0312   LearningRate 0.0000   Epoch: 34   Global Step: 59560   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-04 14:01:48,900-Speed 13879.77 samples/sec   Loss 1.0308   LearningRate 0.0000   Epoch: 34   Global Step: 59570   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-04 14:02:06,612-Speed 13876.09 samples/sec   Loss 1.0317   LearningRate 0.0000   Epoch: 34   Global Step: 59580   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-04 14:02:24,283-Speed 13908.48 samples/sec   Loss 1.0338   LearningRate 0.0000   Epoch: 34   Global Step: 59590   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-04 14:02:41,967-Speed 13898.33 samples/sec   Loss 1.0350   LearningRate 0.0000   Epoch: 34   Global Step: 59600   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-04 14:02:59,661-Speed 13890.63 samples/sec   Loss 1.0310   LearningRate 0.0000   Epoch: 34   Global Step: 59610   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-04 14:03:17,331-Speed 13908.74 samples/sec   Loss 1.0354   LearningRate 0.0000   Epoch: 34   Global Step: 59620   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-04 14:03:35,051-Speed 13870.13 samples/sec   Loss 1.0253   LearningRate 0.0000   Epoch: 34   Global Step: 59630   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-04 14:03:52,764-Speed 13875.24 samples/sec   Loss 1.0227   LearningRate 0.0000   Epoch: 34   Global Step: 59640   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-04 14:04:10,498-Speed 13859.22 samples/sec   Loss 1.0277   LearningRate 0.0000   Epoch: 34   Global Step: 59650   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-04 14:04:28,147-Speed 13926.42 samples/sec   Loss 1.0305   LearningRate 0.0000   Epoch: 34   Global Step: 59660   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-04 14:04:45,919-Speed 13828.71 samples/sec   Loss 1.0320   LearningRate 0.0000   Epoch: 34   Global Step: 59670   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-04 14:05:03,592-Speed 13907.47 samples/sec   Loss 1.0242   LearningRate 0.0000   Epoch: 34   Global Step: 59680   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-04 14:05:21,243-Speed 13925.45 samples/sec   Loss 1.0292   LearningRate 0.0000   Epoch: 34   Global Step: 59690   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-04 14:05:39,019-Speed 13826.22 samples/sec   Loss 1.0236   LearningRate 0.0000   Epoch: 34   Global Step: 59700   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-04 14:05:56,689-Speed 13908.77 samples/sec   Loss 1.0290   LearningRate 0.0000   Epoch: 34   Global Step: 59710   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-04 14:06:14,431-Speed 13853.45 samples/sec   Loss 1.0330   LearningRate 0.0000   Epoch: 34   Global Step: 59720   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-04 14:06:32,142-Speed 13876.63 samples/sec   Loss 1.0286   LearningRate 0.0000   Epoch: 34   Global Step: 59730   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-04 14:06:49,859-Speed 13872.31 samples/sec   Loss 1.0287   LearningRate 0.0000   Epoch: 34   Global Step: 59740   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-04 14:07:07,572-Speed 13875.79 samples/sec   Loss 1.0385   LearningRate 0.0000   Epoch: 34   Global Step: 59750   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-04 14:07:25,241-Speed 13910.63 samples/sec   Loss 1.0328   LearningRate 0.0000   Epoch: 34   Global Step: 59760   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-04 14:07:43,071-Speed 13783.99 samples/sec   Loss 1.0349   LearningRate 0.0000   Epoch: 34   Global Step: 59770   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-04 14:08:00,748-Speed 13903.60 samples/sec   Loss 1.0287   LearningRate 0.0000   Epoch: 34   Global Step: 59780   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-04 14:08:18,456-Speed 13881.69 samples/sec   Loss 1.0342   LearningRate 0.0000   Epoch: 34   Global Step: 59790   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-04 14:08:36,152-Speed 13888.67 samples/sec   Loss 1.0355   LearningRate 0.0000   Epoch: 34   Global Step: 59800   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-04 14:08:53,903-Speed 13845.61 samples/sec   Loss 1.0258   LearningRate 0.0000   Epoch: 34   Global Step: 59810   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-04 14:09:11,632-Speed 13863.44 samples/sec   Loss 1.0310   LearningRate 0.0000   Epoch: 34   Global Step: 59820   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-04 14:09:29,347-Speed 13873.73 samples/sec   Loss 1.0272   LearningRate 0.0000   Epoch: 34   Global Step: 59830   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-04 14:09:47,010-Speed 13914.24 samples/sec   Loss 1.0272   LearningRate 0.0000   Epoch: 34   Global Step: 59840   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-04 14:10:04,745-Speed 13858.94 samples/sec   Loss 1.0276   LearningRate 0.0000   Epoch: 34   Global Step: 59850   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-04 14:10:22,364-Speed 13949.73 samples/sec   Loss 1.0247   LearningRate 0.0000   Epoch: 34   Global Step: 59860   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-04 14:10:40,002-Speed 13934.19 samples/sec   Loss 1.0179   LearningRate 0.0000   Epoch: 34   Global Step: 59870   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-04 14:10:57,660-Speed 13918.28 samples/sec   Loss 1.0270   LearningRate 0.0000   Epoch: 34   Global Step: 59880   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-04 14:11:15,382-Speed 13868.76 samples/sec   Loss 1.0283   LearningRate 0.0000   Epoch: 34   Global Step: 59890   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-04 14:11:33,055-Speed 13907.05 samples/sec   Loss 1.0298   LearningRate 0.0000   Epoch: 34   Global Step: 59900   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-04 14:11:50,782-Speed 13864.42 samples/sec   Loss 1.0262   LearningRate 0.0000   Epoch: 34   Global Step: 59910   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-04 14:12:08,441-Speed 13917.47 samples/sec   Loss 1.0285   LearningRate 0.0000   Epoch: 34   Global Step: 59920   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-04 14:12:26,211-Speed 13831.40 samples/sec   Loss 1.0309   LearningRate 0.0000   Epoch: 34   Global Step: 59930   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-04 14:12:43,906-Speed 13889.55 samples/sec   Loss 1.0237   LearningRate 0.0000   Epoch: 34   Global Step: 59940   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-04 14:13:01,597-Speed 13892.50 samples/sec   Loss 1.0261   LearningRate 0.0000   Epoch: 34   Global Step: 59950   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-04 14:13:19,323-Speed 13864.70 samples/sec   Loss 1.0256   LearningRate 0.0000   Epoch: 34   Global Step: 59960   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-04 14:13:37,095-Speed 13829.82 samples/sec   Loss 1.0215   LearningRate 0.0000   Epoch: 34   Global Step: 59970   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-03-04 14:13:54,758-Speed 13915.06 samples/sec   Loss 1.0302   LearningRate 0.0000   Epoch: 34   Global Step: 59980   Fp16 Grad Scale: 65536   Required: 5 hours
Training: 2022-03-04 14:14:12,451-Speed 13891.31 samples/sec   Loss 1.0170   LearningRate 0.0000   Epoch: 34   Global Step: 59990   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-04 14:14:30,181-Speed 13862.10 samples/sec   Loss 1.0284   LearningRate 0.0000   Epoch: 34   Global Step: 60000   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-04 14:14:47,982-Speed 13806.28 samples/sec   Loss 1.0168   LearningRate 0.0000   Epoch: 34   Global Step: 60010   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-04 14:15:05,659-Speed 13904.41 samples/sec   Loss 1.0277   LearningRate 0.0000   Epoch: 34   Global Step: 60020   Fp16 Grad Scale: 32768   Required: 5 hours
Training: 2022-03-04 14:15:23,365-Speed 13880.41 samples/sec   Loss 1.0257   LearningRate 0.0000   Epoch: 34   Global Step: 60030   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-04 14:15:41,040-Speed 13905.37 samples/sec   Loss 1.0245   LearningRate 0.0000   Epoch: 34   Global Step: 60040   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-04 14:15:58,715-Speed 13905.40 samples/sec   Loss 1.0262   LearningRate 0.0000   Epoch: 34   Global Step: 60050   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-04 14:16:16,425-Speed 13878.36 samples/sec   Loss 1.0195   LearningRate 0.0000   Epoch: 34   Global Step: 60060   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-04 14:16:34,180-Speed 13841.76 samples/sec   Loss 1.0266   LearningRate 0.0000   Epoch: 34   Global Step: 60070   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-04 14:16:51,937-Speed 13840.94 samples/sec   Loss 1.0208   LearningRate 0.0000   Epoch: 34   Global Step: 60080   Fp16 Grad Scale: 16384   Required: 5 hours
Training: 2022-03-04 14:17:09,671-Speed 13859.24 samples/sec   Loss 1.0190   LearningRate 0.0000   Epoch: 34   Global Step: 60090   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-03-04 14:17:27,398-Speed 13864.60 samples/sec   Loss 1.0394   LearningRate 0.0000   Epoch: 34   Global Step: 60100   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-03-04 14:17:45,064-Speed 13913.05 samples/sec   Loss 1.0248   LearningRate 0.0000   Epoch: 34   Global Step: 60110   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-03-04 14:18:02,740-Speed 13903.94 samples/sec   Loss 1.0212   LearningRate 0.0000   Epoch: 34   Global Step: 60120   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-03-04 14:18:20,524-Speed 13830.27 samples/sec   Loss 1.0294   LearningRate 0.0000   Epoch: 34   Global Step: 60130   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-03-04 14:18:38,275-Speed 13845.61 samples/sec   Loss 1.0232   LearningRate 0.0000   Epoch: 34   Global Step: 60140   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-03-04 14:18:56,007-Speed 13860.92 samples/sec   Loss 1.0262   LearningRate 0.0000   Epoch: 34   Global Step: 60150   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-03-04 14:19:13,696-Speed 13894.61 samples/sec   Loss 1.0182   LearningRate 0.0000   Epoch: 34   Global Step: 60160   Fp16 Grad Scale: 8192   Required: 5 hours
Training: 2022-03-04 14:19:31,400-Speed 13882.31 samples/sec   Loss 1.0177   LearningRate 0.0000   Epoch: 34   Global Step: 60170   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-03-04 14:19:49,144-Speed 13851.24 samples/sec   Loss 1.0237   LearningRate 0.0000   Epoch: 34   Global Step: 60180   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-03-04 14:20:06,873-Speed 13862.73 samples/sec   Loss 1.0152   LearningRate 0.0000   Epoch: 34   Global Step: 60190   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-04 14:20:24,550-Speed 13903.72 samples/sec   Loss 1.0177   LearningRate 0.0000   Epoch: 34   Global Step: 60200   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-04 14:20:42,284-Speed 13859.41 samples/sec   Loss 1.0223   LearningRate 0.0000   Epoch: 34   Global Step: 60210   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-04 14:20:59,950-Speed 13914.21 samples/sec   Loss 1.0170   LearningRate 0.0000   Epoch: 34   Global Step: 60220   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-04 14:21:17,670-Speed 13870.01 samples/sec   Loss 1.0211   LearningRate 0.0000   Epoch: 34   Global Step: 60230   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-04 14:21:35,394-Speed 13866.83 samples/sec   Loss 1.0241   LearningRate 0.0000   Epoch: 34   Global Step: 60240   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-04 14:21:53,153-Speed 13839.28 samples/sec   Loss 1.0271   LearningRate 0.0000   Epoch: 34   Global Step: 60250   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-04 14:22:10,831-Speed 13902.54 samples/sec   Loss 1.0271   LearningRate 0.0000   Epoch: 34   Global Step: 60260   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-04 14:22:28,550-Speed 13871.24 samples/sec   Loss 1.0159   LearningRate 0.0000   Epoch: 34   Global Step: 60270   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-04 14:22:46,216-Speed 13912.18 samples/sec   Loss 1.0276   LearningRate 0.0000   Epoch: 34   Global Step: 60280   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-04 14:23:03,939-Speed 13867.94 samples/sec   Loss 1.0274   LearningRate 0.0000   Epoch: 34   Global Step: 60290   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-04 14:23:21,703-Speed 13835.58 samples/sec   Loss 1.0273   LearningRate 0.0000   Epoch: 34   Global Step: 60300   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-04 14:23:39,462-Speed 13839.63 samples/sec   Loss 1.0302   LearningRate 0.0000   Epoch: 34   Global Step: 60310   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-04 14:23:57,274-Speed 13798.34 samples/sec   Loss 1.0212   LearningRate 0.0000   Epoch: 34   Global Step: 60320   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-04 14:24:14,987-Speed 13876.01 samples/sec   Loss 1.0170   LearningRate 0.0000   Epoch: 34   Global Step: 60330   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-04 14:24:32,743-Speed 13841.63 samples/sec   Loss 1.0237   LearningRate 0.0000   Epoch: 34   Global Step: 60340   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-04 14:24:50,463-Speed 13869.60 samples/sec   Loss 1.0190   LearningRate 0.0000   Epoch: 34   Global Step: 60350   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-04 14:25:08,226-Speed 13836.55 samples/sec   Loss 1.0199   LearningRate 0.0000   Epoch: 34   Global Step: 60360   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-04 14:25:26,043-Speed 13794.80 samples/sec   Loss 1.0209   LearningRate 0.0000   Epoch: 34   Global Step: 60370   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-04 14:25:43,776-Speed 13861.20 samples/sec   Loss 1.0233   LearningRate 0.0000   Epoch: 34   Global Step: 60380   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-04 14:26:01,469-Speed 13890.41 samples/sec   Loss 1.0282   LearningRate 0.0000   Epoch: 34   Global Step: 60390   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-04 14:26:19,273-Speed 13805.06 samples/sec   Loss 1.0307   LearningRate 0.0000   Epoch: 34   Global Step: 60400   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-04 14:26:36,972-Speed 13886.38 samples/sec   Loss 1.0221   LearningRate 0.0000   Epoch: 34   Global Step: 60410   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-04 14:26:54,674-Speed 13883.80 samples/sec   Loss 1.0205   LearningRate 0.0000   Epoch: 34   Global Step: 60420   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-04 14:27:12,333-Speed 13918.06 samples/sec   Loss 1.0339   LearningRate 0.0000   Epoch: 34   Global Step: 60430   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-04 14:27:30,144-Speed 13798.80 samples/sec   Loss 1.0184   LearningRate 0.0000   Epoch: 34   Global Step: 60440   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-04 14:27:47,887-Speed 13852.01 samples/sec   Loss 1.0221   LearningRate 0.0000   Epoch: 34   Global Step: 60450   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-04 14:28:05,595-Speed 13879.76 samples/sec   Loss 1.0253   LearningRate 0.0000   Epoch: 34   Global Step: 60460   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-04 14:28:23,324-Speed 13863.21 samples/sec   Loss 1.0207   LearningRate 0.0000   Epoch: 34   Global Step: 60470   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-04 14:28:41,036-Speed 13876.07 samples/sec   Loss 1.0163   LearningRate 0.0000   Epoch: 34   Global Step: 60480   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-04 14:28:58,709-Speed 13906.79 samples/sec   Loss 1.0186   LearningRate 0.0000   Epoch: 34   Global Step: 60490   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-04 14:30:07,663-Speed 3564.17 samples/sec   Loss 1.0206   LearningRate 0.0000   Epoch: 35   Global Step: 60500   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-04 14:30:25,362-Speed 13886.56 samples/sec   Loss 1.0293   LearningRate 0.0000   Epoch: 35   Global Step: 60510   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-04 14:30:43,013-Speed 13923.59 samples/sec   Loss 1.0210   LearningRate 0.0000   Epoch: 35   Global Step: 60520   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-04 14:31:00,665-Speed 13923.86 samples/sec   Loss 1.0191   LearningRate 0.0000   Epoch: 35   Global Step: 60530   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-04 14:31:18,368-Speed 13883.27 samples/sec   Loss 1.0216   LearningRate 0.0000   Epoch: 35   Global Step: 60540   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-04 14:31:36,119-Speed 13845.43 samples/sec   Loss 1.0222   LearningRate 0.0000   Epoch: 35   Global Step: 60550   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-04 14:31:53,795-Speed 13904.71 samples/sec   Loss 1.0115   LearningRate 0.0000   Epoch: 35   Global Step: 60560   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-04 14:32:11,522-Speed 13864.72 samples/sec   Loss 1.0121   LearningRate 0.0000   Epoch: 35   Global Step: 60570   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-04 14:32:29,240-Speed 13871.99 samples/sec   Loss 1.0171   LearningRate 0.0000   Epoch: 35   Global Step: 60580   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-04 14:32:46,938-Speed 13886.41 samples/sec   Loss 1.0110   LearningRate 0.0000   Epoch: 35   Global Step: 60590   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-04 14:33:04,626-Speed 13894.92 samples/sec   Loss 1.0166   LearningRate 0.0000   Epoch: 35   Global Step: 60600   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-04 14:33:22,319-Speed 13891.94 samples/sec   Loss 1.0163   LearningRate 0.0000   Epoch: 35   Global Step: 60610   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-04 14:33:40,017-Speed 13887.60 samples/sec   Loss 1.0154   LearningRate 0.0000   Epoch: 35   Global Step: 60620   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-04 14:33:57,740-Speed 13867.31 samples/sec   Loss 1.0124   LearningRate 0.0000   Epoch: 35   Global Step: 60630   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-04 14:34:15,484-Speed 13851.31 samples/sec   Loss 1.0185   LearningRate 0.0000   Epoch: 35   Global Step: 60640   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-04 14:34:33,291-Speed 13802.21 samples/sec   Loss 1.0206   LearningRate 0.0000   Epoch: 35   Global Step: 60650   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-04 14:34:50,954-Speed 13914.85 samples/sec   Loss 1.0179   LearningRate 0.0000   Epoch: 35   Global Step: 60660   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-04 14:35:08,667-Speed 13875.65 samples/sec   Loss 1.0196   LearningRate 0.0000   Epoch: 35   Global Step: 60670   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-04 14:35:26,521-Speed 13765.77 samples/sec   Loss 1.0122   LearningRate 0.0000   Epoch: 35   Global Step: 60680   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-04 14:35:44,209-Speed 13895.05 samples/sec   Loss 1.0123   LearningRate 0.0000   Epoch: 35   Global Step: 60690   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-04 14:36:01,872-Speed 13914.70 samples/sec   Loss 1.0121   LearningRate 0.0000   Epoch: 35   Global Step: 60700   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-04 14:36:19,644-Speed 13829.43 samples/sec   Loss 1.0127   LearningRate 0.0000   Epoch: 35   Global Step: 60710   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-04 14:36:37,423-Speed 13825.07 samples/sec   Loss 1.0152   LearningRate 0.0000   Epoch: 35   Global Step: 60720   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-04 14:36:55,163-Speed 13853.60 samples/sec   Loss 1.0199   LearningRate 0.0000   Epoch: 35   Global Step: 60730   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-04 14:37:12,901-Speed 13856.23 samples/sec   Loss 1.0179   LearningRate 0.0000   Epoch: 35   Global Step: 60740   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-04 14:37:30,620-Speed 13870.99 samples/sec   Loss 1.0161   LearningRate 0.0000   Epoch: 35   Global Step: 60750   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-04 14:37:48,354-Speed 13859.08 samples/sec   Loss 1.0150   LearningRate 0.0000   Epoch: 35   Global Step: 60760   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-04 14:38:06,030-Speed 13904.40 samples/sec   Loss 1.0259   LearningRate 0.0000   Epoch: 35   Global Step: 60770   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-04 14:38:23,731-Speed 13885.25 samples/sec   Loss 1.0140   LearningRate 0.0000   Epoch: 35   Global Step: 60780   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-04 14:38:41,461-Speed 13862.01 samples/sec   Loss 1.0074   LearningRate 0.0000   Epoch: 35   Global Step: 60790   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-04 14:38:59,107-Speed 13927.65 samples/sec   Loss 1.0138   LearningRate 0.0000   Epoch: 35   Global Step: 60800   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-04 14:39:16,793-Speed 13896.83 samples/sec   Loss 1.0191   LearningRate 0.0000   Epoch: 35   Global Step: 60810   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-04 14:39:34,543-Speed 13849.58 samples/sec   Loss 1.0180   LearningRate 0.0000   Epoch: 35   Global Step: 60820   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-04 14:39:52,397-Speed 13766.68 samples/sec   Loss 1.0110   LearningRate 0.0000   Epoch: 35   Global Step: 60830   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-04 14:40:10,227-Speed 13784.35 samples/sec   Loss 1.0153   LearningRate 0.0000   Epoch: 35   Global Step: 60840   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-04 14:40:27,894-Speed 13910.84 samples/sec   Loss 1.0167   LearningRate 0.0000   Epoch: 35   Global Step: 60850   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-04 14:40:45,559-Speed 13913.49 samples/sec   Loss 1.0212   LearningRate 0.0000   Epoch: 35   Global Step: 60860   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-04 14:41:03,252-Speed 13891.66 samples/sec   Loss 1.0160   LearningRate 0.0000   Epoch: 35   Global Step: 60870   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-04 14:41:20,938-Speed 13896.60 samples/sec   Loss 1.0235   LearningRate 0.0000   Epoch: 35   Global Step: 60880   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-04 14:41:38,589-Speed 13924.29 samples/sec   Loss 1.0222   LearningRate 0.0000   Epoch: 35   Global Step: 60890   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-04 14:41:56,320-Speed 13860.85 samples/sec   Loss 1.0109   LearningRate 0.0000   Epoch: 35   Global Step: 60900   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-04 14:42:13,983-Speed 13915.27 samples/sec   Loss 1.0147   LearningRate 0.0000   Epoch: 35   Global Step: 60910   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-04 14:42:31,682-Speed 13886.61 samples/sec   Loss 1.0117   LearningRate 0.0000   Epoch: 35   Global Step: 60920   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-04 14:42:49,360-Speed 13904.06 samples/sec   Loss 1.0145   LearningRate 0.0000   Epoch: 35   Global Step: 60930   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-04 14:43:07,151-Speed 13813.95 samples/sec   Loss 1.0144   LearningRate 0.0000   Epoch: 35   Global Step: 60940   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-04 14:43:24,834-Speed 13899.12 samples/sec   Loss 1.0191   LearningRate 0.0000   Epoch: 35   Global Step: 60950   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-04 14:43:42,544-Speed 13878.83 samples/sec   Loss 1.0136   LearningRate 0.0000   Epoch: 35   Global Step: 60960   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-04 14:44:00,271-Speed 13865.01 samples/sec   Loss 1.0119   LearningRate 0.0000   Epoch: 35   Global Step: 60970   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-04 14:44:17,960-Speed 13894.63 samples/sec   Loss 1.0149   LearningRate 0.0000   Epoch: 35   Global Step: 60980   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-04 14:44:35,650-Speed 13893.39 samples/sec   Loss 1.0116   LearningRate 0.0000   Epoch: 35   Global Step: 60990   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-04 14:44:53,442-Speed 13813.92 samples/sec   Loss 1.0274   LearningRate 0.0000   Epoch: 35   Global Step: 61000   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-04 14:45:11,334-Speed 13736.90 samples/sec   Loss 1.0109   LearningRate 0.0000   Epoch: 35   Global Step: 61010   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-04 14:45:29,158-Speed 13789.33 samples/sec   Loss 1.0229   LearningRate 0.0000   Epoch: 35   Global Step: 61020   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-04 14:45:46,872-Speed 13874.38 samples/sec   Loss 1.0068   LearningRate 0.0000   Epoch: 35   Global Step: 61030   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-04 14:46:04,627-Speed 13843.35 samples/sec   Loss 1.0101   LearningRate 0.0000   Epoch: 35   Global Step: 61040   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-04 14:46:22,364-Speed 13856.07 samples/sec   Loss 1.0080   LearningRate 0.0000   Epoch: 35   Global Step: 61050   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-04 14:46:40,067-Speed 13883.35 samples/sec   Loss 1.0142   LearningRate 0.0000   Epoch: 35   Global Step: 61060   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-04 14:46:57,741-Speed 13906.69 samples/sec   Loss 1.0131   LearningRate 0.0000   Epoch: 35   Global Step: 61070   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-04 14:47:15,417-Speed 13904.83 samples/sec   Loss 1.0080   LearningRate 0.0000   Epoch: 35   Global Step: 61080   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-04 14:47:33,105-Speed 13894.89 samples/sec   Loss 1.0068   LearningRate 0.0000   Epoch: 35   Global Step: 61090   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-04 14:47:50,835-Speed 13862.26 samples/sec   Loss 1.0098   LearningRate 0.0000   Epoch: 35   Global Step: 61100   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-04 14:48:08,588-Speed 13844.16 samples/sec   Loss 1.0078   LearningRate 0.0000   Epoch: 35   Global Step: 61110   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-04 14:48:26,397-Speed 13800.75 samples/sec   Loss 1.0151   LearningRate 0.0000   Epoch: 35   Global Step: 61120   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-03-04 14:48:44,130-Speed 13860.32 samples/sec   Loss 1.0156   LearningRate 0.0000   Epoch: 35   Global Step: 61130   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-04 14:49:01,891-Speed 13837.73 samples/sec   Loss 1.0130   LearningRate 0.0000   Epoch: 35   Global Step: 61140   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-04 14:49:19,629-Speed 13855.21 samples/sec   Loss 1.0112   LearningRate 0.0000   Epoch: 35   Global Step: 61150   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-04 14:49:37,420-Speed 13814.88 samples/sec   Loss 1.0139   LearningRate 0.0000   Epoch: 35   Global Step: 61160   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-04 14:49:55,211-Speed 13814.90 samples/sec   Loss 1.0174   LearningRate 0.0000   Epoch: 35   Global Step: 61170   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-04 14:50:12,971-Speed 13838.67 samples/sec   Loss 1.0116   LearningRate 0.0000   Epoch: 35   Global Step: 61180   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-04 14:50:30,846-Speed 13749.61 samples/sec   Loss 1.0158   LearningRate 0.0000   Epoch: 35   Global Step: 61190   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-04 14:50:48,652-Speed 13803.81 samples/sec   Loss 1.0187   LearningRate 0.0000   Epoch: 35   Global Step: 61200   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-04 14:51:06,349-Speed 13888.13 samples/sec   Loss 1.0108   LearningRate 0.0000   Epoch: 35   Global Step: 61210   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-04 14:51:24,106-Speed 13840.65 samples/sec   Loss 1.0149   LearningRate 0.0000   Epoch: 35   Global Step: 61220   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-04 14:51:41,971-Speed 13757.34 samples/sec   Loss 1.0042   LearningRate 0.0000   Epoch: 35   Global Step: 61230   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-04 14:51:59,735-Speed 13835.51 samples/sec   Loss 1.0157   LearningRate 0.0000   Epoch: 35   Global Step: 61240   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-04 14:52:17,465-Speed 13862.53 samples/sec   Loss 1.0098   LearningRate 0.0000   Epoch: 35   Global Step: 61250   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-04 14:52:35,188-Speed 13867.74 samples/sec   Loss 1.0131   LearningRate 0.0000   Epoch: 35   Global Step: 61260   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-04 14:52:52,945-Speed 13840.60 samples/sec   Loss 1.0073   LearningRate 0.0000   Epoch: 35   Global Step: 61270   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-04 14:53:10,715-Speed 13831.04 samples/sec   Loss 1.0127   LearningRate 0.0000   Epoch: 35   Global Step: 61280   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-04 14:53:28,478-Speed 13837.13 samples/sec   Loss 1.0081   LearningRate 0.0000   Epoch: 35   Global Step: 61290   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-04 14:53:46,210-Speed 13859.83 samples/sec   Loss 1.0133   LearningRate 0.0000   Epoch: 35   Global Step: 61300   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-04 14:54:04,110-Speed 13730.60 samples/sec   Loss 1.0022   LearningRate 0.0000   Epoch: 35   Global Step: 61310   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-04 14:54:21,967-Speed 13763.88 samples/sec   Loss 1.0102   LearningRate 0.0000   Epoch: 35   Global Step: 61320   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-04 14:54:39,673-Speed 13880.77 samples/sec   Loss 1.0122   LearningRate 0.0000   Epoch: 35   Global Step: 61330   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-04 14:54:57,394-Speed 13869.41 samples/sec   Loss 1.0088   LearningRate 0.0000   Epoch: 35   Global Step: 61340   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-04 14:55:15,268-Speed 13750.12 samples/sec   Loss 1.0003   LearningRate 0.0000   Epoch: 35   Global Step: 61350   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-04 14:55:32,979-Speed 13877.71 samples/sec   Loss 1.0034   LearningRate 0.0000   Epoch: 35   Global Step: 61360   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-04 14:55:50,759-Speed 13823.53 samples/sec   Loss 0.9977   LearningRate 0.0000   Epoch: 35   Global Step: 61370   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-04 14:56:08,454-Speed 13889.65 samples/sec   Loss 0.9994   LearningRate 0.0000   Epoch: 35   Global Step: 61380   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-04 14:56:26,217-Speed 13836.25 samples/sec   Loss 1.0008   LearningRate 0.0000   Epoch: 35   Global Step: 61390   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-04 14:56:43,940-Speed 13868.01 samples/sec   Loss 1.0007   LearningRate 0.0000   Epoch: 35   Global Step: 61400   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-04 14:57:01,673-Speed 13860.30 samples/sec   Loss 1.0075   LearningRate 0.0000   Epoch: 35   Global Step: 61410   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-04 14:57:19,455-Speed 13821.66 samples/sec   Loss 1.0012   LearningRate 0.0000   Epoch: 35   Global Step: 61420   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-04 14:57:37,163-Speed 13878.97 samples/sec   Loss 1.0126   LearningRate 0.0000   Epoch: 35   Global Step: 61430   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-04 14:57:54,920-Speed 13840.95 samples/sec   Loss 1.0178   LearningRate 0.0000   Epoch: 35   Global Step: 61440   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-04 14:58:12,649-Speed 13863.10 samples/sec   Loss 1.0124   LearningRate 0.0000   Epoch: 35   Global Step: 61450   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-04 14:58:30,416-Speed 13833.44 samples/sec   Loss 1.0130   LearningRate 0.0000   Epoch: 35   Global Step: 61460   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-04 14:58:48,173-Speed 13841.22 samples/sec   Loss 1.0072   LearningRate 0.0000   Epoch: 35   Global Step: 61470   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-04 14:59:05,992-Speed 13792.49 samples/sec   Loss 1.0060   LearningRate 0.0000   Epoch: 35   Global Step: 61480   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-04 14:59:23,677-Speed 13897.26 samples/sec   Loss 0.9988   LearningRate 0.0000   Epoch: 35   Global Step: 61490   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-04 14:59:41,406-Speed 13863.26 samples/sec   Loss 1.0087   LearningRate 0.0000   Epoch: 35   Global Step: 61500   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-04 14:59:59,135-Speed 13863.24 samples/sec   Loss 1.0049   LearningRate 0.0000   Epoch: 35   Global Step: 61510   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-04 15:00:16,857-Speed 13868.36 samples/sec   Loss 0.9945   LearningRate 0.0000   Epoch: 35   Global Step: 61520   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-04 15:00:34,591-Speed 13858.42 samples/sec   Loss 1.0158   LearningRate 0.0000   Epoch: 35   Global Step: 61530   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-03-04 15:00:52,314-Speed 13868.04 samples/sec   Loss 1.0045   LearningRate 0.0000   Epoch: 35   Global Step: 61540   Fp16 Grad Scale: 65536   Required: 4 hours
Training: 2022-03-04 15:01:10,026-Speed 13876.40 samples/sec   Loss 1.0084   LearningRate 0.0000   Epoch: 35   Global Step: 61550   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-04 15:01:27,748-Speed 13867.94 samples/sec   Loss 0.9982   LearningRate 0.0000   Epoch: 35   Global Step: 61560   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-04 15:01:45,482-Speed 13859.17 samples/sec   Loss 1.0120   LearningRate 0.0000   Epoch: 35   Global Step: 61570   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-04 15:02:03,278-Speed 13810.33 samples/sec   Loss 1.0096   LearningRate 0.0000   Epoch: 35   Global Step: 61580   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-04 15:02:21,024-Speed 13852.86 samples/sec   Loss 1.0061   LearningRate 0.0000   Epoch: 35   Global Step: 61590   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-04 15:02:38,757-Speed 13859.89 samples/sec   Loss 1.0112   LearningRate 0.0000   Epoch: 35   Global Step: 61600   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-04 15:02:56,520-Speed 13836.07 samples/sec   Loss 0.9982   LearningRate 0.0000   Epoch: 35   Global Step: 61610   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-04 15:03:14,169-Speed 13925.37 samples/sec   Loss 1.0004   LearningRate 0.0000   Epoch: 35   Global Step: 61620   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-04 15:03:31,806-Speed 13936.12 samples/sec   Loss 1.0052   LearningRate 0.0000   Epoch: 35   Global Step: 61630   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-04 15:03:49,526-Speed 13870.90 samples/sec   Loss 1.0115   LearningRate 0.0000   Epoch: 35   Global Step: 61640   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-04 15:04:07,282-Speed 13841.31 samples/sec   Loss 1.0162   LearningRate 0.0000   Epoch: 35   Global Step: 61650   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-04 15:04:24,977-Speed 13893.30 samples/sec   Loss 1.0119   LearningRate 0.0000   Epoch: 35   Global Step: 61660   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-04 15:04:42,674-Speed 13888.19 samples/sec   Loss 1.0052   LearningRate 0.0000   Epoch: 35   Global Step: 61670   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-04 15:05:00,401-Speed 13864.04 samples/sec   Loss 1.0030   LearningRate 0.0000   Epoch: 35   Global Step: 61680   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-04 15:05:18,077-Speed 13905.56 samples/sec   Loss 0.9968   LearningRate 0.0000   Epoch: 35   Global Step: 61690   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-04 15:05:35,820-Speed 13851.83 samples/sec   Loss 1.0036   LearningRate 0.0000   Epoch: 35   Global Step: 61700   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-04 15:05:53,554-Speed 13858.63 samples/sec   Loss 1.0036   LearningRate 0.0000   Epoch: 35   Global Step: 61710   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-04 15:06:11,397-Speed 13774.34 samples/sec   Loss 1.0073   LearningRate 0.0000   Epoch: 35   Global Step: 61720   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-04 15:06:29,192-Speed 13811.43 samples/sec   Loss 1.0104   LearningRate 0.0000   Epoch: 35   Global Step: 61730   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-04 15:06:46,989-Speed 13810.47 samples/sec   Loss 1.0025   LearningRate 0.0000   Epoch: 35   Global Step: 61740   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-04 15:07:04,729-Speed 13853.99 samples/sec   Loss 0.9985   LearningRate 0.0000   Epoch: 35   Global Step: 61750   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-04 15:07:22,469-Speed 13854.73 samples/sec   Loss 0.9989   LearningRate 0.0000   Epoch: 35   Global Step: 61760   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-04 15:07:40,166-Speed 13887.57 samples/sec   Loss 1.0014   LearningRate 0.0000   Epoch: 35   Global Step: 61770   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-04 15:07:57,933-Speed 13833.32 samples/sec   Loss 0.9960   LearningRate 0.0000   Epoch: 35   Global Step: 61780   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-04 15:08:15,741-Speed 13801.32 samples/sec   Loss 1.0067   LearningRate 0.0000   Epoch: 35   Global Step: 61790   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-04 15:08:33,475-Speed 13859.27 samples/sec   Loss 1.0043   LearningRate 0.0000   Epoch: 35   Global Step: 61800   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-04 15:08:51,314-Speed 13777.57 samples/sec   Loss 1.0043   LearningRate 0.0000   Epoch: 35   Global Step: 61810   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-04 15:09:09,085-Speed 13829.67 samples/sec   Loss 1.0079   LearningRate 0.0000   Epoch: 35   Global Step: 61820   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-04 15:09:26,864-Speed 13824.05 samples/sec   Loss 1.0048   LearningRate 0.0000   Epoch: 35   Global Step: 61830   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-04 15:09:44,628-Speed 13835.66 samples/sec   Loss 1.0030   LearningRate 0.0000   Epoch: 35   Global Step: 61840   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-04 15:10:02,466-Speed 13778.55 samples/sec   Loss 1.0022   LearningRate 0.0000   Epoch: 35   Global Step: 61850   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-04 15:10:20,334-Speed 13754.31 samples/sec   Loss 1.0110   LearningRate 0.0000   Epoch: 35   Global Step: 61860   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-04 15:10:38,187-Speed 13767.11 samples/sec   Loss 1.0031   LearningRate 0.0000   Epoch: 35   Global Step: 61870   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-04 15:10:55,958-Speed 13829.67 samples/sec   Loss 0.9954   LearningRate 0.0000   Epoch: 35   Global Step: 61880   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-03-04 15:11:13,823-Speed 13757.81 samples/sec   Loss 1.0045   LearningRate 0.0000   Epoch: 35   Global Step: 61890   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-03-04 15:11:31,695-Speed 13752.02 samples/sec   Loss 0.9974   LearningRate 0.0000   Epoch: 35   Global Step: 61900   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-03-04 15:11:49,502-Speed 13801.99 samples/sec   Loss 1.0028   LearningRate 0.0000   Epoch: 35   Global Step: 61910   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-03-04 15:12:07,369-Speed 13755.72 samples/sec   Loss 1.0012   LearningRate 0.0000   Epoch: 35   Global Step: 61920   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-03-04 15:12:25,314-Speed 13696.15 samples/sec   Loss 1.0043   LearningRate 0.0000   Epoch: 35   Global Step: 61930   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-03-04 15:12:43,238-Speed 13712.52 samples/sec   Loss 1.0043   LearningRate 0.0000   Epoch: 35   Global Step: 61940   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-03-04 15:13:01,012-Speed 13828.40 samples/sec   Loss 1.0028   LearningRate 0.0000   Epoch: 35   Global Step: 61950   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-03-04 15:13:18,782-Speed 13830.25 samples/sec   Loss 1.0112   LearningRate 0.0000   Epoch: 35   Global Step: 61960   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-03-04 15:13:36,653-Speed 13753.31 samples/sec   Loss 0.9980   LearningRate 0.0000   Epoch: 35   Global Step: 61970   Fp16 Grad Scale: 8192   Required: 4 hours
Training: 2022-03-04 15:13:54,544-Speed 13737.39 samples/sec   Loss 1.0065   LearningRate 0.0000   Epoch: 35   Global Step: 61980   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-04 15:14:12,402-Speed 13762.59 samples/sec   Loss 1.0010   LearningRate 0.0000   Epoch: 35   Global Step: 61990   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-04 15:14:30,228-Speed 13787.86 samples/sec   Loss 1.0014   LearningRate 0.0000   Epoch: 35   Global Step: 62000   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-04 15:14:48,095-Speed 13755.95 samples/sec   Loss 0.9999   LearningRate 0.0000   Epoch: 35   Global Step: 62010   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-04 15:15:05,949-Speed 13765.66 samples/sec   Loss 1.0092   LearningRate 0.0000   Epoch: 35   Global Step: 62020   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-04 15:15:23,818-Speed 13754.47 samples/sec   Loss 0.9971   LearningRate 0.0000   Epoch: 35   Global Step: 62030   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-04 15:15:41,682-Speed 13758.50 samples/sec   Loss 1.0114   LearningRate 0.0000   Epoch: 35   Global Step: 62040   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-04 15:15:59,559-Speed 13748.50 samples/sec   Loss 0.9939   LearningRate 0.0000   Epoch: 35   Global Step: 62050   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-04 15:16:17,381-Speed 13790.76 samples/sec   Loss 1.0027   LearningRate 0.0000   Epoch: 35   Global Step: 62060   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-04 15:16:35,240-Speed 13762.19 samples/sec   Loss 0.9978   LearningRate 0.0000   Epoch: 35   Global Step: 62070   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-04 15:16:53,170-Speed 13707.00 samples/sec   Loss 1.0072   LearningRate 0.0000   Epoch: 35   Global Step: 62080   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-04 15:17:10,964-Speed 13812.13 samples/sec   Loss 1.0089   LearningRate 0.0000   Epoch: 35   Global Step: 62090   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-04 15:17:28,801-Speed 13779.63 samples/sec   Loss 0.9988   LearningRate 0.0000   Epoch: 35   Global Step: 62100   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-04 15:17:46,638-Speed 13778.95 samples/sec   Loss 1.0025   LearningRate 0.0000   Epoch: 35   Global Step: 62110   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-04 15:18:04,487-Speed 13771.07 samples/sec   Loss 1.0013   LearningRate 0.0000   Epoch: 35   Global Step: 62120   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-04 15:18:22,382-Speed 13734.19 samples/sec   Loss 1.0005   LearningRate 0.0000   Epoch: 35   Global Step: 62130   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-04 15:18:40,238-Speed 13764.97 samples/sec   Loss 1.0023   LearningRate 0.0000   Epoch: 35   Global Step: 62140   Fp16 Grad Scale: 32768   Required: 4 hours
Training: 2022-03-04 15:18:58,019-Speed 13821.91 samples/sec   Loss 1.0011   LearningRate 0.0000   Epoch: 35   Global Step: 62150   Fp16 Grad Scale: 16384   Required: 4 hours
Training: 2022-03-04 15:19:15,904-Speed 13742.49 samples/sec   Loss 1.0065   LearningRate 0.0000   Epoch: 35   Global Step: 62160   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-04 15:19:33,720-Speed 13794.44 samples/sec   Loss 1.0041   LearningRate 0.0000   Epoch: 35   Global Step: 62170   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-04 15:19:51,515-Speed 13811.95 samples/sec   Loss 1.0022   LearningRate 0.0000   Epoch: 35   Global Step: 62180   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-04 15:20:09,346-Speed 13783.90 samples/sec   Loss 1.0019   LearningRate 0.0000   Epoch: 35   Global Step: 62190   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-04 15:20:27,277-Speed 13706.15 samples/sec   Loss 1.0037   LearningRate 0.0000   Epoch: 35   Global Step: 62200   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-04 15:20:45,171-Speed 13734.95 samples/sec   Loss 0.9951   LearningRate 0.0000   Epoch: 35   Global Step: 62210   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-04 15:21:53,981-Speed 3571.67 samples/sec   Loss 1.0056   LearningRate 0.0000   Epoch: 36   Global Step: 62220   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-04 15:22:11,757-Speed 13825.93 samples/sec   Loss 0.9978   LearningRate 0.0000   Epoch: 36   Global Step: 62230   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-04 15:22:29,514-Speed 13840.76 samples/sec   Loss 0.9970   LearningRate 0.0000   Epoch: 36   Global Step: 62240   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-04 15:22:47,343-Speed 13785.02 samples/sec   Loss 1.0043   LearningRate 0.0000   Epoch: 36   Global Step: 62250   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-04 15:23:05,172-Speed 13785.73 samples/sec   Loss 1.0038   LearningRate 0.0000   Epoch: 36   Global Step: 62260   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-04 15:23:22,955-Speed 13822.05 samples/sec   Loss 0.9985   LearningRate 0.0000   Epoch: 36   Global Step: 62270   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-04 15:23:40,741-Speed 13818.18 samples/sec   Loss 0.9916   LearningRate 0.0000   Epoch: 36   Global Step: 62280   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-04 15:23:58,576-Speed 13780.46 samples/sec   Loss 0.9994   LearningRate 0.0000   Epoch: 36   Global Step: 62290   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-04 15:24:16,366-Speed 13815.26 samples/sec   Loss 0.9995   LearningRate 0.0000   Epoch: 36   Global Step: 62300   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-04 15:24:34,182-Speed 13795.06 samples/sec   Loss 0.9983   LearningRate 0.0000   Epoch: 36   Global Step: 62310   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-04 15:24:52,188-Speed 13650.13 samples/sec   Loss 1.0022   LearningRate 0.0000   Epoch: 36   Global Step: 62320   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-04 15:25:09,987-Speed 13808.30 samples/sec   Loss 0.9900   LearningRate 0.0000   Epoch: 36   Global Step: 62330   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-04 15:25:27,829-Speed 13775.02 samples/sec   Loss 0.9957   LearningRate 0.0000   Epoch: 36   Global Step: 62340   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-04 15:25:45,622-Speed 13813.44 samples/sec   Loss 0.9891   LearningRate 0.0000   Epoch: 36   Global Step: 62350   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-03-04 15:26:03,429-Speed 13802.65 samples/sec   Loss 1.0042   LearningRate 0.0000   Epoch: 36   Global Step: 62360   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-04 15:26:21,261-Speed 13782.39 samples/sec   Loss 0.9983   LearningRate 0.0000   Epoch: 36   Global Step: 62370   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-04 15:26:39,017-Speed 13841.56 samples/sec   Loss 0.9962   LearningRate 0.0000   Epoch: 36   Global Step: 62380   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-04 15:26:56,836-Speed 13793.28 samples/sec   Loss 0.9940   LearningRate 0.0000   Epoch: 36   Global Step: 62390   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-04 15:27:14,618-Speed 13821.50 samples/sec   Loss 0.9996   LearningRate 0.0000   Epoch: 36   Global Step: 62400   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-04 15:27:32,570-Speed 13690.99 samples/sec   Loss 0.9955   LearningRate 0.0000   Epoch: 36   Global Step: 62410   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-04 15:27:50,357-Speed 13817.78 samples/sec   Loss 0.9929   LearningRate 0.0000   Epoch: 36   Global Step: 62420   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-04 15:28:08,247-Speed 13738.33 samples/sec   Loss 0.9993   LearningRate 0.0000   Epoch: 36   Global Step: 62430   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-04 15:28:26,157-Speed 13722.52 samples/sec   Loss 1.0016   LearningRate 0.0000   Epoch: 36   Global Step: 62440   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-04 15:28:43,948-Speed 13814.90 samples/sec   Loss 0.9898   LearningRate 0.0000   Epoch: 36   Global Step: 62450   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-04 15:29:01,782-Speed 13781.05 samples/sec   Loss 0.9995   LearningRate 0.0000   Epoch: 36   Global Step: 62460   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-04 15:29:19,567-Speed 13819.07 samples/sec   Loss 0.9920   LearningRate 0.0000   Epoch: 36   Global Step: 62470   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-04 15:29:37,482-Speed 13719.28 samples/sec   Loss 0.9973   LearningRate 0.0000   Epoch: 36   Global Step: 62480   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-04 15:29:55,368-Speed 13741.05 samples/sec   Loss 0.9952   LearningRate 0.0000   Epoch: 36   Global Step: 62490   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-04 15:30:13,175-Speed 13801.88 samples/sec   Loss 0.9951   LearningRate 0.0000   Epoch: 36   Global Step: 62500   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-04 15:30:31,038-Speed 13759.48 samples/sec   Loss 0.9968   LearningRate 0.0000   Epoch: 36   Global Step: 62510   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-04 15:30:49,034-Speed 13657.57 samples/sec   Loss 0.9958   LearningRate 0.0000   Epoch: 36   Global Step: 62520   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-04 15:31:06,900-Speed 13756.34 samples/sec   Loss 1.0033   LearningRate 0.0000   Epoch: 36   Global Step: 62530   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-04 15:31:24,678-Speed 13824.97 samples/sec   Loss 0.9942   LearningRate 0.0000   Epoch: 36   Global Step: 62540   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-04 15:31:42,611-Speed 13704.89 samples/sec   Loss 1.0032   LearningRate 0.0000   Epoch: 36   Global Step: 62550   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-04 15:32:00,492-Speed 13745.13 samples/sec   Loss 0.9985   LearningRate 0.0000   Epoch: 36   Global Step: 62560   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-04 15:32:18,320-Speed 13785.71 samples/sec   Loss 0.9967   LearningRate 0.0000   Epoch: 36   Global Step: 62570   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-04 15:32:36,212-Speed 13736.46 samples/sec   Loss 0.9989   LearningRate 0.0000   Epoch: 36   Global Step: 62580   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-04 15:32:54,061-Speed 13769.86 samples/sec   Loss 0.9989   LearningRate 0.0000   Epoch: 36   Global Step: 62590   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-04 15:33:11,945-Speed 13744.34 samples/sec   Loss 1.0012   LearningRate 0.0000   Epoch: 36   Global Step: 62600   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-04 15:33:29,862-Speed 13717.49 samples/sec   Loss 0.9948   LearningRate 0.0000   Epoch: 36   Global Step: 62610   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-04 15:33:47,813-Speed 13691.89 samples/sec   Loss 0.9943   LearningRate 0.0000   Epoch: 36   Global Step: 62620   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-04 15:34:05,766-Speed 13689.64 samples/sec   Loss 0.9982   LearningRate 0.0000   Epoch: 36   Global Step: 62630   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-04 15:34:23,567-Speed 13806.93 samples/sec   Loss 0.9974   LearningRate 0.0000   Epoch: 36   Global Step: 62640   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-04 15:34:41,403-Speed 13780.03 samples/sec   Loss 0.9881   LearningRate 0.0000   Epoch: 36   Global Step: 62650   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-04 15:34:59,280-Speed 13747.72 samples/sec   Loss 1.0004   LearningRate 0.0000   Epoch: 36   Global Step: 62660   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-04 15:35:17,322-Speed 13622.80 samples/sec   Loss 0.9981   LearningRate 0.0000   Epoch: 36   Global Step: 62670   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-04 15:35:35,157-Speed 13780.77 samples/sec   Loss 1.0001   LearningRate 0.0000   Epoch: 36   Global Step: 62680   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-04 15:35:53,058-Speed 13729.16 samples/sec   Loss 0.9995   LearningRate 0.0000   Epoch: 36   Global Step: 62690   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-04 15:36:11,023-Speed 13683.32 samples/sec   Loss 0.9932   LearningRate 0.0000   Epoch: 36   Global Step: 62700   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-04 15:36:28,875-Speed 13766.58 samples/sec   Loss 0.9853   LearningRate 0.0000   Epoch: 36   Global Step: 62710   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-03-04 15:36:46,671-Speed 13811.40 samples/sec   Loss 0.9944   LearningRate 0.0000   Epoch: 36   Global Step: 62720   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-03-04 15:37:04,554-Speed 13743.06 samples/sec   Loss 0.9929   LearningRate 0.0000   Epoch: 36   Global Step: 62730   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-04 15:37:22,347-Speed 13813.64 samples/sec   Loss 0.9956   LearningRate 0.0000   Epoch: 36   Global Step: 62740   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-04 15:37:40,178-Speed 13783.10 samples/sec   Loss 1.0034   LearningRate 0.0000   Epoch: 36   Global Step: 62750   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-04 15:37:57,962-Speed 13819.88 samples/sec   Loss 0.9917   LearningRate 0.0000   Epoch: 36   Global Step: 62760   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-04 15:38:15,865-Speed 13728.62 samples/sec   Loss 0.9950   LearningRate 0.0000   Epoch: 36   Global Step: 62770   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-04 15:38:33,774-Speed 13723.05 samples/sec   Loss 0.9872   LearningRate 0.0000   Epoch: 36   Global Step: 62780   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-04 15:38:51,732-Speed 13686.17 samples/sec   Loss 0.9965   LearningRate 0.0000   Epoch: 36   Global Step: 62790   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-04 15:39:09,584-Speed 13769.64 samples/sec   Loss 0.9950   LearningRate 0.0000   Epoch: 36   Global Step: 62800   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-04 15:39:27,408-Speed 13789.55 samples/sec   Loss 0.9920   LearningRate 0.0000   Epoch: 36   Global Step: 62810   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-04 15:39:45,409-Speed 13653.51 samples/sec   Loss 0.9926   LearningRate 0.0000   Epoch: 36   Global Step: 62820   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-04 15:40:03,293-Speed 13742.88 samples/sec   Loss 0.9998   LearningRate 0.0000   Epoch: 36   Global Step: 62830   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-04 15:40:21,205-Speed 13721.18 samples/sec   Loss 0.9911   LearningRate 0.0000   Epoch: 36   Global Step: 62840   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-04 15:40:39,055-Speed 13768.71 samples/sec   Loss 0.9880   LearningRate 0.0000   Epoch: 36   Global Step: 62850   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-04 15:40:56,982-Speed 13710.04 samples/sec   Loss 0.9941   LearningRate 0.0000   Epoch: 36   Global Step: 62860   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-04 15:41:14,763-Speed 13821.98 samples/sec   Loss 0.9900   LearningRate 0.0000   Epoch: 36   Global Step: 62870   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-04 15:41:32,545-Speed 13822.14 samples/sec   Loss 1.0052   LearningRate 0.0000   Epoch: 36   Global Step: 62880   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-04 15:41:50,458-Speed 13720.42 samples/sec   Loss 0.9980   LearningRate 0.0000   Epoch: 36   Global Step: 62890   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-04 15:42:08,299-Speed 13775.75 samples/sec   Loss 0.9898   LearningRate 0.0000   Epoch: 36   Global Step: 62900   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-04 15:42:26,263-Speed 13681.94 samples/sec   Loss 0.9939   LearningRate 0.0000   Epoch: 36   Global Step: 62910   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-04 15:42:44,140-Speed 13747.95 samples/sec   Loss 0.9964   LearningRate 0.0000   Epoch: 36   Global Step: 62920   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-04 15:43:02,027-Speed 13740.81 samples/sec   Loss 1.0020   LearningRate 0.0000   Epoch: 36   Global Step: 62930   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-03-04 15:43:19,824-Speed 13809.91 samples/sec   Loss 0.9926   LearningRate 0.0000   Epoch: 36   Global Step: 62940   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-04 15:43:37,646-Speed 13790.33 samples/sec   Loss 0.9944   LearningRate 0.0000   Epoch: 36   Global Step: 62950   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-04 15:43:55,533-Speed 13740.22 samples/sec   Loss 0.9919   LearningRate 0.0000   Epoch: 36   Global Step: 62960   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-04 15:44:13,352-Speed 13793.16 samples/sec   Loss 0.9901   LearningRate 0.0000   Epoch: 36   Global Step: 62970   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-04 15:44:31,273-Speed 13714.25 samples/sec   Loss 0.9936   LearningRate 0.0000   Epoch: 36   Global Step: 62980   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-04 15:44:49,134-Speed 13760.79 samples/sec   Loss 0.9934   LearningRate 0.0000   Epoch: 36   Global Step: 62990   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-04 15:45:06,995-Speed 13759.72 samples/sec   Loss 0.9920   LearningRate 0.0000   Epoch: 36   Global Step: 63000   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-04 15:45:24,837-Speed 13775.16 samples/sec   Loss 0.9947   LearningRate 0.0000   Epoch: 36   Global Step: 63010   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-04 15:45:42,737-Speed 13731.16 samples/sec   Loss 0.9908   LearningRate 0.0000   Epoch: 36   Global Step: 63020   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-04 15:46:00,571-Speed 13781.32 samples/sec   Loss 0.9923   LearningRate 0.0000   Epoch: 36   Global Step: 63030   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-04 15:46:18,398-Speed 13786.76 samples/sec   Loss 0.9973   LearningRate 0.0000   Epoch: 36   Global Step: 63040   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-04 15:46:36,334-Speed 13702.46 samples/sec   Loss 0.9837   LearningRate 0.0000   Epoch: 36   Global Step: 63050   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-04 15:46:54,191-Speed 13763.67 samples/sec   Loss 0.9912   LearningRate 0.0000   Epoch: 36   Global Step: 63060   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-04 15:47:11,971-Speed 13823.43 samples/sec   Loss 0.9893   LearningRate 0.0000   Epoch: 36   Global Step: 63070   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-04 15:47:29,796-Speed 13787.80 samples/sec   Loss 0.9929   LearningRate 0.0000   Epoch: 36   Global Step: 63080   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-04 15:47:47,645-Speed 13769.85 samples/sec   Loss 0.9791   LearningRate 0.0000   Epoch: 36   Global Step: 63090   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-04 15:48:05,536-Speed 13737.75 samples/sec   Loss 0.9889   LearningRate 0.0000   Epoch: 36   Global Step: 63100   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-04 15:48:23,352-Speed 13795.03 samples/sec   Loss 0.9913   LearningRate 0.0000   Epoch: 36   Global Step: 63110   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-04 15:48:41,213-Speed 13760.40 samples/sec   Loss 0.9920   LearningRate 0.0000   Epoch: 36   Global Step: 63120   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-04 15:48:59,027-Speed 13796.77 samples/sec   Loss 0.9859   LearningRate 0.0000   Epoch: 36   Global Step: 63130   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-04 15:49:16,856-Speed 13785.54 samples/sec   Loss 0.9826   LearningRate 0.0000   Epoch: 36   Global Step: 63140   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-04 15:49:34,691-Speed 13780.83 samples/sec   Loss 0.9830   LearningRate 0.0000   Epoch: 36   Global Step: 63150   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-04 15:49:52,557-Speed 13756.57 samples/sec   Loss 0.9872   LearningRate 0.0000   Epoch: 36   Global Step: 63160   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-04 15:50:10,392-Speed 13781.18 samples/sec   Loss 0.9828   LearningRate 0.0000   Epoch: 36   Global Step: 63170   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-04 15:50:28,258-Speed 13756.70 samples/sec   Loss 0.9937   LearningRate 0.0000   Epoch: 36   Global Step: 63180   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-04 15:50:46,101-Speed 13774.67 samples/sec   Loss 0.9856   LearningRate 0.0000   Epoch: 36   Global Step: 63190   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-04 15:51:04,022-Speed 13713.82 samples/sec   Loss 0.9891   LearningRate 0.0000   Epoch: 36   Global Step: 63200   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-04 15:51:21,836-Speed 13796.82 samples/sec   Loss 0.9862   LearningRate 0.0000   Epoch: 36   Global Step: 63210   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-04 15:51:39,665-Speed 13785.87 samples/sec   Loss 0.9902   LearningRate 0.0000   Epoch: 36   Global Step: 63220   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-04 15:51:57,411-Speed 13849.09 samples/sec   Loss 0.9872   LearningRate 0.0000   Epoch: 36   Global Step: 63230   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-04 15:52:15,179-Speed 13832.80 samples/sec   Loss 0.9920   LearningRate 0.0000   Epoch: 36   Global Step: 63240   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-04 15:52:33,118-Speed 13700.31 samples/sec   Loss 0.9864   LearningRate 0.0000   Epoch: 36   Global Step: 63250   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-03-04 15:52:50,888-Speed 13830.93 samples/sec   Loss 0.9956   LearningRate 0.0000   Epoch: 36   Global Step: 63260   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-03-04 15:53:08,726-Speed 13778.24 samples/sec   Loss 0.9925   LearningRate 0.0000   Epoch: 36   Global Step: 63270   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-04 15:53:26,495-Speed 13831.58 samples/sec   Loss 0.9882   LearningRate 0.0000   Epoch: 36   Global Step: 63280   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-04 15:53:44,248-Speed 13844.82 samples/sec   Loss 0.9970   LearningRate 0.0000   Epoch: 36   Global Step: 63290   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-04 15:54:02,143-Speed 13734.42 samples/sec   Loss 0.9903   LearningRate 0.0000   Epoch: 36   Global Step: 63300   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-04 15:54:19,880-Speed 13856.35 samples/sec   Loss 0.9939   LearningRate 0.0000   Epoch: 36   Global Step: 63310   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-04 15:54:37,685-Speed 13803.42 samples/sec   Loss 0.9891   LearningRate 0.0000   Epoch: 36   Global Step: 63320   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-04 15:54:55,524-Speed 13777.63 samples/sec   Loss 0.9968   LearningRate 0.0000   Epoch: 36   Global Step: 63330   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-04 15:55:13,358-Speed 13781.63 samples/sec   Loss 0.9900   LearningRate 0.0000   Epoch: 36   Global Step: 63340   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-04 15:55:31,140-Speed 13821.63 samples/sec   Loss 0.9900   LearningRate 0.0000   Epoch: 36   Global Step: 63350   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-04 15:55:48,962-Speed 13790.76 samples/sec   Loss 0.9866   LearningRate 0.0000   Epoch: 36   Global Step: 63360   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-04 15:56:06,790-Speed 13786.10 samples/sec   Loss 0.9960   LearningRate 0.0000   Epoch: 36   Global Step: 63370   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-04 15:56:24,650-Speed 13761.80 samples/sec   Loss 0.9916   LearningRate 0.0000   Epoch: 36   Global Step: 63380   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-04 15:56:42,561-Speed 13721.93 samples/sec   Loss 0.9896   LearningRate 0.0000   Epoch: 36   Global Step: 63390   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-04 15:57:00,366-Speed 13803.44 samples/sec   Loss 0.9853   LearningRate 0.0000   Epoch: 36   Global Step: 63400   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-04 15:57:18,136-Speed 13830.97 samples/sec   Loss 0.9856   LearningRate 0.0000   Epoch: 36   Global Step: 63410   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-04 15:57:35,908-Speed 13829.21 samples/sec   Loss 0.9938   LearningRate 0.0000   Epoch: 36   Global Step: 63420   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-04 15:57:53,680-Speed 13829.59 samples/sec   Loss 0.9923   LearningRate 0.0000   Epoch: 36   Global Step: 63430   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-04 15:58:11,531-Speed 13768.03 samples/sec   Loss 0.9849   LearningRate 0.0000   Epoch: 36   Global Step: 63440   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-04 15:58:29,384-Speed 13766.44 samples/sec   Loss 0.9791   LearningRate 0.0000   Epoch: 36   Global Step: 63450   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-04 15:58:47,193-Speed 13801.14 samples/sec   Loss 0.9835   LearningRate 0.0000   Epoch: 36   Global Step: 63460   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-04 15:59:05,054-Speed 13760.30 samples/sec   Loss 0.9831   LearningRate 0.0000   Epoch: 36   Global Step: 63470   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-04 15:59:22,852-Speed 13809.41 samples/sec   Loss 0.9861   LearningRate 0.0000   Epoch: 36   Global Step: 63480   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-04 15:59:40,686-Speed 13781.13 samples/sec   Loss 0.9829   LearningRate 0.0000   Epoch: 36   Global Step: 63490   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-04 15:59:58,707-Speed 13638.03 samples/sec   Loss 0.9880   LearningRate 0.0000   Epoch: 36   Global Step: 63500   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-04 16:00:17,004-Speed 13433.30 samples/sec   Loss 0.9931   LearningRate 0.0000   Epoch: 36   Global Step: 63510   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-04 16:00:34,825-Speed 13790.84 samples/sec   Loss 0.9758   LearningRate 0.0000   Epoch: 36   Global Step: 63520   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-04 16:00:52,644-Speed 13792.76 samples/sec   Loss 0.9905   LearningRate 0.0000   Epoch: 36   Global Step: 63530   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-04 16:01:10,482-Speed 13779.21 samples/sec   Loss 0.9851   LearningRate 0.0000   Epoch: 36   Global Step: 63540   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-04 16:01:28,270-Speed 13817.38 samples/sec   Loss 0.9873   LearningRate 0.0000   Epoch: 36   Global Step: 63550   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-03-04 16:01:46,151-Speed 13744.56 samples/sec   Loss 0.9884   LearningRate 0.0000   Epoch: 36   Global Step: 63560   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-03-04 16:02:03,933-Speed 13821.63 samples/sec   Loss 0.9799   LearningRate 0.0000   Epoch: 36   Global Step: 63570   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-04 16:02:21,781-Speed 13771.20 samples/sec   Loss 0.9864   LearningRate 0.0000   Epoch: 36   Global Step: 63580   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-04 16:02:39,609-Speed 13786.61 samples/sec   Loss 0.9913   LearningRate 0.0000   Epoch: 36   Global Step: 63590   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-04 16:02:57,469-Speed 13761.74 samples/sec   Loss 0.9858   LearningRate 0.0000   Epoch: 36   Global Step: 63600   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-04 16:03:15,274-Speed 13803.32 samples/sec   Loss 0.9827   LearningRate 0.0000   Epoch: 36   Global Step: 63610   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-04 16:03:33,058-Speed 13819.67 samples/sec   Loss 0.9892   LearningRate 0.0000   Epoch: 36   Global Step: 63620   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-04 16:03:50,886-Speed 13786.66 samples/sec   Loss 0.9940   LearningRate 0.0000   Epoch: 36   Global Step: 63630   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-04 16:04:08,793-Speed 13725.36 samples/sec   Loss 0.9868   LearningRate 0.0000   Epoch: 36   Global Step: 63640   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-03-04 16:04:26,638-Speed 13772.99 samples/sec   Loss 0.9861   LearningRate 0.0000   Epoch: 36   Global Step: 63650   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-03-04 16:04:44,439-Speed 13807.79 samples/sec   Loss 0.9916   LearningRate 0.0000   Epoch: 36   Global Step: 63660   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-03-04 16:05:02,225-Speed 13818.40 samples/sec   Loss 0.9845   LearningRate 0.0000   Epoch: 36   Global Step: 63670   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-03-04 16:05:19,988-Speed 13836.21 samples/sec   Loss 0.9865   LearningRate 0.0000   Epoch: 36   Global Step: 63680   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-03-04 16:05:37,814-Speed 13787.06 samples/sec   Loss 0.9798   LearningRate 0.0000   Epoch: 36   Global Step: 63690   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-03-04 16:05:55,675-Speed 13760.87 samples/sec   Loss 0.9850   LearningRate 0.0000   Epoch: 36   Global Step: 63700   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-03-04 16:06:13,509-Speed 13781.06 samples/sec   Loss 0.9842   LearningRate 0.0000   Epoch: 36   Global Step: 63710   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-03-04 16:06:31,368-Speed 13761.69 samples/sec   Loss 0.9892   LearningRate 0.0000   Epoch: 36   Global Step: 63720   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-03-04 16:06:49,138-Speed 13831.53 samples/sec   Loss 0.9892   LearningRate 0.0000   Epoch: 36   Global Step: 63730   Fp16 Grad Scale: 8192   Required: 3 hours
Training: 2022-03-04 16:07:07,047-Speed 13723.45 samples/sec   Loss 0.9831   LearningRate 0.0000   Epoch: 36   Global Step: 63740   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-04 16:07:25,280-Speed 13479.47 samples/sec   Loss 0.9934   LearningRate 0.0000   Epoch: 36   Global Step: 63750   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-04 16:07:43,545-Speed 13456.37 samples/sec   Loss 0.9788   LearningRate 0.0000   Epoch: 36   Global Step: 63760   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-04 16:08:01,373-Speed 13787.10 samples/sec   Loss 0.9979   LearningRate 0.0000   Epoch: 36   Global Step: 63770   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-04 16:08:19,297-Speed 13712.34 samples/sec   Loss 0.9839   LearningRate 0.0000   Epoch: 36   Global Step: 63780   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-04 16:08:37,183-Speed 13741.01 samples/sec   Loss 0.9860   LearningRate 0.0000   Epoch: 36   Global Step: 63790   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-04 16:08:55,073-Speed 13738.11 samples/sec   Loss 0.9829   LearningRate 0.0000   Epoch: 36   Global Step: 63800   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-04 16:09:12,918-Speed 13773.33 samples/sec   Loss 0.9857   LearningRate 0.0000   Epoch: 36   Global Step: 63810   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-04 16:09:30,799-Speed 13744.58 samples/sec   Loss 0.9826   LearningRate 0.0000   Epoch: 36   Global Step: 63820   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-04 16:09:48,589-Speed 13815.61 samples/sec   Loss 0.9834   LearningRate 0.0000   Epoch: 36   Global Step: 63830   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-04 16:10:06,406-Speed 13795.69 samples/sec   Loss 0.9896   LearningRate 0.0000   Epoch: 36   Global Step: 63840   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-04 16:10:24,232-Speed 13787.45 samples/sec   Loss 0.9855   LearningRate 0.0000   Epoch: 36   Global Step: 63850   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-04 16:10:41,972-Speed 13854.30 samples/sec   Loss 0.9919   LearningRate 0.0000   Epoch: 36   Global Step: 63860   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-04 16:10:59,775-Speed 13805.82 samples/sec   Loss 0.9830   LearningRate 0.0000   Epoch: 36   Global Step: 63870   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-04 16:11:17,648-Speed 13751.06 samples/sec   Loss 0.9884   LearningRate 0.0000   Epoch: 36   Global Step: 63880   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-04 16:11:35,489-Speed 13776.02 samples/sec   Loss 0.9850   LearningRate 0.0000   Epoch: 36   Global Step: 63890   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-04 16:11:53,352-Speed 13758.55 samples/sec   Loss 0.9878   LearningRate 0.0000   Epoch: 36   Global Step: 63900   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-04 16:12:11,235-Speed 13744.94 samples/sec   Loss 0.9913   LearningRate 0.0000   Epoch: 36   Global Step: 63910   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-04 16:12:29,233-Speed 13655.70 samples/sec   Loss 0.9869   LearningRate 0.0000   Epoch: 36   Global Step: 63920   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-04 16:12:47,174-Speed 13699.10 samples/sec   Loss 0.9858   LearningRate 0.0000   Epoch: 36   Global Step: 63930   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-04 16:13:05,012-Speed 13778.57 samples/sec   Loss 0.9800   LearningRate 0.0000   Epoch: 36   Global Step: 63940   Fp16 Grad Scale: 65536   Required: 3 hours
Training: 2022-03-04 16:14:13,298-Speed 3599.06 samples/sec   Loss 0.9862   LearningRate 0.0000   Epoch: 37   Global Step: 63950   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-04 16:14:31,001-Speed 13883.08 samples/sec   Loss 0.9854   LearningRate 0.0000   Epoch: 37   Global Step: 63960   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-04 16:14:48,695-Speed 13890.11 samples/sec   Loss 0.9822   LearningRate 0.0000   Epoch: 37   Global Step: 63970   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-04 16:15:06,537-Speed 13775.27 samples/sec   Loss 0.9888   LearningRate 0.0000   Epoch: 37   Global Step: 63980   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-04 16:15:24,416-Speed 13746.43 samples/sec   Loss 0.9872   LearningRate 0.0000   Epoch: 37   Global Step: 63990   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-04 16:15:42,276-Speed 13763.42 samples/sec   Loss 0.9749   LearningRate 0.0000   Epoch: 37   Global Step: 64000   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-04 16:16:00,054-Speed 13825.03 samples/sec   Loss 0.9876   LearningRate 0.0000   Epoch: 37   Global Step: 64010   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-04 16:16:17,808-Speed 13844.12 samples/sec   Loss 0.9877   LearningRate 0.0000   Epoch: 37   Global Step: 64020   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-04 16:16:35,644-Speed 13779.96 samples/sec   Loss 0.9750   LearningRate 0.0000   Epoch: 37   Global Step: 64030   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-04 16:16:54,222-Speed 13229.01 samples/sec   Loss 0.9845   LearningRate 0.0000   Epoch: 37   Global Step: 64040   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-04 16:17:11,986-Speed 13836.31 samples/sec   Loss 0.9830   LearningRate 0.0000   Epoch: 37   Global Step: 64050   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-04 16:17:29,826-Speed 13776.33 samples/sec   Loss 0.9881   LearningRate 0.0000   Epoch: 37   Global Step: 64060   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-04 16:17:48,356-Speed 13263.87 samples/sec   Loss 0.9832   LearningRate 0.0000   Epoch: 37   Global Step: 64070   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-04 16:18:06,196-Speed 13776.61 samples/sec   Loss 0.9880   LearningRate 0.0000   Epoch: 37   Global Step: 64080   Fp16 Grad Scale: 16384   Required: 3 hours
Training: 2022-03-04 16:18:24,025-Speed 13785.68 samples/sec   Loss 0.9842   LearningRate 0.0000   Epoch: 37   Global Step: 64090   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-04 16:18:41,824-Speed 13808.29 samples/sec   Loss 0.9741   LearningRate 0.0000   Epoch: 37   Global Step: 64100   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-04 16:18:59,612-Speed 13818.16 samples/sec   Loss 0.9765   LearningRate 0.0000   Epoch: 37   Global Step: 64110   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-04 16:19:17,562-Speed 13692.24 samples/sec   Loss 0.9784   LearningRate 0.0000   Epoch: 37   Global Step: 64120   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-04 16:19:35,360-Speed 13809.53 samples/sec   Loss 0.9814   LearningRate 0.0000   Epoch: 37   Global Step: 64130   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-04 16:19:53,175-Speed 13796.37 samples/sec   Loss 0.9799   LearningRate 0.0000   Epoch: 37   Global Step: 64140   Fp16 Grad Scale: 32768   Required: 3 hours
Training: 2022-03-04 16:20:10,992-Speed 13794.02 samples/sec   Loss 0.9811   LearningRate 0.0000   Epoch: 37   Global Step: 64150   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-04 16:20:28,904-Speed 13721.35 samples/sec   Loss 0.9782   LearningRate 0.0000   Epoch: 37   Global Step: 64160   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-04 16:20:46,673-Speed 13831.98 samples/sec   Loss 0.9825   LearningRate 0.0000   Epoch: 37   Global Step: 64170   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-04 16:21:04,446-Speed 13828.91 samples/sec   Loss 0.9722   LearningRate 0.0000   Epoch: 37   Global Step: 64180   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-04 16:21:22,327-Speed 13744.93 samples/sec   Loss 0.9732   LearningRate 0.0000   Epoch: 37   Global Step: 64190   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-03-04 16:21:40,164-Speed 13779.12 samples/sec   Loss 0.9855   LearningRate 0.0000   Epoch: 37   Global Step: 64200   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-04 16:21:57,883-Speed 13871.03 samples/sec   Loss 0.9809   LearningRate 0.0000   Epoch: 37   Global Step: 64210   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-04 16:22:15,634-Speed 13845.80 samples/sec   Loss 0.9790   LearningRate 0.0000   Epoch: 37   Global Step: 64220   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-04 16:22:33,444-Speed 13799.55 samples/sec   Loss 0.9840   LearningRate 0.0000   Epoch: 37   Global Step: 64230   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-04 16:22:51,223-Speed 13823.76 samples/sec   Loss 0.9737   LearningRate 0.0000   Epoch: 37   Global Step: 64240   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-04 16:23:09,018-Speed 13811.02 samples/sec   Loss 0.9826   LearningRate 0.0000   Epoch: 37   Global Step: 64250   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-04 16:23:26,728-Speed 13878.37 samples/sec   Loss 0.9921   LearningRate 0.0000   Epoch: 37   Global Step: 64260   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-04 16:23:44,470-Speed 13852.44 samples/sec   Loss 0.9799   LearningRate 0.0000   Epoch: 37   Global Step: 64270   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-04 16:24:02,293-Speed 13790.43 samples/sec   Loss 0.9836   LearningRate 0.0000   Epoch: 37   Global Step: 64280   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-04 16:24:20,141-Speed 13771.95 samples/sec   Loss 0.9905   LearningRate 0.0000   Epoch: 37   Global Step: 64290   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-04 16:24:37,955-Speed 13797.47 samples/sec   Loss 0.9902   LearningRate 0.0000   Epoch: 37   Global Step: 64300   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-04 16:24:55,788-Speed 13781.62 samples/sec   Loss 0.9840   LearningRate 0.0000   Epoch: 37   Global Step: 64310   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-04 16:25:13,719-Speed 13706.86 samples/sec   Loss 0.9908   LearningRate 0.0000   Epoch: 37   Global Step: 64320   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-04 16:25:31,561-Speed 13775.39 samples/sec   Loss 0.9792   LearningRate 0.0000   Epoch: 37   Global Step: 64330   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-04 16:25:50,140-Speed 13235.30 samples/sec   Loss 0.9845   LearningRate 0.0000   Epoch: 37   Global Step: 64340   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-04 16:26:07,911-Speed 13830.22 samples/sec   Loss 0.9828   LearningRate 0.0000   Epoch: 37   Global Step: 64350   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-04 16:26:25,770-Speed 13762.12 samples/sec   Loss 0.9751   LearningRate 0.0000   Epoch: 37   Global Step: 64360   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-04 16:26:43,570-Speed 13809.10 samples/sec   Loss 0.9856   LearningRate 0.0000   Epoch: 37   Global Step: 64370   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-04 16:27:01,325-Speed 13844.12 samples/sec   Loss 0.9779   LearningRate 0.0000   Epoch: 37   Global Step: 64380   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-04 16:27:19,864-Speed 13257.12 samples/sec   Loss 0.9836   LearningRate 0.0000   Epoch: 37   Global Step: 64390   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-04 16:27:37,666-Speed 13805.84 samples/sec   Loss 0.9788   LearningRate 0.0000   Epoch: 37   Global Step: 64400   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-04 16:27:55,474-Speed 13801.58 samples/sec   Loss 0.9787   LearningRate 0.0000   Epoch: 37   Global Step: 64410   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-04 16:28:13,396-Speed 13713.70 samples/sec   Loss 0.9827   LearningRate 0.0000   Epoch: 37   Global Step: 64420   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-04 16:28:31,209-Speed 13797.50 samples/sec   Loss 0.9834   LearningRate 0.0000   Epoch: 37   Global Step: 64430   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-04 16:28:48,973-Speed 13835.52 samples/sec   Loss 0.9822   LearningRate 0.0000   Epoch: 37   Global Step: 64440   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-04 16:29:06,761-Speed 13816.71 samples/sec   Loss 0.9891   LearningRate 0.0000   Epoch: 37   Global Step: 64450   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-04 16:29:24,620-Speed 13762.66 samples/sec   Loss 0.9821   LearningRate 0.0000   Epoch: 37   Global Step: 64460   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-04 16:29:42,409-Speed 13816.05 samples/sec   Loss 0.9867   LearningRate 0.0000   Epoch: 37   Global Step: 64470   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-04 16:30:00,240-Speed 13783.22 samples/sec   Loss 0.9849   LearningRate 0.0000   Epoch: 37   Global Step: 64480   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-04 16:30:18,049-Speed 13800.85 samples/sec   Loss 0.9780   LearningRate 0.0000   Epoch: 37   Global Step: 64490   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-04 16:30:35,804-Speed 13842.94 samples/sec   Loss 0.9846   LearningRate 0.0000   Epoch: 37   Global Step: 64500   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-04 16:30:53,632-Speed 13785.44 samples/sec   Loss 0.9800   LearningRate 0.0000   Epoch: 37   Global Step: 64510   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-04 16:31:11,452-Speed 13792.17 samples/sec   Loss 0.9796   LearningRate 0.0000   Epoch: 37   Global Step: 64520   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-04 16:31:29,596-Speed 13545.85 samples/sec   Loss 0.9815   LearningRate 0.0000   Epoch: 37   Global Step: 64530   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-03-04 16:31:47,939-Speed 13399.02 samples/sec   Loss 0.9756   LearningRate 0.0000   Epoch: 37   Global Step: 64540   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-03-04 16:32:05,780-Speed 13776.08 samples/sec   Loss 0.9846   LearningRate 0.0000   Epoch: 37   Global Step: 64550   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-03-04 16:32:23,549-Speed 13831.78 samples/sec   Loss 0.9781   LearningRate 0.0000   Epoch: 37   Global Step: 64560   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-03-04 16:32:41,434-Speed 13741.73 samples/sec   Loss 0.9693   LearningRate 0.0000   Epoch: 37   Global Step: 64570   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-03-04 16:33:00,033-Speed 13214.75 samples/sec   Loss 0.9890   LearningRate 0.0000   Epoch: 37   Global Step: 64580   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-03-04 16:33:18,658-Speed 13196.43 samples/sec   Loss 0.9819   LearningRate 0.0000   Epoch: 37   Global Step: 64590   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-03-04 16:33:36,457-Speed 13807.73 samples/sec   Loss 0.9837   LearningRate 0.0000   Epoch: 37   Global Step: 64600   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-03-04 16:33:54,719-Speed 13458.19 samples/sec   Loss 0.9837   LearningRate 0.0000   Epoch: 37   Global Step: 64610   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-03-04 16:34:12,824-Speed 13575.22 samples/sec   Loss 0.9826   LearningRate 0.0000   Epoch: 37   Global Step: 64620   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-03-04 16:34:30,588-Speed 13835.81 samples/sec   Loss 0.9809   LearningRate 0.0000   Epoch: 37   Global Step: 64630   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-04 16:34:48,316-Speed 13863.77 samples/sec   Loss 0.9881   LearningRate 0.0000   Epoch: 37   Global Step: 64640   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-04 16:35:05,978-Speed 13915.35 samples/sec   Loss 0.9871   LearningRate 0.0000   Epoch: 37   Global Step: 64650   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-04 16:35:23,727-Speed 13847.47 samples/sec   Loss 0.9793   LearningRate 0.0000   Epoch: 37   Global Step: 64660   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-04 16:35:41,440-Speed 13875.78 samples/sec   Loss 0.9760   LearningRate 0.0000   Epoch: 37   Global Step: 64670   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-04 16:35:59,217-Speed 13825.95 samples/sec   Loss 0.9685   LearningRate 0.0000   Epoch: 37   Global Step: 64680   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-04 16:36:16,982-Speed 13834.46 samples/sec   Loss 0.9832   LearningRate 0.0000   Epoch: 37   Global Step: 64690   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-04 16:36:34,652-Speed 13909.72 samples/sec   Loss 0.9719   LearningRate 0.0000   Epoch: 37   Global Step: 64700   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-04 16:36:52,478-Speed 13787.23 samples/sec   Loss 0.9798   LearningRate 0.0000   Epoch: 37   Global Step: 64710   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-04 16:37:10,243-Speed 13834.69 samples/sec   Loss 0.9724   LearningRate 0.0000   Epoch: 37   Global Step: 64720   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-04 16:37:28,017-Speed 13828.40 samples/sec   Loss 0.9857   LearningRate 0.0000   Epoch: 37   Global Step: 64730   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-04 16:37:45,768-Speed 13849.54 samples/sec   Loss 0.9830   LearningRate 0.0000   Epoch: 37   Global Step: 64740   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-04 16:38:03,585-Speed 13794.53 samples/sec   Loss 0.9762   LearningRate 0.0000   Epoch: 37   Global Step: 64750   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-04 16:38:21,380-Speed 13811.07 samples/sec   Loss 0.9752   LearningRate 0.0000   Epoch: 37   Global Step: 64760   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-04 16:38:39,107-Speed 13864.59 samples/sec   Loss 0.9726   LearningRate 0.0000   Epoch: 37   Global Step: 64770   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-04 16:38:56,775-Speed 13911.02 samples/sec   Loss 0.9705   LearningRate 0.0000   Epoch: 37   Global Step: 64780   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-04 16:39:14,676-Speed 13729.95 samples/sec   Loss 0.9762   LearningRate 0.0000   Epoch: 37   Global Step: 64790   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-04 16:39:32,523-Speed 13770.89 samples/sec   Loss 0.9761   LearningRate 0.0000   Epoch: 37   Global Step: 64800   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-04 16:39:50,676-Speed 13538.89 samples/sec   Loss 0.9665   LearningRate 0.0000   Epoch: 37   Global Step: 64810   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-04 16:40:08,902-Speed 13484.68 samples/sec   Loss 0.9765   LearningRate 0.0000   Epoch: 37   Global Step: 64820   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-04 16:40:26,625-Speed 13867.90 samples/sec   Loss 0.9777   LearningRate 0.0000   Epoch: 37   Global Step: 64830   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-03-04 16:40:44,343-Speed 13872.17 samples/sec   Loss 0.9814   LearningRate 0.0000   Epoch: 37   Global Step: 64840   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-04 16:41:02,175-Speed 13782.79 samples/sec   Loss 0.9742   LearningRate 0.0000   Epoch: 37   Global Step: 64850   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-04 16:41:20,418-Speed 13472.45 samples/sec   Loss 0.9840   LearningRate 0.0000   Epoch: 37   Global Step: 64860   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-04 16:41:38,559-Speed 13547.59 samples/sec   Loss 0.9736   LearningRate 0.0000   Epoch: 37   Global Step: 64870   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-04 16:41:56,286-Speed 13864.34 samples/sec   Loss 0.9780   LearningRate 0.0000   Epoch: 37   Global Step: 64880   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-04 16:42:13,997-Speed 13877.36 samples/sec   Loss 0.9710   LearningRate 0.0000   Epoch: 37   Global Step: 64890   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-04 16:42:31,796-Speed 13808.14 samples/sec   Loss 0.9794   LearningRate 0.0000   Epoch: 37   Global Step: 64900   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-04 16:42:49,488-Speed 13893.06 samples/sec   Loss 0.9801   LearningRate 0.0000   Epoch: 37   Global Step: 64910   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-04 16:43:07,278-Speed 13815.34 samples/sec   Loss 0.9801   LearningRate 0.0000   Epoch: 37   Global Step: 64920   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-04 16:43:25,035-Speed 13841.20 samples/sec   Loss 0.9739   LearningRate 0.0000   Epoch: 37   Global Step: 64930   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-04 16:43:42,824-Speed 13816.66 samples/sec   Loss 0.9840   LearningRate 0.0000   Epoch: 37   Global Step: 64940   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-04 16:44:00,534-Speed 13877.34 samples/sec   Loss 0.9785   LearningRate 0.0000   Epoch: 37   Global Step: 64950   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-04 16:44:18,344-Speed 13800.22 samples/sec   Loss 0.9705   LearningRate 0.0000   Epoch: 37   Global Step: 64960   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-04 16:44:36,185-Speed 13776.28 samples/sec   Loss 0.9701   LearningRate 0.0000   Epoch: 37   Global Step: 64970   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-04 16:44:53,926-Speed 13853.19 samples/sec   Loss 0.9831   LearningRate 0.0000   Epoch: 37   Global Step: 64980   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-04 16:45:11,700-Speed 13828.05 samples/sec   Loss 0.9716   LearningRate 0.0000   Epoch: 37   Global Step: 64990   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-04 16:45:29,406-Speed 13880.86 samples/sec   Loss 0.9758   LearningRate 0.0000   Epoch: 37   Global Step: 65000   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-04 16:45:47,325-Speed 13716.17 samples/sec   Loss 0.9860   LearningRate 0.0000   Epoch: 37   Global Step: 65010   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-04 16:46:05,060-Speed 13858.11 samples/sec   Loss 0.9636   LearningRate 0.0000   Epoch: 37   Global Step: 65020   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-04 16:46:22,823-Speed 13835.96 samples/sec   Loss 0.9770   LearningRate 0.0000   Epoch: 37   Global Step: 65030   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-04 16:46:40,528-Speed 13882.37 samples/sec   Loss 0.9707   LearningRate 0.0000   Epoch: 37   Global Step: 65040   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-04 16:46:58,202-Speed 13906.08 samples/sec   Loss 0.9753   LearningRate 0.0000   Epoch: 37   Global Step: 65050   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-04 16:47:15,887-Speed 13897.22 samples/sec   Loss 0.9786   LearningRate 0.0000   Epoch: 37   Global Step: 65060   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-04 16:47:33,695-Speed 13801.40 samples/sec   Loss 0.9754   LearningRate 0.0000   Epoch: 37   Global Step: 65070   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-04 16:47:51,450-Speed 13842.78 samples/sec   Loss 0.9825   LearningRate 0.0000   Epoch: 37   Global Step: 65080   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-04 16:48:09,223-Speed 13828.97 samples/sec   Loss 0.9794   LearningRate 0.0000   Epoch: 37   Global Step: 65090   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-04 16:48:26,957-Speed 13858.75 samples/sec   Loss 0.9754   LearningRate 0.0000   Epoch: 37   Global Step: 65100   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-04 16:48:44,632-Speed 13905.56 samples/sec   Loss 0.9747   LearningRate 0.0000   Epoch: 37   Global Step: 65110   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-03-04 16:49:02,317-Speed 13896.99 samples/sec   Loss 0.9715   LearningRate 0.0000   Epoch: 37   Global Step: 65120   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-04 16:49:20,116-Speed 13809.03 samples/sec   Loss 0.9770   LearningRate 0.0000   Epoch: 37   Global Step: 65130   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-04 16:49:37,777-Speed 13915.82 samples/sec   Loss 0.9812   LearningRate 0.0000   Epoch: 37   Global Step: 65140   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-04 16:49:55,480-Speed 13883.05 samples/sec   Loss 0.9793   LearningRate 0.0000   Epoch: 37   Global Step: 65150   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-04 16:50:13,194-Speed 13875.38 samples/sec   Loss 0.9636   LearningRate 0.0000   Epoch: 37   Global Step: 65160   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-04 16:50:30,857-Speed 13914.93 samples/sec   Loss 0.9751   LearningRate 0.0000   Epoch: 37   Global Step: 65170   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-04 16:50:48,541-Speed 13897.33 samples/sec   Loss 0.9740   LearningRate 0.0000   Epoch: 37   Global Step: 65180   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-04 16:51:06,330-Speed 13816.44 samples/sec   Loss 0.9760   LearningRate 0.0000   Epoch: 37   Global Step: 65190   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-04 16:51:24,010-Speed 13901.89 samples/sec   Loss 0.9765   LearningRate 0.0000   Epoch: 37   Global Step: 65200   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-04 16:51:41,707-Speed 13888.14 samples/sec   Loss 0.9690   LearningRate 0.0000   Epoch: 37   Global Step: 65210   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-04 16:51:59,510-Speed 13805.30 samples/sec   Loss 0.9739   LearningRate 0.0000   Epoch: 37   Global Step: 65220   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-04 16:52:17,311-Speed 13806.82 samples/sec   Loss 0.9778   LearningRate 0.0000   Epoch: 37   Global Step: 65230   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-04 16:52:34,973-Speed 13915.62 samples/sec   Loss 0.9736   LearningRate 0.0000   Epoch: 37   Global Step: 65240   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-04 16:52:52,672-Speed 13886.98 samples/sec   Loss 0.9832   LearningRate 0.0000   Epoch: 37   Global Step: 65250   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-04 16:53:10,378-Speed 13880.56 samples/sec   Loss 0.9727   LearningRate 0.0000   Epoch: 37   Global Step: 65260   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-04 16:53:28,152-Speed 13827.45 samples/sec   Loss 0.9746   LearningRate 0.0000   Epoch: 37   Global Step: 65270   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-04 16:53:45,905-Speed 13844.22 samples/sec   Loss 0.9790   LearningRate 0.0000   Epoch: 37   Global Step: 65280   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-04 16:54:03,635-Speed 13862.24 samples/sec   Loss 0.9836   LearningRate 0.0000   Epoch: 37   Global Step: 65290   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-04 16:54:21,313-Speed 13903.01 samples/sec   Loss 0.9756   LearningRate 0.0000   Epoch: 37   Global Step: 65300   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-04 16:54:39,148-Speed 13780.85 samples/sec   Loss 0.9761   LearningRate 0.0000   Epoch: 37   Global Step: 65310   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-04 16:54:56,940-Speed 13813.60 samples/sec   Loss 0.9744   LearningRate 0.0000   Epoch: 37   Global Step: 65320   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-04 16:55:14,643-Speed 13883.70 samples/sec   Loss 0.9699   LearningRate 0.0000   Epoch: 37   Global Step: 65330   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-04 16:55:32,533-Speed 13737.86 samples/sec   Loss 0.9680   LearningRate 0.0000   Epoch: 37   Global Step: 65340   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-04 16:55:50,242-Speed 13879.12 samples/sec   Loss 0.9731   LearningRate 0.0000   Epoch: 37   Global Step: 65350   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-04 16:56:07,898-Speed 13920.50 samples/sec   Loss 0.9725   LearningRate 0.0000   Epoch: 37   Global Step: 65360   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-03-04 16:56:25,590-Speed 13892.00 samples/sec   Loss 0.9777   LearningRate 0.0000   Epoch: 37   Global Step: 65370   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-04 16:56:43,343-Speed 13844.14 samples/sec   Loss 0.9774   LearningRate 0.0000   Epoch: 37   Global Step: 65380   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-04 16:57:01,066-Speed 13867.76 samples/sec   Loss 0.9787   LearningRate 0.0000   Epoch: 37   Global Step: 65390   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-04 16:57:18,806-Speed 13853.72 samples/sec   Loss 0.9683   LearningRate 0.0000   Epoch: 37   Global Step: 65400   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-04 16:57:36,540-Speed 13858.98 samples/sec   Loss 0.9786   LearningRate 0.0000   Epoch: 37   Global Step: 65410   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-04 16:57:54,255-Speed 13874.91 samples/sec   Loss 0.9758   LearningRate 0.0000   Epoch: 37   Global Step: 65420   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-04 16:58:12,029-Speed 13828.11 samples/sec   Loss 0.9745   LearningRate 0.0000   Epoch: 37   Global Step: 65430   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-04 16:58:29,733-Speed 13882.20 samples/sec   Loss 0.9857   LearningRate 0.0000   Epoch: 37   Global Step: 65440   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-04 16:58:47,557-Speed 13789.09 samples/sec   Loss 0.9822   LearningRate 0.0000   Epoch: 37   Global Step: 65450   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-04 16:59:05,322-Speed 13835.20 samples/sec   Loss 0.9673   LearningRate 0.0000   Epoch: 37   Global Step: 65460   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-04 16:59:23,017-Speed 13888.88 samples/sec   Loss 0.9750   LearningRate 0.0000   Epoch: 37   Global Step: 65470   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-04 16:59:40,775-Speed 13840.44 samples/sec   Loss 0.9812   LearningRate 0.0000   Epoch: 37   Global Step: 65480   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-04 16:59:58,488-Speed 13875.31 samples/sec   Loss 0.9701   LearningRate 0.0000   Epoch: 37   Global Step: 65490   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-04 17:00:16,183-Speed 13889.92 samples/sec   Loss 0.9732   LearningRate 0.0000   Epoch: 37   Global Step: 65500   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-04 17:00:33,933-Speed 13846.60 samples/sec   Loss 0.9685   LearningRate 0.0000   Epoch: 37   Global Step: 65510   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-04 17:00:51,668-Speed 13858.52 samples/sec   Loss 0.9761   LearningRate 0.0000   Epoch: 37   Global Step: 65520   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-04 17:01:09,359-Speed 13892.97 samples/sec   Loss 0.9676   LearningRate 0.0000   Epoch: 37   Global Step: 65530   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-04 17:01:27,080-Speed 13869.09 samples/sec   Loss 0.9725   LearningRate 0.0000   Epoch: 37   Global Step: 65540   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-04 17:01:44,888-Speed 13801.27 samples/sec   Loss 0.9751   LearningRate 0.0000   Epoch: 37   Global Step: 65550   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-04 17:02:02,566-Speed 13902.72 samples/sec   Loss 0.9746   LearningRate 0.0000   Epoch: 37   Global Step: 65560   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-04 17:02:20,421-Speed 13765.81 samples/sec   Loss 0.9765   LearningRate 0.0000   Epoch: 37   Global Step: 65570   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-04 17:02:38,152-Speed 13861.68 samples/sec   Loss 0.9736   LearningRate 0.0000   Epoch: 37   Global Step: 65580   Fp16 Grad Scale: 65536   Required: 2 hours
Training: 2022-03-04 17:02:55,863-Speed 13876.95 samples/sec   Loss 0.9790   LearningRate 0.0000   Epoch: 37   Global Step: 65590   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-04 17:03:13,562-Speed 13886.67 samples/sec   Loss 0.9755   LearningRate 0.0000   Epoch: 37   Global Step: 65600   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-04 17:03:31,403-Speed 13775.86 samples/sec   Loss 0.9759   LearningRate 0.0000   Epoch: 37   Global Step: 65610   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-04 17:03:49,087-Speed 13898.22 samples/sec   Loss 0.9797   LearningRate 0.0000   Epoch: 37   Global Step: 65620   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-04 17:04:06,765-Speed 13903.41 samples/sec   Loss 0.9754   LearningRate 0.0000   Epoch: 37   Global Step: 65630   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-04 17:04:24,536-Speed 13830.03 samples/sec   Loss 0.9793   LearningRate 0.0000   Epoch: 37   Global Step: 65640   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-04 17:04:42,391-Speed 13764.36 samples/sec   Loss 0.9759   LearningRate 0.0000   Epoch: 37   Global Step: 65650   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-04 17:05:00,079-Speed 13895.40 samples/sec   Loss 0.9768   LearningRate 0.0000   Epoch: 37   Global Step: 65660   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-04 17:05:17,797-Speed 13871.42 samples/sec   Loss 0.9686   LearningRate 0.0000   Epoch: 37   Global Step: 65670   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-04 17:06:25,735-Speed 3617.53 samples/sec   Loss 0.9787   LearningRate 0.0000   Epoch: 38   Global Step: 65680   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-04 17:06:43,411-Speed 13904.28 samples/sec   Loss 0.9816   LearningRate 0.0000   Epoch: 38   Global Step: 65690   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-04 17:07:01,177-Speed 13834.33 samples/sec   Loss 0.9742   LearningRate 0.0000   Epoch: 38   Global Step: 65700   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-04 17:07:18,819-Speed 13931.46 samples/sec   Loss 0.9714   LearningRate 0.0000   Epoch: 38   Global Step: 65710   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-04 17:07:36,636-Speed 13794.75 samples/sec   Loss 0.9648   LearningRate 0.0000   Epoch: 38   Global Step: 65720   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-04 17:07:54,593-Speed 13687.08 samples/sec   Loss 0.9822   LearningRate 0.0000   Epoch: 38   Global Step: 65730   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-04 17:08:12,598-Speed 13650.27 samples/sec   Loss 0.9683   LearningRate 0.0000   Epoch: 38   Global Step: 65740   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-04 17:08:30,600-Speed 13652.78 samples/sec   Loss 0.9721   LearningRate 0.0000   Epoch: 38   Global Step: 65750   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-04 17:08:48,532-Speed 13705.97 samples/sec   Loss 0.9613   LearningRate 0.0000   Epoch: 38   Global Step: 65760   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-04 17:09:06,550-Speed 13640.80 samples/sec   Loss 0.9741   LearningRate 0.0000   Epoch: 38   Global Step: 65770   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-04 17:09:24,525-Speed 13673.18 samples/sec   Loss 0.9673   LearningRate 0.0000   Epoch: 38   Global Step: 65780   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-04 17:09:42,521-Speed 13657.16 samples/sec   Loss 0.9785   LearningRate 0.0000   Epoch: 38   Global Step: 65790   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-04 17:10:00,614-Speed 13584.20 samples/sec   Loss 0.9674   LearningRate 0.0000   Epoch: 38   Global Step: 65800   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-04 17:10:18,737-Speed 13561.26 samples/sec   Loss 0.9731   LearningRate 0.0000   Epoch: 38   Global Step: 65810   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-04 17:10:36,682-Speed 13696.37 samples/sec   Loss 0.9629   LearningRate 0.0000   Epoch: 38   Global Step: 65820   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-04 17:10:54,848-Speed 13529.49 samples/sec   Loss 0.9768   LearningRate 0.0000   Epoch: 38   Global Step: 65830   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-04 17:11:12,855-Speed 13648.49 samples/sec   Loss 0.9732   LearningRate 0.0000   Epoch: 38   Global Step: 65840   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-04 17:11:30,833-Speed 13671.24 samples/sec   Loss 0.9685   LearningRate 0.0000   Epoch: 38   Global Step: 65850   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-04 17:11:48,818-Speed 13666.32 samples/sec   Loss 0.9695   LearningRate 0.0000   Epoch: 38   Global Step: 65860   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-04 17:12:06,807-Speed 13662.60 samples/sec   Loss 0.9771   LearningRate 0.0000   Epoch: 38   Global Step: 65870   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-04 17:12:24,979-Speed 13525.05 samples/sec   Loss 0.9663   LearningRate 0.0000   Epoch: 38   Global Step: 65880   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-04 17:12:43,007-Speed 13632.76 samples/sec   Loss 0.9705   LearningRate 0.0000   Epoch: 38   Global Step: 65890   Fp16 Grad Scale: 32768   Required: 2 hours
Training: 2022-03-04 17:13:00,939-Speed 13706.09 samples/sec   Loss 0.9670   LearningRate 0.0000   Epoch: 38   Global Step: 65900   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-04 17:13:18,938-Speed 13654.95 samples/sec   Loss 0.9859   LearningRate 0.0000   Epoch: 38   Global Step: 65910   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-04 17:13:36,999-Speed 13608.27 samples/sec   Loss 0.9707   LearningRate 0.0000   Epoch: 38   Global Step: 65920   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-04 17:13:55,017-Speed 13640.51 samples/sec   Loss 0.9707   LearningRate 0.0000   Epoch: 38   Global Step: 65930   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-04 17:14:13,044-Speed 13633.62 samples/sec   Loss 0.9703   LearningRate 0.0000   Epoch: 38   Global Step: 65940   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-04 17:14:31,087-Speed 13622.16 samples/sec   Loss 0.9770   LearningRate 0.0000   Epoch: 38   Global Step: 65950   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-04 17:14:49,109-Speed 13636.88 samples/sec   Loss 0.9735   LearningRate 0.0000   Epoch: 38   Global Step: 65960   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-04 17:15:07,099-Speed 13661.57 samples/sec   Loss 0.9763   LearningRate 0.0000   Epoch: 38   Global Step: 65970   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-04 17:15:25,077-Speed 13671.18 samples/sec   Loss 0.9772   LearningRate 0.0000   Epoch: 38   Global Step: 65980   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-03-04 17:15:43,035-Speed 13686.94 samples/sec   Loss 0.9747   LearningRate 0.0000   Epoch: 38   Global Step: 65990   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-03-04 17:16:01,060-Speed 13636.38 samples/sec   Loss 0.9760   LearningRate 0.0000   Epoch: 38   Global Step: 66000   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-03-04 17:16:19,140-Speed 13593.35 samples/sec   Loss 0.9785   LearningRate 0.0000   Epoch: 38   Global Step: 66010   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-03-04 17:16:37,133-Speed 13659.57 samples/sec   Loss 0.9677   LearningRate 0.0000   Epoch: 38   Global Step: 66020   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-03-04 17:16:55,167-Speed 13628.94 samples/sec   Loss 0.9706   LearningRate 0.0000   Epoch: 38   Global Step: 66030   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-03-04 17:17:13,345-Speed 13520.53 samples/sec   Loss 0.9762   LearningRate 0.0000   Epoch: 38   Global Step: 66040   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-03-04 17:17:31,428-Speed 13591.59 samples/sec   Loss 0.9738   LearningRate 0.0000   Epoch: 38   Global Step: 66050   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-03-04 17:17:49,472-Speed 13620.67 samples/sec   Loss 0.9697   LearningRate 0.0000   Epoch: 38   Global Step: 66060   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-03-04 17:18:07,488-Speed 13641.98 samples/sec   Loss 0.9677   LearningRate 0.0000   Epoch: 38   Global Step: 66070   Fp16 Grad Scale: 8192   Required: 2 hours
Training: 2022-03-04 17:18:25,502-Speed 13644.16 samples/sec   Loss 0.9715   LearningRate 0.0000   Epoch: 38   Global Step: 66080   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-04 17:18:43,524-Speed 13636.93 samples/sec   Loss 0.9786   LearningRate 0.0000   Epoch: 38   Global Step: 66090   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-04 17:19:01,537-Speed 13644.69 samples/sec   Loss 0.9648   LearningRate 0.0000   Epoch: 38   Global Step: 66100   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-04 17:19:19,563-Speed 13636.40 samples/sec   Loss 0.9685   LearningRate 0.0000   Epoch: 38   Global Step: 66110   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-04 17:19:37,540-Speed 13671.61 samples/sec   Loss 0.9720   LearningRate 0.0000   Epoch: 38   Global Step: 66120   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-04 17:19:55,560-Speed 13639.11 samples/sec   Loss 0.9772   LearningRate 0.0000   Epoch: 38   Global Step: 66130   Fp16 Grad Scale: 16384   Required: 2 hours
Training: 2022-03-04 17:20:13,576-Speed 13641.33 samples/sec   Loss 0.9773   LearningRate 0.0000   Epoch: 38   Global Step: 66140   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-04 17:20:31,665-Speed 13587.97 samples/sec   Loss 0.9745   LearningRate 0.0000   Epoch: 38   Global Step: 66150   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-04 17:20:49,785-Speed 13563.28 samples/sec   Loss 0.9756   LearningRate 0.0000   Epoch: 38   Global Step: 66160   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-04 17:21:07,804-Speed 13639.84 samples/sec   Loss 0.9686   LearningRate 0.0000   Epoch: 38   Global Step: 66170   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-04 17:21:25,772-Speed 13678.69 samples/sec   Loss 0.9720   LearningRate 0.0000   Epoch: 38   Global Step: 66180   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-04 17:21:43,804-Speed 13629.76 samples/sec   Loss 0.9723   LearningRate 0.0000   Epoch: 38   Global Step: 66190   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-04 17:22:01,842-Speed 13625.27 samples/sec   Loss 0.9706   LearningRate 0.0000   Epoch: 38   Global Step: 66200   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-04 17:22:19,926-Speed 13590.82 samples/sec   Loss 0.9663   LearningRate 0.0000   Epoch: 38   Global Step: 66210   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-04 17:22:38,010-Speed 13590.74 samples/sec   Loss 0.9711   LearningRate 0.0000   Epoch: 38   Global Step: 66220   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-04 17:22:56,040-Speed 13631.43 samples/sec   Loss 0.9761   LearningRate 0.0000   Epoch: 38   Global Step: 66230   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-04 17:23:14,010-Speed 13677.42 samples/sec   Loss 0.9699   LearningRate 0.0000   Epoch: 38   Global Step: 66240   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-04 17:23:32,032-Speed 13637.02 samples/sec   Loss 0.9693   LearningRate 0.0000   Epoch: 38   Global Step: 66250   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-04 17:23:49,999-Speed 13679.56 samples/sec   Loss 0.9808   LearningRate 0.0000   Epoch: 38   Global Step: 66260   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-04 17:24:08,039-Speed 13623.62 samples/sec   Loss 0.9715   LearningRate 0.0000   Epoch: 38   Global Step: 66270   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-04 17:24:26,062-Speed 13636.76 samples/sec   Loss 0.9678   LearningRate 0.0000   Epoch: 38   Global Step: 66280   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-04 17:24:44,050-Speed 13663.61 samples/sec   Loss 0.9803   LearningRate 0.0000   Epoch: 38   Global Step: 66290   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-04 17:25:02,121-Speed 13600.75 samples/sec   Loss 0.9674   LearningRate 0.0000   Epoch: 38   Global Step: 66300   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-04 17:25:20,118-Speed 13656.49 samples/sec   Loss 0.9710   LearningRate 0.0000   Epoch: 38   Global Step: 66310   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-04 17:25:38,218-Speed 13578.25 samples/sec   Loss 0.9687   LearningRate 0.0000   Epoch: 38   Global Step: 66320   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-04 17:25:56,174-Speed 13688.46 samples/sec   Loss 0.9700   LearningRate 0.0000   Epoch: 38   Global Step: 66330   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-04 17:26:14,146-Speed 13674.65 samples/sec   Loss 0.9698   LearningRate 0.0000   Epoch: 38   Global Step: 66340   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-04 17:26:32,164-Speed 13640.60 samples/sec   Loss 0.9644   LearningRate 0.0000   Epoch: 38   Global Step: 66350   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-04 17:26:50,287-Speed 13561.87 samples/sec   Loss 0.9714   LearningRate 0.0000   Epoch: 38   Global Step: 66360   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-04 17:27:08,295-Speed 13648.15 samples/sec   Loss 0.9736   LearningRate 0.0000   Epoch: 38   Global Step: 66370   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-04 17:27:26,297-Speed 13653.03 samples/sec   Loss 0.9764   LearningRate 0.0000   Epoch: 38   Global Step: 66380   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-04 17:27:44,344-Speed 13617.88 samples/sec   Loss 0.9669   LearningRate 0.0000   Epoch: 38   Global Step: 66390   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-04 17:28:02,358-Speed 13644.13 samples/sec   Loss 0.9694   LearningRate 0.0000   Epoch: 38   Global Step: 66400   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-04 17:28:20,431-Speed 13599.33 samples/sec   Loss 0.9699   LearningRate 0.0000   Epoch: 38   Global Step: 66410   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-04 17:28:38,434-Speed 13651.21 samples/sec   Loss 0.9762   LearningRate 0.0000   Epoch: 38   Global Step: 66420   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-04 17:28:56,423-Speed 13663.04 samples/sec   Loss 0.9700   LearningRate 0.0000   Epoch: 38   Global Step: 66430   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-04 17:29:14,424-Speed 13652.82 samples/sec   Loss 0.9763   LearningRate 0.0000   Epoch: 38   Global Step: 66440   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-04 17:29:32,566-Speed 13547.82 samples/sec   Loss 0.9670   LearningRate 0.0000   Epoch: 38   Global Step: 66450   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-04 17:29:50,617-Speed 13615.98 samples/sec   Loss 0.9674   LearningRate 0.0000   Epoch: 38   Global Step: 66460   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-04 17:30:08,655-Speed 13624.96 samples/sec   Loss 0.9706   LearningRate 0.0000   Epoch: 38   Global Step: 66470   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-04 17:30:26,708-Speed 13651.23 samples/sec   Loss 0.9649   LearningRate 0.0000   Epoch: 38   Global Step: 66480   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-04 17:30:44,803-Speed 13582.97 samples/sec   Loss 0.9736   LearningRate 0.0000   Epoch: 38   Global Step: 66490   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-04 17:31:02,893-Speed 13586.13 samples/sec   Loss 0.9692   LearningRate 0.0000   Epoch: 38   Global Step: 66500   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-04 17:31:20,923-Speed 13631.49 samples/sec   Loss 0.9675   LearningRate 0.0000   Epoch: 38   Global Step: 66510   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-04 17:31:38,898-Speed 13673.54 samples/sec   Loss 0.9611   LearningRate 0.0000   Epoch: 38   Global Step: 66520   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-04 17:31:56,891-Speed 13659.68 samples/sec   Loss 0.9796   LearningRate 0.0000   Epoch: 38   Global Step: 66530   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-04 17:32:14,920-Speed 13631.88 samples/sec   Loss 0.9747   LearningRate 0.0000   Epoch: 38   Global Step: 66540   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-04 17:32:32,934-Speed 13643.73 samples/sec   Loss 0.9619   LearningRate 0.0000   Epoch: 38   Global Step: 66550   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-04 17:32:51,010-Speed 13596.23 samples/sec   Loss 0.9658   LearningRate 0.0000   Epoch: 38   Global Step: 66560   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-04 17:33:09,059-Speed 13617.67 samples/sec   Loss 0.9699   LearningRate 0.0000   Epoch: 38   Global Step: 66570   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-03-04 17:33:27,056-Speed 13656.27 samples/sec   Loss 0.9638   LearningRate 0.0000   Epoch: 38   Global Step: 66580   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-04 17:33:45,063-Speed 13648.99 samples/sec   Loss 0.9626   LearningRate 0.0000   Epoch: 38   Global Step: 66590   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-04 17:34:03,059-Speed 13657.44 samples/sec   Loss 0.9752   LearningRate 0.0000   Epoch: 38   Global Step: 66600   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-04 17:34:21,104-Speed 13622.11 samples/sec   Loss 0.9719   LearningRate 0.0000   Epoch: 38   Global Step: 66610   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-04 17:34:39,111-Speed 13649.63 samples/sec   Loss 0.9681   LearningRate 0.0000   Epoch: 38   Global Step: 66620   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-04 17:34:57,156-Speed 13619.84 samples/sec   Loss 0.9680   LearningRate 0.0000   Epoch: 38   Global Step: 66630   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-04 17:35:15,178-Speed 13636.94 samples/sec   Loss 0.9696   LearningRate 0.0000   Epoch: 38   Global Step: 66640   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-04 17:35:33,215-Speed 13626.35 samples/sec   Loss 0.9692   LearningRate 0.0000   Epoch: 38   Global Step: 66650   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-04 17:35:51,237-Speed 13637.67 samples/sec   Loss 0.9653   LearningRate 0.0000   Epoch: 38   Global Step: 66660   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-04 17:36:09,329-Speed 13585.38 samples/sec   Loss 0.9666   LearningRate 0.0000   Epoch: 38   Global Step: 66670   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-04 17:36:27,412-Speed 13591.37 samples/sec   Loss 0.9681   LearningRate 0.0000   Epoch: 38   Global Step: 66680   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-04 17:36:45,466-Speed 13613.47 samples/sec   Loss 0.9728   LearningRate 0.0000   Epoch: 38   Global Step: 66690   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-04 17:37:03,526-Speed 13609.05 samples/sec   Loss 0.9715   LearningRate 0.0000   Epoch: 38   Global Step: 66700   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-04 17:37:21,592-Speed 13603.91 samples/sec   Loss 0.9695   LearningRate 0.0000   Epoch: 38   Global Step: 66710   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-04 17:37:39,606-Speed 13643.29 samples/sec   Loss 0.9723   LearningRate 0.0000   Epoch: 38   Global Step: 66720   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-04 17:37:57,788-Speed 13518.05 samples/sec   Loss 0.9713   LearningRate 0.0000   Epoch: 38   Global Step: 66730   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-04 17:38:15,775-Speed 13664.23 samples/sec   Loss 0.9655   LearningRate 0.0000   Epoch: 38   Global Step: 66740   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-04 17:38:33,780-Speed 13650.00 samples/sec   Loss 0.9686   LearningRate 0.0000   Epoch: 38   Global Step: 66750   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-04 17:38:51,912-Speed 13554.99 samples/sec   Loss 0.9613   LearningRate 0.0000   Epoch: 38   Global Step: 66760   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-04 17:39:10,070-Speed 13535.22 samples/sec   Loss 0.9771   LearningRate 0.0000   Epoch: 38   Global Step: 66770   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-04 17:39:28,180-Speed 13571.56 samples/sec   Loss 0.9678   LearningRate 0.0000   Epoch: 38   Global Step: 66780   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-03-04 17:39:46,117-Speed 13703.88 samples/sec   Loss 0.9703   LearningRate 0.0000   Epoch: 38   Global Step: 66790   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-04 17:40:04,165-Speed 13618.17 samples/sec   Loss 0.9687   LearningRate 0.0000   Epoch: 38   Global Step: 66800   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-04 17:40:22,216-Speed 13615.66 samples/sec   Loss 0.9731   LearningRate 0.0000   Epoch: 38   Global Step: 66810   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-04 17:40:40,170-Speed 13688.94 samples/sec   Loss 0.9682   LearningRate 0.0000   Epoch: 38   Global Step: 66820   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-04 17:40:58,228-Speed 13610.45 samples/sec   Loss 0.9733   LearningRate 0.0000   Epoch: 38   Global Step: 66830   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-04 17:41:16,251-Speed 13636.88 samples/sec   Loss 0.9709   LearningRate 0.0000   Epoch: 38   Global Step: 66840   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-04 17:41:34,205-Speed 13689.22 samples/sec   Loss 0.9672   LearningRate 0.0000   Epoch: 38   Global Step: 66850   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-04 17:41:52,329-Speed 13561.19 samples/sec   Loss 0.9670   LearningRate 0.0000   Epoch: 38   Global Step: 66860   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-04 17:42:10,457-Speed 13557.78 samples/sec   Loss 0.9686   LearningRate 0.0000   Epoch: 38   Global Step: 66870   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-04 17:42:28,513-Speed 13611.35 samples/sec   Loss 0.9681   LearningRate 0.0000   Epoch: 38   Global Step: 66880   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-04 17:42:46,551-Speed 13625.78 samples/sec   Loss 0.9653   LearningRate 0.0000   Epoch: 38   Global Step: 66890   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-04 17:43:04,610-Speed 13609.58 samples/sec   Loss 0.9674   LearningRate 0.0000   Epoch: 38   Global Step: 66900   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-04 17:43:22,578-Speed 13678.87 samples/sec   Loss 0.9671   LearningRate 0.0000   Epoch: 38   Global Step: 66910   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-04 17:43:40,621-Speed 13621.11 samples/sec   Loss 0.9672   LearningRate 0.0000   Epoch: 38   Global Step: 66920   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-04 17:43:58,602-Speed 13669.76 samples/sec   Loss 0.9661   LearningRate 0.0000   Epoch: 38   Global Step: 66930   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-04 17:44:16,642-Speed 13623.67 samples/sec   Loss 0.9697   LearningRate 0.0000   Epoch: 38   Global Step: 66940   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-04 17:44:34,707-Speed 13605.61 samples/sec   Loss 0.9642   LearningRate 0.0000   Epoch: 38   Global Step: 66950   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-04 17:44:52,677-Speed 13676.62 samples/sec   Loss 0.9675   LearningRate 0.0000   Epoch: 38   Global Step: 66960   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-04 17:45:10,650-Speed 13674.55 samples/sec   Loss 0.9632   LearningRate 0.0000   Epoch: 38   Global Step: 66970   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-04 17:45:28,767-Speed 13567.14 samples/sec   Loss 0.9571   LearningRate 0.0000   Epoch: 38   Global Step: 66980   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-04 17:45:46,774-Speed 13648.61 samples/sec   Loss 0.9624   LearningRate 0.0000   Epoch: 38   Global Step: 66990   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-04 17:46:04,808-Speed 13628.46 samples/sec   Loss 0.9704   LearningRate 0.0000   Epoch: 38   Global Step: 67000   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-04 17:46:22,890-Speed 13592.31 samples/sec   Loss 0.9639   LearningRate 0.0000   Epoch: 38   Global Step: 67010   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-04 17:46:40,883-Speed 13659.79 samples/sec   Loss 0.9719   LearningRate 0.0000   Epoch: 38   Global Step: 67020   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-04 17:46:58,896-Speed 13643.97 samples/sec   Loss 0.9667   LearningRate 0.0000   Epoch: 38   Global Step: 67030   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-04 17:47:16,923-Speed 13634.00 samples/sec   Loss 0.9677   LearningRate 0.0000   Epoch: 38   Global Step: 67040   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-04 17:47:35,099-Speed 13685.74 samples/sec   Loss 0.9621   LearningRate 0.0000   Epoch: 38   Global Step: 67050   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-04 17:47:53,196-Speed 13581.29 samples/sec   Loss 0.9799   LearningRate 0.0000   Epoch: 38   Global Step: 67060   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-04 17:48:11,195-Speed 13655.46 samples/sec   Loss 0.9662   LearningRate 0.0000   Epoch: 38   Global Step: 67070   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-04 17:48:29,209-Speed 13644.71 samples/sec   Loss 0.9642   LearningRate 0.0000   Epoch: 38   Global Step: 67080   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-04 17:48:47,251-Speed 13622.41 samples/sec   Loss 0.9663   LearningRate 0.0000   Epoch: 38   Global Step: 67090   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-04 17:49:05,235-Speed 13665.81 samples/sec   Loss 0.9704   LearningRate 0.0000   Epoch: 38   Global Step: 67100   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-04 17:49:23,296-Speed 13608.45 samples/sec   Loss 0.9682   LearningRate 0.0000   Epoch: 38   Global Step: 67110   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-03-04 17:49:41,252-Speed 13687.78 samples/sec   Loss 0.9703   LearningRate 0.0000   Epoch: 38   Global Step: 67120   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-04 17:49:59,298-Speed 13619.42 samples/sec   Loss 0.9704   LearningRate 0.0000   Epoch: 38   Global Step: 67130   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-04 17:50:17,310-Speed 13644.89 samples/sec   Loss 0.9614   LearningRate 0.0000   Epoch: 38   Global Step: 67140   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-04 17:50:35,437-Speed 13558.68 samples/sec   Loss 0.9713   LearningRate 0.0000   Epoch: 38   Global Step: 67150   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-04 17:50:53,532-Speed 13582.16 samples/sec   Loss 0.9668   LearningRate 0.0000   Epoch: 38   Global Step: 67160   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-04 17:51:11,664-Speed 13554.85 samples/sec   Loss 0.9765   LearningRate 0.0000   Epoch: 38   Global Step: 67170   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-04 17:51:29,707-Speed 13621.40 samples/sec   Loss 0.9645   LearningRate 0.0000   Epoch: 38   Global Step: 67180   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-04 17:51:47,729-Speed 13638.31 samples/sec   Loss 0.9761   LearningRate 0.0000   Epoch: 38   Global Step: 67190   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-04 17:52:05,781-Speed 13614.92 samples/sec   Loss 0.9762   LearningRate 0.0000   Epoch: 38   Global Step: 67200   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-04 17:52:23,556-Speed 13826.71 samples/sec   Loss 0.9707   LearningRate 0.0000   Epoch: 38   Global Step: 67210   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-04 17:52:41,247-Speed 13892.78 samples/sec   Loss 0.9619   LearningRate 0.0000   Epoch: 38   Global Step: 67220   Fp16 Grad Scale: 65536   Required: 1 hours
Training: 2022-03-04 17:52:58,955-Speed 13879.48 samples/sec   Loss 0.9778   LearningRate 0.0000   Epoch: 38   Global Step: 67230   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-04 17:53:16,700-Speed 13850.43 samples/sec   Loss 0.9607   LearningRate 0.0000   Epoch: 38   Global Step: 67240   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-04 17:53:34,502-Speed 13805.75 samples/sec   Loss 0.9584   LearningRate 0.0000   Epoch: 38   Global Step: 67250   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-04 17:53:52,251-Speed 13847.07 samples/sec   Loss 0.9640   LearningRate 0.0000   Epoch: 38   Global Step: 67260   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-04 17:54:09,922-Speed 13909.00 samples/sec   Loss 0.9640   LearningRate 0.0000   Epoch: 38   Global Step: 67270   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-04 17:54:27,719-Speed 13809.95 samples/sec   Loss 0.9745   LearningRate 0.0000   Epoch: 38   Global Step: 67280   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-04 17:54:45,412-Speed 13891.01 samples/sec   Loss 0.9628   LearningRate 0.0000   Epoch: 38   Global Step: 67290   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-04 17:55:03,113-Speed 13884.41 samples/sec   Loss 0.9699   LearningRate 0.0000   Epoch: 38   Global Step: 67300   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-04 17:55:20,811-Speed 13887.54 samples/sec   Loss 0.9663   LearningRate 0.0000   Epoch: 38   Global Step: 67310   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-04 17:55:38,590-Speed 13824.76 samples/sec   Loss 0.9615   LearningRate 0.0000   Epoch: 38   Global Step: 67320   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-04 17:55:56,315-Speed 13865.63 samples/sec   Loss 0.9709   LearningRate 0.0000   Epoch: 38   Global Step: 67330   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-04 17:56:14,039-Speed 13866.75 samples/sec   Loss 0.9693   LearningRate 0.0000   Epoch: 38   Global Step: 67340   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-04 17:56:31,772-Speed 13859.50 samples/sec   Loss 0.9699   LearningRate 0.0000   Epoch: 38   Global Step: 67350   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-04 17:56:49,441-Speed 13910.26 samples/sec   Loss 0.9667   LearningRate 0.0000   Epoch: 38   Global Step: 67360   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-04 17:57:07,231-Speed 13815.21 samples/sec   Loss 0.9776   LearningRate 0.0000   Epoch: 38   Global Step: 67370   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-04 17:57:24,932-Speed 13885.08 samples/sec   Loss 0.9722   LearningRate 0.0000   Epoch: 38   Global Step: 67380   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-04 17:57:42,699-Speed 13833.28 samples/sec   Loss 0.9723   LearningRate 0.0000   Epoch: 38   Global Step: 67390   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-04 17:58:00,348-Speed 13925.91 samples/sec   Loss 0.9700   LearningRate 0.0000   Epoch: 38   Global Step: 67400   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-04 17:59:08,417-Speed 3610.54 samples/sec   Loss 0.9729   LearningRate 0.0000   Epoch: 39   Global Step: 67410   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-04 17:59:25,989-Speed 13986.41 samples/sec   Loss 0.9703   LearningRate 0.0000   Epoch: 39   Global Step: 67420   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-04 17:59:43,682-Speed 13890.92 samples/sec   Loss 0.9661   LearningRate 0.0000   Epoch: 39   Global Step: 67430   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-04 18:00:01,389-Speed 13880.34 samples/sec   Loss 0.9634   LearningRate 0.0000   Epoch: 39   Global Step: 67440   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-04 18:00:19,121-Speed 13860.76 samples/sec   Loss 0.9752   LearningRate 0.0000   Epoch: 39   Global Step: 67450   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-04 18:00:36,825-Speed 13882.72 samples/sec   Loss 0.9635   LearningRate 0.0000   Epoch: 39   Global Step: 67460   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-04 18:00:54,650-Speed 13788.21 samples/sec   Loss 0.9684   LearningRate 0.0000   Epoch: 39   Global Step: 67470   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-04 18:01:12,394-Speed 13850.84 samples/sec   Loss 0.9663   LearningRate 0.0000   Epoch: 39   Global Step: 67480   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-04 18:01:30,086-Speed 13892.48 samples/sec   Loss 0.9627   LearningRate 0.0000   Epoch: 39   Global Step: 67490   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-04 18:01:47,743-Speed 13919.55 samples/sec   Loss 0.9717   LearningRate 0.0000   Epoch: 39   Global Step: 67500   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-04 18:02:05,397-Speed 13921.68 samples/sec   Loss 0.9627   LearningRate 0.0000   Epoch: 39   Global Step: 67510   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-04 18:02:23,056-Speed 13917.69 samples/sec   Loss 0.9648   LearningRate 0.0000   Epoch: 39   Global Step: 67520   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-04 18:02:40,831-Speed 13827.45 samples/sec   Loss 0.9578   LearningRate 0.0000   Epoch: 39   Global Step: 67530   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-04 18:02:58,550-Speed 13870.87 samples/sec   Loss 0.9636   LearningRate 0.0000   Epoch: 39   Global Step: 67540   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-04 18:03:16,321-Speed 13830.43 samples/sec   Loss 0.9660   LearningRate 0.0000   Epoch: 39   Global Step: 67550   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-04 18:03:34,019-Speed 13886.59 samples/sec   Loss 0.9522   LearningRate 0.0000   Epoch: 39   Global Step: 67560   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-04 18:03:51,700-Speed 13900.91 samples/sec   Loss 0.9689   LearningRate 0.0000   Epoch: 39   Global Step: 67570   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-04 18:04:09,385-Speed 13898.27 samples/sec   Loss 0.9748   LearningRate 0.0000   Epoch: 39   Global Step: 67580   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-04 18:04:27,094-Speed 13878.11 samples/sec   Loss 0.9698   LearningRate 0.0000   Epoch: 39   Global Step: 67590   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-04 18:04:44,791-Speed 13888.19 samples/sec   Loss 0.9692   LearningRate 0.0000   Epoch: 39   Global Step: 67600   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-04 18:05:02,453-Speed 13915.45 samples/sec   Loss 0.9609   LearningRate 0.0000   Epoch: 39   Global Step: 67610   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-04 18:05:20,227-Speed 13827.99 samples/sec   Loss 0.9713   LearningRate 0.0000   Epoch: 39   Global Step: 67620   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-04 18:05:37,942-Speed 13873.47 samples/sec   Loss 0.9642   LearningRate 0.0000   Epoch: 39   Global Step: 67630   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-04 18:05:55,666-Speed 13867.40 samples/sec   Loss 0.9677   LearningRate 0.0000   Epoch: 39   Global Step: 67640   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-04 18:06:13,358-Speed 13891.40 samples/sec   Loss 0.9707   LearningRate 0.0000   Epoch: 39   Global Step: 67650   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-04 18:06:31,121-Speed 13836.66 samples/sec   Loss 0.9552   LearningRate 0.0000   Epoch: 39   Global Step: 67660   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-04 18:06:48,838-Speed 13872.14 samples/sec   Loss 0.9701   LearningRate 0.0000   Epoch: 39   Global Step: 67670   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-04 18:07:06,596-Speed 13840.78 samples/sec   Loss 0.9708   LearningRate 0.0000   Epoch: 39   Global Step: 67680   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-04 18:07:24,361-Speed 13834.73 samples/sec   Loss 0.9711   LearningRate 0.0000   Epoch: 39   Global Step: 67690   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-04 18:07:42,049-Speed 13895.02 samples/sec   Loss 0.9667   LearningRate 0.0000   Epoch: 39   Global Step: 67700   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-04 18:07:59,745-Speed 13888.39 samples/sec   Loss 0.9660   LearningRate 0.0000   Epoch: 39   Global Step: 67710   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-04 18:08:17,447-Speed 13884.65 samples/sec   Loss 0.9704   LearningRate 0.0000   Epoch: 39   Global Step: 67720   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-04 18:08:35,152-Speed 13881.14 samples/sec   Loss 0.9660   LearningRate 0.0000   Epoch: 39   Global Step: 67730   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-04 18:08:52,893-Speed 13853.92 samples/sec   Loss 0.9638   LearningRate 0.0000   Epoch: 39   Global Step: 67740   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-04 18:09:10,591-Speed 13888.37 samples/sec   Loss 0.9645   LearningRate 0.0000   Epoch: 39   Global Step: 67750   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-04 18:09:28,265-Speed 13906.37 samples/sec   Loss 0.9695   LearningRate 0.0000   Epoch: 39   Global Step: 67760   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-04 18:09:45,992-Speed 13864.13 samples/sec   Loss 0.9637   LearningRate 0.0000   Epoch: 39   Global Step: 67770   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-04 18:10:03,682-Speed 13893.95 samples/sec   Loss 0.9687   LearningRate 0.0000   Epoch: 39   Global Step: 67780   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-04 18:10:21,370-Speed 13894.81 samples/sec   Loss 0.9718   LearningRate 0.0000   Epoch: 39   Global Step: 67790   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-04 18:10:39,106-Speed 13857.34 samples/sec   Loss 0.9722   LearningRate 0.0000   Epoch: 39   Global Step: 67800   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-04 18:10:56,780-Speed 13906.43 samples/sec   Loss 0.9608   LearningRate 0.0000   Epoch: 39   Global Step: 67810   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-04 18:11:14,496-Speed 13872.82 samples/sec   Loss 0.9611   LearningRate 0.0000   Epoch: 39   Global Step: 67820   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-04 18:11:32,210-Speed 13874.93 samples/sec   Loss 0.9679   LearningRate 0.0000   Epoch: 39   Global Step: 67830   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-04 18:11:49,896-Speed 13897.30 samples/sec   Loss 0.9665   LearningRate 0.0000   Epoch: 39   Global Step: 67840   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-04 18:12:07,574-Speed 13903.29 samples/sec   Loss 0.9704   LearningRate 0.0000   Epoch: 39   Global Step: 67850   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-04 18:12:25,293-Speed 13870.23 samples/sec   Loss 0.9597   LearningRate 0.0000   Epoch: 39   Global Step: 67860   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-04 18:12:43,034-Speed 13854.66 samples/sec   Loss 0.9696   LearningRate 0.0000   Epoch: 39   Global Step: 67870   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-04 18:13:00,817-Speed 13820.83 samples/sec   Loss 0.9693   LearningRate 0.0000   Epoch: 39   Global Step: 67880   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-04 18:13:18,462-Speed 13929.21 samples/sec   Loss 0.9724   LearningRate 0.0000   Epoch: 39   Global Step: 67890   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-04 18:13:36,092-Speed 13941.04 samples/sec   Loss 0.9707   LearningRate 0.0000   Epoch: 39   Global Step: 67900   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-04 18:13:53,807-Speed 13874.10 samples/sec   Loss 0.9718   LearningRate 0.0000   Epoch: 39   Global Step: 67910   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-04 18:14:11,556-Speed 13847.05 samples/sec   Loss 0.9706   LearningRate 0.0000   Epoch: 39   Global Step: 67920   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-04 18:14:29,291-Speed 13859.24 samples/sec   Loss 0.9625   LearningRate 0.0000   Epoch: 39   Global Step: 67930   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-04 18:14:47,042-Speed 13845.39 samples/sec   Loss 0.9619   LearningRate 0.0000   Epoch: 39   Global Step: 67940   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-04 18:15:04,758-Speed 13873.14 samples/sec   Loss 0.9677   LearningRate 0.0000   Epoch: 39   Global Step: 67950   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-04 18:15:22,432-Speed 13906.54 samples/sec   Loss 0.9622   LearningRate 0.0000   Epoch: 39   Global Step: 67960   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-04 18:15:40,157-Speed 13865.70 samples/sec   Loss 0.9690   LearningRate 0.0000   Epoch: 39   Global Step: 67970   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-04 18:15:57,829-Speed 13907.35 samples/sec   Loss 0.9587   LearningRate 0.0000   Epoch: 39   Global Step: 67980   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-04 18:16:15,640-Speed 13798.97 samples/sec   Loss 0.9706   LearningRate 0.0000   Epoch: 39   Global Step: 67990   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-04 18:16:33,424-Speed 13820.36 samples/sec   Loss 0.9637   LearningRate 0.0000   Epoch: 39   Global Step: 68000   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-04 18:16:51,078-Speed 13921.71 samples/sec   Loss 0.9647   LearningRate 0.0000   Epoch: 39   Global Step: 68010   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-04 18:17:08,857-Speed 13824.35 samples/sec   Loss 0.9704   LearningRate 0.0000   Epoch: 39   Global Step: 68020   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-04 18:17:26,556-Speed 13886.02 samples/sec   Loss 0.9622   LearningRate 0.0000   Epoch: 39   Global Step: 68030   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-04 18:17:44,310-Speed 13843.75 samples/sec   Loss 0.9674   LearningRate 0.0000   Epoch: 39   Global Step: 68040   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-04 18:18:02,020-Speed 13878.51 samples/sec   Loss 0.9687   LearningRate 0.0000   Epoch: 39   Global Step: 68050   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-04 18:18:19,770-Speed 13846.38 samples/sec   Loss 0.9725   LearningRate 0.0000   Epoch: 39   Global Step: 68060   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-04 18:18:37,478-Speed 13879.43 samples/sec   Loss 0.9732   LearningRate 0.0000   Epoch: 39   Global Step: 68070   Fp16 Grad Scale: 16384   Required: 1 hours
Training: 2022-03-04 18:18:55,198-Speed 13869.90 samples/sec   Loss 0.9663   LearningRate 0.0000   Epoch: 39   Global Step: 68080   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-04 18:19:12,925-Speed 13864.89 samples/sec   Loss 0.9636   LearningRate 0.0000   Epoch: 39   Global Step: 68090   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-04 18:19:30,628-Speed 13883.65 samples/sec   Loss 0.9650   LearningRate 0.0000   Epoch: 39   Global Step: 68100   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-04 18:19:48,325-Speed 13887.66 samples/sec   Loss 0.9774   LearningRate 0.0000   Epoch: 39   Global Step: 68110   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-04 18:20:06,109-Speed 13820.21 samples/sec   Loss 0.9575   LearningRate 0.0000   Epoch: 39   Global Step: 68120   Fp16 Grad Scale: 32768   Required: 1 hours
Training: 2022-03-04 18:20:23,943-Speed 13780.57 samples/sec   Loss 0.9685   LearningRate 0.0000   Epoch: 39   Global Step: 68130   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-04 18:20:41,755-Speed 13798.57 samples/sec   Loss 0.9695   LearningRate 0.0000   Epoch: 39   Global Step: 68140   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-04 18:20:59,434-Speed 13901.71 samples/sec   Loss 0.9665   LearningRate 0.0000   Epoch: 39   Global Step: 68150   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-04 18:21:17,298-Speed 13757.90 samples/sec   Loss 0.9605   LearningRate 0.0000   Epoch: 39   Global Step: 68160   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-04 18:21:35,010-Speed 13876.55 samples/sec   Loss 0.9767   LearningRate 0.0000   Epoch: 39   Global Step: 68170   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-04 18:21:52,757-Speed 13848.27 samples/sec   Loss 0.9617   LearningRate 0.0000   Epoch: 39   Global Step: 68180   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-04 18:22:10,502-Speed 13850.07 samples/sec   Loss 0.9694   LearningRate 0.0000   Epoch: 39   Global Step: 68190   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-04 18:22:28,312-Speed 13799.52 samples/sec   Loss 0.9720   LearningRate 0.0000   Epoch: 39   Global Step: 68200   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-04 18:22:46,096-Speed 13820.06 samples/sec   Loss 0.9592   LearningRate 0.0000   Epoch: 39   Global Step: 68210   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-04 18:23:03,827-Speed 13860.94 samples/sec   Loss 0.9626   LearningRate 0.0000   Epoch: 39   Global Step: 68220   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-04 18:23:21,617-Speed 13815.36 samples/sec   Loss 0.9706   LearningRate 0.0000   Epoch: 39   Global Step: 68230   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-04 18:23:39,327-Speed 13877.34 samples/sec   Loss 0.9580   LearningRate 0.0000   Epoch: 39   Global Step: 68240   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-04 18:23:57,084-Speed 13840.89 samples/sec   Loss 0.9603   LearningRate 0.0000   Epoch: 39   Global Step: 68250   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-04 18:24:14,851-Speed 13833.34 samples/sec   Loss 0.9657   LearningRate 0.0000   Epoch: 39   Global Step: 68260   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-04 18:24:32,607-Speed 13841.95 samples/sec   Loss 0.9546   LearningRate 0.0000   Epoch: 39   Global Step: 68270   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-04 18:24:50,341-Speed 13858.21 samples/sec   Loss 0.9622   LearningRate 0.0000   Epoch: 39   Global Step: 68280   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-03-04 18:25:08,007-Speed 13912.16 samples/sec   Loss 0.9646   LearningRate 0.0000   Epoch: 39   Global Step: 68290   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-04 18:25:25,753-Speed 13849.27 samples/sec   Loss 0.9624   LearningRate 0.0000   Epoch: 39   Global Step: 68300   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-04 18:25:43,483-Speed 13861.80 samples/sec   Loss 0.9549   LearningRate 0.0000   Epoch: 39   Global Step: 68310   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-04 18:26:01,243-Speed 13838.97 samples/sec   Loss 0.9574   LearningRate 0.0000   Epoch: 39   Global Step: 68320   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-04 18:26:18,999-Speed 13841.11 samples/sec   Loss 0.9660   LearningRate 0.0000   Epoch: 39   Global Step: 68330   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-04 18:26:36,753-Speed 13843.11 samples/sec   Loss 0.9621   LearningRate 0.0000   Epoch: 39   Global Step: 68340   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-04 18:26:54,525-Speed 13829.21 samples/sec   Loss 0.9669   LearningRate 0.0000   Epoch: 39   Global Step: 68350   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-04 18:27:12,250-Speed 13866.32 samples/sec   Loss 0.9597   LearningRate 0.0000   Epoch: 39   Global Step: 68360   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-04 18:27:30,086-Speed 13778.98 samples/sec   Loss 0.9671   LearningRate 0.0000   Epoch: 39   Global Step: 68370   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-04 18:27:47,886-Speed 13807.12 samples/sec   Loss 0.9596   LearningRate 0.0000   Epoch: 39   Global Step: 68380   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-04 18:28:05,777-Speed 13737.99 samples/sec   Loss 0.9618   LearningRate 0.0000   Epoch: 39   Global Step: 68390   Fp16 Grad Scale: 65536   Required: 0 hours
Training: 2022-03-04 18:28:23,519-Speed 13852.87 samples/sec   Loss 0.9664   LearningRate 0.0000   Epoch: 39   Global Step: 68400   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-04 18:28:41,304-Speed 13818.61 samples/sec   Loss 0.9693   LearningRate 0.0000   Epoch: 39   Global Step: 68410   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-04 18:28:59,017-Speed 13875.23 samples/sec   Loss 0.9672   LearningRate 0.0000   Epoch: 39   Global Step: 68420   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-04 18:29:16,780-Speed 13836.04 samples/sec   Loss 0.9692   LearningRate 0.0000   Epoch: 39   Global Step: 68430   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-04 18:29:34,574-Speed 13812.13 samples/sec   Loss 0.9661   LearningRate 0.0000   Epoch: 39   Global Step: 68440   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-03-04 18:29:52,216-Speed 13931.24 samples/sec   Loss 0.9692   LearningRate 0.0000   Epoch: 39   Global Step: 68450   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-03-04 18:30:09,882-Speed 13912.08 samples/sec   Loss 0.9631   LearningRate 0.0000   Epoch: 39   Global Step: 68460   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-03-04 18:30:27,591-Speed 13878.06 samples/sec   Loss 0.9743   LearningRate 0.0000   Epoch: 39   Global Step: 68470   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-03-04 18:30:45,331-Speed 13854.20 samples/sec   Loss 0.9655   LearningRate 0.0000   Epoch: 39   Global Step: 68480   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-03-04 18:31:03,101-Speed 13830.75 samples/sec   Loss 0.9693   LearningRate 0.0000   Epoch: 39   Global Step: 68490   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-03-04 18:31:20,850-Speed 13846.51 samples/sec   Loss 0.9743   LearningRate 0.0000   Epoch: 39   Global Step: 68500   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-03-04 18:31:38,642-Speed 13813.93 samples/sec   Loss 0.9689   LearningRate 0.0000   Epoch: 39   Global Step: 68510   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-03-04 18:31:56,350-Speed 13879.80 samples/sec   Loss 0.9637   LearningRate 0.0000   Epoch: 39   Global Step: 68520   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-03-04 18:32:14,038-Speed 13894.59 samples/sec   Loss 0.9642   LearningRate 0.0000   Epoch: 39   Global Step: 68530   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-03-04 18:32:31,760-Speed 13867.55 samples/sec   Loss 0.9659   LearningRate 0.0000   Epoch: 39   Global Step: 68540   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-04 18:32:49,467-Speed 13880.47 samples/sec   Loss 0.9647   LearningRate 0.0000   Epoch: 39   Global Step: 68550   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-04 18:33:07,220-Speed 13843.82 samples/sec   Loss 0.9606   LearningRate 0.0000   Epoch: 39   Global Step: 68560   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-04 18:33:25,028-Speed 13801.44 samples/sec   Loss 0.9627   LearningRate 0.0000   Epoch: 39   Global Step: 68570   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-04 18:33:42,788-Speed 13838.10 samples/sec   Loss 0.9640   LearningRate 0.0000   Epoch: 39   Global Step: 68580   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-04 18:34:00,557-Speed 13831.66 samples/sec   Loss 0.9582   LearningRate 0.0000   Epoch: 39   Global Step: 68590   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-04 18:34:18,283-Speed 13864.90 samples/sec   Loss 0.9722   LearningRate 0.0000   Epoch: 39   Global Step: 68600   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-04 18:34:36,062-Speed 13823.94 samples/sec   Loss 0.9634   LearningRate 0.0000   Epoch: 39   Global Step: 68610   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-04 18:34:53,809-Speed 13848.58 samples/sec   Loss 0.9689   LearningRate 0.0000   Epoch: 39   Global Step: 68620   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-03-04 18:35:11,539-Speed 13861.18 samples/sec   Loss 0.9563   LearningRate 0.0000   Epoch: 39   Global Step: 68630   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-03-04 18:35:29,227-Speed 13895.31 samples/sec   Loss 0.9632   LearningRate 0.0000   Epoch: 39   Global Step: 68640   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-03-04 18:35:47,120-Speed 13907.40 samples/sec   Loss 0.9682   LearningRate 0.0000   Epoch: 39   Global Step: 68650   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-03-04 18:36:04,855-Speed 13857.73 samples/sec   Loss 0.9701   LearningRate 0.0000   Epoch: 39   Global Step: 68660   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-03-04 18:36:22,610-Speed 13842.35 samples/sec   Loss 0.9648   LearningRate 0.0000   Epoch: 39   Global Step: 68670   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-03-04 18:36:40,476-Speed 13756.75 samples/sec   Loss 0.9673   LearningRate 0.0000   Epoch: 39   Global Step: 68680   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-03-04 18:36:58,163-Speed 13895.58 samples/sec   Loss 0.9677   LearningRate 0.0000   Epoch: 39   Global Step: 68690   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-03-04 18:37:15,927-Speed 13834.98 samples/sec   Loss 0.9664   LearningRate 0.0000   Epoch: 39   Global Step: 68700   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-03-04 18:37:33,774-Speed 13771.04 samples/sec   Loss 0.9623   LearningRate 0.0000   Epoch: 39   Global Step: 68710   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-03-04 18:37:51,493-Speed 13870.90 samples/sec   Loss 0.9649   LearningRate 0.0000   Epoch: 39   Global Step: 68720   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-04 18:38:09,141-Speed 13926.49 samples/sec   Loss 0.9579   LearningRate 0.0000   Epoch: 39   Global Step: 68730   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-04 18:38:27,023-Speed 13744.37 samples/sec   Loss 0.9677   LearningRate 0.0000   Epoch: 39   Global Step: 68740   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-04 18:38:44,761-Speed 13855.25 samples/sec   Loss 0.9644   LearningRate 0.0000   Epoch: 39   Global Step: 68750   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-04 18:39:02,555-Speed 13812.46 samples/sec   Loss 0.9699   LearningRate 0.0000   Epoch: 39   Global Step: 68760   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-03-04 18:39:20,303-Speed 13847.66 samples/sec   Loss 0.9670   LearningRate 0.0000   Epoch: 39   Global Step: 68770   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-03-04 18:39:38,062-Speed 13839.13 samples/sec   Loss 0.9694   LearningRate 0.0000   Epoch: 39   Global Step: 68780   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-03-04 18:39:55,792-Speed 13861.51 samples/sec   Loss 0.9637   LearningRate 0.0000   Epoch: 39   Global Step: 68790   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-03-04 18:40:13,542-Speed 13846.20 samples/sec   Loss 0.9594   LearningRate 0.0000   Epoch: 39   Global Step: 68800   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-03-04 18:40:31,352-Speed 13799.91 samples/sec   Loss 0.9664   LearningRate 0.0000   Epoch: 39   Global Step: 68810   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-03-04 18:40:49,088-Speed 13857.76 samples/sec   Loss 0.9665   LearningRate 0.0000   Epoch: 39   Global Step: 68820   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-03-04 18:41:06,873-Speed 13818.21 samples/sec   Loss 0.9688   LearningRate 0.0000   Epoch: 39   Global Step: 68830   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-03-04 18:41:24,584-Speed 13876.75 samples/sec   Loss 0.9705   LearningRate 0.0000   Epoch: 39   Global Step: 68840   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-03-04 18:41:42,324-Speed 13854.56 samples/sec   Loss 0.9659   LearningRate 0.0000   Epoch: 39   Global Step: 68850   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-03-04 18:42:00,043-Speed 13870.43 samples/sec   Loss 0.9600   LearningRate 0.0000   Epoch: 39   Global Step: 68860   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-04 18:42:17,788-Speed 13850.15 samples/sec   Loss 0.9658   LearningRate 0.0000   Epoch: 39   Global Step: 68870   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-04 18:42:35,600-Speed 13797.85 samples/sec   Loss 0.9655   LearningRate 0.0000   Epoch: 39   Global Step: 68880   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-04 18:42:53,297-Speed 13887.87 samples/sec   Loss 0.9640   LearningRate 0.0000   Epoch: 39   Global Step: 68890   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-04 18:43:10,978-Speed 13901.18 samples/sec   Loss 0.9672   LearningRate 0.0000   Epoch: 39   Global Step: 68900   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-04 18:43:28,762-Speed 13819.21 samples/sec   Loss 0.9712   LearningRate 0.0000   Epoch: 39   Global Step: 68910   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-03-04 18:43:46,489-Speed 13864.22 samples/sec   Loss 0.9617   LearningRate 0.0000   Epoch: 39   Global Step: 68920   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-03-04 18:44:04,155-Speed 13911.95 samples/sec   Loss 0.9660   LearningRate 0.0000   Epoch: 39   Global Step: 68930   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-03-04 18:44:22,012-Speed 13763.80 samples/sec   Loss 0.9705   LearningRate 0.0000   Epoch: 39   Global Step: 68940   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-03-04 18:44:39,735-Speed 13867.13 samples/sec   Loss 0.9654   LearningRate 0.0000   Epoch: 39   Global Step: 68950   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-03-04 18:44:57,472-Speed 13856.19 samples/sec   Loss 0.9712   LearningRate 0.0000   Epoch: 39   Global Step: 68960   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-03-04 18:45:15,154-Speed 13899.43 samples/sec   Loss 0.9564   LearningRate 0.0000   Epoch: 39   Global Step: 68970   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-03-04 18:45:32,962-Speed 13801.77 samples/sec   Loss 0.9683   LearningRate 0.0000   Epoch: 39   Global Step: 68980   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-03-04 18:45:50,770-Speed 13800.55 samples/sec   Loss 0.9693   LearningRate 0.0000   Epoch: 39   Global Step: 68990   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-03-04 18:46:08,541-Speed 13829.83 samples/sec   Loss 0.9703   LearningRate 0.0000   Epoch: 39   Global Step: 69000   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-03-04 18:46:26,266-Speed 13865.69 samples/sec   Loss 0.9635   LearningRate 0.0000   Epoch: 39   Global Step: 69010   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-03-04 18:46:44,025-Speed 13840.06 samples/sec   Loss 0.9702   LearningRate 0.0000   Epoch: 39   Global Step: 69020   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-03-04 18:47:01,871-Speed 13771.70 samples/sec   Loss 0.9643   LearningRate 0.0000   Epoch: 39   Global Step: 69030   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-03-04 18:47:19,662-Speed 13813.94 samples/sec   Loss 0.9678   LearningRate 0.0000   Epoch: 39   Global Step: 69040   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-03-04 18:47:37,427-Speed 13834.26 samples/sec   Loss 0.9625   LearningRate 0.0000   Epoch: 39   Global Step: 69050   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-03-04 18:47:55,145-Speed 13871.61 samples/sec   Loss 0.9700   LearningRate 0.0000   Epoch: 39   Global Step: 69060   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-03-04 18:48:12,865-Speed 13870.25 samples/sec   Loss 0.9650   LearningRate 0.0000   Epoch: 39   Global Step: 69070   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-03-04 18:48:30,619-Speed 13842.76 samples/sec   Loss 0.9644   LearningRate 0.0000   Epoch: 39   Global Step: 69080   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-03-04 18:48:48,315-Speed 13888.78 samples/sec   Loss 0.9705   LearningRate 0.0000   Epoch: 39   Global Step: 69090   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-03-04 18:49:06,001-Speed 13896.58 samples/sec   Loss 0.9623   LearningRate 0.0000   Epoch: 39   Global Step: 69100   Fp16 Grad Scale: 16384   Required: 0 hours
Training: 2022-03-04 18:49:23,706-Speed 13881.23 samples/sec   Loss 0.9679   LearningRate 0.0000   Epoch: 39   Global Step: 69110   Fp16 Grad Scale: 32768   Required: 0 hours
Training: 2022-03-04 18:49:41,451-Speed 13850.28 samples/sec   Loss 0.9683   LearningRate 0.0000   Epoch: 39   Global Step: 69120   Fp16 Grad Scale: 32768   Required: -0 hours
Training: 2022-03-04 18:49:59,190-Speed 13854.37 samples/sec   Loss 0.9662   LearningRate 0.0000   Epoch: 39   Global Step: 69130   Fp16 Grad Scale: 32768   Required: -0 hours