Training: 2022-03-03 08:04:25,066-rank_id: 0 Training: 2022-03-03 08:05:28,575-Speed 13912.69 samples/sec Loss 42.4892 LearningRate 0.0000 Epoch: 0 Global Step: 20 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-03-03 08:05:46,223-Speed 13926.31 samples/sec Loss 42.4764 LearningRate 0.0000 Epoch: 0 Global Step: 30 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-03-03 08:06:04,049-Speed 13787.92 samples/sec Loss 42.4636 LearningRate 0.0000 Epoch: 0 Global Step: 40 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-03-03 08:06:21,863-Speed 13797.58 samples/sec Loss 42.4250 LearningRate 0.0000 Epoch: 0 Global Step: 50 Fp16 Grad Scale: 65536 Required: 35 hours Training: 2022-03-03 08:06:39,678-Speed 13795.80 samples/sec Loss 42.3780 LearningRate 0.0000 Epoch: 0 Global Step: 60 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-03-03 08:06:57,458-Speed 13824.54 samples/sec Loss 42.2783 LearningRate 0.0000 Epoch: 0 Global Step: 70 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-03-03 08:07:15,280-Speed 13791.01 samples/sec Loss 42.1334 LearningRate 0.0000 Epoch: 0 Global Step: 80 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-03-03 08:07:33,029-Speed 13847.08 samples/sec Loss 41.9596 LearningRate 0.0000 Epoch: 0 Global Step: 90 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-03-03 08:07:50,813-Speed 13820.37 samples/sec Loss 41.7365 LearningRate 0.0000 Epoch: 0 Global Step: 100 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-03-03 08:08:08,717-Speed 13727.80 samples/sec Loss 41.4752 LearningRate 0.0000 Epoch: 0 Global Step: 110 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-03-03 08:08:26,526-Speed 13800.93 samples/sec Loss 41.2082 LearningRate 0.0000 Epoch: 0 Global Step: 120 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-03-03 08:08:44,253-Speed 13863.98 samples/sec Loss 40.9147 LearningRate 0.0000 Epoch: 0 Global Step: 130 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-03-03 08:09:02,054-Speed 13807.67 samples/sec Loss 40.6283 LearningRate 0.0000 Epoch: 0 Global Step: 140 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-03-03 08:09:19,820-Speed 13834.18 samples/sec Loss 40.3172 LearningRate 0.0000 Epoch: 0 Global Step: 150 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-03-03 08:09:37,573-Speed 13844.00 samples/sec Loss 40.0319 LearningRate 0.0000 Epoch: 0 Global Step: 160 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-03-03 08:09:55,367-Speed 13812.18 samples/sec Loss 39.7825 LearningRate 0.0000 Epoch: 0 Global Step: 170 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-03-03 08:10:13,153-Speed 13818.63 samples/sec Loss 39.5631 LearningRate 0.0000 Epoch: 0 Global Step: 180 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-03-03 08:10:30,955-Speed 13806.28 samples/sec Loss 39.3847 LearningRate 0.0000 Epoch: 0 Global Step: 190 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-03-03 08:10:48,686-Speed 13860.77 samples/sec Loss 39.2179 LearningRate 0.0000 Epoch: 0 Global Step: 200 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-03-03 08:11:06,539-Speed 13766.89 samples/sec Loss 39.0972 LearningRate 0.0000 Epoch: 0 Global Step: 210 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-03-03 08:11:24,268-Speed 13862.74 samples/sec Loss 39.0031 LearningRate 0.0000 Epoch: 0 Global Step: 220 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-03-03 08:11:41,981-Speed 13875.46 samples/sec Loss 38.9293 LearningRate 0.0000 Epoch: 0 Global Step: 230 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-03-03 08:11:59,738-Speed 13841.07 samples/sec Loss 38.8636 LearningRate 0.0000 Epoch: 0 Global Step: 240 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-03-03 08:12:17,509-Speed 13830.24 samples/sec Loss 38.8420 LearningRate 0.0000 Epoch: 0 Global Step: 250 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-03-03 08:12:35,194-Speed 13897.54 samples/sec Loss 38.8140 LearningRate 0.0000 Epoch: 0 Global Step: 260 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-03-03 08:12:52,882-Speed 13896.15 samples/sec Loss 38.8162 LearningRate 0.0000 Epoch: 0 Global Step: 270 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-03-03 08:13:10,561-Speed 13902.39 samples/sec Loss 38.8487 LearningRate 0.0000 Epoch: 0 Global Step: 280 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-03-03 08:13:28,305-Speed 13851.31 samples/sec Loss 38.8465 LearningRate 0.0000 Epoch: 0 Global Step: 290 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-03-03 08:13:46,162-Speed 13763.55 samples/sec Loss 38.8397 LearningRate 0.0000 Epoch: 0 Global Step: 300 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-03-03 08:14:03,883-Speed 13868.49 samples/sec Loss 38.8301 LearningRate 0.0000 Epoch: 0 Global Step: 310 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-03-03 08:14:21,639-Speed 13842.04 samples/sec Loss 38.8364 LearningRate 0.0000 Epoch: 0 Global Step: 320 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-03-03 08:14:39,389-Speed 13846.51 samples/sec Loss 38.8474 LearningRate 0.0000 Epoch: 0 Global Step: 330 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-03-03 08:14:57,179-Speed 13815.29 samples/sec Loss 38.8848 LearningRate 0.0000 Epoch: 0 Global Step: 340 Fp16 Grad Scale: 8192 Required: 34 hours Training: 2022-03-03 08:15:14,898-Speed 13870.53 samples/sec Loss 38.8806 LearningRate 0.0001 Epoch: 0 Global Step: 350 Fp16 Grad Scale: 8192 Required: 34 hours Training: 2022-03-03 08:15:32,633-Speed 13858.63 samples/sec Loss 38.8829 LearningRate 0.0001 Epoch: 0 Global Step: 360 Fp16 Grad Scale: 8192 Required: 34 hours Training: 2022-03-03 08:15:50,422-Speed 13816.21 samples/sec Loss 38.8849 LearningRate 0.0001 Epoch: 0 Global Step: 370 Fp16 Grad Scale: 8192 Required: 34 hours Training: 2022-03-03 08:16:08,167-Speed 13850.07 samples/sec Loss 38.8941 LearningRate 0.0001 Epoch: 0 Global Step: 380 Fp16 Grad Scale: 8192 Required: 34 hours Training: 2022-03-03 08:16:25,951-Speed 13819.86 samples/sec Loss 38.8790 LearningRate 0.0001 Epoch: 0 Global Step: 390 Fp16 Grad Scale: 8192 Required: 34 hours Training: 2022-03-03 08:16:43,637-Speed 13896.74 samples/sec Loss 38.8755 LearningRate 0.0001 Epoch: 0 Global Step: 400 Fp16 Grad Scale: 8192 Required: 34 hours Training: 2022-03-03 08:17:01,346-Speed 13878.60 samples/sec Loss 38.8982 LearningRate 0.0001 Epoch: 0 Global Step: 410 Fp16 Grad Scale: 8192 Required: 34 hours Training: 2022-03-03 08:17:19,087-Speed 13853.36 samples/sec Loss 38.9245 LearningRate 0.0001 Epoch: 0 Global Step: 420 Fp16 Grad Scale: 8192 Required: 34 hours Training: 2022-03-03 08:17:36,834-Speed 13848.57 samples/sec Loss 38.8834 LearningRate 0.0001 Epoch: 0 Global Step: 430 Fp16 Grad Scale: 8192 Required: 34 hours Training: 2022-03-03 08:17:54,591-Speed 13842.23 samples/sec Loss 38.8782 LearningRate 0.0001 Epoch: 0 Global Step: 440 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-03-03 08:18:12,424-Speed 13783.54 samples/sec Loss 38.8602 LearningRate 0.0001 Epoch: 0 Global Step: 450 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-03-03 08:18:30,198-Speed 13827.63 samples/sec Loss 38.8489 LearningRate 0.0001 Epoch: 0 Global Step: 460 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-03-03 08:18:47,945-Speed 13848.76 samples/sec Loss 38.8465 LearningRate 0.0001 Epoch: 0 Global Step: 470 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-03-03 08:19:05,705-Speed 13839.12 samples/sec Loss 38.8471 LearningRate 0.0001 Epoch: 0 Global Step: 480 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-03-03 08:19:23,500-Speed 13811.29 samples/sec Loss 38.8400 LearningRate 0.0001 Epoch: 0 Global Step: 490 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-03-03 08:19:41,328-Speed 13785.90 samples/sec Loss 38.8598 LearningRate 0.0001 Epoch: 0 Global Step: 500 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-03-03 08:19:59,100-Speed 13829.85 samples/sec Loss 38.8535 LearningRate 0.0001 Epoch: 0 Global Step: 510 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-03-03 08:20:16,842-Speed 13852.16 samples/sec Loss 38.8620 LearningRate 0.0001 Epoch: 0 Global Step: 520 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-03-03 08:20:34,592-Speed 13847.07 samples/sec Loss 38.8628 LearningRate 0.0001 Epoch: 0 Global Step: 530 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-03-03 08:20:52,312-Speed 13869.23 samples/sec Loss 38.8776 LearningRate 0.0001 Epoch: 0 Global Step: 540 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-03-03 08:21:10,073-Speed 13838.32 samples/sec Loss 38.8709 LearningRate 0.0001 Epoch: 0 Global Step: 550 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-03-03 08:21:27,871-Speed 13809.66 samples/sec Loss 38.8748 LearningRate 0.0001 Epoch: 0 Global Step: 560 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-03-03 08:21:45,689-Speed 13793.13 samples/sec Loss 38.8965 LearningRate 0.0001 Epoch: 0 Global Step: 570 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-03-03 08:22:03,579-Speed 13738.15 samples/sec Loss 38.9041 LearningRate 0.0001 Epoch: 0 Global Step: 580 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-03-03 08:22:21,328-Speed 13847.18 samples/sec Loss 38.9106 LearningRate 0.0001 Epoch: 0 Global Step: 590 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-03-03 08:22:39,078-Speed 13846.92 samples/sec Loss 38.9325 LearningRate 0.0001 Epoch: 0 Global Step: 600 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-03-03 08:22:56,853-Speed 13826.43 samples/sec Loss 38.9278 LearningRate 0.0001 Epoch: 0 Global Step: 610 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-03-03 08:23:14,503-Speed 13925.14 samples/sec Loss 38.9422 LearningRate 0.0001 Epoch: 0 Global Step: 620 Fp16 Grad Scale: 8192 Required: 34 hours Training: 2022-03-03 08:23:32,260-Speed 13840.69 samples/sec Loss 38.9742 LearningRate 0.0001 Epoch: 0 Global Step: 630 Fp16 Grad Scale: 8192 Required: 34 hours Training: 2022-03-03 08:23:50,046-Speed 13818.72 samples/sec Loss 38.9523 LearningRate 0.0001 Epoch: 0 Global Step: 640 Fp16 Grad Scale: 8192 Required: 34 hours Training: 2022-03-03 08:24:07,818-Speed 13829.47 samples/sec Loss 39.0715 LearningRate 0.0001 Epoch: 0 Global Step: 650 Fp16 Grad Scale: 4096 Required: 34 hours Training: 2022-03-03 08:24:25,570-Speed 13844.92 samples/sec Loss 39.0198 LearningRate 0.0001 Epoch: 0 Global Step: 660 Fp16 Grad Scale: 4096 Required: 34 hours Training: 2022-03-03 08:24:43,276-Speed 13880.68 samples/sec Loss 39.0395 LearningRate 0.0001 Epoch: 0 Global Step: 670 Fp16 Grad Scale: 4096 Required: 34 hours Training: 2022-03-03 08:25:01,091-Speed 13795.41 samples/sec Loss 39.0070 LearningRate 0.0001 Epoch: 0 Global Step: 680 Fp16 Grad Scale: 4096 Required: 34 hours Training: 2022-03-03 08:25:18,859-Speed 13832.52 samples/sec Loss 39.0112 LearningRate 0.0001 Epoch: 0 Global Step: 690 Fp16 Grad Scale: 4096 Required: 34 hours Training: 2022-03-03 08:25:36,660-Speed 13807.04 samples/sec Loss 39.0098 LearningRate 0.0001 Epoch: 0 Global Step: 700 Fp16 Grad Scale: 4096 Required: 34 hours Training: 2022-03-03 08:25:54,396-Speed 13856.76 samples/sec Loss 39.0003 LearningRate 0.0001 Epoch: 0 Global Step: 710 Fp16 Grad Scale: 4096 Required: 34 hours Training: 2022-03-03 08:26:12,224-Speed 13786.34 samples/sec Loss 39.0102 LearningRate 0.0001 Epoch: 0 Global Step: 720 Fp16 Grad Scale: 4096 Required: 34 hours Training: 2022-03-03 08:26:29,977-Speed 13843.96 samples/sec Loss 39.0031 LearningRate 0.0001 Epoch: 0 Global Step: 730 Fp16 Grad Scale: 4096 Required: 34 hours Training: 2022-03-03 08:26:47,798-Speed 13791.24 samples/sec Loss 39.0007 LearningRate 0.0001 Epoch: 0 Global Step: 740 Fp16 Grad Scale: 4096 Required: 34 hours Training: 2022-03-03 08:27:05,589-Speed 13814.13 samples/sec Loss 39.0052 LearningRate 0.0001 Epoch: 0 Global Step: 750 Fp16 Grad Scale: 8192 Required: 34 hours Training: 2022-03-03 08:27:23,349-Speed 13839.08 samples/sec Loss 38.9972 LearningRate 0.0001 Epoch: 0 Global Step: 760 Fp16 Grad Scale: 8192 Required: 34 hours Training: 2022-03-03 08:27:41,054-Speed 13881.22 samples/sec Loss 38.9906 LearningRate 0.0001 Epoch: 0 Global Step: 770 Fp16 Grad Scale: 8192 Required: 34 hours Training: 2022-03-03 08:27:58,805-Speed 13846.07 samples/sec Loss 38.9871 LearningRate 0.0001 Epoch: 0 Global Step: 780 Fp16 Grad Scale: 8192 Required: 34 hours Training: 2022-03-03 08:28:16,523-Speed 13871.55 samples/sec Loss 38.9822 LearningRate 0.0001 Epoch: 0 Global Step: 790 Fp16 Grad Scale: 8192 Required: 34 hours Training: 2022-03-03 08:28:34,270-Speed 13848.87 samples/sec Loss 38.9683 LearningRate 0.0001 Epoch: 0 Global Step: 800 Fp16 Grad Scale: 8192 Required: 34 hours Training: 2022-03-03 08:28:52,018-Speed 13848.00 samples/sec Loss 38.9722 LearningRate 0.0001 Epoch: 0 Global Step: 810 Fp16 Grad Scale: 8192 Required: 34 hours Training: 2022-03-03 08:29:09,801-Speed 13820.80 samples/sec Loss 38.9758 LearningRate 0.0001 Epoch: 0 Global Step: 820 Fp16 Grad Scale: 8192 Required: 34 hours Training: 2022-03-03 08:29:27,525-Speed 13866.94 samples/sec Loss 38.9755 LearningRate 0.0001 Epoch: 0 Global Step: 830 Fp16 Grad Scale: 8192 Required: 34 hours Training: 2022-03-03 08:29:45,323-Speed 13808.78 samples/sec Loss 38.9726 LearningRate 0.0001 Epoch: 0 Global Step: 840 Fp16 Grad Scale: 8192 Required: 34 hours Training: 2022-03-03 08:30:03,178-Speed 13765.34 samples/sec Loss 38.9720 LearningRate 0.0001 Epoch: 0 Global Step: 850 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-03-03 08:30:21,023-Speed 13772.40 samples/sec Loss 38.9793 LearningRate 0.0001 Epoch: 0 Global Step: 860 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-03-03 08:30:38,793-Speed 13831.31 samples/sec Loss 38.9842 LearningRate 0.0001 Epoch: 0 Global Step: 870 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-03-03 08:30:56,595-Speed 13806.20 samples/sec Loss 38.9836 LearningRate 0.0001 Epoch: 0 Global Step: 880 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-03-03 08:31:14,319-Speed 13866.70 samples/sec Loss 38.9835 LearningRate 0.0001 Epoch: 0 Global Step: 890 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-03-03 08:31:32,045-Speed 13865.02 samples/sec Loss 38.9756 LearningRate 0.0001 Epoch: 0 Global Step: 900 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-03-03 08:31:49,880-Speed 13780.51 samples/sec Loss 38.9804 LearningRate 0.0001 Epoch: 0 Global Step: 910 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-03-03 08:32:07,650-Speed 13831.58 samples/sec Loss 38.9767 LearningRate 0.0001 Epoch: 0 Global Step: 920 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-03-03 08:32:25,385-Speed 13858.11 samples/sec Loss 38.9758 LearningRate 0.0001 Epoch: 0 Global Step: 930 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-03-03 08:32:43,123-Speed 13855.88 samples/sec Loss 38.9730 LearningRate 0.0001 Epoch: 0 Global Step: 940 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-03-03 08:33:00,864-Speed 13852.99 samples/sec Loss 38.9514 LearningRate 0.0001 Epoch: 0 Global Step: 950 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-03-03 08:33:18,685-Speed 13791.73 samples/sec Loss 38.9424 LearningRate 0.0001 Epoch: 0 Global Step: 960 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-03-03 08:33:36,387-Speed 13884.18 samples/sec Loss 38.9416 LearningRate 0.0001 Epoch: 0 Global Step: 970 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-03-03 08:33:54,093-Speed 13880.81 samples/sec Loss 38.9334 LearningRate 0.0001 Epoch: 0 Global Step: 980 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-03-03 08:34:11,916-Speed 13789.72 samples/sec Loss 38.9140 LearningRate 0.0001 Epoch: 0 Global Step: 990 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-03-03 08:34:29,740-Speed 13788.90 samples/sec Loss 38.8864 LearningRate 0.0001 Epoch: 0 Global Step: 1000 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-03-03 08:34:47,602-Speed 13759.92 samples/sec Loss 38.8724 LearningRate 0.0001 Epoch: 0 Global Step: 1010 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-03-03 08:35:05,319-Speed 13871.98 samples/sec Loss 38.8473 LearningRate 0.0001 Epoch: 0 Global Step: 1020 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-03-03 08:35:23,096-Speed 13824.77 samples/sec Loss 38.8155 LearningRate 0.0001 Epoch: 0 Global Step: 1030 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-03-03 08:35:40,923-Speed 13787.13 samples/sec Loss 38.7901 LearningRate 0.0002 Epoch: 0 Global Step: 1040 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-03-03 08:35:58,678-Speed 13842.70 samples/sec Loss 38.7585 LearningRate 0.0002 Epoch: 0 Global Step: 1050 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-03-03 08:36:16,459-Speed 13822.10 samples/sec Loss 38.7198 LearningRate 0.0002 Epoch: 0 Global Step: 1060 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-03-03 08:36:34,246-Speed 13817.67 samples/sec Loss 38.6768 LearningRate 0.0002 Epoch: 0 Global Step: 1070 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-03-03 08:36:51,994-Speed 13847.91 samples/sec Loss 38.6483 LearningRate 0.0002 Epoch: 0 Global Step: 1080 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-03-03 08:37:09,808-Speed 13798.74 samples/sec Loss 38.6080 LearningRate 0.0002 Epoch: 0 Global Step: 1090 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-03-03 08:37:27,588-Speed 13823.00 samples/sec Loss 38.5714 LearningRate 0.0002 Epoch: 0 Global Step: 1100 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-03-03 08:37:45,344-Speed 13841.63 samples/sec Loss 38.5292 LearningRate 0.0002 Epoch: 0 Global Step: 1110 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-03-03 08:38:03,111-Speed 13833.50 samples/sec Loss 38.5053 LearningRate 0.0002 Epoch: 0 Global Step: 1120 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-03-03 08:38:20,893-Speed 13821.86 samples/sec Loss 38.4691 LearningRate 0.0002 Epoch: 0 Global Step: 1130 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-03-03 08:38:38,745-Speed 13767.48 samples/sec Loss 38.4542 LearningRate 0.0002 Epoch: 0 Global Step: 1140 Fp16 Grad Scale: 65536 Required: 34 hours Training: 2022-03-03 08:38:56,547-Speed 13805.91 samples/sec Loss 38.4065 LearningRate 0.0002 Epoch: 0 Global Step: 1150 Fp16 Grad Scale: 131072 Required: 34 hours Training: 2022-03-03 08:39:14,320-Speed 13828.61 samples/sec Loss 38.3869 LearningRate 0.0002 Epoch: 0 Global Step: 1160 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-03-03 08:39:32,097-Speed 13825.75 samples/sec Loss 38.3578 LearningRate 0.0002 Epoch: 0 Global Step: 1170 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-03-03 08:39:49,890-Speed 13812.76 samples/sec Loss 38.3266 LearningRate 0.0002 Epoch: 0 Global Step: 1180 Fp16 Grad Scale: 32768 Required: 34 hours Training: 2022-03-03 08:40:07,738-Speed 13770.69 samples/sec Loss 38.3128 LearningRate 0.0002 Epoch: 0 Global Step: 1190 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-03-03 08:40:25,583-Speed 13772.86 samples/sec Loss 38.2927 LearningRate 0.0002 Epoch: 0 Global Step: 1200 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-03-03 08:40:43,370-Speed 13817.65 samples/sec Loss 38.3706 LearningRate 0.0002 Epoch: 0 Global Step: 1210 Fp16 Grad Scale: 8192 Required: 34 hours Training: 2022-03-03 08:41:01,209-Speed 13776.92 samples/sec Loss 38.3209 LearningRate 0.0002 Epoch: 0 Global Step: 1220 Fp16 Grad Scale: 8192 Required: 34 hours Training: 2022-03-03 08:41:19,071-Speed 13765.50 samples/sec Loss 38.2093 LearningRate 0.0002 Epoch: 0 Global Step: 1230 Fp16 Grad Scale: 8192 Required: 34 hours Training: 2022-03-03 08:41:36,903-Speed 13782.51 samples/sec Loss 38.1617 LearningRate 0.0002 Epoch: 0 Global Step: 1240 Fp16 Grad Scale: 8192 Required: 34 hours Training: 2022-03-03 08:41:54,742-Speed 13777.37 samples/sec Loss 38.1224 LearningRate 0.0002 Epoch: 0 Global Step: 1250 Fp16 Grad Scale: 8192 Required: 34 hours Training: 2022-03-03 08:42:12,624-Speed 13744.14 samples/sec Loss 38.0936 LearningRate 0.0002 Epoch: 0 Global Step: 1260 Fp16 Grad Scale: 8192 Required: 34 hours Training: 2022-03-03 08:42:30,456-Speed 13783.69 samples/sec Loss 38.0492 LearningRate 0.0002 Epoch: 0 Global Step: 1270 Fp16 Grad Scale: 8192 Required: 34 hours Training: 2022-03-03 08:42:48,311-Speed 13764.58 samples/sec Loss 38.0031 LearningRate 0.0002 Epoch: 0 Global Step: 1280 Fp16 Grad Scale: 8192 Required: 34 hours Training: 2022-03-03 08:43:06,204-Speed 13736.24 samples/sec Loss 37.9594 LearningRate 0.0002 Epoch: 0 Global Step: 1290 Fp16 Grad Scale: 8192 Required: 34 hours Training: 2022-03-03 08:43:24,038-Speed 13781.29 samples/sec Loss 37.9136 LearningRate 0.0002 Epoch: 0 Global Step: 1300 Fp16 Grad Scale: 8192 Required: 34 hours Training: 2022-03-03 08:43:41,978-Speed 13700.12 samples/sec Loss 37.8599 LearningRate 0.0002 Epoch: 0 Global Step: 1310 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-03-03 08:43:59,882-Speed 13727.38 samples/sec Loss 37.8138 LearningRate 0.0002 Epoch: 0 Global Step: 1320 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-03-03 08:44:17,756-Speed 13750.10 samples/sec Loss 37.7872 LearningRate 0.0002 Epoch: 0 Global Step: 1330 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-03-03 08:44:35,530-Speed 13827.79 samples/sec Loss 37.8208 LearningRate 0.0002 Epoch: 0 Global Step: 1340 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-03-03 08:44:53,472-Speed 13698.74 samples/sec Loss 37.7613 LearningRate 0.0002 Epoch: 0 Global Step: 1350 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-03-03 08:45:11,236-Speed 13834.91 samples/sec Loss 37.6893 LearningRate 0.0002 Epoch: 0 Global Step: 1360 Fp16 Grad Scale: 4096 Required: 33 hours Training: 2022-03-03 08:45:29,059-Speed 13789.80 samples/sec Loss 37.5833 LearningRate 0.0002 Epoch: 0 Global Step: 1370 Fp16 Grad Scale: 4096 Required: 33 hours Training: 2022-03-03 08:45:46,976-Speed 13717.71 samples/sec Loss 37.5570 LearningRate 0.0002 Epoch: 0 Global Step: 1380 Fp16 Grad Scale: 4096 Required: 33 hours Training: 2022-03-03 08:46:05,092-Speed 13567.19 samples/sec Loss 37.4757 LearningRate 0.0002 Epoch: 0 Global Step: 1390 Fp16 Grad Scale: 4096 Required: 33 hours Training: 2022-03-03 08:46:23,228-Speed 13551.73 samples/sec Loss 37.3808 LearningRate 0.0002 Epoch: 0 Global Step: 1400 Fp16 Grad Scale: 4096 Required: 33 hours Training: 2022-03-03 08:46:41,398-Speed 13526.43 samples/sec Loss 37.3408 LearningRate 0.0002 Epoch: 0 Global Step: 1410 Fp16 Grad Scale: 4096 Required: 33 hours Training: 2022-03-03 08:46:59,571-Speed 13524.15 samples/sec Loss 37.3004 LearningRate 0.0002 Epoch: 0 Global Step: 1420 Fp16 Grad Scale: 4096 Required: 33 hours Training: 2022-03-03 08:47:17,716-Speed 13544.78 samples/sec Loss 37.2942 LearningRate 0.0002 Epoch: 0 Global Step: 1430 Fp16 Grad Scale: 4096 Required: 33 hours Training: 2022-03-03 08:47:35,822-Speed 13574.75 samples/sec Loss 37.2175 LearningRate 0.0002 Epoch: 0 Global Step: 1440 Fp16 Grad Scale: 4096 Required: 33 hours Training: 2022-03-03 08:47:53,969-Speed 13543.38 samples/sec Loss 37.1724 LearningRate 0.0002 Epoch: 0 Global Step: 1450 Fp16 Grad Scale: 4096 Required: 33 hours Training: 2022-03-03 08:48:12,074-Speed 13575.27 samples/sec Loss 37.0700 LearningRate 0.0002 Epoch: 0 Global Step: 1460 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-03-03 08:48:30,220-Speed 13543.76 samples/sec Loss 37.0468 LearningRate 0.0002 Epoch: 0 Global Step: 1470 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-03-03 08:48:48,312-Speed 13584.54 samples/sec Loss 37.0402 LearningRate 0.0002 Epoch: 0 Global Step: 1480 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-03-03 08:49:06,626-Speed 13422.38 samples/sec Loss 36.9097 LearningRate 0.0002 Epoch: 0 Global Step: 1490 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-03-03 08:49:24,776-Speed 13541.34 samples/sec Loss 36.8841 LearningRate 0.0002 Epoch: 0 Global Step: 1500 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-03-03 08:49:42,933-Speed 13535.28 samples/sec Loss 36.8346 LearningRate 0.0002 Epoch: 0 Global Step: 1510 Fp16 Grad Scale: 4096 Required: 33 hours Training: 2022-03-03 08:50:01,071-Speed 13550.17 samples/sec Loss 36.7794 LearningRate 0.0002 Epoch: 0 Global Step: 1520 Fp16 Grad Scale: 4096 Required: 33 hours Training: 2022-03-03 08:50:19,197-Speed 13559.59 samples/sec Loss 36.6977 LearningRate 0.0002 Epoch: 0 Global Step: 1530 Fp16 Grad Scale: 4096 Required: 33 hours Training: 2022-03-03 08:50:37,425-Speed 13483.22 samples/sec Loss 36.6119 LearningRate 0.0002 Epoch: 0 Global Step: 1540 Fp16 Grad Scale: 4096 Required: 33 hours Training: 2022-03-03 08:50:55,536-Speed 13571.42 samples/sec Loss 36.5650 LearningRate 0.0002 Epoch: 0 Global Step: 1550 Fp16 Grad Scale: 4096 Required: 33 hours Training: 2022-03-03 08:51:13,729-Speed 13509.15 samples/sec Loss 36.6514 LearningRate 0.0002 Epoch: 0 Global Step: 1560 Fp16 Grad Scale: 2048 Required: 33 hours Training: 2022-03-03 08:51:31,790-Speed 13608.64 samples/sec Loss 36.4679 LearningRate 0.0002 Epoch: 0 Global Step: 1570 Fp16 Grad Scale: 1024 Required: 33 hours Training: 2022-03-03 08:51:49,877-Speed 13588.24 samples/sec Loss 36.4191 LearningRate 0.0002 Epoch: 0 Global Step: 1580 Fp16 Grad Scale: 1024 Required: 33 hours Training: 2022-03-03 08:52:08,014-Speed 13550.97 samples/sec Loss 36.3615 LearningRate 0.0002 Epoch: 0 Global Step: 1590 Fp16 Grad Scale: 1024 Required: 33 hours Training: 2022-03-03 08:52:26,159-Speed 13545.35 samples/sec Loss 36.3309 LearningRate 0.0002 Epoch: 0 Global Step: 1600 Fp16 Grad Scale: 1024 Required: 33 hours Training: 2022-03-03 08:52:44,298-Speed 13549.19 samples/sec Loss 36.3660 LearningRate 0.0002 Epoch: 0 Global Step: 1610 Fp16 Grad Scale: 1024 Required: 33 hours Training: 2022-03-03 08:53:02,452-Speed 13538.40 samples/sec Loss 36.2173 LearningRate 0.0002 Epoch: 0 Global Step: 1620 Fp16 Grad Scale: 1024 Required: 33 hours Training: 2022-03-03 08:53:20,568-Speed 13566.57 samples/sec Loss 36.1629 LearningRate 0.0002 Epoch: 0 Global Step: 1630 Fp16 Grad Scale: 1024 Required: 33 hours Training: 2022-03-03 08:53:38,774-Speed 13500.01 samples/sec Loss 36.0479 LearningRate 0.0002 Epoch: 0 Global Step: 1640 Fp16 Grad Scale: 1024 Required: 33 hours Training: 2022-03-03 08:53:56,673-Speed 13731.63 samples/sec Loss 35.9773 LearningRate 0.0002 Epoch: 0 Global Step: 1650 Fp16 Grad Scale: 1024 Required: 33 hours Training: 2022-03-03 08:54:14,529-Speed 13763.75 samples/sec Loss 35.8711 LearningRate 0.0002 Epoch: 0 Global Step: 1660 Fp16 Grad Scale: 1024 Required: 33 hours Training: 2022-03-03 08:54:32,301-Speed 13830.88 samples/sec Loss 35.8540 LearningRate 0.0002 Epoch: 0 Global Step: 1670 Fp16 Grad Scale: 2048 Required: 33 hours Training: 2022-03-03 08:54:50,101-Speed 13807.64 samples/sec Loss 35.7735 LearningRate 0.0002 Epoch: 0 Global Step: 1680 Fp16 Grad Scale: 2048 Required: 33 hours Training: 2022-03-03 08:55:07,781-Speed 13901.30 samples/sec Loss 35.7210 LearningRate 0.0002 Epoch: 0 Global Step: 1690 Fp16 Grad Scale: 2048 Required: 33 hours Training: 2022-03-03 08:55:25,515-Speed 13858.76 samples/sec Loss 35.6461 LearningRate 0.0002 Epoch: 0 Global Step: 1700 Fp16 Grad Scale: 2048 Required: 33 hours Training: 2022-03-03 08:55:43,246-Speed 13861.38 samples/sec Loss 35.5800 LearningRate 0.0002 Epoch: 0 Global Step: 1710 Fp16 Grad Scale: 2048 Required: 33 hours Training: 2022-03-03 08:56:00,947-Speed 13884.52 samples/sec Loss 35.5038 LearningRate 0.0002 Epoch: 0 Global Step: 1720 Fp16 Grad Scale: 2048 Required: 33 hours Training: 2022-03-03 08:57:08,583-Speed 3633.66 samples/sec Loss 35.5531 LearningRate 0.0003 Epoch: 1 Global Step: 1730 Fp16 Grad Scale: 2048 Required: 34 hours Training: 2022-03-03 08:57:26,309-Speed 13864.73 samples/sec Loss 35.5557 LearningRate 0.0003 Epoch: 1 Global Step: 1740 Fp16 Grad Scale: 2048 Required: 34 hours Training: 2022-03-03 08:57:44,043-Speed 13859.43 samples/sec Loss 35.4544 LearningRate 0.0003 Epoch: 1 Global Step: 1750 Fp16 Grad Scale: 2048 Required: 34 hours Training: 2022-03-03 08:58:01,798-Speed 13842.65 samples/sec Loss 35.4012 LearningRate 0.0003 Epoch: 1 Global Step: 1760 Fp16 Grad Scale: 2048 Required: 34 hours Training: 2022-03-03 08:58:19,514-Speed 13873.03 samples/sec Loss 35.3628 LearningRate 0.0003 Epoch: 1 Global Step: 1770 Fp16 Grad Scale: 4096 Required: 34 hours Training: 2022-03-03 08:58:37,355-Speed 13775.54 samples/sec Loss 35.3447 LearningRate 0.0003 Epoch: 1 Global Step: 1780 Fp16 Grad Scale: 4096 Required: 34 hours Training: 2022-03-03 08:58:55,129-Speed 13828.19 samples/sec Loss 35.2740 LearningRate 0.0003 Epoch: 1 Global Step: 1790 Fp16 Grad Scale: 4096 Required: 34 hours Training: 2022-03-03 08:59:12,910-Speed 13822.55 samples/sec Loss 35.1900 LearningRate 0.0003 Epoch: 1 Global Step: 1800 Fp16 Grad Scale: 4096 Required: 34 hours Training: 2022-03-03 08:59:30,607-Speed 13888.20 samples/sec Loss 35.1373 LearningRate 0.0003 Epoch: 1 Global Step: 1810 Fp16 Grad Scale: 4096 Required: 34 hours Training: 2022-03-03 08:59:48,344-Speed 13856.45 samples/sec Loss 35.0047 LearningRate 0.0003 Epoch: 1 Global Step: 1820 Fp16 Grad Scale: 4096 Required: 34 hours Training: 2022-03-03 09:00:06,117-Speed 13828.50 samples/sec Loss 34.9075 LearningRate 0.0003 Epoch: 1 Global Step: 1830 Fp16 Grad Scale: 4096 Required: 34 hours Training: 2022-03-03 09:00:23,869-Speed 13844.93 samples/sec Loss 34.8102 LearningRate 0.0003 Epoch: 1 Global Step: 1840 Fp16 Grad Scale: 4096 Required: 34 hours Training: 2022-03-03 09:00:41,630-Speed 13838.42 samples/sec Loss 34.7652 LearningRate 0.0003 Epoch: 1 Global Step: 1850 Fp16 Grad Scale: 4096 Required: 34 hours Training: 2022-03-03 09:00:59,465-Speed 13780.76 samples/sec Loss 34.6772 LearningRate 0.0003 Epoch: 1 Global Step: 1860 Fp16 Grad Scale: 4096 Required: 34 hours Training: 2022-03-03 09:01:17,168-Speed 13882.71 samples/sec Loss 34.5755 LearningRate 0.0003 Epoch: 1 Global Step: 1870 Fp16 Grad Scale: 8192 Required: 34 hours Training: 2022-03-03 09:01:34,865-Speed 13887.89 samples/sec Loss 34.5566 LearningRate 0.0003 Epoch: 1 Global Step: 1880 Fp16 Grad Scale: 8192 Required: 34 hours Training: 2022-03-03 09:01:52,704-Speed 13777.61 samples/sec Loss 34.5804 LearningRate 0.0003 Epoch: 1 Global Step: 1890 Fp16 Grad Scale: 8192 Required: 34 hours Training: 2022-03-03 09:02:10,431-Speed 13864.35 samples/sec Loss 34.4526 LearningRate 0.0003 Epoch: 1 Global Step: 1900 Fp16 Grad Scale: 8192 Required: 34 hours Training: 2022-03-03 09:02:28,181-Speed 13846.54 samples/sec Loss 34.3351 LearningRate 0.0003 Epoch: 1 Global Step: 1910 Fp16 Grad Scale: 8192 Required: 34 hours Training: 2022-03-03 09:02:45,920-Speed 13854.59 samples/sec Loss 34.2268 LearningRate 0.0003 Epoch: 1 Global Step: 1920 Fp16 Grad Scale: 8192 Required: 34 hours Training: 2022-03-03 09:03:03,640-Speed 13870.02 samples/sec Loss 34.1568 LearningRate 0.0003 Epoch: 1 Global Step: 1930 Fp16 Grad Scale: 8192 Required: 34 hours Training: 2022-03-03 09:03:21,344-Speed 13882.62 samples/sec Loss 34.1038 LearningRate 0.0003 Epoch: 1 Global Step: 1940 Fp16 Grad Scale: 8192 Required: 34 hours Training: 2022-03-03 09:03:39,129-Speed 13819.35 samples/sec Loss 34.0145 LearningRate 0.0003 Epoch: 1 Global Step: 1950 Fp16 Grad Scale: 8192 Required: 34 hours Training: 2022-03-03 09:03:56,881-Speed 13844.66 samples/sec Loss 33.9539 LearningRate 0.0003 Epoch: 1 Global Step: 1960 Fp16 Grad Scale: 8192 Required: 34 hours Training: 2022-03-03 09:04:14,624-Speed 13852.10 samples/sec Loss 33.8635 LearningRate 0.0003 Epoch: 1 Global Step: 1970 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-03-03 09:04:32,405-Speed 13822.33 samples/sec Loss 33.8227 LearningRate 0.0003 Epoch: 1 Global Step: 1980 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-03-03 09:04:50,123-Speed 13871.28 samples/sec Loss 33.7152 LearningRate 0.0003 Epoch: 1 Global Step: 1990 Fp16 Grad Scale: 8192 Required: 34 hours Training: 2022-03-03 09:05:07,904-Speed 13822.59 samples/sec Loss 33.6241 LearningRate 0.0003 Epoch: 1 Global Step: 2000 Fp16 Grad Scale: 8192 Required: 34 hours Training: 2022-03-03 09:05:25,663-Speed 13838.95 samples/sec Loss 33.6186 LearningRate 0.0003 Epoch: 1 Global Step: 2010 Fp16 Grad Scale: 8192 Required: 34 hours Training: 2022-03-03 09:05:43,437-Speed 13828.59 samples/sec Loss 33.5551 LearningRate 0.0003 Epoch: 1 Global Step: 2020 Fp16 Grad Scale: 8192 Required: 34 hours Training: 2022-03-03 09:06:01,204-Speed 13832.62 samples/sec Loss 33.4624 LearningRate 0.0003 Epoch: 1 Global Step: 2030 Fp16 Grad Scale: 8192 Required: 34 hours Training: 2022-03-03 09:06:18,904-Speed 13886.19 samples/sec Loss 33.4992 LearningRate 0.0003 Epoch: 1 Global Step: 2040 Fp16 Grad Scale: 8192 Required: 34 hours Training: 2022-03-03 09:06:36,705-Speed 13806.62 samples/sec Loss 33.3382 LearningRate 0.0003 Epoch: 1 Global Step: 2050 Fp16 Grad Scale: 8192 Required: 34 hours Training: 2022-03-03 09:06:54,528-Speed 13789.80 samples/sec Loss 33.2099 LearningRate 0.0003 Epoch: 1 Global Step: 2060 Fp16 Grad Scale: 8192 Required: 34 hours Training: 2022-03-03 09:07:12,344-Speed 13795.44 samples/sec Loss 33.1453 LearningRate 0.0003 Epoch: 1 Global Step: 2070 Fp16 Grad Scale: 8192 Required: 34 hours Training: 2022-03-03 09:07:30,033-Speed 13894.05 samples/sec Loss 33.0261 LearningRate 0.0003 Epoch: 1 Global Step: 2080 Fp16 Grad Scale: 8192 Required: 34 hours Training: 2022-03-03 09:07:47,791-Speed 13840.48 samples/sec Loss 32.9128 LearningRate 0.0003 Epoch: 1 Global Step: 2090 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-03-03 09:08:05,589-Speed 13810.22 samples/sec Loss 32.7946 LearningRate 0.0003 Epoch: 1 Global Step: 2100 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-03-03 09:08:23,346-Speed 13840.72 samples/sec Loss 32.7730 LearningRate 0.0003 Epoch: 1 Global Step: 2110 Fp16 Grad Scale: 8192 Required: 34 hours Training: 2022-03-03 09:08:41,069-Speed 13867.57 samples/sec Loss 32.6987 LearningRate 0.0003 Epoch: 1 Global Step: 2120 Fp16 Grad Scale: 8192 Required: 34 hours Training: 2022-03-03 09:08:58,774-Speed 13881.55 samples/sec Loss 32.5884 LearningRate 0.0003 Epoch: 1 Global Step: 2130 Fp16 Grad Scale: 8192 Required: 34 hours Training: 2022-03-03 09:09:16,584-Speed 13800.10 samples/sec Loss 32.4788 LearningRate 0.0003 Epoch: 1 Global Step: 2140 Fp16 Grad Scale: 8192 Required: 34 hours Training: 2022-03-03 09:09:34,300-Speed 13873.17 samples/sec Loss 32.4069 LearningRate 0.0003 Epoch: 1 Global Step: 2150 Fp16 Grad Scale: 8192 Required: 34 hours Training: 2022-03-03 09:09:52,122-Speed 13790.50 samples/sec Loss 32.3353 LearningRate 0.0003 Epoch: 1 Global Step: 2160 Fp16 Grad Scale: 8192 Required: 34 hours Training: 2022-03-03 09:10:09,839-Speed 13871.86 samples/sec Loss 32.2968 LearningRate 0.0003 Epoch: 1 Global Step: 2170 Fp16 Grad Scale: 8192 Required: 34 hours Training: 2022-03-03 09:10:27,515-Speed 13904.67 samples/sec Loss 32.1148 LearningRate 0.0003 Epoch: 1 Global Step: 2180 Fp16 Grad Scale: 8192 Required: 34 hours Training: 2022-03-03 09:10:45,332-Speed 13794.39 samples/sec Loss 32.0056 LearningRate 0.0003 Epoch: 1 Global Step: 2190 Fp16 Grad Scale: 8192 Required: 34 hours Training: 2022-03-03 09:11:03,128-Speed 13810.83 samples/sec Loss 31.9613 LearningRate 0.0003 Epoch: 1 Global Step: 2200 Fp16 Grad Scale: 8192 Required: 34 hours Training: 2022-03-03 09:11:20,820-Speed 13892.19 samples/sec Loss 31.9026 LearningRate 0.0003 Epoch: 1 Global Step: 2210 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-03-03 09:11:38,512-Speed 13891.54 samples/sec Loss 31.7751 LearningRate 0.0003 Epoch: 1 Global Step: 2220 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-03-03 09:11:56,306-Speed 13812.78 samples/sec Loss 31.6323 LearningRate 0.0003 Epoch: 1 Global Step: 2230 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-03-03 09:12:14,101-Speed 13811.13 samples/sec Loss 31.5413 LearningRate 0.0003 Epoch: 1 Global Step: 2240 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-03-03 09:12:31,862-Speed 13837.78 samples/sec Loss 31.4241 LearningRate 0.0003 Epoch: 1 Global Step: 2250 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-03-03 09:12:49,580-Speed 13871.74 samples/sec Loss 31.3262 LearningRate 0.0003 Epoch: 1 Global Step: 2260 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-03-03 09:13:07,255-Speed 13904.82 samples/sec Loss 31.1940 LearningRate 0.0003 Epoch: 1 Global Step: 2270 Fp16 Grad Scale: 8192 Required: 34 hours Training: 2022-03-03 09:13:25,041-Speed 13818.61 samples/sec Loss 31.1200 LearningRate 0.0003 Epoch: 1 Global Step: 2280 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-03-03 09:13:42,713-Speed 13907.76 samples/sec Loss 31.0600 LearningRate 0.0003 Epoch: 1 Global Step: 2290 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-03-03 09:14:00,411-Speed 13888.44 samples/sec Loss 31.0208 LearningRate 0.0003 Epoch: 1 Global Step: 2300 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-03-03 09:14:18,184-Speed 13828.20 samples/sec Loss 30.8744 LearningRate 0.0003 Epoch: 1 Global Step: 2310 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-03-03 09:14:35,873-Speed 13893.84 samples/sec Loss 30.7826 LearningRate 0.0003 Epoch: 1 Global Step: 2320 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-03-03 09:14:53,644-Speed 13830.94 samples/sec Loss 30.6314 LearningRate 0.0003 Epoch: 1 Global Step: 2330 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-03-03 09:15:11,506-Speed 13759.68 samples/sec Loss 30.5017 LearningRate 0.0003 Epoch: 1 Global Step: 2340 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-03-03 09:15:29,276-Speed 13830.44 samples/sec Loss 30.3838 LearningRate 0.0003 Epoch: 1 Global Step: 2350 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-03-03 09:15:47,050-Speed 13828.03 samples/sec Loss 30.2962 LearningRate 0.0003 Epoch: 1 Global Step: 2360 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-03-03 09:16:04,781-Speed 13861.73 samples/sec Loss 30.2379 LearningRate 0.0003 Epoch: 1 Global Step: 2370 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-03-03 09:16:22,468-Speed 13895.60 samples/sec Loss 30.1137 LearningRate 0.0003 Epoch: 1 Global Step: 2380 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-03-03 09:16:40,211-Speed 13851.96 samples/sec Loss 29.9769 LearningRate 0.0003 Epoch: 1 Global Step: 2390 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-03-03 09:16:57,982-Speed 13830.26 samples/sec Loss 29.8317 LearningRate 0.0003 Epoch: 1 Global Step: 2400 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-03-03 09:17:15,729-Speed 13849.03 samples/sec Loss 29.7221 LearningRate 0.0003 Epoch: 1 Global Step: 2410 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-03-03 09:17:33,433-Speed 13881.91 samples/sec Loss 29.7048 LearningRate 0.0004 Epoch: 1 Global Step: 2420 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-03-03 09:17:51,313-Speed 13745.93 samples/sec Loss 29.5675 LearningRate 0.0004 Epoch: 1 Global Step: 2430 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-03-03 09:18:09,124-Speed 13799.36 samples/sec Loss 29.3802 LearningRate 0.0004 Epoch: 1 Global Step: 2440 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-03-03 09:18:26,878-Speed 13843.54 samples/sec Loss 29.3104 LearningRate 0.0004 Epoch: 1 Global Step: 2450 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-03-03 09:18:44,714-Speed 13779.11 samples/sec Loss 29.2667 LearningRate 0.0004 Epoch: 1 Global Step: 2460 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-03-03 09:19:02,488-Speed 13828.27 samples/sec Loss 29.0750 LearningRate 0.0004 Epoch: 1 Global Step: 2470 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-03-03 09:19:20,297-Speed 13800.69 samples/sec Loss 28.9467 LearningRate 0.0004 Epoch: 1 Global Step: 2480 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-03-03 09:19:38,151-Speed 13766.06 samples/sec Loss 28.8525 LearningRate 0.0004 Epoch: 1 Global Step: 2490 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-03-03 09:19:56,076-Speed 13711.87 samples/sec Loss 28.7785 LearningRate 0.0004 Epoch: 1 Global Step: 2500 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-03-03 09:20:13,872-Speed 13810.35 samples/sec Loss 28.6038 LearningRate 0.0004 Epoch: 1 Global Step: 2510 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-03-03 09:20:31,680-Speed 13801.40 samples/sec Loss 28.5469 LearningRate 0.0004 Epoch: 1 Global Step: 2520 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-03-03 09:20:49,584-Speed 13728.51 samples/sec Loss 28.5782 LearningRate 0.0004 Epoch: 1 Global Step: 2530 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-03-03 09:21:07,475-Speed 13737.44 samples/sec Loss 28.4046 LearningRate 0.0004 Epoch: 1 Global Step: 2540 Fp16 Grad Scale: 4096 Required: 33 hours Training: 2022-03-03 09:21:25,522-Speed 13618.41 samples/sec Loss 28.2257 LearningRate 0.0004 Epoch: 1 Global Step: 2550 Fp16 Grad Scale: 4096 Required: 33 hours Training: 2022-03-03 09:21:43,383-Speed 13760.41 samples/sec Loss 28.1340 LearningRate 0.0004 Epoch: 1 Global Step: 2560 Fp16 Grad Scale: 4096 Required: 33 hours Training: 2022-03-03 09:22:01,298-Speed 13719.13 samples/sec Loss 27.9651 LearningRate 0.0004 Epoch: 1 Global Step: 2570 Fp16 Grad Scale: 4096 Required: 33 hours Training: 2022-03-03 09:22:19,145-Speed 13771.33 samples/sec Loss 27.8427 LearningRate 0.0004 Epoch: 1 Global Step: 2580 Fp16 Grad Scale: 4096 Required: 33 hours Training: 2022-03-03 09:22:36,891-Speed 13848.94 samples/sec Loss 27.7203 LearningRate 0.0004 Epoch: 1 Global Step: 2590 Fp16 Grad Scale: 4096 Required: 33 hours Training: 2022-03-03 09:22:54,744-Speed 13767.28 samples/sec Loss 27.5573 LearningRate 0.0004 Epoch: 1 Global Step: 2600 Fp16 Grad Scale: 2048 Required: 33 hours Training: 2022-03-03 09:23:12,550-Speed 13803.12 samples/sec Loss 27.4450 LearningRate 0.0004 Epoch: 1 Global Step: 2610 Fp16 Grad Scale: 2048 Required: 33 hours Training: 2022-03-03 09:23:30,394-Speed 13773.40 samples/sec Loss 27.3745 LearningRate 0.0004 Epoch: 1 Global Step: 2620 Fp16 Grad Scale: 2048 Required: 33 hours Training: 2022-03-03 09:23:48,245-Speed 13769.09 samples/sec Loss 27.2276 LearningRate 0.0004 Epoch: 1 Global Step: 2630 Fp16 Grad Scale: 2048 Required: 33 hours Training: 2022-03-03 09:24:06,052-Speed 13802.99 samples/sec Loss 27.1099 LearningRate 0.0004 Epoch: 1 Global Step: 2640 Fp16 Grad Scale: 2048 Required: 33 hours Training: 2022-03-03 09:24:23,880-Speed 13785.54 samples/sec Loss 27.0195 LearningRate 0.0004 Epoch: 1 Global Step: 2650 Fp16 Grad Scale: 2048 Required: 33 hours Training: 2022-03-03 09:24:41,709-Speed 13785.15 samples/sec Loss 26.8865 LearningRate 0.0004 Epoch: 1 Global Step: 2660 Fp16 Grad Scale: 2048 Required: 33 hours Training: 2022-03-03 09:24:59,524-Speed 13796.53 samples/sec Loss 26.7253 LearningRate 0.0004 Epoch: 1 Global Step: 2670 Fp16 Grad Scale: 2048 Required: 33 hours Training: 2022-03-03 09:25:17,458-Speed 13704.15 samples/sec Loss 26.6498 LearningRate 0.0004 Epoch: 1 Global Step: 2680 Fp16 Grad Scale: 2048 Required: 33 hours Training: 2022-03-03 09:25:35,249-Speed 13814.37 samples/sec Loss 26.5064 LearningRate 0.0004 Epoch: 1 Global Step: 2690 Fp16 Grad Scale: 2048 Required: 33 hours Training: 2022-03-03 09:25:53,121-Speed 13752.32 samples/sec Loss 26.3562 LearningRate 0.0004 Epoch: 1 Global Step: 2700 Fp16 Grad Scale: 4096 Required: 33 hours Training: 2022-03-03 09:26:10,915-Speed 13812.55 samples/sec Loss 26.2177 LearningRate 0.0004 Epoch: 1 Global Step: 2710 Fp16 Grad Scale: 4096 Required: 33 hours Training: 2022-03-03 09:26:28,718-Speed 13804.55 samples/sec Loss 26.0584 LearningRate 0.0004 Epoch: 1 Global Step: 2720 Fp16 Grad Scale: 4096 Required: 33 hours Training: 2022-03-03 09:26:46,644-Speed 13710.86 samples/sec Loss 25.9390 LearningRate 0.0004 Epoch: 1 Global Step: 2730 Fp16 Grad Scale: 4096 Required: 33 hours Training: 2022-03-03 09:27:04,494-Speed 13769.02 samples/sec Loss 25.8979 LearningRate 0.0004 Epoch: 1 Global Step: 2740 Fp16 Grad Scale: 4096 Required: 33 hours Training: 2022-03-03 09:27:22,283-Speed 13816.17 samples/sec Loss 25.7764 LearningRate 0.0004 Epoch: 1 Global Step: 2750 Fp16 Grad Scale: 4096 Required: 33 hours Training: 2022-03-03 09:27:40,123-Speed 13776.87 samples/sec Loss 25.5662 LearningRate 0.0004 Epoch: 1 Global Step: 2760 Fp16 Grad Scale: 4096 Required: 33 hours Training: 2022-03-03 09:27:57,918-Speed 13815.01 samples/sec Loss 25.4346 LearningRate 0.0004 Epoch: 1 Global Step: 2770 Fp16 Grad Scale: 4096 Required: 33 hours Training: 2022-03-03 09:28:15,703-Speed 13818.92 samples/sec Loss 25.3088 LearningRate 0.0004 Epoch: 1 Global Step: 2780 Fp16 Grad Scale: 4096 Required: 33 hours Training: 2022-03-03 09:28:33,558-Speed 13766.15 samples/sec Loss 25.2201 LearningRate 0.0004 Epoch: 1 Global Step: 2790 Fp16 Grad Scale: 4096 Required: 33 hours Training: 2022-03-03 09:28:51,381-Speed 13789.38 samples/sec Loss 25.0652 LearningRate 0.0004 Epoch: 1 Global Step: 2800 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-03-03 09:29:09,195-Speed 13796.51 samples/sec Loss 24.9586 LearningRate 0.0004 Epoch: 1 Global Step: 2810 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-03-03 09:29:27,153-Speed 13686.54 samples/sec Loss 24.8858 LearningRate 0.0004 Epoch: 1 Global Step: 2820 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-03-03 09:29:44,963-Speed 13799.90 samples/sec Loss 24.7069 LearningRate 0.0004 Epoch: 1 Global Step: 2830 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-03-03 09:30:02,815-Speed 13767.66 samples/sec Loss 24.5654 LearningRate 0.0004 Epoch: 1 Global Step: 2840 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-03-03 09:30:20,597-Speed 13821.01 samples/sec Loss 24.4343 LearningRate 0.0004 Epoch: 1 Global Step: 2850 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-03-03 09:30:38,396-Speed 13808.99 samples/sec Loss 24.3576 LearningRate 0.0004 Epoch: 1 Global Step: 2860 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-03-03 09:30:56,208-Speed 13798.14 samples/sec Loss 24.1971 LearningRate 0.0004 Epoch: 1 Global Step: 2870 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-03-03 09:31:14,008-Speed 13807.32 samples/sec Loss 24.1181 LearningRate 0.0004 Epoch: 1 Global Step: 2880 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-03-03 09:31:31,839-Speed 13783.74 samples/sec Loss 23.9296 LearningRate 0.0004 Epoch: 1 Global Step: 2890 Fp16 Grad Scale: 8192 Required: 33 hours Training: 2022-03-03 09:31:49,662-Speed 13789.73 samples/sec Loss 23.8119 LearningRate 0.0004 Epoch: 1 Global Step: 2900 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-03-03 09:32:07,546-Speed 13744.14 samples/sec Loss 23.6697 LearningRate 0.0004 Epoch: 1 Global Step: 2910 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-03-03 09:32:25,414-Speed 13754.51 samples/sec Loss 23.4899 LearningRate 0.0004 Epoch: 1 Global Step: 2920 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-03-03 09:32:43,274-Speed 13761.40 samples/sec Loss 23.3738 LearningRate 0.0004 Epoch: 1 Global Step: 2930 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-03-03 09:33:01,145-Speed 13752.93 samples/sec Loss 23.2855 LearningRate 0.0004 Epoch: 1 Global Step: 2940 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-03-03 09:33:18,974-Speed 13785.14 samples/sec Loss 23.1657 LearningRate 0.0004 Epoch: 1 Global Step: 2950 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-03-03 09:33:36,793-Speed 13792.80 samples/sec Loss 22.9788 LearningRate 0.0004 Epoch: 1 Global Step: 2960 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-03-03 09:33:54,685-Speed 13737.15 samples/sec Loss 22.9435 LearningRate 0.0004 Epoch: 1 Global Step: 2970 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-03-03 09:34:12,498-Speed 13798.32 samples/sec Loss 22.7832 LearningRate 0.0004 Epoch: 1 Global Step: 2980 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-03-03 09:34:30,299-Speed 13806.38 samples/sec Loss 22.6576 LearningRate 0.0004 Epoch: 1 Global Step: 2990 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-03-03 09:34:48,126-Speed 13787.02 samples/sec Loss 22.5278 LearningRate 0.0004 Epoch: 1 Global Step: 3000 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-03-03 09:35:05,991-Speed 13756.91 samples/sec Loss 22.3702 LearningRate 0.0004 Epoch: 1 Global Step: 3010 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-03-03 09:35:23,878-Speed 13740.51 samples/sec Loss 22.3141 LearningRate 0.0004 Epoch: 1 Global Step: 3020 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-03-03 09:35:41,764-Speed 13741.11 samples/sec Loss 22.1622 LearningRate 0.0004 Epoch: 1 Global Step: 3030 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-03-03 09:35:59,586-Speed 13790.78 samples/sec Loss 21.9911 LearningRate 0.0004 Epoch: 1 Global Step: 3040 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-03-03 09:36:17,473-Speed 13740.26 samples/sec Loss 21.9280 LearningRate 0.0004 Epoch: 1 Global Step: 3050 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-03-03 09:36:35,605-Speed 13554.32 samples/sec Loss 21.7514 LearningRate 0.0004 Epoch: 1 Global Step: 3060 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-03-03 09:36:53,449-Speed 13773.99 samples/sec Loss 21.6066 LearningRate 0.0004 Epoch: 1 Global Step: 3070 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-03-03 09:37:11,382-Speed 13705.38 samples/sec Loss 21.4856 LearningRate 0.0004 Epoch: 1 Global Step: 3080 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-03-03 09:37:29,257-Speed 13749.76 samples/sec Loss 21.3891 LearningRate 0.0004 Epoch: 1 Global Step: 3090 Fp16 Grad Scale: 32768 Required: 33 hours Training: 2022-03-03 09:37:47,108-Speed 13768.06 samples/sec Loss 21.2793 LearningRate 0.0004 Epoch: 1 Global Step: 3100 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-03-03 09:38:04,989-Speed 13745.07 samples/sec Loss 21.1248 LearningRate 0.0004 Epoch: 1 Global Step: 3110 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-03-03 09:38:22,793-Speed 13804.79 samples/sec Loss 20.9951 LearningRate 0.0005 Epoch: 1 Global Step: 3120 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-03-03 09:38:40,684-Speed 13737.66 samples/sec Loss 20.8648 LearningRate 0.0005 Epoch: 1 Global Step: 3130 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-03-03 09:38:58,625-Speed 13698.84 samples/sec Loss 20.7772 LearningRate 0.0005 Epoch: 1 Global Step: 3140 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-03-03 09:39:16,451-Speed 13787.01 samples/sec Loss 20.7054 LearningRate 0.0005 Epoch: 1 Global Step: 3150 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-03-03 09:39:34,292-Speed 13776.42 samples/sec Loss 20.5445 LearningRate 0.0005 Epoch: 1 Global Step: 3160 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-03-03 09:39:52,211-Speed 13715.63 samples/sec Loss 20.5153 LearningRate 0.0005 Epoch: 1 Global Step: 3170 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-03-03 09:40:10,170-Speed 13685.23 samples/sec Loss 20.3372 LearningRate 0.0005 Epoch: 1 Global Step: 3180 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-03-03 09:40:28,098-Speed 13708.98 samples/sec Loss 20.2067 LearningRate 0.0005 Epoch: 1 Global Step: 3190 Fp16 Grad Scale: 65536 Required: 33 hours Training: 2022-03-03 09:40:46,024-Speed 13710.96 samples/sec Loss 20.1509 LearningRate 0.0005 Epoch: 1 Global Step: 3200 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-03 09:41:03,892-Speed 13756.85 samples/sec Loss 20.0018 LearningRate 0.0005 Epoch: 1 Global Step: 3210 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-03 09:41:21,840-Speed 13693.20 samples/sec Loss 19.8522 LearningRate 0.0005 Epoch: 1 Global Step: 3220 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-03 09:41:39,711-Speed 13753.16 samples/sec Loss 19.7650 LearningRate 0.0005 Epoch: 1 Global Step: 3230 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-03 09:41:57,666-Speed 13688.29 samples/sec Loss 19.6733 LearningRate 0.0005 Epoch: 1 Global Step: 3240 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-03 09:42:15,512-Speed 13772.17 samples/sec Loss 19.5675 LearningRate 0.0005 Epoch: 1 Global Step: 3250 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-03 09:42:33,401-Speed 13738.68 samples/sec Loss 19.4078 LearningRate 0.0005 Epoch: 1 Global Step: 3260 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-03 09:42:51,381-Speed 13669.61 samples/sec Loss 19.3313 LearningRate 0.0005 Epoch: 1 Global Step: 3270 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-03 09:43:09,282-Speed 13730.03 samples/sec Loss 19.2461 LearningRate 0.0005 Epoch: 1 Global Step: 3280 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-03 09:43:27,157-Speed 13749.70 samples/sec Loss 19.1304 LearningRate 0.0005 Epoch: 1 Global Step: 3290 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-03 09:43:44,991-Speed 13780.59 samples/sec Loss 19.0403 LearningRate 0.0005 Epoch: 1 Global Step: 3300 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-03 09:44:03,008-Speed 13641.51 samples/sec Loss 18.9700 LearningRate 0.0005 Epoch: 1 Global Step: 3310 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-03 09:44:20,807-Speed 13808.53 samples/sec Loss 18.8407 LearningRate 0.0005 Epoch: 1 Global Step: 3320 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-03 09:44:38,645-Speed 13778.82 samples/sec Loss 18.6800 LearningRate 0.0005 Epoch: 1 Global Step: 3330 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-03 09:44:56,498-Speed 13766.54 samples/sec Loss 18.6049 LearningRate 0.0005 Epoch: 1 Global Step: 3340 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-03 09:45:14,384-Speed 13741.48 samples/sec Loss 18.5446 LearningRate 0.0005 Epoch: 1 Global Step: 3350 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-03 09:45:32,263-Speed 13746.52 samples/sec Loss 18.4337 LearningRate 0.0005 Epoch: 1 Global Step: 3360 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-03 09:45:50,157-Speed 13735.12 samples/sec Loss 18.3089 LearningRate 0.0005 Epoch: 1 Global Step: 3370 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-03 09:46:08,065-Speed 13724.17 samples/sec Loss 18.1975 LearningRate 0.0005 Epoch: 1 Global Step: 3380 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-03 09:46:25,961-Speed 13734.57 samples/sec Loss 18.0874 LearningRate 0.0005 Epoch: 1 Global Step: 3390 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-03 09:46:43,887-Speed 13710.29 samples/sec Loss 18.0297 LearningRate 0.0005 Epoch: 1 Global Step: 3400 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-03 09:47:01,709-Speed 13790.66 samples/sec Loss 17.8903 LearningRate 0.0005 Epoch: 1 Global Step: 3410 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-03 09:47:19,682-Speed 13674.53 samples/sec Loss 17.8396 LearningRate 0.0005 Epoch: 1 Global Step: 3420 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-03 09:47:37,540-Speed 13763.25 samples/sec Loss 17.7496 LearningRate 0.0005 Epoch: 1 Global Step: 3430 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-03 09:47:55,399-Speed 13763.76 samples/sec Loss 17.6247 LearningRate 0.0005 Epoch: 1 Global Step: 3440 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-03 09:48:13,254-Speed 13764.72 samples/sec Loss 17.5910 LearningRate 0.0005 Epoch: 1 Global Step: 3450 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-03 09:49:21,084-Speed 3623.24 samples/sec Loss 17.4380 LearningRate 0.0005 Epoch: 2 Global Step: 3460 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-03 09:49:38,953-Speed 13755.02 samples/sec Loss 17.2721 LearningRate 0.0005 Epoch: 2 Global Step: 3470 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-03 09:49:56,762-Speed 13799.96 samples/sec Loss 17.2015 LearningRate 0.0005 Epoch: 2 Global Step: 3480 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-03 09:50:14,694-Speed 13706.33 samples/sec Loss 17.1504 LearningRate 0.0005 Epoch: 2 Global Step: 3490 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-03 09:50:32,556-Speed 13759.83 samples/sec Loss 17.0511 LearningRate 0.0005 Epoch: 2 Global Step: 3500 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-03-03 09:50:50,423-Speed 13756.00 samples/sec Loss 16.9772 LearningRate 0.0005 Epoch: 2 Global Step: 3510 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-03 09:51:08,431-Speed 13647.24 samples/sec Loss 16.9112 LearningRate 0.0005 Epoch: 2 Global Step: 3520 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-03 09:51:26,306-Speed 13749.59 samples/sec Loss 16.7676 LearningRate 0.0005 Epoch: 2 Global Step: 3530 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-03 09:51:44,155-Speed 13770.34 samples/sec Loss 16.6862 LearningRate 0.0005 Epoch: 2 Global Step: 3540 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-03 09:52:02,010-Speed 13765.30 samples/sec Loss 16.6608 LearningRate 0.0005 Epoch: 2 Global Step: 3550 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-03 09:52:19,897-Speed 13740.21 samples/sec Loss 16.6435 LearningRate 0.0005 Epoch: 2 Global Step: 3560 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-03 09:52:38,023-Speed 13559.35 samples/sec Loss 16.4541 LearningRate 0.0005 Epoch: 2 Global Step: 3570 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-03 09:52:55,992-Speed 13677.60 samples/sec Loss 16.3415 LearningRate 0.0005 Epoch: 2 Global Step: 3580 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-03 09:53:13,997-Speed 13650.44 samples/sec Loss 16.2446 LearningRate 0.0005 Epoch: 2 Global Step: 3590 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-03 09:53:31,963-Speed 13680.01 samples/sec Loss 16.2461 LearningRate 0.0005 Epoch: 2 Global Step: 3600 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-03 09:53:49,926-Speed 13681.93 samples/sec Loss 16.1302 LearningRate 0.0005 Epoch: 2 Global Step: 3610 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-03 09:54:07,846-Speed 13715.33 samples/sec Loss 16.0572 LearningRate 0.0005 Epoch: 2 Global Step: 3620 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-03 09:54:25,725-Speed 13746.89 samples/sec Loss 16.0477 LearningRate 0.0005 Epoch: 2 Global Step: 3630 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-03 09:54:43,820-Speed 13582.35 samples/sec Loss 15.9614 LearningRate 0.0005 Epoch: 2 Global Step: 3640 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-03 09:55:01,745-Speed 13711.72 samples/sec Loss 15.8111 LearningRate 0.0005 Epoch: 2 Global Step: 3650 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-03 09:55:19,583-Speed 13778.06 samples/sec Loss 15.7780 LearningRate 0.0005 Epoch: 2 Global Step: 3660 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-03 09:55:37,532-Speed 13692.67 samples/sec Loss 15.6485 LearningRate 0.0005 Epoch: 2 Global Step: 3670 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-03 09:55:55,412-Speed 13746.06 samples/sec Loss 15.5662 LearningRate 0.0005 Epoch: 2 Global Step: 3680 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-03 09:56:13,349-Speed 13702.44 samples/sec Loss 15.4928 LearningRate 0.0005 Epoch: 2 Global Step: 3690 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-03 09:56:31,241-Speed 13736.88 samples/sec Loss 15.4393 LearningRate 0.0005 Epoch: 2 Global Step: 3700 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-03 09:56:49,032-Speed 13814.71 samples/sec Loss 15.3704 LearningRate 0.0005 Epoch: 2 Global Step: 3710 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-03 09:57:06,972-Speed 13699.68 samples/sec Loss 15.2785 LearningRate 0.0005 Epoch: 2 Global Step: 3720 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-03 09:57:24,871-Speed 13731.06 samples/sec Loss 15.2103 LearningRate 0.0005 Epoch: 2 Global Step: 3730 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-03 09:57:42,727-Speed 13764.54 samples/sec Loss 15.1511 LearningRate 0.0005 Epoch: 2 Global Step: 3740 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-03 09:58:00,559-Speed 13783.20 samples/sec Loss 15.1083 LearningRate 0.0005 Epoch: 2 Global Step: 3750 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-03 09:58:18,462-Speed 13728.07 samples/sec Loss 15.0011 LearningRate 0.0005 Epoch: 2 Global Step: 3760 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-03 09:58:36,350-Speed 13739.81 samples/sec Loss 14.8975 LearningRate 0.0005 Epoch: 2 Global Step: 3770 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-03 09:58:54,281-Speed 13706.85 samples/sec Loss 14.8750 LearningRate 0.0005 Epoch: 2 Global Step: 3780 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-03 09:59:12,119-Speed 13777.66 samples/sec Loss 14.7159 LearningRate 0.0005 Epoch: 2 Global Step: 3790 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-03 09:59:29,971-Speed 13767.59 samples/sec Loss 14.7004 LearningRate 0.0005 Epoch: 2 Global Step: 3800 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-03 09:59:47,843-Speed 13751.87 samples/sec Loss 14.6226 LearningRate 0.0006 Epoch: 2 Global Step: 3810 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-03 10:00:05,870-Speed 13633.56 samples/sec Loss 14.6076 LearningRate 0.0006 Epoch: 2 Global Step: 3820 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-03 10:00:23,695-Speed 13788.07 samples/sec Loss 14.5568 LearningRate 0.0006 Epoch: 2 Global Step: 3830 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-03 10:00:41,509-Speed 13797.46 samples/sec Loss 14.4441 LearningRate 0.0006 Epoch: 2 Global Step: 3840 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-03 10:00:59,450-Speed 13699.00 samples/sec Loss 14.2926 LearningRate 0.0006 Epoch: 2 Global Step: 3850 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-03 10:01:17,300-Speed 13768.58 samples/sec Loss 14.2915 LearningRate 0.0006 Epoch: 2 Global Step: 3860 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-03 10:01:35,159-Speed 13761.83 samples/sec Loss 14.1843 LearningRate 0.0006 Epoch: 2 Global Step: 3870 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-03 10:01:53,071-Speed 13721.96 samples/sec Loss 14.1817 LearningRate 0.0006 Epoch: 2 Global Step: 3880 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-03 10:02:11,030-Speed 13684.88 samples/sec Loss 14.1055 LearningRate 0.0006 Epoch: 2 Global Step: 3890 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-03 10:02:28,894-Speed 13758.43 samples/sec Loss 14.0630 LearningRate 0.0006 Epoch: 2 Global Step: 3900 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-03 10:02:46,721-Speed 13786.96 samples/sec Loss 13.9432 LearningRate 0.0006 Epoch: 2 Global Step: 3910 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-03 10:03:04,650-Speed 13708.36 samples/sec Loss 13.8881 LearningRate 0.0006 Epoch: 2 Global Step: 3920 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-03 10:03:22,647-Speed 13656.44 samples/sec Loss 13.8539 LearningRate 0.0006 Epoch: 2 Global Step: 3930 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-03 10:03:40,598-Speed 13691.38 samples/sec Loss 13.8137 LearningRate 0.0006 Epoch: 2 Global Step: 3940 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-03 10:03:58,481-Speed 13743.62 samples/sec Loss 13.7871 LearningRate 0.0006 Epoch: 2 Global Step: 3950 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-03 10:04:16,374-Speed 13735.77 samples/sec Loss 13.7344 LearningRate 0.0006 Epoch: 2 Global Step: 3960 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-03 10:04:34,214-Speed 13777.27 samples/sec Loss 13.6199 LearningRate 0.0006 Epoch: 2 Global Step: 3970 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-03 10:04:52,060-Speed 13771.69 samples/sec Loss 13.5125 LearningRate 0.0006 Epoch: 2 Global Step: 3980 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-03 10:05:09,903-Speed 13773.85 samples/sec Loss 13.4485 LearningRate 0.0006 Epoch: 2 Global Step: 3990 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-03 10:05:27,674-Speed 13831.15 samples/sec Loss 13.4629 LearningRate 0.0006 Epoch: 2 Global Step: 4000 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-03 10:05:45,446-Speed 13829.03 samples/sec Loss 13.4473 LearningRate 0.0006 Epoch: 2 Global Step: 4010 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-03 10:06:03,271-Speed 13788.68 samples/sec Loss 13.3518 LearningRate 0.0006 Epoch: 2 Global Step: 4020 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-03 10:06:21,054-Speed 13820.89 samples/sec Loss 13.2399 LearningRate 0.0006 Epoch: 2 Global Step: 4030 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-03 10:06:38,857-Speed 13805.22 samples/sec Loss 13.1824 LearningRate 0.0006 Epoch: 2 Global Step: 4040 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-03 10:06:56,572-Speed 13873.79 samples/sec Loss 13.1503 LearningRate 0.0006 Epoch: 2 Global Step: 4050 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-03 10:07:14,320-Speed 13847.65 samples/sec Loss 13.0836 LearningRate 0.0006 Epoch: 2 Global Step: 4060 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-03 10:07:32,087-Speed 13835.50 samples/sec Loss 13.0263 LearningRate 0.0006 Epoch: 2 Global Step: 4070 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-03 10:07:49,898-Speed 13800.55 samples/sec Loss 12.9696 LearningRate 0.0006 Epoch: 2 Global Step: 4080 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-03 10:08:07,700-Speed 13806.51 samples/sec Loss 12.9287 LearningRate 0.0006 Epoch: 2 Global Step: 4090 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-03 10:08:25,562-Speed 13759.23 samples/sec Loss 12.8667 LearningRate 0.0006 Epoch: 2 Global Step: 4100 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-03 10:08:43,296-Speed 13859.67 samples/sec Loss 12.9377 LearningRate 0.0006 Epoch: 2 Global Step: 4110 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-03 10:09:01,128-Speed 13782.40 samples/sec Loss 12.8275 LearningRate 0.0006 Epoch: 2 Global Step: 4120 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-03 10:09:18,946-Speed 13793.47 samples/sec Loss 12.7534 LearningRate 0.0006 Epoch: 2 Global Step: 4130 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-03 10:09:36,737-Speed 13814.65 samples/sec Loss 12.7555 LearningRate 0.0006 Epoch: 2 Global Step: 4140 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-03 10:09:54,487-Speed 13846.74 samples/sec Loss 12.6587 LearningRate 0.0006 Epoch: 2 Global Step: 4150 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-03 10:10:12,348-Speed 13760.76 samples/sec Loss 12.5846 LearningRate 0.0006 Epoch: 2 Global Step: 4160 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-03 10:10:30,123-Speed 13826.83 samples/sec Loss 12.5126 LearningRate 0.0006 Epoch: 2 Global Step: 4170 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-03 10:10:47,900-Speed 13825.15 samples/sec Loss 12.5199 LearningRate 0.0006 Epoch: 2 Global Step: 4180 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-03 10:11:05,698-Speed 13810.04 samples/sec Loss 12.4278 LearningRate 0.0006 Epoch: 2 Global Step: 4190 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-03 10:11:23,508-Speed 13799.85 samples/sec Loss 12.3860 LearningRate 0.0006 Epoch: 2 Global Step: 4200 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-03 10:11:41,249-Speed 13852.99 samples/sec Loss 12.3051 LearningRate 0.0006 Epoch: 2 Global Step: 4210 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-03 10:11:58,946-Speed 13887.75 samples/sec Loss 12.2179 LearningRate 0.0006 Epoch: 2 Global Step: 4220 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-03 10:12:16,757-Speed 13799.28 samples/sec Loss 12.2156 LearningRate 0.0006 Epoch: 2 Global Step: 4230 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-03 10:12:34,501-Speed 13851.15 samples/sec Loss 12.1773 LearningRate 0.0006 Epoch: 2 Global Step: 4240 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-03 10:12:52,294-Speed 13813.45 samples/sec Loss 12.1781 LearningRate 0.0006 Epoch: 2 Global Step: 4250 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-03 10:13:10,089-Speed 13811.51 samples/sec Loss 12.0810 LearningRate 0.0006 Epoch: 2 Global Step: 4260 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-03 10:13:27,924-Speed 13780.81 samples/sec Loss 12.0239 LearningRate 0.0006 Epoch: 2 Global Step: 4270 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-03 10:13:45,681-Speed 13842.22 samples/sec Loss 11.9613 LearningRate 0.0006 Epoch: 2 Global Step: 4280 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-03 10:14:03,500-Speed 13792.99 samples/sec Loss 12.0319 LearningRate 0.0006 Epoch: 2 Global Step: 4290 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-03 10:14:21,364-Speed 13758.18 samples/sec Loss 11.9673 LearningRate 0.0006 Epoch: 2 Global Step: 4300 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-03 10:14:39,116-Speed 13844.58 samples/sec Loss 11.8649 LearningRate 0.0006 Epoch: 2 Global Step: 4310 Fp16 Grad Scale: 262144 Required: 33 hours Training: 2022-03-03 10:14:56,855-Speed 13854.88 samples/sec Loss 11.8540 LearningRate 0.0006 Epoch: 2 Global Step: 4320 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-03 10:15:14,629-Speed 13828.66 samples/sec Loss 11.7987 LearningRate 0.0006 Epoch: 2 Global Step: 4330 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-03 10:15:32,425-Speed 13811.50 samples/sec Loss 11.7578 LearningRate 0.0006 Epoch: 2 Global Step: 4340 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-03 10:15:50,186-Speed 13838.07 samples/sec Loss 11.7257 LearningRate 0.0006 Epoch: 2 Global Step: 4350 Fp16 Grad Scale: 131072 Required: 33 hours Training: 2022-03-03 10:16:07,968-Speed 13821.07 samples/sec Loss 11.7093 LearningRate 0.0006 Epoch: 2 Global Step: 4360 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 10:16:25,842-Speed 13750.38 samples/sec Loss 11.6251 LearningRate 0.0006 Epoch: 2 Global Step: 4370 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 10:16:43,577-Speed 13858.37 samples/sec Loss 11.5432 LearningRate 0.0006 Epoch: 2 Global Step: 4380 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 10:17:01,306-Speed 13862.94 samples/sec Loss 11.5216 LearningRate 0.0006 Epoch: 2 Global Step: 4390 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 10:17:19,039-Speed 13860.16 samples/sec Loss 11.5026 LearningRate 0.0006 Epoch: 2 Global Step: 4400 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 10:17:36,830-Speed 13814.14 samples/sec Loss 11.4639 LearningRate 0.0006 Epoch: 2 Global Step: 4410 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 10:17:54,591-Speed 13838.24 samples/sec Loss 11.3888 LearningRate 0.0006 Epoch: 2 Global Step: 4420 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 10:18:12,326-Speed 13859.19 samples/sec Loss 11.3967 LearningRate 0.0006 Epoch: 2 Global Step: 4430 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 10:18:30,147-Speed 13790.87 samples/sec Loss 11.3638 LearningRate 0.0006 Epoch: 2 Global Step: 4440 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 10:18:48,099-Speed 13690.99 samples/sec Loss 11.3500 LearningRate 0.0006 Epoch: 2 Global Step: 4450 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 10:19:06,171-Speed 13600.34 samples/sec Loss 11.3081 LearningRate 0.0006 Epoch: 2 Global Step: 4460 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 10:19:24,163-Speed 13659.79 samples/sec Loss 11.2517 LearningRate 0.0006 Epoch: 2 Global Step: 4470 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 10:19:42,271-Speed 13572.77 samples/sec Loss 11.2499 LearningRate 0.0006 Epoch: 2 Global Step: 4480 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 10:20:00,336-Speed 13605.37 samples/sec Loss 11.1613 LearningRate 0.0006 Epoch: 2 Global Step: 4490 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 10:20:18,438-Speed 13577.30 samples/sec Loss 11.1475 LearningRate 0.0007 Epoch: 2 Global Step: 4500 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 10:20:36,495-Speed 13611.00 samples/sec Loss 11.0924 LearningRate 0.0007 Epoch: 2 Global Step: 4510 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 10:20:54,585-Speed 13586.46 samples/sec Loss 11.0383 LearningRate 0.0007 Epoch: 2 Global Step: 4520 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 10:21:12,678-Speed 13583.60 samples/sec Loss 11.0638 LearningRate 0.0007 Epoch: 2 Global Step: 4530 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 10:21:30,686-Speed 13648.36 samples/sec Loss 11.0390 LearningRate 0.0007 Epoch: 2 Global Step: 4540 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 10:21:48,732-Speed 13618.98 samples/sec Loss 10.9646 LearningRate 0.0007 Epoch: 2 Global Step: 4550 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 10:22:06,846-Speed 13568.94 samples/sec Loss 10.8788 LearningRate 0.0007 Epoch: 2 Global Step: 4560 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 10:22:24,887-Speed 13622.37 samples/sec Loss 10.8641 LearningRate 0.0007 Epoch: 2 Global Step: 4570 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 10:22:42,967-Speed 13593.93 samples/sec Loss 10.8553 LearningRate 0.0007 Epoch: 2 Global Step: 4580 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 10:23:01,092-Speed 13560.25 samples/sec Loss 10.7868 LearningRate 0.0007 Epoch: 2 Global Step: 4590 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 10:23:19,165-Speed 13599.02 samples/sec Loss 10.7302 LearningRate 0.0007 Epoch: 2 Global Step: 4600 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 10:23:37,191-Speed 13634.71 samples/sec Loss 10.7951 LearningRate 0.0007 Epoch: 2 Global Step: 4610 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 10:23:55,283-Speed 13583.72 samples/sec Loss 10.7043 LearningRate 0.0007 Epoch: 2 Global Step: 4620 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 10:24:13,374-Speed 13586.23 samples/sec Loss 10.7075 LearningRate 0.0007 Epoch: 2 Global Step: 4630 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 10:24:31,420-Speed 13619.50 samples/sec Loss 10.6534 LearningRate 0.0007 Epoch: 2 Global Step: 4640 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 10:24:49,534-Speed 13568.19 samples/sec Loss 10.6392 LearningRate 0.0007 Epoch: 2 Global Step: 4650 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 10:25:07,719-Speed 13515.18 samples/sec Loss 10.6968 LearningRate 0.0007 Epoch: 2 Global Step: 4660 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 10:25:25,783-Speed 13605.09 samples/sec Loss 10.5744 LearningRate 0.0007 Epoch: 2 Global Step: 4670 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 10:25:43,880-Speed 13581.20 samples/sec Loss 10.4908 LearningRate 0.0007 Epoch: 2 Global Step: 4680 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 10:26:02,144-Speed 13456.35 samples/sec Loss 10.5948 LearningRate 0.0007 Epoch: 2 Global Step: 4690 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 10:26:20,176-Speed 13630.09 samples/sec Loss 10.4815 LearningRate 0.0007 Epoch: 2 Global Step: 4700 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 10:26:38,207-Speed 13631.19 samples/sec Loss 10.4757 LearningRate 0.0007 Epoch: 2 Global Step: 4710 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 10:26:56,263-Speed 13612.01 samples/sec Loss 10.3633 LearningRate 0.0007 Epoch: 2 Global Step: 4720 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 10:27:14,344-Speed 13592.44 samples/sec Loss 10.3554 LearningRate 0.0007 Epoch: 2 Global Step: 4730 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 10:27:32,517-Speed 13524.58 samples/sec Loss 10.3275 LearningRate 0.0007 Epoch: 2 Global Step: 4740 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 10:27:50,583-Speed 13604.16 samples/sec Loss 10.3043 LearningRate 0.0007 Epoch: 2 Global Step: 4750 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 10:28:08,736-Speed 13539.26 samples/sec Loss 10.2927 LearningRate 0.0007 Epoch: 2 Global Step: 4760 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 10:28:26,513-Speed 13824.63 samples/sec Loss 10.2059 LearningRate 0.0007 Epoch: 2 Global Step: 4770 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 10:28:44,305-Speed 13814.55 samples/sec Loss 10.2282 LearningRate 0.0007 Epoch: 2 Global Step: 4780 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 10:29:02,042-Speed 13856.45 samples/sec Loss 10.1813 LearningRate 0.0007 Epoch: 2 Global Step: 4790 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 10:29:19,771-Speed 13862.55 samples/sec Loss 10.2093 LearningRate 0.0007 Epoch: 2 Global Step: 4800 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 10:29:37,557-Speed 13818.64 samples/sec Loss 10.2023 LearningRate 0.0007 Epoch: 2 Global Step: 4810 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 10:29:55,306-Speed 13847.63 samples/sec Loss 10.0780 LearningRate 0.0007 Epoch: 2 Global Step: 4820 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 10:30:13,092-Speed 13818.91 samples/sec Loss 10.0414 LearningRate 0.0007 Epoch: 2 Global Step: 4830 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 10:30:30,819-Speed 13863.95 samples/sec Loss 10.0319 LearningRate 0.0007 Epoch: 2 Global Step: 4840 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 10:30:48,529-Speed 13878.14 samples/sec Loss 10.0415 LearningRate 0.0007 Epoch: 2 Global Step: 4850 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 10:31:06,278-Speed 13847.75 samples/sec Loss 10.0393 LearningRate 0.0007 Epoch: 2 Global Step: 4860 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 10:31:24,096-Speed 13793.49 samples/sec Loss 9.9965 LearningRate 0.0007 Epoch: 2 Global Step: 4870 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 10:31:41,922-Speed 13787.66 samples/sec Loss 9.9941 LearningRate 0.0007 Epoch: 2 Global Step: 4880 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 10:31:59,816-Speed 13734.78 samples/sec Loss 9.8827 LearningRate 0.0007 Epoch: 2 Global Step: 4890 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 10:32:17,606-Speed 13815.42 samples/sec Loss 9.9247 LearningRate 0.0007 Epoch: 2 Global Step: 4900 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 10:32:35,360-Speed 13843.64 samples/sec Loss 9.8317 LearningRate 0.0007 Epoch: 2 Global Step: 4910 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 10:32:53,113-Speed 13843.86 samples/sec Loss 9.9766 LearningRate 0.0007 Epoch: 2 Global Step: 4920 Fp16 Grad Scale: 262144 Required: 32 hours Training: 2022-03-03 10:33:10,824-Speed 13877.20 samples/sec Loss 9.8587 LearningRate 0.0007 Epoch: 2 Global Step: 4930 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 10:33:28,637-Speed 13797.54 samples/sec Loss 9.8553 LearningRate 0.0007 Epoch: 2 Global Step: 4940 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 10:33:46,365-Speed 13863.66 samples/sec Loss 9.8242 LearningRate 0.0007 Epoch: 2 Global Step: 4950 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 10:34:04,185-Speed 13792.55 samples/sec Loss 9.7939 LearningRate 0.0007 Epoch: 2 Global Step: 4960 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 10:34:21,973-Speed 13816.51 samples/sec Loss 9.7381 LearningRate 0.0007 Epoch: 2 Global Step: 4970 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 10:34:39,719-Speed 13850.18 samples/sec Loss 9.7562 LearningRate 0.0007 Epoch: 2 Global Step: 4980 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 10:34:57,472-Speed 13843.53 samples/sec Loss 9.6810 LearningRate 0.0007 Epoch: 2 Global Step: 4990 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-03 10:35:15,270-Speed 13809.55 samples/sec Loss 9.6715 LearningRate 0.0007 Epoch: 2 Global Step: 5000 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-03 10:35:33,042-Speed 13829.19 samples/sec Loss 9.6479 LearningRate 0.0007 Epoch: 2 Global Step: 5010 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-03 10:35:50,802-Speed 13838.53 samples/sec Loss 9.6420 LearningRate 0.0007 Epoch: 2 Global Step: 5020 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-03 10:36:08,598-Speed 13811.13 samples/sec Loss 9.6333 LearningRate 0.0007 Epoch: 2 Global Step: 5030 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-03 10:36:26,334-Speed 13856.94 samples/sec Loss 9.5596 LearningRate 0.0007 Epoch: 2 Global Step: 5040 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-03 10:36:44,228-Speed 13735.48 samples/sec Loss 9.5516 LearningRate 0.0007 Epoch: 2 Global Step: 5050 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-03 10:37:02,061-Speed 13782.02 samples/sec Loss 9.6145 LearningRate 0.0007 Epoch: 2 Global Step: 5060 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-03 10:37:19,812-Speed 13845.33 samples/sec Loss 9.5411 LearningRate 0.0007 Epoch: 2 Global Step: 5070 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-03 10:37:37,642-Speed 13784.73 samples/sec Loss 9.4940 LearningRate 0.0007 Epoch: 2 Global Step: 5080 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-03 10:37:55,386-Speed 13851.64 samples/sec Loss 9.4702 LearningRate 0.0007 Epoch: 2 Global Step: 5090 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 10:38:13,092-Speed 13880.24 samples/sec Loss 9.4059 LearningRate 0.0007 Epoch: 2 Global Step: 5100 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 10:38:30,782-Speed 13893.89 samples/sec Loss 9.4620 LearningRate 0.0007 Epoch: 2 Global Step: 5110 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 10:38:48,573-Speed 13815.31 samples/sec Loss 9.4602 LearningRate 0.0007 Epoch: 2 Global Step: 5120 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 10:39:06,326-Speed 13844.42 samples/sec Loss 9.4277 LearningRate 0.0007 Epoch: 2 Global Step: 5130 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 10:39:24,036-Speed 13878.12 samples/sec Loss 9.4240 LearningRate 0.0007 Epoch: 2 Global Step: 5140 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 10:39:41,761-Speed 13865.71 samples/sec Loss 9.4218 LearningRate 0.0007 Epoch: 2 Global Step: 5150 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-03 10:39:59,486-Speed 13866.20 samples/sec Loss 9.3741 LearningRate 0.0007 Epoch: 2 Global Step: 5160 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-03 10:40:17,201-Speed 13873.76 samples/sec Loss 9.4015 LearningRate 0.0007 Epoch: 2 Global Step: 5170 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-03 10:40:34,944-Speed 13853.98 samples/sec Loss 9.3555 LearningRate 0.0007 Epoch: 2 Global Step: 5180 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-03 10:41:42,586-Speed 3633.26 samples/sec Loss 9.2058 LearningRate 0.0008 Epoch: 3 Global Step: 5190 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-03 10:42:00,306-Speed 13869.76 samples/sec Loss 9.1614 LearningRate 0.0008 Epoch: 3 Global Step: 5200 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-03 10:42:18,054-Speed 13848.31 samples/sec Loss 9.1329 LearningRate 0.0008 Epoch: 3 Global Step: 5210 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-03 10:42:35,771-Speed 13872.12 samples/sec Loss 9.1619 LearningRate 0.0008 Epoch: 3 Global Step: 5220 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-03 10:42:53,495-Speed 13867.20 samples/sec Loss 9.1391 LearningRate 0.0008 Epoch: 3 Global Step: 5230 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-03 10:43:11,295-Speed 13807.64 samples/sec Loss 9.0847 LearningRate 0.0008 Epoch: 3 Global Step: 5240 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-03 10:43:28,986-Speed 13892.48 samples/sec Loss 9.0987 LearningRate 0.0008 Epoch: 3 Global Step: 5250 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-03-03 10:43:46,771-Speed 13819.34 samples/sec Loss 9.0874 LearningRate 0.0008 Epoch: 3 Global Step: 5260 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-03-03 10:44:04,587-Speed 13794.77 samples/sec Loss 9.0138 LearningRate 0.0008 Epoch: 3 Global Step: 5270 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-03-03 10:44:22,350-Speed 13836.64 samples/sec Loss 9.0256 LearningRate 0.0008 Epoch: 3 Global Step: 5280 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-03-03 10:44:40,137-Speed 13818.94 samples/sec Loss 9.0360 LearningRate 0.0008 Epoch: 3 Global Step: 5290 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-03-03 10:44:57,867-Speed 13861.61 samples/sec Loss 8.9814 LearningRate 0.0008 Epoch: 3 Global Step: 5300 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-03-03 10:45:15,662-Speed 13811.98 samples/sec Loss 8.9942 LearningRate 0.0008 Epoch: 3 Global Step: 5310 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-03-03 10:45:33,501-Speed 13776.72 samples/sec Loss 8.9716 LearningRate 0.0008 Epoch: 3 Global Step: 5320 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-03-03 10:45:51,301-Speed 13808.02 samples/sec Loss 8.9904 LearningRate 0.0008 Epoch: 3 Global Step: 5330 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-03-03 10:46:09,233-Speed 13705.98 samples/sec Loss 8.9571 LearningRate 0.0008 Epoch: 3 Global Step: 5340 Fp16 Grad Scale: 32768 Required: 32 hours Training: 2022-03-03 10:46:27,115-Speed 13743.69 samples/sec Loss 9.0349 LearningRate 0.0008 Epoch: 3 Global Step: 5350 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-03 10:46:44,836-Speed 13869.09 samples/sec Loss 8.9203 LearningRate 0.0008 Epoch: 3 Global Step: 5360 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-03 10:47:02,708-Speed 13751.98 samples/sec Loss 8.8564 LearningRate 0.0008 Epoch: 3 Global Step: 5370 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-03 10:47:20,470-Speed 13837.33 samples/sec Loss 8.8409 LearningRate 0.0008 Epoch: 3 Global Step: 5380 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-03 10:47:38,202-Speed 13860.15 samples/sec Loss 8.8298 LearningRate 0.0008 Epoch: 3 Global Step: 5390 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-03 10:47:55,977-Speed 13827.63 samples/sec Loss 8.8980 LearningRate 0.0008 Epoch: 3 Global Step: 5400 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-03 10:48:13,753-Speed 13826.42 samples/sec Loss 8.8237 LearningRate 0.0008 Epoch: 3 Global Step: 5410 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-03 10:48:31,518-Speed 13834.72 samples/sec Loss 8.8194 LearningRate 0.0008 Epoch: 3 Global Step: 5420 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-03 10:48:49,266-Speed 13847.83 samples/sec Loss 8.7002 LearningRate 0.0008 Epoch: 3 Global Step: 5430 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-03 10:49:07,064-Speed 13809.50 samples/sec Loss 8.7778 LearningRate 0.0008 Epoch: 3 Global Step: 5440 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-03 10:49:24,774-Speed 13877.81 samples/sec Loss 8.7554 LearningRate 0.0008 Epoch: 3 Global Step: 5450 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 10:49:42,547-Speed 13828.46 samples/sec Loss 8.8205 LearningRate 0.0008 Epoch: 3 Global Step: 5460 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 10:50:00,371-Speed 13789.22 samples/sec Loss 8.7396 LearningRate 0.0008 Epoch: 3 Global Step: 5470 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 10:50:18,143-Speed 13829.03 samples/sec Loss 8.7064 LearningRate 0.0008 Epoch: 3 Global Step: 5480 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 10:50:35,886-Speed 13851.87 samples/sec Loss 8.6784 LearningRate 0.0008 Epoch: 3 Global Step: 5490 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 10:50:53,655-Speed 13831.96 samples/sec Loss 8.6061 LearningRate 0.0008 Epoch: 3 Global Step: 5500 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 10:51:11,455-Speed 13807.61 samples/sec Loss 8.6355 LearningRate 0.0008 Epoch: 3 Global Step: 5510 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 10:51:29,274-Speed 13793.34 samples/sec Loss 8.6202 LearningRate 0.0008 Epoch: 3 Global Step: 5520 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 10:51:47,084-Speed 13799.71 samples/sec Loss 8.5954 LearningRate 0.0008 Epoch: 3 Global Step: 5530 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 10:52:04,876-Speed 13813.77 samples/sec Loss 8.6206 LearningRate 0.0008 Epoch: 3 Global Step: 5540 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 10:52:22,618-Speed 13852.59 samples/sec Loss 8.6838 LearningRate 0.0008 Epoch: 3 Global Step: 5550 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 10:52:40,423-Speed 13804.63 samples/sec Loss 8.6419 LearningRate 0.0008 Epoch: 3 Global Step: 5560 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 10:52:58,206-Speed 13822.10 samples/sec Loss 8.5643 LearningRate 0.0008 Epoch: 3 Global Step: 5570 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 10:53:15,951-Speed 13850.22 samples/sec Loss 8.5148 LearningRate 0.0008 Epoch: 3 Global Step: 5580 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 10:53:33,676-Speed 13866.06 samples/sec Loss 8.4501 LearningRate 0.0008 Epoch: 3 Global Step: 5590 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 10:53:51,502-Speed 13787.14 samples/sec Loss 8.5865 LearningRate 0.0008 Epoch: 3 Global Step: 5600 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 10:54:09,223-Speed 13869.70 samples/sec Loss 8.4834 LearningRate 0.0008 Epoch: 3 Global Step: 5610 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 10:54:26,993-Speed 13830.85 samples/sec Loss 8.4266 LearningRate 0.0008 Epoch: 3 Global Step: 5620 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 10:54:44,760-Speed 13833.19 samples/sec Loss 8.4901 LearningRate 0.0008 Epoch: 3 Global Step: 5630 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 10:55:02,598-Speed 13778.03 samples/sec Loss 8.4355 LearningRate 0.0008 Epoch: 3 Global Step: 5640 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 10:55:20,383-Speed 13819.13 samples/sec Loss 8.4356 LearningRate 0.0008 Epoch: 3 Global Step: 5650 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 10:55:38,164-Speed 13823.80 samples/sec Loss 8.4333 LearningRate 0.0008 Epoch: 3 Global Step: 5660 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 10:55:55,983-Speed 13792.52 samples/sec Loss 8.3614 LearningRate 0.0008 Epoch: 3 Global Step: 5670 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 10:56:13,738-Speed 13843.23 samples/sec Loss 8.3327 LearningRate 0.0008 Epoch: 3 Global Step: 5680 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 10:56:31,577-Speed 13778.36 samples/sec Loss 8.3549 LearningRate 0.0008 Epoch: 3 Global Step: 5690 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 10:56:49,321-Speed 13851.02 samples/sec Loss 8.3495 LearningRate 0.0008 Epoch: 3 Global Step: 5700 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 10:57:07,114-Speed 13813.47 samples/sec Loss 8.3570 LearningRate 0.0008 Epoch: 3 Global Step: 5710 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 10:57:25,028-Speed 13719.35 samples/sec Loss 8.3023 LearningRate 0.0008 Epoch: 3 Global Step: 5720 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 10:57:42,848-Speed 13791.86 samples/sec Loss 8.2486 LearningRate 0.0008 Epoch: 3 Global Step: 5730 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 10:58:00,690-Speed 13775.38 samples/sec Loss 8.2982 LearningRate 0.0008 Epoch: 3 Global Step: 5740 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 10:58:18,545-Speed 13765.52 samples/sec Loss 8.3405 LearningRate 0.0008 Epoch: 3 Global Step: 5750 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 10:58:36,266-Speed 13868.96 samples/sec Loss 8.3031 LearningRate 0.0008 Epoch: 3 Global Step: 5760 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 10:58:53,981-Speed 13873.65 samples/sec Loss 8.2758 LearningRate 0.0008 Epoch: 3 Global Step: 5770 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 10:59:11,719-Speed 13856.07 samples/sec Loss 8.2624 LearningRate 0.0008 Epoch: 3 Global Step: 5780 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 10:59:29,499-Speed 13823.06 samples/sec Loss 8.1798 LearningRate 0.0008 Epoch: 3 Global Step: 5790 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 10:59:47,297-Speed 13809.51 samples/sec Loss 8.1933 LearningRate 0.0008 Epoch: 3 Global Step: 5800 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 11:00:05,084-Speed 13817.34 samples/sec Loss 8.1857 LearningRate 0.0008 Epoch: 3 Global Step: 5810 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 11:00:22,841-Speed 13841.19 samples/sec Loss 8.1749 LearningRate 0.0008 Epoch: 3 Global Step: 5820 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 11:00:40,663-Speed 13790.43 samples/sec Loss 8.1440 LearningRate 0.0008 Epoch: 3 Global Step: 5830 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 11:00:58,370-Speed 13880.22 samples/sec Loss 8.1235 LearningRate 0.0008 Epoch: 3 Global Step: 5840 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 11:01:16,049-Speed 13902.55 samples/sec Loss 8.1656 LearningRate 0.0008 Epoch: 3 Global Step: 5850 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-03 11:01:33,819-Speed 13830.72 samples/sec Loss 8.0975 LearningRate 0.0008 Epoch: 3 Global Step: 5860 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-03 11:01:51,564-Speed 13850.29 samples/sec Loss 8.0329 LearningRate 0.0008 Epoch: 3 Global Step: 5870 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-03 11:02:09,329-Speed 13834.90 samples/sec Loss 8.0611 LearningRate 0.0009 Epoch: 3 Global Step: 5880 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-03 11:02:27,071-Speed 13852.51 samples/sec Loss 8.0166 LearningRate 0.0009 Epoch: 3 Global Step: 5890 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-03 11:02:44,859-Speed 13817.46 samples/sec Loss 8.0842 LearningRate 0.0009 Epoch: 3 Global Step: 5900 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-03 11:03:02,536-Speed 13903.69 samples/sec Loss 8.0597 LearningRate 0.0009 Epoch: 3 Global Step: 5910 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-03 11:03:20,412-Speed 13749.93 samples/sec Loss 8.0502 LearningRate 0.0009 Epoch: 3 Global Step: 5920 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-03 11:03:38,096-Speed 13897.92 samples/sec Loss 7.9858 LearningRate 0.0009 Epoch: 3 Global Step: 5930 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-03 11:03:55,838-Speed 13853.28 samples/sec Loss 8.0511 LearningRate 0.0009 Epoch: 3 Global Step: 5940 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-03 11:04:13,603-Speed 13835.49 samples/sec Loss 8.0138 LearningRate 0.0009 Epoch: 3 Global Step: 5950 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 11:04:31,272-Speed 13909.81 samples/sec Loss 7.9892 LearningRate 0.0009 Epoch: 3 Global Step: 5960 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-03 11:04:49,003-Speed 13862.33 samples/sec Loss 7.9692 LearningRate 0.0009 Epoch: 3 Global Step: 5970 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-03 11:05:06,808-Speed 13803.93 samples/sec Loss 7.9789 LearningRate 0.0009 Epoch: 3 Global Step: 5980 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-03 11:05:24,577-Speed 13832.23 samples/sec Loss 7.9013 LearningRate 0.0009 Epoch: 3 Global Step: 5990 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-03 11:05:42,260-Speed 13898.31 samples/sec Loss 7.8475 LearningRate 0.0009 Epoch: 3 Global Step: 6000 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-03 11:06:00,007-Speed 13849.61 samples/sec Loss 7.8437 LearningRate 0.0009 Epoch: 3 Global Step: 6010 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-03 11:06:17,756-Speed 13847.07 samples/sec Loss 7.8577 LearningRate 0.0009 Epoch: 3 Global Step: 6020 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-03 11:06:35,461-Speed 13882.09 samples/sec Loss 7.8226 LearningRate 0.0009 Epoch: 3 Global Step: 6030 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-03 11:06:53,173-Speed 13875.40 samples/sec Loss 7.8788 LearningRate 0.0009 Epoch: 3 Global Step: 6040 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-03 11:07:10,850-Speed 13903.56 samples/sec Loss 7.8260 LearningRate 0.0009 Epoch: 3 Global Step: 6050 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-03 11:07:28,626-Speed 13828.02 samples/sec Loss 7.7779 LearningRate 0.0009 Epoch: 3 Global Step: 6060 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 11:07:46,431-Speed 13804.07 samples/sec Loss 7.7919 LearningRate 0.0009 Epoch: 3 Global Step: 6070 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 11:08:04,187-Speed 13841.43 samples/sec Loss 7.7967 LearningRate 0.0009 Epoch: 3 Global Step: 6080 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 11:08:21,866-Speed 13902.36 samples/sec Loss 7.8695 LearningRate 0.0009 Epoch: 3 Global Step: 6090 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 11:08:39,560-Speed 13890.83 samples/sec Loss 7.7519 LearningRate 0.0009 Epoch: 3 Global Step: 6100 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 11:08:57,344-Speed 13819.37 samples/sec Loss 7.7390 LearningRate 0.0009 Epoch: 3 Global Step: 6110 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-03 11:09:15,094-Speed 13846.66 samples/sec Loss 7.7998 LearningRate 0.0009 Epoch: 3 Global Step: 6120 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-03 11:09:32,894-Speed 13807.67 samples/sec Loss 7.7781 LearningRate 0.0009 Epoch: 3 Global Step: 6130 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-03 11:09:50,692-Speed 13809.73 samples/sec Loss 7.7308 LearningRate 0.0009 Epoch: 3 Global Step: 6140 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-03 11:10:08,493-Speed 13806.52 samples/sec Loss 7.6352 LearningRate 0.0009 Epoch: 3 Global Step: 6150 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-03 11:10:26,213-Speed 13869.55 samples/sec Loss 7.6544 LearningRate 0.0009 Epoch: 3 Global Step: 6160 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-03 11:10:43,931-Speed 13871.97 samples/sec Loss 7.6337 LearningRate 0.0009 Epoch: 3 Global Step: 6170 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-03 11:11:01,667-Speed 13857.31 samples/sec Loss 7.6323 LearningRate 0.0009 Epoch: 3 Global Step: 6180 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-03 11:11:19,356-Speed 13894.92 samples/sec Loss 7.6765 LearningRate 0.0009 Epoch: 3 Global Step: 6190 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-03 11:11:37,136-Speed 13823.47 samples/sec Loss 7.6130 LearningRate 0.0009 Epoch: 3 Global Step: 6200 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-03 11:11:55,041-Speed 13726.18 samples/sec Loss 7.6326 LearningRate 0.0009 Epoch: 3 Global Step: 6210 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 11:12:12,915-Speed 13750.34 samples/sec Loss 7.5820 LearningRate 0.0009 Epoch: 3 Global Step: 6220 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 11:12:30,637-Speed 13868.26 samples/sec Loss 7.6288 LearningRate 0.0009 Epoch: 3 Global Step: 6230 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 11:12:48,381-Speed 13851.51 samples/sec Loss 7.5696 LearningRate 0.0009 Epoch: 3 Global Step: 6240 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 11:13:06,146-Speed 13834.65 samples/sec Loss 7.5637 LearningRate 0.0009 Epoch: 3 Global Step: 6250 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 11:13:23,838-Speed 13892.65 samples/sec Loss 7.5502 LearningRate 0.0009 Epoch: 3 Global Step: 6260 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 11:13:41,546-Speed 13879.08 samples/sec Loss 7.5417 LearningRate 0.0009 Epoch: 3 Global Step: 6270 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 11:13:59,307-Speed 13838.12 samples/sec Loss 7.5335 LearningRate 0.0009 Epoch: 3 Global Step: 6280 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 11:14:17,071-Speed 13835.84 samples/sec Loss 7.4937 LearningRate 0.0009 Epoch: 3 Global Step: 6290 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 11:14:34,795-Speed 13866.65 samples/sec Loss 7.5509 LearningRate 0.0009 Epoch: 3 Global Step: 6300 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 11:14:52,491-Speed 13888.63 samples/sec Loss 7.5001 LearningRate 0.0009 Epoch: 3 Global Step: 6310 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 11:15:10,254-Speed 13837.34 samples/sec Loss 7.5129 LearningRate 0.0009 Epoch: 3 Global Step: 6320 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 11:15:28,005-Speed 13846.15 samples/sec Loss 7.4951 LearningRate 0.0009 Epoch: 3 Global Step: 6330 Fp16 Grad Scale: 131072 Required: 32 hours Training: 2022-03-03 11:15:45,845-Speed 13776.29 samples/sec Loss 7.4669 LearningRate 0.0009 Epoch: 3 Global Step: 6340 Fp16 Grad Scale: 65536 Required: 32 hours Training: 2022-03-03 11:16:03,561-Speed 13873.50 samples/sec Loss 7.4764 LearningRate 0.0009 Epoch: 3 Global Step: 6350 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-03 11:16:21,329-Speed 13832.35 samples/sec Loss 7.5298 LearningRate 0.0009 Epoch: 3 Global Step: 6360 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-03 11:16:39,123-Speed 13812.68 samples/sec Loss 7.4059 LearningRate 0.0009 Epoch: 3 Global Step: 6370 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-03 11:16:56,854-Speed 13860.84 samples/sec Loss 7.4255 LearningRate 0.0009 Epoch: 3 Global Step: 6380 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-03 11:17:14,687-Speed 13782.17 samples/sec Loss 7.4025 LearningRate 0.0009 Epoch: 3 Global Step: 6390 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-03 11:17:32,468-Speed 13822.66 samples/sec Loss 7.3947 LearningRate 0.0009 Epoch: 3 Global Step: 6400 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-03 11:17:50,343-Speed 13749.50 samples/sec Loss 7.4148 LearningRate 0.0009 Epoch: 3 Global Step: 6410 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-03 11:18:08,135-Speed 13813.63 samples/sec Loss 7.3469 LearningRate 0.0009 Epoch: 3 Global Step: 6420 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-03 11:18:25,846-Speed 13878.11 samples/sec Loss 7.3420 LearningRate 0.0009 Epoch: 3 Global Step: 6430 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-03 11:18:43,560-Speed 13874.38 samples/sec Loss 7.3463 LearningRate 0.0009 Epoch: 3 Global Step: 6440 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-03-03 11:19:01,347-Speed 13818.31 samples/sec Loss 7.3787 LearningRate 0.0009 Epoch: 3 Global Step: 6450 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-03 11:19:19,153-Speed 13802.24 samples/sec Loss 7.3445 LearningRate 0.0009 Epoch: 3 Global Step: 6460 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-03 11:19:36,931-Speed 13825.47 samples/sec Loss 7.3244 LearningRate 0.0009 Epoch: 3 Global Step: 6470 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-03 11:19:54,647-Speed 13872.97 samples/sec Loss 7.3392 LearningRate 0.0009 Epoch: 3 Global Step: 6480 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-03 11:20:12,434-Speed 13817.25 samples/sec Loss 7.3297 LearningRate 0.0009 Epoch: 3 Global Step: 6490 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-03 11:20:30,197-Speed 13836.53 samples/sec Loss 7.2753 LearningRate 0.0009 Epoch: 3 Global Step: 6500 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-03 11:20:47,910-Speed 13875.58 samples/sec Loss 7.2408 LearningRate 0.0009 Epoch: 3 Global Step: 6510 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-03 11:21:05,632-Speed 13868.55 samples/sec Loss 7.3368 LearningRate 0.0009 Epoch: 3 Global Step: 6520 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-03 11:21:23,332-Speed 13885.40 samples/sec Loss 7.3690 LearningRate 0.0009 Epoch: 3 Global Step: 6530 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-03 11:21:41,037-Speed 13881.24 samples/sec Loss 7.3183 LearningRate 0.0009 Epoch: 3 Global Step: 6540 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-03 11:21:58,813-Speed 13826.92 samples/sec Loss 7.2687 LearningRate 0.0009 Epoch: 3 Global Step: 6550 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-03-03 11:22:16,671-Speed 13762.01 samples/sec Loss 7.2648 LearningRate 0.0009 Epoch: 3 Global Step: 6560 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-03 11:22:34,356-Speed 13897.52 samples/sec Loss 7.2354 LearningRate 0.0010 Epoch: 3 Global Step: 6570 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-03 11:22:52,054-Speed 13887.71 samples/sec Loss 7.2721 LearningRate 0.0010 Epoch: 3 Global Step: 6580 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-03 11:23:09,724-Speed 13909.08 samples/sec Loss 7.1960 LearningRate 0.0010 Epoch: 3 Global Step: 6590 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-03 11:23:27,420-Speed 13888.34 samples/sec Loss 7.1189 LearningRate 0.0010 Epoch: 3 Global Step: 6600 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-03 11:23:45,147-Speed 13864.35 samples/sec Loss 7.2032 LearningRate 0.0010 Epoch: 3 Global Step: 6610 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-03 11:24:02,896-Speed 13846.94 samples/sec Loss 7.2223 LearningRate 0.0010 Epoch: 3 Global Step: 6620 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-03 11:24:20,641-Speed 13850.75 samples/sec Loss 7.1755 LearningRate 0.0010 Epoch: 3 Global Step: 6630 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-03 11:24:38,396-Speed 13842.39 samples/sec Loss 7.1712 LearningRate 0.0010 Epoch: 3 Global Step: 6640 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-03 11:24:56,185-Speed 13816.03 samples/sec Loss 7.1562 LearningRate 0.0010 Epoch: 3 Global Step: 6650 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-03 11:25:13,928-Speed 13851.30 samples/sec Loss 7.1379 LearningRate 0.0010 Epoch: 3 Global Step: 6660 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-03-03 11:25:31,652-Speed 13867.12 samples/sec Loss 7.0824 LearningRate 0.0010 Epoch: 3 Global Step: 6670 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-03-03 11:25:49,392-Speed 13853.76 samples/sec Loss 7.1319 LearningRate 0.0010 Epoch: 3 Global Step: 6680 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-03-03 11:26:07,202-Speed 13800.03 samples/sec Loss 7.1017 LearningRate 0.0010 Epoch: 3 Global Step: 6690 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-03-03 11:26:24,978-Speed 13826.55 samples/sec Loss 7.1224 LearningRate 0.0010 Epoch: 3 Global Step: 6700 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-03-03 11:26:42,729-Speed 13846.10 samples/sec Loss 7.0844 LearningRate 0.0010 Epoch: 3 Global Step: 6710 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-03-03 11:27:00,476-Speed 13848.16 samples/sec Loss 7.1206 LearningRate 0.0010 Epoch: 3 Global Step: 6720 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-03-03 11:27:18,216-Speed 13854.57 samples/sec Loss 7.0672 LearningRate 0.0010 Epoch: 3 Global Step: 6730 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-03-03 11:27:35,896-Speed 13901.79 samples/sec Loss 7.0799 LearningRate 0.0010 Epoch: 3 Global Step: 6740 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-03-03 11:27:53,630-Speed 13858.58 samples/sec Loss 7.0640 LearningRate 0.0010 Epoch: 3 Global Step: 6750 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-03-03 11:28:11,315-Speed 13897.42 samples/sec Loss 7.0414 LearningRate 0.0010 Epoch: 3 Global Step: 6760 Fp16 Grad Scale: 262144 Required: 31 hours Training: 2022-03-03 11:28:29,010-Speed 13889.07 samples/sec Loss 7.0256 LearningRate 0.0010 Epoch: 3 Global Step: 6770 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-03-03 11:28:46,668-Speed 13919.05 samples/sec Loss 6.9943 LearningRate 0.0010 Epoch: 3 Global Step: 6780 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-03-03 11:29:04,473-Speed 13803.44 samples/sec Loss 7.0218 LearningRate 0.0010 Epoch: 3 Global Step: 6790 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-03-03 11:29:22,288-Speed 13796.24 samples/sec Loss 7.0620 LearningRate 0.0010 Epoch: 3 Global Step: 6800 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-03-03 11:29:39,969-Speed 13900.48 samples/sec Loss 7.0222 LearningRate 0.0010 Epoch: 3 Global Step: 6810 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-03-03 11:29:57,692-Speed 13867.83 samples/sec Loss 6.9909 LearningRate 0.0010 Epoch: 3 Global Step: 6820 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-03-03 11:30:15,463-Speed 13829.70 samples/sec Loss 6.9650 LearningRate 0.0010 Epoch: 3 Global Step: 6830 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-03-03 11:30:33,296-Speed 13781.89 samples/sec Loss 7.0010 LearningRate 0.0010 Epoch: 3 Global Step: 6840 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-03-03 11:30:51,039-Speed 13852.69 samples/sec Loss 6.9439 LearningRate 0.0010 Epoch: 3 Global Step: 6850 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-03-03 11:31:08,740-Speed 13884.54 samples/sec Loss 6.9994 LearningRate 0.0010 Epoch: 3 Global Step: 6860 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-03-03 11:31:26,476-Speed 13857.61 samples/sec Loss 6.9402 LearningRate 0.0010 Epoch: 3 Global Step: 6870 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-03-03 11:31:44,314-Speed 13777.63 samples/sec Loss 7.0064 LearningRate 0.0010 Epoch: 3 Global Step: 6880 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-03-03 11:32:02,015-Speed 13885.33 samples/sec Loss 7.0281 LearningRate 0.0010 Epoch: 3 Global Step: 6890 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-03-03 11:32:19,824-Speed 13800.55 samples/sec Loss 6.9503 LearningRate 0.0010 Epoch: 3 Global Step: 6900 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-03-03 11:32:37,579-Speed 13842.60 samples/sec Loss 6.9493 LearningRate 0.0010 Epoch: 3 Global Step: 6910 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-03 11:33:46,918-Speed 3544.37 samples/sec Loss 6.8595 LearningRate 0.0010 Epoch: 4 Global Step: 6920 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-03 11:34:04,771-Speed 13766.95 samples/sec Loss 6.8145 LearningRate 0.0010 Epoch: 4 Global Step: 6930 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-03 11:34:22,421-Speed 13925.61 samples/sec Loss 6.7803 LearningRate 0.0010 Epoch: 4 Global Step: 6940 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-03 11:34:40,110-Speed 13894.36 samples/sec Loss 6.8227 LearningRate 0.0010 Epoch: 4 Global Step: 6950 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-03 11:34:57,821-Speed 13877.44 samples/sec Loss 6.8824 LearningRate 0.0010 Epoch: 4 Global Step: 6960 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-03 11:35:15,536-Speed 13874.26 samples/sec Loss 6.7643 LearningRate 0.0010 Epoch: 4 Global Step: 6970 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-03 11:35:33,211-Speed 13904.69 samples/sec Loss 6.7486 LearningRate 0.0010 Epoch: 4 Global Step: 6980 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-03 11:35:50,997-Speed 13818.47 samples/sec Loss 6.7839 LearningRate 0.0010 Epoch: 4 Global Step: 6990 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-03 11:36:08,707-Speed 13877.89 samples/sec Loss 6.7819 LearningRate 0.0010 Epoch: 4 Global Step: 7000 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-03 11:36:26,434-Speed 13864.20 samples/sec Loss 6.7266 LearningRate 0.0010 Epoch: 4 Global Step: 7010 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-03-03 11:36:44,314-Speed 13745.55 samples/sec Loss 6.6949 LearningRate 0.0010 Epoch: 4 Global Step: 7020 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-03-03 11:37:02,091-Speed 13825.74 samples/sec Loss 6.7691 LearningRate 0.0010 Epoch: 4 Global Step: 7030 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-03-03 11:37:19,854-Speed 13836.39 samples/sec Loss 6.7311 LearningRate 0.0010 Epoch: 4 Global Step: 7040 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-03-03 11:37:37,540-Speed 13896.52 samples/sec Loss 6.6866 LearningRate 0.0010 Epoch: 4 Global Step: 7050 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-03-03 11:37:55,268-Speed 13863.83 samples/sec Loss 6.7489 LearningRate 0.0010 Epoch: 4 Global Step: 7060 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-03-03 11:38:13,109-Speed 13775.42 samples/sec Loss 6.6964 LearningRate 0.0010 Epoch: 4 Global Step: 7070 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-03-03 11:38:30,800-Speed 13892.93 samples/sec Loss 6.6824 LearningRate 0.0010 Epoch: 4 Global Step: 7080 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-03-03 11:38:48,499-Speed 13886.93 samples/sec Loss 6.6523 LearningRate 0.0010 Epoch: 4 Global Step: 7090 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-03 11:39:06,170-Speed 13907.65 samples/sec Loss 6.6479 LearningRate 0.0010 Epoch: 4 Global Step: 7100 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-03 11:39:23,933-Speed 13836.25 samples/sec Loss 6.6578 LearningRate 0.0010 Epoch: 4 Global Step: 7110 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-03 11:39:41,615-Speed 13900.04 samples/sec Loss 6.6562 LearningRate 0.0010 Epoch: 4 Global Step: 7120 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-03-03 11:39:59,340-Speed 13866.34 samples/sec Loss 6.6721 LearningRate 0.0010 Epoch: 4 Global Step: 7130 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-03-03 11:40:17,062-Speed 13869.77 samples/sec Loss 6.6484 LearningRate 0.0010 Epoch: 4 Global Step: 7140 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-03-03 11:40:34,779-Speed 13872.22 samples/sec Loss 6.5812 LearningRate 0.0010 Epoch: 4 Global Step: 7150 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-03-03 11:40:52,542-Speed 13837.30 samples/sec Loss 6.6120 LearningRate 0.0010 Epoch: 4 Global Step: 7160 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-03-03 11:41:10,253-Speed 13877.07 samples/sec Loss 6.6608 LearningRate 0.0010 Epoch: 4 Global Step: 7170 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-03-03 11:41:27,939-Speed 13895.74 samples/sec Loss 6.5617 LearningRate 0.0010 Epoch: 4 Global Step: 7180 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-03-03 11:41:45,782-Speed 13774.29 samples/sec Loss 6.5695 LearningRate 0.0010 Epoch: 4 Global Step: 7190 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-03-03 11:42:03,475-Speed 13893.25 samples/sec Loss 6.5599 LearningRate 0.0010 Epoch: 4 Global Step: 7200 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-03-03 11:42:21,212-Speed 13856.38 samples/sec Loss 6.5351 LearningRate 0.0010 Epoch: 4 Global Step: 7210 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-03-03 11:42:38,957-Speed 13850.82 samples/sec Loss 6.5420 LearningRate 0.0010 Epoch: 4 Global Step: 7220 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-03 11:42:56,702-Speed 13850.59 samples/sec Loss 6.5210 LearningRate 0.0010 Epoch: 4 Global Step: 7230 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-03 11:43:14,375-Speed 13906.35 samples/sec Loss 6.5523 LearningRate 0.0010 Epoch: 4 Global Step: 7240 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-03 11:43:32,095-Speed 13870.57 samples/sec Loss 6.4744 LearningRate 0.0010 Epoch: 4 Global Step: 7250 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-03-03 11:43:49,839-Speed 13850.86 samples/sec Loss 6.4732 LearningRate 0.0010 Epoch: 4 Global Step: 7260 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-03-03 11:44:07,547-Speed 13880.18 samples/sec Loss 6.4895 LearningRate 0.0010 Epoch: 4 Global Step: 7270 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-03-03 11:44:25,287-Speed 13853.88 samples/sec Loss 6.5068 LearningRate 0.0010 Epoch: 4 Global Step: 7280 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-03-03 11:44:42,979-Speed 13892.36 samples/sec Loss 6.4917 LearningRate 0.0010 Epoch: 4 Global Step: 7290 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-03-03 11:45:00,673-Speed 13889.66 samples/sec Loss 6.4930 LearningRate 0.0010 Epoch: 4 Global Step: 7300 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-03-03 11:45:18,363-Speed 13894.34 samples/sec Loss 6.4371 LearningRate 0.0010 Epoch: 4 Global Step: 7310 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-03-03 11:45:36,087-Speed 13866.97 samples/sec Loss 6.4642 LearningRate 0.0010 Epoch: 4 Global Step: 7320 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-03-03 11:45:53,862-Speed 13826.86 samples/sec Loss 6.4216 LearningRate 0.0010 Epoch: 4 Global Step: 7330 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-03-03 11:46:11,652-Speed 13815.28 samples/sec Loss 6.4242 LearningRate 0.0010 Epoch: 4 Global Step: 7340 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-03-03 11:46:29,344-Speed 13892.26 samples/sec Loss 6.4084 LearningRate 0.0010 Epoch: 4 Global Step: 7350 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-03 11:46:47,172-Speed 13785.64 samples/sec Loss 6.4823 LearningRate 0.0010 Epoch: 4 Global Step: 7360 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-03 11:47:04,960-Speed 13816.52 samples/sec Loss 6.4028 LearningRate 0.0010 Epoch: 4 Global Step: 7370 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-03 11:47:22,682-Speed 13868.85 samples/sec Loss 6.3541 LearningRate 0.0010 Epoch: 4 Global Step: 7380 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-03 11:47:40,497-Speed 13796.21 samples/sec Loss 6.3768 LearningRate 0.0010 Epoch: 4 Global Step: 7390 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-03 11:47:58,275-Speed 13824.61 samples/sec Loss 6.3632 LearningRate 0.0010 Epoch: 4 Global Step: 7400 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-03 11:48:16,086-Speed 13799.07 samples/sec Loss 6.3117 LearningRate 0.0010 Epoch: 4 Global Step: 7410 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-03 11:48:33,863-Speed 13825.45 samples/sec Loss 6.3535 LearningRate 0.0010 Epoch: 4 Global Step: 7420 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-03 11:48:51,535-Speed 13907.46 samples/sec Loss 6.3571 LearningRate 0.0010 Epoch: 4 Global Step: 7430 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-03 11:49:09,355-Speed 13795.24 samples/sec Loss 6.2703 LearningRate 0.0010 Epoch: 4 Global Step: 7440 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-03 11:49:27,058-Speed 13883.11 samples/sec Loss 6.2320 LearningRate 0.0010 Epoch: 4 Global Step: 7450 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-03-03 11:49:44,783-Speed 13866.31 samples/sec Loss 6.2565 LearningRate 0.0010 Epoch: 4 Global Step: 7460 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-03-03 11:50:02,500-Speed 13872.11 samples/sec Loss 6.2434 LearningRate 0.0010 Epoch: 4 Global Step: 7470 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-03-03 11:50:20,270-Speed 13832.41 samples/sec Loss 6.3493 LearningRate 0.0010 Epoch: 4 Global Step: 7480 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-03-03 11:50:37,987-Speed 13873.30 samples/sec Loss 6.2754 LearningRate 0.0010 Epoch: 4 Global Step: 7490 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-03-03 11:50:55,692-Speed 13881.93 samples/sec Loss 6.2093 LearningRate 0.0010 Epoch: 4 Global Step: 7500 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-03 11:51:13,392-Speed 13885.43 samples/sec Loss 6.2338 LearningRate 0.0010 Epoch: 4 Global Step: 7510 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-03 11:51:31,088-Speed 13889.34 samples/sec Loss 6.2292 LearningRate 0.0010 Epoch: 4 Global Step: 7520 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-03 11:51:48,897-Speed 13800.00 samples/sec Loss 6.2225 LearningRate 0.0010 Epoch: 4 Global Step: 7530 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-03 11:52:06,701-Speed 13805.82 samples/sec Loss 6.1897 LearningRate 0.0010 Epoch: 4 Global Step: 7540 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-03 11:52:24,422-Speed 13868.88 samples/sec Loss 6.1918 LearningRate 0.0010 Epoch: 4 Global Step: 7550 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-03 11:52:42,131-Speed 13879.06 samples/sec Loss 6.1886 LearningRate 0.0010 Epoch: 4 Global Step: 7560 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-03 11:52:59,938-Speed 13802.92 samples/sec Loss 6.1791 LearningRate 0.0010 Epoch: 4 Global Step: 7570 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-03 11:53:17,708-Speed 13830.77 samples/sec Loss 6.1477 LearningRate 0.0010 Epoch: 4 Global Step: 7580 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-03 11:53:35,455-Speed 13848.56 samples/sec Loss 6.2307 LearningRate 0.0010 Epoch: 4 Global Step: 7590 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-03 11:53:53,161-Speed 13880.95 samples/sec Loss 6.1754 LearningRate 0.0010 Epoch: 4 Global Step: 7600 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-03-03 11:54:10,861-Speed 13886.16 samples/sec Loss 6.1653 LearningRate 0.0010 Epoch: 4 Global Step: 7610 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-03-03 11:54:28,680-Speed 13792.79 samples/sec Loss 6.1261 LearningRate 0.0010 Epoch: 4 Global Step: 7620 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-03-03 11:54:46,388-Speed 13879.52 samples/sec Loss 6.1076 LearningRate 0.0010 Epoch: 4 Global Step: 7630 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-03 11:55:04,124-Speed 13856.96 samples/sec Loss 6.0950 LearningRate 0.0010 Epoch: 4 Global Step: 7640 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-03 11:55:21,842-Speed 13872.18 samples/sec Loss 6.1017 LearningRate 0.0010 Epoch: 4 Global Step: 7650 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-03 11:55:39,638-Speed 13810.30 samples/sec Loss 6.1015 LearningRate 0.0010 Epoch: 4 Global Step: 7660 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-03 11:55:57,361-Speed 13867.82 samples/sec Loss 6.1118 LearningRate 0.0010 Epoch: 4 Global Step: 7670 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-03 11:56:15,113-Speed 13845.02 samples/sec Loss 6.0885 LearningRate 0.0010 Epoch: 4 Global Step: 7680 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-03 11:56:32,845-Speed 13860.20 samples/sec Loss 6.0642 LearningRate 0.0010 Epoch: 4 Global Step: 7690 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-03 11:56:50,511-Speed 13912.27 samples/sec Loss 6.1169 LearningRate 0.0010 Epoch: 4 Global Step: 7700 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-03 11:57:08,247-Speed 13857.08 samples/sec Loss 6.0657 LearningRate 0.0010 Epoch: 4 Global Step: 7710 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-03 11:57:25,977-Speed 13862.42 samples/sec Loss 5.9842 LearningRate 0.0010 Epoch: 4 Global Step: 7720 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-03 11:57:43,837-Speed 13761.03 samples/sec Loss 5.9849 LearningRate 0.0010 Epoch: 4 Global Step: 7730 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-03-03 11:58:01,639-Speed 13806.57 samples/sec Loss 6.0537 LearningRate 0.0010 Epoch: 4 Global Step: 7740 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-03-03 11:58:19,331-Speed 13891.32 samples/sec Loss 5.9869 LearningRate 0.0010 Epoch: 4 Global Step: 7750 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-03-03 11:58:37,055-Speed 13866.84 samples/sec Loss 5.9425 LearningRate 0.0010 Epoch: 4 Global Step: 7760 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-03-03 11:58:54,806-Speed 13846.30 samples/sec Loss 6.0490 LearningRate 0.0010 Epoch: 4 Global Step: 7770 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-03 11:59:12,538-Speed 13860.08 samples/sec Loss 5.9986 LearningRate 0.0010 Epoch: 4 Global Step: 7780 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-03 11:59:30,292-Speed 13845.11 samples/sec Loss 5.9317 LearningRate 0.0010 Epoch: 4 Global Step: 7790 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-03 11:59:47,985-Speed 13890.87 samples/sec Loss 5.9845 LearningRate 0.0010 Epoch: 4 Global Step: 7800 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-03 12:00:05,694-Speed 13878.31 samples/sec Loss 5.9729 LearningRate 0.0010 Epoch: 4 Global Step: 7810 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-03 12:00:23,418-Speed 13866.87 samples/sec Loss 5.9748 LearningRate 0.0010 Epoch: 4 Global Step: 7820 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-03 12:00:41,121-Speed 13883.66 samples/sec Loss 5.9268 LearningRate 0.0010 Epoch: 4 Global Step: 7830 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-03 12:00:58,795-Speed 13906.85 samples/sec Loss 5.8962 LearningRate 0.0010 Epoch: 4 Global Step: 7840 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-03 12:01:16,527-Speed 13861.18 samples/sec Loss 5.9139 LearningRate 0.0010 Epoch: 4 Global Step: 7850 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-03 12:01:34,220-Speed 13890.47 samples/sec Loss 5.8925 LearningRate 0.0010 Epoch: 4 Global Step: 7860 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-03 12:01:51,964-Speed 13851.69 samples/sec Loss 5.8865 LearningRate 0.0010 Epoch: 4 Global Step: 7870 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-03 12:02:09,724-Speed 13838.63 samples/sec Loss 5.8703 LearningRate 0.0010 Epoch: 4 Global Step: 7880 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-03 12:02:27,500-Speed 13827.19 samples/sec Loss 5.8644 LearningRate 0.0010 Epoch: 4 Global Step: 7890 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-03 12:02:45,346-Speed 13771.86 samples/sec Loss 5.9126 LearningRate 0.0010 Epoch: 4 Global Step: 7900 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-03 12:03:03,150-Speed 13805.09 samples/sec Loss 5.8778 LearningRate 0.0010 Epoch: 4 Global Step: 7910 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-03 12:03:20,947-Speed 13810.64 samples/sec Loss 5.8631 LearningRate 0.0010 Epoch: 4 Global Step: 7920 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-03 12:03:38,763-Speed 13795.00 samples/sec Loss 5.8581 LearningRate 0.0010 Epoch: 4 Global Step: 7930 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-03 12:03:56,450-Speed 13895.15 samples/sec Loss 5.8795 LearningRate 0.0010 Epoch: 4 Global Step: 7940 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-03 12:04:14,153-Speed 13883.69 samples/sec Loss 5.8325 LearningRate 0.0010 Epoch: 4 Global Step: 7950 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-03 12:04:31,838-Speed 13897.41 samples/sec Loss 5.7853 LearningRate 0.0010 Epoch: 4 Global Step: 7960 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-03 12:04:49,534-Speed 13888.95 samples/sec Loss 5.7646 LearningRate 0.0010 Epoch: 4 Global Step: 7970 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-03-03 12:05:07,247-Speed 13874.97 samples/sec Loss 5.8718 LearningRate 0.0010 Epoch: 4 Global Step: 7980 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-03-03 12:05:24,976-Speed 13862.76 samples/sec Loss 5.9171 LearningRate 0.0010 Epoch: 4 Global Step: 7990 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-03-03 12:05:42,730-Speed 13844.14 samples/sec Loss 5.8141 LearningRate 0.0010 Epoch: 4 Global Step: 8000 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-03 12:06:00,463-Speed 13860.00 samples/sec Loss 5.7564 LearningRate 0.0010 Epoch: 4 Global Step: 8010 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-03 12:06:18,127-Speed 13913.85 samples/sec Loss 5.8101 LearningRate 0.0010 Epoch: 4 Global Step: 8020 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-03 12:06:35,883-Speed 13841.35 samples/sec Loss 5.7684 LearningRate 0.0010 Epoch: 4 Global Step: 8030 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-03 12:06:53,591-Speed 13879.44 samples/sec Loss 5.7504 LearningRate 0.0010 Epoch: 4 Global Step: 8040 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-03 12:07:11,321-Speed 13862.13 samples/sec Loss 5.7338 LearningRate 0.0010 Epoch: 4 Global Step: 8050 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-03 12:07:28,978-Speed 13920.77 samples/sec Loss 5.7265 LearningRate 0.0010 Epoch: 4 Global Step: 8060 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-03 12:07:46,696-Speed 13870.78 samples/sec Loss 5.7123 LearningRate 0.0010 Epoch: 4 Global Step: 8070 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-03 12:08:04,576-Speed 13747.40 samples/sec Loss 5.7168 LearningRate 0.0010 Epoch: 4 Global Step: 8080 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-03 12:08:22,388-Speed 13797.97 samples/sec Loss 5.7055 LearningRate 0.0010 Epoch: 4 Global Step: 8090 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-03 12:08:40,134-Speed 13850.21 samples/sec Loss 5.6660 LearningRate 0.0010 Epoch: 4 Global Step: 8100 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-03-03 12:08:57,949-Speed 13796.66 samples/sec Loss 5.6923 LearningRate 0.0010 Epoch: 4 Global Step: 8110 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-03-03 12:09:15,639-Speed 13893.58 samples/sec Loss 5.6847 LearningRate 0.0010 Epoch: 4 Global Step: 8120 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-03-03 12:09:33,311-Speed 13907.45 samples/sec Loss 5.6898 LearningRate 0.0010 Epoch: 4 Global Step: 8130 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-03 12:09:51,064-Speed 13843.81 samples/sec Loss 5.6937 LearningRate 0.0010 Epoch: 4 Global Step: 8140 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-03 12:10:08,831-Speed 13833.49 samples/sec Loss 5.6979 LearningRate 0.0010 Epoch: 4 Global Step: 8150 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-03 12:10:26,539-Speed 13879.53 samples/sec Loss 5.6308 LearningRate 0.0010 Epoch: 4 Global Step: 8160 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-03 12:10:44,377-Speed 13778.22 samples/sec Loss 5.5972 LearningRate 0.0010 Epoch: 4 Global Step: 8170 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-03 12:11:02,066-Speed 13894.01 samples/sec Loss 5.6429 LearningRate 0.0010 Epoch: 4 Global Step: 8180 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-03 12:11:19,777-Speed 13877.39 samples/sec Loss 5.6453 LearningRate 0.0010 Epoch: 4 Global Step: 8190 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-03 12:11:37,494-Speed 13871.77 samples/sec Loss 5.6337 LearningRate 0.0010 Epoch: 4 Global Step: 8200 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-03 12:11:55,193-Speed 13886.31 samples/sec Loss 5.6203 LearningRate 0.0010 Epoch: 4 Global Step: 8210 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-03 12:12:12,890-Speed 13888.15 samples/sec Loss 5.5945 LearningRate 0.0010 Epoch: 4 Global Step: 8220 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-03 12:12:30,695-Speed 13803.68 samples/sec Loss 5.6367 LearningRate 0.0010 Epoch: 4 Global Step: 8230 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-03-03 12:12:48,390-Speed 13890.21 samples/sec Loss 5.5799 LearningRate 0.0010 Epoch: 4 Global Step: 8240 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-03-03 12:13:06,097-Speed 13879.53 samples/sec Loss 5.6133 LearningRate 0.0010 Epoch: 4 Global Step: 8250 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-03-03 12:13:23,849-Speed 13845.03 samples/sec Loss 5.5780 LearningRate 0.0010 Epoch: 4 Global Step: 8260 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-03-03 12:13:41,546-Speed 13887.97 samples/sec Loss 5.5837 LearningRate 0.0010 Epoch: 4 Global Step: 8270 Fp16 Grad Scale: 131072 Required: 31 hours Training: 2022-03-03 12:13:59,229-Speed 13899.63 samples/sec Loss 5.5545 LearningRate 0.0010 Epoch: 4 Global Step: 8280 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-03 12:14:16,955-Speed 13865.17 samples/sec Loss 5.5665 LearningRate 0.0010 Epoch: 4 Global Step: 8290 Fp16 Grad Scale: 65536 Required: 31 hours Training: 2022-03-03 12:14:34,700-Speed 13849.92 samples/sec Loss 5.6413 LearningRate 0.0010 Epoch: 4 Global Step: 8300 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-03 12:14:52,415-Speed 13873.82 samples/sec Loss 5.5377 LearningRate 0.0010 Epoch: 4 Global Step: 8310 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-03 12:15:10,188-Speed 13829.26 samples/sec Loss 5.5849 LearningRate 0.0010 Epoch: 4 Global Step: 8320 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-03 12:15:27,925-Speed 13856.60 samples/sec Loss 5.5117 LearningRate 0.0010 Epoch: 4 Global Step: 8330 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-03 12:15:45,661-Speed 13857.06 samples/sec Loss 5.5481 LearningRate 0.0010 Epoch: 4 Global Step: 8340 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-03 12:16:03,388-Speed 13865.14 samples/sec Loss 5.5072 LearningRate 0.0010 Epoch: 4 Global Step: 8350 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-03 12:16:21,119-Speed 13861.23 samples/sec Loss 5.5238 LearningRate 0.0010 Epoch: 4 Global Step: 8360 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-03 12:16:38,827-Speed 13878.78 samples/sec Loss 5.4468 LearningRate 0.0010 Epoch: 4 Global Step: 8370 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-03 12:16:56,565-Speed 13856.45 samples/sec Loss 5.4910 LearningRate 0.0010 Epoch: 4 Global Step: 8380 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-03 12:17:14,274-Speed 13878.64 samples/sec Loss 5.4645 LearningRate 0.0010 Epoch: 4 Global Step: 8390 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-03 12:17:31,984-Speed 13877.81 samples/sec Loss 5.4844 LearningRate 0.0010 Epoch: 4 Global Step: 8400 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-03 12:17:49,819-Speed 13780.38 samples/sec Loss 5.5219 LearningRate 0.0010 Epoch: 4 Global Step: 8410 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-03 12:18:07,543-Speed 13866.89 samples/sec Loss 5.4591 LearningRate 0.0010 Epoch: 4 Global Step: 8420 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-03 12:18:25,287-Speed 13851.35 samples/sec Loss 5.4413 LearningRate 0.0010 Epoch: 4 Global Step: 8430 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-03 12:18:42,975-Speed 13894.46 samples/sec Loss 5.4389 LearningRate 0.0010 Epoch: 4 Global Step: 8440 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-03 12:19:00,697-Speed 13868.04 samples/sec Loss 5.4626 LearningRate 0.0010 Epoch: 4 Global Step: 8450 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-03 12:19:18,448-Speed 13845.77 samples/sec Loss 5.4210 LearningRate 0.0010 Epoch: 4 Global Step: 8460 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-03 12:19:36,154-Speed 13881.18 samples/sec Loss 5.3834 LearningRate 0.0010 Epoch: 4 Global Step: 8470 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-03 12:19:53,854-Speed 13886.33 samples/sec Loss 5.4207 LearningRate 0.0010 Epoch: 4 Global Step: 8480 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-03 12:20:11,590-Speed 13857.54 samples/sec Loss 5.4703 LearningRate 0.0009 Epoch: 4 Global Step: 8490 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-03 12:20:29,253-Speed 13915.25 samples/sec Loss 5.4452 LearningRate 0.0009 Epoch: 4 Global Step: 8500 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-03 12:20:46,947-Speed 13889.98 samples/sec Loss 5.3985 LearningRate 0.0009 Epoch: 4 Global Step: 8510 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-03 12:21:04,702-Speed 13842.55 samples/sec Loss 5.3866 LearningRate 0.0009 Epoch: 4 Global Step: 8520 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-03-03 12:21:22,417-Speed 13874.28 samples/sec Loss 5.3425 LearningRate 0.0009 Epoch: 4 Global Step: 8530 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-03-03 12:21:40,172-Speed 13842.37 samples/sec Loss 5.4026 LearningRate 0.0009 Epoch: 4 Global Step: 8540 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-03 12:21:57,908-Speed 13857.11 samples/sec Loss 5.3651 LearningRate 0.0009 Epoch: 4 Global Step: 8550 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-03 12:22:15,612-Speed 13882.35 samples/sec Loss 5.3694 LearningRate 0.0009 Epoch: 4 Global Step: 8560 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-03 12:22:33,260-Speed 13927.22 samples/sec Loss 5.4168 LearningRate 0.0009 Epoch: 4 Global Step: 8570 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-03 12:22:50,968-Speed 13879.43 samples/sec Loss 5.4318 LearningRate 0.0009 Epoch: 4 Global Step: 8580 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-03 12:23:08,708-Speed 13854.30 samples/sec Loss 5.3632 LearningRate 0.0009 Epoch: 4 Global Step: 8590 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-03 12:23:26,359-Speed 13924.10 samples/sec Loss 5.3415 LearningRate 0.0009 Epoch: 4 Global Step: 8600 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-03 12:23:44,095-Speed 13858.18 samples/sec Loss 5.3200 LearningRate 0.0009 Epoch: 4 Global Step: 8610 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-03 12:24:01,860-Speed 13834.95 samples/sec Loss 5.4301 LearningRate 0.0009 Epoch: 4 Global Step: 8620 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-03 12:24:19,598-Speed 13855.59 samples/sec Loss 5.3968 LearningRate 0.0009 Epoch: 4 Global Step: 8630 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-03 12:24:37,319-Speed 13869.18 samples/sec Loss 5.3822 LearningRate 0.0009 Epoch: 4 Global Step: 8640 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-03 12:25:45,088-Speed 3626.50 samples/sec Loss 5.2538 LearningRate 0.0009 Epoch: 5 Global Step: 8650 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-03 12:26:02,780-Speed 13891.67 samples/sec Loss 5.2590 LearningRate 0.0009 Epoch: 5 Global Step: 8660 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-03 12:26:20,467-Speed 13896.11 samples/sec Loss 5.2486 LearningRate 0.0009 Epoch: 5 Global Step: 8670 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-03 12:26:38,101-Speed 13937.79 samples/sec Loss 5.2361 LearningRate 0.0009 Epoch: 5 Global Step: 8680 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-03 12:26:55,888-Speed 13817.88 samples/sec Loss 5.2126 LearningRate 0.0009 Epoch: 5 Global Step: 8690 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-03 12:27:13,611-Speed 13867.51 samples/sec Loss 5.2032 LearningRate 0.0009 Epoch: 5 Global Step: 8700 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-03 12:27:31,360-Speed 13847.29 samples/sec Loss 5.2472 LearningRate 0.0009 Epoch: 5 Global Step: 8710 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-03 12:27:49,063-Speed 13883.26 samples/sec Loss 5.2589 LearningRate 0.0009 Epoch: 5 Global Step: 8720 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-03 12:28:06,974-Speed 13721.49 samples/sec Loss 5.2320 LearningRate 0.0009 Epoch: 5 Global Step: 8730 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-03 12:28:24,773-Speed 13808.92 samples/sec Loss 5.2187 LearningRate 0.0009 Epoch: 5 Global Step: 8740 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-03 12:28:42,548-Speed 13827.95 samples/sec Loss 5.2072 LearningRate 0.0009 Epoch: 5 Global Step: 8750 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-03 12:29:00,289-Speed 13853.71 samples/sec Loss 5.2125 LearningRate 0.0009 Epoch: 5 Global Step: 8760 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-03 12:29:17,988-Speed 13887.12 samples/sec Loss 5.2141 LearningRate 0.0009 Epoch: 5 Global Step: 8770 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-03 12:29:35,665-Speed 13902.75 samples/sec Loss 5.2140 LearningRate 0.0009 Epoch: 5 Global Step: 8780 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-03 12:29:53,423-Speed 13840.51 samples/sec Loss 5.2261 LearningRate 0.0009 Epoch: 5 Global Step: 8790 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-03 12:30:11,141-Speed 13871.43 samples/sec Loss 5.1935 LearningRate 0.0009 Epoch: 5 Global Step: 8800 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-03 12:30:28,844-Speed 13883.77 samples/sec Loss 5.1517 LearningRate 0.0009 Epoch: 5 Global Step: 8810 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-03 12:30:46,722-Speed 13747.44 samples/sec Loss 5.1879 LearningRate 0.0009 Epoch: 5 Global Step: 8820 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-03 12:31:04,429-Speed 13879.09 samples/sec Loss 5.2668 LearningRate 0.0009 Epoch: 5 Global Step: 8830 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-03 12:31:22,152-Speed 13868.27 samples/sec Loss 5.1722 LearningRate 0.0009 Epoch: 5 Global Step: 8840 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-03 12:31:39,871-Speed 13870.28 samples/sec Loss 5.1607 LearningRate 0.0009 Epoch: 5 Global Step: 8850 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-03 12:31:57,537-Speed 13912.63 samples/sec Loss 5.1710 LearningRate 0.0009 Epoch: 5 Global Step: 8860 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-03 12:32:15,300-Speed 13836.68 samples/sec Loss 5.1177 LearningRate 0.0009 Epoch: 5 Global Step: 8870 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-03 12:32:33,028-Speed 13863.76 samples/sec Loss 5.1800 LearningRate 0.0009 Epoch: 5 Global Step: 8880 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-03 12:32:50,701-Speed 13906.51 samples/sec Loss 5.1770 LearningRate 0.0009 Epoch: 5 Global Step: 8890 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-03 12:33:08,403-Speed 13884.75 samples/sec Loss 5.1484 LearningRate 0.0009 Epoch: 5 Global Step: 8900 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-03 12:33:26,115-Speed 13875.51 samples/sec Loss 5.1315 LearningRate 0.0009 Epoch: 5 Global Step: 8910 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-03 12:33:43,899-Speed 13820.61 samples/sec Loss 5.1557 LearningRate 0.0009 Epoch: 5 Global Step: 8920 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-03 12:34:01,628-Speed 13862.82 samples/sec Loss 5.1618 LearningRate 0.0009 Epoch: 5 Global Step: 8930 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-03 12:34:19,410-Speed 13821.76 samples/sec Loss 5.2165 LearningRate 0.0009 Epoch: 5 Global Step: 8940 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-03 12:34:37,121-Speed 13877.10 samples/sec Loss 5.1359 LearningRate 0.0009 Epoch: 5 Global Step: 8950 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-03 12:34:54,848-Speed 13864.31 samples/sec Loss 5.1074 LearningRate 0.0009 Epoch: 5 Global Step: 8960 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-03 12:35:12,544-Speed 13889.32 samples/sec Loss 5.1146 LearningRate 0.0009 Epoch: 5 Global Step: 8970 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-03 12:35:30,223-Speed 13902.00 samples/sec Loss 5.1149 LearningRate 0.0009 Epoch: 5 Global Step: 8980 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-03 12:35:47,984-Speed 13838.70 samples/sec Loss 5.0975 LearningRate 0.0009 Epoch: 5 Global Step: 8990 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-03 12:36:05,651-Speed 13912.73 samples/sec Loss 5.1304 LearningRate 0.0009 Epoch: 5 Global Step: 9000 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-03 12:36:23,420-Speed 13831.59 samples/sec Loss 5.1105 LearningRate 0.0009 Epoch: 5 Global Step: 9010 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-03 12:36:41,142-Speed 13868.46 samples/sec Loss 5.0839 LearningRate 0.0009 Epoch: 5 Global Step: 9020 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-03 12:36:58,798-Speed 13920.48 samples/sec Loss 5.0750 LearningRate 0.0009 Epoch: 5 Global Step: 9030 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-03 12:37:16,578-Speed 13825.16 samples/sec Loss 5.0688 LearningRate 0.0009 Epoch: 5 Global Step: 9040 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-03 12:37:34,282-Speed 13883.20 samples/sec Loss 5.0592 LearningRate 0.0009 Epoch: 5 Global Step: 9050 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-03 12:37:51,931-Speed 13925.46 samples/sec Loss 5.0309 LearningRate 0.0009 Epoch: 5 Global Step: 9060 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-03 12:38:09,566-Speed 13937.01 samples/sec Loss 5.0588 LearningRate 0.0009 Epoch: 5 Global Step: 9070 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-03 12:38:27,277-Speed 13876.85 samples/sec Loss 5.0716 LearningRate 0.0009 Epoch: 5 Global Step: 9080 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-03 12:38:45,155-Speed 13747.25 samples/sec Loss 5.0358 LearningRate 0.0009 Epoch: 5 Global Step: 9090 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-03 12:39:02,883-Speed 13863.10 samples/sec Loss 5.0187 LearningRate 0.0009 Epoch: 5 Global Step: 9100 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-03 12:39:20,660-Speed 13826.19 samples/sec Loss 5.0667 LearningRate 0.0009 Epoch: 5 Global Step: 9110 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-03 12:39:38,342-Speed 13899.62 samples/sec Loss 5.0335 LearningRate 0.0009 Epoch: 5 Global Step: 9120 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-03 12:39:56,061-Speed 13871.86 samples/sec Loss 5.0512 LearningRate 0.0009 Epoch: 5 Global Step: 9130 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-03-03 12:40:13,741-Speed 13901.47 samples/sec Loss 5.0174 LearningRate 0.0009 Epoch: 5 Global Step: 9140 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-03-03 12:40:31,534-Speed 13815.06 samples/sec Loss 5.0282 LearningRate 0.0009 Epoch: 5 Global Step: 9150 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-03-03 12:40:49,260-Speed 13865.10 samples/sec Loss 5.0220 LearningRate 0.0009 Epoch: 5 Global Step: 9160 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-03-03 12:41:06,940-Speed 13901.18 samples/sec Loss 4.9853 LearningRate 0.0009 Epoch: 5 Global Step: 9170 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-03-03 12:41:24,618-Speed 13903.39 samples/sec Loss 5.0486 LearningRate 0.0009 Epoch: 5 Global Step: 9180 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-03-03 12:41:42,305-Speed 13896.08 samples/sec Loss 4.9868 LearningRate 0.0009 Epoch: 5 Global Step: 9190 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-03-03 12:42:00,050-Speed 13850.53 samples/sec Loss 4.9471 LearningRate 0.0009 Epoch: 5 Global Step: 9200 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-03-03 12:42:17,794-Speed 13851.20 samples/sec Loss 4.9594 LearningRate 0.0009 Epoch: 5 Global Step: 9210 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-03-03 12:42:35,542-Speed 13847.83 samples/sec Loss 4.9822 LearningRate 0.0009 Epoch: 5 Global Step: 9220 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-03-03 12:42:53,342-Speed 13808.51 samples/sec Loss 5.0119 LearningRate 0.0009 Epoch: 5 Global Step: 9230 Fp16 Grad Scale: 16384 Required: 30 hours Training: 2022-03-03 12:43:11,071-Speed 13862.93 samples/sec Loss 5.0280 LearningRate 0.0009 Epoch: 5 Global Step: 9240 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-03 12:43:28,825-Speed 13843.38 samples/sec Loss 4.9590 LearningRate 0.0009 Epoch: 5 Global Step: 9250 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-03 12:43:46,520-Speed 13888.79 samples/sec Loss 4.9672 LearningRate 0.0009 Epoch: 5 Global Step: 9260 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-03 12:44:04,244-Speed 13868.41 samples/sec Loss 4.9393 LearningRate 0.0009 Epoch: 5 Global Step: 9270 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-03 12:44:21,976-Speed 13860.27 samples/sec Loss 4.9227 LearningRate 0.0009 Epoch: 5 Global Step: 9280 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-03 12:44:39,723-Speed 13849.65 samples/sec Loss 4.9632 LearningRate 0.0009 Epoch: 5 Global Step: 9290 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-03 12:44:57,524-Speed 13806.50 samples/sec Loss 4.9243 LearningRate 0.0009 Epoch: 5 Global Step: 9300 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-03 12:45:15,214-Speed 13893.98 samples/sec Loss 4.9456 LearningRate 0.0009 Epoch: 5 Global Step: 9310 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-03 12:45:32,925-Speed 13877.51 samples/sec Loss 4.9745 LearningRate 0.0009 Epoch: 5 Global Step: 9320 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-03 12:45:50,764-Speed 13776.76 samples/sec Loss 4.8887 LearningRate 0.0009 Epoch: 5 Global Step: 9330 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-03 12:46:08,507-Speed 13852.08 samples/sec Loss 4.9672 LearningRate 0.0009 Epoch: 5 Global Step: 9340 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-03 12:46:26,221-Speed 13875.42 samples/sec Loss 4.9097 LearningRate 0.0009 Epoch: 5 Global Step: 9350 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-03 12:46:43,958-Speed 13855.67 samples/sec Loss 4.9202 LearningRate 0.0009 Epoch: 5 Global Step: 9360 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-03 12:47:01,597-Speed 13933.93 samples/sec Loss 4.8767 LearningRate 0.0009 Epoch: 5 Global Step: 9370 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-03 12:47:19,288-Speed 13892.54 samples/sec Loss 4.9070 LearningRate 0.0009 Epoch: 5 Global Step: 9380 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-03 12:47:37,004-Speed 13873.49 samples/sec Loss 4.9208 LearningRate 0.0009 Epoch: 5 Global Step: 9390 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-03 12:47:54,827-Speed 13790.51 samples/sec Loss 4.9760 LearningRate 0.0009 Epoch: 5 Global Step: 9400 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-03 12:48:12,616-Speed 13817.17 samples/sec Loss 4.8860 LearningRate 0.0009 Epoch: 5 Global Step: 9410 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-03 12:48:30,345-Speed 13863.36 samples/sec Loss 4.8421 LearningRate 0.0009 Epoch: 5 Global Step: 9420 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-03 12:48:48,041-Speed 13888.22 samples/sec Loss 4.8891 LearningRate 0.0009 Epoch: 5 Global Step: 9430 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-03 12:49:05,778-Speed 13856.79 samples/sec Loss 4.8751 LearningRate 0.0009 Epoch: 5 Global Step: 9440 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-03-03 12:49:23,554-Speed 13826.23 samples/sec Loss 4.8713 LearningRate 0.0009 Epoch: 5 Global Step: 9450 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-03-03 12:49:41,369-Speed 13796.14 samples/sec Loss 4.8659 LearningRate 0.0009 Epoch: 5 Global Step: 9460 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-03 12:49:59,117-Speed 13848.09 samples/sec Loss 4.8399 LearningRate 0.0009 Epoch: 5 Global Step: 9470 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-03 12:50:16,877-Speed 13839.18 samples/sec Loss 4.8295 LearningRate 0.0009 Epoch: 5 Global Step: 9480 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-03 12:50:34,695-Speed 13793.53 samples/sec Loss 4.8555 LearningRate 0.0009 Epoch: 5 Global Step: 9490 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-03 12:50:52,452-Speed 13840.99 samples/sec Loss 4.8607 LearningRate 0.0009 Epoch: 5 Global Step: 9500 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-03 12:51:10,296-Speed 13773.58 samples/sec Loss 4.8234 LearningRate 0.0009 Epoch: 5 Global Step: 9510 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-03 12:51:28,047-Speed 13845.67 samples/sec Loss 4.8291 LearningRate 0.0009 Epoch: 5 Global Step: 9520 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-03 12:51:45,772-Speed 13866.29 samples/sec Loss 4.8258 LearningRate 0.0009 Epoch: 5 Global Step: 9530 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-03 12:52:03,483-Speed 13876.74 samples/sec Loss 4.8458 LearningRate 0.0009 Epoch: 5 Global Step: 9540 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-03 12:52:21,260-Speed 13826.60 samples/sec Loss 4.8755 LearningRate 0.0009 Epoch: 5 Global Step: 9550 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-03 12:52:39,008-Speed 13848.00 samples/sec Loss 4.8168 LearningRate 0.0009 Epoch: 5 Global Step: 9560 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-03 12:52:56,790-Speed 13821.26 samples/sec Loss 4.8153 LearningRate 0.0009 Epoch: 5 Global Step: 9570 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-03 12:53:14,585-Speed 13812.02 samples/sec Loss 4.7891 LearningRate 0.0009 Epoch: 5 Global Step: 9580 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-03 12:53:32,366-Speed 13822.38 samples/sec Loss 4.7897 LearningRate 0.0009 Epoch: 5 Global Step: 9590 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-03 12:53:50,124-Speed 13839.89 samples/sec Loss 4.7952 LearningRate 0.0009 Epoch: 5 Global Step: 9600 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-03 12:54:08,067-Speed 13698.58 samples/sec Loss 4.7872 LearningRate 0.0009 Epoch: 5 Global Step: 9610 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-03 12:54:25,824-Speed 13841.13 samples/sec Loss 4.8123 LearningRate 0.0009 Epoch: 5 Global Step: 9620 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-03 12:54:43,562-Speed 13855.77 samples/sec Loss 4.7709 LearningRate 0.0009 Epoch: 5 Global Step: 9630 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-03 12:55:01,316-Speed 13842.67 samples/sec Loss 4.7686 LearningRate 0.0009 Epoch: 5 Global Step: 9640 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-03 12:55:19,043-Speed 13865.20 samples/sec Loss 4.7669 LearningRate 0.0009 Epoch: 5 Global Step: 9650 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-03 12:55:36,851-Speed 13802.83 samples/sec Loss 4.7883 LearningRate 0.0009 Epoch: 5 Global Step: 9660 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-03 12:55:54,621-Speed 13830.43 samples/sec Loss 4.7549 LearningRate 0.0009 Epoch: 5 Global Step: 9670 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-03-03 12:56:12,381-Speed 13838.55 samples/sec Loss 4.8214 LearningRate 0.0009 Epoch: 5 Global Step: 9680 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-03-03 12:56:30,149-Speed 13832.79 samples/sec Loss 4.7804 LearningRate 0.0009 Epoch: 5 Global Step: 9690 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-03-03 12:56:47,846-Speed 13887.74 samples/sec Loss 4.7429 LearningRate 0.0009 Epoch: 5 Global Step: 9700 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-03 12:57:05,599-Speed 13844.21 samples/sec Loss 4.7618 LearningRate 0.0009 Epoch: 5 Global Step: 9710 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-03 12:57:23,297-Speed 13887.23 samples/sec Loss 4.7299 LearningRate 0.0009 Epoch: 5 Global Step: 9720 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-03 12:57:41,008-Speed 13877.32 samples/sec Loss 4.7755 LearningRate 0.0009 Epoch: 5 Global Step: 9730 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-03 12:57:58,698-Speed 13893.27 samples/sec Loss 4.7079 LearningRate 0.0009 Epoch: 5 Global Step: 9740 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-03 12:58:16,386-Speed 13895.59 samples/sec Loss 4.7475 LearningRate 0.0009 Epoch: 5 Global Step: 9750 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-03 12:58:34,076-Speed 13893.53 samples/sec Loss 4.7513 LearningRate 0.0009 Epoch: 5 Global Step: 9760 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-03 12:58:51,847-Speed 13829.97 samples/sec Loss 4.7016 LearningRate 0.0009 Epoch: 5 Global Step: 9770 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-03 12:59:09,643-Speed 13811.14 samples/sec Loss 4.7397 LearningRate 0.0009 Epoch: 5 Global Step: 9780 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-03 12:59:27,348-Speed 13881.93 samples/sec Loss 4.7068 LearningRate 0.0009 Epoch: 5 Global Step: 9790 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-03 12:59:45,178-Speed 13784.49 samples/sec Loss 4.6934 LearningRate 0.0009 Epoch: 5 Global Step: 9800 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-03-03 13:00:02,932-Speed 13843.79 samples/sec Loss 4.7009 LearningRate 0.0009 Epoch: 5 Global Step: 9810 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-03 13:00:20,720-Speed 13816.40 samples/sec Loss 4.6957 LearningRate 0.0009 Epoch: 5 Global Step: 9820 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-03 13:00:38,438-Speed 13871.60 samples/sec Loss 4.7186 LearningRate 0.0009 Epoch: 5 Global Step: 9830 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-03 13:00:56,124-Speed 13896.11 samples/sec Loss 4.6883 LearningRate 0.0009 Epoch: 5 Global Step: 9840 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-03 13:01:13,853-Speed 13863.07 samples/sec Loss 4.7098 LearningRate 0.0009 Epoch: 5 Global Step: 9850 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-03 13:01:31,560-Speed 13880.18 samples/sec Loss 4.7058 LearningRate 0.0009 Epoch: 5 Global Step: 9860 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-03 13:01:49,370-Speed 13799.93 samples/sec Loss 4.6503 LearningRate 0.0009 Epoch: 5 Global Step: 9870 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-03 13:02:07,172-Speed 13805.85 samples/sec Loss 4.6414 LearningRate 0.0009 Epoch: 5 Global Step: 9880 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-03 13:02:24,871-Speed 13886.77 samples/sec Loss 4.6894 LearningRate 0.0009 Epoch: 5 Global Step: 9890 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-03 13:02:42,680-Speed 13800.39 samples/sec Loss 4.6821 LearningRate 0.0009 Epoch: 5 Global Step: 9900 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-03 13:03:00,458-Speed 13825.17 samples/sec Loss 4.6539 LearningRate 0.0009 Epoch: 5 Global Step: 9910 Fp16 Grad Scale: 131072 Required: 30 hours Training: 2022-03-03 13:03:18,114-Speed 13920.27 samples/sec Loss 4.6407 LearningRate 0.0009 Epoch: 5 Global Step: 9920 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-03 13:03:35,835-Speed 13868.37 samples/sec Loss 4.6177 LearningRate 0.0009 Epoch: 5 Global Step: 9930 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-03 13:03:53,688-Speed 13767.33 samples/sec Loss 4.6223 LearningRate 0.0009 Epoch: 5 Global Step: 9940 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-03 13:04:11,442-Speed 13843.59 samples/sec Loss 4.6491 LearningRate 0.0009 Epoch: 5 Global Step: 9950 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-03 13:04:29,159-Speed 13871.75 samples/sec Loss 4.5989 LearningRate 0.0009 Epoch: 5 Global Step: 9960 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-03 13:04:46,923-Speed 13835.52 samples/sec Loss 4.6926 LearningRate 0.0009 Epoch: 5 Global Step: 9970 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-03 13:05:04,658-Speed 13858.80 samples/sec Loss 4.6977 LearningRate 0.0009 Epoch: 5 Global Step: 9980 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-03 13:05:22,379-Speed 13868.74 samples/sec Loss 4.6733 LearningRate 0.0009 Epoch: 5 Global Step: 9990 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-03 13:05:40,181-Speed 13806.88 samples/sec Loss 4.6660 LearningRate 0.0009 Epoch: 5 Global Step: 10000 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-03 13:05:57,943-Speed 13836.55 samples/sec Loss 4.6398 LearningRate 0.0009 Epoch: 5 Global Step: 10010 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-03 13:06:15,708-Speed 13835.29 samples/sec Loss 4.6150 LearningRate 0.0009 Epoch: 5 Global Step: 10020 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-03 13:06:33,396-Speed 13894.88 samples/sec Loss 4.5863 LearningRate 0.0009 Epoch: 5 Global Step: 10030 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-03 13:06:51,083-Speed 13895.79 samples/sec Loss 4.6099 LearningRate 0.0009 Epoch: 5 Global Step: 10040 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-03 13:07:08,809-Speed 13865.48 samples/sec Loss 4.5835 LearningRate 0.0009 Epoch: 5 Global Step: 10050 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-03 13:07:26,543-Speed 13858.42 samples/sec Loss 4.6457 LearningRate 0.0009 Epoch: 5 Global Step: 10060 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-03 13:07:44,295-Speed 13845.37 samples/sec Loss 4.6141 LearningRate 0.0009 Epoch: 5 Global Step: 10070 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-03 13:08:01,980-Speed 13897.21 samples/sec Loss 4.5827 LearningRate 0.0009 Epoch: 5 Global Step: 10080 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-03 13:08:19,761-Speed 13822.60 samples/sec Loss 4.5661 LearningRate 0.0009 Epoch: 5 Global Step: 10090 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-03 13:08:37,498-Speed 13857.08 samples/sec Loss 4.5692 LearningRate 0.0009 Epoch: 5 Global Step: 10100 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-03 13:08:55,222-Speed 13866.52 samples/sec Loss 4.5770 LearningRate 0.0009 Epoch: 5 Global Step: 10110 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-03 13:09:12,900-Speed 13903.77 samples/sec Loss 4.5989 LearningRate 0.0009 Epoch: 5 Global Step: 10120 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-03 13:09:30,633-Speed 13860.82 samples/sec Loss 4.5757 LearningRate 0.0009 Epoch: 5 Global Step: 10130 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-03 13:09:48,426-Speed 13813.39 samples/sec Loss 4.5630 LearningRate 0.0009 Epoch: 5 Global Step: 10140 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-03 13:10:06,241-Speed 13795.43 samples/sec Loss 4.5321 LearningRate 0.0009 Epoch: 5 Global Step: 10150 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-03 13:10:24,023-Speed 13823.02 samples/sec Loss 4.5641 LearningRate 0.0009 Epoch: 5 Global Step: 10160 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-03 13:10:41,824-Speed 13806.55 samples/sec Loss 4.5852 LearningRate 0.0009 Epoch: 5 Global Step: 10170 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-03 13:10:59,571-Speed 13849.30 samples/sec Loss 4.5667 LearningRate 0.0009 Epoch: 5 Global Step: 10180 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-03 13:11:17,229-Speed 13918.43 samples/sec Loss 4.5368 LearningRate 0.0009 Epoch: 5 Global Step: 10190 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-03 13:11:34,941-Speed 13876.38 samples/sec Loss 4.5643 LearningRate 0.0009 Epoch: 5 Global Step: 10200 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-03 13:11:52,688-Speed 13848.55 samples/sec Loss 4.5362 LearningRate 0.0009 Epoch: 5 Global Step: 10210 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-03 13:12:10,428-Speed 13854.46 samples/sec Loss 4.5372 LearningRate 0.0009 Epoch: 5 Global Step: 10220 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-03 13:12:28,179-Speed 13846.12 samples/sec Loss 4.5440 LearningRate 0.0009 Epoch: 5 Global Step: 10230 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-03 13:12:45,939-Speed 13838.97 samples/sec Loss 4.5281 LearningRate 0.0009 Epoch: 5 Global Step: 10240 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-03 13:13:03,653-Speed 13874.61 samples/sec Loss 4.5034 LearningRate 0.0009 Epoch: 5 Global Step: 10250 Fp16 Grad Scale: 65536 Required: 30 hours Training: 2022-03-03 13:13:21,491-Speed 13777.88 samples/sec Loss 4.5442 LearningRate 0.0009 Epoch: 5 Global Step: 10260 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-03 13:13:39,248-Speed 13841.27 samples/sec Loss 4.5154 LearningRate 0.0009 Epoch: 5 Global Step: 10270 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-03 13:13:57,002-Speed 13843.44 samples/sec Loss 4.5502 LearningRate 0.0009 Epoch: 5 Global Step: 10280 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-03 13:14:14,733-Speed 13861.11 samples/sec Loss 4.5341 LearningRate 0.0009 Epoch: 5 Global Step: 10290 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-03 13:14:32,414-Speed 13900.69 samples/sec Loss 4.5079 LearningRate 0.0009 Epoch: 5 Global Step: 10300 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-03 13:14:50,137-Speed 13866.98 samples/sec Loss 4.5216 LearningRate 0.0009 Epoch: 5 Global Step: 10310 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-03 13:15:07,842-Speed 13882.11 samples/sec Loss 4.4972 LearningRate 0.0009 Epoch: 5 Global Step: 10320 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-03 13:15:25,563-Speed 13868.95 samples/sec Loss 4.4960 LearningRate 0.0009 Epoch: 5 Global Step: 10330 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-03 13:15:43,361-Speed 13809.28 samples/sec Loss 4.5869 LearningRate 0.0009 Epoch: 5 Global Step: 10340 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-03 13:16:01,063-Speed 13883.24 samples/sec Loss 4.5318 LearningRate 0.0009 Epoch: 5 Global Step: 10350 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-03 13:16:18,818-Speed 13843.23 samples/sec Loss 4.5826 LearningRate 0.0009 Epoch: 5 Global Step: 10360 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-03 13:16:36,484-Speed 13912.48 samples/sec Loss 4.5705 LearningRate 0.0009 Epoch: 5 Global Step: 10370 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-03 13:17:45,343-Speed 3569.10 samples/sec Loss 4.4227 LearningRate 0.0009 Epoch: 6 Global Step: 10380 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-03 13:18:03,077-Speed 13858.56 samples/sec Loss 4.3843 LearningRate 0.0009 Epoch: 6 Global Step: 10390 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-03-03 13:18:20,764-Speed 13897.35 samples/sec Loss 4.4755 LearningRate 0.0009 Epoch: 6 Global Step: 10400 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-03 13:18:38,457-Speed 13890.67 samples/sec Loss 4.4648 LearningRate 0.0009 Epoch: 6 Global Step: 10410 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-03 13:18:56,178-Speed 13869.18 samples/sec Loss 4.4196 LearningRate 0.0009 Epoch: 6 Global Step: 10420 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-03 13:19:13,936-Speed 13839.86 samples/sec Loss 4.4214 LearningRate 0.0009 Epoch: 6 Global Step: 10430 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-03 13:19:31,762-Speed 13787.78 samples/sec Loss 4.4639 LearningRate 0.0009 Epoch: 6 Global Step: 10440 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-03 13:19:49,490-Speed 13863.57 samples/sec Loss 4.3857 LearningRate 0.0009 Epoch: 6 Global Step: 10450 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-03 13:20:07,170-Speed 13901.58 samples/sec Loss 4.3911 LearningRate 0.0009 Epoch: 6 Global Step: 10460 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-03 13:20:24,861-Speed 13893.22 samples/sec Loss 4.4117 LearningRate 0.0009 Epoch: 6 Global Step: 10470 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-03 13:20:42,584-Speed 13867.61 samples/sec Loss 4.4262 LearningRate 0.0009 Epoch: 6 Global Step: 10480 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-03 13:21:00,300-Speed 13873.06 samples/sec Loss 4.4024 LearningRate 0.0009 Epoch: 6 Global Step: 10490 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-03 13:21:18,044-Speed 13850.53 samples/sec Loss 4.4290 LearningRate 0.0009 Epoch: 6 Global Step: 10500 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-03 13:21:35,771-Speed 13865.08 samples/sec Loss 4.4426 LearningRate 0.0009 Epoch: 6 Global Step: 10510 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-03 13:21:53,486-Speed 13874.05 samples/sec Loss 4.4113 LearningRate 0.0009 Epoch: 6 Global Step: 10520 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-03 13:22:11,189-Speed 13882.77 samples/sec Loss 4.4396 LearningRate 0.0009 Epoch: 6 Global Step: 10530 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-03 13:22:28,897-Speed 13880.03 samples/sec Loss 4.4107 LearningRate 0.0009 Epoch: 6 Global Step: 10540 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-03 13:22:46,653-Speed 13841.76 samples/sec Loss 4.4087 LearningRate 0.0009 Epoch: 6 Global Step: 10550 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-03 13:23:04,365-Speed 13876.40 samples/sec Loss 4.3965 LearningRate 0.0009 Epoch: 6 Global Step: 10560 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-03 13:23:22,079-Speed 13874.62 samples/sec Loss 4.4331 LearningRate 0.0009 Epoch: 6 Global Step: 10570 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-03 13:23:39,808-Speed 13863.12 samples/sec Loss 4.4328 LearningRate 0.0009 Epoch: 6 Global Step: 10580 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-03 13:23:57,509-Speed 13884.80 samples/sec Loss 4.3935 LearningRate 0.0009 Epoch: 6 Global Step: 10590 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-03 13:24:15,207-Speed 13887.02 samples/sec Loss 4.3629 LearningRate 0.0009 Epoch: 6 Global Step: 10600 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-03 13:24:33,004-Speed 13809.87 samples/sec Loss 4.3856 LearningRate 0.0009 Epoch: 6 Global Step: 10610 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-03 13:24:50,718-Speed 13874.95 samples/sec Loss 4.4524 LearningRate 0.0009 Epoch: 6 Global Step: 10620 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-03 13:25:08,409-Speed 13892.57 samples/sec Loss 4.4106 LearningRate 0.0009 Epoch: 6 Global Step: 10630 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-03 13:25:26,141-Speed 13861.02 samples/sec Loss 4.3618 LearningRate 0.0009 Epoch: 6 Global Step: 10640 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-03 13:25:43,870-Speed 13862.36 samples/sec Loss 4.3721 LearningRate 0.0009 Epoch: 6 Global Step: 10650 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-03 13:26:01,631-Speed 13838.05 samples/sec Loss 4.3764 LearningRate 0.0009 Epoch: 6 Global Step: 10660 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-03 13:26:19,435-Speed 13804.80 samples/sec Loss 4.3913 LearningRate 0.0009 Epoch: 6 Global Step: 10670 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-03 13:26:37,176-Speed 13853.21 samples/sec Loss 4.3751 LearningRate 0.0009 Epoch: 6 Global Step: 10680 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-03 13:26:55,019-Speed 13774.66 samples/sec Loss 4.3620 LearningRate 0.0009 Epoch: 6 Global Step: 10690 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-03-03 13:27:12,729-Speed 13878.39 samples/sec Loss 4.3311 LearningRate 0.0009 Epoch: 6 Global Step: 10700 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-03-03 13:27:30,485-Speed 13842.13 samples/sec Loss 4.3745 LearningRate 0.0009 Epoch: 6 Global Step: 10710 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-03-03 13:27:48,240-Speed 13843.54 samples/sec Loss 4.3587 LearningRate 0.0009 Epoch: 6 Global Step: 10720 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-03 13:28:05,983-Speed 13852.45 samples/sec Loss 4.3436 LearningRate 0.0009 Epoch: 6 Global Step: 10730 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-03 13:28:23,735-Speed 13844.59 samples/sec Loss 4.3117 LearningRate 0.0009 Epoch: 6 Global Step: 10740 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-03 13:28:41,515-Speed 13823.07 samples/sec Loss 4.3916 LearningRate 0.0009 Epoch: 6 Global Step: 10750 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-03 13:28:59,332-Speed 13794.28 samples/sec Loss 4.3839 LearningRate 0.0009 Epoch: 6 Global Step: 10760 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-03 13:29:17,088-Speed 13842.31 samples/sec Loss 4.3289 LearningRate 0.0009 Epoch: 6 Global Step: 10770 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-03 13:29:34,828-Speed 13854.64 samples/sec Loss 4.3220 LearningRate 0.0009 Epoch: 6 Global Step: 10780 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-03 13:29:52,537-Speed 13878.50 samples/sec Loss 4.3435 LearningRate 0.0009 Epoch: 6 Global Step: 10790 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-03 13:30:10,311-Speed 13827.33 samples/sec Loss 4.4349 LearningRate 0.0009 Epoch: 6 Global Step: 10800 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-03 13:30:28,062-Speed 13846.36 samples/sec Loss 4.3427 LearningRate 0.0009 Epoch: 6 Global Step: 10810 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-03 13:30:45,768-Speed 13880.90 samples/sec Loss 4.3200 LearningRate 0.0009 Epoch: 6 Global Step: 10820 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-03-03 13:31:03,499-Speed 13861.35 samples/sec Loss 4.3191 LearningRate 0.0009 Epoch: 6 Global Step: 10830 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-03-03 13:31:21,202-Speed 13883.01 samples/sec Loss 4.3318 LearningRate 0.0009 Epoch: 6 Global Step: 10840 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-03-03 13:31:38,916-Speed 13874.68 samples/sec Loss 4.3203 LearningRate 0.0009 Epoch: 6 Global Step: 10850 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-03-03 13:31:56,651-Speed 13858.26 samples/sec Loss 4.3102 LearningRate 0.0009 Epoch: 6 Global Step: 10860 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-03 13:32:14,386-Speed 13858.32 samples/sec Loss 4.2910 LearningRate 0.0009 Epoch: 6 Global Step: 10870 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-03 13:32:31,997-Speed 13955.97 samples/sec Loss 4.2909 LearningRate 0.0009 Epoch: 6 Global Step: 10880 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-03 13:32:49,746-Speed 13847.16 samples/sec Loss 4.3356 LearningRate 0.0009 Epoch: 6 Global Step: 10890 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-03 13:33:07,504-Speed 13840.40 samples/sec Loss 4.3302 LearningRate 0.0009 Epoch: 6 Global Step: 10900 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-03 13:33:25,170-Speed 13912.00 samples/sec Loss 4.2901 LearningRate 0.0009 Epoch: 6 Global Step: 10910 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-03 13:33:42,936-Speed 13833.88 samples/sec Loss 4.2812 LearningRate 0.0009 Epoch: 6 Global Step: 10920 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-03 13:34:00,669-Speed 13860.26 samples/sec Loss 4.2549 LearningRate 0.0009 Epoch: 6 Global Step: 10930 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-03 13:34:18,428-Speed 13839.58 samples/sec Loss 4.2731 LearningRate 0.0009 Epoch: 6 Global Step: 10940 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-03 13:34:36,188-Speed 13838.30 samples/sec Loss 4.2904 LearningRate 0.0009 Epoch: 6 Global Step: 10950 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-03 13:34:53,972-Speed 13820.22 samples/sec Loss 4.3379 LearningRate 0.0009 Epoch: 6 Global Step: 10960 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-03 13:35:11,678-Speed 13880.89 samples/sec Loss 4.2952 LearningRate 0.0009 Epoch: 6 Global Step: 10970 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-03 13:35:29,432-Speed 13843.79 samples/sec Loss 4.2316 LearningRate 0.0009 Epoch: 6 Global Step: 10980 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-03 13:35:47,155-Speed 13866.83 samples/sec Loss 4.2504 LearningRate 0.0009 Epoch: 6 Global Step: 10990 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-03 13:36:05,061-Speed 13726.73 samples/sec Loss 4.2834 LearningRate 0.0009 Epoch: 6 Global Step: 11000 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-03 13:36:22,757-Speed 13888.90 samples/sec Loss 4.2595 LearningRate 0.0009 Epoch: 6 Global Step: 11010 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-03 13:36:40,545-Speed 13816.62 samples/sec Loss 4.2533 LearningRate 0.0009 Epoch: 6 Global Step: 11020 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-03 13:36:58,239-Speed 13890.18 samples/sec Loss 4.2350 LearningRate 0.0009 Epoch: 6 Global Step: 11030 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-03 13:37:15,917-Speed 13903.35 samples/sec Loss 4.2259 LearningRate 0.0009 Epoch: 6 Global Step: 11040 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-03 13:37:33,588-Speed 13908.15 samples/sec Loss 4.2807 LearningRate 0.0009 Epoch: 6 Global Step: 11050 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-03 13:37:51,311-Speed 13867.78 samples/sec Loss 4.2715 LearningRate 0.0009 Epoch: 6 Global Step: 11060 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-03 13:38:09,042-Speed 13861.40 samples/sec Loss 4.2694 LearningRate 0.0009 Epoch: 6 Global Step: 11070 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-03 13:38:26,825-Speed 13820.79 samples/sec Loss 4.2325 LearningRate 0.0009 Epoch: 6 Global Step: 11080 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-03 13:38:44,560-Speed 13858.30 samples/sec Loss 4.2862 LearningRate 0.0009 Epoch: 6 Global Step: 11090 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-03 13:39:02,383-Speed 13789.81 samples/sec Loss 4.2568 LearningRate 0.0009 Epoch: 6 Global Step: 11100 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-03 13:39:20,153-Speed 13831.17 samples/sec Loss 4.2010 LearningRate 0.0009 Epoch: 6 Global Step: 11110 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-03 13:39:37,888-Speed 13857.86 samples/sec Loss 4.2337 LearningRate 0.0009 Epoch: 6 Global Step: 11120 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-03 13:39:55,619-Speed 13861.42 samples/sec Loss 4.2153 LearningRate 0.0009 Epoch: 6 Global Step: 11130 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-03 13:40:13,430-Speed 13799.20 samples/sec Loss 4.1995 LearningRate 0.0009 Epoch: 6 Global Step: 11140 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-03 13:40:31,150-Speed 13869.70 samples/sec Loss 4.2040 LearningRate 0.0009 Epoch: 6 Global Step: 11150 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-03 13:40:48,905-Speed 13842.93 samples/sec Loss 4.1988 LearningRate 0.0009 Epoch: 6 Global Step: 11160 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-03 13:41:06,621-Speed 13873.12 samples/sec Loss 4.2231 LearningRate 0.0009 Epoch: 6 Global Step: 11170 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-03 13:41:24,340-Speed 13870.63 samples/sec Loss 4.1961 LearningRate 0.0009 Epoch: 6 Global Step: 11180 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-03 13:41:42,023-Speed 13898.68 samples/sec Loss 4.2056 LearningRate 0.0009 Epoch: 6 Global Step: 11190 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-03 13:41:59,782-Speed 13839.72 samples/sec Loss 4.2104 LearningRate 0.0009 Epoch: 6 Global Step: 11200 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-03 13:42:17,493-Speed 13876.98 samples/sec Loss 4.1848 LearningRate 0.0009 Epoch: 6 Global Step: 11210 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-03 13:42:35,191-Speed 13887.40 samples/sec Loss 4.2488 LearningRate 0.0009 Epoch: 6 Global Step: 11220 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-03 13:42:52,882-Speed 13892.87 samples/sec Loss 4.1959 LearningRate 0.0009 Epoch: 6 Global Step: 11230 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-03 13:43:10,644-Speed 13837.06 samples/sec Loss 4.2263 LearningRate 0.0009 Epoch: 6 Global Step: 11240 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-03 13:43:28,348-Speed 13882.43 samples/sec Loss 4.1974 LearningRate 0.0009 Epoch: 6 Global Step: 11250 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-03 13:43:46,031-Speed 13899.54 samples/sec Loss 4.2117 LearningRate 0.0009 Epoch: 6 Global Step: 11260 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-03 13:44:03,752-Speed 13869.21 samples/sec Loss 4.1336 LearningRate 0.0009 Epoch: 6 Global Step: 11270 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-03 13:44:21,407-Speed 13920.74 samples/sec Loss 4.1619 LearningRate 0.0009 Epoch: 6 Global Step: 11280 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-03 13:44:39,151-Speed 13851.15 samples/sec Loss 4.1779 LearningRate 0.0009 Epoch: 6 Global Step: 11290 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-03 13:44:56,945-Speed 13812.21 samples/sec Loss 4.1527 LearningRate 0.0009 Epoch: 6 Global Step: 11300 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-03 13:45:14,639-Speed 13890.82 samples/sec Loss 4.1755 LearningRate 0.0009 Epoch: 6 Global Step: 11310 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-03 13:45:32,323-Speed 13897.81 samples/sec Loss 4.1804 LearningRate 0.0009 Epoch: 6 Global Step: 11320 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-03 13:45:49,990-Speed 13911.32 samples/sec Loss 4.1268 LearningRate 0.0009 Epoch: 6 Global Step: 11330 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-03 13:46:07,711-Speed 13869.68 samples/sec Loss 4.1987 LearningRate 0.0009 Epoch: 6 Global Step: 11340 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-03 13:46:25,383-Speed 13907.69 samples/sec Loss 4.1651 LearningRate 0.0009 Epoch: 6 Global Step: 11350 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-03 13:46:43,127-Speed 13851.17 samples/sec Loss 4.1667 LearningRate 0.0009 Epoch: 6 Global Step: 11360 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-03 13:47:00,865-Speed 13855.45 samples/sec Loss 4.1361 LearningRate 0.0009 Epoch: 6 Global Step: 11370 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-03 13:47:18,549-Speed 13898.63 samples/sec Loss 4.1578 LearningRate 0.0009 Epoch: 6 Global Step: 11380 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-03 13:47:36,261-Speed 13875.74 samples/sec Loss 4.1711 LearningRate 0.0009 Epoch: 6 Global Step: 11390 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-03 13:47:54,009-Speed 13847.99 samples/sec Loss 4.1418 LearningRate 0.0009 Epoch: 6 Global Step: 11400 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-03 13:48:11,665-Speed 13920.21 samples/sec Loss 4.1073 LearningRate 0.0009 Epoch: 6 Global Step: 11410 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-03 13:48:29,356-Speed 13893.50 samples/sec Loss 4.1232 LearningRate 0.0009 Epoch: 6 Global Step: 11420 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-03 13:48:47,062-Speed 13881.26 samples/sec Loss 4.0710 LearningRate 0.0009 Epoch: 6 Global Step: 11430 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-03 13:49:04,810-Speed 13848.40 samples/sec Loss 4.1820 LearningRate 0.0009 Epoch: 6 Global Step: 11440 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-03 13:49:22,518-Speed 13880.17 samples/sec Loss 4.1844 LearningRate 0.0009 Epoch: 6 Global Step: 11450 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-03 13:49:40,258-Speed 13853.96 samples/sec Loss 4.1287 LearningRate 0.0009 Epoch: 6 Global Step: 11460 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-03-03 13:49:57,940-Speed 13899.58 samples/sec Loss 4.1520 LearningRate 0.0009 Epoch: 6 Global Step: 11470 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-03 13:50:15,686-Speed 13849.89 samples/sec Loss 4.0918 LearningRate 0.0009 Epoch: 6 Global Step: 11480 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-03 13:50:33,392-Speed 13881.28 samples/sec Loss 4.0890 LearningRate 0.0009 Epoch: 6 Global Step: 11490 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-03 13:50:51,150-Speed 13840.43 samples/sec Loss 4.1343 LearningRate 0.0009 Epoch: 6 Global Step: 11500 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-03 13:51:08,850-Speed 13885.21 samples/sec Loss 4.1636 LearningRate 0.0009 Epoch: 6 Global Step: 11510 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-03 13:51:26,573-Speed 13867.95 samples/sec Loss 4.0960 LearningRate 0.0009 Epoch: 6 Global Step: 11520 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-03 13:51:44,379-Speed 13803.08 samples/sec Loss 4.1059 LearningRate 0.0009 Epoch: 6 Global Step: 11530 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-03 13:52:02,167-Speed 13817.21 samples/sec Loss 4.0907 LearningRate 0.0009 Epoch: 6 Global Step: 11540 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-03 13:52:19,907-Speed 13853.74 samples/sec Loss 4.0987 LearningRate 0.0009 Epoch: 6 Global Step: 11550 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-03 13:52:37,567-Speed 13917.12 samples/sec Loss 4.1204 LearningRate 0.0009 Epoch: 6 Global Step: 11560 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-03 13:52:55,402-Speed 13781.08 samples/sec Loss 4.0884 LearningRate 0.0009 Epoch: 6 Global Step: 11570 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-03-03 13:53:13,081-Speed 13902.29 samples/sec Loss 4.1062 LearningRate 0.0009 Epoch: 6 Global Step: 11580 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-03-03 13:53:30,807-Speed 13864.86 samples/sec Loss 4.0900 LearningRate 0.0009 Epoch: 6 Global Step: 11590 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-03-03 13:53:48,560-Speed 13843.63 samples/sec Loss 4.1061 LearningRate 0.0009 Epoch: 6 Global Step: 11600 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-03 13:54:06,332-Speed 13830.29 samples/sec Loss 4.0990 LearningRate 0.0009 Epoch: 6 Global Step: 11610 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-03 13:54:24,070-Speed 13855.40 samples/sec Loss 4.0880 LearningRate 0.0009 Epoch: 6 Global Step: 11620 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-03 13:54:41,870-Speed 13807.81 samples/sec Loss 4.1153 LearningRate 0.0009 Epoch: 6 Global Step: 11630 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-03 13:54:59,664-Speed 13812.34 samples/sec Loss 4.0500 LearningRate 0.0009 Epoch: 6 Global Step: 11640 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-03 13:55:17,357-Speed 13891.34 samples/sec Loss 4.0478 LearningRate 0.0009 Epoch: 6 Global Step: 11650 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-03 13:55:35,088-Speed 13861.18 samples/sec Loss 4.0816 LearningRate 0.0009 Epoch: 6 Global Step: 11660 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-03 13:55:52,864-Speed 13826.04 samples/sec Loss 4.0267 LearningRate 0.0009 Epoch: 6 Global Step: 11670 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-03 13:56:10,552-Speed 13894.98 samples/sec Loss 4.0950 LearningRate 0.0009 Epoch: 6 Global Step: 11680 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-03 13:56:28,240-Speed 13895.28 samples/sec Loss 4.0496 LearningRate 0.0009 Epoch: 6 Global Step: 11690 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-03 13:56:45,957-Speed 13872.07 samples/sec Loss 4.0884 LearningRate 0.0009 Epoch: 6 Global Step: 11700 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-03-03 13:57:03,656-Speed 13886.38 samples/sec Loss 4.0551 LearningRate 0.0009 Epoch: 6 Global Step: 11710 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-03-03 13:57:21,375-Speed 13870.73 samples/sec Loss 4.0835 LearningRate 0.0009 Epoch: 6 Global Step: 11720 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-03-03 13:57:39,037-Speed 13915.51 samples/sec Loss 4.0287 LearningRate 0.0009 Epoch: 6 Global Step: 11730 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-03 13:57:56,778-Speed 13853.86 samples/sec Loss 4.0561 LearningRate 0.0009 Epoch: 6 Global Step: 11740 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-03 13:58:14,479-Speed 13884.45 samples/sec Loss 4.0495 LearningRate 0.0009 Epoch: 6 Global Step: 11750 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-03 13:58:32,248-Speed 13831.74 samples/sec Loss 4.0361 LearningRate 0.0009 Epoch: 6 Global Step: 11760 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-03 13:58:49,983-Speed 13858.39 samples/sec Loss 4.0344 LearningRate 0.0008 Epoch: 6 Global Step: 11770 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-03 13:59:07,699-Speed 13873.36 samples/sec Loss 4.0179 LearningRate 0.0008 Epoch: 6 Global Step: 11780 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-03 13:59:25,378-Speed 13902.01 samples/sec Loss 4.0404 LearningRate 0.0008 Epoch: 6 Global Step: 11790 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-03 13:59:43,052-Speed 13906.33 samples/sec Loss 4.0356 LearningRate 0.0008 Epoch: 6 Global Step: 11800 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-03 14:00:00,780-Speed 13863.40 samples/sec Loss 4.0842 LearningRate 0.0008 Epoch: 6 Global Step: 11810 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-03 14:00:18,534-Speed 13843.78 samples/sec Loss 4.0269 LearningRate 0.0008 Epoch: 6 Global Step: 11820 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-03 14:00:36,254-Speed 13869.76 samples/sec Loss 4.0065 LearningRate 0.0008 Epoch: 6 Global Step: 11830 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-03-03 14:00:54,021-Speed 13835.41 samples/sec Loss 4.0479 LearningRate 0.0008 Epoch: 6 Global Step: 11840 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-03-03 14:01:11,811-Speed 13815.40 samples/sec Loss 4.0335 LearningRate 0.0008 Epoch: 6 Global Step: 11850 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-03 14:01:29,488-Speed 13903.13 samples/sec Loss 4.0328 LearningRate 0.0008 Epoch: 6 Global Step: 11860 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-03 14:01:47,195-Speed 13880.61 samples/sec Loss 4.0386 LearningRate 0.0008 Epoch: 6 Global Step: 11870 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-03 14:02:04,903-Speed 13879.75 samples/sec Loss 4.0038 LearningRate 0.0008 Epoch: 6 Global Step: 11880 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-03 14:02:22,580-Speed 13903.04 samples/sec Loss 3.9863 LearningRate 0.0008 Epoch: 6 Global Step: 11890 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-03 14:02:40,354-Speed 13828.47 samples/sec Loss 3.9696 LearningRate 0.0008 Epoch: 6 Global Step: 11900 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-03 14:02:58,057-Speed 13883.07 samples/sec Loss 4.0315 LearningRate 0.0008 Epoch: 6 Global Step: 11910 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-03 14:03:15,779-Speed 13868.20 samples/sec Loss 4.0844 LearningRate 0.0008 Epoch: 6 Global Step: 11920 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-03 14:03:33,547-Speed 13834.20 samples/sec Loss 4.0250 LearningRate 0.0008 Epoch: 6 Global Step: 11930 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-03 14:03:51,313-Speed 13833.89 samples/sec Loss 4.0171 LearningRate 0.0008 Epoch: 6 Global Step: 11940 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-03 14:04:09,122-Speed 13800.25 samples/sec Loss 3.9711 LearningRate 0.0008 Epoch: 6 Global Step: 11950 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-03-03 14:04:26,915-Speed 13813.36 samples/sec Loss 3.9838 LearningRate 0.0008 Epoch: 6 Global Step: 11960 Fp16 Grad Scale: 131072 Required: 29 hours Training: 2022-03-03 14:04:44,635-Speed 13869.70 samples/sec Loss 3.9764 LearningRate 0.0008 Epoch: 6 Global Step: 11970 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-03 14:05:02,459-Speed 13788.72 samples/sec Loss 3.9848 LearningRate 0.0008 Epoch: 6 Global Step: 11980 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-03 14:05:20,186-Speed 13867.65 samples/sec Loss 4.0272 LearningRate 0.0008 Epoch: 6 Global Step: 11990 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-03 14:05:37,862-Speed 13904.08 samples/sec Loss 3.9593 LearningRate 0.0008 Epoch: 6 Global Step: 12000 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-03 14:05:55,582-Speed 13870.17 samples/sec Loss 3.9987 LearningRate 0.0008 Epoch: 6 Global Step: 12010 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-03 14:06:13,287-Speed 13881.96 samples/sec Loss 4.0003 LearningRate 0.0008 Epoch: 6 Global Step: 12020 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-03 14:06:30,978-Speed 13892.54 samples/sec Loss 3.9565 LearningRate 0.0008 Epoch: 6 Global Step: 12030 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-03 14:06:48,720-Speed 13852.95 samples/sec Loss 3.9762 LearningRate 0.0008 Epoch: 6 Global Step: 12040 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-03 14:07:06,463-Speed 13851.82 samples/sec Loss 3.9890 LearningRate 0.0008 Epoch: 6 Global Step: 12050 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-03 14:07:24,244-Speed 13822.31 samples/sec Loss 3.9883 LearningRate 0.0008 Epoch: 6 Global Step: 12060 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-03 14:07:41,971-Speed 13864.67 samples/sec Loss 3.9880 LearningRate 0.0008 Epoch: 6 Global Step: 12070 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-03 14:07:59,659-Speed 13894.79 samples/sec Loss 4.0565 LearningRate 0.0008 Epoch: 6 Global Step: 12080 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-03 14:08:17,468-Speed 13800.97 samples/sec Loss 4.0223 LearningRate 0.0008 Epoch: 6 Global Step: 12090 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-03 14:09:25,635-Speed 3605.35 samples/sec Loss 3.9772 LearningRate 0.0008 Epoch: 7 Global Step: 12100 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-03 14:09:43,307-Speed 13907.18 samples/sec Loss 3.9027 LearningRate 0.0008 Epoch: 7 Global Step: 12110 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-03 14:10:01,080-Speed 13828.37 samples/sec Loss 3.9277 LearningRate 0.0008 Epoch: 7 Global Step: 12120 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-03 14:10:18,774-Speed 13890.37 samples/sec Loss 3.9066 LearningRate 0.0008 Epoch: 7 Global Step: 12130 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-03 14:10:36,529-Speed 13842.71 samples/sec Loss 3.9121 LearningRate 0.0008 Epoch: 7 Global Step: 12140 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-03 14:10:54,395-Speed 13756.73 samples/sec Loss 3.9282 LearningRate 0.0008 Epoch: 7 Global Step: 12150 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-03 14:11:12,196-Speed 13806.98 samples/sec Loss 3.9125 LearningRate 0.0008 Epoch: 7 Global Step: 12160 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-03 14:11:29,955-Speed 13840.28 samples/sec Loss 3.9020 LearningRate 0.0008 Epoch: 7 Global Step: 12170 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-03 14:11:47,851-Speed 13733.60 samples/sec Loss 3.9000 LearningRate 0.0008 Epoch: 7 Global Step: 12180 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-03 14:12:05,679-Speed 13785.94 samples/sec Loss 3.8904 LearningRate 0.0008 Epoch: 7 Global Step: 12190 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-03 14:12:23,451-Speed 13829.34 samples/sec Loss 3.9042 LearningRate 0.0008 Epoch: 7 Global Step: 12200 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-03 14:12:41,236-Speed 13819.39 samples/sec Loss 3.9471 LearningRate 0.0008 Epoch: 7 Global Step: 12210 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-03 14:12:59,041-Speed 13803.84 samples/sec Loss 3.8942 LearningRate 0.0008 Epoch: 7 Global Step: 12220 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-03 14:13:16,876-Speed 13780.56 samples/sec Loss 3.9240 LearningRate 0.0008 Epoch: 7 Global Step: 12230 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-03 14:13:34,701-Speed 13788.25 samples/sec Loss 3.9141 LearningRate 0.0008 Epoch: 7 Global Step: 12240 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-03 14:13:52,466-Speed 13834.45 samples/sec Loss 3.9293 LearningRate 0.0008 Epoch: 7 Global Step: 12250 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-03 14:14:10,311-Speed 13773.57 samples/sec Loss 3.8795 LearningRate 0.0008 Epoch: 7 Global Step: 12260 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-03 14:14:28,133-Speed 13790.32 samples/sec Loss 3.9009 LearningRate 0.0008 Epoch: 7 Global Step: 12270 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-03 14:14:45,932-Speed 13808.50 samples/sec Loss 3.8710 LearningRate 0.0008 Epoch: 7 Global Step: 12280 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-03 14:15:03,773-Speed 13775.68 samples/sec Loss 3.8969 LearningRate 0.0008 Epoch: 7 Global Step: 12290 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-03-03 14:15:21,500-Speed 13864.72 samples/sec Loss 3.9424 LearningRate 0.0008 Epoch: 7 Global Step: 12300 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-03 14:15:39,264-Speed 13835.35 samples/sec Loss 3.8876 LearningRate 0.0008 Epoch: 7 Global Step: 12310 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-03 14:15:57,054-Speed 13815.14 samples/sec Loss 3.9175 LearningRate 0.0008 Epoch: 7 Global Step: 12320 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-03 14:16:14,966-Speed 13721.91 samples/sec Loss 3.8803 LearningRate 0.0008 Epoch: 7 Global Step: 12330 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-03 14:16:32,831-Speed 13757.19 samples/sec Loss 3.8720 LearningRate 0.0008 Epoch: 7 Global Step: 12340 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-03-03 14:16:50,655-Speed 13789.14 samples/sec Loss 3.8897 LearningRate 0.0008 Epoch: 7 Global Step: 12350 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-03-03 14:17:08,390-Speed 13858.19 samples/sec Loss 3.8888 LearningRate 0.0008 Epoch: 7 Global Step: 12360 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-03-03 14:17:26,134-Speed 13851.58 samples/sec Loss 3.9031 LearningRate 0.0008 Epoch: 7 Global Step: 12370 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-03-03 14:17:43,845-Speed 13877.47 samples/sec Loss 3.8828 LearningRate 0.0008 Epoch: 7 Global Step: 12380 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-03-03 14:18:01,608-Speed 13836.07 samples/sec Loss 3.8501 LearningRate 0.0008 Epoch: 7 Global Step: 12390 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-03-03 14:18:19,390-Speed 13821.88 samples/sec Loss 3.8735 LearningRate 0.0008 Epoch: 7 Global Step: 12400 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-03 14:18:37,113-Speed 13867.40 samples/sec Loss 3.8932 LearningRate 0.0008 Epoch: 7 Global Step: 12410 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-03 14:18:55,002-Speed 13739.00 samples/sec Loss 3.9244 LearningRate 0.0008 Epoch: 7 Global Step: 12420 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-03 14:19:12,714-Speed 13876.39 samples/sec Loss 3.9157 LearningRate 0.0008 Epoch: 7 Global Step: 12430 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-03 14:19:30,383-Speed 13909.77 samples/sec Loss 3.8561 LearningRate 0.0008 Epoch: 7 Global Step: 12440 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-03 14:19:48,033-Speed 13925.27 samples/sec Loss 3.8797 LearningRate 0.0008 Epoch: 7 Global Step: 12450 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-03 14:20:05,739-Speed 13880.78 samples/sec Loss 3.8566 LearningRate 0.0008 Epoch: 7 Global Step: 12460 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-03 14:20:23,399-Speed 13917.68 samples/sec Loss 3.8480 LearningRate 0.0008 Epoch: 7 Global Step: 12470 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-03 14:20:41,134-Speed 13858.44 samples/sec Loss 3.8749 LearningRate 0.0008 Epoch: 7 Global Step: 12480 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-03 14:20:58,796-Speed 13915.46 samples/sec Loss 3.8819 LearningRate 0.0008 Epoch: 7 Global Step: 12490 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-03-03 14:21:16,470-Speed 13906.48 samples/sec Loss 3.8782 LearningRate 0.0008 Epoch: 7 Global Step: 12500 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-03-03 14:21:34,185-Speed 13874.15 samples/sec Loss 3.8330 LearningRate 0.0008 Epoch: 7 Global Step: 12510 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-03-03 14:21:51,922-Speed 13856.77 samples/sec Loss 3.8247 LearningRate 0.0008 Epoch: 7 Global Step: 12520 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-03-03 14:22:09,818-Speed 13733.68 samples/sec Loss 3.8916 LearningRate 0.0008 Epoch: 7 Global Step: 12530 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-03-03 14:22:27,529-Speed 13877.21 samples/sec Loss 3.8774 LearningRate 0.0008 Epoch: 7 Global Step: 12540 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-03-03 14:22:45,239-Speed 13877.60 samples/sec Loss 3.8327 LearningRate 0.0008 Epoch: 7 Global Step: 12550 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-03-03 14:23:03,001-Speed 13837.17 samples/sec Loss 3.8245 LearningRate 0.0008 Epoch: 7 Global Step: 12560 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-03-03 14:23:20,728-Speed 13864.32 samples/sec Loss 3.8647 LearningRate 0.0008 Epoch: 7 Global Step: 12570 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-03-03 14:23:38,390-Speed 13915.62 samples/sec Loss 3.8235 LearningRate 0.0008 Epoch: 7 Global Step: 12580 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-03-03 14:23:56,126-Speed 13857.76 samples/sec Loss 3.8272 LearningRate 0.0008 Epoch: 7 Global Step: 12590 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-03 14:24:13,828-Speed 13883.72 samples/sec Loss 3.8405 LearningRate 0.0008 Epoch: 7 Global Step: 12600 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-03 14:24:31,543-Speed 13874.75 samples/sec Loss 3.8157 LearningRate 0.0008 Epoch: 7 Global Step: 12610 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-03 14:24:49,315-Speed 13828.84 samples/sec Loss 3.8131 LearningRate 0.0008 Epoch: 7 Global Step: 12620 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-03 14:25:07,137-Speed 13790.83 samples/sec Loss 3.8518 LearningRate 0.0008 Epoch: 7 Global Step: 12630 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-03 14:25:24,819-Speed 13900.04 samples/sec Loss 3.8565 LearningRate 0.0008 Epoch: 7 Global Step: 12640 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-03 14:25:42,553-Speed 13859.70 samples/sec Loss 3.8163 LearningRate 0.0008 Epoch: 7 Global Step: 12650 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-03 14:26:00,262-Speed 13878.28 samples/sec Loss 3.8265 LearningRate 0.0008 Epoch: 7 Global Step: 12660 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-03 14:26:17,945-Speed 13898.95 samples/sec Loss 3.8207 LearningRate 0.0008 Epoch: 7 Global Step: 12670 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-03 14:26:35,653-Speed 13879.22 samples/sec Loss 3.7937 LearningRate 0.0008 Epoch: 7 Global Step: 12680 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-03 14:26:53,420-Speed 13833.43 samples/sec Loss 3.7835 LearningRate 0.0008 Epoch: 7 Global Step: 12690 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-03-03 14:27:11,226-Speed 13803.36 samples/sec Loss 3.7732 LearningRate 0.0008 Epoch: 7 Global Step: 12700 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-03-03 14:27:28,961-Speed 13857.58 samples/sec Loss 3.8449 LearningRate 0.0008 Epoch: 7 Global Step: 12710 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-03 14:27:46,731-Speed 13831.37 samples/sec Loss 3.8107 LearningRate 0.0008 Epoch: 7 Global Step: 12720 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-03 14:28:04,475-Speed 13851.55 samples/sec Loss 3.7895 LearningRate 0.0008 Epoch: 7 Global Step: 12730 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-03 14:28:22,165-Speed 13893.28 samples/sec Loss 3.8021 LearningRate 0.0008 Epoch: 7 Global Step: 12740 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-03 14:28:39,918-Speed 13844.09 samples/sec Loss 3.8359 LearningRate 0.0008 Epoch: 7 Global Step: 12750 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-03 14:28:57,608-Speed 13893.24 samples/sec Loss 3.8157 LearningRate 0.0008 Epoch: 7 Global Step: 12760 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-03 14:29:15,284-Speed 13905.06 samples/sec Loss 3.7779 LearningRate 0.0008 Epoch: 7 Global Step: 12770 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-03 14:29:33,000-Speed 13873.26 samples/sec Loss 3.7810 LearningRate 0.0008 Epoch: 7 Global Step: 12780 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-03 14:29:50,831-Speed 13782.99 samples/sec Loss 3.7648 LearningRate 0.0008 Epoch: 7 Global Step: 12790 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-03 14:30:08,524-Speed 13891.80 samples/sec Loss 3.7958 LearningRate 0.0008 Epoch: 7 Global Step: 12800 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-03 14:30:26,389-Speed 13757.59 samples/sec Loss 3.7963 LearningRate 0.0008 Epoch: 7 Global Step: 12810 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-03-03 14:30:44,106-Speed 13871.85 samples/sec Loss 3.7834 LearningRate 0.0008 Epoch: 7 Global Step: 12820 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-03 14:31:01,769-Speed 13914.85 samples/sec Loss 3.7670 LearningRate 0.0008 Epoch: 7 Global Step: 12830 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-03 14:31:19,519-Speed 13846.15 samples/sec Loss 3.8476 LearningRate 0.0008 Epoch: 7 Global Step: 12840 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-03 14:31:37,247-Speed 13863.97 samples/sec Loss 3.8015 LearningRate 0.0008 Epoch: 7 Global Step: 12850 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-03 14:31:54,962-Speed 13875.40 samples/sec Loss 3.7568 LearningRate 0.0008 Epoch: 7 Global Step: 12860 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-03 14:32:12,698-Speed 13857.12 samples/sec Loss 3.7969 LearningRate 0.0008 Epoch: 7 Global Step: 12870 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-03 14:32:30,473-Speed 13827.23 samples/sec Loss 3.7615 LearningRate 0.0008 Epoch: 7 Global Step: 12880 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-03 14:32:48,167-Speed 13892.36 samples/sec Loss 3.7791 LearningRate 0.0008 Epoch: 7 Global Step: 12890 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-03 14:33:05,793-Speed 13943.91 samples/sec Loss 3.7934 LearningRate 0.0008 Epoch: 7 Global Step: 12900 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-03-03 14:33:23,576-Speed 13820.85 samples/sec Loss 3.7554 LearningRate 0.0008 Epoch: 7 Global Step: 12910 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-03-03 14:33:41,271-Speed 13891.24 samples/sec Loss 3.7254 LearningRate 0.0008 Epoch: 7 Global Step: 12920 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-03-03 14:33:59,077-Speed 13803.22 samples/sec Loss 3.7645 LearningRate 0.0008 Epoch: 7 Global Step: 12930 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-03-03 14:34:16,752-Speed 13905.20 samples/sec Loss 3.8233 LearningRate 0.0008 Epoch: 7 Global Step: 12940 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-03-03 14:34:34,451-Speed 13886.22 samples/sec Loss 3.7483 LearningRate 0.0008 Epoch: 7 Global Step: 12950 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-03-03 14:34:52,208-Speed 13840.96 samples/sec Loss 3.7414 LearningRate 0.0008 Epoch: 7 Global Step: 12960 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-03-03 14:35:09,909-Speed 13885.08 samples/sec Loss 3.7384 LearningRate 0.0008 Epoch: 7 Global Step: 12970 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-03-03 14:35:27,653-Speed 13851.56 samples/sec Loss 3.7579 LearningRate 0.0008 Epoch: 7 Global Step: 12980 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-03-03 14:35:45,365-Speed 13876.15 samples/sec Loss 3.7539 LearningRate 0.0008 Epoch: 7 Global Step: 12990 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-03-03 14:36:03,093-Speed 13863.78 samples/sec Loss 3.7229 LearningRate 0.0008 Epoch: 7 Global Step: 13000 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-03 14:36:20,782-Speed 13894.26 samples/sec Loss 3.7070 LearningRate 0.0008 Epoch: 7 Global Step: 13010 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-03 14:36:38,523-Speed 13853.84 samples/sec Loss 3.7459 LearningRate 0.0008 Epoch: 7 Global Step: 13020 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-03 14:36:56,281-Speed 13840.25 samples/sec Loss 3.7247 LearningRate 0.0008 Epoch: 7 Global Step: 13030 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-03 14:37:14,020-Speed 13855.09 samples/sec Loss 3.7448 LearningRate 0.0008 Epoch: 7 Global Step: 13040 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-03-03 14:37:31,701-Speed 13900.45 samples/sec Loss 3.7002 LearningRate 0.0008 Epoch: 7 Global Step: 13050 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-03-03 14:37:49,382-Speed 13900.50 samples/sec Loss 3.6871 LearningRate 0.0008 Epoch: 7 Global Step: 13060 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-03-03 14:38:07,144-Speed 13838.28 samples/sec Loss 3.7151 LearningRate 0.0008 Epoch: 7 Global Step: 13070 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-03-03 14:38:24,821-Speed 13903.88 samples/sec Loss 3.7259 LearningRate 0.0008 Epoch: 7 Global Step: 13080 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-03-03 14:38:42,556-Speed 13858.71 samples/sec Loss 3.7103 LearningRate 0.0008 Epoch: 7 Global Step: 13090 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-03-03 14:39:00,278-Speed 13868.32 samples/sec Loss 3.7389 LearningRate 0.0008 Epoch: 7 Global Step: 13100 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-03-03 14:39:18,075-Speed 13809.88 samples/sec Loss 3.7108 LearningRate 0.0008 Epoch: 7 Global Step: 13110 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-03-03 14:39:35,772-Speed 13888.25 samples/sec Loss 3.7044 LearningRate 0.0008 Epoch: 7 Global Step: 13120 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-03-03 14:39:53,522-Speed 13846.24 samples/sec Loss 3.7048 LearningRate 0.0008 Epoch: 7 Global Step: 13130 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-03-03 14:40:11,291-Speed 13831.99 samples/sec Loss 3.7257 LearningRate 0.0008 Epoch: 7 Global Step: 13140 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-03 14:40:28,991-Speed 13885.84 samples/sec Loss 3.6882 LearningRate 0.0008 Epoch: 7 Global Step: 13150 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-03 14:40:46,756-Speed 13834.39 samples/sec Loss 3.7231 LearningRate 0.0008 Epoch: 7 Global Step: 13160 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-03-03 14:41:04,603-Speed 13772.37 samples/sec Loss 3.7099 LearningRate 0.0008 Epoch: 7 Global Step: 13170 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-03-03 14:41:22,348-Speed 13850.36 samples/sec Loss 3.7211 LearningRate 0.0008 Epoch: 7 Global Step: 13180 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-03-03 14:41:40,154-Speed 13812.29 samples/sec Loss 3.7005 LearningRate 0.0008 Epoch: 7 Global Step: 13190 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-03-03 14:41:57,932-Speed 13824.25 samples/sec Loss 3.6590 LearningRate 0.0008 Epoch: 7 Global Step: 13200 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-03-03 14:42:15,734-Speed 13811.69 samples/sec Loss 3.7195 LearningRate 0.0008 Epoch: 7 Global Step: 13210 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-03-03 14:42:33,543-Speed 13800.39 samples/sec Loss 3.7045 LearningRate 0.0008 Epoch: 7 Global Step: 13220 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-03-03 14:42:51,290-Speed 13858.37 samples/sec Loss 3.6679 LearningRate 0.0008 Epoch: 7 Global Step: 13230 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-03-03 14:43:09,068-Speed 13824.36 samples/sec Loss 3.6915 LearningRate 0.0008 Epoch: 7 Global Step: 13240 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-03-03 14:43:26,837-Speed 13836.41 samples/sec Loss 3.6813 LearningRate 0.0008 Epoch: 7 Global Step: 13250 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-03-03 14:43:44,628-Speed 13815.07 samples/sec Loss 3.6863 LearningRate 0.0008 Epoch: 7 Global Step: 13260 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-03 14:44:02,346-Speed 13871.26 samples/sec Loss 3.6693 LearningRate 0.0008 Epoch: 7 Global Step: 13270 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-03-03 14:44:20,146-Speed 13807.59 samples/sec Loss 3.6932 LearningRate 0.0008 Epoch: 7 Global Step: 13280 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-03-03 14:44:38,040-Speed 13734.99 samples/sec Loss 3.6738 LearningRate 0.0008 Epoch: 7 Global Step: 13290 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-03-03 14:44:55,810-Speed 13831.48 samples/sec Loss 3.6880 LearningRate 0.0008 Epoch: 7 Global Step: 13300 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-03-03 14:45:13,580-Speed 13830.83 samples/sec Loss 3.6591 LearningRate 0.0008 Epoch: 7 Global Step: 13310 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-03-03 14:45:31,401-Speed 13791.70 samples/sec Loss 3.6767 LearningRate 0.0008 Epoch: 7 Global Step: 13320 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-03-03 14:45:49,173-Speed 13829.15 samples/sec Loss 3.6701 LearningRate 0.0008 Epoch: 7 Global Step: 13330 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-03-03 14:46:07,018-Speed 13772.87 samples/sec Loss 3.6827 LearningRate 0.0008 Epoch: 7 Global Step: 13340 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-03-03 14:46:24,816-Speed 13809.06 samples/sec Loss 3.6917 LearningRate 0.0008 Epoch: 7 Global Step: 13350 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-03-03 14:46:42,770-Speed 13689.02 samples/sec Loss 3.7077 LearningRate 0.0008 Epoch: 7 Global Step: 13360 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-03-03 14:47:00,543-Speed 13829.32 samples/sec Loss 3.6856 LearningRate 0.0008 Epoch: 7 Global Step: 13370 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-03 14:47:18,427-Speed 13750.95 samples/sec Loss 3.6326 LearningRate 0.0008 Epoch: 7 Global Step: 13380 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-03 14:47:36,262-Speed 13780.77 samples/sec Loss 3.6324 LearningRate 0.0008 Epoch: 7 Global Step: 13390 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-03 14:47:54,081-Speed 13792.58 samples/sec Loss 3.6912 LearningRate 0.0008 Epoch: 7 Global Step: 13400 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-03 14:48:11,929-Speed 13770.82 samples/sec Loss 3.6319 LearningRate 0.0008 Epoch: 7 Global Step: 13410 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-03 14:48:29,846-Speed 13717.49 samples/sec Loss 3.6321 LearningRate 0.0008 Epoch: 7 Global Step: 13420 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-03 14:48:47,658-Speed 13813.40 samples/sec Loss 3.6442 LearningRate 0.0008 Epoch: 7 Global Step: 13430 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-03 14:49:05,429-Speed 13830.38 samples/sec Loss 3.6566 LearningRate 0.0008 Epoch: 7 Global Step: 13440 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-03 14:49:23,253-Speed 13795.51 samples/sec Loss 3.6858 LearningRate 0.0008 Epoch: 7 Global Step: 13450 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-03 14:49:41,029-Speed 13826.01 samples/sec Loss 3.6729 LearningRate 0.0008 Epoch: 7 Global Step: 13460 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-03 14:49:58,808-Speed 13830.62 samples/sec Loss 3.6417 LearningRate 0.0008 Epoch: 7 Global Step: 13470 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-03-03 14:50:16,720-Speed 13721.39 samples/sec Loss 3.6439 LearningRate 0.0008 Epoch: 7 Global Step: 13480 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-03-03 14:50:34,496-Speed 13831.30 samples/sec Loss 3.6549 LearningRate 0.0008 Epoch: 7 Global Step: 13490 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-03-03 14:50:52,358-Speed 13759.65 samples/sec Loss 3.6402 LearningRate 0.0008 Epoch: 7 Global Step: 13500 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-03-03 14:51:10,141-Speed 13820.68 samples/sec Loss 3.6555 LearningRate 0.0008 Epoch: 7 Global Step: 13510 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-03-03 14:51:27,893-Speed 13845.50 samples/sec Loss 3.6451 LearningRate 0.0008 Epoch: 7 Global Step: 13520 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-03-03 14:51:45,706-Speed 13797.24 samples/sec Loss 3.6364 LearningRate 0.0008 Epoch: 7 Global Step: 13530 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-03-03 14:52:03,540-Speed 13781.74 samples/sec Loss 3.6599 LearningRate 0.0008 Epoch: 7 Global Step: 13540 Fp16 Grad Scale: 131072 Required: 28 hours Training: 2022-03-03 14:52:21,419-Speed 13746.25 samples/sec Loss 3.6063 LearningRate 0.0008 Epoch: 7 Global Step: 13550 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-03 14:52:39,185-Speed 13834.51 samples/sec Loss 3.6113 LearningRate 0.0008 Epoch: 7 Global Step: 13560 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-03-03 14:52:56,978-Speed 13812.86 samples/sec Loss 3.6393 LearningRate 0.0008 Epoch: 7 Global Step: 13570 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-03-03 14:53:14,864-Speed 13740.99 samples/sec Loss 3.6483 LearningRate 0.0008 Epoch: 7 Global Step: 13580 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-03-03 14:53:32,644-Speed 13823.63 samples/sec Loss 3.6159 LearningRate 0.0008 Epoch: 7 Global Step: 13590 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-03-03 14:53:50,464-Speed 13791.67 samples/sec Loss 3.6294 LearningRate 0.0008 Epoch: 7 Global Step: 13600 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-03-03 14:54:08,546-Speed 13592.32 samples/sec Loss 3.5961 LearningRate 0.0008 Epoch: 7 Global Step: 13610 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-03-03 14:54:26,669-Speed 13561.58 samples/sec Loss 3.6226 LearningRate 0.0008 Epoch: 7 Global Step: 13620 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-03-03 14:54:44,782-Speed 13569.28 samples/sec Loss 3.6438 LearningRate 0.0008 Epoch: 7 Global Step: 13630 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-03-03 14:55:02,901-Speed 13564.61 samples/sec Loss 3.6401 LearningRate 0.0008 Epoch: 7 Global Step: 13640 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-03-03 14:55:21,038-Speed 13550.90 samples/sec Loss 3.5787 LearningRate 0.0008 Epoch: 7 Global Step: 13650 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-03-03 14:55:39,131-Speed 13583.59 samples/sec Loss 3.6069 LearningRate 0.0008 Epoch: 7 Global Step: 13660 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-03 14:55:57,200-Speed 13602.46 samples/sec Loss 3.5681 LearningRate 0.0008 Epoch: 7 Global Step: 13670 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-03 14:56:15,373-Speed 13524.07 samples/sec Loss 3.5806 LearningRate 0.0008 Epoch: 7 Global Step: 13680 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-03 14:56:33,507-Speed 13553.49 samples/sec Loss 3.6188 LearningRate 0.0008 Epoch: 7 Global Step: 13690 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-03 14:56:51,604-Speed 13590.00 samples/sec Loss 3.6253 LearningRate 0.0008 Epoch: 7 Global Step: 13700 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-03 14:57:09,660-Speed 13612.37 samples/sec Loss 3.6945 LearningRate 0.0008 Epoch: 7 Global Step: 13710 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-03 14:57:27,769-Speed 13571.62 samples/sec Loss 3.6078 LearningRate 0.0008 Epoch: 7 Global Step: 13720 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-03 14:57:45,840-Speed 13601.11 samples/sec Loss 3.5777 LearningRate 0.0008 Epoch: 7 Global Step: 13730 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-03 14:58:03,947-Speed 13572.85 samples/sec Loss 3.5815 LearningRate 0.0008 Epoch: 7 Global Step: 13740 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-03-03 14:58:22,073-Speed 13566.69 samples/sec Loss 3.5639 LearningRate 0.0008 Epoch: 7 Global Step: 13750 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-03-03 14:58:40,197-Speed 13561.09 samples/sec Loss 3.5572 LearningRate 0.0008 Epoch: 7 Global Step: 13760 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-03-03 14:58:58,272-Speed 13603.95 samples/sec Loss 3.6025 LearningRate 0.0008 Epoch: 7 Global Step: 13770 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-03-03 14:59:16,398-Speed 13559.16 samples/sec Loss 3.6677 LearningRate 0.0008 Epoch: 7 Global Step: 13780 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-03-03 14:59:34,462-Speed 13606.11 samples/sec Loss 3.6275 LearningRate 0.0008 Epoch: 7 Global Step: 13790 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-03-03 14:59:52,622-Speed 13534.16 samples/sec Loss 3.6130 LearningRate 0.0008 Epoch: 7 Global Step: 13800 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-03-03 15:00:10,706-Speed 13590.89 samples/sec Loss 3.6209 LearningRate 0.0008 Epoch: 7 Global Step: 13810 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-03-03 15:00:28,828-Speed 13562.13 samples/sec Loss 3.6315 LearningRate 0.0008 Epoch: 7 Global Step: 13820 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-03-03 15:01:44,005-Speed 3269.19 samples/sec Loss 3.5854 LearningRate 0.0008 Epoch: 8 Global Step: 13830 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-03-03 15:02:02,063-Speed 13610.65 samples/sec Loss 3.5567 LearningRate 0.0008 Epoch: 8 Global Step: 13840 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-03 15:02:20,108-Speed 13628.84 samples/sec Loss 3.5499 LearningRate 0.0008 Epoch: 8 Global Step: 13850 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-03 15:02:37,804-Speed 13888.92 samples/sec Loss 3.5261 LearningRate 0.0008 Epoch: 8 Global Step: 13860 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-03 15:02:55,490-Speed 13896.92 samples/sec Loss 3.5480 LearningRate 0.0008 Epoch: 8 Global Step: 13870 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-03-03 15:03:13,137-Speed 13927.90 samples/sec Loss 3.5540 LearningRate 0.0008 Epoch: 8 Global Step: 13880 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-03-03 15:03:30,800-Speed 13915.31 samples/sec Loss 3.5402 LearningRate 0.0008 Epoch: 8 Global Step: 13890 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-03-03 15:03:48,598-Speed 13809.38 samples/sec Loss 3.5572 LearningRate 0.0008 Epoch: 8 Global Step: 13900 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-03-03 15:04:06,239-Speed 13931.96 samples/sec Loss 3.5301 LearningRate 0.0008 Epoch: 8 Global Step: 13910 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-03-03 15:04:23,955-Speed 13873.24 samples/sec Loss 3.5193 LearningRate 0.0008 Epoch: 8 Global Step: 13920 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-03-03 15:04:41,711-Speed 13841.47 samples/sec Loss 3.5545 LearningRate 0.0008 Epoch: 8 Global Step: 13930 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-03-03 15:04:59,420-Speed 13883.68 samples/sec Loss 3.5202 LearningRate 0.0008 Epoch: 8 Global Step: 13940 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-03-03 15:05:17,190-Speed 13830.85 samples/sec Loss 3.5485 LearningRate 0.0008 Epoch: 8 Global Step: 13950 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-03-03 15:05:34,953-Speed 13840.50 samples/sec Loss 3.5635 LearningRate 0.0008 Epoch: 8 Global Step: 13960 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-03-03 15:05:52,682-Speed 13863.17 samples/sec Loss 3.5471 LearningRate 0.0008 Epoch: 8 Global Step: 13970 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-03-03 15:06:10,410-Speed 13868.79 samples/sec Loss 3.5230 LearningRate 0.0008 Epoch: 8 Global Step: 13980 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-03-03 15:06:28,063-Speed 13922.98 samples/sec Loss 3.4903 LearningRate 0.0008 Epoch: 8 Global Step: 13990 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-03-03 15:06:45,838-Speed 13836.92 samples/sec Loss 3.5558 LearningRate 0.0008 Epoch: 8 Global Step: 14000 Fp16 Grad Scale: 16384 Required: 28 hours Training: 2022-03-03 15:07:03,507-Speed 13910.25 samples/sec Loss 3.5234 LearningRate 0.0008 Epoch: 8 Global Step: 14010 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-03-03 15:07:21,164-Speed 13925.93 samples/sec Loss 3.5264 LearningRate 0.0008 Epoch: 8 Global Step: 14020 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-03-03 15:07:38,957-Speed 13812.98 samples/sec Loss 3.5457 LearningRate 0.0008 Epoch: 8 Global Step: 14030 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-03-03 15:07:56,630-Speed 13913.32 samples/sec Loss 3.6096 LearningRate 0.0008 Epoch: 8 Global Step: 14040 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-03-03 15:08:14,284-Speed 13922.15 samples/sec Loss 3.5387 LearningRate 0.0008 Epoch: 8 Global Step: 14050 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-03-03 15:08:31,910-Speed 13952.55 samples/sec Loss 3.5106 LearningRate 0.0008 Epoch: 8 Global Step: 14060 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-03-03 15:08:49,599-Speed 13893.95 samples/sec Loss 3.5271 LearningRate 0.0008 Epoch: 8 Global Step: 14070 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-03-03 15:09:07,272-Speed 13907.08 samples/sec Loss 3.5500 LearningRate 0.0008 Epoch: 8 Global Step: 14080 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-03-03 15:09:24,983-Speed 13877.00 samples/sec Loss 3.5394 LearningRate 0.0008 Epoch: 8 Global Step: 14090 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-03-03 15:09:42,730-Speed 13849.54 samples/sec Loss 3.4978 LearningRate 0.0008 Epoch: 8 Global Step: 14100 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-03-03 15:10:00,434-Speed 13882.26 samples/sec Loss 3.5349 LearningRate 0.0008 Epoch: 8 Global Step: 14110 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-03 15:10:18,151-Speed 13872.38 samples/sec Loss 3.5252 LearningRate 0.0008 Epoch: 8 Global Step: 14120 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-03 15:10:35,927-Speed 13825.97 samples/sec Loss 3.5336 LearningRate 0.0008 Epoch: 8 Global Step: 14130 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-03 15:10:53,582-Speed 13921.36 samples/sec Loss 3.4882 LearningRate 0.0008 Epoch: 8 Global Step: 14140 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-03 15:11:11,394-Speed 13797.72 samples/sec Loss 3.5173 LearningRate 0.0008 Epoch: 8 Global Step: 14150 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-03 15:11:29,081-Speed 13896.18 samples/sec Loss 3.5220 LearningRate 0.0008 Epoch: 8 Global Step: 14160 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-03-03 15:11:46,763-Speed 13899.66 samples/sec Loss 3.5101 LearningRate 0.0008 Epoch: 8 Global Step: 14170 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-03-03 15:12:04,456-Speed 13891.63 samples/sec Loss 3.5305 LearningRate 0.0008 Epoch: 8 Global Step: 14180 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-03-03 15:12:22,144-Speed 13894.69 samples/sec Loss 3.5191 LearningRate 0.0008 Epoch: 8 Global Step: 14190 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-03-03 15:12:39,819-Speed 13905.63 samples/sec Loss 3.4985 LearningRate 0.0008 Epoch: 8 Global Step: 14200 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-03-03 15:12:57,653-Speed 13781.72 samples/sec Loss 3.5623 LearningRate 0.0008 Epoch: 8 Global Step: 14210 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-03-03 15:13:15,368-Speed 13874.31 samples/sec Loss 3.5118 LearningRate 0.0008 Epoch: 8 Global Step: 14220 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-03-03 15:13:33,235-Speed 13755.29 samples/sec Loss 3.4631 LearningRate 0.0008 Epoch: 8 Global Step: 14230 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-03-03 15:13:50,971-Speed 13857.97 samples/sec Loss 3.4838 LearningRate 0.0008 Epoch: 8 Global Step: 14240 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-03-03 15:14:08,786-Speed 13795.91 samples/sec Loss 3.5072 LearningRate 0.0008 Epoch: 8 Global Step: 14250 Fp16 Grad Scale: 32768 Required: 28 hours Training: 2022-03-03 15:14:26,608-Speed 13790.71 samples/sec Loss 3.5193 LearningRate 0.0008 Epoch: 8 Global Step: 14260 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-03 15:14:44,282-Speed 13905.39 samples/sec Loss 3.4902 LearningRate 0.0008 Epoch: 8 Global Step: 14270 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-03 15:15:02,023-Speed 13853.74 samples/sec Loss 3.4905 LearningRate 0.0008 Epoch: 8 Global Step: 14280 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-03 15:15:19,718-Speed 13890.78 samples/sec Loss 3.4574 LearningRate 0.0008 Epoch: 8 Global Step: 14290 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-03 15:15:37,480-Speed 13838.02 samples/sec Loss 3.4934 LearningRate 0.0008 Epoch: 8 Global Step: 14300 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-03 15:15:55,265-Speed 13819.69 samples/sec Loss 3.5238 LearningRate 0.0008 Epoch: 8 Global Step: 14310 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-03 15:16:13,009-Speed 13850.71 samples/sec Loss 3.4889 LearningRate 0.0008 Epoch: 8 Global Step: 14320 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-03 15:16:30,740-Speed 13861.47 samples/sec Loss 3.5030 LearningRate 0.0008 Epoch: 8 Global Step: 14330 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-03 15:16:48,477-Speed 13857.03 samples/sec Loss 3.4825 LearningRate 0.0008 Epoch: 8 Global Step: 14340 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-03-03 15:17:06,275-Speed 13814.10 samples/sec Loss 3.4653 LearningRate 0.0008 Epoch: 8 Global Step: 14350 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-03 15:17:24,185-Speed 13723.27 samples/sec Loss 3.4770 LearningRate 0.0008 Epoch: 8 Global Step: 14360 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-03-03 15:17:42,176-Speed 13660.91 samples/sec Loss 3.4940 LearningRate 0.0008 Epoch: 8 Global Step: 14370 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-03-03 15:18:00,044-Speed 13755.02 samples/sec Loss 3.4973 LearningRate 0.0008 Epoch: 8 Global Step: 14380 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-03 15:18:17,808-Speed 13835.46 samples/sec Loss 3.4404 LearningRate 0.0008 Epoch: 8 Global Step: 14390 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-03 15:18:35,626-Speed 13800.77 samples/sec Loss 3.4436 LearningRate 0.0008 Epoch: 8 Global Step: 14400 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-03 15:18:53,495-Speed 13754.66 samples/sec Loss 3.4905 LearningRate 0.0008 Epoch: 8 Global Step: 14410 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-03 15:19:11,331-Speed 13780.58 samples/sec Loss 3.4436 LearningRate 0.0008 Epoch: 8 Global Step: 14420 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-03 15:19:29,159-Speed 13786.14 samples/sec Loss 3.4757 LearningRate 0.0008 Epoch: 8 Global Step: 14430 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-03 15:19:47,007-Speed 13770.84 samples/sec Loss 3.4542 LearningRate 0.0008 Epoch: 8 Global Step: 14440 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-03 15:20:04,874-Speed 13755.24 samples/sec Loss 3.4539 LearningRate 0.0008 Epoch: 8 Global Step: 14450 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-03 15:20:22,761-Speed 13741.17 samples/sec Loss 3.4476 LearningRate 0.0008 Epoch: 8 Global Step: 14460 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-03 15:20:40,651-Speed 13738.04 samples/sec Loss 3.4779 LearningRate 0.0008 Epoch: 8 Global Step: 14470 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-03 15:20:58,557-Speed 13725.81 samples/sec Loss 3.4863 LearningRate 0.0008 Epoch: 8 Global Step: 14480 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-03 15:21:16,531-Speed 13673.32 samples/sec Loss 3.4450 LearningRate 0.0008 Epoch: 8 Global Step: 14490 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-03 15:21:34,404-Speed 13751.85 samples/sec Loss 3.4394 LearningRate 0.0008 Epoch: 8 Global Step: 14500 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-03 15:21:52,176-Speed 13829.44 samples/sec Loss 3.4774 LearningRate 0.0008 Epoch: 8 Global Step: 14510 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-03 15:22:10,051-Speed 13749.52 samples/sec Loss 3.4703 LearningRate 0.0008 Epoch: 8 Global Step: 14520 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-03 15:22:28,102-Speed 13615.27 samples/sec Loss 3.4700 LearningRate 0.0008 Epoch: 8 Global Step: 14530 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-03 15:22:46,026-Speed 13713.54 samples/sec Loss 3.4774 LearningRate 0.0008 Epoch: 8 Global Step: 14540 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-03 15:23:03,783-Speed 13841.42 samples/sec Loss 3.4615 LearningRate 0.0008 Epoch: 8 Global Step: 14550 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-03 15:23:21,652-Speed 13754.06 samples/sec Loss 3.4376 LearningRate 0.0008 Epoch: 8 Global Step: 14560 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-03 15:23:39,555-Speed 13728.61 samples/sec Loss 3.4401 LearningRate 0.0008 Epoch: 8 Global Step: 14570 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-03 15:23:57,395-Speed 13776.43 samples/sec Loss 3.4647 LearningRate 0.0008 Epoch: 8 Global Step: 14580 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-03 15:24:15,400-Speed 13650.63 samples/sec Loss 3.4548 LearningRate 0.0008 Epoch: 8 Global Step: 14590 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-03 15:24:33,341-Speed 13699.27 samples/sec Loss 3.4043 LearningRate 0.0008 Epoch: 8 Global Step: 14600 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-03 15:24:51,205-Speed 13758.07 samples/sec Loss 3.4338 LearningRate 0.0008 Epoch: 8 Global Step: 14610 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-03 15:25:08,995-Speed 13819.78 samples/sec Loss 3.4537 LearningRate 0.0008 Epoch: 8 Global Step: 14620 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-03 15:25:26,900-Speed 13726.65 samples/sec Loss 3.4437 LearningRate 0.0008 Epoch: 8 Global Step: 14630 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-03 15:25:44,773-Speed 13756.34 samples/sec Loss 3.4478 LearningRate 0.0008 Epoch: 8 Global Step: 14640 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-03-03 15:26:02,703-Speed 13707.30 samples/sec Loss 3.4436 LearningRate 0.0008 Epoch: 8 Global Step: 14650 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-03-03 15:26:20,533-Speed 13791.14 samples/sec Loss 3.4297 LearningRate 0.0008 Epoch: 8 Global Step: 14660 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-03-03 15:26:38,379-Speed 13771.77 samples/sec Loss 3.4299 LearningRate 0.0008 Epoch: 8 Global Step: 14670 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-03-03 15:26:56,212-Speed 13782.08 samples/sec Loss 3.4322 LearningRate 0.0008 Epoch: 8 Global Step: 14680 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-03-03 15:27:14,019-Speed 13801.94 samples/sec Loss 3.4328 LearningRate 0.0008 Epoch: 8 Global Step: 14690 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-03 15:27:32,015-Speed 13657.96 samples/sec Loss 3.4263 LearningRate 0.0008 Epoch: 8 Global Step: 14700 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-03 15:27:49,814-Speed 13815.03 samples/sec Loss 3.4026 LearningRate 0.0008 Epoch: 8 Global Step: 14710 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-03 15:28:07,677-Speed 13758.83 samples/sec Loss 3.3964 LearningRate 0.0008 Epoch: 8 Global Step: 14720 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-03 15:28:25,522-Speed 13773.36 samples/sec Loss 3.4145 LearningRate 0.0008 Epoch: 8 Global Step: 14730 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-03 15:28:43,393-Speed 13753.46 samples/sec Loss 3.4371 LearningRate 0.0008 Epoch: 8 Global Step: 14740 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-03 15:29:01,244-Speed 13767.94 samples/sec Loss 3.4237 LearningRate 0.0008 Epoch: 8 Global Step: 14750 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-03 15:29:19,124-Speed 13778.74 samples/sec Loss 3.3897 LearningRate 0.0008 Epoch: 8 Global Step: 14760 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-03 15:29:37,037-Speed 13720.71 samples/sec Loss 3.4138 LearningRate 0.0008 Epoch: 8 Global Step: 14770 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-03 15:29:55,055-Speed 13640.98 samples/sec Loss 3.4203 LearningRate 0.0008 Epoch: 8 Global Step: 14780 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-03 15:30:12,879-Speed 13788.67 samples/sec Loss 3.3840 LearningRate 0.0008 Epoch: 8 Global Step: 14790 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-03 15:30:30,857-Speed 13671.12 samples/sec Loss 3.3737 LearningRate 0.0008 Epoch: 8 Global Step: 14800 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-03 15:30:48,719-Speed 13759.68 samples/sec Loss 3.4029 LearningRate 0.0008 Epoch: 8 Global Step: 14810 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-03 15:31:06,708-Speed 13662.88 samples/sec Loss 3.3790 LearningRate 0.0008 Epoch: 8 Global Step: 14820 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-03 15:31:24,669-Speed 13683.46 samples/sec Loss 3.3935 LearningRate 0.0008 Epoch: 8 Global Step: 14830 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-03 15:31:42,663-Speed 13659.02 samples/sec Loss 3.4130 LearningRate 0.0008 Epoch: 8 Global Step: 14840 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-03 15:32:00,559-Speed 13733.48 samples/sec Loss 3.3957 LearningRate 0.0008 Epoch: 8 Global Step: 14850 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-03 15:32:18,466-Speed 13725.48 samples/sec Loss 3.4082 LearningRate 0.0008 Epoch: 8 Global Step: 14860 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-03 15:32:36,387-Speed 13714.18 samples/sec Loss 3.4045 LearningRate 0.0008 Epoch: 8 Global Step: 14870 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-03 15:32:54,226-Speed 13778.01 samples/sec Loss 3.3693 LearningRate 0.0008 Epoch: 8 Global Step: 14880 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-03 15:33:12,005-Speed 13823.97 samples/sec Loss 3.3586 LearningRate 0.0008 Epoch: 8 Global Step: 14890 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-03-03 15:33:29,841-Speed 13780.09 samples/sec Loss 3.3792 LearningRate 0.0008 Epoch: 8 Global Step: 14900 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-03-03 15:33:47,780-Speed 13700.68 samples/sec Loss 3.4166 LearningRate 0.0008 Epoch: 8 Global Step: 14910 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-03 15:34:05,789-Speed 13647.17 samples/sec Loss 3.4056 LearningRate 0.0008 Epoch: 8 Global Step: 14920 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-03 15:34:23,666-Speed 13755.20 samples/sec Loss 3.3570 LearningRate 0.0008 Epoch: 8 Global Step: 14930 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-03 15:34:41,507-Speed 13775.90 samples/sec Loss 3.3726 LearningRate 0.0008 Epoch: 8 Global Step: 14940 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-03 15:34:59,417-Speed 13722.35 samples/sec Loss 3.3605 LearningRate 0.0008 Epoch: 8 Global Step: 14950 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-03 15:35:17,322-Speed 13726.87 samples/sec Loss 3.3711 LearningRate 0.0008 Epoch: 8 Global Step: 14960 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-03 15:35:35,120-Speed 13810.70 samples/sec Loss 3.3741 LearningRate 0.0008 Epoch: 8 Global Step: 14970 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-03 15:35:52,985-Speed 13760.24 samples/sec Loss 3.3826 LearningRate 0.0008 Epoch: 8 Global Step: 14980 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-03 15:36:10,801-Speed 13794.64 samples/sec Loss 3.3706 LearningRate 0.0008 Epoch: 8 Global Step: 14990 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-03 15:36:28,697-Speed 13743.24 samples/sec Loss 3.3658 LearningRate 0.0008 Epoch: 8 Global Step: 15000 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-03 15:36:46,537-Speed 13776.78 samples/sec Loss 3.3515 LearningRate 0.0008 Epoch: 8 Global Step: 15010 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-03 15:37:04,382-Speed 13772.71 samples/sec Loss 3.3527 LearningRate 0.0008 Epoch: 8 Global Step: 15020 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-03 15:37:22,246-Speed 13758.36 samples/sec Loss 3.3922 LearningRate 0.0008 Epoch: 8 Global Step: 15030 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-03 15:37:40,076-Speed 13784.29 samples/sec Loss 3.3635 LearningRate 0.0008 Epoch: 8 Global Step: 15040 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-03 15:37:57,936-Speed 13761.57 samples/sec Loss 3.3377 LearningRate 0.0008 Epoch: 8 Global Step: 15050 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-03 15:38:15,760-Speed 13788.52 samples/sec Loss 3.3545 LearningRate 0.0008 Epoch: 8 Global Step: 15060 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-03 15:38:33,690-Speed 13707.67 samples/sec Loss 3.3822 LearningRate 0.0008 Epoch: 8 Global Step: 15070 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-03 15:38:51,493-Speed 13805.42 samples/sec Loss 3.3722 LearningRate 0.0008 Epoch: 8 Global Step: 15080 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-03 15:39:09,431-Speed 13701.28 samples/sec Loss 3.4035 LearningRate 0.0008 Epoch: 8 Global Step: 15090 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-03 15:39:27,512-Speed 13593.25 samples/sec Loss 3.3426 LearningRate 0.0008 Epoch: 8 Global Step: 15100 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-03 15:39:45,328-Speed 13795.15 samples/sec Loss 3.3538 LearningRate 0.0008 Epoch: 8 Global Step: 15110 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-03 15:40:03,190-Speed 13759.99 samples/sec Loss 3.3515 LearningRate 0.0008 Epoch: 8 Global Step: 15120 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-03 15:40:21,146-Speed 13688.05 samples/sec Loss 3.3589 LearningRate 0.0008 Epoch: 8 Global Step: 15130 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-03 15:40:39,128-Speed 13667.90 samples/sec Loss 3.3207 LearningRate 0.0008 Epoch: 8 Global Step: 15140 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-03 15:40:57,062-Speed 13704.53 samples/sec Loss 3.3454 LearningRate 0.0008 Epoch: 8 Global Step: 15150 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-03 15:41:15,048-Speed 13664.69 samples/sec Loss 3.3698 LearningRate 0.0008 Epoch: 8 Global Step: 15160 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-03 15:41:32,865-Speed 13796.01 samples/sec Loss 3.3531 LearningRate 0.0008 Epoch: 8 Global Step: 15170 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-03 15:41:50,733-Speed 13755.42 samples/sec Loss 3.3068 LearningRate 0.0008 Epoch: 8 Global Step: 15180 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-03 15:42:08,693-Speed 13684.51 samples/sec Loss 3.3934 LearningRate 0.0008 Epoch: 8 Global Step: 15190 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-03 15:42:26,584-Speed 13737.42 samples/sec Loss 3.3796 LearningRate 0.0008 Epoch: 8 Global Step: 15200 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-03 15:42:44,541-Speed 13687.03 samples/sec Loss 3.3262 LearningRate 0.0008 Epoch: 8 Global Step: 15210 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-03 15:43:02,458-Speed 13717.67 samples/sec Loss 3.3332 LearningRate 0.0008 Epoch: 8 Global Step: 15220 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-03 15:43:20,291-Speed 13782.07 samples/sec Loss 3.3275 LearningRate 0.0008 Epoch: 8 Global Step: 15230 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-03 15:43:38,096-Speed 13803.62 samples/sec Loss 3.3219 LearningRate 0.0008 Epoch: 8 Global Step: 15240 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-03 15:43:55,942-Speed 13772.02 samples/sec Loss 3.2921 LearningRate 0.0007 Epoch: 8 Global Step: 15250 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-03 15:44:13,816-Speed 13750.74 samples/sec Loss 3.2830 LearningRate 0.0007 Epoch: 8 Global Step: 15260 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-03 15:44:31,689-Speed 13751.77 samples/sec Loss 3.3247 LearningRate 0.0007 Epoch: 8 Global Step: 15270 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-03 15:44:49,536-Speed 13771.02 samples/sec Loss 3.3417 LearningRate 0.0007 Epoch: 8 Global Step: 15280 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-03 15:45:07,281-Speed 13850.29 samples/sec Loss 3.3739 LearningRate 0.0007 Epoch: 8 Global Step: 15290 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-03 15:45:24,995-Speed 13874.91 samples/sec Loss 3.2961 LearningRate 0.0007 Epoch: 8 Global Step: 15300 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-03 15:45:42,782-Speed 13818.18 samples/sec Loss 3.3283 LearningRate 0.0007 Epoch: 8 Global Step: 15310 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-03 15:46:00,514-Speed 13860.76 samples/sec Loss 3.3611 LearningRate 0.0007 Epoch: 8 Global Step: 15320 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-03 15:46:18,250-Speed 13857.13 samples/sec Loss 3.3314 LearningRate 0.0007 Epoch: 8 Global Step: 15330 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-03 15:46:36,021-Speed 13830.01 samples/sec Loss 3.3219 LearningRate 0.0007 Epoch: 8 Global Step: 15340 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-03 15:46:53,717-Speed 13889.16 samples/sec Loss 3.3183 LearningRate 0.0007 Epoch: 8 Global Step: 15350 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-03 15:47:11,508-Speed 13815.22 samples/sec Loss 3.3346 LearningRate 0.0007 Epoch: 8 Global Step: 15360 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-03 15:47:29,295-Speed 13817.58 samples/sec Loss 3.3418 LearningRate 0.0007 Epoch: 8 Global Step: 15370 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-03 15:47:46,963-Speed 13910.30 samples/sec Loss 3.3028 LearningRate 0.0007 Epoch: 8 Global Step: 15380 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-03-03 15:48:04,794-Speed 13783.85 samples/sec Loss 3.3103 LearningRate 0.0007 Epoch: 8 Global Step: 15390 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-03-03 15:48:22,544-Speed 13846.81 samples/sec Loss 3.3187 LearningRate 0.0007 Epoch: 8 Global Step: 15400 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-03-03 15:48:40,246-Speed 13883.80 samples/sec Loss 3.2837 LearningRate 0.0007 Epoch: 8 Global Step: 15410 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-03-03 15:48:57,927-Speed 13901.08 samples/sec Loss 3.2792 LearningRate 0.0007 Epoch: 8 Global Step: 15420 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-03-03 15:49:15,644-Speed 13872.55 samples/sec Loss 3.3819 LearningRate 0.0007 Epoch: 8 Global Step: 15430 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-03-03 15:49:33,362-Speed 13871.98 samples/sec Loss 3.3756 LearningRate 0.0007 Epoch: 8 Global Step: 15440 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-03-03 15:49:51,244-Speed 13743.68 samples/sec Loss 3.3068 LearningRate 0.0007 Epoch: 8 Global Step: 15450 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-03-03 15:50:09,323-Speed 13595.24 samples/sec Loss 3.2934 LearningRate 0.0007 Epoch: 8 Global Step: 15460 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-03-03 15:50:27,149-Speed 13787.27 samples/sec Loss 3.3275 LearningRate 0.0007 Epoch: 8 Global Step: 15470 Fp16 Grad Scale: 16384 Required: 27 hours Training: 2022-03-03 15:50:45,051-Speed 13730.19 samples/sec Loss 3.2823 LearningRate 0.0007 Epoch: 8 Global Step: 15480 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-03 15:51:03,199-Speed 13543.41 samples/sec Loss 3.2807 LearningRate 0.0007 Epoch: 8 Global Step: 15490 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-03 15:51:20,943-Speed 13850.93 samples/sec Loss 3.3478 LearningRate 0.0007 Epoch: 8 Global Step: 15500 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-03 15:51:38,691-Speed 13848.39 samples/sec Loss 3.3629 LearningRate 0.0007 Epoch: 8 Global Step: 15510 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-03 15:51:56,404-Speed 13875.32 samples/sec Loss 3.3480 LearningRate 0.0007 Epoch: 8 Global Step: 15520 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-03 15:52:14,151-Speed 13848.79 samples/sec Loss 3.3022 LearningRate 0.0007 Epoch: 8 Global Step: 15530 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-03 15:52:31,896-Speed 13850.80 samples/sec Loss 3.3004 LearningRate 0.0007 Epoch: 8 Global Step: 15540 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-03 15:52:49,820-Speed 13711.97 samples/sec Loss 3.3370 LearningRate 0.0007 Epoch: 8 Global Step: 15550 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-03 15:53:57,159-Speed 3649.71 samples/sec Loss 3.3412 LearningRate 0.0007 Epoch: 9 Global Step: 15560 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-03 15:54:14,905-Speed 13849.35 samples/sec Loss 3.2789 LearningRate 0.0007 Epoch: 9 Global Step: 15570 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-03 15:54:32,670-Speed 13834.97 samples/sec Loss 3.2407 LearningRate 0.0007 Epoch: 9 Global Step: 15580 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-03 15:54:50,487-Speed 13794.58 samples/sec Loss 3.2447 LearningRate 0.0007 Epoch: 9 Global Step: 15590 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-03 15:55:08,263-Speed 13826.05 samples/sec Loss 3.2676 LearningRate 0.0007 Epoch: 9 Global Step: 15600 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-03 15:55:26,126-Speed 13759.18 samples/sec Loss 3.2364 LearningRate 0.0007 Epoch: 9 Global Step: 15610 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-03 15:55:43,938-Speed 13798.08 samples/sec Loss 3.2731 LearningRate 0.0007 Epoch: 9 Global Step: 15620 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-03 15:56:01,742-Speed 13804.31 samples/sec Loss 3.2320 LearningRate 0.0007 Epoch: 9 Global Step: 15630 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-03 15:56:20,292-Speed 13249.85 samples/sec Loss 3.2507 LearningRate 0.0007 Epoch: 9 Global Step: 15640 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-03 15:56:38,264-Speed 13781.30 samples/sec Loss 3.2568 LearningRate 0.0007 Epoch: 9 Global Step: 15650 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-03 15:56:56,082-Speed 13793.35 samples/sec Loss 3.2503 LearningRate 0.0007 Epoch: 9 Global Step: 15660 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-03 15:57:13,919-Speed 13778.89 samples/sec Loss 3.2615 LearningRate 0.0007 Epoch: 9 Global Step: 15670 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-03 15:57:31,820-Speed 13730.27 samples/sec Loss 3.2505 LearningRate 0.0007 Epoch: 9 Global Step: 15680 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-03 15:57:49,637-Speed 13794.30 samples/sec Loss 3.2864 LearningRate 0.0007 Epoch: 9 Global Step: 15690 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-03 15:58:07,513-Speed 13749.65 samples/sec Loss 3.2901 LearningRate 0.0007 Epoch: 9 Global Step: 15700 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-03 15:58:25,370-Speed 13763.58 samples/sec Loss 3.2468 LearningRate 0.0007 Epoch: 9 Global Step: 15710 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-03 15:58:43,147-Speed 13825.86 samples/sec Loss 3.2305 LearningRate 0.0007 Epoch: 9 Global Step: 15720 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-03 15:59:00,866-Speed 13870.80 samples/sec Loss 3.2313 LearningRate 0.0007 Epoch: 9 Global Step: 15730 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-03 15:59:18,668-Speed 13805.52 samples/sec Loss 3.2588 LearningRate 0.0007 Epoch: 9 Global Step: 15740 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-03 15:59:36,451-Speed 13820.77 samples/sec Loss 3.2390 LearningRate 0.0007 Epoch: 9 Global Step: 15750 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-03 15:59:54,308-Speed 13763.87 samples/sec Loss 3.2621 LearningRate 0.0007 Epoch: 9 Global Step: 15760 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-03 16:00:12,117-Speed 13800.71 samples/sec Loss 3.2603 LearningRate 0.0007 Epoch: 9 Global Step: 15770 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-03 16:00:29,983-Speed 13756.19 samples/sec Loss 3.2797 LearningRate 0.0007 Epoch: 9 Global Step: 15780 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-03 16:00:47,817-Speed 13782.22 samples/sec Loss 3.2756 LearningRate 0.0007 Epoch: 9 Global Step: 15790 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-03 16:01:05,614-Speed 13809.90 samples/sec Loss 3.2439 LearningRate 0.0007 Epoch: 9 Global Step: 15800 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-03 16:01:23,385-Speed 13830.27 samples/sec Loss 3.2306 LearningRate 0.0007 Epoch: 9 Global Step: 15810 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-03 16:01:41,207-Speed 13790.55 samples/sec Loss 3.2170 LearningRate 0.0007 Epoch: 9 Global Step: 15820 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-03 16:01:59,007-Speed 13807.99 samples/sec Loss 3.2555 LearningRate 0.0007 Epoch: 9 Global Step: 15830 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-03 16:02:16,834-Speed 13786.23 samples/sec Loss 3.2404 LearningRate 0.0007 Epoch: 9 Global Step: 15840 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-03 16:02:34,657-Speed 13790.39 samples/sec Loss 3.2139 LearningRate 0.0007 Epoch: 9 Global Step: 15850 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-03 16:02:52,493-Speed 13779.88 samples/sec Loss 3.2298 LearningRate 0.0007 Epoch: 9 Global Step: 15860 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-03 16:03:10,346-Speed 13765.99 samples/sec Loss 3.2868 LearningRate 0.0007 Epoch: 9 Global Step: 15870 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-03 16:03:28,151-Speed 13803.96 samples/sec Loss 3.3042 LearningRate 0.0007 Epoch: 9 Global Step: 15880 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-03 16:03:46,062-Speed 13722.50 samples/sec Loss 3.2579 LearningRate 0.0007 Epoch: 9 Global Step: 15890 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-03 16:04:04,693-Speed 13192.18 samples/sec Loss 3.2048 LearningRate 0.0007 Epoch: 9 Global Step: 15900 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-03 16:04:22,545-Speed 13768.04 samples/sec Loss 3.2214 LearningRate 0.0007 Epoch: 9 Global Step: 15910 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-03 16:04:40,403-Speed 13762.65 samples/sec Loss 3.2281 LearningRate 0.0007 Epoch: 9 Global Step: 15920 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-03 16:04:58,228-Speed 13787.70 samples/sec Loss 3.2288 LearningRate 0.0007 Epoch: 9 Global Step: 15930 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-03 16:05:16,983-Speed 13106.69 samples/sec Loss 3.2659 LearningRate 0.0007 Epoch: 9 Global Step: 15940 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-03 16:05:34,823-Speed 13777.26 samples/sec Loss 3.2140 LearningRate 0.0007 Epoch: 9 Global Step: 15950 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-03 16:05:52,564-Speed 13853.50 samples/sec Loss 3.2238 LearningRate 0.0007 Epoch: 9 Global Step: 15960 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-03 16:06:10,449-Speed 13742.22 samples/sec Loss 3.2151 LearningRate 0.0007 Epoch: 9 Global Step: 15970 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-03 16:06:28,328-Speed 13747.39 samples/sec Loss 3.2421 LearningRate 0.0007 Epoch: 9 Global Step: 15980 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-03 16:06:46,974-Speed 13181.08 samples/sec Loss 3.2421 LearningRate 0.0007 Epoch: 9 Global Step: 15990 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-03 16:07:04,813-Speed 13777.07 samples/sec Loss 3.2190 LearningRate 0.0007 Epoch: 9 Global Step: 16000 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-03 16:07:22,625-Speed 13798.97 samples/sec Loss 3.2367 LearningRate 0.0007 Epoch: 9 Global Step: 16010 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-03 16:07:40,424-Speed 13808.46 samples/sec Loss 3.2015 LearningRate 0.0007 Epoch: 9 Global Step: 16020 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-03 16:07:58,249-Speed 13788.90 samples/sec Loss 3.1949 LearningRate 0.0007 Epoch: 9 Global Step: 16030 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-03 16:08:16,088-Speed 13777.12 samples/sec Loss 3.2128 LearningRate 0.0007 Epoch: 9 Global Step: 16040 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-03 16:08:33,908-Speed 13791.87 samples/sec Loss 3.2284 LearningRate 0.0007 Epoch: 9 Global Step: 16050 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-03 16:08:51,753-Speed 13772.98 samples/sec Loss 3.2627 LearningRate 0.0007 Epoch: 9 Global Step: 16060 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-03 16:09:09,580-Speed 13787.01 samples/sec Loss 3.2203 LearningRate 0.0007 Epoch: 9 Global Step: 16070 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-03 16:09:27,383-Speed 13804.84 samples/sec Loss 3.1926 LearningRate 0.0007 Epoch: 9 Global Step: 16080 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-03 16:09:45,200-Speed 13794.24 samples/sec Loss 3.2052 LearningRate 0.0007 Epoch: 9 Global Step: 16090 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-03 16:10:03,070-Speed 13754.38 samples/sec Loss 3.2229 LearningRate 0.0007 Epoch: 9 Global Step: 16100 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-03 16:10:20,872-Speed 13805.90 samples/sec Loss 3.1862 LearningRate 0.0007 Epoch: 9 Global Step: 16110 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-03 16:10:38,725-Speed 13766.62 samples/sec Loss 3.1820 LearningRate 0.0007 Epoch: 9 Global Step: 16120 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-03 16:10:56,525-Speed 13807.84 samples/sec Loss 3.1894 LearningRate 0.0007 Epoch: 9 Global Step: 16130 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-03 16:11:14,737-Speed 13495.39 samples/sec Loss 3.2053 LearningRate 0.0007 Epoch: 9 Global Step: 16140 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-03 16:11:33,119-Speed 13370.46 samples/sec Loss 3.1987 LearningRate 0.0007 Epoch: 9 Global Step: 16150 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-03 16:11:50,887-Speed 13832.76 samples/sec Loss 3.2051 LearningRate 0.0007 Epoch: 9 Global Step: 16160 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-03 16:12:08,660-Speed 13829.01 samples/sec Loss 3.2048 LearningRate 0.0007 Epoch: 9 Global Step: 16170 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-03 16:12:26,467-Speed 13802.66 samples/sec Loss 3.1724 LearningRate 0.0007 Epoch: 9 Global Step: 16180 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-03 16:12:44,264-Speed 13809.79 samples/sec Loss 3.1794 LearningRate 0.0007 Epoch: 9 Global Step: 16190 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-03 16:13:02,147-Speed 13743.51 samples/sec Loss 3.1966 LearningRate 0.0007 Epoch: 9 Global Step: 16200 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-03 16:13:19,988-Speed 13775.89 samples/sec Loss 3.2001 LearningRate 0.0007 Epoch: 9 Global Step: 16210 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-03 16:13:37,790-Speed 13805.38 samples/sec Loss 3.2012 LearningRate 0.0007 Epoch: 9 Global Step: 16220 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-03 16:13:55,643-Speed 13767.22 samples/sec Loss 3.2076 LearningRate 0.0007 Epoch: 9 Global Step: 16230 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-03 16:14:13,509-Speed 13757.25 samples/sec Loss 3.1818 LearningRate 0.0007 Epoch: 9 Global Step: 16240 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-03 16:14:31,374-Speed 13758.71 samples/sec Loss 3.1791 LearningRate 0.0007 Epoch: 9 Global Step: 16250 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-03 16:14:49,188-Speed 13796.47 samples/sec Loss 3.1768 LearningRate 0.0007 Epoch: 9 Global Step: 16260 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-03 16:15:07,057-Speed 13754.72 samples/sec Loss 3.1929 LearningRate 0.0007 Epoch: 9 Global Step: 16270 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-03 16:15:24,888-Speed 13783.70 samples/sec Loss 3.2151 LearningRate 0.0007 Epoch: 9 Global Step: 16280 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-03 16:15:42,816-Speed 13709.03 samples/sec Loss 3.2011 LearningRate 0.0007 Epoch: 9 Global Step: 16290 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-03 16:16:00,729-Speed 13720.02 samples/sec Loss 3.1923 LearningRate 0.0007 Epoch: 9 Global Step: 16300 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-03-03 16:16:18,651-Speed 13714.40 samples/sec Loss 3.1823 LearningRate 0.0007 Epoch: 9 Global Step: 16310 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-03 16:16:36,440-Speed 13815.69 samples/sec Loss 3.1757 LearningRate 0.0007 Epoch: 9 Global Step: 16320 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-03 16:16:54,173-Speed 13860.11 samples/sec Loss 3.1677 LearningRate 0.0007 Epoch: 9 Global Step: 16330 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-03 16:17:11,973-Speed 13807.63 samples/sec Loss 3.1443 LearningRate 0.0007 Epoch: 9 Global Step: 16340 Fp16 Grad Scale: 32768 Required: 27 hours Training: 2022-03-03 16:17:29,730-Speed 13842.30 samples/sec Loss 3.1642 LearningRate 0.0007 Epoch: 9 Global Step: 16350 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-03-03 16:17:47,559-Speed 13785.07 samples/sec Loss 3.1795 LearningRate 0.0007 Epoch: 9 Global Step: 16360 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-03-03 16:18:05,356-Speed 13810.65 samples/sec Loss 3.1739 LearningRate 0.0007 Epoch: 9 Global Step: 16370 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-03-03 16:18:23,130-Speed 13827.67 samples/sec Loss 3.1813 LearningRate 0.0007 Epoch: 9 Global Step: 16380 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-03-03 16:18:40,874-Speed 13851.94 samples/sec Loss 3.1811 LearningRate 0.0007 Epoch: 9 Global Step: 16390 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-03-03 16:18:58,639-Speed 13835.13 samples/sec Loss 3.1849 LearningRate 0.0007 Epoch: 9 Global Step: 16400 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-03-03 16:19:16,442-Speed 13805.35 samples/sec Loss 3.1993 LearningRate 0.0007 Epoch: 9 Global Step: 16410 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-03 16:19:34,272-Speed 13784.53 samples/sec Loss 3.1661 LearningRate 0.0007 Epoch: 9 Global Step: 16420 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-03 16:19:52,035-Speed 13836.02 samples/sec Loss 3.1639 LearningRate 0.0007 Epoch: 9 Global Step: 16430 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-03 16:20:09,836-Speed 13807.46 samples/sec Loss 3.1617 LearningRate 0.0007 Epoch: 9 Global Step: 16440 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-03 16:20:27,652-Speed 13794.64 samples/sec Loss 3.1473 LearningRate 0.0007 Epoch: 9 Global Step: 16450 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-03 16:20:45,523-Speed 13752.68 samples/sec Loss 3.1601 LearningRate 0.0007 Epoch: 9 Global Step: 16460 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-03 16:21:03,342-Speed 13793.25 samples/sec Loss 3.1759 LearningRate 0.0007 Epoch: 9 Global Step: 16470 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-03 16:21:21,157-Speed 13796.16 samples/sec Loss 3.1603 LearningRate 0.0007 Epoch: 9 Global Step: 16480 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-03 16:21:39,033-Speed 13749.50 samples/sec Loss 3.1657 LearningRate 0.0007 Epoch: 9 Global Step: 16490 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-03 16:21:56,893-Speed 13760.82 samples/sec Loss 3.1260 LearningRate 0.0007 Epoch: 9 Global Step: 16500 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-03 16:22:14,700-Speed 13802.24 samples/sec Loss 3.1409 LearningRate 0.0007 Epoch: 9 Global Step: 16510 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-03-03 16:22:32,544-Speed 13773.71 samples/sec Loss 3.1620 LearningRate 0.0007 Epoch: 9 Global Step: 16520 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-03-03 16:22:50,320-Speed 13826.20 samples/sec Loss 3.1476 LearningRate 0.0007 Epoch: 9 Global Step: 16530 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-03-03 16:23:08,228-Speed 13724.53 samples/sec Loss 3.1216 LearningRate 0.0007 Epoch: 9 Global Step: 16540 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-03-03 16:23:26,071-Speed 13774.39 samples/sec Loss 3.1560 LearningRate 0.0007 Epoch: 9 Global Step: 16550 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-03-03 16:23:43,905-Speed 13781.03 samples/sec Loss 3.1552 LearningRate 0.0007 Epoch: 9 Global Step: 16560 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-03-03 16:24:01,740-Speed 13780.23 samples/sec Loss 3.1346 LearningRate 0.0007 Epoch: 9 Global Step: 16570 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-03 16:24:19,554-Speed 13797.48 samples/sec Loss 3.1510 LearningRate 0.0007 Epoch: 9 Global Step: 16580 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-03 16:24:37,366-Speed 13798.73 samples/sec Loss 3.1111 LearningRate 0.0007 Epoch: 9 Global Step: 16590 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-03 16:24:55,119-Speed 13843.91 samples/sec Loss 3.1470 LearningRate 0.0007 Epoch: 9 Global Step: 16600 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-03 16:25:12,952-Speed 13781.98 samples/sec Loss 3.1418 LearningRate 0.0007 Epoch: 9 Global Step: 16610 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-03 16:25:30,741-Speed 13817.93 samples/sec Loss 3.1411 LearningRate 0.0007 Epoch: 9 Global Step: 16620 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-03 16:25:48,529-Speed 13816.51 samples/sec Loss 3.1514 LearningRate 0.0007 Epoch: 9 Global Step: 16630 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-03 16:26:06,398-Speed 13755.12 samples/sec Loss 3.1374 LearningRate 0.0007 Epoch: 9 Global Step: 16640 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-03 16:26:24,230-Speed 13782.76 samples/sec Loss 3.1183 LearningRate 0.0007 Epoch: 9 Global Step: 16650 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-03 16:26:42,189-Speed 13686.80 samples/sec Loss 3.1361 LearningRate 0.0007 Epoch: 9 Global Step: 16660 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-03 16:27:00,003-Speed 13796.64 samples/sec Loss 3.1299 LearningRate 0.0007 Epoch: 9 Global Step: 16670 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-03 16:27:17,836-Speed 13782.35 samples/sec Loss 3.1287 LearningRate 0.0007 Epoch: 9 Global Step: 16680 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-03 16:27:35,668-Speed 13782.73 samples/sec Loss 3.1115 LearningRate 0.0007 Epoch: 9 Global Step: 16690 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-03 16:27:53,475-Speed 13802.36 samples/sec Loss 3.1490 LearningRate 0.0007 Epoch: 9 Global Step: 16700 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-03 16:28:11,345-Speed 13753.36 samples/sec Loss 3.1454 LearningRate 0.0007 Epoch: 9 Global Step: 16710 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-03 16:28:29,111-Speed 13834.61 samples/sec Loss 3.0783 LearningRate 0.0007 Epoch: 9 Global Step: 16720 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-03 16:28:46,975-Speed 13757.92 samples/sec Loss 3.1167 LearningRate 0.0007 Epoch: 9 Global Step: 16730 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-03 16:29:04,823-Speed 13770.22 samples/sec Loss 3.1386 LearningRate 0.0007 Epoch: 9 Global Step: 16740 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-03 16:29:22,625-Speed 13805.72 samples/sec Loss 3.1336 LearningRate 0.0007 Epoch: 9 Global Step: 16750 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-03 16:29:40,403-Speed 13825.32 samples/sec Loss 3.1310 LearningRate 0.0007 Epoch: 9 Global Step: 16760 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-03 16:29:58,242-Speed 13777.41 samples/sec Loss 3.1093 LearningRate 0.0007 Epoch: 9 Global Step: 16770 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-03-03 16:30:16,122-Speed 13747.13 samples/sec Loss 3.1437 LearningRate 0.0007 Epoch: 9 Global Step: 16780 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-03-03 16:30:33,880-Speed 13840.04 samples/sec Loss 3.1104 LearningRate 0.0007 Epoch: 9 Global Step: 16790 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-03-03 16:30:51,649-Speed 13831.73 samples/sec Loss 3.1096 LearningRate 0.0007 Epoch: 9 Global Step: 16800 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-03-03 16:31:09,441-Speed 13813.59 samples/sec Loss 3.1075 LearningRate 0.0007 Epoch: 9 Global Step: 16810 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-03-03 16:31:27,205-Speed 13836.10 samples/sec Loss 3.1124 LearningRate 0.0007 Epoch: 9 Global Step: 16820 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-03 16:31:45,042-Speed 13779.28 samples/sec Loss 3.0991 LearningRate 0.0007 Epoch: 9 Global Step: 16830 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-03 16:32:02,911-Speed 13754.52 samples/sec Loss 3.1266 LearningRate 0.0007 Epoch: 9 Global Step: 16840 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-03 16:32:20,881-Speed 13676.66 samples/sec Loss 3.0908 LearningRate 0.0007 Epoch: 9 Global Step: 16850 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-03 16:32:39,380-Speed 13286.44 samples/sec Loss 3.0938 LearningRate 0.0007 Epoch: 9 Global Step: 16860 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-03 16:32:57,180-Speed 13807.14 samples/sec Loss 3.0888 LearningRate 0.0007 Epoch: 9 Global Step: 16870 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-03 16:33:15,108-Speed 13709.09 samples/sec Loss 3.1038 LearningRate 0.0007 Epoch: 9 Global Step: 16880 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-03 16:33:32,886-Speed 13824.44 samples/sec Loss 3.1016 LearningRate 0.0007 Epoch: 9 Global Step: 16890 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-03-03 16:33:51,541-Speed 13174.82 samples/sec Loss 3.1180 LearningRate 0.0007 Epoch: 9 Global Step: 16900 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-03-03 16:34:09,536-Speed 13658.32 samples/sec Loss 3.0866 LearningRate 0.0007 Epoch: 9 Global Step: 16910 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-03-03 16:34:27,318-Speed 13821.58 samples/sec Loss 3.1033 LearningRate 0.0007 Epoch: 9 Global Step: 16920 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-03-03 16:34:45,098-Speed 13823.28 samples/sec Loss 3.1165 LearningRate 0.0007 Epoch: 9 Global Step: 16930 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-03-03 16:35:02,930-Speed 13783.18 samples/sec Loss 3.0808 LearningRate 0.0007 Epoch: 9 Global Step: 16940 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-03-03 16:35:20,966-Speed 13626.99 samples/sec Loss 3.1100 LearningRate 0.0007 Epoch: 9 Global Step: 16950 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-03-03 16:35:39,433-Speed 13308.73 samples/sec Loss 3.1031 LearningRate 0.0007 Epoch: 9 Global Step: 16960 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-03-03 16:35:57,370-Speed 13701.74 samples/sec Loss 3.0752 LearningRate 0.0007 Epoch: 9 Global Step: 16970 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-03-03 16:36:15,155-Speed 13819.86 samples/sec Loss 3.0848 LearningRate 0.0007 Epoch: 9 Global Step: 16980 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-03-03 16:36:32,939-Speed 13819.92 samples/sec Loss 3.0668 LearningRate 0.0007 Epoch: 9 Global Step: 16990 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-03 16:36:50,812-Speed 13751.70 samples/sec Loss 3.0936 LearningRate 0.0007 Epoch: 9 Global Step: 17000 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-03 16:37:09,518-Speed 13138.76 samples/sec Loss 3.1252 LearningRate 0.0007 Epoch: 9 Global Step: 17010 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-03 16:37:27,209-Speed 13892.83 samples/sec Loss 3.0890 LearningRate 0.0007 Epoch: 9 Global Step: 17020 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-03 16:37:44,935-Speed 13865.46 samples/sec Loss 3.0701 LearningRate 0.0007 Epoch: 9 Global Step: 17030 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-03 16:38:02,807-Speed 13751.84 samples/sec Loss 3.0901 LearningRate 0.0007 Epoch: 9 Global Step: 17040 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-03 16:38:20,470-Speed 13915.75 samples/sec Loss 3.0938 LearningRate 0.0007 Epoch: 9 Global Step: 17050 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-03-03 16:38:38,166-Speed 13889.03 samples/sec Loss 3.0937 LearningRate 0.0007 Epoch: 9 Global Step: 17060 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-03-03 16:38:55,925-Speed 13839.39 samples/sec Loss 3.0882 LearningRate 0.0007 Epoch: 9 Global Step: 17070 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-03-03 16:39:13,705-Speed 13823.18 samples/sec Loss 3.0668 LearningRate 0.0007 Epoch: 9 Global Step: 17080 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-03-03 16:39:31,375-Speed 13908.92 samples/sec Loss 3.0707 LearningRate 0.0007 Epoch: 9 Global Step: 17090 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-03-03 16:39:49,125-Speed 13846.68 samples/sec Loss 3.0786 LearningRate 0.0007 Epoch: 9 Global Step: 17100 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-03-03 16:40:06,858-Speed 13860.36 samples/sec Loss 3.0730 LearningRate 0.0007 Epoch: 9 Global Step: 17110 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-03-03 16:40:24,661-Speed 13804.97 samples/sec Loss 3.0589 LearningRate 0.0007 Epoch: 9 Global Step: 17120 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-03-03 16:40:42,417-Speed 13841.90 samples/sec Loss 3.0828 LearningRate 0.0007 Epoch: 9 Global Step: 17130 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-03-03 16:41:00,232-Speed 13796.94 samples/sec Loss 3.0949 LearningRate 0.0007 Epoch: 9 Global Step: 17140 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-03-03 16:41:18,003-Speed 13830.36 samples/sec Loss 3.0837 LearningRate 0.0007 Epoch: 9 Global Step: 17150 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-03 16:41:35,744-Speed 13854.03 samples/sec Loss 3.0616 LearningRate 0.0007 Epoch: 9 Global Step: 17160 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-03 16:41:53,500-Speed 13841.76 samples/sec Loss 3.0630 LearningRate 0.0007 Epoch: 9 Global Step: 17170 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-03 16:42:11,228-Speed 13863.87 samples/sec Loss 3.0577 LearningRate 0.0007 Epoch: 9 Global Step: 17180 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-03 16:42:28,955-Speed 13864.50 samples/sec Loss 3.0842 LearningRate 0.0007 Epoch: 9 Global Step: 17190 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-03 16:42:46,687-Speed 13860.96 samples/sec Loss 3.0783 LearningRate 0.0007 Epoch: 9 Global Step: 17200 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-03 16:43:04,404-Speed 13871.93 samples/sec Loss 3.0728 LearningRate 0.0007 Epoch: 9 Global Step: 17210 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-03 16:43:22,174-Speed 13830.74 samples/sec Loss 3.0773 LearningRate 0.0007 Epoch: 9 Global Step: 17220 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-03 16:43:39,915-Speed 13854.42 samples/sec Loss 3.0651 LearningRate 0.0007 Epoch: 9 Global Step: 17230 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-03 16:43:58,566-Speed 13179.16 samples/sec Loss 3.0650 LearningRate 0.0007 Epoch: 9 Global Step: 17240 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-03 16:44:16,346-Speed 13822.85 samples/sec Loss 3.0849 LearningRate 0.0007 Epoch: 9 Global Step: 17250 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-03-03 16:44:34,082-Speed 13857.37 samples/sec Loss 3.0949 LearningRate 0.0007 Epoch: 9 Global Step: 17260 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-03-03 16:44:51,862-Speed 13823.56 samples/sec Loss 3.0897 LearningRate 0.0007 Epoch: 9 Global Step: 17270 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-03-03 16:45:09,565-Speed 13883.38 samples/sec Loss 3.0961 LearningRate 0.0007 Epoch: 9 Global Step: 17280 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-03-03 16:46:17,217-Speed 3632.76 samples/sec Loss 3.0635 LearningRate 0.0007 Epoch: 10 Global Step: 17290 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-03 16:46:35,602-Speed 13368.07 samples/sec Loss 3.0191 LearningRate 0.0007 Epoch: 10 Global Step: 17300 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-03 16:46:53,262-Speed 13917.36 samples/sec Loss 3.0543 LearningRate 0.0007 Epoch: 10 Global Step: 17310 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-03 16:47:11,022-Speed 13838.74 samples/sec Loss 3.0267 LearningRate 0.0007 Epoch: 10 Global Step: 17320 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-03 16:47:28,785-Speed 13836.44 samples/sec Loss 3.0164 LearningRate 0.0007 Epoch: 10 Global Step: 17330 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-03 16:47:47,344-Speed 13242.83 samples/sec Loss 3.0198 LearningRate 0.0007 Epoch: 10 Global Step: 17340 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-03 16:48:05,100-Speed 13842.12 samples/sec Loss 3.0245 LearningRate 0.0007 Epoch: 10 Global Step: 17350 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-03 16:48:22,935-Speed 13780.44 samples/sec Loss 3.0090 LearningRate 0.0007 Epoch: 10 Global Step: 17360 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-03 16:48:41,382-Speed 13323.73 samples/sec Loss 3.0315 LearningRate 0.0007 Epoch: 10 Global Step: 17370 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-03 16:48:59,029-Speed 13927.72 samples/sec Loss 3.0137 LearningRate 0.0007 Epoch: 10 Global Step: 17380 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-03 16:49:16,730-Speed 13884.60 samples/sec Loss 3.0364 LearningRate 0.0007 Epoch: 10 Global Step: 17390 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-03-03 16:49:34,407-Speed 13904.37 samples/sec Loss 3.0371 LearningRate 0.0007 Epoch: 10 Global Step: 17400 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-03 16:49:52,061-Speed 13921.38 samples/sec Loss 3.0704 LearningRate 0.0007 Epoch: 10 Global Step: 17410 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-03 16:50:09,722-Speed 13916.32 samples/sec Loss 3.0465 LearningRate 0.0007 Epoch: 10 Global Step: 17420 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-03 16:50:27,424-Speed 13884.01 samples/sec Loss 3.0587 LearningRate 0.0007 Epoch: 10 Global Step: 17430 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-03 16:50:45,117-Speed 13891.13 samples/sec Loss 3.0242 LearningRate 0.0007 Epoch: 10 Global Step: 17440 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-03 16:51:02,896-Speed 13824.16 samples/sec Loss 3.0273 LearningRate 0.0007 Epoch: 10 Global Step: 17450 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-03 16:51:20,603-Speed 13880.57 samples/sec Loss 3.0059 LearningRate 0.0007 Epoch: 10 Global Step: 17460 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-03 16:51:38,276-Speed 13907.17 samples/sec Loss 3.0357 LearningRate 0.0007 Epoch: 10 Global Step: 17470 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-03 16:51:56,018-Speed 13852.12 samples/sec Loss 3.0206 LearningRate 0.0007 Epoch: 10 Global Step: 17480 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-03-03 16:52:13,912-Speed 13735.08 samples/sec Loss 3.0234 LearningRate 0.0007 Epoch: 10 Global Step: 17490 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-03-03 16:52:31,677-Speed 13835.01 samples/sec Loss 3.0192 LearningRate 0.0007 Epoch: 10 Global Step: 17500 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-03-03 16:52:49,365-Speed 13894.53 samples/sec Loss 3.0117 LearningRate 0.0007 Epoch: 10 Global Step: 17510 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-03-03 16:53:07,032-Speed 13912.60 samples/sec Loss 3.0132 LearningRate 0.0007 Epoch: 10 Global Step: 17520 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-03-03 16:53:24,736-Speed 13882.04 samples/sec Loss 3.0296 LearningRate 0.0007 Epoch: 10 Global Step: 17530 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-03-03 16:53:42,436-Speed 13885.27 samples/sec Loss 3.0579 LearningRate 0.0007 Epoch: 10 Global Step: 17540 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-03-03 16:54:00,144-Speed 13879.73 samples/sec Loss 3.0235 LearningRate 0.0007 Epoch: 10 Global Step: 17550 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-03-03 16:54:17,909-Speed 13835.41 samples/sec Loss 3.0125 LearningRate 0.0007 Epoch: 10 Global Step: 17560 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-03-03 16:54:35,617-Speed 13881.27 samples/sec Loss 3.0139 LearningRate 0.0007 Epoch: 10 Global Step: 17570 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-03-03 16:54:53,439-Speed 13790.06 samples/sec Loss 3.0487 LearningRate 0.0007 Epoch: 10 Global Step: 17580 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-03 16:55:11,082-Speed 13930.66 samples/sec Loss 3.0187 LearningRate 0.0007 Epoch: 10 Global Step: 17590 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-03 16:55:28,757-Speed 13905.68 samples/sec Loss 3.0383 LearningRate 0.0007 Epoch: 10 Global Step: 17600 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-03 16:55:46,517-Speed 13838.97 samples/sec Loss 3.0396 LearningRate 0.0007 Epoch: 10 Global Step: 17610 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-03 16:56:04,241-Speed 13866.04 samples/sec Loss 2.9991 LearningRate 0.0007 Epoch: 10 Global Step: 17620 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-03 16:56:21,969-Speed 13864.07 samples/sec Loss 3.0005 LearningRate 0.0007 Epoch: 10 Global Step: 17630 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-03 16:56:39,699-Speed 13861.82 samples/sec Loss 3.0231 LearningRate 0.0007 Epoch: 10 Global Step: 17640 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-03 16:56:57,356-Speed 13920.14 samples/sec Loss 3.0210 LearningRate 0.0007 Epoch: 10 Global Step: 17650 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-03-03 16:57:15,067-Speed 13876.66 samples/sec Loss 3.0017 LearningRate 0.0007 Epoch: 10 Global Step: 17660 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-03-03 16:57:32,775-Speed 13879.06 samples/sec Loss 3.0169 LearningRate 0.0007 Epoch: 10 Global Step: 17670 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-03-03 16:57:50,511-Speed 13857.53 samples/sec Loss 3.0267 LearningRate 0.0007 Epoch: 10 Global Step: 17680 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-03-03 16:58:08,284-Speed 13829.39 samples/sec Loss 3.0081 LearningRate 0.0007 Epoch: 10 Global Step: 17690 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-03-03 16:58:25,997-Speed 13875.14 samples/sec Loss 3.0088 LearningRate 0.0007 Epoch: 10 Global Step: 17700 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-03-03 16:58:43,692-Speed 13889.76 samples/sec Loss 2.9937 LearningRate 0.0007 Epoch: 10 Global Step: 17710 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-03-03 16:59:01,394-Speed 13883.61 samples/sec Loss 2.9903 LearningRate 0.0007 Epoch: 10 Global Step: 17720 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-03-03 16:59:19,107-Speed 13875.86 samples/sec Loss 3.0010 LearningRate 0.0007 Epoch: 10 Global Step: 17730 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-03-03 16:59:36,838-Speed 13861.22 samples/sec Loss 3.0091 LearningRate 0.0007 Epoch: 10 Global Step: 17740 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-03-03 16:59:54,524-Speed 13896.63 samples/sec Loss 2.9919 LearningRate 0.0007 Epoch: 10 Global Step: 17750 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-03 17:00:12,278-Speed 13843.15 samples/sec Loss 2.9950 LearningRate 0.0007 Epoch: 10 Global Step: 17760 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-03-03 17:00:30,072-Speed 13812.35 samples/sec Loss 3.0102 LearningRate 0.0007 Epoch: 10 Global Step: 17770 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-03-03 17:00:47,820-Speed 13848.98 samples/sec Loss 2.9773 LearningRate 0.0007 Epoch: 10 Global Step: 17780 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-03-03 17:01:05,555-Speed 13857.94 samples/sec Loss 3.0015 LearningRate 0.0007 Epoch: 10 Global Step: 17790 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-03-03 17:01:23,306-Speed 13845.30 samples/sec Loss 3.0144 LearningRate 0.0007 Epoch: 10 Global Step: 17800 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-03-03 17:01:41,052-Speed 13849.50 samples/sec Loss 3.0253 LearningRate 0.0007 Epoch: 10 Global Step: 17810 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-03-03 17:01:58,749-Speed 13888.84 samples/sec Loss 3.0051 LearningRate 0.0007 Epoch: 10 Global Step: 17820 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-03-03 17:02:16,435-Speed 13896.17 samples/sec Loss 2.9761 LearningRate 0.0007 Epoch: 10 Global Step: 17830 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-03-03 17:02:34,082-Speed 13927.39 samples/sec Loss 2.9828 LearningRate 0.0007 Epoch: 10 Global Step: 17840 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-03-03 17:02:51,799-Speed 13872.43 samples/sec Loss 2.9829 LearningRate 0.0007 Epoch: 10 Global Step: 17850 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-03-03 17:03:09,493-Speed 13890.56 samples/sec Loss 2.9735 LearningRate 0.0007 Epoch: 10 Global Step: 17860 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-03 17:03:27,224-Speed 13861.54 samples/sec Loss 2.9830 LearningRate 0.0007 Epoch: 10 Global Step: 17870 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-03 17:03:44,983-Speed 13839.13 samples/sec Loss 3.0161 LearningRate 0.0007 Epoch: 10 Global Step: 17880 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-03 17:04:02,739-Speed 13841.46 samples/sec Loss 2.9918 LearningRate 0.0007 Epoch: 10 Global Step: 17890 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-03 17:04:20,408-Speed 13910.70 samples/sec Loss 2.9606 LearningRate 0.0007 Epoch: 10 Global Step: 17900 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-03 17:04:38,144-Speed 13857.71 samples/sec Loss 2.9632 LearningRate 0.0007 Epoch: 10 Global Step: 17910 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-03 17:04:55,828-Speed 13898.36 samples/sec Loss 2.9838 LearningRate 0.0007 Epoch: 10 Global Step: 17920 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-03 17:05:13,569-Speed 13853.78 samples/sec Loss 2.9891 LearningRate 0.0007 Epoch: 10 Global Step: 17930 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-03 17:05:31,297-Speed 13863.45 samples/sec Loss 2.9870 LearningRate 0.0007 Epoch: 10 Global Step: 17940 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-03 17:05:49,033-Speed 13857.85 samples/sec Loss 2.9921 LearningRate 0.0007 Epoch: 10 Global Step: 17950 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-03 17:06:06,739-Speed 13881.34 samples/sec Loss 2.9800 LearningRate 0.0007 Epoch: 10 Global Step: 17960 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-03-03 17:06:24,419-Speed 13900.64 samples/sec Loss 2.9428 LearningRate 0.0007 Epoch: 10 Global Step: 17970 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-03 17:06:42,131-Speed 13876.65 samples/sec Loss 2.9436 LearningRate 0.0007 Epoch: 10 Global Step: 17980 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-03 17:06:59,866-Speed 13858.48 samples/sec Loss 2.9832 LearningRate 0.0007 Epoch: 10 Global Step: 17990 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-03 17:07:17,590-Speed 13866.91 samples/sec Loss 2.9707 LearningRate 0.0007 Epoch: 10 Global Step: 18000 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-03-03 17:07:35,313-Speed 13867.65 samples/sec Loss 2.9794 LearningRate 0.0007 Epoch: 10 Global Step: 18010 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-03-03 17:07:53,013-Speed 13885.03 samples/sec Loss 2.9613 LearningRate 0.0007 Epoch: 10 Global Step: 18020 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-03-03 17:08:10,750-Speed 13857.42 samples/sec Loss 2.9603 LearningRate 0.0007 Epoch: 10 Global Step: 18030 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-03-03 17:08:28,489-Speed 13854.78 samples/sec Loss 2.9694 LearningRate 0.0007 Epoch: 10 Global Step: 18040 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-03-03 17:08:46,267-Speed 13824.98 samples/sec Loss 2.9843 LearningRate 0.0007 Epoch: 10 Global Step: 18050 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-03-03 17:09:03,998-Speed 13861.25 samples/sec Loss 3.0008 LearningRate 0.0007 Epoch: 10 Global Step: 18060 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-03-03 17:09:21,734-Speed 13857.49 samples/sec Loss 2.9594 LearningRate 0.0007 Epoch: 10 Global Step: 18070 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-03-03 17:09:39,420-Speed 13897.13 samples/sec Loss 2.9590 LearningRate 0.0007 Epoch: 10 Global Step: 18080 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-03-03 17:09:57,195-Speed 13826.98 samples/sec Loss 2.9538 LearningRate 0.0007 Epoch: 10 Global Step: 18090 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-03-03 17:10:14,944-Speed 13847.33 samples/sec Loss 2.9430 LearningRate 0.0007 Epoch: 10 Global Step: 18100 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-03 17:10:32,681-Speed 13856.31 samples/sec Loss 2.9225 LearningRate 0.0007 Epoch: 10 Global Step: 18110 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-03 17:10:50,414-Speed 13860.20 samples/sec Loss 2.9596 LearningRate 0.0007 Epoch: 10 Global Step: 18120 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-03 17:11:08,099-Speed 13897.61 samples/sec Loss 2.9412 LearningRate 0.0007 Epoch: 10 Global Step: 18130 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-03 17:11:25,825-Speed 13865.10 samples/sec Loss 2.9424 LearningRate 0.0007 Epoch: 10 Global Step: 18140 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-03 17:11:43,544-Speed 13870.71 samples/sec Loss 2.9851 LearningRate 0.0007 Epoch: 10 Global Step: 18150 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-03 17:12:01,207-Speed 13914.64 samples/sec Loss 2.9633 LearningRate 0.0007 Epoch: 10 Global Step: 18160 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-03 17:12:18,881-Speed 13906.19 samples/sec Loss 2.9412 LearningRate 0.0007 Epoch: 10 Global Step: 18170 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-03 17:12:36,664-Speed 13821.50 samples/sec Loss 2.9291 LearningRate 0.0007 Epoch: 10 Global Step: 18180 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-03 17:12:54,390-Speed 13864.87 samples/sec Loss 2.9377 LearningRate 0.0007 Epoch: 10 Global Step: 18190 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-03 17:13:12,094-Speed 13882.42 samples/sec Loss 2.9382 LearningRate 0.0007 Epoch: 10 Global Step: 18200 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-03-03 17:13:29,811-Speed 13872.68 samples/sec Loss 2.9433 LearningRate 0.0007 Epoch: 10 Global Step: 18210 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-03 17:13:47,592-Speed 13824.31 samples/sec Loss 2.9568 LearningRate 0.0007 Epoch: 10 Global Step: 18220 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-03 17:14:05,400-Speed 13801.88 samples/sec Loss 2.9364 LearningRate 0.0007 Epoch: 10 Global Step: 18230 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-03 17:14:23,100-Speed 13886.03 samples/sec Loss 2.9350 LearningRate 0.0007 Epoch: 10 Global Step: 18240 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-03 17:14:40,860-Speed 13838.72 samples/sec Loss 2.9435 LearningRate 0.0007 Epoch: 10 Global Step: 18250 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-03 17:14:58,569-Speed 13878.00 samples/sec Loss 2.9455 LearningRate 0.0007 Epoch: 10 Global Step: 18260 Fp16 Grad Scale: 65536 Required: 26 hours Training: 2022-03-03 17:15:16,349-Speed 13823.25 samples/sec Loss 2.9215 LearningRate 0.0007 Epoch: 10 Global Step: 18270 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-03-03 17:15:34,042-Speed 13891.59 samples/sec Loss 2.9200 LearningRate 0.0007 Epoch: 10 Global Step: 18280 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-03-03 17:15:51,775-Speed 13859.60 samples/sec Loss 2.9062 LearningRate 0.0007 Epoch: 10 Global Step: 18290 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-03-03 17:16:09,595-Speed 13791.85 samples/sec Loss 2.9442 LearningRate 0.0007 Epoch: 10 Global Step: 18300 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-03-03 17:16:27,317-Speed 13868.61 samples/sec Loss 2.9439 LearningRate 0.0007 Epoch: 10 Global Step: 18310 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-03-03 17:16:44,978-Speed 13916.52 samples/sec Loss 2.9262 LearningRate 0.0007 Epoch: 10 Global Step: 18320 Fp16 Grad Scale: 32768 Required: 26 hours Training: 2022-03-03 17:17:02,762-Speed 13819.30 samples/sec Loss 2.9511 LearningRate 0.0007 Epoch: 10 Global Step: 18330 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-03 17:17:20,435-Speed 13906.81 samples/sec Loss 2.9241 LearningRate 0.0007 Epoch: 10 Global Step: 18340 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-03 17:17:38,228-Speed 13813.52 samples/sec Loss 2.9336 LearningRate 0.0007 Epoch: 10 Global Step: 18350 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-03 17:17:55,898-Speed 13909.74 samples/sec Loss 2.9367 LearningRate 0.0007 Epoch: 10 Global Step: 18360 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-03 17:18:13,642-Speed 13852.27 samples/sec Loss 2.9256 LearningRate 0.0007 Epoch: 10 Global Step: 18370 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-03 17:18:31,408-Speed 13834.37 samples/sec Loss 2.9560 LearningRate 0.0007 Epoch: 10 Global Step: 18380 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-03 17:18:49,219-Speed 13798.62 samples/sec Loss 2.9589 LearningRate 0.0007 Epoch: 10 Global Step: 18390 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-03 17:19:06,871-Speed 13923.31 samples/sec Loss 2.9056 LearningRate 0.0007 Epoch: 10 Global Step: 18400 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-03 17:19:24,621-Speed 13846.83 samples/sec Loss 2.9164 LearningRate 0.0007 Epoch: 10 Global Step: 18410 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-03 17:19:42,339-Speed 13872.13 samples/sec Loss 2.9338 LearningRate 0.0007 Epoch: 10 Global Step: 18420 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-03 17:20:00,052-Speed 13875.60 samples/sec Loss 2.9063 LearningRate 0.0007 Epoch: 10 Global Step: 18430 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-03 17:20:17,861-Speed 13799.94 samples/sec Loss 2.9225 LearningRate 0.0007 Epoch: 10 Global Step: 18440 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-03 17:20:35,594-Speed 13860.33 samples/sec Loss 2.9392 LearningRate 0.0007 Epoch: 10 Global Step: 18450 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-03 17:20:53,331-Speed 13856.54 samples/sec Loss 2.9149 LearningRate 0.0007 Epoch: 10 Global Step: 18460 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-03 17:21:11,044-Speed 13875.62 samples/sec Loss 2.9201 LearningRate 0.0007 Epoch: 10 Global Step: 18470 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-03-03 17:21:28,788-Speed 13851.55 samples/sec Loss 2.9120 LearningRate 0.0007 Epoch: 10 Global Step: 18480 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-03-03 17:21:46,523-Speed 13858.43 samples/sec Loss 2.8831 LearningRate 0.0007 Epoch: 10 Global Step: 18490 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-03-03 17:22:04,241-Speed 13872.20 samples/sec Loss 2.9140 LearningRate 0.0007 Epoch: 10 Global Step: 18500 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-03 17:22:21,907-Speed 13911.79 samples/sec Loss 2.9709 LearningRate 0.0007 Epoch: 10 Global Step: 18510 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-03 17:22:39,642-Speed 13858.79 samples/sec Loss 2.9022 LearningRate 0.0007 Epoch: 10 Global Step: 18520 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-03 17:22:57,320-Speed 13902.70 samples/sec Loss 2.8977 LearningRate 0.0007 Epoch: 10 Global Step: 18530 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-03 17:23:15,042-Speed 13868.81 samples/sec Loss 2.9414 LearningRate 0.0007 Epoch: 10 Global Step: 18540 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-03 17:23:32,701-Speed 13917.83 samples/sec Loss 2.9015 LearningRate 0.0007 Epoch: 10 Global Step: 18550 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-03 17:23:50,456-Speed 13842.68 samples/sec Loss 2.9083 LearningRate 0.0007 Epoch: 10 Global Step: 18560 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-03 17:24:08,123-Speed 13911.81 samples/sec Loss 2.8600 LearningRate 0.0007 Epoch: 10 Global Step: 18570 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-03 17:24:25,813-Speed 13893.18 samples/sec Loss 2.9127 LearningRate 0.0007 Epoch: 10 Global Step: 18580 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-03 17:24:43,539-Speed 13865.72 samples/sec Loss 2.9270 LearningRate 0.0007 Epoch: 10 Global Step: 18590 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-03 17:25:01,285-Speed 13849.16 samples/sec Loss 2.9019 LearningRate 0.0007 Epoch: 10 Global Step: 18600 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-03 17:25:19,023-Speed 13856.00 samples/sec Loss 2.8862 LearningRate 0.0007 Epoch: 10 Global Step: 18610 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-03 17:25:36,754-Speed 13861.83 samples/sec Loss 2.9019 LearningRate 0.0007 Epoch: 10 Global Step: 18620 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-03 17:25:54,534-Speed 13822.76 samples/sec Loss 2.9174 LearningRate 0.0007 Epoch: 10 Global Step: 18630 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-03 17:26:12,244-Speed 13878.69 samples/sec Loss 2.8947 LearningRate 0.0007 Epoch: 10 Global Step: 18640 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-03 17:26:30,022-Speed 13825.03 samples/sec Loss 2.9127 LearningRate 0.0007 Epoch: 10 Global Step: 18650 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-03 17:26:47,703-Speed 13900.80 samples/sec Loss 2.8929 LearningRate 0.0007 Epoch: 10 Global Step: 18660 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-03 17:27:05,482-Speed 13823.25 samples/sec Loss 2.8855 LearningRate 0.0007 Epoch: 10 Global Step: 18670 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-03 17:27:23,378-Speed 13733.68 samples/sec Loss 2.9057 LearningRate 0.0007 Epoch: 10 Global Step: 18680 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-03 17:27:41,131-Speed 13844.19 samples/sec Loss 2.9152 LearningRate 0.0007 Epoch: 10 Global Step: 18690 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-03 17:27:59,056-Speed 13711.73 samples/sec Loss 2.9067 LearningRate 0.0007 Epoch: 10 Global Step: 18700 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-03 17:28:16,771-Speed 13873.89 samples/sec Loss 2.8696 LearningRate 0.0007 Epoch: 10 Global Step: 18710 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-03 17:28:34,514-Speed 13851.69 samples/sec Loss 2.8995 LearningRate 0.0007 Epoch: 10 Global Step: 18720 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-03 17:28:52,257-Speed 13852.25 samples/sec Loss 2.8950 LearningRate 0.0007 Epoch: 10 Global Step: 18730 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-03 17:29:09,984-Speed 13864.71 samples/sec Loss 2.8833 LearningRate 0.0007 Epoch: 10 Global Step: 18740 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-03 17:29:27,809-Speed 13788.67 samples/sec Loss 2.8588 LearningRate 0.0007 Epoch: 10 Global Step: 18750 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-03-03 17:29:45,554-Speed 13850.31 samples/sec Loss 2.8986 LearningRate 0.0007 Epoch: 10 Global Step: 18760 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-03 17:30:03,318-Speed 13835.50 samples/sec Loss 2.8909 LearningRate 0.0007 Epoch: 10 Global Step: 18770 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-03 17:30:21,019-Speed 13885.26 samples/sec Loss 2.8910 LearningRate 0.0007 Epoch: 10 Global Step: 18780 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-03 17:30:38,762-Speed 13851.64 samples/sec Loss 2.8805 LearningRate 0.0007 Epoch: 10 Global Step: 18790 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-03 17:30:56,583-Speed 13791.33 samples/sec Loss 2.8653 LearningRate 0.0007 Epoch: 10 Global Step: 18800 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-03 17:31:14,320-Speed 13856.98 samples/sec Loss 2.8673 LearningRate 0.0007 Epoch: 10 Global Step: 18810 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-03 17:31:32,032-Speed 13876.20 samples/sec Loss 2.8941 LearningRate 0.0007 Epoch: 10 Global Step: 18820 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-03 17:31:49,735-Speed 13883.60 samples/sec Loss 2.8790 LearningRate 0.0007 Epoch: 10 Global Step: 18830 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-03 17:32:07,499-Speed 13835.32 samples/sec Loss 2.8857 LearningRate 0.0007 Epoch: 10 Global Step: 18840 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-03 17:32:25,187-Speed 13895.39 samples/sec Loss 2.8660 LearningRate 0.0007 Epoch: 10 Global Step: 18850 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-03 17:32:42,900-Speed 13875.46 samples/sec Loss 2.8685 LearningRate 0.0007 Epoch: 10 Global Step: 18860 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-03 17:33:00,703-Speed 13804.93 samples/sec Loss 2.8745 LearningRate 0.0007 Epoch: 10 Global Step: 18870 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-03 17:33:18,434-Speed 13861.45 samples/sec Loss 2.9025 LearningRate 0.0007 Epoch: 10 Global Step: 18880 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-03 17:33:36,115-Speed 13900.95 samples/sec Loss 2.8978 LearningRate 0.0007 Epoch: 10 Global Step: 18890 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-03 17:33:53,835-Speed 13869.65 samples/sec Loss 2.8795 LearningRate 0.0007 Epoch: 10 Global Step: 18900 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-03 17:34:11,553-Speed 13871.95 samples/sec Loss 2.8629 LearningRate 0.0007 Epoch: 10 Global Step: 18910 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-03 17:34:29,311-Speed 13840.21 samples/sec Loss 2.8570 LearningRate 0.0007 Epoch: 10 Global Step: 18920 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-03 17:34:47,069-Speed 13840.70 samples/sec Loss 2.8448 LearningRate 0.0007 Epoch: 10 Global Step: 18930 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-03 17:35:04,895-Speed 13787.37 samples/sec Loss 2.8665 LearningRate 0.0007 Epoch: 10 Global Step: 18940 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-03 17:35:22,659-Speed 13835.85 samples/sec Loss 2.9062 LearningRate 0.0007 Epoch: 10 Global Step: 18950 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-03 17:35:40,396-Speed 13856.65 samples/sec Loss 2.8729 LearningRate 0.0007 Epoch: 10 Global Step: 18960 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-03 17:35:58,092-Speed 13888.42 samples/sec Loss 2.8947 LearningRate 0.0006 Epoch: 10 Global Step: 18970 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-03 17:36:15,869-Speed 13824.92 samples/sec Loss 2.8805 LearningRate 0.0006 Epoch: 10 Global Step: 18980 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-03 17:36:33,681-Speed 13801.11 samples/sec Loss 2.8711 LearningRate 0.0006 Epoch: 10 Global Step: 18990 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-03 17:36:51,381-Speed 13885.97 samples/sec Loss 2.8687 LearningRate 0.0006 Epoch: 10 Global Step: 19000 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-03 17:37:09,151-Speed 13830.30 samples/sec Loss 2.9261 LearningRate 0.0006 Epoch: 10 Global Step: 19010 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-03 17:38:16,686-Speed 3639.08 samples/sec Loss 2.8415 LearningRate 0.0006 Epoch: 11 Global Step: 19020 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-03 17:38:34,410-Speed 13866.44 samples/sec Loss 2.8228 LearningRate 0.0006 Epoch: 11 Global Step: 19030 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-03 17:38:52,075-Speed 13913.90 samples/sec Loss 2.8350 LearningRate 0.0006 Epoch: 11 Global Step: 19040 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-03 17:39:09,808-Speed 13859.10 samples/sec Loss 2.8362 LearningRate 0.0006 Epoch: 11 Global Step: 19050 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-03 17:39:27,557-Speed 13847.29 samples/sec Loss 2.8287 LearningRate 0.0006 Epoch: 11 Global Step: 19060 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-03 17:39:45,285-Speed 13863.93 samples/sec Loss 2.8569 LearningRate 0.0006 Epoch: 11 Global Step: 19070 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-03 17:40:03,003-Speed 13871.76 samples/sec Loss 2.8640 LearningRate 0.0006 Epoch: 11 Global Step: 19080 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-03 17:40:20,715-Speed 13876.32 samples/sec Loss 2.8705 LearningRate 0.0006 Epoch: 11 Global Step: 19090 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-03 17:40:38,449-Speed 13858.42 samples/sec Loss 2.8469 LearningRate 0.0006 Epoch: 11 Global Step: 19100 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-03 17:40:56,120-Speed 13908.93 samples/sec Loss 2.8204 LearningRate 0.0006 Epoch: 11 Global Step: 19110 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-03 17:41:13,886-Speed 13833.85 samples/sec Loss 2.8335 LearningRate 0.0006 Epoch: 11 Global Step: 19120 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-03 17:41:31,673-Speed 13818.04 samples/sec Loss 2.8485 LearningRate 0.0006 Epoch: 11 Global Step: 19130 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-03 17:41:49,405-Speed 13859.93 samples/sec Loss 2.8360 LearningRate 0.0006 Epoch: 11 Global Step: 19140 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-03 17:42:07,145-Speed 13854.97 samples/sec Loss 2.8165 LearningRate 0.0006 Epoch: 11 Global Step: 19150 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-03 17:42:24,902-Speed 13840.64 samples/sec Loss 2.8370 LearningRate 0.0006 Epoch: 11 Global Step: 19160 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-03 17:42:42,594-Speed 13892.58 samples/sec Loss 2.8456 LearningRate 0.0006 Epoch: 11 Global Step: 19170 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-03 17:43:00,388-Speed 13811.59 samples/sec Loss 2.8441 LearningRate 0.0006 Epoch: 11 Global Step: 19180 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-03 17:43:18,176-Speed 13817.91 samples/sec Loss 2.8324 LearningRate 0.0006 Epoch: 11 Global Step: 19190 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-03 17:43:35,964-Speed 13816.79 samples/sec Loss 2.8490 LearningRate 0.0006 Epoch: 11 Global Step: 19200 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-03-03 17:43:53,739-Speed 13826.64 samples/sec Loss 2.8389 LearningRate 0.0006 Epoch: 11 Global Step: 19210 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-03-03 17:44:11,507-Speed 13833.08 samples/sec Loss 2.8238 LearningRate 0.0006 Epoch: 11 Global Step: 19220 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-03-03 17:44:29,217-Speed 13877.75 samples/sec Loss 2.8301 LearningRate 0.0006 Epoch: 11 Global Step: 19230 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-03-03 17:44:46,958-Speed 13853.90 samples/sec Loss 2.8158 LearningRate 0.0006 Epoch: 11 Global Step: 19240 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-03-03 17:45:04,664-Speed 13880.84 samples/sec Loss 2.8434 LearningRate 0.0006 Epoch: 11 Global Step: 19250 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-03-03 17:45:22,409-Speed 13850.88 samples/sec Loss 2.8381 LearningRate 0.0006 Epoch: 11 Global Step: 19260 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-03-03 17:45:40,166-Speed 13840.66 samples/sec Loss 2.8388 LearningRate 0.0006 Epoch: 11 Global Step: 19270 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-03-03 17:45:57,937-Speed 13830.57 samples/sec Loss 2.8211 LearningRate 0.0006 Epoch: 11 Global Step: 19280 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-03 17:46:15,702-Speed 13834.99 samples/sec Loss 2.8174 LearningRate 0.0006 Epoch: 11 Global Step: 19290 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-03 17:46:33,444-Speed 13852.18 samples/sec Loss 2.8027 LearningRate 0.0006 Epoch: 11 Global Step: 19300 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-03 17:46:51,134-Speed 13893.61 samples/sec Loss 2.8403 LearningRate 0.0006 Epoch: 11 Global Step: 19310 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-03 17:47:08,844-Speed 13878.15 samples/sec Loss 2.8605 LearningRate 0.0006 Epoch: 11 Global Step: 19320 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-03 17:47:26,596-Speed 13844.59 samples/sec Loss 2.8397 LearningRate 0.0006 Epoch: 11 Global Step: 19330 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-03 17:47:44,314-Speed 13871.75 samples/sec Loss 2.8112 LearningRate 0.0006 Epoch: 11 Global Step: 19340 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-03 17:48:02,076-Speed 13836.87 samples/sec Loss 2.8186 LearningRate 0.0006 Epoch: 11 Global Step: 19350 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-03 17:48:19,884-Speed 13801.94 samples/sec Loss 2.8019 LearningRate 0.0006 Epoch: 11 Global Step: 19360 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-03 17:48:37,603-Speed 13871.35 samples/sec Loss 2.8288 LearningRate 0.0006 Epoch: 11 Global Step: 19370 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-03 17:48:55,410-Speed 13801.59 samples/sec Loss 2.8139 LearningRate 0.0006 Epoch: 11 Global Step: 19380 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-03-03 17:49:13,107-Speed 13888.30 samples/sec Loss 2.8661 LearningRate 0.0006 Epoch: 11 Global Step: 19390 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-03-03 17:49:30,870-Speed 13836.57 samples/sec Loss 2.8414 LearningRate 0.0006 Epoch: 11 Global Step: 19400 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-03-03 17:49:48,828-Speed 13686.39 samples/sec Loss 2.8084 LearningRate 0.0006 Epoch: 11 Global Step: 19410 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-03-03 17:50:06,633-Speed 13803.22 samples/sec Loss 2.8131 LearningRate 0.0006 Epoch: 11 Global Step: 19420 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-03-03 17:50:24,389-Speed 13841.56 samples/sec Loss 2.7864 LearningRate 0.0006 Epoch: 11 Global Step: 19430 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-03-03 17:50:42,115-Speed 13865.56 samples/sec Loss 2.8030 LearningRate 0.0006 Epoch: 11 Global Step: 19440 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-03 17:50:59,963-Speed 13770.85 samples/sec Loss 2.7975 LearningRate 0.0006 Epoch: 11 Global Step: 19450 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-03 17:51:17,773-Speed 13800.31 samples/sec Loss 2.8566 LearningRate 0.0006 Epoch: 11 Global Step: 19460 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-03 17:51:35,579-Speed 13802.43 samples/sec Loss 2.8270 LearningRate 0.0006 Epoch: 11 Global Step: 19470 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-03 17:51:53,446-Speed 13755.76 samples/sec Loss 2.8331 LearningRate 0.0006 Epoch: 11 Global Step: 19480 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-03 17:52:11,240-Speed 13812.66 samples/sec Loss 2.8059 LearningRate 0.0006 Epoch: 11 Global Step: 19490 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-03 17:52:29,023-Speed 13820.96 samples/sec Loss 2.7922 LearningRate 0.0006 Epoch: 11 Global Step: 19500 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-03 17:52:46,714-Speed 13892.08 samples/sec Loss 2.8314 LearningRate 0.0006 Epoch: 11 Global Step: 19510 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-03 17:53:04,493-Speed 13824.07 samples/sec Loss 2.8080 LearningRate 0.0006 Epoch: 11 Global Step: 19520 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-03 17:53:22,219-Speed 13865.28 samples/sec Loss 2.8116 LearningRate 0.0006 Epoch: 11 Global Step: 19530 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-03 17:53:39,967-Speed 13848.60 samples/sec Loss 2.8082 LearningRate 0.0006 Epoch: 11 Global Step: 19540 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-03 17:53:57,757-Speed 13815.01 samples/sec Loss 2.7787 LearningRate 0.0006 Epoch: 11 Global Step: 19550 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-03 17:54:15,568-Speed 13798.94 samples/sec Loss 2.8672 LearningRate 0.0006 Epoch: 11 Global Step: 19560 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-03-03 17:54:33,288-Speed 13870.87 samples/sec Loss 2.8075 LearningRate 0.0006 Epoch: 11 Global Step: 19570 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-03-03 17:54:51,112-Speed 13789.11 samples/sec Loss 2.8123 LearningRate 0.0006 Epoch: 11 Global Step: 19580 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-03-03 17:55:08,817-Speed 13881.81 samples/sec Loss 2.7881 LearningRate 0.0006 Epoch: 11 Global Step: 19590 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-03-03 17:55:26,491-Speed 13905.68 samples/sec Loss 2.7797 LearningRate 0.0006 Epoch: 11 Global Step: 19600 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-03-03 17:55:44,261-Speed 13831.25 samples/sec Loss 2.7838 LearningRate 0.0006 Epoch: 11 Global Step: 19610 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-03-03 17:56:02,057-Speed 13810.92 samples/sec Loss 2.7871 LearningRate 0.0006 Epoch: 11 Global Step: 19620 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-03-03 17:56:19,841-Speed 13819.95 samples/sec Loss 2.7865 LearningRate 0.0006 Epoch: 11 Global Step: 19630 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-03-03 17:56:37,548-Speed 13880.11 samples/sec Loss 2.7960 LearningRate 0.0006 Epoch: 11 Global Step: 19640 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-03-03 17:56:55,357-Speed 13800.78 samples/sec Loss 2.8076 LearningRate 0.0006 Epoch: 11 Global Step: 19650 Fp16 Grad Scale: 16384 Required: 25 hours Training: 2022-03-03 17:57:13,105-Speed 13848.03 samples/sec Loss 2.7937 LearningRate 0.0006 Epoch: 11 Global Step: 19660 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-03 17:57:30,854-Speed 13847.88 samples/sec Loss 2.8104 LearningRate 0.0006 Epoch: 11 Global Step: 19670 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-03 17:57:48,651-Speed 13809.23 samples/sec Loss 2.7975 LearningRate 0.0006 Epoch: 11 Global Step: 19680 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-03 17:58:06,376-Speed 13866.48 samples/sec Loss 2.7990 LearningRate 0.0006 Epoch: 11 Global Step: 19690 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-03 17:58:24,126-Speed 13846.46 samples/sec Loss 2.7975 LearningRate 0.0006 Epoch: 11 Global Step: 19700 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-03 17:58:41,845-Speed 13871.27 samples/sec Loss 2.7868 LearningRate 0.0006 Epoch: 11 Global Step: 19710 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-03 17:58:59,603-Speed 13839.86 samples/sec Loss 2.7995 LearningRate 0.0006 Epoch: 11 Global Step: 19720 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-03 17:59:17,317-Speed 13875.44 samples/sec Loss 2.7800 LearningRate 0.0006 Epoch: 11 Global Step: 19730 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-03 17:59:35,055-Speed 13857.10 samples/sec Loss 2.7885 LearningRate 0.0006 Epoch: 11 Global Step: 19740 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-03 17:59:52,853-Speed 13809.47 samples/sec Loss 2.7934 LearningRate 0.0006 Epoch: 11 Global Step: 19750 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-03 18:00:10,581-Speed 13863.21 samples/sec Loss 2.8118 LearningRate 0.0006 Epoch: 11 Global Step: 19760 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-03 18:00:28,393-Speed 13799.62 samples/sec Loss 2.7937 LearningRate 0.0006 Epoch: 11 Global Step: 19770 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-03 18:00:46,133-Speed 13853.99 samples/sec Loss 2.8024 LearningRate 0.0006 Epoch: 11 Global Step: 19780 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-03 18:01:03,920-Speed 13817.76 samples/sec Loss 2.7590 LearningRate 0.0006 Epoch: 11 Global Step: 19790 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-03 18:01:21,650-Speed 13862.07 samples/sec Loss 2.7655 LearningRate 0.0006 Epoch: 11 Global Step: 19800 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-03 18:01:39,439-Speed 13816.53 samples/sec Loss 2.7730 LearningRate 0.0006 Epoch: 11 Global Step: 19810 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-03 18:01:57,250-Speed 13798.73 samples/sec Loss 2.7767 LearningRate 0.0006 Epoch: 11 Global Step: 19820 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-03 18:02:15,015-Speed 13835.26 samples/sec Loss 2.7687 LearningRate 0.0006 Epoch: 11 Global Step: 19830 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-03 18:02:32,731-Speed 13873.68 samples/sec Loss 2.8150 LearningRate 0.0006 Epoch: 11 Global Step: 19840 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-03 18:02:50,542-Speed 13798.70 samples/sec Loss 2.7821 LearningRate 0.0006 Epoch: 11 Global Step: 19850 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-03 18:03:08,242-Speed 13886.06 samples/sec Loss 2.7742 LearningRate 0.0006 Epoch: 11 Global Step: 19860 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-03 18:03:26,038-Speed 13810.69 samples/sec Loss 2.7539 LearningRate 0.0006 Epoch: 11 Global Step: 19870 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-03 18:03:43,798-Speed 13838.17 samples/sec Loss 2.7704 LearningRate 0.0006 Epoch: 11 Global Step: 19880 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-03 18:04:01,578-Speed 13823.59 samples/sec Loss 2.7360 LearningRate 0.0006 Epoch: 11 Global Step: 19890 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-03 18:04:19,372-Speed 13813.31 samples/sec Loss 2.7793 LearningRate 0.0006 Epoch: 11 Global Step: 19900 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-03 18:04:37,105-Speed 13860.64 samples/sec Loss 2.7522 LearningRate 0.0006 Epoch: 11 Global Step: 19910 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-03 18:04:54,842-Speed 13856.98 samples/sec Loss 2.7434 LearningRate 0.0006 Epoch: 11 Global Step: 19920 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-03 18:05:12,600-Speed 13839.76 samples/sec Loss 2.7476 LearningRate 0.0006 Epoch: 11 Global Step: 19930 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-03 18:05:30,410-Speed 13799.90 samples/sec Loss 2.7787 LearningRate 0.0006 Epoch: 11 Global Step: 19940 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-03 18:05:48,190-Speed 13823.14 samples/sec Loss 2.7691 LearningRate 0.0006 Epoch: 11 Global Step: 19950 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-03 18:06:05,988-Speed 13809.28 samples/sec Loss 2.7506 LearningRate 0.0006 Epoch: 11 Global Step: 19960 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-03 18:06:23,744-Speed 13842.74 samples/sec Loss 2.7435 LearningRate 0.0006 Epoch: 11 Global Step: 19970 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-03 18:06:41,511-Speed 13833.17 samples/sec Loss 2.7602 LearningRate 0.0006 Epoch: 11 Global Step: 19980 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-03 18:06:59,273-Speed 13837.47 samples/sec Loss 2.7717 LearningRate 0.0006 Epoch: 11 Global Step: 19990 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-03 18:07:16,983-Speed 13877.52 samples/sec Loss 2.7690 LearningRate 0.0006 Epoch: 11 Global Step: 20000 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-03 18:07:34,740-Speed 13841.28 samples/sec Loss 2.7511 LearningRate 0.0006 Epoch: 11 Global Step: 20010 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-03 18:07:52,465-Speed 13865.61 samples/sec Loss 2.7601 LearningRate 0.0006 Epoch: 11 Global Step: 20020 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-03 18:08:10,207-Speed 13853.60 samples/sec Loss 2.7730 LearningRate 0.0006 Epoch: 11 Global Step: 20030 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-03 18:08:28,020-Speed 13796.87 samples/sec Loss 2.7746 LearningRate 0.0006 Epoch: 11 Global Step: 20040 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-03 18:08:45,713-Speed 13891.22 samples/sec Loss 2.7688 LearningRate 0.0006 Epoch: 11 Global Step: 20050 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-03 18:09:03,410-Speed 13888.45 samples/sec Loss 2.7719 LearningRate 0.0006 Epoch: 11 Global Step: 20060 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-03 18:09:21,183-Speed 13828.31 samples/sec Loss 2.7577 LearningRate 0.0006 Epoch: 11 Global Step: 20070 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-03 18:09:38,926-Speed 13852.24 samples/sec Loss 2.7271 LearningRate 0.0006 Epoch: 11 Global Step: 20080 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-03 18:09:56,646-Speed 13869.58 samples/sec Loss 2.7474 LearningRate 0.0006 Epoch: 11 Global Step: 20090 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-03 18:10:14,381-Speed 13858.08 samples/sec Loss 2.7858 LearningRate 0.0006 Epoch: 11 Global Step: 20100 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-03 18:10:32,127-Speed 13849.60 samples/sec Loss 2.7708 LearningRate 0.0006 Epoch: 11 Global Step: 20110 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-03 18:10:49,924-Speed 13810.93 samples/sec Loss 2.7497 LearningRate 0.0006 Epoch: 11 Global Step: 20120 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-03 18:11:07,748-Speed 13789.48 samples/sec Loss 2.7335 LearningRate 0.0006 Epoch: 11 Global Step: 20130 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-03 18:11:25,539-Speed 13813.83 samples/sec Loss 2.7349 LearningRate 0.0006 Epoch: 11 Global Step: 20140 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-03 18:11:43,277-Speed 13856.37 samples/sec Loss 2.7596 LearningRate 0.0006 Epoch: 11 Global Step: 20150 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-03 18:12:01,035-Speed 13840.87 samples/sec Loss 2.7401 LearningRate 0.0006 Epoch: 11 Global Step: 20160 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-03 18:12:18,757-Speed 13868.99 samples/sec Loss 2.7690 LearningRate 0.0006 Epoch: 11 Global Step: 20170 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-03 18:12:36,513-Speed 13841.41 samples/sec Loss 2.7537 LearningRate 0.0006 Epoch: 11 Global Step: 20180 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-03 18:12:54,244-Speed 13861.10 samples/sec Loss 2.7221 LearningRate 0.0006 Epoch: 11 Global Step: 20190 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-03 18:13:11,967-Speed 13867.60 samples/sec Loss 2.7213 LearningRate 0.0006 Epoch: 11 Global Step: 20200 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-03 18:13:29,694-Speed 13864.92 samples/sec Loss 2.7434 LearningRate 0.0006 Epoch: 11 Global Step: 20210 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-03 18:13:47,466-Speed 13829.25 samples/sec Loss 2.7482 LearningRate 0.0006 Epoch: 11 Global Step: 20220 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-03 18:14:05,271-Speed 13804.47 samples/sec Loss 2.7194 LearningRate 0.0006 Epoch: 11 Global Step: 20230 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-03 18:14:23,010-Speed 13855.02 samples/sec Loss 2.7111 LearningRate 0.0006 Epoch: 11 Global Step: 20240 Fp16 Grad Scale: 32768 Required: 25 hours Training: 2022-03-03 18:14:40,805-Speed 13811.90 samples/sec Loss 2.7411 LearningRate 0.0006 Epoch: 11 Global Step: 20250 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-03 18:14:58,563-Speed 13840.77 samples/sec Loss 2.7304 LearningRate 0.0006 Epoch: 11 Global Step: 20260 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-03 18:15:16,279-Speed 13873.14 samples/sec Loss 2.7558 LearningRate 0.0006 Epoch: 11 Global Step: 20270 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-03 18:15:33,997-Speed 13870.87 samples/sec Loss 2.7253 LearningRate 0.0006 Epoch: 11 Global Step: 20280 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-03 18:15:51,796-Speed 13809.29 samples/sec Loss 2.7045 LearningRate 0.0006 Epoch: 11 Global Step: 20290 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-03 18:16:09,574-Speed 13824.74 samples/sec Loss 2.7534 LearningRate 0.0006 Epoch: 11 Global Step: 20300 Fp16 Grad Scale: 65536 Required: 25 hours Training: 2022-03-03 18:16:27,290-Speed 13872.77 samples/sec Loss 2.7500 LearningRate 0.0006 Epoch: 11 Global Step: 20310 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-03 18:16:45,000-Speed 13877.74 samples/sec Loss 2.7371 LearningRate 0.0006 Epoch: 11 Global Step: 20320 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-03 18:17:02,802-Speed 13806.47 samples/sec Loss 2.7284 LearningRate 0.0006 Epoch: 11 Global Step: 20330 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-03 18:17:20,491-Speed 13893.94 samples/sec Loss 2.7494 LearningRate 0.0006 Epoch: 11 Global Step: 20340 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-03 18:17:38,416-Speed 13711.01 samples/sec Loss 2.7420 LearningRate 0.0006 Epoch: 11 Global Step: 20350 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-03-03 18:17:56,147-Speed 13861.68 samples/sec Loss 2.7166 LearningRate 0.0006 Epoch: 11 Global Step: 20360 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-03 18:18:13,921-Speed 13828.05 samples/sec Loss 2.7194 LearningRate 0.0006 Epoch: 11 Global Step: 20370 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-03 18:18:31,613-Speed 13891.76 samples/sec Loss 2.7078 LearningRate 0.0006 Epoch: 11 Global Step: 20380 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-03 18:18:49,376-Speed 13836.79 samples/sec Loss 2.7135 LearningRate 0.0006 Epoch: 11 Global Step: 20390 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-03 18:19:07,096-Speed 13869.43 samples/sec Loss 2.7356 LearningRate 0.0006 Epoch: 11 Global Step: 20400 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-03 18:19:24,741-Speed 13929.09 samples/sec Loss 2.7000 LearningRate 0.0006 Epoch: 11 Global Step: 20410 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-03 18:19:42,441-Speed 13886.29 samples/sec Loss 2.7289 LearningRate 0.0006 Epoch: 11 Global Step: 20420 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-03 18:20:00,178-Speed 13856.14 samples/sec Loss 2.7252 LearningRate 0.0006 Epoch: 11 Global Step: 20430 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-03 18:20:17,940-Speed 13837.43 samples/sec Loss 2.7251 LearningRate 0.0006 Epoch: 11 Global Step: 20440 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-03 18:20:35,615-Speed 13904.98 samples/sec Loss 2.6987 LearningRate 0.0006 Epoch: 11 Global Step: 20450 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-03 18:20:53,336-Speed 13869.33 samples/sec Loss 2.7036 LearningRate 0.0006 Epoch: 11 Global Step: 20460 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-03-03 18:21:10,980-Speed 13930.01 samples/sec Loss 2.7054 LearningRate 0.0006 Epoch: 11 Global Step: 20470 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-03 18:21:28,720-Speed 13853.80 samples/sec Loss 2.7879 LearningRate 0.0006 Epoch: 11 Global Step: 20480 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-03 18:21:46,419-Speed 13886.88 samples/sec Loss 2.7170 LearningRate 0.0006 Epoch: 11 Global Step: 20490 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-03 18:22:04,125-Speed 13880.75 samples/sec Loss 2.7283 LearningRate 0.0006 Epoch: 11 Global Step: 20500 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-03 18:22:21,832-Speed 13880.19 samples/sec Loss 2.6926 LearningRate 0.0006 Epoch: 11 Global Step: 20510 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-03 18:22:39,606-Speed 13828.01 samples/sec Loss 2.7066 LearningRate 0.0006 Epoch: 11 Global Step: 20520 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-03 18:22:57,240-Speed 13936.86 samples/sec Loss 2.7441 LearningRate 0.0006 Epoch: 11 Global Step: 20530 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-03-03 18:23:15,023-Speed 13821.39 samples/sec Loss 2.7496 LearningRate 0.0006 Epoch: 11 Global Step: 20540 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-03-03 18:23:32,790-Speed 13833.45 samples/sec Loss 2.7403 LearningRate 0.0006 Epoch: 11 Global Step: 20550 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-03-03 18:23:50,462-Speed 13907.66 samples/sec Loss 2.7216 LearningRate 0.0006 Epoch: 11 Global Step: 20560 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-03-03 18:24:08,171-Speed 13878.26 samples/sec Loss 2.6997 LearningRate 0.0006 Epoch: 11 Global Step: 20570 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-03-03 18:24:25,865-Speed 13890.93 samples/sec Loss 2.7082 LearningRate 0.0006 Epoch: 11 Global Step: 20580 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-03-03 18:24:43,564-Speed 13886.67 samples/sec Loss 2.6991 LearningRate 0.0006 Epoch: 11 Global Step: 20590 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-03-03 18:25:01,325-Speed 13837.56 samples/sec Loss 2.7135 LearningRate 0.0006 Epoch: 11 Global Step: 20600 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-03-03 18:25:19,086-Speed 13837.78 samples/sec Loss 2.7103 LearningRate 0.0006 Epoch: 11 Global Step: 20610 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-03-03 18:25:36,867-Speed 13823.06 samples/sec Loss 2.6963 LearningRate 0.0006 Epoch: 11 Global Step: 20620 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-03-03 18:25:54,528-Speed 13916.03 samples/sec Loss 2.6825 LearningRate 0.0006 Epoch: 11 Global Step: 20630 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-03 18:26:12,273-Speed 13850.66 samples/sec Loss 2.7118 LearningRate 0.0006 Epoch: 11 Global Step: 20640 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-03 18:26:30,061-Speed 13816.43 samples/sec Loss 2.7187 LearningRate 0.0006 Epoch: 11 Global Step: 20650 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-03 18:26:47,797-Speed 13857.40 samples/sec Loss 2.7164 LearningRate 0.0006 Epoch: 11 Global Step: 20660 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-03 18:27:05,566-Speed 13832.07 samples/sec Loss 2.7413 LearningRate 0.0006 Epoch: 11 Global Step: 20670 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-03 18:27:23,330-Speed 13836.12 samples/sec Loss 2.7369 LearningRate 0.0006 Epoch: 11 Global Step: 20680 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-03-03 18:27:41,098-Speed 13832.39 samples/sec Loss 2.6919 LearningRate 0.0006 Epoch: 11 Global Step: 20690 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-03-03 18:27:58,845-Speed 13848.82 samples/sec Loss 2.6821 LearningRate 0.0006 Epoch: 11 Global Step: 20700 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-03-03 18:28:16,507-Speed 13916.78 samples/sec Loss 2.7268 LearningRate 0.0006 Epoch: 11 Global Step: 20710 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-03-03 18:28:34,251-Speed 13851.02 samples/sec Loss 2.7293 LearningRate 0.0006 Epoch: 11 Global Step: 20720 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-03-03 18:28:51,926-Speed 13905.45 samples/sec Loss 2.7159 LearningRate 0.0006 Epoch: 11 Global Step: 20730 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-03-03 18:29:09,635-Speed 13878.55 samples/sec Loss 2.7063 LearningRate 0.0006 Epoch: 11 Global Step: 20740 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-03-03 18:30:18,552-Speed 3566.08 samples/sec Loss 2.6495 LearningRate 0.0006 Epoch: 12 Global Step: 20750 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-03-03 18:30:36,189-Speed 13935.70 samples/sec Loss 2.6696 LearningRate 0.0006 Epoch: 12 Global Step: 20760 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-03-03 18:30:53,892-Speed 13882.93 samples/sec Loss 2.6570 LearningRate 0.0006 Epoch: 12 Global Step: 20770 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-03-03 18:31:11,622-Speed 13863.12 samples/sec Loss 2.6313 LearningRate 0.0006 Epoch: 12 Global Step: 20780 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-03 18:31:29,345-Speed 13867.00 samples/sec Loss 2.6914 LearningRate 0.0006 Epoch: 12 Global Step: 20790 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-03 18:31:47,098-Speed 13844.73 samples/sec Loss 2.6799 LearningRate 0.0006 Epoch: 12 Global Step: 20800 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-03 18:32:04,950-Speed 13767.69 samples/sec Loss 2.6562 LearningRate 0.0006 Epoch: 12 Global Step: 20810 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-03 18:32:22,811-Speed 13760.22 samples/sec Loss 2.6710 LearningRate 0.0006 Epoch: 12 Global Step: 20820 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-03-03 18:32:40,693-Speed 13744.08 samples/sec Loss 2.6578 LearningRate 0.0006 Epoch: 12 Global Step: 20830 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-03-03 18:32:58,525-Speed 13783.43 samples/sec Loss 2.6547 LearningRate 0.0006 Epoch: 12 Global Step: 20840 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-03-03 18:33:16,323-Speed 13808.71 samples/sec Loss 2.6596 LearningRate 0.0006 Epoch: 12 Global Step: 20850 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-03-03 18:33:34,128-Speed 13803.49 samples/sec Loss 2.6572 LearningRate 0.0006 Epoch: 12 Global Step: 20860 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-03-03 18:33:51,865-Speed 13857.03 samples/sec Loss 2.7075 LearningRate 0.0006 Epoch: 12 Global Step: 20870 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-03-03 18:34:09,621-Speed 13841.95 samples/sec Loss 2.6665 LearningRate 0.0006 Epoch: 12 Global Step: 20880 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-03-03 18:34:27,516-Speed 13734.75 samples/sec Loss 2.6556 LearningRate 0.0006 Epoch: 12 Global Step: 20890 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-03-03 18:34:45,623-Speed 13572.85 samples/sec Loss 2.6553 LearningRate 0.0006 Epoch: 12 Global Step: 20900 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-03-03 18:35:03,616-Speed 13660.10 samples/sec Loss 2.6789 LearningRate 0.0006 Epoch: 12 Global Step: 20910 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-03-03 18:35:21,281-Speed 13913.06 samples/sec Loss 2.6688 LearningRate 0.0006 Epoch: 12 Global Step: 20920 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-03 18:35:39,040-Speed 13839.43 samples/sec Loss 2.6735 LearningRate 0.0006 Epoch: 12 Global Step: 20930 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-03 18:35:56,811-Speed 13831.19 samples/sec Loss 2.6669 LearningRate 0.0006 Epoch: 12 Global Step: 20940 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-03 18:36:14,523-Speed 13876.46 samples/sec Loss 2.7000 LearningRate 0.0006 Epoch: 12 Global Step: 20950 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-03 18:36:32,245-Speed 13868.65 samples/sec Loss 2.6788 LearningRate 0.0006 Epoch: 12 Global Step: 20960 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-03 18:36:50,079-Speed 13781.11 samples/sec Loss 2.6642 LearningRate 0.0006 Epoch: 12 Global Step: 20970 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-03 18:37:07,777-Speed 13887.41 samples/sec Loss 2.6388 LearningRate 0.0006 Epoch: 12 Global Step: 20980 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-03 18:37:25,561-Speed 13819.49 samples/sec Loss 2.6581 LearningRate 0.0006 Epoch: 12 Global Step: 20990 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-03 18:37:43,373-Speed 13798.94 samples/sec Loss 2.7295 LearningRate 0.0006 Epoch: 12 Global Step: 21000 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-03 18:38:01,296-Speed 13713.08 samples/sec Loss 2.6646 LearningRate 0.0006 Epoch: 12 Global Step: 21010 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-03 18:38:19,273-Speed 13671.94 samples/sec Loss 2.6514 LearningRate 0.0006 Epoch: 12 Global Step: 21020 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-03-03 18:38:37,214-Speed 13698.73 samples/sec Loss 2.6796 LearningRate 0.0006 Epoch: 12 Global Step: 21030 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-03-03 18:38:55,029-Speed 13796.74 samples/sec Loss 2.6711 LearningRate 0.0006 Epoch: 12 Global Step: 21040 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-03 18:39:12,879-Speed 13769.06 samples/sec Loss 2.6644 LearningRate 0.0006 Epoch: 12 Global Step: 21050 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-03 18:39:30,696-Speed 13794.30 samples/sec Loss 2.6436 LearningRate 0.0006 Epoch: 12 Global Step: 21060 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-03 18:39:48,573-Speed 13748.35 samples/sec Loss 2.6808 LearningRate 0.0006 Epoch: 12 Global Step: 21070 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-03 18:40:06,499-Speed 13710.63 samples/sec Loss 2.6878 LearningRate 0.0006 Epoch: 12 Global Step: 21080 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-03 18:40:24,307-Speed 13800.98 samples/sec Loss 2.6557 LearningRate 0.0006 Epoch: 12 Global Step: 21090 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-03 18:40:42,150-Speed 13774.54 samples/sec Loss 2.6498 LearningRate 0.0006 Epoch: 12 Global Step: 21100 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-03 18:40:59,932-Speed 13822.09 samples/sec Loss 2.6466 LearningRate 0.0006 Epoch: 12 Global Step: 21110 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-03-03 18:41:17,633-Speed 13884.42 samples/sec Loss 2.6515 LearningRate 0.0006 Epoch: 12 Global Step: 21120 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-03-03 18:41:35,308-Speed 13905.77 samples/sec Loss 2.6477 LearningRate 0.0006 Epoch: 12 Global Step: 21130 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-03-03 18:41:52,966-Speed 13918.15 samples/sec Loss 2.6613 LearningRate 0.0006 Epoch: 12 Global Step: 21140 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-03-03 18:42:10,766-Speed 13807.66 samples/sec Loss 2.6623 LearningRate 0.0006 Epoch: 12 Global Step: 21150 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-03-03 18:42:28,588-Speed 13790.17 samples/sec Loss 2.6835 LearningRate 0.0006 Epoch: 12 Global Step: 21160 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-03-03 18:42:46,336-Speed 13848.04 samples/sec Loss 2.6747 LearningRate 0.0006 Epoch: 12 Global Step: 21170 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-03-03 18:43:04,191-Speed 13765.21 samples/sec Loss 2.6688 LearningRate 0.0006 Epoch: 12 Global Step: 21180 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-03-03 18:43:21,992-Speed 13807.00 samples/sec Loss 2.6702 LearningRate 0.0006 Epoch: 12 Global Step: 21190 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-03-03 18:43:39,835-Speed 13774.80 samples/sec Loss 2.6529 LearningRate 0.0006 Epoch: 12 Global Step: 21200 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-03-03 18:43:57,640-Speed 13803.60 samples/sec Loss 2.6659 LearningRate 0.0006 Epoch: 12 Global Step: 21210 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-03-03 18:44:15,524-Speed 13742.51 samples/sec Loss 2.6458 LearningRate 0.0006 Epoch: 12 Global Step: 21220 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-03-03 18:44:33,342-Speed 13793.86 samples/sec Loss 2.6731 LearningRate 0.0006 Epoch: 12 Global Step: 21230 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-03-03 18:44:51,169-Speed 13786.70 samples/sec Loss 2.6539 LearningRate 0.0006 Epoch: 12 Global Step: 21240 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-03-03 18:45:08,988-Speed 13792.51 samples/sec Loss 2.6473 LearningRate 0.0006 Epoch: 12 Global Step: 21250 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-03-03 18:45:26,815-Speed 13787.43 samples/sec Loss 2.6557 LearningRate 0.0006 Epoch: 12 Global Step: 21260 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-03-03 18:45:44,664-Speed 13769.74 samples/sec Loss 2.6462 LearningRate 0.0006 Epoch: 12 Global Step: 21270 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-03-03 18:46:02,489-Speed 13787.72 samples/sec Loss 2.6384 LearningRate 0.0006 Epoch: 12 Global Step: 21280 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-03-03 18:46:20,283-Speed 13812.68 samples/sec Loss 2.6526 LearningRate 0.0006 Epoch: 12 Global Step: 21290 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-03-03 18:46:38,035-Speed 13845.13 samples/sec Loss 2.6486 LearningRate 0.0006 Epoch: 12 Global Step: 21300 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-03-03 18:46:55,806-Speed 13830.14 samples/sec Loss 2.6318 LearningRate 0.0006 Epoch: 12 Global Step: 21310 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-03-03 18:47:13,604-Speed 13809.09 samples/sec Loss 2.6240 LearningRate 0.0006 Epoch: 12 Global Step: 21320 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-03-03 18:47:31,423-Speed 13792.89 samples/sec Loss 2.6334 LearningRate 0.0006 Epoch: 12 Global Step: 21330 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-03-03 18:47:49,225-Speed 13806.18 samples/sec Loss 2.6181 LearningRate 0.0006 Epoch: 12 Global Step: 21340 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-03 18:48:07,074-Speed 13769.74 samples/sec Loss 2.6371 LearningRate 0.0006 Epoch: 12 Global Step: 21350 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-03 18:48:24,861-Speed 13818.01 samples/sec Loss 2.6302 LearningRate 0.0006 Epoch: 12 Global Step: 21360 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-03 18:48:42,674-Speed 13797.38 samples/sec Loss 2.6431 LearningRate 0.0006 Epoch: 12 Global Step: 21370 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-03 18:49:00,557-Speed 13742.97 samples/sec Loss 2.6383 LearningRate 0.0006 Epoch: 12 Global Step: 21380 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-03 18:49:18,336-Speed 13824.35 samples/sec Loss 2.6585 LearningRate 0.0006 Epoch: 12 Global Step: 21390 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-03 18:49:36,125-Speed 13816.54 samples/sec Loss 2.6311 LearningRate 0.0006 Epoch: 12 Global Step: 21400 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-03 18:49:53,975-Speed 13768.81 samples/sec Loss 2.6375 LearningRate 0.0006 Epoch: 12 Global Step: 21410 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-03 18:50:11,794-Speed 13792.90 samples/sec Loss 2.6348 LearningRate 0.0006 Epoch: 12 Global Step: 21420 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-03 18:50:29,575-Speed 13822.49 samples/sec Loss 2.6278 LearningRate 0.0006 Epoch: 12 Global Step: 21430 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-03 18:50:47,399-Speed 13788.34 samples/sec Loss 2.6169 LearningRate 0.0006 Epoch: 12 Global Step: 21440 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-03-03 18:51:05,207-Speed 13801.22 samples/sec Loss 2.6125 LearningRate 0.0006 Epoch: 12 Global Step: 21450 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-03-03 18:51:23,027-Speed 13792.59 samples/sec Loss 2.6214 LearningRate 0.0006 Epoch: 12 Global Step: 21460 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-03-03 18:51:40,791-Speed 13835.25 samples/sec Loss 2.6326 LearningRate 0.0006 Epoch: 12 Global Step: 21470 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-03 18:51:58,666-Speed 13752.41 samples/sec Loss 2.6163 LearningRate 0.0006 Epoch: 12 Global Step: 21480 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-03 18:52:16,577-Speed 13722.07 samples/sec Loss 2.6320 LearningRate 0.0006 Epoch: 12 Global Step: 21490 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-03 18:52:34,342-Speed 13834.78 samples/sec Loss 2.6380 LearningRate 0.0006 Epoch: 12 Global Step: 21500 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-03 18:52:52,149-Speed 13802.26 samples/sec Loss 2.6078 LearningRate 0.0006 Epoch: 12 Global Step: 21510 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-03 18:53:09,910-Speed 13837.58 samples/sec Loss 2.6173 LearningRate 0.0006 Epoch: 12 Global Step: 21520 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-03 18:53:27,652-Speed 13852.72 samples/sec Loss 2.6450 LearningRate 0.0006 Epoch: 12 Global Step: 21530 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-03 18:53:45,409-Speed 13842.31 samples/sec Loss 2.6319 LearningRate 0.0006 Epoch: 12 Global Step: 21540 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-03 18:54:03,207-Speed 13809.61 samples/sec Loss 2.6345 LearningRate 0.0006 Epoch: 12 Global Step: 21550 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-03 18:54:20,940-Speed 13859.72 samples/sec Loss 2.5856 LearningRate 0.0006 Epoch: 12 Global Step: 21560 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-03 18:54:38,697-Speed 13841.28 samples/sec Loss 2.5925 LearningRate 0.0006 Epoch: 12 Global Step: 21570 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-03-03 18:54:56,359-Speed 13914.91 samples/sec Loss 2.5998 LearningRate 0.0006 Epoch: 12 Global Step: 21580 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-03-03 18:55:14,342-Speed 13667.60 samples/sec Loss 2.5924 LearningRate 0.0006 Epoch: 12 Global Step: 21590 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-03-03 18:55:32,323-Speed 13668.61 samples/sec Loss 2.6173 LearningRate 0.0006 Epoch: 12 Global Step: 21600 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-03-03 18:55:50,232-Speed 13723.86 samples/sec Loss 2.6472 LearningRate 0.0006 Epoch: 12 Global Step: 21610 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-03 18:56:08,199-Speed 13680.34 samples/sec Loss 2.5933 LearningRate 0.0006 Epoch: 12 Global Step: 21620 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-03 18:56:26,193-Speed 13658.57 samples/sec Loss 2.6000 LearningRate 0.0006 Epoch: 12 Global Step: 21630 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-03 18:56:44,263-Speed 13601.52 samples/sec Loss 2.6014 LearningRate 0.0006 Epoch: 12 Global Step: 21640 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-03 18:57:02,209-Speed 13695.56 samples/sec Loss 2.5909 LearningRate 0.0006 Epoch: 12 Global Step: 21650 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-03-03 18:57:20,205-Speed 13656.45 samples/sec Loss 2.6189 LearningRate 0.0006 Epoch: 12 Global Step: 21660 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-03-03 18:57:38,162-Speed 13686.86 samples/sec Loss 2.6281 LearningRate 0.0006 Epoch: 12 Global Step: 21670 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-03-03 18:57:56,150-Speed 13663.53 samples/sec Loss 2.6144 LearningRate 0.0006 Epoch: 12 Global Step: 21680 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-03-03 18:58:14,254-Speed 13576.76 samples/sec Loss 2.5969 LearningRate 0.0006 Epoch: 12 Global Step: 21690 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-03-03 18:58:32,211-Speed 13686.77 samples/sec Loss 2.5607 LearningRate 0.0006 Epoch: 12 Global Step: 21700 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-03-03 18:58:50,169-Speed 13686.11 samples/sec Loss 2.5963 LearningRate 0.0006 Epoch: 12 Global Step: 21710 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-03-03 18:59:08,102-Speed 13704.70 samples/sec Loss 2.6054 LearningRate 0.0006 Epoch: 12 Global Step: 21720 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-03-03 18:59:26,069-Speed 13679.87 samples/sec Loss 2.6090 LearningRate 0.0006 Epoch: 12 Global Step: 21730 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-03-03 18:59:44,210-Speed 13547.69 samples/sec Loss 2.6096 LearningRate 0.0006 Epoch: 12 Global Step: 21740 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-03-03 19:00:02,294-Speed 13590.89 samples/sec Loss 2.6221 LearningRate 0.0006 Epoch: 12 Global Step: 21750 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-03-03 19:00:20,383-Speed 13588.25 samples/sec Loss 2.6192 LearningRate 0.0006 Epoch: 12 Global Step: 21760 Fp16 Grad Scale: 16384 Required: 24 hours Training: 2022-03-03 19:00:38,339-Speed 13687.84 samples/sec Loss 2.6180 LearningRate 0.0006 Epoch: 12 Global Step: 21770 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-03-03 19:00:56,343-Speed 13651.37 samples/sec Loss 2.5955 LearningRate 0.0006 Epoch: 12 Global Step: 21780 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-03-03 19:01:14,337-Speed 13658.17 samples/sec Loss 2.5978 LearningRate 0.0006 Epoch: 12 Global Step: 21790 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-03-03 19:01:32,309-Speed 13675.33 samples/sec Loss 2.5975 LearningRate 0.0006 Epoch: 12 Global Step: 21800 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-03-03 19:01:50,272-Speed 13683.79 samples/sec Loss 2.6248 LearningRate 0.0006 Epoch: 12 Global Step: 21810 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-03-03 19:02:08,266-Speed 13658.85 samples/sec Loss 2.6116 LearningRate 0.0006 Epoch: 12 Global Step: 21820 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-03-03 19:02:26,274-Speed 13647.87 samples/sec Loss 2.5926 LearningRate 0.0006 Epoch: 12 Global Step: 21830 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-03-03 19:02:44,312-Speed 13625.39 samples/sec Loss 2.5945 LearningRate 0.0006 Epoch: 12 Global Step: 21840 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-03-03 19:03:02,291-Speed 13670.50 samples/sec Loss 2.5905 LearningRate 0.0006 Epoch: 12 Global Step: 21850 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-03-03 19:03:20,237-Speed 13695.69 samples/sec Loss 2.5593 LearningRate 0.0006 Epoch: 12 Global Step: 21860 Fp16 Grad Scale: 32768 Required: 24 hours Training: 2022-03-03 19:03:38,260-Speed 13638.31 samples/sec Loss 2.5534 LearningRate 0.0006 Epoch: 12 Global Step: 21870 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-03 19:03:56,247-Speed 13664.29 samples/sec Loss 2.5955 LearningRate 0.0006 Epoch: 12 Global Step: 21880 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-03 19:04:14,263-Speed 13642.43 samples/sec Loss 2.5839 LearningRate 0.0006 Epoch: 12 Global Step: 21890 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-03 19:04:32,265-Speed 13652.64 samples/sec Loss 2.5813 LearningRate 0.0006 Epoch: 12 Global Step: 21900 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-03 19:04:50,266-Speed 13653.36 samples/sec Loss 2.5886 LearningRate 0.0006 Epoch: 12 Global Step: 21910 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-03 19:05:08,305-Speed 13624.62 samples/sec Loss 2.6273 LearningRate 0.0006 Epoch: 12 Global Step: 21920 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-03 19:05:26,304-Speed 13655.02 samples/sec Loss 2.5808 LearningRate 0.0006 Epoch: 12 Global Step: 21930 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-03 19:05:44,356-Speed 13614.50 samples/sec Loss 2.5820 LearningRate 0.0006 Epoch: 12 Global Step: 21940 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-03 19:06:02,298-Speed 13698.38 samples/sec Loss 2.5831 LearningRate 0.0006 Epoch: 12 Global Step: 21950 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-03 19:06:20,275-Speed 13671.68 samples/sec Loss 2.5768 LearningRate 0.0006 Epoch: 12 Global Step: 21960 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-03 19:06:38,295-Speed 13639.33 samples/sec Loss 2.5755 LearningRate 0.0006 Epoch: 12 Global Step: 21970 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-03-03 19:06:56,291-Speed 13656.86 samples/sec Loss 2.5642 LearningRate 0.0006 Epoch: 12 Global Step: 21980 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-03-03 19:07:14,293-Speed 13652.89 samples/sec Loss 2.5932 LearningRate 0.0006 Epoch: 12 Global Step: 21990 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-03 19:07:32,326-Speed 13628.81 samples/sec Loss 2.5765 LearningRate 0.0006 Epoch: 12 Global Step: 22000 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-03 19:07:50,446-Speed 13564.31 samples/sec Loss 2.5645 LearningRate 0.0006 Epoch: 12 Global Step: 22010 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-03 19:08:08,531-Speed 13590.28 samples/sec Loss 2.5656 LearningRate 0.0006 Epoch: 12 Global Step: 22020 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-03 19:08:26,605-Speed 13597.90 samples/sec Loss 2.6068 LearningRate 0.0006 Epoch: 12 Global Step: 22030 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-03 19:08:44,581-Speed 13672.65 samples/sec Loss 2.5732 LearningRate 0.0006 Epoch: 12 Global Step: 22040 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-03 19:09:02,454-Speed 13750.73 samples/sec Loss 2.5597 LearningRate 0.0006 Epoch: 12 Global Step: 22050 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-03 19:09:20,212-Speed 13841.83 samples/sec Loss 2.5641 LearningRate 0.0006 Epoch: 12 Global Step: 22060 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-03 19:09:37,949-Speed 13856.69 samples/sec Loss 2.5885 LearningRate 0.0006 Epoch: 12 Global Step: 22070 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-03 19:09:55,622-Speed 13906.69 samples/sec Loss 2.5656 LearningRate 0.0006 Epoch: 12 Global Step: 22080 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-03 19:10:13,386-Speed 13835.19 samples/sec Loss 2.5721 LearningRate 0.0006 Epoch: 12 Global Step: 22090 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-03-03 19:10:31,099-Speed 13875.97 samples/sec Loss 2.5955 LearningRate 0.0006 Epoch: 12 Global Step: 22100 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-03-03 19:10:49,014-Speed 13718.44 samples/sec Loss 2.5662 LearningRate 0.0006 Epoch: 12 Global Step: 22110 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-03 19:11:06,819-Speed 13803.61 samples/sec Loss 2.5729 LearningRate 0.0006 Epoch: 12 Global Step: 22120 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-03 19:11:24,673-Speed 13766.21 samples/sec Loss 2.5781 LearningRate 0.0006 Epoch: 12 Global Step: 22130 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-03 19:11:42,408-Speed 13858.41 samples/sec Loss 2.5667 LearningRate 0.0006 Epoch: 12 Global Step: 22140 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-03 19:12:00,160-Speed 13844.50 samples/sec Loss 2.5757 LearningRate 0.0006 Epoch: 12 Global Step: 22150 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-03 19:12:17,910-Speed 13846.76 samples/sec Loss 2.5598 LearningRate 0.0006 Epoch: 12 Global Step: 22160 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-03 19:12:35,593-Speed 13898.92 samples/sec Loss 2.5538 LearningRate 0.0006 Epoch: 12 Global Step: 22170 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-03 19:12:53,298-Speed 13881.62 samples/sec Loss 2.5659 LearningRate 0.0006 Epoch: 12 Global Step: 22180 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-03 19:13:11,034-Speed 13857.56 samples/sec Loss 2.5794 LearningRate 0.0006 Epoch: 12 Global Step: 22190 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-03 19:13:28,809-Speed 13826.84 samples/sec Loss 2.5727 LearningRate 0.0006 Epoch: 12 Global Step: 22200 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-03 19:13:46,534-Speed 13866.24 samples/sec Loss 2.5817 LearningRate 0.0006 Epoch: 12 Global Step: 22210 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-03-03 19:14:04,335-Speed 13806.83 samples/sec Loss 2.5818 LearningRate 0.0006 Epoch: 12 Global Step: 22220 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-03 19:14:22,036-Speed 13884.80 samples/sec Loss 2.5604 LearningRate 0.0006 Epoch: 12 Global Step: 22230 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-03 19:14:39,741-Speed 13881.90 samples/sec Loss 2.5616 LearningRate 0.0006 Epoch: 12 Global Step: 22240 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-03 19:14:57,488-Speed 13848.93 samples/sec Loss 2.5580 LearningRate 0.0006 Epoch: 12 Global Step: 22250 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-03 19:15:15,340-Speed 13767.49 samples/sec Loss 2.5686 LearningRate 0.0006 Epoch: 12 Global Step: 22260 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-03 19:15:33,072-Speed 13860.40 samples/sec Loss 2.5648 LearningRate 0.0006 Epoch: 12 Global Step: 22270 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-03 19:15:50,830-Speed 13840.71 samples/sec Loss 2.5804 LearningRate 0.0006 Epoch: 12 Global Step: 22280 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-03 19:16:08,635-Speed 13803.56 samples/sec Loss 2.5522 LearningRate 0.0006 Epoch: 12 Global Step: 22290 Fp16 Grad Scale: 65536 Required: 24 hours Training: 2022-03-03 19:16:26,354-Speed 13870.12 samples/sec Loss 2.5470 LearningRate 0.0006 Epoch: 12 Global Step: 22300 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-03 19:16:44,119-Speed 13835.83 samples/sec Loss 2.5469 LearningRate 0.0006 Epoch: 12 Global Step: 22310 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-03 19:17:01,894-Speed 13826.93 samples/sec Loss 2.5456 LearningRate 0.0006 Epoch: 12 Global Step: 22320 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-03-03 19:17:19,576-Speed 13900.36 samples/sec Loss 2.6125 LearningRate 0.0006 Epoch: 12 Global Step: 22330 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-03 19:17:37,301-Speed 13865.60 samples/sec Loss 2.5772 LearningRate 0.0006 Epoch: 12 Global Step: 22340 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-03 19:17:55,034-Speed 13859.45 samples/sec Loss 2.5663 LearningRate 0.0006 Epoch: 12 Global Step: 22350 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-03 19:18:12,724-Speed 13894.16 samples/sec Loss 2.5564 LearningRate 0.0006 Epoch: 12 Global Step: 22360 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-03 19:18:30,511-Speed 13817.53 samples/sec Loss 2.5486 LearningRate 0.0006 Epoch: 12 Global Step: 22370 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-03 19:18:48,192-Speed 13900.66 samples/sec Loss 2.5555 LearningRate 0.0006 Epoch: 12 Global Step: 22380 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-03 19:19:05,903-Speed 13877.21 samples/sec Loss 2.5445 LearningRate 0.0006 Epoch: 12 Global Step: 22390 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-03 19:19:23,609-Speed 13881.10 samples/sec Loss 2.5759 LearningRate 0.0006 Epoch: 12 Global Step: 22400 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-03 19:19:41,332-Speed 13867.85 samples/sec Loss 2.5652 LearningRate 0.0006 Epoch: 12 Global Step: 22410 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-03 19:19:59,121-Speed 13815.64 samples/sec Loss 2.5674 LearningRate 0.0006 Epoch: 12 Global Step: 22420 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-03 19:20:16,909-Speed 13817.48 samples/sec Loss 2.5510 LearningRate 0.0006 Epoch: 12 Global Step: 22430 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-03-03 19:20:34,761-Speed 13769.07 samples/sec Loss 2.5684 LearningRate 0.0006 Epoch: 12 Global Step: 22440 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-03-03 19:20:52,448-Speed 13895.52 samples/sec Loss 2.5781 LearningRate 0.0006 Epoch: 12 Global Step: 22450 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-03-03 19:21:10,215-Speed 13833.28 samples/sec Loss 2.6140 LearningRate 0.0006 Epoch: 12 Global Step: 22460 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-03 19:22:19,288-Speed 3558.04 samples/sec Loss 2.6045 LearningRate 0.0006 Epoch: 13 Global Step: 22470 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-03 19:22:36,968-Speed 13901.88 samples/sec Loss 2.5316 LearningRate 0.0006 Epoch: 13 Global Step: 22480 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-03 19:22:54,734-Speed 13834.16 samples/sec Loss 2.5144 LearningRate 0.0006 Epoch: 13 Global Step: 22490 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-03 19:23:12,475-Speed 13853.69 samples/sec Loss 2.5129 LearningRate 0.0006 Epoch: 13 Global Step: 22500 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-03 19:23:30,233-Speed 13840.47 samples/sec Loss 2.5207 LearningRate 0.0006 Epoch: 13 Global Step: 22510 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-03 19:23:48,057-Speed 13788.98 samples/sec Loss 2.5124 LearningRate 0.0006 Epoch: 13 Global Step: 22520 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-03 19:24:05,897-Speed 13776.58 samples/sec Loss 2.5157 LearningRate 0.0006 Epoch: 13 Global Step: 22530 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-03 19:24:23,722-Speed 13788.17 samples/sec Loss 2.5399 LearningRate 0.0006 Epoch: 13 Global Step: 22540 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-03 19:24:41,523-Speed 13806.64 samples/sec Loss 2.5273 LearningRate 0.0006 Epoch: 13 Global Step: 22550 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-03 19:24:59,278-Speed 13843.31 samples/sec Loss 2.5125 LearningRate 0.0006 Epoch: 13 Global Step: 22560 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-03 19:25:17,180-Speed 13728.59 samples/sec Loss 2.5148 LearningRate 0.0006 Epoch: 13 Global Step: 22570 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-03 19:25:35,749-Speed 13236.07 samples/sec Loss 2.5133 LearningRate 0.0006 Epoch: 13 Global Step: 22580 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-03 19:25:53,567-Speed 13793.43 samples/sec Loss 2.5188 LearningRate 0.0006 Epoch: 13 Global Step: 22590 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-03 19:26:11,338-Speed 13830.62 samples/sec Loss 2.5106 LearningRate 0.0006 Epoch: 13 Global Step: 22600 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-03 19:26:29,958-Speed 13199.53 samples/sec Loss 2.5166 LearningRate 0.0006 Epoch: 13 Global Step: 22610 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-03 19:26:47,691-Speed 13860.01 samples/sec Loss 2.5211 LearningRate 0.0006 Epoch: 13 Global Step: 22620 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-03 19:27:05,407-Speed 13872.55 samples/sec Loss 2.5518 LearningRate 0.0006 Epoch: 13 Global Step: 22630 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-03 19:27:23,216-Speed 13800.47 samples/sec Loss 2.5236 LearningRate 0.0006 Epoch: 13 Global Step: 22640 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-03 19:27:40,965-Speed 13848.58 samples/sec Loss 2.5261 LearningRate 0.0006 Epoch: 13 Global Step: 22650 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-03 19:27:58,699-Speed 13859.04 samples/sec Loss 2.5028 LearningRate 0.0006 Epoch: 13 Global Step: 22660 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-03 19:28:16,572-Speed 13751.31 samples/sec Loss 2.5268 LearningRate 0.0006 Epoch: 13 Global Step: 22670 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-03 19:28:34,319-Speed 13848.25 samples/sec Loss 2.5197 LearningRate 0.0006 Epoch: 13 Global Step: 22680 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-03 19:28:52,078-Speed 13840.52 samples/sec Loss 2.5138 LearningRate 0.0006 Epoch: 13 Global Step: 22690 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-03 19:29:09,802-Speed 13866.79 samples/sec Loss 2.5178 LearningRate 0.0006 Epoch: 13 Global Step: 22700 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-03 19:29:27,538-Speed 13857.24 samples/sec Loss 2.5132 LearningRate 0.0006 Epoch: 13 Global Step: 22710 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-03 19:29:45,298-Speed 13839.23 samples/sec Loss 2.5087 LearningRate 0.0006 Epoch: 13 Global Step: 22720 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-03 19:30:03,065-Speed 13833.28 samples/sec Loss 2.5278 LearningRate 0.0006 Epoch: 13 Global Step: 22730 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-03 19:30:20,753-Speed 13895.53 samples/sec Loss 2.5171 LearningRate 0.0006 Epoch: 13 Global Step: 22740 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-03 19:30:38,551-Speed 13808.92 samples/sec Loss 2.5380 LearningRate 0.0006 Epoch: 13 Global Step: 22750 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-03 19:30:56,316-Speed 13834.89 samples/sec Loss 2.4935 LearningRate 0.0006 Epoch: 13 Global Step: 22760 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-03 19:31:14,100-Speed 13819.98 samples/sec Loss 2.5236 LearningRate 0.0006 Epoch: 13 Global Step: 22770 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-03 19:31:31,847-Speed 13848.92 samples/sec Loss 2.5172 LearningRate 0.0006 Epoch: 13 Global Step: 22780 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-03 19:31:49,581-Speed 13859.54 samples/sec Loss 2.5237 LearningRate 0.0006 Epoch: 13 Global Step: 22790 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-03 19:32:07,361-Speed 13823.04 samples/sec Loss 2.5129 LearningRate 0.0006 Epoch: 13 Global Step: 22800 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-03 19:32:25,122-Speed 13838.02 samples/sec Loss 2.5340 LearningRate 0.0006 Epoch: 13 Global Step: 22810 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-03 19:32:42,801-Speed 13901.95 samples/sec Loss 2.5524 LearningRate 0.0006 Epoch: 13 Global Step: 22820 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-03 19:33:00,526-Speed 13866.52 samples/sec Loss 2.5636 LearningRate 0.0006 Epoch: 13 Global Step: 22830 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-03 19:33:18,286-Speed 13838.31 samples/sec Loss 2.5204 LearningRate 0.0006 Epoch: 13 Global Step: 22840 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-03 19:33:36,009-Speed 13867.91 samples/sec Loss 2.5263 LearningRate 0.0006 Epoch: 13 Global Step: 22850 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-03 19:33:53,828-Speed 13792.59 samples/sec Loss 2.5292 LearningRate 0.0006 Epoch: 13 Global Step: 22860 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-03 19:34:11,619-Speed 13814.65 samples/sec Loss 2.5126 LearningRate 0.0006 Epoch: 13 Global Step: 22870 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-03 19:34:29,428-Speed 13800.51 samples/sec Loss 2.5009 LearningRate 0.0006 Epoch: 13 Global Step: 22880 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-03 19:34:47,150-Speed 13868.18 samples/sec Loss 2.5026 LearningRate 0.0006 Epoch: 13 Global Step: 22890 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-03 19:35:04,878-Speed 13864.35 samples/sec Loss 2.4947 LearningRate 0.0006 Epoch: 13 Global Step: 22900 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-03 19:35:22,684-Speed 13803.04 samples/sec Loss 2.5201 LearningRate 0.0006 Epoch: 13 Global Step: 22910 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-03 19:35:40,463-Speed 13823.97 samples/sec Loss 2.5217 LearningRate 0.0006 Epoch: 13 Global Step: 22920 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-03 19:35:58,237-Speed 13827.59 samples/sec Loss 2.4995 LearningRate 0.0006 Epoch: 13 Global Step: 22930 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-03-03 19:36:16,055-Speed 13793.98 samples/sec Loss 2.4964 LearningRate 0.0006 Epoch: 13 Global Step: 22940 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-03 19:36:33,835-Speed 13822.98 samples/sec Loss 2.5200 LearningRate 0.0006 Epoch: 13 Global Step: 22950 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-03 19:36:51,684-Speed 13769.51 samples/sec Loss 2.4961 LearningRate 0.0006 Epoch: 13 Global Step: 22960 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-03 19:37:10,021-Speed 13403.68 samples/sec Loss 2.5032 LearningRate 0.0006 Epoch: 13 Global Step: 22970 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-03 19:37:28,315-Speed 13434.38 samples/sec Loss 2.5000 LearningRate 0.0006 Epoch: 13 Global Step: 22980 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-03 19:37:46,611-Speed 13433.19 samples/sec Loss 2.5093 LearningRate 0.0005 Epoch: 13 Global Step: 22990 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-03 19:38:04,417-Speed 13803.17 samples/sec Loss 2.4973 LearningRate 0.0005 Epoch: 13 Global Step: 23000 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-03 19:38:22,171-Speed 13843.37 samples/sec Loss 2.4785 LearningRate 0.0005 Epoch: 13 Global Step: 23010 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-03 19:38:40,626-Speed 13317.52 samples/sec Loss 2.4824 LearningRate 0.0005 Epoch: 13 Global Step: 23020 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-03 19:38:58,304-Speed 13902.92 samples/sec Loss 2.4892 LearningRate 0.0005 Epoch: 13 Global Step: 23030 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-03 19:39:16,213-Speed 13723.89 samples/sec Loss 2.5176 LearningRate 0.0005 Epoch: 13 Global Step: 23040 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-03 19:39:34,740-Speed 13265.66 samples/sec Loss 2.5042 LearningRate 0.0005 Epoch: 13 Global Step: 23050 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-03 19:39:53,169-Speed 13336.28 samples/sec Loss 2.4820 LearningRate 0.0005 Epoch: 13 Global Step: 23060 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-03 19:40:10,935-Speed 13834.00 samples/sec Loss 2.5132 LearningRate 0.0005 Epoch: 13 Global Step: 23070 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-03 19:40:28,692-Speed 13841.62 samples/sec Loss 2.4798 LearningRate 0.0005 Epoch: 13 Global Step: 23080 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-03 19:40:46,428-Speed 13857.35 samples/sec Loss 2.4762 LearningRate 0.0005 Epoch: 13 Global Step: 23090 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-03 19:41:05,053-Speed 13195.82 samples/sec Loss 2.4896 LearningRate 0.0005 Epoch: 13 Global Step: 23100 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-03 19:41:22,777-Speed 13866.66 samples/sec Loss 2.4720 LearningRate 0.0005 Epoch: 13 Global Step: 23110 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-03 19:41:41,434-Speed 13173.26 samples/sec Loss 2.4718 LearningRate 0.0005 Epoch: 13 Global Step: 23120 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-03 19:41:59,146-Speed 13876.16 samples/sec Loss 2.5072 LearningRate 0.0005 Epoch: 13 Global Step: 23130 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-03 19:42:17,038-Speed 13736.49 samples/sec Loss 2.5062 LearningRate 0.0005 Epoch: 13 Global Step: 23140 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-03 19:42:35,668-Speed 13192.79 samples/sec Loss 2.4836 LearningRate 0.0005 Epoch: 13 Global Step: 23150 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-03 19:42:53,413-Speed 13850.16 samples/sec Loss 2.5049 LearningRate 0.0005 Epoch: 13 Global Step: 23160 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-03 19:43:11,863-Speed 13321.73 samples/sec Loss 2.5114 LearningRate 0.0005 Epoch: 13 Global Step: 23170 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-03 19:43:29,927-Speed 13605.57 samples/sec Loss 2.5079 LearningRate 0.0005 Epoch: 13 Global Step: 23180 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-03 19:43:48,303-Speed 13375.15 samples/sec Loss 2.4810 LearningRate 0.0005 Epoch: 13 Global Step: 23190 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-03 19:44:06,172-Speed 13754.46 samples/sec Loss 2.4919 LearningRate 0.0005 Epoch: 13 Global Step: 23200 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-03 19:44:24,783-Speed 13205.41 samples/sec Loss 2.4931 LearningRate 0.0005 Epoch: 13 Global Step: 23210 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-03 19:44:42,681-Speed 13731.90 samples/sec Loss 2.4700 LearningRate 0.0005 Epoch: 13 Global Step: 23220 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-03 19:45:00,470-Speed 13816.44 samples/sec Loss 2.4858 LearningRate 0.0005 Epoch: 13 Global Step: 23230 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-03 19:45:18,534-Speed 13606.05 samples/sec Loss 2.4870 LearningRate 0.0005 Epoch: 13 Global Step: 23240 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-03 19:45:36,868-Speed 13405.58 samples/sec Loss 2.4931 LearningRate 0.0005 Epoch: 13 Global Step: 23250 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-03 19:45:54,579-Speed 13876.88 samples/sec Loss 2.4728 LearningRate 0.0005 Epoch: 13 Global Step: 23260 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-03 19:46:13,186-Speed 13208.91 samples/sec Loss 2.4921 LearningRate 0.0005 Epoch: 13 Global Step: 23270 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-03 19:46:30,923-Speed 13856.67 samples/sec Loss 2.4865 LearningRate 0.0005 Epoch: 13 Global Step: 23280 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-03 19:46:48,688-Speed 13835.28 samples/sec Loss 2.4850 LearningRate 0.0005 Epoch: 13 Global Step: 23290 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-03 19:47:06,464-Speed 13826.06 samples/sec Loss 2.4518 LearningRate 0.0005 Epoch: 13 Global Step: 23300 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-03 19:47:24,238-Speed 13827.85 samples/sec Loss 2.4522 LearningRate 0.0005 Epoch: 13 Global Step: 23310 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-03 19:47:41,975-Speed 13856.77 samples/sec Loss 2.4790 LearningRate 0.0005 Epoch: 13 Global Step: 23320 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-03 19:47:59,750-Speed 13827.52 samples/sec Loss 2.4816 LearningRate 0.0005 Epoch: 13 Global Step: 23330 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-03 19:48:17,523-Speed 13827.88 samples/sec Loss 2.4698 LearningRate 0.0005 Epoch: 13 Global Step: 23340 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-03 19:48:35,246-Speed 13868.13 samples/sec Loss 2.4719 LearningRate 0.0005 Epoch: 13 Global Step: 23350 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-03 19:48:53,034-Speed 13817.00 samples/sec Loss 2.4802 LearningRate 0.0005 Epoch: 13 Global Step: 23360 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-03-03 19:49:10,830-Speed 13810.65 samples/sec Loss 2.4645 LearningRate 0.0005 Epoch: 13 Global Step: 23370 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-03-03 19:49:28,620-Speed 13815.86 samples/sec Loss 2.4612 LearningRate 0.0005 Epoch: 13 Global Step: 23380 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-03-03 19:49:46,609-Speed 13662.84 samples/sec Loss 2.4616 LearningRate 0.0005 Epoch: 13 Global Step: 23390 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-03 19:50:05,194-Speed 13224.52 samples/sec Loss 2.4622 LearningRate 0.0005 Epoch: 13 Global Step: 23400 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-03 19:50:22,877-Speed 13898.61 samples/sec Loss 2.4810 LearningRate 0.0005 Epoch: 13 Global Step: 23410 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-03 19:50:41,029-Speed 13540.27 samples/sec Loss 2.4536 LearningRate 0.0005 Epoch: 13 Global Step: 23420 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-03 19:50:59,201-Speed 13525.05 samples/sec Loss 2.4492 LearningRate 0.0005 Epoch: 13 Global Step: 23430 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-03 19:51:17,798-Speed 13215.94 samples/sec Loss 2.4812 LearningRate 0.0005 Epoch: 13 Global Step: 23440 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-03 19:51:35,587-Speed 13817.05 samples/sec Loss 2.4649 LearningRate 0.0005 Epoch: 13 Global Step: 23450 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-03 19:51:53,408-Speed 13791.11 samples/sec Loss 2.4652 LearningRate 0.0005 Epoch: 13 Global Step: 23460 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-03 19:52:12,024-Speed 13201.95 samples/sec Loss 2.4517 LearningRate 0.0005 Epoch: 13 Global Step: 23470 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-03 19:52:30,006-Speed 13668.63 samples/sec Loss 2.4647 LearningRate 0.0005 Epoch: 13 Global Step: 23480 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-03 19:52:48,624-Speed 13202.22 samples/sec Loss 2.5379 LearningRate 0.0005 Epoch: 13 Global Step: 23490 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-03 19:53:06,975-Speed 13392.74 samples/sec Loss 2.4993 LearningRate 0.0005 Epoch: 13 Global Step: 23500 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-03 19:53:25,057-Speed 13593.46 samples/sec Loss 2.4635 LearningRate 0.0005 Epoch: 13 Global Step: 23510 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-03 19:53:43,426-Speed 13379.58 samples/sec Loss 2.4513 LearningRate 0.0005 Epoch: 13 Global Step: 23520 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-03 19:54:01,228-Speed 13805.97 samples/sec Loss 2.4368 LearningRate 0.0005 Epoch: 13 Global Step: 23530 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-03 19:54:18,973-Speed 13850.35 samples/sec Loss 2.4524 LearningRate 0.0005 Epoch: 13 Global Step: 23540 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-03 19:54:36,747-Speed 13828.35 samples/sec Loss 2.4618 LearningRate 0.0005 Epoch: 13 Global Step: 23550 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-03 19:54:54,517-Speed 13830.54 samples/sec Loss 2.4454 LearningRate 0.0005 Epoch: 13 Global Step: 23560 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-03 19:55:12,320-Speed 13805.73 samples/sec Loss 2.4449 LearningRate 0.0005 Epoch: 13 Global Step: 23570 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-03 19:55:29,982-Speed 13915.66 samples/sec Loss 2.4401 LearningRate 0.0005 Epoch: 13 Global Step: 23580 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-03 19:55:47,699-Speed 13872.92 samples/sec Loss 2.4551 LearningRate 0.0005 Epoch: 13 Global Step: 23590 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-03 19:56:05,388-Speed 13894.87 samples/sec Loss 2.4660 LearningRate 0.0005 Epoch: 13 Global Step: 23600 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-03 19:56:23,120-Speed 13860.22 samples/sec Loss 2.4220 LearningRate 0.0005 Epoch: 13 Global Step: 23610 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-03 19:56:40,894-Speed 13827.99 samples/sec Loss 2.4404 LearningRate 0.0005 Epoch: 13 Global Step: 23620 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-03 19:56:58,624-Speed 13865.23 samples/sec Loss 2.4472 LearningRate 0.0005 Epoch: 13 Global Step: 23630 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-03 19:57:16,354-Speed 13861.95 samples/sec Loss 2.4720 LearningRate 0.0005 Epoch: 13 Global Step: 23640 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-03 19:57:34,032-Speed 13902.57 samples/sec Loss 2.5009 LearningRate 0.0005 Epoch: 13 Global Step: 23650 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-03 19:57:51,760-Speed 13864.07 samples/sec Loss 2.4859 LearningRate 0.0005 Epoch: 13 Global Step: 23660 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-03 19:58:09,606-Speed 13771.96 samples/sec Loss 2.4429 LearningRate 0.0005 Epoch: 13 Global Step: 23670 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-03 19:58:27,321-Speed 13874.32 samples/sec Loss 2.4349 LearningRate 0.0005 Epoch: 13 Global Step: 23680 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-03 19:58:45,046-Speed 13865.84 samples/sec Loss 2.4555 LearningRate 0.0005 Epoch: 13 Global Step: 23690 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-03 19:59:02,812-Speed 13834.87 samples/sec Loss 2.4574 LearningRate 0.0005 Epoch: 13 Global Step: 23700 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-03 19:59:20,580-Speed 13832.81 samples/sec Loss 2.4276 LearningRate 0.0005 Epoch: 13 Global Step: 23710 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-03 19:59:38,251-Speed 13908.08 samples/sec Loss 2.4139 LearningRate 0.0005 Epoch: 13 Global Step: 23720 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-03 19:59:56,024-Speed 13829.05 samples/sec Loss 2.4426 LearningRate 0.0005 Epoch: 13 Global Step: 23730 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-03 20:00:13,856-Speed 13782.89 samples/sec Loss 2.4283 LearningRate 0.0005 Epoch: 13 Global Step: 23740 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-03 20:00:31,658-Speed 13806.39 samples/sec Loss 2.4294 LearningRate 0.0005 Epoch: 13 Global Step: 23750 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-03 20:00:49,386-Speed 13863.82 samples/sec Loss 2.4369 LearningRate 0.0005 Epoch: 13 Global Step: 23760 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-03-03 20:01:07,162-Speed 13825.97 samples/sec Loss 2.4327 LearningRate 0.0005 Epoch: 13 Global Step: 23770 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-03-03 20:01:24,989-Speed 13786.50 samples/sec Loss 2.4477 LearningRate 0.0005 Epoch: 13 Global Step: 23780 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-03-03 20:01:42,729-Speed 13854.32 samples/sec Loss 2.4559 LearningRate 0.0005 Epoch: 13 Global Step: 23790 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-03-03 20:02:00,690-Speed 13684.09 samples/sec Loss 2.4451 LearningRate 0.0005 Epoch: 13 Global Step: 23800 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-03-03 20:02:18,636-Speed 13695.65 samples/sec Loss 2.4284 LearningRate 0.0005 Epoch: 13 Global Step: 23810 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-03-03 20:02:36,461-Speed 13787.69 samples/sec Loss 2.4447 LearningRate 0.0005 Epoch: 13 Global Step: 23820 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-03-03 20:02:54,326-Speed 13757.91 samples/sec Loss 2.4397 LearningRate 0.0005 Epoch: 13 Global Step: 23830 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-03-03 20:03:12,100-Speed 13827.36 samples/sec Loss 2.4429 LearningRate 0.0005 Epoch: 13 Global Step: 23840 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-03-03 20:03:29,885-Speed 13819.30 samples/sec Loss 2.4357 LearningRate 0.0005 Epoch: 13 Global Step: 23850 Fp16 Grad Scale: 16384 Required: 23 hours Training: 2022-03-03 20:03:47,650-Speed 13834.73 samples/sec Loss 2.4352 LearningRate 0.0005 Epoch: 13 Global Step: 23860 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-03 20:04:05,499-Speed 13770.05 samples/sec Loss 2.4270 LearningRate 0.0005 Epoch: 13 Global Step: 23870 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-03 20:04:23,285-Speed 13818.68 samples/sec Loss 2.4310 LearningRate 0.0005 Epoch: 13 Global Step: 23880 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-03 20:04:41,213-Speed 13709.04 samples/sec Loss 2.4262 LearningRate 0.0005 Epoch: 13 Global Step: 23890 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-03 20:04:59,323-Speed 13571.28 samples/sec Loss 2.4209 LearningRate 0.0005 Epoch: 13 Global Step: 23900 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-03 20:05:17,422-Speed 13580.97 samples/sec Loss 2.4411 LearningRate 0.0005 Epoch: 13 Global Step: 23910 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-03 20:05:35,162-Speed 13854.18 samples/sec Loss 2.4505 LearningRate 0.0005 Epoch: 13 Global Step: 23920 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-03 20:05:52,861-Speed 13886.40 samples/sec Loss 2.4550 LearningRate 0.0005 Epoch: 13 Global Step: 23930 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-03 20:06:10,581-Speed 13869.73 samples/sec Loss 2.4443 LearningRate 0.0005 Epoch: 13 Global Step: 23940 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-03 20:06:28,310-Speed 13863.38 samples/sec Loss 2.4231 LearningRate 0.0005 Epoch: 13 Global Step: 23950 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-03 20:06:46,027-Speed 13872.27 samples/sec Loss 2.4425 LearningRate 0.0005 Epoch: 13 Global Step: 23960 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-03 20:07:03,762-Speed 13858.36 samples/sec Loss 2.4301 LearningRate 0.0005 Epoch: 13 Global Step: 23970 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-03 20:07:21,496-Speed 13858.95 samples/sec Loss 2.4235 LearningRate 0.0005 Epoch: 13 Global Step: 23980 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-03 20:07:39,431-Speed 13703.21 samples/sec Loss 2.4381 LearningRate 0.0005 Epoch: 13 Global Step: 23990 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-03 20:07:57,342-Speed 13723.24 samples/sec Loss 2.4081 LearningRate 0.0005 Epoch: 13 Global Step: 24000 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-03 20:08:15,273-Speed 13707.35 samples/sec Loss 2.4364 LearningRate 0.0005 Epoch: 13 Global Step: 24010 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-03 20:08:33,170-Speed 13732.52 samples/sec Loss 2.4567 LearningRate 0.0005 Epoch: 13 Global Step: 24020 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-03 20:08:50,925-Speed 13843.05 samples/sec Loss 2.4263 LearningRate 0.0005 Epoch: 13 Global Step: 24030 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-03 20:09:08,621-Speed 13888.66 samples/sec Loss 2.4149 LearningRate 0.0005 Epoch: 13 Global Step: 24040 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-03 20:09:26,369-Speed 13848.22 samples/sec Loss 2.4307 LearningRate 0.0005 Epoch: 13 Global Step: 24050 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-03 20:09:44,018-Speed 13926.74 samples/sec Loss 2.4382 LearningRate 0.0005 Epoch: 13 Global Step: 24060 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-03-03 20:10:01,788-Speed 13831.02 samples/sec Loss 2.4296 LearningRate 0.0005 Epoch: 13 Global Step: 24070 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-03-03 20:10:19,481-Speed 13890.92 samples/sec Loss 2.4162 LearningRate 0.0005 Epoch: 13 Global Step: 24080 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-03 20:10:37,224-Speed 13852.56 samples/sec Loss 2.4076 LearningRate 0.0005 Epoch: 13 Global Step: 24090 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-03 20:10:55,129-Speed 13732.54 samples/sec Loss 2.4296 LearningRate 0.0005 Epoch: 13 Global Step: 24100 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-03 20:11:12,849-Speed 13869.52 samples/sec Loss 2.4365 LearningRate 0.0005 Epoch: 13 Global Step: 24110 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-03 20:11:30,516-Speed 13912.21 samples/sec Loss 2.4245 LearningRate 0.0005 Epoch: 13 Global Step: 24120 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-03 20:11:48,267-Speed 13846.21 samples/sec Loss 2.4339 LearningRate 0.0005 Epoch: 13 Global Step: 24130 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-03 20:12:06,034-Speed 13833.21 samples/sec Loss 2.4293 LearningRate 0.0005 Epoch: 13 Global Step: 24140 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-03 20:12:23,683-Speed 13925.47 samples/sec Loss 2.4280 LearningRate 0.0005 Epoch: 13 Global Step: 24150 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-03 20:12:41,336-Speed 13922.83 samples/sec Loss 2.4418 LearningRate 0.0005 Epoch: 13 Global Step: 24160 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-03 20:12:59,061-Speed 13866.45 samples/sec Loss 2.4211 LearningRate 0.0005 Epoch: 13 Global Step: 24170 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-03 20:13:16,792-Speed 13861.37 samples/sec Loss 2.4336 LearningRate 0.0005 Epoch: 13 Global Step: 24180 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-03 20:13:34,451-Speed 13917.97 samples/sec Loss 2.4433 LearningRate 0.0005 Epoch: 13 Global Step: 24190 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-03 20:14:41,689-Speed 3655.10 samples/sec Loss 2.4223 LearningRate 0.0005 Epoch: 14 Global Step: 24200 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-03 20:14:59,317-Speed 13942.34 samples/sec Loss 2.4217 LearningRate 0.0005 Epoch: 14 Global Step: 24210 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-03 20:15:16,969-Speed 13923.39 samples/sec Loss 2.3981 LearningRate 0.0005 Epoch: 14 Global Step: 24220 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-03 20:15:34,618-Speed 13925.64 samples/sec Loss 2.3797 LearningRate 0.0005 Epoch: 14 Global Step: 24230 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-03 20:15:52,339-Speed 13869.76 samples/sec Loss 2.3759 LearningRate 0.0005 Epoch: 14 Global Step: 24240 Fp16 Grad Scale: 32768 Required: 23 hours Training: 2022-03-03 20:16:10,147-Speed 13801.14 samples/sec Loss 2.3915 LearningRate 0.0005 Epoch: 14 Global Step: 24250 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-03 20:16:27,830-Speed 13898.82 samples/sec Loss 2.3860 LearningRate 0.0005 Epoch: 14 Global Step: 24260 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-03 20:16:45,499-Speed 13910.31 samples/sec Loss 2.3788 LearningRate 0.0005 Epoch: 14 Global Step: 24270 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-03 20:17:03,189-Speed 13893.18 samples/sec Loss 2.3903 LearningRate 0.0005 Epoch: 14 Global Step: 24280 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-03 20:17:20,834-Speed 13928.89 samples/sec Loss 2.3700 LearningRate 0.0005 Epoch: 14 Global Step: 24290 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-03 20:17:38,497-Speed 13915.27 samples/sec Loss 2.4038 LearningRate 0.0005 Epoch: 14 Global Step: 24300 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-03 20:17:56,214-Speed 13872.16 samples/sec Loss 2.4018 LearningRate 0.0005 Epoch: 14 Global Step: 24310 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-03 20:18:13,869-Speed 13920.54 samples/sec Loss 2.3734 LearningRate 0.0005 Epoch: 14 Global Step: 24320 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-03 20:18:31,690-Speed 13791.70 samples/sec Loss 2.3785 LearningRate 0.0005 Epoch: 14 Global Step: 24330 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-03 20:18:49,372-Speed 13899.89 samples/sec Loss 2.3905 LearningRate 0.0005 Epoch: 14 Global Step: 24340 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-03-03 20:19:07,163-Speed 13814.22 samples/sec Loss 2.4163 LearningRate 0.0005 Epoch: 14 Global Step: 24350 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-03 20:19:24,993-Speed 13784.63 samples/sec Loss 2.4044 LearningRate 0.0005 Epoch: 14 Global Step: 24360 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-03 20:19:42,673-Speed 13901.53 samples/sec Loss 2.4152 LearningRate 0.0005 Epoch: 14 Global Step: 24370 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-03 20:20:00,309-Speed 13935.73 samples/sec Loss 2.3948 LearningRate 0.0005 Epoch: 14 Global Step: 24380 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-03 20:20:18,026-Speed 13872.29 samples/sec Loss 2.3809 LearningRate 0.0005 Epoch: 14 Global Step: 24390 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-03 20:20:35,734-Speed 13879.05 samples/sec Loss 2.3778 LearningRate 0.0005 Epoch: 14 Global Step: 24400 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-03 20:20:53,494-Speed 13839.27 samples/sec Loss 2.3854 LearningRate 0.0005 Epoch: 14 Global Step: 24410 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-03 20:21:11,243-Speed 13849.48 samples/sec Loss 2.3823 LearningRate 0.0005 Epoch: 14 Global Step: 24420 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-03 20:21:29,003-Speed 13838.96 samples/sec Loss 2.3838 LearningRate 0.0005 Epoch: 14 Global Step: 24430 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-03 20:21:46,915-Speed 13721.22 samples/sec Loss 2.3882 LearningRate 0.0005 Epoch: 14 Global Step: 24440 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-03 20:22:04,800-Speed 13742.15 samples/sec Loss 2.4157 LearningRate 0.0005 Epoch: 14 Global Step: 24450 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-03-03 20:22:22,591-Speed 13815.11 samples/sec Loss 2.4214 LearningRate 0.0005 Epoch: 14 Global Step: 24460 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-03-03 20:22:40,232-Speed 13931.85 samples/sec Loss 2.3881 LearningRate 0.0005 Epoch: 14 Global Step: 24470 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-03 20:22:57,925-Speed 13891.06 samples/sec Loss 2.3909 LearningRate 0.0005 Epoch: 14 Global Step: 24480 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-03 20:23:15,670-Speed 13850.34 samples/sec Loss 2.3705 LearningRate 0.0005 Epoch: 14 Global Step: 24490 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-03 20:23:33,369-Speed 13886.95 samples/sec Loss 2.3747 LearningRate 0.0005 Epoch: 14 Global Step: 24500 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-03 20:23:51,087-Speed 13871.47 samples/sec Loss 2.3706 LearningRate 0.0005 Epoch: 14 Global Step: 24510 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-03 20:24:08,766-Speed 13902.50 samples/sec Loss 2.3949 LearningRate 0.0005 Epoch: 14 Global Step: 24520 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-03 20:24:26,486-Speed 13869.98 samples/sec Loss 2.3848 LearningRate 0.0005 Epoch: 14 Global Step: 24530 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-03 20:24:44,158-Speed 13907.89 samples/sec Loss 2.3737 LearningRate 0.0005 Epoch: 14 Global Step: 24540 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-03 20:25:01,861-Speed 13883.61 samples/sec Loss 2.3980 LearningRate 0.0005 Epoch: 14 Global Step: 24550 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-03 20:25:19,617-Speed 13842.00 samples/sec Loss 2.3808 LearningRate 0.0005 Epoch: 14 Global Step: 24560 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-03 20:25:37,393-Speed 13826.81 samples/sec Loss 2.3698 LearningRate 0.0005 Epoch: 14 Global Step: 24570 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-03-03 20:25:55,099-Speed 13880.68 samples/sec Loss 2.3952 LearningRate 0.0005 Epoch: 14 Global Step: 24580 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-03-03 20:26:12,912-Speed 13798.12 samples/sec Loss 2.3936 LearningRate 0.0005 Epoch: 14 Global Step: 24590 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-03-03 20:26:30,716-Speed 13803.92 samples/sec Loss 2.3914 LearningRate 0.0005 Epoch: 14 Global Step: 24600 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-03-03 20:26:48,447-Speed 13861.37 samples/sec Loss 2.3820 LearningRate 0.0005 Epoch: 14 Global Step: 24610 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-03 20:27:06,253-Speed 13803.10 samples/sec Loss 2.3618 LearningRate 0.0005 Epoch: 14 Global Step: 24620 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-03 20:27:24,050-Speed 13810.07 samples/sec Loss 2.3736 LearningRate 0.0005 Epoch: 14 Global Step: 24630 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-03 20:27:41,952-Speed 13729.18 samples/sec Loss 2.3895 LearningRate 0.0005 Epoch: 14 Global Step: 24640 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-03 20:27:59,733-Speed 13822.27 samples/sec Loss 2.3841 LearningRate 0.0005 Epoch: 14 Global Step: 24650 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-03 20:28:17,558-Speed 13787.56 samples/sec Loss 2.3896 LearningRate 0.0005 Epoch: 14 Global Step: 24660 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-03 20:28:35,415-Speed 13764.42 samples/sec Loss 2.3671 LearningRate 0.0005 Epoch: 14 Global Step: 24670 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-03 20:28:53,214-Speed 13807.82 samples/sec Loss 2.3994 LearningRate 0.0005 Epoch: 14 Global Step: 24680 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-03 20:29:11,166-Speed 13690.97 samples/sec Loss 2.3875 LearningRate 0.0005 Epoch: 14 Global Step: 24690 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-03 20:29:29,321-Speed 13537.38 samples/sec Loss 2.4167 LearningRate 0.0005 Epoch: 14 Global Step: 24700 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-03 20:29:47,153-Speed 13782.94 samples/sec Loss 2.3693 LearningRate 0.0005 Epoch: 14 Global Step: 24710 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-03-03 20:30:04,875-Speed 13868.47 samples/sec Loss 2.3583 LearningRate 0.0005 Epoch: 14 Global Step: 24720 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-03 20:30:22,588-Speed 13875.68 samples/sec Loss 2.3705 LearningRate 0.0005 Epoch: 14 Global Step: 24730 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-03 20:30:40,420-Speed 13782.42 samples/sec Loss 2.3639 LearningRate 0.0005 Epoch: 14 Global Step: 24740 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-03 20:30:58,197-Speed 13826.15 samples/sec Loss 2.3834 LearningRate 0.0005 Epoch: 14 Global Step: 24750 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-03 20:31:15,889-Speed 13891.65 samples/sec Loss 2.3699 LearningRate 0.0005 Epoch: 14 Global Step: 24760 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-03 20:31:33,617-Speed 13863.47 samples/sec Loss 2.3485 LearningRate 0.0005 Epoch: 14 Global Step: 24770 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-03 20:31:51,366-Speed 13847.25 samples/sec Loss 2.3570 LearningRate 0.0005 Epoch: 14 Global Step: 24780 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-03 20:32:09,106-Speed 13854.53 samples/sec Loss 2.3749 LearningRate 0.0005 Epoch: 14 Global Step: 24790 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-03 20:32:26,797-Speed 13892.95 samples/sec Loss 2.3734 LearningRate 0.0005 Epoch: 14 Global Step: 24800 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-03 20:32:44,531-Speed 13858.92 samples/sec Loss 2.3585 LearningRate 0.0005 Epoch: 14 Global Step: 24810 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-03 20:33:02,338-Speed 13801.76 samples/sec Loss 2.3449 LearningRate 0.0005 Epoch: 14 Global Step: 24820 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-03 20:33:20,074-Speed 13857.61 samples/sec Loss 2.4066 LearningRate 0.0005 Epoch: 14 Global Step: 24830 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-03 20:33:37,858-Speed 13820.75 samples/sec Loss 2.3839 LearningRate 0.0005 Epoch: 14 Global Step: 24840 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-03 20:33:55,609-Speed 13845.52 samples/sec Loss 2.3610 LearningRate 0.0005 Epoch: 14 Global Step: 24850 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-03 20:34:13,475-Speed 13756.47 samples/sec Loss 2.3649 LearningRate 0.0005 Epoch: 14 Global Step: 24860 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-03 20:34:31,192-Speed 13872.78 samples/sec Loss 2.3656 LearningRate 0.0005 Epoch: 14 Global Step: 24870 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-03 20:34:48,924-Speed 13860.49 samples/sec Loss 2.3581 LearningRate 0.0005 Epoch: 14 Global Step: 24880 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-03 20:35:06,785-Speed 13760.61 samples/sec Loss 2.3602 LearningRate 0.0005 Epoch: 14 Global Step: 24890 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-03 20:35:24,658-Speed 13751.15 samples/sec Loss 2.3660 LearningRate 0.0005 Epoch: 14 Global Step: 24900 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-03 20:35:42,505-Speed 13771.20 samples/sec Loss 2.3379 LearningRate 0.0005 Epoch: 14 Global Step: 24910 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-03 20:36:00,345-Speed 13776.43 samples/sec Loss 2.3521 LearningRate 0.0005 Epoch: 14 Global Step: 24920 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-03 20:36:18,231-Speed 13741.37 samples/sec Loss 2.3773 LearningRate 0.0005 Epoch: 14 Global Step: 24930 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-03 20:36:35,960-Speed 13862.80 samples/sec Loss 2.3468 LearningRate 0.0005 Epoch: 14 Global Step: 24940 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-03 20:36:53,708-Speed 13848.40 samples/sec Loss 2.3649 LearningRate 0.0005 Epoch: 14 Global Step: 24950 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-03 20:37:11,518-Speed 13799.95 samples/sec Loss 2.3478 LearningRate 0.0005 Epoch: 14 Global Step: 24960 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-03 20:37:29,198-Speed 13900.94 samples/sec Loss 2.3582 LearningRate 0.0005 Epoch: 14 Global Step: 24970 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-03 20:37:46,917-Speed 13870.79 samples/sec Loss 2.3573 LearningRate 0.0005 Epoch: 14 Global Step: 24980 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-03 20:38:04,738-Speed 13791.56 samples/sec Loss 2.3425 LearningRate 0.0005 Epoch: 14 Global Step: 24990 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-03 20:38:22,594-Speed 13764.45 samples/sec Loss 2.3465 LearningRate 0.0005 Epoch: 14 Global Step: 25000 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-03 20:38:40,406-Speed 13797.83 samples/sec Loss 2.3885 LearningRate 0.0005 Epoch: 14 Global Step: 25010 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-03 20:38:58,272-Speed 13756.82 samples/sec Loss 2.3834 LearningRate 0.0005 Epoch: 14 Global Step: 25020 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-03 20:39:16,142-Speed 13753.57 samples/sec Loss 2.3605 LearningRate 0.0005 Epoch: 14 Global Step: 25030 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-03 20:39:33,906-Speed 13835.67 samples/sec Loss 2.3626 LearningRate 0.0005 Epoch: 14 Global Step: 25040 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-03 20:39:51,750-Speed 13774.05 samples/sec Loss 2.3340 LearningRate 0.0005 Epoch: 14 Global Step: 25050 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-03 20:40:09,509-Speed 13839.45 samples/sec Loss 2.3253 LearningRate 0.0005 Epoch: 14 Global Step: 25060 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-03 20:40:27,319-Speed 13799.69 samples/sec Loss 2.3528 LearningRate 0.0005 Epoch: 14 Global Step: 25070 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-03 20:40:45,066-Speed 13848.56 samples/sec Loss 2.3433 LearningRate 0.0005 Epoch: 14 Global Step: 25080 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-03 20:41:02,818-Speed 13845.39 samples/sec Loss 2.3309 LearningRate 0.0005 Epoch: 14 Global Step: 25090 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-03 20:41:20,522-Speed 13882.50 samples/sec Loss 2.3348 LearningRate 0.0005 Epoch: 14 Global Step: 25100 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-03-03 20:41:38,270-Speed 13847.60 samples/sec Loss 2.3475 LearningRate 0.0005 Epoch: 14 Global Step: 25110 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-03 20:41:56,102-Speed 13782.79 samples/sec Loss 2.3586 LearningRate 0.0005 Epoch: 14 Global Step: 25120 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-03 20:42:13,978-Speed 13749.66 samples/sec Loss 2.3429 LearningRate 0.0005 Epoch: 14 Global Step: 25130 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-03 20:42:31,757-Speed 13823.54 samples/sec Loss 2.3396 LearningRate 0.0005 Epoch: 14 Global Step: 25140 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-03 20:42:49,632-Speed 13749.43 samples/sec Loss 2.3360 LearningRate 0.0005 Epoch: 14 Global Step: 25150 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-03 20:43:07,475-Speed 13774.91 samples/sec Loss 2.3462 LearningRate 0.0005 Epoch: 14 Global Step: 25160 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-03 20:43:25,229-Speed 13843.28 samples/sec Loss 2.3353 LearningRate 0.0005 Epoch: 14 Global Step: 25170 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-03 20:43:43,007-Speed 13825.13 samples/sec Loss 2.3391 LearningRate 0.0005 Epoch: 14 Global Step: 25180 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-03 20:44:00,841-Speed 13783.55 samples/sec Loss 2.3231 LearningRate 0.0005 Epoch: 14 Global Step: 25190 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-03 20:44:18,733-Speed 13736.47 samples/sec Loss 2.3233 LearningRate 0.0005 Epoch: 14 Global Step: 25200 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-03 20:44:36,617-Speed 13742.70 samples/sec Loss 2.3431 LearningRate 0.0005 Epoch: 14 Global Step: 25210 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-03 20:44:54,461-Speed 13773.74 samples/sec Loss 2.3402 LearningRate 0.0005 Epoch: 14 Global Step: 25220 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-03 20:45:12,353-Speed 13736.48 samples/sec Loss 2.3339 LearningRate 0.0005 Epoch: 14 Global Step: 25230 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-03 20:45:30,197-Speed 13773.43 samples/sec Loss 2.3246 LearningRate 0.0005 Epoch: 14 Global Step: 25240 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-03 20:45:48,002-Speed 13803.50 samples/sec Loss 2.3318 LearningRate 0.0005 Epoch: 14 Global Step: 25250 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-03 20:46:05,924-Speed 13714.08 samples/sec Loss 2.3276 LearningRate 0.0005 Epoch: 14 Global Step: 25260 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-03 20:46:23,747-Speed 13789.73 samples/sec Loss 2.3361 LearningRate 0.0005 Epoch: 14 Global Step: 25270 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-03 20:46:41,544-Speed 13809.93 samples/sec Loss 2.3402 LearningRate 0.0005 Epoch: 14 Global Step: 25280 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-03 20:46:59,437-Speed 13735.97 samples/sec Loss 2.3404 LearningRate 0.0005 Epoch: 14 Global Step: 25290 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-03 20:47:17,268-Speed 13783.68 samples/sec Loss 2.3404 LearningRate 0.0005 Epoch: 14 Global Step: 25300 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-03 20:47:35,133-Speed 13757.96 samples/sec Loss 2.3192 LearningRate 0.0005 Epoch: 14 Global Step: 25310 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-03 20:47:52,949-Speed 13795.05 samples/sec Loss 2.3248 LearningRate 0.0005 Epoch: 14 Global Step: 25320 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-03 20:48:10,924-Speed 13673.07 samples/sec Loss 2.3343 LearningRate 0.0005 Epoch: 14 Global Step: 25330 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-03 20:48:28,737-Speed 13797.24 samples/sec Loss 2.3226 LearningRate 0.0005 Epoch: 14 Global Step: 25340 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-03 20:48:46,543-Speed 13803.32 samples/sec Loss 2.3526 LearningRate 0.0005 Epoch: 14 Global Step: 25350 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-03 20:49:04,382-Speed 13777.94 samples/sec Loss 2.3300 LearningRate 0.0005 Epoch: 14 Global Step: 25360 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-03 20:49:22,223-Speed 13775.75 samples/sec Loss 2.3444 LearningRate 0.0005 Epoch: 14 Global Step: 25370 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-03 20:49:40,107-Speed 13742.60 samples/sec Loss 2.3156 LearningRate 0.0005 Epoch: 14 Global Step: 25380 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-03-03 20:49:57,909-Speed 13806.14 samples/sec Loss 2.3140 LearningRate 0.0005 Epoch: 14 Global Step: 25390 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-03-03 20:50:15,655-Speed 13849.32 samples/sec Loss 2.3322 LearningRate 0.0005 Epoch: 14 Global Step: 25400 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-03 20:50:33,514-Speed 13762.54 samples/sec Loss 2.2914 LearningRate 0.0005 Epoch: 14 Global Step: 25410 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-03 20:50:51,413-Speed 13730.92 samples/sec Loss 2.3252 LearningRate 0.0005 Epoch: 14 Global Step: 25420 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-03 20:51:09,170-Speed 13841.68 samples/sec Loss 2.3060 LearningRate 0.0005 Epoch: 14 Global Step: 25430 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-03 20:51:27,016-Speed 13772.13 samples/sec Loss 2.3267 LearningRate 0.0005 Epoch: 14 Global Step: 25440 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-03 20:51:44,783-Speed 13833.37 samples/sec Loss 2.3266 LearningRate 0.0005 Epoch: 14 Global Step: 25450 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-03 20:52:02,658-Speed 13749.17 samples/sec Loss 2.3253 LearningRate 0.0005 Epoch: 14 Global Step: 25460 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-03 20:52:20,426-Speed 13833.18 samples/sec Loss 2.3143 LearningRate 0.0005 Epoch: 14 Global Step: 25470 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-03 20:52:38,222-Speed 13810.46 samples/sec Loss 2.3133 LearningRate 0.0005 Epoch: 14 Global Step: 25480 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-03 20:52:56,081-Speed 13762.20 samples/sec Loss 2.3138 LearningRate 0.0005 Epoch: 14 Global Step: 25490 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-03 20:53:13,874-Speed 13812.99 samples/sec Loss 2.3064 LearningRate 0.0005 Epoch: 14 Global Step: 25500 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-03 20:53:31,650-Speed 13826.04 samples/sec Loss 2.3310 LearningRate 0.0005 Epoch: 14 Global Step: 25510 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-03 20:53:49,413-Speed 13837.27 samples/sec Loss 2.3232 LearningRate 0.0005 Epoch: 14 Global Step: 25520 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-03 20:54:07,304-Speed 13737.22 samples/sec Loss 2.3149 LearningRate 0.0005 Epoch: 14 Global Step: 25530 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-03 20:54:25,151-Speed 13771.38 samples/sec Loss 2.2989 LearningRate 0.0005 Epoch: 14 Global Step: 25540 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-03 20:54:42,962-Speed 13798.57 samples/sec Loss 2.3161 LearningRate 0.0005 Epoch: 14 Global Step: 25550 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-03 20:55:00,734-Speed 13829.75 samples/sec Loss 2.3078 LearningRate 0.0005 Epoch: 14 Global Step: 25560 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-03 20:55:18,538-Speed 13804.09 samples/sec Loss 2.3226 LearningRate 0.0005 Epoch: 14 Global Step: 25570 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-03 20:55:36,297-Speed 13839.36 samples/sec Loss 2.3119 LearningRate 0.0005 Epoch: 14 Global Step: 25580 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-03 20:55:54,026-Speed 13863.28 samples/sec Loss 2.3164 LearningRate 0.0005 Epoch: 14 Global Step: 25590 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-03 20:56:11,853-Speed 13786.71 samples/sec Loss 2.3144 LearningRate 0.0005 Epoch: 14 Global Step: 25600 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-03 20:56:29,665-Speed 13798.15 samples/sec Loss 2.2879 LearningRate 0.0005 Epoch: 14 Global Step: 25610 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-03 20:56:47,411-Speed 13849.83 samples/sec Loss 2.3267 LearningRate 0.0005 Epoch: 14 Global Step: 25620 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-03 20:57:05,199-Speed 13816.96 samples/sec Loss 2.3140 LearningRate 0.0005 Epoch: 14 Global Step: 25630 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-03 20:57:22,932-Speed 13860.18 samples/sec Loss 2.3114 LearningRate 0.0005 Epoch: 14 Global Step: 25640 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-03 20:57:40,679-Speed 13848.53 samples/sec Loss 2.2961 LearningRate 0.0005 Epoch: 14 Global Step: 25650 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-03 20:57:58,518-Speed 13777.35 samples/sec Loss 2.2979 LearningRate 0.0005 Epoch: 14 Global Step: 25660 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-03 20:58:16,206-Speed 13894.75 samples/sec Loss 2.3021 LearningRate 0.0005 Epoch: 14 Global Step: 25670 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-03 20:58:34,000-Speed 13812.19 samples/sec Loss 2.2887 LearningRate 0.0005 Epoch: 14 Global Step: 25680 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-03 20:58:51,741-Speed 13854.37 samples/sec Loss 2.3128 LearningRate 0.0005 Epoch: 14 Global Step: 25690 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-03 20:59:09,428-Speed 13895.34 samples/sec Loss 2.3155 LearningRate 0.0005 Epoch: 14 Global Step: 25700 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-03 20:59:27,175-Speed 13849.03 samples/sec Loss 2.3152 LearningRate 0.0005 Epoch: 14 Global Step: 25710 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-03 20:59:44,893-Speed 13872.26 samples/sec Loss 2.3471 LearningRate 0.0005 Epoch: 14 Global Step: 25720 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-03 21:00:02,765-Speed 13752.28 samples/sec Loss 2.3217 LearningRate 0.0005 Epoch: 14 Global Step: 25730 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-03 21:00:20,493-Speed 13863.76 samples/sec Loss 2.3181 LearningRate 0.0005 Epoch: 14 Global Step: 25740 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-03 21:00:38,189-Speed 13888.45 samples/sec Loss 2.3263 LearningRate 0.0005 Epoch: 14 Global Step: 25750 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-03 21:00:55,968-Speed 13823.85 samples/sec Loss 2.3223 LearningRate 0.0005 Epoch: 14 Global Step: 25760 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-03 21:01:13,790-Speed 13790.95 samples/sec Loss 2.3137 LearningRate 0.0005 Epoch: 14 Global Step: 25770 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-03 21:01:31,467-Speed 13903.74 samples/sec Loss 2.3021 LearningRate 0.0005 Epoch: 14 Global Step: 25780 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-03 21:01:49,215-Speed 13847.96 samples/sec Loss 2.2963 LearningRate 0.0005 Epoch: 14 Global Step: 25790 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-03-03 21:02:06,921-Speed 13882.30 samples/sec Loss 2.2967 LearningRate 0.0005 Epoch: 14 Global Step: 25800 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-03-03 21:02:24,633-Speed 13875.96 samples/sec Loss 2.3086 LearningRate 0.0005 Epoch: 14 Global Step: 25810 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-03-03 21:02:42,308-Speed 13905.83 samples/sec Loss 2.3071 LearningRate 0.0005 Epoch: 14 Global Step: 25820 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-03 21:02:59,970-Speed 13915.21 samples/sec Loss 2.3177 LearningRate 0.0005 Epoch: 14 Global Step: 25830 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-03 21:03:17,721-Speed 13845.13 samples/sec Loss 2.2981 LearningRate 0.0005 Epoch: 14 Global Step: 25840 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-03 21:03:35,425-Speed 13883.38 samples/sec Loss 2.2881 LearningRate 0.0005 Epoch: 14 Global Step: 25850 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-03 21:03:53,115-Speed 13893.62 samples/sec Loss 2.3031 LearningRate 0.0005 Epoch: 14 Global Step: 25860 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-03 21:04:10,854-Speed 13855.33 samples/sec Loss 2.3108 LearningRate 0.0005 Epoch: 14 Global Step: 25870 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-03 21:04:28,671-Speed 13794.87 samples/sec Loss 2.2811 LearningRate 0.0005 Epoch: 14 Global Step: 25880 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-03 21:04:46,350-Speed 13901.86 samples/sec Loss 2.3186 LearningRate 0.0005 Epoch: 14 Global Step: 25890 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-03 21:05:04,193-Speed 13774.90 samples/sec Loss 2.3087 LearningRate 0.0005 Epoch: 14 Global Step: 25900 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-03 21:05:21,924-Speed 13861.32 samples/sec Loss 2.3380 LearningRate 0.0005 Epoch: 14 Global Step: 25910 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-03 21:05:39,669-Speed 13850.09 samples/sec Loss 2.3149 LearningRate 0.0005 Epoch: 14 Global Step: 25920 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-03 21:06:48,913-Speed 3549.24 samples/sec Loss 2.2814 LearningRate 0.0005 Epoch: 15 Global Step: 25930 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-03 21:07:06,520-Speed 13959.10 samples/sec Loss 2.2586 LearningRate 0.0005 Epoch: 15 Global Step: 25940 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-03 21:07:24,150-Speed 13941.48 samples/sec Loss 2.2808 LearningRate 0.0005 Epoch: 15 Global Step: 25950 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-03 21:07:41,816-Speed 13912.41 samples/sec Loss 2.2682 LearningRate 0.0005 Epoch: 15 Global Step: 25960 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-03 21:07:59,600-Speed 13820.03 samples/sec Loss 2.2685 LearningRate 0.0005 Epoch: 15 Global Step: 25970 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-03 21:08:17,317-Speed 13872.63 samples/sec Loss 2.2817 LearningRate 0.0005 Epoch: 15 Global Step: 25980 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-03 21:08:35,007-Speed 13893.67 samples/sec Loss 2.2724 LearningRate 0.0005 Epoch: 15 Global Step: 25990 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-03 21:08:52,772-Speed 13834.64 samples/sec Loss 2.2608 LearningRate 0.0005 Epoch: 15 Global Step: 26000 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-03 21:09:10,565-Speed 13812.75 samples/sec Loss 2.2709 LearningRate 0.0005 Epoch: 15 Global Step: 26010 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-03 21:09:28,260-Speed 13889.71 samples/sec Loss 2.2577 LearningRate 0.0005 Epoch: 15 Global Step: 26020 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-03 21:09:45,945-Speed 13897.50 samples/sec Loss 2.2932 LearningRate 0.0005 Epoch: 15 Global Step: 26030 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-03 21:10:03,622-Speed 13905.19 samples/sec Loss 2.2642 LearningRate 0.0005 Epoch: 15 Global Step: 26040 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-03 21:10:21,332-Speed 13877.67 samples/sec Loss 2.2686 LearningRate 0.0005 Epoch: 15 Global Step: 26050 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-03 21:10:39,136-Speed 13804.55 samples/sec Loss 2.2575 LearningRate 0.0005 Epoch: 15 Global Step: 26060 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-03 21:10:56,941-Speed 13803.69 samples/sec Loss 2.2618 LearningRate 0.0005 Epoch: 15 Global Step: 26070 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-03 21:11:14,681-Speed 13855.06 samples/sec Loss 2.2695 LearningRate 0.0005 Epoch: 15 Global Step: 26080 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-03 21:11:32,407-Speed 13864.92 samples/sec Loss 2.2732 LearningRate 0.0005 Epoch: 15 Global Step: 26090 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-03 21:11:50,096-Speed 13894.29 samples/sec Loss 2.2832 LearningRate 0.0005 Epoch: 15 Global Step: 26100 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-03 21:12:07,801-Speed 13882.11 samples/sec Loss 2.2637 LearningRate 0.0005 Epoch: 15 Global Step: 26110 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-03 21:12:25,598-Speed 13810.28 samples/sec Loss 2.2688 LearningRate 0.0005 Epoch: 15 Global Step: 26120 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-03 21:12:43,308-Speed 13878.15 samples/sec Loss 2.2835 LearningRate 0.0005 Epoch: 15 Global Step: 26130 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-03 21:13:01,132-Speed 13788.62 samples/sec Loss 2.2808 LearningRate 0.0005 Epoch: 15 Global Step: 26140 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-03 21:13:18,924-Speed 13814.17 samples/sec Loss 2.2439 LearningRate 0.0005 Epoch: 15 Global Step: 26150 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-03 21:13:36,667-Speed 13852.32 samples/sec Loss 2.2708 LearningRate 0.0005 Epoch: 15 Global Step: 26160 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-03 21:13:54,435-Speed 13832.35 samples/sec Loss 2.2790 LearningRate 0.0005 Epoch: 15 Global Step: 26170 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-03 21:14:12,127-Speed 13891.21 samples/sec Loss 2.2611 LearningRate 0.0005 Epoch: 15 Global Step: 26180 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-03 21:14:29,819-Speed 13892.66 samples/sec Loss 2.2826 LearningRate 0.0005 Epoch: 15 Global Step: 26190 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-03-03 21:14:47,494-Speed 13905.00 samples/sec Loss 2.2700 LearningRate 0.0005 Epoch: 15 Global Step: 26200 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-03 21:15:05,247-Speed 13844.92 samples/sec Loss 2.2573 LearningRate 0.0005 Epoch: 15 Global Step: 26210 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-03 21:15:23,092-Speed 13772.71 samples/sec Loss 2.2860 LearningRate 0.0005 Epoch: 15 Global Step: 26220 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-03 21:15:40,866-Speed 13827.51 samples/sec Loss 2.2854 LearningRate 0.0005 Epoch: 15 Global Step: 26230 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-03 21:15:58,547-Speed 13901.03 samples/sec Loss 2.2564 LearningRate 0.0005 Epoch: 15 Global Step: 26240 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-03 21:16:16,301-Speed 13843.72 samples/sec Loss 2.2540 LearningRate 0.0005 Epoch: 15 Global Step: 26250 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-03 21:16:34,012-Speed 13877.06 samples/sec Loss 2.2624 LearningRate 0.0005 Epoch: 15 Global Step: 26260 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-03 21:16:51,812-Speed 13807.29 samples/sec Loss 2.2720 LearningRate 0.0005 Epoch: 15 Global Step: 26270 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-03 21:17:09,579-Speed 13833.39 samples/sec Loss 2.2646 LearningRate 0.0005 Epoch: 15 Global Step: 26280 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-03 21:17:27,321-Speed 13852.83 samples/sec Loss 2.2487 LearningRate 0.0005 Epoch: 15 Global Step: 26290 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-03 21:17:45,078-Speed 13841.70 samples/sec Loss 2.2542 LearningRate 0.0005 Epoch: 15 Global Step: 26300 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-03-03 21:18:02,799-Speed 13868.83 samples/sec Loss 2.2872 LearningRate 0.0005 Epoch: 15 Global Step: 26310 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-03 21:18:20,569-Speed 13830.99 samples/sec Loss 2.2630 LearningRate 0.0005 Epoch: 15 Global Step: 26320 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-03-03 21:18:38,371-Speed 13806.61 samples/sec Loss 2.2659 LearningRate 0.0005 Epoch: 15 Global Step: 26330 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-03 21:18:56,342-Speed 13676.79 samples/sec Loss 2.2576 LearningRate 0.0005 Epoch: 15 Global Step: 26340 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-03 21:19:14,363-Speed 13638.19 samples/sec Loss 2.2755 LearningRate 0.0005 Epoch: 15 Global Step: 26350 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-03 21:19:32,362-Speed 13656.62 samples/sec Loss 2.2615 LearningRate 0.0005 Epoch: 15 Global Step: 26360 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-03 21:19:50,051-Speed 13893.90 samples/sec Loss 2.2614 LearningRate 0.0005 Epoch: 15 Global Step: 26370 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-03 21:20:07,852-Speed 13806.89 samples/sec Loss 2.2738 LearningRate 0.0005 Epoch: 15 Global Step: 26380 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-03 21:20:25,594-Speed 13852.66 samples/sec Loss 2.2594 LearningRate 0.0005 Epoch: 15 Global Step: 26390 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-03 21:20:43,290-Speed 13888.49 samples/sec Loss 2.2330 LearningRate 0.0005 Epoch: 15 Global Step: 26400 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-03 21:21:00,984-Speed 13890.63 samples/sec Loss 2.2782 LearningRate 0.0005 Epoch: 15 Global Step: 26410 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-03-03 21:21:18,720-Speed 13857.43 samples/sec Loss 2.2600 LearningRate 0.0005 Epoch: 15 Global Step: 26420 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-03-03 21:21:36,414-Speed 13890.83 samples/sec Loss 2.2550 LearningRate 0.0005 Epoch: 15 Global Step: 26430 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-03 21:21:54,107-Speed 13890.52 samples/sec Loss 2.2499 LearningRate 0.0005 Epoch: 15 Global Step: 26440 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-03 21:22:11,844-Speed 13856.99 samples/sec Loss 2.2556 LearningRate 0.0005 Epoch: 15 Global Step: 26450 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-03 21:22:29,560-Speed 13872.90 samples/sec Loss 2.2443 LearningRate 0.0005 Epoch: 15 Global Step: 26460 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-03 21:22:47,368-Speed 13801.46 samples/sec Loss 2.2542 LearningRate 0.0005 Epoch: 15 Global Step: 26470 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-03-03 21:23:05,200-Speed 13782.84 samples/sec Loss 2.2732 LearningRate 0.0005 Epoch: 15 Global Step: 26480 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-03-03 21:23:22,957-Speed 13840.88 samples/sec Loss 2.2480 LearningRate 0.0005 Epoch: 15 Global Step: 26490 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-03-03 21:23:40,652-Speed 13890.20 samples/sec Loss 2.2529 LearningRate 0.0005 Epoch: 15 Global Step: 26500 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-03-03 21:23:58,351-Speed 13886.06 samples/sec Loss 2.2377 LearningRate 0.0005 Epoch: 15 Global Step: 26510 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-03-03 21:24:16,071-Speed 13869.48 samples/sec Loss 2.2417 LearningRate 0.0005 Epoch: 15 Global Step: 26520 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-03-03 21:24:33,861-Speed 13816.14 samples/sec Loss 2.2598 LearningRate 0.0005 Epoch: 15 Global Step: 26530 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-03-03 21:24:51,626-Speed 13834.31 samples/sec Loss 2.2586 LearningRate 0.0005 Epoch: 15 Global Step: 26540 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-03-03 21:25:09,339-Speed 13875.76 samples/sec Loss 2.2510 LearningRate 0.0005 Epoch: 15 Global Step: 26550 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-03-03 21:25:27,083-Speed 13851.22 samples/sec Loss 2.2541 LearningRate 0.0005 Epoch: 15 Global Step: 26560 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-03-03 21:25:44,855-Speed 13829.62 samples/sec Loss 2.2194 LearningRate 0.0005 Epoch: 15 Global Step: 26570 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-03 21:26:02,606-Speed 13846.41 samples/sec Loss 2.2452 LearningRate 0.0005 Epoch: 15 Global Step: 26580 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-03 21:26:20,408-Speed 13805.32 samples/sec Loss 2.2334 LearningRate 0.0005 Epoch: 15 Global Step: 26590 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-03 21:26:38,121-Speed 13875.49 samples/sec Loss 2.2547 LearningRate 0.0005 Epoch: 15 Global Step: 26600 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-03-03 21:26:55,864-Speed 13852.08 samples/sec Loss 2.2536 LearningRate 0.0005 Epoch: 15 Global Step: 26610 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-03-03 21:27:13,678-Speed 13796.96 samples/sec Loss 2.2387 LearningRate 0.0005 Epoch: 15 Global Step: 26620 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-03-03 21:27:31,406-Speed 13863.42 samples/sec Loss 2.2336 LearningRate 0.0005 Epoch: 15 Global Step: 26630 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-03-03 21:27:49,282-Speed 13749.32 samples/sec Loss 2.2466 LearningRate 0.0005 Epoch: 15 Global Step: 26640 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-03-03 21:28:07,092-Speed 13799.37 samples/sec Loss 2.2697 LearningRate 0.0005 Epoch: 15 Global Step: 26650 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-03-03 21:28:24,890-Speed 13809.53 samples/sec Loss 2.2486 LearningRate 0.0005 Epoch: 15 Global Step: 26660 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-03-03 21:28:42,684-Speed 13812.53 samples/sec Loss 2.2335 LearningRate 0.0005 Epoch: 15 Global Step: 26670 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-03-03 21:29:00,447-Speed 13836.08 samples/sec Loss 2.2383 LearningRate 0.0005 Epoch: 15 Global Step: 26680 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-03-03 21:29:18,202-Speed 13843.91 samples/sec Loss 2.2530 LearningRate 0.0005 Epoch: 15 Global Step: 26690 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-03-03 21:29:35,929-Speed 13864.21 samples/sec Loss 2.2257 LearningRate 0.0005 Epoch: 15 Global Step: 26700 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-03 21:29:53,673-Speed 13851.29 samples/sec Loss 2.2326 LearningRate 0.0005 Epoch: 15 Global Step: 26710 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-03 21:30:11,524-Speed 13768.34 samples/sec Loss 2.2278 LearningRate 0.0005 Epoch: 15 Global Step: 26720 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-03 21:30:29,363-Speed 13776.97 samples/sec Loss 2.2307 LearningRate 0.0005 Epoch: 15 Global Step: 26730 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-03 21:30:47,053-Speed 13893.62 samples/sec Loss 2.2251 LearningRate 0.0005 Epoch: 15 Global Step: 26740 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-03 21:31:04,795-Speed 13853.09 samples/sec Loss 2.2476 LearningRate 0.0005 Epoch: 15 Global Step: 26750 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-03 21:31:22,512-Speed 13872.38 samples/sec Loss 2.2663 LearningRate 0.0005 Epoch: 15 Global Step: 26760 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-03-03 21:31:40,287-Speed 13826.91 samples/sec Loss 2.2332 LearningRate 0.0005 Epoch: 15 Global Step: 26770 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-03-03 21:31:57,971-Speed 13898.35 samples/sec Loss 2.2301 LearningRate 0.0005 Epoch: 15 Global Step: 26780 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-03-03 21:32:15,734-Speed 13836.13 samples/sec Loss 2.2337 LearningRate 0.0005 Epoch: 15 Global Step: 26790 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-03-03 21:32:33,487-Speed 13844.70 samples/sec Loss 2.2548 LearningRate 0.0005 Epoch: 15 Global Step: 26800 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-03-03 21:32:51,227-Speed 13854.63 samples/sec Loss 2.2525 LearningRate 0.0005 Epoch: 15 Global Step: 26810 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-03-03 21:33:09,048-Speed 13791.44 samples/sec Loss 2.2277 LearningRate 0.0005 Epoch: 15 Global Step: 26820 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-03-03 21:33:26,791-Speed 13852.15 samples/sec Loss 2.2164 LearningRate 0.0005 Epoch: 15 Global Step: 26830 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-03-03 21:33:44,604-Speed 13798.81 samples/sec Loss 2.2144 LearningRate 0.0005 Epoch: 15 Global Step: 26840 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-03-03 21:34:02,371-Speed 13833.49 samples/sec Loss 2.2457 LearningRate 0.0005 Epoch: 15 Global Step: 26850 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-03-03 21:34:20,106-Speed 13857.60 samples/sec Loss 2.2269 LearningRate 0.0005 Epoch: 15 Global Step: 26860 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-03 21:34:37,946-Speed 13777.01 samples/sec Loss 2.2047 LearningRate 0.0005 Epoch: 15 Global Step: 26870 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-03 21:34:55,784-Speed 13778.54 samples/sec Loss 2.2236 LearningRate 0.0005 Epoch: 15 Global Step: 26880 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-03 21:35:13,514-Speed 13862.23 samples/sec Loss 2.2455 LearningRate 0.0005 Epoch: 15 Global Step: 26890 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-03 21:35:31,340-Speed 13787.49 samples/sec Loss 2.2452 LearningRate 0.0005 Epoch: 15 Global Step: 26900 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-03 21:35:49,125-Speed 13818.85 samples/sec Loss 2.2176 LearningRate 0.0005 Epoch: 15 Global Step: 26910 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-03-03 21:36:06,913-Speed 13817.11 samples/sec Loss 2.2377 LearningRate 0.0005 Epoch: 15 Global Step: 26920 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-03-03 21:36:24,707-Speed 13812.17 samples/sec Loss 2.2427 LearningRate 0.0005 Epoch: 15 Global Step: 26930 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-03-03 21:36:42,603-Speed 13733.90 samples/sec Loss 2.2212 LearningRate 0.0005 Epoch: 15 Global Step: 26940 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-03-03 21:37:00,314-Speed 13877.65 samples/sec Loss 2.2178 LearningRate 0.0005 Epoch: 15 Global Step: 26950 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-03-03 21:37:18,077-Speed 13837.32 samples/sec Loss 2.2157 LearningRate 0.0005 Epoch: 15 Global Step: 26960 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-03-03 21:37:35,866-Speed 13815.71 samples/sec Loss 2.2103 LearningRate 0.0005 Epoch: 15 Global Step: 26970 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-03-03 21:37:53,616-Speed 13847.37 samples/sec Loss 2.2060 LearningRate 0.0005 Epoch: 15 Global Step: 26980 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-03-03 21:38:11,327-Speed 13876.83 samples/sec Loss 2.2122 LearningRate 0.0005 Epoch: 15 Global Step: 26990 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-03-03 21:38:29,055-Speed 13864.21 samples/sec Loss 2.2086 LearningRate 0.0005 Epoch: 15 Global Step: 27000 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-03-03 21:38:46,800-Speed 13850.60 samples/sec Loss 2.2083 LearningRate 0.0005 Epoch: 15 Global Step: 27010 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-03-03 21:39:04,576-Speed 13825.69 samples/sec Loss 2.1937 LearningRate 0.0005 Epoch: 15 Global Step: 27020 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-03-03 21:39:22,359-Speed 13820.92 samples/sec Loss 2.2043 LearningRate 0.0005 Epoch: 15 Global Step: 27030 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-03-03 21:39:40,011-Speed 13923.45 samples/sec Loss 2.2229 LearningRate 0.0005 Epoch: 15 Global Step: 27040 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-03-03 21:39:57,802-Speed 13814.87 samples/sec Loss 2.2096 LearningRate 0.0005 Epoch: 15 Global Step: 27050 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-03-03 21:40:15,553-Speed 13845.78 samples/sec Loss 2.2217 LearningRate 0.0005 Epoch: 15 Global Step: 27060 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-03-03 21:40:33,275-Speed 13868.42 samples/sec Loss 2.2208 LearningRate 0.0005 Epoch: 15 Global Step: 27070 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-03-03 21:40:51,052-Speed 13825.64 samples/sec Loss 2.2057 LearningRate 0.0005 Epoch: 15 Global Step: 27080 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-03-03 21:41:08,803-Speed 13848.84 samples/sec Loss 2.2295 LearningRate 0.0005 Epoch: 15 Global Step: 27090 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-03-03 21:41:26,566-Speed 13836.27 samples/sec Loss 2.1998 LearningRate 0.0005 Epoch: 15 Global Step: 27100 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-03-03 21:41:44,390-Speed 13788.79 samples/sec Loss 2.1963 LearningRate 0.0005 Epoch: 15 Global Step: 27110 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-03-03 21:42:02,116-Speed 13865.60 samples/sec Loss 2.2146 LearningRate 0.0005 Epoch: 15 Global Step: 27120 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-03 21:42:19,852-Speed 13857.04 samples/sec Loss 2.2266 LearningRate 0.0005 Epoch: 15 Global Step: 27130 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-03 21:42:37,723-Speed 13752.88 samples/sec Loss 2.2069 LearningRate 0.0005 Epoch: 15 Global Step: 27140 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-03 21:42:55,461-Speed 13855.74 samples/sec Loss 2.2148 LearningRate 0.0005 Epoch: 15 Global Step: 27150 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-03 21:43:13,205-Speed 13852.23 samples/sec Loss 2.2094 LearningRate 0.0005 Epoch: 15 Global Step: 27160 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-03 21:43:30,916-Speed 13877.13 samples/sec Loss 2.2054 LearningRate 0.0005 Epoch: 15 Global Step: 27170 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-03 21:43:48,649-Speed 13859.83 samples/sec Loss 2.2057 LearningRate 0.0005 Epoch: 15 Global Step: 27180 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-03 21:44:06,471-Speed 13790.54 samples/sec Loss 2.2259 LearningRate 0.0005 Epoch: 15 Global Step: 27190 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-03 21:44:24,240-Speed 13831.71 samples/sec Loss 2.2169 LearningRate 0.0005 Epoch: 15 Global Step: 27200 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-03 21:44:42,009-Speed 13831.55 samples/sec Loss 2.2037 LearningRate 0.0005 Epoch: 15 Global Step: 27210 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-03 21:44:59,735-Speed 13865.40 samples/sec Loss 2.2098 LearningRate 0.0005 Epoch: 15 Global Step: 27220 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-03-03 21:45:17,524-Speed 13816.17 samples/sec Loss 2.2193 LearningRate 0.0005 Epoch: 15 Global Step: 27230 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-03 21:45:35,264-Speed 13854.39 samples/sec Loss 2.2031 LearningRate 0.0005 Epoch: 15 Global Step: 27240 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-03 21:45:52,982-Speed 13871.36 samples/sec Loss 2.1951 LearningRate 0.0005 Epoch: 15 Global Step: 27250 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-03 21:46:10,736-Speed 13843.54 samples/sec Loss 2.1931 LearningRate 0.0005 Epoch: 15 Global Step: 27260 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-03 21:46:28,492-Speed 13841.63 samples/sec Loss 2.1752 LearningRate 0.0005 Epoch: 15 Global Step: 27270 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-03 21:46:46,275-Speed 13821.84 samples/sec Loss 2.1712 LearningRate 0.0005 Epoch: 15 Global Step: 27280 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-03 21:47:04,076-Speed 13807.04 samples/sec Loss 2.2022 LearningRate 0.0005 Epoch: 15 Global Step: 27290 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-03 21:47:21,820-Speed 13851.01 samples/sec Loss 2.2140 LearningRate 0.0005 Epoch: 15 Global Step: 27300 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-03 21:47:39,551-Speed 13860.85 samples/sec Loss 2.2049 LearningRate 0.0005 Epoch: 15 Global Step: 27310 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-03 21:47:57,334-Speed 13821.35 samples/sec Loss 2.1902 LearningRate 0.0005 Epoch: 15 Global Step: 27320 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-03 21:48:15,334-Speed 13653.70 samples/sec Loss 2.2091 LearningRate 0.0005 Epoch: 15 Global Step: 27330 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-03-03 21:48:33,083-Speed 13847.53 samples/sec Loss 2.1959 LearningRate 0.0005 Epoch: 15 Global Step: 27340 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-03 21:48:50,896-Speed 13797.42 samples/sec Loss 2.1867 LearningRate 0.0005 Epoch: 15 Global Step: 27350 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-03 21:49:08,658-Speed 13837.04 samples/sec Loss 2.1984 LearningRate 0.0005 Epoch: 15 Global Step: 27360 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-03 21:49:26,392-Speed 13859.38 samples/sec Loss 2.1926 LearningRate 0.0005 Epoch: 15 Global Step: 27370 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-03 21:49:44,113-Speed 13869.27 samples/sec Loss 2.1798 LearningRate 0.0005 Epoch: 15 Global Step: 27380 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-03-03 21:50:01,808-Speed 13889.27 samples/sec Loss 2.1876 LearningRate 0.0004 Epoch: 15 Global Step: 27390 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-03-03 21:50:19,639-Speed 13783.98 samples/sec Loss 2.2054 LearningRate 0.0004 Epoch: 15 Global Step: 27400 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-03-03 21:50:37,386-Speed 13848.41 samples/sec Loss 2.2057 LearningRate 0.0004 Epoch: 15 Global Step: 27410 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-03-03 21:50:55,144-Speed 13840.84 samples/sec Loss 2.2118 LearningRate 0.0004 Epoch: 15 Global Step: 27420 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-03-03 21:51:12,904-Speed 13838.31 samples/sec Loss 2.2048 LearningRate 0.0004 Epoch: 15 Global Step: 27430 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-03-03 21:51:30,619-Speed 13874.54 samples/sec Loss 2.2027 LearningRate 0.0004 Epoch: 15 Global Step: 27440 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-03-03 21:51:48,311-Speed 13892.00 samples/sec Loss 2.1861 LearningRate 0.0004 Epoch: 15 Global Step: 27450 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-03-03 21:52:06,072-Speed 13838.20 samples/sec Loss 2.1889 LearningRate 0.0004 Epoch: 15 Global Step: 27460 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-03-03 21:52:23,866-Speed 13812.07 samples/sec Loss 2.1855 LearningRate 0.0004 Epoch: 15 Global Step: 27470 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-03-03 21:52:41,757-Speed 13737.16 samples/sec Loss 2.1925 LearningRate 0.0004 Epoch: 15 Global Step: 27480 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-03 21:52:59,557-Speed 13808.00 samples/sec Loss 2.1676 LearningRate 0.0004 Epoch: 15 Global Step: 27490 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-03 21:53:17,299-Speed 13852.71 samples/sec Loss 2.2039 LearningRate 0.0004 Epoch: 15 Global Step: 27500 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-03 21:53:35,002-Speed 13883.20 samples/sec Loss 2.1829 LearningRate 0.0004 Epoch: 15 Global Step: 27510 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-03 21:53:52,766-Speed 13835.78 samples/sec Loss 2.1985 LearningRate 0.0004 Epoch: 15 Global Step: 27520 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-03 21:54:10,506-Speed 13854.18 samples/sec Loss 2.2028 LearningRate 0.0004 Epoch: 15 Global Step: 27530 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-03 21:54:28,222-Speed 13873.43 samples/sec Loss 2.1772 LearningRate 0.0004 Epoch: 15 Global Step: 27540 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-03 21:54:45,917-Speed 13889.30 samples/sec Loss 2.1949 LearningRate 0.0004 Epoch: 15 Global Step: 27550 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-03 21:55:03,646-Speed 13862.77 samples/sec Loss 2.1868 LearningRate 0.0004 Epoch: 15 Global Step: 27560 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-03 21:55:21,309-Speed 13914.96 samples/sec Loss 2.1939 LearningRate 0.0004 Epoch: 15 Global Step: 27570 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-03 21:55:39,046-Speed 13856.29 samples/sec Loss 2.2007 LearningRate 0.0004 Epoch: 15 Global Step: 27580 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-03-03 21:55:56,854-Speed 13801.93 samples/sec Loss 2.2042 LearningRate 0.0004 Epoch: 15 Global Step: 27590 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-03-03 21:56:14,512-Speed 13918.29 samples/sec Loss 2.1997 LearningRate 0.0004 Epoch: 15 Global Step: 27600 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-03 21:56:32,242-Speed 13862.25 samples/sec Loss 2.2056 LearningRate 0.0004 Epoch: 15 Global Step: 27610 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-03 21:56:49,916-Speed 13906.40 samples/sec Loss 2.2008 LearningRate 0.0004 Epoch: 15 Global Step: 27620 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-03 21:57:07,626-Speed 13877.93 samples/sec Loss 2.1994 LearningRate 0.0004 Epoch: 15 Global Step: 27630 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-03 21:57:25,327-Speed 13885.33 samples/sec Loss 2.2024 LearningRate 0.0004 Epoch: 15 Global Step: 27640 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-03-03 21:57:43,012-Speed 13896.96 samples/sec Loss 2.2014 LearningRate 0.0004 Epoch: 15 Global Step: 27650 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-03-03 21:58:52,213-Speed 3551.46 samples/sec Loss 2.1591 LearningRate 0.0004 Epoch: 16 Global Step: 27660 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-03-03 21:59:09,912-Speed 13886.78 samples/sec Loss 2.1601 LearningRate 0.0004 Epoch: 16 Global Step: 27670 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-03-03 21:59:27,637-Speed 13865.47 samples/sec Loss 2.1613 LearningRate 0.0004 Epoch: 16 Global Step: 27680 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-03-03 21:59:45,359-Speed 13868.98 samples/sec Loss 2.1445 LearningRate 0.0004 Epoch: 16 Global Step: 27690 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-03-03 22:00:03,152-Speed 13812.64 samples/sec Loss 2.1483 LearningRate 0.0004 Epoch: 16 Global Step: 27700 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-03-03 22:00:20,983-Speed 13783.74 samples/sec Loss 2.1789 LearningRate 0.0004 Epoch: 16 Global Step: 27710 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-03-03 22:00:38,797-Speed 13796.62 samples/sec Loss 2.1616 LearningRate 0.0004 Epoch: 16 Global Step: 27720 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-03-03 22:00:56,496-Speed 13886.29 samples/sec Loss 2.1540 LearningRate 0.0004 Epoch: 16 Global Step: 27730 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-03-03 22:01:14,245-Speed 13847.68 samples/sec Loss 2.1600 LearningRate 0.0004 Epoch: 16 Global Step: 27740 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-03 22:01:31,981-Speed 13859.56 samples/sec Loss 2.1760 LearningRate 0.0004 Epoch: 16 Global Step: 27750 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-03 22:01:49,681-Speed 13885.91 samples/sec Loss 2.1615 LearningRate 0.0004 Epoch: 16 Global Step: 27760 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-03 22:02:07,526-Speed 13772.41 samples/sec Loss 2.1704 LearningRate 0.0004 Epoch: 16 Global Step: 27770 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-03 22:02:25,333-Speed 13802.62 samples/sec Loss 2.1620 LearningRate 0.0004 Epoch: 16 Global Step: 27780 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-03 22:02:43,129-Speed 13810.19 samples/sec Loss 2.1487 LearningRate 0.0004 Epoch: 16 Global Step: 27790 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-03 22:03:00,878-Speed 13847.18 samples/sec Loss 2.1607 LearningRate 0.0004 Epoch: 16 Global Step: 27800 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-03 22:03:18,688-Speed 13800.33 samples/sec Loss 2.1693 LearningRate 0.0004 Epoch: 16 Global Step: 27810 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-03 22:03:36,604-Speed 13718.48 samples/sec Loss 2.1549 LearningRate 0.0004 Epoch: 16 Global Step: 27820 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-03 22:03:54,494-Speed 13737.69 samples/sec Loss 2.1573 LearningRate 0.0004 Epoch: 16 Global Step: 27830 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-03 22:04:12,270-Speed 13826.59 samples/sec Loss 2.1837 LearningRate 0.0004 Epoch: 16 Global Step: 27840 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-03 22:04:30,330-Speed 13608.50 samples/sec Loss 2.1758 LearningRate 0.0004 Epoch: 16 Global Step: 27850 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-03 22:04:48,547-Speed 13491.36 samples/sec Loss 2.1577 LearningRate 0.0004 Epoch: 16 Global Step: 27860 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-03 22:05:06,671-Speed 13561.05 samples/sec Loss 2.1531 LearningRate 0.0004 Epoch: 16 Global Step: 27870 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-03 22:05:24,977-Speed 13426.23 samples/sec Loss 2.1605 LearningRate 0.0004 Epoch: 16 Global Step: 27880 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-03 22:05:43,118-Speed 13547.64 samples/sec Loss 2.1713 LearningRate 0.0004 Epoch: 16 Global Step: 27890 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-03 22:06:01,231-Speed 13569.46 samples/sec Loss 2.1927 LearningRate 0.0004 Epoch: 16 Global Step: 27900 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-03 22:06:19,337-Speed 13575.16 samples/sec Loss 2.1579 LearningRate 0.0004 Epoch: 16 Global Step: 27910 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-03 22:06:37,478-Speed 13549.44 samples/sec Loss 2.1437 LearningRate 0.0004 Epoch: 16 Global Step: 27920 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-03 22:06:55,647-Speed 13526.97 samples/sec Loss 2.1700 LearningRate 0.0004 Epoch: 16 Global Step: 27930 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-03 22:07:13,669-Speed 13637.88 samples/sec Loss 2.2111 LearningRate 0.0004 Epoch: 16 Global Step: 27940 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-03-03 22:07:31,816-Speed 13543.82 samples/sec Loss 2.1564 LearningRate 0.0004 Epoch: 16 Global Step: 27950 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-03-03 22:07:49,943-Speed 13558.64 samples/sec Loss 2.1498 LearningRate 0.0004 Epoch: 16 Global Step: 27960 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-03-03 22:08:08,032-Speed 13587.41 samples/sec Loss 2.1496 LearningRate 0.0004 Epoch: 16 Global Step: 27970 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-03-03 22:08:26,162-Speed 13556.67 samples/sec Loss 2.1456 LearningRate 0.0004 Epoch: 16 Global Step: 27980 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-03-03 22:08:44,266-Speed 13576.53 samples/sec Loss 2.1577 LearningRate 0.0004 Epoch: 16 Global Step: 27990 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-03-03 22:09:02,444-Speed 13520.37 samples/sec Loss 2.1723 LearningRate 0.0004 Epoch: 16 Global Step: 28000 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-03-03 22:09:20,587-Speed 13546.90 samples/sec Loss 2.1730 LearningRate 0.0004 Epoch: 16 Global Step: 28010 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-03-03 22:09:38,726-Speed 13549.19 samples/sec Loss 2.1675 LearningRate 0.0004 Epoch: 16 Global Step: 28020 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-03-03 22:09:56,808-Speed 13592.18 samples/sec Loss 2.1485 LearningRate 0.0004 Epoch: 16 Global Step: 28030 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-03-03 22:10:14,956-Speed 13543.08 samples/sec Loss 2.1378 LearningRate 0.0004 Epoch: 16 Global Step: 28040 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-03-03 22:10:33,065-Speed 13571.87 samples/sec Loss 2.1487 LearningRate 0.0004 Epoch: 16 Global Step: 28050 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-03-03 22:10:51,285-Speed 13489.41 samples/sec Loss 2.1592 LearningRate 0.0004 Epoch: 16 Global Step: 28060 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-03-03 22:11:09,432-Speed 13543.58 samples/sec Loss 2.1425 LearningRate 0.0004 Epoch: 16 Global Step: 28070 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-03-03 22:11:27,526-Speed 13583.49 samples/sec Loss 2.1262 LearningRate 0.0004 Epoch: 16 Global Step: 28080 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-03-03 22:11:45,699-Speed 13524.12 samples/sec Loss 2.1511 LearningRate 0.0004 Epoch: 16 Global Step: 28090 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-03-03 22:12:03,862-Speed 13531.76 samples/sec Loss 2.1548 LearningRate 0.0004 Epoch: 16 Global Step: 28100 Fp16 Grad Scale: 16384 Required: 21 hours Training: 2022-03-03 22:12:21,952-Speed 13585.92 samples/sec Loss 2.1307 LearningRate 0.0004 Epoch: 16 Global Step: 28110 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-03-03 22:12:40,049-Speed 13581.02 samples/sec Loss 2.1525 LearningRate 0.0004 Epoch: 16 Global Step: 28120 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-03-03 22:12:58,145-Speed 13581.68 samples/sec Loss 2.1537 LearningRate 0.0004 Epoch: 16 Global Step: 28130 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-03-03 22:13:16,312-Speed 13529.09 samples/sec Loss 2.1466 LearningRate 0.0004 Epoch: 16 Global Step: 28140 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-03-03 22:13:34,437-Speed 13559.91 samples/sec Loss 2.1738 LearningRate 0.0004 Epoch: 16 Global Step: 28150 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-03-03 22:13:52,541-Speed 13575.79 samples/sec Loss 2.1670 LearningRate 0.0004 Epoch: 16 Global Step: 28160 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-03-03 22:14:10,663-Speed 13562.52 samples/sec Loss 2.1318 LearningRate 0.0004 Epoch: 16 Global Step: 28170 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-03-03 22:14:28,791-Speed 13557.44 samples/sec Loss 2.1464 LearningRate 0.0004 Epoch: 16 Global Step: 28180 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-03-03 22:14:46,874-Speed 13591.18 samples/sec Loss 2.1585 LearningRate 0.0004 Epoch: 16 Global Step: 28190 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-03-03 22:15:05,049-Speed 13523.42 samples/sec Loss 2.1399 LearningRate 0.0004 Epoch: 16 Global Step: 28200 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-03-03 22:15:23,145-Speed 13581.22 samples/sec Loss 2.1471 LearningRate 0.0004 Epoch: 16 Global Step: 28210 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-03 22:15:41,254-Speed 13572.06 samples/sec Loss 2.1474 LearningRate 0.0004 Epoch: 16 Global Step: 28220 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-03 22:15:59,362-Speed 13573.01 samples/sec Loss 2.1499 LearningRate 0.0004 Epoch: 16 Global Step: 28230 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-03 22:16:17,429-Speed 13604.00 samples/sec Loss 2.1186 LearningRate 0.0004 Epoch: 16 Global Step: 28240 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-03 22:16:35,540-Speed 13570.51 samples/sec Loss 2.1396 LearningRate 0.0004 Epoch: 16 Global Step: 28250 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-03 22:16:53,644-Speed 13575.59 samples/sec Loss 2.1428 LearningRate 0.0004 Epoch: 16 Global Step: 28260 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-03 22:17:11,744-Speed 13579.50 samples/sec Loss 2.1509 LearningRate 0.0004 Epoch: 16 Global Step: 28270 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-03 22:17:29,983-Speed 13474.68 samples/sec Loss 2.1284 LearningRate 0.0004 Epoch: 16 Global Step: 28280 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-03 22:17:48,140-Speed 13536.09 samples/sec Loss 2.1428 LearningRate 0.0004 Epoch: 16 Global Step: 28290 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-03 22:18:06,279-Speed 13549.91 samples/sec Loss 2.1371 LearningRate 0.0004 Epoch: 16 Global Step: 28300 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-03 22:18:24,380-Speed 13577.93 samples/sec Loss 2.1412 LearningRate 0.0004 Epoch: 16 Global Step: 28310 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-03-03 22:18:42,397-Speed 13643.73 samples/sec Loss 2.1202 LearningRate 0.0004 Epoch: 16 Global Step: 28320 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-03-03 22:19:00,506-Speed 13571.56 samples/sec Loss 2.1348 LearningRate 0.0004 Epoch: 16 Global Step: 28330 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-03 22:19:18,589-Speed 13591.57 samples/sec Loss 2.1664 LearningRate 0.0004 Epoch: 16 Global Step: 28340 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-03 22:19:36,772-Speed 13516.98 samples/sec Loss 2.1343 LearningRate 0.0004 Epoch: 16 Global Step: 28350 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-03 22:19:54,928-Speed 13536.46 samples/sec Loss 2.1306 LearningRate 0.0004 Epoch: 16 Global Step: 28360 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-03 22:20:13,014-Speed 13589.79 samples/sec Loss 2.1267 LearningRate 0.0004 Epoch: 16 Global Step: 28370 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-03 22:20:31,073-Speed 13609.66 samples/sec Loss 2.1528 LearningRate 0.0004 Epoch: 16 Global Step: 28380 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-03 22:20:49,165-Speed 13584.33 samples/sec Loss 2.1192 LearningRate 0.0004 Epoch: 16 Global Step: 28390 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-03 22:21:07,338-Speed 13524.92 samples/sec Loss 2.1254 LearningRate 0.0004 Epoch: 16 Global Step: 28400 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-03 22:21:25,406-Speed 13602.33 samples/sec Loss 2.1498 LearningRate 0.0004 Epoch: 16 Global Step: 28410 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-03 22:21:43,504-Speed 13580.13 samples/sec Loss 2.1244 LearningRate 0.0004 Epoch: 16 Global Step: 28420 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-03 22:22:01,650-Speed 13544.48 samples/sec Loss 2.1258 LearningRate 0.0004 Epoch: 16 Global Step: 28430 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-03 22:22:19,744-Speed 13583.51 samples/sec Loss 2.1410 LearningRate 0.0004 Epoch: 16 Global Step: 28440 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-03 22:22:37,818-Speed 13598.29 samples/sec Loss 2.1441 LearningRate 0.0004 Epoch: 16 Global Step: 28450 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-03 22:22:55,855-Speed 13627.62 samples/sec Loss 2.1259 LearningRate 0.0004 Epoch: 16 Global Step: 28460 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-03 22:23:13,954-Speed 13579.30 samples/sec Loss 2.1111 LearningRate 0.0004 Epoch: 16 Global Step: 28470 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-03 22:23:32,086-Speed 13555.15 samples/sec Loss 2.1286 LearningRate 0.0004 Epoch: 16 Global Step: 28480 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-03 22:23:50,131-Speed 13620.52 samples/sec Loss 2.1287 LearningRate 0.0004 Epoch: 16 Global Step: 28490 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-03 22:24:08,224-Speed 13583.45 samples/sec Loss 2.1261 LearningRate 0.0004 Epoch: 16 Global Step: 28500 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-03 22:24:26,303-Speed 13595.02 samples/sec Loss 2.1159 LearningRate 0.0004 Epoch: 16 Global Step: 28510 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-03 22:24:44,028-Speed 13865.96 samples/sec Loss 2.1300 LearningRate 0.0004 Epoch: 16 Global Step: 28520 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-03 22:25:01,792-Speed 13835.75 samples/sec Loss 2.1236 LearningRate 0.0004 Epoch: 16 Global Step: 28530 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-03 22:25:19,485-Speed 13891.32 samples/sec Loss 2.1095 LearningRate 0.0004 Epoch: 16 Global Step: 28540 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-03 22:25:37,199-Speed 13873.89 samples/sec Loss 2.1463 LearningRate 0.0004 Epoch: 16 Global Step: 28550 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-03 22:25:54,971-Speed 13830.16 samples/sec Loss 2.1169 LearningRate 0.0004 Epoch: 16 Global Step: 28560 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-03 22:26:12,796-Speed 13787.69 samples/sec Loss 2.1080 LearningRate 0.0004 Epoch: 16 Global Step: 28570 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-03 22:26:30,618-Speed 13790.56 samples/sec Loss 2.1132 LearningRate 0.0004 Epoch: 16 Global Step: 28580 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-03 22:26:48,390-Speed 13829.24 samples/sec Loss 2.1210 LearningRate 0.0004 Epoch: 16 Global Step: 28590 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-03 22:27:06,170-Speed 13823.13 samples/sec Loss 2.1141 LearningRate 0.0004 Epoch: 16 Global Step: 28600 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-03 22:27:24,007-Speed 13779.64 samples/sec Loss 2.1291 LearningRate 0.0004 Epoch: 16 Global Step: 28610 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-03 22:27:41,937-Speed 13707.37 samples/sec Loss 2.1151 LearningRate 0.0004 Epoch: 16 Global Step: 28620 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-03 22:27:59,744-Speed 13802.38 samples/sec Loss 2.1169 LearningRate 0.0004 Epoch: 16 Global Step: 28630 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-03 22:28:17,609-Speed 13757.94 samples/sec Loss 2.1280 LearningRate 0.0004 Epoch: 16 Global Step: 28640 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-03 22:28:35,469-Speed 13761.21 samples/sec Loss 2.1262 LearningRate 0.0004 Epoch: 16 Global Step: 28650 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-03 22:28:53,275-Speed 13803.11 samples/sec Loss 2.1108 LearningRate 0.0004 Epoch: 16 Global Step: 28660 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-03 22:29:11,143-Speed 13754.94 samples/sec Loss 2.1209 LearningRate 0.0004 Epoch: 16 Global Step: 28670 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-03 22:29:28,975-Speed 13782.66 samples/sec Loss 2.1316 LearningRate 0.0004 Epoch: 16 Global Step: 28680 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-03 22:29:46,860-Speed 13742.67 samples/sec Loss 2.1196 LearningRate 0.0004 Epoch: 16 Global Step: 28690 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-03 22:30:04,669-Speed 13800.90 samples/sec Loss 2.1178 LearningRate 0.0004 Epoch: 16 Global Step: 28700 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-03 22:30:22,345-Speed 13905.52 samples/sec Loss 2.1331 LearningRate 0.0004 Epoch: 16 Global Step: 28710 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-03 22:30:40,089-Speed 13851.15 samples/sec Loss 2.1076 LearningRate 0.0004 Epoch: 16 Global Step: 28720 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-03 22:30:57,899-Speed 13801.70 samples/sec Loss 2.1102 LearningRate 0.0004 Epoch: 16 Global Step: 28730 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-03 22:31:15,692-Speed 13813.33 samples/sec Loss 2.1314 LearningRate 0.0004 Epoch: 16 Global Step: 28740 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-03 22:31:33,434-Speed 13852.68 samples/sec Loss 2.1073 LearningRate 0.0004 Epoch: 16 Global Step: 28750 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-03-03 22:31:51,246-Speed 13798.03 samples/sec Loss 2.1177 LearningRate 0.0004 Epoch: 16 Global Step: 28760 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-03-03 22:32:08,950-Speed 13882.71 samples/sec Loss 2.1063 LearningRate 0.0004 Epoch: 16 Global Step: 28770 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-03-03 22:32:26,695-Speed 13850.63 samples/sec Loss 2.1142 LearningRate 0.0004 Epoch: 16 Global Step: 28780 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-03-03 22:32:44,345-Speed 13924.53 samples/sec Loss 2.1076 LearningRate 0.0004 Epoch: 16 Global Step: 28790 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-03-03 22:33:02,214-Speed 13754.67 samples/sec Loss 2.1214 LearningRate 0.0004 Epoch: 16 Global Step: 28800 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-03-03 22:33:19,942-Speed 13863.50 samples/sec Loss 2.0970 LearningRate 0.0004 Epoch: 16 Global Step: 28810 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-03-03 22:33:37,785-Speed 13775.23 samples/sec Loss 2.1030 LearningRate 0.0004 Epoch: 16 Global Step: 28820 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-03-03 22:33:55,608-Speed 13789.54 samples/sec Loss 2.1224 LearningRate 0.0004 Epoch: 16 Global Step: 28830 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-03-03 22:34:13,475-Speed 13756.20 samples/sec Loss 2.1185 LearningRate 0.0004 Epoch: 16 Global Step: 28840 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-03-03 22:34:31,289-Speed 13796.57 samples/sec Loss 2.1069 LearningRate 0.0004 Epoch: 16 Global Step: 28850 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-03 22:34:48,966-Speed 13903.79 samples/sec Loss 2.0847 LearningRate 0.0004 Epoch: 16 Global Step: 28860 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-03 22:35:06,725-Speed 13840.00 samples/sec Loss 2.1037 LearningRate 0.0004 Epoch: 16 Global Step: 28870 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-03 22:35:24,497-Speed 13828.82 samples/sec Loss 2.1164 LearningRate 0.0004 Epoch: 16 Global Step: 28880 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-03 22:35:42,191-Speed 13892.64 samples/sec Loss 2.1006 LearningRate 0.0004 Epoch: 16 Global Step: 28890 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-03 22:35:59,915-Speed 13867.03 samples/sec Loss 2.0932 LearningRate 0.0004 Epoch: 16 Global Step: 28900 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-03 22:36:17,634-Speed 13870.64 samples/sec Loss 2.0946 LearningRate 0.0004 Epoch: 16 Global Step: 28910 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-03 22:36:35,325-Speed 13893.02 samples/sec Loss 2.0867 LearningRate 0.0004 Epoch: 16 Global Step: 28920 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-03 22:36:53,101-Speed 13825.63 samples/sec Loss 2.0976 LearningRate 0.0004 Epoch: 16 Global Step: 28930 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-03 22:37:10,886-Speed 13819.12 samples/sec Loss 2.1070 LearningRate 0.0004 Epoch: 16 Global Step: 28940 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-03 22:37:28,708-Speed 13791.48 samples/sec Loss 2.1063 LearningRate 0.0004 Epoch: 16 Global Step: 28950 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-03 22:37:46,393-Speed 13897.36 samples/sec Loss 2.0944 LearningRate 0.0004 Epoch: 16 Global Step: 28960 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-03 22:38:04,097-Speed 13881.82 samples/sec Loss 2.1050 LearningRate 0.0004 Epoch: 16 Global Step: 28970 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-03 22:38:21,792-Speed 13889.94 samples/sec Loss 2.0873 LearningRate 0.0004 Epoch: 16 Global Step: 28980 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-03 22:38:39,559-Speed 13833.06 samples/sec Loss 2.1015 LearningRate 0.0004 Epoch: 16 Global Step: 28990 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-03 22:38:57,508-Speed 13692.90 samples/sec Loss 2.0792 LearningRate 0.0004 Epoch: 16 Global Step: 29000 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-03 22:39:15,195-Speed 13895.63 samples/sec Loss 2.0813 LearningRate 0.0004 Epoch: 16 Global Step: 29010 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-03 22:39:32,900-Speed 13881.91 samples/sec Loss 2.0974 LearningRate 0.0004 Epoch: 16 Global Step: 29020 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-03 22:39:50,636-Speed 13858.07 samples/sec Loss 2.0802 LearningRate 0.0004 Epoch: 16 Global Step: 29030 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-03 22:40:08,389-Speed 13843.73 samples/sec Loss 2.1026 LearningRate 0.0004 Epoch: 16 Global Step: 29040 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-03 22:40:26,097-Speed 13879.41 samples/sec Loss 2.0833 LearningRate 0.0004 Epoch: 16 Global Step: 29050 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-03 22:40:43,822-Speed 13866.55 samples/sec Loss 2.0799 LearningRate 0.0004 Epoch: 16 Global Step: 29060 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-03 22:41:01,632-Speed 13800.18 samples/sec Loss 2.0915 LearningRate 0.0004 Epoch: 16 Global Step: 29070 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-03 22:41:19,370-Speed 13855.89 samples/sec Loss 2.0927 LearningRate 0.0004 Epoch: 16 Global Step: 29080 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-03 22:41:37,054-Speed 13897.52 samples/sec Loss 2.0878 LearningRate 0.0004 Epoch: 16 Global Step: 29090 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-03 22:41:54,748-Speed 13891.01 samples/sec Loss 2.1090 LearningRate 0.0004 Epoch: 16 Global Step: 29100 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-03 22:42:12,662-Speed 13721.01 samples/sec Loss 2.0941 LearningRate 0.0004 Epoch: 16 Global Step: 29110 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-03 22:42:30,416-Speed 13843.21 samples/sec Loss 2.0762 LearningRate 0.0004 Epoch: 16 Global Step: 29120 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-03 22:42:48,139-Speed 13867.36 samples/sec Loss 2.0876 LearningRate 0.0004 Epoch: 16 Global Step: 29130 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-03 22:43:05,888-Speed 13847.54 samples/sec Loss 2.0993 LearningRate 0.0004 Epoch: 16 Global Step: 29140 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-03 22:43:23,649-Speed 13838.94 samples/sec Loss 2.1095 LearningRate 0.0004 Epoch: 16 Global Step: 29150 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-03 22:43:41,388-Speed 13854.99 samples/sec Loss 2.0997 LearningRate 0.0004 Epoch: 16 Global Step: 29160 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-03 22:43:59,091-Speed 13883.57 samples/sec Loss 2.0964 LearningRate 0.0004 Epoch: 16 Global Step: 29170 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-03 22:44:16,923-Speed 13782.28 samples/sec Loss 2.0831 LearningRate 0.0004 Epoch: 16 Global Step: 29180 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-03-03 22:44:34,625-Speed 13884.88 samples/sec Loss 2.0959 LearningRate 0.0004 Epoch: 16 Global Step: 29190 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-03-03 22:44:52,355-Speed 13861.70 samples/sec Loss 2.0808 LearningRate 0.0004 Epoch: 16 Global Step: 29200 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-03 22:45:10,133-Speed 13826.85 samples/sec Loss 2.0807 LearningRate 0.0004 Epoch: 16 Global Step: 29210 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-03 22:45:27,902-Speed 13831.62 samples/sec Loss 2.0795 LearningRate 0.0004 Epoch: 16 Global Step: 29220 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-03 22:45:45,642-Speed 13854.29 samples/sec Loss 2.0801 LearningRate 0.0004 Epoch: 16 Global Step: 29230 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-03 22:46:03,426-Speed 13819.96 samples/sec Loss 2.0807 LearningRate 0.0004 Epoch: 16 Global Step: 29240 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-03 22:46:21,174-Speed 13848.36 samples/sec Loss 2.1162 LearningRate 0.0004 Epoch: 16 Global Step: 29250 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-03 22:46:38,873-Speed 13886.87 samples/sec Loss 2.0870 LearningRate 0.0004 Epoch: 16 Global Step: 29260 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-03 22:46:56,607-Speed 13858.75 samples/sec Loss 2.0854 LearningRate 0.0004 Epoch: 16 Global Step: 29270 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-03 22:47:14,473-Speed 13757.02 samples/sec Loss 2.0787 LearningRate 0.0004 Epoch: 16 Global Step: 29280 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-03 22:47:32,249-Speed 13826.02 samples/sec Loss 2.1017 LearningRate 0.0004 Epoch: 16 Global Step: 29290 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-03 22:47:50,042-Speed 13813.55 samples/sec Loss 2.1121 LearningRate 0.0004 Epoch: 16 Global Step: 29300 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-03 22:48:07,822-Speed 13822.97 samples/sec Loss 2.0864 LearningRate 0.0004 Epoch: 16 Global Step: 29310 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-03 22:48:25,510-Speed 13894.88 samples/sec Loss 2.0718 LearningRate 0.0004 Epoch: 16 Global Step: 29320 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-03 22:48:43,272-Speed 13837.37 samples/sec Loss 2.1064 LearningRate 0.0004 Epoch: 16 Global Step: 29330 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-03-03 22:49:00,974-Speed 13884.44 samples/sec Loss 2.1116 LearningRate 0.0004 Epoch: 16 Global Step: 29340 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-03-03 22:49:18,736-Speed 13836.71 samples/sec Loss 2.0969 LearningRate 0.0004 Epoch: 16 Global Step: 29350 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-03-03 22:49:36,424-Speed 13895.21 samples/sec Loss 2.0892 LearningRate 0.0004 Epoch: 16 Global Step: 29360 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-03-03 22:49:54,296-Speed 13752.15 samples/sec Loss 2.0945 LearningRate 0.0004 Epoch: 16 Global Step: 29370 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-03-03 22:50:12,068-Speed 13830.38 samples/sec Loss 2.1086 LearningRate 0.0004 Epoch: 16 Global Step: 29380 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-03-03 22:51:20,994-Speed 3565.67 samples/sec Loss 2.0606 LearningRate 0.0004 Epoch: 17 Global Step: 29390 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-03-03 22:51:38,609-Speed 13952.29 samples/sec Loss 2.0539 LearningRate 0.0004 Epoch: 17 Global Step: 29400 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-03-03 22:51:56,277-Speed 13911.06 samples/sec Loss 2.0486 LearningRate 0.0004 Epoch: 17 Global Step: 29410 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-03-03 22:52:14,098-Speed 13791.53 samples/sec Loss 2.0524 LearningRate 0.0004 Epoch: 17 Global Step: 29420 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-03-03 22:52:31,915-Speed 13794.64 samples/sec Loss 2.0353 LearningRate 0.0004 Epoch: 17 Global Step: 29430 Fp16 Grad Scale: 8192 Required: 20 hours Training: 2022-03-03 22:52:49,671-Speed 13841.26 samples/sec Loss 2.0704 LearningRate 0.0004 Epoch: 17 Global Step: 29440 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-03-03 22:53:07,447-Speed 13827.02 samples/sec Loss 2.0468 LearningRate 0.0004 Epoch: 17 Global Step: 29450 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-03-03 22:53:25,209-Speed 13837.54 samples/sec Loss 2.0715 LearningRate 0.0004 Epoch: 17 Global Step: 29460 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-03-03 22:53:42,937-Speed 13863.69 samples/sec Loss 2.0571 LearningRate 0.0004 Epoch: 17 Global Step: 29470 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-03-03 22:54:00,575-Speed 13933.93 samples/sec Loss 2.0687 LearningRate 0.0004 Epoch: 17 Global Step: 29480 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-03-03 22:54:18,274-Speed 13886.93 samples/sec Loss 2.0511 LearningRate 0.0004 Epoch: 17 Global Step: 29490 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-03-03 22:54:36,047-Speed 13828.31 samples/sec Loss 2.0501 LearningRate 0.0004 Epoch: 17 Global Step: 29500 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-03-03 22:54:53,754-Speed 13880.10 samples/sec Loss 2.0489 LearningRate 0.0004 Epoch: 17 Global Step: 29510 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-03-03 22:55:11,457-Speed 13883.38 samples/sec Loss 2.0505 LearningRate 0.0004 Epoch: 17 Global Step: 29520 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-03-03 22:55:29,149-Speed 13892.37 samples/sec Loss 2.0659 LearningRate 0.0004 Epoch: 17 Global Step: 29530 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-03-03 22:55:46,867-Speed 13871.34 samples/sec Loss 2.0536 LearningRate 0.0004 Epoch: 17 Global Step: 29540 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-03 22:56:04,599-Speed 13860.69 samples/sec Loss 2.0658 LearningRate 0.0004 Epoch: 17 Global Step: 29550 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-03 22:56:22,309-Speed 13877.39 samples/sec Loss 2.0633 LearningRate 0.0004 Epoch: 17 Global Step: 29560 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-03 22:56:39,980-Speed 13909.27 samples/sec Loss 2.0605 LearningRate 0.0004 Epoch: 17 Global Step: 29570 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-03 22:56:57,727-Speed 13848.97 samples/sec Loss 2.0815 LearningRate 0.0004 Epoch: 17 Global Step: 29580 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-03 22:57:15,393-Speed 13912.05 samples/sec Loss 2.0721 LearningRate 0.0004 Epoch: 17 Global Step: 29590 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-03 22:57:33,091-Speed 13887.12 samples/sec Loss 2.0691 LearningRate 0.0004 Epoch: 17 Global Step: 29600 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-03 22:57:50,778-Speed 13896.54 samples/sec Loss 2.0497 LearningRate 0.0004 Epoch: 17 Global Step: 29610 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-03 22:58:08,494-Speed 13873.13 samples/sec Loss 2.0406 LearningRate 0.0004 Epoch: 17 Global Step: 29620 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-03 22:58:26,455-Speed 13683.57 samples/sec Loss 2.0635 LearningRate 0.0004 Epoch: 17 Global Step: 29630 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-03 22:58:44,164-Speed 13878.28 samples/sec Loss 2.0603 LearningRate 0.0004 Epoch: 17 Global Step: 29640 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-03 22:59:02,025-Speed 13760.83 samples/sec Loss 2.0702 LearningRate 0.0004 Epoch: 17 Global Step: 29650 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-03 22:59:19,778-Speed 13844.00 samples/sec Loss 2.0569 LearningRate 0.0004 Epoch: 17 Global Step: 29660 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-03 22:59:37,526-Speed 13848.19 samples/sec Loss 2.0683 LearningRate 0.0004 Epoch: 17 Global Step: 29670 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-03 22:59:55,250-Speed 13866.91 samples/sec Loss 2.0609 LearningRate 0.0004 Epoch: 17 Global Step: 29680 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-03 23:00:13,038-Speed 13816.97 samples/sec Loss 2.0740 LearningRate 0.0004 Epoch: 17 Global Step: 29690 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-03 23:00:30,697-Speed 13918.42 samples/sec Loss 2.0839 LearningRate 0.0004 Epoch: 17 Global Step: 29700 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-03 23:00:48,454-Speed 13840.67 samples/sec Loss 2.0532 LearningRate 0.0004 Epoch: 17 Global Step: 29710 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-03 23:01:06,303-Speed 13769.38 samples/sec Loss 2.0464 LearningRate 0.0004 Epoch: 17 Global Step: 29720 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-03 23:01:24,001-Speed 13887.47 samples/sec Loss 2.0466 LearningRate 0.0004 Epoch: 17 Global Step: 29730 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-03 23:01:41,776-Speed 13828.05 samples/sec Loss 2.0560 LearningRate 0.0004 Epoch: 17 Global Step: 29740 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-03 23:01:59,464-Speed 13894.92 samples/sec Loss 2.0625 LearningRate 0.0004 Epoch: 17 Global Step: 29750 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-03 23:02:17,263-Speed 13807.84 samples/sec Loss 2.0582 LearningRate 0.0004 Epoch: 17 Global Step: 29760 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-03 23:02:34,995-Speed 13860.51 samples/sec Loss 2.0623 LearningRate 0.0004 Epoch: 17 Global Step: 29770 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-03 23:02:52,716-Speed 13869.36 samples/sec Loss 2.0516 LearningRate 0.0004 Epoch: 17 Global Step: 29780 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-03 23:03:10,410-Speed 13890.85 samples/sec Loss 2.0534 LearningRate 0.0004 Epoch: 17 Global Step: 29790 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-03 23:03:28,176-Speed 13833.72 samples/sec Loss 2.0521 LearningRate 0.0004 Epoch: 17 Global Step: 29800 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-03 23:03:45,903-Speed 13864.69 samples/sec Loss 2.0596 LearningRate 0.0004 Epoch: 17 Global Step: 29810 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-03 23:04:03,740-Speed 13779.02 samples/sec Loss 2.0569 LearningRate 0.0004 Epoch: 17 Global Step: 29820 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-03 23:04:21,499-Speed 13839.97 samples/sec Loss 2.0601 LearningRate 0.0004 Epoch: 17 Global Step: 29830 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-03 23:04:39,265-Speed 13833.51 samples/sec Loss 2.0439 LearningRate 0.0004 Epoch: 17 Global Step: 29840 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-03 23:04:57,031-Speed 13833.88 samples/sec Loss 2.0510 LearningRate 0.0004 Epoch: 17 Global Step: 29850 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-03 23:05:14,774-Speed 13852.76 samples/sec Loss 2.0392 LearningRate 0.0004 Epoch: 17 Global Step: 29860 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-03 23:05:32,468-Speed 13890.07 samples/sec Loss 2.0439 LearningRate 0.0004 Epoch: 17 Global Step: 29870 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-03 23:05:50,211-Speed 13852.22 samples/sec Loss 2.0483 LearningRate 0.0004 Epoch: 17 Global Step: 29880 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-03 23:06:07,991-Speed 13823.52 samples/sec Loss 2.0418 LearningRate 0.0004 Epoch: 17 Global Step: 29890 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-03 23:06:25,721-Speed 13862.55 samples/sec Loss 2.0440 LearningRate 0.0004 Epoch: 17 Global Step: 29900 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-03 23:06:43,489-Speed 13832.44 samples/sec Loss 2.0451 LearningRate 0.0004 Epoch: 17 Global Step: 29910 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-03 23:07:01,178-Speed 13894.29 samples/sec Loss 2.0534 LearningRate 0.0004 Epoch: 17 Global Step: 29920 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-03 23:07:18,897-Speed 13870.45 samples/sec Loss 2.0326 LearningRate 0.0004 Epoch: 17 Global Step: 29930 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-03 23:07:36,596-Speed 13886.62 samples/sec Loss 2.0448 LearningRate 0.0004 Epoch: 17 Global Step: 29940 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-03 23:07:54,368-Speed 13829.82 samples/sec Loss 2.0482 LearningRate 0.0004 Epoch: 17 Global Step: 29950 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-03 23:08:12,145-Speed 13827.24 samples/sec Loss 2.0377 LearningRate 0.0004 Epoch: 17 Global Step: 29960 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-03 23:08:29,935-Speed 13815.22 samples/sec Loss 2.0367 LearningRate 0.0004 Epoch: 17 Global Step: 29970 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-03 23:08:47,724-Speed 13816.44 samples/sec Loss 2.0453 LearningRate 0.0004 Epoch: 17 Global Step: 29980 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-03 23:09:05,471-Speed 13848.85 samples/sec Loss 2.0343 LearningRate 0.0004 Epoch: 17 Global Step: 29990 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-03 23:09:23,226-Speed 13843.14 samples/sec Loss 2.0421 LearningRate 0.0004 Epoch: 17 Global Step: 30000 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-03 23:09:40,904-Speed 13903.22 samples/sec Loss 2.0325 LearningRate 0.0004 Epoch: 17 Global Step: 30010 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-03 23:09:58,623-Speed 13870.83 samples/sec Loss 2.0410 LearningRate 0.0004 Epoch: 17 Global Step: 30020 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-03 23:10:16,374-Speed 13845.89 samples/sec Loss 2.0267 LearningRate 0.0004 Epoch: 17 Global Step: 30030 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-03 23:10:34,101-Speed 13865.96 samples/sec Loss 2.0308 LearningRate 0.0004 Epoch: 17 Global Step: 30040 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-03 23:10:51,936-Speed 13780.78 samples/sec Loss 2.0434 LearningRate 0.0004 Epoch: 17 Global Step: 30050 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-03 23:11:09,659-Speed 13867.10 samples/sec Loss 2.0340 LearningRate 0.0004 Epoch: 17 Global Step: 30060 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-03 23:11:27,363-Speed 13882.57 samples/sec Loss 2.0414 LearningRate 0.0004 Epoch: 17 Global Step: 30070 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-03 23:11:45,139-Speed 13826.14 samples/sec Loss 2.0265 LearningRate 0.0004 Epoch: 17 Global Step: 30080 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-03 23:12:03,011-Speed 13752.10 samples/sec Loss 2.0475 LearningRate 0.0004 Epoch: 17 Global Step: 30090 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-03 23:12:20,762-Speed 13845.82 samples/sec Loss 2.0414 LearningRate 0.0004 Epoch: 17 Global Step: 30100 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-03 23:12:38,549-Speed 13817.59 samples/sec Loss 2.0334 LearningRate 0.0004 Epoch: 17 Global Step: 30110 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-03 23:12:56,290-Speed 13854.13 samples/sec Loss 2.0462 LearningRate 0.0004 Epoch: 17 Global Step: 30120 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-03 23:13:14,028-Speed 13855.23 samples/sec Loss 2.0299 LearningRate 0.0004 Epoch: 17 Global Step: 30130 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-03 23:13:31,846-Speed 13795.20 samples/sec Loss 2.0345 LearningRate 0.0004 Epoch: 17 Global Step: 30140 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-03 23:13:49,652-Speed 13803.06 samples/sec Loss 2.0368 LearningRate 0.0004 Epoch: 17 Global Step: 30150 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-03 23:14:07,444-Speed 13814.37 samples/sec Loss 2.0275 LearningRate 0.0004 Epoch: 17 Global Step: 30160 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-03 23:14:25,244-Speed 13807.58 samples/sec Loss 2.0399 LearningRate 0.0004 Epoch: 17 Global Step: 30170 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-03-03 23:14:43,009-Speed 13835.71 samples/sec Loss 2.0259 LearningRate 0.0004 Epoch: 17 Global Step: 30180 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-03 23:15:00,859-Speed 13768.75 samples/sec Loss 2.0458 LearningRate 0.0004 Epoch: 17 Global Step: 30190 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-03 23:15:18,641-Speed 13821.88 samples/sec Loss 2.0400 LearningRate 0.0004 Epoch: 17 Global Step: 30200 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-03 23:15:36,427-Speed 13818.12 samples/sec Loss 2.0223 LearningRate 0.0004 Epoch: 17 Global Step: 30210 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-03 23:15:54,231-Speed 13804.60 samples/sec Loss 2.0327 LearningRate 0.0004 Epoch: 17 Global Step: 30220 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-03 23:16:11,970-Speed 13855.27 samples/sec Loss 2.0344 LearningRate 0.0004 Epoch: 17 Global Step: 30230 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-03 23:16:29,720-Speed 13846.77 samples/sec Loss 2.0346 LearningRate 0.0004 Epoch: 17 Global Step: 30240 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-03-03 23:16:47,403-Speed 13898.60 samples/sec Loss 2.0312 LearningRate 0.0004 Epoch: 17 Global Step: 30250 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-03-03 23:17:05,181-Speed 13824.73 samples/sec Loss 2.0373 LearningRate 0.0004 Epoch: 17 Global Step: 30260 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-03-03 23:17:22,887-Speed 13882.46 samples/sec Loss 2.0358 LearningRate 0.0004 Epoch: 17 Global Step: 30270 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-03-03 23:17:40,640-Speed 13845.51 samples/sec Loss 2.0256 LearningRate 0.0004 Epoch: 17 Global Step: 30280 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-03-03 23:17:58,398-Speed 13840.06 samples/sec Loss 2.0223 LearningRate 0.0004 Epoch: 17 Global Step: 30290 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-03-03 23:18:16,182-Speed 13821.25 samples/sec Loss 2.0166 LearningRate 0.0004 Epoch: 17 Global Step: 30300 Fp16 Grad Scale: 16384 Required: 20 hours Training: 2022-03-03 23:18:33,944-Speed 13837.19 samples/sec Loss 2.0080 LearningRate 0.0004 Epoch: 17 Global Step: 30310 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-03-03 23:18:51,713-Speed 13832.23 samples/sec Loss 2.0127 LearningRate 0.0004 Epoch: 17 Global Step: 30320 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-03-03 23:19:09,447-Speed 13858.96 samples/sec Loss 2.0081 LearningRate 0.0004 Epoch: 17 Global Step: 30330 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-03-03 23:19:27,194-Speed 13848.85 samples/sec Loss 2.0158 LearningRate 0.0004 Epoch: 17 Global Step: 30340 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-03-03 23:19:44,856-Speed 13915.81 samples/sec Loss 2.0140 LearningRate 0.0004 Epoch: 17 Global Step: 30350 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-03 23:20:02,635-Speed 13823.98 samples/sec Loss 2.0230 LearningRate 0.0004 Epoch: 17 Global Step: 30360 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-03 23:20:20,470-Speed 13783.82 samples/sec Loss 2.0269 LearningRate 0.0004 Epoch: 17 Global Step: 30370 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-03 23:20:38,331-Speed 13760.36 samples/sec Loss 2.0288 LearningRate 0.0004 Epoch: 17 Global Step: 30380 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-03 23:20:56,364-Speed 13628.61 samples/sec Loss 2.0257 LearningRate 0.0004 Epoch: 17 Global Step: 30390 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-03 23:21:14,492-Speed 13557.96 samples/sec Loss 2.0243 LearningRate 0.0004 Epoch: 17 Global Step: 30400 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-03 23:21:32,610-Speed 13566.50 samples/sec Loss 1.9984 LearningRate 0.0004 Epoch: 17 Global Step: 30410 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-03 23:21:50,725-Speed 13567.39 samples/sec Loss 2.0038 LearningRate 0.0004 Epoch: 17 Global Step: 30420 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-03 23:22:08,902-Speed 13520.87 samples/sec Loss 2.0148 LearningRate 0.0004 Epoch: 17 Global Step: 30430 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-03 23:22:27,018-Speed 13567.11 samples/sec Loss 2.0129 LearningRate 0.0004 Epoch: 17 Global Step: 30440 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-03 23:22:45,143-Speed 13560.38 samples/sec Loss 2.0300 LearningRate 0.0004 Epoch: 17 Global Step: 30450 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-03 23:23:03,282-Speed 13549.37 samples/sec Loss 2.0112 LearningRate 0.0004 Epoch: 17 Global Step: 30460 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-03 23:23:21,351-Speed 13601.67 samples/sec Loss 2.0019 LearningRate 0.0004 Epoch: 17 Global Step: 30470 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-03 23:23:39,449-Speed 13580.44 samples/sec Loss 2.0199 LearningRate 0.0004 Epoch: 17 Global Step: 30480 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-03 23:23:57,595-Speed 13544.53 samples/sec Loss 2.0253 LearningRate 0.0004 Epoch: 17 Global Step: 30490 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-03 23:24:15,947-Speed 13392.68 samples/sec Loss 1.9933 LearningRate 0.0004 Epoch: 17 Global Step: 30500 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-03 23:24:34,059-Speed 13569.55 samples/sec Loss 2.0125 LearningRate 0.0004 Epoch: 17 Global Step: 30510 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-03 23:24:52,168-Speed 13571.62 samples/sec Loss 2.0132 LearningRate 0.0004 Epoch: 17 Global Step: 30520 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-03 23:25:10,303-Speed 13553.71 samples/sec Loss 2.0029 LearningRate 0.0004 Epoch: 17 Global Step: 30530 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-03 23:25:28,465-Speed 13532.29 samples/sec Loss 2.0182 LearningRate 0.0004 Epoch: 17 Global Step: 30540 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-03 23:25:46,295-Speed 13784.30 samples/sec Loss 2.0113 LearningRate 0.0004 Epoch: 17 Global Step: 30550 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-03 23:26:04,109-Speed 13796.42 samples/sec Loss 2.0113 LearningRate 0.0004 Epoch: 17 Global Step: 30560 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-03 23:26:21,889-Speed 13823.58 samples/sec Loss 2.0058 LearningRate 0.0004 Epoch: 17 Global Step: 30570 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-03 23:26:39,575-Speed 13896.27 samples/sec Loss 2.0280 LearningRate 0.0004 Epoch: 17 Global Step: 30580 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-03 23:26:57,347-Speed 13829.98 samples/sec Loss 2.0114 LearningRate 0.0004 Epoch: 17 Global Step: 30590 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-03 23:27:15,143-Speed 13810.19 samples/sec Loss 1.9884 LearningRate 0.0004 Epoch: 17 Global Step: 30600 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-03 23:27:32,826-Speed 13899.07 samples/sec Loss 1.9905 LearningRate 0.0004 Epoch: 17 Global Step: 30610 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-03 23:27:50,547-Speed 13868.93 samples/sec Loss 2.0188 LearningRate 0.0004 Epoch: 17 Global Step: 30620 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-03 23:28:08,346-Speed 13808.51 samples/sec Loss 2.0040 LearningRate 0.0004 Epoch: 17 Global Step: 30630 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-03 23:28:26,062-Speed 13872.97 samples/sec Loss 1.9999 LearningRate 0.0004 Epoch: 17 Global Step: 30640 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-03 23:28:43,887-Speed 13788.59 samples/sec Loss 2.0021 LearningRate 0.0004 Epoch: 17 Global Step: 30650 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-03 23:29:01,587-Speed 13886.00 samples/sec Loss 2.0202 LearningRate 0.0004 Epoch: 17 Global Step: 30660 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-03 23:29:19,301-Speed 13874.02 samples/sec Loss 2.0090 LearningRate 0.0004 Epoch: 17 Global Step: 30670 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-03 23:29:37,007-Speed 13880.68 samples/sec Loss 2.0080 LearningRate 0.0004 Epoch: 17 Global Step: 30680 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-03 23:29:54,716-Speed 13879.16 samples/sec Loss 1.9930 LearningRate 0.0004 Epoch: 17 Global Step: 30690 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-03 23:30:12,628-Speed 13721.43 samples/sec Loss 2.0007 LearningRate 0.0004 Epoch: 17 Global Step: 30700 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-03 23:30:30,411-Speed 13820.65 samples/sec Loss 1.9941 LearningRate 0.0004 Epoch: 17 Global Step: 30710 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-03 23:30:48,131-Speed 13869.57 samples/sec Loss 2.0135 LearningRate 0.0004 Epoch: 17 Global Step: 30720 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-03 23:31:05,863-Speed 13861.02 samples/sec Loss 1.9868 LearningRate 0.0004 Epoch: 17 Global Step: 30730 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-03 23:31:23,615-Speed 13844.82 samples/sec Loss 2.0109 LearningRate 0.0004 Epoch: 17 Global Step: 30740 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-03 23:31:41,335-Speed 13869.88 samples/sec Loss 1.9903 LearningRate 0.0004 Epoch: 17 Global Step: 30750 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-03 23:31:59,009-Speed 13906.64 samples/sec Loss 1.9806 LearningRate 0.0004 Epoch: 17 Global Step: 30760 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-03 23:32:16,785-Speed 13826.36 samples/sec Loss 1.9876 LearningRate 0.0004 Epoch: 17 Global Step: 30770 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-03 23:32:34,528-Speed 13851.55 samples/sec Loss 1.9848 LearningRate 0.0004 Epoch: 17 Global Step: 30780 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-03 23:32:52,337-Speed 13801.12 samples/sec Loss 1.9767 LearningRate 0.0004 Epoch: 17 Global Step: 30790 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-03 23:33:10,047-Speed 13877.04 samples/sec Loss 2.0047 LearningRate 0.0004 Epoch: 17 Global Step: 30800 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-03 23:33:27,791-Speed 13851.40 samples/sec Loss 2.0122 LearningRate 0.0004 Epoch: 17 Global Step: 30810 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-03 23:33:45,547-Speed 13843.18 samples/sec Loss 1.9812 LearningRate 0.0004 Epoch: 17 Global Step: 30820 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-03 23:34:03,352-Speed 13804.37 samples/sec Loss 2.0058 LearningRate 0.0004 Epoch: 17 Global Step: 30830 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-03 23:34:21,092-Speed 13854.07 samples/sec Loss 1.9895 LearningRate 0.0004 Epoch: 17 Global Step: 30840 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-03 23:34:38,865-Speed 13828.66 samples/sec Loss 1.9968 LearningRate 0.0004 Epoch: 17 Global Step: 30850 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-03 23:34:56,652-Speed 13818.12 samples/sec Loss 1.9871 LearningRate 0.0004 Epoch: 17 Global Step: 30860 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-03 23:35:14,443-Speed 13814.42 samples/sec Loss 1.9935 LearningRate 0.0004 Epoch: 17 Global Step: 30870 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-03 23:35:32,149-Speed 13880.56 samples/sec Loss 1.9924 LearningRate 0.0004 Epoch: 17 Global Step: 30880 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-03 23:35:49,911-Speed 13836.91 samples/sec Loss 1.9972 LearningRate 0.0004 Epoch: 17 Global Step: 30890 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-03 23:36:07,720-Speed 13801.06 samples/sec Loss 2.0041 LearningRate 0.0004 Epoch: 17 Global Step: 30900 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-03 23:36:25,491-Speed 13830.32 samples/sec Loss 1.9902 LearningRate 0.0004 Epoch: 17 Global Step: 30910 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-03 23:36:43,317-Speed 13787.45 samples/sec Loss 1.9946 LearningRate 0.0004 Epoch: 17 Global Step: 30920 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-03 23:37:01,015-Speed 13886.72 samples/sec Loss 1.9964 LearningRate 0.0004 Epoch: 17 Global Step: 30930 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-03 23:37:18,746-Speed 13861.58 samples/sec Loss 2.0104 LearningRate 0.0004 Epoch: 17 Global Step: 30940 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-03 23:37:36,644-Speed 13732.03 samples/sec Loss 2.0088 LearningRate 0.0004 Epoch: 17 Global Step: 30950 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-03 23:37:54,414-Speed 13831.44 samples/sec Loss 1.9947 LearningRate 0.0004 Epoch: 17 Global Step: 30960 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-03-03 23:38:12,264-Speed 13768.68 samples/sec Loss 1.9749 LearningRate 0.0004 Epoch: 17 Global Step: 30970 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-03-03 23:38:29,937-Speed 13906.88 samples/sec Loss 1.9866 LearningRate 0.0004 Epoch: 17 Global Step: 30980 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-03-03 23:38:47,649-Speed 13877.90 samples/sec Loss 1.9926 LearningRate 0.0004 Epoch: 17 Global Step: 30990 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-03-03 23:39:05,419-Speed 13830.63 samples/sec Loss 2.0100 LearningRate 0.0004 Epoch: 17 Global Step: 31000 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-03-03 23:39:23,143-Speed 13867.18 samples/sec Loss 1.9980 LearningRate 0.0004 Epoch: 17 Global Step: 31010 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-03-03 23:39:40,893-Speed 13846.67 samples/sec Loss 2.0016 LearningRate 0.0004 Epoch: 17 Global Step: 31020 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-03-03 23:39:58,599-Speed 13880.95 samples/sec Loss 1.9983 LearningRate 0.0004 Epoch: 17 Global Step: 31030 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-03-03 23:40:16,331-Speed 13860.76 samples/sec Loss 1.9952 LearningRate 0.0004 Epoch: 17 Global Step: 31040 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-03-03 23:40:33,997-Speed 13912.29 samples/sec Loss 2.0023 LearningRate 0.0004 Epoch: 17 Global Step: 31050 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-03-03 23:40:51,719-Speed 13868.54 samples/sec Loss 2.0021 LearningRate 0.0004 Epoch: 17 Global Step: 31060 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-03 23:41:09,578-Speed 13762.03 samples/sec Loss 2.0028 LearningRate 0.0004 Epoch: 17 Global Step: 31070 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-03 23:41:27,316-Speed 13856.11 samples/sec Loss 2.0093 LearningRate 0.0004 Epoch: 17 Global Step: 31080 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-03 23:41:45,061-Speed 13850.13 samples/sec Loss 2.0092 LearningRate 0.0004 Epoch: 17 Global Step: 31090 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-03 23:42:02,780-Speed 13870.52 samples/sec Loss 2.0100 LearningRate 0.0004 Epoch: 17 Global Step: 31100 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-03 23:42:20,634-Speed 13766.48 samples/sec Loss 2.0264 LearningRate 0.0004 Epoch: 17 Global Step: 31110 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-03 23:43:29,381-Speed 3577.08 samples/sec Loss 1.9687 LearningRate 0.0004 Epoch: 18 Global Step: 31120 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-03 23:43:47,017-Speed 13935.98 samples/sec Loss 1.9601 LearningRate 0.0004 Epoch: 18 Global Step: 31130 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-03 23:44:04,673-Speed 13920.06 samples/sec Loss 1.9373 LearningRate 0.0004 Epoch: 18 Global Step: 31140 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-03 23:44:22,340-Speed 13911.91 samples/sec Loss 1.9642 LearningRate 0.0004 Epoch: 18 Global Step: 31150 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-03 23:44:40,044-Speed 13882.02 samples/sec Loss 1.9680 LearningRate 0.0004 Epoch: 18 Global Step: 31160 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-03 23:44:57,694-Speed 13925.16 samples/sec Loss 1.9562 LearningRate 0.0004 Epoch: 18 Global Step: 31170 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-03 23:45:15,538-Speed 13773.29 samples/sec Loss 1.9605 LearningRate 0.0004 Epoch: 18 Global Step: 31180 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-03 23:45:33,430-Speed 13737.17 samples/sec Loss 1.9577 LearningRate 0.0004 Epoch: 18 Global Step: 31190 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-03 23:45:51,291-Speed 13759.94 samples/sec Loss 1.9645 LearningRate 0.0004 Epoch: 18 Global Step: 31200 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-03 23:46:09,095-Speed 13804.39 samples/sec Loss 1.9483 LearningRate 0.0004 Epoch: 18 Global Step: 31210 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-03 23:46:26,902-Speed 13802.57 samples/sec Loss 1.9643 LearningRate 0.0004 Epoch: 18 Global Step: 31220 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-03 23:46:44,686-Speed 13819.78 samples/sec Loss 1.9568 LearningRate 0.0004 Epoch: 18 Global Step: 31230 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-03 23:47:02,516-Speed 13785.37 samples/sec Loss 1.9703 LearningRate 0.0004 Epoch: 18 Global Step: 31240 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-03 23:47:20,345-Speed 13785.08 samples/sec Loss 1.9666 LearningRate 0.0004 Epoch: 18 Global Step: 31250 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-03 23:47:38,113-Speed 13832.57 samples/sec Loss 1.9553 LearningRate 0.0004 Epoch: 18 Global Step: 31260 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-03 23:47:55,932-Speed 13793.34 samples/sec Loss 1.9547 LearningRate 0.0004 Epoch: 18 Global Step: 31270 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-03 23:48:13,730-Speed 13808.69 samples/sec Loss 1.9721 LearningRate 0.0004 Epoch: 18 Global Step: 31280 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-03 23:48:31,485-Speed 13842.91 samples/sec Loss 1.9587 LearningRate 0.0004 Epoch: 18 Global Step: 31290 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-03 23:48:49,184-Speed 13885.94 samples/sec Loss 1.9625 LearningRate 0.0004 Epoch: 18 Global Step: 31300 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-03 23:49:06,932-Speed 13848.51 samples/sec Loss 1.9597 LearningRate 0.0004 Epoch: 18 Global Step: 31310 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-03 23:49:24,643-Speed 13876.77 samples/sec Loss 1.9433 LearningRate 0.0004 Epoch: 18 Global Step: 31320 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-03 23:49:42,369-Speed 13865.19 samples/sec Loss 1.9524 LearningRate 0.0004 Epoch: 18 Global Step: 31330 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-03 23:50:00,084-Speed 13874.16 samples/sec Loss 1.9665 LearningRate 0.0004 Epoch: 18 Global Step: 31340 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-03 23:50:17,742-Speed 13918.65 samples/sec Loss 1.9646 LearningRate 0.0004 Epoch: 18 Global Step: 31350 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-03 23:50:35,466-Speed 13867.15 samples/sec Loss 1.9604 LearningRate 0.0004 Epoch: 18 Global Step: 31360 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-03-03 23:50:53,208-Speed 13852.56 samples/sec Loss 1.9569 LearningRate 0.0004 Epoch: 18 Global Step: 31370 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-03-03 23:51:10,927-Speed 13870.31 samples/sec Loss 1.9651 LearningRate 0.0004 Epoch: 18 Global Step: 31380 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-03 23:51:28,645-Speed 13873.92 samples/sec Loss 1.9664 LearningRate 0.0004 Epoch: 18 Global Step: 31390 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-03 23:51:46,317-Speed 13907.28 samples/sec Loss 1.9735 LearningRate 0.0004 Epoch: 18 Global Step: 31400 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-03 23:52:04,097-Speed 13823.30 samples/sec Loss 1.9559 LearningRate 0.0004 Epoch: 18 Global Step: 31410 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-03 23:52:21,844-Speed 13848.96 samples/sec Loss 1.9589 LearningRate 0.0004 Epoch: 18 Global Step: 31420 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-03 23:52:39,610-Speed 13834.24 samples/sec Loss 1.9856 LearningRate 0.0004 Epoch: 18 Global Step: 31430 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-03 23:52:57,368-Speed 13839.89 samples/sec Loss 1.9694 LearningRate 0.0004 Epoch: 18 Global Step: 31440 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-03 23:53:15,074-Speed 13881.48 samples/sec Loss 1.9591 LearningRate 0.0004 Epoch: 18 Global Step: 31450 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-03 23:53:32,751-Speed 13903.55 samples/sec Loss 1.9773 LearningRate 0.0004 Epoch: 18 Global Step: 31460 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-03 23:53:50,465-Speed 13874.66 samples/sec Loss 1.9677 LearningRate 0.0004 Epoch: 18 Global Step: 31470 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-03 23:54:08,227-Speed 13837.44 samples/sec Loss 1.9686 LearningRate 0.0004 Epoch: 18 Global Step: 31480 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-03 23:54:25,929-Speed 13883.53 samples/sec Loss 1.9558 LearningRate 0.0004 Epoch: 18 Global Step: 31490 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-03 23:54:43,640-Speed 13877.09 samples/sec Loss 1.9443 LearningRate 0.0004 Epoch: 18 Global Step: 31500 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-03 23:55:01,384-Speed 13852.37 samples/sec Loss 1.9804 LearningRate 0.0004 Epoch: 18 Global Step: 31510 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-03 23:55:19,076-Speed 13892.37 samples/sec Loss 1.9732 LearningRate 0.0004 Epoch: 18 Global Step: 31520 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-03 23:55:36,824-Speed 13847.69 samples/sec Loss 1.9557 LearningRate 0.0004 Epoch: 18 Global Step: 31530 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-03 23:55:54,567-Speed 13852.20 samples/sec Loss 1.9536 LearningRate 0.0004 Epoch: 18 Global Step: 31540 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-03 23:56:12,335-Speed 13832.15 samples/sec Loss 1.9578 LearningRate 0.0004 Epoch: 18 Global Step: 31550 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-03 23:56:30,160-Speed 13788.03 samples/sec Loss 1.9585 LearningRate 0.0004 Epoch: 18 Global Step: 31560 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-03 23:56:47,882-Speed 13868.51 samples/sec Loss 1.9501 LearningRate 0.0004 Epoch: 18 Global Step: 31570 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-03-03 23:57:05,689-Speed 13802.66 samples/sec Loss 1.9470 LearningRate 0.0004 Epoch: 18 Global Step: 31580 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-03-03 23:57:23,473-Speed 13819.62 samples/sec Loss 1.9495 LearningRate 0.0004 Epoch: 18 Global Step: 31590 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-03-03 23:57:41,283-Speed 13800.23 samples/sec Loss 1.9584 LearningRate 0.0004 Epoch: 18 Global Step: 31600 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-03-03 23:57:59,090-Speed 13802.10 samples/sec Loss 1.9483 LearningRate 0.0004 Epoch: 18 Global Step: 31610 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-03-03 23:58:17,046-Speed 13687.75 samples/sec Loss 1.9482 LearningRate 0.0004 Epoch: 18 Global Step: 31620 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-03-03 23:58:34,864-Speed 13793.18 samples/sec Loss 1.9546 LearningRate 0.0004 Epoch: 18 Global Step: 31630 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-03-03 23:58:52,708-Speed 13773.63 samples/sec Loss 1.9496 LearningRate 0.0004 Epoch: 18 Global Step: 31640 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-03-03 23:59:10,544-Speed 13780.35 samples/sec Loss 1.9556 LearningRate 0.0004 Epoch: 18 Global Step: 31650 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-03-03 23:59:28,530-Speed 13664.78 samples/sec Loss 1.9537 LearningRate 0.0004 Epoch: 18 Global Step: 31660 Fp16 Grad Scale: 16384 Required: 19 hours Training: 2022-03-03 23:59:46,408-Speed 13747.20 samples/sec Loss 1.9577 LearningRate 0.0004 Epoch: 18 Global Step: 31670 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-04 00:00:04,246-Speed 13778.19 samples/sec Loss 1.9414 LearningRate 0.0004 Epoch: 18 Global Step: 31680 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-04 00:00:22,149-Speed 13730.38 samples/sec Loss 1.9565 LearningRate 0.0004 Epoch: 18 Global Step: 31690 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-04 00:00:39,982-Speed 13782.15 samples/sec Loss 1.9424 LearningRate 0.0004 Epoch: 18 Global Step: 31700 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-04 00:00:57,924-Speed 13698.14 samples/sec Loss 1.9558 LearningRate 0.0004 Epoch: 18 Global Step: 31710 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-04 00:01:15,802-Speed 13747.76 samples/sec Loss 1.9477 LearningRate 0.0004 Epoch: 18 Global Step: 31720 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-04 00:01:33,714-Speed 13721.09 samples/sec Loss 1.9353 LearningRate 0.0004 Epoch: 18 Global Step: 31730 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-04 00:01:51,595-Speed 13745.27 samples/sec Loss 1.9442 LearningRate 0.0004 Epoch: 18 Global Step: 31740 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-04 00:02:09,419-Speed 13788.71 samples/sec Loss 1.9473 LearningRate 0.0004 Epoch: 18 Global Step: 31750 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-04 00:02:27,396-Speed 13671.36 samples/sec Loss 1.9444 LearningRate 0.0004 Epoch: 18 Global Step: 31760 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-04 00:02:45,457-Speed 13608.15 samples/sec Loss 1.9517 LearningRate 0.0004 Epoch: 18 Global Step: 31770 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-04 00:03:03,609-Speed 13539.94 samples/sec Loss 1.9403 LearningRate 0.0004 Epoch: 18 Global Step: 31780 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-04 00:03:21,795-Speed 13514.66 samples/sec Loss 1.9339 LearningRate 0.0004 Epoch: 18 Global Step: 31790 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-04 00:03:40,104-Speed 13423.24 samples/sec Loss 1.9320 LearningRate 0.0004 Epoch: 18 Global Step: 31800 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-04 00:03:58,268-Speed 13531.04 samples/sec Loss 1.9463 LearningRate 0.0004 Epoch: 18 Global Step: 31810 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-04 00:04:16,395-Speed 13558.72 samples/sec Loss 1.9542 LearningRate 0.0004 Epoch: 18 Global Step: 31820 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-04 00:04:34,528-Speed 13554.01 samples/sec Loss 1.9418 LearningRate 0.0004 Epoch: 18 Global Step: 31830 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-04 00:04:52,648-Speed 13563.58 samples/sec Loss 1.9516 LearningRate 0.0004 Epoch: 18 Global Step: 31840 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-04 00:05:10,666-Speed 13641.27 samples/sec Loss 1.9485 LearningRate 0.0004 Epoch: 18 Global Step: 31850 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-04 00:05:28,445-Speed 13823.54 samples/sec Loss 1.9465 LearningRate 0.0004 Epoch: 18 Global Step: 31860 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-04 00:05:46,176-Speed 13861.71 samples/sec Loss 1.9190 LearningRate 0.0004 Epoch: 18 Global Step: 31870 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-04 00:06:04,150-Speed 13673.89 samples/sec Loss 1.9428 LearningRate 0.0004 Epoch: 18 Global Step: 31880 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-04 00:06:21,989-Speed 13777.74 samples/sec Loss 1.9442 LearningRate 0.0004 Epoch: 18 Global Step: 31890 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-04 00:06:39,942-Speed 13689.98 samples/sec Loss 1.9343 LearningRate 0.0004 Epoch: 18 Global Step: 31900 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-04 00:06:57,825-Speed 13743.02 samples/sec Loss 1.9553 LearningRate 0.0004 Epoch: 18 Global Step: 31910 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-04 00:07:15,680-Speed 13765.01 samples/sec Loss 1.9402 LearningRate 0.0004 Epoch: 18 Global Step: 31920 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-04 00:07:33,575-Speed 13734.69 samples/sec Loss 1.9238 LearningRate 0.0004 Epoch: 18 Global Step: 31930 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-04 00:07:51,313-Speed 13855.93 samples/sec Loss 1.9219 LearningRate 0.0004 Epoch: 18 Global Step: 31940 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-04 00:08:09,205-Speed 13736.02 samples/sec Loss 1.9317 LearningRate 0.0004 Epoch: 18 Global Step: 31950 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-04 00:08:26,951-Speed 13849.91 samples/sec Loss 1.9461 LearningRate 0.0004 Epoch: 18 Global Step: 31960 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-04 00:08:44,736-Speed 13818.86 samples/sec Loss 1.9226 LearningRate 0.0004 Epoch: 18 Global Step: 31970 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-04 00:09:02,507-Speed 13830.87 samples/sec Loss 1.9297 LearningRate 0.0004 Epoch: 18 Global Step: 31980 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-04 00:09:20,259-Speed 13844.86 samples/sec Loss 1.9380 LearningRate 0.0004 Epoch: 18 Global Step: 31990 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-04 00:09:37,976-Speed 13872.22 samples/sec Loss 1.9439 LearningRate 0.0004 Epoch: 18 Global Step: 32000 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-04 00:09:55,674-Speed 13886.79 samples/sec Loss 1.9278 LearningRate 0.0004 Epoch: 18 Global Step: 32010 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-04 00:10:13,419-Speed 13850.89 samples/sec Loss 1.9397 LearningRate 0.0004 Epoch: 18 Global Step: 32020 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-04 00:10:31,182-Speed 13835.94 samples/sec Loss 1.9331 LearningRate 0.0004 Epoch: 18 Global Step: 32030 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-04 00:10:49,030-Speed 13770.48 samples/sec Loss 1.9182 LearningRate 0.0004 Epoch: 18 Global Step: 32040 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-04 00:11:07,063-Speed 13629.32 samples/sec Loss 1.9167 LearningRate 0.0004 Epoch: 18 Global Step: 32050 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-04 00:11:25,099-Speed 13627.49 samples/sec Loss 1.9182 LearningRate 0.0004 Epoch: 18 Global Step: 32060 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-04 00:11:43,146-Speed 13618.99 samples/sec Loss 1.9210 LearningRate 0.0004 Epoch: 18 Global Step: 32070 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-04 00:12:01,149-Speed 13651.41 samples/sec Loss 1.9181 LearningRate 0.0004 Epoch: 18 Global Step: 32080 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-04 00:12:19,167-Speed 13640.47 samples/sec Loss 1.9500 LearningRate 0.0004 Epoch: 18 Global Step: 32090 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-04 00:12:37,183-Speed 13642.07 samples/sec Loss 1.9210 LearningRate 0.0004 Epoch: 18 Global Step: 32100 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-04 00:12:54,954-Speed 13830.38 samples/sec Loss 1.8951 LearningRate 0.0004 Epoch: 18 Global Step: 32110 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-04 00:13:12,594-Speed 13932.51 samples/sec Loss 1.9119 LearningRate 0.0004 Epoch: 18 Global Step: 32120 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-04 00:13:30,345-Speed 13846.05 samples/sec Loss 1.9215 LearningRate 0.0004 Epoch: 18 Global Step: 32130 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-04 00:13:48,078-Speed 13859.60 samples/sec Loss 1.9362 LearningRate 0.0004 Epoch: 18 Global Step: 32140 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-04 00:14:05,808-Speed 13862.30 samples/sec Loss 1.9264 LearningRate 0.0004 Epoch: 18 Global Step: 32150 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-04 00:14:23,507-Speed 13887.52 samples/sec Loss 1.9182 LearningRate 0.0004 Epoch: 18 Global Step: 32160 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-04 00:14:41,190-Speed 13898.67 samples/sec Loss 1.9260 LearningRate 0.0004 Epoch: 18 Global Step: 32170 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-04 00:14:58,972-Speed 13821.37 samples/sec Loss 1.9182 LearningRate 0.0004 Epoch: 18 Global Step: 32180 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-04 00:15:16,676-Speed 13882.65 samples/sec Loss 1.9178 LearningRate 0.0004 Epoch: 18 Global Step: 32190 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-04 00:15:34,432-Speed 13841.56 samples/sec Loss 1.9236 LearningRate 0.0004 Epoch: 18 Global Step: 32200 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-04 00:15:52,102-Speed 13909.22 samples/sec Loss 1.9180 LearningRate 0.0004 Epoch: 18 Global Step: 32210 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-04 00:16:09,866-Speed 13836.71 samples/sec Loss 1.9243 LearningRate 0.0004 Epoch: 18 Global Step: 32220 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-03-04 00:16:27,616-Speed 13846.24 samples/sec Loss 1.9303 LearningRate 0.0004 Epoch: 18 Global Step: 32230 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-04 00:16:45,405-Speed 13816.62 samples/sec Loss 1.9220 LearningRate 0.0004 Epoch: 18 Global Step: 32240 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-04 00:17:03,152-Speed 13849.02 samples/sec Loss 1.9183 LearningRate 0.0004 Epoch: 18 Global Step: 32250 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-04 00:17:20,890-Speed 13855.76 samples/sec Loss 1.9220 LearningRate 0.0004 Epoch: 18 Global Step: 32260 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-04 00:17:38,694-Speed 13804.10 samples/sec Loss 1.9168 LearningRate 0.0004 Epoch: 18 Global Step: 32270 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-04 00:17:56,407-Speed 13875.94 samples/sec Loss 1.9153 LearningRate 0.0004 Epoch: 18 Global Step: 32280 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-04 00:18:14,143-Speed 13857.09 samples/sec Loss 1.9215 LearningRate 0.0004 Epoch: 18 Global Step: 32290 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-03-04 00:18:31,918-Speed 13827.34 samples/sec Loss 1.9122 LearningRate 0.0004 Epoch: 18 Global Step: 32300 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-04 00:18:49,722-Speed 13803.87 samples/sec Loss 1.9258 LearningRate 0.0004 Epoch: 18 Global Step: 32310 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-04 00:19:07,491-Speed 13832.00 samples/sec Loss 1.9308 LearningRate 0.0003 Epoch: 18 Global Step: 32320 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-04 00:19:25,326-Speed 13780.53 samples/sec Loss 1.9211 LearningRate 0.0003 Epoch: 18 Global Step: 32330 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-04 00:19:43,109-Speed 13821.60 samples/sec Loss 1.9047 LearningRate 0.0003 Epoch: 18 Global Step: 32340 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-04 00:20:00,907-Speed 13808.52 samples/sec Loss 1.8900 LearningRate 0.0003 Epoch: 18 Global Step: 32350 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-04 00:20:18,702-Speed 13811.93 samples/sec Loss 1.9139 LearningRate 0.0003 Epoch: 18 Global Step: 32360 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-04 00:20:36,436-Speed 13858.86 samples/sec Loss 1.9088 LearningRate 0.0003 Epoch: 18 Global Step: 32370 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-04 00:20:54,146-Speed 13878.27 samples/sec Loss 1.9213 LearningRate 0.0003 Epoch: 18 Global Step: 32380 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-04 00:21:11,848-Speed 13884.61 samples/sec Loss 1.9034 LearningRate 0.0003 Epoch: 18 Global Step: 32390 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-04 00:21:29,698-Speed 13769.13 samples/sec Loss 1.9106 LearningRate 0.0003 Epoch: 18 Global Step: 32400 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-04 00:21:47,401-Speed 13883.23 samples/sec Loss 1.9075 LearningRate 0.0003 Epoch: 18 Global Step: 32410 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-04 00:22:05,185-Speed 13819.59 samples/sec Loss 1.9049 LearningRate 0.0003 Epoch: 18 Global Step: 32420 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-04 00:22:22,955-Speed 13831.48 samples/sec Loss 1.9131 LearningRate 0.0003 Epoch: 18 Global Step: 32430 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-04 00:22:40,727-Speed 13828.71 samples/sec Loss 1.9173 LearningRate 0.0003 Epoch: 18 Global Step: 32440 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-04 00:22:58,524-Speed 13810.28 samples/sec Loss 1.9067 LearningRate 0.0003 Epoch: 18 Global Step: 32450 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-04 00:23:16,452-Speed 13709.17 samples/sec Loss 1.8960 LearningRate 0.0003 Epoch: 18 Global Step: 32460 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-04 00:23:34,326-Speed 13750.63 samples/sec Loss 1.8994 LearningRate 0.0003 Epoch: 18 Global Step: 32470 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-04 00:23:52,133-Speed 13802.20 samples/sec Loss 1.8976 LearningRate 0.0003 Epoch: 18 Global Step: 32480 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-04 00:24:09,958-Speed 13788.08 samples/sec Loss 1.9102 LearningRate 0.0003 Epoch: 18 Global Step: 32490 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-04 00:24:27,794-Speed 13779.90 samples/sec Loss 1.9057 LearningRate 0.0003 Epoch: 18 Global Step: 32500 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-04 00:24:45,508-Speed 13876.07 samples/sec Loss 1.8975 LearningRate 0.0003 Epoch: 18 Global Step: 32510 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-04 00:25:03,268-Speed 13839.09 samples/sec Loss 1.8977 LearningRate 0.0003 Epoch: 18 Global Step: 32520 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-04 00:25:21,026-Speed 13840.55 samples/sec Loss 1.9142 LearningRate 0.0003 Epoch: 18 Global Step: 32530 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-04 00:25:38,718-Speed 13891.64 samples/sec Loss 1.9079 LearningRate 0.0003 Epoch: 18 Global Step: 32540 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-04 00:25:56,494-Speed 13827.29 samples/sec Loss 1.9042 LearningRate 0.0003 Epoch: 18 Global Step: 32550 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-04 00:26:14,311-Speed 13794.74 samples/sec Loss 1.9000 LearningRate 0.0003 Epoch: 18 Global Step: 32560 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-04 00:26:32,054-Speed 13851.88 samples/sec Loss 1.9050 LearningRate 0.0003 Epoch: 18 Global Step: 32570 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-04 00:26:49,773-Speed 13871.31 samples/sec Loss 1.9061 LearningRate 0.0003 Epoch: 18 Global Step: 32580 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-04 00:27:07,515-Speed 13854.26 samples/sec Loss 1.9088 LearningRate 0.0003 Epoch: 18 Global Step: 32590 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-04 00:27:25,238-Speed 13867.54 samples/sec Loss 1.8933 LearningRate 0.0003 Epoch: 18 Global Step: 32600 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-04 00:27:43,131-Speed 13736.18 samples/sec Loss 1.9038 LearningRate 0.0003 Epoch: 18 Global Step: 32610 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-04 00:28:00,838-Speed 13880.48 samples/sec Loss 1.8924 LearningRate 0.0003 Epoch: 18 Global Step: 32620 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-04 00:28:18,586-Speed 13847.91 samples/sec Loss 1.9032 LearningRate 0.0003 Epoch: 18 Global Step: 32630 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-04 00:28:36,347-Speed 13837.47 samples/sec Loss 1.9109 LearningRate 0.0003 Epoch: 18 Global Step: 32640 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-04 00:28:54,135-Speed 13817.23 samples/sec Loss 1.9036 LearningRate 0.0003 Epoch: 18 Global Step: 32650 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-04 00:29:12,048-Speed 13721.05 samples/sec Loss 1.9039 LearningRate 0.0003 Epoch: 18 Global Step: 32660 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-04 00:29:29,810-Speed 13836.60 samples/sec Loss 1.8921 LearningRate 0.0003 Epoch: 18 Global Step: 32670 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-04 00:29:47,520-Speed 13877.63 samples/sec Loss 1.8916 LearningRate 0.0003 Epoch: 18 Global Step: 32680 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-04 00:30:05,219-Speed 13886.80 samples/sec Loss 1.9057 LearningRate 0.0003 Epoch: 18 Global Step: 32690 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-04 00:30:22,918-Speed 13886.34 samples/sec Loss 1.9200 LearningRate 0.0003 Epoch: 18 Global Step: 32700 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-04 00:30:40,678-Speed 13839.02 samples/sec Loss 1.9109 LearningRate 0.0003 Epoch: 18 Global Step: 32710 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-04 00:30:58,463-Speed 13819.13 samples/sec Loss 1.9005 LearningRate 0.0003 Epoch: 18 Global Step: 32720 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-04 00:31:16,252-Speed 13815.80 samples/sec Loss 1.8987 LearningRate 0.0003 Epoch: 18 Global Step: 32730 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-04 00:31:34,132-Speed 13746.06 samples/sec Loss 1.8992 LearningRate 0.0003 Epoch: 18 Global Step: 32740 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-04 00:31:51,929-Speed 13809.95 samples/sec Loss 1.9002 LearningRate 0.0003 Epoch: 18 Global Step: 32750 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-04 00:32:09,725-Speed 13810.52 samples/sec Loss 1.9205 LearningRate 0.0003 Epoch: 18 Global Step: 32760 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-04 00:32:27,511-Speed 13818.16 samples/sec Loss 1.9131 LearningRate 0.0003 Epoch: 18 Global Step: 32770 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-04 00:32:45,186-Speed 13905.45 samples/sec Loss 1.8965 LearningRate 0.0003 Epoch: 18 Global Step: 32780 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-04 00:33:02,949-Speed 13837.36 samples/sec Loss 1.9216 LearningRate 0.0003 Epoch: 18 Global Step: 32790 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-04 00:33:20,707-Speed 13840.02 samples/sec Loss 1.9139 LearningRate 0.0003 Epoch: 18 Global Step: 32800 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-04 00:33:38,427-Speed 13869.52 samples/sec Loss 1.9124 LearningRate 0.0003 Epoch: 18 Global Step: 32810 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-04 00:33:56,152-Speed 13865.91 samples/sec Loss 1.9301 LearningRate 0.0003 Epoch: 18 Global Step: 32820 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-04 00:34:13,898-Speed 13850.39 samples/sec Loss 1.9267 LearningRate 0.0003 Epoch: 18 Global Step: 32830 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-04 00:35:22,359-Speed 3589.83 samples/sec Loss 1.9103 LearningRate 0.0003 Epoch: 19 Global Step: 32840 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-04 00:35:40,028-Speed 13910.50 samples/sec Loss 1.8679 LearningRate 0.0003 Epoch: 19 Global Step: 32850 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-04 00:35:57,687-Speed 13918.26 samples/sec Loss 1.8714 LearningRate 0.0003 Epoch: 19 Global Step: 32860 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-04 00:36:15,476-Speed 13816.00 samples/sec Loss 1.8766 LearningRate 0.0003 Epoch: 19 Global Step: 32870 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-04 00:36:33,179-Speed 13883.36 samples/sec Loss 1.8609 LearningRate 0.0003 Epoch: 19 Global Step: 32880 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-03-04 00:36:50,860-Speed 13901.25 samples/sec Loss 1.8614 LearningRate 0.0003 Epoch: 19 Global Step: 32890 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-04 00:37:08,583-Speed 13866.87 samples/sec Loss 1.8628 LearningRate 0.0003 Epoch: 19 Global Step: 32900 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-04 00:37:26,356-Speed 13828.85 samples/sec Loss 1.8663 LearningRate 0.0003 Epoch: 19 Global Step: 32910 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-04 00:37:44,095-Speed 13855.76 samples/sec Loss 1.8833 LearningRate 0.0003 Epoch: 19 Global Step: 32920 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-04 00:38:01,882-Speed 13817.49 samples/sec Loss 1.8796 LearningRate 0.0003 Epoch: 19 Global Step: 32930 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-04 00:38:19,702-Speed 13792.06 samples/sec Loss 1.8676 LearningRate 0.0003 Epoch: 19 Global Step: 32940 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-04 00:38:37,592-Speed 13738.36 samples/sec Loss 1.8721 LearningRate 0.0003 Epoch: 19 Global Step: 32950 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-04 00:38:55,603-Speed 13645.56 samples/sec Loss 1.8663 LearningRate 0.0003 Epoch: 19 Global Step: 32960 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-04 00:39:13,737-Speed 13553.33 samples/sec Loss 1.8649 LearningRate 0.0003 Epoch: 19 Global Step: 32970 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-04 00:39:31,853-Speed 13569.45 samples/sec Loss 1.8557 LearningRate 0.0003 Epoch: 19 Global Step: 32980 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-04 00:39:49,975-Speed 13562.63 samples/sec Loss 1.8794 LearningRate 0.0003 Epoch: 19 Global Step: 32990 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-04 00:40:07,907-Speed 13705.95 samples/sec Loss 1.8693 LearningRate 0.0003 Epoch: 19 Global Step: 33000 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-04 00:40:25,758-Speed 13768.23 samples/sec Loss 1.8653 LearningRate 0.0003 Epoch: 19 Global Step: 33010 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-04 00:40:43,544-Speed 13818.45 samples/sec Loss 1.8679 LearningRate 0.0003 Epoch: 19 Global Step: 33020 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-04 00:41:01,258-Speed 13874.67 samples/sec Loss 1.8772 LearningRate 0.0003 Epoch: 19 Global Step: 33030 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-04 00:41:19,041-Speed 13821.34 samples/sec Loss 1.8608 LearningRate 0.0003 Epoch: 19 Global Step: 33040 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-04 00:41:36,867-Speed 13788.02 samples/sec Loss 1.8768 LearningRate 0.0003 Epoch: 19 Global Step: 33050 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-04 00:41:54,584-Speed 13872.89 samples/sec Loss 1.8674 LearningRate 0.0003 Epoch: 19 Global Step: 33060 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-04 00:42:12,340-Speed 13842.45 samples/sec Loss 1.8892 LearningRate 0.0003 Epoch: 19 Global Step: 33070 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-04 00:42:30,065-Speed 13866.09 samples/sec Loss 1.8822 LearningRate 0.0003 Epoch: 19 Global Step: 33080 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-04 00:42:47,790-Speed 13866.27 samples/sec Loss 1.8844 LearningRate 0.0003 Epoch: 19 Global Step: 33090 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-04 00:43:05,526-Speed 13856.88 samples/sec Loss 1.8819 LearningRate 0.0003 Epoch: 19 Global Step: 33100 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-03-04 00:43:23,250-Speed 13867.00 samples/sec Loss 1.8815 LearningRate 0.0003 Epoch: 19 Global Step: 33110 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-03-04 00:43:41,051-Speed 13807.42 samples/sec Loss 1.8792 LearningRate 0.0003 Epoch: 19 Global Step: 33120 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-03-04 00:43:58,783-Speed 13860.06 samples/sec Loss 1.8828 LearningRate 0.0003 Epoch: 19 Global Step: 33130 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-03-04 00:44:16,569-Speed 13818.76 samples/sec Loss 1.8694 LearningRate 0.0003 Epoch: 19 Global Step: 33140 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-03-04 00:44:34,291-Speed 13868.54 samples/sec Loss 1.8714 LearningRate 0.0003 Epoch: 19 Global Step: 33150 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-03-04 00:44:52,039-Speed 13847.67 samples/sec Loss 1.8816 LearningRate 0.0003 Epoch: 19 Global Step: 33160 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-03-04 00:45:09,932-Speed 13736.03 samples/sec Loss 1.8874 LearningRate 0.0003 Epoch: 19 Global Step: 33170 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-03-04 00:45:27,718-Speed 13818.68 samples/sec Loss 1.8724 LearningRate 0.0003 Epoch: 19 Global Step: 33180 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-03-04 00:45:45,398-Speed 13901.80 samples/sec Loss 1.8836 LearningRate 0.0003 Epoch: 19 Global Step: 33190 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-03-04 00:46:03,205-Speed 13802.11 samples/sec Loss 1.8795 LearningRate 0.0003 Epoch: 19 Global Step: 33200 Fp16 Grad Scale: 8192 Required: 18 hours Training: 2022-03-04 00:46:21,094-Speed 13738.99 samples/sec Loss 1.8786 LearningRate 0.0003 Epoch: 19 Global Step: 33210 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-03-04 00:46:38,920-Speed 13787.45 samples/sec Loss 1.8728 LearningRate 0.0003 Epoch: 19 Global Step: 33220 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-03-04 00:46:56,693-Speed 13829.27 samples/sec Loss 1.8581 LearningRate 0.0003 Epoch: 19 Global Step: 33230 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-03-04 00:47:14,412-Speed 13870.85 samples/sec Loss 1.8772 LearningRate 0.0003 Epoch: 19 Global Step: 33240 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-03-04 00:47:32,146-Speed 13858.80 samples/sec Loss 1.8647 LearningRate 0.0003 Epoch: 19 Global Step: 33250 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-03-04 00:47:49,945-Speed 13808.63 samples/sec Loss 1.8634 LearningRate 0.0003 Epoch: 19 Global Step: 33260 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-03-04 00:48:07,733-Speed 13816.80 samples/sec Loss 1.8680 LearningRate 0.0003 Epoch: 19 Global Step: 33270 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-03-04 00:48:25,478-Speed 13851.12 samples/sec Loss 1.8785 LearningRate 0.0003 Epoch: 19 Global Step: 33280 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-03-04 00:48:43,243-Speed 13834.47 samples/sec Loss 1.8785 LearningRate 0.0003 Epoch: 19 Global Step: 33290 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-03-04 00:49:01,044-Speed 13806.36 samples/sec Loss 1.8572 LearningRate 0.0003 Epoch: 19 Global Step: 33300 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-03-04 00:49:18,767-Speed 13867.58 samples/sec Loss 1.8595 LearningRate 0.0003 Epoch: 19 Global Step: 33310 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-04 00:49:36,528-Speed 13838.94 samples/sec Loss 1.8644 LearningRate 0.0003 Epoch: 19 Global Step: 33320 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-04 00:49:54,272-Speed 13851.02 samples/sec Loss 1.8708 LearningRate 0.0003 Epoch: 19 Global Step: 33330 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-04 00:50:12,081-Speed 13800.82 samples/sec Loss 1.8608 LearningRate 0.0003 Epoch: 19 Global Step: 33340 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-04 00:50:29,828-Speed 13848.37 samples/sec Loss 1.8659 LearningRate 0.0003 Epoch: 19 Global Step: 33350 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-04 00:50:47,511-Speed 13899.34 samples/sec Loss 1.8600 LearningRate 0.0003 Epoch: 19 Global Step: 33360 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-03-04 00:51:05,266-Speed 13843.15 samples/sec Loss 1.8557 LearningRate 0.0003 Epoch: 19 Global Step: 33370 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-03-04 00:51:23,100-Speed 13780.61 samples/sec Loss 1.8733 LearningRate 0.0003 Epoch: 19 Global Step: 33380 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-03-04 00:51:40,862-Speed 13837.20 samples/sec Loss 1.8518 LearningRate 0.0003 Epoch: 19 Global Step: 33390 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-03-04 00:51:58,592-Speed 13862.29 samples/sec Loss 1.8563 LearningRate 0.0003 Epoch: 19 Global Step: 33400 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-03-04 00:52:16,302-Speed 13878.22 samples/sec Loss 1.8500 LearningRate 0.0003 Epoch: 19 Global Step: 33410 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-03-04 00:52:34,160-Speed 13762.34 samples/sec Loss 1.8565 LearningRate 0.0003 Epoch: 19 Global Step: 33420 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-03-04 00:52:51,850-Speed 13893.05 samples/sec Loss 1.8602 LearningRate 0.0003 Epoch: 19 Global Step: 33430 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-03-04 00:53:09,520-Speed 13909.74 samples/sec Loss 1.8621 LearningRate 0.0003 Epoch: 19 Global Step: 33440 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-03-04 00:53:27,339-Speed 13792.76 samples/sec Loss 1.8662 LearningRate 0.0003 Epoch: 19 Global Step: 33450 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-03-04 00:53:45,036-Speed 13887.78 samples/sec Loss 1.8468 LearningRate 0.0003 Epoch: 19 Global Step: 33460 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-04 00:54:02,801-Speed 13834.56 samples/sec Loss 1.8533 LearningRate 0.0003 Epoch: 19 Global Step: 33470 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-04 00:54:20,645-Speed 13774.82 samples/sec Loss 1.8639 LearningRate 0.0003 Epoch: 19 Global Step: 33480 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-04 00:54:38,454-Speed 13800.67 samples/sec Loss 1.8710 LearningRate 0.0003 Epoch: 19 Global Step: 33490 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-04 00:54:56,435-Speed 13668.23 samples/sec Loss 1.8715 LearningRate 0.0003 Epoch: 19 Global Step: 33500 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-04 00:55:14,607-Speed 13526.08 samples/sec Loss 1.8593 LearningRate 0.0003 Epoch: 19 Global Step: 33510 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-04 00:55:32,682-Speed 13597.54 samples/sec Loss 1.8634 LearningRate 0.0003 Epoch: 19 Global Step: 33520 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-04 00:55:50,760-Speed 13595.32 samples/sec Loss 1.8593 LearningRate 0.0003 Epoch: 19 Global Step: 33530 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-04 00:56:08,745-Speed 13666.16 samples/sec Loss 1.8618 LearningRate 0.0003 Epoch: 19 Global Step: 33540 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-04 00:56:26,819-Speed 13597.87 samples/sec Loss 1.8529 LearningRate 0.0003 Epoch: 19 Global Step: 33550 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-04 00:56:44,848-Speed 13632.38 samples/sec Loss 1.8600 LearningRate 0.0003 Epoch: 19 Global Step: 33560 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-04 00:57:02,830-Speed 13668.09 samples/sec Loss 1.8505 LearningRate 0.0003 Epoch: 19 Global Step: 33570 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-04 00:57:20,786-Speed 13687.46 samples/sec Loss 1.8395 LearningRate 0.0003 Epoch: 19 Global Step: 33580 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-03-04 00:57:38,778-Speed 13660.58 samples/sec Loss 1.8390 LearningRate 0.0003 Epoch: 19 Global Step: 33590 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-03-04 00:57:56,731-Speed 13689.32 samples/sec Loss 1.8415 LearningRate 0.0003 Epoch: 19 Global Step: 33600 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-03-04 00:58:14,712-Speed 13668.65 samples/sec Loss 1.8470 LearningRate 0.0003 Epoch: 19 Global Step: 33610 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-03-04 00:58:32,546-Speed 13781.41 samples/sec Loss 1.8430 LearningRate 0.0003 Epoch: 19 Global Step: 33620 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-03-04 00:58:50,261-Speed 13874.20 samples/sec Loss 1.8503 LearningRate 0.0003 Epoch: 19 Global Step: 33630 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-03-04 00:59:07,967-Speed 13881.17 samples/sec Loss 1.8587 LearningRate 0.0003 Epoch: 19 Global Step: 33640 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-03-04 00:59:25,709-Speed 13852.65 samples/sec Loss 1.8375 LearningRate 0.0003 Epoch: 19 Global Step: 33650 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-03-04 00:59:43,415-Speed 13881.23 samples/sec Loss 1.8322 LearningRate 0.0003 Epoch: 19 Global Step: 33660 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-03-04 01:00:01,168-Speed 13843.62 samples/sec Loss 1.8397 LearningRate 0.0003 Epoch: 19 Global Step: 33670 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-03-04 01:00:18,877-Speed 13878.70 samples/sec Loss 1.8447 LearningRate 0.0003 Epoch: 19 Global Step: 33680 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-04 01:00:36,626-Speed 13847.61 samples/sec Loss 1.8494 LearningRate 0.0003 Epoch: 19 Global Step: 33690 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-04 01:00:54,311-Speed 13897.03 samples/sec Loss 1.8606 LearningRate 0.0003 Epoch: 19 Global Step: 33700 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-04 01:01:12,043-Speed 13860.85 samples/sec Loss 1.8495 LearningRate 0.0003 Epoch: 19 Global Step: 33710 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-04 01:01:29,751-Speed 13879.19 samples/sec Loss 1.8422 LearningRate 0.0003 Epoch: 19 Global Step: 33720 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-04 01:01:47,489-Speed 13856.04 samples/sec Loss 1.8496 LearningRate 0.0003 Epoch: 19 Global Step: 33730 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-04 01:02:05,182-Speed 13891.69 samples/sec Loss 1.8543 LearningRate 0.0003 Epoch: 19 Global Step: 33740 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-04 01:02:22,933-Speed 13846.41 samples/sec Loss 1.8445 LearningRate 0.0003 Epoch: 19 Global Step: 33750 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-04 01:02:40,712-Speed 13823.62 samples/sec Loss 1.8320 LearningRate 0.0003 Epoch: 19 Global Step: 33760 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-04 01:02:58,409-Speed 13888.13 samples/sec Loss 1.8364 LearningRate 0.0003 Epoch: 19 Global Step: 33770 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-04 01:03:16,133-Speed 13867.40 samples/sec Loss 1.8498 LearningRate 0.0003 Epoch: 19 Global Step: 33780 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-04 01:03:33,878-Speed 13851.33 samples/sec Loss 1.8413 LearningRate 0.0003 Epoch: 19 Global Step: 33790 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-04 01:03:51,682-Speed 13804.68 samples/sec Loss 1.8411 LearningRate 0.0003 Epoch: 19 Global Step: 33800 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-04 01:04:09,369-Speed 13896.11 samples/sec Loss 1.8264 LearningRate 0.0003 Epoch: 19 Global Step: 33810 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-04 01:04:27,225-Speed 13764.51 samples/sec Loss 1.8278 LearningRate 0.0003 Epoch: 19 Global Step: 33820 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-04 01:04:44,898-Speed 13906.76 samples/sec Loss 1.8316 LearningRate 0.0003 Epoch: 19 Global Step: 33830 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-04 01:05:02,673-Speed 13827.32 samples/sec Loss 1.8385 LearningRate 0.0003 Epoch: 19 Global Step: 33840 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-04 01:05:20,400-Speed 13863.93 samples/sec Loss 1.8449 LearningRate 0.0003 Epoch: 19 Global Step: 33850 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-04 01:05:38,152-Speed 13846.67 samples/sec Loss 1.8397 LearningRate 0.0003 Epoch: 19 Global Step: 33860 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-04 01:05:55,927-Speed 13827.27 samples/sec Loss 1.8419 LearningRate 0.0003 Epoch: 19 Global Step: 33870 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-04 01:06:13,653-Speed 13864.96 samples/sec Loss 1.8272 LearningRate 0.0003 Epoch: 19 Global Step: 33880 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-04 01:06:31,364-Speed 13877.26 samples/sec Loss 1.8474 LearningRate 0.0003 Epoch: 19 Global Step: 33890 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-04 01:06:49,133-Speed 13832.15 samples/sec Loss 1.8324 LearningRate 0.0003 Epoch: 19 Global Step: 33900 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-04 01:07:06,868-Speed 13858.67 samples/sec Loss 1.8309 LearningRate 0.0003 Epoch: 19 Global Step: 33910 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-04 01:07:24,569-Speed 13884.73 samples/sec Loss 1.8332 LearningRate 0.0003 Epoch: 19 Global Step: 33920 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-04 01:07:42,323-Speed 13843.05 samples/sec Loss 1.8273 LearningRate 0.0003 Epoch: 19 Global Step: 33930 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-04 01:08:00,077-Speed 13843.77 samples/sec Loss 1.8406 LearningRate 0.0003 Epoch: 19 Global Step: 33940 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-04 01:08:17,816-Speed 13855.04 samples/sec Loss 1.8517 LearningRate 0.0003 Epoch: 19 Global Step: 33950 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-04 01:08:35,569-Speed 13844.51 samples/sec Loss 1.8422 LearningRate 0.0003 Epoch: 19 Global Step: 33960 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-04 01:08:53,300-Speed 13860.86 samples/sec Loss 1.8457 LearningRate 0.0003 Epoch: 19 Global Step: 33970 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-04 01:09:10,980-Speed 13901.65 samples/sec Loss 1.8255 LearningRate 0.0003 Epoch: 19 Global Step: 33980 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-04 01:09:28,711-Speed 13861.82 samples/sec Loss 1.8367 LearningRate 0.0003 Epoch: 19 Global Step: 33990 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-04 01:09:46,594-Speed 13743.07 samples/sec Loss 1.8455 LearningRate 0.0003 Epoch: 19 Global Step: 34000 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-04 01:10:04,317-Speed 13868.64 samples/sec Loss 1.8321 LearningRate 0.0003 Epoch: 19 Global Step: 34010 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-04 01:10:22,031-Speed 13874.09 samples/sec Loss 1.8261 LearningRate 0.0003 Epoch: 19 Global Step: 34020 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-04 01:10:39,792-Speed 13838.36 samples/sec Loss 1.8333 LearningRate 0.0003 Epoch: 19 Global Step: 34030 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-04 01:10:57,502-Speed 13878.14 samples/sec Loss 1.8402 LearningRate 0.0003 Epoch: 19 Global Step: 34040 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-04 01:11:15,231-Speed 13862.32 samples/sec Loss 1.8427 LearningRate 0.0003 Epoch: 19 Global Step: 34050 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-04 01:11:33,016-Speed 13819.60 samples/sec Loss 1.8304 LearningRate 0.0003 Epoch: 19 Global Step: 34060 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-04 01:11:50,743-Speed 13864.40 samples/sec Loss 1.8237 LearningRate 0.0003 Epoch: 19 Global Step: 34070 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-04 01:12:08,667-Speed 13712.21 samples/sec Loss 1.8196 LearningRate 0.0003 Epoch: 19 Global Step: 34080 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-04 01:12:26,423-Speed 13842.24 samples/sec Loss 1.8163 LearningRate 0.0003 Epoch: 19 Global Step: 34090 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-04 01:12:44,127-Speed 13882.36 samples/sec Loss 1.8153 LearningRate 0.0003 Epoch: 19 Global Step: 34100 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-04 01:13:01,810-Speed 13898.70 samples/sec Loss 1.8095 LearningRate 0.0003 Epoch: 19 Global Step: 34110 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-04 01:13:19,535-Speed 13866.20 samples/sec Loss 1.8089 LearningRate 0.0003 Epoch: 19 Global Step: 34120 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-04 01:13:37,229-Speed 13890.40 samples/sec Loss 1.8232 LearningRate 0.0003 Epoch: 19 Global Step: 34130 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-04 01:13:54,928-Speed 13886.58 samples/sec Loss 1.8264 LearningRate 0.0003 Epoch: 19 Global Step: 34140 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-04 01:14:12,651-Speed 13866.98 samples/sec Loss 1.8119 LearningRate 0.0003 Epoch: 19 Global Step: 34150 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-04 01:14:30,501-Speed 13769.47 samples/sec Loss 1.8282 LearningRate 0.0003 Epoch: 19 Global Step: 34160 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-04 01:14:48,219-Speed 13871.76 samples/sec Loss 1.8171 LearningRate 0.0003 Epoch: 19 Global Step: 34170 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-04 01:15:05,944-Speed 13865.80 samples/sec Loss 1.8158 LearningRate 0.0003 Epoch: 19 Global Step: 34180 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-04 01:15:23,700-Speed 13841.87 samples/sec Loss 1.8143 LearningRate 0.0003 Epoch: 19 Global Step: 34190 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-04 01:15:41,436-Speed 13857.65 samples/sec Loss 1.8182 LearningRate 0.0003 Epoch: 19 Global Step: 34200 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-04 01:15:59,169-Speed 13860.01 samples/sec Loss 1.8182 LearningRate 0.0003 Epoch: 19 Global Step: 34210 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-04 01:16:16,911-Speed 13852.50 samples/sec Loss 1.8254 LearningRate 0.0003 Epoch: 19 Global Step: 34220 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-04 01:16:34,670-Speed 13839.15 samples/sec Loss 1.8117 LearningRate 0.0003 Epoch: 19 Global Step: 34230 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-03-04 01:16:52,466-Speed 13810.99 samples/sec Loss 1.8128 LearningRate 0.0003 Epoch: 19 Global Step: 34240 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-04 01:17:10,550-Speed 13591.26 samples/sec Loss 1.8210 LearningRate 0.0003 Epoch: 19 Global Step: 34250 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-04 01:17:28,370-Speed 13791.44 samples/sec Loss 1.8255 LearningRate 0.0003 Epoch: 19 Global Step: 34260 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-04 01:17:46,196-Speed 13787.49 samples/sec Loss 1.8204 LearningRate 0.0003 Epoch: 19 Global Step: 34270 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-04 01:18:03,932-Speed 13857.90 samples/sec Loss 1.8126 LearningRate 0.0003 Epoch: 19 Global Step: 34280 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-03-04 01:18:21,656-Speed 13868.33 samples/sec Loss 1.8151 LearningRate 0.0003 Epoch: 19 Global Step: 34290 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-04 01:18:39,452-Speed 13810.92 samples/sec Loss 1.8177 LearningRate 0.0003 Epoch: 19 Global Step: 34300 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-04 01:18:57,204-Speed 13844.45 samples/sec Loss 1.8134 LearningRate 0.0003 Epoch: 19 Global Step: 34310 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-03-04 01:19:15,083-Speed 13747.07 samples/sec Loss 1.8269 LearningRate 0.0003 Epoch: 19 Global Step: 34320 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-03-04 01:19:32,935-Speed 13767.70 samples/sec Loss 1.8289 LearningRate 0.0003 Epoch: 19 Global Step: 34330 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-03-04 01:19:50,697-Speed 13836.82 samples/sec Loss 1.8152 LearningRate 0.0003 Epoch: 19 Global Step: 34340 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-03-04 01:20:08,414-Speed 13871.74 samples/sec Loss 1.8172 LearningRate 0.0003 Epoch: 19 Global Step: 34350 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-03-04 01:20:26,266-Speed 13767.58 samples/sec Loss 1.8182 LearningRate 0.0003 Epoch: 19 Global Step: 34360 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-03-04 01:20:44,004-Speed 13856.53 samples/sec Loss 1.8106 LearningRate 0.0003 Epoch: 19 Global Step: 34370 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-03-04 01:21:01,776-Speed 13829.43 samples/sec Loss 1.8258 LearningRate 0.0003 Epoch: 19 Global Step: 34380 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-03-04 01:21:19,548-Speed 13829.09 samples/sec Loss 1.8227 LearningRate 0.0003 Epoch: 19 Global Step: 34390 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-03-04 01:21:37,286-Speed 13855.64 samples/sec Loss 1.8133 LearningRate 0.0003 Epoch: 19 Global Step: 34400 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-03-04 01:21:55,013-Speed 13864.54 samples/sec Loss 1.8205 LearningRate 0.0003 Epoch: 19 Global Step: 34410 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-04 01:22:12,806-Speed 13813.11 samples/sec Loss 1.8126 LearningRate 0.0003 Epoch: 19 Global Step: 34420 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-04 01:22:30,563-Speed 13841.46 samples/sec Loss 1.8190 LearningRate 0.0003 Epoch: 19 Global Step: 34430 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-04 01:22:48,247-Speed 13897.99 samples/sec Loss 1.8083 LearningRate 0.0003 Epoch: 19 Global Step: 34440 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-04 01:23:06,008-Speed 13837.69 samples/sec Loss 1.8099 LearningRate 0.0003 Epoch: 19 Global Step: 34450 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-04 01:23:23,769-Speed 13838.71 samples/sec Loss 1.8057 LearningRate 0.0003 Epoch: 19 Global Step: 34460 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-04 01:23:41,537-Speed 13832.50 samples/sec Loss 1.8101 LearningRate 0.0003 Epoch: 19 Global Step: 34470 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-04 01:23:59,274-Speed 13856.19 samples/sec Loss 1.8173 LearningRate 0.0003 Epoch: 19 Global Step: 34480 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-04 01:24:17,192-Speed 13716.60 samples/sec Loss 1.8276 LearningRate 0.0003 Epoch: 19 Global Step: 34490 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-04 01:24:34,951-Speed 13840.14 samples/sec Loss 1.8391 LearningRate 0.0003 Epoch: 19 Global Step: 34500 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-04 01:24:52,788-Speed 13778.83 samples/sec Loss 1.8232 LearningRate 0.0003 Epoch: 19 Global Step: 34510 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-04 01:25:10,561-Speed 13828.52 samples/sec Loss 1.8260 LearningRate 0.0003 Epoch: 19 Global Step: 34520 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-04 01:25:28,275-Speed 13874.55 samples/sec Loss 1.8242 LearningRate 0.0003 Epoch: 19 Global Step: 34530 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-04 01:25:46,006-Speed 13861.47 samples/sec Loss 1.8146 LearningRate 0.0003 Epoch: 19 Global Step: 34540 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-04 01:26:03,784-Speed 13824.32 samples/sec Loss 1.8153 LearningRate 0.0003 Epoch: 19 Global Step: 34550 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-04 01:26:21,612-Speed 13785.99 samples/sec Loss 1.8345 LearningRate 0.0003 Epoch: 19 Global Step: 34560 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-04 01:27:30,391-Speed 3573.26 samples/sec Loss 1.8166 LearningRate 0.0003 Epoch: 20 Global Step: 34570 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-04 01:27:47,994-Speed 13962.37 samples/sec Loss 1.7775 LearningRate 0.0003 Epoch: 20 Global Step: 34580 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-04 01:28:05,795-Speed 13806.66 samples/sec Loss 1.7887 LearningRate 0.0003 Epoch: 20 Global Step: 34590 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-04 01:28:23,452-Speed 13919.51 samples/sec Loss 1.7873 LearningRate 0.0003 Epoch: 20 Global Step: 34600 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-04 01:28:41,206-Speed 13843.58 samples/sec Loss 1.7902 LearningRate 0.0003 Epoch: 20 Global Step: 34610 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-04 01:28:58,942-Speed 13857.14 samples/sec Loss 1.7915 LearningRate 0.0003 Epoch: 20 Global Step: 34620 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-04 01:29:16,721-Speed 13824.03 samples/sec Loss 1.7772 LearningRate 0.0003 Epoch: 20 Global Step: 34630 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-04 01:29:34,439-Speed 13875.40 samples/sec Loss 1.7957 LearningRate 0.0003 Epoch: 20 Global Step: 34640 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-04 01:29:52,145-Speed 13880.32 samples/sec Loss 1.7866 LearningRate 0.0003 Epoch: 20 Global Step: 34650 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-04 01:30:09,912-Speed 13835.21 samples/sec Loss 1.8010 LearningRate 0.0003 Epoch: 20 Global Step: 34660 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-04 01:30:27,697-Speed 13819.11 samples/sec Loss 1.7847 LearningRate 0.0003 Epoch: 20 Global Step: 34670 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-04 01:30:45,434-Speed 13856.07 samples/sec Loss 1.7985 LearningRate 0.0003 Epoch: 20 Global Step: 34680 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-04 01:31:03,125-Speed 13892.72 samples/sec Loss 1.7909 LearningRate 0.0003 Epoch: 20 Global Step: 34690 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-04 01:31:20,870-Speed 13851.05 samples/sec Loss 1.7902 LearningRate 0.0003 Epoch: 20 Global Step: 34700 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-04 01:31:38,652-Speed 13821.10 samples/sec Loss 1.7903 LearningRate 0.0003 Epoch: 20 Global Step: 34710 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-04 01:31:56,388-Speed 13857.94 samples/sec Loss 1.8135 LearningRate 0.0003 Epoch: 20 Global Step: 34720 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-04 01:32:14,122-Speed 13858.55 samples/sec Loss 1.7949 LearningRate 0.0003 Epoch: 20 Global Step: 34730 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-04 01:32:31,887-Speed 13834.73 samples/sec Loss 1.7840 LearningRate 0.0003 Epoch: 20 Global Step: 34740 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-04 01:32:49,583-Speed 13889.35 samples/sec Loss 1.7822 LearningRate 0.0003 Epoch: 20 Global Step: 34750 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-04 01:33:07,358-Speed 13827.12 samples/sec Loss 1.7909 LearningRate 0.0003 Epoch: 20 Global Step: 34760 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-04 01:33:25,137-Speed 13823.84 samples/sec Loss 1.7931 LearningRate 0.0003 Epoch: 20 Global Step: 34770 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-04 01:33:42,863-Speed 13865.06 samples/sec Loss 1.8030 LearningRate 0.0003 Epoch: 20 Global Step: 34780 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-04 01:34:00,574-Speed 13877.38 samples/sec Loss 1.7957 LearningRate 0.0003 Epoch: 20 Global Step: 34790 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-04 01:34:18,336-Speed 13836.88 samples/sec Loss 1.7932 LearningRate 0.0003 Epoch: 20 Global Step: 34800 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-04 01:34:36,055-Speed 13870.77 samples/sec Loss 1.8052 LearningRate 0.0003 Epoch: 20 Global Step: 34810 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-04 01:34:53,857-Speed 13806.48 samples/sec Loss 1.7986 LearningRate 0.0003 Epoch: 20 Global Step: 34820 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-04 01:35:11,603-Speed 13849.18 samples/sec Loss 1.7933 LearningRate 0.0003 Epoch: 20 Global Step: 34830 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-04 01:35:29,344-Speed 13854.02 samples/sec Loss 1.7862 LearningRate 0.0003 Epoch: 20 Global Step: 34840 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-04 01:35:47,109-Speed 13834.27 samples/sec Loss 1.8023 LearningRate 0.0003 Epoch: 20 Global Step: 34850 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-04 01:36:04,841-Speed 13860.98 samples/sec Loss 1.7928 LearningRate 0.0003 Epoch: 20 Global Step: 34860 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-04 01:36:22,596-Speed 13843.48 samples/sec Loss 1.7910 LearningRate 0.0003 Epoch: 20 Global Step: 34870 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-04 01:36:40,308-Speed 13876.09 samples/sec Loss 1.7828 LearningRate 0.0003 Epoch: 20 Global Step: 34880 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-04 01:36:58,027-Speed 13870.49 samples/sec Loss 1.7952 LearningRate 0.0003 Epoch: 20 Global Step: 34890 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-04 01:37:15,741-Speed 13875.34 samples/sec Loss 1.7752 LearningRate 0.0003 Epoch: 20 Global Step: 34900 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-04 01:37:33,517-Speed 13826.36 samples/sec Loss 1.7885 LearningRate 0.0003 Epoch: 20 Global Step: 34910 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-04 01:37:51,280-Speed 13836.25 samples/sec Loss 1.7820 LearningRate 0.0003 Epoch: 20 Global Step: 34920 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-04 01:38:09,020-Speed 13854.44 samples/sec Loss 1.7880 LearningRate 0.0003 Epoch: 20 Global Step: 34930 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-04 01:38:26,788-Speed 13832.12 samples/sec Loss 1.7965 LearningRate 0.0003 Epoch: 20 Global Step: 34940 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-04 01:38:44,519-Speed 13861.41 samples/sec Loss 1.7799 LearningRate 0.0003 Epoch: 20 Global Step: 34950 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-04 01:39:02,269-Speed 13846.99 samples/sec Loss 1.7939 LearningRate 0.0003 Epoch: 20 Global Step: 34960 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-04 01:39:20,040-Speed 13829.99 samples/sec Loss 1.7906 LearningRate 0.0003 Epoch: 20 Global Step: 34970 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-04 01:39:37,793-Speed 13843.58 samples/sec Loss 1.7842 LearningRate 0.0003 Epoch: 20 Global Step: 34980 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-04 01:39:55,504-Speed 13877.36 samples/sec Loss 1.7861 LearningRate 0.0003 Epoch: 20 Global Step: 34990 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-04 01:40:13,204-Speed 13886.03 samples/sec Loss 1.8052 LearningRate 0.0003 Epoch: 20 Global Step: 35000 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-04 01:40:30,970-Speed 13834.04 samples/sec Loss 1.7824 LearningRate 0.0003 Epoch: 20 Global Step: 35010 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-04 01:40:48,740-Speed 13830.18 samples/sec Loss 1.7872 LearningRate 0.0003 Epoch: 20 Global Step: 35020 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-04 01:41:06,435-Speed 13889.38 samples/sec Loss 1.7803 LearningRate 0.0003 Epoch: 20 Global Step: 35030 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-04 01:41:24,152-Speed 13873.05 samples/sec Loss 1.7810 LearningRate 0.0003 Epoch: 20 Global Step: 35040 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-04 01:41:41,841-Speed 13894.48 samples/sec Loss 1.7726 LearningRate 0.0003 Epoch: 20 Global Step: 35050 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-04 01:41:59,630-Speed 13815.48 samples/sec Loss 1.7864 LearningRate 0.0003 Epoch: 20 Global Step: 35060 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-04 01:42:17,366-Speed 13857.27 samples/sec Loss 1.7760 LearningRate 0.0003 Epoch: 20 Global Step: 35070 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-04 01:42:35,162-Speed 13811.21 samples/sec Loss 1.7737 LearningRate 0.0003 Epoch: 20 Global Step: 35080 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-04 01:42:52,857-Speed 13889.51 samples/sec Loss 1.7865 LearningRate 0.0003 Epoch: 20 Global Step: 35090 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-04 01:43:10,618-Speed 13838.36 samples/sec Loss 1.7772 LearningRate 0.0003 Epoch: 20 Global Step: 35100 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-03-04 01:43:28,306-Speed 13895.22 samples/sec Loss 1.7795 LearningRate 0.0003 Epoch: 20 Global Step: 35110 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-03-04 01:43:46,002-Speed 13888.68 samples/sec Loss 1.7664 LearningRate 0.0003 Epoch: 20 Global Step: 35120 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-03-04 01:44:03,781-Speed 13823.70 samples/sec Loss 1.7792 LearningRate 0.0003 Epoch: 20 Global Step: 35130 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-03-04 01:44:21,535-Speed 13843.66 samples/sec Loss 1.7625 LearningRate 0.0003 Epoch: 20 Global Step: 35140 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-03-04 01:44:39,279-Speed 13851.35 samples/sec Loss 1.7658 LearningRate 0.0003 Epoch: 20 Global Step: 35150 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-03-04 01:44:57,130-Speed 13768.44 samples/sec Loss 1.7836 LearningRate 0.0003 Epoch: 20 Global Step: 35160 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-03-04 01:45:14,985-Speed 13765.09 samples/sec Loss 1.7730 LearningRate 0.0003 Epoch: 20 Global Step: 35170 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-03-04 01:45:32,785-Speed 13807.74 samples/sec Loss 1.7653 LearningRate 0.0003 Epoch: 20 Global Step: 35180 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-03-04 01:45:50,628-Speed 13773.73 samples/sec Loss 1.7648 LearningRate 0.0003 Epoch: 20 Global Step: 35190 Fp16 Grad Scale: 16384 Required: 17 hours Training: 2022-03-04 01:46:08,657-Speed 13632.78 samples/sec Loss 1.7692 LearningRate 0.0003 Epoch: 20 Global Step: 35200 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-04 01:46:26,784-Speed 13558.68 samples/sec Loss 1.7763 LearningRate 0.0003 Epoch: 20 Global Step: 35210 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-04 01:46:44,859-Speed 13597.80 samples/sec Loss 1.7788 LearningRate 0.0003 Epoch: 20 Global Step: 35220 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-04 01:47:03,007-Speed 13543.03 samples/sec Loss 1.7850 LearningRate 0.0003 Epoch: 20 Global Step: 35230 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-04 01:47:21,063-Speed 13611.82 samples/sec Loss 1.7803 LearningRate 0.0003 Epoch: 20 Global Step: 35240 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-04 01:47:38,757-Speed 13890.76 samples/sec Loss 1.7781 LearningRate 0.0003 Epoch: 20 Global Step: 35250 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-04 01:47:56,489-Speed 13861.33 samples/sec Loss 1.7654 LearningRate 0.0003 Epoch: 20 Global Step: 35260 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-04 01:48:14,318-Speed 13784.65 samples/sec Loss 1.7605 LearningRate 0.0003 Epoch: 20 Global Step: 35270 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-04 01:48:32,071-Speed 13844.23 samples/sec Loss 1.7742 LearningRate 0.0003 Epoch: 20 Global Step: 35280 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-04 01:48:49,812-Speed 13854.26 samples/sec Loss 1.7772 LearningRate 0.0003 Epoch: 20 Global Step: 35290 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-04 01:49:07,568-Speed 13841.84 samples/sec Loss 1.7747 LearningRate 0.0003 Epoch: 20 Global Step: 35300 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-04 01:49:25,443-Speed 13750.00 samples/sec Loss 1.7675 LearningRate 0.0003 Epoch: 20 Global Step: 35310 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-04 01:49:43,369-Speed 13709.85 samples/sec Loss 1.7754 LearningRate 0.0003 Epoch: 20 Global Step: 35320 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-04 01:50:01,127-Speed 13841.06 samples/sec Loss 1.7785 LearningRate 0.0003 Epoch: 20 Global Step: 35330 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-04 01:50:18,881-Speed 13842.96 samples/sec Loss 1.7633 LearningRate 0.0003 Epoch: 20 Global Step: 35340 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-04 01:50:36,582-Speed 13885.23 samples/sec Loss 1.7661 LearningRate 0.0003 Epoch: 20 Global Step: 35350 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-04 01:50:54,374-Speed 13813.95 samples/sec Loss 1.7644 LearningRate 0.0003 Epoch: 20 Global Step: 35360 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-04 01:51:12,146-Speed 13829.40 samples/sec Loss 1.7732 LearningRate 0.0003 Epoch: 20 Global Step: 35370 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-04 01:51:29,988-Speed 13775.25 samples/sec Loss 1.7778 LearningRate 0.0003 Epoch: 20 Global Step: 35380 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-04 01:51:47,675-Speed 13895.57 samples/sec Loss 1.7586 LearningRate 0.0003 Epoch: 20 Global Step: 35390 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-04 01:52:05,399-Speed 13866.92 samples/sec Loss 1.7707 LearningRate 0.0003 Epoch: 20 Global Step: 35400 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-04 01:52:23,067-Speed 13911.74 samples/sec Loss 1.7649 LearningRate 0.0003 Epoch: 20 Global Step: 35410 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-04 01:52:40,914-Speed 13771.85 samples/sec Loss 1.7625 LearningRate 0.0003 Epoch: 20 Global Step: 35420 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-04 01:52:58,711-Speed 13810.84 samples/sec Loss 1.7629 LearningRate 0.0003 Epoch: 20 Global Step: 35430 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-04 01:53:16,479-Speed 13832.38 samples/sec Loss 1.7689 LearningRate 0.0003 Epoch: 20 Global Step: 35440 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-04 01:53:34,319-Speed 13777.01 samples/sec Loss 1.7665 LearningRate 0.0003 Epoch: 20 Global Step: 35450 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-04 01:53:52,144-Speed 13788.23 samples/sec Loss 1.7525 LearningRate 0.0003 Epoch: 20 Global Step: 35460 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-04 01:54:09,980-Speed 13779.82 samples/sec Loss 1.7663 LearningRate 0.0003 Epoch: 20 Global Step: 35470 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-04 01:54:28,075-Speed 13583.52 samples/sec Loss 1.7616 LearningRate 0.0003 Epoch: 20 Global Step: 35480 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-04 01:54:46,091-Speed 13641.55 samples/sec Loss 1.7629 LearningRate 0.0003 Epoch: 20 Global Step: 35490 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-04 01:55:04,226-Speed 13552.70 samples/sec Loss 1.7665 LearningRate 0.0003 Epoch: 20 Global Step: 35500 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-04 01:55:22,262-Speed 13628.22 samples/sec Loss 1.7583 LearningRate 0.0003 Epoch: 20 Global Step: 35510 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-04 01:55:40,022-Speed 13838.62 samples/sec Loss 1.7477 LearningRate 0.0003 Epoch: 20 Global Step: 35520 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-04 01:55:57,944-Speed 13713.76 samples/sec Loss 1.7426 LearningRate 0.0003 Epoch: 20 Global Step: 35530 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-04 01:56:15,752-Speed 13801.93 samples/sec Loss 1.7585 LearningRate 0.0003 Epoch: 20 Global Step: 35540 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-04 01:56:33,656-Speed 13727.66 samples/sec Loss 1.7638 LearningRate 0.0003 Epoch: 20 Global Step: 35550 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-04 01:56:51,512-Speed 13763.84 samples/sec Loss 1.7562 LearningRate 0.0003 Epoch: 20 Global Step: 35560 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-04 01:57:09,332-Speed 13792.19 samples/sec Loss 1.7701 LearningRate 0.0003 Epoch: 20 Global Step: 35570 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-04 01:57:27,229-Speed 13733.32 samples/sec Loss 1.7611 LearningRate 0.0003 Epoch: 20 Global Step: 35580 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-04 01:57:45,021-Speed 13813.87 samples/sec Loss 1.7525 LearningRate 0.0003 Epoch: 20 Global Step: 35590 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-04 01:58:02,769-Speed 13848.37 samples/sec Loss 1.7653 LearningRate 0.0003 Epoch: 20 Global Step: 35600 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-04 01:58:20,469-Speed 13885.71 samples/sec Loss 1.7310 LearningRate 0.0003 Epoch: 20 Global Step: 35610 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-04 01:58:38,231-Speed 13837.24 samples/sec Loss 1.7537 LearningRate 0.0003 Epoch: 20 Global Step: 35620 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-04 01:58:55,969-Speed 13855.69 samples/sec Loss 1.7505 LearningRate 0.0003 Epoch: 20 Global Step: 35630 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-04 01:59:13,735-Speed 13833.81 samples/sec Loss 1.7547 LearningRate 0.0003 Epoch: 20 Global Step: 35640 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-04 01:59:31,552-Speed 13794.39 samples/sec Loss 1.7559 LearningRate 0.0003 Epoch: 20 Global Step: 35650 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-04 01:59:49,311-Speed 13839.98 samples/sec Loss 1.7581 LearningRate 0.0003 Epoch: 20 Global Step: 35660 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-04 02:00:07,010-Speed 13886.34 samples/sec Loss 1.7490 LearningRate 0.0003 Epoch: 20 Global Step: 35670 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-04 02:00:24,690-Speed 13901.49 samples/sec Loss 1.7372 LearningRate 0.0003 Epoch: 20 Global Step: 35680 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-04 02:00:42,466-Speed 13825.59 samples/sec Loss 1.7554 LearningRate 0.0003 Epoch: 20 Global Step: 35690 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-04 02:01:00,154-Speed 13895.18 samples/sec Loss 1.7456 LearningRate 0.0003 Epoch: 20 Global Step: 35700 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-04 02:01:17,903-Speed 13848.95 samples/sec Loss 1.7437 LearningRate 0.0003 Epoch: 20 Global Step: 35710 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-04 02:01:35,612-Speed 13878.45 samples/sec Loss 1.7521 LearningRate 0.0003 Epoch: 20 Global Step: 35720 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-04 02:01:53,304-Speed 13892.73 samples/sec Loss 1.7471 LearningRate 0.0003 Epoch: 20 Global Step: 35730 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-04 02:02:11,042-Speed 13856.23 samples/sec Loss 1.7623 LearningRate 0.0003 Epoch: 20 Global Step: 35740 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-04 02:02:28,832-Speed 13815.52 samples/sec Loss 1.7460 LearningRate 0.0003 Epoch: 20 Global Step: 35750 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-04 02:02:46,654-Speed 13792.93 samples/sec Loss 1.7382 LearningRate 0.0003 Epoch: 20 Global Step: 35760 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-04 02:03:04,413-Speed 13841.62 samples/sec Loss 1.7462 LearningRate 0.0003 Epoch: 20 Global Step: 35770 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-04 02:03:22,095-Speed 13899.72 samples/sec Loss 1.7468 LearningRate 0.0003 Epoch: 20 Global Step: 35780 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-04 02:03:39,863-Speed 13832.24 samples/sec Loss 1.7432 LearningRate 0.0003 Epoch: 20 Global Step: 35790 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-04 02:03:57,579-Speed 13873.77 samples/sec Loss 1.7480 LearningRate 0.0003 Epoch: 20 Global Step: 35800 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-04 02:04:15,366-Speed 13817.15 samples/sec Loss 1.7410 LearningRate 0.0003 Epoch: 20 Global Step: 35810 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-04 02:04:33,087-Speed 13869.03 samples/sec Loss 1.7351 LearningRate 0.0003 Epoch: 20 Global Step: 35820 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-04 02:04:50,873-Speed 13818.45 samples/sec Loss 1.7502 LearningRate 0.0003 Epoch: 20 Global Step: 35830 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-04 02:05:08,644-Speed 13831.76 samples/sec Loss 1.7428 LearningRate 0.0003 Epoch: 20 Global Step: 35840 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-04 02:05:26,424-Speed 13824.01 samples/sec Loss 1.7469 LearningRate 0.0003 Epoch: 20 Global Step: 35850 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-04 02:05:44,197-Speed 13828.91 samples/sec Loss 1.7361 LearningRate 0.0003 Epoch: 20 Global Step: 35860 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-04 02:06:01,986-Speed 13816.93 samples/sec Loss 1.7565 LearningRate 0.0003 Epoch: 20 Global Step: 35870 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-04 02:06:19,744-Speed 13840.70 samples/sec Loss 1.7395 LearningRate 0.0003 Epoch: 20 Global Step: 35880 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-04 02:06:37,539-Speed 13811.49 samples/sec Loss 1.7500 LearningRate 0.0003 Epoch: 20 Global Step: 35890 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-04 02:06:55,288-Speed 13847.40 samples/sec Loss 1.7495 LearningRate 0.0003 Epoch: 20 Global Step: 35900 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-04 02:07:13,019-Speed 13860.97 samples/sec Loss 1.7348 LearningRate 0.0003 Epoch: 20 Global Step: 35910 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-04 02:07:30,775-Speed 13842.34 samples/sec Loss 1.7395 LearningRate 0.0003 Epoch: 20 Global Step: 35920 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-04 02:07:48,541-Speed 13834.24 samples/sec Loss 1.7458 LearningRate 0.0003 Epoch: 20 Global Step: 35930 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-04 02:08:06,370-Speed 13785.39 samples/sec Loss 1.7329 LearningRate 0.0003 Epoch: 20 Global Step: 35940 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-04 02:08:24,155-Speed 13818.98 samples/sec Loss 1.7286 LearningRate 0.0003 Epoch: 20 Global Step: 35950 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-04 02:08:41,884-Speed 13863.16 samples/sec Loss 1.7276 LearningRate 0.0003 Epoch: 20 Global Step: 35960 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-04 02:08:59,740-Speed 13764.56 samples/sec Loss 1.7224 LearningRate 0.0003 Epoch: 20 Global Step: 35970 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-04 02:09:17,600-Speed 13762.11 samples/sec Loss 1.7410 LearningRate 0.0003 Epoch: 20 Global Step: 35980 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-04 02:09:35,369-Speed 13831.47 samples/sec Loss 1.7457 LearningRate 0.0003 Epoch: 20 Global Step: 35990 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-04 02:09:53,144-Speed 13827.55 samples/sec Loss 1.7499 LearningRate 0.0003 Epoch: 20 Global Step: 36000 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-04 02:10:10,861-Speed 13872.42 samples/sec Loss 1.7311 LearningRate 0.0003 Epoch: 20 Global Step: 36010 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-04 02:10:28,668-Speed 13802.51 samples/sec Loss 1.7480 LearningRate 0.0003 Epoch: 20 Global Step: 36020 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-04 02:10:46,416-Speed 13847.58 samples/sec Loss 1.7419 LearningRate 0.0003 Epoch: 20 Global Step: 36030 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-04 02:11:04,266-Speed 13768.59 samples/sec Loss 1.7251 LearningRate 0.0003 Epoch: 20 Global Step: 36040 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-04 02:11:22,008-Speed 13853.48 samples/sec Loss 1.7266 LearningRate 0.0003 Epoch: 20 Global Step: 36050 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-04 02:11:39,772-Speed 13835.32 samples/sec Loss 1.7348 LearningRate 0.0003 Epoch: 20 Global Step: 36060 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-04 02:11:57,469-Speed 13887.80 samples/sec Loss 1.7469 LearningRate 0.0003 Epoch: 20 Global Step: 36070 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-04 02:12:15,228-Speed 13839.45 samples/sec Loss 1.7315 LearningRate 0.0003 Epoch: 20 Global Step: 36080 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-04 02:12:33,000-Speed 13830.01 samples/sec Loss 1.7331 LearningRate 0.0003 Epoch: 20 Global Step: 36090 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-04 02:12:50,859-Speed 13762.32 samples/sec Loss 1.7348 LearningRate 0.0003 Epoch: 20 Global Step: 36100 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-04 02:13:08,669-Speed 13799.81 samples/sec Loss 1.7327 LearningRate 0.0003 Epoch: 20 Global Step: 36110 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-04 02:13:26,393-Speed 13866.54 samples/sec Loss 1.7403 LearningRate 0.0003 Epoch: 20 Global Step: 36120 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-04 02:13:44,108-Speed 13874.25 samples/sec Loss 1.7312 LearningRate 0.0003 Epoch: 20 Global Step: 36130 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-04 02:14:01,830-Speed 13868.12 samples/sec Loss 1.7200 LearningRate 0.0003 Epoch: 20 Global Step: 36140 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-04 02:14:19,697-Speed 13756.15 samples/sec Loss 1.7195 LearningRate 0.0003 Epoch: 20 Global Step: 36150 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-04 02:14:37,503-Speed 13802.72 samples/sec Loss 1.7189 LearningRate 0.0003 Epoch: 20 Global Step: 36160 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-04 02:14:55,336-Speed 13782.36 samples/sec Loss 1.7342 LearningRate 0.0003 Epoch: 20 Global Step: 36170 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-04 02:15:13,146-Speed 13800.26 samples/sec Loss 1.7266 LearningRate 0.0003 Epoch: 20 Global Step: 36180 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-04 02:15:30,987-Speed 13776.19 samples/sec Loss 1.7440 LearningRate 0.0003 Epoch: 20 Global Step: 36190 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-04 02:15:48,746-Speed 13839.40 samples/sec Loss 1.7323 LearningRate 0.0003 Epoch: 20 Global Step: 36200 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-04 02:16:06,536-Speed 13816.55 samples/sec Loss 1.7320 LearningRate 0.0003 Epoch: 20 Global Step: 36210 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-04 02:16:24,392-Speed 13764.24 samples/sec Loss 1.7440 LearningRate 0.0003 Epoch: 20 Global Step: 36220 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-04 02:16:42,355-Speed 13682.02 samples/sec Loss 1.7350 LearningRate 0.0003 Epoch: 20 Global Step: 36230 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-03-04 02:17:00,165-Speed 13800.84 samples/sec Loss 1.7408 LearningRate 0.0003 Epoch: 20 Global Step: 36240 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-04 02:17:17,999-Speed 13781.15 samples/sec Loss 1.7416 LearningRate 0.0003 Epoch: 20 Global Step: 36250 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-04 02:17:35,886-Speed 13740.28 samples/sec Loss 1.7473 LearningRate 0.0003 Epoch: 20 Global Step: 36260 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-03-04 02:17:53,768-Speed 13744.63 samples/sec Loss 1.7442 LearningRate 0.0003 Epoch: 20 Global Step: 36270 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 02:18:11,499-Speed 13861.47 samples/sec Loss 1.7421 LearningRate 0.0003 Epoch: 20 Global Step: 36280 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 02:18:29,309-Speed 13801.37 samples/sec Loss 1.7503 LearningRate 0.0003 Epoch: 20 Global Step: 36290 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 02:19:36,965-Speed 3632.54 samples/sec Loss 1.7399 LearningRate 0.0003 Epoch: 21 Global Step: 36300 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 02:19:54,700-Speed 13858.53 samples/sec Loss 1.6956 LearningRate 0.0003 Epoch: 21 Global Step: 36310 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 02:20:12,513-Speed 13797.22 samples/sec Loss 1.7091 LearningRate 0.0003 Epoch: 21 Global Step: 36320 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 02:20:30,245-Speed 13860.89 samples/sec Loss 1.7294 LearningRate 0.0003 Epoch: 21 Global Step: 36330 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-03-04 02:20:48,077-Speed 13782.61 samples/sec Loss 1.7139 LearningRate 0.0003 Epoch: 21 Global Step: 36340 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-03-04 02:21:05,921-Speed 13774.00 samples/sec Loss 1.7020 LearningRate 0.0003 Epoch: 21 Global Step: 36350 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-03-04 02:21:23,802-Speed 13744.28 samples/sec Loss 1.7091 LearningRate 0.0003 Epoch: 21 Global Step: 36360 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-03-04 02:21:41,713-Speed 13722.42 samples/sec Loss 1.7156 LearningRate 0.0003 Epoch: 21 Global Step: 36370 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-03-04 02:21:59,647-Speed 13704.57 samples/sec Loss 1.7085 LearningRate 0.0003 Epoch: 21 Global Step: 36380 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-03-04 02:22:17,532-Speed 13742.03 samples/sec Loss 1.7040 LearningRate 0.0003 Epoch: 21 Global Step: 36390 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-03-04 02:22:35,514-Speed 13667.77 samples/sec Loss 1.7139 LearningRate 0.0003 Epoch: 21 Global Step: 36400 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-03-04 02:22:53,414-Speed 13730.37 samples/sec Loss 1.7030 LearningRate 0.0003 Epoch: 21 Global Step: 36410 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-03-04 02:23:11,316-Speed 13729.67 samples/sec Loss 1.7165 LearningRate 0.0003 Epoch: 21 Global Step: 36420 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-03-04 02:23:29,493-Speed 13521.24 samples/sec Loss 1.7177 LearningRate 0.0003 Epoch: 21 Global Step: 36430 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 02:23:47,669-Speed 13523.20 samples/sec Loss 1.7120 LearningRate 0.0003 Epoch: 21 Global Step: 36440 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 02:24:05,877-Speed 13498.15 samples/sec Loss 1.7035 LearningRate 0.0003 Epoch: 21 Global Step: 36450 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 02:24:23,823-Speed 13695.38 samples/sec Loss 1.7102 LearningRate 0.0003 Epoch: 21 Global Step: 36460 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 02:24:41,585-Speed 13837.15 samples/sec Loss 1.7104 LearningRate 0.0003 Epoch: 21 Global Step: 36470 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 02:24:59,328-Speed 13851.68 samples/sec Loss 1.7222 LearningRate 0.0003 Epoch: 21 Global Step: 36480 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 02:25:17,077-Speed 13847.25 samples/sec Loss 1.7201 LearningRate 0.0003 Epoch: 21 Global Step: 36490 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 02:25:34,994-Speed 13718.27 samples/sec Loss 1.7233 LearningRate 0.0003 Epoch: 21 Global Step: 36500 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 02:25:52,762-Speed 13832.16 samples/sec Loss 1.7071 LearningRate 0.0003 Epoch: 21 Global Step: 36510 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 02:26:10,515-Speed 13843.70 samples/sec Loss 1.7288 LearningRate 0.0003 Epoch: 21 Global Step: 36520 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 02:26:28,491-Speed 13672.91 samples/sec Loss 1.7147 LearningRate 0.0003 Epoch: 21 Global Step: 36530 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-04 02:26:46,289-Speed 13809.50 samples/sec Loss 1.7206 LearningRate 0.0003 Epoch: 21 Global Step: 36540 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-04 02:27:04,132-Speed 13774.02 samples/sec Loss 1.7211 LearningRate 0.0003 Epoch: 21 Global Step: 36550 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-04 02:27:21,882-Speed 13846.95 samples/sec Loss 1.7150 LearningRate 0.0003 Epoch: 21 Global Step: 36560 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-04 02:27:39,744-Speed 13759.35 samples/sec Loss 1.6997 LearningRate 0.0003 Epoch: 21 Global Step: 36570 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 02:27:57,527-Speed 13821.92 samples/sec Loss 1.6962 LearningRate 0.0003 Epoch: 21 Global Step: 36580 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 02:28:15,549-Speed 13637.77 samples/sec Loss 1.7053 LearningRate 0.0003 Epoch: 21 Global Step: 36590 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 02:28:33,606-Speed 13611.11 samples/sec Loss 1.6995 LearningRate 0.0003 Epoch: 21 Global Step: 36600 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 02:28:51,419-Speed 13797.46 samples/sec Loss 1.7121 LearningRate 0.0003 Epoch: 21 Global Step: 36610 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 02:29:09,300-Speed 13744.89 samples/sec Loss 1.7205 LearningRate 0.0003 Epoch: 21 Global Step: 36620 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 02:29:27,144-Speed 13773.27 samples/sec Loss 1.7334 LearningRate 0.0003 Epoch: 21 Global Step: 36630 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 02:29:44,944-Speed 13808.20 samples/sec Loss 1.7118 LearningRate 0.0003 Epoch: 21 Global Step: 36640 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 02:30:02,757-Speed 13797.07 samples/sec Loss 1.7077 LearningRate 0.0003 Epoch: 21 Global Step: 36650 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 02:30:20,562-Speed 13804.15 samples/sec Loss 1.7160 LearningRate 0.0003 Epoch: 21 Global Step: 36660 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-03-04 02:30:38,399-Speed 13778.56 samples/sec Loss 1.7088 LearningRate 0.0003 Epoch: 21 Global Step: 36670 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-03-04 02:30:56,201-Speed 13806.34 samples/sec Loss 1.7109 LearningRate 0.0003 Epoch: 21 Global Step: 36680 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-03-04 02:31:14,104-Speed 13728.18 samples/sec Loss 1.7070 LearningRate 0.0003 Epoch: 21 Global Step: 36690 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-03-04 02:31:31,980-Speed 13748.92 samples/sec Loss 1.7137 LearningRate 0.0003 Epoch: 21 Global Step: 36700 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-03-04 02:31:49,873-Speed 13735.77 samples/sec Loss 1.7200 LearningRate 0.0003 Epoch: 21 Global Step: 36710 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-03-04 02:32:07,708-Speed 13780.49 samples/sec Loss 1.7049 LearningRate 0.0003 Epoch: 21 Global Step: 36720 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-03-04 02:32:25,663-Speed 13688.12 samples/sec Loss 1.7169 LearningRate 0.0003 Epoch: 21 Global Step: 36730 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-03-04 02:32:43,672-Speed 13647.53 samples/sec Loss 1.7137 LearningRate 0.0003 Epoch: 21 Global Step: 36740 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-03-04 02:33:01,728-Speed 13612.41 samples/sec Loss 1.7123 LearningRate 0.0003 Epoch: 21 Global Step: 36750 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-03-04 02:33:19,586-Speed 13762.36 samples/sec Loss 1.6982 LearningRate 0.0003 Epoch: 21 Global Step: 36760 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 02:33:37,485-Speed 13731.56 samples/sec Loss 1.7069 LearningRate 0.0003 Epoch: 21 Global Step: 36770 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 02:33:55,356-Speed 13753.82 samples/sec Loss 1.7034 LearningRate 0.0003 Epoch: 21 Global Step: 36780 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 02:34:13,349-Speed 13659.93 samples/sec Loss 1.6935 LearningRate 0.0003 Epoch: 21 Global Step: 36790 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 02:34:31,215-Speed 13756.35 samples/sec Loss 1.6919 LearningRate 0.0003 Epoch: 21 Global Step: 36800 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 02:34:49,155-Speed 13699.99 samples/sec Loss 1.7008 LearningRate 0.0003 Epoch: 21 Global Step: 36810 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 02:35:07,078-Speed 13712.31 samples/sec Loss 1.6982 LearningRate 0.0003 Epoch: 21 Global Step: 36820 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 02:35:25,025-Speed 13695.19 samples/sec Loss 1.7060 LearningRate 0.0003 Epoch: 21 Global Step: 36830 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 02:35:42,936-Speed 13721.63 samples/sec Loss 1.6949 LearningRate 0.0003 Epoch: 21 Global Step: 36840 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 02:36:00,670-Speed 13859.77 samples/sec Loss 1.6960 LearningRate 0.0003 Epoch: 21 Global Step: 36850 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 02:36:18,633-Speed 13682.62 samples/sec Loss 1.7094 LearningRate 0.0003 Epoch: 21 Global Step: 36860 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-04 02:36:36,413-Speed 13823.27 samples/sec Loss 1.7138 LearningRate 0.0003 Epoch: 21 Global Step: 36870 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-04 02:36:54,304-Speed 13737.15 samples/sec Loss 1.7022 LearningRate 0.0003 Epoch: 21 Global Step: 36880 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-04 02:37:12,259-Speed 13688.34 samples/sec Loss 1.6955 LearningRate 0.0003 Epoch: 21 Global Step: 36890 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-04 02:37:30,234-Speed 13673.25 samples/sec Loss 1.6924 LearningRate 0.0003 Epoch: 21 Global Step: 36900 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-04 02:37:48,052-Speed 13793.75 samples/sec Loss 1.6870 LearningRate 0.0003 Epoch: 21 Global Step: 36910 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 02:38:05,898-Speed 13773.45 samples/sec Loss 1.6870 LearningRate 0.0003 Epoch: 21 Global Step: 36920 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-03-04 02:38:23,702-Speed 13805.33 samples/sec Loss 1.7066 LearningRate 0.0003 Epoch: 21 Global Step: 36930 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-03-04 02:38:41,453-Speed 13846.64 samples/sec Loss 1.6956 LearningRate 0.0003 Epoch: 21 Global Step: 36940 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-03-04 02:38:59,245-Speed 13814.19 samples/sec Loss 1.7070 LearningRate 0.0003 Epoch: 21 Global Step: 36950 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-03-04 02:39:17,088-Speed 13774.37 samples/sec Loss 1.6944 LearningRate 0.0003 Epoch: 21 Global Step: 36960 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-03-04 02:39:34,995-Speed 13725.52 samples/sec Loss 1.6975 LearningRate 0.0003 Epoch: 21 Global Step: 36970 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-03-04 02:39:52,819-Speed 13788.76 samples/sec Loss 1.6944 LearningRate 0.0003 Epoch: 21 Global Step: 36980 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-03-04 02:40:10,658-Speed 13777.09 samples/sec Loss 1.6973 LearningRate 0.0003 Epoch: 21 Global Step: 36990 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-03-04 02:40:28,423-Speed 13835.50 samples/sec Loss 1.6893 LearningRate 0.0003 Epoch: 21 Global Step: 37000 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-03-04 02:40:46,315-Speed 13736.42 samples/sec Loss 1.6927 LearningRate 0.0003 Epoch: 21 Global Step: 37010 Fp16 Grad Scale: 16384 Required: 16 hours Training: 2022-03-04 02:41:04,226-Speed 13721.69 samples/sec Loss 1.7018 LearningRate 0.0003 Epoch: 21 Global Step: 37020 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 02:41:22,038-Speed 13798.96 samples/sec Loss 1.6925 LearningRate 0.0003 Epoch: 21 Global Step: 37030 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 02:41:39,923-Speed 13741.87 samples/sec Loss 1.6808 LearningRate 0.0003 Epoch: 21 Global Step: 37040 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 02:41:57,689-Speed 13833.97 samples/sec Loss 1.7022 LearningRate 0.0003 Epoch: 21 Global Step: 37050 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 02:42:15,458-Speed 13831.97 samples/sec Loss 1.6808 LearningRate 0.0003 Epoch: 21 Global Step: 37060 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 02:42:33,221-Speed 13836.20 samples/sec Loss 1.6931 LearningRate 0.0003 Epoch: 21 Global Step: 37070 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 02:42:51,139-Speed 13716.78 samples/sec Loss 1.6901 LearningRate 0.0003 Epoch: 21 Global Step: 37080 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 02:43:09,093-Speed 13689.69 samples/sec Loss 1.6843 LearningRate 0.0003 Epoch: 21 Global Step: 37090 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 02:43:26,788-Speed 13891.54 samples/sec Loss 1.6922 LearningRate 0.0003 Epoch: 21 Global Step: 37100 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 02:43:44,518-Speed 13861.90 samples/sec Loss 1.6869 LearningRate 0.0003 Epoch: 21 Global Step: 37110 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 02:44:02,361-Speed 13774.24 samples/sec Loss 1.6739 LearningRate 0.0003 Epoch: 21 Global Step: 37120 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-04 02:44:20,186-Speed 13788.60 samples/sec Loss 1.6824 LearningRate 0.0003 Epoch: 21 Global Step: 37130 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-04 02:44:38,020-Speed 13781.08 samples/sec Loss 1.6820 LearningRate 0.0003 Epoch: 21 Global Step: 37140 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-04 02:44:55,730-Speed 13878.82 samples/sec Loss 1.6725 LearningRate 0.0003 Epoch: 21 Global Step: 37150 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-04 02:45:13,588-Speed 13761.92 samples/sec Loss 1.6972 LearningRate 0.0003 Epoch: 21 Global Step: 37160 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-04 02:45:31,465-Speed 13748.42 samples/sec Loss 1.6883 LearningRate 0.0003 Epoch: 21 Global Step: 37170 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-04 02:45:49,312-Speed 13771.56 samples/sec Loss 1.6794 LearningRate 0.0003 Epoch: 21 Global Step: 37180 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-04 02:46:07,150-Speed 13778.21 samples/sec Loss 1.6792 LearningRate 0.0003 Epoch: 21 Global Step: 37190 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 02:46:24,983-Speed 13782.09 samples/sec Loss 1.6949 LearningRate 0.0003 Epoch: 21 Global Step: 37200 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 02:46:42,823-Speed 13776.15 samples/sec Loss 1.6847 LearningRate 0.0003 Epoch: 21 Global Step: 37210 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 02:47:00,714-Speed 13737.78 samples/sec Loss 1.6899 LearningRate 0.0003 Epoch: 21 Global Step: 37220 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 02:47:18,566-Speed 13766.94 samples/sec Loss 1.6825 LearningRate 0.0003 Epoch: 21 Global Step: 37230 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 02:47:36,576-Speed 13647.11 samples/sec Loss 1.6700 LearningRate 0.0003 Epoch: 21 Global Step: 37240 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 02:47:54,359-Speed 13820.86 samples/sec Loss 1.6821 LearningRate 0.0003 Epoch: 21 Global Step: 37250 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 02:48:12,163-Speed 13804.87 samples/sec Loss 1.6784 LearningRate 0.0003 Epoch: 21 Global Step: 37260 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 02:48:29,940-Speed 13825.17 samples/sec Loss 1.6655 LearningRate 0.0003 Epoch: 21 Global Step: 37270 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 02:48:47,764-Speed 13788.70 samples/sec Loss 1.6739 LearningRate 0.0003 Epoch: 21 Global Step: 37280 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 02:49:05,550-Speed 13819.65 samples/sec Loss 1.6802 LearningRate 0.0003 Epoch: 21 Global Step: 37290 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 02:49:23,400-Speed 13769.16 samples/sec Loss 1.6790 LearningRate 0.0003 Epoch: 21 Global Step: 37300 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 02:49:41,214-Speed 13796.58 samples/sec Loss 1.6724 LearningRate 0.0003 Epoch: 21 Global Step: 37310 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 02:49:58,965-Speed 13845.62 samples/sec Loss 1.6726 LearningRate 0.0003 Epoch: 21 Global Step: 37320 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 02:50:16,869-Speed 13727.11 samples/sec Loss 1.6857 LearningRate 0.0003 Epoch: 21 Global Step: 37330 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 02:50:34,603-Speed 13859.37 samples/sec Loss 1.6730 LearningRate 0.0003 Epoch: 21 Global Step: 37340 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 02:50:52,408-Speed 13803.90 samples/sec Loss 1.6762 LearningRate 0.0003 Epoch: 21 Global Step: 37350 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 02:51:10,220-Speed 13797.56 samples/sec Loss 1.6729 LearningRate 0.0003 Epoch: 21 Global Step: 37360 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 02:51:28,040-Speed 13792.58 samples/sec Loss 1.6733 LearningRate 0.0003 Epoch: 21 Global Step: 37370 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 02:51:45,808-Speed 13832.83 samples/sec Loss 1.6709 LearningRate 0.0003 Epoch: 21 Global Step: 37380 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 02:52:03,600-Speed 13813.36 samples/sec Loss 1.6688 LearningRate 0.0003 Epoch: 21 Global Step: 37390 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-04 02:52:21,474-Speed 13750.08 samples/sec Loss 1.6768 LearningRate 0.0003 Epoch: 21 Global Step: 37400 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-04 02:52:39,475-Speed 13653.41 samples/sec Loss 1.6731 LearningRate 0.0003 Epoch: 21 Global Step: 37410 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 02:52:57,335-Speed 13761.83 samples/sec Loss 1.6822 LearningRate 0.0003 Epoch: 21 Global Step: 37420 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 02:53:15,066-Speed 13860.87 samples/sec Loss 1.6736 LearningRate 0.0003 Epoch: 21 Global Step: 37430 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 02:53:32,892-Speed 13787.52 samples/sec Loss 1.6584 LearningRate 0.0003 Epoch: 21 Global Step: 37440 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 02:53:50,804-Speed 13721.89 samples/sec Loss 1.6690 LearningRate 0.0003 Epoch: 21 Global Step: 37450 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 02:54:08,657-Speed 13766.70 samples/sec Loss 1.6818 LearningRate 0.0003 Epoch: 21 Global Step: 37460 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 02:54:26,438-Speed 13822.13 samples/sec Loss 1.6651 LearningRate 0.0003 Epoch: 21 Global Step: 37470 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 02:54:44,194-Speed 13841.90 samples/sec Loss 1.6613 LearningRate 0.0003 Epoch: 21 Global Step: 37480 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 02:55:02,125-Speed 13706.47 samples/sec Loss 1.6850 LearningRate 0.0003 Epoch: 21 Global Step: 37490 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 02:55:20,225-Speed 13578.78 samples/sec Loss 1.6634 LearningRate 0.0003 Epoch: 21 Global Step: 37500 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 02:55:38,230-Speed 13650.55 samples/sec Loss 1.6629 LearningRate 0.0003 Epoch: 21 Global Step: 37510 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-04 02:55:56,388-Speed 13535.28 samples/sec Loss 1.6723 LearningRate 0.0003 Epoch: 21 Global Step: 37520 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-04 02:56:14,499-Speed 13570.24 samples/sec Loss 1.6614 LearningRate 0.0003 Epoch: 21 Global Step: 37530 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-04 02:56:32,577-Speed 13596.05 samples/sec Loss 1.6623 LearningRate 0.0003 Epoch: 21 Global Step: 37540 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-04 02:56:50,614-Speed 13625.78 samples/sec Loss 1.6700 LearningRate 0.0003 Epoch: 21 Global Step: 37550 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-04 02:57:08,767-Speed 13539.18 samples/sec Loss 1.6696 LearningRate 0.0003 Epoch: 21 Global Step: 37560 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-04 02:57:26,800-Speed 13628.91 samples/sec Loss 1.6712 LearningRate 0.0003 Epoch: 21 Global Step: 37570 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 02:57:44,932-Speed 13554.91 samples/sec Loss 1.6728 LearningRate 0.0003 Epoch: 21 Global Step: 37580 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 02:58:03,015-Speed 13592.36 samples/sec Loss 1.6724 LearningRate 0.0003 Epoch: 21 Global Step: 37590 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 02:58:21,029-Speed 13643.52 samples/sec Loss 1.6705 LearningRate 0.0003 Epoch: 21 Global Step: 37600 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 02:58:39,145-Speed 13566.48 samples/sec Loss 1.6665 LearningRate 0.0003 Epoch: 21 Global Step: 37610 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 02:58:57,240-Speed 13582.75 samples/sec Loss 1.6559 LearningRate 0.0003 Epoch: 21 Global Step: 37620 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 02:59:15,045-Speed 13804.09 samples/sec Loss 1.6709 LearningRate 0.0003 Epoch: 21 Global Step: 37630 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 02:59:32,825-Speed 13822.71 samples/sec Loss 1.6650 LearningRate 0.0003 Epoch: 21 Global Step: 37640 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 02:59:50,689-Speed 13758.39 samples/sec Loss 1.6647 LearningRate 0.0003 Epoch: 21 Global Step: 37650 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 03:00:08,529-Speed 13776.38 samples/sec Loss 1.6611 LearningRate 0.0003 Epoch: 21 Global Step: 37660 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 03:00:26,365-Speed 13779.92 samples/sec Loss 1.6701 LearningRate 0.0003 Epoch: 21 Global Step: 37670 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-04 03:00:44,149-Speed 13820.01 samples/sec Loss 1.6694 LearningRate 0.0003 Epoch: 21 Global Step: 37680 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-04 03:01:01,957-Speed 13801.69 samples/sec Loss 1.6663 LearningRate 0.0003 Epoch: 21 Global Step: 37690 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-04 03:01:19,737-Speed 13822.53 samples/sec Loss 1.6698 LearningRate 0.0003 Epoch: 21 Global Step: 37700 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-04 03:01:37,503-Speed 13835.07 samples/sec Loss 1.6674 LearningRate 0.0003 Epoch: 21 Global Step: 37710 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-04 03:01:55,319-Speed 13795.25 samples/sec Loss 1.6586 LearningRate 0.0003 Epoch: 21 Global Step: 37720 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-04 03:02:13,170-Speed 13769.07 samples/sec Loss 1.6624 LearningRate 0.0003 Epoch: 21 Global Step: 37730 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-04 03:02:31,018-Speed 13770.10 samples/sec Loss 1.6497 LearningRate 0.0003 Epoch: 21 Global Step: 37740 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 03:02:48,921-Speed 13728.67 samples/sec Loss 1.6587 LearningRate 0.0003 Epoch: 21 Global Step: 37750 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 03:03:06,739-Speed 13793.23 samples/sec Loss 1.6669 LearningRate 0.0003 Epoch: 21 Global Step: 37760 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 03:03:24,531-Speed 13814.86 samples/sec Loss 1.6556 LearningRate 0.0003 Epoch: 21 Global Step: 37770 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 03:03:42,423-Speed 13737.23 samples/sec Loss 1.6627 LearningRate 0.0003 Epoch: 21 Global Step: 37780 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 03:04:00,214-Speed 13815.70 samples/sec Loss 1.6544 LearningRate 0.0003 Epoch: 21 Global Step: 37790 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 03:04:18,074-Speed 13761.48 samples/sec Loss 1.6646 LearningRate 0.0003 Epoch: 21 Global Step: 37800 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 03:04:35,972-Speed 13731.46 samples/sec Loss 1.6525 LearningRate 0.0003 Epoch: 21 Global Step: 37810 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 03:04:53,797-Speed 13787.98 samples/sec Loss 1.6662 LearningRate 0.0003 Epoch: 21 Global Step: 37820 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 03:05:11,587-Speed 13815.76 samples/sec Loss 1.6718 LearningRate 0.0003 Epoch: 21 Global Step: 37830 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 03:05:29,410-Speed 13790.83 samples/sec Loss 1.6634 LearningRate 0.0003 Epoch: 21 Global Step: 37840 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-04 03:05:47,218-Speed 13801.48 samples/sec Loss 1.6626 LearningRate 0.0003 Epoch: 21 Global Step: 37850 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-04 03:06:05,025-Speed 13802.08 samples/sec Loss 1.6633 LearningRate 0.0003 Epoch: 21 Global Step: 37860 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 03:06:22,846-Speed 13791.42 samples/sec Loss 1.6484 LearningRate 0.0003 Epoch: 21 Global Step: 37870 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 03:06:40,680-Speed 13781.70 samples/sec Loss 1.6497 LearningRate 0.0003 Epoch: 21 Global Step: 37880 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 03:06:58,507-Speed 13786.91 samples/sec Loss 1.6471 LearningRate 0.0003 Epoch: 21 Global Step: 37890 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 03:07:16,359-Speed 13766.98 samples/sec Loss 1.6505 LearningRate 0.0003 Epoch: 21 Global Step: 37900 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 03:07:34,216-Speed 13763.34 samples/sec Loss 1.6716 LearningRate 0.0003 Epoch: 21 Global Step: 37910 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 03:07:51,947-Speed 13861.86 samples/sec Loss 1.6679 LearningRate 0.0003 Epoch: 21 Global Step: 37920 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 03:08:09,773-Speed 13787.86 samples/sec Loss 1.6613 LearningRate 0.0003 Epoch: 21 Global Step: 37930 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 03:08:27,641-Speed 13754.84 samples/sec Loss 1.6584 LearningRate 0.0003 Epoch: 21 Global Step: 37940 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 03:08:45,426-Speed 13819.60 samples/sec Loss 1.6587 LearningRate 0.0003 Epoch: 21 Global Step: 37950 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 03:09:03,223-Speed 13809.65 samples/sec Loss 1.6601 LearningRate 0.0003 Epoch: 21 Global Step: 37960 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-04 03:09:20,973-Speed 13846.67 samples/sec Loss 1.6658 LearningRate 0.0003 Epoch: 21 Global Step: 37970 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-04 03:09:38,740-Speed 13833.24 samples/sec Loss 1.6583 LearningRate 0.0003 Epoch: 21 Global Step: 37980 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-04 03:09:56,578-Speed 13778.33 samples/sec Loss 1.6738 LearningRate 0.0003 Epoch: 21 Global Step: 37990 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-04 03:10:14,422-Speed 13773.62 samples/sec Loss 1.6716 LearningRate 0.0003 Epoch: 21 Global Step: 38000 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-04 03:10:32,277-Speed 13765.12 samples/sec Loss 1.6641 LearningRate 0.0003 Epoch: 21 Global Step: 38010 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-04 03:10:50,058-Speed 13822.77 samples/sec Loss 1.6754 LearningRate 0.0002 Epoch: 21 Global Step: 38020 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-04 03:11:57,981-Speed 3618.23 samples/sec Loss 1.6381 LearningRate 0.0002 Epoch: 22 Global Step: 38030 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-04 03:12:15,706-Speed 13866.69 samples/sec Loss 1.6399 LearningRate 0.0002 Epoch: 22 Global Step: 38040 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-04 03:12:33,435-Speed 13862.55 samples/sec Loss 1.6378 LearningRate 0.0002 Epoch: 22 Global Step: 38050 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-04 03:12:51,127-Speed 13891.93 samples/sec Loss 1.6354 LearningRate 0.0002 Epoch: 22 Global Step: 38060 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 03:13:08,897-Speed 13830.59 samples/sec Loss 1.6259 LearningRate 0.0002 Epoch: 22 Global Step: 38070 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 03:13:26,736-Speed 13778.27 samples/sec Loss 1.6365 LearningRate 0.0002 Epoch: 22 Global Step: 38080 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 03:13:44,557-Speed 13791.20 samples/sec Loss 1.6336 LearningRate 0.0002 Epoch: 22 Global Step: 38090 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 03:14:02,284-Speed 13865.58 samples/sec Loss 1.6430 LearningRate 0.0002 Epoch: 22 Global Step: 38100 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 03:14:20,149-Speed 13756.89 samples/sec Loss 1.6376 LearningRate 0.0002 Epoch: 22 Global Step: 38110 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 03:14:37,992-Speed 13774.68 samples/sec Loss 1.6374 LearningRate 0.0002 Epoch: 22 Global Step: 38120 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 03:14:55,738-Speed 13849.73 samples/sec Loss 1.6397 LearningRate 0.0002 Epoch: 22 Global Step: 38130 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 03:15:13,624-Speed 13741.47 samples/sec Loss 1.6188 LearningRate 0.0002 Epoch: 22 Global Step: 38140 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 03:15:31,497-Speed 13750.91 samples/sec Loss 1.6306 LearningRate 0.0002 Epoch: 22 Global Step: 38150 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 03:15:49,301-Speed 13805.07 samples/sec Loss 1.6351 LearningRate 0.0002 Epoch: 22 Global Step: 38160 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-04 03:16:07,240-Speed 13700.44 samples/sec Loss 1.6356 LearningRate 0.0002 Epoch: 22 Global Step: 38170 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 03:16:25,133-Speed 13735.79 samples/sec Loss 1.6375 LearningRate 0.0002 Epoch: 22 Global Step: 38180 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 03:16:42,937-Speed 13804.58 samples/sec Loss 1.6500 LearningRate 0.0002 Epoch: 22 Global Step: 38190 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 03:17:00,766-Speed 13784.98 samples/sec Loss 1.6464 LearningRate 0.0002 Epoch: 22 Global Step: 38200 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 03:17:18,741-Speed 13673.36 samples/sec Loss 1.6413 LearningRate 0.0002 Epoch: 22 Global Step: 38210 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 03:17:36,489-Speed 13848.16 samples/sec Loss 1.6294 LearningRate 0.0002 Epoch: 22 Global Step: 38220 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 03:17:54,318-Speed 13785.25 samples/sec Loss 1.6447 LearningRate 0.0002 Epoch: 22 Global Step: 38230 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 03:18:12,108-Speed 13815.20 samples/sec Loss 1.6479 LearningRate 0.0002 Epoch: 22 Global Step: 38240 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 03:18:30,009-Speed 13730.13 samples/sec Loss 1.6364 LearningRate 0.0002 Epoch: 22 Global Step: 38250 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 03:18:47,857-Speed 13769.98 samples/sec Loss 1.6495 LearningRate 0.0002 Epoch: 22 Global Step: 38260 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 03:19:05,775-Speed 13716.73 samples/sec Loss 1.6284 LearningRate 0.0002 Epoch: 22 Global Step: 38270 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-03-04 03:19:23,584-Speed 13800.55 samples/sec Loss 1.6393 LearningRate 0.0002 Epoch: 22 Global Step: 38280 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-03-04 03:19:41,447-Speed 13759.69 samples/sec Loss 1.6240 LearningRate 0.0002 Epoch: 22 Global Step: 38290 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-04 03:19:59,322-Speed 13751.03 samples/sec Loss 1.6475 LearningRate 0.0002 Epoch: 22 Global Step: 38300 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-04 03:20:17,222-Speed 13731.04 samples/sec Loss 1.6347 LearningRate 0.0002 Epoch: 22 Global Step: 38310 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-04 03:20:34,959-Speed 13856.53 samples/sec Loss 1.6378 LearningRate 0.0002 Epoch: 22 Global Step: 38320 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-04 03:20:52,779-Speed 13792.84 samples/sec Loss 1.6419 LearningRate 0.0002 Epoch: 22 Global Step: 38330 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-04 03:21:10,615-Speed 13779.33 samples/sec Loss 1.6456 LearningRate 0.0002 Epoch: 22 Global Step: 38340 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-04 03:21:28,509-Speed 13735.69 samples/sec Loss 1.6372 LearningRate 0.0002 Epoch: 22 Global Step: 38350 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-04 03:21:46,317-Speed 13800.74 samples/sec Loss 1.6308 LearningRate 0.0002 Epoch: 22 Global Step: 38360 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-04 03:22:04,222-Speed 13727.99 samples/sec Loss 1.6373 LearningRate 0.0002 Epoch: 22 Global Step: 38370 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-04 03:22:22,113-Speed 13737.09 samples/sec Loss 1.6426 LearningRate 0.0002 Epoch: 22 Global Step: 38380 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-04 03:22:40,053-Speed 13699.80 samples/sec Loss 1.6388 LearningRate 0.0002 Epoch: 22 Global Step: 38390 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-04 03:22:57,939-Speed 13741.24 samples/sec Loss 1.6255 LearningRate 0.0002 Epoch: 22 Global Step: 38400 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-04 03:23:16,159-Speed 13489.55 samples/sec Loss 1.6369 LearningRate 0.0002 Epoch: 22 Global Step: 38410 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-04 03:23:34,289-Speed 13556.60 samples/sec Loss 1.6275 LearningRate 0.0002 Epoch: 22 Global Step: 38420 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-04 03:23:52,450-Speed 13532.64 samples/sec Loss 1.6452 LearningRate 0.0002 Epoch: 22 Global Step: 38430 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-04 03:24:10,556-Speed 13575.86 samples/sec Loss 1.6288 LearningRate 0.0002 Epoch: 22 Global Step: 38440 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-04 03:24:28,811-Speed 13463.56 samples/sec Loss 1.6340 LearningRate 0.0002 Epoch: 22 Global Step: 38450 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-04 03:24:46,988-Speed 13521.77 samples/sec Loss 1.6421 LearningRate 0.0002 Epoch: 22 Global Step: 38460 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-04 03:25:05,098-Speed 13570.72 samples/sec Loss 1.6294 LearningRate 0.0002 Epoch: 22 Global Step: 38470 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-04 03:25:23,217-Speed 13564.71 samples/sec Loss 1.6470 LearningRate 0.0002 Epoch: 22 Global Step: 38480 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-04 03:25:41,480-Speed 13458.98 samples/sec Loss 1.6314 LearningRate 0.0002 Epoch: 22 Global Step: 38490 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-04 03:25:59,724-Speed 13471.78 samples/sec Loss 1.6284 LearningRate 0.0002 Epoch: 22 Global Step: 38500 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-03-04 03:26:18,047-Speed 13413.67 samples/sec Loss 1.6164 LearningRate 0.0002 Epoch: 22 Global Step: 38510 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-03-04 03:26:36,241-Speed 13507.99 samples/sec Loss 1.6269 LearningRate 0.0002 Epoch: 22 Global Step: 38520 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-03-04 03:26:54,404-Speed 13531.44 samples/sec Loss 1.6238 LearningRate 0.0002 Epoch: 22 Global Step: 38530 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-03-04 03:27:12,542-Speed 13551.05 samples/sec Loss 1.6301 LearningRate 0.0002 Epoch: 22 Global Step: 38540 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-03-04 03:27:30,717-Speed 13522.95 samples/sec Loss 1.6241 LearningRate 0.0002 Epoch: 22 Global Step: 38550 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-03-04 03:27:48,873-Speed 13536.75 samples/sec Loss 1.6246 LearningRate 0.0002 Epoch: 22 Global Step: 38560 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-03-04 03:28:06,988-Speed 13567.11 samples/sec Loss 1.6287 LearningRate 0.0002 Epoch: 22 Global Step: 38570 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-03-04 03:28:25,185-Speed 13506.10 samples/sec Loss 1.6301 LearningRate 0.0002 Epoch: 22 Global Step: 38580 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-03-04 03:28:43,305-Speed 13564.00 samples/sec Loss 1.6227 LearningRate 0.0002 Epoch: 22 Global Step: 38590 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-03-04 03:29:01,468-Speed 13531.85 samples/sec Loss 1.6267 LearningRate 0.0002 Epoch: 22 Global Step: 38600 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-04 03:29:19,567-Speed 13579.04 samples/sec Loss 1.6286 LearningRate 0.0002 Epoch: 22 Global Step: 38610 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-04 03:29:37,710-Speed 13546.92 samples/sec Loss 1.6187 LearningRate 0.0002 Epoch: 22 Global Step: 38620 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-04 03:29:55,939-Speed 13483.26 samples/sec Loss 1.6364 LearningRate 0.0002 Epoch: 22 Global Step: 38630 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-04 03:30:14,113-Speed 13523.31 samples/sec Loss 1.6249 LearningRate 0.0002 Epoch: 22 Global Step: 38640 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-04 03:30:32,241-Speed 13557.52 samples/sec Loss 1.6284 LearningRate 0.0002 Epoch: 22 Global Step: 38650 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-03-04 03:30:50,385-Speed 13545.45 samples/sec Loss 1.6160 LearningRate 0.0002 Epoch: 22 Global Step: 38660 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-03-04 03:31:08,212-Speed 13786.84 samples/sec Loss 1.6256 LearningRate 0.0002 Epoch: 22 Global Step: 38670 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-03-04 03:31:26,100-Speed 13740.02 samples/sec Loss 1.6223 LearningRate 0.0002 Epoch: 22 Global Step: 38680 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-03-04 03:31:43,951-Speed 13768.09 samples/sec Loss 1.6247 LearningRate 0.0002 Epoch: 22 Global Step: 38690 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-03-04 03:32:01,790-Speed 13776.96 samples/sec Loss 1.6303 LearningRate 0.0002 Epoch: 22 Global Step: 38700 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-03-04 03:32:19,526-Speed 13857.77 samples/sec Loss 1.6315 LearningRate 0.0002 Epoch: 22 Global Step: 38710 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-03-04 03:32:37,293-Speed 13833.51 samples/sec Loss 1.6122 LearningRate 0.0002 Epoch: 22 Global Step: 38720 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-03-04 03:32:55,073-Speed 13823.36 samples/sec Loss 1.6260 LearningRate 0.0002 Epoch: 22 Global Step: 38730 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-03-04 03:33:12,826-Speed 13844.06 samples/sec Loss 1.6183 LearningRate 0.0002 Epoch: 22 Global Step: 38740 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-03-04 03:33:30,619-Speed 13813.19 samples/sec Loss 1.6142 LearningRate 0.0002 Epoch: 22 Global Step: 38750 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-04 03:33:48,484-Speed 13757.22 samples/sec Loss 1.6221 LearningRate 0.0002 Epoch: 22 Global Step: 38760 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-04 03:34:06,204-Speed 13870.31 samples/sec Loss 1.6284 LearningRate 0.0002 Epoch: 22 Global Step: 38770 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-04 03:34:23,971-Speed 13832.48 samples/sec Loss 1.6087 LearningRate 0.0002 Epoch: 22 Global Step: 38780 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-04 03:34:41,750-Speed 13824.38 samples/sec Loss 1.6190 LearningRate 0.0002 Epoch: 22 Global Step: 38790 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-04 03:34:59,521-Speed 13830.23 samples/sec Loss 1.6108 LearningRate 0.0002 Epoch: 22 Global Step: 38800 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-04 03:35:17,216-Speed 13889.32 samples/sec Loss 1.6099 LearningRate 0.0002 Epoch: 22 Global Step: 38810 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-04 03:35:34,947-Speed 13862.58 samples/sec Loss 1.6136 LearningRate 0.0002 Epoch: 22 Global Step: 38820 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-04 03:35:52,765-Speed 13793.73 samples/sec Loss 1.6173 LearningRate 0.0002 Epoch: 22 Global Step: 38830 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-04 03:36:10,647-Speed 13743.84 samples/sec Loss 1.6042 LearningRate 0.0002 Epoch: 22 Global Step: 38840 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-04 03:36:28,352-Speed 13881.64 samples/sec Loss 1.6117 LearningRate 0.0002 Epoch: 22 Global Step: 38850 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-04 03:36:46,112-Speed 13840.20 samples/sec Loss 1.6066 LearningRate 0.0002 Epoch: 22 Global Step: 38860 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-04 03:37:03,879-Speed 13832.94 samples/sec Loss 1.6095 LearningRate 0.0002 Epoch: 22 Global Step: 38870 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-04 03:37:21,606-Speed 13864.56 samples/sec Loss 1.6121 LearningRate 0.0002 Epoch: 22 Global Step: 38880 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-04 03:37:39,392-Speed 13818.80 samples/sec Loss 1.6129 LearningRate 0.0002 Epoch: 22 Global Step: 38890 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-04 03:37:57,195-Speed 13804.82 samples/sec Loss 1.6069 LearningRate 0.0002 Epoch: 22 Global Step: 38900 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-04 03:38:15,092-Speed 13732.71 samples/sec Loss 1.6078 LearningRate 0.0002 Epoch: 22 Global Step: 38910 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-04 03:38:32,925-Speed 13782.60 samples/sec Loss 1.6005 LearningRate 0.0002 Epoch: 22 Global Step: 38920 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-04 03:38:50,714-Speed 13815.56 samples/sec Loss 1.6137 LearningRate 0.0002 Epoch: 22 Global Step: 38930 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-04 03:39:08,556-Speed 13775.67 samples/sec Loss 1.5957 LearningRate 0.0002 Epoch: 22 Global Step: 38940 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-04 03:39:26,329-Speed 13828.22 samples/sec Loss 1.6149 LearningRate 0.0002 Epoch: 22 Global Step: 38950 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-04 03:39:44,132-Speed 13805.36 samples/sec Loss 1.6030 LearningRate 0.0002 Epoch: 22 Global Step: 38960 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-04 03:40:01,970-Speed 13778.84 samples/sec Loss 1.6048 LearningRate 0.0002 Epoch: 22 Global Step: 38970 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-04 03:40:19,881-Speed 13722.23 samples/sec Loss 1.5952 LearningRate 0.0002 Epoch: 22 Global Step: 38980 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-04 03:40:37,677-Speed 13810.73 samples/sec Loss 1.6073 LearningRate 0.0002 Epoch: 22 Global Step: 38990 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-04 03:40:55,460-Speed 13820.49 samples/sec Loss 1.6169 LearningRate 0.0002 Epoch: 22 Global Step: 39000 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-04 03:41:13,240-Speed 13823.35 samples/sec Loss 1.6232 LearningRate 0.0002 Epoch: 22 Global Step: 39010 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-04 03:41:30,944-Speed 13882.70 samples/sec Loss 1.6059 LearningRate 0.0002 Epoch: 22 Global Step: 39020 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-04 03:41:48,721-Speed 13825.79 samples/sec Loss 1.6050 LearningRate 0.0002 Epoch: 22 Global Step: 39030 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-04 03:42:06,456-Speed 13857.72 samples/sec Loss 1.6095 LearningRate 0.0002 Epoch: 22 Global Step: 39040 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-04 03:42:24,305-Speed 13770.78 samples/sec Loss 1.6078 LearningRate 0.0002 Epoch: 22 Global Step: 39050 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-04 03:42:42,085-Speed 13822.75 samples/sec Loss 1.6113 LearningRate 0.0002 Epoch: 22 Global Step: 39060 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-04 03:42:59,840-Speed 13843.30 samples/sec Loss 1.6160 LearningRate 0.0002 Epoch: 22 Global Step: 39070 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-04 03:43:17,634-Speed 13811.93 samples/sec Loss 1.5957 LearningRate 0.0002 Epoch: 22 Global Step: 39080 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-04 03:43:35,367-Speed 13859.57 samples/sec Loss 1.6166 LearningRate 0.0002 Epoch: 22 Global Step: 39090 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-03-04 03:43:53,151-Speed 13820.53 samples/sec Loss 1.5948 LearningRate 0.0002 Epoch: 22 Global Step: 39100 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-03-04 03:44:10,986-Speed 13779.93 samples/sec Loss 1.6037 LearningRate 0.0002 Epoch: 22 Global Step: 39110 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-03-04 03:44:28,791-Speed 13804.08 samples/sec Loss 1.6209 LearningRate 0.0002 Epoch: 22 Global Step: 39120 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-03-04 03:44:46,514-Speed 13867.61 samples/sec Loss 1.6061 LearningRate 0.0002 Epoch: 22 Global Step: 39130 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-03-04 03:45:04,343-Speed 13785.06 samples/sec Loss 1.6027 LearningRate 0.0002 Epoch: 22 Global Step: 39140 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-03-04 03:45:22,059-Speed 13873.79 samples/sec Loss 1.5986 LearningRate 0.0002 Epoch: 22 Global Step: 39150 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-03-04 03:45:39,824-Speed 13835.24 samples/sec Loss 1.6040 LearningRate 0.0002 Epoch: 22 Global Step: 39160 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-03-04 03:45:57,621-Speed 13809.94 samples/sec Loss 1.5959 LearningRate 0.0002 Epoch: 22 Global Step: 39170 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-03-04 03:46:15,417-Speed 13810.78 samples/sec Loss 1.5944 LearningRate 0.0002 Epoch: 22 Global Step: 39180 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-03-04 03:46:33,129-Speed 13876.48 samples/sec Loss 1.6083 LearningRate 0.0002 Epoch: 22 Global Step: 39190 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-04 03:46:50,877-Speed 13847.74 samples/sec Loss 1.6001 LearningRate 0.0002 Epoch: 22 Global Step: 39200 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-04 03:47:08,727-Speed 13768.89 samples/sec Loss 1.5979 LearningRate 0.0002 Epoch: 22 Global Step: 39210 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-04 03:47:26,472-Speed 13851.03 samples/sec Loss 1.5995 LearningRate 0.0002 Epoch: 22 Global Step: 39220 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-04 03:47:44,207-Speed 13858.34 samples/sec Loss 1.6020 LearningRate 0.0002 Epoch: 22 Global Step: 39230 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-04 03:48:02,236-Speed 13631.77 samples/sec Loss 1.6072 LearningRate 0.0002 Epoch: 22 Global Step: 39240 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-04 03:48:19,995-Speed 13838.95 samples/sec Loss 1.5925 LearningRate 0.0002 Epoch: 22 Global Step: 39250 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-04 03:48:37,729-Speed 13859.75 samples/sec Loss 1.5910 LearningRate 0.0002 Epoch: 22 Global Step: 39260 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-03-04 03:48:55,567-Speed 13777.97 samples/sec Loss 1.5929 LearningRate 0.0002 Epoch: 22 Global Step: 39270 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-03-04 03:49:13,322-Speed 13842.92 samples/sec Loss 1.5968 LearningRate 0.0002 Epoch: 22 Global Step: 39280 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-03-04 03:49:31,143-Speed 13790.98 samples/sec Loss 1.5951 LearningRate 0.0002 Epoch: 22 Global Step: 39290 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-03-04 03:49:48,900-Speed 13841.42 samples/sec Loss 1.5898 LearningRate 0.0002 Epoch: 22 Global Step: 39300 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-03-04 03:50:06,742-Speed 13775.00 samples/sec Loss 1.5930 LearningRate 0.0002 Epoch: 22 Global Step: 39310 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-03-04 03:50:24,521-Speed 13824.06 samples/sec Loss 1.5906 LearningRate 0.0002 Epoch: 22 Global Step: 39320 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-03-04 03:50:42,376-Speed 13764.78 samples/sec Loss 1.5893 LearningRate 0.0002 Epoch: 22 Global Step: 39330 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-03-04 03:51:00,098-Speed 13868.67 samples/sec Loss 1.6045 LearningRate 0.0002 Epoch: 22 Global Step: 39340 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-03-04 03:51:17,870-Speed 13829.40 samples/sec Loss 1.6059 LearningRate 0.0002 Epoch: 22 Global Step: 39350 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-03-04 03:51:35,705-Speed 13780.43 samples/sec Loss 1.6111 LearningRate 0.0002 Epoch: 22 Global Step: 39360 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-04 03:51:53,417-Speed 13876.65 samples/sec Loss 1.6001 LearningRate 0.0002 Epoch: 22 Global Step: 39370 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-04 03:52:11,145-Speed 13863.23 samples/sec Loss 1.5856 LearningRate 0.0002 Epoch: 22 Global Step: 39380 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-04 03:52:28,857-Speed 13876.29 samples/sec Loss 1.5804 LearningRate 0.0002 Epoch: 22 Global Step: 39390 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-04 03:52:46,649-Speed 13813.67 samples/sec Loss 1.5844 LearningRate 0.0002 Epoch: 22 Global Step: 39400 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-04 03:53:04,466-Speed 13795.34 samples/sec Loss 1.5879 LearningRate 0.0002 Epoch: 22 Global Step: 39410 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-04 03:53:22,270-Speed 13804.08 samples/sec Loss 1.5954 LearningRate 0.0002 Epoch: 22 Global Step: 39420 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-04 03:53:40,055-Speed 13819.40 samples/sec Loss 1.5932 LearningRate 0.0002 Epoch: 22 Global Step: 39430 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-04 03:53:57,861-Speed 13802.76 samples/sec Loss 1.5997 LearningRate 0.0002 Epoch: 22 Global Step: 39440 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-04 03:54:15,749-Speed 13740.19 samples/sec Loss 1.5893 LearningRate 0.0002 Epoch: 22 Global Step: 39450 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-04 03:54:33,649-Speed 13730.09 samples/sec Loss 1.5906 LearningRate 0.0002 Epoch: 22 Global Step: 39460 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-04 03:54:51,544-Speed 13734.01 samples/sec Loss 1.5745 LearningRate 0.0002 Epoch: 22 Global Step: 39470 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-04 03:55:09,609-Speed 13605.59 samples/sec Loss 1.5849 LearningRate 0.0002 Epoch: 22 Global Step: 39480 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-04 03:55:27,491-Speed 13744.76 samples/sec Loss 1.5819 LearningRate 0.0002 Epoch: 22 Global Step: 39490 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-04 03:55:45,182-Speed 13892.04 samples/sec Loss 1.5921 LearningRate 0.0002 Epoch: 22 Global Step: 39500 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-04 03:56:02,971-Speed 13815.87 samples/sec Loss 1.5871 LearningRate 0.0002 Epoch: 22 Global Step: 39510 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-04 03:56:20,669-Speed 13887.91 samples/sec Loss 1.5843 LearningRate 0.0002 Epoch: 22 Global Step: 39520 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-04 03:56:38,480-Speed 13799.04 samples/sec Loss 1.5897 LearningRate 0.0002 Epoch: 22 Global Step: 39530 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-04 03:56:56,397-Speed 13717.10 samples/sec Loss 1.5837 LearningRate 0.0002 Epoch: 22 Global Step: 39540 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-04 03:57:14,133-Speed 13857.26 samples/sec Loss 1.5835 LearningRate 0.0002 Epoch: 22 Global Step: 39550 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-04 03:57:31,912-Speed 13824.73 samples/sec Loss 1.5969 LearningRate 0.0002 Epoch: 22 Global Step: 39560 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-04 03:57:49,668-Speed 13841.56 samples/sec Loss 1.5855 LearningRate 0.0002 Epoch: 22 Global Step: 39570 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-04 03:58:07,437-Speed 13831.57 samples/sec Loss 1.6005 LearningRate 0.0002 Epoch: 22 Global Step: 39580 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-04 03:58:25,201-Speed 13835.96 samples/sec Loss 1.5914 LearningRate 0.0002 Epoch: 22 Global Step: 39590 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-04 03:58:43,034-Speed 13781.73 samples/sec Loss 1.5790 LearningRate 0.0002 Epoch: 22 Global Step: 39600 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-04 03:59:00,858-Speed 13788.97 samples/sec Loss 1.6013 LearningRate 0.0002 Epoch: 22 Global Step: 39610 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-04 03:59:18,675-Speed 13794.67 samples/sec Loss 1.5945 LearningRate 0.0002 Epoch: 22 Global Step: 39620 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-04 03:59:36,481-Speed 13802.67 samples/sec Loss 1.5815 LearningRate 0.0002 Epoch: 22 Global Step: 39630 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-04 03:59:54,244-Speed 13836.72 samples/sec Loss 1.5786 LearningRate 0.0002 Epoch: 22 Global Step: 39640 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-04 04:00:12,098-Speed 13766.04 samples/sec Loss 1.5694 LearningRate 0.0002 Epoch: 22 Global Step: 39650 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-04 04:00:29,942-Speed 13773.60 samples/sec Loss 1.5781 LearningRate 0.0002 Epoch: 22 Global Step: 39660 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-04 04:00:47,745-Speed 13804.83 samples/sec Loss 1.5880 LearningRate 0.0002 Epoch: 22 Global Step: 39670 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-04 04:01:05,564-Speed 13795.68 samples/sec Loss 1.5897 LearningRate 0.0002 Epoch: 22 Global Step: 39680 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-04 04:01:23,378-Speed 13796.96 samples/sec Loss 1.5889 LearningRate 0.0002 Epoch: 22 Global Step: 39690 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-04 04:01:41,191-Speed 13797.13 samples/sec Loss 1.5983 LearningRate 0.0002 Epoch: 22 Global Step: 39700 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-04 04:01:59,023-Speed 13782.42 samples/sec Loss 1.5868 LearningRate 0.0002 Epoch: 22 Global Step: 39710 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-04 04:02:16,800-Speed 13825.74 samples/sec Loss 1.5963 LearningRate 0.0002 Epoch: 22 Global Step: 39720 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-04 04:02:34,588-Speed 13817.32 samples/sec Loss 1.5983 LearningRate 0.0002 Epoch: 22 Global Step: 39730 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-04 04:02:52,301-Speed 13875.16 samples/sec Loss 1.5874 LearningRate 0.0002 Epoch: 22 Global Step: 39740 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-04 04:03:10,126-Speed 13787.95 samples/sec Loss 1.5920 LearningRate 0.0002 Epoch: 22 Global Step: 39750 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-04 04:04:18,626-Speed 3587.85 samples/sec Loss 1.5876 LearningRate 0.0002 Epoch: 23 Global Step: 39760 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-04 04:04:36,309-Speed 13899.76 samples/sec Loss 1.5723 LearningRate 0.0002 Epoch: 23 Global Step: 39770 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-04 04:04:53,993-Speed 13897.54 samples/sec Loss 1.5769 LearningRate 0.0002 Epoch: 23 Global Step: 39780 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-04 04:05:11,899-Speed 13725.98 samples/sec Loss 1.5585 LearningRate 0.0002 Epoch: 23 Global Step: 39790 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-04 04:05:29,687-Speed 13816.99 samples/sec Loss 1.5692 LearningRate 0.0002 Epoch: 23 Global Step: 39800 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-04 04:05:47,559-Speed 13752.62 samples/sec Loss 1.5561 LearningRate 0.0002 Epoch: 23 Global Step: 39810 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-04 04:06:05,328-Speed 13831.53 samples/sec Loss 1.5682 LearningRate 0.0002 Epoch: 23 Global Step: 39820 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-04 04:06:23,152-Speed 13788.66 samples/sec Loss 1.5607 LearningRate 0.0002 Epoch: 23 Global Step: 39830 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-04 04:06:40,982-Speed 13784.61 samples/sec Loss 1.5656 LearningRate 0.0002 Epoch: 23 Global Step: 39840 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-04 04:06:58,762-Speed 13823.55 samples/sec Loss 1.5648 LearningRate 0.0002 Epoch: 23 Global Step: 39850 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-04 04:07:16,660-Speed 13731.80 samples/sec Loss 1.5700 LearningRate 0.0002 Epoch: 23 Global Step: 39860 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-04 04:07:34,394-Speed 13858.77 samples/sec Loss 1.5666 LearningRate 0.0002 Epoch: 23 Global Step: 39870 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-04 04:07:52,174-Speed 13823.60 samples/sec Loss 1.5642 LearningRate 0.0002 Epoch: 23 Global Step: 39880 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-04 04:08:10,005-Speed 13784.85 samples/sec Loss 1.5652 LearningRate 0.0002 Epoch: 23 Global Step: 39890 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-03-04 04:08:27,734-Speed 13862.97 samples/sec Loss 1.5793 LearningRate 0.0002 Epoch: 23 Global Step: 39900 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-04 04:08:45,598-Speed 13758.88 samples/sec Loss 1.5747 LearningRate 0.0002 Epoch: 23 Global Step: 39910 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-04 04:09:03,460-Speed 13760.11 samples/sec Loss 1.5659 LearningRate 0.0002 Epoch: 23 Global Step: 39920 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-04 04:09:21,301-Speed 13776.11 samples/sec Loss 1.5636 LearningRate 0.0002 Epoch: 23 Global Step: 39930 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-04 04:09:39,061-Speed 13838.70 samples/sec Loss 1.5773 LearningRate 0.0002 Epoch: 23 Global Step: 39940 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-04 04:09:56,836-Speed 13827.09 samples/sec Loss 1.5752 LearningRate 0.0002 Epoch: 23 Global Step: 39950 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-03-04 04:10:14,706-Speed 13753.01 samples/sec Loss 1.5644 LearningRate 0.0002 Epoch: 23 Global Step: 39960 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-03-04 04:10:32,466-Speed 13838.56 samples/sec Loss 1.5583 LearningRate 0.0002 Epoch: 23 Global Step: 39970 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-03-04 04:10:50,349-Speed 13744.59 samples/sec Loss 1.5758 LearningRate 0.0002 Epoch: 23 Global Step: 39980 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-03-04 04:11:08,103-Speed 13842.99 samples/sec Loss 1.5723 LearningRate 0.0002 Epoch: 23 Global Step: 39990 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-03-04 04:11:25,863-Speed 13839.04 samples/sec Loss 1.5682 LearningRate 0.0002 Epoch: 23 Global Step: 40000 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-03-04 04:11:43,633-Speed 13830.49 samples/sec Loss 1.5758 LearningRate 0.0002 Epoch: 23 Global Step: 40010 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-03-04 04:12:01,439-Speed 13803.06 samples/sec Loss 1.5654 LearningRate 0.0002 Epoch: 23 Global Step: 40020 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-03-04 04:12:19,286-Speed 13773.90 samples/sec Loss 1.5601 LearningRate 0.0002 Epoch: 23 Global Step: 40030 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-03-04 04:12:37,140-Speed 13766.03 samples/sec Loss 1.5613 LearningRate 0.0002 Epoch: 23 Global Step: 40040 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-03-04 04:12:54,983-Speed 13774.17 samples/sec Loss 1.5605 LearningRate 0.0002 Epoch: 23 Global Step: 40050 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-04 04:13:12,817-Speed 13781.21 samples/sec Loss 1.5577 LearningRate 0.0002 Epoch: 23 Global Step: 40060 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-04 04:13:30,600-Speed 13821.86 samples/sec Loss 1.5690 LearningRate 0.0002 Epoch: 23 Global Step: 40070 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-04 04:13:48,338-Speed 13856.16 samples/sec Loss 1.5690 LearningRate 0.0002 Epoch: 23 Global Step: 40080 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-04 04:14:06,159-Speed 13790.91 samples/sec Loss 1.5727 LearningRate 0.0002 Epoch: 23 Global Step: 40090 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-04 04:14:24,168-Speed 13647.48 samples/sec Loss 1.5587 LearningRate 0.0002 Epoch: 23 Global Step: 40100 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-03-04 04:14:41,974-Speed 13803.11 samples/sec Loss 1.5627 LearningRate 0.0002 Epoch: 23 Global Step: 40110 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-03-04 04:14:59,728-Speed 13843.55 samples/sec Loss 1.5659 LearningRate 0.0002 Epoch: 23 Global Step: 40120 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-03-04 04:15:17,405-Speed 13903.53 samples/sec Loss 1.5683 LearningRate 0.0002 Epoch: 23 Global Step: 40130 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-03-04 04:15:35,168-Speed 13836.32 samples/sec Loss 1.5706 LearningRate 0.0002 Epoch: 23 Global Step: 40140 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-03-04 04:15:53,002-Speed 13781.32 samples/sec Loss 1.5798 LearningRate 0.0002 Epoch: 23 Global Step: 40150 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-03-04 04:16:10,865-Speed 13760.03 samples/sec Loss 1.5724 LearningRate 0.0002 Epoch: 23 Global Step: 40160 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-03-04 04:16:28,603-Speed 13855.70 samples/sec Loss 1.5661 LearningRate 0.0002 Epoch: 23 Global Step: 40170 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-03-04 04:16:46,360-Speed 13841.09 samples/sec Loss 1.5612 LearningRate 0.0002 Epoch: 23 Global Step: 40180 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-03-04 04:17:04,146-Speed 13818.16 samples/sec Loss 1.5550 LearningRate 0.0002 Epoch: 23 Global Step: 40190 Fp16 Grad Scale: 16384 Required: 15 hours Training: 2022-03-04 04:17:21,932-Speed 13818.55 samples/sec Loss 1.5624 LearningRate 0.0002 Epoch: 23 Global Step: 40200 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-04 04:17:39,681-Speed 13847.47 samples/sec Loss 1.5580 LearningRate 0.0002 Epoch: 23 Global Step: 40210 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-04 04:17:57,449-Speed 13832.43 samples/sec Loss 1.5598 LearningRate 0.0002 Epoch: 23 Global Step: 40220 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-04 04:18:15,387-Speed 13702.28 samples/sec Loss 1.5557 LearningRate 0.0002 Epoch: 23 Global Step: 40230 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-04 04:18:33,204-Speed 13793.90 samples/sec Loss 1.5577 LearningRate 0.0002 Epoch: 23 Global Step: 40240 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-04 04:18:50,979-Speed 13827.26 samples/sec Loss 1.5547 LearningRate 0.0002 Epoch: 23 Global Step: 40250 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-04 04:19:08,814-Speed 13781.22 samples/sec Loss 1.5580 LearningRate 0.0002 Epoch: 23 Global Step: 40260 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-04 04:19:26,608-Speed 13812.19 samples/sec Loss 1.5664 LearningRate 0.0002 Epoch: 23 Global Step: 40270 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-03-04 04:19:44,354-Speed 13850.08 samples/sec Loss 1.5470 LearningRate 0.0002 Epoch: 23 Global Step: 40280 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-04 04:20:02,224-Speed 13753.80 samples/sec Loss 1.5591 LearningRate 0.0002 Epoch: 23 Global Step: 40290 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-04 04:20:20,014-Speed 13815.30 samples/sec Loss 1.5646 LearningRate 0.0002 Epoch: 23 Global Step: 40300 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-04 04:20:37,789-Speed 13826.57 samples/sec Loss 1.5539 LearningRate 0.0002 Epoch: 23 Global Step: 40310 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-04 04:20:55,556-Speed 13833.79 samples/sec Loss 1.5533 LearningRate 0.0002 Epoch: 23 Global Step: 40320 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-04 04:21:13,294-Speed 13856.70 samples/sec Loss 1.5585 LearningRate 0.0002 Epoch: 23 Global Step: 40330 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-04 04:21:31,058-Speed 13835.61 samples/sec Loss 1.5403 LearningRate 0.0002 Epoch: 23 Global Step: 40340 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-04 04:21:48,885-Speed 13786.23 samples/sec Loss 1.5554 LearningRate 0.0002 Epoch: 23 Global Step: 40350 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-04 04:22:06,681-Speed 13811.63 samples/sec Loss 1.5568 LearningRate 0.0002 Epoch: 23 Global Step: 40360 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-04 04:22:24,547-Speed 13755.95 samples/sec Loss 1.5429 LearningRate 0.0002 Epoch: 23 Global Step: 40370 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-04 04:22:42,527-Speed 13670.69 samples/sec Loss 1.5478 LearningRate 0.0002 Epoch: 23 Global Step: 40380 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-04 04:23:00,451-Speed 13712.39 samples/sec Loss 1.5633 LearningRate 0.0002 Epoch: 23 Global Step: 40390 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-04 04:23:18,310-Speed 13762.14 samples/sec Loss 1.5442 LearningRate 0.0002 Epoch: 23 Global Step: 40400 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-04 04:23:36,151-Speed 13776.01 samples/sec Loss 1.5472 LearningRate 0.0002 Epoch: 23 Global Step: 40410 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-04 04:23:53,989-Speed 13778.40 samples/sec Loss 1.5458 LearningRate 0.0002 Epoch: 23 Global Step: 40420 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-04 04:24:11,872-Speed 13743.39 samples/sec Loss 1.5493 LearningRate 0.0002 Epoch: 23 Global Step: 40430 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-04 04:24:29,673-Speed 13806.86 samples/sec Loss 1.5451 LearningRate 0.0002 Epoch: 23 Global Step: 40440 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-04 04:24:47,433-Speed 13838.70 samples/sec Loss 1.5473 LearningRate 0.0002 Epoch: 23 Global Step: 40450 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-04 04:25:05,208-Speed 13827.08 samples/sec Loss 1.5544 LearningRate 0.0002 Epoch: 23 Global Step: 40460 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-04 04:25:23,035-Speed 13786.52 samples/sec Loss 1.5403 LearningRate 0.0002 Epoch: 23 Global Step: 40470 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-04 04:25:40,819-Speed 13820.47 samples/sec Loss 1.5531 LearningRate 0.0002 Epoch: 23 Global Step: 40480 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-04 04:25:58,669-Speed 13768.34 samples/sec Loss 1.5511 LearningRate 0.0002 Epoch: 23 Global Step: 40490 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-04 04:26:16,511-Speed 13775.04 samples/sec Loss 1.5456 LearningRate 0.0002 Epoch: 23 Global Step: 40500 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-04 04:26:34,259-Speed 13848.42 samples/sec Loss 1.5581 LearningRate 0.0002 Epoch: 23 Global Step: 40510 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-04 04:26:51,997-Speed 13856.50 samples/sec Loss 1.5509 LearningRate 0.0002 Epoch: 23 Global Step: 40520 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-04 04:27:09,759-Speed 13836.58 samples/sec Loss 1.5393 LearningRate 0.0002 Epoch: 23 Global Step: 40530 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-04 04:27:27,453-Speed 13890.72 samples/sec Loss 1.5531 LearningRate 0.0002 Epoch: 23 Global Step: 40540 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-04 04:27:45,231-Speed 13824.33 samples/sec Loss 1.5352 LearningRate 0.0002 Epoch: 23 Global Step: 40550 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-04 04:28:03,022-Speed 13814.85 samples/sec Loss 1.5380 LearningRate 0.0002 Epoch: 23 Global Step: 40560 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-04 04:28:20,782-Speed 13838.78 samples/sec Loss 1.5352 LearningRate 0.0002 Epoch: 23 Global Step: 40570 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-04 04:28:38,502-Speed 13869.68 samples/sec Loss 1.5480 LearningRate 0.0002 Epoch: 23 Global Step: 40580 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-04 04:28:56,342-Speed 13776.50 samples/sec Loss 1.5492 LearningRate 0.0002 Epoch: 23 Global Step: 40590 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-04 04:29:14,069-Speed 13865.03 samples/sec Loss 1.5550 LearningRate 0.0002 Epoch: 23 Global Step: 40600 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-03-04 04:29:31,894-Speed 13788.33 samples/sec Loss 1.5528 LearningRate 0.0002 Epoch: 23 Global Step: 40610 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-03-04 04:29:49,611-Speed 13872.29 samples/sec Loss 1.5418 LearningRate 0.0002 Epoch: 23 Global Step: 40620 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-03-04 04:30:07,389-Speed 13824.45 samples/sec Loss 1.5449 LearningRate 0.0002 Epoch: 23 Global Step: 40630 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-03-04 04:30:25,176-Speed 13817.90 samples/sec Loss 1.5429 LearningRate 0.0002 Epoch: 23 Global Step: 40640 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-03-04 04:30:43,111-Speed 13703.95 samples/sec Loss 1.5331 LearningRate 0.0002 Epoch: 23 Global Step: 40650 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-03-04 04:31:00,935-Speed 13789.03 samples/sec Loss 1.5392 LearningRate 0.0002 Epoch: 23 Global Step: 40660 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-03-04 04:31:18,696-Speed 13837.52 samples/sec Loss 1.5412 LearningRate 0.0002 Epoch: 23 Global Step: 40670 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-03-04 04:31:36,405-Speed 13878.81 samples/sec Loss 1.5354 LearningRate 0.0002 Epoch: 23 Global Step: 40680 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-03-04 04:31:54,144-Speed 13855.06 samples/sec Loss 1.5287 LearningRate 0.0002 Epoch: 23 Global Step: 40690 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-03-04 04:32:11,900-Speed 13842.32 samples/sec Loss 1.5490 LearningRate 0.0002 Epoch: 23 Global Step: 40700 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-04 04:32:29,777-Speed 13748.87 samples/sec Loss 1.5413 LearningRate 0.0002 Epoch: 23 Global Step: 40710 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-04 04:32:47,561-Speed 13819.97 samples/sec Loss 1.5417 LearningRate 0.0002 Epoch: 23 Global Step: 40720 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-04 04:33:05,403-Speed 13775.29 samples/sec Loss 1.5448 LearningRate 0.0002 Epoch: 23 Global Step: 40730 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-03-04 04:33:23,271-Speed 13754.71 samples/sec Loss 1.5347 LearningRate 0.0002 Epoch: 23 Global Step: 40740 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-03-04 04:33:41,128-Speed 13763.49 samples/sec Loss 1.5339 LearningRate 0.0002 Epoch: 23 Global Step: 40750 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-03-04 04:33:58,953-Speed 13788.09 samples/sec Loss 1.5470 LearningRate 0.0002 Epoch: 23 Global Step: 40760 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-03-04 04:34:16,742-Speed 13816.72 samples/sec Loss 1.5378 LearningRate 0.0002 Epoch: 23 Global Step: 40770 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-03-04 04:34:34,556-Speed 13796.95 samples/sec Loss 1.5292 LearningRate 0.0002 Epoch: 23 Global Step: 40780 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-03-04 04:34:52,347-Speed 13814.26 samples/sec Loss 1.5434 LearningRate 0.0002 Epoch: 23 Global Step: 40790 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-03-04 04:35:10,146-Speed 13808.25 samples/sec Loss 1.5421 LearningRate 0.0002 Epoch: 23 Global Step: 40800 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-03-04 04:35:27,974-Speed 13785.77 samples/sec Loss 1.5437 LearningRate 0.0002 Epoch: 23 Global Step: 40810 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-03-04 04:35:45,844-Speed 13754.00 samples/sec Loss 1.5345 LearningRate 0.0002 Epoch: 23 Global Step: 40820 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-03-04 04:36:03,688-Speed 13773.65 samples/sec Loss 1.5199 LearningRate 0.0002 Epoch: 23 Global Step: 40830 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-04 04:36:21,545-Speed 13763.67 samples/sec Loss 1.5312 LearningRate 0.0002 Epoch: 23 Global Step: 40840 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-04 04:36:39,282-Speed 13856.56 samples/sec Loss 1.5398 LearningRate 0.0002 Epoch: 23 Global Step: 40850 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-04 04:36:57,033-Speed 13846.02 samples/sec Loss 1.5311 LearningRate 0.0002 Epoch: 23 Global Step: 40860 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-04 04:37:14,838-Speed 13803.80 samples/sec Loss 1.5382 LearningRate 0.0002 Epoch: 23 Global Step: 40870 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-04 04:37:32,634-Speed 13810.75 samples/sec Loss 1.5301 LearningRate 0.0002 Epoch: 23 Global Step: 40880 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-04 04:37:50,415-Speed 13822.25 samples/sec Loss 1.5346 LearningRate 0.0002 Epoch: 23 Global Step: 40890 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-03-04 04:38:08,209-Speed 13812.54 samples/sec Loss 1.5293 LearningRate 0.0002 Epoch: 23 Global Step: 40900 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-03-04 04:38:26,031-Speed 13790.35 samples/sec Loss 1.5461 LearningRate 0.0002 Epoch: 23 Global Step: 40910 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-03-04 04:38:43,779-Speed 13848.19 samples/sec Loss 1.5399 LearningRate 0.0002 Epoch: 23 Global Step: 40920 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-03-04 04:39:01,586-Speed 13801.98 samples/sec Loss 1.5393 LearningRate 0.0002 Epoch: 23 Global Step: 40930 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-03-04 04:39:19,387-Speed 13807.39 samples/sec Loss 1.5313 LearningRate 0.0002 Epoch: 23 Global Step: 40940 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-03-04 04:39:37,234-Speed 13771.15 samples/sec Loss 1.5172 LearningRate 0.0002 Epoch: 23 Global Step: 40950 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-03-04 04:39:55,002-Speed 13833.54 samples/sec Loss 1.5258 LearningRate 0.0002 Epoch: 23 Global Step: 40960 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-03-04 04:40:12,757-Speed 13842.42 samples/sec Loss 1.5264 LearningRate 0.0002 Epoch: 23 Global Step: 40970 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-03-04 04:40:30,618-Speed 13760.21 samples/sec Loss 1.5233 LearningRate 0.0002 Epoch: 23 Global Step: 40980 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-03-04 04:40:48,369-Speed 13846.01 samples/sec Loss 1.5247 LearningRate 0.0002 Epoch: 23 Global Step: 40990 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-04 04:41:06,164-Speed 13812.01 samples/sec Loss 1.5332 LearningRate 0.0002 Epoch: 23 Global Step: 41000 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-04 04:41:23,938-Speed 13827.19 samples/sec Loss 1.5305 LearningRate 0.0002 Epoch: 23 Global Step: 41010 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-04 04:41:41,739-Speed 13806.62 samples/sec Loss 1.5245 LearningRate 0.0002 Epoch: 23 Global Step: 41020 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-04 04:41:59,428-Speed 13895.03 samples/sec Loss 1.5246 LearningRate 0.0002 Epoch: 23 Global Step: 41030 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-04 04:42:17,199-Speed 13829.75 samples/sec Loss 1.5173 LearningRate 0.0002 Epoch: 23 Global Step: 41040 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-04 04:42:34,938-Speed 13855.71 samples/sec Loss 1.5282 LearningRate 0.0002 Epoch: 23 Global Step: 41050 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-04 04:42:52,697-Speed 13839.14 samples/sec Loss 1.5232 LearningRate 0.0002 Epoch: 23 Global Step: 41060 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-04 04:43:10,435-Speed 13856.02 samples/sec Loss 1.5209 LearningRate 0.0002 Epoch: 23 Global Step: 41070 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-04 04:43:28,172-Speed 13857.46 samples/sec Loss 1.5230 LearningRate 0.0002 Epoch: 23 Global Step: 41080 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-04 04:43:45,897-Speed 13865.49 samples/sec Loss 1.5465 LearningRate 0.0002 Epoch: 23 Global Step: 41090 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-04 04:44:03,667-Speed 13831.09 samples/sec Loss 1.5172 LearningRate 0.0002 Epoch: 23 Global Step: 41100 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-04 04:44:21,383-Speed 13873.14 samples/sec Loss 1.5293 LearningRate 0.0002 Epoch: 23 Global Step: 41110 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-04 04:44:39,221-Speed 13777.84 samples/sec Loss 1.5179 LearningRate 0.0002 Epoch: 23 Global Step: 41120 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-04 04:44:56,970-Speed 13847.41 samples/sec Loss 1.5268 LearningRate 0.0002 Epoch: 23 Global Step: 41130 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-04 04:45:14,794-Speed 13788.96 samples/sec Loss 1.5222 LearningRate 0.0002 Epoch: 23 Global Step: 41140 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-04 04:45:32,631-Speed 13779.14 samples/sec Loss 1.5223 LearningRate 0.0002 Epoch: 23 Global Step: 41150 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-04 04:45:50,403-Speed 13829.68 samples/sec Loss 1.5258 LearningRate 0.0002 Epoch: 23 Global Step: 41160 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-04 04:46:08,082-Speed 13902.35 samples/sec Loss 1.5268 LearningRate 0.0002 Epoch: 23 Global Step: 41170 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-04 04:46:25,843-Speed 13837.91 samples/sec Loss 1.5099 LearningRate 0.0002 Epoch: 23 Global Step: 41180 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-04 04:46:43,651-Speed 13801.03 samples/sec Loss 1.5382 LearningRate 0.0002 Epoch: 23 Global Step: 41190 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-04 04:47:01,429-Speed 13824.73 samples/sec Loss 1.5216 LearningRate 0.0002 Epoch: 23 Global Step: 41200 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-04 04:47:19,210-Speed 13822.39 samples/sec Loss 1.5084 LearningRate 0.0002 Epoch: 23 Global Step: 41210 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-04 04:47:36,988-Speed 13825.09 samples/sec Loss 1.5241 LearningRate 0.0002 Epoch: 23 Global Step: 41220 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-04 04:47:54,738-Speed 13846.49 samples/sec Loss 1.5201 LearningRate 0.0002 Epoch: 23 Global Step: 41230 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-04 04:48:12,608-Speed 13753.51 samples/sec Loss 1.5234 LearningRate 0.0002 Epoch: 23 Global Step: 41240 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-04 04:48:30,410-Speed 13805.92 samples/sec Loss 1.5208 LearningRate 0.0002 Epoch: 23 Global Step: 41250 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-04 04:48:48,209-Speed 13808.17 samples/sec Loss 1.5212 LearningRate 0.0002 Epoch: 23 Global Step: 41260 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-04 04:49:05,947-Speed 13856.22 samples/sec Loss 1.5263 LearningRate 0.0002 Epoch: 23 Global Step: 41270 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-04 04:49:23,704-Speed 13840.56 samples/sec Loss 1.5216 LearningRate 0.0002 Epoch: 23 Global Step: 41280 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-04 04:49:41,452-Speed 13848.57 samples/sec Loss 1.5097 LearningRate 0.0002 Epoch: 23 Global Step: 41290 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-04 04:49:59,242-Speed 13815.48 samples/sec Loss 1.5154 LearningRate 0.0002 Epoch: 23 Global Step: 41300 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-04 04:50:17,053-Speed 13799.20 samples/sec Loss 1.5194 LearningRate 0.0002 Epoch: 23 Global Step: 41310 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-04 04:50:34,904-Speed 13768.03 samples/sec Loss 1.5218 LearningRate 0.0002 Epoch: 23 Global Step: 41320 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-04 04:50:52,648-Speed 13851.11 samples/sec Loss 1.5199 LearningRate 0.0002 Epoch: 23 Global Step: 41330 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-04 04:51:10,394-Speed 13849.80 samples/sec Loss 1.5178 LearningRate 0.0002 Epoch: 23 Global Step: 41340 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-04 04:51:28,195-Speed 13806.75 samples/sec Loss 1.5280 LearningRate 0.0002 Epoch: 23 Global Step: 41350 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-04 04:51:45,912-Speed 13871.79 samples/sec Loss 1.5206 LearningRate 0.0002 Epoch: 23 Global Step: 41360 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-04 04:52:03,636-Speed 13866.59 samples/sec Loss 1.5294 LearningRate 0.0002 Epoch: 23 Global Step: 41370 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-04 04:52:21,539-Speed 13728.45 samples/sec Loss 1.5120 LearningRate 0.0002 Epoch: 23 Global Step: 41380 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-04 04:52:39,333-Speed 13812.92 samples/sec Loss 1.5307 LearningRate 0.0002 Epoch: 23 Global Step: 41390 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-04 04:52:57,086-Speed 13843.49 samples/sec Loss 1.5096 LearningRate 0.0002 Epoch: 23 Global Step: 41400 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-04 04:53:14,937-Speed 13767.95 samples/sec Loss 1.5184 LearningRate 0.0002 Epoch: 23 Global Step: 41410 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-04 04:53:32,731-Speed 13812.62 samples/sec Loss 1.5185 LearningRate 0.0002 Epoch: 23 Global Step: 41420 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-04 04:53:50,521-Speed 13815.82 samples/sec Loss 1.5287 LearningRate 0.0002 Epoch: 23 Global Step: 41430 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-04 04:54:08,176-Speed 13920.28 samples/sec Loss 1.5214 LearningRate 0.0002 Epoch: 23 Global Step: 41440 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-04 04:54:25,930-Speed 13843.99 samples/sec Loss 1.5171 LearningRate 0.0002 Epoch: 23 Global Step: 41450 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-04 04:54:43,743-Speed 13797.32 samples/sec Loss 1.5238 LearningRate 0.0002 Epoch: 23 Global Step: 41460 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-04 04:55:01,730-Speed 13663.83 samples/sec Loss 1.5203 LearningRate 0.0002 Epoch: 23 Global Step: 41470 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-04 04:55:19,660-Speed 13708.87 samples/sec Loss 1.5264 LearningRate 0.0002 Epoch: 23 Global Step: 41480 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-04 04:56:28,753-Speed 3556.99 samples/sec Loss 1.4998 LearningRate 0.0002 Epoch: 24 Global Step: 41490 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-04 04:56:46,599-Speed 13772.14 samples/sec Loss 1.5048 LearningRate 0.0002 Epoch: 24 Global Step: 41500 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-04 04:57:04,464-Speed 13757.45 samples/sec Loss 1.5084 LearningRate 0.0002 Epoch: 24 Global Step: 41510 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-04 04:57:22,256-Speed 13814.03 samples/sec Loss 1.4967 LearningRate 0.0002 Epoch: 24 Global Step: 41520 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-04 04:57:40,049-Speed 13812.33 samples/sec Loss 1.5002 LearningRate 0.0002 Epoch: 24 Global Step: 41530 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-04 04:57:57,890-Speed 13775.74 samples/sec Loss 1.5011 LearningRate 0.0002 Epoch: 24 Global Step: 41540 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-04 04:58:15,894-Speed 13651.99 samples/sec Loss 1.5059 LearningRate 0.0002 Epoch: 24 Global Step: 41550 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-04 04:58:33,755-Speed 13760.03 samples/sec Loss 1.5079 LearningRate 0.0002 Epoch: 24 Global Step: 41560 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-04 04:58:51,555-Speed 13808.01 samples/sec Loss 1.4882 LearningRate 0.0002 Epoch: 24 Global Step: 41570 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-04 04:59:09,401-Speed 13771.98 samples/sec Loss 1.5062 LearningRate 0.0002 Epoch: 24 Global Step: 41580 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-03-04 04:59:27,228-Speed 13786.85 samples/sec Loss 1.4961 LearningRate 0.0002 Epoch: 24 Global Step: 41590 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-03-04 04:59:45,102-Speed 13749.91 samples/sec Loss 1.5092 LearningRate 0.0002 Epoch: 24 Global Step: 41600 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-03-04 05:00:02,938-Speed 13780.37 samples/sec Loss 1.4950 LearningRate 0.0002 Epoch: 24 Global Step: 41610 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-03-04 05:00:20,777-Speed 13777.78 samples/sec Loss 1.5115 LearningRate 0.0002 Epoch: 24 Global Step: 41620 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-03-04 05:00:38,647-Speed 13753.78 samples/sec Loss 1.5061 LearningRate 0.0002 Epoch: 24 Global Step: 41630 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-03-04 05:00:56,446-Speed 13809.10 samples/sec Loss 1.4994 LearningRate 0.0002 Epoch: 24 Global Step: 41640 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-03-04 05:01:14,311-Speed 13757.16 samples/sec Loss 1.4975 LearningRate 0.0002 Epoch: 24 Global Step: 41650 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-03-04 05:01:32,205-Speed 13735.05 samples/sec Loss 1.5040 LearningRate 0.0002 Epoch: 24 Global Step: 41660 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-03-04 05:01:50,017-Speed 13798.32 samples/sec Loss 1.5026 LearningRate 0.0002 Epoch: 24 Global Step: 41670 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-03-04 05:02:07,931-Speed 13719.45 samples/sec Loss 1.4947 LearningRate 0.0002 Epoch: 24 Global Step: 41680 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-04 05:02:25,767-Speed 13779.98 samples/sec Loss 1.5012 LearningRate 0.0002 Epoch: 24 Global Step: 41690 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-04 05:02:43,663-Speed 13733.36 samples/sec Loss 1.5078 LearningRate 0.0002 Epoch: 24 Global Step: 41700 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-04 05:03:01,602-Speed 13700.86 samples/sec Loss 1.4975 LearningRate 0.0002 Epoch: 24 Global Step: 41710 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-04 05:03:19,397-Speed 13811.56 samples/sec Loss 1.4996 LearningRate 0.0002 Epoch: 24 Global Step: 41720 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-04 05:03:37,102-Speed 13881.93 samples/sec Loss 1.4978 LearningRate 0.0002 Epoch: 24 Global Step: 41730 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-04 05:03:54,961-Speed 13761.87 samples/sec Loss 1.5059 LearningRate 0.0002 Epoch: 24 Global Step: 41740 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-04 05:04:12,750-Speed 13816.28 samples/sec Loss 1.5078 LearningRate 0.0002 Epoch: 24 Global Step: 41750 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-04 05:04:30,691-Speed 13699.47 samples/sec Loss 1.4910 LearningRate 0.0002 Epoch: 24 Global Step: 41760 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-04 05:04:48,546-Speed 13765.12 samples/sec Loss 1.5004 LearningRate 0.0002 Epoch: 24 Global Step: 41770 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-04 05:05:06,450-Speed 13728.01 samples/sec Loss 1.5065 LearningRate 0.0002 Epoch: 24 Global Step: 41780 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-04 05:05:24,278-Speed 13786.03 samples/sec Loss 1.5004 LearningRate 0.0002 Epoch: 24 Global Step: 41790 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-04 05:05:42,159-Speed 13745.90 samples/sec Loss 1.5038 LearningRate 0.0002 Epoch: 24 Global Step: 41800 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-04 05:05:59,979-Speed 13792.03 samples/sec Loss 1.5008 LearningRate 0.0002 Epoch: 24 Global Step: 41810 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-04 05:06:17,825-Speed 13772.41 samples/sec Loss 1.5029 LearningRate 0.0002 Epoch: 24 Global Step: 41820 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-04 05:06:35,681-Speed 13763.90 samples/sec Loss 1.5142 LearningRate 0.0002 Epoch: 24 Global Step: 41830 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-04 05:06:53,610-Speed 13708.48 samples/sec Loss 1.5197 LearningRate 0.0002 Epoch: 24 Global Step: 41840 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-04 05:07:11,466-Speed 13764.37 samples/sec Loss 1.5058 LearningRate 0.0002 Epoch: 24 Global Step: 41850 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-04 05:07:29,266-Speed 13807.53 samples/sec Loss 1.5060 LearningRate 0.0002 Epoch: 24 Global Step: 41860 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-04 05:07:46,986-Speed 13870.08 samples/sec Loss 1.5065 LearningRate 0.0002 Epoch: 24 Global Step: 41870 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-04 05:08:04,852-Speed 13757.14 samples/sec Loss 1.4984 LearningRate 0.0002 Epoch: 24 Global Step: 41880 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-04 05:08:22,629-Speed 13827.31 samples/sec Loss 1.5004 LearningRate 0.0002 Epoch: 24 Global Step: 41890 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-04 05:08:40,449-Speed 13792.04 samples/sec Loss 1.4971 LearningRate 0.0002 Epoch: 24 Global Step: 41900 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-04 05:08:58,231-Speed 13821.14 samples/sec Loss 1.4960 LearningRate 0.0002 Epoch: 24 Global Step: 41910 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-04 05:09:16,075-Speed 13774.26 samples/sec Loss 1.4966 LearningRate 0.0002 Epoch: 24 Global Step: 41920 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-04 05:09:33,906-Speed 13783.48 samples/sec Loss 1.4807 LearningRate 0.0002 Epoch: 24 Global Step: 41930 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-04 05:09:51,811-Speed 13726.48 samples/sec Loss 1.4893 LearningRate 0.0002 Epoch: 24 Global Step: 41940 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-04 05:10:09,756-Speed 13697.38 samples/sec Loss 1.4895 LearningRate 0.0002 Epoch: 24 Global Step: 41950 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-04 05:10:27,675-Speed 13715.60 samples/sec Loss 1.5034 LearningRate 0.0002 Epoch: 24 Global Step: 41960 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-04 05:10:45,448-Speed 13828.58 samples/sec Loss 1.5029 LearningRate 0.0002 Epoch: 24 Global Step: 41970 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-04 05:11:03,229-Speed 13822.96 samples/sec Loss 1.4854 LearningRate 0.0002 Epoch: 24 Global Step: 41980 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-04 05:11:21,058-Speed 13784.72 samples/sec Loss 1.4922 LearningRate 0.0002 Epoch: 24 Global Step: 41990 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-04 05:11:38,885-Speed 13786.82 samples/sec Loss 1.4946 LearningRate 0.0002 Epoch: 24 Global Step: 42000 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-04 05:11:56,707-Speed 13790.46 samples/sec Loss 1.4827 LearningRate 0.0002 Epoch: 24 Global Step: 42010 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-03-04 05:12:14,675-Speed 13678.48 samples/sec Loss 1.4959 LearningRate 0.0002 Epoch: 24 Global Step: 42020 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-03-04 05:12:32,468-Speed 13813.32 samples/sec Loss 1.4798 LearningRate 0.0002 Epoch: 24 Global Step: 42030 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-03-04 05:12:50,290-Speed 13790.16 samples/sec Loss 1.4810 LearningRate 0.0002 Epoch: 24 Global Step: 42040 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-03-04 05:13:08,228-Speed 13701.90 samples/sec Loss 1.4869 LearningRate 0.0002 Epoch: 24 Global Step: 42050 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-03-04 05:13:26,058-Speed 13785.09 samples/sec Loss 1.4846 LearningRate 0.0002 Epoch: 24 Global Step: 42060 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-03-04 05:13:43,897-Speed 13777.81 samples/sec Loss 1.4882 LearningRate 0.0002 Epoch: 24 Global Step: 42070 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-03-04 05:14:01,696-Speed 13808.39 samples/sec Loss 1.4940 LearningRate 0.0002 Epoch: 24 Global Step: 42080 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-03-04 05:14:19,511-Speed 13797.37 samples/sec Loss 1.4832 LearningRate 0.0002 Epoch: 24 Global Step: 42090 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-03-04 05:14:37,269-Speed 13840.82 samples/sec Loss 1.4764 LearningRate 0.0002 Epoch: 24 Global Step: 42100 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-03-04 05:14:55,170-Speed 13729.34 samples/sec Loss 1.4887 LearningRate 0.0002 Epoch: 24 Global Step: 42110 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-04 05:15:12,972-Speed 13806.13 samples/sec Loss 1.4848 LearningRate 0.0002 Epoch: 24 Global Step: 42120 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-04 05:15:30,894-Speed 13713.82 samples/sec Loss 1.4864 LearningRate 0.0002 Epoch: 24 Global Step: 42130 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-04 05:15:48,646-Speed 13845.07 samples/sec Loss 1.4913 LearningRate 0.0002 Epoch: 24 Global Step: 42140 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-04 05:16:06,563-Speed 13716.85 samples/sec Loss 1.4876 LearningRate 0.0002 Epoch: 24 Global Step: 42150 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-04 05:16:24,316-Speed 13844.63 samples/sec Loss 1.4805 LearningRate 0.0002 Epoch: 24 Global Step: 42160 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-04 05:16:42,224-Speed 13724.41 samples/sec Loss 1.4910 LearningRate 0.0002 Epoch: 24 Global Step: 42170 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-04 05:17:00,053-Speed 13784.81 samples/sec Loss 1.4859 LearningRate 0.0002 Epoch: 24 Global Step: 42180 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-04 05:17:17,983-Speed 13707.93 samples/sec Loss 1.4733 LearningRate 0.0002 Epoch: 24 Global Step: 42190 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-04 05:17:35,914-Speed 13706.34 samples/sec Loss 1.4818 LearningRate 0.0002 Epoch: 24 Global Step: 42200 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-04 05:17:53,846-Speed 13706.09 samples/sec Loss 1.4788 LearningRate 0.0002 Epoch: 24 Global Step: 42210 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-04 05:18:11,664-Speed 13796.28 samples/sec Loss 1.4746 LearningRate 0.0002 Epoch: 24 Global Step: 42220 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-04 05:18:29,624-Speed 13686.04 samples/sec Loss 1.4753 LearningRate 0.0002 Epoch: 24 Global Step: 42230 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-03-04 05:18:47,471-Speed 13770.57 samples/sec Loss 1.4797 LearningRate 0.0002 Epoch: 24 Global Step: 42240 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-04 05:19:05,348-Speed 13748.37 samples/sec Loss 1.4834 LearningRate 0.0002 Epoch: 24 Global Step: 42250 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-04 05:19:23,158-Speed 13800.26 samples/sec Loss 1.4780 LearningRate 0.0002 Epoch: 24 Global Step: 42260 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-03-04 05:19:41,057-Speed 13730.97 samples/sec Loss 1.4840 LearningRate 0.0002 Epoch: 24 Global Step: 42270 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-04 05:19:58,999-Speed 13698.44 samples/sec Loss 1.4845 LearningRate 0.0002 Epoch: 24 Global Step: 42280 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-04 05:20:16,787-Speed 13816.85 samples/sec Loss 1.4795 LearningRate 0.0002 Epoch: 24 Global Step: 42290 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-04 05:20:34,567-Speed 13823.28 samples/sec Loss 1.4785 LearningRate 0.0002 Epoch: 24 Global Step: 42300 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-04 05:20:52,248-Speed 13901.05 samples/sec Loss 1.4809 LearningRate 0.0002 Epoch: 24 Global Step: 42310 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-03-04 05:21:10,139-Speed 13739.11 samples/sec Loss 1.4849 LearningRate 0.0002 Epoch: 24 Global Step: 42320 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-03-04 05:21:27,983-Speed 13773.57 samples/sec Loss 1.4868 LearningRate 0.0002 Epoch: 24 Global Step: 42330 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-03-04 05:21:46,069-Speed 13589.40 samples/sec Loss 1.4762 LearningRate 0.0002 Epoch: 24 Global Step: 42340 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-03-04 05:22:04,190-Speed 13563.13 samples/sec Loss 1.4848 LearningRate 0.0002 Epoch: 24 Global Step: 42350 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-03-04 05:22:22,302-Speed 13570.95 samples/sec Loss 1.4720 LearningRate 0.0002 Epoch: 24 Global Step: 42360 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-03-04 05:22:40,394-Speed 13584.57 samples/sec Loss 1.4681 LearningRate 0.0002 Epoch: 24 Global Step: 42370 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-03-04 05:22:58,488-Speed 13583.71 samples/sec Loss 1.4837 LearningRate 0.0002 Epoch: 24 Global Step: 42380 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-03-04 05:23:16,505-Speed 13641.15 samples/sec Loss 1.4659 LearningRate 0.0002 Epoch: 24 Global Step: 42390 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-03-04 05:23:34,588-Speed 13591.38 samples/sec Loss 1.4788 LearningRate 0.0002 Epoch: 24 Global Step: 42400 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-03-04 05:23:52,614-Speed 13634.56 samples/sec Loss 1.4777 LearningRate 0.0002 Epoch: 24 Global Step: 42410 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-04 05:24:10,735-Speed 13562.93 samples/sec Loss 1.4777 LearningRate 0.0002 Epoch: 24 Global Step: 42420 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-04 05:24:28,612-Speed 13748.90 samples/sec Loss 1.4817 LearningRate 0.0002 Epoch: 24 Global Step: 42430 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-04 05:24:46,592-Speed 13668.61 samples/sec Loss 1.4830 LearningRate 0.0002 Epoch: 24 Global Step: 42440 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-04 05:25:04,566-Speed 13673.84 samples/sec Loss 1.4796 LearningRate 0.0002 Epoch: 24 Global Step: 42450 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-04 05:25:22,450-Speed 13744.94 samples/sec Loss 1.4652 LearningRate 0.0002 Epoch: 24 Global Step: 42460 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-04 05:25:40,282-Speed 13783.48 samples/sec Loss 1.4772 LearningRate 0.0002 Epoch: 24 Global Step: 42470 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-04 05:25:58,187-Speed 13726.28 samples/sec Loss 1.4765 LearningRate 0.0002 Epoch: 24 Global Step: 42480 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-04 05:26:16,079-Speed 13736.62 samples/sec Loss 1.4816 LearningRate 0.0002 Epoch: 24 Global Step: 42490 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-04 05:26:33,970-Speed 13737.46 samples/sec Loss 1.4719 LearningRate 0.0002 Epoch: 24 Global Step: 42500 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-04 05:26:51,821-Speed 13767.75 samples/sec Loss 1.4666 LearningRate 0.0002 Epoch: 24 Global Step: 42510 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-04 05:27:09,619-Speed 13809.97 samples/sec Loss 1.4765 LearningRate 0.0002 Epoch: 24 Global Step: 42520 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-04 05:27:27,415-Speed 13810.08 samples/sec Loss 1.4752 LearningRate 0.0002 Epoch: 24 Global Step: 42530 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-03-04 05:27:45,166-Speed 13846.20 samples/sec Loss 1.4647 LearningRate 0.0002 Epoch: 24 Global Step: 42540 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-03-04 05:28:02,960-Speed 13812.17 samples/sec Loss 1.4731 LearningRate 0.0002 Epoch: 24 Global Step: 42550 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-03-04 05:28:20,744-Speed 13819.84 samples/sec Loss 1.4606 LearningRate 0.0002 Epoch: 24 Global Step: 42560 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-03-04 05:28:38,473-Speed 13862.94 samples/sec Loss 1.4691 LearningRate 0.0002 Epoch: 24 Global Step: 42570 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-03-04 05:28:56,196-Speed 13867.55 samples/sec Loss 1.4760 LearningRate 0.0002 Epoch: 24 Global Step: 42580 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-03-04 05:29:14,050-Speed 13766.86 samples/sec Loss 1.4713 LearningRate 0.0002 Epoch: 24 Global Step: 42590 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-03-04 05:29:31,873-Speed 13789.72 samples/sec Loss 1.4678 LearningRate 0.0002 Epoch: 24 Global Step: 42600 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-03-04 05:29:49,709-Speed 13780.03 samples/sec Loss 1.4731 LearningRate 0.0002 Epoch: 24 Global Step: 42610 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-03-04 05:30:07,493-Speed 13819.75 samples/sec Loss 1.4654 LearningRate 0.0002 Epoch: 24 Global Step: 42620 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-03-04 05:30:25,234-Speed 13854.16 samples/sec Loss 1.4714 LearningRate 0.0002 Epoch: 24 Global Step: 42630 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-04 05:30:43,036-Speed 13806.48 samples/sec Loss 1.4694 LearningRate 0.0002 Epoch: 24 Global Step: 42640 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-04 05:31:00,864-Speed 13785.49 samples/sec Loss 1.4683 LearningRate 0.0002 Epoch: 24 Global Step: 42650 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-04 05:31:18,666-Speed 13806.14 samples/sec Loss 1.4624 LearningRate 0.0002 Epoch: 24 Global Step: 42660 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-04 05:31:36,623-Speed 13687.29 samples/sec Loss 1.4696 LearningRate 0.0002 Epoch: 24 Global Step: 42670 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-04 05:31:54,431-Speed 13801.29 samples/sec Loss 1.4610 LearningRate 0.0002 Epoch: 24 Global Step: 42680 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-04 05:32:12,342-Speed 13722.16 samples/sec Loss 1.4563 LearningRate 0.0002 Epoch: 24 Global Step: 42690 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-04 05:32:30,038-Speed 13889.02 samples/sec Loss 1.4649 LearningRate 0.0002 Epoch: 24 Global Step: 42700 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-04 05:32:47,853-Speed 13795.95 samples/sec Loss 1.4633 LearningRate 0.0002 Epoch: 24 Global Step: 42710 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-03-04 05:33:05,888-Speed 13627.67 samples/sec Loss 1.4720 LearningRate 0.0002 Epoch: 24 Global Step: 42720 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-03-04 05:33:23,783-Speed 13734.45 samples/sec Loss 1.4551 LearningRate 0.0002 Epoch: 24 Global Step: 42730 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-03-04 05:33:41,641-Speed 13762.75 samples/sec Loss 1.4623 LearningRate 0.0002 Epoch: 24 Global Step: 42740 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-03-04 05:33:59,554-Speed 13720.31 samples/sec Loss 1.4662 LearningRate 0.0002 Epoch: 24 Global Step: 42750 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-03-04 05:34:17,379-Speed 13788.68 samples/sec Loss 1.4536 LearningRate 0.0002 Epoch: 24 Global Step: 42760 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-03-04 05:34:35,292-Speed 13720.44 samples/sec Loss 1.4728 LearningRate 0.0002 Epoch: 24 Global Step: 42770 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-03-04 05:34:53,191-Speed 13731.52 samples/sec Loss 1.4636 LearningRate 0.0002 Epoch: 24 Global Step: 42780 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-03-04 05:35:11,034-Speed 13775.26 samples/sec Loss 1.4606 LearningRate 0.0002 Epoch: 24 Global Step: 42790 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-03-04 05:35:28,949-Speed 13719.04 samples/sec Loss 1.4609 LearningRate 0.0002 Epoch: 24 Global Step: 42800 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-03-04 05:35:46,762-Speed 13797.64 samples/sec Loss 1.4664 LearningRate 0.0002 Epoch: 24 Global Step: 42810 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-04 05:36:04,644-Speed 13744.27 samples/sec Loss 1.4699 LearningRate 0.0002 Epoch: 24 Global Step: 42820 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-04 05:36:22,424-Speed 13822.98 samples/sec Loss 1.4623 LearningRate 0.0002 Epoch: 24 Global Step: 42830 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-04 05:36:40,226-Speed 13805.85 samples/sec Loss 1.4569 LearningRate 0.0002 Epoch: 24 Global Step: 42840 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-04 05:36:58,070-Speed 13774.11 samples/sec Loss 1.4561 LearningRate 0.0002 Epoch: 24 Global Step: 42850 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-04 05:37:15,884-Speed 13796.19 samples/sec Loss 1.4515 LearningRate 0.0002 Epoch: 24 Global Step: 42860 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-04 05:37:33,702-Speed 13794.19 samples/sec Loss 1.4585 LearningRate 0.0002 Epoch: 24 Global Step: 42870 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-04 05:37:51,563-Speed 13760.06 samples/sec Loss 1.4527 LearningRate 0.0002 Epoch: 24 Global Step: 42880 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-04 05:38:09,435-Speed 13752.01 samples/sec Loss 1.4476 LearningRate 0.0002 Epoch: 24 Global Step: 42890 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-04 05:38:27,217-Speed 13821.39 samples/sec Loss 1.4493 LearningRate 0.0002 Epoch: 24 Global Step: 42900 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-04 05:38:45,041-Speed 13788.96 samples/sec Loss 1.4621 LearningRate 0.0002 Epoch: 24 Global Step: 42910 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-04 05:39:02,851-Speed 13800.24 samples/sec Loss 1.4589 LearningRate 0.0002 Epoch: 24 Global Step: 42920 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-04 05:39:20,634-Speed 13821.08 samples/sec Loss 1.4520 LearningRate 0.0002 Epoch: 24 Global Step: 42930 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-04 05:39:38,472-Speed 13778.26 samples/sec Loss 1.4525 LearningRate 0.0002 Epoch: 24 Global Step: 42940 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-04 05:39:56,298-Speed 13787.01 samples/sec Loss 1.4522 LearningRate 0.0002 Epoch: 24 Global Step: 42950 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-04 05:40:14,225-Speed 13709.29 samples/sec Loss 1.4538 LearningRate 0.0002 Epoch: 24 Global Step: 42960 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-04 05:40:32,005-Speed 13824.77 samples/sec Loss 1.4537 LearningRate 0.0002 Epoch: 24 Global Step: 42970 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-03-04 05:40:49,883-Speed 13747.54 samples/sec Loss 1.4513 LearningRate 0.0002 Epoch: 24 Global Step: 42980 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-03-04 05:41:07,726-Speed 13774.20 samples/sec Loss 1.4603 LearningRate 0.0002 Epoch: 24 Global Step: 42990 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-03-04 05:41:25,506-Speed 13823.38 samples/sec Loss 1.4577 LearningRate 0.0002 Epoch: 24 Global Step: 43000 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-03-04 05:41:43,290-Speed 13819.80 samples/sec Loss 1.4630 LearningRate 0.0002 Epoch: 24 Global Step: 43010 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-03-04 05:42:01,432-Speed 13547.90 samples/sec Loss 1.4459 LearningRate 0.0002 Epoch: 24 Global Step: 43020 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-03-04 05:42:19,439-Speed 13648.69 samples/sec Loss 1.4486 LearningRate 0.0002 Epoch: 24 Global Step: 43030 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-03-04 05:42:37,292-Speed 13768.87 samples/sec Loss 1.4592 LearningRate 0.0002 Epoch: 24 Global Step: 43040 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-03-04 05:42:55,201-Speed 13723.31 samples/sec Loss 1.4557 LearningRate 0.0002 Epoch: 24 Global Step: 43050 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-03-04 05:43:13,145-Speed 13696.90 samples/sec Loss 1.4540 LearningRate 0.0002 Epoch: 24 Global Step: 43060 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-03-04 05:43:30,938-Speed 13813.89 samples/sec Loss 1.4594 LearningRate 0.0002 Epoch: 24 Global Step: 43070 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-03-04 05:43:48,829-Speed 13736.85 samples/sec Loss 1.4505 LearningRate 0.0002 Epoch: 24 Global Step: 43080 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-03-04 05:44:06,710-Speed 13745.34 samples/sec Loss 1.4479 LearningRate 0.0002 Epoch: 24 Global Step: 43090 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-03-04 05:44:24,707-Speed 13656.17 samples/sec Loss 1.4468 LearningRate 0.0002 Epoch: 24 Global Step: 43100 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-03-04 05:44:42,492-Speed 13819.27 samples/sec Loss 1.4550 LearningRate 0.0002 Epoch: 24 Global Step: 43110 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-03-04 05:45:00,253-Speed 13839.65 samples/sec Loss 1.4474 LearningRate 0.0002 Epoch: 24 Global Step: 43120 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-03-04 05:45:18,116-Speed 13758.58 samples/sec Loss 1.4614 LearningRate 0.0002 Epoch: 24 Global Step: 43130 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-03-04 05:45:35,899-Speed 13821.30 samples/sec Loss 1.4527 LearningRate 0.0002 Epoch: 24 Global Step: 43140 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-03-04 05:45:53,722-Speed 13790.84 samples/sec Loss 1.4651 LearningRate 0.0002 Epoch: 24 Global Step: 43150 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-03-04 05:46:11,587-Speed 13757.96 samples/sec Loss 1.4628 LearningRate 0.0002 Epoch: 24 Global Step: 43160 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-03-04 05:46:29,397-Speed 13799.68 samples/sec Loss 1.4595 LearningRate 0.0002 Epoch: 24 Global Step: 43170 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-03-04 05:46:47,320-Speed 13712.55 samples/sec Loss 1.4679 LearningRate 0.0002 Epoch: 24 Global Step: 43180 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-03-04 05:47:05,149-Speed 13785.25 samples/sec Loss 1.4639 LearningRate 0.0002 Epoch: 24 Global Step: 43190 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-03-04 05:47:22,890-Speed 13853.54 samples/sec Loss 1.4693 LearningRate 0.0002 Epoch: 24 Global Step: 43200 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-03-04 05:48:31,312-Speed 3591.88 samples/sec Loss 1.4579 LearningRate 0.0002 Epoch: 25 Global Step: 43210 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-04 05:48:49,260-Speed 13694.50 samples/sec Loss 1.4346 LearningRate 0.0002 Epoch: 25 Global Step: 43220 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-04 05:49:07,046-Speed 13817.98 samples/sec Loss 1.4480 LearningRate 0.0002 Epoch: 25 Global Step: 43230 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-04 05:49:24,850-Speed 13804.58 samples/sec Loss 1.4392 LearningRate 0.0002 Epoch: 25 Global Step: 43240 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-04 05:49:42,700-Speed 13769.23 samples/sec Loss 1.4436 LearningRate 0.0002 Epoch: 25 Global Step: 43250 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-04 05:50:00,473-Speed 13828.84 samples/sec Loss 1.4433 LearningRate 0.0002 Epoch: 25 Global Step: 43260 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-04 05:50:18,336-Speed 13759.64 samples/sec Loss 1.4391 LearningRate 0.0002 Epoch: 25 Global Step: 43270 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-04 05:50:36,205-Speed 13755.59 samples/sec Loss 1.4390 LearningRate 0.0002 Epoch: 25 Global Step: 43280 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-04 05:50:54,124-Speed 13715.65 samples/sec Loss 1.4372 LearningRate 0.0002 Epoch: 25 Global Step: 43290 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-04 05:51:11,953-Speed 13785.82 samples/sec Loss 1.4405 LearningRate 0.0002 Epoch: 25 Global Step: 43300 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-04 05:51:29,822-Speed 13753.83 samples/sec Loss 1.4396 LearningRate 0.0002 Epoch: 25 Global Step: 43310 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-04 05:51:47,732-Speed 13723.20 samples/sec Loss 1.4420 LearningRate 0.0002 Epoch: 25 Global Step: 43320 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-03-04 05:52:05,527-Speed 13811.02 samples/sec Loss 1.4359 LearningRate 0.0002 Epoch: 25 Global Step: 43330 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-03-04 05:52:23,323-Speed 13811.14 samples/sec Loss 1.4318 LearningRate 0.0002 Epoch: 25 Global Step: 43340 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-03-04 05:52:41,166-Speed 13774.51 samples/sec Loss 1.4402 LearningRate 0.0002 Epoch: 25 Global Step: 43350 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-03-04 05:52:59,009-Speed 13774.41 samples/sec Loss 1.4484 LearningRate 0.0002 Epoch: 25 Global Step: 43360 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-03-04 05:53:16,816-Speed 13802.01 samples/sec Loss 1.4400 LearningRate 0.0002 Epoch: 25 Global Step: 43370 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-03-04 05:53:34,682-Speed 13755.95 samples/sec Loss 1.4375 LearningRate 0.0002 Epoch: 25 Global Step: 43380 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-03-04 05:53:52,611-Speed 13709.29 samples/sec Loss 1.4389 LearningRate 0.0002 Epoch: 25 Global Step: 43390 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-03-04 05:54:10,479-Speed 13754.51 samples/sec Loss 1.4353 LearningRate 0.0002 Epoch: 25 Global Step: 43400 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-03-04 05:54:28,327-Speed 13770.34 samples/sec Loss 1.4368 LearningRate 0.0002 Epoch: 25 Global Step: 43410 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-03-04 05:54:46,083-Speed 13842.01 samples/sec Loss 1.4421 LearningRate 0.0002 Epoch: 25 Global Step: 43420 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-04 05:55:03,916-Speed 13782.20 samples/sec Loss 1.4407 LearningRate 0.0002 Epoch: 25 Global Step: 43430 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-04 05:55:21,845-Speed 13708.49 samples/sec Loss 1.4383 LearningRate 0.0002 Epoch: 25 Global Step: 43440 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-04 05:55:39,795-Speed 13692.35 samples/sec Loss 1.4350 LearningRate 0.0002 Epoch: 25 Global Step: 43450 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-04 05:55:57,557-Speed 13836.94 samples/sec Loss 1.4470 LearningRate 0.0002 Epoch: 25 Global Step: 43460 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-04 05:56:15,381-Speed 13789.08 samples/sec Loss 1.4516 LearningRate 0.0002 Epoch: 25 Global Step: 43470 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-04 05:56:33,163-Speed 13821.82 samples/sec Loss 1.4325 LearningRate 0.0002 Epoch: 25 Global Step: 43480 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-04 05:56:50,975-Speed 13798.34 samples/sec Loss 1.4365 LearningRate 0.0002 Epoch: 25 Global Step: 43490 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-04 05:57:08,805-Speed 13784.25 samples/sec Loss 1.4387 LearningRate 0.0002 Epoch: 25 Global Step: 43500 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-04 05:57:26,664-Speed 13762.07 samples/sec Loss 1.4474 LearningRate 0.0002 Epoch: 25 Global Step: 43510 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-04 05:57:44,570-Speed 13725.78 samples/sec Loss 1.4392 LearningRate 0.0002 Epoch: 25 Global Step: 43520 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-04 05:58:02,426-Speed 13764.12 samples/sec Loss 1.4288 LearningRate 0.0002 Epoch: 25 Global Step: 43530 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-04 05:58:20,164-Speed 13856.13 samples/sec Loss 1.4285 LearningRate 0.0002 Epoch: 25 Global Step: 43540 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-04 05:58:38,101-Speed 13703.42 samples/sec Loss 1.4461 LearningRate 0.0002 Epoch: 25 Global Step: 43550 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-04 05:58:55,923-Speed 13790.43 samples/sec Loss 1.4426 LearningRate 0.0002 Epoch: 25 Global Step: 43560 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-04 05:59:13,766-Speed 13773.90 samples/sec Loss 1.4300 LearningRate 0.0002 Epoch: 25 Global Step: 43570 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-04 05:59:31,695-Speed 13708.78 samples/sec Loss 1.4279 LearningRate 0.0002 Epoch: 25 Global Step: 43580 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-04 05:59:49,463-Speed 13832.29 samples/sec Loss 1.4433 LearningRate 0.0002 Epoch: 25 Global Step: 43590 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-04 06:00:07,229-Speed 13834.04 samples/sec Loss 1.4327 LearningRate 0.0002 Epoch: 25 Global Step: 43600 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-04 06:00:25,100-Speed 13752.86 samples/sec Loss 1.4323 LearningRate 0.0002 Epoch: 25 Global Step: 43610 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-04 06:00:42,965-Speed 13757.00 samples/sec Loss 1.4398 LearningRate 0.0002 Epoch: 25 Global Step: 43620 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-04 06:01:00,759-Speed 13812.55 samples/sec Loss 1.4275 LearningRate 0.0002 Epoch: 25 Global Step: 43630 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-04 06:01:18,606-Speed 13771.66 samples/sec Loss 1.4311 LearningRate 0.0002 Epoch: 25 Global Step: 43640 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-04 06:01:36,570-Speed 13681.14 samples/sec Loss 1.4214 LearningRate 0.0002 Epoch: 25 Global Step: 43650 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-04 06:01:54,486-Speed 13718.71 samples/sec Loss 1.4353 LearningRate 0.0002 Epoch: 25 Global Step: 43660 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-04 06:02:12,436-Speed 13692.85 samples/sec Loss 1.4378 LearningRate 0.0002 Epoch: 25 Global Step: 43670 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-04 06:02:30,332-Speed 13733.47 samples/sec Loss 1.4416 LearningRate 0.0002 Epoch: 25 Global Step: 43680 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-04 06:02:48,294-Speed 13682.74 samples/sec Loss 1.4195 LearningRate 0.0002 Epoch: 25 Global Step: 43690 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-04 06:03:06,146-Speed 13769.64 samples/sec Loss 1.4267 LearningRate 0.0002 Epoch: 25 Global Step: 43700 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-04 06:03:23,958-Speed 13798.77 samples/sec Loss 1.4163 LearningRate 0.0002 Epoch: 25 Global Step: 43710 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-04 06:03:41,792-Speed 13781.40 samples/sec Loss 1.4373 LearningRate 0.0002 Epoch: 25 Global Step: 43720 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-03-04 06:03:59,726-Speed 13705.09 samples/sec Loss 1.4376 LearningRate 0.0002 Epoch: 25 Global Step: 43730 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-04 06:04:17,667-Speed 13700.06 samples/sec Loss 1.4266 LearningRate 0.0002 Epoch: 25 Global Step: 43740 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-04 06:04:35,636-Speed 13679.13 samples/sec Loss 1.4324 LearningRate 0.0002 Epoch: 25 Global Step: 43750 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-04 06:04:53,479-Speed 13773.87 samples/sec Loss 1.4198 LearningRate 0.0002 Epoch: 25 Global Step: 43760 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-04 06:05:11,341-Speed 13759.85 samples/sec Loss 1.4340 LearningRate 0.0002 Epoch: 25 Global Step: 43770 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-04 06:05:29,153-Speed 13798.47 samples/sec Loss 1.4162 LearningRate 0.0002 Epoch: 25 Global Step: 43780 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-04 06:05:47,227-Speed 13598.13 samples/sec Loss 1.4348 LearningRate 0.0002 Epoch: 25 Global Step: 43790 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-04 06:06:04,933-Speed 13880.79 samples/sec Loss 1.4208 LearningRate 0.0002 Epoch: 25 Global Step: 43800 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-03-04 06:06:22,700-Speed 13833.60 samples/sec Loss 1.4251 LearningRate 0.0002 Epoch: 25 Global Step: 43810 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-03-04 06:06:40,538-Speed 13778.45 samples/sec Loss 1.4136 LearningRate 0.0002 Epoch: 25 Global Step: 43820 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-03-04 06:06:58,281-Speed 13852.33 samples/sec Loss 1.4162 LearningRate 0.0002 Epoch: 25 Global Step: 43830 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-03-04 06:07:16,216-Speed 13703.68 samples/sec Loss 1.4329 LearningRate 0.0002 Epoch: 25 Global Step: 43840 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-03-04 06:07:33,959-Speed 13852.15 samples/sec Loss 1.4290 LearningRate 0.0002 Epoch: 25 Global Step: 43850 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-03-04 06:07:51,662-Speed 13882.83 samples/sec Loss 1.4323 LearningRate 0.0002 Epoch: 25 Global Step: 43860 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-03-04 06:08:09,471-Speed 13800.65 samples/sec Loss 1.4281 LearningRate 0.0002 Epoch: 25 Global Step: 43870 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-03-04 06:08:27,394-Speed 13713.04 samples/sec Loss 1.4271 LearningRate 0.0002 Epoch: 25 Global Step: 43880 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-03-04 06:08:45,196-Speed 13806.07 samples/sec Loss 1.4301 LearningRate 0.0002 Epoch: 25 Global Step: 43890 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-03-04 06:09:03,054-Speed 13763.69 samples/sec Loss 1.4240 LearningRate 0.0002 Epoch: 25 Global Step: 43900 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-04 06:09:20,825-Speed 13830.86 samples/sec Loss 1.4186 LearningRate 0.0002 Epoch: 25 Global Step: 43910 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-03-04 06:09:38,662-Speed 13779.58 samples/sec Loss 1.4263 LearningRate 0.0002 Epoch: 25 Global Step: 43920 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-03-04 06:09:56,382-Speed 13870.36 samples/sec Loss 1.4286 LearningRate 0.0002 Epoch: 25 Global Step: 43930 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-03-04 06:10:14,176-Speed 13812.66 samples/sec Loss 1.4220 LearningRate 0.0002 Epoch: 25 Global Step: 43940 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-03-04 06:10:31,930-Speed 13842.94 samples/sec Loss 1.4165 LearningRate 0.0002 Epoch: 25 Global Step: 43950 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-03-04 06:10:49,698-Speed 13832.95 samples/sec Loss 1.4136 LearningRate 0.0002 Epoch: 25 Global Step: 43960 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-03-04 06:11:07,468-Speed 13832.10 samples/sec Loss 1.4291 LearningRate 0.0002 Epoch: 25 Global Step: 43970 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-03-04 06:11:25,241-Speed 13828.62 samples/sec Loss 1.4140 LearningRate 0.0002 Epoch: 25 Global Step: 43980 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-03-04 06:11:43,040-Speed 13808.31 samples/sec Loss 1.4265 LearningRate 0.0002 Epoch: 25 Global Step: 43990 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-03-04 06:12:00,839-Speed 13808.51 samples/sec Loss 1.4241 LearningRate 0.0002 Epoch: 25 Global Step: 44000 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-03-04 06:12:18,627-Speed 13817.10 samples/sec Loss 1.4266 LearningRate 0.0002 Epoch: 25 Global Step: 44010 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-03-04 06:12:36,424-Speed 13809.54 samples/sec Loss 1.4081 LearningRate 0.0002 Epoch: 25 Global Step: 44020 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-03-04 06:12:54,193-Speed 13832.60 samples/sec Loss 1.4193 LearningRate 0.0002 Epoch: 25 Global Step: 44030 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-03-04 06:13:11,915-Speed 13868.89 samples/sec Loss 1.4072 LearningRate 0.0002 Epoch: 25 Global Step: 44040 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-03-04 06:13:29,631-Speed 13872.53 samples/sec Loss 1.4094 LearningRate 0.0002 Epoch: 25 Global Step: 44050 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-03-04 06:13:47,323-Speed 13892.31 samples/sec Loss 1.4129 LearningRate 0.0002 Epoch: 25 Global Step: 44060 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-03-04 06:14:05,070-Speed 13848.59 samples/sec Loss 1.4193 LearningRate 0.0002 Epoch: 25 Global Step: 44070 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-03-04 06:14:22,817-Speed 13848.69 samples/sec Loss 1.4132 LearningRate 0.0002 Epoch: 25 Global Step: 44080 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-03-04 06:14:40,614-Speed 13810.24 samples/sec Loss 1.4161 LearningRate 0.0002 Epoch: 25 Global Step: 44090 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-03-04 06:14:58,338-Speed 13866.85 samples/sec Loss 1.4178 LearningRate 0.0002 Epoch: 25 Global Step: 44100 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-03-04 06:15:16,039-Speed 13884.92 samples/sec Loss 1.4151 LearningRate 0.0002 Epoch: 25 Global Step: 44110 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-04 06:15:33,818-Speed 13824.16 samples/sec Loss 1.4122 LearningRate 0.0002 Epoch: 25 Global Step: 44120 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-03-04 06:15:51,568-Speed 13845.94 samples/sec Loss 1.4127 LearningRate 0.0002 Epoch: 25 Global Step: 44130 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-03-04 06:16:09,335-Speed 13833.68 samples/sec Loss 1.4074 LearningRate 0.0002 Epoch: 25 Global Step: 44140 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-03-04 06:16:27,199-Speed 13758.07 samples/sec Loss 1.4150 LearningRate 0.0002 Epoch: 25 Global Step: 44150 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-03-04 06:16:44,890-Speed 13892.74 samples/sec Loss 1.4083 LearningRate 0.0002 Epoch: 25 Global Step: 44160 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-03-04 06:17:02,603-Speed 13876.06 samples/sec Loss 1.4102 LearningRate 0.0002 Epoch: 25 Global Step: 44170 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-03-04 06:17:20,447-Speed 13773.16 samples/sec Loss 1.4088 LearningRate 0.0002 Epoch: 25 Global Step: 44180 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-03-04 06:17:38,404-Speed 13687.04 samples/sec Loss 1.4098 LearningRate 0.0002 Epoch: 25 Global Step: 44190 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-03-04 06:17:56,500-Speed 13581.92 samples/sec Loss 1.4149 LearningRate 0.0002 Epoch: 25 Global Step: 44200 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-03-04 06:18:14,485-Speed 13666.40 samples/sec Loss 1.4215 LearningRate 0.0002 Epoch: 25 Global Step: 44210 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-03-04 06:18:32,502-Speed 13641.34 samples/sec Loss 1.4228 LearningRate 0.0002 Epoch: 25 Global Step: 44220 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-03-04 06:18:50,536-Speed 13628.31 samples/sec Loss 1.4149 LearningRate 0.0002 Epoch: 25 Global Step: 44230 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-03-04 06:19:08,552-Speed 13641.73 samples/sec Loss 1.4049 LearningRate 0.0002 Epoch: 25 Global Step: 44240 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-03-04 06:19:26,655-Speed 13577.29 samples/sec Loss 1.4172 LearningRate 0.0002 Epoch: 25 Global Step: 44250 Fp16 Grad Scale: 8192 Required: 13 hours Training: 2022-03-04 06:19:44,781-Speed 13559.17 samples/sec Loss 1.4106 LearningRate 0.0002 Epoch: 25 Global Step: 44260 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-03-04 06:20:02,758-Speed 13671.34 samples/sec Loss 1.4067 LearningRate 0.0002 Epoch: 25 Global Step: 44270 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-03-04 06:20:20,749-Speed 13660.90 samples/sec Loss 1.4118 LearningRate 0.0002 Epoch: 25 Global Step: 44280 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-03-04 06:20:38,798-Speed 13617.40 samples/sec Loss 1.4118 LearningRate 0.0002 Epoch: 25 Global Step: 44290 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-03-04 06:20:56,875-Speed 13596.52 samples/sec Loss 1.4098 LearningRate 0.0002 Epoch: 25 Global Step: 44300 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-03-04 06:21:14,878-Speed 13651.95 samples/sec Loss 1.4155 LearningRate 0.0002 Epoch: 25 Global Step: 44310 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-03-04 06:21:32,928-Speed 13616.22 samples/sec Loss 1.4130 LearningRate 0.0002 Epoch: 25 Global Step: 44320 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-03-04 06:21:50,930-Speed 13652.69 samples/sec Loss 1.4178 LearningRate 0.0002 Epoch: 25 Global Step: 44330 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-03-04 06:22:08,970-Speed 13623.78 samples/sec Loss 1.4095 LearningRate 0.0002 Epoch: 25 Global Step: 44340 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-03-04 06:22:27,043-Speed 13599.09 samples/sec Loss 1.4181 LearningRate 0.0002 Epoch: 25 Global Step: 44350 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-03-04 06:22:45,050-Speed 13648.93 samples/sec Loss 1.4109 LearningRate 0.0002 Epoch: 25 Global Step: 44360 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-03-04 06:23:03,161-Speed 13570.24 samples/sec Loss 1.4110 LearningRate 0.0002 Epoch: 25 Global Step: 44370 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-03-04 06:23:21,079-Speed 13716.44 samples/sec Loss 1.4022 LearningRate 0.0002 Epoch: 25 Global Step: 44380 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-03-04 06:23:38,910-Speed 13783.92 samples/sec Loss 1.4056 LearningRate 0.0002 Epoch: 25 Global Step: 44390 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-03-04 06:23:56,640-Speed 13862.51 samples/sec Loss 1.4013 LearningRate 0.0002 Epoch: 25 Global Step: 44400 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-03-04 06:24:14,449-Speed 13799.92 samples/sec Loss 1.4036 LearningRate 0.0002 Epoch: 25 Global Step: 44410 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-03-04 06:24:32,245-Speed 13810.63 samples/sec Loss 1.3942 LearningRate 0.0002 Epoch: 25 Global Step: 44420 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-03-04 06:24:50,065-Speed 13793.12 samples/sec Loss 1.3973 LearningRate 0.0002 Epoch: 25 Global Step: 44430 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-03-04 06:25:07,839-Speed 13827.43 samples/sec Loss 1.4127 LearningRate 0.0002 Epoch: 25 Global Step: 44440 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-03-04 06:25:25,676-Speed 13778.80 samples/sec Loss 1.3970 LearningRate 0.0002 Epoch: 25 Global Step: 44450 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-03-04 06:25:43,530-Speed 13766.14 samples/sec Loss 1.3992 LearningRate 0.0002 Epoch: 25 Global Step: 44460 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-03-04 06:26:01,271-Speed 13853.66 samples/sec Loss 1.3842 LearningRate 0.0002 Epoch: 25 Global Step: 44470 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-03-04 06:26:19,066-Speed 13811.68 samples/sec Loss 1.3975 LearningRate 0.0002 Epoch: 25 Global Step: 44480 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-03-04 06:26:36,903-Speed 13779.26 samples/sec Loss 1.4008 LearningRate 0.0002 Epoch: 25 Global Step: 44490 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-03-04 06:26:54,704-Speed 13806.77 samples/sec Loss 1.4039 LearningRate 0.0002 Epoch: 25 Global Step: 44500 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-03-04 06:27:12,630-Speed 13711.57 samples/sec Loss 1.4082 LearningRate 0.0002 Epoch: 25 Global Step: 44510 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-03-04 06:27:30,390-Speed 13838.86 samples/sec Loss 1.3987 LearningRate 0.0002 Epoch: 25 Global Step: 44520 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-03-04 06:27:48,230-Speed 13776.59 samples/sec Loss 1.4020 LearningRate 0.0002 Epoch: 25 Global Step: 44530 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-03-04 06:28:05,980-Speed 13846.09 samples/sec Loss 1.4131 LearningRate 0.0002 Epoch: 25 Global Step: 44540 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-03-04 06:28:23,809-Speed 13785.28 samples/sec Loss 1.3999 LearningRate 0.0002 Epoch: 25 Global Step: 44550 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-04 06:28:41,691-Speed 13744.61 samples/sec Loss 1.3945 LearningRate 0.0002 Epoch: 25 Global Step: 44560 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-04 06:28:59,531-Speed 13776.38 samples/sec Loss 1.4047 LearningRate 0.0002 Epoch: 25 Global Step: 44570 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-04 06:29:17,270-Speed 13855.41 samples/sec Loss 1.4012 LearningRate 0.0002 Epoch: 25 Global Step: 44580 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-04 06:29:34,995-Speed 13865.65 samples/sec Loss 1.3978 LearningRate 0.0002 Epoch: 25 Global Step: 44590 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-04 06:29:52,790-Speed 13811.67 samples/sec Loss 1.3829 LearningRate 0.0002 Epoch: 25 Global Step: 44600 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-04 06:30:10,589-Speed 13808.55 samples/sec Loss 1.3944 LearningRate 0.0002 Epoch: 25 Global Step: 44610 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-04 06:30:28,421-Speed 13783.01 samples/sec Loss 1.3861 LearningRate 0.0002 Epoch: 25 Global Step: 44620 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-04 06:30:46,182-Speed 13837.19 samples/sec Loss 1.4113 LearningRate 0.0002 Epoch: 25 Global Step: 44630 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-04 06:31:04,014-Speed 13783.89 samples/sec Loss 1.4049 LearningRate 0.0002 Epoch: 25 Global Step: 44640 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-04 06:31:21,831-Speed 13794.40 samples/sec Loss 1.3963 LearningRate 0.0002 Epoch: 25 Global Step: 44650 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-04 06:31:39,619-Speed 13817.17 samples/sec Loss 1.4018 LearningRate 0.0002 Epoch: 25 Global Step: 44660 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-04 06:31:57,473-Speed 13765.85 samples/sec Loss 1.3994 LearningRate 0.0002 Epoch: 25 Global Step: 44670 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-04 06:32:15,193-Speed 13869.78 samples/sec Loss 1.4012 LearningRate 0.0002 Epoch: 25 Global Step: 44680 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-04 06:32:32,900-Speed 13880.96 samples/sec Loss 1.4019 LearningRate 0.0002 Epoch: 25 Global Step: 44690 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-04 06:32:50,792-Speed 13736.39 samples/sec Loss 1.3976 LearningRate 0.0002 Epoch: 25 Global Step: 44700 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-04 06:33:08,547-Speed 13842.45 samples/sec Loss 1.3978 LearningRate 0.0002 Epoch: 25 Global Step: 44710 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-04 06:33:26,334-Speed 13818.21 samples/sec Loss 1.3838 LearningRate 0.0002 Epoch: 25 Global Step: 44720 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-04 06:33:44,051-Speed 13872.55 samples/sec Loss 1.4055 LearningRate 0.0002 Epoch: 25 Global Step: 44730 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-04 06:34:01,780-Speed 13862.53 samples/sec Loss 1.3906 LearningRate 0.0002 Epoch: 25 Global Step: 44740 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-04 06:34:19,526-Speed 13849.90 samples/sec Loss 1.3937 LearningRate 0.0002 Epoch: 25 Global Step: 44750 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-04 06:34:37,260-Speed 13858.70 samples/sec Loss 1.4028 LearningRate 0.0002 Epoch: 25 Global Step: 44760 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-04 06:34:55,002-Speed 13853.34 samples/sec Loss 1.3935 LearningRate 0.0002 Epoch: 25 Global Step: 44770 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-04 06:35:12,839-Speed 13779.79 samples/sec Loss 1.4005 LearningRate 0.0002 Epoch: 25 Global Step: 44780 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-04 06:35:30,590-Speed 13845.46 samples/sec Loss 1.3847 LearningRate 0.0002 Epoch: 25 Global Step: 44790 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-04 06:35:48,322-Speed 13860.96 samples/sec Loss 1.3938 LearningRate 0.0002 Epoch: 25 Global Step: 44800 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-04 06:36:05,998-Speed 13904.85 samples/sec Loss 1.3898 LearningRate 0.0002 Epoch: 25 Global Step: 44810 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-04 06:36:23,706-Speed 13878.71 samples/sec Loss 1.3966 LearningRate 0.0002 Epoch: 25 Global Step: 44820 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-04 06:36:41,501-Speed 13811.50 samples/sec Loss 1.3879 LearningRate 0.0002 Epoch: 25 Global Step: 44830 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-04 06:36:59,274-Speed 13829.14 samples/sec Loss 1.3990 LearningRate 0.0002 Epoch: 25 Global Step: 44840 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-04 06:37:17,000-Speed 13865.05 samples/sec Loss 1.3991 LearningRate 0.0002 Epoch: 25 Global Step: 44850 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-04 06:37:34,756-Speed 13842.31 samples/sec Loss 1.3852 LearningRate 0.0002 Epoch: 25 Global Step: 44860 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-04 06:37:52,486-Speed 13861.88 samples/sec Loss 1.3937 LearningRate 0.0002 Epoch: 25 Global Step: 44870 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-04 06:38:10,284-Speed 13809.00 samples/sec Loss 1.4029 LearningRate 0.0002 Epoch: 25 Global Step: 44880 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-04 06:38:28,108-Speed 13789.56 samples/sec Loss 1.4052 LearningRate 0.0002 Epoch: 25 Global Step: 44890 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-03-04 06:38:45,889-Speed 13821.65 samples/sec Loss 1.4033 LearningRate 0.0002 Epoch: 25 Global Step: 44900 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-03-04 06:39:03,629-Speed 13854.78 samples/sec Loss 1.3997 LearningRate 0.0002 Epoch: 25 Global Step: 44910 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-03-04 06:39:21,446-Speed 13794.00 samples/sec Loss 1.3918 LearningRate 0.0002 Epoch: 25 Global Step: 44920 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-03-04 06:39:39,300-Speed 13767.27 samples/sec Loss 1.4098 LearningRate 0.0002 Epoch: 25 Global Step: 44930 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-03-04 06:40:47,802-Speed 3587.65 samples/sec Loss 1.3985 LearningRate 0.0002 Epoch: 26 Global Step: 44940 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-03-04 06:41:05,450-Speed 13926.83 samples/sec Loss 1.3711 LearningRate 0.0002 Epoch: 26 Global Step: 44950 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-03-04 06:41:23,174-Speed 13867.64 samples/sec Loss 1.3850 LearningRate 0.0002 Epoch: 26 Global Step: 44960 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-03-04 06:41:40,983-Speed 13800.67 samples/sec Loss 1.3775 LearningRate 0.0002 Epoch: 26 Global Step: 44970 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-03-04 06:41:58,972-Speed 13662.62 samples/sec Loss 1.3741 LearningRate 0.0002 Epoch: 26 Global Step: 44980 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-03-04 06:42:16,893-Speed 13714.27 samples/sec Loss 1.3814 LearningRate 0.0002 Epoch: 26 Global Step: 44990 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-03-04 06:42:34,714-Speed 13791.10 samples/sec Loss 1.3792 LearningRate 0.0002 Epoch: 26 Global Step: 45000 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-03-04 06:42:52,599-Speed 13742.19 samples/sec Loss 1.3815 LearningRate 0.0002 Epoch: 26 Global Step: 45010 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-03-04 06:43:10,458-Speed 13761.78 samples/sec Loss 1.3831 LearningRate 0.0002 Epoch: 26 Global Step: 45020 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-03-04 06:43:28,429-Speed 13676.54 samples/sec Loss 1.3935 LearningRate 0.0001 Epoch: 26 Global Step: 45030 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-03-04 06:43:46,196-Speed 13833.31 samples/sec Loss 1.3830 LearningRate 0.0001 Epoch: 26 Global Step: 45040 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-03-04 06:44:04,140-Speed 13696.66 samples/sec Loss 1.3792 LearningRate 0.0001 Epoch: 26 Global Step: 45050 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-03-04 06:44:21,806-Speed 13912.69 samples/sec Loss 1.3871 LearningRate 0.0001 Epoch: 26 Global Step: 45060 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-03-04 06:44:39,517-Speed 13876.76 samples/sec Loss 1.3768 LearningRate 0.0001 Epoch: 26 Global Step: 45070 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-03-04 06:44:57,367-Speed 13768.46 samples/sec Loss 1.3891 LearningRate 0.0001 Epoch: 26 Global Step: 45080 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-03-04 06:45:15,270-Speed 13728.50 samples/sec Loss 1.3701 LearningRate 0.0001 Epoch: 26 Global Step: 45090 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-03-04 06:45:33,122-Speed 13767.97 samples/sec Loss 1.3869 LearningRate 0.0001 Epoch: 26 Global Step: 45100 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-03-04 06:45:50,988-Speed 13757.01 samples/sec Loss 1.3838 LearningRate 0.0001 Epoch: 26 Global Step: 45110 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-03-04 06:46:08,806-Speed 13794.71 samples/sec Loss 1.3730 LearningRate 0.0001 Epoch: 26 Global Step: 45120 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-03-04 06:46:26,619-Speed 13798.02 samples/sec Loss 1.3819 LearningRate 0.0001 Epoch: 26 Global Step: 45130 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-03-04 06:46:44,559-Speed 13700.12 samples/sec Loss 1.3690 LearningRate 0.0001 Epoch: 26 Global Step: 45140 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-03-04 06:47:02,461-Speed 13728.94 samples/sec Loss 1.3745 LearningRate 0.0001 Epoch: 26 Global Step: 45150 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-03-04 06:47:20,302-Speed 13776.20 samples/sec Loss 1.3796 LearningRate 0.0001 Epoch: 26 Global Step: 45160 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-03-04 06:47:38,291-Speed 13662.33 samples/sec Loss 1.3868 LearningRate 0.0001 Epoch: 26 Global Step: 45170 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-04 06:47:56,176-Speed 13743.67 samples/sec Loss 1.3903 LearningRate 0.0001 Epoch: 26 Global Step: 45180 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-03-04 06:48:14,065-Speed 13739.12 samples/sec Loss 1.3843 LearningRate 0.0001 Epoch: 26 Global Step: 45190 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-03-04 06:48:31,993-Speed 13709.47 samples/sec Loss 1.3805 LearningRate 0.0001 Epoch: 26 Global Step: 45200 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-03-04 06:48:49,794-Speed 13806.45 samples/sec Loss 1.3799 LearningRate 0.0001 Epoch: 26 Global Step: 45210 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-03-04 06:49:07,601-Speed 13802.13 samples/sec Loss 1.3822 LearningRate 0.0001 Epoch: 26 Global Step: 45220 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-03-04 06:49:25,450-Speed 13770.08 samples/sec Loss 1.3833 LearningRate 0.0001 Epoch: 26 Global Step: 45230 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-03-04 06:49:43,280-Speed 13783.80 samples/sec Loss 1.3833 LearningRate 0.0001 Epoch: 26 Global Step: 45240 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-03-04 06:50:01,231-Speed 13691.79 samples/sec Loss 1.3822 LearningRate 0.0001 Epoch: 26 Global Step: 45250 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-03-04 06:50:19,132-Speed 13729.78 samples/sec Loss 1.3793 LearningRate 0.0001 Epoch: 26 Global Step: 45260 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-03-04 06:50:36,998-Speed 13756.64 samples/sec Loss 1.3777 LearningRate 0.0001 Epoch: 26 Global Step: 45270 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-03-04 06:50:54,845-Speed 13771.51 samples/sec Loss 1.3902 LearningRate 0.0001 Epoch: 26 Global Step: 45280 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-04 06:51:12,636-Speed 13814.29 samples/sec Loss 1.3713 LearningRate 0.0001 Epoch: 26 Global Step: 45290 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-04 06:51:30,453-Speed 13796.38 samples/sec Loss 1.3776 LearningRate 0.0001 Epoch: 26 Global Step: 45300 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-04 06:51:48,226-Speed 13828.70 samples/sec Loss 1.3840 LearningRate 0.0001 Epoch: 26 Global Step: 45310 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-04 06:52:05,949-Speed 13867.07 samples/sec Loss 1.3688 LearningRate 0.0001 Epoch: 26 Global Step: 45320 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-04 06:52:23,748-Speed 13808.88 samples/sec Loss 1.3682 LearningRate 0.0001 Epoch: 26 Global Step: 45330 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-04 06:52:41,533-Speed 13820.61 samples/sec Loss 1.3809 LearningRate 0.0001 Epoch: 26 Global Step: 45340 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-04 06:52:59,279-Speed 13850.07 samples/sec Loss 1.3776 LearningRate 0.0001 Epoch: 26 Global Step: 45350 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-04 06:53:17,053-Speed 13827.64 samples/sec Loss 1.3747 LearningRate 0.0001 Epoch: 26 Global Step: 45360 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-04 06:53:34,860-Speed 13801.89 samples/sec Loss 1.3757 LearningRate 0.0001 Epoch: 26 Global Step: 45370 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-04 06:53:52,619-Speed 13840.35 samples/sec Loss 1.3763 LearningRate 0.0001 Epoch: 26 Global Step: 45380 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-04 06:54:10,383-Speed 13835.35 samples/sec Loss 1.3843 LearningRate 0.0001 Epoch: 26 Global Step: 45390 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-04 06:54:28,096-Speed 13875.63 samples/sec Loss 1.3658 LearningRate 0.0001 Epoch: 26 Global Step: 45400 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-04 06:54:45,893-Speed 13809.77 samples/sec Loss 1.3739 LearningRate 0.0001 Epoch: 26 Global Step: 45410 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-04 06:55:03,634-Speed 13853.50 samples/sec Loss 1.3711 LearningRate 0.0001 Epoch: 26 Global Step: 45420 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-04 06:55:21,440-Speed 13803.05 samples/sec Loss 1.3568 LearningRate 0.0001 Epoch: 26 Global Step: 45430 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-04 06:55:39,272-Speed 13783.66 samples/sec Loss 1.3658 LearningRate 0.0001 Epoch: 26 Global Step: 45440 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-04 06:55:57,166-Speed 13735.18 samples/sec Loss 1.3668 LearningRate 0.0001 Epoch: 26 Global Step: 45450 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-04 06:56:14,984-Speed 13793.66 samples/sec Loss 1.3798 LearningRate 0.0001 Epoch: 26 Global Step: 45460 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-04 06:56:32,756-Speed 13829.24 samples/sec Loss 1.3645 LearningRate 0.0001 Epoch: 26 Global Step: 45470 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-04 06:56:50,593-Speed 13778.87 samples/sec Loss 1.3667 LearningRate 0.0001 Epoch: 26 Global Step: 45480 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-04 06:57:08,460-Speed 13755.99 samples/sec Loss 1.3711 LearningRate 0.0001 Epoch: 26 Global Step: 45490 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-04 06:57:26,262-Speed 13806.60 samples/sec Loss 1.3692 LearningRate 0.0001 Epoch: 26 Global Step: 45500 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-04 06:57:43,978-Speed 13872.62 samples/sec Loss 1.3769 LearningRate 0.0001 Epoch: 26 Global Step: 45510 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-04 06:58:01,664-Speed 13896.85 samples/sec Loss 1.3664 LearningRate 0.0001 Epoch: 26 Global Step: 45520 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-04 06:58:19,421-Speed 13840.76 samples/sec Loss 1.3752 LearningRate 0.0001 Epoch: 26 Global Step: 45530 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-03-04 06:58:37,205-Speed 13819.58 samples/sec Loss 1.3754 LearningRate 0.0001 Epoch: 26 Global Step: 45540 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-03-04 06:58:55,039-Speed 13781.78 samples/sec Loss 1.3685 LearningRate 0.0001 Epoch: 26 Global Step: 45550 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-03-04 06:59:12,755-Speed 13873.04 samples/sec Loss 1.3648 LearningRate 0.0001 Epoch: 26 Global Step: 45560 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-03-04 06:59:30,546-Speed 13814.45 samples/sec Loss 1.3723 LearningRate 0.0001 Epoch: 26 Global Step: 45570 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-03-04 06:59:48,239-Speed 13891.23 samples/sec Loss 1.3617 LearningRate 0.0001 Epoch: 26 Global Step: 45580 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-03-04 07:00:05,987-Speed 13848.46 samples/sec Loss 1.3720 LearningRate 0.0001 Epoch: 26 Global Step: 45590 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-03-04 07:00:23,667-Speed 13901.35 samples/sec Loss 1.3626 LearningRate 0.0001 Epoch: 26 Global Step: 45600 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-03-04 07:00:41,441-Speed 13827.84 samples/sec Loss 1.3684 LearningRate 0.0001 Epoch: 26 Global Step: 45610 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-03-04 07:00:59,180-Speed 13855.07 samples/sec Loss 1.3610 LearningRate 0.0001 Epoch: 26 Global Step: 45620 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-03-04 07:01:16,870-Speed 13893.53 samples/sec Loss 1.3726 LearningRate 0.0001 Epoch: 26 Global Step: 45630 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-04 07:01:34,657-Speed 13817.66 samples/sec Loss 1.3688 LearningRate 0.0001 Epoch: 26 Global Step: 45640 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-04 07:01:52,447-Speed 13815.77 samples/sec Loss 1.3654 LearningRate 0.0001 Epoch: 26 Global Step: 45650 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-04 07:02:10,256-Speed 13800.67 samples/sec Loss 1.3674 LearningRate 0.0001 Epoch: 26 Global Step: 45660 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-04 07:02:28,069-Speed 13797.65 samples/sec Loss 1.3662 LearningRate 0.0001 Epoch: 26 Global Step: 45670 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-04 07:02:45,765-Speed 13888.46 samples/sec Loss 1.3708 LearningRate 0.0001 Epoch: 26 Global Step: 45680 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-04 07:03:03,558-Speed 13813.04 samples/sec Loss 1.3699 LearningRate 0.0001 Epoch: 26 Global Step: 45690 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-04 07:03:21,334-Speed 13826.49 samples/sec Loss 1.3624 LearningRate 0.0001 Epoch: 26 Global Step: 45700 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-04 07:03:39,045-Speed 13877.84 samples/sec Loss 1.3625 LearningRate 0.0001 Epoch: 26 Global Step: 45710 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-04 07:03:56,869-Speed 13789.34 samples/sec Loss 1.3665 LearningRate 0.0001 Epoch: 26 Global Step: 45720 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-03-04 07:04:14,898-Speed 13631.78 samples/sec Loss 1.3650 LearningRate 0.0001 Epoch: 26 Global Step: 45730 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-03-04 07:04:32,687-Speed 13816.16 samples/sec Loss 1.3586 LearningRate 0.0001 Epoch: 26 Global Step: 45740 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-03-04 07:04:50,443-Speed 13842.72 samples/sec Loss 1.3507 LearningRate 0.0001 Epoch: 26 Global Step: 45750 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-03-04 07:05:08,190-Speed 13848.60 samples/sec Loss 1.3546 LearningRate 0.0001 Epoch: 26 Global Step: 45760 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-03-04 07:05:25,921-Speed 13861.95 samples/sec Loss 1.3649 LearningRate 0.0001 Epoch: 26 Global Step: 45770 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-03-04 07:05:43,728-Speed 13801.93 samples/sec Loss 1.3562 LearningRate 0.0001 Epoch: 26 Global Step: 45780 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-03-04 07:06:01,528-Speed 13807.47 samples/sec Loss 1.3526 LearningRate 0.0001 Epoch: 26 Global Step: 45790 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-03-04 07:06:19,389-Speed 13761.00 samples/sec Loss 1.3559 LearningRate 0.0001 Epoch: 26 Global Step: 45800 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-03-04 07:06:37,232-Speed 13775.21 samples/sec Loss 1.3644 LearningRate 0.0001 Epoch: 26 Global Step: 45810 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-03-04 07:06:55,080-Speed 13770.37 samples/sec Loss 1.3554 LearningRate 0.0001 Epoch: 26 Global Step: 45820 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-04 07:07:12,943-Speed 13758.39 samples/sec Loss 1.3591 LearningRate 0.0001 Epoch: 26 Global Step: 45830 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-04 07:07:30,738-Speed 13811.96 samples/sec Loss 1.3629 LearningRate 0.0001 Epoch: 26 Global Step: 45840 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-04 07:07:48,605-Speed 13756.23 samples/sec Loss 1.3639 LearningRate 0.0001 Epoch: 26 Global Step: 45850 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-04 07:08:06,345-Speed 13854.09 samples/sec Loss 1.3513 LearningRate 0.0001 Epoch: 26 Global Step: 45860 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-04 07:08:24,143-Speed 13808.87 samples/sec Loss 1.3546 LearningRate 0.0001 Epoch: 26 Global Step: 45870 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-04 07:08:41,908-Speed 13835.02 samples/sec Loss 1.3490 LearningRate 0.0001 Epoch: 26 Global Step: 45880 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-04 07:08:59,692-Speed 13820.37 samples/sec Loss 1.3541 LearningRate 0.0001 Epoch: 26 Global Step: 45890 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-04 07:09:17,433-Speed 13853.66 samples/sec Loss 1.3542 LearningRate 0.0001 Epoch: 26 Global Step: 45900 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-04 07:09:35,161-Speed 13862.94 samples/sec Loss 1.3610 LearningRate 0.0001 Epoch: 26 Global Step: 45910 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-04 07:09:52,950-Speed 13818.25 samples/sec Loss 1.3590 LearningRate 0.0001 Epoch: 26 Global Step: 45920 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-04 07:10:10,661-Speed 13876.63 samples/sec Loss 1.3516 LearningRate 0.0001 Epoch: 26 Global Step: 45930 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-04 07:10:28,423-Speed 13837.35 samples/sec Loss 1.3579 LearningRate 0.0001 Epoch: 26 Global Step: 45940 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-04 07:10:46,149-Speed 13866.52 samples/sec Loss 1.3466 LearningRate 0.0001 Epoch: 26 Global Step: 45950 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-04 07:11:03,940-Speed 13813.94 samples/sec Loss 1.3563 LearningRate 0.0001 Epoch: 26 Global Step: 45960 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-03-04 07:11:21,632-Speed 13892.27 samples/sec Loss 1.3631 LearningRate 0.0001 Epoch: 26 Global Step: 45970 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-04 07:11:39,305-Speed 13907.04 samples/sec Loss 1.3521 LearningRate 0.0001 Epoch: 26 Global Step: 45980 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-04 07:11:57,037-Speed 13860.64 samples/sec Loss 1.3508 LearningRate 0.0001 Epoch: 26 Global Step: 45990 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-04 07:12:14,816-Speed 13824.49 samples/sec Loss 1.3509 LearningRate 0.0001 Epoch: 26 Global Step: 46000 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-04 07:12:32,513-Speed 13888.13 samples/sec Loss 1.3613 LearningRate 0.0001 Epoch: 26 Global Step: 46010 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-04 07:12:50,315-Speed 13806.06 samples/sec Loss 1.3586 LearningRate 0.0001 Epoch: 26 Global Step: 46020 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-03-04 07:13:08,132-Speed 13793.74 samples/sec Loss 1.3532 LearningRate 0.0001 Epoch: 26 Global Step: 46030 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-03-04 07:13:25,870-Speed 13855.93 samples/sec Loss 1.3575 LearningRate 0.0001 Epoch: 26 Global Step: 46040 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-03-04 07:13:43,685-Speed 13799.04 samples/sec Loss 1.3561 LearningRate 0.0001 Epoch: 26 Global Step: 46050 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-03-04 07:14:01,410-Speed 13866.04 samples/sec Loss 1.3402 LearningRate 0.0001 Epoch: 26 Global Step: 46060 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-03-04 07:14:19,152-Speed 13853.11 samples/sec Loss 1.3499 LearningRate 0.0001 Epoch: 26 Global Step: 46070 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-03-04 07:14:36,876-Speed 13866.77 samples/sec Loss 1.3506 LearningRate 0.0001 Epoch: 26 Global Step: 46080 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-03-04 07:14:54,653-Speed 13825.92 samples/sec Loss 1.3518 LearningRate 0.0001 Epoch: 26 Global Step: 46090 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-03-04 07:15:12,424-Speed 13830.15 samples/sec Loss 1.3478 LearningRate 0.0001 Epoch: 26 Global Step: 46100 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-03-04 07:15:30,314-Speed 13737.88 samples/sec Loss 1.3449 LearningRate 0.0001 Epoch: 26 Global Step: 46110 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-03-04 07:15:48,095-Speed 13821.85 samples/sec Loss 1.3539 LearningRate 0.0001 Epoch: 26 Global Step: 46120 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-03-04 07:16:06,044-Speed 13693.70 samples/sec Loss 1.3479 LearningRate 0.0001 Epoch: 26 Global Step: 46130 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-03-04 07:16:24,197-Speed 13539.77 samples/sec Loss 1.3466 LearningRate 0.0001 Epoch: 26 Global Step: 46140 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-03-04 07:16:42,273-Speed 13598.77 samples/sec Loss 1.3424 LearningRate 0.0001 Epoch: 26 Global Step: 46150 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-03-04 07:17:00,002-Speed 13862.36 samples/sec Loss 1.3499 LearningRate 0.0001 Epoch: 26 Global Step: 46160 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-03-04 07:17:17,740-Speed 13855.80 samples/sec Loss 1.3430 LearningRate 0.0001 Epoch: 26 Global Step: 46170 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-03-04 07:17:35,487-Speed 13849.07 samples/sec Loss 1.3363 LearningRate 0.0001 Epoch: 26 Global Step: 46180 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-03-04 07:17:53,191-Speed 13882.41 samples/sec Loss 1.3398 LearningRate 0.0001 Epoch: 26 Global Step: 46190 Fp16 Grad Scale: 8192 Required: 12 hours Training: 2022-03-04 07:18:10,956-Speed 13835.02 samples/sec Loss 1.3483 LearningRate 0.0001 Epoch: 26 Global Step: 46200 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-03-04 07:18:28,788-Speed 13783.57 samples/sec Loss 1.3468 LearningRate 0.0001 Epoch: 26 Global Step: 46210 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-03-04 07:18:46,592-Speed 13804.51 samples/sec Loss 1.3428 LearningRate 0.0001 Epoch: 26 Global Step: 46220 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-03-04 07:19:04,529-Speed 13702.35 samples/sec Loss 1.3409 LearningRate 0.0001 Epoch: 26 Global Step: 46230 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-03-04 07:19:22,420-Speed 13737.08 samples/sec Loss 1.3362 LearningRate 0.0001 Epoch: 26 Global Step: 46240 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-03-04 07:19:40,292-Speed 13752.47 samples/sec Loss 1.3377 LearningRate 0.0001 Epoch: 26 Global Step: 46250 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-03-04 07:19:58,105-Speed 13797.36 samples/sec Loss 1.3441 LearningRate 0.0001 Epoch: 26 Global Step: 46260 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-03-04 07:20:16,074-Speed 13677.62 samples/sec Loss 1.3369 LearningRate 0.0001 Epoch: 26 Global Step: 46270 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-03-04 07:20:33,912-Speed 13778.91 samples/sec Loss 1.3405 LearningRate 0.0001 Epoch: 26 Global Step: 46280 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-03-04 07:20:51,837-Speed 13710.76 samples/sec Loss 1.3405 LearningRate 0.0001 Epoch: 26 Global Step: 46290 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-03-04 07:21:09,665-Speed 13786.48 samples/sec Loss 1.3449 LearningRate 0.0001 Epoch: 26 Global Step: 46300 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 07:21:27,566-Speed 13729.31 samples/sec Loss 1.3478 LearningRate 0.0001 Epoch: 26 Global Step: 46310 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 07:21:45,399-Speed 13782.33 samples/sec Loss 1.3379 LearningRate 0.0001 Epoch: 26 Global Step: 46320 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 07:22:03,176-Speed 13825.48 samples/sec Loss 1.3317 LearningRate 0.0001 Epoch: 26 Global Step: 46330 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 07:22:20,998-Speed 13790.93 samples/sec Loss 1.3513 LearningRate 0.0001 Epoch: 26 Global Step: 46340 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 07:22:38,763-Speed 13835.04 samples/sec Loss 1.3413 LearningRate 0.0001 Epoch: 26 Global Step: 46350 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 07:22:56,455-Speed 13891.80 samples/sec Loss 1.3466 LearningRate 0.0001 Epoch: 26 Global Step: 46360 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 07:23:14,159-Speed 13882.76 samples/sec Loss 1.3518 LearningRate 0.0001 Epoch: 26 Global Step: 46370 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 07:23:31,884-Speed 13865.95 samples/sec Loss 1.3468 LearningRate 0.0001 Epoch: 26 Global Step: 46380 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 07:23:49,726-Speed 13775.40 samples/sec Loss 1.3407 LearningRate 0.0001 Epoch: 26 Global Step: 46390 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 07:24:07,478-Speed 13844.72 samples/sec Loss 1.3389 LearningRate 0.0001 Epoch: 26 Global Step: 46400 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-04 07:24:25,288-Speed 13800.46 samples/sec Loss 1.3381 LearningRate 0.0001 Epoch: 26 Global Step: 46410 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-04 07:24:43,085-Speed 13810.63 samples/sec Loss 1.3384 LearningRate 0.0001 Epoch: 26 Global Step: 46420 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 07:25:00,894-Speed 13800.25 samples/sec Loss 1.3327 LearningRate 0.0001 Epoch: 26 Global Step: 46430 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 07:25:18,810-Speed 13718.61 samples/sec Loss 1.3336 LearningRate 0.0001 Epoch: 26 Global Step: 46440 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 07:25:36,843-Speed 13628.88 samples/sec Loss 1.3415 LearningRate 0.0001 Epoch: 26 Global Step: 46450 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 07:25:54,744-Speed 13730.14 samples/sec Loss 1.3278 LearningRate 0.0001 Epoch: 26 Global Step: 46460 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 07:26:12,672-Speed 13708.70 samples/sec Loss 1.3342 LearningRate 0.0001 Epoch: 26 Global Step: 46470 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 07:26:30,466-Speed 13812.79 samples/sec Loss 1.3357 LearningRate 0.0001 Epoch: 26 Global Step: 46480 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 07:26:48,299-Speed 13781.42 samples/sec Loss 1.3328 LearningRate 0.0001 Epoch: 26 Global Step: 46490 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 07:27:06,132-Speed 13783.21 samples/sec Loss 1.3390 LearningRate 0.0001 Epoch: 26 Global Step: 46500 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 07:27:23,872-Speed 13854.37 samples/sec Loss 1.3429 LearningRate 0.0001 Epoch: 26 Global Step: 46510 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 07:27:41,627-Speed 13842.63 samples/sec Loss 1.3317 LearningRate 0.0001 Epoch: 26 Global Step: 46520 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-04 07:27:59,332-Speed 13880.89 samples/sec Loss 1.3455 LearningRate 0.0001 Epoch: 26 Global Step: 46530 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 07:28:17,095-Speed 13836.31 samples/sec Loss 1.3435 LearningRate 0.0001 Epoch: 26 Global Step: 46540 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 07:28:34,888-Speed 13813.41 samples/sec Loss 1.3354 LearningRate 0.0001 Epoch: 26 Global Step: 46550 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 07:28:52,710-Speed 13790.83 samples/sec Loss 1.3414 LearningRate 0.0001 Epoch: 26 Global Step: 46560 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 07:29:10,438-Speed 13863.66 samples/sec Loss 1.3364 LearningRate 0.0001 Epoch: 26 Global Step: 46570 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 07:29:28,146-Speed 13879.03 samples/sec Loss 1.3487 LearningRate 0.0001 Epoch: 26 Global Step: 46580 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 07:29:46,112-Speed 13680.51 samples/sec Loss 1.3453 LearningRate 0.0001 Epoch: 26 Global Step: 46590 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 07:30:04,206-Speed 13582.81 samples/sec Loss 1.3318 LearningRate 0.0001 Epoch: 26 Global Step: 46600 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 07:30:22,039-Speed 13782.15 samples/sec Loss 1.3381 LearningRate 0.0001 Epoch: 26 Global Step: 46610 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 07:30:39,801-Speed 13837.57 samples/sec Loss 1.3379 LearningRate 0.0001 Epoch: 26 Global Step: 46620 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 07:30:57,520-Speed 13871.13 samples/sec Loss 1.3462 LearningRate 0.0001 Epoch: 26 Global Step: 46630 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-04 07:31:15,358-Speed 13778.28 samples/sec Loss 1.3370 LearningRate 0.0001 Epoch: 26 Global Step: 46640 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-04 07:31:33,114-Speed 13842.26 samples/sec Loss 1.3391 LearningRate 0.0001 Epoch: 26 Global Step: 46650 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 07:31:50,964-Speed 13768.48 samples/sec Loss 1.3561 LearningRate 0.0001 Epoch: 26 Global Step: 46660 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 07:32:59,441-Speed 3589.03 samples/sec Loss 1.3326 LearningRate 0.0001 Epoch: 27 Global Step: 46670 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 07:33:17,135-Speed 13892.29 samples/sec Loss 1.3333 LearningRate 0.0001 Epoch: 27 Global Step: 46680 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 07:33:34,822-Speed 13896.37 samples/sec Loss 1.3217 LearningRate 0.0001 Epoch: 27 Global Step: 46690 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 07:33:52,562-Speed 13853.58 samples/sec Loss 1.3333 LearningRate 0.0001 Epoch: 27 Global Step: 46700 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-03-04 07:34:10,535-Speed 13674.72 samples/sec Loss 1.3260 LearningRate 0.0001 Epoch: 27 Global Step: 46710 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-03-04 07:34:28,412-Speed 13748.70 samples/sec Loss 1.3229 LearningRate 0.0001 Epoch: 27 Global Step: 46720 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-03-04 07:34:46,188-Speed 13825.84 samples/sec Loss 1.3280 LearningRate 0.0001 Epoch: 27 Global Step: 46730 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-03-04 07:35:04,033-Speed 13773.16 samples/sec Loss 1.3249 LearningRate 0.0001 Epoch: 27 Global Step: 46740 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-03-04 07:35:21,750-Speed 13872.21 samples/sec Loss 1.3308 LearningRate 0.0001 Epoch: 27 Global Step: 46750 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-03-04 07:35:39,581-Speed 13784.01 samples/sec Loss 1.3251 LearningRate 0.0001 Epoch: 27 Global Step: 46760 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-03-04 07:35:57,362-Speed 13822.15 samples/sec Loss 1.3340 LearningRate 0.0001 Epoch: 27 Global Step: 46770 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-03-04 07:36:15,175-Speed 13797.40 samples/sec Loss 1.3222 LearningRate 0.0001 Epoch: 27 Global Step: 46780 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-03-04 07:36:32,981-Speed 13803.09 samples/sec Loss 1.3143 LearningRate 0.0001 Epoch: 27 Global Step: 46790 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-03-04 07:36:50,772-Speed 13814.54 samples/sec Loss 1.3245 LearningRate 0.0001 Epoch: 27 Global Step: 46800 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 07:37:08,613-Speed 13776.51 samples/sec Loss 1.3305 LearningRate 0.0001 Epoch: 27 Global Step: 46810 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 07:37:26,362-Speed 13847.97 samples/sec Loss 1.3234 LearningRate 0.0001 Epoch: 27 Global Step: 46820 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 07:37:44,206-Speed 13773.09 samples/sec Loss 1.3205 LearningRate 0.0001 Epoch: 27 Global Step: 46830 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 07:38:02,042-Speed 13779.90 samples/sec Loss 1.3240 LearningRate 0.0001 Epoch: 27 Global Step: 46840 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 07:38:19,849-Speed 13802.52 samples/sec Loss 1.3264 LearningRate 0.0001 Epoch: 27 Global Step: 46850 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 07:38:37,600-Speed 13845.54 samples/sec Loss 1.3225 LearningRate 0.0001 Epoch: 27 Global Step: 46860 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 07:38:55,382-Speed 13821.78 samples/sec Loss 1.3232 LearningRate 0.0001 Epoch: 27 Global Step: 46870 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 07:39:13,243-Speed 13760.22 samples/sec Loss 1.3209 LearningRate 0.0001 Epoch: 27 Global Step: 46880 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 07:39:31,011-Speed 13832.93 samples/sec Loss 1.3254 LearningRate 0.0001 Epoch: 27 Global Step: 46890 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 07:39:48,768-Speed 13840.91 samples/sec Loss 1.3243 LearningRate 0.0001 Epoch: 27 Global Step: 46900 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-04 07:40:06,543-Speed 13826.81 samples/sec Loss 1.3225 LearningRate 0.0001 Epoch: 27 Global Step: 46910 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-04 07:40:24,281-Speed 13855.94 samples/sec Loss 1.3289 LearningRate 0.0001 Epoch: 27 Global Step: 46920 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 07:40:42,026-Speed 13850.74 samples/sec Loss 1.3153 LearningRate 0.0001 Epoch: 27 Global Step: 46930 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 07:40:59,823-Speed 13810.06 samples/sec Loss 1.3230 LearningRate 0.0001 Epoch: 27 Global Step: 46940 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 07:41:17,656-Speed 13781.97 samples/sec Loss 1.3277 LearningRate 0.0001 Epoch: 27 Global Step: 46950 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 07:41:35,420-Speed 13835.76 samples/sec Loss 1.3348 LearningRate 0.0001 Epoch: 27 Global Step: 46960 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 07:41:53,195-Speed 13827.40 samples/sec Loss 1.3228 LearningRate 0.0001 Epoch: 27 Global Step: 46970 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 07:42:10,890-Speed 13889.39 samples/sec Loss 1.3201 LearningRate 0.0001 Epoch: 27 Global Step: 46980 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 07:42:28,646-Speed 13841.80 samples/sec Loss 1.3186 LearningRate 0.0001 Epoch: 27 Global Step: 46990 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 07:42:46,368-Speed 13868.23 samples/sec Loss 1.3222 LearningRate 0.0001 Epoch: 27 Global Step: 47000 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 07:43:04,115-Speed 13849.69 samples/sec Loss 1.3317 LearningRate 0.0001 Epoch: 27 Global Step: 47010 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 07:43:21,834-Speed 13870.77 samples/sec Loss 1.3200 LearningRate 0.0001 Epoch: 27 Global Step: 47020 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 07:43:39,588-Speed 13842.85 samples/sec Loss 1.3258 LearningRate 0.0001 Epoch: 27 Global Step: 47030 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 07:43:57,291-Speed 13883.50 samples/sec Loss 1.3277 LearningRate 0.0001 Epoch: 27 Global Step: 47040 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 07:44:15,177-Speed 13741.28 samples/sec Loss 1.3290 LearningRate 0.0001 Epoch: 27 Global Step: 47050 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 07:44:33,037-Speed 13761.32 samples/sec Loss 1.3249 LearningRate 0.0001 Epoch: 27 Global Step: 47060 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 07:44:50,737-Speed 13885.09 samples/sec Loss 1.3308 LearningRate 0.0001 Epoch: 27 Global Step: 47070 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 07:45:08,519-Speed 13822.16 samples/sec Loss 1.3114 LearningRate 0.0001 Epoch: 27 Global Step: 47080 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 07:45:26,232-Speed 13875.05 samples/sec Loss 1.3131 LearningRate 0.0001 Epoch: 27 Global Step: 47090 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 07:45:43,938-Speed 13880.78 samples/sec Loss 1.3136 LearningRate 0.0001 Epoch: 27 Global Step: 47100 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 07:46:01,678-Speed 13854.79 samples/sec Loss 1.3220 LearningRate 0.0001 Epoch: 27 Global Step: 47110 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 07:46:19,417-Speed 13855.15 samples/sec Loss 1.3248 LearningRate 0.0001 Epoch: 27 Global Step: 47120 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-04 07:46:37,176-Speed 13839.23 samples/sec Loss 1.3093 LearningRate 0.0001 Epoch: 27 Global Step: 47130 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 07:46:54,885-Speed 13878.31 samples/sec Loss 1.3149 LearningRate 0.0001 Epoch: 27 Global Step: 47140 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 07:47:12,593-Speed 13879.39 samples/sec Loss 1.3265 LearningRate 0.0001 Epoch: 27 Global Step: 47150 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 07:47:30,358-Speed 13834.86 samples/sec Loss 1.3169 LearningRate 0.0001 Epoch: 27 Global Step: 47160 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 07:47:48,127-Speed 13831.98 samples/sec Loss 1.3102 LearningRate 0.0001 Epoch: 27 Global Step: 47170 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 07:48:05,864-Speed 13857.59 samples/sec Loss 1.3173 LearningRate 0.0001 Epoch: 27 Global Step: 47180 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 07:48:23,524-Speed 13916.46 samples/sec Loss 1.3151 LearningRate 0.0001 Epoch: 27 Global Step: 47190 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 07:48:41,277-Speed 13844.22 samples/sec Loss 1.3135 LearningRate 0.0001 Epoch: 27 Global Step: 47200 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 07:48:58,982-Speed 13881.95 samples/sec Loss 1.3157 LearningRate 0.0001 Epoch: 27 Global Step: 47210 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 07:49:16,742-Speed 13840.24 samples/sec Loss 1.3173 LearningRate 0.0001 Epoch: 27 Global Step: 47220 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 07:49:34,472-Speed 13863.29 samples/sec Loss 1.3044 LearningRate 0.0001 Epoch: 27 Global Step: 47230 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-04 07:49:52,362-Speed 13738.45 samples/sec Loss 1.3172 LearningRate 0.0001 Epoch: 27 Global Step: 47240 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-04 07:50:10,037-Speed 13904.66 samples/sec Loss 1.3193 LearningRate 0.0001 Epoch: 27 Global Step: 47250 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-04 07:50:27,726-Speed 13894.73 samples/sec Loss 1.3160 LearningRate 0.0001 Epoch: 27 Global Step: 47260 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 07:50:45,488-Speed 13839.44 samples/sec Loss 1.3162 LearningRate 0.0001 Epoch: 27 Global Step: 47270 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 07:51:03,274-Speed 13818.37 samples/sec Loss 1.3072 LearningRate 0.0001 Epoch: 27 Global Step: 47280 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 07:51:21,092-Speed 13793.80 samples/sec Loss 1.3121 LearningRate 0.0001 Epoch: 27 Global Step: 47290 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 07:51:38,899-Speed 13802.03 samples/sec Loss 1.3156 LearningRate 0.0001 Epoch: 27 Global Step: 47300 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-03-04 07:51:57,065-Speed 13529.40 samples/sec Loss 1.3167 LearningRate 0.0001 Epoch: 27 Global Step: 47310 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-03-04 07:52:15,364-Speed 13431.45 samples/sec Loss 1.3268 LearningRate 0.0001 Epoch: 27 Global Step: 47320 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-03-04 07:52:33,479-Speed 13567.10 samples/sec Loss 1.3092 LearningRate 0.0001 Epoch: 27 Global Step: 47330 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-03-04 07:52:51,644-Speed 13530.02 samples/sec Loss 1.3108 LearningRate 0.0001 Epoch: 27 Global Step: 47340 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-03-04 07:53:09,718-Speed 13598.66 samples/sec Loss 1.3140 LearningRate 0.0001 Epoch: 27 Global Step: 47350 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-03-04 07:53:27,537-Speed 13792.60 samples/sec Loss 1.3123 LearningRate 0.0001 Epoch: 27 Global Step: 47360 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-03-04 07:53:45,316-Speed 13823.68 samples/sec Loss 1.3072 LearningRate 0.0001 Epoch: 27 Global Step: 47370 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-03-04 07:54:03,104-Speed 13816.97 samples/sec Loss 1.3049 LearningRate 0.0001 Epoch: 27 Global Step: 47380 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-03-04 07:54:20,861-Speed 13841.30 samples/sec Loss 1.3140 LearningRate 0.0001 Epoch: 27 Global Step: 47390 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-03-04 07:54:38,632-Speed 13830.18 samples/sec Loss 1.3036 LearningRate 0.0001 Epoch: 27 Global Step: 47400 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-03-04 07:54:56,365-Speed 13859.92 samples/sec Loss 1.3085 LearningRate 0.0001 Epoch: 27 Global Step: 47410 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-03-04 07:55:14,133-Speed 13832.31 samples/sec Loss 1.3052 LearningRate 0.0001 Epoch: 27 Global Step: 47420 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-03-04 07:55:31,936-Speed 13805.84 samples/sec Loss 1.3029 LearningRate 0.0001 Epoch: 27 Global Step: 47430 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-03-04 07:55:49,736-Speed 13807.83 samples/sec Loss 1.3140 LearningRate 0.0001 Epoch: 27 Global Step: 47440 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-03-04 07:56:07,530-Speed 13811.89 samples/sec Loss 1.3041 LearningRate 0.0001 Epoch: 27 Global Step: 47450 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-03-04 07:56:25,359-Speed 13785.32 samples/sec Loss 1.2984 LearningRate 0.0001 Epoch: 27 Global Step: 47460 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-03-04 07:56:43,178-Speed 13793.22 samples/sec Loss 1.3088 LearningRate 0.0001 Epoch: 27 Global Step: 47470 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-03-04 07:57:00,982-Speed 13804.38 samples/sec Loss 1.2958 LearningRate 0.0001 Epoch: 27 Global Step: 47480 Fp16 Grad Scale: 8192 Required: 11 hours Training: 2022-03-04 07:57:18,777-Speed 13813.69 samples/sec Loss 1.3067 LearningRate 0.0001 Epoch: 27 Global Step: 47490 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-03-04 07:57:36,592-Speed 13795.17 samples/sec Loss 1.3076 LearningRate 0.0001 Epoch: 27 Global Step: 47500 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-03-04 07:57:54,401-Speed 13800.83 samples/sec Loss 1.3100 LearningRate 0.0001 Epoch: 27 Global Step: 47510 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-03-04 07:58:12,241-Speed 13777.53 samples/sec Loss 1.3092 LearningRate 0.0001 Epoch: 27 Global Step: 47520 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-03-04 07:58:30,053-Speed 13798.91 samples/sec Loss 1.3004 LearningRate 0.0001 Epoch: 27 Global Step: 47530 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-03-04 07:58:47,878-Speed 13787.90 samples/sec Loss 1.3096 LearningRate 0.0001 Epoch: 27 Global Step: 47540 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-03-04 07:59:05,713-Speed 13780.71 samples/sec Loss 1.3114 LearningRate 0.0001 Epoch: 27 Global Step: 47550 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-03-04 07:59:23,536-Speed 13789.95 samples/sec Loss 1.3071 LearningRate 0.0001 Epoch: 27 Global Step: 47560 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-03-04 07:59:41,331-Speed 13811.49 samples/sec Loss 1.3070 LearningRate 0.0001 Epoch: 27 Global Step: 47570 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-03-04 07:59:59,115-Speed 13820.12 samples/sec Loss 1.2907 LearningRate 0.0001 Epoch: 27 Global Step: 47580 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-03-04 08:00:16,919-Speed 13804.57 samples/sec Loss 1.3034 LearningRate 0.0001 Epoch: 27 Global Step: 47590 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 08:00:34,691-Speed 13829.74 samples/sec Loss 1.2927 LearningRate 0.0001 Epoch: 27 Global Step: 47600 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 08:00:52,487-Speed 13810.60 samples/sec Loss 1.3065 LearningRate 0.0001 Epoch: 27 Global Step: 47610 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 08:01:10,283-Speed 13811.40 samples/sec Loss 1.3101 LearningRate 0.0001 Epoch: 27 Global Step: 47620 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 08:01:28,150-Speed 13755.43 samples/sec Loss 1.2978 LearningRate 0.0001 Epoch: 27 Global Step: 47630 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 08:01:45,856-Speed 13881.40 samples/sec Loss 1.3158 LearningRate 0.0001 Epoch: 27 Global Step: 47640 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 08:02:03,718-Speed 13759.65 samples/sec Loss 1.2932 LearningRate 0.0001 Epoch: 27 Global Step: 47650 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 08:02:21,425-Speed 13879.68 samples/sec Loss 1.2984 LearningRate 0.0001 Epoch: 27 Global Step: 47660 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 08:02:39,174-Speed 13847.59 samples/sec Loss 1.3065 LearningRate 0.0001 Epoch: 27 Global Step: 47670 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 08:02:56,974-Speed 13808.06 samples/sec Loss 1.3087 LearningRate 0.0001 Epoch: 27 Global Step: 47680 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 08:03:14,758-Speed 13819.91 samples/sec Loss 1.3011 LearningRate 0.0001 Epoch: 27 Global Step: 47690 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-04 08:03:32,483-Speed 13865.98 samples/sec Loss 1.3020 LearningRate 0.0001 Epoch: 27 Global Step: 47700 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 08:03:50,209-Speed 13865.43 samples/sec Loss 1.3018 LearningRate 0.0001 Epoch: 27 Global Step: 47710 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 08:04:07,927-Speed 13871.00 samples/sec Loss 1.3009 LearningRate 0.0001 Epoch: 27 Global Step: 47720 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 08:04:25,788-Speed 13760.89 samples/sec Loss 1.3062 LearningRate 0.0001 Epoch: 27 Global Step: 47730 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 08:04:43,550-Speed 13838.39 samples/sec Loss 1.3065 LearningRate 0.0001 Epoch: 27 Global Step: 47740 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 08:05:01,276-Speed 13865.81 samples/sec Loss 1.3057 LearningRate 0.0001 Epoch: 27 Global Step: 47750 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 08:05:19,181-Speed 13726.50 samples/sec Loss 1.2973 LearningRate 0.0001 Epoch: 27 Global Step: 47760 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 08:05:37,076-Speed 13734.48 samples/sec Loss 1.3052 LearningRate 0.0001 Epoch: 27 Global Step: 47770 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 08:05:55,008-Speed 13705.81 samples/sec Loss 1.3091 LearningRate 0.0001 Epoch: 27 Global Step: 47780 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 08:06:13,041-Speed 13629.50 samples/sec Loss 1.2966 LearningRate 0.0001 Epoch: 27 Global Step: 47790 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 08:06:31,089-Speed 13617.46 samples/sec Loss 1.2922 LearningRate 0.0001 Epoch: 27 Global Step: 47800 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-04 08:06:49,137-Speed 13618.38 samples/sec Loss 1.2933 LearningRate 0.0001 Epoch: 27 Global Step: 47810 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-04 08:07:07,198-Speed 13607.53 samples/sec Loss 1.2953 LearningRate 0.0001 Epoch: 27 Global Step: 47820 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-04 08:07:25,477-Speed 13446.36 samples/sec Loss 1.3008 LearningRate 0.0001 Epoch: 27 Global Step: 47830 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-04 08:07:43,432-Speed 13688.30 samples/sec Loss 1.3030 LearningRate 0.0001 Epoch: 27 Global Step: 47840 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 08:08:01,182-Speed 13848.60 samples/sec Loss 1.2940 LearningRate 0.0001 Epoch: 27 Global Step: 47850 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 08:08:18,916-Speed 13860.05 samples/sec Loss 1.2917 LearningRate 0.0001 Epoch: 27 Global Step: 47860 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 08:08:36,617-Speed 13885.00 samples/sec Loss 1.2982 LearningRate 0.0001 Epoch: 27 Global Step: 47870 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 08:08:54,384-Speed 13832.76 samples/sec Loss 1.2856 LearningRate 0.0001 Epoch: 27 Global Step: 47880 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 08:09:12,164-Speed 13823.71 samples/sec Loss 1.2956 LearningRate 0.0001 Epoch: 27 Global Step: 47890 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 08:09:29,920-Speed 13841.66 samples/sec Loss 1.2945 LearningRate 0.0001 Epoch: 27 Global Step: 47900 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 08:09:47,738-Speed 13793.35 samples/sec Loss 1.2938 LearningRate 0.0001 Epoch: 27 Global Step: 47910 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 08:10:05,514-Speed 13826.60 samples/sec Loss 1.2893 LearningRate 0.0001 Epoch: 27 Global Step: 47920 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 08:10:23,305-Speed 13814.13 samples/sec Loss 1.2924 LearningRate 0.0001 Epoch: 27 Global Step: 47930 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 08:10:41,026-Speed 13869.60 samples/sec Loss 1.2906 LearningRate 0.0001 Epoch: 27 Global Step: 47940 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 08:10:58,777-Speed 13845.91 samples/sec Loss 1.2942 LearningRate 0.0001 Epoch: 27 Global Step: 47950 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 08:11:16,552-Speed 13827.04 samples/sec Loss 1.2949 LearningRate 0.0001 Epoch: 27 Global Step: 47960 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 08:11:34,286-Speed 13859.05 samples/sec Loss 1.2901 LearningRate 0.0001 Epoch: 27 Global Step: 47970 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 08:11:52,053-Speed 13833.33 samples/sec Loss 1.2874 LearningRate 0.0001 Epoch: 27 Global Step: 47980 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 08:12:09,784-Speed 13862.41 samples/sec Loss 1.2857 LearningRate 0.0001 Epoch: 27 Global Step: 47990 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 08:12:27,481-Speed 13888.29 samples/sec Loss 1.2883 LearningRate 0.0001 Epoch: 27 Global Step: 48000 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 08:12:45,243-Speed 13836.40 samples/sec Loss 1.2827 LearningRate 0.0001 Epoch: 27 Global Step: 48010 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 08:13:02,947-Speed 13883.13 samples/sec Loss 1.2889 LearningRate 0.0001 Epoch: 27 Global Step: 48020 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 08:13:20,663-Speed 13873.32 samples/sec Loss 1.2794 LearningRate 0.0001 Epoch: 27 Global Step: 48030 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 08:13:38,404-Speed 13852.82 samples/sec Loss 1.2863 LearningRate 0.0001 Epoch: 27 Global Step: 48040 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-04 08:13:56,119-Speed 13873.93 samples/sec Loss 1.2840 LearningRate 0.0001 Epoch: 27 Global Step: 48050 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-04 08:14:13,827-Speed 13880.10 samples/sec Loss 1.2892 LearningRate 0.0001 Epoch: 27 Global Step: 48060 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-04 08:14:31,612-Speed 13819.45 samples/sec Loss 1.2765 LearningRate 0.0001 Epoch: 27 Global Step: 48070 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-04 08:14:49,413-Speed 13807.13 samples/sec Loss 1.2820 LearningRate 0.0001 Epoch: 27 Global Step: 48080 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-03-04 08:15:07,190-Speed 13824.93 samples/sec Loss 1.2930 LearningRate 0.0001 Epoch: 27 Global Step: 48090 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 08:15:24,887-Speed 13888.33 samples/sec Loss 1.2826 LearningRate 0.0001 Epoch: 27 Global Step: 48100 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 08:15:42,667-Speed 13823.97 samples/sec Loss 1.2958 LearningRate 0.0001 Epoch: 27 Global Step: 48110 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 08:16:00,388-Speed 13869.41 samples/sec Loss 1.2935 LearningRate 0.0001 Epoch: 27 Global Step: 48120 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 08:16:18,160-Speed 13828.75 samples/sec Loss 1.2916 LearningRate 0.0001 Epoch: 27 Global Step: 48130 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 08:16:35,941-Speed 13822.97 samples/sec Loss 1.2886 LearningRate 0.0001 Epoch: 27 Global Step: 48140 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 08:16:53,695-Speed 13842.91 samples/sec Loss 1.2878 LearningRate 0.0001 Epoch: 27 Global Step: 48150 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 08:17:11,499-Speed 13805.16 samples/sec Loss 1.2841 LearningRate 0.0001 Epoch: 27 Global Step: 48160 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 08:17:29,416-Speed 13717.13 samples/sec Loss 1.2879 LearningRate 0.0001 Epoch: 27 Global Step: 48170 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 08:17:47,204-Speed 13816.62 samples/sec Loss 1.2849 LearningRate 0.0001 Epoch: 27 Global Step: 48180 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 08:18:04,924-Speed 13870.43 samples/sec Loss 1.2896 LearningRate 0.0001 Epoch: 27 Global Step: 48190 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 08:18:22,653-Speed 13862.63 samples/sec Loss 1.2889 LearningRate 0.0001 Epoch: 27 Global Step: 48200 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 08:18:40,390-Speed 13857.11 samples/sec Loss 1.2973 LearningRate 0.0001 Epoch: 27 Global Step: 48210 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 08:18:58,100-Speed 13877.59 samples/sec Loss 1.2895 LearningRate 0.0001 Epoch: 27 Global Step: 48220 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-03-04 08:19:15,894-Speed 13812.17 samples/sec Loss 1.2820 LearningRate 0.0001 Epoch: 27 Global Step: 48230 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-04 08:19:33,643-Speed 13847.65 samples/sec Loss 1.2920 LearningRate 0.0001 Epoch: 27 Global Step: 48240 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-04 08:19:51,459-Speed 13794.57 samples/sec Loss 1.2885 LearningRate 0.0001 Epoch: 27 Global Step: 48250 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-04 08:20:09,170-Speed 13877.48 samples/sec Loss 1.2824 LearningRate 0.0001 Epoch: 27 Global Step: 48260 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-04 08:20:26,899-Speed 13863.02 samples/sec Loss 1.2914 LearningRate 0.0001 Epoch: 27 Global Step: 48270 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-04 08:20:44,611-Speed 13878.61 samples/sec Loss 1.2857 LearningRate 0.0001 Epoch: 27 Global Step: 48280 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-04 08:21:02,379-Speed 13833.77 samples/sec Loss 1.2889 LearningRate 0.0001 Epoch: 27 Global Step: 48290 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-04 08:21:20,091-Speed 13875.88 samples/sec Loss 1.2802 LearningRate 0.0001 Epoch: 27 Global Step: 48300 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-04 08:21:37,840-Speed 13847.07 samples/sec Loss 1.2820 LearningRate 0.0001 Epoch: 27 Global Step: 48310 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-04 08:21:55,561-Speed 13871.03 samples/sec Loss 1.2912 LearningRate 0.0001 Epoch: 27 Global Step: 48320 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-04 08:22:13,296-Speed 13858.79 samples/sec Loss 1.2820 LearningRate 0.0001 Epoch: 27 Global Step: 48330 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-04 08:22:31,113-Speed 13794.03 samples/sec Loss 1.2827 LearningRate 0.0001 Epoch: 27 Global Step: 48340 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-04 08:22:48,940-Speed 13786.27 samples/sec Loss 1.2899 LearningRate 0.0001 Epoch: 27 Global Step: 48350 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-04 08:23:06,738-Speed 13809.48 samples/sec Loss 1.2928 LearningRate 0.0001 Epoch: 27 Global Step: 48360 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-04 08:23:24,525-Speed 13817.74 samples/sec Loss 1.2894 LearningRate 0.0001 Epoch: 27 Global Step: 48370 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-04 08:23:42,349-Speed 13789.25 samples/sec Loss 1.2893 LearningRate 0.0001 Epoch: 27 Global Step: 48380 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-04 08:24:00,174-Speed 13788.49 samples/sec Loss 1.2891 LearningRate 0.0001 Epoch: 27 Global Step: 48390 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-04 08:25:08,215-Speed 3611.97 samples/sec Loss 1.2847 LearningRate 0.0001 Epoch: 28 Global Step: 48400 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-04 08:25:25,913-Speed 13887.51 samples/sec Loss 1.2774 LearningRate 0.0001 Epoch: 28 Global Step: 48410 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-04 08:25:43,589-Speed 13904.51 samples/sec Loss 1.2691 LearningRate 0.0001 Epoch: 28 Global Step: 48420 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-04 08:26:01,347-Speed 13840.19 samples/sec Loss 1.2733 LearningRate 0.0001 Epoch: 28 Global Step: 48430 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-04 08:26:18,944-Speed 13967.55 samples/sec Loss 1.2772 LearningRate 0.0001 Epoch: 28 Global Step: 48440 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-04 08:26:36,628-Speed 13898.11 samples/sec Loss 1.2714 LearningRate 0.0001 Epoch: 28 Global Step: 48450 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-03-04 08:26:54,253-Speed 13945.06 samples/sec Loss 1.2632 LearningRate 0.0001 Epoch: 28 Global Step: 48460 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-03-04 08:27:11,963-Speed 13877.52 samples/sec Loss 1.2774 LearningRate 0.0001 Epoch: 28 Global Step: 48470 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-03-04 08:27:29,691-Speed 13864.10 samples/sec Loss 1.2710 LearningRate 0.0001 Epoch: 28 Global Step: 48480 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-03-04 08:27:47,453-Speed 13837.30 samples/sec Loss 1.2728 LearningRate 0.0001 Epoch: 28 Global Step: 48490 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-03-04 08:28:05,218-Speed 13835.13 samples/sec Loss 1.2759 LearningRate 0.0001 Epoch: 28 Global Step: 48500 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-03-04 08:28:22,962-Speed 13852.23 samples/sec Loss 1.2760 LearningRate 0.0001 Epoch: 28 Global Step: 48510 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-03-04 08:28:40,611-Speed 13925.42 samples/sec Loss 1.2630 LearningRate 0.0001 Epoch: 28 Global Step: 48520 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-03-04 08:28:58,341-Speed 13862.16 samples/sec Loss 1.2715 LearningRate 0.0001 Epoch: 28 Global Step: 48530 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-03-04 08:29:15,989-Speed 13926.64 samples/sec Loss 1.2744 LearningRate 0.0001 Epoch: 28 Global Step: 48540 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-03-04 08:29:33,675-Speed 13896.60 samples/sec Loss 1.2714 LearningRate 0.0001 Epoch: 28 Global Step: 48550 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-04 08:29:51,422-Speed 13848.71 samples/sec Loss 1.2651 LearningRate 0.0001 Epoch: 28 Global Step: 48560 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-04 08:30:09,110-Speed 13895.74 samples/sec Loss 1.2777 LearningRate 0.0001 Epoch: 28 Global Step: 48570 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-04 08:30:26,816-Speed 13880.49 samples/sec Loss 1.2762 LearningRate 0.0001 Epoch: 28 Global Step: 48580 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-04 08:30:44,542-Speed 13865.55 samples/sec Loss 1.2667 LearningRate 0.0001 Epoch: 28 Global Step: 48590 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-04 08:31:02,312-Speed 13830.67 samples/sec Loss 1.2691 LearningRate 0.0001 Epoch: 28 Global Step: 48600 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-04 08:31:20,057-Speed 13850.60 samples/sec Loss 1.2753 LearningRate 0.0001 Epoch: 28 Global Step: 48610 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-04 08:31:37,729-Speed 13907.86 samples/sec Loss 1.2772 LearningRate 0.0001 Epoch: 28 Global Step: 48620 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-04 08:31:55,442-Speed 13874.94 samples/sec Loss 1.2865 LearningRate 0.0001 Epoch: 28 Global Step: 48630 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-04 08:32:13,207-Speed 13834.66 samples/sec Loss 1.2797 LearningRate 0.0001 Epoch: 28 Global Step: 48640 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-04 08:32:31,181-Speed 13673.98 samples/sec Loss 1.2844 LearningRate 0.0001 Epoch: 28 Global Step: 48650 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-04 08:32:49,264-Speed 13591.58 samples/sec Loss 1.2719 LearningRate 0.0001 Epoch: 28 Global Step: 48660 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-04 08:33:07,434-Speed 13526.32 samples/sec Loss 1.2697 LearningRate 0.0001 Epoch: 28 Global Step: 48670 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-04 08:33:25,518-Speed 13591.09 samples/sec Loss 1.2706 LearningRate 0.0001 Epoch: 28 Global Step: 48680 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-04 08:33:43,719-Speed 13502.90 samples/sec Loss 1.2699 LearningRate 0.0001 Epoch: 28 Global Step: 48690 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-04 08:34:01,613-Speed 13735.82 samples/sec Loss 1.2788 LearningRate 0.0001 Epoch: 28 Global Step: 48700 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-04 08:34:19,442-Speed 13785.45 samples/sec Loss 1.2627 LearningRate 0.0001 Epoch: 28 Global Step: 48710 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-04 08:34:37,158-Speed 13872.75 samples/sec Loss 1.2654 LearningRate 0.0001 Epoch: 28 Global Step: 48720 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-04 08:34:54,837-Speed 13901.98 samples/sec Loss 1.2656 LearningRate 0.0001 Epoch: 28 Global Step: 48730 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-04 08:35:12,484-Speed 13926.99 samples/sec Loss 1.2794 LearningRate 0.0001 Epoch: 28 Global Step: 48740 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-03-04 08:35:30,255-Speed 13830.77 samples/sec Loss 1.2619 LearningRate 0.0001 Epoch: 28 Global Step: 48750 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-03-04 08:35:47,941-Speed 13896.61 samples/sec Loss 1.2640 LearningRate 0.0001 Epoch: 28 Global Step: 48760 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-03-04 08:36:05,716-Speed 13827.12 samples/sec Loss 1.2732 LearningRate 0.0001 Epoch: 28 Global Step: 48770 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-03-04 08:36:23,495-Speed 13823.06 samples/sec Loss 1.2721 LearningRate 0.0001 Epoch: 28 Global Step: 48780 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-03-04 08:36:41,236-Speed 13853.90 samples/sec Loss 1.2636 LearningRate 0.0001 Epoch: 28 Global Step: 48790 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-03-04 08:36:59,075-Speed 13777.66 samples/sec Loss 1.2691 LearningRate 0.0001 Epoch: 28 Global Step: 48800 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-03-04 08:37:16,893-Speed 13793.25 samples/sec Loss 1.2736 LearningRate 0.0001 Epoch: 28 Global Step: 48810 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-03-04 08:37:34,814-Speed 13714.63 samples/sec Loss 1.2678 LearningRate 0.0001 Epoch: 28 Global Step: 48820 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-03-04 08:37:52,645-Speed 13783.84 samples/sec Loss 1.2558 LearningRate 0.0001 Epoch: 28 Global Step: 48830 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-03-04 08:38:10,470-Speed 13788.76 samples/sec Loss 1.2700 LearningRate 0.0001 Epoch: 28 Global Step: 48840 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-04 08:38:28,333-Speed 13759.05 samples/sec Loss 1.2611 LearningRate 0.0001 Epoch: 28 Global Step: 48850 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-04 08:38:46,126-Speed 13812.48 samples/sec Loss 1.2635 LearningRate 0.0001 Epoch: 28 Global Step: 48860 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-04 08:39:04,085-Speed 13685.78 samples/sec Loss 1.2593 LearningRate 0.0001 Epoch: 28 Global Step: 48870 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-04 08:39:21,898-Speed 13797.39 samples/sec Loss 1.2699 LearningRate 0.0001 Epoch: 28 Global Step: 48880 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-04 08:39:39,780-Speed 13744.29 samples/sec Loss 1.2597 LearningRate 0.0001 Epoch: 28 Global Step: 48890 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-04 08:39:57,570-Speed 13816.89 samples/sec Loss 1.2690 LearningRate 0.0001 Epoch: 28 Global Step: 48900 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-04 08:40:15,315-Speed 13851.36 samples/sec Loss 1.2751 LearningRate 0.0001 Epoch: 28 Global Step: 48910 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-04 08:40:33,058-Speed 13851.51 samples/sec Loss 1.2715 LearningRate 0.0001 Epoch: 28 Global Step: 48920 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-04 08:40:50,830-Speed 13829.39 samples/sec Loss 1.2680 LearningRate 0.0001 Epoch: 28 Global Step: 48930 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-03-04 08:41:08,645-Speed 13797.19 samples/sec Loss 1.2641 LearningRate 0.0001 Epoch: 28 Global Step: 48940 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-03-04 08:41:26,506-Speed 13760.13 samples/sec Loss 1.2560 LearningRate 0.0001 Epoch: 28 Global Step: 48950 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-03-04 08:41:44,385-Speed 13746.58 samples/sec Loss 1.2610 LearningRate 0.0001 Epoch: 28 Global Step: 48960 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-03-04 08:42:02,163-Speed 13825.00 samples/sec Loss 1.2667 LearningRate 0.0001 Epoch: 28 Global Step: 48970 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-03-04 08:42:19,966-Speed 13805.33 samples/sec Loss 1.2561 LearningRate 0.0001 Epoch: 28 Global Step: 48980 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-03-04 08:42:37,711-Speed 13850.24 samples/sec Loss 1.2613 LearningRate 0.0001 Epoch: 28 Global Step: 48990 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-03-04 08:42:55,504-Speed 13812.92 samples/sec Loss 1.2673 LearningRate 0.0001 Epoch: 28 Global Step: 49000 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-03-04 08:43:13,308-Speed 13805.83 samples/sec Loss 1.2763 LearningRate 0.0001 Epoch: 28 Global Step: 49010 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-03-04 08:43:31,028-Speed 13869.31 samples/sec Loss 1.2628 LearningRate 0.0001 Epoch: 28 Global Step: 49020 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-03-04 08:43:48,821-Speed 13813.29 samples/sec Loss 1.2684 LearningRate 0.0001 Epoch: 28 Global Step: 49030 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-04 08:44:06,610-Speed 13816.81 samples/sec Loss 1.2712 LearningRate 0.0001 Epoch: 28 Global Step: 49040 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-04 08:44:24,363-Speed 13844.50 samples/sec Loss 1.2592 LearningRate 0.0001 Epoch: 28 Global Step: 49050 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-04 08:44:42,104-Speed 13853.26 samples/sec Loss 1.2703 LearningRate 0.0001 Epoch: 28 Global Step: 49060 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-04 08:44:59,795-Speed 13892.85 samples/sec Loss 1.2651 LearningRate 0.0001 Epoch: 28 Global Step: 49070 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-04 08:45:17,577-Speed 13821.83 samples/sec Loss 1.2575 LearningRate 0.0001 Epoch: 28 Global Step: 49080 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-04 08:45:35,318-Speed 13853.79 samples/sec Loss 1.2661 LearningRate 0.0001 Epoch: 28 Global Step: 49090 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-04 08:45:53,066-Speed 13848.00 samples/sec Loss 1.2632 LearningRate 0.0001 Epoch: 28 Global Step: 49100 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-04 08:46:10,789-Speed 13868.01 samples/sec Loss 1.2583 LearningRate 0.0001 Epoch: 28 Global Step: 49110 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-04 08:46:28,485-Speed 13888.92 samples/sec Loss 1.2626 LearningRate 0.0001 Epoch: 28 Global Step: 49120 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-03-04 08:46:46,212-Speed 13864.72 samples/sec Loss 1.2530 LearningRate 0.0001 Epoch: 28 Global Step: 49130 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-03-04 08:47:04,118-Speed 13725.28 samples/sec Loss 1.2615 LearningRate 0.0001 Epoch: 28 Global Step: 49140 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-03-04 08:47:21,862-Speed 13851.25 samples/sec Loss 1.2569 LearningRate 0.0001 Epoch: 28 Global Step: 49150 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-03-04 08:47:39,600-Speed 13856.13 samples/sec Loss 1.2593 LearningRate 0.0001 Epoch: 28 Global Step: 49160 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-03-04 08:47:57,470-Speed 13753.68 samples/sec Loss 1.2659 LearningRate 0.0001 Epoch: 28 Global Step: 49170 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-03-04 08:48:15,241-Speed 13829.83 samples/sec Loss 1.2567 LearningRate 0.0001 Epoch: 28 Global Step: 49180 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-03-04 08:48:32,983-Speed 13852.96 samples/sec Loss 1.2587 LearningRate 0.0001 Epoch: 28 Global Step: 49190 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-03-04 08:48:50,712-Speed 13863.33 samples/sec Loss 1.2549 LearningRate 0.0001 Epoch: 28 Global Step: 49200 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-03-04 08:49:08,426-Speed 13874.51 samples/sec Loss 1.2561 LearningRate 0.0001 Epoch: 28 Global Step: 49210 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-03-04 08:49:26,188-Speed 13836.62 samples/sec Loss 1.2495 LearningRate 0.0001 Epoch: 28 Global Step: 49220 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-04 08:49:43,982-Speed 13812.70 samples/sec Loss 1.2524 LearningRate 0.0001 Epoch: 28 Global Step: 49230 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-04 08:50:01,725-Speed 13852.19 samples/sec Loss 1.2432 LearningRate 0.0001 Epoch: 28 Global Step: 49240 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-04 08:50:19,438-Speed 13875.39 samples/sec Loss 1.2487 LearningRate 0.0001 Epoch: 28 Global Step: 49250 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-04 08:50:37,170-Speed 13860.57 samples/sec Loss 1.2469 LearningRate 0.0001 Epoch: 28 Global Step: 49260 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-04 08:50:54,846-Speed 13904.30 samples/sec Loss 1.2528 LearningRate 0.0001 Epoch: 28 Global Step: 49270 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-04 08:51:12,705-Speed 13761.75 samples/sec Loss 1.2442 LearningRate 0.0001 Epoch: 28 Global Step: 49280 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-04 08:51:30,522-Speed 13795.25 samples/sec Loss 1.2512 LearningRate 0.0001 Epoch: 28 Global Step: 49290 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-04 08:51:48,273-Speed 13845.75 samples/sec Loss 1.2433 LearningRate 0.0001 Epoch: 28 Global Step: 49300 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-04 08:52:06,028-Speed 13842.45 samples/sec Loss 1.2446 LearningRate 0.0001 Epoch: 28 Global Step: 49310 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-04 08:52:23,660-Speed 13939.20 samples/sec Loss 1.2520 LearningRate 0.0001 Epoch: 28 Global Step: 49320 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-04 08:52:41,511-Speed 13768.10 samples/sec Loss 1.2465 LearningRate 0.0001 Epoch: 28 Global Step: 49330 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-04 08:52:59,184-Speed 13907.10 samples/sec Loss 1.2549 LearningRate 0.0001 Epoch: 28 Global Step: 49340 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-04 08:53:16,824-Speed 13932.78 samples/sec Loss 1.2553 LearningRate 0.0001 Epoch: 28 Global Step: 49350 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-04 08:53:34,522-Speed 13887.00 samples/sec Loss 1.2452 LearningRate 0.0001 Epoch: 28 Global Step: 49360 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-04 08:53:52,256-Speed 13859.42 samples/sec Loss 1.2550 LearningRate 0.0001 Epoch: 28 Global Step: 49370 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-04 08:54:09,987-Speed 13861.55 samples/sec Loss 1.2502 LearningRate 0.0001 Epoch: 28 Global Step: 49380 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-04 08:54:27,729-Speed 13852.24 samples/sec Loss 1.2469 LearningRate 0.0001 Epoch: 28 Global Step: 49390 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-04 08:54:45,409-Speed 13901.41 samples/sec Loss 1.2568 LearningRate 0.0001 Epoch: 28 Global Step: 49400 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-04 08:55:03,185-Speed 13827.64 samples/sec Loss 1.2509 LearningRate 0.0001 Epoch: 28 Global Step: 49410 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-04 08:55:20,888-Speed 13883.41 samples/sec Loss 1.2498 LearningRate 0.0001 Epoch: 28 Global Step: 49420 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-04 08:55:38,682-Speed 13812.50 samples/sec Loss 1.2550 LearningRate 0.0001 Epoch: 28 Global Step: 49430 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-04 08:55:56,424-Speed 13853.08 samples/sec Loss 1.2452 LearningRate 0.0001 Epoch: 28 Global Step: 49440 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-04 08:56:14,159-Speed 13858.71 samples/sec Loss 1.2565 LearningRate 0.0001 Epoch: 28 Global Step: 49450 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-04 08:56:31,948-Speed 13816.20 samples/sec Loss 1.2408 LearningRate 0.0001 Epoch: 28 Global Step: 49460 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-04 08:56:49,723-Speed 13827.28 samples/sec Loss 1.2583 LearningRate 0.0001 Epoch: 28 Global Step: 49470 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-04 08:57:07,441-Speed 13870.78 samples/sec Loss 1.2484 LearningRate 0.0001 Epoch: 28 Global Step: 49480 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-04 08:57:25,143-Speed 13884.32 samples/sec Loss 1.2398 LearningRate 0.0001 Epoch: 28 Global Step: 49490 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-04 08:57:42,838-Speed 13891.22 samples/sec Loss 1.2425 LearningRate 0.0001 Epoch: 28 Global Step: 49500 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-04 08:58:00,490-Speed 13922.96 samples/sec Loss 1.2498 LearningRate 0.0001 Epoch: 28 Global Step: 49510 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-04 08:58:18,187-Speed 13888.34 samples/sec Loss 1.2478 LearningRate 0.0001 Epoch: 28 Global Step: 49520 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-04 08:58:35,926-Speed 13854.78 samples/sec Loss 1.2466 LearningRate 0.0001 Epoch: 28 Global Step: 49530 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-03-04 08:58:53,744-Speed 13794.25 samples/sec Loss 1.2396 LearningRate 0.0001 Epoch: 28 Global Step: 49540 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-03-04 08:59:11,599-Speed 13764.68 samples/sec Loss 1.2459 LearningRate 0.0001 Epoch: 28 Global Step: 49550 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-03-04 08:59:29,338-Speed 13854.86 samples/sec Loss 1.2484 LearningRate 0.0001 Epoch: 28 Global Step: 49560 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-03-04 08:59:47,125-Speed 13817.84 samples/sec Loss 1.2501 LearningRate 0.0001 Epoch: 28 Global Step: 49570 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-03-04 09:00:04,917-Speed 13813.96 samples/sec Loss 1.2429 LearningRate 0.0001 Epoch: 28 Global Step: 49580 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-03-04 09:00:22,611-Speed 13890.86 samples/sec Loss 1.2473 LearningRate 0.0001 Epoch: 28 Global Step: 49590 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-03-04 09:00:40,301-Speed 13893.19 samples/sec Loss 1.2365 LearningRate 0.0001 Epoch: 28 Global Step: 49600 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-03-04 09:00:58,013-Speed 13876.46 samples/sec Loss 1.2529 LearningRate 0.0001 Epoch: 28 Global Step: 49610 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-03-04 09:01:15,831-Speed 13793.39 samples/sec Loss 1.2458 LearningRate 0.0001 Epoch: 28 Global Step: 49620 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-03-04 09:01:33,670-Speed 13777.34 samples/sec Loss 1.2342 LearningRate 0.0001 Epoch: 28 Global Step: 49630 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-04 09:01:51,395-Speed 13866.33 samples/sec Loss 1.2329 LearningRate 0.0001 Epoch: 28 Global Step: 49640 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-04 09:02:09,209-Speed 13796.25 samples/sec Loss 1.2375 LearningRate 0.0001 Epoch: 28 Global Step: 49650 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-04 09:02:26,981-Speed 13829.81 samples/sec Loss 1.2379 LearningRate 0.0001 Epoch: 28 Global Step: 49660 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-04 09:02:44,706-Speed 13865.96 samples/sec Loss 1.2441 LearningRate 0.0001 Epoch: 28 Global Step: 49670 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-04 09:03:02,424-Speed 13871.49 samples/sec Loss 1.2387 LearningRate 0.0001 Epoch: 28 Global Step: 49680 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-04 09:03:20,148-Speed 13866.68 samples/sec Loss 1.2383 LearningRate 0.0001 Epoch: 28 Global Step: 49690 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-04 09:03:37,965-Speed 13794.59 samples/sec Loss 1.2401 LearningRate 0.0001 Epoch: 28 Global Step: 49700 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-04 09:03:55,773-Speed 13801.68 samples/sec Loss 1.2385 LearningRate 0.0001 Epoch: 28 Global Step: 49710 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-04 09:04:13,605-Speed 13782.66 samples/sec Loss 1.2380 LearningRate 0.0001 Epoch: 28 Global Step: 49720 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-04 09:04:31,402-Speed 13810.16 samples/sec Loss 1.2451 LearningRate 0.0001 Epoch: 28 Global Step: 49730 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-03-04 09:04:49,077-Speed 13905.53 samples/sec Loss 1.2529 LearningRate 0.0001 Epoch: 28 Global Step: 49740 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-04 09:05:06,779-Speed 13884.11 samples/sec Loss 1.2408 LearningRate 0.0001 Epoch: 28 Global Step: 49750 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-04 09:05:24,515-Speed 13857.60 samples/sec Loss 1.2402 LearningRate 0.0001 Epoch: 28 Global Step: 49760 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-04 09:05:42,244-Speed 13862.77 samples/sec Loss 1.2442 LearningRate 0.0001 Epoch: 28 Global Step: 49770 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-04 09:05:59,950-Speed 13880.71 samples/sec Loss 1.2418 LearningRate 0.0001 Epoch: 28 Global Step: 49780 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-04 09:06:17,711-Speed 13838.33 samples/sec Loss 1.2369 LearningRate 0.0001 Epoch: 28 Global Step: 49790 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-04 09:06:35,463-Speed 13844.43 samples/sec Loss 1.2310 LearningRate 0.0001 Epoch: 28 Global Step: 49800 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-04 09:06:53,239-Speed 13826.41 samples/sec Loss 1.2396 LearningRate 0.0001 Epoch: 28 Global Step: 49810 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-04 09:07:10,954-Speed 13874.24 samples/sec Loss 1.2517 LearningRate 0.0001 Epoch: 28 Global Step: 49820 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-04 09:07:28,805-Speed 13768.47 samples/sec Loss 1.2390 LearningRate 0.0001 Epoch: 28 Global Step: 49830 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-04 09:07:46,523-Speed 13871.72 samples/sec Loss 1.2395 LearningRate 0.0001 Epoch: 28 Global Step: 49840 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-04 09:08:04,376-Speed 13766.26 samples/sec Loss 1.2343 LearningRate 0.0001 Epoch: 28 Global Step: 49850 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-04 09:08:22,200-Speed 13788.41 samples/sec Loss 1.2385 LearningRate 0.0001 Epoch: 28 Global Step: 49860 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-04 09:08:39,991-Speed 13815.14 samples/sec Loss 1.2451 LearningRate 0.0001 Epoch: 28 Global Step: 49870 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-04 09:08:57,717-Speed 13867.13 samples/sec Loss 1.2408 LearningRate 0.0001 Epoch: 28 Global Step: 49880 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-03-04 09:09:15,474-Speed 13841.62 samples/sec Loss 1.2300 LearningRate 0.0001 Epoch: 28 Global Step: 49890 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-03-04 09:09:33,264-Speed 13815.39 samples/sec Loss 1.2450 LearningRate 0.0001 Epoch: 28 Global Step: 49900 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-03-04 09:09:51,024-Speed 13838.36 samples/sec Loss 1.2361 LearningRate 0.0001 Epoch: 28 Global Step: 49910 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-03-04 09:10:08,826-Speed 13806.76 samples/sec Loss 1.2439 LearningRate 0.0001 Epoch: 28 Global Step: 49920 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-03-04 09:10:26,676-Speed 13768.86 samples/sec Loss 1.2325 LearningRate 0.0001 Epoch: 28 Global Step: 49930 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-03-04 09:10:44,736-Speed 13608.56 samples/sec Loss 1.2297 LearningRate 0.0001 Epoch: 28 Global Step: 49940 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-03-04 09:11:02,809-Speed 13599.46 samples/sec Loss 1.2309 LearningRate 0.0001 Epoch: 28 Global Step: 49950 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-03-04 09:11:20,895-Speed 13588.87 samples/sec Loss 1.2326 LearningRate 0.0001 Epoch: 28 Global Step: 49960 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-03-04 09:11:38,958-Speed 13606.54 samples/sec Loss 1.2360 LearningRate 0.0001 Epoch: 28 Global Step: 49970 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-03-04 09:11:57,065-Speed 13573.84 samples/sec Loss 1.2404 LearningRate 0.0001 Epoch: 28 Global Step: 49980 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-04 09:12:15,170-Speed 13575.08 samples/sec Loss 1.2311 LearningRate 0.0001 Epoch: 28 Global Step: 49990 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-04 09:12:33,304-Speed 13553.86 samples/sec Loss 1.2303 LearningRate 0.0001 Epoch: 28 Global Step: 50000 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-04 09:12:51,446-Speed 13547.25 samples/sec Loss 1.2323 LearningRate 0.0001 Epoch: 28 Global Step: 50010 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-03-04 09:13:09,534-Speed 13587.17 samples/sec Loss 1.2311 LearningRate 0.0001 Epoch: 28 Global Step: 50020 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-03-04 09:13:27,639-Speed 13574.88 samples/sec Loss 1.2345 LearningRate 0.0001 Epoch: 28 Global Step: 50030 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-03-04 09:13:45,667-Speed 13633.51 samples/sec Loss 1.2346 LearningRate 0.0001 Epoch: 28 Global Step: 50040 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-03-04 09:14:03,717-Speed 13616.68 samples/sec Loss 1.2416 LearningRate 0.0001 Epoch: 28 Global Step: 50050 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-03-04 09:14:21,825-Speed 13572.06 samples/sec Loss 1.2349 LearningRate 0.0001 Epoch: 28 Global Step: 50060 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-03-04 09:14:39,909-Speed 13591.26 samples/sec Loss 1.2424 LearningRate 0.0001 Epoch: 28 Global Step: 50070 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-03-04 09:14:57,679-Speed 13830.43 samples/sec Loss 1.2398 LearningRate 0.0001 Epoch: 28 Global Step: 50080 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-03-04 09:15:15,608-Speed 13708.59 samples/sec Loss 1.2396 LearningRate 0.0001 Epoch: 28 Global Step: 50090 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-03-04 09:15:33,436-Speed 13785.40 samples/sec Loss 1.2355 LearningRate 0.0001 Epoch: 28 Global Step: 50100 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-03-04 09:15:51,233-Speed 13809.91 samples/sec Loss 1.2494 LearningRate 0.0001 Epoch: 28 Global Step: 50110 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-03-04 09:16:09,065-Speed 13783.14 samples/sec Loss 1.2481 LearningRate 0.0001 Epoch: 28 Global Step: 50120 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-03-04 09:17:16,119-Speed 3665.21 samples/sec Loss 1.2336 LearningRate 0.0001 Epoch: 29 Global Step: 50130 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-03-04 09:17:33,827-Speed 13879.38 samples/sec Loss 1.2270 LearningRate 0.0001 Epoch: 29 Global Step: 50140 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-03-04 09:17:51,691-Speed 13757.83 samples/sec Loss 1.2248 LearningRate 0.0001 Epoch: 29 Global Step: 50150 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-03-04 09:18:09,614-Speed 13713.08 samples/sec Loss 1.2317 LearningRate 0.0001 Epoch: 29 Global Step: 50160 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-03-04 09:18:27,489-Speed 13750.02 samples/sec Loss 1.2304 LearningRate 0.0001 Epoch: 29 Global Step: 50170 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-03-04 09:18:45,419-Speed 13706.97 samples/sec Loss 1.2281 LearningRate 0.0001 Epoch: 29 Global Step: 50180 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-03-04 09:19:03,196-Speed 13825.55 samples/sec Loss 1.2160 LearningRate 0.0001 Epoch: 29 Global Step: 50190 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-03-04 09:19:21,020-Speed 13789.35 samples/sec Loss 1.2256 LearningRate 0.0001 Epoch: 29 Global Step: 50200 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-03-04 09:19:38,715-Speed 13889.99 samples/sec Loss 1.2193 LearningRate 0.0001 Epoch: 29 Global Step: 50210 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-03-04 09:19:56,474-Speed 13839.19 samples/sec Loss 1.2249 LearningRate 0.0001 Epoch: 29 Global Step: 50220 Fp16 Grad Scale: 8192 Required: 10 hours Training: 2022-03-04 09:20:14,199-Speed 13866.29 samples/sec Loss 1.2261 LearningRate 0.0001 Epoch: 29 Global Step: 50230 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-03-04 09:20:31,936-Speed 13855.97 samples/sec Loss 1.2292 LearningRate 0.0001 Epoch: 29 Global Step: 50240 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-03-04 09:20:49,661-Speed 13866.90 samples/sec Loss 1.2329 LearningRate 0.0001 Epoch: 29 Global Step: 50250 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-03-04 09:21:07,369-Speed 13879.75 samples/sec Loss 1.2163 LearningRate 0.0001 Epoch: 29 Global Step: 50260 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-03-04 09:21:25,156-Speed 13817.56 samples/sec Loss 1.2113 LearningRate 0.0001 Epoch: 29 Global Step: 50270 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-03-04 09:21:42,857-Speed 13884.39 samples/sec Loss 1.2248 LearningRate 0.0001 Epoch: 29 Global Step: 50280 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-03-04 09:22:00,618-Speed 13838.40 samples/sec Loss 1.2162 LearningRate 0.0001 Epoch: 29 Global Step: 50290 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-03-04 09:22:18,360-Speed 13853.43 samples/sec Loss 1.2189 LearningRate 0.0001 Epoch: 29 Global Step: 50300 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-03-04 09:22:36,057-Speed 13887.75 samples/sec Loss 1.2270 LearningRate 0.0001 Epoch: 29 Global Step: 50310 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-03-04 09:22:53,810-Speed 13845.61 samples/sec Loss 1.2181 LearningRate 0.0001 Epoch: 29 Global Step: 50320 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-03-04 09:23:11,515-Speed 13882.15 samples/sec Loss 1.2305 LearningRate 0.0001 Epoch: 29 Global Step: 50330 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-03-04 09:23:29,253-Speed 13855.45 samples/sec Loss 1.2253 LearningRate 0.0001 Epoch: 29 Global Step: 50340 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-03-04 09:23:47,091-Speed 13778.49 samples/sec Loss 1.2237 LearningRate 0.0001 Epoch: 29 Global Step: 50350 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-03-04 09:24:04,841-Speed 13846.39 samples/sec Loss 1.2115 LearningRate 0.0001 Epoch: 29 Global Step: 50360 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-03-04 09:24:22,625-Speed 13819.70 samples/sec Loss 1.2257 LearningRate 0.0001 Epoch: 29 Global Step: 50370 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-03-04 09:24:40,500-Speed 13750.08 samples/sec Loss 1.2288 LearningRate 0.0001 Epoch: 29 Global Step: 50380 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-03-04 09:24:58,174-Speed 13906.20 samples/sec Loss 1.2229 LearningRate 0.0001 Epoch: 29 Global Step: 50390 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-03-04 09:25:15,918-Speed 13850.43 samples/sec Loss 1.2287 LearningRate 0.0001 Epoch: 29 Global Step: 50400 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-03-04 09:25:33,672-Speed 13844.35 samples/sec Loss 1.2269 LearningRate 0.0001 Epoch: 29 Global Step: 50410 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-03-04 09:25:51,433-Speed 13838.42 samples/sec Loss 1.2202 LearningRate 0.0001 Epoch: 29 Global Step: 50420 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-03-04 09:26:09,404-Speed 13676.05 samples/sec Loss 1.2276 LearningRate 0.0001 Epoch: 29 Global Step: 50430 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-03-04 09:26:27,535-Speed 13555.70 samples/sec Loss 1.2229 LearningRate 0.0001 Epoch: 29 Global Step: 50440 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-03-04 09:26:45,525-Speed 13661.76 samples/sec Loss 1.2248 LearningRate 0.0001 Epoch: 29 Global Step: 50450 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-03-04 09:27:03,541-Speed 13641.88 samples/sec Loss 1.2178 LearningRate 0.0001 Epoch: 29 Global Step: 50460 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-03-04 09:27:21,499-Speed 13686.53 samples/sec Loss 1.2235 LearningRate 0.0001 Epoch: 29 Global Step: 50470 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-03-04 09:27:39,223-Speed 13866.81 samples/sec Loss 1.2233 LearningRate 0.0001 Epoch: 29 Global Step: 50480 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-03-04 09:27:56,969-Speed 13849.19 samples/sec Loss 1.2254 LearningRate 0.0001 Epoch: 29 Global Step: 50490 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-03-04 09:28:14,707-Speed 13856.09 samples/sec Loss 1.2171 LearningRate 0.0001 Epoch: 29 Global Step: 50500 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-03-04 09:28:32,447-Speed 13854.41 samples/sec Loss 1.2258 LearningRate 0.0001 Epoch: 29 Global Step: 50510 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-03-04 09:28:50,180-Speed 13860.20 samples/sec Loss 1.2340 LearningRate 0.0001 Epoch: 29 Global Step: 50520 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-04 09:29:07,940-Speed 13838.56 samples/sec Loss 1.2141 LearningRate 0.0001 Epoch: 29 Global Step: 50530 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-04 09:29:25,734-Speed 13812.46 samples/sec Loss 1.2253 LearningRate 0.0001 Epoch: 29 Global Step: 50540 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-04 09:29:43,409-Speed 13905.31 samples/sec Loss 1.2214 LearningRate 0.0001 Epoch: 29 Global Step: 50550 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-04 09:30:01,117-Speed 13879.56 samples/sec Loss 1.2159 LearningRate 0.0001 Epoch: 29 Global Step: 50560 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-04 09:30:19,061-Speed 13696.61 samples/sec Loss 1.2266 LearningRate 0.0001 Epoch: 29 Global Step: 50570 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-04 09:30:36,835-Speed 13827.82 samples/sec Loss 1.2202 LearningRate 0.0001 Epoch: 29 Global Step: 50580 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-03-04 09:30:54,665-Speed 13785.17 samples/sec Loss 1.2201 LearningRate 0.0001 Epoch: 29 Global Step: 50590 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-03-04 09:31:12,394-Speed 13862.50 samples/sec Loss 1.2133 LearningRate 0.0001 Epoch: 29 Global Step: 50600 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-03-04 09:31:30,208-Speed 13796.59 samples/sec Loss 1.2227 LearningRate 0.0001 Epoch: 29 Global Step: 50610 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-03-04 09:31:48,041-Speed 13782.11 samples/sec Loss 1.2167 LearningRate 0.0001 Epoch: 29 Global Step: 50620 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-03-04 09:32:05,788-Speed 13849.69 samples/sec Loss 1.2228 LearningRate 0.0001 Epoch: 29 Global Step: 50630 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-03-04 09:32:23,525-Speed 13856.13 samples/sec Loss 1.2177 LearningRate 0.0001 Epoch: 29 Global Step: 50640 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-03-04 09:32:41,223-Speed 13888.94 samples/sec Loss 1.2164 LearningRate 0.0001 Epoch: 29 Global Step: 50650 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-03-04 09:32:59,031-Speed 13837.88 samples/sec Loss 1.2155 LearningRate 0.0001 Epoch: 29 Global Step: 50660 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-03-04 09:33:16,775-Speed 13921.89 samples/sec Loss 1.2132 LearningRate 0.0001 Epoch: 29 Global Step: 50670 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-03-04 09:33:34,539-Speed 13910.14 samples/sec Loss 1.2151 LearningRate 0.0001 Epoch: 29 Global Step: 50680 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-03-04 09:33:52,286-Speed 13849.04 samples/sec Loss 1.2164 LearningRate 0.0001 Epoch: 29 Global Step: 50690 Fp16 Grad Scale: 8192 Required: 9 hours Training: 2022-03-04 09:34:10,014-Speed 13887.25 samples/sec Loss 1.2161 LearningRate 0.0001 Epoch: 29 Global Step: 50700 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-03-04 09:34:27,755-Speed 13900.53 samples/sec Loss 1.2077 LearningRate 0.0001 Epoch: 29 Global Step: 50710 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-03-04 09:34:45,484-Speed 13911.92 samples/sec Loss 1.2107 LearningRate 0.0001 Epoch: 29 Global Step: 50720 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-03-04 09:35:03,176-Speed 13892.26 samples/sec Loss 1.2155 LearningRate 0.0001 Epoch: 29 Global Step: 50730 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-03-04 09:35:20,926-Speed 13907.99 samples/sec Loss 1.2160 LearningRate 0.0001 Epoch: 29 Global Step: 50740 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-03-04 09:35:38,718-Speed 13816.01 samples/sec Loss 1.2045 LearningRate 0.0001 Epoch: 29 Global Step: 50750 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-03-04 09:35:56,780-Speed 13899.64 samples/sec Loss 1.2133 LearningRate 0.0001 Epoch: 29 Global Step: 50760 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-03-04 09:36:14,748-Speed 13775.92 samples/sec Loss 1.2162 LearningRate 0.0001 Epoch: 29 Global Step: 50770 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-03-04 09:36:32,481-Speed 13876.72 samples/sec Loss 1.2164 LearningRate 0.0001 Epoch: 29 Global Step: 50780 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-03-04 09:36:50,227-Speed 13899.38 samples/sec Loss 1.2174 LearningRate 0.0001 Epoch: 29 Global Step: 50790 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-03-04 09:37:07,955-Speed 13883.01 samples/sec Loss 1.2193 LearningRate 0.0001 Epoch: 29 Global Step: 50800 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-04 09:37:25,710-Speed 13843.20 samples/sec Loss 1.2162 LearningRate 0.0001 Epoch: 29 Global Step: 50810 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-04 09:37:43,462-Speed 13924.40 samples/sec Loss 1.2179 LearningRate 0.0001 Epoch: 29 Global Step: 50820 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-04 09:38:01,369-Speed 13771.49 samples/sec Loss 1.2059 LearningRate 0.0001 Epoch: 29 Global Step: 50830 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-04 09:38:19,004-Speed 13940.30 samples/sec Loss 1.2191 LearningRate 0.0001 Epoch: 29 Global Step: 50840 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-04 09:38:36,837-Speed 13874.41 samples/sec Loss 1.2097 LearningRate 0.0001 Epoch: 29 Global Step: 50850 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-04 09:38:54,686-Speed 13822.37 samples/sec Loss 1.2052 LearningRate 0.0001 Epoch: 29 Global Step: 50860 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-04 09:39:12,634-Speed 13923.93 samples/sec Loss 1.2066 LearningRate 0.0001 Epoch: 29 Global Step: 50870 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-04 09:39:30,462-Speed 13879.44 samples/sec Loss 1.2112 LearningRate 0.0001 Epoch: 29 Global Step: 50880 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-04 09:39:48,296-Speed 13812.57 samples/sec Loss 1.2062 LearningRate 0.0001 Epoch: 29 Global Step: 50890 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-04 09:40:06,114-Speed 13794.26 samples/sec Loss 1.2174 LearningRate 0.0001 Epoch: 29 Global Step: 50900 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-04 09:40:23,892-Speed 13845.42 samples/sec Loss 1.2113 LearningRate 0.0001 Epoch: 29 Global Step: 50910 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-04 09:40:41,664-Speed 13829.10 samples/sec Loss 1.2128 LearningRate 0.0001 Epoch: 29 Global Step: 50920 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-04 09:40:59,529-Speed 13860.35 samples/sec Loss 1.2106 LearningRate 0.0001 Epoch: 29 Global Step: 50930 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-04 09:41:17,350-Speed 13823.57 samples/sec Loss 1.2046 LearningRate 0.0001 Epoch: 29 Global Step: 50940 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-04 09:41:35,031-Speed 13917.85 samples/sec Loss 1.2021 LearningRate 0.0001 Epoch: 29 Global Step: 50950 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-04 09:41:52,870-Speed 13857.71 samples/sec Loss 1.2142 LearningRate 0.0001 Epoch: 29 Global Step: 50960 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-04 09:42:10,570-Speed 13913.57 samples/sec Loss 1.1990 LearningRate 0.0001 Epoch: 29 Global Step: 50970 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-04 09:42:28,234-Speed 13931.59 samples/sec Loss 1.2051 LearningRate 0.0001 Epoch: 29 Global Step: 50980 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-04 09:42:46,053-Speed 13815.75 samples/sec Loss 1.1991 LearningRate 0.0001 Epoch: 29 Global Step: 50990 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-04 09:43:03,798-Speed 13850.19 samples/sec Loss 1.1974 LearningRate 0.0001 Epoch: 29 Global Step: 51000 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-04 09:43:21,539-Speed 13854.40 samples/sec Loss 1.2038 LearningRate 0.0001 Epoch: 29 Global Step: 51010 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-04 09:43:39,429-Speed 13819.31 samples/sec Loss 1.2054 LearningRate 0.0001 Epoch: 29 Global Step: 51020 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-04 09:43:57,167-Speed 13898.69 samples/sec Loss 1.2027 LearningRate 0.0001 Epoch: 29 Global Step: 51030 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-04 09:44:14,972-Speed 13804.00 samples/sec Loss 1.2078 LearningRate 0.0001 Epoch: 29 Global Step: 51040 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-04 09:44:32,836-Speed 13771.45 samples/sec Loss 1.2025 LearningRate 0.0001 Epoch: 29 Global Step: 51050 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-04 09:44:50,851-Speed 13696.37 samples/sec Loss 1.1955 LearningRate 0.0001 Epoch: 29 Global Step: 51060 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-04 09:45:08,818-Speed 13735.41 samples/sec Loss 1.2105 LearningRate 0.0001 Epoch: 29 Global Step: 51070 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-04 09:45:26,599-Speed 13834.15 samples/sec Loss 1.2075 LearningRate 0.0001 Epoch: 29 Global Step: 51080 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-04 09:45:44,408-Speed 13863.98 samples/sec Loss 1.1961 LearningRate 0.0001 Epoch: 29 Global Step: 51090 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-04 09:46:02,166-Speed 13840.19 samples/sec Loss 1.2012 LearningRate 0.0001 Epoch: 29 Global Step: 51100 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-04 09:46:19,903-Speed 13875.16 samples/sec Loss 1.2124 LearningRate 0.0001 Epoch: 29 Global Step: 51110 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-04 09:46:37,664-Speed 13841.51 samples/sec Loss 1.2017 LearningRate 0.0001 Epoch: 29 Global Step: 51120 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-04 09:46:55,467-Speed 13840.27 samples/sec Loss 1.2044 LearningRate 0.0001 Epoch: 29 Global Step: 51130 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-04 09:47:13,306-Speed 13840.93 samples/sec Loss 1.2089 LearningRate 0.0001 Epoch: 29 Global Step: 51140 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-04 09:47:31,229-Speed 13756.19 samples/sec Loss 1.2074 LearningRate 0.0001 Epoch: 29 Global Step: 51150 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-03-04 09:47:49,101-Speed 13799.36 samples/sec Loss 1.2010 LearningRate 0.0001 Epoch: 29 Global Step: 51160 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-03-04 09:48:06,962-Speed 13831.65 samples/sec Loss 1.1974 LearningRate 0.0001 Epoch: 29 Global Step: 51170 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-03-04 09:48:24,858-Speed 13778.81 samples/sec Loss 1.2043 LearningRate 0.0001 Epoch: 29 Global Step: 51180 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-03-04 09:48:42,793-Speed 13745.17 samples/sec Loss 1.2114 LearningRate 0.0001 Epoch: 29 Global Step: 51190 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-03-04 09:49:00,666-Speed 13813.44 samples/sec Loss 1.1995 LearningRate 0.0001 Epoch: 29 Global Step: 51200 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-03-04 09:49:18,492-Speed 13789.11 samples/sec Loss 1.2022 LearningRate 0.0001 Epoch: 29 Global Step: 51210 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-03-04 09:49:36,372-Speed 13763.30 samples/sec Loss 1.1937 LearningRate 0.0001 Epoch: 29 Global Step: 51220 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-03-04 09:49:54,283-Speed 13770.30 samples/sec Loss 1.1992 LearningRate 0.0001 Epoch: 29 Global Step: 51230 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-03-04 09:50:12,093-Speed 13810.25 samples/sec Loss 1.2040 LearningRate 0.0001 Epoch: 29 Global Step: 51240 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-03-04 09:50:29,976-Speed 13743.43 samples/sec Loss 1.2042 LearningRate 0.0001 Epoch: 29 Global Step: 51250 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-04 09:50:47,918-Speed 13712.38 samples/sec Loss 1.1970 LearningRate 0.0001 Epoch: 29 Global Step: 51260 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-04 09:51:05,895-Speed 13672.24 samples/sec Loss 1.2041 LearningRate 0.0001 Epoch: 29 Global Step: 51270 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-04 09:51:23,725-Speed 13815.59 samples/sec Loss 1.2036 LearningRate 0.0001 Epoch: 29 Global Step: 51280 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-04 09:51:41,545-Speed 13793.14 samples/sec Loss 1.2029 LearningRate 0.0001 Epoch: 29 Global Step: 51290 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-04 09:51:59,428-Speed 13769.58 samples/sec Loss 1.1980 LearningRate 0.0001 Epoch: 29 Global Step: 51300 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-04 09:52:17,386-Speed 13741.70 samples/sec Loss 1.1938 LearningRate 0.0001 Epoch: 29 Global Step: 51310 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-04 09:52:35,314-Speed 13710.23 samples/sec Loss 1.2005 LearningRate 0.0001 Epoch: 29 Global Step: 51320 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-04 09:52:53,147-Speed 13782.41 samples/sec Loss 1.2000 LearningRate 0.0001 Epoch: 29 Global Step: 51330 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-04 09:53:10,943-Speed 13810.30 samples/sec Loss 1.1943 LearningRate 0.0001 Epoch: 29 Global Step: 51340 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-04 09:53:28,709-Speed 13833.78 samples/sec Loss 1.1934 LearningRate 0.0001 Epoch: 29 Global Step: 51350 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-04 09:53:46,438-Speed 13864.19 samples/sec Loss 1.2017 LearningRate 0.0001 Epoch: 29 Global Step: 51360 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-04 09:54:04,229-Speed 13814.86 samples/sec Loss 1.1967 LearningRate 0.0001 Epoch: 29 Global Step: 51370 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-04 09:54:21,989-Speed 13838.91 samples/sec Loss 1.1840 LearningRate 0.0001 Epoch: 29 Global Step: 51380 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-04 09:54:39,789-Speed 13807.17 samples/sec Loss 1.1905 LearningRate 0.0001 Epoch: 29 Global Step: 51390 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-04 09:54:57,629-Speed 13777.15 samples/sec Loss 1.1952 LearningRate 0.0001 Epoch: 29 Global Step: 51400 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-04 09:55:15,324-Speed 13889.83 samples/sec Loss 1.1949 LearningRate 0.0001 Epoch: 29 Global Step: 51410 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-04 09:55:33,024-Speed 13885.68 samples/sec Loss 1.1939 LearningRate 0.0001 Epoch: 29 Global Step: 51420 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-04 09:55:50,859-Speed 13779.91 samples/sec Loss 1.1829 LearningRate 0.0001 Epoch: 29 Global Step: 51430 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-04 09:56:08,897-Speed 13625.88 samples/sec Loss 1.1921 LearningRate 0.0001 Epoch: 29 Global Step: 51440 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-04 09:56:26,968-Speed 13600.36 samples/sec Loss 1.1928 LearningRate 0.0001 Epoch: 29 Global Step: 51450 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-04 09:56:44,994-Speed 13635.04 samples/sec Loss 1.1958 LearningRate 0.0001 Epoch: 29 Global Step: 51460 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-04 09:57:03,042-Speed 13617.54 samples/sec Loss 1.1933 LearningRate 0.0001 Epoch: 29 Global Step: 51470 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-03-04 09:57:21,040-Speed 13655.91 samples/sec Loss 1.1880 LearningRate 0.0001 Epoch: 29 Global Step: 51480 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-04 09:57:39,089-Speed 13617.29 samples/sec Loss 1.1975 LearningRate 0.0001 Epoch: 29 Global Step: 51490 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-04 09:57:57,057-Speed 13678.26 samples/sec Loss 1.1849 LearningRate 0.0001 Epoch: 29 Global Step: 51500 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-03-04 09:58:15,073-Speed 13642.15 samples/sec Loss 1.1985 LearningRate 0.0001 Epoch: 29 Global Step: 51510 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-03-04 09:58:33,071-Speed 13655.75 samples/sec Loss 1.1899 LearningRate 0.0001 Epoch: 29 Global Step: 51520 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-03-04 09:58:50,821-Speed 13846.42 samples/sec Loss 1.1902 LearningRate 0.0001 Epoch: 29 Global Step: 51530 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-03-04 09:59:08,619-Speed 13809.06 samples/sec Loss 1.1910 LearningRate 0.0001 Epoch: 29 Global Step: 51540 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-03-04 09:59:26,394-Speed 13826.92 samples/sec Loss 1.1885 LearningRate 0.0001 Epoch: 29 Global Step: 51550 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-03-04 09:59:44,099-Speed 13881.85 samples/sec Loss 1.1923 LearningRate 0.0001 Epoch: 29 Global Step: 51560 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-03-04 10:00:01,780-Speed 13900.82 samples/sec Loss 1.1928 LearningRate 0.0001 Epoch: 29 Global Step: 51570 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-03-04 10:00:19,567-Speed 13817.47 samples/sec Loss 1.1917 LearningRate 0.0001 Epoch: 29 Global Step: 51580 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-03-04 10:00:37,292-Speed 13865.81 samples/sec Loss 1.2020 LearningRate 0.0001 Epoch: 29 Global Step: 51590 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-03-04 10:00:55,061-Speed 13831.92 samples/sec Loss 1.1979 LearningRate 0.0001 Epoch: 29 Global Step: 51600 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-04 10:01:12,764-Speed 13883.66 samples/sec Loss 1.1953 LearningRate 0.0001 Epoch: 29 Global Step: 51610 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-04 10:01:30,567-Speed 13804.95 samples/sec Loss 1.1963 LearningRate 0.0001 Epoch: 29 Global Step: 51620 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-03-04 10:01:48,274-Speed 13880.62 samples/sec Loss 1.1964 LearningRate 0.0001 Epoch: 29 Global Step: 51630 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-03-04 10:02:06,049-Speed 13826.37 samples/sec Loss 1.2030 LearningRate 0.0001 Epoch: 29 Global Step: 51640 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-03-04 10:02:23,819-Speed 13831.29 samples/sec Loss 1.1975 LearningRate 0.0001 Epoch: 29 Global Step: 51650 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-03-04 10:02:41,570-Speed 13846.04 samples/sec Loss 1.1912 LearningRate 0.0001 Epoch: 29 Global Step: 51660 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-03-04 10:02:59,355-Speed 13819.63 samples/sec Loss 1.1924 LearningRate 0.0001 Epoch: 29 Global Step: 51670 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-03-04 10:03:17,113-Speed 13840.13 samples/sec Loss 1.1923 LearningRate 0.0001 Epoch: 29 Global Step: 51680 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-03-04 10:03:34,844-Speed 13861.12 samples/sec Loss 1.1924 LearningRate 0.0001 Epoch: 29 Global Step: 51690 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-03-04 10:03:52,603-Speed 13839.86 samples/sec Loss 1.1972 LearningRate 0.0001 Epoch: 29 Global Step: 51700 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-03-04 10:04:10,315-Speed 13876.21 samples/sec Loss 1.1949 LearningRate 0.0001 Epoch: 29 Global Step: 51710 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-03-04 10:04:28,027-Speed 13876.30 samples/sec Loss 1.1924 LearningRate 0.0001 Epoch: 29 Global Step: 51720 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-04 10:04:45,723-Speed 13888.64 samples/sec Loss 1.1815 LearningRate 0.0001 Epoch: 29 Global Step: 51730 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-03-04 10:05:03,432-Speed 13878.46 samples/sec Loss 1.1726 LearningRate 0.0001 Epoch: 29 Global Step: 51740 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-03-04 10:05:21,184-Speed 13844.81 samples/sec Loss 1.1896 LearningRate 0.0001 Epoch: 29 Global Step: 51750 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-03-04 10:05:38,910-Speed 13865.28 samples/sec Loss 1.1955 LearningRate 0.0001 Epoch: 29 Global Step: 51760 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-03-04 10:05:56,646-Speed 13857.70 samples/sec Loss 1.1911 LearningRate 0.0001 Epoch: 29 Global Step: 51770 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-03-04 10:06:14,396-Speed 13846.91 samples/sec Loss 1.1935 LearningRate 0.0001 Epoch: 29 Global Step: 51780 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-03-04 10:06:32,105-Speed 13878.37 samples/sec Loss 1.1889 LearningRate 0.0001 Epoch: 29 Global Step: 51790 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-03-04 10:06:49,947-Speed 13774.57 samples/sec Loss 1.1960 LearningRate 0.0001 Epoch: 29 Global Step: 51800 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-03-04 10:07:07,682-Speed 13858.70 samples/sec Loss 1.1917 LearningRate 0.0001 Epoch: 29 Global Step: 51810 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-03-04 10:07:25,411-Speed 13863.78 samples/sec Loss 1.1958 LearningRate 0.0001 Epoch: 29 Global Step: 51820 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-03-04 10:07:43,150-Speed 13855.41 samples/sec Loss 1.1938 LearningRate 0.0001 Epoch: 29 Global Step: 51830 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-04 10:08:00,875-Speed 13865.95 samples/sec Loss 1.1924 LearningRate 0.0001 Epoch: 29 Global Step: 51840 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-04 10:09:08,129-Speed 3654.22 samples/sec Loss 1.1978 LearningRate 0.0001 Epoch: 30 Global Step: 51850 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-04 10:09:25,836-Speed 13880.65 samples/sec Loss 1.1830 LearningRate 0.0001 Epoch: 30 Global Step: 51860 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-03-04 10:09:43,480-Speed 13930.13 samples/sec Loss 1.1869 LearningRate 0.0001 Epoch: 30 Global Step: 51870 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-03-04 10:10:01,212-Speed 13860.40 samples/sec Loss 1.1782 LearningRate 0.0001 Epoch: 30 Global Step: 51880 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-03-04 10:10:18,941-Speed 13863.06 samples/sec Loss 1.1856 LearningRate 0.0001 Epoch: 30 Global Step: 51890 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-03-04 10:10:36,567-Speed 13944.31 samples/sec Loss 1.1851 LearningRate 0.0001 Epoch: 30 Global Step: 51900 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-03-04 10:10:54,365-Speed 13809.00 samples/sec Loss 1.1743 LearningRate 0.0001 Epoch: 30 Global Step: 51910 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-03-04 10:11:12,216-Speed 13768.54 samples/sec Loss 1.1897 LearningRate 0.0001 Epoch: 30 Global Step: 51920 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-03-04 10:11:29,908-Speed 13891.30 samples/sec Loss 1.1713 LearningRate 0.0001 Epoch: 30 Global Step: 51930 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-03-04 10:11:47,546-Speed 13934.70 samples/sec Loss 1.1845 LearningRate 0.0001 Epoch: 30 Global Step: 51940 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-03-04 10:12:05,310-Speed 13836.02 samples/sec Loss 1.1823 LearningRate 0.0001 Epoch: 30 Global Step: 51950 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-03-04 10:12:23,016-Speed 13881.04 samples/sec Loss 1.1790 LearningRate 0.0001 Epoch: 30 Global Step: 51960 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-04 10:12:40,734-Speed 13870.85 samples/sec Loss 1.1822 LearningRate 0.0001 Epoch: 30 Global Step: 51970 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-04 10:12:58,384-Speed 13925.47 samples/sec Loss 1.1755 LearningRate 0.0001 Epoch: 30 Global Step: 51980 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-04 10:13:16,052-Speed 13910.55 samples/sec Loss 1.1826 LearningRate 0.0001 Epoch: 30 Global Step: 51990 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-04 10:13:33,813-Speed 13838.65 samples/sec Loss 1.1784 LearningRate 0.0001 Epoch: 30 Global Step: 52000 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-04 10:13:51,564-Speed 13846.83 samples/sec Loss 1.1772 LearningRate 0.0001 Epoch: 30 Global Step: 52010 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-04 10:14:09,278-Speed 13874.79 samples/sec Loss 1.1775 LearningRate 0.0001 Epoch: 30 Global Step: 52020 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-04 10:14:27,046-Speed 13832.41 samples/sec Loss 1.1749 LearningRate 0.0001 Epoch: 30 Global Step: 52030 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-04 10:14:44,733-Speed 13895.84 samples/sec Loss 1.1772 LearningRate 0.0001 Epoch: 30 Global Step: 52040 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-03-04 10:15:02,536-Speed 13805.90 samples/sec Loss 1.1660 LearningRate 0.0001 Epoch: 30 Global Step: 52050 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-03-04 10:15:20,260-Speed 13866.17 samples/sec Loss 1.1751 LearningRate 0.0001 Epoch: 30 Global Step: 52060 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-03-04 10:15:38,017-Speed 13841.52 samples/sec Loss 1.1817 LearningRate 0.0001 Epoch: 30 Global Step: 52070 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-03-04 10:15:55,740-Speed 13867.47 samples/sec Loss 1.1838 LearningRate 0.0001 Epoch: 30 Global Step: 52080 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-03-04 10:16:13,482-Speed 13853.20 samples/sec Loss 1.1826 LearningRate 0.0001 Epoch: 30 Global Step: 52090 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-03-04 10:16:31,224-Speed 13852.56 samples/sec Loss 1.1793 LearningRate 0.0001 Epoch: 30 Global Step: 52100 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-03-04 10:16:49,022-Speed 13808.86 samples/sec Loss 1.1810 LearningRate 0.0001 Epoch: 30 Global Step: 52110 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-03-04 10:17:06,767-Speed 13851.08 samples/sec Loss 1.1782 LearningRate 0.0001 Epoch: 30 Global Step: 52120 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-03-04 10:17:24,497-Speed 13862.06 samples/sec Loss 1.1746 LearningRate 0.0001 Epoch: 30 Global Step: 52130 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-03-04 10:17:42,281-Speed 13880.68 samples/sec Loss 1.1721 LearningRate 0.0001 Epoch: 30 Global Step: 52140 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-04 10:18:00,100-Speed 13883.98 samples/sec Loss 1.1790 LearningRate 0.0001 Epoch: 30 Global Step: 52150 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-04 10:18:17,956-Speed 13764.14 samples/sec Loss 1.1811 LearningRate 0.0001 Epoch: 30 Global Step: 52160 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-04 10:18:35,669-Speed 13875.21 samples/sec Loss 1.1840 LearningRate 0.0001 Epoch: 30 Global Step: 52170 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-04 10:18:53,374-Speed 13882.17 samples/sec Loss 1.1772 LearningRate 0.0001 Epoch: 30 Global Step: 52180 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-04 10:19:11,066-Speed 13892.01 samples/sec Loss 1.1874 LearningRate 0.0001 Epoch: 30 Global Step: 52190 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-04 10:19:28,782-Speed 13873.05 samples/sec Loss 1.1837 LearningRate 0.0001 Epoch: 30 Global Step: 52200 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-04 10:19:46,481-Speed 13886.57 samples/sec Loss 1.1823 LearningRate 0.0001 Epoch: 30 Global Step: 52210 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-03-04 10:20:04,207-Speed 13865.78 samples/sec Loss 1.1848 LearningRate 0.0001 Epoch: 30 Global Step: 52220 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-04 10:20:22,010-Speed 13805.21 samples/sec Loss 1.1640 LearningRate 0.0001 Epoch: 30 Global Step: 52230 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-04 10:20:39,698-Speed 13895.15 samples/sec Loss 1.1730 LearningRate 0.0001 Epoch: 30 Global Step: 52240 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-04 10:20:57,391-Speed 13890.65 samples/sec Loss 1.1799 LearningRate 0.0001 Epoch: 30 Global Step: 52250 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-04 10:21:15,094-Speed 13883.44 samples/sec Loss 1.1661 LearningRate 0.0001 Epoch: 30 Global Step: 52260 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-03-04 10:21:32,801-Speed 13879.62 samples/sec Loss 1.1779 LearningRate 0.0001 Epoch: 30 Global Step: 52270 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-03-04 10:21:50,577-Speed 13827.85 samples/sec Loss 1.1727 LearningRate 0.0001 Epoch: 30 Global Step: 52280 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-03-04 10:22:08,292-Speed 13873.44 samples/sec Loss 1.1775 LearningRate 0.0001 Epoch: 30 Global Step: 52290 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-03-04 10:22:26,100-Speed 13801.38 samples/sec Loss 1.1742 LearningRate 0.0001 Epoch: 30 Global Step: 52300 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-03-04 10:22:43,867-Speed 13833.24 samples/sec Loss 1.1766 LearningRate 0.0001 Epoch: 30 Global Step: 52310 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-03-04 10:23:01,516-Speed 13926.34 samples/sec Loss 1.1753 LearningRate 0.0001 Epoch: 30 Global Step: 52320 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-03-04 10:23:19,327-Speed 13799.01 samples/sec Loss 1.1790 LearningRate 0.0001 Epoch: 30 Global Step: 52330 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-03-04 10:23:37,176-Speed 13769.89 samples/sec Loss 1.1780 LearningRate 0.0001 Epoch: 30 Global Step: 52340 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-03-04 10:23:55,043-Speed 13755.24 samples/sec Loss 1.1676 LearningRate 0.0001 Epoch: 30 Global Step: 52350 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-03-04 10:24:12,852-Speed 13801.47 samples/sec Loss 1.1918 LearningRate 0.0001 Epoch: 30 Global Step: 52360 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-04 10:24:30,644-Speed 13813.59 samples/sec Loss 1.1777 LearningRate 0.0001 Epoch: 30 Global Step: 52370 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-04 10:24:48,533-Speed 13738.81 samples/sec Loss 1.1719 LearningRate 0.0001 Epoch: 30 Global Step: 52380 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-04 10:25:06,366-Speed 13782.59 samples/sec Loss 1.1677 LearningRate 0.0001 Epoch: 30 Global Step: 52390 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-04 10:25:24,119-Speed 13844.32 samples/sec Loss 1.1695 LearningRate 0.0001 Epoch: 30 Global Step: 52400 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-04 10:25:42,052-Speed 13705.62 samples/sec Loss 1.1788 LearningRate 0.0001 Epoch: 30 Global Step: 52410 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-04 10:25:59,953-Speed 13729.17 samples/sec Loss 1.1680 LearningRate 0.0001 Epoch: 30 Global Step: 52420 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-04 10:26:17,714-Speed 13838.22 samples/sec Loss 1.1763 LearningRate 0.0001 Epoch: 30 Global Step: 52430 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-04 10:26:35,561-Speed 13771.31 samples/sec Loss 1.1750 LearningRate 0.0001 Epoch: 30 Global Step: 52440 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-04 10:26:53,381-Speed 13792.00 samples/sec Loss 1.1729 LearningRate 0.0001 Epoch: 30 Global Step: 52450 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-04 10:27:11,206-Speed 13788.34 samples/sec Loss 1.1701 LearningRate 0.0001 Epoch: 30 Global Step: 52460 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-03-04 10:27:29,201-Speed 13657.73 samples/sec Loss 1.1633 LearningRate 0.0001 Epoch: 30 Global Step: 52470 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-03-04 10:27:47,314-Speed 13569.20 samples/sec Loss 1.1802 LearningRate 0.0001 Epoch: 30 Global Step: 52480 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-03-04 10:28:05,353-Speed 13624.97 samples/sec Loss 1.1681 LearningRate 0.0001 Epoch: 30 Global Step: 52490 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-03-04 10:28:23,478-Speed 13559.92 samples/sec Loss 1.1683 LearningRate 0.0001 Epoch: 30 Global Step: 52500 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-03-04 10:28:41,613-Speed 13552.64 samples/sec Loss 1.1661 LearningRate 0.0001 Epoch: 30 Global Step: 52510 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-03-04 10:28:59,680-Speed 13603.94 samples/sec Loss 1.1636 LearningRate 0.0001 Epoch: 30 Global Step: 52520 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-04 10:29:17,795-Speed 13567.31 samples/sec Loss 1.1628 LearningRate 0.0001 Epoch: 30 Global Step: 52530 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-04 10:29:35,936-Speed 13548.46 samples/sec Loss 1.1683 LearningRate 0.0001 Epoch: 30 Global Step: 52540 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-04 10:29:54,057-Speed 13562.46 samples/sec Loss 1.1648 LearningRate 0.0001 Epoch: 30 Global Step: 52550 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-04 10:30:12,201-Speed 13546.16 samples/sec Loss 1.1637 LearningRate 0.0001 Epoch: 30 Global Step: 52560 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-04 10:30:30,319-Speed 13565.00 samples/sec Loss 1.1658 LearningRate 0.0001 Epoch: 30 Global Step: 52570 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-04 10:30:48,439-Speed 13564.46 samples/sec Loss 1.1700 LearningRate 0.0001 Epoch: 30 Global Step: 52580 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-04 10:31:06,535-Speed 13581.09 samples/sec Loss 1.1697 LearningRate 0.0001 Epoch: 30 Global Step: 52590 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-04 10:31:24,623-Speed 13588.16 samples/sec Loss 1.1768 LearningRate 0.0001 Epoch: 30 Global Step: 52600 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-04 10:31:42,685-Speed 13607.62 samples/sec Loss 1.1670 LearningRate 0.0001 Epoch: 30 Global Step: 52610 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-04 10:32:00,739-Speed 13614.49 samples/sec Loss 1.1589 LearningRate 0.0001 Epoch: 30 Global Step: 52620 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-03-04 10:32:18,887-Speed 13542.87 samples/sec Loss 1.1808 LearningRate 0.0001 Epoch: 30 Global Step: 52630 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-03-04 10:32:36,937-Speed 13616.01 samples/sec Loss 1.1696 LearningRate 0.0001 Epoch: 30 Global Step: 52640 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-03-04 10:32:54,721-Speed 13820.24 samples/sec Loss 1.1684 LearningRate 0.0001 Epoch: 30 Global Step: 52650 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-03-04 10:33:12,637-Speed 13718.05 samples/sec Loss 1.1674 LearningRate 0.0001 Epoch: 30 Global Step: 52660 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-04 10:33:30,454-Speed 13794.79 samples/sec Loss 1.1616 LearningRate 0.0001 Epoch: 30 Global Step: 52670 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-04 10:33:48,180-Speed 13865.28 samples/sec Loss 1.1580 LearningRate 0.0001 Epoch: 30 Global Step: 52680 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-04 10:34:05,918-Speed 13855.43 samples/sec Loss 1.1629 LearningRate 0.0001 Epoch: 30 Global Step: 52690 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-04 10:34:23,936-Speed 13641.10 samples/sec Loss 1.1650 LearningRate 0.0001 Epoch: 30 Global Step: 52700 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-04 10:34:41,817-Speed 13745.22 samples/sec Loss 1.1571 LearningRate 0.0001 Epoch: 30 Global Step: 52710 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-04 10:34:59,673-Speed 13763.78 samples/sec Loss 1.1700 LearningRate 0.0001 Epoch: 30 Global Step: 52720 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-03-04 10:35:17,476-Speed 13806.32 samples/sec Loss 1.1658 LearningRate 0.0001 Epoch: 30 Global Step: 52730 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-03-04 10:35:35,214-Speed 13856.15 samples/sec Loss 1.1630 LearningRate 0.0001 Epoch: 30 Global Step: 52740 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-03-04 10:35:52,987-Speed 13828.29 samples/sec Loss 1.1534 LearningRate 0.0001 Epoch: 30 Global Step: 52750 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-03-04 10:36:10,702-Speed 13874.13 samples/sec Loss 1.1603 LearningRate 0.0001 Epoch: 30 Global Step: 52760 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-03-04 10:36:28,434-Speed 13860.44 samples/sec Loss 1.1539 LearningRate 0.0001 Epoch: 30 Global Step: 52770 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-03-04 10:36:46,186-Speed 13845.51 samples/sec Loss 1.1637 LearningRate 0.0001 Epoch: 30 Global Step: 52780 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-03-04 10:37:04,014-Speed 13785.53 samples/sec Loss 1.1608 LearningRate 0.0001 Epoch: 30 Global Step: 52790 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-03-04 10:37:21,826-Speed 13798.56 samples/sec Loss 1.1670 LearningRate 0.0001 Epoch: 30 Global Step: 52800 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-03-04 10:37:39,602-Speed 13825.87 samples/sec Loss 1.1571 LearningRate 0.0001 Epoch: 30 Global Step: 52810 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-03-04 10:37:57,309-Speed 13880.39 samples/sec Loss 1.1531 LearningRate 0.0001 Epoch: 30 Global Step: 52820 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-04 10:38:15,055-Speed 13849.71 samples/sec Loss 1.1610 LearningRate 0.0001 Epoch: 30 Global Step: 52830 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-04 10:38:32,771-Speed 13873.00 samples/sec Loss 1.1670 LearningRate 0.0001 Epoch: 30 Global Step: 52840 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-04 10:38:50,575-Speed 13804.50 samples/sec Loss 1.1638 LearningRate 0.0001 Epoch: 30 Global Step: 52850 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-04 10:39:08,419-Speed 13774.01 samples/sec Loss 1.1600 LearningRate 0.0001 Epoch: 30 Global Step: 52860 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-04 10:39:26,280-Speed 13760.09 samples/sec Loss 1.1608 LearningRate 0.0001 Epoch: 30 Global Step: 52870 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-04 10:39:44,024-Speed 13851.58 samples/sec Loss 1.1605 LearningRate 0.0001 Epoch: 30 Global Step: 52880 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-04 10:40:01,825-Speed 13806.84 samples/sec Loss 1.1549 LearningRate 0.0001 Epoch: 30 Global Step: 52890 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-04 10:40:19,643-Speed 13794.20 samples/sec Loss 1.1496 LearningRate 0.0001 Epoch: 30 Global Step: 52900 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-04 10:40:37,355-Speed 13875.79 samples/sec Loss 1.1656 LearningRate 0.0001 Epoch: 30 Global Step: 52910 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-04 10:40:55,061-Speed 13880.88 samples/sec Loss 1.1604 LearningRate 0.0001 Epoch: 30 Global Step: 52920 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-03-04 10:41:12,870-Speed 13800.68 samples/sec Loss 1.1582 LearningRate 0.0001 Epoch: 30 Global Step: 52930 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-03-04 10:41:30,577-Speed 13880.45 samples/sec Loss 1.1629 LearningRate 0.0001 Epoch: 30 Global Step: 52940 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-03-04 10:41:48,294-Speed 13874.11 samples/sec Loss 1.1566 LearningRate 0.0001 Epoch: 30 Global Step: 52950 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-04 10:42:06,093-Speed 13808.52 samples/sec Loss 1.1506 LearningRate 0.0001 Epoch: 30 Global Step: 52960 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-04 10:42:23,847-Speed 13843.24 samples/sec Loss 1.1567 LearningRate 0.0001 Epoch: 30 Global Step: 52970 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-04 10:42:41,643-Speed 13810.91 samples/sec Loss 1.1670 LearningRate 0.0001 Epoch: 30 Global Step: 52980 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-04 10:42:59,323-Speed 13900.97 samples/sec Loss 1.1554 LearningRate 0.0001 Epoch: 30 Global Step: 52990 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-03-04 10:43:17,093-Speed 13831.15 samples/sec Loss 1.1597 LearningRate 0.0001 Epoch: 30 Global Step: 53000 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-03-04 10:43:34,942-Speed 13769.63 samples/sec Loss 1.1554 LearningRate 0.0001 Epoch: 30 Global Step: 53010 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-03-04 10:43:52,700-Speed 13840.38 samples/sec Loss 1.1579 LearningRate 0.0001 Epoch: 30 Global Step: 53020 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-03-04 10:44:10,455-Speed 13842.80 samples/sec Loss 1.1618 LearningRate 0.0001 Epoch: 30 Global Step: 53030 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-03-04 10:44:28,202-Speed 13848.99 samples/sec Loss 1.1552 LearningRate 0.0001 Epoch: 30 Global Step: 53040 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-03-04 10:44:45,925-Speed 13867.41 samples/sec Loss 1.1601 LearningRate 0.0001 Epoch: 30 Global Step: 53050 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-03-04 10:45:03,721-Speed 13810.61 samples/sec Loss 1.1552 LearningRate 0.0001 Epoch: 30 Global Step: 53060 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-03-04 10:45:21,421-Speed 13886.12 samples/sec Loss 1.1454 LearningRate 0.0001 Epoch: 30 Global Step: 53070 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-03-04 10:45:39,232-Speed 13799.10 samples/sec Loss 1.1528 LearningRate 0.0001 Epoch: 30 Global Step: 53080 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-03-04 10:45:57,091-Speed 13762.34 samples/sec Loss 1.1630 LearningRate 0.0001 Epoch: 30 Global Step: 53090 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-04 10:46:14,874-Speed 13820.49 samples/sec Loss 1.1507 LearningRate 0.0001 Epoch: 30 Global Step: 53100 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-04 10:46:32,640-Speed 13834.05 samples/sec Loss 1.1552 LearningRate 0.0001 Epoch: 30 Global Step: 53110 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-04 10:46:50,342-Speed 13884.06 samples/sec Loss 1.1535 LearningRate 0.0001 Epoch: 30 Global Step: 53120 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-04 10:47:08,118-Speed 13827.38 samples/sec Loss 1.1431 LearningRate 0.0001 Epoch: 30 Global Step: 53130 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-04 10:47:25,984-Speed 13756.60 samples/sec Loss 1.1586 LearningRate 0.0001 Epoch: 30 Global Step: 53140 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-04 10:47:43,748-Speed 13835.63 samples/sec Loss 1.1460 LearningRate 0.0001 Epoch: 30 Global Step: 53150 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-04 10:48:01,459-Speed 13878.25 samples/sec Loss 1.1526 LearningRate 0.0001 Epoch: 30 Global Step: 53160 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-04 10:48:19,186-Speed 13864.39 samples/sec Loss 1.1586 LearningRate 0.0001 Epoch: 30 Global Step: 53170 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-04 10:48:36,981-Speed 13811.50 samples/sec Loss 1.1610 LearningRate 0.0001 Epoch: 30 Global Step: 53180 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-04 10:48:54,672-Speed 13892.60 samples/sec Loss 1.1468 LearningRate 0.0001 Epoch: 30 Global Step: 53190 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-03-04 10:49:12,364-Speed 13891.91 samples/sec Loss 1.1524 LearningRate 0.0001 Epoch: 30 Global Step: 53200 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-03-04 10:49:30,100-Speed 13857.56 samples/sec Loss 1.1469 LearningRate 0.0001 Epoch: 30 Global Step: 53210 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-03-04 10:49:47,831-Speed 13860.84 samples/sec Loss 1.1538 LearningRate 0.0001 Epoch: 30 Global Step: 53220 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-03-04 10:50:05,577-Speed 13849.67 samples/sec Loss 1.1557 LearningRate 0.0001 Epoch: 30 Global Step: 53230 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-03-04 10:50:23,329-Speed 13845.07 samples/sec Loss 1.1462 LearningRate 0.0001 Epoch: 30 Global Step: 53240 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-03-04 10:50:41,029-Speed 13885.62 samples/sec Loss 1.1445 LearningRate 0.0001 Epoch: 30 Global Step: 53250 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-03-04 10:50:58,755-Speed 13865.21 samples/sec Loss 1.1593 LearningRate 0.0001 Epoch: 30 Global Step: 53260 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-03-04 10:51:16,475-Speed 13870.00 samples/sec Loss 1.1537 LearningRate 0.0001 Epoch: 30 Global Step: 53270 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-03-04 10:51:34,215-Speed 13854.73 samples/sec Loss 1.1491 LearningRate 0.0001 Epoch: 30 Global Step: 53280 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-03-04 10:51:51,972-Speed 13841.48 samples/sec Loss 1.1529 LearningRate 0.0001 Epoch: 30 Global Step: 53290 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-04 10:52:09,770-Speed 13808.73 samples/sec Loss 1.1552 LearningRate 0.0001 Epoch: 30 Global Step: 53300 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-04 10:52:27,471-Speed 13884.99 samples/sec Loss 1.1490 LearningRate 0.0001 Epoch: 30 Global Step: 53310 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-04 10:52:45,152-Speed 13900.05 samples/sec Loss 1.1502 LearningRate 0.0001 Epoch: 30 Global Step: 53320 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-04 10:53:02,887-Speed 13858.57 samples/sec Loss 1.1544 LearningRate 0.0001 Epoch: 30 Global Step: 53330 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-04 10:53:20,614-Speed 13864.78 samples/sec Loss 1.1477 LearningRate 0.0001 Epoch: 30 Global Step: 53340 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-04 10:53:38,354-Speed 13854.08 samples/sec Loss 1.1599 LearningRate 0.0001 Epoch: 30 Global Step: 53350 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-04 10:53:56,099-Speed 13849.94 samples/sec Loss 1.1562 LearningRate 0.0001 Epoch: 30 Global Step: 53360 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-04 10:54:13,846-Speed 13849.77 samples/sec Loss 1.1539 LearningRate 0.0001 Epoch: 30 Global Step: 53370 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-04 10:54:31,546-Speed 13885.32 samples/sec Loss 1.1455 LearningRate 0.0001 Epoch: 30 Global Step: 53380 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-03-04 10:54:49,287-Speed 13854.24 samples/sec Loss 1.1471 LearningRate 0.0001 Epoch: 30 Global Step: 53390 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-03-04 10:55:07,027-Speed 13854.95 samples/sec Loss 1.1417 LearningRate 0.0001 Epoch: 30 Global Step: 53400 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-03-04 10:55:24,773-Speed 13850.86 samples/sec Loss 1.1462 LearningRate 0.0001 Epoch: 30 Global Step: 53410 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-03-04 10:55:42,560-Speed 13819.15 samples/sec Loss 1.1481 LearningRate 0.0001 Epoch: 30 Global Step: 53420 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-03-04 10:56:00,286-Speed 13866.06 samples/sec Loss 1.1365 LearningRate 0.0001 Epoch: 30 Global Step: 53430 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-03-04 10:56:18,006-Speed 13869.26 samples/sec Loss 1.1498 LearningRate 0.0001 Epoch: 30 Global Step: 53440 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-03-04 10:56:35,830-Speed 13789.57 samples/sec Loss 1.1501 LearningRate 0.0001 Epoch: 30 Global Step: 53450 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-03-04 10:56:53,519-Speed 13894.53 samples/sec Loss 1.1481 LearningRate 0.0001 Epoch: 30 Global Step: 53460 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-03-04 10:57:11,231-Speed 13876.43 samples/sec Loss 1.1536 LearningRate 0.0001 Epoch: 30 Global Step: 53470 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-03-04 10:57:28,947-Speed 13872.36 samples/sec Loss 1.1519 LearningRate 0.0001 Epoch: 30 Global Step: 53480 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-04 10:57:46,658-Speed 13877.65 samples/sec Loss 1.1535 LearningRate 0.0001 Epoch: 30 Global Step: 53490 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-04 10:58:04,359-Speed 13884.90 samples/sec Loss 1.1561 LearningRate 0.0001 Epoch: 30 Global Step: 53500 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-04 10:58:22,061-Speed 13884.14 samples/sec Loss 1.1413 LearningRate 0.0001 Epoch: 30 Global Step: 53510 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-04 10:58:39,793-Speed 13860.38 samples/sec Loss 1.1556 LearningRate 0.0001 Epoch: 30 Global Step: 53520 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-04 10:58:57,542-Speed 13847.06 samples/sec Loss 1.1490 LearningRate 0.0001 Epoch: 30 Global Step: 53530 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-04 10:59:15,267-Speed 13866.65 samples/sec Loss 1.1486 LearningRate 0.0001 Epoch: 30 Global Step: 53540 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-04 10:59:33,043-Speed 13826.31 samples/sec Loss 1.1454 LearningRate 0.0001 Epoch: 30 Global Step: 53550 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-04 10:59:50,750-Speed 13879.82 samples/sec Loss 1.1470 LearningRate 0.0001 Epoch: 30 Global Step: 53560 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-04 11:00:08,477-Speed 13864.44 samples/sec Loss 1.1474 LearningRate 0.0001 Epoch: 30 Global Step: 53570 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-04 11:01:16,099-Speed 3634.36 samples/sec Loss 1.1487 LearningRate 0.0001 Epoch: 31 Global Step: 53580 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-03-04 11:01:33,671-Speed 13989.27 samples/sec Loss 1.1449 LearningRate 0.0001 Epoch: 31 Global Step: 53590 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-04 11:01:51,325-Speed 13922.03 samples/sec Loss 1.1488 LearningRate 0.0001 Epoch: 31 Global Step: 53600 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-04 11:02:09,028-Speed 13883.22 samples/sec Loss 1.1331 LearningRate 0.0001 Epoch: 31 Global Step: 53610 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-04 11:02:26,702-Speed 13906.75 samples/sec Loss 1.1434 LearningRate 0.0001 Epoch: 31 Global Step: 53620 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-04 11:02:44,425-Speed 13866.99 samples/sec Loss 1.1379 LearningRate 0.0001 Epoch: 31 Global Step: 53630 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-04 11:03:02,021-Speed 13967.74 samples/sec Loss 1.1384 LearningRate 0.0001 Epoch: 31 Global Step: 53640 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-04 11:03:19,729-Speed 13879.40 samples/sec Loss 1.1456 LearningRate 0.0001 Epoch: 31 Global Step: 53650 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-04 11:03:37,391-Speed 13915.74 samples/sec Loss 1.1427 LearningRate 0.0001 Epoch: 31 Global Step: 53660 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-04 11:03:55,035-Speed 13929.69 samples/sec Loss 1.1372 LearningRate 0.0001 Epoch: 31 Global Step: 53670 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-04 11:04:12,654-Speed 13949.86 samples/sec Loss 1.1361 LearningRate 0.0001 Epoch: 31 Global Step: 53680 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-03-04 11:04:30,454-Speed 13807.50 samples/sec Loss 1.1332 LearningRate 0.0001 Epoch: 31 Global Step: 53690 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-03-04 11:04:48,161-Speed 13880.15 samples/sec Loss 1.1398 LearningRate 0.0001 Epoch: 31 Global Step: 53700 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-03-04 11:05:05,990-Speed 13786.59 samples/sec Loss 1.1411 LearningRate 0.0001 Epoch: 31 Global Step: 53710 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-03-04 11:05:23,814-Speed 13789.39 samples/sec Loss 1.1485 LearningRate 0.0001 Epoch: 31 Global Step: 53720 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-03-04 11:05:41,601-Speed 13817.46 samples/sec Loss 1.1319 LearningRate 0.0001 Epoch: 31 Global Step: 53730 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-03-04 11:05:59,269-Speed 13910.68 samples/sec Loss 1.1323 LearningRate 0.0001 Epoch: 31 Global Step: 53740 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-03-04 11:06:17,066-Speed 13810.13 samples/sec Loss 1.1391 LearningRate 0.0001 Epoch: 31 Global Step: 53750 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-03-04 11:06:34,801-Speed 13859.09 samples/sec Loss 1.1418 LearningRate 0.0001 Epoch: 31 Global Step: 53760 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-03-04 11:06:52,539-Speed 13856.25 samples/sec Loss 1.1393 LearningRate 0.0001 Epoch: 31 Global Step: 53770 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-03-04 11:07:10,343-Speed 13804.46 samples/sec Loss 1.1405 LearningRate 0.0001 Epoch: 31 Global Step: 53780 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-03-04 11:07:28,233-Speed 13738.50 samples/sec Loss 1.1339 LearningRate 0.0001 Epoch: 31 Global Step: 53790 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-03-04 11:07:46,327-Speed 13583.28 samples/sec Loss 1.1441 LearningRate 0.0001 Epoch: 31 Global Step: 53800 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-03-04 11:08:04,350-Speed 13636.92 samples/sec Loss 1.1358 LearningRate 0.0001 Epoch: 31 Global Step: 53810 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-03-04 11:08:22,451-Speed 13577.92 samples/sec Loss 1.1304 LearningRate 0.0001 Epoch: 31 Global Step: 53820 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-03-04 11:08:40,497-Speed 13619.24 samples/sec Loss 1.1382 LearningRate 0.0001 Epoch: 31 Global Step: 53830 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-03-04 11:08:58,189-Speed 13892.66 samples/sec Loss 1.1406 LearningRate 0.0001 Epoch: 31 Global Step: 53840 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-03-04 11:09:15,888-Speed 13885.88 samples/sec Loss 1.1397 LearningRate 0.0001 Epoch: 31 Global Step: 53850 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-03-04 11:09:33,617-Speed 13866.47 samples/sec Loss 1.1426 LearningRate 0.0001 Epoch: 31 Global Step: 53860 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-03-04 11:09:51,294-Speed 13903.59 samples/sec Loss 1.1321 LearningRate 0.0001 Epoch: 31 Global Step: 53870 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-03-04 11:10:09,016-Speed 13868.35 samples/sec Loss 1.1438 LearningRate 0.0001 Epoch: 31 Global Step: 53880 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-03-04 11:10:26,752-Speed 13857.98 samples/sec Loss 1.1381 LearningRate 0.0001 Epoch: 31 Global Step: 53890 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-04 11:10:44,495-Speed 13851.64 samples/sec Loss 1.1363 LearningRate 0.0001 Epoch: 31 Global Step: 53900 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-04 11:11:02,313-Speed 13794.08 samples/sec Loss 1.1376 LearningRate 0.0001 Epoch: 31 Global Step: 53910 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-04 11:11:20,140-Speed 13786.38 samples/sec Loss 1.1400 LearningRate 0.0001 Epoch: 31 Global Step: 53920 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-04 11:11:37,917-Speed 13825.93 samples/sec Loss 1.1383 LearningRate 0.0001 Epoch: 31 Global Step: 53930 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-04 11:11:55,789-Speed 13751.39 samples/sec Loss 1.1371 LearningRate 0.0001 Epoch: 31 Global Step: 53940 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-04 11:12:13,606-Speed 13794.96 samples/sec Loss 1.1369 LearningRate 0.0001 Epoch: 31 Global Step: 53950 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-04 11:12:31,442-Speed 13779.85 samples/sec Loss 1.1371 LearningRate 0.0001 Epoch: 31 Global Step: 53960 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-04 11:12:49,336-Speed 13735.77 samples/sec Loss 1.1358 LearningRate 0.0001 Epoch: 31 Global Step: 53970 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-04 11:13:07,241-Speed 13726.53 samples/sec Loss 1.1350 LearningRate 0.0001 Epoch: 31 Global Step: 53980 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-04 11:13:24,985-Speed 13851.92 samples/sec Loss 1.1304 LearningRate 0.0001 Epoch: 31 Global Step: 53990 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-03-04 11:13:42,804-Speed 13792.84 samples/sec Loss 1.1255 LearningRate 0.0001 Epoch: 31 Global Step: 54000 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-03-04 11:14:00,427-Speed 13946.18 samples/sec Loss 1.1413 LearningRate 0.0001 Epoch: 31 Global Step: 54010 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-04 11:14:18,215-Speed 13816.71 samples/sec Loss 1.1387 LearningRate 0.0001 Epoch: 31 Global Step: 54020 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-04 11:14:35,967-Speed 13845.11 samples/sec Loss 1.1381 LearningRate 0.0001 Epoch: 31 Global Step: 54030 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-04 11:14:53,725-Speed 13839.87 samples/sec Loss 1.1214 LearningRate 0.0001 Epoch: 31 Global Step: 54040 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-04 11:15:11,443-Speed 13872.09 samples/sec Loss 1.1355 LearningRate 0.0001 Epoch: 31 Global Step: 54050 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-04 11:15:29,267-Speed 13788.66 samples/sec Loss 1.1363 LearningRate 0.0001 Epoch: 31 Global Step: 54060 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-04 11:15:46,942-Speed 13905.48 samples/sec Loss 1.1384 LearningRate 0.0001 Epoch: 31 Global Step: 54070 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-04 11:16:04,756-Speed 13796.44 samples/sec Loss 1.1394 LearningRate 0.0001 Epoch: 31 Global Step: 54080 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-04 11:16:22,442-Speed 13896.99 samples/sec Loss 1.1338 LearningRate 0.0001 Epoch: 31 Global Step: 54090 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-04 11:16:40,202-Speed 13838.53 samples/sec Loss 1.1315 LearningRate 0.0001 Epoch: 31 Global Step: 54100 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-04 11:16:57,874-Speed 13907.52 samples/sec Loss 1.1331 LearningRate 0.0001 Epoch: 31 Global Step: 54110 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-04 11:17:15,677-Speed 13805.89 samples/sec Loss 1.1410 LearningRate 0.0001 Epoch: 31 Global Step: 54120 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-04 11:17:33,418-Speed 13853.36 samples/sec Loss 1.1386 LearningRate 0.0001 Epoch: 31 Global Step: 54130 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-04 11:17:51,234-Speed 13795.36 samples/sec Loss 1.1263 LearningRate 0.0001 Epoch: 31 Global Step: 54140 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-04 11:18:09,057-Speed 13789.15 samples/sec Loss 1.1273 LearningRate 0.0001 Epoch: 31 Global Step: 54150 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-04 11:18:26,776-Speed 13871.43 samples/sec Loss 1.1347 LearningRate 0.0001 Epoch: 31 Global Step: 54160 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-04 11:18:44,500-Speed 13867.03 samples/sec Loss 1.1283 LearningRate 0.0001 Epoch: 31 Global Step: 54170 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-04 11:19:02,194-Speed 13889.52 samples/sec Loss 1.1286 LearningRate 0.0001 Epoch: 31 Global Step: 54180 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-04 11:19:19,943-Speed 13847.30 samples/sec Loss 1.1281 LearningRate 0.0001 Epoch: 31 Global Step: 54190 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-03-04 11:19:37,808-Speed 13758.02 samples/sec Loss 1.1240 LearningRate 0.0001 Epoch: 31 Global Step: 54200 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-04 11:19:55,645-Speed 13780.80 samples/sec Loss 1.1289 LearningRate 0.0001 Epoch: 31 Global Step: 54210 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-03-04 11:20:13,443-Speed 13810.95 samples/sec Loss 1.1301 LearningRate 0.0001 Epoch: 31 Global Step: 54220 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-04 11:20:31,254-Speed 13798.98 samples/sec Loss 1.1291 LearningRate 0.0001 Epoch: 31 Global Step: 54230 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-04 11:20:49,069-Speed 13796.11 samples/sec Loss 1.1322 LearningRate 0.0001 Epoch: 31 Global Step: 54240 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-04 11:21:06,890-Speed 13792.04 samples/sec Loss 1.1365 LearningRate 0.0001 Epoch: 31 Global Step: 54250 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-04 11:21:24,695-Speed 13803.54 samples/sec Loss 1.1328 LearningRate 0.0001 Epoch: 31 Global Step: 54260 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-04 11:21:42,512-Speed 13794.03 samples/sec Loss 1.1335 LearningRate 0.0001 Epoch: 31 Global Step: 54270 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-04 11:22:00,307-Speed 13811.34 samples/sec Loss 1.1269 LearningRate 0.0001 Epoch: 31 Global Step: 54280 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-04 11:22:18,089-Speed 13822.03 samples/sec Loss 1.1241 LearningRate 0.0001 Epoch: 31 Global Step: 54290 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-04 11:22:35,889-Speed 13807.75 samples/sec Loss 1.1298 LearningRate 0.0001 Epoch: 31 Global Step: 54300 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-04 11:22:53,678-Speed 13817.94 samples/sec Loss 1.1318 LearningRate 0.0001 Epoch: 31 Global Step: 54310 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-04 11:23:11,542-Speed 13757.79 samples/sec Loss 1.1237 LearningRate 0.0001 Epoch: 31 Global Step: 54320 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-04 11:23:29,342-Speed 13807.69 samples/sec Loss 1.1320 LearningRate 0.0001 Epoch: 31 Global Step: 54330 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-04 11:23:47,117-Speed 13827.01 samples/sec Loss 1.1239 LearningRate 0.0001 Epoch: 31 Global Step: 54340 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-04 11:24:04,847-Speed 13862.21 samples/sec Loss 1.1238 LearningRate 0.0001 Epoch: 31 Global Step: 54350 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-04 11:24:22,629-Speed 13821.93 samples/sec Loss 1.1302 LearningRate 0.0001 Epoch: 31 Global Step: 54360 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-04 11:24:40,466-Speed 13779.40 samples/sec Loss 1.1351 LearningRate 0.0001 Epoch: 31 Global Step: 54370 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-04 11:24:58,235-Speed 13831.55 samples/sec Loss 1.1249 LearningRate 0.0001 Epoch: 31 Global Step: 54380 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-04 11:25:16,037-Speed 13806.30 samples/sec Loss 1.1265 LearningRate 0.0001 Epoch: 31 Global Step: 54390 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-04 11:25:33,749-Speed 13875.96 samples/sec Loss 1.1231 LearningRate 0.0001 Epoch: 31 Global Step: 54400 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-04 11:25:51,518-Speed 13831.84 samples/sec Loss 1.1223 LearningRate 0.0001 Epoch: 31 Global Step: 54410 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-04 11:26:09,250-Speed 13860.59 samples/sec Loss 1.1247 LearningRate 0.0001 Epoch: 31 Global Step: 54420 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-04 11:26:27,095-Speed 13772.79 samples/sec Loss 1.1156 LearningRate 0.0001 Epoch: 31 Global Step: 54430 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-04 11:26:44,866-Speed 13830.43 samples/sec Loss 1.1112 LearningRate 0.0001 Epoch: 31 Global Step: 54440 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-04 11:27:02,631-Speed 13835.14 samples/sec Loss 1.1225 LearningRate 0.0001 Epoch: 31 Global Step: 54450 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-04 11:27:20,382-Speed 13845.71 samples/sec Loss 1.1256 LearningRate 0.0001 Epoch: 31 Global Step: 54460 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-04 11:27:38,154-Speed 13829.53 samples/sec Loss 1.1213 LearningRate 0.0001 Epoch: 31 Global Step: 54470 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-04 11:27:55,865-Speed 13876.74 samples/sec Loss 1.1128 LearningRate 0.0001 Epoch: 31 Global Step: 54480 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-04 11:28:13,596-Speed 13861.38 samples/sec Loss 1.1173 LearningRate 0.0001 Epoch: 31 Global Step: 54490 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-04 11:28:31,360-Speed 13835.46 samples/sec Loss 1.1191 LearningRate 0.0001 Epoch: 31 Global Step: 54500 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-04 11:28:49,135-Speed 13827.20 samples/sec Loss 1.1259 LearningRate 0.0001 Epoch: 31 Global Step: 54510 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-04 11:29:06,885-Speed 13847.07 samples/sec Loss 1.1152 LearningRate 0.0001 Epoch: 31 Global Step: 54520 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-04 11:29:24,692-Speed 13802.36 samples/sec Loss 1.1166 LearningRate 0.0001 Epoch: 31 Global Step: 54530 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-04 11:29:42,406-Speed 13875.07 samples/sec Loss 1.1258 LearningRate 0.0001 Epoch: 31 Global Step: 54540 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-04 11:30:00,065-Speed 13917.74 samples/sec Loss 1.1177 LearningRate 0.0001 Epoch: 31 Global Step: 54550 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-04 11:30:17,888-Speed 13789.41 samples/sec Loss 1.1151 LearningRate 0.0001 Epoch: 31 Global Step: 54560 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-04 11:30:35,610-Speed 13868.17 samples/sec Loss 1.1233 LearningRate 0.0001 Epoch: 31 Global Step: 54570 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-04 11:30:53,354-Speed 13851.56 samples/sec Loss 1.1120 LearningRate 0.0001 Epoch: 31 Global Step: 54580 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-04 11:31:11,079-Speed 13866.44 samples/sec Loss 1.1283 LearningRate 0.0001 Epoch: 31 Global Step: 54590 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-04 11:31:28,880-Speed 13806.84 samples/sec Loss 1.1226 LearningRate 0.0001 Epoch: 31 Global Step: 54600 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-04 11:31:46,682-Speed 13805.56 samples/sec Loss 1.1256 LearningRate 0.0001 Epoch: 31 Global Step: 54610 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-04 11:32:04,445-Speed 13836.23 samples/sec Loss 1.1265 LearningRate 0.0001 Epoch: 31 Global Step: 54620 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-04 11:32:22,386-Speed 13699.88 samples/sec Loss 1.1176 LearningRate 0.0001 Epoch: 31 Global Step: 54630 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-04 11:32:40,220-Speed 13781.17 samples/sec Loss 1.1282 LearningRate 0.0001 Epoch: 31 Global Step: 54640 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-04 11:32:57,905-Speed 13896.90 samples/sec Loss 1.1268 LearningRate 0.0001 Epoch: 31 Global Step: 54650 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-04 11:33:15,710-Speed 13804.18 samples/sec Loss 1.1157 LearningRate 0.0001 Epoch: 31 Global Step: 54660 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-04 11:33:33,616-Speed 13725.72 samples/sec Loss 1.1269 LearningRate 0.0001 Epoch: 31 Global Step: 54670 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-04 11:33:51,350-Speed 13859.55 samples/sec Loss 1.1249 LearningRate 0.0001 Epoch: 31 Global Step: 54680 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-04 11:34:09,051-Speed 13884.32 samples/sec Loss 1.1182 LearningRate 0.0001 Epoch: 31 Global Step: 54690 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-04 11:34:26,856-Speed 13803.97 samples/sec Loss 1.1125 LearningRate 0.0001 Epoch: 31 Global Step: 54700 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-04 11:34:44,612-Speed 13842.19 samples/sec Loss 1.1110 LearningRate 0.0001 Epoch: 31 Global Step: 54710 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-04 11:35:02,511-Speed 13731.28 samples/sec Loss 1.1203 LearningRate 0.0001 Epoch: 31 Global Step: 54720 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-04 11:35:20,331-Speed 13791.81 samples/sec Loss 1.1181 LearningRate 0.0001 Epoch: 31 Global Step: 54730 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-04 11:35:38,118-Speed 13817.64 samples/sec Loss 1.1211 LearningRate 0.0001 Epoch: 31 Global Step: 54740 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-04 11:35:55,868-Speed 13846.65 samples/sec Loss 1.1183 LearningRate 0.0001 Epoch: 31 Global Step: 54750 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-04 11:36:13,552-Speed 13898.56 samples/sec Loss 1.1277 LearningRate 0.0001 Epoch: 31 Global Step: 54760 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-04 11:36:31,289-Speed 13856.52 samples/sec Loss 1.1095 LearningRate 0.0001 Epoch: 31 Global Step: 54770 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-04 11:36:49,013-Speed 13866.66 samples/sec Loss 1.1220 LearningRate 0.0001 Epoch: 31 Global Step: 54780 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-04 11:37:06,799-Speed 13818.77 samples/sec Loss 1.1083 LearningRate 0.0001 Epoch: 31 Global Step: 54790 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-04 11:37:24,497-Speed 13887.28 samples/sec Loss 1.1119 LearningRate 0.0001 Epoch: 31 Global Step: 54800 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-04 11:37:42,296-Speed 13808.30 samples/sec Loss 1.1221 LearningRate 0.0001 Epoch: 31 Global Step: 54810 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-04 11:38:00,041-Speed 13850.53 samples/sec Loss 1.1122 LearningRate 0.0001 Epoch: 31 Global Step: 54820 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-04 11:38:17,780-Speed 13854.59 samples/sec Loss 1.1207 LearningRate 0.0001 Epoch: 31 Global Step: 54830 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-04 11:38:35,489-Speed 13879.01 samples/sec Loss 1.1080 LearningRate 0.0001 Epoch: 31 Global Step: 54840 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-04 11:38:53,315-Speed 13788.01 samples/sec Loss 1.1104 LearningRate 0.0001 Epoch: 31 Global Step: 54850 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-04 11:39:11,053-Speed 13855.29 samples/sec Loss 1.1152 LearningRate 0.0001 Epoch: 31 Global Step: 54860 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-04 11:39:28,777-Speed 13867.02 samples/sec Loss 1.1137 LearningRate 0.0001 Epoch: 31 Global Step: 54870 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-04 11:39:46,550-Speed 13828.88 samples/sec Loss 1.1128 LearningRate 0.0001 Epoch: 31 Global Step: 54880 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-04 11:40:04,349-Speed 13810.55 samples/sec Loss 1.1167 LearningRate 0.0001 Epoch: 31 Global Step: 54890 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-04 11:40:22,106-Speed 13840.90 samples/sec Loss 1.1063 LearningRate 0.0001 Epoch: 31 Global Step: 54900 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-04 11:40:39,891-Speed 13818.89 samples/sec Loss 1.1151 LearningRate 0.0001 Epoch: 31 Global Step: 54910 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-04 11:40:57,657-Speed 13834.53 samples/sec Loss 1.1186 LearningRate 0.0001 Epoch: 31 Global Step: 54920 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-04 11:41:15,418-Speed 13838.22 samples/sec Loss 1.1098 LearningRate 0.0001 Epoch: 31 Global Step: 54930 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-04 11:41:33,208-Speed 13815.22 samples/sec Loss 1.1125 LearningRate 0.0001 Epoch: 31 Global Step: 54940 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-04 11:41:50,913-Speed 13881.88 samples/sec Loss 1.1069 LearningRate 0.0001 Epoch: 31 Global Step: 54950 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-04 11:42:08,642-Speed 13862.58 samples/sec Loss 1.1128 LearningRate 0.0001 Epoch: 31 Global Step: 54960 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-04 11:42:26,335-Speed 13891.48 samples/sec Loss 1.1188 LearningRate 0.0001 Epoch: 31 Global Step: 54970 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-04 11:42:44,105-Speed 13830.37 samples/sec Loss 1.1095 LearningRate 0.0001 Epoch: 31 Global Step: 54980 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-04 11:43:01,919-Speed 13796.50 samples/sec Loss 1.1129 LearningRate 0.0001 Epoch: 31 Global Step: 54990 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-04 11:43:19,652-Speed 13861.11 samples/sec Loss 1.1026 LearningRate 0.0001 Epoch: 31 Global Step: 55000 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-04 11:43:37,378-Speed 13864.87 samples/sec Loss 1.1185 LearningRate 0.0001 Epoch: 31 Global Step: 55010 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-04 11:43:55,048-Speed 13909.24 samples/sec Loss 1.1157 LearningRate 0.0001 Epoch: 31 Global Step: 55020 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-04 11:44:12,840-Speed 13814.43 samples/sec Loss 1.1146 LearningRate 0.0001 Epoch: 31 Global Step: 55030 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-04 11:44:30,548-Speed 13878.91 samples/sec Loss 1.1179 LearningRate 0.0001 Epoch: 31 Global Step: 55040 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-04 11:44:48,301-Speed 13844.01 samples/sec Loss 1.1108 LearningRate 0.0001 Epoch: 31 Global Step: 55050 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-04 11:45:06,115-Speed 13796.90 samples/sec Loss 1.1150 LearningRate 0.0001 Epoch: 31 Global Step: 55060 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-04 11:45:23,941-Speed 13787.33 samples/sec Loss 1.1155 LearningRate 0.0001 Epoch: 31 Global Step: 55070 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-04 11:45:41,771-Speed 13784.91 samples/sec Loss 1.1118 LearningRate 0.0001 Epoch: 31 Global Step: 55080 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-03-04 11:45:59,625-Speed 13765.46 samples/sec Loss 1.1091 LearningRate 0.0001 Epoch: 31 Global Step: 55090 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-03-04 11:46:17,496-Speed 13753.27 samples/sec Loss 1.1094 LearningRate 0.0001 Epoch: 31 Global Step: 55100 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-04 11:46:35,220-Speed 13868.85 samples/sec Loss 1.1135 LearningRate 0.0001 Epoch: 31 Global Step: 55110 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-04 11:46:53,097-Speed 13747.69 samples/sec Loss 1.1089 LearningRate 0.0001 Epoch: 31 Global Step: 55120 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-04 11:47:10,809-Speed 13876.56 samples/sec Loss 1.1073 LearningRate 0.0001 Epoch: 31 Global Step: 55130 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-04 11:47:28,733-Speed 13712.38 samples/sec Loss 1.1134 LearningRate 0.0001 Epoch: 31 Global Step: 55140 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-04 11:47:46,433-Speed 13885.71 samples/sec Loss 1.1063 LearningRate 0.0001 Epoch: 31 Global Step: 55150 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-04 11:48:04,253-Speed 13791.59 samples/sec Loss 1.1188 LearningRate 0.0001 Epoch: 31 Global Step: 55160 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-04 11:48:21,953-Speed 13885.76 samples/sec Loss 1.1137 LearningRate 0.0001 Epoch: 31 Global Step: 55170 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-04 11:48:39,650-Speed 13888.06 samples/sec Loss 1.1224 LearningRate 0.0001 Epoch: 31 Global Step: 55180 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-04 11:48:57,463-Speed 13797.71 samples/sec Loss 1.1111 LearningRate 0.0001 Epoch: 31 Global Step: 55190 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-04 11:49:15,229-Speed 13834.01 samples/sec Loss 1.1067 LearningRate 0.0001 Epoch: 31 Global Step: 55200 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-04 11:49:33,002-Speed 13829.89 samples/sec Loss 1.1103 LearningRate 0.0000 Epoch: 31 Global Step: 55210 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-04 11:49:50,771-Speed 13833.11 samples/sec Loss 1.1174 LearningRate 0.0000 Epoch: 31 Global Step: 55220 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-04 11:50:08,582-Speed 13799.14 samples/sec Loss 1.1110 LearningRate 0.0000 Epoch: 31 Global Step: 55230 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-04 11:50:26,485-Speed 13728.33 samples/sec Loss 1.1103 LearningRate 0.0000 Epoch: 31 Global Step: 55240 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-04 11:50:44,243-Speed 13839.80 samples/sec Loss 1.0988 LearningRate 0.0000 Epoch: 31 Global Step: 55250 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-04 11:51:02,043-Speed 13808.24 samples/sec Loss 1.1055 LearningRate 0.0000 Epoch: 31 Global Step: 55260 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-04 11:51:19,766-Speed 13867.17 samples/sec Loss 1.1113 LearningRate 0.0000 Epoch: 31 Global Step: 55270 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-04 11:51:37,583-Speed 13794.26 samples/sec Loss 1.1051 LearningRate 0.0000 Epoch: 31 Global Step: 55280 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-04 11:51:55,375-Speed 13814.21 samples/sec Loss 1.1082 LearningRate 0.0000 Epoch: 31 Global Step: 55290 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-04 11:52:13,166-Speed 13814.58 samples/sec Loss 1.1158 LearningRate 0.0000 Epoch: 31 Global Step: 55300 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-04 11:53:21,554-Speed 3593.71 samples/sec Loss 1.1039 LearningRate 0.0000 Epoch: 32 Global Step: 55310 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-04 11:53:39,325-Speed 13829.55 samples/sec Loss 1.1060 LearningRate 0.0000 Epoch: 32 Global Step: 55320 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-04 11:53:57,093-Speed 13832.90 samples/sec Loss 1.1053 LearningRate 0.0000 Epoch: 32 Global Step: 55330 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-04 11:54:14,764-Speed 13908.58 samples/sec Loss 1.1045 LearningRate 0.0000 Epoch: 32 Global Step: 55340 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-04 11:54:32,460-Speed 13888.41 samples/sec Loss 1.1060 LearningRate 0.0000 Epoch: 32 Global Step: 55350 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-04 11:54:50,190-Speed 13862.20 samples/sec Loss 1.1098 LearningRate 0.0000 Epoch: 32 Global Step: 55360 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-04 11:55:08,181-Speed 13661.16 samples/sec Loss 1.1003 LearningRate 0.0000 Epoch: 32 Global Step: 55370 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-04 11:55:26,159-Speed 13672.15 samples/sec Loss 1.1036 LearningRate 0.0000 Epoch: 32 Global Step: 55380 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-04 11:55:44,130-Speed 13676.21 samples/sec Loss 1.1031 LearningRate 0.0000 Epoch: 32 Global Step: 55390 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-04 11:56:02,120-Speed 13662.28 samples/sec Loss 1.1047 LearningRate 0.0000 Epoch: 32 Global Step: 55400 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-04 11:56:20,137-Speed 13641.17 samples/sec Loss 1.0946 LearningRate 0.0000 Epoch: 32 Global Step: 55410 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-04 11:56:38,149-Speed 13645.21 samples/sec Loss 1.1122 LearningRate 0.0000 Epoch: 32 Global Step: 55420 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-04 11:56:56,219-Speed 13601.45 samples/sec Loss 1.1032 LearningRate 0.0000 Epoch: 32 Global Step: 55430 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-04 11:57:14,350-Speed 13555.04 samples/sec Loss 1.0976 LearningRate 0.0000 Epoch: 32 Global Step: 55440 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-04 11:57:32,355-Speed 13651.06 samples/sec Loss 1.1032 LearningRate 0.0000 Epoch: 32 Global Step: 55450 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-04 11:57:50,326-Speed 13675.97 samples/sec Loss 1.0993 LearningRate 0.0000 Epoch: 32 Global Step: 55460 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-04 11:58:08,288-Speed 13683.21 samples/sec Loss 1.1028 LearningRate 0.0000 Epoch: 32 Global Step: 55470 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-04 11:58:25,952-Speed 13913.78 samples/sec Loss 1.1060 LearningRate 0.0000 Epoch: 32 Global Step: 55480 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-04 11:58:43,630-Speed 13902.82 samples/sec Loss 1.1024 LearningRate 0.0000 Epoch: 32 Global Step: 55490 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-04 11:59:01,337-Speed 13880.61 samples/sec Loss 1.0981 LearningRate 0.0000 Epoch: 32 Global Step: 55500 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-04 11:59:19,034-Speed 13887.65 samples/sec Loss 1.1100 LearningRate 0.0000 Epoch: 32 Global Step: 55510 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-04 11:59:36,754-Speed 13869.99 samples/sec Loss 1.0969 LearningRate 0.0000 Epoch: 32 Global Step: 55520 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-04 11:59:54,537-Speed 13821.18 samples/sec Loss 1.0990 LearningRate 0.0000 Epoch: 32 Global Step: 55530 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-04 12:00:12,315-Speed 13824.60 samples/sec Loss 1.0974 LearningRate 0.0000 Epoch: 32 Global Step: 55540 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-04 12:00:29,980-Speed 13913.01 samples/sec Loss 1.1023 LearningRate 0.0000 Epoch: 32 Global Step: 55550 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-04 12:00:47,759-Speed 13823.84 samples/sec Loss 1.1013 LearningRate 0.0000 Epoch: 32 Global Step: 55560 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-04 12:01:05,627-Speed 13756.56 samples/sec Loss 1.1046 LearningRate 0.0000 Epoch: 32 Global Step: 55570 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-04 12:01:23,414-Speed 13817.66 samples/sec Loss 1.1014 LearningRate 0.0000 Epoch: 32 Global Step: 55580 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-04 12:01:41,170-Speed 13841.73 samples/sec Loss 1.0996 LearningRate 0.0000 Epoch: 32 Global Step: 55590 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-04 12:01:58,883-Speed 13875.92 samples/sec Loss 1.0975 LearningRate 0.0000 Epoch: 32 Global Step: 55600 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-04 12:02:16,606-Speed 13867.27 samples/sec Loss 1.0972 LearningRate 0.0000 Epoch: 32 Global Step: 55610 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-04 12:02:34,313-Speed 13880.03 samples/sec Loss 1.1022 LearningRate 0.0000 Epoch: 32 Global Step: 55620 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-04 12:02:52,153-Speed 13776.63 samples/sec Loss 1.1020 LearningRate 0.0000 Epoch: 32 Global Step: 55630 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-04 12:03:09,866-Speed 13876.15 samples/sec Loss 1.1043 LearningRate 0.0000 Epoch: 32 Global Step: 55640 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-04 12:03:27,634-Speed 13832.68 samples/sec Loss 1.0946 LearningRate 0.0000 Epoch: 32 Global Step: 55650 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-04 12:03:45,289-Speed 13920.28 samples/sec Loss 1.0916 LearningRate 0.0000 Epoch: 32 Global Step: 55660 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-04 12:04:03,020-Speed 13861.70 samples/sec Loss 1.1029 LearningRate 0.0000 Epoch: 32 Global Step: 55670 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-04 12:04:20,744-Speed 13866.97 samples/sec Loss 1.0998 LearningRate 0.0000 Epoch: 32 Global Step: 55680 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-04 12:04:38,468-Speed 13867.09 samples/sec Loss 1.1003 LearningRate 0.0000 Epoch: 32 Global Step: 55690 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-04 12:04:56,170-Speed 13883.47 samples/sec Loss 1.0950 LearningRate 0.0000 Epoch: 32 Global Step: 55700 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-04 12:05:13,871-Speed 13885.95 samples/sec Loss 1.0999 LearningRate 0.0000 Epoch: 32 Global Step: 55710 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-04 12:05:31,570-Speed 13886.81 samples/sec Loss 1.1048 LearningRate 0.0000 Epoch: 32 Global Step: 55720 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-04 12:05:49,419-Speed 13769.18 samples/sec Loss 1.0998 LearningRate 0.0000 Epoch: 32 Global Step: 55730 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-04 12:06:07,194-Speed 13827.38 samples/sec Loss 1.0964 LearningRate 0.0000 Epoch: 32 Global Step: 55740 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-04 12:06:24,965-Speed 13830.49 samples/sec Loss 1.1015 LearningRate 0.0000 Epoch: 32 Global Step: 55750 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-04 12:06:42,624-Speed 13917.14 samples/sec Loss 1.0982 LearningRate 0.0000 Epoch: 32 Global Step: 55760 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-04 12:07:00,329-Speed 13882.17 samples/sec Loss 1.0984 LearningRate 0.0000 Epoch: 32 Global Step: 55770 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-04 12:07:18,104-Speed 13827.24 samples/sec Loss 1.0972 LearningRate 0.0000 Epoch: 32 Global Step: 55780 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-04 12:07:35,819-Speed 13873.86 samples/sec Loss 1.1075 LearningRate 0.0000 Epoch: 32 Global Step: 55790 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-04 12:07:53,555-Speed 13857.38 samples/sec Loss 1.0988 LearningRate 0.0000 Epoch: 32 Global Step: 55800 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-04 12:08:11,245-Speed 13893.53 samples/sec Loss 1.0961 LearningRate 0.0000 Epoch: 32 Global Step: 55810 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-04 12:08:28,955-Speed 13877.74 samples/sec Loss 1.0962 LearningRate 0.0000 Epoch: 32 Global Step: 55820 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-04 12:08:46,743-Speed 13816.53 samples/sec Loss 1.0970 LearningRate 0.0000 Epoch: 32 Global Step: 55830 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-04 12:09:04,514-Speed 13830.43 samples/sec Loss 1.0992 LearningRate 0.0000 Epoch: 32 Global Step: 55840 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-04 12:09:22,238-Speed 13866.52 samples/sec Loss 1.0892 LearningRate 0.0000 Epoch: 32 Global Step: 55850 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-03-04 12:09:39,991-Speed 13844.63 samples/sec Loss 1.0981 LearningRate 0.0000 Epoch: 32 Global Step: 55860 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-04 12:09:57,770-Speed 13824.23 samples/sec Loss 1.0954 LearningRate 0.0000 Epoch: 32 Global Step: 55870 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-04 12:10:15,519-Speed 13847.19 samples/sec Loss 1.0937 LearningRate 0.0000 Epoch: 32 Global Step: 55880 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-04 12:10:33,236-Speed 13872.27 samples/sec Loss 1.0940 LearningRate 0.0000 Epoch: 32 Global Step: 55890 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-04 12:10:51,014-Speed 13825.40 samples/sec Loss 1.0976 LearningRate 0.0000 Epoch: 32 Global Step: 55900 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-04 12:11:08,758-Speed 13851.04 samples/sec Loss 1.0922 LearningRate 0.0000 Epoch: 32 Global Step: 55910 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-04 12:11:26,549-Speed 13814.43 samples/sec Loss 1.0986 LearningRate 0.0000 Epoch: 32 Global Step: 55920 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-04 12:11:44,373-Speed 13788.99 samples/sec Loss 1.0990 LearningRate 0.0000 Epoch: 32 Global Step: 55930 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-04 12:12:02,157-Speed 13821.39 samples/sec Loss 1.0983 LearningRate 0.0000 Epoch: 32 Global Step: 55940 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-04 12:12:19,905-Speed 13847.87 samples/sec Loss 1.0944 LearningRate 0.0000 Epoch: 32 Global Step: 55950 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-04 12:12:37,750-Speed 13772.69 samples/sec Loss 1.0906 LearningRate 0.0000 Epoch: 32 Global Step: 55960 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-03-04 12:12:55,429-Speed 13902.20 samples/sec Loss 1.0930 LearningRate 0.0000 Epoch: 32 Global Step: 55970 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-04 12:13:13,134-Speed 13881.78 samples/sec Loss 1.0894 LearningRate 0.0000 Epoch: 32 Global Step: 55980 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-04 12:13:30,980-Speed 13771.91 samples/sec Loss 1.1007 LearningRate 0.0000 Epoch: 32 Global Step: 55990 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-04 12:13:48,691-Speed 13877.42 samples/sec Loss 1.0914 LearningRate 0.0000 Epoch: 32 Global Step: 56000 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-04 12:14:06,434-Speed 13851.67 samples/sec Loss 1.0881 LearningRate 0.0000 Epoch: 32 Global Step: 56010 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-04 12:14:24,309-Speed 13750.56 samples/sec Loss 1.0946 LearningRate 0.0000 Epoch: 32 Global Step: 56020 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-04 12:14:42,068-Speed 13839.53 samples/sec Loss 1.0906 LearningRate 0.0000 Epoch: 32 Global Step: 56030 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-04 12:14:59,822-Speed 13844.39 samples/sec Loss 1.0927 LearningRate 0.0000 Epoch: 32 Global Step: 56040 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-04 12:15:17,669-Speed 13770.61 samples/sec Loss 1.0897 LearningRate 0.0000 Epoch: 32 Global Step: 56050 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-04 12:15:35,447-Speed 13825.15 samples/sec Loss 1.0925 LearningRate 0.0000 Epoch: 32 Global Step: 56060 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-04 12:15:53,192-Speed 13850.29 samples/sec Loss 1.0916 LearningRate 0.0000 Epoch: 32 Global Step: 56070 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-03-04 12:16:10,900-Speed 13879.74 samples/sec Loss 1.0839 LearningRate 0.0000 Epoch: 32 Global Step: 56080 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-04 12:16:28,594-Speed 13889.76 samples/sec Loss 1.0862 LearningRate 0.0000 Epoch: 32 Global Step: 56090 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-04 12:16:46,329-Speed 13858.53 samples/sec Loss 1.0911 LearningRate 0.0000 Epoch: 32 Global Step: 56100 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-04 12:17:04,068-Speed 13855.50 samples/sec Loss 1.0902 LearningRate 0.0000 Epoch: 32 Global Step: 56110 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-04 12:17:21,826-Speed 13840.21 samples/sec Loss 1.0859 LearningRate 0.0000 Epoch: 32 Global Step: 56120 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-04 12:17:39,578-Speed 13844.64 samples/sec Loss 1.1048 LearningRate 0.0000 Epoch: 32 Global Step: 56130 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-04 12:17:57,353-Speed 13826.91 samples/sec Loss 1.0907 LearningRate 0.0000 Epoch: 32 Global Step: 56140 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-04 12:18:15,093-Speed 13854.77 samples/sec Loss 1.0872 LearningRate 0.0000 Epoch: 32 Global Step: 56150 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-04 12:18:32,850-Speed 13841.40 samples/sec Loss 1.0919 LearningRate 0.0000 Epoch: 32 Global Step: 56160 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-04 12:18:50,669-Speed 13792.43 samples/sec Loss 1.0964 LearningRate 0.0000 Epoch: 32 Global Step: 56170 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-04 12:19:08,430-Speed 13837.88 samples/sec Loss 1.0926 LearningRate 0.0000 Epoch: 32 Global Step: 56180 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-03-04 12:19:26,100-Speed 13909.23 samples/sec Loss 1.0893 LearningRate 0.0000 Epoch: 32 Global Step: 56190 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-04 12:19:43,881-Speed 13822.07 samples/sec Loss 1.0941 LearningRate 0.0000 Epoch: 32 Global Step: 56200 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-04 12:20:01,633-Speed 13845.08 samples/sec Loss 1.0886 LearningRate 0.0000 Epoch: 32 Global Step: 56210 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-04 12:20:19,437-Speed 13805.02 samples/sec Loss 1.0872 LearningRate 0.0000 Epoch: 32 Global Step: 56220 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-04 12:20:37,168-Speed 13860.95 samples/sec Loss 1.0823 LearningRate 0.0000 Epoch: 32 Global Step: 56230 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-04 12:20:55,014-Speed 13772.13 samples/sec Loss 1.0894 LearningRate 0.0000 Epoch: 32 Global Step: 56240 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-04 12:21:12,792-Speed 13826.19 samples/sec Loss 1.0795 LearningRate 0.0000 Epoch: 32 Global Step: 56250 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-04 12:21:30,594-Speed 13806.07 samples/sec Loss 1.0850 LearningRate 0.0000 Epoch: 32 Global Step: 56260 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-04 12:21:48,373-Speed 13824.02 samples/sec Loss 1.0853 LearningRate 0.0000 Epoch: 32 Global Step: 56270 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-04 12:22:06,132-Speed 13840.37 samples/sec Loss 1.0854 LearningRate 0.0000 Epoch: 32 Global Step: 56280 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-03-04 12:22:23,912-Speed 13823.16 samples/sec Loss 1.0877 LearningRate 0.0000 Epoch: 32 Global Step: 56290 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-03-04 12:22:41,671-Speed 13840.00 samples/sec Loss 1.0808 LearningRate 0.0000 Epoch: 32 Global Step: 56300 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-03-04 12:22:59,380-Speed 13878.80 samples/sec Loss 1.0912 LearningRate 0.0000 Epoch: 32 Global Step: 56310 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-04 12:23:17,065-Speed 13897.13 samples/sec Loss 1.0772 LearningRate 0.0000 Epoch: 32 Global Step: 56320 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-04 12:23:34,877-Speed 13798.94 samples/sec Loss 1.0820 LearningRate 0.0000 Epoch: 32 Global Step: 56330 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-04 12:23:52,705-Speed 13785.47 samples/sec Loss 1.0800 LearningRate 0.0000 Epoch: 32 Global Step: 56340 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-04 12:24:10,460-Speed 13842.36 samples/sec Loss 1.0963 LearningRate 0.0000 Epoch: 32 Global Step: 56350 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-04 12:24:28,137-Speed 13903.82 samples/sec Loss 1.0821 LearningRate 0.0000 Epoch: 32 Global Step: 56360 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-04 12:24:45,828-Speed 13893.09 samples/sec Loss 1.0821 LearningRate 0.0000 Epoch: 32 Global Step: 56370 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-04 12:25:03,573-Speed 13850.73 samples/sec Loss 1.0893 LearningRate 0.0000 Epoch: 32 Global Step: 56380 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-04 12:25:21,333-Speed 13837.92 samples/sec Loss 1.0800 LearningRate 0.0000 Epoch: 32 Global Step: 56390 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-04 12:25:39,147-Speed 13796.86 samples/sec Loss 1.0883 LearningRate 0.0000 Epoch: 32 Global Step: 56400 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-04 12:25:56,873-Speed 13865.67 samples/sec Loss 1.0875 LearningRate 0.0000 Epoch: 32 Global Step: 56410 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-03-04 12:26:14,591-Speed 13871.43 samples/sec Loss 1.0812 LearningRate 0.0000 Epoch: 32 Global Step: 56420 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-04 12:26:32,293-Speed 13883.76 samples/sec Loss 1.0896 LearningRate 0.0000 Epoch: 32 Global Step: 56430 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-04 12:26:50,039-Speed 13849.59 samples/sec Loss 1.0828 LearningRate 0.0000 Epoch: 32 Global Step: 56440 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-04 12:27:07,829-Speed 13815.99 samples/sec Loss 1.0870 LearningRate 0.0000 Epoch: 32 Global Step: 56450 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-04 12:27:25,792-Speed 13682.11 samples/sec Loss 1.0821 LearningRate 0.0000 Epoch: 32 Global Step: 56460 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-04 12:27:43,499-Speed 13881.47 samples/sec Loss 1.0844 LearningRate 0.0000 Epoch: 32 Global Step: 56470 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-04 12:28:01,412-Speed 13720.68 samples/sec Loss 1.0835 LearningRate 0.0000 Epoch: 32 Global Step: 56480 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-04 12:28:19,213-Speed 13806.75 samples/sec Loss 1.0831 LearningRate 0.0000 Epoch: 32 Global Step: 56490 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-04 12:28:36,950-Speed 13856.81 samples/sec Loss 1.0818 LearningRate 0.0000 Epoch: 32 Global Step: 56500 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-04 12:28:54,777-Speed 13787.48 samples/sec Loss 1.0816 LearningRate 0.0000 Epoch: 32 Global Step: 56510 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-04 12:29:12,603-Speed 13787.32 samples/sec Loss 1.0746 LearningRate 0.0000 Epoch: 32 Global Step: 56520 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-03-04 12:29:30,500-Speed 13734.04 samples/sec Loss 1.0843 LearningRate 0.0000 Epoch: 32 Global Step: 56530 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-03-04 12:29:48,305-Speed 13803.68 samples/sec Loss 1.0740 LearningRate 0.0000 Epoch: 32 Global Step: 56540 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-03-04 12:30:06,061-Speed 13842.05 samples/sec Loss 1.0797 LearningRate 0.0000 Epoch: 32 Global Step: 56550 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-04 12:30:23,982-Speed 13715.59 samples/sec Loss 1.0876 LearningRate 0.0000 Epoch: 32 Global Step: 56560 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-04 12:30:41,747-Speed 13837.56 samples/sec Loss 1.0794 LearningRate 0.0000 Epoch: 32 Global Step: 56570 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-04 12:30:59,916-Speed 13528.73 samples/sec Loss 1.0777 LearningRate 0.0000 Epoch: 32 Global Step: 56580 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-04 12:31:18,070-Speed 13538.04 samples/sec Loss 1.0818 LearningRate 0.0000 Epoch: 32 Global Step: 56590 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-04 12:31:36,104-Speed 13628.46 samples/sec Loss 1.0805 LearningRate 0.0000 Epoch: 32 Global Step: 56600 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-04 12:31:54,197-Speed 13584.16 samples/sec Loss 1.0757 LearningRate 0.0000 Epoch: 32 Global Step: 56610 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-04 12:32:12,285-Speed 13587.55 samples/sec Loss 1.0809 LearningRate 0.0000 Epoch: 32 Global Step: 56620 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-04 12:32:30,392-Speed 13573.61 samples/sec Loss 1.0821 LearningRate 0.0000 Epoch: 32 Global Step: 56630 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-04 12:32:48,559-Speed 13529.15 samples/sec Loss 1.0724 LearningRate 0.0000 Epoch: 32 Global Step: 56640 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-04 12:33:06,585-Speed 13634.55 samples/sec Loss 1.0873 LearningRate 0.0000 Epoch: 32 Global Step: 56650 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-04 12:33:24,649-Speed 13605.87 samples/sec Loss 1.0862 LearningRate 0.0000 Epoch: 32 Global Step: 56660 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-04 12:33:42,756-Speed 13573.29 samples/sec Loss 1.0811 LearningRate 0.0000 Epoch: 32 Global Step: 56670 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-04 12:34:00,854-Speed 13580.54 samples/sec Loss 1.0739 LearningRate 0.0000 Epoch: 32 Global Step: 56680 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-04 12:34:18,930-Speed 13596.31 samples/sec Loss 1.0803 LearningRate 0.0000 Epoch: 32 Global Step: 56690 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-04 12:34:37,021-Speed 13586.29 samples/sec Loss 1.0833 LearningRate 0.0000 Epoch: 32 Global Step: 56700 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-04 12:34:55,109-Speed 13587.63 samples/sec Loss 1.0767 LearningRate 0.0000 Epoch: 32 Global Step: 56710 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-04 12:35:13,159-Speed 13616.11 samples/sec Loss 1.0705 LearningRate 0.0000 Epoch: 32 Global Step: 56720 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-04 12:35:31,255-Speed 13581.23 samples/sec Loss 1.0827 LearningRate 0.0000 Epoch: 32 Global Step: 56730 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-04 12:35:48,934-Speed 13903.05 samples/sec Loss 1.0852 LearningRate 0.0000 Epoch: 32 Global Step: 56740 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-04 12:36:06,671-Speed 13857.16 samples/sec Loss 1.0748 LearningRate 0.0000 Epoch: 32 Global Step: 56750 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-04 12:36:24,402-Speed 13860.84 samples/sec Loss 1.0756 LearningRate 0.0000 Epoch: 32 Global Step: 56760 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-04 12:36:42,184-Speed 13822.40 samples/sec Loss 1.0747 LearningRate 0.0000 Epoch: 32 Global Step: 56770 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-04 12:36:59,869-Speed 13898.37 samples/sec Loss 1.0759 LearningRate 0.0000 Epoch: 32 Global Step: 56780 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-04 12:37:17,644-Speed 13826.73 samples/sec Loss 1.0809 LearningRate 0.0000 Epoch: 32 Global Step: 56790 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-04 12:37:35,399-Speed 13842.09 samples/sec Loss 1.0789 LearningRate 0.0000 Epoch: 32 Global Step: 56800 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-04 12:37:53,335-Speed 13703.47 samples/sec Loss 1.0748 LearningRate 0.0000 Epoch: 32 Global Step: 56810 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-04 12:38:11,082-Speed 13848.47 samples/sec Loss 1.0846 LearningRate 0.0000 Epoch: 32 Global Step: 56820 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-04 12:38:28,864-Speed 13822.04 samples/sec Loss 1.0855 LearningRate 0.0000 Epoch: 32 Global Step: 56830 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-04 12:38:46,533-Speed 13909.75 samples/sec Loss 1.0618 LearningRate 0.0000 Epoch: 32 Global Step: 56840 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-04 12:39:04,343-Speed 13799.78 samples/sec Loss 1.0787 LearningRate 0.0000 Epoch: 32 Global Step: 56850 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-04 12:39:22,060-Speed 13872.65 samples/sec Loss 1.0727 LearningRate 0.0000 Epoch: 32 Global Step: 56860 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-04 12:39:39,733-Speed 13907.15 samples/sec Loss 1.0814 LearningRate 0.0000 Epoch: 32 Global Step: 56870 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-04 12:39:57,500-Speed 13833.10 samples/sec Loss 1.0749 LearningRate 0.0000 Epoch: 32 Global Step: 56880 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-04 12:40:15,212-Speed 13876.89 samples/sec Loss 1.0834 LearningRate 0.0000 Epoch: 32 Global Step: 56890 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-04 12:40:32,953-Speed 13853.27 samples/sec Loss 1.0770 LearningRate 0.0000 Epoch: 32 Global Step: 56900 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-04 12:40:50,637-Speed 13899.11 samples/sec Loss 1.0824 LearningRate 0.0000 Epoch: 32 Global Step: 56910 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-04 12:41:08,409-Speed 13829.37 samples/sec Loss 1.0732 LearningRate 0.0000 Epoch: 32 Global Step: 56920 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-03-04 12:41:26,112-Speed 13883.64 samples/sec Loss 1.0939 LearningRate 0.0000 Epoch: 32 Global Step: 56930 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-04 12:41:43,906-Speed 13812.30 samples/sec Loss 1.0735 LearningRate 0.0000 Epoch: 32 Global Step: 56940 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-04 12:42:01,676-Speed 13832.18 samples/sec Loss 1.0729 LearningRate 0.0000 Epoch: 32 Global Step: 56950 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-04 12:42:19,351-Speed 13905.47 samples/sec Loss 1.0772 LearningRate 0.0000 Epoch: 32 Global Step: 56960 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-04 12:42:37,179-Speed 13788.13 samples/sec Loss 1.0753 LearningRate 0.0000 Epoch: 32 Global Step: 56970 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-04 12:42:54,894-Speed 13873.58 samples/sec Loss 1.0758 LearningRate 0.0000 Epoch: 32 Global Step: 56980 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-04 12:43:12,649-Speed 13842.44 samples/sec Loss 1.0826 LearningRate 0.0000 Epoch: 32 Global Step: 56990 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-04 12:43:30,417-Speed 13832.56 samples/sec Loss 1.0826 LearningRate 0.0000 Epoch: 32 Global Step: 57000 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-04 12:43:48,135-Speed 13872.39 samples/sec Loss 1.0721 LearningRate 0.0000 Epoch: 32 Global Step: 57010 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-04 12:44:05,913-Speed 13825.20 samples/sec Loss 1.0776 LearningRate 0.0000 Epoch: 32 Global Step: 57020 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-04 12:44:23,669-Speed 13842.07 samples/sec Loss 1.0768 LearningRate 0.0000 Epoch: 32 Global Step: 57030 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-03-04 12:45:30,793-Speed 3661.33 samples/sec Loss 1.0612 LearningRate 0.0000 Epoch: 33 Global Step: 57040 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-04 12:45:48,491-Speed 13888.05 samples/sec Loss 1.0777 LearningRate 0.0000 Epoch: 33 Global Step: 57050 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-04 12:46:06,209-Speed 13874.47 samples/sec Loss 1.0693 LearningRate 0.0000 Epoch: 33 Global Step: 57060 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-04 12:46:23,920-Speed 13876.91 samples/sec Loss 1.0669 LearningRate 0.0000 Epoch: 33 Global Step: 57070 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-04 12:46:41,595-Speed 13905.33 samples/sec Loss 1.0694 LearningRate 0.0000 Epoch: 33 Global Step: 57080 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-04 12:46:59,242-Speed 13927.32 samples/sec Loss 1.0695 LearningRate 0.0000 Epoch: 33 Global Step: 57090 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-04 12:47:16,969-Speed 13865.52 samples/sec Loss 1.0700 LearningRate 0.0000 Epoch: 33 Global Step: 57100 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-04 12:47:34,633-Speed 13914.36 samples/sec Loss 1.0716 LearningRate 0.0000 Epoch: 33 Global Step: 57110 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-04 12:47:52,345-Speed 13876.65 samples/sec Loss 1.0674 LearningRate 0.0000 Epoch: 33 Global Step: 57120 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-04 12:48:10,044-Speed 13886.99 samples/sec Loss 1.0798 LearningRate 0.0000 Epoch: 33 Global Step: 57130 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-04 12:48:27,846-Speed 13806.45 samples/sec Loss 1.0718 LearningRate 0.0000 Epoch: 33 Global Step: 57140 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-04 12:48:45,672-Speed 13787.32 samples/sec Loss 1.0673 LearningRate 0.0000 Epoch: 33 Global Step: 57150 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-04 12:49:03,403-Speed 13861.67 samples/sec Loss 1.0688 LearningRate 0.0000 Epoch: 33 Global Step: 57160 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-04 12:49:21,032-Speed 13941.42 samples/sec Loss 1.0593 LearningRate 0.0000 Epoch: 33 Global Step: 57170 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-04 12:49:38,782-Speed 13846.24 samples/sec Loss 1.0686 LearningRate 0.0000 Epoch: 33 Global Step: 57180 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-04 12:49:56,490-Speed 13879.41 samples/sec Loss 1.0677 LearningRate 0.0000 Epoch: 33 Global Step: 57190 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-04 12:50:14,161-Speed 13909.42 samples/sec Loss 1.0607 LearningRate 0.0000 Epoch: 33 Global Step: 57200 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-04 12:50:31,828-Speed 13911.86 samples/sec Loss 1.0716 LearningRate 0.0000 Epoch: 33 Global Step: 57210 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-04 12:50:49,569-Speed 13852.96 samples/sec Loss 1.0639 LearningRate 0.0000 Epoch: 33 Global Step: 57220 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-04 12:51:07,238-Speed 13911.64 samples/sec Loss 1.0675 LearningRate 0.0000 Epoch: 33 Global Step: 57230 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-04 12:51:24,978-Speed 13855.14 samples/sec Loss 1.0667 LearningRate 0.0000 Epoch: 33 Global Step: 57240 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-04 12:51:42,692-Speed 13874.53 samples/sec Loss 1.0663 LearningRate 0.0000 Epoch: 33 Global Step: 57250 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-04 12:52:00,387-Speed 13889.48 samples/sec Loss 1.0744 LearningRate 0.0000 Epoch: 33 Global Step: 57260 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-04 12:52:18,102-Speed 13873.97 samples/sec Loss 1.0655 LearningRate 0.0000 Epoch: 33 Global Step: 57270 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-04 12:52:35,829-Speed 13865.52 samples/sec Loss 1.0671 LearningRate 0.0000 Epoch: 33 Global Step: 57280 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-04 12:52:53,599-Speed 13830.59 samples/sec Loss 1.0709 LearningRate 0.0000 Epoch: 33 Global Step: 57290 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-04 12:53:11,346-Speed 13848.65 samples/sec Loss 1.0681 LearningRate 0.0000 Epoch: 33 Global Step: 57300 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-04 12:53:29,040-Speed 13890.30 samples/sec Loss 1.0679 LearningRate 0.0000 Epoch: 33 Global Step: 57310 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-04 12:53:46,808-Speed 13833.03 samples/sec Loss 1.0756 LearningRate 0.0000 Epoch: 33 Global Step: 57320 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-04 12:54:04,560-Speed 13845.25 samples/sec Loss 1.0599 LearningRate 0.0000 Epoch: 33 Global Step: 57330 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-04 12:54:22,368-Speed 13801.76 samples/sec Loss 1.0702 LearningRate 0.0000 Epoch: 33 Global Step: 57340 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-04 12:54:40,192-Speed 13788.85 samples/sec Loss 1.0709 LearningRate 0.0000 Epoch: 33 Global Step: 57350 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-04 12:54:57,845-Speed 13921.79 samples/sec Loss 1.0759 LearningRate 0.0000 Epoch: 33 Global Step: 57360 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-04 12:55:15,660-Speed 13796.74 samples/sec Loss 1.0698 LearningRate 0.0000 Epoch: 33 Global Step: 57370 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-04 12:55:33,337-Speed 13903.74 samples/sec Loss 1.0679 LearningRate 0.0000 Epoch: 33 Global Step: 57380 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-04 12:55:51,046-Speed 13878.22 samples/sec Loss 1.0645 LearningRate 0.0000 Epoch: 33 Global Step: 57390 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-04 12:56:08,746-Speed 13885.58 samples/sec Loss 1.0695 LearningRate 0.0000 Epoch: 33 Global Step: 57400 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-04 12:56:26,583-Speed 13778.95 samples/sec Loss 1.0742 LearningRate 0.0000 Epoch: 33 Global Step: 57410 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-04 12:56:44,380-Speed 13810.44 samples/sec Loss 1.0729 LearningRate 0.0000 Epoch: 33 Global Step: 57420 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-04 12:57:02,116-Speed 13856.76 samples/sec Loss 1.0657 LearningRate 0.0000 Epoch: 33 Global Step: 57430 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-04 12:57:19,857-Speed 13853.87 samples/sec Loss 1.0677 LearningRate 0.0000 Epoch: 33 Global Step: 57440 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-04 12:57:37,661-Speed 13804.70 samples/sec Loss 1.0615 LearningRate 0.0000 Epoch: 33 Global Step: 57450 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-04 12:57:55,468-Speed 13802.62 samples/sec Loss 1.0598 LearningRate 0.0000 Epoch: 33 Global Step: 57460 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-04 12:58:13,265-Speed 13810.45 samples/sec Loss 1.0684 LearningRate 0.0000 Epoch: 33 Global Step: 57470 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-04 12:58:31,051-Speed 13818.44 samples/sec Loss 1.0645 LearningRate 0.0000 Epoch: 33 Global Step: 57480 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-04 12:58:48,925-Speed 13750.20 samples/sec Loss 1.0634 LearningRate 0.0000 Epoch: 33 Global Step: 57490 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-04 12:59:06,729-Speed 13804.62 samples/sec Loss 1.0587 LearningRate 0.0000 Epoch: 33 Global Step: 57500 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-04 12:59:24,521-Speed 13813.49 samples/sec Loss 1.0667 LearningRate 0.0000 Epoch: 33 Global Step: 57510 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-04 12:59:42,243-Speed 13868.88 samples/sec Loss 1.0692 LearningRate 0.0000 Epoch: 33 Global Step: 57520 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-04 13:00:00,072-Speed 13784.39 samples/sec Loss 1.0687 LearningRate 0.0000 Epoch: 33 Global Step: 57530 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-04 13:00:17,870-Speed 13810.21 samples/sec Loss 1.0662 LearningRate 0.0000 Epoch: 33 Global Step: 57540 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-04 13:00:35,689-Speed 13793.03 samples/sec Loss 1.0610 LearningRate 0.0000 Epoch: 33 Global Step: 57550 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-04 13:00:53,439-Speed 13846.25 samples/sec Loss 1.0627 LearningRate 0.0000 Epoch: 33 Global Step: 57560 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-04 13:01:11,138-Speed 13886.41 samples/sec Loss 1.0720 LearningRate 0.0000 Epoch: 33 Global Step: 57570 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-04 13:01:28,873-Speed 13857.88 samples/sec Loss 1.0680 LearningRate 0.0000 Epoch: 33 Global Step: 57580 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-04 13:01:46,567-Speed 13890.71 samples/sec Loss 1.0635 LearningRate 0.0000 Epoch: 33 Global Step: 57590 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-04 13:02:04,368-Speed 13807.23 samples/sec Loss 1.0702 LearningRate 0.0000 Epoch: 33 Global Step: 57600 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-04 13:02:22,077-Speed 13878.44 samples/sec Loss 1.0628 LearningRate 0.0000 Epoch: 33 Global Step: 57610 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-04 13:02:39,860-Speed 13821.21 samples/sec Loss 1.0619 LearningRate 0.0000 Epoch: 33 Global Step: 57620 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-04 13:02:57,571-Speed 13876.86 samples/sec Loss 1.0652 LearningRate 0.0000 Epoch: 33 Global Step: 57630 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-04 13:03:15,351-Speed 13823.17 samples/sec Loss 1.0563 LearningRate 0.0000 Epoch: 33 Global Step: 57640 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-04 13:03:33,125-Speed 13827.46 samples/sec Loss 1.0616 LearningRate 0.0000 Epoch: 33 Global Step: 57650 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-04 13:03:50,855-Speed 13862.44 samples/sec Loss 1.0645 LearningRate 0.0000 Epoch: 33 Global Step: 57660 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-04 13:04:08,532-Speed 13903.65 samples/sec Loss 1.0549 LearningRate 0.0000 Epoch: 33 Global Step: 57670 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-04 13:04:26,289-Speed 13841.61 samples/sec Loss 1.0683 LearningRate 0.0000 Epoch: 33 Global Step: 57680 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-04 13:04:43,989-Speed 13885.82 samples/sec Loss 1.0634 LearningRate 0.0000 Epoch: 33 Global Step: 57690 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-04 13:05:01,718-Speed 13863.97 samples/sec Loss 1.0595 LearningRate 0.0000 Epoch: 33 Global Step: 57700 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-04 13:05:19,446-Speed 13866.00 samples/sec Loss 1.0611 LearningRate 0.0000 Epoch: 33 Global Step: 57710 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-04 13:05:37,265-Speed 13792.73 samples/sec Loss 1.0594 LearningRate 0.0000 Epoch: 33 Global Step: 57720 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-04 13:05:54,971-Speed 13880.54 samples/sec Loss 1.0598 LearningRate 0.0000 Epoch: 33 Global Step: 57730 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-04 13:06:12,716-Speed 13850.42 samples/sec Loss 1.0671 LearningRate 0.0000 Epoch: 33 Global Step: 57740 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-04 13:06:30,499-Speed 13821.23 samples/sec Loss 1.0728 LearningRate 0.0000 Epoch: 33 Global Step: 57750 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-04 13:06:48,306-Speed 13802.28 samples/sec Loss 1.0583 LearningRate 0.0000 Epoch: 33 Global Step: 57760 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-04 13:07:06,088-Speed 13823.75 samples/sec Loss 1.0706 LearningRate 0.0000 Epoch: 33 Global Step: 57770 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-04 13:07:23,771-Speed 13898.96 samples/sec Loss 1.0610 LearningRate 0.0000 Epoch: 33 Global Step: 57780 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-04 13:07:41,478-Speed 13880.76 samples/sec Loss 1.0545 LearningRate 0.0000 Epoch: 33 Global Step: 57790 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-04 13:07:59,277-Speed 13808.14 samples/sec Loss 1.0543 LearningRate 0.0000 Epoch: 33 Global Step: 57800 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-04 13:08:17,017-Speed 13854.29 samples/sec Loss 1.0589 LearningRate 0.0000 Epoch: 33 Global Step: 57810 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-04 13:08:34,875-Speed 13763.35 samples/sec Loss 1.0613 LearningRate 0.0000 Epoch: 33 Global Step: 57820 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-03-04 13:08:52,557-Speed 13899.90 samples/sec Loss 1.0571 LearningRate 0.0000 Epoch: 33 Global Step: 57830 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-04 13:09:10,325-Speed 13832.63 samples/sec Loss 1.0475 LearningRate 0.0000 Epoch: 33 Global Step: 57840 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-04 13:09:28,104-Speed 13823.73 samples/sec Loss 1.0498 LearningRate 0.0000 Epoch: 33 Global Step: 57850 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-04 13:09:45,844-Speed 13854.71 samples/sec Loss 1.0625 LearningRate 0.0000 Epoch: 33 Global Step: 57860 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-04 13:10:03,615-Speed 13830.39 samples/sec Loss 1.0641 LearningRate 0.0000 Epoch: 33 Global Step: 57870 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-04 13:10:21,457-Speed 13774.73 samples/sec Loss 1.0592 LearningRate 0.0000 Epoch: 33 Global Step: 57880 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-04 13:10:39,190-Speed 13859.88 samples/sec Loss 1.0637 LearningRate 0.0000 Epoch: 33 Global Step: 57890 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-04 13:10:56,944-Speed 13843.31 samples/sec Loss 1.0646 LearningRate 0.0000 Epoch: 33 Global Step: 57900 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-04 13:11:15,009-Speed 13605.26 samples/sec Loss 1.0634 LearningRate 0.0000 Epoch: 33 Global Step: 57910 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-04 13:11:33,018-Speed 13647.03 samples/sec Loss 1.0503 LearningRate 0.0000 Epoch: 33 Global Step: 57920 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-04 13:11:51,037-Speed 13639.56 samples/sec Loss 1.0519 LearningRate 0.0000 Epoch: 33 Global Step: 57930 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-03-04 13:12:09,120-Speed 13591.50 samples/sec Loss 1.0572 LearningRate 0.0000 Epoch: 33 Global Step: 57940 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-04 13:12:27,075-Speed 13689.00 samples/sec Loss 1.0529 LearningRate 0.0000 Epoch: 33 Global Step: 57950 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-04 13:12:45,085-Speed 13646.26 samples/sec Loss 1.0611 LearningRate 0.0000 Epoch: 33 Global Step: 57960 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-04 13:13:03,131-Speed 13619.27 samples/sec Loss 1.0495 LearningRate 0.0000 Epoch: 33 Global Step: 57970 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-04 13:13:21,146-Speed 13643.02 samples/sec Loss 1.0591 LearningRate 0.0000 Epoch: 33 Global Step: 57980 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-04 13:13:39,165-Speed 13640.27 samples/sec Loss 1.0534 LearningRate 0.0000 Epoch: 33 Global Step: 57990 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-04 13:13:57,203-Speed 13625.55 samples/sec Loss 1.0481 LearningRate 0.0000 Epoch: 33 Global Step: 58000 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-04 13:14:15,277-Speed 13598.69 samples/sec Loss 1.0504 LearningRate 0.0000 Epoch: 33 Global Step: 58010 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-04 13:14:33,291-Speed 13643.30 samples/sec Loss 1.0536 LearningRate 0.0000 Epoch: 33 Global Step: 58020 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-04 13:14:51,272-Speed 13668.70 samples/sec Loss 1.0543 LearningRate 0.0000 Epoch: 33 Global Step: 58030 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-04 13:15:09,286-Speed 13643.81 samples/sec Loss 1.0573 LearningRate 0.0000 Epoch: 33 Global Step: 58040 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-04 13:15:27,263-Speed 13671.73 samples/sec Loss 1.0576 LearningRate 0.0000 Epoch: 33 Global Step: 58050 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-04 13:15:45,306-Speed 13622.19 samples/sec Loss 1.0552 LearningRate 0.0000 Epoch: 33 Global Step: 58060 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-04 13:16:03,369-Speed 13606.54 samples/sec Loss 1.0563 LearningRate 0.0000 Epoch: 33 Global Step: 58070 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-04 13:16:21,454-Speed 13590.12 samples/sec Loss 1.0509 LearningRate 0.0000 Epoch: 33 Global Step: 58080 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-03-04 13:16:39,417-Speed 13681.98 samples/sec Loss 1.0574 LearningRate 0.0000 Epoch: 33 Global Step: 58090 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-04 13:16:57,396-Speed 13669.95 samples/sec Loss 1.0525 LearningRate 0.0000 Epoch: 33 Global Step: 58100 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-04 13:17:15,380-Speed 13666.75 samples/sec Loss 1.0602 LearningRate 0.0000 Epoch: 33 Global Step: 58110 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-04 13:17:33,489-Speed 13572.37 samples/sec Loss 1.0501 LearningRate 0.0000 Epoch: 33 Global Step: 58120 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-04 13:17:51,498-Speed 13646.72 samples/sec Loss 1.0552 LearningRate 0.0000 Epoch: 33 Global Step: 58130 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-04 13:18:09,487-Speed 13662.68 samples/sec Loss 1.0537 LearningRate 0.0000 Epoch: 33 Global Step: 58140 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-04 13:18:27,502-Speed 13642.73 samples/sec Loss 1.0584 LearningRate 0.0000 Epoch: 33 Global Step: 58150 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-04 13:18:45,490-Speed 13663.83 samples/sec Loss 1.0580 LearningRate 0.0000 Epoch: 33 Global Step: 58160 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-04 13:19:03,504-Speed 13643.21 samples/sec Loss 1.0504 LearningRate 0.0000 Epoch: 33 Global Step: 58170 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-03-04 13:19:21,453-Speed 13692.91 samples/sec Loss 1.0527 LearningRate 0.0000 Epoch: 33 Global Step: 58180 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-04 13:19:39,458-Speed 13650.62 samples/sec Loss 1.0518 LearningRate 0.0000 Epoch: 33 Global Step: 58190 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-04 13:19:57,446-Speed 13663.21 samples/sec Loss 1.0492 LearningRate 0.0000 Epoch: 33 Global Step: 58200 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-04 13:20:15,469-Speed 13636.97 samples/sec Loss 1.0482 LearningRate 0.0000 Epoch: 33 Global Step: 58210 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-04 13:20:33,445-Speed 13672.14 samples/sec Loss 1.0649 LearningRate 0.0000 Epoch: 33 Global Step: 58220 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-04 13:20:51,508-Speed 13606.74 samples/sec Loss 1.0479 LearningRate 0.0000 Epoch: 33 Global Step: 58230 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-04 13:21:09,531-Speed 13637.61 samples/sec Loss 1.0608 LearningRate 0.0000 Epoch: 33 Global Step: 58240 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-04 13:21:27,553-Speed 13637.05 samples/sec Loss 1.0555 LearningRate 0.0000 Epoch: 33 Global Step: 58250 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-04 13:21:45,574-Speed 13638.29 samples/sec Loss 1.0461 LearningRate 0.0000 Epoch: 33 Global Step: 58260 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-04 13:22:03,676-Speed 13577.39 samples/sec Loss 1.0553 LearningRate 0.0000 Epoch: 33 Global Step: 58270 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-04 13:22:21,818-Speed 13547.17 samples/sec Loss 1.0541 LearningRate 0.0000 Epoch: 33 Global Step: 58280 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-04 13:22:39,842-Speed 13636.39 samples/sec Loss 1.0444 LearningRate 0.0000 Epoch: 33 Global Step: 58290 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-04 13:22:57,793-Speed 13691.24 samples/sec Loss 1.0572 LearningRate 0.0000 Epoch: 33 Global Step: 58300 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-04 13:23:15,841-Speed 13617.91 samples/sec Loss 1.0545 LearningRate 0.0000 Epoch: 33 Global Step: 58310 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-04 13:23:33,835-Speed 13660.97 samples/sec Loss 1.0452 LearningRate 0.0000 Epoch: 33 Global Step: 58320 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-04 13:23:51,907-Speed 13599.46 samples/sec Loss 1.0480 LearningRate 0.0000 Epoch: 33 Global Step: 58330 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-04 13:24:09,919-Speed 13645.26 samples/sec Loss 1.0538 LearningRate 0.0000 Epoch: 33 Global Step: 58340 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-04 13:24:27,927-Speed 13647.83 samples/sec Loss 1.0433 LearningRate 0.0000 Epoch: 33 Global Step: 58350 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-04 13:24:45,951-Speed 13636.44 samples/sec Loss 1.0463 LearningRate 0.0000 Epoch: 33 Global Step: 58360 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-04 13:25:03,943-Speed 13659.71 samples/sec Loss 1.0489 LearningRate 0.0000 Epoch: 33 Global Step: 58370 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-04 13:25:21,902-Speed 13685.37 samples/sec Loss 1.0459 LearningRate 0.0000 Epoch: 33 Global Step: 58380 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-04 13:25:39,896-Speed 13659.25 samples/sec Loss 1.0436 LearningRate 0.0000 Epoch: 33 Global Step: 58390 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-04 13:25:57,953-Speed 13611.33 samples/sec Loss 1.0411 LearningRate 0.0000 Epoch: 33 Global Step: 58400 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-04 13:26:16,011-Speed 13610.37 samples/sec Loss 1.0551 LearningRate 0.0000 Epoch: 33 Global Step: 58410 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-04 13:26:34,009-Speed 13655.28 samples/sec Loss 1.0497 LearningRate 0.0000 Epoch: 33 Global Step: 58420 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-04 13:26:52,070-Speed 13608.26 samples/sec Loss 1.0440 LearningRate 0.0000 Epoch: 33 Global Step: 58430 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-04 13:27:10,079-Speed 13647.36 samples/sec Loss 1.0513 LearningRate 0.0000 Epoch: 33 Global Step: 58440 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-04 13:27:28,059-Speed 13669.27 samples/sec Loss 1.0509 LearningRate 0.0000 Epoch: 33 Global Step: 58450 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-04 13:27:46,099-Speed 13623.98 samples/sec Loss 1.0478 LearningRate 0.0000 Epoch: 33 Global Step: 58460 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-04 13:28:04,119-Speed 13639.04 samples/sec Loss 1.0565 LearningRate 0.0000 Epoch: 33 Global Step: 58470 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-04 13:28:22,145-Speed 13634.33 samples/sec Loss 1.0538 LearningRate 0.0000 Epoch: 33 Global Step: 58480 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-04 13:28:40,093-Speed 13693.92 samples/sec Loss 1.0480 LearningRate 0.0000 Epoch: 33 Global Step: 58490 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-04 13:28:58,167-Speed 13598.25 samples/sec Loss 1.0479 LearningRate 0.0000 Epoch: 33 Global Step: 58500 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-04 13:29:16,200-Speed 13629.13 samples/sec Loss 1.0439 LearningRate 0.0000 Epoch: 33 Global Step: 58510 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-04 13:29:34,259-Speed 13609.85 samples/sec Loss 1.0491 LearningRate 0.0000 Epoch: 33 Global Step: 58520 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-04 13:29:52,249-Speed 13661.85 samples/sec Loss 1.0502 LearningRate 0.0000 Epoch: 33 Global Step: 58530 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-03-04 13:30:10,234-Speed 13666.07 samples/sec Loss 1.0470 LearningRate 0.0000 Epoch: 33 Global Step: 58540 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-03-04 13:30:28,338-Speed 13575.19 samples/sec Loss 1.0491 LearningRate 0.0000 Epoch: 33 Global Step: 58550 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-03-04 13:30:46,449-Speed 13570.55 samples/sec Loss 1.0429 LearningRate 0.0000 Epoch: 33 Global Step: 58560 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-04 13:31:04,451-Speed 13652.66 samples/sec Loss 1.0482 LearningRate 0.0000 Epoch: 33 Global Step: 58570 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-04 13:31:22,502-Speed 13617.18 samples/sec Loss 1.0437 LearningRate 0.0000 Epoch: 33 Global Step: 58580 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-04 13:31:40,646-Speed 13546.47 samples/sec Loss 1.0424 LearningRate 0.0000 Epoch: 33 Global Step: 58590 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-04 13:31:58,650-Speed 13650.94 samples/sec Loss 1.0378 LearningRate 0.0000 Epoch: 33 Global Step: 58600 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-04 13:32:16,757-Speed 13573.49 samples/sec Loss 1.0547 LearningRate 0.0000 Epoch: 33 Global Step: 58610 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-04 13:32:34,794-Speed 13625.91 samples/sec Loss 1.0469 LearningRate 0.0000 Epoch: 33 Global Step: 58620 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-04 13:32:52,893-Speed 13580.03 samples/sec Loss 1.0477 LearningRate 0.0000 Epoch: 33 Global Step: 58630 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-04 13:33:10,920-Speed 13633.49 samples/sec Loss 1.0551 LearningRate 0.0000 Epoch: 33 Global Step: 58640 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-04 13:33:28,982-Speed 13607.43 samples/sec Loss 1.0503 LearningRate 0.0000 Epoch: 33 Global Step: 58650 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-04 13:33:46,960-Speed 13670.62 samples/sec Loss 1.0460 LearningRate 0.0000 Epoch: 33 Global Step: 58660 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-03-04 13:34:04,947-Speed 13663.97 samples/sec Loss 1.0495 LearningRate 0.0000 Epoch: 33 Global Step: 58670 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-04 13:34:23,057-Speed 13571.84 samples/sec Loss 1.0509 LearningRate 0.0000 Epoch: 33 Global Step: 58680 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-04 13:34:41,035-Speed 13670.64 samples/sec Loss 1.0442 LearningRate 0.0000 Epoch: 33 Global Step: 58690 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-04 13:34:58,995-Speed 13684.59 samples/sec Loss 1.0455 LearningRate 0.0000 Epoch: 33 Global Step: 58700 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-04 13:35:16,993-Speed 13655.85 samples/sec Loss 1.0423 LearningRate 0.0000 Epoch: 33 Global Step: 58710 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-04 13:35:34,983-Speed 13662.61 samples/sec Loss 1.0443 LearningRate 0.0000 Epoch: 33 Global Step: 58720 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-04 13:35:53,065-Speed 13592.89 samples/sec Loss 1.0451 LearningRate 0.0000 Epoch: 33 Global Step: 58730 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-04 13:36:10,987-Speed 13713.57 samples/sec Loss 1.0511 LearningRate 0.0000 Epoch: 33 Global Step: 58740 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-04 13:36:29,008-Speed 13638.21 samples/sec Loss 1.0542 LearningRate 0.0000 Epoch: 33 Global Step: 58750 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-04 13:36:47,008-Speed 13654.55 samples/sec Loss 1.0472 LearningRate 0.0000 Epoch: 33 Global Step: 58760 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-04 13:37:56,234-Speed 3550.19 samples/sec Loss 1.0436 LearningRate 0.0000 Epoch: 34 Global Step: 58770 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-04 13:38:14,173-Speed 13700.57 samples/sec Loss 1.0386 LearningRate 0.0000 Epoch: 34 Global Step: 58780 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-04 13:38:32,133-Speed 13684.62 samples/sec Loss 1.0512 LearningRate 0.0000 Epoch: 34 Global Step: 58790 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-04 13:38:50,122-Speed 13662.36 samples/sec Loss 1.0375 LearningRate 0.0000 Epoch: 34 Global Step: 58800 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-04 13:39:08,099-Speed 13672.56 samples/sec Loss 1.0427 LearningRate 0.0000 Epoch: 34 Global Step: 58810 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-04 13:39:26,107-Speed 13647.67 samples/sec Loss 1.0358 LearningRate 0.0000 Epoch: 34 Global Step: 58820 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-04 13:39:44,138-Speed 13630.70 samples/sec Loss 1.0417 LearningRate 0.0000 Epoch: 34 Global Step: 58830 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-04 13:40:02,002-Speed 13758.02 samples/sec Loss 1.0404 LearningRate 0.0000 Epoch: 34 Global Step: 58840 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-04 13:40:19,935-Speed 13705.99 samples/sec Loss 1.0440 LearningRate 0.0000 Epoch: 34 Global Step: 58850 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-04 13:40:37,901-Speed 13679.89 samples/sec Loss 1.0353 LearningRate 0.0000 Epoch: 34 Global Step: 58860 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-04 13:40:55,843-Speed 13698.92 samples/sec Loss 1.0381 LearningRate 0.0000 Epoch: 34 Global Step: 58870 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-04 13:41:13,825-Speed 13667.56 samples/sec Loss 1.0368 LearningRate 0.0000 Epoch: 34 Global Step: 58880 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-04 13:41:31,790-Speed 13680.71 samples/sec Loss 1.0351 LearningRate 0.0000 Epoch: 34 Global Step: 58890 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-04 13:41:49,723-Speed 13705.47 samples/sec Loss 1.0421 LearningRate 0.0000 Epoch: 34 Global Step: 58900 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-04 13:42:07,793-Speed 13601.76 samples/sec Loss 1.0455 LearningRate 0.0000 Epoch: 34 Global Step: 58910 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-04 13:42:25,786-Speed 13659.14 samples/sec Loss 1.0406 LearningRate 0.0000 Epoch: 34 Global Step: 58920 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-04 13:42:43,840-Speed 13613.77 samples/sec Loss 1.0376 LearningRate 0.0000 Epoch: 34 Global Step: 58930 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-04 13:43:01,746-Speed 13725.15 samples/sec Loss 1.0346 LearningRate 0.0000 Epoch: 34 Global Step: 58940 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-04 13:43:19,721-Speed 13674.32 samples/sec Loss 1.0438 LearningRate 0.0000 Epoch: 34 Global Step: 58950 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-04 13:43:37,665-Speed 13696.26 samples/sec Loss 1.0385 LearningRate 0.0000 Epoch: 34 Global Step: 58960 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-04 13:43:55,649-Speed 13666.38 samples/sec Loss 1.0399 LearningRate 0.0000 Epoch: 34 Global Step: 58970 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-04 13:44:13,608-Speed 13685.72 samples/sec Loss 1.0368 LearningRate 0.0000 Epoch: 34 Global Step: 58980 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-04 13:44:31,541-Speed 13705.35 samples/sec Loss 1.0381 LearningRate 0.0000 Epoch: 34 Global Step: 58990 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-04 13:44:49,528-Speed 13664.00 samples/sec Loss 1.0384 LearningRate 0.0000 Epoch: 34 Global Step: 59000 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-04 13:45:07,503-Speed 13673.18 samples/sec Loss 1.0446 LearningRate 0.0000 Epoch: 34 Global Step: 59010 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-04 13:45:25,453-Speed 13692.86 samples/sec Loss 1.0388 LearningRate 0.0000 Epoch: 34 Global Step: 59020 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-04 13:45:43,422-Speed 13677.41 samples/sec Loss 1.0429 LearningRate 0.0000 Epoch: 34 Global Step: 59030 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-04 13:46:01,308-Speed 13741.14 samples/sec Loss 1.0434 LearningRate 0.0000 Epoch: 34 Global Step: 59040 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-03-04 13:46:19,272-Speed 13681.88 samples/sec Loss 1.0442 LearningRate 0.0000 Epoch: 34 Global Step: 59050 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-03-04 13:46:37,252-Speed 13669.24 samples/sec Loss 1.0374 LearningRate 0.0000 Epoch: 34 Global Step: 59060 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-03-04 13:46:55,197-Speed 13696.29 samples/sec Loss 1.0396 LearningRate 0.0000 Epoch: 34 Global Step: 59070 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-03-04 13:47:13,126-Speed 13708.65 samples/sec Loss 1.0481 LearningRate 0.0000 Epoch: 34 Global Step: 59080 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-03-04 13:47:31,120-Speed 13658.24 samples/sec Loss 1.0446 LearningRate 0.0000 Epoch: 34 Global Step: 59090 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-03-04 13:47:49,029-Speed 13724.15 samples/sec Loss 1.0376 LearningRate 0.0000 Epoch: 34 Global Step: 59100 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-03-04 13:48:07,036-Speed 13648.76 samples/sec Loss 1.0410 LearningRate 0.0000 Epoch: 34 Global Step: 59110 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-03-04 13:48:25,046-Speed 13646.81 samples/sec Loss 1.0382 LearningRate 0.0000 Epoch: 34 Global Step: 59120 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-03-04 13:48:43,053-Speed 13648.93 samples/sec Loss 1.0444 LearningRate 0.0000 Epoch: 34 Global Step: 59130 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-03-04 13:49:01,040-Speed 13664.12 samples/sec Loss 1.0401 LearningRate 0.0000 Epoch: 34 Global Step: 59140 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-04 13:49:19,117-Speed 13595.52 samples/sec Loss 1.0403 LearningRate 0.0000 Epoch: 34 Global Step: 59150 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-04 13:49:37,105-Speed 13664.00 samples/sec Loss 1.0336 LearningRate 0.0000 Epoch: 34 Global Step: 59160 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-04 13:49:55,089-Speed 13666.01 samples/sec Loss 1.0425 LearningRate 0.0000 Epoch: 34 Global Step: 59170 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-04 13:50:13,091-Speed 13652.81 samples/sec Loss 1.0395 LearningRate 0.0000 Epoch: 34 Global Step: 59180 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-04 13:50:31,063-Speed 13675.73 samples/sec Loss 1.0402 LearningRate 0.0000 Epoch: 34 Global Step: 59190 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-04 13:50:49,077-Speed 13643.20 samples/sec Loss 1.0388 LearningRate 0.0000 Epoch: 34 Global Step: 59200 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-04 13:51:07,098-Speed 13638.38 samples/sec Loss 1.0428 LearningRate 0.0000 Epoch: 34 Global Step: 59210 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-04 13:51:25,060-Speed 13683.23 samples/sec Loss 1.0337 LearningRate 0.0000 Epoch: 34 Global Step: 59220 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-04 13:51:43,038-Speed 13671.88 samples/sec Loss 1.0368 LearningRate 0.0000 Epoch: 34 Global Step: 59230 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-04 13:52:01,102-Speed 13605.74 samples/sec Loss 1.0436 LearningRate 0.0000 Epoch: 34 Global Step: 59240 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-04 13:52:19,183-Speed 13592.68 samples/sec Loss 1.0379 LearningRate 0.0000 Epoch: 34 Global Step: 59250 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-04 13:52:37,195-Speed 13644.97 samples/sec Loss 1.0395 LearningRate 0.0000 Epoch: 34 Global Step: 59260 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-04 13:52:55,285-Speed 13586.61 samples/sec Loss 1.0353 LearningRate 0.0000 Epoch: 34 Global Step: 59270 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-04 13:53:13,309-Speed 13636.12 samples/sec Loss 1.0286 LearningRate 0.0000 Epoch: 34 Global Step: 59280 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-04 13:53:31,334-Speed 13634.86 samples/sec Loss 1.0348 LearningRate 0.0000 Epoch: 34 Global Step: 59290 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-04 13:53:49,374-Speed 13623.57 samples/sec Loss 1.0378 LearningRate 0.0000 Epoch: 34 Global Step: 59300 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-04 13:54:07,401-Speed 13634.44 samples/sec Loss 1.0376 LearningRate 0.0000 Epoch: 34 Global Step: 59310 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-04 13:54:25,411-Speed 13647.28 samples/sec Loss 1.0335 LearningRate 0.0000 Epoch: 34 Global Step: 59320 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-04 13:54:43,499-Speed 13587.56 samples/sec Loss 1.0323 LearningRate 0.0000 Epoch: 34 Global Step: 59330 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-04 13:55:01,533-Speed 13628.29 samples/sec Loss 1.0421 LearningRate 0.0000 Epoch: 34 Global Step: 59340 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-04 13:55:19,338-Speed 13804.28 samples/sec Loss 1.0370 LearningRate 0.0000 Epoch: 34 Global Step: 59350 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-04 13:55:37,045-Speed 13880.14 samples/sec Loss 1.0325 LearningRate 0.0000 Epoch: 34 Global Step: 59360 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-04 13:55:54,697-Speed 13923.43 samples/sec Loss 1.0317 LearningRate 0.0000 Epoch: 34 Global Step: 59370 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-03-04 13:56:12,397-Speed 13885.03 samples/sec Loss 1.0276 LearningRate 0.0000 Epoch: 34 Global Step: 59380 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-03-04 13:56:30,082-Speed 13898.19 samples/sec Loss 1.0323 LearningRate 0.0000 Epoch: 34 Global Step: 59390 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-03-04 13:56:47,730-Speed 13927.37 samples/sec Loss 1.0429 LearningRate 0.0000 Epoch: 34 Global Step: 59400 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-03-04 13:57:05,433-Speed 13883.21 samples/sec Loss 1.0366 LearningRate 0.0000 Epoch: 34 Global Step: 59410 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-03-04 13:57:23,185-Speed 13844.35 samples/sec Loss 1.0362 LearningRate 0.0000 Epoch: 34 Global Step: 59420 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-03-04 13:57:40,866-Speed 13901.49 samples/sec Loss 1.0306 LearningRate 0.0000 Epoch: 34 Global Step: 59430 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-03-04 13:57:58,603-Speed 13855.97 samples/sec Loss 1.0323 LearningRate 0.0000 Epoch: 34 Global Step: 59440 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-03-04 13:58:16,316-Speed 13875.78 samples/sec Loss 1.0379 LearningRate 0.0000 Epoch: 34 Global Step: 59450 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-03-04 13:58:34,019-Speed 13884.26 samples/sec Loss 1.0342 LearningRate 0.0000 Epoch: 34 Global Step: 59460 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-03-04 13:58:51,714-Speed 13889.89 samples/sec Loss 1.0309 LearningRate 0.0000 Epoch: 34 Global Step: 59470 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-04 13:59:09,453-Speed 13854.67 samples/sec Loss 1.0360 LearningRate 0.0000 Epoch: 34 Global Step: 59480 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-04 13:59:27,195-Speed 13853.19 samples/sec Loss 1.0374 LearningRate 0.0000 Epoch: 34 Global Step: 59490 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-04 13:59:44,891-Speed 13888.36 samples/sec Loss 1.0277 LearningRate 0.0000 Epoch: 34 Global Step: 59500 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-04 14:00:02,603-Speed 13876.67 samples/sec Loss 1.0398 LearningRate 0.0000 Epoch: 34 Global Step: 59510 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-04 14:00:20,314-Speed 13876.89 samples/sec Loss 1.0340 LearningRate 0.0000 Epoch: 34 Global Step: 59520 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-04 14:00:38,098-Speed 13819.97 samples/sec Loss 1.0308 LearningRate 0.0000 Epoch: 34 Global Step: 59530 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-04 14:00:55,799-Speed 13884.29 samples/sec Loss 1.0353 LearningRate 0.0000 Epoch: 34 Global Step: 59540 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-04 14:01:13,514-Speed 13875.50 samples/sec Loss 1.0250 LearningRate 0.0000 Epoch: 34 Global Step: 59550 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-04 14:01:31,192-Speed 13902.45 samples/sec Loss 1.0312 LearningRate 0.0000 Epoch: 34 Global Step: 59560 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-04 14:01:48,900-Speed 13879.77 samples/sec Loss 1.0308 LearningRate 0.0000 Epoch: 34 Global Step: 59570 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-04 14:02:06,612-Speed 13876.09 samples/sec Loss 1.0317 LearningRate 0.0000 Epoch: 34 Global Step: 59580 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-04 14:02:24,283-Speed 13908.48 samples/sec Loss 1.0338 LearningRate 0.0000 Epoch: 34 Global Step: 59590 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-04 14:02:41,967-Speed 13898.33 samples/sec Loss 1.0350 LearningRate 0.0000 Epoch: 34 Global Step: 59600 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-04 14:02:59,661-Speed 13890.63 samples/sec Loss 1.0310 LearningRate 0.0000 Epoch: 34 Global Step: 59610 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-04 14:03:17,331-Speed 13908.74 samples/sec Loss 1.0354 LearningRate 0.0000 Epoch: 34 Global Step: 59620 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-04 14:03:35,051-Speed 13870.13 samples/sec Loss 1.0253 LearningRate 0.0000 Epoch: 34 Global Step: 59630 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-04 14:03:52,764-Speed 13875.24 samples/sec Loss 1.0227 LearningRate 0.0000 Epoch: 34 Global Step: 59640 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-04 14:04:10,498-Speed 13859.22 samples/sec Loss 1.0277 LearningRate 0.0000 Epoch: 34 Global Step: 59650 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-04 14:04:28,147-Speed 13926.42 samples/sec Loss 1.0305 LearningRate 0.0000 Epoch: 34 Global Step: 59660 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-04 14:04:45,919-Speed 13828.71 samples/sec Loss 1.0320 LearningRate 0.0000 Epoch: 34 Global Step: 59670 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-04 14:05:03,592-Speed 13907.47 samples/sec Loss 1.0242 LearningRate 0.0000 Epoch: 34 Global Step: 59680 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-04 14:05:21,243-Speed 13925.45 samples/sec Loss 1.0292 LearningRate 0.0000 Epoch: 34 Global Step: 59690 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-04 14:05:39,019-Speed 13826.22 samples/sec Loss 1.0236 LearningRate 0.0000 Epoch: 34 Global Step: 59700 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-04 14:05:56,689-Speed 13908.77 samples/sec Loss 1.0290 LearningRate 0.0000 Epoch: 34 Global Step: 59710 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-04 14:06:14,431-Speed 13853.45 samples/sec Loss 1.0330 LearningRate 0.0000 Epoch: 34 Global Step: 59720 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-04 14:06:32,142-Speed 13876.63 samples/sec Loss 1.0286 LearningRate 0.0000 Epoch: 34 Global Step: 59730 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-04 14:06:49,859-Speed 13872.31 samples/sec Loss 1.0287 LearningRate 0.0000 Epoch: 34 Global Step: 59740 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-04 14:07:07,572-Speed 13875.79 samples/sec Loss 1.0385 LearningRate 0.0000 Epoch: 34 Global Step: 59750 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-04 14:07:25,241-Speed 13910.63 samples/sec Loss 1.0328 LearningRate 0.0000 Epoch: 34 Global Step: 59760 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-04 14:07:43,071-Speed 13783.99 samples/sec Loss 1.0349 LearningRate 0.0000 Epoch: 34 Global Step: 59770 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-04 14:08:00,748-Speed 13903.60 samples/sec Loss 1.0287 LearningRate 0.0000 Epoch: 34 Global Step: 59780 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-04 14:08:18,456-Speed 13881.69 samples/sec Loss 1.0342 LearningRate 0.0000 Epoch: 34 Global Step: 59790 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-04 14:08:36,152-Speed 13888.67 samples/sec Loss 1.0355 LearningRate 0.0000 Epoch: 34 Global Step: 59800 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-04 14:08:53,903-Speed 13845.61 samples/sec Loss 1.0258 LearningRate 0.0000 Epoch: 34 Global Step: 59810 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-04 14:09:11,632-Speed 13863.44 samples/sec Loss 1.0310 LearningRate 0.0000 Epoch: 34 Global Step: 59820 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-04 14:09:29,347-Speed 13873.73 samples/sec Loss 1.0272 LearningRate 0.0000 Epoch: 34 Global Step: 59830 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-04 14:09:47,010-Speed 13914.24 samples/sec Loss 1.0272 LearningRate 0.0000 Epoch: 34 Global Step: 59840 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-04 14:10:04,745-Speed 13858.94 samples/sec Loss 1.0276 LearningRate 0.0000 Epoch: 34 Global Step: 59850 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-04 14:10:22,364-Speed 13949.73 samples/sec Loss 1.0247 LearningRate 0.0000 Epoch: 34 Global Step: 59860 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-04 14:10:40,002-Speed 13934.19 samples/sec Loss 1.0179 LearningRate 0.0000 Epoch: 34 Global Step: 59870 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-04 14:10:57,660-Speed 13918.28 samples/sec Loss 1.0270 LearningRate 0.0000 Epoch: 34 Global Step: 59880 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-04 14:11:15,382-Speed 13868.76 samples/sec Loss 1.0283 LearningRate 0.0000 Epoch: 34 Global Step: 59890 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-04 14:11:33,055-Speed 13907.05 samples/sec Loss 1.0298 LearningRate 0.0000 Epoch: 34 Global Step: 59900 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-04 14:11:50,782-Speed 13864.42 samples/sec Loss 1.0262 LearningRate 0.0000 Epoch: 34 Global Step: 59910 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-04 14:12:08,441-Speed 13917.47 samples/sec Loss 1.0285 LearningRate 0.0000 Epoch: 34 Global Step: 59920 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-04 14:12:26,211-Speed 13831.40 samples/sec Loss 1.0309 LearningRate 0.0000 Epoch: 34 Global Step: 59930 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-04 14:12:43,906-Speed 13889.55 samples/sec Loss 1.0237 LearningRate 0.0000 Epoch: 34 Global Step: 59940 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-04 14:13:01,597-Speed 13892.50 samples/sec Loss 1.0261 LearningRate 0.0000 Epoch: 34 Global Step: 59950 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-04 14:13:19,323-Speed 13864.70 samples/sec Loss 1.0256 LearningRate 0.0000 Epoch: 34 Global Step: 59960 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-04 14:13:37,095-Speed 13829.82 samples/sec Loss 1.0215 LearningRate 0.0000 Epoch: 34 Global Step: 59970 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-03-04 14:13:54,758-Speed 13915.06 samples/sec Loss 1.0302 LearningRate 0.0000 Epoch: 34 Global Step: 59980 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-03-04 14:14:12,451-Speed 13891.31 samples/sec Loss 1.0170 LearningRate 0.0000 Epoch: 34 Global Step: 59990 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-04 14:14:30,181-Speed 13862.10 samples/sec Loss 1.0284 LearningRate 0.0000 Epoch: 34 Global Step: 60000 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-04 14:14:47,982-Speed 13806.28 samples/sec Loss 1.0168 LearningRate 0.0000 Epoch: 34 Global Step: 60010 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-04 14:15:05,659-Speed 13904.41 samples/sec Loss 1.0277 LearningRate 0.0000 Epoch: 34 Global Step: 60020 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-03-04 14:15:23,365-Speed 13880.41 samples/sec Loss 1.0257 LearningRate 0.0000 Epoch: 34 Global Step: 60030 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-04 14:15:41,040-Speed 13905.37 samples/sec Loss 1.0245 LearningRate 0.0000 Epoch: 34 Global Step: 60040 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-04 14:15:58,715-Speed 13905.40 samples/sec Loss 1.0262 LearningRate 0.0000 Epoch: 34 Global Step: 60050 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-04 14:16:16,425-Speed 13878.36 samples/sec Loss 1.0195 LearningRate 0.0000 Epoch: 34 Global Step: 60060 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-04 14:16:34,180-Speed 13841.76 samples/sec Loss 1.0266 LearningRate 0.0000 Epoch: 34 Global Step: 60070 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-04 14:16:51,937-Speed 13840.94 samples/sec Loss 1.0208 LearningRate 0.0000 Epoch: 34 Global Step: 60080 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-03-04 14:17:09,671-Speed 13859.24 samples/sec Loss 1.0190 LearningRate 0.0000 Epoch: 34 Global Step: 60090 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-03-04 14:17:27,398-Speed 13864.60 samples/sec Loss 1.0394 LearningRate 0.0000 Epoch: 34 Global Step: 60100 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-03-04 14:17:45,064-Speed 13913.05 samples/sec Loss 1.0248 LearningRate 0.0000 Epoch: 34 Global Step: 60110 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-03-04 14:18:02,740-Speed 13903.94 samples/sec Loss 1.0212 LearningRate 0.0000 Epoch: 34 Global Step: 60120 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-03-04 14:18:20,524-Speed 13830.27 samples/sec Loss 1.0294 LearningRate 0.0000 Epoch: 34 Global Step: 60130 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-03-04 14:18:38,275-Speed 13845.61 samples/sec Loss 1.0232 LearningRate 0.0000 Epoch: 34 Global Step: 60140 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-03-04 14:18:56,007-Speed 13860.92 samples/sec Loss 1.0262 LearningRate 0.0000 Epoch: 34 Global Step: 60150 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-03-04 14:19:13,696-Speed 13894.61 samples/sec Loss 1.0182 LearningRate 0.0000 Epoch: 34 Global Step: 60160 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-03-04 14:19:31,400-Speed 13882.31 samples/sec Loss 1.0177 LearningRate 0.0000 Epoch: 34 Global Step: 60170 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-03-04 14:19:49,144-Speed 13851.24 samples/sec Loss 1.0237 LearningRate 0.0000 Epoch: 34 Global Step: 60180 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-03-04 14:20:06,873-Speed 13862.73 samples/sec Loss 1.0152 LearningRate 0.0000 Epoch: 34 Global Step: 60190 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-04 14:20:24,550-Speed 13903.72 samples/sec Loss 1.0177 LearningRate 0.0000 Epoch: 34 Global Step: 60200 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-04 14:20:42,284-Speed 13859.41 samples/sec Loss 1.0223 LearningRate 0.0000 Epoch: 34 Global Step: 60210 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-04 14:20:59,950-Speed 13914.21 samples/sec Loss 1.0170 LearningRate 0.0000 Epoch: 34 Global Step: 60220 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-04 14:21:17,670-Speed 13870.01 samples/sec Loss 1.0211 LearningRate 0.0000 Epoch: 34 Global Step: 60230 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-04 14:21:35,394-Speed 13866.83 samples/sec Loss 1.0241 LearningRate 0.0000 Epoch: 34 Global Step: 60240 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-04 14:21:53,153-Speed 13839.28 samples/sec Loss 1.0271 LearningRate 0.0000 Epoch: 34 Global Step: 60250 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-04 14:22:10,831-Speed 13902.54 samples/sec Loss 1.0271 LearningRate 0.0000 Epoch: 34 Global Step: 60260 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-04 14:22:28,550-Speed 13871.24 samples/sec Loss 1.0159 LearningRate 0.0000 Epoch: 34 Global Step: 60270 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-04 14:22:46,216-Speed 13912.18 samples/sec Loss 1.0276 LearningRate 0.0000 Epoch: 34 Global Step: 60280 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-04 14:23:03,939-Speed 13867.94 samples/sec Loss 1.0274 LearningRate 0.0000 Epoch: 34 Global Step: 60290 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-04 14:23:21,703-Speed 13835.58 samples/sec Loss 1.0273 LearningRate 0.0000 Epoch: 34 Global Step: 60300 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-04 14:23:39,462-Speed 13839.63 samples/sec Loss 1.0302 LearningRate 0.0000 Epoch: 34 Global Step: 60310 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-04 14:23:57,274-Speed 13798.34 samples/sec Loss 1.0212 LearningRate 0.0000 Epoch: 34 Global Step: 60320 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-04 14:24:14,987-Speed 13876.01 samples/sec Loss 1.0170 LearningRate 0.0000 Epoch: 34 Global Step: 60330 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-04 14:24:32,743-Speed 13841.63 samples/sec Loss 1.0237 LearningRate 0.0000 Epoch: 34 Global Step: 60340 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-04 14:24:50,463-Speed 13869.60 samples/sec Loss 1.0190 LearningRate 0.0000 Epoch: 34 Global Step: 60350 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-04 14:25:08,226-Speed 13836.55 samples/sec Loss 1.0199 LearningRate 0.0000 Epoch: 34 Global Step: 60360 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-04 14:25:26,043-Speed 13794.80 samples/sec Loss 1.0209 LearningRate 0.0000 Epoch: 34 Global Step: 60370 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-04 14:25:43,776-Speed 13861.20 samples/sec Loss 1.0233 LearningRate 0.0000 Epoch: 34 Global Step: 60380 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-04 14:26:01,469-Speed 13890.41 samples/sec Loss 1.0282 LearningRate 0.0000 Epoch: 34 Global Step: 60390 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-04 14:26:19,273-Speed 13805.06 samples/sec Loss 1.0307 LearningRate 0.0000 Epoch: 34 Global Step: 60400 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-04 14:26:36,972-Speed 13886.38 samples/sec Loss 1.0221 LearningRate 0.0000 Epoch: 34 Global Step: 60410 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-04 14:26:54,674-Speed 13883.80 samples/sec Loss 1.0205 LearningRate 0.0000 Epoch: 34 Global Step: 60420 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-04 14:27:12,333-Speed 13918.06 samples/sec Loss 1.0339 LearningRate 0.0000 Epoch: 34 Global Step: 60430 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-04 14:27:30,144-Speed 13798.80 samples/sec Loss 1.0184 LearningRate 0.0000 Epoch: 34 Global Step: 60440 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-04 14:27:47,887-Speed 13852.01 samples/sec Loss 1.0221 LearningRate 0.0000 Epoch: 34 Global Step: 60450 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-04 14:28:05,595-Speed 13879.76 samples/sec Loss 1.0253 LearningRate 0.0000 Epoch: 34 Global Step: 60460 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-04 14:28:23,324-Speed 13863.21 samples/sec Loss 1.0207 LearningRate 0.0000 Epoch: 34 Global Step: 60470 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-04 14:28:41,036-Speed 13876.07 samples/sec Loss 1.0163 LearningRate 0.0000 Epoch: 34 Global Step: 60480 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-04 14:28:58,709-Speed 13906.79 samples/sec Loss 1.0186 LearningRate 0.0000 Epoch: 34 Global Step: 60490 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-04 14:30:07,663-Speed 3564.17 samples/sec Loss 1.0206 LearningRate 0.0000 Epoch: 35 Global Step: 60500 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-04 14:30:25,362-Speed 13886.56 samples/sec Loss 1.0293 LearningRate 0.0000 Epoch: 35 Global Step: 60510 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-04 14:30:43,013-Speed 13923.59 samples/sec Loss 1.0210 LearningRate 0.0000 Epoch: 35 Global Step: 60520 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-04 14:31:00,665-Speed 13923.86 samples/sec Loss 1.0191 LearningRate 0.0000 Epoch: 35 Global Step: 60530 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-04 14:31:18,368-Speed 13883.27 samples/sec Loss 1.0216 LearningRate 0.0000 Epoch: 35 Global Step: 60540 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-04 14:31:36,119-Speed 13845.43 samples/sec Loss 1.0222 LearningRate 0.0000 Epoch: 35 Global Step: 60550 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-04 14:31:53,795-Speed 13904.71 samples/sec Loss 1.0115 LearningRate 0.0000 Epoch: 35 Global Step: 60560 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-04 14:32:11,522-Speed 13864.72 samples/sec Loss 1.0121 LearningRate 0.0000 Epoch: 35 Global Step: 60570 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-04 14:32:29,240-Speed 13871.99 samples/sec Loss 1.0171 LearningRate 0.0000 Epoch: 35 Global Step: 60580 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-04 14:32:46,938-Speed 13886.41 samples/sec Loss 1.0110 LearningRate 0.0000 Epoch: 35 Global Step: 60590 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-04 14:33:04,626-Speed 13894.92 samples/sec Loss 1.0166 LearningRate 0.0000 Epoch: 35 Global Step: 60600 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-04 14:33:22,319-Speed 13891.94 samples/sec Loss 1.0163 LearningRate 0.0000 Epoch: 35 Global Step: 60610 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-04 14:33:40,017-Speed 13887.60 samples/sec Loss 1.0154 LearningRate 0.0000 Epoch: 35 Global Step: 60620 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-04 14:33:57,740-Speed 13867.31 samples/sec Loss 1.0124 LearningRate 0.0000 Epoch: 35 Global Step: 60630 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-04 14:34:15,484-Speed 13851.31 samples/sec Loss 1.0185 LearningRate 0.0000 Epoch: 35 Global Step: 60640 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-04 14:34:33,291-Speed 13802.21 samples/sec Loss 1.0206 LearningRate 0.0000 Epoch: 35 Global Step: 60650 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-04 14:34:50,954-Speed 13914.85 samples/sec Loss 1.0179 LearningRate 0.0000 Epoch: 35 Global Step: 60660 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-04 14:35:08,667-Speed 13875.65 samples/sec Loss 1.0196 LearningRate 0.0000 Epoch: 35 Global Step: 60670 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-04 14:35:26,521-Speed 13765.77 samples/sec Loss 1.0122 LearningRate 0.0000 Epoch: 35 Global Step: 60680 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-04 14:35:44,209-Speed 13895.05 samples/sec Loss 1.0123 LearningRate 0.0000 Epoch: 35 Global Step: 60690 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-04 14:36:01,872-Speed 13914.70 samples/sec Loss 1.0121 LearningRate 0.0000 Epoch: 35 Global Step: 60700 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-04 14:36:19,644-Speed 13829.43 samples/sec Loss 1.0127 LearningRate 0.0000 Epoch: 35 Global Step: 60710 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-04 14:36:37,423-Speed 13825.07 samples/sec Loss 1.0152 LearningRate 0.0000 Epoch: 35 Global Step: 60720 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-04 14:36:55,163-Speed 13853.60 samples/sec Loss 1.0199 LearningRate 0.0000 Epoch: 35 Global Step: 60730 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-04 14:37:12,901-Speed 13856.23 samples/sec Loss 1.0179 LearningRate 0.0000 Epoch: 35 Global Step: 60740 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-04 14:37:30,620-Speed 13870.99 samples/sec Loss 1.0161 LearningRate 0.0000 Epoch: 35 Global Step: 60750 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-04 14:37:48,354-Speed 13859.08 samples/sec Loss 1.0150 LearningRate 0.0000 Epoch: 35 Global Step: 60760 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-04 14:38:06,030-Speed 13904.40 samples/sec Loss 1.0259 LearningRate 0.0000 Epoch: 35 Global Step: 60770 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-04 14:38:23,731-Speed 13885.25 samples/sec Loss 1.0140 LearningRate 0.0000 Epoch: 35 Global Step: 60780 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-04 14:38:41,461-Speed 13862.01 samples/sec Loss 1.0074 LearningRate 0.0000 Epoch: 35 Global Step: 60790 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-04 14:38:59,107-Speed 13927.65 samples/sec Loss 1.0138 LearningRate 0.0000 Epoch: 35 Global Step: 60800 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-04 14:39:16,793-Speed 13896.83 samples/sec Loss 1.0191 LearningRate 0.0000 Epoch: 35 Global Step: 60810 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-04 14:39:34,543-Speed 13849.58 samples/sec Loss 1.0180 LearningRate 0.0000 Epoch: 35 Global Step: 60820 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-04 14:39:52,397-Speed 13766.68 samples/sec Loss 1.0110 LearningRate 0.0000 Epoch: 35 Global Step: 60830 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-04 14:40:10,227-Speed 13784.35 samples/sec Loss 1.0153 LearningRate 0.0000 Epoch: 35 Global Step: 60840 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-04 14:40:27,894-Speed 13910.84 samples/sec Loss 1.0167 LearningRate 0.0000 Epoch: 35 Global Step: 60850 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-04 14:40:45,559-Speed 13913.49 samples/sec Loss 1.0212 LearningRate 0.0000 Epoch: 35 Global Step: 60860 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-04 14:41:03,252-Speed 13891.66 samples/sec Loss 1.0160 LearningRate 0.0000 Epoch: 35 Global Step: 60870 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-04 14:41:20,938-Speed 13896.60 samples/sec Loss 1.0235 LearningRate 0.0000 Epoch: 35 Global Step: 60880 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-04 14:41:38,589-Speed 13924.29 samples/sec Loss 1.0222 LearningRate 0.0000 Epoch: 35 Global Step: 60890 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-04 14:41:56,320-Speed 13860.85 samples/sec Loss 1.0109 LearningRate 0.0000 Epoch: 35 Global Step: 60900 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-04 14:42:13,983-Speed 13915.27 samples/sec Loss 1.0147 LearningRate 0.0000 Epoch: 35 Global Step: 60910 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-04 14:42:31,682-Speed 13886.61 samples/sec Loss 1.0117 LearningRate 0.0000 Epoch: 35 Global Step: 60920 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-04 14:42:49,360-Speed 13904.06 samples/sec Loss 1.0145 LearningRate 0.0000 Epoch: 35 Global Step: 60930 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-04 14:43:07,151-Speed 13813.95 samples/sec Loss 1.0144 LearningRate 0.0000 Epoch: 35 Global Step: 60940 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-04 14:43:24,834-Speed 13899.12 samples/sec Loss 1.0191 LearningRate 0.0000 Epoch: 35 Global Step: 60950 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-04 14:43:42,544-Speed 13878.83 samples/sec Loss 1.0136 LearningRate 0.0000 Epoch: 35 Global Step: 60960 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-04 14:44:00,271-Speed 13865.01 samples/sec Loss 1.0119 LearningRate 0.0000 Epoch: 35 Global Step: 60970 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-04 14:44:17,960-Speed 13894.63 samples/sec Loss 1.0149 LearningRate 0.0000 Epoch: 35 Global Step: 60980 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-04 14:44:35,650-Speed 13893.39 samples/sec Loss 1.0116 LearningRate 0.0000 Epoch: 35 Global Step: 60990 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-04 14:44:53,442-Speed 13813.92 samples/sec Loss 1.0274 LearningRate 0.0000 Epoch: 35 Global Step: 61000 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-04 14:45:11,334-Speed 13736.90 samples/sec Loss 1.0109 LearningRate 0.0000 Epoch: 35 Global Step: 61010 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-04 14:45:29,158-Speed 13789.33 samples/sec Loss 1.0229 LearningRate 0.0000 Epoch: 35 Global Step: 61020 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-04 14:45:46,872-Speed 13874.38 samples/sec Loss 1.0068 LearningRate 0.0000 Epoch: 35 Global Step: 61030 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-04 14:46:04,627-Speed 13843.35 samples/sec Loss 1.0101 LearningRate 0.0000 Epoch: 35 Global Step: 61040 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-04 14:46:22,364-Speed 13856.07 samples/sec Loss 1.0080 LearningRate 0.0000 Epoch: 35 Global Step: 61050 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-04 14:46:40,067-Speed 13883.35 samples/sec Loss 1.0142 LearningRate 0.0000 Epoch: 35 Global Step: 61060 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-04 14:46:57,741-Speed 13906.69 samples/sec Loss 1.0131 LearningRate 0.0000 Epoch: 35 Global Step: 61070 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-04 14:47:15,417-Speed 13904.83 samples/sec Loss 1.0080 LearningRate 0.0000 Epoch: 35 Global Step: 61080 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-04 14:47:33,105-Speed 13894.89 samples/sec Loss 1.0068 LearningRate 0.0000 Epoch: 35 Global Step: 61090 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-04 14:47:50,835-Speed 13862.26 samples/sec Loss 1.0098 LearningRate 0.0000 Epoch: 35 Global Step: 61100 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-04 14:48:08,588-Speed 13844.16 samples/sec Loss 1.0078 LearningRate 0.0000 Epoch: 35 Global Step: 61110 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-04 14:48:26,397-Speed 13800.75 samples/sec Loss 1.0151 LearningRate 0.0000 Epoch: 35 Global Step: 61120 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-03-04 14:48:44,130-Speed 13860.32 samples/sec Loss 1.0156 LearningRate 0.0000 Epoch: 35 Global Step: 61130 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-04 14:49:01,891-Speed 13837.73 samples/sec Loss 1.0130 LearningRate 0.0000 Epoch: 35 Global Step: 61140 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-04 14:49:19,629-Speed 13855.21 samples/sec Loss 1.0112 LearningRate 0.0000 Epoch: 35 Global Step: 61150 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-04 14:49:37,420-Speed 13814.88 samples/sec Loss 1.0139 LearningRate 0.0000 Epoch: 35 Global Step: 61160 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-04 14:49:55,211-Speed 13814.90 samples/sec Loss 1.0174 LearningRate 0.0000 Epoch: 35 Global Step: 61170 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-04 14:50:12,971-Speed 13838.67 samples/sec Loss 1.0116 LearningRate 0.0000 Epoch: 35 Global Step: 61180 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-04 14:50:30,846-Speed 13749.61 samples/sec Loss 1.0158 LearningRate 0.0000 Epoch: 35 Global Step: 61190 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-04 14:50:48,652-Speed 13803.81 samples/sec Loss 1.0187 LearningRate 0.0000 Epoch: 35 Global Step: 61200 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-04 14:51:06,349-Speed 13888.13 samples/sec Loss 1.0108 LearningRate 0.0000 Epoch: 35 Global Step: 61210 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-04 14:51:24,106-Speed 13840.65 samples/sec Loss 1.0149 LearningRate 0.0000 Epoch: 35 Global Step: 61220 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-04 14:51:41,971-Speed 13757.34 samples/sec Loss 1.0042 LearningRate 0.0000 Epoch: 35 Global Step: 61230 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-04 14:51:59,735-Speed 13835.51 samples/sec Loss 1.0157 LearningRate 0.0000 Epoch: 35 Global Step: 61240 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-04 14:52:17,465-Speed 13862.53 samples/sec Loss 1.0098 LearningRate 0.0000 Epoch: 35 Global Step: 61250 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-04 14:52:35,188-Speed 13867.74 samples/sec Loss 1.0131 LearningRate 0.0000 Epoch: 35 Global Step: 61260 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-04 14:52:52,945-Speed 13840.60 samples/sec Loss 1.0073 LearningRate 0.0000 Epoch: 35 Global Step: 61270 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-04 14:53:10,715-Speed 13831.04 samples/sec Loss 1.0127 LearningRate 0.0000 Epoch: 35 Global Step: 61280 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-04 14:53:28,478-Speed 13837.13 samples/sec Loss 1.0081 LearningRate 0.0000 Epoch: 35 Global Step: 61290 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-04 14:53:46,210-Speed 13859.83 samples/sec Loss 1.0133 LearningRate 0.0000 Epoch: 35 Global Step: 61300 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-04 14:54:04,110-Speed 13730.60 samples/sec Loss 1.0022 LearningRate 0.0000 Epoch: 35 Global Step: 61310 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-04 14:54:21,967-Speed 13763.88 samples/sec Loss 1.0102 LearningRate 0.0000 Epoch: 35 Global Step: 61320 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-04 14:54:39,673-Speed 13880.77 samples/sec Loss 1.0122 LearningRate 0.0000 Epoch: 35 Global Step: 61330 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-04 14:54:57,394-Speed 13869.41 samples/sec Loss 1.0088 LearningRate 0.0000 Epoch: 35 Global Step: 61340 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-04 14:55:15,268-Speed 13750.12 samples/sec Loss 1.0003 LearningRate 0.0000 Epoch: 35 Global Step: 61350 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-04 14:55:32,979-Speed 13877.71 samples/sec Loss 1.0034 LearningRate 0.0000 Epoch: 35 Global Step: 61360 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-04 14:55:50,759-Speed 13823.53 samples/sec Loss 0.9977 LearningRate 0.0000 Epoch: 35 Global Step: 61370 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-04 14:56:08,454-Speed 13889.65 samples/sec Loss 0.9994 LearningRate 0.0000 Epoch: 35 Global Step: 61380 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-04 14:56:26,217-Speed 13836.25 samples/sec Loss 1.0008 LearningRate 0.0000 Epoch: 35 Global Step: 61390 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-04 14:56:43,940-Speed 13868.01 samples/sec Loss 1.0007 LearningRate 0.0000 Epoch: 35 Global Step: 61400 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-04 14:57:01,673-Speed 13860.30 samples/sec Loss 1.0075 LearningRate 0.0000 Epoch: 35 Global Step: 61410 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-04 14:57:19,455-Speed 13821.66 samples/sec Loss 1.0012 LearningRate 0.0000 Epoch: 35 Global Step: 61420 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-04 14:57:37,163-Speed 13878.97 samples/sec Loss 1.0126 LearningRate 0.0000 Epoch: 35 Global Step: 61430 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-04 14:57:54,920-Speed 13840.95 samples/sec Loss 1.0178 LearningRate 0.0000 Epoch: 35 Global Step: 61440 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-04 14:58:12,649-Speed 13863.10 samples/sec Loss 1.0124 LearningRate 0.0000 Epoch: 35 Global Step: 61450 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-04 14:58:30,416-Speed 13833.44 samples/sec Loss 1.0130 LearningRate 0.0000 Epoch: 35 Global Step: 61460 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-04 14:58:48,173-Speed 13841.22 samples/sec Loss 1.0072 LearningRate 0.0000 Epoch: 35 Global Step: 61470 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-04 14:59:05,992-Speed 13792.49 samples/sec Loss 1.0060 LearningRate 0.0000 Epoch: 35 Global Step: 61480 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-04 14:59:23,677-Speed 13897.26 samples/sec Loss 0.9988 LearningRate 0.0000 Epoch: 35 Global Step: 61490 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-04 14:59:41,406-Speed 13863.26 samples/sec Loss 1.0087 LearningRate 0.0000 Epoch: 35 Global Step: 61500 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-04 14:59:59,135-Speed 13863.24 samples/sec Loss 1.0049 LearningRate 0.0000 Epoch: 35 Global Step: 61510 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-04 15:00:16,857-Speed 13868.36 samples/sec Loss 0.9945 LearningRate 0.0000 Epoch: 35 Global Step: 61520 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-04 15:00:34,591-Speed 13858.42 samples/sec Loss 1.0158 LearningRate 0.0000 Epoch: 35 Global Step: 61530 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-03-04 15:00:52,314-Speed 13868.04 samples/sec Loss 1.0045 LearningRate 0.0000 Epoch: 35 Global Step: 61540 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-03-04 15:01:10,026-Speed 13876.40 samples/sec Loss 1.0084 LearningRate 0.0000 Epoch: 35 Global Step: 61550 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-04 15:01:27,748-Speed 13867.94 samples/sec Loss 0.9982 LearningRate 0.0000 Epoch: 35 Global Step: 61560 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-04 15:01:45,482-Speed 13859.17 samples/sec Loss 1.0120 LearningRate 0.0000 Epoch: 35 Global Step: 61570 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-04 15:02:03,278-Speed 13810.33 samples/sec Loss 1.0096 LearningRate 0.0000 Epoch: 35 Global Step: 61580 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-04 15:02:21,024-Speed 13852.86 samples/sec Loss 1.0061 LearningRate 0.0000 Epoch: 35 Global Step: 61590 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-04 15:02:38,757-Speed 13859.89 samples/sec Loss 1.0112 LearningRate 0.0000 Epoch: 35 Global Step: 61600 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-04 15:02:56,520-Speed 13836.07 samples/sec Loss 0.9982 LearningRate 0.0000 Epoch: 35 Global Step: 61610 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-04 15:03:14,169-Speed 13925.37 samples/sec Loss 1.0004 LearningRate 0.0000 Epoch: 35 Global Step: 61620 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-04 15:03:31,806-Speed 13936.12 samples/sec Loss 1.0052 LearningRate 0.0000 Epoch: 35 Global Step: 61630 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-04 15:03:49,526-Speed 13870.90 samples/sec Loss 1.0115 LearningRate 0.0000 Epoch: 35 Global Step: 61640 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-04 15:04:07,282-Speed 13841.31 samples/sec Loss 1.0162 LearningRate 0.0000 Epoch: 35 Global Step: 61650 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-04 15:04:24,977-Speed 13893.30 samples/sec Loss 1.0119 LearningRate 0.0000 Epoch: 35 Global Step: 61660 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-04 15:04:42,674-Speed 13888.19 samples/sec Loss 1.0052 LearningRate 0.0000 Epoch: 35 Global Step: 61670 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-04 15:05:00,401-Speed 13864.04 samples/sec Loss 1.0030 LearningRate 0.0000 Epoch: 35 Global Step: 61680 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-04 15:05:18,077-Speed 13905.56 samples/sec Loss 0.9968 LearningRate 0.0000 Epoch: 35 Global Step: 61690 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-04 15:05:35,820-Speed 13851.83 samples/sec Loss 1.0036 LearningRate 0.0000 Epoch: 35 Global Step: 61700 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-04 15:05:53,554-Speed 13858.63 samples/sec Loss 1.0036 LearningRate 0.0000 Epoch: 35 Global Step: 61710 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-04 15:06:11,397-Speed 13774.34 samples/sec Loss 1.0073 LearningRate 0.0000 Epoch: 35 Global Step: 61720 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-04 15:06:29,192-Speed 13811.43 samples/sec Loss 1.0104 LearningRate 0.0000 Epoch: 35 Global Step: 61730 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-04 15:06:46,989-Speed 13810.47 samples/sec Loss 1.0025 LearningRate 0.0000 Epoch: 35 Global Step: 61740 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-04 15:07:04,729-Speed 13853.99 samples/sec Loss 0.9985 LearningRate 0.0000 Epoch: 35 Global Step: 61750 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-04 15:07:22,469-Speed 13854.73 samples/sec Loss 0.9989 LearningRate 0.0000 Epoch: 35 Global Step: 61760 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-04 15:07:40,166-Speed 13887.57 samples/sec Loss 1.0014 LearningRate 0.0000 Epoch: 35 Global Step: 61770 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-04 15:07:57,933-Speed 13833.32 samples/sec Loss 0.9960 LearningRate 0.0000 Epoch: 35 Global Step: 61780 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-04 15:08:15,741-Speed 13801.32 samples/sec Loss 1.0067 LearningRate 0.0000 Epoch: 35 Global Step: 61790 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-04 15:08:33,475-Speed 13859.27 samples/sec Loss 1.0043 LearningRate 0.0000 Epoch: 35 Global Step: 61800 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-04 15:08:51,314-Speed 13777.57 samples/sec Loss 1.0043 LearningRate 0.0000 Epoch: 35 Global Step: 61810 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-04 15:09:09,085-Speed 13829.67 samples/sec Loss 1.0079 LearningRate 0.0000 Epoch: 35 Global Step: 61820 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-04 15:09:26,864-Speed 13824.05 samples/sec Loss 1.0048 LearningRate 0.0000 Epoch: 35 Global Step: 61830 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-04 15:09:44,628-Speed 13835.66 samples/sec Loss 1.0030 LearningRate 0.0000 Epoch: 35 Global Step: 61840 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-04 15:10:02,466-Speed 13778.55 samples/sec Loss 1.0022 LearningRate 0.0000 Epoch: 35 Global Step: 61850 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-04 15:10:20,334-Speed 13754.31 samples/sec Loss 1.0110 LearningRate 0.0000 Epoch: 35 Global Step: 61860 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-04 15:10:38,187-Speed 13767.11 samples/sec Loss 1.0031 LearningRate 0.0000 Epoch: 35 Global Step: 61870 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-04 15:10:55,958-Speed 13829.67 samples/sec Loss 0.9954 LearningRate 0.0000 Epoch: 35 Global Step: 61880 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-03-04 15:11:13,823-Speed 13757.81 samples/sec Loss 1.0045 LearningRate 0.0000 Epoch: 35 Global Step: 61890 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-03-04 15:11:31,695-Speed 13752.02 samples/sec Loss 0.9974 LearningRate 0.0000 Epoch: 35 Global Step: 61900 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-03-04 15:11:49,502-Speed 13801.99 samples/sec Loss 1.0028 LearningRate 0.0000 Epoch: 35 Global Step: 61910 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-03-04 15:12:07,369-Speed 13755.72 samples/sec Loss 1.0012 LearningRate 0.0000 Epoch: 35 Global Step: 61920 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-03-04 15:12:25,314-Speed 13696.15 samples/sec Loss 1.0043 LearningRate 0.0000 Epoch: 35 Global Step: 61930 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-03-04 15:12:43,238-Speed 13712.52 samples/sec Loss 1.0043 LearningRate 0.0000 Epoch: 35 Global Step: 61940 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-03-04 15:13:01,012-Speed 13828.40 samples/sec Loss 1.0028 LearningRate 0.0000 Epoch: 35 Global Step: 61950 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-03-04 15:13:18,782-Speed 13830.25 samples/sec Loss 1.0112 LearningRate 0.0000 Epoch: 35 Global Step: 61960 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-03-04 15:13:36,653-Speed 13753.31 samples/sec Loss 0.9980 LearningRate 0.0000 Epoch: 35 Global Step: 61970 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-03-04 15:13:54,544-Speed 13737.39 samples/sec Loss 1.0065 LearningRate 0.0000 Epoch: 35 Global Step: 61980 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-04 15:14:12,402-Speed 13762.59 samples/sec Loss 1.0010 LearningRate 0.0000 Epoch: 35 Global Step: 61990 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-04 15:14:30,228-Speed 13787.86 samples/sec Loss 1.0014 LearningRate 0.0000 Epoch: 35 Global Step: 62000 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-04 15:14:48,095-Speed 13755.95 samples/sec Loss 0.9999 LearningRate 0.0000 Epoch: 35 Global Step: 62010 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-04 15:15:05,949-Speed 13765.66 samples/sec Loss 1.0092 LearningRate 0.0000 Epoch: 35 Global Step: 62020 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-04 15:15:23,818-Speed 13754.47 samples/sec Loss 0.9971 LearningRate 0.0000 Epoch: 35 Global Step: 62030 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-04 15:15:41,682-Speed 13758.50 samples/sec Loss 1.0114 LearningRate 0.0000 Epoch: 35 Global Step: 62040 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-04 15:15:59,559-Speed 13748.50 samples/sec Loss 0.9939 LearningRate 0.0000 Epoch: 35 Global Step: 62050 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-04 15:16:17,381-Speed 13790.76 samples/sec Loss 1.0027 LearningRate 0.0000 Epoch: 35 Global Step: 62060 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-04 15:16:35,240-Speed 13762.19 samples/sec Loss 0.9978 LearningRate 0.0000 Epoch: 35 Global Step: 62070 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-04 15:16:53,170-Speed 13707.00 samples/sec Loss 1.0072 LearningRate 0.0000 Epoch: 35 Global Step: 62080 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-04 15:17:10,964-Speed 13812.13 samples/sec Loss 1.0089 LearningRate 0.0000 Epoch: 35 Global Step: 62090 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-04 15:17:28,801-Speed 13779.63 samples/sec Loss 0.9988 LearningRate 0.0000 Epoch: 35 Global Step: 62100 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-04 15:17:46,638-Speed 13778.95 samples/sec Loss 1.0025 LearningRate 0.0000 Epoch: 35 Global Step: 62110 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-04 15:18:04,487-Speed 13771.07 samples/sec Loss 1.0013 LearningRate 0.0000 Epoch: 35 Global Step: 62120 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-04 15:18:22,382-Speed 13734.19 samples/sec Loss 1.0005 LearningRate 0.0000 Epoch: 35 Global Step: 62130 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-04 15:18:40,238-Speed 13764.97 samples/sec Loss 1.0023 LearningRate 0.0000 Epoch: 35 Global Step: 62140 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-03-04 15:18:58,019-Speed 13821.91 samples/sec Loss 1.0011 LearningRate 0.0000 Epoch: 35 Global Step: 62150 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-03-04 15:19:15,904-Speed 13742.49 samples/sec Loss 1.0065 LearningRate 0.0000 Epoch: 35 Global Step: 62160 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-04 15:19:33,720-Speed 13794.44 samples/sec Loss 1.0041 LearningRate 0.0000 Epoch: 35 Global Step: 62170 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-04 15:19:51,515-Speed 13811.95 samples/sec Loss 1.0022 LearningRate 0.0000 Epoch: 35 Global Step: 62180 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-04 15:20:09,346-Speed 13783.90 samples/sec Loss 1.0019 LearningRate 0.0000 Epoch: 35 Global Step: 62190 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-04 15:20:27,277-Speed 13706.15 samples/sec Loss 1.0037 LearningRate 0.0000 Epoch: 35 Global Step: 62200 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-04 15:20:45,171-Speed 13734.95 samples/sec Loss 0.9951 LearningRate 0.0000 Epoch: 35 Global Step: 62210 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-04 15:21:53,981-Speed 3571.67 samples/sec Loss 1.0056 LearningRate 0.0000 Epoch: 36 Global Step: 62220 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-04 15:22:11,757-Speed 13825.93 samples/sec Loss 0.9978 LearningRate 0.0000 Epoch: 36 Global Step: 62230 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-04 15:22:29,514-Speed 13840.76 samples/sec Loss 0.9970 LearningRate 0.0000 Epoch: 36 Global Step: 62240 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-04 15:22:47,343-Speed 13785.02 samples/sec Loss 1.0043 LearningRate 0.0000 Epoch: 36 Global Step: 62250 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-04 15:23:05,172-Speed 13785.73 samples/sec Loss 1.0038 LearningRate 0.0000 Epoch: 36 Global Step: 62260 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-04 15:23:22,955-Speed 13822.05 samples/sec Loss 0.9985 LearningRate 0.0000 Epoch: 36 Global Step: 62270 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-04 15:23:40,741-Speed 13818.18 samples/sec Loss 0.9916 LearningRate 0.0000 Epoch: 36 Global Step: 62280 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-04 15:23:58,576-Speed 13780.46 samples/sec Loss 0.9994 LearningRate 0.0000 Epoch: 36 Global Step: 62290 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-04 15:24:16,366-Speed 13815.26 samples/sec Loss 0.9995 LearningRate 0.0000 Epoch: 36 Global Step: 62300 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-04 15:24:34,182-Speed 13795.06 samples/sec Loss 0.9983 LearningRate 0.0000 Epoch: 36 Global Step: 62310 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-04 15:24:52,188-Speed 13650.13 samples/sec Loss 1.0022 LearningRate 0.0000 Epoch: 36 Global Step: 62320 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-04 15:25:09,987-Speed 13808.30 samples/sec Loss 0.9900 LearningRate 0.0000 Epoch: 36 Global Step: 62330 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-04 15:25:27,829-Speed 13775.02 samples/sec Loss 0.9957 LearningRate 0.0000 Epoch: 36 Global Step: 62340 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-04 15:25:45,622-Speed 13813.44 samples/sec Loss 0.9891 LearningRate 0.0000 Epoch: 36 Global Step: 62350 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-03-04 15:26:03,429-Speed 13802.65 samples/sec Loss 1.0042 LearningRate 0.0000 Epoch: 36 Global Step: 62360 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-04 15:26:21,261-Speed 13782.39 samples/sec Loss 0.9983 LearningRate 0.0000 Epoch: 36 Global Step: 62370 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-04 15:26:39,017-Speed 13841.56 samples/sec Loss 0.9962 LearningRate 0.0000 Epoch: 36 Global Step: 62380 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-04 15:26:56,836-Speed 13793.28 samples/sec Loss 0.9940 LearningRate 0.0000 Epoch: 36 Global Step: 62390 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-04 15:27:14,618-Speed 13821.50 samples/sec Loss 0.9996 LearningRate 0.0000 Epoch: 36 Global Step: 62400 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-04 15:27:32,570-Speed 13690.99 samples/sec Loss 0.9955 LearningRate 0.0000 Epoch: 36 Global Step: 62410 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-04 15:27:50,357-Speed 13817.78 samples/sec Loss 0.9929 LearningRate 0.0000 Epoch: 36 Global Step: 62420 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-04 15:28:08,247-Speed 13738.33 samples/sec Loss 0.9993 LearningRate 0.0000 Epoch: 36 Global Step: 62430 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-04 15:28:26,157-Speed 13722.52 samples/sec Loss 1.0016 LearningRate 0.0000 Epoch: 36 Global Step: 62440 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-04 15:28:43,948-Speed 13814.90 samples/sec Loss 0.9898 LearningRate 0.0000 Epoch: 36 Global Step: 62450 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-04 15:29:01,782-Speed 13781.05 samples/sec Loss 0.9995 LearningRate 0.0000 Epoch: 36 Global Step: 62460 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-04 15:29:19,567-Speed 13819.07 samples/sec Loss 0.9920 LearningRate 0.0000 Epoch: 36 Global Step: 62470 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-04 15:29:37,482-Speed 13719.28 samples/sec Loss 0.9973 LearningRate 0.0000 Epoch: 36 Global Step: 62480 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-04 15:29:55,368-Speed 13741.05 samples/sec Loss 0.9952 LearningRate 0.0000 Epoch: 36 Global Step: 62490 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-04 15:30:13,175-Speed 13801.88 samples/sec Loss 0.9951 LearningRate 0.0000 Epoch: 36 Global Step: 62500 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-04 15:30:31,038-Speed 13759.48 samples/sec Loss 0.9968 LearningRate 0.0000 Epoch: 36 Global Step: 62510 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-04 15:30:49,034-Speed 13657.57 samples/sec Loss 0.9958 LearningRate 0.0000 Epoch: 36 Global Step: 62520 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-04 15:31:06,900-Speed 13756.34 samples/sec Loss 1.0033 LearningRate 0.0000 Epoch: 36 Global Step: 62530 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-04 15:31:24,678-Speed 13824.97 samples/sec Loss 0.9942 LearningRate 0.0000 Epoch: 36 Global Step: 62540 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-04 15:31:42,611-Speed 13704.89 samples/sec Loss 1.0032 LearningRate 0.0000 Epoch: 36 Global Step: 62550 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-04 15:32:00,492-Speed 13745.13 samples/sec Loss 0.9985 LearningRate 0.0000 Epoch: 36 Global Step: 62560 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-04 15:32:18,320-Speed 13785.71 samples/sec Loss 0.9967 LearningRate 0.0000 Epoch: 36 Global Step: 62570 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-04 15:32:36,212-Speed 13736.46 samples/sec Loss 0.9989 LearningRate 0.0000 Epoch: 36 Global Step: 62580 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-04 15:32:54,061-Speed 13769.86 samples/sec Loss 0.9989 LearningRate 0.0000 Epoch: 36 Global Step: 62590 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-04 15:33:11,945-Speed 13744.34 samples/sec Loss 1.0012 LearningRate 0.0000 Epoch: 36 Global Step: 62600 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-04 15:33:29,862-Speed 13717.49 samples/sec Loss 0.9948 LearningRate 0.0000 Epoch: 36 Global Step: 62610 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-04 15:33:47,813-Speed 13691.89 samples/sec Loss 0.9943 LearningRate 0.0000 Epoch: 36 Global Step: 62620 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-04 15:34:05,766-Speed 13689.64 samples/sec Loss 0.9982 LearningRate 0.0000 Epoch: 36 Global Step: 62630 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-04 15:34:23,567-Speed 13806.93 samples/sec Loss 0.9974 LearningRate 0.0000 Epoch: 36 Global Step: 62640 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-04 15:34:41,403-Speed 13780.03 samples/sec Loss 0.9881 LearningRate 0.0000 Epoch: 36 Global Step: 62650 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-04 15:34:59,280-Speed 13747.72 samples/sec Loss 1.0004 LearningRate 0.0000 Epoch: 36 Global Step: 62660 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-04 15:35:17,322-Speed 13622.80 samples/sec Loss 0.9981 LearningRate 0.0000 Epoch: 36 Global Step: 62670 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-04 15:35:35,157-Speed 13780.77 samples/sec Loss 1.0001 LearningRate 0.0000 Epoch: 36 Global Step: 62680 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-04 15:35:53,058-Speed 13729.16 samples/sec Loss 0.9995 LearningRate 0.0000 Epoch: 36 Global Step: 62690 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-04 15:36:11,023-Speed 13683.32 samples/sec Loss 0.9932 LearningRate 0.0000 Epoch: 36 Global Step: 62700 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-04 15:36:28,875-Speed 13766.58 samples/sec Loss 0.9853 LearningRate 0.0000 Epoch: 36 Global Step: 62710 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-03-04 15:36:46,671-Speed 13811.40 samples/sec Loss 0.9944 LearningRate 0.0000 Epoch: 36 Global Step: 62720 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-03-04 15:37:04,554-Speed 13743.06 samples/sec Loss 0.9929 LearningRate 0.0000 Epoch: 36 Global Step: 62730 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-04 15:37:22,347-Speed 13813.64 samples/sec Loss 0.9956 LearningRate 0.0000 Epoch: 36 Global Step: 62740 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-04 15:37:40,178-Speed 13783.10 samples/sec Loss 1.0034 LearningRate 0.0000 Epoch: 36 Global Step: 62750 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-04 15:37:57,962-Speed 13819.88 samples/sec Loss 0.9917 LearningRate 0.0000 Epoch: 36 Global Step: 62760 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-04 15:38:15,865-Speed 13728.62 samples/sec Loss 0.9950 LearningRate 0.0000 Epoch: 36 Global Step: 62770 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-04 15:38:33,774-Speed 13723.05 samples/sec Loss 0.9872 LearningRate 0.0000 Epoch: 36 Global Step: 62780 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-04 15:38:51,732-Speed 13686.17 samples/sec Loss 0.9965 LearningRate 0.0000 Epoch: 36 Global Step: 62790 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-04 15:39:09,584-Speed 13769.64 samples/sec Loss 0.9950 LearningRate 0.0000 Epoch: 36 Global Step: 62800 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-04 15:39:27,408-Speed 13789.55 samples/sec Loss 0.9920 LearningRate 0.0000 Epoch: 36 Global Step: 62810 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-04 15:39:45,409-Speed 13653.51 samples/sec Loss 0.9926 LearningRate 0.0000 Epoch: 36 Global Step: 62820 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-04 15:40:03,293-Speed 13742.88 samples/sec Loss 0.9998 LearningRate 0.0000 Epoch: 36 Global Step: 62830 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-04 15:40:21,205-Speed 13721.18 samples/sec Loss 0.9911 LearningRate 0.0000 Epoch: 36 Global Step: 62840 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-04 15:40:39,055-Speed 13768.71 samples/sec Loss 0.9880 LearningRate 0.0000 Epoch: 36 Global Step: 62850 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-04 15:40:56,982-Speed 13710.04 samples/sec Loss 0.9941 LearningRate 0.0000 Epoch: 36 Global Step: 62860 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-04 15:41:14,763-Speed 13821.98 samples/sec Loss 0.9900 LearningRate 0.0000 Epoch: 36 Global Step: 62870 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-04 15:41:32,545-Speed 13822.14 samples/sec Loss 1.0052 LearningRate 0.0000 Epoch: 36 Global Step: 62880 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-04 15:41:50,458-Speed 13720.42 samples/sec Loss 0.9980 LearningRate 0.0000 Epoch: 36 Global Step: 62890 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-04 15:42:08,299-Speed 13775.75 samples/sec Loss 0.9898 LearningRate 0.0000 Epoch: 36 Global Step: 62900 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-04 15:42:26,263-Speed 13681.94 samples/sec Loss 0.9939 LearningRate 0.0000 Epoch: 36 Global Step: 62910 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-04 15:42:44,140-Speed 13747.95 samples/sec Loss 0.9964 LearningRate 0.0000 Epoch: 36 Global Step: 62920 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-04 15:43:02,027-Speed 13740.81 samples/sec Loss 1.0020 LearningRate 0.0000 Epoch: 36 Global Step: 62930 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-03-04 15:43:19,824-Speed 13809.91 samples/sec Loss 0.9926 LearningRate 0.0000 Epoch: 36 Global Step: 62940 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-04 15:43:37,646-Speed 13790.33 samples/sec Loss 0.9944 LearningRate 0.0000 Epoch: 36 Global Step: 62950 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-04 15:43:55,533-Speed 13740.22 samples/sec Loss 0.9919 LearningRate 0.0000 Epoch: 36 Global Step: 62960 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-04 15:44:13,352-Speed 13793.16 samples/sec Loss 0.9901 LearningRate 0.0000 Epoch: 36 Global Step: 62970 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-04 15:44:31,273-Speed 13714.25 samples/sec Loss 0.9936 LearningRate 0.0000 Epoch: 36 Global Step: 62980 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-04 15:44:49,134-Speed 13760.79 samples/sec Loss 0.9934 LearningRate 0.0000 Epoch: 36 Global Step: 62990 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-04 15:45:06,995-Speed 13759.72 samples/sec Loss 0.9920 LearningRate 0.0000 Epoch: 36 Global Step: 63000 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-04 15:45:24,837-Speed 13775.16 samples/sec Loss 0.9947 LearningRate 0.0000 Epoch: 36 Global Step: 63010 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-04 15:45:42,737-Speed 13731.16 samples/sec Loss 0.9908 LearningRate 0.0000 Epoch: 36 Global Step: 63020 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-04 15:46:00,571-Speed 13781.32 samples/sec Loss 0.9923 LearningRate 0.0000 Epoch: 36 Global Step: 63030 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-04 15:46:18,398-Speed 13786.76 samples/sec Loss 0.9973 LearningRate 0.0000 Epoch: 36 Global Step: 63040 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-04 15:46:36,334-Speed 13702.46 samples/sec Loss 0.9837 LearningRate 0.0000 Epoch: 36 Global Step: 63050 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-04 15:46:54,191-Speed 13763.67 samples/sec Loss 0.9912 LearningRate 0.0000 Epoch: 36 Global Step: 63060 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-04 15:47:11,971-Speed 13823.43 samples/sec Loss 0.9893 LearningRate 0.0000 Epoch: 36 Global Step: 63070 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-04 15:47:29,796-Speed 13787.80 samples/sec Loss 0.9929 LearningRate 0.0000 Epoch: 36 Global Step: 63080 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-04 15:47:47,645-Speed 13769.85 samples/sec Loss 0.9791 LearningRate 0.0000 Epoch: 36 Global Step: 63090 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-04 15:48:05,536-Speed 13737.75 samples/sec Loss 0.9889 LearningRate 0.0000 Epoch: 36 Global Step: 63100 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-04 15:48:23,352-Speed 13795.03 samples/sec Loss 0.9913 LearningRate 0.0000 Epoch: 36 Global Step: 63110 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-04 15:48:41,213-Speed 13760.40 samples/sec Loss 0.9920 LearningRate 0.0000 Epoch: 36 Global Step: 63120 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-04 15:48:59,027-Speed 13796.77 samples/sec Loss 0.9859 LearningRate 0.0000 Epoch: 36 Global Step: 63130 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-04 15:49:16,856-Speed 13785.54 samples/sec Loss 0.9826 LearningRate 0.0000 Epoch: 36 Global Step: 63140 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-04 15:49:34,691-Speed 13780.83 samples/sec Loss 0.9830 LearningRate 0.0000 Epoch: 36 Global Step: 63150 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-04 15:49:52,557-Speed 13756.57 samples/sec Loss 0.9872 LearningRate 0.0000 Epoch: 36 Global Step: 63160 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-04 15:50:10,392-Speed 13781.18 samples/sec Loss 0.9828 LearningRate 0.0000 Epoch: 36 Global Step: 63170 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-04 15:50:28,258-Speed 13756.70 samples/sec Loss 0.9937 LearningRate 0.0000 Epoch: 36 Global Step: 63180 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-04 15:50:46,101-Speed 13774.67 samples/sec Loss 0.9856 LearningRate 0.0000 Epoch: 36 Global Step: 63190 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-04 15:51:04,022-Speed 13713.82 samples/sec Loss 0.9891 LearningRate 0.0000 Epoch: 36 Global Step: 63200 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-04 15:51:21,836-Speed 13796.82 samples/sec Loss 0.9862 LearningRate 0.0000 Epoch: 36 Global Step: 63210 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-04 15:51:39,665-Speed 13785.87 samples/sec Loss 0.9902 LearningRate 0.0000 Epoch: 36 Global Step: 63220 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-04 15:51:57,411-Speed 13849.09 samples/sec Loss 0.9872 LearningRate 0.0000 Epoch: 36 Global Step: 63230 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-04 15:52:15,179-Speed 13832.80 samples/sec Loss 0.9920 LearningRate 0.0000 Epoch: 36 Global Step: 63240 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-04 15:52:33,118-Speed 13700.31 samples/sec Loss 0.9864 LearningRate 0.0000 Epoch: 36 Global Step: 63250 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-03-04 15:52:50,888-Speed 13830.93 samples/sec Loss 0.9956 LearningRate 0.0000 Epoch: 36 Global Step: 63260 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-03-04 15:53:08,726-Speed 13778.24 samples/sec Loss 0.9925 LearningRate 0.0000 Epoch: 36 Global Step: 63270 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-04 15:53:26,495-Speed 13831.58 samples/sec Loss 0.9882 LearningRate 0.0000 Epoch: 36 Global Step: 63280 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-04 15:53:44,248-Speed 13844.82 samples/sec Loss 0.9970 LearningRate 0.0000 Epoch: 36 Global Step: 63290 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-04 15:54:02,143-Speed 13734.42 samples/sec Loss 0.9903 LearningRate 0.0000 Epoch: 36 Global Step: 63300 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-04 15:54:19,880-Speed 13856.35 samples/sec Loss 0.9939 LearningRate 0.0000 Epoch: 36 Global Step: 63310 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-04 15:54:37,685-Speed 13803.42 samples/sec Loss 0.9891 LearningRate 0.0000 Epoch: 36 Global Step: 63320 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-04 15:54:55,524-Speed 13777.63 samples/sec Loss 0.9968 LearningRate 0.0000 Epoch: 36 Global Step: 63330 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-04 15:55:13,358-Speed 13781.63 samples/sec Loss 0.9900 LearningRate 0.0000 Epoch: 36 Global Step: 63340 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-04 15:55:31,140-Speed 13821.63 samples/sec Loss 0.9900 LearningRate 0.0000 Epoch: 36 Global Step: 63350 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-04 15:55:48,962-Speed 13790.76 samples/sec Loss 0.9866 LearningRate 0.0000 Epoch: 36 Global Step: 63360 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-04 15:56:06,790-Speed 13786.10 samples/sec Loss 0.9960 LearningRate 0.0000 Epoch: 36 Global Step: 63370 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-04 15:56:24,650-Speed 13761.80 samples/sec Loss 0.9916 LearningRate 0.0000 Epoch: 36 Global Step: 63380 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-04 15:56:42,561-Speed 13721.93 samples/sec Loss 0.9896 LearningRate 0.0000 Epoch: 36 Global Step: 63390 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-04 15:57:00,366-Speed 13803.44 samples/sec Loss 0.9853 LearningRate 0.0000 Epoch: 36 Global Step: 63400 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-04 15:57:18,136-Speed 13830.97 samples/sec Loss 0.9856 LearningRate 0.0000 Epoch: 36 Global Step: 63410 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-04 15:57:35,908-Speed 13829.21 samples/sec Loss 0.9938 LearningRate 0.0000 Epoch: 36 Global Step: 63420 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-04 15:57:53,680-Speed 13829.59 samples/sec Loss 0.9923 LearningRate 0.0000 Epoch: 36 Global Step: 63430 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-04 15:58:11,531-Speed 13768.03 samples/sec Loss 0.9849 LearningRate 0.0000 Epoch: 36 Global Step: 63440 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-04 15:58:29,384-Speed 13766.44 samples/sec Loss 0.9791 LearningRate 0.0000 Epoch: 36 Global Step: 63450 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-04 15:58:47,193-Speed 13801.14 samples/sec Loss 0.9835 LearningRate 0.0000 Epoch: 36 Global Step: 63460 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-04 15:59:05,054-Speed 13760.30 samples/sec Loss 0.9831 LearningRate 0.0000 Epoch: 36 Global Step: 63470 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-04 15:59:22,852-Speed 13809.41 samples/sec Loss 0.9861 LearningRate 0.0000 Epoch: 36 Global Step: 63480 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-04 15:59:40,686-Speed 13781.13 samples/sec Loss 0.9829 LearningRate 0.0000 Epoch: 36 Global Step: 63490 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-04 15:59:58,707-Speed 13638.03 samples/sec Loss 0.9880 LearningRate 0.0000 Epoch: 36 Global Step: 63500 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-04 16:00:17,004-Speed 13433.30 samples/sec Loss 0.9931 LearningRate 0.0000 Epoch: 36 Global Step: 63510 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-04 16:00:34,825-Speed 13790.84 samples/sec Loss 0.9758 LearningRate 0.0000 Epoch: 36 Global Step: 63520 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-04 16:00:52,644-Speed 13792.76 samples/sec Loss 0.9905 LearningRate 0.0000 Epoch: 36 Global Step: 63530 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-04 16:01:10,482-Speed 13779.21 samples/sec Loss 0.9851 LearningRate 0.0000 Epoch: 36 Global Step: 63540 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-04 16:01:28,270-Speed 13817.38 samples/sec Loss 0.9873 LearningRate 0.0000 Epoch: 36 Global Step: 63550 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-03-04 16:01:46,151-Speed 13744.56 samples/sec Loss 0.9884 LearningRate 0.0000 Epoch: 36 Global Step: 63560 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-03-04 16:02:03,933-Speed 13821.63 samples/sec Loss 0.9799 LearningRate 0.0000 Epoch: 36 Global Step: 63570 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-04 16:02:21,781-Speed 13771.20 samples/sec Loss 0.9864 LearningRate 0.0000 Epoch: 36 Global Step: 63580 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-04 16:02:39,609-Speed 13786.61 samples/sec Loss 0.9913 LearningRate 0.0000 Epoch: 36 Global Step: 63590 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-04 16:02:57,469-Speed 13761.74 samples/sec Loss 0.9858 LearningRate 0.0000 Epoch: 36 Global Step: 63600 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-04 16:03:15,274-Speed 13803.32 samples/sec Loss 0.9827 LearningRate 0.0000 Epoch: 36 Global Step: 63610 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-04 16:03:33,058-Speed 13819.67 samples/sec Loss 0.9892 LearningRate 0.0000 Epoch: 36 Global Step: 63620 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-04 16:03:50,886-Speed 13786.66 samples/sec Loss 0.9940 LearningRate 0.0000 Epoch: 36 Global Step: 63630 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-04 16:04:08,793-Speed 13725.36 samples/sec Loss 0.9868 LearningRate 0.0000 Epoch: 36 Global Step: 63640 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-03-04 16:04:26,638-Speed 13772.99 samples/sec Loss 0.9861 LearningRate 0.0000 Epoch: 36 Global Step: 63650 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-03-04 16:04:44,439-Speed 13807.79 samples/sec Loss 0.9916 LearningRate 0.0000 Epoch: 36 Global Step: 63660 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-03-04 16:05:02,225-Speed 13818.40 samples/sec Loss 0.9845 LearningRate 0.0000 Epoch: 36 Global Step: 63670 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-03-04 16:05:19,988-Speed 13836.21 samples/sec Loss 0.9865 LearningRate 0.0000 Epoch: 36 Global Step: 63680 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-03-04 16:05:37,814-Speed 13787.06 samples/sec Loss 0.9798 LearningRate 0.0000 Epoch: 36 Global Step: 63690 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-03-04 16:05:55,675-Speed 13760.87 samples/sec Loss 0.9850 LearningRate 0.0000 Epoch: 36 Global Step: 63700 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-03-04 16:06:13,509-Speed 13781.06 samples/sec Loss 0.9842 LearningRate 0.0000 Epoch: 36 Global Step: 63710 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-03-04 16:06:31,368-Speed 13761.69 samples/sec Loss 0.9892 LearningRate 0.0000 Epoch: 36 Global Step: 63720 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-03-04 16:06:49,138-Speed 13831.53 samples/sec Loss 0.9892 LearningRate 0.0000 Epoch: 36 Global Step: 63730 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-03-04 16:07:07,047-Speed 13723.45 samples/sec Loss 0.9831 LearningRate 0.0000 Epoch: 36 Global Step: 63740 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-04 16:07:25,280-Speed 13479.47 samples/sec Loss 0.9934 LearningRate 0.0000 Epoch: 36 Global Step: 63750 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-04 16:07:43,545-Speed 13456.37 samples/sec Loss 0.9788 LearningRate 0.0000 Epoch: 36 Global Step: 63760 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-04 16:08:01,373-Speed 13787.10 samples/sec Loss 0.9979 LearningRate 0.0000 Epoch: 36 Global Step: 63770 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-04 16:08:19,297-Speed 13712.34 samples/sec Loss 0.9839 LearningRate 0.0000 Epoch: 36 Global Step: 63780 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-04 16:08:37,183-Speed 13741.01 samples/sec Loss 0.9860 LearningRate 0.0000 Epoch: 36 Global Step: 63790 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-04 16:08:55,073-Speed 13738.11 samples/sec Loss 0.9829 LearningRate 0.0000 Epoch: 36 Global Step: 63800 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-04 16:09:12,918-Speed 13773.33 samples/sec Loss 0.9857 LearningRate 0.0000 Epoch: 36 Global Step: 63810 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-04 16:09:30,799-Speed 13744.58 samples/sec Loss 0.9826 LearningRate 0.0000 Epoch: 36 Global Step: 63820 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-04 16:09:48,589-Speed 13815.61 samples/sec Loss 0.9834 LearningRate 0.0000 Epoch: 36 Global Step: 63830 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-04 16:10:06,406-Speed 13795.69 samples/sec Loss 0.9896 LearningRate 0.0000 Epoch: 36 Global Step: 63840 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-04 16:10:24,232-Speed 13787.45 samples/sec Loss 0.9855 LearningRate 0.0000 Epoch: 36 Global Step: 63850 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-04 16:10:41,972-Speed 13854.30 samples/sec Loss 0.9919 LearningRate 0.0000 Epoch: 36 Global Step: 63860 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-04 16:10:59,775-Speed 13805.82 samples/sec Loss 0.9830 LearningRate 0.0000 Epoch: 36 Global Step: 63870 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-04 16:11:17,648-Speed 13751.06 samples/sec Loss 0.9884 LearningRate 0.0000 Epoch: 36 Global Step: 63880 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-04 16:11:35,489-Speed 13776.02 samples/sec Loss 0.9850 LearningRate 0.0000 Epoch: 36 Global Step: 63890 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-04 16:11:53,352-Speed 13758.55 samples/sec Loss 0.9878 LearningRate 0.0000 Epoch: 36 Global Step: 63900 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-04 16:12:11,235-Speed 13744.94 samples/sec Loss 0.9913 LearningRate 0.0000 Epoch: 36 Global Step: 63910 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-04 16:12:29,233-Speed 13655.70 samples/sec Loss 0.9869 LearningRate 0.0000 Epoch: 36 Global Step: 63920 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-04 16:12:47,174-Speed 13699.10 samples/sec Loss 0.9858 LearningRate 0.0000 Epoch: 36 Global Step: 63930 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-04 16:13:05,012-Speed 13778.57 samples/sec Loss 0.9800 LearningRate 0.0000 Epoch: 36 Global Step: 63940 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-03-04 16:14:13,298-Speed 3599.06 samples/sec Loss 0.9862 LearningRate 0.0000 Epoch: 37 Global Step: 63950 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-04 16:14:31,001-Speed 13883.08 samples/sec Loss 0.9854 LearningRate 0.0000 Epoch: 37 Global Step: 63960 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-04 16:14:48,695-Speed 13890.11 samples/sec Loss 0.9822 LearningRate 0.0000 Epoch: 37 Global Step: 63970 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-04 16:15:06,537-Speed 13775.27 samples/sec Loss 0.9888 LearningRate 0.0000 Epoch: 37 Global Step: 63980 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-04 16:15:24,416-Speed 13746.43 samples/sec Loss 0.9872 LearningRate 0.0000 Epoch: 37 Global Step: 63990 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-04 16:15:42,276-Speed 13763.42 samples/sec Loss 0.9749 LearningRate 0.0000 Epoch: 37 Global Step: 64000 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-04 16:16:00,054-Speed 13825.03 samples/sec Loss 0.9876 LearningRate 0.0000 Epoch: 37 Global Step: 64010 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-04 16:16:17,808-Speed 13844.12 samples/sec Loss 0.9877 LearningRate 0.0000 Epoch: 37 Global Step: 64020 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-04 16:16:35,644-Speed 13779.96 samples/sec Loss 0.9750 LearningRate 0.0000 Epoch: 37 Global Step: 64030 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-04 16:16:54,222-Speed 13229.01 samples/sec Loss 0.9845 LearningRate 0.0000 Epoch: 37 Global Step: 64040 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-04 16:17:11,986-Speed 13836.31 samples/sec Loss 0.9830 LearningRate 0.0000 Epoch: 37 Global Step: 64050 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-04 16:17:29,826-Speed 13776.33 samples/sec Loss 0.9881 LearningRate 0.0000 Epoch: 37 Global Step: 64060 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-04 16:17:48,356-Speed 13263.87 samples/sec Loss 0.9832 LearningRate 0.0000 Epoch: 37 Global Step: 64070 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-04 16:18:06,196-Speed 13776.61 samples/sec Loss 0.9880 LearningRate 0.0000 Epoch: 37 Global Step: 64080 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-03-04 16:18:24,025-Speed 13785.68 samples/sec Loss 0.9842 LearningRate 0.0000 Epoch: 37 Global Step: 64090 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-04 16:18:41,824-Speed 13808.29 samples/sec Loss 0.9741 LearningRate 0.0000 Epoch: 37 Global Step: 64100 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-04 16:18:59,612-Speed 13818.16 samples/sec Loss 0.9765 LearningRate 0.0000 Epoch: 37 Global Step: 64110 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-04 16:19:17,562-Speed 13692.24 samples/sec Loss 0.9784 LearningRate 0.0000 Epoch: 37 Global Step: 64120 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-04 16:19:35,360-Speed 13809.53 samples/sec Loss 0.9814 LearningRate 0.0000 Epoch: 37 Global Step: 64130 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-04 16:19:53,175-Speed 13796.37 samples/sec Loss 0.9799 LearningRate 0.0000 Epoch: 37 Global Step: 64140 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-03-04 16:20:10,992-Speed 13794.02 samples/sec Loss 0.9811 LearningRate 0.0000 Epoch: 37 Global Step: 64150 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-04 16:20:28,904-Speed 13721.35 samples/sec Loss 0.9782 LearningRate 0.0000 Epoch: 37 Global Step: 64160 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-04 16:20:46,673-Speed 13831.98 samples/sec Loss 0.9825 LearningRate 0.0000 Epoch: 37 Global Step: 64170 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-04 16:21:04,446-Speed 13828.91 samples/sec Loss 0.9722 LearningRate 0.0000 Epoch: 37 Global Step: 64180 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-04 16:21:22,327-Speed 13744.93 samples/sec Loss 0.9732 LearningRate 0.0000 Epoch: 37 Global Step: 64190 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-03-04 16:21:40,164-Speed 13779.12 samples/sec Loss 0.9855 LearningRate 0.0000 Epoch: 37 Global Step: 64200 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-04 16:21:57,883-Speed 13871.03 samples/sec Loss 0.9809 LearningRate 0.0000 Epoch: 37 Global Step: 64210 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-04 16:22:15,634-Speed 13845.80 samples/sec Loss 0.9790 LearningRate 0.0000 Epoch: 37 Global Step: 64220 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-04 16:22:33,444-Speed 13799.55 samples/sec Loss 0.9840 LearningRate 0.0000 Epoch: 37 Global Step: 64230 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-04 16:22:51,223-Speed 13823.76 samples/sec Loss 0.9737 LearningRate 0.0000 Epoch: 37 Global Step: 64240 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-04 16:23:09,018-Speed 13811.02 samples/sec Loss 0.9826 LearningRate 0.0000 Epoch: 37 Global Step: 64250 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-04 16:23:26,728-Speed 13878.37 samples/sec Loss 0.9921 LearningRate 0.0000 Epoch: 37 Global Step: 64260 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-04 16:23:44,470-Speed 13852.44 samples/sec Loss 0.9799 LearningRate 0.0000 Epoch: 37 Global Step: 64270 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-04 16:24:02,293-Speed 13790.43 samples/sec Loss 0.9836 LearningRate 0.0000 Epoch: 37 Global Step: 64280 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-04 16:24:20,141-Speed 13771.95 samples/sec Loss 0.9905 LearningRate 0.0000 Epoch: 37 Global Step: 64290 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-04 16:24:37,955-Speed 13797.47 samples/sec Loss 0.9902 LearningRate 0.0000 Epoch: 37 Global Step: 64300 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-04 16:24:55,788-Speed 13781.62 samples/sec Loss 0.9840 LearningRate 0.0000 Epoch: 37 Global Step: 64310 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-04 16:25:13,719-Speed 13706.86 samples/sec Loss 0.9908 LearningRate 0.0000 Epoch: 37 Global Step: 64320 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-04 16:25:31,561-Speed 13775.39 samples/sec Loss 0.9792 LearningRate 0.0000 Epoch: 37 Global Step: 64330 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-04 16:25:50,140-Speed 13235.30 samples/sec Loss 0.9845 LearningRate 0.0000 Epoch: 37 Global Step: 64340 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-04 16:26:07,911-Speed 13830.22 samples/sec Loss 0.9828 LearningRate 0.0000 Epoch: 37 Global Step: 64350 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-04 16:26:25,770-Speed 13762.12 samples/sec Loss 0.9751 LearningRate 0.0000 Epoch: 37 Global Step: 64360 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-04 16:26:43,570-Speed 13809.10 samples/sec Loss 0.9856 LearningRate 0.0000 Epoch: 37 Global Step: 64370 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-04 16:27:01,325-Speed 13844.12 samples/sec Loss 0.9779 LearningRate 0.0000 Epoch: 37 Global Step: 64380 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-04 16:27:19,864-Speed 13257.12 samples/sec Loss 0.9836 LearningRate 0.0000 Epoch: 37 Global Step: 64390 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-04 16:27:37,666-Speed 13805.84 samples/sec Loss 0.9788 LearningRate 0.0000 Epoch: 37 Global Step: 64400 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-04 16:27:55,474-Speed 13801.58 samples/sec Loss 0.9787 LearningRate 0.0000 Epoch: 37 Global Step: 64410 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-04 16:28:13,396-Speed 13713.70 samples/sec Loss 0.9827 LearningRate 0.0000 Epoch: 37 Global Step: 64420 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-04 16:28:31,209-Speed 13797.50 samples/sec Loss 0.9834 LearningRate 0.0000 Epoch: 37 Global Step: 64430 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-04 16:28:48,973-Speed 13835.52 samples/sec Loss 0.9822 LearningRate 0.0000 Epoch: 37 Global Step: 64440 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-04 16:29:06,761-Speed 13816.71 samples/sec Loss 0.9891 LearningRate 0.0000 Epoch: 37 Global Step: 64450 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-04 16:29:24,620-Speed 13762.66 samples/sec Loss 0.9821 LearningRate 0.0000 Epoch: 37 Global Step: 64460 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-04 16:29:42,409-Speed 13816.05 samples/sec Loss 0.9867 LearningRate 0.0000 Epoch: 37 Global Step: 64470 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-04 16:30:00,240-Speed 13783.22 samples/sec Loss 0.9849 LearningRate 0.0000 Epoch: 37 Global Step: 64480 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-04 16:30:18,049-Speed 13800.85 samples/sec Loss 0.9780 LearningRate 0.0000 Epoch: 37 Global Step: 64490 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-04 16:30:35,804-Speed 13842.94 samples/sec Loss 0.9846 LearningRate 0.0000 Epoch: 37 Global Step: 64500 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-04 16:30:53,632-Speed 13785.44 samples/sec Loss 0.9800 LearningRate 0.0000 Epoch: 37 Global Step: 64510 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-04 16:31:11,452-Speed 13792.17 samples/sec Loss 0.9796 LearningRate 0.0000 Epoch: 37 Global Step: 64520 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-04 16:31:29,596-Speed 13545.85 samples/sec Loss 0.9815 LearningRate 0.0000 Epoch: 37 Global Step: 64530 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-03-04 16:31:47,939-Speed 13399.02 samples/sec Loss 0.9756 LearningRate 0.0000 Epoch: 37 Global Step: 64540 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-03-04 16:32:05,780-Speed 13776.08 samples/sec Loss 0.9846 LearningRate 0.0000 Epoch: 37 Global Step: 64550 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-03-04 16:32:23,549-Speed 13831.78 samples/sec Loss 0.9781 LearningRate 0.0000 Epoch: 37 Global Step: 64560 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-03-04 16:32:41,434-Speed 13741.73 samples/sec Loss 0.9693 LearningRate 0.0000 Epoch: 37 Global Step: 64570 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-03-04 16:33:00,033-Speed 13214.75 samples/sec Loss 0.9890 LearningRate 0.0000 Epoch: 37 Global Step: 64580 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-03-04 16:33:18,658-Speed 13196.43 samples/sec Loss 0.9819 LearningRate 0.0000 Epoch: 37 Global Step: 64590 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-03-04 16:33:36,457-Speed 13807.73 samples/sec Loss 0.9837 LearningRate 0.0000 Epoch: 37 Global Step: 64600 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-03-04 16:33:54,719-Speed 13458.19 samples/sec Loss 0.9837 LearningRate 0.0000 Epoch: 37 Global Step: 64610 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-03-04 16:34:12,824-Speed 13575.22 samples/sec Loss 0.9826 LearningRate 0.0000 Epoch: 37 Global Step: 64620 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-03-04 16:34:30,588-Speed 13835.81 samples/sec Loss 0.9809 LearningRate 0.0000 Epoch: 37 Global Step: 64630 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-04 16:34:48,316-Speed 13863.77 samples/sec Loss 0.9881 LearningRate 0.0000 Epoch: 37 Global Step: 64640 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-04 16:35:05,978-Speed 13915.35 samples/sec Loss 0.9871 LearningRate 0.0000 Epoch: 37 Global Step: 64650 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-04 16:35:23,727-Speed 13847.47 samples/sec Loss 0.9793 LearningRate 0.0000 Epoch: 37 Global Step: 64660 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-04 16:35:41,440-Speed 13875.78 samples/sec Loss 0.9760 LearningRate 0.0000 Epoch: 37 Global Step: 64670 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-04 16:35:59,217-Speed 13825.95 samples/sec Loss 0.9685 LearningRate 0.0000 Epoch: 37 Global Step: 64680 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-04 16:36:16,982-Speed 13834.46 samples/sec Loss 0.9832 LearningRate 0.0000 Epoch: 37 Global Step: 64690 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-04 16:36:34,652-Speed 13909.72 samples/sec Loss 0.9719 LearningRate 0.0000 Epoch: 37 Global Step: 64700 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-04 16:36:52,478-Speed 13787.23 samples/sec Loss 0.9798 LearningRate 0.0000 Epoch: 37 Global Step: 64710 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-04 16:37:10,243-Speed 13834.69 samples/sec Loss 0.9724 LearningRate 0.0000 Epoch: 37 Global Step: 64720 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-04 16:37:28,017-Speed 13828.40 samples/sec Loss 0.9857 LearningRate 0.0000 Epoch: 37 Global Step: 64730 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-04 16:37:45,768-Speed 13849.54 samples/sec Loss 0.9830 LearningRate 0.0000 Epoch: 37 Global Step: 64740 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-04 16:38:03,585-Speed 13794.53 samples/sec Loss 0.9762 LearningRate 0.0000 Epoch: 37 Global Step: 64750 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-04 16:38:21,380-Speed 13811.07 samples/sec Loss 0.9752 LearningRate 0.0000 Epoch: 37 Global Step: 64760 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-04 16:38:39,107-Speed 13864.59 samples/sec Loss 0.9726 LearningRate 0.0000 Epoch: 37 Global Step: 64770 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-04 16:38:56,775-Speed 13911.02 samples/sec Loss 0.9705 LearningRate 0.0000 Epoch: 37 Global Step: 64780 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-04 16:39:14,676-Speed 13729.95 samples/sec Loss 0.9762 LearningRate 0.0000 Epoch: 37 Global Step: 64790 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-04 16:39:32,523-Speed 13770.89 samples/sec Loss 0.9761 LearningRate 0.0000 Epoch: 37 Global Step: 64800 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-04 16:39:50,676-Speed 13538.89 samples/sec Loss 0.9665 LearningRate 0.0000 Epoch: 37 Global Step: 64810 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-04 16:40:08,902-Speed 13484.68 samples/sec Loss 0.9765 LearningRate 0.0000 Epoch: 37 Global Step: 64820 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-04 16:40:26,625-Speed 13867.90 samples/sec Loss 0.9777 LearningRate 0.0000 Epoch: 37 Global Step: 64830 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-03-04 16:40:44,343-Speed 13872.17 samples/sec Loss 0.9814 LearningRate 0.0000 Epoch: 37 Global Step: 64840 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-04 16:41:02,175-Speed 13782.79 samples/sec Loss 0.9742 LearningRate 0.0000 Epoch: 37 Global Step: 64850 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-04 16:41:20,418-Speed 13472.45 samples/sec Loss 0.9840 LearningRate 0.0000 Epoch: 37 Global Step: 64860 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-04 16:41:38,559-Speed 13547.59 samples/sec Loss 0.9736 LearningRate 0.0000 Epoch: 37 Global Step: 64870 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-04 16:41:56,286-Speed 13864.34 samples/sec Loss 0.9780 LearningRate 0.0000 Epoch: 37 Global Step: 64880 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-04 16:42:13,997-Speed 13877.36 samples/sec Loss 0.9710 LearningRate 0.0000 Epoch: 37 Global Step: 64890 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-04 16:42:31,796-Speed 13808.14 samples/sec Loss 0.9794 LearningRate 0.0000 Epoch: 37 Global Step: 64900 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-04 16:42:49,488-Speed 13893.06 samples/sec Loss 0.9801 LearningRate 0.0000 Epoch: 37 Global Step: 64910 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-04 16:43:07,278-Speed 13815.34 samples/sec Loss 0.9801 LearningRate 0.0000 Epoch: 37 Global Step: 64920 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-04 16:43:25,035-Speed 13841.20 samples/sec Loss 0.9739 LearningRate 0.0000 Epoch: 37 Global Step: 64930 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-04 16:43:42,824-Speed 13816.66 samples/sec Loss 0.9840 LearningRate 0.0000 Epoch: 37 Global Step: 64940 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-04 16:44:00,534-Speed 13877.34 samples/sec Loss 0.9785 LearningRate 0.0000 Epoch: 37 Global Step: 64950 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-04 16:44:18,344-Speed 13800.22 samples/sec Loss 0.9705 LearningRate 0.0000 Epoch: 37 Global Step: 64960 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-04 16:44:36,185-Speed 13776.28 samples/sec Loss 0.9701 LearningRate 0.0000 Epoch: 37 Global Step: 64970 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-04 16:44:53,926-Speed 13853.19 samples/sec Loss 0.9831 LearningRate 0.0000 Epoch: 37 Global Step: 64980 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-04 16:45:11,700-Speed 13828.05 samples/sec Loss 0.9716 LearningRate 0.0000 Epoch: 37 Global Step: 64990 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-04 16:45:29,406-Speed 13880.86 samples/sec Loss 0.9758 LearningRate 0.0000 Epoch: 37 Global Step: 65000 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-04 16:45:47,325-Speed 13716.17 samples/sec Loss 0.9860 LearningRate 0.0000 Epoch: 37 Global Step: 65010 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-04 16:46:05,060-Speed 13858.11 samples/sec Loss 0.9636 LearningRate 0.0000 Epoch: 37 Global Step: 65020 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-04 16:46:22,823-Speed 13835.96 samples/sec Loss 0.9770 LearningRate 0.0000 Epoch: 37 Global Step: 65030 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-04 16:46:40,528-Speed 13882.37 samples/sec Loss 0.9707 LearningRate 0.0000 Epoch: 37 Global Step: 65040 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-04 16:46:58,202-Speed 13906.08 samples/sec Loss 0.9753 LearningRate 0.0000 Epoch: 37 Global Step: 65050 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-04 16:47:15,887-Speed 13897.22 samples/sec Loss 0.9786 LearningRate 0.0000 Epoch: 37 Global Step: 65060 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-04 16:47:33,695-Speed 13801.40 samples/sec Loss 0.9754 LearningRate 0.0000 Epoch: 37 Global Step: 65070 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-04 16:47:51,450-Speed 13842.78 samples/sec Loss 0.9825 LearningRate 0.0000 Epoch: 37 Global Step: 65080 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-04 16:48:09,223-Speed 13828.97 samples/sec Loss 0.9794 LearningRate 0.0000 Epoch: 37 Global Step: 65090 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-04 16:48:26,957-Speed 13858.75 samples/sec Loss 0.9754 LearningRate 0.0000 Epoch: 37 Global Step: 65100 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-04 16:48:44,632-Speed 13905.56 samples/sec Loss 0.9747 LearningRate 0.0000 Epoch: 37 Global Step: 65110 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-03-04 16:49:02,317-Speed 13896.99 samples/sec Loss 0.9715 LearningRate 0.0000 Epoch: 37 Global Step: 65120 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-04 16:49:20,116-Speed 13809.03 samples/sec Loss 0.9770 LearningRate 0.0000 Epoch: 37 Global Step: 65130 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-04 16:49:37,777-Speed 13915.82 samples/sec Loss 0.9812 LearningRate 0.0000 Epoch: 37 Global Step: 65140 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-04 16:49:55,480-Speed 13883.05 samples/sec Loss 0.9793 LearningRate 0.0000 Epoch: 37 Global Step: 65150 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-04 16:50:13,194-Speed 13875.38 samples/sec Loss 0.9636 LearningRate 0.0000 Epoch: 37 Global Step: 65160 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-04 16:50:30,857-Speed 13914.93 samples/sec Loss 0.9751 LearningRate 0.0000 Epoch: 37 Global Step: 65170 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-04 16:50:48,541-Speed 13897.33 samples/sec Loss 0.9740 LearningRate 0.0000 Epoch: 37 Global Step: 65180 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-04 16:51:06,330-Speed 13816.44 samples/sec Loss 0.9760 LearningRate 0.0000 Epoch: 37 Global Step: 65190 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-04 16:51:24,010-Speed 13901.89 samples/sec Loss 0.9765 LearningRate 0.0000 Epoch: 37 Global Step: 65200 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-04 16:51:41,707-Speed 13888.14 samples/sec Loss 0.9690 LearningRate 0.0000 Epoch: 37 Global Step: 65210 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-04 16:51:59,510-Speed 13805.30 samples/sec Loss 0.9739 LearningRate 0.0000 Epoch: 37 Global Step: 65220 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-04 16:52:17,311-Speed 13806.82 samples/sec Loss 0.9778 LearningRate 0.0000 Epoch: 37 Global Step: 65230 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-04 16:52:34,973-Speed 13915.62 samples/sec Loss 0.9736 LearningRate 0.0000 Epoch: 37 Global Step: 65240 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-04 16:52:52,672-Speed 13886.98 samples/sec Loss 0.9832 LearningRate 0.0000 Epoch: 37 Global Step: 65250 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-04 16:53:10,378-Speed 13880.56 samples/sec Loss 0.9727 LearningRate 0.0000 Epoch: 37 Global Step: 65260 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-04 16:53:28,152-Speed 13827.45 samples/sec Loss 0.9746 LearningRate 0.0000 Epoch: 37 Global Step: 65270 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-04 16:53:45,905-Speed 13844.22 samples/sec Loss 0.9790 LearningRate 0.0000 Epoch: 37 Global Step: 65280 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-04 16:54:03,635-Speed 13862.24 samples/sec Loss 0.9836 LearningRate 0.0000 Epoch: 37 Global Step: 65290 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-04 16:54:21,313-Speed 13903.01 samples/sec Loss 0.9756 LearningRate 0.0000 Epoch: 37 Global Step: 65300 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-04 16:54:39,148-Speed 13780.85 samples/sec Loss 0.9761 LearningRate 0.0000 Epoch: 37 Global Step: 65310 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-04 16:54:56,940-Speed 13813.60 samples/sec Loss 0.9744 LearningRate 0.0000 Epoch: 37 Global Step: 65320 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-04 16:55:14,643-Speed 13883.70 samples/sec Loss 0.9699 LearningRate 0.0000 Epoch: 37 Global Step: 65330 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-04 16:55:32,533-Speed 13737.86 samples/sec Loss 0.9680 LearningRate 0.0000 Epoch: 37 Global Step: 65340 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-04 16:55:50,242-Speed 13879.12 samples/sec Loss 0.9731 LearningRate 0.0000 Epoch: 37 Global Step: 65350 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-04 16:56:07,898-Speed 13920.50 samples/sec Loss 0.9725 LearningRate 0.0000 Epoch: 37 Global Step: 65360 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-03-04 16:56:25,590-Speed 13892.00 samples/sec Loss 0.9777 LearningRate 0.0000 Epoch: 37 Global Step: 65370 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-04 16:56:43,343-Speed 13844.14 samples/sec Loss 0.9774 LearningRate 0.0000 Epoch: 37 Global Step: 65380 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-04 16:57:01,066-Speed 13867.76 samples/sec Loss 0.9787 LearningRate 0.0000 Epoch: 37 Global Step: 65390 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-04 16:57:18,806-Speed 13853.72 samples/sec Loss 0.9683 LearningRate 0.0000 Epoch: 37 Global Step: 65400 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-04 16:57:36,540-Speed 13858.98 samples/sec Loss 0.9786 LearningRate 0.0000 Epoch: 37 Global Step: 65410 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-04 16:57:54,255-Speed 13874.91 samples/sec Loss 0.9758 LearningRate 0.0000 Epoch: 37 Global Step: 65420 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-04 16:58:12,029-Speed 13828.11 samples/sec Loss 0.9745 LearningRate 0.0000 Epoch: 37 Global Step: 65430 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-04 16:58:29,733-Speed 13882.20 samples/sec Loss 0.9857 LearningRate 0.0000 Epoch: 37 Global Step: 65440 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-04 16:58:47,557-Speed 13789.09 samples/sec Loss 0.9822 LearningRate 0.0000 Epoch: 37 Global Step: 65450 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-04 16:59:05,322-Speed 13835.20 samples/sec Loss 0.9673 LearningRate 0.0000 Epoch: 37 Global Step: 65460 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-04 16:59:23,017-Speed 13888.88 samples/sec Loss 0.9750 LearningRate 0.0000 Epoch: 37 Global Step: 65470 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-04 16:59:40,775-Speed 13840.44 samples/sec Loss 0.9812 LearningRate 0.0000 Epoch: 37 Global Step: 65480 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-04 16:59:58,488-Speed 13875.31 samples/sec Loss 0.9701 LearningRate 0.0000 Epoch: 37 Global Step: 65490 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-04 17:00:16,183-Speed 13889.92 samples/sec Loss 0.9732 LearningRate 0.0000 Epoch: 37 Global Step: 65500 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-04 17:00:33,933-Speed 13846.60 samples/sec Loss 0.9685 LearningRate 0.0000 Epoch: 37 Global Step: 65510 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-04 17:00:51,668-Speed 13858.52 samples/sec Loss 0.9761 LearningRate 0.0000 Epoch: 37 Global Step: 65520 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-04 17:01:09,359-Speed 13892.97 samples/sec Loss 0.9676 LearningRate 0.0000 Epoch: 37 Global Step: 65530 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-04 17:01:27,080-Speed 13869.09 samples/sec Loss 0.9725 LearningRate 0.0000 Epoch: 37 Global Step: 65540 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-04 17:01:44,888-Speed 13801.27 samples/sec Loss 0.9751 LearningRate 0.0000 Epoch: 37 Global Step: 65550 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-04 17:02:02,566-Speed 13902.72 samples/sec Loss 0.9746 LearningRate 0.0000 Epoch: 37 Global Step: 65560 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-04 17:02:20,421-Speed 13765.81 samples/sec Loss 0.9765 LearningRate 0.0000 Epoch: 37 Global Step: 65570 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-04 17:02:38,152-Speed 13861.68 samples/sec Loss 0.9736 LearningRate 0.0000 Epoch: 37 Global Step: 65580 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-03-04 17:02:55,863-Speed 13876.95 samples/sec Loss 0.9790 LearningRate 0.0000 Epoch: 37 Global Step: 65590 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-04 17:03:13,562-Speed 13886.67 samples/sec Loss 0.9755 LearningRate 0.0000 Epoch: 37 Global Step: 65600 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-04 17:03:31,403-Speed 13775.86 samples/sec Loss 0.9759 LearningRate 0.0000 Epoch: 37 Global Step: 65610 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-04 17:03:49,087-Speed 13898.22 samples/sec Loss 0.9797 LearningRate 0.0000 Epoch: 37 Global Step: 65620 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-04 17:04:06,765-Speed 13903.41 samples/sec Loss 0.9754 LearningRate 0.0000 Epoch: 37 Global Step: 65630 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-04 17:04:24,536-Speed 13830.03 samples/sec Loss 0.9793 LearningRate 0.0000 Epoch: 37 Global Step: 65640 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-04 17:04:42,391-Speed 13764.36 samples/sec Loss 0.9759 LearningRate 0.0000 Epoch: 37 Global Step: 65650 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-04 17:05:00,079-Speed 13895.40 samples/sec Loss 0.9768 LearningRate 0.0000 Epoch: 37 Global Step: 65660 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-04 17:05:17,797-Speed 13871.42 samples/sec Loss 0.9686 LearningRate 0.0000 Epoch: 37 Global Step: 65670 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-04 17:06:25,735-Speed 3617.53 samples/sec Loss 0.9787 LearningRate 0.0000 Epoch: 38 Global Step: 65680 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-04 17:06:43,411-Speed 13904.28 samples/sec Loss 0.9816 LearningRate 0.0000 Epoch: 38 Global Step: 65690 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-04 17:07:01,177-Speed 13834.33 samples/sec Loss 0.9742 LearningRate 0.0000 Epoch: 38 Global Step: 65700 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-04 17:07:18,819-Speed 13931.46 samples/sec Loss 0.9714 LearningRate 0.0000 Epoch: 38 Global Step: 65710 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-04 17:07:36,636-Speed 13794.75 samples/sec Loss 0.9648 LearningRate 0.0000 Epoch: 38 Global Step: 65720 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-04 17:07:54,593-Speed 13687.08 samples/sec Loss 0.9822 LearningRate 0.0000 Epoch: 38 Global Step: 65730 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-04 17:08:12,598-Speed 13650.27 samples/sec Loss 0.9683 LearningRate 0.0000 Epoch: 38 Global Step: 65740 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-04 17:08:30,600-Speed 13652.78 samples/sec Loss 0.9721 LearningRate 0.0000 Epoch: 38 Global Step: 65750 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-04 17:08:48,532-Speed 13705.97 samples/sec Loss 0.9613 LearningRate 0.0000 Epoch: 38 Global Step: 65760 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-04 17:09:06,550-Speed 13640.80 samples/sec Loss 0.9741 LearningRate 0.0000 Epoch: 38 Global Step: 65770 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-04 17:09:24,525-Speed 13673.18 samples/sec Loss 0.9673 LearningRate 0.0000 Epoch: 38 Global Step: 65780 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-04 17:09:42,521-Speed 13657.16 samples/sec Loss 0.9785 LearningRate 0.0000 Epoch: 38 Global Step: 65790 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-04 17:10:00,614-Speed 13584.20 samples/sec Loss 0.9674 LearningRate 0.0000 Epoch: 38 Global Step: 65800 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-04 17:10:18,737-Speed 13561.26 samples/sec Loss 0.9731 LearningRate 0.0000 Epoch: 38 Global Step: 65810 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-04 17:10:36,682-Speed 13696.37 samples/sec Loss 0.9629 LearningRate 0.0000 Epoch: 38 Global Step: 65820 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-04 17:10:54,848-Speed 13529.49 samples/sec Loss 0.9768 LearningRate 0.0000 Epoch: 38 Global Step: 65830 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-04 17:11:12,855-Speed 13648.49 samples/sec Loss 0.9732 LearningRate 0.0000 Epoch: 38 Global Step: 65840 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-04 17:11:30,833-Speed 13671.24 samples/sec Loss 0.9685 LearningRate 0.0000 Epoch: 38 Global Step: 65850 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-04 17:11:48,818-Speed 13666.32 samples/sec Loss 0.9695 LearningRate 0.0000 Epoch: 38 Global Step: 65860 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-04 17:12:06,807-Speed 13662.60 samples/sec Loss 0.9771 LearningRate 0.0000 Epoch: 38 Global Step: 65870 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-04 17:12:24,979-Speed 13525.05 samples/sec Loss 0.9663 LearningRate 0.0000 Epoch: 38 Global Step: 65880 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-04 17:12:43,007-Speed 13632.76 samples/sec Loss 0.9705 LearningRate 0.0000 Epoch: 38 Global Step: 65890 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-03-04 17:13:00,939-Speed 13706.09 samples/sec Loss 0.9670 LearningRate 0.0000 Epoch: 38 Global Step: 65900 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-04 17:13:18,938-Speed 13654.95 samples/sec Loss 0.9859 LearningRate 0.0000 Epoch: 38 Global Step: 65910 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-04 17:13:36,999-Speed 13608.27 samples/sec Loss 0.9707 LearningRate 0.0000 Epoch: 38 Global Step: 65920 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-04 17:13:55,017-Speed 13640.51 samples/sec Loss 0.9707 LearningRate 0.0000 Epoch: 38 Global Step: 65930 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-04 17:14:13,044-Speed 13633.62 samples/sec Loss 0.9703 LearningRate 0.0000 Epoch: 38 Global Step: 65940 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-04 17:14:31,087-Speed 13622.16 samples/sec Loss 0.9770 LearningRate 0.0000 Epoch: 38 Global Step: 65950 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-04 17:14:49,109-Speed 13636.88 samples/sec Loss 0.9735 LearningRate 0.0000 Epoch: 38 Global Step: 65960 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-04 17:15:07,099-Speed 13661.57 samples/sec Loss 0.9763 LearningRate 0.0000 Epoch: 38 Global Step: 65970 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-04 17:15:25,077-Speed 13671.18 samples/sec Loss 0.9772 LearningRate 0.0000 Epoch: 38 Global Step: 65980 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-03-04 17:15:43,035-Speed 13686.94 samples/sec Loss 0.9747 LearningRate 0.0000 Epoch: 38 Global Step: 65990 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-03-04 17:16:01,060-Speed 13636.38 samples/sec Loss 0.9760 LearningRate 0.0000 Epoch: 38 Global Step: 66000 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-03-04 17:16:19,140-Speed 13593.35 samples/sec Loss 0.9785 LearningRate 0.0000 Epoch: 38 Global Step: 66010 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-03-04 17:16:37,133-Speed 13659.57 samples/sec Loss 0.9677 LearningRate 0.0000 Epoch: 38 Global Step: 66020 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-03-04 17:16:55,167-Speed 13628.94 samples/sec Loss 0.9706 LearningRate 0.0000 Epoch: 38 Global Step: 66030 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-03-04 17:17:13,345-Speed 13520.53 samples/sec Loss 0.9762 LearningRate 0.0000 Epoch: 38 Global Step: 66040 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-03-04 17:17:31,428-Speed 13591.59 samples/sec Loss 0.9738 LearningRate 0.0000 Epoch: 38 Global Step: 66050 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-03-04 17:17:49,472-Speed 13620.67 samples/sec Loss 0.9697 LearningRate 0.0000 Epoch: 38 Global Step: 66060 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-03-04 17:18:07,488-Speed 13641.98 samples/sec Loss 0.9677 LearningRate 0.0000 Epoch: 38 Global Step: 66070 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-03-04 17:18:25,502-Speed 13644.16 samples/sec Loss 0.9715 LearningRate 0.0000 Epoch: 38 Global Step: 66080 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-04 17:18:43,524-Speed 13636.93 samples/sec Loss 0.9786 LearningRate 0.0000 Epoch: 38 Global Step: 66090 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-04 17:19:01,537-Speed 13644.69 samples/sec Loss 0.9648 LearningRate 0.0000 Epoch: 38 Global Step: 66100 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-04 17:19:19,563-Speed 13636.40 samples/sec Loss 0.9685 LearningRate 0.0000 Epoch: 38 Global Step: 66110 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-04 17:19:37,540-Speed 13671.61 samples/sec Loss 0.9720 LearningRate 0.0000 Epoch: 38 Global Step: 66120 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-04 17:19:55,560-Speed 13639.11 samples/sec Loss 0.9772 LearningRate 0.0000 Epoch: 38 Global Step: 66130 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-03-04 17:20:13,576-Speed 13641.33 samples/sec Loss 0.9773 LearningRate 0.0000 Epoch: 38 Global Step: 66140 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-04 17:20:31,665-Speed 13587.97 samples/sec Loss 0.9745 LearningRate 0.0000 Epoch: 38 Global Step: 66150 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-04 17:20:49,785-Speed 13563.28 samples/sec Loss 0.9756 LearningRate 0.0000 Epoch: 38 Global Step: 66160 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-04 17:21:07,804-Speed 13639.84 samples/sec Loss 0.9686 LearningRate 0.0000 Epoch: 38 Global Step: 66170 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-04 17:21:25,772-Speed 13678.69 samples/sec Loss 0.9720 LearningRate 0.0000 Epoch: 38 Global Step: 66180 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-04 17:21:43,804-Speed 13629.76 samples/sec Loss 0.9723 LearningRate 0.0000 Epoch: 38 Global Step: 66190 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-04 17:22:01,842-Speed 13625.27 samples/sec Loss 0.9706 LearningRate 0.0000 Epoch: 38 Global Step: 66200 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-04 17:22:19,926-Speed 13590.82 samples/sec Loss 0.9663 LearningRate 0.0000 Epoch: 38 Global Step: 66210 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-04 17:22:38,010-Speed 13590.74 samples/sec Loss 0.9711 LearningRate 0.0000 Epoch: 38 Global Step: 66220 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-04 17:22:56,040-Speed 13631.43 samples/sec Loss 0.9761 LearningRate 0.0000 Epoch: 38 Global Step: 66230 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-04 17:23:14,010-Speed 13677.42 samples/sec Loss 0.9699 LearningRate 0.0000 Epoch: 38 Global Step: 66240 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-04 17:23:32,032-Speed 13637.02 samples/sec Loss 0.9693 LearningRate 0.0000 Epoch: 38 Global Step: 66250 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-04 17:23:49,999-Speed 13679.56 samples/sec Loss 0.9808 LearningRate 0.0000 Epoch: 38 Global Step: 66260 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-04 17:24:08,039-Speed 13623.62 samples/sec Loss 0.9715 LearningRate 0.0000 Epoch: 38 Global Step: 66270 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-04 17:24:26,062-Speed 13636.76 samples/sec Loss 0.9678 LearningRate 0.0000 Epoch: 38 Global Step: 66280 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-04 17:24:44,050-Speed 13663.61 samples/sec Loss 0.9803 LearningRate 0.0000 Epoch: 38 Global Step: 66290 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-04 17:25:02,121-Speed 13600.75 samples/sec Loss 0.9674 LearningRate 0.0000 Epoch: 38 Global Step: 66300 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-04 17:25:20,118-Speed 13656.49 samples/sec Loss 0.9710 LearningRate 0.0000 Epoch: 38 Global Step: 66310 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-04 17:25:38,218-Speed 13578.25 samples/sec Loss 0.9687 LearningRate 0.0000 Epoch: 38 Global Step: 66320 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-04 17:25:56,174-Speed 13688.46 samples/sec Loss 0.9700 LearningRate 0.0000 Epoch: 38 Global Step: 66330 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-04 17:26:14,146-Speed 13674.65 samples/sec Loss 0.9698 LearningRate 0.0000 Epoch: 38 Global Step: 66340 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-04 17:26:32,164-Speed 13640.60 samples/sec Loss 0.9644 LearningRate 0.0000 Epoch: 38 Global Step: 66350 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-04 17:26:50,287-Speed 13561.87 samples/sec Loss 0.9714 LearningRate 0.0000 Epoch: 38 Global Step: 66360 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-04 17:27:08,295-Speed 13648.15 samples/sec Loss 0.9736 LearningRate 0.0000 Epoch: 38 Global Step: 66370 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-04 17:27:26,297-Speed 13653.03 samples/sec Loss 0.9764 LearningRate 0.0000 Epoch: 38 Global Step: 66380 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-04 17:27:44,344-Speed 13617.88 samples/sec Loss 0.9669 LearningRate 0.0000 Epoch: 38 Global Step: 66390 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-04 17:28:02,358-Speed 13644.13 samples/sec Loss 0.9694 LearningRate 0.0000 Epoch: 38 Global Step: 66400 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-04 17:28:20,431-Speed 13599.33 samples/sec Loss 0.9699 LearningRate 0.0000 Epoch: 38 Global Step: 66410 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-04 17:28:38,434-Speed 13651.21 samples/sec Loss 0.9762 LearningRate 0.0000 Epoch: 38 Global Step: 66420 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-04 17:28:56,423-Speed 13663.04 samples/sec Loss 0.9700 LearningRate 0.0000 Epoch: 38 Global Step: 66430 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-04 17:29:14,424-Speed 13652.82 samples/sec Loss 0.9763 LearningRate 0.0000 Epoch: 38 Global Step: 66440 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-04 17:29:32,566-Speed 13547.82 samples/sec Loss 0.9670 LearningRate 0.0000 Epoch: 38 Global Step: 66450 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-04 17:29:50,617-Speed 13615.98 samples/sec Loss 0.9674 LearningRate 0.0000 Epoch: 38 Global Step: 66460 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-04 17:30:08,655-Speed 13624.96 samples/sec Loss 0.9706 LearningRate 0.0000 Epoch: 38 Global Step: 66470 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-04 17:30:26,708-Speed 13651.23 samples/sec Loss 0.9649 LearningRate 0.0000 Epoch: 38 Global Step: 66480 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-04 17:30:44,803-Speed 13582.97 samples/sec Loss 0.9736 LearningRate 0.0000 Epoch: 38 Global Step: 66490 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-04 17:31:02,893-Speed 13586.13 samples/sec Loss 0.9692 LearningRate 0.0000 Epoch: 38 Global Step: 66500 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-04 17:31:20,923-Speed 13631.49 samples/sec Loss 0.9675 LearningRate 0.0000 Epoch: 38 Global Step: 66510 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-04 17:31:38,898-Speed 13673.54 samples/sec Loss 0.9611 LearningRate 0.0000 Epoch: 38 Global Step: 66520 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-04 17:31:56,891-Speed 13659.68 samples/sec Loss 0.9796 LearningRate 0.0000 Epoch: 38 Global Step: 66530 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-04 17:32:14,920-Speed 13631.88 samples/sec Loss 0.9747 LearningRate 0.0000 Epoch: 38 Global Step: 66540 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-04 17:32:32,934-Speed 13643.73 samples/sec Loss 0.9619 LearningRate 0.0000 Epoch: 38 Global Step: 66550 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-04 17:32:51,010-Speed 13596.23 samples/sec Loss 0.9658 LearningRate 0.0000 Epoch: 38 Global Step: 66560 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-04 17:33:09,059-Speed 13617.67 samples/sec Loss 0.9699 LearningRate 0.0000 Epoch: 38 Global Step: 66570 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-03-04 17:33:27,056-Speed 13656.27 samples/sec Loss 0.9638 LearningRate 0.0000 Epoch: 38 Global Step: 66580 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-04 17:33:45,063-Speed 13648.99 samples/sec Loss 0.9626 LearningRate 0.0000 Epoch: 38 Global Step: 66590 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-04 17:34:03,059-Speed 13657.44 samples/sec Loss 0.9752 LearningRate 0.0000 Epoch: 38 Global Step: 66600 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-04 17:34:21,104-Speed 13622.11 samples/sec Loss 0.9719 LearningRate 0.0000 Epoch: 38 Global Step: 66610 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-04 17:34:39,111-Speed 13649.63 samples/sec Loss 0.9681 LearningRate 0.0000 Epoch: 38 Global Step: 66620 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-04 17:34:57,156-Speed 13619.84 samples/sec Loss 0.9680 LearningRate 0.0000 Epoch: 38 Global Step: 66630 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-04 17:35:15,178-Speed 13636.94 samples/sec Loss 0.9696 LearningRate 0.0000 Epoch: 38 Global Step: 66640 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-04 17:35:33,215-Speed 13626.35 samples/sec Loss 0.9692 LearningRate 0.0000 Epoch: 38 Global Step: 66650 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-04 17:35:51,237-Speed 13637.67 samples/sec Loss 0.9653 LearningRate 0.0000 Epoch: 38 Global Step: 66660 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-04 17:36:09,329-Speed 13585.38 samples/sec Loss 0.9666 LearningRate 0.0000 Epoch: 38 Global Step: 66670 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-04 17:36:27,412-Speed 13591.37 samples/sec Loss 0.9681 LearningRate 0.0000 Epoch: 38 Global Step: 66680 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-04 17:36:45,466-Speed 13613.47 samples/sec Loss 0.9728 LearningRate 0.0000 Epoch: 38 Global Step: 66690 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-04 17:37:03,526-Speed 13609.05 samples/sec Loss 0.9715 LearningRate 0.0000 Epoch: 38 Global Step: 66700 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-04 17:37:21,592-Speed 13603.91 samples/sec Loss 0.9695 LearningRate 0.0000 Epoch: 38 Global Step: 66710 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-04 17:37:39,606-Speed 13643.29 samples/sec Loss 0.9723 LearningRate 0.0000 Epoch: 38 Global Step: 66720 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-04 17:37:57,788-Speed 13518.05 samples/sec Loss 0.9713 LearningRate 0.0000 Epoch: 38 Global Step: 66730 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-04 17:38:15,775-Speed 13664.23 samples/sec Loss 0.9655 LearningRate 0.0000 Epoch: 38 Global Step: 66740 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-04 17:38:33,780-Speed 13650.00 samples/sec Loss 0.9686 LearningRate 0.0000 Epoch: 38 Global Step: 66750 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-04 17:38:51,912-Speed 13554.99 samples/sec Loss 0.9613 LearningRate 0.0000 Epoch: 38 Global Step: 66760 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-04 17:39:10,070-Speed 13535.22 samples/sec Loss 0.9771 LearningRate 0.0000 Epoch: 38 Global Step: 66770 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-04 17:39:28,180-Speed 13571.56 samples/sec Loss 0.9678 LearningRate 0.0000 Epoch: 38 Global Step: 66780 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-03-04 17:39:46,117-Speed 13703.88 samples/sec Loss 0.9703 LearningRate 0.0000 Epoch: 38 Global Step: 66790 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-04 17:40:04,165-Speed 13618.17 samples/sec Loss 0.9687 LearningRate 0.0000 Epoch: 38 Global Step: 66800 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-04 17:40:22,216-Speed 13615.66 samples/sec Loss 0.9731 LearningRate 0.0000 Epoch: 38 Global Step: 66810 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-04 17:40:40,170-Speed 13688.94 samples/sec Loss 0.9682 LearningRate 0.0000 Epoch: 38 Global Step: 66820 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-04 17:40:58,228-Speed 13610.45 samples/sec Loss 0.9733 LearningRate 0.0000 Epoch: 38 Global Step: 66830 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-04 17:41:16,251-Speed 13636.88 samples/sec Loss 0.9709 LearningRate 0.0000 Epoch: 38 Global Step: 66840 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-04 17:41:34,205-Speed 13689.22 samples/sec Loss 0.9672 LearningRate 0.0000 Epoch: 38 Global Step: 66850 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-04 17:41:52,329-Speed 13561.19 samples/sec Loss 0.9670 LearningRate 0.0000 Epoch: 38 Global Step: 66860 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-04 17:42:10,457-Speed 13557.78 samples/sec Loss 0.9686 LearningRate 0.0000 Epoch: 38 Global Step: 66870 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-04 17:42:28,513-Speed 13611.35 samples/sec Loss 0.9681 LearningRate 0.0000 Epoch: 38 Global Step: 66880 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-04 17:42:46,551-Speed 13625.78 samples/sec Loss 0.9653 LearningRate 0.0000 Epoch: 38 Global Step: 66890 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-04 17:43:04,610-Speed 13609.58 samples/sec Loss 0.9674 LearningRate 0.0000 Epoch: 38 Global Step: 66900 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-04 17:43:22,578-Speed 13678.87 samples/sec Loss 0.9671 LearningRate 0.0000 Epoch: 38 Global Step: 66910 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-04 17:43:40,621-Speed 13621.11 samples/sec Loss 0.9672 LearningRate 0.0000 Epoch: 38 Global Step: 66920 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-04 17:43:58,602-Speed 13669.76 samples/sec Loss 0.9661 LearningRate 0.0000 Epoch: 38 Global Step: 66930 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-04 17:44:16,642-Speed 13623.67 samples/sec Loss 0.9697 LearningRate 0.0000 Epoch: 38 Global Step: 66940 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-04 17:44:34,707-Speed 13605.61 samples/sec Loss 0.9642 LearningRate 0.0000 Epoch: 38 Global Step: 66950 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-04 17:44:52,677-Speed 13676.62 samples/sec Loss 0.9675 LearningRate 0.0000 Epoch: 38 Global Step: 66960 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-04 17:45:10,650-Speed 13674.55 samples/sec Loss 0.9632 LearningRate 0.0000 Epoch: 38 Global Step: 66970 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-04 17:45:28,767-Speed 13567.14 samples/sec Loss 0.9571 LearningRate 0.0000 Epoch: 38 Global Step: 66980 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-04 17:45:46,774-Speed 13648.61 samples/sec Loss 0.9624 LearningRate 0.0000 Epoch: 38 Global Step: 66990 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-04 17:46:04,808-Speed 13628.46 samples/sec Loss 0.9704 LearningRate 0.0000 Epoch: 38 Global Step: 67000 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-04 17:46:22,890-Speed 13592.31 samples/sec Loss 0.9639 LearningRate 0.0000 Epoch: 38 Global Step: 67010 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-04 17:46:40,883-Speed 13659.79 samples/sec Loss 0.9719 LearningRate 0.0000 Epoch: 38 Global Step: 67020 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-04 17:46:58,896-Speed 13643.97 samples/sec Loss 0.9667 LearningRate 0.0000 Epoch: 38 Global Step: 67030 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-04 17:47:16,923-Speed 13634.00 samples/sec Loss 0.9677 LearningRate 0.0000 Epoch: 38 Global Step: 67040 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-04 17:47:35,099-Speed 13685.74 samples/sec Loss 0.9621 LearningRate 0.0000 Epoch: 38 Global Step: 67050 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-04 17:47:53,196-Speed 13581.29 samples/sec Loss 0.9799 LearningRate 0.0000 Epoch: 38 Global Step: 67060 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-04 17:48:11,195-Speed 13655.46 samples/sec Loss 0.9662 LearningRate 0.0000 Epoch: 38 Global Step: 67070 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-04 17:48:29,209-Speed 13644.71 samples/sec Loss 0.9642 LearningRate 0.0000 Epoch: 38 Global Step: 67080 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-04 17:48:47,251-Speed 13622.41 samples/sec Loss 0.9663 LearningRate 0.0000 Epoch: 38 Global Step: 67090 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-04 17:49:05,235-Speed 13665.81 samples/sec Loss 0.9704 LearningRate 0.0000 Epoch: 38 Global Step: 67100 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-04 17:49:23,296-Speed 13608.45 samples/sec Loss 0.9682 LearningRate 0.0000 Epoch: 38 Global Step: 67110 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-03-04 17:49:41,252-Speed 13687.78 samples/sec Loss 0.9703 LearningRate 0.0000 Epoch: 38 Global Step: 67120 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-04 17:49:59,298-Speed 13619.42 samples/sec Loss 0.9704 LearningRate 0.0000 Epoch: 38 Global Step: 67130 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-04 17:50:17,310-Speed 13644.89 samples/sec Loss 0.9614 LearningRate 0.0000 Epoch: 38 Global Step: 67140 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-04 17:50:35,437-Speed 13558.68 samples/sec Loss 0.9713 LearningRate 0.0000 Epoch: 38 Global Step: 67150 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-04 17:50:53,532-Speed 13582.16 samples/sec Loss 0.9668 LearningRate 0.0000 Epoch: 38 Global Step: 67160 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-04 17:51:11,664-Speed 13554.85 samples/sec Loss 0.9765 LearningRate 0.0000 Epoch: 38 Global Step: 67170 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-04 17:51:29,707-Speed 13621.40 samples/sec Loss 0.9645 LearningRate 0.0000 Epoch: 38 Global Step: 67180 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-04 17:51:47,729-Speed 13638.31 samples/sec Loss 0.9761 LearningRate 0.0000 Epoch: 38 Global Step: 67190 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-04 17:52:05,781-Speed 13614.92 samples/sec Loss 0.9762 LearningRate 0.0000 Epoch: 38 Global Step: 67200 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-04 17:52:23,556-Speed 13826.71 samples/sec Loss 0.9707 LearningRate 0.0000 Epoch: 38 Global Step: 67210 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-04 17:52:41,247-Speed 13892.78 samples/sec Loss 0.9619 LearningRate 0.0000 Epoch: 38 Global Step: 67220 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-03-04 17:52:58,955-Speed 13879.48 samples/sec Loss 0.9778 LearningRate 0.0000 Epoch: 38 Global Step: 67230 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-04 17:53:16,700-Speed 13850.43 samples/sec Loss 0.9607 LearningRate 0.0000 Epoch: 38 Global Step: 67240 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-04 17:53:34,502-Speed 13805.75 samples/sec Loss 0.9584 LearningRate 0.0000 Epoch: 38 Global Step: 67250 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-04 17:53:52,251-Speed 13847.07 samples/sec Loss 0.9640 LearningRate 0.0000 Epoch: 38 Global Step: 67260 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-04 17:54:09,922-Speed 13909.00 samples/sec Loss 0.9640 LearningRate 0.0000 Epoch: 38 Global Step: 67270 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-04 17:54:27,719-Speed 13809.95 samples/sec Loss 0.9745 LearningRate 0.0000 Epoch: 38 Global Step: 67280 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-04 17:54:45,412-Speed 13891.01 samples/sec Loss 0.9628 LearningRate 0.0000 Epoch: 38 Global Step: 67290 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-04 17:55:03,113-Speed 13884.41 samples/sec Loss 0.9699 LearningRate 0.0000 Epoch: 38 Global Step: 67300 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-04 17:55:20,811-Speed 13887.54 samples/sec Loss 0.9663 LearningRate 0.0000 Epoch: 38 Global Step: 67310 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-04 17:55:38,590-Speed 13824.76 samples/sec Loss 0.9615 LearningRate 0.0000 Epoch: 38 Global Step: 67320 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-04 17:55:56,315-Speed 13865.63 samples/sec Loss 0.9709 LearningRate 0.0000 Epoch: 38 Global Step: 67330 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-04 17:56:14,039-Speed 13866.75 samples/sec Loss 0.9693 LearningRate 0.0000 Epoch: 38 Global Step: 67340 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-04 17:56:31,772-Speed 13859.50 samples/sec Loss 0.9699 LearningRate 0.0000 Epoch: 38 Global Step: 67350 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-04 17:56:49,441-Speed 13910.26 samples/sec Loss 0.9667 LearningRate 0.0000 Epoch: 38 Global Step: 67360 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-04 17:57:07,231-Speed 13815.21 samples/sec Loss 0.9776 LearningRate 0.0000 Epoch: 38 Global Step: 67370 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-04 17:57:24,932-Speed 13885.08 samples/sec Loss 0.9722 LearningRate 0.0000 Epoch: 38 Global Step: 67380 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-04 17:57:42,699-Speed 13833.28 samples/sec Loss 0.9723 LearningRate 0.0000 Epoch: 38 Global Step: 67390 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-04 17:58:00,348-Speed 13925.91 samples/sec Loss 0.9700 LearningRate 0.0000 Epoch: 38 Global Step: 67400 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-04 17:59:08,417-Speed 3610.54 samples/sec Loss 0.9729 LearningRate 0.0000 Epoch: 39 Global Step: 67410 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-04 17:59:25,989-Speed 13986.41 samples/sec Loss 0.9703 LearningRate 0.0000 Epoch: 39 Global Step: 67420 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-04 17:59:43,682-Speed 13890.92 samples/sec Loss 0.9661 LearningRate 0.0000 Epoch: 39 Global Step: 67430 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-04 18:00:01,389-Speed 13880.34 samples/sec Loss 0.9634 LearningRate 0.0000 Epoch: 39 Global Step: 67440 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-04 18:00:19,121-Speed 13860.76 samples/sec Loss 0.9752 LearningRate 0.0000 Epoch: 39 Global Step: 67450 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-04 18:00:36,825-Speed 13882.72 samples/sec Loss 0.9635 LearningRate 0.0000 Epoch: 39 Global Step: 67460 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-04 18:00:54,650-Speed 13788.21 samples/sec Loss 0.9684 LearningRate 0.0000 Epoch: 39 Global Step: 67470 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-04 18:01:12,394-Speed 13850.84 samples/sec Loss 0.9663 LearningRate 0.0000 Epoch: 39 Global Step: 67480 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-04 18:01:30,086-Speed 13892.48 samples/sec Loss 0.9627 LearningRate 0.0000 Epoch: 39 Global Step: 67490 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-04 18:01:47,743-Speed 13919.55 samples/sec Loss 0.9717 LearningRate 0.0000 Epoch: 39 Global Step: 67500 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-04 18:02:05,397-Speed 13921.68 samples/sec Loss 0.9627 LearningRate 0.0000 Epoch: 39 Global Step: 67510 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-04 18:02:23,056-Speed 13917.69 samples/sec Loss 0.9648 LearningRate 0.0000 Epoch: 39 Global Step: 67520 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-04 18:02:40,831-Speed 13827.45 samples/sec Loss 0.9578 LearningRate 0.0000 Epoch: 39 Global Step: 67530 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-04 18:02:58,550-Speed 13870.87 samples/sec Loss 0.9636 LearningRate 0.0000 Epoch: 39 Global Step: 67540 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-04 18:03:16,321-Speed 13830.43 samples/sec Loss 0.9660 LearningRate 0.0000 Epoch: 39 Global Step: 67550 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-04 18:03:34,019-Speed 13886.59 samples/sec Loss 0.9522 LearningRate 0.0000 Epoch: 39 Global Step: 67560 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-04 18:03:51,700-Speed 13900.91 samples/sec Loss 0.9689 LearningRate 0.0000 Epoch: 39 Global Step: 67570 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-04 18:04:09,385-Speed 13898.27 samples/sec Loss 0.9748 LearningRate 0.0000 Epoch: 39 Global Step: 67580 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-04 18:04:27,094-Speed 13878.11 samples/sec Loss 0.9698 LearningRate 0.0000 Epoch: 39 Global Step: 67590 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-04 18:04:44,791-Speed 13888.19 samples/sec Loss 0.9692 LearningRate 0.0000 Epoch: 39 Global Step: 67600 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-04 18:05:02,453-Speed 13915.45 samples/sec Loss 0.9609 LearningRate 0.0000 Epoch: 39 Global Step: 67610 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-04 18:05:20,227-Speed 13827.99 samples/sec Loss 0.9713 LearningRate 0.0000 Epoch: 39 Global Step: 67620 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-04 18:05:37,942-Speed 13873.47 samples/sec Loss 0.9642 LearningRate 0.0000 Epoch: 39 Global Step: 67630 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-04 18:05:55,666-Speed 13867.40 samples/sec Loss 0.9677 LearningRate 0.0000 Epoch: 39 Global Step: 67640 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-04 18:06:13,358-Speed 13891.40 samples/sec Loss 0.9707 LearningRate 0.0000 Epoch: 39 Global Step: 67650 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-04 18:06:31,121-Speed 13836.66 samples/sec Loss 0.9552 LearningRate 0.0000 Epoch: 39 Global Step: 67660 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-04 18:06:48,838-Speed 13872.14 samples/sec Loss 0.9701 LearningRate 0.0000 Epoch: 39 Global Step: 67670 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-04 18:07:06,596-Speed 13840.78 samples/sec Loss 0.9708 LearningRate 0.0000 Epoch: 39 Global Step: 67680 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-04 18:07:24,361-Speed 13834.73 samples/sec Loss 0.9711 LearningRate 0.0000 Epoch: 39 Global Step: 67690 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-04 18:07:42,049-Speed 13895.02 samples/sec Loss 0.9667 LearningRate 0.0000 Epoch: 39 Global Step: 67700 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-04 18:07:59,745-Speed 13888.39 samples/sec Loss 0.9660 LearningRate 0.0000 Epoch: 39 Global Step: 67710 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-04 18:08:17,447-Speed 13884.65 samples/sec Loss 0.9704 LearningRate 0.0000 Epoch: 39 Global Step: 67720 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-04 18:08:35,152-Speed 13881.14 samples/sec Loss 0.9660 LearningRate 0.0000 Epoch: 39 Global Step: 67730 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-04 18:08:52,893-Speed 13853.92 samples/sec Loss 0.9638 LearningRate 0.0000 Epoch: 39 Global Step: 67740 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-04 18:09:10,591-Speed 13888.37 samples/sec Loss 0.9645 LearningRate 0.0000 Epoch: 39 Global Step: 67750 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-04 18:09:28,265-Speed 13906.37 samples/sec Loss 0.9695 LearningRate 0.0000 Epoch: 39 Global Step: 67760 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-04 18:09:45,992-Speed 13864.13 samples/sec Loss 0.9637 LearningRate 0.0000 Epoch: 39 Global Step: 67770 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-04 18:10:03,682-Speed 13893.95 samples/sec Loss 0.9687 LearningRate 0.0000 Epoch: 39 Global Step: 67780 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-04 18:10:21,370-Speed 13894.81 samples/sec Loss 0.9718 LearningRate 0.0000 Epoch: 39 Global Step: 67790 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-04 18:10:39,106-Speed 13857.34 samples/sec Loss 0.9722 LearningRate 0.0000 Epoch: 39 Global Step: 67800 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-04 18:10:56,780-Speed 13906.43 samples/sec Loss 0.9608 LearningRate 0.0000 Epoch: 39 Global Step: 67810 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-04 18:11:14,496-Speed 13872.82 samples/sec Loss 0.9611 LearningRate 0.0000 Epoch: 39 Global Step: 67820 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-04 18:11:32,210-Speed 13874.93 samples/sec Loss 0.9679 LearningRate 0.0000 Epoch: 39 Global Step: 67830 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-04 18:11:49,896-Speed 13897.30 samples/sec Loss 0.9665 LearningRate 0.0000 Epoch: 39 Global Step: 67840 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-04 18:12:07,574-Speed 13903.29 samples/sec Loss 0.9704 LearningRate 0.0000 Epoch: 39 Global Step: 67850 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-04 18:12:25,293-Speed 13870.23 samples/sec Loss 0.9597 LearningRate 0.0000 Epoch: 39 Global Step: 67860 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-04 18:12:43,034-Speed 13854.66 samples/sec Loss 0.9696 LearningRate 0.0000 Epoch: 39 Global Step: 67870 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-04 18:13:00,817-Speed 13820.83 samples/sec Loss 0.9693 LearningRate 0.0000 Epoch: 39 Global Step: 67880 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-04 18:13:18,462-Speed 13929.21 samples/sec Loss 0.9724 LearningRate 0.0000 Epoch: 39 Global Step: 67890 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-04 18:13:36,092-Speed 13941.04 samples/sec Loss 0.9707 LearningRate 0.0000 Epoch: 39 Global Step: 67900 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-04 18:13:53,807-Speed 13874.10 samples/sec Loss 0.9718 LearningRate 0.0000 Epoch: 39 Global Step: 67910 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-04 18:14:11,556-Speed 13847.05 samples/sec Loss 0.9706 LearningRate 0.0000 Epoch: 39 Global Step: 67920 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-04 18:14:29,291-Speed 13859.24 samples/sec Loss 0.9625 LearningRate 0.0000 Epoch: 39 Global Step: 67930 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-04 18:14:47,042-Speed 13845.39 samples/sec Loss 0.9619 LearningRate 0.0000 Epoch: 39 Global Step: 67940 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-04 18:15:04,758-Speed 13873.14 samples/sec Loss 0.9677 LearningRate 0.0000 Epoch: 39 Global Step: 67950 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-04 18:15:22,432-Speed 13906.54 samples/sec Loss 0.9622 LearningRate 0.0000 Epoch: 39 Global Step: 67960 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-04 18:15:40,157-Speed 13865.70 samples/sec Loss 0.9690 LearningRate 0.0000 Epoch: 39 Global Step: 67970 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-04 18:15:57,829-Speed 13907.35 samples/sec Loss 0.9587 LearningRate 0.0000 Epoch: 39 Global Step: 67980 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-04 18:16:15,640-Speed 13798.97 samples/sec Loss 0.9706 LearningRate 0.0000 Epoch: 39 Global Step: 67990 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-04 18:16:33,424-Speed 13820.36 samples/sec Loss 0.9637 LearningRate 0.0000 Epoch: 39 Global Step: 68000 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-04 18:16:51,078-Speed 13921.71 samples/sec Loss 0.9647 LearningRate 0.0000 Epoch: 39 Global Step: 68010 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-04 18:17:08,857-Speed 13824.35 samples/sec Loss 0.9704 LearningRate 0.0000 Epoch: 39 Global Step: 68020 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-04 18:17:26,556-Speed 13886.02 samples/sec Loss 0.9622 LearningRate 0.0000 Epoch: 39 Global Step: 68030 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-04 18:17:44,310-Speed 13843.75 samples/sec Loss 0.9674 LearningRate 0.0000 Epoch: 39 Global Step: 68040 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-04 18:18:02,020-Speed 13878.51 samples/sec Loss 0.9687 LearningRate 0.0000 Epoch: 39 Global Step: 68050 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-04 18:18:19,770-Speed 13846.38 samples/sec Loss 0.9725 LearningRate 0.0000 Epoch: 39 Global Step: 68060 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-04 18:18:37,478-Speed 13879.43 samples/sec Loss 0.9732 LearningRate 0.0000 Epoch: 39 Global Step: 68070 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-03-04 18:18:55,198-Speed 13869.90 samples/sec Loss 0.9663 LearningRate 0.0000 Epoch: 39 Global Step: 68080 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-04 18:19:12,925-Speed 13864.89 samples/sec Loss 0.9636 LearningRate 0.0000 Epoch: 39 Global Step: 68090 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-04 18:19:30,628-Speed 13883.65 samples/sec Loss 0.9650 LearningRate 0.0000 Epoch: 39 Global Step: 68100 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-04 18:19:48,325-Speed 13887.66 samples/sec Loss 0.9774 LearningRate 0.0000 Epoch: 39 Global Step: 68110 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-04 18:20:06,109-Speed 13820.21 samples/sec Loss 0.9575 LearningRate 0.0000 Epoch: 39 Global Step: 68120 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-03-04 18:20:23,943-Speed 13780.57 samples/sec Loss 0.9685 LearningRate 0.0000 Epoch: 39 Global Step: 68130 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-04 18:20:41,755-Speed 13798.57 samples/sec Loss 0.9695 LearningRate 0.0000 Epoch: 39 Global Step: 68140 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-04 18:20:59,434-Speed 13901.71 samples/sec Loss 0.9665 LearningRate 0.0000 Epoch: 39 Global Step: 68150 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-04 18:21:17,298-Speed 13757.90 samples/sec Loss 0.9605 LearningRate 0.0000 Epoch: 39 Global Step: 68160 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-04 18:21:35,010-Speed 13876.55 samples/sec Loss 0.9767 LearningRate 0.0000 Epoch: 39 Global Step: 68170 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-04 18:21:52,757-Speed 13848.27 samples/sec Loss 0.9617 LearningRate 0.0000 Epoch: 39 Global Step: 68180 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-04 18:22:10,502-Speed 13850.07 samples/sec Loss 0.9694 LearningRate 0.0000 Epoch: 39 Global Step: 68190 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-04 18:22:28,312-Speed 13799.52 samples/sec Loss 0.9720 LearningRate 0.0000 Epoch: 39 Global Step: 68200 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-04 18:22:46,096-Speed 13820.06 samples/sec Loss 0.9592 LearningRate 0.0000 Epoch: 39 Global Step: 68210 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-04 18:23:03,827-Speed 13860.94 samples/sec Loss 0.9626 LearningRate 0.0000 Epoch: 39 Global Step: 68220 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-04 18:23:21,617-Speed 13815.36 samples/sec Loss 0.9706 LearningRate 0.0000 Epoch: 39 Global Step: 68230 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-04 18:23:39,327-Speed 13877.34 samples/sec Loss 0.9580 LearningRate 0.0000 Epoch: 39 Global Step: 68240 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-04 18:23:57,084-Speed 13840.89 samples/sec Loss 0.9603 LearningRate 0.0000 Epoch: 39 Global Step: 68250 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-04 18:24:14,851-Speed 13833.34 samples/sec Loss 0.9657 LearningRate 0.0000 Epoch: 39 Global Step: 68260 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-04 18:24:32,607-Speed 13841.95 samples/sec Loss 0.9546 LearningRate 0.0000 Epoch: 39 Global Step: 68270 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-04 18:24:50,341-Speed 13858.21 samples/sec Loss 0.9622 LearningRate 0.0000 Epoch: 39 Global Step: 68280 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-03-04 18:25:08,007-Speed 13912.16 samples/sec Loss 0.9646 LearningRate 0.0000 Epoch: 39 Global Step: 68290 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-04 18:25:25,753-Speed 13849.27 samples/sec Loss 0.9624 LearningRate 0.0000 Epoch: 39 Global Step: 68300 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-04 18:25:43,483-Speed 13861.80 samples/sec Loss 0.9549 LearningRate 0.0000 Epoch: 39 Global Step: 68310 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-04 18:26:01,243-Speed 13838.97 samples/sec Loss 0.9574 LearningRate 0.0000 Epoch: 39 Global Step: 68320 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-04 18:26:18,999-Speed 13841.11 samples/sec Loss 0.9660 LearningRate 0.0000 Epoch: 39 Global Step: 68330 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-04 18:26:36,753-Speed 13843.11 samples/sec Loss 0.9621 LearningRate 0.0000 Epoch: 39 Global Step: 68340 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-04 18:26:54,525-Speed 13829.21 samples/sec Loss 0.9669 LearningRate 0.0000 Epoch: 39 Global Step: 68350 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-04 18:27:12,250-Speed 13866.32 samples/sec Loss 0.9597 LearningRate 0.0000 Epoch: 39 Global Step: 68360 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-04 18:27:30,086-Speed 13778.98 samples/sec Loss 0.9671 LearningRate 0.0000 Epoch: 39 Global Step: 68370 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-04 18:27:47,886-Speed 13807.12 samples/sec Loss 0.9596 LearningRate 0.0000 Epoch: 39 Global Step: 68380 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-04 18:28:05,777-Speed 13737.99 samples/sec Loss 0.9618 LearningRate 0.0000 Epoch: 39 Global Step: 68390 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-03-04 18:28:23,519-Speed 13852.87 samples/sec Loss 0.9664 LearningRate 0.0000 Epoch: 39 Global Step: 68400 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-04 18:28:41,304-Speed 13818.61 samples/sec Loss 0.9693 LearningRate 0.0000 Epoch: 39 Global Step: 68410 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-04 18:28:59,017-Speed 13875.23 samples/sec Loss 0.9672 LearningRate 0.0000 Epoch: 39 Global Step: 68420 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-04 18:29:16,780-Speed 13836.04 samples/sec Loss 0.9692 LearningRate 0.0000 Epoch: 39 Global Step: 68430 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-04 18:29:34,574-Speed 13812.13 samples/sec Loss 0.9661 LearningRate 0.0000 Epoch: 39 Global Step: 68440 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-03-04 18:29:52,216-Speed 13931.24 samples/sec Loss 0.9692 LearningRate 0.0000 Epoch: 39 Global Step: 68450 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-03-04 18:30:09,882-Speed 13912.08 samples/sec Loss 0.9631 LearningRate 0.0000 Epoch: 39 Global Step: 68460 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-03-04 18:30:27,591-Speed 13878.06 samples/sec Loss 0.9743 LearningRate 0.0000 Epoch: 39 Global Step: 68470 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-03-04 18:30:45,331-Speed 13854.20 samples/sec Loss 0.9655 LearningRate 0.0000 Epoch: 39 Global Step: 68480 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-03-04 18:31:03,101-Speed 13830.75 samples/sec Loss 0.9693 LearningRate 0.0000 Epoch: 39 Global Step: 68490 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-03-04 18:31:20,850-Speed 13846.51 samples/sec Loss 0.9743 LearningRate 0.0000 Epoch: 39 Global Step: 68500 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-03-04 18:31:38,642-Speed 13813.93 samples/sec Loss 0.9689 LearningRate 0.0000 Epoch: 39 Global Step: 68510 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-03-04 18:31:56,350-Speed 13879.80 samples/sec Loss 0.9637 LearningRate 0.0000 Epoch: 39 Global Step: 68520 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-03-04 18:32:14,038-Speed 13894.59 samples/sec Loss 0.9642 LearningRate 0.0000 Epoch: 39 Global Step: 68530 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-03-04 18:32:31,760-Speed 13867.55 samples/sec Loss 0.9659 LearningRate 0.0000 Epoch: 39 Global Step: 68540 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-04 18:32:49,467-Speed 13880.47 samples/sec Loss 0.9647 LearningRate 0.0000 Epoch: 39 Global Step: 68550 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-04 18:33:07,220-Speed 13843.82 samples/sec Loss 0.9606 LearningRate 0.0000 Epoch: 39 Global Step: 68560 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-04 18:33:25,028-Speed 13801.44 samples/sec Loss 0.9627 LearningRate 0.0000 Epoch: 39 Global Step: 68570 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-04 18:33:42,788-Speed 13838.10 samples/sec Loss 0.9640 LearningRate 0.0000 Epoch: 39 Global Step: 68580 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-04 18:34:00,557-Speed 13831.66 samples/sec Loss 0.9582 LearningRate 0.0000 Epoch: 39 Global Step: 68590 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-04 18:34:18,283-Speed 13864.90 samples/sec Loss 0.9722 LearningRate 0.0000 Epoch: 39 Global Step: 68600 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-04 18:34:36,062-Speed 13823.94 samples/sec Loss 0.9634 LearningRate 0.0000 Epoch: 39 Global Step: 68610 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-04 18:34:53,809-Speed 13848.58 samples/sec Loss 0.9689 LearningRate 0.0000 Epoch: 39 Global Step: 68620 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-03-04 18:35:11,539-Speed 13861.18 samples/sec Loss 0.9563 LearningRate 0.0000 Epoch: 39 Global Step: 68630 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-03-04 18:35:29,227-Speed 13895.31 samples/sec Loss 0.9632 LearningRate 0.0000 Epoch: 39 Global Step: 68640 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-03-04 18:35:47,120-Speed 13907.40 samples/sec Loss 0.9682 LearningRate 0.0000 Epoch: 39 Global Step: 68650 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-03-04 18:36:04,855-Speed 13857.73 samples/sec Loss 0.9701 LearningRate 0.0000 Epoch: 39 Global Step: 68660 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-03-04 18:36:22,610-Speed 13842.35 samples/sec Loss 0.9648 LearningRate 0.0000 Epoch: 39 Global Step: 68670 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-03-04 18:36:40,476-Speed 13756.75 samples/sec Loss 0.9673 LearningRate 0.0000 Epoch: 39 Global Step: 68680 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-03-04 18:36:58,163-Speed 13895.58 samples/sec Loss 0.9677 LearningRate 0.0000 Epoch: 39 Global Step: 68690 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-03-04 18:37:15,927-Speed 13834.98 samples/sec Loss 0.9664 LearningRate 0.0000 Epoch: 39 Global Step: 68700 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-03-04 18:37:33,774-Speed 13771.04 samples/sec Loss 0.9623 LearningRate 0.0000 Epoch: 39 Global Step: 68710 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-03-04 18:37:51,493-Speed 13870.90 samples/sec Loss 0.9649 LearningRate 0.0000 Epoch: 39 Global Step: 68720 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-04 18:38:09,141-Speed 13926.49 samples/sec Loss 0.9579 LearningRate 0.0000 Epoch: 39 Global Step: 68730 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-04 18:38:27,023-Speed 13744.37 samples/sec Loss 0.9677 LearningRate 0.0000 Epoch: 39 Global Step: 68740 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-04 18:38:44,761-Speed 13855.25 samples/sec Loss 0.9644 LearningRate 0.0000 Epoch: 39 Global Step: 68750 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-04 18:39:02,555-Speed 13812.46 samples/sec Loss 0.9699 LearningRate 0.0000 Epoch: 39 Global Step: 68760 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-03-04 18:39:20,303-Speed 13847.66 samples/sec Loss 0.9670 LearningRate 0.0000 Epoch: 39 Global Step: 68770 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-03-04 18:39:38,062-Speed 13839.13 samples/sec Loss 0.9694 LearningRate 0.0000 Epoch: 39 Global Step: 68780 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-03-04 18:39:55,792-Speed 13861.51 samples/sec Loss 0.9637 LearningRate 0.0000 Epoch: 39 Global Step: 68790 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-03-04 18:40:13,542-Speed 13846.20 samples/sec Loss 0.9594 LearningRate 0.0000 Epoch: 39 Global Step: 68800 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-03-04 18:40:31,352-Speed 13799.91 samples/sec Loss 0.9664 LearningRate 0.0000 Epoch: 39 Global Step: 68810 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-03-04 18:40:49,088-Speed 13857.76 samples/sec Loss 0.9665 LearningRate 0.0000 Epoch: 39 Global Step: 68820 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-03-04 18:41:06,873-Speed 13818.21 samples/sec Loss 0.9688 LearningRate 0.0000 Epoch: 39 Global Step: 68830 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-03-04 18:41:24,584-Speed 13876.75 samples/sec Loss 0.9705 LearningRate 0.0000 Epoch: 39 Global Step: 68840 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-03-04 18:41:42,324-Speed 13854.56 samples/sec Loss 0.9659 LearningRate 0.0000 Epoch: 39 Global Step: 68850 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-03-04 18:42:00,043-Speed 13870.43 samples/sec Loss 0.9600 LearningRate 0.0000 Epoch: 39 Global Step: 68860 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-04 18:42:17,788-Speed 13850.15 samples/sec Loss 0.9658 LearningRate 0.0000 Epoch: 39 Global Step: 68870 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-04 18:42:35,600-Speed 13797.85 samples/sec Loss 0.9655 LearningRate 0.0000 Epoch: 39 Global Step: 68880 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-04 18:42:53,297-Speed 13887.87 samples/sec Loss 0.9640 LearningRate 0.0000 Epoch: 39 Global Step: 68890 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-04 18:43:10,978-Speed 13901.18 samples/sec Loss 0.9672 LearningRate 0.0000 Epoch: 39 Global Step: 68900 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-04 18:43:28,762-Speed 13819.21 samples/sec Loss 0.9712 LearningRate 0.0000 Epoch: 39 Global Step: 68910 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-03-04 18:43:46,489-Speed 13864.22 samples/sec Loss 0.9617 LearningRate 0.0000 Epoch: 39 Global Step: 68920 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-03-04 18:44:04,155-Speed 13911.95 samples/sec Loss 0.9660 LearningRate 0.0000 Epoch: 39 Global Step: 68930 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-03-04 18:44:22,012-Speed 13763.80 samples/sec Loss 0.9705 LearningRate 0.0000 Epoch: 39 Global Step: 68940 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-03-04 18:44:39,735-Speed 13867.13 samples/sec Loss 0.9654 LearningRate 0.0000 Epoch: 39 Global Step: 68950 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-03-04 18:44:57,472-Speed 13856.19 samples/sec Loss 0.9712 LearningRate 0.0000 Epoch: 39 Global Step: 68960 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-03-04 18:45:15,154-Speed 13899.43 samples/sec Loss 0.9564 LearningRate 0.0000 Epoch: 39 Global Step: 68970 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-03-04 18:45:32,962-Speed 13801.77 samples/sec Loss 0.9683 LearningRate 0.0000 Epoch: 39 Global Step: 68980 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-03-04 18:45:50,770-Speed 13800.55 samples/sec Loss 0.9693 LearningRate 0.0000 Epoch: 39 Global Step: 68990 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-03-04 18:46:08,541-Speed 13829.83 samples/sec Loss 0.9703 LearningRate 0.0000 Epoch: 39 Global Step: 69000 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-03-04 18:46:26,266-Speed 13865.69 samples/sec Loss 0.9635 LearningRate 0.0000 Epoch: 39 Global Step: 69010 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-03-04 18:46:44,025-Speed 13840.06 samples/sec Loss 0.9702 LearningRate 0.0000 Epoch: 39 Global Step: 69020 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-03-04 18:47:01,871-Speed 13771.70 samples/sec Loss 0.9643 LearningRate 0.0000 Epoch: 39 Global Step: 69030 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-03-04 18:47:19,662-Speed 13813.94 samples/sec Loss 0.9678 LearningRate 0.0000 Epoch: 39 Global Step: 69040 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-03-04 18:47:37,427-Speed 13834.26 samples/sec Loss 0.9625 LearningRate 0.0000 Epoch: 39 Global Step: 69050 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-03-04 18:47:55,145-Speed 13871.61 samples/sec Loss 0.9700 LearningRate 0.0000 Epoch: 39 Global Step: 69060 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-03-04 18:48:12,865-Speed 13870.25 samples/sec Loss 0.9650 LearningRate 0.0000 Epoch: 39 Global Step: 69070 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-03-04 18:48:30,619-Speed 13842.76 samples/sec Loss 0.9644 LearningRate 0.0000 Epoch: 39 Global Step: 69080 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-03-04 18:48:48,315-Speed 13888.78 samples/sec Loss 0.9705 LearningRate 0.0000 Epoch: 39 Global Step: 69090 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-03-04 18:49:06,001-Speed 13896.58 samples/sec Loss 0.9623 LearningRate 0.0000 Epoch: 39 Global Step: 69100 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-03-04 18:49:23,706-Speed 13881.23 samples/sec Loss 0.9679 LearningRate 0.0000 Epoch: 39 Global Step: 69110 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-03-04 18:49:41,451-Speed 13850.28 samples/sec Loss 0.9683 LearningRate 0.0000 Epoch: 39 Global Step: 69120 Fp16 Grad Scale: 32768 Required: -0 hours Training: 2022-03-04 18:49:59,190-Speed 13854.37 samples/sec Loss 0.9662 LearningRate 0.0000 Epoch: 39 Global Step: 69130 Fp16 Grad Scale: 32768 Required: -0 hours