Training: 2022-04-27 01:45:32,689-rank_id: 0 Training: 2022-04-27 01:45:59,045-: margin_list [1.0, 0.0, 0.4] Training: 2022-04-27 01:45:59,046-: network r100 Training: 2022-04-27 01:45:59,046-: resume False Training: 2022-04-27 01:45:59,046-: output work_dirs/wf12m_r100 Training: 2022-04-27 01:45:59,046-: embedding_size 512 Training: 2022-04-27 01:45:59,046-: sample_rate 1.0 Training: 2022-04-27 01:45:59,046-: interclass_filtering_threshold0 Training: 2022-04-27 01:45:59,046-: fp16 True Training: 2022-04-27 01:45:59,046-: batch_size 128 Training: 2022-04-27 01:45:59,047-: optimizer sgd Training: 2022-04-27 01:45:59,047-: lr 0.1 Training: 2022-04-27 01:45:59,047-: momentum 0.9 Training: 2022-04-27 01:45:59,047-: weight_decay 0.0005 Training: 2022-04-27 01:45:59,047-: verbose 2000 Training: 2022-04-27 01:45:59,047-: frequent 10 Training: 2022-04-27 01:45:59,047-: dali False Training: 2022-04-27 01:45:59,047-: rec /train_tmp/WebFace12M Training: 2022-04-27 01:45:59,047-: num_classes 617970 Training: 2022-04-27 01:45:59,047-: num_image 12720066 Training: 2022-04-27 01:45:59,047-: num_epoch 20 Training: 2022-04-27 01:45:59,047-: warmup_epoch 0 Training: 2022-04-27 01:45:59,047-: val_targets [] Training: 2022-04-27 01:45:59,047-: total_batch_size 1024 Training: 2022-04-27 01:45:59,047-: warmup_step 0 Training: 2022-04-27 01:45:59,047-: total_step 248420 Training: 2022-04-27 01:46:24,525-Reducer buckets have been rebuilt in this iteration. Training: 2022-04-27 01:46:30,651-Speed 2986.67 samples/sec Loss 43.0786 LearningRate 0.1000 Epoch: 0 Global Step: 20 Fp16 Grad Scale: 8192 Required: 104 hours Training: 2022-04-27 01:46:33,957-Speed 3098.78 samples/sec Loss 43.2348 LearningRate 0.1000 Epoch: 0 Global Step: 30 Fp16 Grad Scale: 8192 Required: 78 hours Training: 2022-04-27 01:46:37,403-Speed 2972.81 samples/sec Loss 43.3577 LearningRate 0.1000 Epoch: 0 Global Step: 40 Fp16 Grad Scale: 8192 Required: 65 hours Training: 2022-04-27 01:46:40,698-Speed 3108.54 samples/sec Loss 43.6135 LearningRate 0.1000 Epoch: 0 Global Step: 50 Fp16 Grad Scale: 8192 Required: 56 hours Training: 2022-04-27 01:46:43,997-Speed 3104.51 samples/sec Loss 43.5974 LearningRate 0.1000 Epoch: 0 Global Step: 60 Fp16 Grad Scale: 8192 Required: 51 hours Training: 2022-04-27 01:46:47,337-Speed 3067.85 samples/sec Loss 43.7322 LearningRate 0.0999 Epoch: 0 Global Step: 70 Fp16 Grad Scale: 8192 Required: 47 hours Training: 2022-04-27 01:46:50,765-Speed 2987.36 samples/sec Loss 44.1046 LearningRate 0.0999 Epoch: 0 Global Step: 80 Fp16 Grad Scale: 8192 Required: 44 hours Training: 2022-04-27 01:46:54,137-Speed 3038.32 samples/sec Loss 43.6538 LearningRate 0.0999 Epoch: 0 Global Step: 90 Fp16 Grad Scale: 8192 Required: 42 hours Training: 2022-04-27 01:46:57,424-Speed 3116.33 samples/sec Loss 43.6409 LearningRate 0.0999 Epoch: 0 Global Step: 100 Fp16 Grad Scale: 8192 Required: 40 hours Training: 2022-04-27 01:47:00,749-Speed 3080.66 samples/sec Loss 43.5258 LearningRate 0.0999 Epoch: 0 Global Step: 110 Fp16 Grad Scale: 16384 Required: 38 hours Training: 2022-04-27 01:47:04,023-Speed 3128.40 samples/sec Loss 43.4639 LearningRate 0.0999 Epoch: 0 Global Step: 120 Fp16 Grad Scale: 16384 Required: 37 hours Training: 2022-04-27 01:47:07,313-Speed 3113.44 samples/sec Loss 43.3916 LearningRate 0.0999 Epoch: 0 Global Step: 130 Fp16 Grad Scale: 16384 Required: 36 hours Training: 2022-04-27 01:47:10,679-Speed 3043.48 samples/sec Loss 43.4332 LearningRate 0.0999 Epoch: 0 Global Step: 140 Fp16 Grad Scale: 16384 Required: 35 hours Training: 2022-04-27 01:47:13,977-Speed 3105.46 samples/sec Loss 43.4014 LearningRate 0.0999 Epoch: 0 Global Step: 150 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-27 01:47:17,316-Speed 3068.59 samples/sec Loss 43.3452 LearningRate 0.0999 Epoch: 0 Global Step: 160 Fp16 Grad Scale: 16384 Required: 34 hours Training: 2022-04-27 01:47:20,999-Speed 2780.84 samples/sec Loss 43.2646 LearningRate 0.0999 Epoch: 0 Global Step: 170 Fp16 Grad Scale: 16384 Required: 33 hours Training: 2022-04-27 01:47:24,355-Speed 3051.92 samples/sec Loss 43.3127 LearningRate 0.0999 Epoch: 0 Global Step: 180 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-27 01:47:27,644-Speed 3114.35 samples/sec Loss 43.3388 LearningRate 0.0998 Epoch: 0 Global Step: 190 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-27 01:47:30,996-Speed 3055.68 samples/sec Loss 43.2834 LearningRate 0.0998 Epoch: 0 Global Step: 200 Fp16 Grad Scale: 16384 Required: 32 hours Training: 2022-04-27 01:47:34,378-Speed 3028.81 samples/sec Loss 43.2613 LearningRate 0.0998 Epoch: 0 Global Step: 210 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-04-27 01:47:37,703-Speed 3080.01 samples/sec Loss 43.1186 LearningRate 0.0998 Epoch: 0 Global Step: 220 Fp16 Grad Scale: 32768 Required: 31 hours Training: 2022-04-27 01:47:41,069-Speed 3043.19 samples/sec Loss 43.1315 LearningRate 0.0998 Epoch: 0 Global Step: 230 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-04-27 01:47:44,391-Speed 3083.97 samples/sec Loss 43.0595 LearningRate 0.0998 Epoch: 0 Global Step: 240 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-04-27 01:47:47,768-Speed 3032.73 samples/sec Loss 43.0336 LearningRate 0.0998 Epoch: 0 Global Step: 250 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-04-27 01:47:51,079-Speed 3094.09 samples/sec Loss 43.0296 LearningRate 0.0998 Epoch: 0 Global Step: 260 Fp16 Grad Scale: 32768 Required: 30 hours Training: 2022-04-27 01:47:54,360-Speed 3121.88 samples/sec Loss 42.9747 LearningRate 0.0998 Epoch: 0 Global Step: 270 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-04-27 01:47:57,656-Speed 3107.89 samples/sec Loss 42.9624 LearningRate 0.0998 Epoch: 0 Global Step: 280 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-04-27 01:48:00,992-Speed 3070.60 samples/sec Loss 42.8749 LearningRate 0.0998 Epoch: 0 Global Step: 290 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-04-27 01:48:04,294-Speed 3102.11 samples/sec Loss 42.8309 LearningRate 0.0998 Epoch: 0 Global Step: 300 Fp16 Grad Scale: 32768 Required: 29 hours Training: 2022-04-27 01:48:07,695-Speed 3012.97 samples/sec Loss 42.8219 LearningRate 0.0998 Epoch: 0 Global Step: 310 Fp16 Grad Scale: 65536 Required: 29 hours Training: 2022-04-27 01:48:11,012-Speed 3088.03 samples/sec Loss 42.6919 LearningRate 0.0997 Epoch: 0 Global Step: 320 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-04-27 01:48:14,372-Speed 3048.82 samples/sec Loss 42.5979 LearningRate 0.0997 Epoch: 0 Global Step: 330 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-04-27 01:48:17,728-Speed 3051.54 samples/sec Loss 42.6776 LearningRate 0.0997 Epoch: 0 Global Step: 340 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-04-27 01:48:21,046-Speed 3087.55 samples/sec Loss 42.6395 LearningRate 0.0997 Epoch: 0 Global Step: 350 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-04-27 01:48:24,377-Speed 3074.84 samples/sec Loss 42.7009 LearningRate 0.0997 Epoch: 0 Global Step: 360 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-04-27 01:48:27,743-Speed 3042.88 samples/sec Loss 42.5829 LearningRate 0.0997 Epoch: 0 Global Step: 370 Fp16 Grad Scale: 65536 Required: 28 hours Training: 2022-04-27 01:48:31,091-Speed 3059.74 samples/sec Loss 42.5689 LearningRate 0.0997 Epoch: 0 Global Step: 380 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-04-27 01:48:34,421-Speed 3076.07 samples/sec Loss 42.4795 LearningRate 0.0997 Epoch: 0 Global Step: 390 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-04-27 01:48:37,769-Speed 3059.14 samples/sec Loss 42.4920 LearningRate 0.0997 Epoch: 0 Global Step: 400 Fp16 Grad Scale: 65536 Required: 27 hours Training: 2022-04-27 01:48:41,163-Speed 3017.62 samples/sec Loss 42.4969 LearningRate 0.0997 Epoch: 0 Global Step: 410 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-04-27 01:48:44,526-Speed 3046.66 samples/sec Loss 42.4195 LearningRate 0.0997 Epoch: 0 Global Step: 420 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-04-27 01:48:47,891-Speed 3043.67 samples/sec Loss 42.2942 LearningRate 0.0997 Epoch: 0 Global Step: 430 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-04-27 01:48:51,208-Speed 3088.37 samples/sec Loss 42.3060 LearningRate 0.0996 Epoch: 0 Global Step: 440 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-04-27 01:48:54,491-Speed 3120.08 samples/sec Loss 42.2664 LearningRate 0.0996 Epoch: 0 Global Step: 450 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-04-27 01:48:57,826-Speed 3071.42 samples/sec Loss 42.1994 LearningRate 0.0996 Epoch: 0 Global Step: 460 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-04-27 01:49:01,124-Speed 3105.27 samples/sec Loss 42.2452 LearningRate 0.0996 Epoch: 0 Global Step: 470 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-04-27 01:49:04,453-Speed 3077.62 samples/sec Loss 42.1489 LearningRate 0.0996 Epoch: 0 Global Step: 480 Fp16 Grad Scale: 131072 Required: 27 hours Training: 2022-04-27 01:49:07,798-Speed 3061.76 samples/sec Loss 42.0032 LearningRate 0.0996 Epoch: 0 Global Step: 490 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-04-27 01:49:11,114-Speed 3089.90 samples/sec Loss 42.0067 LearningRate 0.0996 Epoch: 0 Global Step: 500 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-04-27 01:49:14,437-Speed 3081.69 samples/sec Loss 42.0281 LearningRate 0.0996 Epoch: 0 Global Step: 510 Fp16 Grad Scale: 262144 Required: 26 hours Training: 2022-04-27 01:49:17,735-Speed 3106.24 samples/sec Loss 41.8687 LearningRate 0.0996 Epoch: 0 Global Step: 520 Fp16 Grad Scale: 262144 Required: 26 hours Training: 2022-04-27 01:49:21,045-Speed 3095.01 samples/sec Loss 41.9638 LearningRate 0.0996 Epoch: 0 Global Step: 530 Fp16 Grad Scale: 262144 Required: 26 hours Training: 2022-04-27 01:49:24,378-Speed 3073.10 samples/sec Loss 41.8544 LearningRate 0.0996 Epoch: 0 Global Step: 540 Fp16 Grad Scale: 262144 Required: 26 hours Training: 2022-04-27 01:49:27,715-Speed 3069.64 samples/sec Loss 41.8189 LearningRate 0.0996 Epoch: 0 Global Step: 550 Fp16 Grad Scale: 262144 Required: 26 hours Training: 2022-04-27 01:49:31,087-Speed 3037.53 samples/sec Loss 41.7614 LearningRate 0.0995 Epoch: 0 Global Step: 560 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-04-27 01:49:34,433-Speed 3061.69 samples/sec Loss 41.7508 LearningRate 0.0995 Epoch: 0 Global Step: 570 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-04-27 01:49:37,727-Speed 3109.42 samples/sec Loss 41.6815 LearningRate 0.0995 Epoch: 0 Global Step: 580 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-04-27 01:49:41,081-Speed 3053.74 samples/sec Loss 41.5836 LearningRate 0.0995 Epoch: 0 Global Step: 590 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-04-27 01:49:44,448-Speed 3042.40 samples/sec Loss 41.6075 LearningRate 0.0995 Epoch: 0 Global Step: 600 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-04-27 01:49:47,778-Speed 3076.29 samples/sec Loss 41.6037 LearningRate 0.0995 Epoch: 0 Global Step: 610 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-04-27 01:49:51,155-Speed 3033.38 samples/sec Loss 41.5102 LearningRate 0.0995 Epoch: 0 Global Step: 620 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-04-27 01:49:54,481-Speed 3079.87 samples/sec Loss 41.4921 LearningRate 0.0995 Epoch: 0 Global Step: 630 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-04-27 01:49:57,808-Speed 3079.07 samples/sec Loss 41.5434 LearningRate 0.0995 Epoch: 0 Global Step: 640 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-04-27 01:50:01,164-Speed 3051.86 samples/sec Loss 41.3123 LearningRate 0.0995 Epoch: 0 Global Step: 650 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-04-27 01:50:04,582-Speed 2996.47 samples/sec Loss 41.3155 LearningRate 0.0995 Epoch: 0 Global Step: 660 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-04-27 01:50:07,946-Speed 3045.54 samples/sec Loss 41.3413 LearningRate 0.0995 Epoch: 0 Global Step: 670 Fp16 Grad Scale: 131072 Required: 26 hours Training: 2022-04-27 01:50:11,308-Speed 3046.67 samples/sec Loss 41.2215 LearningRate 0.0995 Epoch: 0 Global Step: 680 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-04-27 01:50:14,634-Speed 3079.30 samples/sec Loss 41.1014 LearningRate 0.0994 Epoch: 0 Global Step: 690 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-04-27 01:50:17,928-Speed 3109.75 samples/sec Loss 41.1524 LearningRate 0.0994 Epoch: 0 Global Step: 700 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-04-27 01:50:21,212-Speed 3119.22 samples/sec Loss 41.0278 LearningRate 0.0994 Epoch: 0 Global Step: 710 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-04-27 01:50:24,511-Speed 3104.50 samples/sec Loss 41.1053 LearningRate 0.0994 Epoch: 0 Global Step: 720 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-04-27 01:50:27,848-Speed 3069.83 samples/sec Loss 41.0712 LearningRate 0.0994 Epoch: 0 Global Step: 730 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-04-27 01:50:31,157-Speed 3096.25 samples/sec Loss 40.9076 LearningRate 0.0994 Epoch: 0 Global Step: 740 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-04-27 01:50:34,462-Speed 3099.56 samples/sec Loss 40.8406 LearningRate 0.0994 Epoch: 0 Global Step: 750 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-04-27 01:50:37,759-Speed 3106.54 samples/sec Loss 40.8167 LearningRate 0.0994 Epoch: 0 Global Step: 760 Fp16 Grad Scale: 262144 Required: 25 hours Training: 2022-04-27 01:50:41,054-Speed 3108.92 samples/sec Loss 40.7383 LearningRate 0.0994 Epoch: 0 Global Step: 770 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-04-27 01:50:44,449-Speed 3017.09 samples/sec Loss 40.7036 LearningRate 0.0994 Epoch: 0 Global Step: 780 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-04-27 01:50:47,832-Speed 3028.68 samples/sec Loss 40.6246 LearningRate 0.0994 Epoch: 0 Global Step: 790 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-04-27 01:50:51,112-Speed 3122.88 samples/sec Loss 40.6331 LearningRate 0.0994 Epoch: 0 Global Step: 800 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-04-27 01:50:54,479-Speed 3042.17 samples/sec Loss 40.5333 LearningRate 0.0993 Epoch: 0 Global Step: 810 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-04-27 01:50:57,847-Speed 3041.70 samples/sec Loss 40.4484 LearningRate 0.0993 Epoch: 0 Global Step: 820 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-04-27 01:51:01,215-Speed 3041.12 samples/sec Loss 40.4011 LearningRate 0.0993 Epoch: 0 Global Step: 830 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-04-27 01:51:04,564-Speed 3058.55 samples/sec Loss 40.4281 LearningRate 0.0993 Epoch: 0 Global Step: 840 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-04-27 01:51:07,959-Speed 3016.96 samples/sec Loss 40.3924 LearningRate 0.0993 Epoch: 0 Global Step: 850 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-04-27 01:51:11,276-Speed 3087.89 samples/sec Loss 40.2791 LearningRate 0.0993 Epoch: 0 Global Step: 860 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-04-27 01:51:14,563-Speed 3116.51 samples/sec Loss 40.2284 LearningRate 0.0993 Epoch: 0 Global Step: 870 Fp16 Grad Scale: 262144 Required: 25 hours Training: 2022-04-27 01:51:17,924-Speed 3048.50 samples/sec Loss 40.1248 LearningRate 0.0993 Epoch: 0 Global Step: 880 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-04-27 01:51:21,232-Speed 3096.65 samples/sec Loss 40.1591 LearningRate 0.0993 Epoch: 0 Global Step: 890 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-04-27 01:51:24,583-Speed 3055.80 samples/sec Loss 40.0711 LearningRate 0.0993 Epoch: 0 Global Step: 900 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-04-27 01:51:28,014-Speed 2985.76 samples/sec Loss 39.9709 LearningRate 0.0993 Epoch: 0 Global Step: 910 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-04-27 01:51:31,382-Speed 3041.74 samples/sec Loss 40.0381 LearningRate 0.0993 Epoch: 0 Global Step: 920 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-04-27 01:51:34,642-Speed 3141.91 samples/sec Loss 39.9180 LearningRate 0.0993 Epoch: 0 Global Step: 930 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-04-27 01:51:37,996-Speed 3053.89 samples/sec Loss 39.8496 LearningRate 0.0992 Epoch: 0 Global Step: 940 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-04-27 01:51:41,335-Speed 3068.10 samples/sec Loss 39.9461 LearningRate 0.0992 Epoch: 0 Global Step: 950 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-04-27 01:51:44,693-Speed 3050.08 samples/sec Loss 39.8462 LearningRate 0.0992 Epoch: 0 Global Step: 960 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-04-27 01:51:48,083-Speed 3021.12 samples/sec Loss 39.6826 LearningRate 0.0992 Epoch: 0 Global Step: 970 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-04-27 01:51:51,366-Speed 3119.88 samples/sec Loss 39.6276 LearningRate 0.0992 Epoch: 0 Global Step: 980 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-04-27 01:51:54,699-Speed 3072.71 samples/sec Loss 39.7032 LearningRate 0.0992 Epoch: 0 Global Step: 990 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-04-27 01:51:58,050-Speed 3057.15 samples/sec Loss 39.6781 LearningRate 0.0992 Epoch: 0 Global Step: 1000 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-04-27 01:52:01,375-Speed 3080.80 samples/sec Loss 39.5571 LearningRate 0.0992 Epoch: 0 Global Step: 1010 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-04-27 01:52:04,784-Speed 3004.63 samples/sec Loss 39.4922 LearningRate 0.0992 Epoch: 0 Global Step: 1020 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-04-27 01:52:08,086-Speed 3102.48 samples/sec Loss 39.4511 LearningRate 0.0992 Epoch: 0 Global Step: 1030 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-04-27 01:52:11,405-Speed 3086.49 samples/sec Loss 39.3832 LearningRate 0.0992 Epoch: 0 Global Step: 1040 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-04-27 01:52:14,737-Speed 3073.96 samples/sec Loss 39.3722 LearningRate 0.0992 Epoch: 0 Global Step: 1050 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-04-27 01:52:18,029-Speed 3111.61 samples/sec Loss 39.2739 LearningRate 0.0991 Epoch: 0 Global Step: 1060 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-04-27 01:52:21,366-Speed 3070.13 samples/sec Loss 39.1834 LearningRate 0.0991 Epoch: 0 Global Step: 1070 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-04-27 01:52:24,646-Speed 3122.35 samples/sec Loss 39.1309 LearningRate 0.0991 Epoch: 0 Global Step: 1080 Fp16 Grad Scale: 131072 Required: 25 hours Training: 2022-04-27 01:52:27,958-Speed 3093.77 samples/sec Loss 39.1564 LearningRate 0.0991 Epoch: 0 Global Step: 1090 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:52:31,267-Speed 3094.97 samples/sec Loss 39.1441 LearningRate 0.0991 Epoch: 0 Global Step: 1100 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:52:34,574-Speed 3097.53 samples/sec Loss 39.0052 LearningRate 0.0991 Epoch: 0 Global Step: 1110 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:52:37,922-Speed 3059.32 samples/sec Loss 38.9490 LearningRate 0.0991 Epoch: 0 Global Step: 1120 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:52:41,270-Speed 3059.40 samples/sec Loss 38.9739 LearningRate 0.0991 Epoch: 0 Global Step: 1130 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:52:44,578-Speed 3096.48 samples/sec Loss 38.8139 LearningRate 0.0991 Epoch: 0 Global Step: 1140 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:52:47,881-Speed 3101.30 samples/sec Loss 38.8279 LearningRate 0.0991 Epoch: 0 Global Step: 1150 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:52:51,200-Speed 3086.07 samples/sec Loss 38.7623 LearningRate 0.0991 Epoch: 0 Global Step: 1160 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:52:54,523-Speed 3082.32 samples/sec Loss 38.8194 LearningRate 0.0991 Epoch: 0 Global Step: 1170 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:52:57,831-Speed 3097.03 samples/sec Loss 38.7396 LearningRate 0.0991 Epoch: 0 Global Step: 1180 Fp16 Grad Scale: 262144 Required: 24 hours Training: 2022-04-27 01:53:01,179-Speed 3059.50 samples/sec Loss 38.7393 LearningRate 0.0990 Epoch: 0 Global Step: 1190 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:53:04,487-Speed 3096.23 samples/sec Loss 38.5133 LearningRate 0.0990 Epoch: 0 Global Step: 1200 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:53:07,911-Speed 2992.18 samples/sec Loss 38.6103 LearningRate 0.0990 Epoch: 0 Global Step: 1210 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:53:11,244-Speed 3072.87 samples/sec Loss 38.5089 LearningRate 0.0990 Epoch: 0 Global Step: 1220 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:53:14,555-Speed 3094.31 samples/sec Loss 38.4729 LearningRate 0.0990 Epoch: 0 Global Step: 1230 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:53:17,901-Speed 3060.93 samples/sec Loss 38.4560 LearningRate 0.0990 Epoch: 0 Global Step: 1240 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:53:21,199-Speed 3106.24 samples/sec Loss 38.4156 LearningRate 0.0990 Epoch: 0 Global Step: 1250 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:53:24,499-Speed 3103.54 samples/sec Loss 38.2790 LearningRate 0.0990 Epoch: 0 Global Step: 1260 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:53:27,804-Speed 3099.17 samples/sec Loss 38.2102 LearningRate 0.0990 Epoch: 0 Global Step: 1270 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:53:31,065-Speed 3140.81 samples/sec Loss 38.1405 LearningRate 0.0990 Epoch: 0 Global Step: 1280 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:53:34,358-Speed 3110.81 samples/sec Loss 38.1743 LearningRate 0.0990 Epoch: 0 Global Step: 1290 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:53:37,742-Speed 3026.56 samples/sec Loss 38.2015 LearningRate 0.0990 Epoch: 0 Global Step: 1300 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:53:41,053-Speed 3094.24 samples/sec Loss 38.1522 LearningRate 0.0989 Epoch: 0 Global Step: 1310 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:53:44,423-Speed 3038.95 samples/sec Loss 37.9972 LearningRate 0.0989 Epoch: 0 Global Step: 1320 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:53:47,811-Speed 3023.14 samples/sec Loss 38.0107 LearningRate 0.0989 Epoch: 0 Global Step: 1330 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:53:51,123-Speed 3093.47 samples/sec Loss 37.8591 LearningRate 0.0989 Epoch: 0 Global Step: 1340 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:53:54,489-Speed 3042.53 samples/sec Loss 37.9075 LearningRate 0.0989 Epoch: 0 Global Step: 1350 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:53:57,884-Speed 3017.75 samples/sec Loss 37.7580 LearningRate 0.0989 Epoch: 0 Global Step: 1360 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:54:01,194-Speed 3094.34 samples/sec Loss 37.7467 LearningRate 0.0989 Epoch: 0 Global Step: 1370 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:54:04,536-Speed 3064.39 samples/sec Loss 37.7431 LearningRate 0.0989 Epoch: 0 Global Step: 1380 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:54:07,855-Speed 3086.48 samples/sec Loss 37.6018 LearningRate 0.0989 Epoch: 0 Global Step: 1390 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:54:11,157-Speed 3101.94 samples/sec Loss 37.5251 LearningRate 0.0989 Epoch: 0 Global Step: 1400 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:54:14,440-Speed 3119.81 samples/sec Loss 37.6447 LearningRate 0.0989 Epoch: 0 Global Step: 1410 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:54:17,723-Speed 3120.57 samples/sec Loss 37.5073 LearningRate 0.0989 Epoch: 0 Global Step: 1420 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:54:21,025-Speed 3102.25 samples/sec Loss 37.5117 LearningRate 0.0989 Epoch: 0 Global Step: 1430 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:54:24,323-Speed 3105.64 samples/sec Loss 37.5293 LearningRate 0.0988 Epoch: 0 Global Step: 1440 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:54:27,662-Speed 3067.63 samples/sec Loss 37.3097 LearningRate 0.0988 Epoch: 0 Global Step: 1450 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:54:30,939-Speed 3125.53 samples/sec Loss 37.3272 LearningRate 0.0988 Epoch: 0 Global Step: 1460 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:54:34,271-Speed 3074.39 samples/sec Loss 37.1705 LearningRate 0.0988 Epoch: 0 Global Step: 1470 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:54:37,624-Speed 3054.76 samples/sec Loss 37.1658 LearningRate 0.0988 Epoch: 0 Global Step: 1480 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:54:40,890-Speed 3135.90 samples/sec Loss 37.1656 LearningRate 0.0988 Epoch: 0 Global Step: 1490 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:54:44,321-Speed 2985.99 samples/sec Loss 37.0639 LearningRate 0.0988 Epoch: 0 Global Step: 1500 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:54:47,703-Speed 3028.36 samples/sec Loss 37.0817 LearningRate 0.0988 Epoch: 0 Global Step: 1510 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:54:51,028-Speed 3080.99 samples/sec Loss 37.1419 LearningRate 0.0988 Epoch: 0 Global Step: 1520 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:54:54,376-Speed 3059.56 samples/sec Loss 36.8111 LearningRate 0.0988 Epoch: 0 Global Step: 1530 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:54:57,697-Speed 3084.36 samples/sec Loss 36.9212 LearningRate 0.0988 Epoch: 0 Global Step: 1540 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:55:01,030-Speed 3072.75 samples/sec Loss 36.8456 LearningRate 0.0988 Epoch: 0 Global Step: 1550 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:55:04,360-Speed 3075.86 samples/sec Loss 36.8577 LearningRate 0.0987 Epoch: 0 Global Step: 1560 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:55:07,663-Speed 3101.84 samples/sec Loss 36.8253 LearningRate 0.0987 Epoch: 0 Global Step: 1570 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:55:11,031-Speed 3040.96 samples/sec Loss 36.7466 LearningRate 0.0987 Epoch: 0 Global Step: 1580 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:55:14,329-Speed 3105.71 samples/sec Loss 36.5477 LearningRate 0.0987 Epoch: 0 Global Step: 1590 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:55:17,630-Speed 3103.39 samples/sec Loss 36.6174 LearningRate 0.0987 Epoch: 0 Global Step: 1600 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:55:20,951-Speed 3084.40 samples/sec Loss 36.6332 LearningRate 0.0987 Epoch: 0 Global Step: 1610 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:55:24,221-Speed 3131.78 samples/sec Loss 36.5143 LearningRate 0.0987 Epoch: 0 Global Step: 1620 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:55:27,549-Speed 3078.10 samples/sec Loss 36.3594 LearningRate 0.0987 Epoch: 0 Global Step: 1630 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:55:30,847-Speed 3105.93 samples/sec Loss 36.4516 LearningRate 0.0987 Epoch: 0 Global Step: 1640 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:55:34,212-Speed 3044.21 samples/sec Loss 36.3354 LearningRate 0.0987 Epoch: 0 Global Step: 1650 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:55:37,485-Speed 3129.34 samples/sec Loss 36.4039 LearningRate 0.0987 Epoch: 0 Global Step: 1660 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:55:40,789-Speed 3100.77 samples/sec Loss 36.3487 LearningRate 0.0987 Epoch: 0 Global Step: 1670 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:55:44,110-Speed 3083.84 samples/sec Loss 36.3298 LearningRate 0.0987 Epoch: 0 Global Step: 1680 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:55:47,382-Speed 3130.63 samples/sec Loss 36.1573 LearningRate 0.0986 Epoch: 0 Global Step: 1690 Fp16 Grad Scale: 262144 Required: 24 hours Training: 2022-04-27 01:55:50,688-Speed 3098.31 samples/sec Loss 36.0739 LearningRate 0.0986 Epoch: 0 Global Step: 1700 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:55:54,023-Speed 3071.10 samples/sec Loss 35.9750 LearningRate 0.0986 Epoch: 0 Global Step: 1710 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:55:57,349-Speed 3079.89 samples/sec Loss 36.1059 LearningRate 0.0986 Epoch: 0 Global Step: 1720 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:56:00,686-Speed 3069.66 samples/sec Loss 35.9792 LearningRate 0.0986 Epoch: 0 Global Step: 1730 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:56:04,000-Speed 3091.19 samples/sec Loss 35.8534 LearningRate 0.0986 Epoch: 0 Global Step: 1740 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:56:07,286-Speed 3116.63 samples/sec Loss 35.8342 LearningRate 0.0986 Epoch: 0 Global Step: 1750 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:56:10,655-Speed 3041.03 samples/sec Loss 35.6063 LearningRate 0.0986 Epoch: 0 Global Step: 1760 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:56:14,037-Speed 3028.00 samples/sec Loss 35.6791 LearningRate 0.0986 Epoch: 0 Global Step: 1770 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:56:17,374-Speed 3069.78 samples/sec Loss 35.8136 LearningRate 0.0986 Epoch: 0 Global Step: 1780 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:56:20,658-Speed 3119.52 samples/sec Loss 35.5772 LearningRate 0.0986 Epoch: 0 Global Step: 1790 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:56:23,948-Speed 3113.09 samples/sec Loss 35.5098 LearningRate 0.0986 Epoch: 0 Global Step: 1800 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:56:27,247-Speed 3105.45 samples/sec Loss 35.4811 LearningRate 0.0985 Epoch: 0 Global Step: 1810 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:56:30,520-Speed 3129.10 samples/sec Loss 35.4422 LearningRate 0.0985 Epoch: 0 Global Step: 1820 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:56:33,812-Speed 3111.13 samples/sec Loss 35.3456 LearningRate 0.0985 Epoch: 0 Global Step: 1830 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:56:37,134-Speed 3083.99 samples/sec Loss 35.2445 LearningRate 0.0985 Epoch: 0 Global Step: 1840 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:56:40,412-Speed 3124.93 samples/sec Loss 35.2724 LearningRate 0.0985 Epoch: 0 Global Step: 1850 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:56:43,710-Speed 3106.12 samples/sec Loss 35.2958 LearningRate 0.0985 Epoch: 0 Global Step: 1860 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:56:47,102-Speed 3019.72 samples/sec Loss 35.0837 LearningRate 0.0985 Epoch: 0 Global Step: 1870 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:56:50,370-Speed 3134.13 samples/sec Loss 35.1286 LearningRate 0.0985 Epoch: 0 Global Step: 1880 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:56:53,727-Speed 3051.40 samples/sec Loss 35.2271 LearningRate 0.0985 Epoch: 0 Global Step: 1890 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:56:57,030-Speed 3101.36 samples/sec Loss 35.0702 LearningRate 0.0985 Epoch: 0 Global Step: 1900 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:57:00,299-Speed 3134.10 samples/sec Loss 34.8415 LearningRate 0.0985 Epoch: 0 Global Step: 1910 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:57:03,603-Speed 3099.25 samples/sec Loss 34.7339 LearningRate 0.0985 Epoch: 0 Global Step: 1920 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:57:06,919-Speed 3088.98 samples/sec Loss 34.9060 LearningRate 0.0985 Epoch: 0 Global Step: 1930 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:57:10,195-Speed 3127.24 samples/sec Loss 34.7186 LearningRate 0.0984 Epoch: 0 Global Step: 1940 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:57:13,520-Speed 3081.20 samples/sec Loss 34.7586 LearningRate 0.0984 Epoch: 0 Global Step: 1950 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:57:16,836-Speed 3088.69 samples/sec Loss 34.6686 LearningRate 0.0984 Epoch: 0 Global Step: 1960 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:57:20,143-Speed 3097.46 samples/sec Loss 34.5672 LearningRate 0.0984 Epoch: 0 Global Step: 1970 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:57:23,469-Speed 3079.91 samples/sec Loss 34.5693 LearningRate 0.0984 Epoch: 0 Global Step: 1980 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:57:26,871-Speed 3010.92 samples/sec Loss 34.5065 LearningRate 0.0984 Epoch: 0 Global Step: 1990 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:57:30,240-Speed 3040.45 samples/sec Loss 34.4929 LearningRate 0.0984 Epoch: 0 Global Step: 2000 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:57:33,584-Speed 3063.44 samples/sec Loss 34.3285 LearningRate 0.0984 Epoch: 0 Global Step: 2010 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:57:36,900-Speed 3088.98 samples/sec Loss 34.3726 LearningRate 0.0984 Epoch: 0 Global Step: 2020 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:57:40,272-Speed 3037.43 samples/sec Loss 34.3117 LearningRate 0.0984 Epoch: 0 Global Step: 2030 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:57:43,660-Speed 3023.75 samples/sec Loss 34.2747 LearningRate 0.0984 Epoch: 0 Global Step: 2040 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:57:46,995-Speed 3071.46 samples/sec Loss 34.1439 LearningRate 0.0984 Epoch: 0 Global Step: 2050 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:57:50,315-Speed 3085.59 samples/sec Loss 34.1771 LearningRate 0.0983 Epoch: 0 Global Step: 2060 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:57:53,615-Speed 3103.50 samples/sec Loss 33.9595 LearningRate 0.0983 Epoch: 0 Global Step: 2070 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:57:56,988-Speed 3037.49 samples/sec Loss 34.0752 LearningRate 0.0983 Epoch: 0 Global Step: 2080 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:58:00,298-Speed 3095.19 samples/sec Loss 33.8740 LearningRate 0.0983 Epoch: 0 Global Step: 2090 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:58:03,664-Speed 3042.90 samples/sec Loss 33.8566 LearningRate 0.0983 Epoch: 0 Global Step: 2100 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:58:07,033-Speed 3040.41 samples/sec Loss 33.9075 LearningRate 0.0983 Epoch: 0 Global Step: 2110 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:58:10,380-Speed 3060.64 samples/sec Loss 33.8332 LearningRate 0.0983 Epoch: 0 Global Step: 2120 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:58:13,688-Speed 3096.55 samples/sec Loss 33.7942 LearningRate 0.0983 Epoch: 0 Global Step: 2130 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:58:16,997-Speed 3094.86 samples/sec Loss 33.5240 LearningRate 0.0983 Epoch: 0 Global Step: 2140 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:58:20,289-Speed 3111.49 samples/sec Loss 33.6107 LearningRate 0.0983 Epoch: 0 Global Step: 2150 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:58:23,547-Speed 3144.26 samples/sec Loss 33.3887 LearningRate 0.0983 Epoch: 0 Global Step: 2160 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:58:26,842-Speed 3109.22 samples/sec Loss 33.4565 LearningRate 0.0983 Epoch: 0 Global Step: 2170 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:58:30,193-Speed 3056.25 samples/sec Loss 33.2438 LearningRate 0.0983 Epoch: 0 Global Step: 2180 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:58:33,573-Speed 3030.39 samples/sec Loss 33.3416 LearningRate 0.0982 Epoch: 0 Global Step: 2190 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:58:36,864-Speed 3112.92 samples/sec Loss 33.1728 LearningRate 0.0982 Epoch: 0 Global Step: 2200 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:58:40,175-Speed 3093.07 samples/sec Loss 33.1385 LearningRate 0.0982 Epoch: 0 Global Step: 2210 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:58:43,535-Speed 3048.97 samples/sec Loss 33.1339 LearningRate 0.0982 Epoch: 0 Global Step: 2220 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:58:46,822-Speed 3115.80 samples/sec Loss 33.1817 LearningRate 0.0982 Epoch: 0 Global Step: 2230 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:58:50,093-Speed 3131.42 samples/sec Loss 32.9400 LearningRate 0.0982 Epoch: 0 Global Step: 2240 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:58:53,387-Speed 3109.66 samples/sec Loss 32.9043 LearningRate 0.0982 Epoch: 0 Global Step: 2250 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:58:56,740-Speed 3055.88 samples/sec Loss 32.9076 LearningRate 0.0982 Epoch: 0 Global Step: 2260 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:59:00,121-Speed 3029.22 samples/sec Loss 32.8108 LearningRate 0.0982 Epoch: 0 Global Step: 2270 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:59:03,397-Speed 3126.98 samples/sec Loss 32.8485 LearningRate 0.0982 Epoch: 0 Global Step: 2280 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:59:06,697-Speed 3104.00 samples/sec Loss 32.7927 LearningRate 0.0982 Epoch: 0 Global Step: 2290 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:59:10,096-Speed 3013.71 samples/sec Loss 32.7511 LearningRate 0.0982 Epoch: 0 Global Step: 2300 Fp16 Grad Scale: 131072 Required: 24 hours Training: 2022-04-27 01:59:13,423-Speed 3078.62 samples/sec Loss 32.8202 LearningRate 0.0981 Epoch: 0 Global Step: 2310 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 01:59:16,763-Speed 3067.21 samples/sec Loss 32.6396 LearningRate 0.0981 Epoch: 0 Global Step: 2320 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 01:59:20,056-Speed 3110.99 samples/sec Loss 32.5052 LearningRate 0.0981 Epoch: 0 Global Step: 2330 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 01:59:23,442-Speed 3025.39 samples/sec Loss 32.7504 LearningRate 0.0981 Epoch: 0 Global Step: 2340 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 01:59:26,695-Speed 3148.81 samples/sec Loss 32.5501 LearningRate 0.0981 Epoch: 0 Global Step: 2350 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 01:59:30,003-Speed 3095.59 samples/sec Loss 32.2903 LearningRate 0.0981 Epoch: 0 Global Step: 2360 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 01:59:33,343-Speed 3067.09 samples/sec Loss 32.4217 LearningRate 0.0981 Epoch: 0 Global Step: 2370 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 01:59:36,622-Speed 3123.53 samples/sec Loss 32.1599 LearningRate 0.0981 Epoch: 0 Global Step: 2380 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 01:59:39,991-Speed 3041.10 samples/sec Loss 32.2417 LearningRate 0.0981 Epoch: 0 Global Step: 2390 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 01:59:43,294-Speed 3100.62 samples/sec Loss 32.2240 LearningRate 0.0981 Epoch: 0 Global Step: 2400 Fp16 Grad Scale: 262144 Required: 23 hours Training: 2022-04-27 01:59:46,571-Speed 3126.06 samples/sec Loss 32.0279 LearningRate 0.0981 Epoch: 0 Global Step: 2410 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 01:59:49,899-Speed 3077.95 samples/sec Loss 32.0832 LearningRate 0.0981 Epoch: 0 Global Step: 2420 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 01:59:53,266-Speed 3041.65 samples/sec Loss 32.0519 LearningRate 0.0981 Epoch: 0 Global Step: 2430 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 01:59:56,547-Speed 3122.00 samples/sec Loss 31.8338 LearningRate 0.0980 Epoch: 0 Global Step: 2440 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 01:59:59,836-Speed 3114.19 samples/sec Loss 31.8570 LearningRate 0.0980 Epoch: 0 Global Step: 2450 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:00:03,192-Speed 3053.20 samples/sec Loss 31.9036 LearningRate 0.0980 Epoch: 0 Global Step: 2460 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:00:06,564-Speed 3037.00 samples/sec Loss 31.7529 LearningRate 0.0980 Epoch: 0 Global Step: 2470 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:00:09,926-Speed 3050.32 samples/sec Loss 31.5251 LearningRate 0.0980 Epoch: 0 Global Step: 2480 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:00:13,218-Speed 3111.61 samples/sec Loss 31.7313 LearningRate 0.0980 Epoch: 0 Global Step: 2490 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:00:16,513-Speed 3109.07 samples/sec Loss 31.3672 LearningRate 0.0980 Epoch: 0 Global Step: 2500 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:00:19,836-Speed 3082.19 samples/sec Loss 31.6885 LearningRate 0.0980 Epoch: 0 Global Step: 2510 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:00:23,176-Speed 3067.46 samples/sec Loss 31.4386 LearningRate 0.0980 Epoch: 0 Global Step: 2520 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:00:26,498-Speed 3083.44 samples/sec Loss 31.3825 LearningRate 0.0980 Epoch: 0 Global Step: 2530 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:00:29,798-Speed 3103.73 samples/sec Loss 31.5407 LearningRate 0.0980 Epoch: 0 Global Step: 2540 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:00:33,126-Speed 3078.37 samples/sec Loss 31.3864 LearningRate 0.0980 Epoch: 0 Global Step: 2550 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:00:36,450-Speed 3082.35 samples/sec Loss 31.3422 LearningRate 0.0979 Epoch: 0 Global Step: 2560 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:00:39,820-Speed 3039.61 samples/sec Loss 31.0621 LearningRate 0.0979 Epoch: 0 Global Step: 2570 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:00:43,167-Speed 3060.25 samples/sec Loss 31.1020 LearningRate 0.0979 Epoch: 0 Global Step: 2580 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:00:46,453-Speed 3116.90 samples/sec Loss 31.2324 LearningRate 0.0979 Epoch: 0 Global Step: 2590 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:00:49,810-Speed 3050.66 samples/sec Loss 30.8797 LearningRate 0.0979 Epoch: 0 Global Step: 2600 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:00:53,194-Speed 3027.59 samples/sec Loss 30.9220 LearningRate 0.0979 Epoch: 0 Global Step: 2610 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:00:56,480-Speed 3116.55 samples/sec Loss 30.9270 LearningRate 0.0979 Epoch: 0 Global Step: 2620 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:00:59,869-Speed 3022.49 samples/sec Loss 30.8036 LearningRate 0.0979 Epoch: 0 Global Step: 2630 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:01:03,280-Speed 3003.05 samples/sec Loss 30.6418 LearningRate 0.0979 Epoch: 0 Global Step: 2640 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:01:06,688-Speed 3005.32 samples/sec Loss 30.6986 LearningRate 0.0979 Epoch: 0 Global Step: 2650 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:01:10,057-Speed 3040.50 samples/sec Loss 30.6611 LearningRate 0.0979 Epoch: 0 Global Step: 2660 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:01:13,368-Speed 3093.94 samples/sec Loss 30.5932 LearningRate 0.0979 Epoch: 0 Global Step: 2670 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:01:16,696-Speed 3077.29 samples/sec Loss 30.4561 LearningRate 0.0979 Epoch: 0 Global Step: 2680 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:01:20,061-Speed 3044.15 samples/sec Loss 30.5832 LearningRate 0.0978 Epoch: 0 Global Step: 2690 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:01:23,395-Speed 3072.41 samples/sec Loss 30.5083 LearningRate 0.0978 Epoch: 0 Global Step: 2700 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:01:26,688-Speed 3110.47 samples/sec Loss 30.5738 LearningRate 0.0978 Epoch: 0 Global Step: 2710 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:01:30,012-Speed 3081.57 samples/sec Loss 30.3490 LearningRate 0.0978 Epoch: 0 Global Step: 2720 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:01:33,357-Speed 3062.25 samples/sec Loss 30.3122 LearningRate 0.0978 Epoch: 0 Global Step: 2730 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:01:36,677-Speed 3085.26 samples/sec Loss 30.2002 LearningRate 0.0978 Epoch: 0 Global Step: 2740 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:01:39,979-Speed 3102.41 samples/sec Loss 30.2329 LearningRate 0.0978 Epoch: 0 Global Step: 2750 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:01:43,292-Speed 3091.22 samples/sec Loss 29.9321 LearningRate 0.0978 Epoch: 0 Global Step: 2760 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:01:46,633-Speed 3066.28 samples/sec Loss 29.9931 LearningRate 0.0978 Epoch: 0 Global Step: 2770 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:01:49,988-Speed 3052.69 samples/sec Loss 30.0118 LearningRate 0.0978 Epoch: 0 Global Step: 2780 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:01:53,279-Speed 3112.75 samples/sec Loss 29.8648 LearningRate 0.0978 Epoch: 0 Global Step: 2790 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:01:56,601-Speed 3083.82 samples/sec Loss 29.9753 LearningRate 0.0978 Epoch: 0 Global Step: 2800 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:01:59,928-Speed 3078.23 samples/sec Loss 29.9228 LearningRate 0.0978 Epoch: 0 Global Step: 2810 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:02:03,241-Speed 3091.83 samples/sec Loss 29.7768 LearningRate 0.0977 Epoch: 0 Global Step: 2820 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:02:06,547-Speed 3098.35 samples/sec Loss 29.6698 LearningRate 0.0977 Epoch: 0 Global Step: 2830 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:02:09,934-Speed 3024.39 samples/sec Loss 29.6712 LearningRate 0.0977 Epoch: 0 Global Step: 2840 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-04-27 02:02:13,318-Speed 3027.35 samples/sec Loss 29.5657 LearningRate 0.0977 Epoch: 0 Global Step: 2850 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-04-27 02:02:16,671-Speed 3054.35 samples/sec Loss 29.6017 LearningRate 0.0977 Epoch: 0 Global Step: 2860 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-04-27 02:02:19,979-Speed 3096.53 samples/sec Loss 29.4563 LearningRate 0.0977 Epoch: 0 Global Step: 2870 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-04-27 02:02:23,268-Speed 3114.28 samples/sec Loss 29.4756 LearningRate 0.0977 Epoch: 0 Global Step: 2880 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-04-27 02:02:26,605-Speed 3069.63 samples/sec Loss 29.3770 LearningRate 0.0977 Epoch: 0 Global Step: 2890 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-04-27 02:02:29,880-Speed 3127.55 samples/sec Loss 29.4774 LearningRate 0.0977 Epoch: 0 Global Step: 2900 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-04-27 02:02:33,165-Speed 3118.03 samples/sec Loss 29.3943 LearningRate 0.0977 Epoch: 0 Global Step: 2910 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-04-27 02:02:36,469-Speed 3100.26 samples/sec Loss 29.2910 LearningRate 0.0977 Epoch: 0 Global Step: 2920 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-04-27 02:02:39,794-Speed 3081.32 samples/sec Loss 29.0759 LearningRate 0.0977 Epoch: 0 Global Step: 2930 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-04-27 02:02:43,118-Speed 3081.47 samples/sec Loss 29.1090 LearningRate 0.0976 Epoch: 0 Global Step: 2940 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:02:46,430-Speed 3092.39 samples/sec Loss 28.7973 LearningRate 0.0976 Epoch: 0 Global Step: 2950 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:02:49,729-Speed 3105.46 samples/sec Loss 28.9474 LearningRate 0.0976 Epoch: 0 Global Step: 2960 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:02:53,036-Speed 3097.74 samples/sec Loss 28.9387 LearningRate 0.0976 Epoch: 0 Global Step: 2970 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:02:56,340-Speed 3099.94 samples/sec Loss 28.7894 LearningRate 0.0976 Epoch: 0 Global Step: 2980 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:02:59,649-Speed 3095.26 samples/sec Loss 28.7997 LearningRate 0.0976 Epoch: 0 Global Step: 2990 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:03:02,978-Speed 3076.68 samples/sec Loss 28.7475 LearningRate 0.0976 Epoch: 0 Global Step: 3000 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:03:06,383-Speed 3008.69 samples/sec Loss 28.5509 LearningRate 0.0976 Epoch: 0 Global Step: 3010 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:03:09,741-Speed 3049.90 samples/sec Loss 28.7862 LearningRate 0.0976 Epoch: 0 Global Step: 3020 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:03:13,083-Speed 3064.77 samples/sec Loss 28.5644 LearningRate 0.0976 Epoch: 0 Global Step: 3030 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:03:16,420-Speed 3070.18 samples/sec Loss 28.4914 LearningRate 0.0976 Epoch: 0 Global Step: 3040 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:03:19,777-Speed 3050.27 samples/sec Loss 28.5841 LearningRate 0.0976 Epoch: 0 Global Step: 3050 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:03:23,044-Speed 3135.90 samples/sec Loss 28.3424 LearningRate 0.0976 Epoch: 0 Global Step: 3060 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:03:26,421-Speed 3032.93 samples/sec Loss 28.3289 LearningRate 0.0975 Epoch: 0 Global Step: 3070 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:03:29,674-Speed 3148.72 samples/sec Loss 28.3370 LearningRate 0.0975 Epoch: 0 Global Step: 3080 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:03:33,044-Speed 3039.66 samples/sec Loss 28.3780 LearningRate 0.0975 Epoch: 0 Global Step: 3090 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:03:36,324-Speed 3123.20 samples/sec Loss 28.1094 LearningRate 0.0975 Epoch: 0 Global Step: 3100 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:03:39,694-Speed 3038.75 samples/sec Loss 28.2430 LearningRate 0.0975 Epoch: 0 Global Step: 3110 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:03:43,045-Speed 3057.36 samples/sec Loss 28.1172 LearningRate 0.0975 Epoch: 0 Global Step: 3120 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:03:46,349-Speed 3100.00 samples/sec Loss 28.0590 LearningRate 0.0975 Epoch: 0 Global Step: 3130 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:03:49,719-Speed 3039.02 samples/sec Loss 28.0624 LearningRate 0.0975 Epoch: 0 Global Step: 3140 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:03:53,036-Speed 3088.17 samples/sec Loss 27.9393 LearningRate 0.0975 Epoch: 0 Global Step: 3150 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:03:56,365-Speed 3077.49 samples/sec Loss 27.7438 LearningRate 0.0975 Epoch: 0 Global Step: 3160 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:03:59,686-Speed 3083.18 samples/sec Loss 27.8596 LearningRate 0.0975 Epoch: 0 Global Step: 3170 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:04:02,994-Speed 3096.48 samples/sec Loss 27.8855 LearningRate 0.0975 Epoch: 0 Global Step: 3180 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:04:06,286-Speed 3112.01 samples/sec Loss 27.6791 LearningRate 0.0974 Epoch: 0 Global Step: 3190 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:04:09,566-Speed 3123.23 samples/sec Loss 27.6195 LearningRate 0.0974 Epoch: 0 Global Step: 3200 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:04:12,850-Speed 3119.64 samples/sec Loss 27.7516 LearningRate 0.0974 Epoch: 0 Global Step: 3210 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:04:16,166-Speed 3089.67 samples/sec Loss 27.4768 LearningRate 0.0974 Epoch: 0 Global Step: 3220 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:04:19,440-Speed 3128.75 samples/sec Loss 27.5868 LearningRate 0.0974 Epoch: 0 Global Step: 3230 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:04:22,758-Speed 3086.50 samples/sec Loss 27.4877 LearningRate 0.0974 Epoch: 0 Global Step: 3240 Fp16 Grad Scale: 262144 Required: 23 hours Training: 2022-04-27 02:04:26,039-Speed 3122.04 samples/sec Loss 27.2445 LearningRate 0.0974 Epoch: 0 Global Step: 3250 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:04:29,384-Speed 3062.35 samples/sec Loss 27.1793 LearningRate 0.0974 Epoch: 0 Global Step: 3260 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:04:32,744-Speed 3048.93 samples/sec Loss 27.3449 LearningRate 0.0974 Epoch: 0 Global Step: 3270 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:04:36,042-Speed 3105.52 samples/sec Loss 27.4291 LearningRate 0.0974 Epoch: 0 Global Step: 3280 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:04:39,350-Speed 3096.33 samples/sec Loss 27.3572 LearningRate 0.0974 Epoch: 0 Global Step: 3290 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:04:42,647-Speed 3107.27 samples/sec Loss 27.0959 LearningRate 0.0974 Epoch: 0 Global Step: 3300 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:04:45,946-Speed 3105.18 samples/sec Loss 27.2882 LearningRate 0.0974 Epoch: 0 Global Step: 3310 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:04:49,300-Speed 3053.44 samples/sec Loss 27.1064 LearningRate 0.0973 Epoch: 0 Global Step: 3320 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:04:52,605-Speed 3099.93 samples/sec Loss 26.9723 LearningRate 0.0973 Epoch: 0 Global Step: 3330 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:04:55,923-Speed 3087.82 samples/sec Loss 26.8946 LearningRate 0.0973 Epoch: 0 Global Step: 3340 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:04:59,256-Speed 3072.80 samples/sec Loss 26.9063 LearningRate 0.0973 Epoch: 0 Global Step: 3350 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:05:02,580-Speed 3082.04 samples/sec Loss 27.0367 LearningRate 0.0973 Epoch: 0 Global Step: 3360 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:05:06,011-Speed 2984.85 samples/sec Loss 26.6745 LearningRate 0.0973 Epoch: 0 Global Step: 3370 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:05:09,332-Speed 3084.76 samples/sec Loss 26.5915 LearningRate 0.0973 Epoch: 0 Global Step: 3380 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:05:12,665-Speed 3073.33 samples/sec Loss 26.6301 LearningRate 0.0973 Epoch: 0 Global Step: 3390 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:05:16,024-Speed 3049.37 samples/sec Loss 26.6925 LearningRate 0.0973 Epoch: 0 Global Step: 3400 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:05:19,390-Speed 3042.41 samples/sec Loss 26.5709 LearningRate 0.0973 Epoch: 0 Global Step: 3410 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:05:22,685-Speed 3109.31 samples/sec Loss 26.6163 LearningRate 0.0973 Epoch: 0 Global Step: 3420 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:05:26,001-Speed 3088.95 samples/sec Loss 26.6636 LearningRate 0.0973 Epoch: 0 Global Step: 3430 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:05:29,345-Speed 3062.69 samples/sec Loss 26.3716 LearningRate 0.0972 Epoch: 0 Global Step: 3440 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:05:32,666-Speed 3084.31 samples/sec Loss 26.3276 LearningRate 0.0972 Epoch: 0 Global Step: 3450 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:05:36,032-Speed 3043.53 samples/sec Loss 26.4196 LearningRate 0.0972 Epoch: 0 Global Step: 3460 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:05:39,360-Speed 3078.12 samples/sec Loss 26.4178 LearningRate 0.0972 Epoch: 0 Global Step: 3470 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:05:42,735-Speed 3034.78 samples/sec Loss 26.0574 LearningRate 0.0972 Epoch: 0 Global Step: 3480 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:05:46,057-Speed 3083.48 samples/sec Loss 26.0779 LearningRate 0.0972 Epoch: 0 Global Step: 3490 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:05:49,381-Speed 3081.88 samples/sec Loss 26.4132 LearningRate 0.0972 Epoch: 0 Global Step: 3500 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:05:52,683-Speed 3102.51 samples/sec Loss 26.0668 LearningRate 0.0972 Epoch: 0 Global Step: 3510 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:05:55,954-Speed 3131.05 samples/sec Loss 25.9855 LearningRate 0.0972 Epoch: 0 Global Step: 3520 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:05:59,292-Speed 3068.53 samples/sec Loss 25.9808 LearningRate 0.0972 Epoch: 0 Global Step: 3530 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:06:02,641-Speed 3058.68 samples/sec Loss 26.0350 LearningRate 0.0972 Epoch: 0 Global Step: 3540 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:06:05,969-Speed 3077.73 samples/sec Loss 25.9078 LearningRate 0.0972 Epoch: 0 Global Step: 3550 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:06:09,255-Speed 3117.65 samples/sec Loss 26.0577 LearningRate 0.0972 Epoch: 0 Global Step: 3560 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:06:12,523-Speed 3133.80 samples/sec Loss 25.8256 LearningRate 0.0971 Epoch: 0 Global Step: 3570 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:06:15,775-Speed 3149.86 samples/sec Loss 25.9546 LearningRate 0.0971 Epoch: 0 Global Step: 3580 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:06:19,120-Speed 3062.52 samples/sec Loss 25.7256 LearningRate 0.0971 Epoch: 0 Global Step: 3590 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:06:22,474-Speed 3054.50 samples/sec Loss 25.6417 LearningRate 0.0971 Epoch: 0 Global Step: 3600 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:06:25,776-Speed 3102.08 samples/sec Loss 25.7033 LearningRate 0.0971 Epoch: 0 Global Step: 3610 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:06:29,118-Speed 3065.24 samples/sec Loss 25.5563 LearningRate 0.0971 Epoch: 0 Global Step: 3620 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:06:32,471-Speed 3054.32 samples/sec Loss 25.7536 LearningRate 0.0971 Epoch: 0 Global Step: 3630 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:06:35,809-Speed 3068.67 samples/sec Loss 25.6258 LearningRate 0.0971 Epoch: 0 Global Step: 3640 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:06:39,168-Speed 3049.47 samples/sec Loss 25.5538 LearningRate 0.0971 Epoch: 0 Global Step: 3650 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:06:42,541-Speed 3037.22 samples/sec Loss 25.2441 LearningRate 0.0971 Epoch: 0 Global Step: 3660 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:06:45,828-Speed 3115.52 samples/sec Loss 25.6325 LearningRate 0.0971 Epoch: 0 Global Step: 3670 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:06:49,091-Speed 3139.46 samples/sec Loss 25.4819 LearningRate 0.0971 Epoch: 0 Global Step: 3680 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:06:52,472-Speed 3029.93 samples/sec Loss 25.3649 LearningRate 0.0971 Epoch: 0 Global Step: 3690 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:06:55,789-Speed 3088.14 samples/sec Loss 25.1092 LearningRate 0.0970 Epoch: 0 Global Step: 3700 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:06:59,084-Speed 3108.91 samples/sec Loss 25.2284 LearningRate 0.0970 Epoch: 0 Global Step: 3710 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:07:02,355-Speed 3131.13 samples/sec Loss 25.4571 LearningRate 0.0970 Epoch: 0 Global Step: 3720 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:07:05,730-Speed 3034.71 samples/sec Loss 25.0015 LearningRate 0.0970 Epoch: 0 Global Step: 3730 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:07:09,119-Speed 3022.61 samples/sec Loss 25.1340 LearningRate 0.0970 Epoch: 0 Global Step: 3740 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:07:12,462-Speed 3065.19 samples/sec Loss 25.2063 LearningRate 0.0970 Epoch: 0 Global Step: 3750 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:07:15,778-Speed 3089.06 samples/sec Loss 24.8189 LearningRate 0.0970 Epoch: 0 Global Step: 3760 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:07:19,109-Speed 3074.73 samples/sec Loss 24.9523 LearningRate 0.0970 Epoch: 0 Global Step: 3770 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:07:22,458-Speed 3058.55 samples/sec Loss 24.9829 LearningRate 0.0970 Epoch: 0 Global Step: 3780 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:07:25,774-Speed 3089.22 samples/sec Loss 25.0947 LearningRate 0.0970 Epoch: 0 Global Step: 3790 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:07:29,134-Speed 3048.58 samples/sec Loss 24.9585 LearningRate 0.0970 Epoch: 0 Global Step: 3800 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:07:32,506-Speed 3037.25 samples/sec Loss 24.8407 LearningRate 0.0970 Epoch: 0 Global Step: 3810 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:07:35,880-Speed 3036.13 samples/sec Loss 24.8534 LearningRate 0.0969 Epoch: 0 Global Step: 3820 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:07:39,245-Speed 3043.87 samples/sec Loss 24.7817 LearningRate 0.0969 Epoch: 0 Global Step: 3830 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:07:42,541-Speed 3107.71 samples/sec Loss 24.7630 LearningRate 0.0969 Epoch: 0 Global Step: 3840 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:07:45,864-Speed 3082.32 samples/sec Loss 24.7185 LearningRate 0.0969 Epoch: 0 Global Step: 3850 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:07:49,248-Speed 3027.04 samples/sec Loss 24.7164 LearningRate 0.0969 Epoch: 0 Global Step: 3860 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:07:52,577-Speed 3076.96 samples/sec Loss 24.7998 LearningRate 0.0969 Epoch: 0 Global Step: 3870 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:07:55,942-Speed 3045.02 samples/sec Loss 24.5094 LearningRate 0.0969 Epoch: 0 Global Step: 3880 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:07:59,238-Speed 3108.04 samples/sec Loss 24.5662 LearningRate 0.0969 Epoch: 0 Global Step: 3890 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:08:02,570-Speed 3073.67 samples/sec Loss 24.3349 LearningRate 0.0969 Epoch: 0 Global Step: 3900 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:08:05,895-Speed 3080.89 samples/sec Loss 24.4935 LearningRate 0.0969 Epoch: 0 Global Step: 3910 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:08:09,170-Speed 3127.83 samples/sec Loss 24.6849 LearningRate 0.0969 Epoch: 0 Global Step: 3920 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:08:12,515-Speed 3061.52 samples/sec Loss 24.5446 LearningRate 0.0969 Epoch: 0 Global Step: 3930 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:08:15,843-Speed 3078.06 samples/sec Loss 24.4875 LearningRate 0.0969 Epoch: 0 Global Step: 3940 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:08:19,168-Speed 3081.07 samples/sec Loss 24.3728 LearningRate 0.0968 Epoch: 0 Global Step: 3950 Fp16 Grad Scale: 262144 Required: 23 hours Training: 2022-04-27 02:08:22,453-Speed 3117.86 samples/sec Loss 24.4370 LearningRate 0.0968 Epoch: 0 Global Step: 3960 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:08:25,787-Speed 3072.37 samples/sec Loss 24.2910 LearningRate 0.0968 Epoch: 0 Global Step: 3970 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:08:29,116-Speed 3077.02 samples/sec Loss 24.2849 LearningRate 0.0968 Epoch: 0 Global Step: 3980 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:08:32,433-Speed 3088.24 samples/sec Loss 24.2342 LearningRate 0.0968 Epoch: 0 Global Step: 3990 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:08:35,811-Speed 3032.30 samples/sec Loss 24.2191 LearningRate 0.0968 Epoch: 0 Global Step: 4000 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:08:39,102-Speed 3111.82 samples/sec Loss 23.9630 LearningRate 0.0968 Epoch: 0 Global Step: 4010 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:08:42,449-Speed 3060.17 samples/sec Loss 24.0225 LearningRate 0.0968 Epoch: 0 Global Step: 4020 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:08:45,789-Speed 3066.85 samples/sec Loss 24.1475 LearningRate 0.0968 Epoch: 0 Global Step: 4030 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:08:49,120-Speed 3074.72 samples/sec Loss 23.9153 LearningRate 0.0968 Epoch: 0 Global Step: 4040 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-04-27 02:08:52,424-Speed 3101.89 samples/sec Loss 23.9565 LearningRate 0.0968 Epoch: 0 Global Step: 4050 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-04-27 02:08:55,785-Speed 3047.13 samples/sec Loss 23.8974 LearningRate 0.0968 Epoch: 0 Global Step: 4060 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-04-27 02:08:59,146-Speed 3048.65 samples/sec Loss 24.0669 LearningRate 0.0968 Epoch: 0 Global Step: 4070 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-04-27 02:09:02,499-Speed 3054.99 samples/sec Loss 23.9203 LearningRate 0.0967 Epoch: 0 Global Step: 4080 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-04-27 02:09:05,832-Speed 3073.10 samples/sec Loss 23.8108 LearningRate 0.0967 Epoch: 0 Global Step: 4090 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-04-27 02:09:09,129-Speed 3106.80 samples/sec Loss 23.8865 LearningRate 0.0967 Epoch: 0 Global Step: 4100 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-04-27 02:09:12,457-Speed 3077.51 samples/sec Loss 23.7157 LearningRate 0.0967 Epoch: 0 Global Step: 4110 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-04-27 02:09:15,729-Speed 3131.41 samples/sec Loss 23.7699 LearningRate 0.0967 Epoch: 0 Global Step: 4120 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-04-27 02:09:18,988-Speed 3142.32 samples/sec Loss 23.6028 LearningRate 0.0967 Epoch: 0 Global Step: 4130 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-04-27 02:09:22,323-Speed 3072.55 samples/sec Loss 23.6766 LearningRate 0.0967 Epoch: 0 Global Step: 4140 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:09:25,590-Speed 3134.97 samples/sec Loss 23.7779 LearningRate 0.0967 Epoch: 0 Global Step: 4150 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:09:28,902-Speed 3093.53 samples/sec Loss 23.7513 LearningRate 0.0967 Epoch: 0 Global Step: 4160 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:09:32,244-Speed 3064.24 samples/sec Loss 23.5256 LearningRate 0.0967 Epoch: 0 Global Step: 4170 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:09:35,602-Speed 3051.07 samples/sec Loss 23.5894 LearningRate 0.0967 Epoch: 0 Global Step: 4180 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:09:38,973-Speed 3038.30 samples/sec Loss 23.4954 LearningRate 0.0967 Epoch: 0 Global Step: 4190 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:09:42,350-Speed 3033.38 samples/sec Loss 23.3024 LearningRate 0.0966 Epoch: 0 Global Step: 4200 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:09:45,693-Speed 3063.70 samples/sec Loss 23.4206 LearningRate 0.0966 Epoch: 0 Global Step: 4210 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:09:49,046-Speed 3055.30 samples/sec Loss 23.2757 LearningRate 0.0966 Epoch: 0 Global Step: 4220 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:09:52,346-Speed 3103.49 samples/sec Loss 23.4442 LearningRate 0.0966 Epoch: 0 Global Step: 4230 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:09:55,652-Speed 3098.37 samples/sec Loss 23.2861 LearningRate 0.0966 Epoch: 0 Global Step: 4240 Fp16 Grad Scale: 262144 Required: 23 hours Training: 2022-04-27 02:09:58,945-Speed 3110.97 samples/sec Loss 23.2979 LearningRate 0.0966 Epoch: 0 Global Step: 4250 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:10:02,274-Speed 3077.20 samples/sec Loss 23.3369 LearningRate 0.0966 Epoch: 0 Global Step: 4260 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:10:05,598-Speed 3081.80 samples/sec Loss 23.3448 LearningRate 0.0966 Epoch: 0 Global Step: 4270 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:10:08,883-Speed 3117.36 samples/sec Loss 23.0737 LearningRate 0.0966 Epoch: 0 Global Step: 4280 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:10:12,161-Speed 3125.32 samples/sec Loss 23.1285 LearningRate 0.0966 Epoch: 0 Global Step: 4290 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:10:15,545-Speed 3026.45 samples/sec Loss 23.2084 LearningRate 0.0966 Epoch: 0 Global Step: 4300 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:10:18,883-Speed 3068.94 samples/sec Loss 23.0059 LearningRate 0.0966 Epoch: 0 Global Step: 4310 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:10:22,172-Speed 3114.65 samples/sec Loss 23.0956 LearningRate 0.0966 Epoch: 0 Global Step: 4320 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:10:25,492-Speed 3084.65 samples/sec Loss 23.1937 LearningRate 0.0965 Epoch: 0 Global Step: 4330 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:10:28,865-Speed 3036.65 samples/sec Loss 23.1930 LearningRate 0.0965 Epoch: 0 Global Step: 4340 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:10:32,140-Speed 3128.22 samples/sec Loss 23.0771 LearningRate 0.0965 Epoch: 0 Global Step: 4350 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:10:35,474-Speed 3072.51 samples/sec Loss 22.8319 LearningRate 0.0965 Epoch: 0 Global Step: 4360 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:10:38,777-Speed 3101.07 samples/sec Loss 23.0082 LearningRate 0.0965 Epoch: 0 Global Step: 4370 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:10:42,072-Speed 3108.93 samples/sec Loss 22.9098 LearningRate 0.0965 Epoch: 0 Global Step: 4380 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:10:45,382-Speed 3094.43 samples/sec Loss 22.8651 LearningRate 0.0965 Epoch: 0 Global Step: 4390 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:10:48,723-Speed 3065.86 samples/sec Loss 22.8753 LearningRate 0.0965 Epoch: 0 Global Step: 4400 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:10:52,116-Speed 3018.74 samples/sec Loss 23.0938 LearningRate 0.0965 Epoch: 0 Global Step: 4410 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:10:55,462-Speed 3061.03 samples/sec Loss 22.7714 LearningRate 0.0965 Epoch: 0 Global Step: 4420 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:10:58,769-Speed 3097.97 samples/sec Loss 22.7717 LearningRate 0.0965 Epoch: 0 Global Step: 4430 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:11:02,181-Speed 3001.75 samples/sec Loss 22.7131 LearningRate 0.0965 Epoch: 0 Global Step: 4440 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:11:05,462-Speed 3122.17 samples/sec Loss 22.6061 LearningRate 0.0964 Epoch: 0 Global Step: 4450 Fp16 Grad Scale: 262144 Required: 23 hours Training: 2022-04-27 02:11:08,714-Speed 3149.84 samples/sec Loss 22.6175 LearningRate 0.0964 Epoch: 0 Global Step: 4460 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:11:12,053-Speed 3067.84 samples/sec Loss 22.6380 LearningRate 0.0964 Epoch: 0 Global Step: 4470 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:11:15,396-Speed 3063.75 samples/sec Loss 22.6868 LearningRate 0.0964 Epoch: 0 Global Step: 4480 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:11:18,687-Speed 3112.91 samples/sec Loss 22.7763 LearningRate 0.0964 Epoch: 0 Global Step: 4490 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:11:21,936-Speed 3153.19 samples/sec Loss 22.7025 LearningRate 0.0964 Epoch: 0 Global Step: 4500 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:11:25,220-Speed 3118.89 samples/sec Loss 22.6199 LearningRate 0.0964 Epoch: 0 Global Step: 4510 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:11:28,506-Speed 3117.51 samples/sec Loss 22.6784 LearningRate 0.0964 Epoch: 0 Global Step: 4520 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:11:31,780-Speed 3128.73 samples/sec Loss 22.5210 LearningRate 0.0964 Epoch: 0 Global Step: 4530 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:11:35,051-Speed 3130.85 samples/sec Loss 22.4915 LearningRate 0.0964 Epoch: 0 Global Step: 4540 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:11:38,351-Speed 3104.33 samples/sec Loss 22.4563 LearningRate 0.0964 Epoch: 0 Global Step: 4550 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:11:41,631-Speed 3123.54 samples/sec Loss 22.3185 LearningRate 0.0964 Epoch: 0 Global Step: 4560 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:11:44,940-Speed 3095.73 samples/sec Loss 22.6775 LearningRate 0.0964 Epoch: 0 Global Step: 4570 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:11:48,272-Speed 3073.88 samples/sec Loss 22.1786 LearningRate 0.0963 Epoch: 0 Global Step: 4580 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:11:51,536-Speed 3138.14 samples/sec Loss 22.5031 LearningRate 0.0963 Epoch: 0 Global Step: 4590 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:11:54,830-Speed 3109.33 samples/sec Loss 22.2366 LearningRate 0.0963 Epoch: 0 Global Step: 4600 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:11:58,134-Speed 3100.20 samples/sec Loss 22.3342 LearningRate 0.0963 Epoch: 0 Global Step: 4610 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:12:01,455-Speed 3084.56 samples/sec Loss 22.1707 LearningRate 0.0963 Epoch: 0 Global Step: 4620 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:12:04,789-Speed 3072.99 samples/sec Loss 22.2078 LearningRate 0.0963 Epoch: 0 Global Step: 4630 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:12:08,074-Speed 3118.62 samples/sec Loss 22.1972 LearningRate 0.0963 Epoch: 0 Global Step: 4640 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:12:11,461-Speed 3024.22 samples/sec Loss 22.2321 LearningRate 0.0963 Epoch: 0 Global Step: 4650 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:12:14,784-Speed 3082.30 samples/sec Loss 22.2108 LearningRate 0.0963 Epoch: 0 Global Step: 4660 Fp16 Grad Scale: 262144 Required: 23 hours Training: 2022-04-27 02:12:18,099-Speed 3090.50 samples/sec Loss 21.9838 LearningRate 0.0963 Epoch: 0 Global Step: 4670 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:12:21,391-Speed 3110.77 samples/sec Loss 22.2515 LearningRate 0.0963 Epoch: 0 Global Step: 4680 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:12:24,735-Speed 3063.70 samples/sec Loss 22.0754 LearningRate 0.0963 Epoch: 0 Global Step: 4690 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:12:28,070-Speed 3071.07 samples/sec Loss 22.0252 LearningRate 0.0963 Epoch: 0 Global Step: 4700 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:12:31,403-Speed 3073.06 samples/sec Loss 21.8959 LearningRate 0.0962 Epoch: 0 Global Step: 4710 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:12:34,713-Speed 3095.45 samples/sec Loss 21.8883 LearningRate 0.0962 Epoch: 0 Global Step: 4720 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:12:38,016-Speed 3100.37 samples/sec Loss 22.0420 LearningRate 0.0962 Epoch: 0 Global Step: 4730 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:12:41,306-Speed 3113.66 samples/sec Loss 22.1140 LearningRate 0.0962 Epoch: 0 Global Step: 4740 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:12:44,618-Speed 3092.20 samples/sec Loss 22.1479 LearningRate 0.0962 Epoch: 0 Global Step: 4750 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:12:47,866-Speed 3154.06 samples/sec Loss 21.7360 LearningRate 0.0962 Epoch: 0 Global Step: 4760 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:12:51,145-Speed 3123.70 samples/sec Loss 21.9875 LearningRate 0.0962 Epoch: 0 Global Step: 4770 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:12:54,454-Speed 3095.44 samples/sec Loss 21.8168 LearningRate 0.0962 Epoch: 0 Global Step: 4780 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:12:57,832-Speed 3032.33 samples/sec Loss 21.8797 LearningRate 0.0962 Epoch: 0 Global Step: 4790 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:13:01,165-Speed 3073.48 samples/sec Loss 21.8195 LearningRate 0.0962 Epoch: 0 Global Step: 4800 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:13:04,488-Speed 3083.27 samples/sec Loss 21.8953 LearningRate 0.0962 Epoch: 0 Global Step: 4810 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:13:07,796-Speed 3096.01 samples/sec Loss 21.6565 LearningRate 0.0962 Epoch: 0 Global Step: 4820 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:13:11,139-Speed 3063.62 samples/sec Loss 21.8401 LearningRate 0.0961 Epoch: 0 Global Step: 4830 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:13:14,559-Speed 2994.88 samples/sec Loss 21.8014 LearningRate 0.0961 Epoch: 0 Global Step: 4840 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:13:17,868-Speed 3095.77 samples/sec Loss 21.6897 LearningRate 0.0961 Epoch: 0 Global Step: 4850 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:13:21,270-Speed 3011.53 samples/sec Loss 21.5503 LearningRate 0.0961 Epoch: 0 Global Step: 4860 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:13:24,556-Speed 3116.43 samples/sec Loss 21.7144 LearningRate 0.0961 Epoch: 0 Global Step: 4870 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:13:27,930-Speed 3036.89 samples/sec Loss 21.7674 LearningRate 0.0961 Epoch: 0 Global Step: 4880 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:13:31,282-Speed 3054.84 samples/sec Loss 21.4156 LearningRate 0.0961 Epoch: 0 Global Step: 4890 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:13:34,621-Speed 3068.71 samples/sec Loss 21.7066 LearningRate 0.0961 Epoch: 0 Global Step: 4900 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:13:37,986-Speed 3043.84 samples/sec Loss 21.4675 LearningRate 0.0961 Epoch: 0 Global Step: 4910 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:13:41,324-Speed 3068.43 samples/sec Loss 21.4850 LearningRate 0.0961 Epoch: 0 Global Step: 4920 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:13:44,639-Speed 3089.48 samples/sec Loss 21.3853 LearningRate 0.0961 Epoch: 0 Global Step: 4930 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:13:48,014-Speed 3035.64 samples/sec Loss 21.3980 LearningRate 0.0961 Epoch: 0 Global Step: 4940 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:13:51,301-Speed 3116.37 samples/sec Loss 21.3667 LearningRate 0.0961 Epoch: 0 Global Step: 4950 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:13:54,594-Speed 3109.97 samples/sec Loss 21.5551 LearningRate 0.0960 Epoch: 0 Global Step: 4960 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:13:57,871-Speed 3125.38 samples/sec Loss 21.4481 LearningRate 0.0960 Epoch: 0 Global Step: 4970 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:14:01,154-Speed 3120.45 samples/sec Loss 21.4850 LearningRate 0.0960 Epoch: 0 Global Step: 4980 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:14:04,470-Speed 3089.27 samples/sec Loss 21.5830 LearningRate 0.0960 Epoch: 0 Global Step: 4990 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:14:07,758-Speed 3115.07 samples/sec Loss 21.4911 LearningRate 0.0960 Epoch: 0 Global Step: 5000 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:14:11,018-Speed 3142.37 samples/sec Loss 21.3204 LearningRate 0.0960 Epoch: 0 Global Step: 5010 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:14:14,267-Speed 3152.50 samples/sec Loss 21.4283 LearningRate 0.0960 Epoch: 0 Global Step: 5020 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:14:17,528-Speed 3140.80 samples/sec Loss 21.4043 LearningRate 0.0960 Epoch: 0 Global Step: 5030 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:14:20,842-Speed 3091.39 samples/sec Loss 21.5063 LearningRate 0.0960 Epoch: 0 Global Step: 5040 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:14:24,143-Speed 3102.68 samples/sec Loss 21.2209 LearningRate 0.0960 Epoch: 0 Global Step: 5050 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:14:27,447-Speed 3100.98 samples/sec Loss 21.4731 LearningRate 0.0960 Epoch: 0 Global Step: 5060 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:14:30,846-Speed 3013.82 samples/sec Loss 21.1527 LearningRate 0.0960 Epoch: 0 Global Step: 5070 Fp16 Grad Scale: 262144 Required: 23 hours Training: 2022-04-27 02:14:34,137-Speed 3112.45 samples/sec Loss 21.2061 LearningRate 0.0960 Epoch: 0 Global Step: 5080 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:14:37,439-Speed 3102.35 samples/sec Loss 21.3664 LearningRate 0.0959 Epoch: 0 Global Step: 5090 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:14:40,756-Speed 3087.88 samples/sec Loss 21.2253 LearningRate 0.0959 Epoch: 0 Global Step: 5100 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:14:44,102-Speed 3061.73 samples/sec Loss 20.9697 LearningRate 0.0959 Epoch: 0 Global Step: 5110 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:14:47,450-Speed 3058.96 samples/sec Loss 21.2181 LearningRate 0.0959 Epoch: 0 Global Step: 5120 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:14:50,709-Speed 3143.44 samples/sec Loss 20.9689 LearningRate 0.0959 Epoch: 0 Global Step: 5130 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:14:54,076-Speed 3042.41 samples/sec Loss 20.9489 LearningRate 0.0959 Epoch: 0 Global Step: 5140 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:14:57,399-Speed 3082.29 samples/sec Loss 21.0492 LearningRate 0.0959 Epoch: 0 Global Step: 5150 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:15:00,746-Speed 3060.53 samples/sec Loss 21.1690 LearningRate 0.0959 Epoch: 0 Global Step: 5160 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:15:04,009-Speed 3138.74 samples/sec Loss 21.1233 LearningRate 0.0959 Epoch: 0 Global Step: 5170 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:15:07,285-Speed 3126.84 samples/sec Loss 20.9748 LearningRate 0.0959 Epoch: 0 Global Step: 5180 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:15:10,553-Speed 3134.67 samples/sec Loss 20.9830 LearningRate 0.0959 Epoch: 0 Global Step: 5190 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:15:13,858-Speed 3098.85 samples/sec Loss 20.7804 LearningRate 0.0959 Epoch: 0 Global Step: 5200 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:15:17,177-Speed 3086.93 samples/sec Loss 20.8297 LearningRate 0.0958 Epoch: 0 Global Step: 5210 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:15:20,541-Speed 3044.37 samples/sec Loss 20.9841 LearningRate 0.0958 Epoch: 0 Global Step: 5220 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:15:23,874-Speed 3073.68 samples/sec Loss 20.9984 LearningRate 0.0958 Epoch: 0 Global Step: 5230 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:15:27,176-Speed 3101.99 samples/sec Loss 21.0428 LearningRate 0.0958 Epoch: 0 Global Step: 5240 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:15:30,472-Speed 3107.62 samples/sec Loss 21.0216 LearningRate 0.0958 Epoch: 0 Global Step: 5250 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:15:33,820-Speed 3059.26 samples/sec Loss 21.0086 LearningRate 0.0958 Epoch: 0 Global Step: 5260 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:15:37,150-Speed 3075.77 samples/sec Loss 20.9215 LearningRate 0.0958 Epoch: 0 Global Step: 5270 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:15:40,425-Speed 3127.98 samples/sec Loss 20.7102 LearningRate 0.0958 Epoch: 0 Global Step: 5280 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:15:43,720-Speed 3108.74 samples/sec Loss 20.7838 LearningRate 0.0958 Epoch: 0 Global Step: 5290 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:15:47,047-Speed 3078.70 samples/sec Loss 20.7228 LearningRate 0.0958 Epoch: 0 Global Step: 5300 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:15:50,367-Speed 3085.70 samples/sec Loss 20.9335 LearningRate 0.0958 Epoch: 0 Global Step: 5310 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:15:53,705-Speed 3068.32 samples/sec Loss 20.7133 LearningRate 0.0958 Epoch: 0 Global Step: 5320 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:15:57,046-Speed 3065.68 samples/sec Loss 20.5680 LearningRate 0.0958 Epoch: 0 Global Step: 5330 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:16:00,424-Speed 3032.56 samples/sec Loss 20.9356 LearningRate 0.0957 Epoch: 0 Global Step: 5340 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:16:03,719-Speed 3109.49 samples/sec Loss 20.8468 LearningRate 0.0957 Epoch: 0 Global Step: 5350 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:16:07,010-Speed 3112.76 samples/sec Loss 20.9152 LearningRate 0.0957 Epoch: 0 Global Step: 5360 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:16:10,381-Speed 3037.76 samples/sec Loss 20.8011 LearningRate 0.0957 Epoch: 0 Global Step: 5370 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:16:13,707-Speed 3080.07 samples/sec Loss 20.8242 LearningRate 0.0957 Epoch: 0 Global Step: 5380 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:16:17,052-Speed 3062.11 samples/sec Loss 20.5695 LearningRate 0.0957 Epoch: 0 Global Step: 5390 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:16:20,393-Speed 3071.26 samples/sec Loss 20.5892 LearningRate 0.0957 Epoch: 0 Global Step: 5400 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:16:23,723-Speed 3075.97 samples/sec Loss 20.6710 LearningRate 0.0957 Epoch: 0 Global Step: 5410 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:16:26,976-Speed 3148.49 samples/sec Loss 20.3867 LearningRate 0.0957 Epoch: 0 Global Step: 5420 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:16:30,319-Speed 3064.62 samples/sec Loss 20.5128 LearningRate 0.0957 Epoch: 0 Global Step: 5430 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:16:33,706-Speed 3024.37 samples/sec Loss 20.4502 LearningRate 0.0957 Epoch: 0 Global Step: 5440 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:16:37,057-Speed 3056.03 samples/sec Loss 20.5432 LearningRate 0.0957 Epoch: 0 Global Step: 5450 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:16:40,377-Speed 3085.46 samples/sec Loss 20.4638 LearningRate 0.0957 Epoch: 0 Global Step: 5460 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:16:43,675-Speed 3105.57 samples/sec Loss 20.5217 LearningRate 0.0956 Epoch: 0 Global Step: 5470 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:16:46,994-Speed 3087.44 samples/sec Loss 20.5820 LearningRate 0.0956 Epoch: 0 Global Step: 5480 Fp16 Grad Scale: 262144 Required: 23 hours Training: 2022-04-27 02:16:50,355-Speed 3047.87 samples/sec Loss 20.4384 LearningRate 0.0956 Epoch: 0 Global Step: 5490 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:16:53,733-Speed 3032.15 samples/sec Loss 20.6463 LearningRate 0.0956 Epoch: 0 Global Step: 5500 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:16:57,001-Speed 3134.49 samples/sec Loss 20.5828 LearningRate 0.0956 Epoch: 0 Global Step: 5510 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:17:00,348-Speed 3060.32 samples/sec Loss 20.4949 LearningRate 0.0956 Epoch: 0 Global Step: 5520 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:17:03,652-Speed 3100.46 samples/sec Loss 20.3902 LearningRate 0.0956 Epoch: 0 Global Step: 5530 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:17:06,988-Speed 3070.29 samples/sec Loss 20.5649 LearningRate 0.0956 Epoch: 0 Global Step: 5540 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:17:10,305-Speed 3087.81 samples/sec Loss 20.5011 LearningRate 0.0956 Epoch: 0 Global Step: 5550 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:17:13,571-Speed 3136.43 samples/sec Loss 20.5212 LearningRate 0.0956 Epoch: 0 Global Step: 5560 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:17:16,869-Speed 3106.79 samples/sec Loss 20.4061 LearningRate 0.0956 Epoch: 0 Global Step: 5570 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:17:20,182-Speed 3090.96 samples/sec Loss 20.3915 LearningRate 0.0956 Epoch: 0 Global Step: 5580 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:17:23,502-Speed 3086.67 samples/sec Loss 20.4285 LearningRate 0.0956 Epoch: 0 Global Step: 5590 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:17:26,858-Speed 3051.57 samples/sec Loss 20.4319 LearningRate 0.0955 Epoch: 0 Global Step: 5600 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:17:30,215-Speed 3051.22 samples/sec Loss 20.3588 LearningRate 0.0955 Epoch: 0 Global Step: 5610 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:17:33,568-Speed 3054.78 samples/sec Loss 20.2047 LearningRate 0.0955 Epoch: 0 Global Step: 5620 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:17:36,888-Speed 3085.94 samples/sec Loss 20.2525 LearningRate 0.0955 Epoch: 0 Global Step: 5630 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:17:40,186-Speed 3105.27 samples/sec Loss 20.2722 LearningRate 0.0955 Epoch: 0 Global Step: 5640 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:17:43,575-Speed 3022.85 samples/sec Loss 20.3437 LearningRate 0.0955 Epoch: 0 Global Step: 5650 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:17:46,850-Speed 3127.58 samples/sec Loss 20.1233 LearningRate 0.0955 Epoch: 0 Global Step: 5660 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:17:50,147-Speed 3105.92 samples/sec Loss 20.2053 LearningRate 0.0955 Epoch: 0 Global Step: 5670 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:17:53,406-Speed 3143.57 samples/sec Loss 20.1774 LearningRate 0.0955 Epoch: 0 Global Step: 5680 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:17:56,681-Speed 3127.08 samples/sec Loss 20.3364 LearningRate 0.0955 Epoch: 0 Global Step: 5690 Fp16 Grad Scale: 262144 Required: 23 hours Training: 2022-04-27 02:17:59,996-Speed 3089.79 samples/sec Loss 20.2994 LearningRate 0.0955 Epoch: 0 Global Step: 5700 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:18:03,263-Speed 3135.97 samples/sec Loss 20.1159 LearningRate 0.0955 Epoch: 0 Global Step: 5710 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:18:06,629-Speed 3043.19 samples/sec Loss 20.2102 LearningRate 0.0954 Epoch: 0 Global Step: 5720 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:18:10,001-Speed 3037.94 samples/sec Loss 19.9816 LearningRate 0.0954 Epoch: 0 Global Step: 5730 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:18:13,324-Speed 3081.91 samples/sec Loss 20.1696 LearningRate 0.0954 Epoch: 0 Global Step: 5740 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:18:16,661-Speed 3069.64 samples/sec Loss 20.0288 LearningRate 0.0954 Epoch: 0 Global Step: 5750 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:18:19,985-Speed 3081.63 samples/sec Loss 19.9735 LearningRate 0.0954 Epoch: 0 Global Step: 5760 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:18:23,376-Speed 3020.68 samples/sec Loss 19.9983 LearningRate 0.0954 Epoch: 0 Global Step: 5770 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:18:26,711-Speed 3071.06 samples/sec Loss 20.2316 LearningRate 0.0954 Epoch: 0 Global Step: 5780 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:18:30,056-Speed 3062.60 samples/sec Loss 20.0249 LearningRate 0.0954 Epoch: 0 Global Step: 5790 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:18:33,418-Speed 3046.08 samples/sec Loss 19.9719 LearningRate 0.0954 Epoch: 0 Global Step: 5800 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:18:36,807-Speed 3022.69 samples/sec Loss 20.0541 LearningRate 0.0954 Epoch: 0 Global Step: 5810 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:18:40,112-Speed 3099.40 samples/sec Loss 19.8148 LearningRate 0.0954 Epoch: 0 Global Step: 5820 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:18:43,436-Speed 3082.25 samples/sec Loss 20.0510 LearningRate 0.0954 Epoch: 0 Global Step: 5830 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:18:46,771-Speed 3071.01 samples/sec Loss 20.0259 LearningRate 0.0954 Epoch: 0 Global Step: 5840 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:18:50,125-Speed 3054.58 samples/sec Loss 20.0811 LearningRate 0.0953 Epoch: 0 Global Step: 5850 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:18:53,413-Speed 3115.80 samples/sec Loss 19.8388 LearningRate 0.0953 Epoch: 0 Global Step: 5860 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:18:56,736-Speed 3082.49 samples/sec Loss 19.9428 LearningRate 0.0953 Epoch: 0 Global Step: 5870 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:19:00,082-Speed 3061.38 samples/sec Loss 20.0231 LearningRate 0.0953 Epoch: 0 Global Step: 5880 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:19:03,403-Speed 3084.25 samples/sec Loss 20.0548 LearningRate 0.0953 Epoch: 0 Global Step: 5890 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:19:06,725-Speed 3083.36 samples/sec Loss 19.9491 LearningRate 0.0953 Epoch: 0 Global Step: 5900 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:19:10,065-Speed 3067.24 samples/sec Loss 19.7273 LearningRate 0.0953 Epoch: 0 Global Step: 5910 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:19:13,377-Speed 3092.24 samples/sec Loss 19.7878 LearningRate 0.0953 Epoch: 0 Global Step: 5920 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:19:16,654-Speed 3126.33 samples/sec Loss 19.9982 LearningRate 0.0953 Epoch: 0 Global Step: 5930 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:19:20,022-Speed 3040.84 samples/sec Loss 19.9279 LearningRate 0.0953 Epoch: 0 Global Step: 5940 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:19:23,394-Speed 3037.75 samples/sec Loss 20.0919 LearningRate 0.0953 Epoch: 0 Global Step: 5950 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:19:26,730-Speed 3070.73 samples/sec Loss 19.8975 LearningRate 0.0953 Epoch: 0 Global Step: 5960 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:19:30,073-Speed 3063.31 samples/sec Loss 19.8229 LearningRate 0.0953 Epoch: 0 Global Step: 5970 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:19:33,369-Speed 3107.92 samples/sec Loss 19.8636 LearningRate 0.0952 Epoch: 0 Global Step: 5980 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-04-27 02:19:36,748-Speed 3031.16 samples/sec Loss 19.6161 LearningRate 0.0952 Epoch: 0 Global Step: 5990 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-04-27 02:19:40,053-Speed 3099.56 samples/sec Loss 19.7748 LearningRate 0.0952 Epoch: 0 Global Step: 6000 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-04-27 02:19:43,367-Speed 3091.30 samples/sec Loss 19.8355 LearningRate 0.0952 Epoch: 0 Global Step: 6010 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-04-27 02:19:46,658-Speed 3112.67 samples/sec Loss 19.5788 LearningRate 0.0952 Epoch: 0 Global Step: 6020 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-04-27 02:19:49,968-Speed 3094.25 samples/sec Loss 19.7887 LearningRate 0.0952 Epoch: 0 Global Step: 6030 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-04-27 02:19:53,301-Speed 3073.13 samples/sec Loss 19.6441 LearningRate 0.0952 Epoch: 0 Global Step: 6040 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-04-27 02:19:56,623-Speed 3083.51 samples/sec Loss 19.5760 LearningRate 0.0952 Epoch: 0 Global Step: 6050 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-04-27 02:19:59,975-Speed 3055.85 samples/sec Loss 19.7376 LearningRate 0.0952 Epoch: 0 Global Step: 6060 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-04-27 02:20:03,313-Speed 3067.89 samples/sec Loss 19.7200 LearningRate 0.0952 Epoch: 0 Global Step: 6070 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-04-27 02:20:06,598-Speed 3118.45 samples/sec Loss 19.6947 LearningRate 0.0952 Epoch: 0 Global Step: 6080 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:20:09,867-Speed 3133.97 samples/sec Loss 19.5458 LearningRate 0.0952 Epoch: 0 Global Step: 6090 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:20:13,127-Speed 3141.58 samples/sec Loss 19.8680 LearningRate 0.0951 Epoch: 0 Global Step: 6100 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:20:16,412-Speed 3118.81 samples/sec Loss 19.5631 LearningRate 0.0951 Epoch: 0 Global Step: 6110 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:20:19,717-Speed 3099.04 samples/sec Loss 19.5974 LearningRate 0.0951 Epoch: 0 Global Step: 6120 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:20:23,076-Speed 3048.99 samples/sec Loss 19.7132 LearningRate 0.0951 Epoch: 0 Global Step: 6130 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:20:26,392-Speed 3089.16 samples/sec Loss 19.5789 LearningRate 0.0951 Epoch: 0 Global Step: 6140 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:20:29,701-Speed 3095.23 samples/sec Loss 19.5181 LearningRate 0.0951 Epoch: 0 Global Step: 6150 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:20:33,085-Speed 3027.44 samples/sec Loss 19.5558 LearningRate 0.0951 Epoch: 0 Global Step: 6160 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:20:36,385-Speed 3103.66 samples/sec Loss 19.4829 LearningRate 0.0951 Epoch: 0 Global Step: 6170 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:20:39,716-Speed 3075.59 samples/sec Loss 19.4522 LearningRate 0.0951 Epoch: 0 Global Step: 6180 Fp16 Grad Scale: 262144 Required: 23 hours Training: 2022-04-27 02:20:43,005-Speed 3113.75 samples/sec Loss 19.4170 LearningRate 0.0951 Epoch: 0 Global Step: 6190 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:20:46,325-Speed 3085.07 samples/sec Loss 19.4612 LearningRate 0.0951 Epoch: 0 Global Step: 6200 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:20:49,621-Speed 3108.40 samples/sec Loss 19.5873 LearningRate 0.0951 Epoch: 0 Global Step: 6210 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:20:52,946-Speed 3080.75 samples/sec Loss 19.4750 LearningRate 0.0951 Epoch: 0 Global Step: 6220 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:20:56,237-Speed 3112.41 samples/sec Loss 19.5312 LearningRate 0.0950 Epoch: 0 Global Step: 6230 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:20:59,561-Speed 3081.36 samples/sec Loss 19.4880 LearningRate 0.0950 Epoch: 0 Global Step: 6240 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:21:02,881-Speed 3085.66 samples/sec Loss 19.3268 LearningRate 0.0950 Epoch: 0 Global Step: 6250 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:21:06,259-Speed 3031.63 samples/sec Loss 19.4976 LearningRate 0.0950 Epoch: 0 Global Step: 6260 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:21:09,540-Speed 3122.83 samples/sec Loss 19.3773 LearningRate 0.0950 Epoch: 0 Global Step: 6270 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:21:12,897-Speed 3051.33 samples/sec Loss 19.5430 LearningRate 0.0950 Epoch: 0 Global Step: 6280 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:21:16,263-Speed 3042.82 samples/sec Loss 19.4248 LearningRate 0.0950 Epoch: 0 Global Step: 6290 Fp16 Grad Scale: 262144 Required: 23 hours Training: 2022-04-27 02:21:19,631-Speed 3040.83 samples/sec Loss 19.3737 LearningRate 0.0950 Epoch: 0 Global Step: 6300 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:21:22,993-Speed 3046.80 samples/sec Loss 19.3043 LearningRate 0.0950 Epoch: 0 Global Step: 6310 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:21:26,307-Speed 3091.01 samples/sec Loss 19.2820 LearningRate 0.0950 Epoch: 0 Global Step: 6320 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:21:29,627-Speed 3085.41 samples/sec Loss 19.2460 LearningRate 0.0950 Epoch: 0 Global Step: 6330 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:21:32,948-Speed 3084.15 samples/sec Loss 19.2436 LearningRate 0.0950 Epoch: 0 Global Step: 6340 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:21:36,241-Speed 3110.85 samples/sec Loss 19.3404 LearningRate 0.0950 Epoch: 0 Global Step: 6350 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:21:39,530-Speed 3114.70 samples/sec Loss 19.2873 LearningRate 0.0949 Epoch: 0 Global Step: 6360 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:21:42,866-Speed 3069.71 samples/sec Loss 19.1819 LearningRate 0.0949 Epoch: 0 Global Step: 6370 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:21:46,198-Speed 3074.28 samples/sec Loss 19.2888 LearningRate 0.0949 Epoch: 0 Global Step: 6380 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:21:49,475-Speed 3126.17 samples/sec Loss 19.0305 LearningRate 0.0949 Epoch: 0 Global Step: 6390 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:21:52,779-Speed 3100.11 samples/sec Loss 19.1540 LearningRate 0.0949 Epoch: 0 Global Step: 6400 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:21:56,036-Speed 3144.80 samples/sec Loss 19.2567 LearningRate 0.0949 Epoch: 0 Global Step: 6410 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:21:59,353-Speed 3087.58 samples/sec Loss 19.3704 LearningRate 0.0949 Epoch: 0 Global Step: 6420 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:22:02,742-Speed 3023.19 samples/sec Loss 19.2555 LearningRate 0.0949 Epoch: 0 Global Step: 6430 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:22:06,044-Speed 3101.59 samples/sec Loss 19.2670 LearningRate 0.0949 Epoch: 0 Global Step: 6440 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:22:09,343-Speed 3104.95 samples/sec Loss 19.2263 LearningRate 0.0949 Epoch: 0 Global Step: 6450 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:22:12,620-Speed 3125.83 samples/sec Loss 19.3695 LearningRate 0.0949 Epoch: 0 Global Step: 6460 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:22:15,869-Speed 3152.84 samples/sec Loss 19.2478 LearningRate 0.0949 Epoch: 0 Global Step: 6470 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:22:19,171-Speed 3102.25 samples/sec Loss 19.2846 LearningRate 0.0949 Epoch: 0 Global Step: 6480 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:22:22,475-Speed 3099.98 samples/sec Loss 18.9674 LearningRate 0.0948 Epoch: 0 Global Step: 6490 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:22:25,723-Speed 3154.72 samples/sec Loss 19.0851 LearningRate 0.0948 Epoch: 0 Global Step: 6500 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:22:29,014-Speed 3112.02 samples/sec Loss 19.2437 LearningRate 0.0948 Epoch: 0 Global Step: 6510 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:22:32,305-Speed 3113.22 samples/sec Loss 19.0568 LearningRate 0.0948 Epoch: 0 Global Step: 6520 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:22:35,578-Speed 3129.66 samples/sec Loss 19.1583 LearningRate 0.0948 Epoch: 0 Global Step: 6530 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:22:38,865-Speed 3115.81 samples/sec Loss 19.1227 LearningRate 0.0948 Epoch: 0 Global Step: 6540 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:22:42,146-Speed 3121.41 samples/sec Loss 19.2857 LearningRate 0.0948 Epoch: 0 Global Step: 6550 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:22:45,505-Speed 3049.57 samples/sec Loss 19.1820 LearningRate 0.0948 Epoch: 0 Global Step: 6560 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:22:48,785-Speed 3123.23 samples/sec Loss 18.9996 LearningRate 0.0948 Epoch: 0 Global Step: 6570 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:22:52,139-Speed 3054.06 samples/sec Loss 19.1636 LearningRate 0.0948 Epoch: 0 Global Step: 6580 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:22:55,459-Speed 3085.68 samples/sec Loss 19.1632 LearningRate 0.0948 Epoch: 0 Global Step: 6590 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:22:58,807-Speed 3058.41 samples/sec Loss 19.0658 LearningRate 0.0948 Epoch: 0 Global Step: 6600 Fp16 Grad Scale: 262144 Required: 23 hours Training: 2022-04-27 02:23:02,153-Speed 3061.99 samples/sec Loss 19.0382 LearningRate 0.0947 Epoch: 0 Global Step: 6610 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:23:05,488-Speed 3071.30 samples/sec Loss 19.2107 LearningRate 0.0947 Epoch: 0 Global Step: 6620 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:23:08,793-Speed 3099.34 samples/sec Loss 19.2471 LearningRate 0.0947 Epoch: 0 Global Step: 6630 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:23:12,107-Speed 3090.06 samples/sec Loss 18.9455 LearningRate 0.0947 Epoch: 0 Global Step: 6640 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:23:15,391-Speed 3119.59 samples/sec Loss 18.9784 LearningRate 0.0947 Epoch: 0 Global Step: 6650 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:23:18,697-Speed 3098.57 samples/sec Loss 19.1930 LearningRate 0.0947 Epoch: 0 Global Step: 6660 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:23:22,079-Speed 3028.48 samples/sec Loss 19.1510 LearningRate 0.0947 Epoch: 0 Global Step: 6670 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:23:25,433-Speed 3053.94 samples/sec Loss 19.2007 LearningRate 0.0947 Epoch: 0 Global Step: 6680 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:23:28,788-Speed 3053.42 samples/sec Loss 19.1713 LearningRate 0.0947 Epoch: 0 Global Step: 6690 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:23:32,060-Speed 3130.30 samples/sec Loss 19.0521 LearningRate 0.0947 Epoch: 0 Global Step: 6700 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:23:35,343-Speed 3119.43 samples/sec Loss 18.8403 LearningRate 0.0947 Epoch: 0 Global Step: 6710 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:23:38,659-Speed 3089.66 samples/sec Loss 18.9383 LearningRate 0.0947 Epoch: 0 Global Step: 6720 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:23:41,968-Speed 3094.60 samples/sec Loss 19.0378 LearningRate 0.0947 Epoch: 0 Global Step: 6730 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:23:45,251-Speed 3120.17 samples/sec Loss 18.9137 LearningRate 0.0946 Epoch: 0 Global Step: 6740 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:23:48,554-Speed 3101.85 samples/sec Loss 18.9992 LearningRate 0.0946 Epoch: 0 Global Step: 6750 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:23:51,862-Speed 3096.42 samples/sec Loss 19.0133 LearningRate 0.0946 Epoch: 0 Global Step: 6760 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:23:55,211-Speed 3058.27 samples/sec Loss 19.0012 LearningRate 0.0946 Epoch: 0 Global Step: 6770 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:23:59,230-Speed 2548.59 samples/sec Loss 18.7555 LearningRate 0.0946 Epoch: 0 Global Step: 6780 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:24:02,534-Speed 3100.49 samples/sec Loss 18.9308 LearningRate 0.0946 Epoch: 0 Global Step: 6790 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:24:05,872-Speed 3068.34 samples/sec Loss 18.8846 LearningRate 0.0946 Epoch: 0 Global Step: 6800 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:24:11,167-Speed 1934.03 samples/sec Loss 19.0349 LearningRate 0.0946 Epoch: 0 Global Step: 6810 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:24:14,568-Speed 3012.45 samples/sec Loss 18.9321 LearningRate 0.0946 Epoch: 0 Global Step: 6820 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:24:17,896-Speed 3078.36 samples/sec Loss 18.8400 LearningRate 0.0946 Epoch: 0 Global Step: 6830 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:24:21,176-Speed 3122.53 samples/sec Loss 18.8680 LearningRate 0.0946 Epoch: 0 Global Step: 6840 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:24:24,482-Speed 3098.77 samples/sec Loss 18.8512 LearningRate 0.0946 Epoch: 0 Global Step: 6850 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:24:27,840-Speed 3050.48 samples/sec Loss 19.1030 LearningRate 0.0946 Epoch: 0 Global Step: 6860 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:24:31,158-Speed 3087.41 samples/sec Loss 19.0234 LearningRate 0.0945 Epoch: 0 Global Step: 6870 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:24:34,472-Speed 3090.32 samples/sec Loss 18.7708 LearningRate 0.0945 Epoch: 0 Global Step: 6880 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:24:37,764-Speed 3111.58 samples/sec Loss 18.8108 LearningRate 0.0945 Epoch: 0 Global Step: 6890 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:24:41,132-Speed 3041.59 samples/sec Loss 19.0234 LearningRate 0.0945 Epoch: 0 Global Step: 6900 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:24:44,483-Speed 3057.71 samples/sec Loss 18.8013 LearningRate 0.0945 Epoch: 0 Global Step: 6910 Fp16 Grad Scale: 262144 Required: 23 hours Training: 2022-04-27 02:24:47,760-Speed 3125.70 samples/sec Loss 18.8203 LearningRate 0.0945 Epoch: 0 Global Step: 6920 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:24:51,077-Speed 3088.24 samples/sec Loss 18.8103 LearningRate 0.0945 Epoch: 0 Global Step: 6930 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:24:54,386-Speed 3095.18 samples/sec Loss 18.8100 LearningRate 0.0945 Epoch: 0 Global Step: 6940 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:24:57,766-Speed 3031.30 samples/sec Loss 18.8650 LearningRate 0.0945 Epoch: 0 Global Step: 6950 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:25:01,031-Speed 3137.04 samples/sec Loss 18.7353 LearningRate 0.0945 Epoch: 0 Global Step: 6960 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:25:04,327-Speed 3107.21 samples/sec Loss 18.7551 LearningRate 0.0945 Epoch: 0 Global Step: 6970 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:25:07,667-Speed 3066.57 samples/sec Loss 18.6478 LearningRate 0.0945 Epoch: 0 Global Step: 6980 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:25:11,002-Speed 3071.94 samples/sec Loss 18.7904 LearningRate 0.0945 Epoch: 0 Global Step: 6990 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:25:14,292-Speed 3113.98 samples/sec Loss 18.6178 LearningRate 0.0944 Epoch: 0 Global Step: 7000 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-04-27 02:25:17,629-Speed 3068.95 samples/sec Loss 19.0064 LearningRate 0.0944 Epoch: 0 Global Step: 7010 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-04-27 02:25:20,952-Speed 3082.48 samples/sec Loss 18.7119 LearningRate 0.0944 Epoch: 0 Global Step: 7020 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-04-27 02:25:24,279-Speed 3078.54 samples/sec Loss 18.5394 LearningRate 0.0944 Epoch: 0 Global Step: 7030 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-04-27 02:25:27,561-Speed 3121.39 samples/sec Loss 18.5314 LearningRate 0.0944 Epoch: 0 Global Step: 7040 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-04-27 02:25:30,979-Speed 2996.73 samples/sec Loss 18.6692 LearningRate 0.0944 Epoch: 0 Global Step: 7050 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-04-27 02:25:34,313-Speed 3072.20 samples/sec Loss 18.7178 LearningRate 0.0944 Epoch: 0 Global Step: 7060 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-04-27 02:25:37,564-Speed 3150.91 samples/sec Loss 18.5977 LearningRate 0.0944 Epoch: 0 Global Step: 7070 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-04-27 02:25:40,833-Speed 3133.59 samples/sec Loss 18.8118 LearningRate 0.0944 Epoch: 0 Global Step: 7080 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-04-27 02:25:44,129-Speed 3108.03 samples/sec Loss 18.7536 LearningRate 0.0944 Epoch: 0 Global Step: 7090 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-04-27 02:25:47,459-Speed 3075.64 samples/sec Loss 18.6429 LearningRate 0.0944 Epoch: 0 Global Step: 7100 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:25:50,795-Speed 3070.64 samples/sec Loss 18.5319 LearningRate 0.0944 Epoch: 0 Global Step: 7110 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:25:54,134-Speed 3067.29 samples/sec Loss 18.7066 LearningRate 0.0943 Epoch: 0 Global Step: 7120 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:25:57,484-Speed 3058.11 samples/sec Loss 18.5186 LearningRate 0.0943 Epoch: 0 Global Step: 7130 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:26:00,815-Speed 3074.83 samples/sec Loss 18.5296 LearningRate 0.0943 Epoch: 0 Global Step: 7140 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-04-27 02:26:04,155-Speed 3067.09 samples/sec Loss 18.5767 LearningRate 0.0943 Epoch: 0 Global Step: 7150 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-04-27 02:26:07,503-Speed 3059.45 samples/sec Loss 18.7272 LearningRate 0.0943 Epoch: 0 Global Step: 7160 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-04-27 02:26:10,807-Speed 3099.95 samples/sec Loss 18.7411 LearningRate 0.0943 Epoch: 0 Global Step: 7170 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-04-27 02:26:14,069-Speed 3139.91 samples/sec Loss 18.6171 LearningRate 0.0943 Epoch: 0 Global Step: 7180 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-04-27 02:26:17,417-Speed 3060.03 samples/sec Loss 18.5394 LearningRate 0.0943 Epoch: 0 Global Step: 7190 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-04-27 02:26:20,765-Speed 3058.87 samples/sec Loss 18.5701 LearningRate 0.0943 Epoch: 0 Global Step: 7200 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-04-27 02:26:24,104-Speed 3068.41 samples/sec Loss 18.6500 LearningRate 0.0943 Epoch: 0 Global Step: 7210 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-04-27 02:26:27,371-Speed 3134.98 samples/sec Loss 18.5264 LearningRate 0.0943 Epoch: 0 Global Step: 7220 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-04-27 02:26:30,706-Speed 3071.25 samples/sec Loss 18.4681 LearningRate 0.0943 Epoch: 0 Global Step: 7230 Fp16 Grad Scale: 65536 Required: 23 hours Training: 2022-04-27 02:26:34,084-Speed 3032.40 samples/sec Loss 18.5810 LearningRate 0.0943 Epoch: 0 Global Step: 7240 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:26:37,413-Speed 3076.73 samples/sec Loss 18.3736 LearningRate 0.0942 Epoch: 0 Global Step: 7250 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:26:40,724-Speed 3093.26 samples/sec Loss 18.4527 LearningRate 0.0942 Epoch: 0 Global Step: 7260 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:26:44,072-Speed 3059.64 samples/sec Loss 18.5230 LearningRate 0.0942 Epoch: 0 Global Step: 7270 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:26:47,486-Speed 3000.62 samples/sec Loss 18.4635 LearningRate 0.0942 Epoch: 0 Global Step: 7280 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:26:50,869-Speed 3028.26 samples/sec Loss 18.4608 LearningRate 0.0942 Epoch: 0 Global Step: 7290 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:26:54,193-Speed 3080.99 samples/sec Loss 18.3017 LearningRate 0.0942 Epoch: 0 Global Step: 7300 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:26:57,539-Speed 3061.26 samples/sec Loss 18.5489 LearningRate 0.0942 Epoch: 0 Global Step: 7310 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:27:00,875-Speed 3070.68 samples/sec Loss 18.4116 LearningRate 0.0942 Epoch: 0 Global Step: 7320 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:27:04,237-Speed 3046.90 samples/sec Loss 18.3576 LearningRate 0.0942 Epoch: 0 Global Step: 7330 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:27:07,563-Speed 3079.69 samples/sec Loss 18.4663 LearningRate 0.0942 Epoch: 0 Global Step: 7340 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:27:10,880-Speed 3087.88 samples/sec Loss 18.3471 LearningRate 0.0942 Epoch: 0 Global Step: 7350 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:27:14,144-Speed 3138.23 samples/sec Loss 18.5248 LearningRate 0.0942 Epoch: 0 Global Step: 7360 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:27:17,415-Speed 3131.78 samples/sec Loss 18.3356 LearningRate 0.0942 Epoch: 0 Global Step: 7370 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:27:20,688-Speed 3129.47 samples/sec Loss 18.3940 LearningRate 0.0941 Epoch: 0 Global Step: 7380 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:27:23,978-Speed 3112.90 samples/sec Loss 18.3422 LearningRate 0.0941 Epoch: 0 Global Step: 7390 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:27:27,363-Speed 3025.93 samples/sec Loss 18.4069 LearningRate 0.0941 Epoch: 0 Global Step: 7400 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:27:30,646-Speed 3120.77 samples/sec Loss 18.3746 LearningRate 0.0941 Epoch: 0 Global Step: 7410 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:27:33,950-Speed 3100.29 samples/sec Loss 18.2429 LearningRate 0.0941 Epoch: 0 Global Step: 7420 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:27:37,238-Speed 3114.68 samples/sec Loss 18.3878 LearningRate 0.0941 Epoch: 0 Global Step: 7430 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:27:40,562-Speed 3082.13 samples/sec Loss 18.1791 LearningRate 0.0941 Epoch: 0 Global Step: 7440 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:27:43,922-Speed 3048.37 samples/sec Loss 18.3116 LearningRate 0.0941 Epoch: 0 Global Step: 7450 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:27:47,253-Speed 3074.87 samples/sec Loss 18.2192 LearningRate 0.0941 Epoch: 0 Global Step: 7460 Fp16 Grad Scale: 131072 Required: 23 hours Training: 2022-04-27 02:27:50,545-Speed 3111.57 samples/sec Loss 18.3357 LearningRate 0.0941 Epoch: 0 Global Step: 7470 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:27:53,863-Speed 3086.74 samples/sec Loss 18.5241 LearningRate 0.0941 Epoch: 0 Global Step: 7480 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:27:57,239-Speed 3034.44 samples/sec Loss 18.3231 LearningRate 0.0941 Epoch: 0 Global Step: 7490 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:28:00,616-Speed 3033.12 samples/sec Loss 18.2861 LearningRate 0.0941 Epoch: 0 Global Step: 7500 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:28:03,958-Speed 3065.14 samples/sec Loss 18.2909 LearningRate 0.0940 Epoch: 0 Global Step: 7510 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:28:07,320-Speed 3047.22 samples/sec Loss 18.4021 LearningRate 0.0940 Epoch: 0 Global Step: 7520 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:28:10,662-Speed 3064.57 samples/sec Loss 18.3555 LearningRate 0.0940 Epoch: 0 Global Step: 7530 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:28:13,974-Speed 3092.71 samples/sec Loss 18.3429 LearningRate 0.0940 Epoch: 0 Global Step: 7540 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:28:17,267-Speed 3110.69 samples/sec Loss 18.4042 LearningRate 0.0940 Epoch: 0 Global Step: 7550 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:28:20,641-Speed 3036.18 samples/sec Loss 18.2926 LearningRate 0.0940 Epoch: 0 Global Step: 7560 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:28:23,927-Speed 3117.25 samples/sec Loss 18.3246 LearningRate 0.0940 Epoch: 0 Global Step: 7570 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:28:27,243-Speed 3088.86 samples/sec Loss 18.3415 LearningRate 0.0940 Epoch: 0 Global Step: 7580 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:28:30,600-Speed 3051.73 samples/sec Loss 18.5134 LearningRate 0.0940 Epoch: 0 Global Step: 7590 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:28:33,975-Speed 3034.70 samples/sec Loss 18.4641 LearningRate 0.0940 Epoch: 0 Global Step: 7600 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:28:37,308-Speed 3073.66 samples/sec Loss 18.3261 LearningRate 0.0940 Epoch: 0 Global Step: 7610 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:28:40,616-Speed 3096.66 samples/sec Loss 18.3400 LearningRate 0.0940 Epoch: 0 Global Step: 7620 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:28:43,983-Speed 3041.98 samples/sec Loss 18.2671 LearningRate 0.0940 Epoch: 0 Global Step: 7630 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:28:47,279-Speed 3108.17 samples/sec Loss 18.1149 LearningRate 0.0939 Epoch: 0 Global Step: 7640 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-04-27 02:28:50,588-Speed 3095.56 samples/sec Loss 18.2471 LearningRate 0.0939 Epoch: 0 Global Step: 7650 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:28:53,917-Speed 3077.36 samples/sec Loss 18.2229 LearningRate 0.0939 Epoch: 0 Global Step: 7660 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:28:57,202-Speed 3117.24 samples/sec Loss 18.2652 LearningRate 0.0939 Epoch: 0 Global Step: 7670 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:29:00,508-Speed 3098.97 samples/sec Loss 18.4120 LearningRate 0.0939 Epoch: 0 Global Step: 7680 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:29:03,808-Speed 3103.32 samples/sec Loss 18.2114 LearningRate 0.0939 Epoch: 0 Global Step: 7690 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:29:07,094-Speed 3117.17 samples/sec Loss 18.2054 LearningRate 0.0939 Epoch: 0 Global Step: 7700 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:29:10,358-Speed 3138.96 samples/sec Loss 18.2410 LearningRate 0.0939 Epoch: 0 Global Step: 7710 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:29:13,696-Speed 3068.52 samples/sec Loss 18.2278 LearningRate 0.0939 Epoch: 0 Global Step: 7720 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:29:17,021-Speed 3080.57 samples/sec Loss 18.3821 LearningRate 0.0939 Epoch: 0 Global Step: 7730 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:29:20,327-Speed 3098.66 samples/sec Loss 18.2573 LearningRate 0.0939 Epoch: 0 Global Step: 7740 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:29:23,653-Speed 3079.34 samples/sec Loss 18.3979 LearningRate 0.0939 Epoch: 0 Global Step: 7750 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:29:27,019-Speed 3043.73 samples/sec Loss 18.2032 LearningRate 0.0939 Epoch: 0 Global Step: 7760 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:29:30,312-Speed 3110.36 samples/sec Loss 18.1230 LearningRate 0.0938 Epoch: 0 Global Step: 7770 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:29:33,657-Speed 3061.93 samples/sec Loss 18.1221 LearningRate 0.0938 Epoch: 0 Global Step: 7780 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:29:36,936-Speed 3123.90 samples/sec Loss 18.1456 LearningRate 0.0938 Epoch: 0 Global Step: 7790 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:29:40,258-Speed 3083.18 samples/sec Loss 18.1196 LearningRate 0.0938 Epoch: 0 Global Step: 7800 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:29:43,575-Speed 3087.56 samples/sec Loss 18.0998 LearningRate 0.0938 Epoch: 0 Global Step: 7810 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:29:46,906-Speed 3075.62 samples/sec Loss 18.2451 LearningRate 0.0938 Epoch: 0 Global Step: 7820 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:29:50,284-Speed 3032.53 samples/sec Loss 18.1917 LearningRate 0.0938 Epoch: 0 Global Step: 7830 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:29:53,602-Speed 3086.67 samples/sec Loss 18.1072 LearningRate 0.0938 Epoch: 0 Global Step: 7840 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:29:56,922-Speed 3085.54 samples/sec Loss 18.0981 LearningRate 0.0938 Epoch: 0 Global Step: 7850 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:30:00,255-Speed 3073.25 samples/sec Loss 18.0270 LearningRate 0.0938 Epoch: 0 Global Step: 7860 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:30:03,585-Speed 3076.28 samples/sec Loss 18.1079 LearningRate 0.0938 Epoch: 0 Global Step: 7870 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:30:06,943-Speed 3049.32 samples/sec Loss 18.0615 LearningRate 0.0938 Epoch: 0 Global Step: 7880 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:30:10,255-Speed 3093.79 samples/sec Loss 18.2506 LearningRate 0.0937 Epoch: 0 Global Step: 7890 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:30:13,600-Speed 3061.65 samples/sec Loss 18.0253 LearningRate 0.0937 Epoch: 0 Global Step: 7900 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:30:16,885-Speed 3119.11 samples/sec Loss 18.1659 LearningRate 0.0937 Epoch: 0 Global Step: 7910 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:30:20,158-Speed 3129.68 samples/sec Loss 18.1782 LearningRate 0.0937 Epoch: 0 Global Step: 7920 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:30:23,596-Speed 2979.79 samples/sec Loss 18.0803 LearningRate 0.0937 Epoch: 0 Global Step: 7930 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:30:26,913-Speed 3087.91 samples/sec Loss 18.0765 LearningRate 0.0937 Epoch: 0 Global Step: 7940 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:30:30,337-Speed 2991.56 samples/sec Loss 17.9781 LearningRate 0.0937 Epoch: 0 Global Step: 7950 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:30:33,634-Speed 3107.05 samples/sec Loss 18.0380 LearningRate 0.0937 Epoch: 0 Global Step: 7960 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:30:37,011-Speed 3033.19 samples/sec Loss 17.9751 LearningRate 0.0937 Epoch: 0 Global Step: 7970 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:30:40,406-Speed 3016.96 samples/sec Loss 17.8670 LearningRate 0.0937 Epoch: 0 Global Step: 7980 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:30:43,673-Speed 3135.71 samples/sec Loss 18.1273 LearningRate 0.0937 Epoch: 0 Global Step: 7990 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:30:46,941-Speed 3133.81 samples/sec Loss 17.9458 LearningRate 0.0937 Epoch: 0 Global Step: 8000 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:30:50,296-Speed 3053.49 samples/sec Loss 17.9314 LearningRate 0.0937 Epoch: 0 Global Step: 8010 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:30:53,621-Speed 3080.09 samples/sec Loss 17.8991 LearningRate 0.0936 Epoch: 0 Global Step: 8020 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:30:56,953-Speed 3074.00 samples/sec Loss 17.9511 LearningRate 0.0936 Epoch: 0 Global Step: 8030 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:31:00,295-Speed 3065.11 samples/sec Loss 17.7489 LearningRate 0.0936 Epoch: 0 Global Step: 8040 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:31:03,629-Speed 3073.02 samples/sec Loss 17.9536 LearningRate 0.0936 Epoch: 0 Global Step: 8050 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:31:06,906-Speed 3125.03 samples/sec Loss 18.1116 LearningRate 0.0936 Epoch: 0 Global Step: 8060 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:31:10,236-Speed 3076.63 samples/sec Loss 17.8576 LearningRate 0.0936 Epoch: 0 Global Step: 8070 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:31:13,566-Speed 3075.37 samples/sec Loss 17.9977 LearningRate 0.0936 Epoch: 0 Global Step: 8080 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:31:17,002-Speed 2984.55 samples/sec Loss 17.9348 LearningRate 0.0936 Epoch: 0 Global Step: 8090 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:31:20,349-Speed 3060.10 samples/sec Loss 17.9370 LearningRate 0.0936 Epoch: 0 Global Step: 8100 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:31:23,701-Speed 3055.68 samples/sec Loss 17.7131 LearningRate 0.0936 Epoch: 0 Global Step: 8110 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-04-27 02:31:27,019-Speed 3087.59 samples/sec Loss 17.8274 LearningRate 0.0936 Epoch: 0 Global Step: 8120 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:31:30,304-Speed 3118.29 samples/sec Loss 18.0532 LearningRate 0.0936 Epoch: 0 Global Step: 8130 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:31:33,646-Speed 3064.52 samples/sec Loss 17.9400 LearningRate 0.0936 Epoch: 0 Global Step: 8140 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:31:37,003-Speed 3051.32 samples/sec Loss 17.8820 LearningRate 0.0935 Epoch: 0 Global Step: 8150 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:31:40,379-Speed 3034.43 samples/sec Loss 17.7226 LearningRate 0.0935 Epoch: 0 Global Step: 8160 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:31:43,718-Speed 3067.33 samples/sec Loss 17.8105 LearningRate 0.0935 Epoch: 0 Global Step: 8170 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:31:47,026-Speed 3096.86 samples/sec Loss 17.6787 LearningRate 0.0935 Epoch: 0 Global Step: 8180 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:31:50,342-Speed 3088.90 samples/sec Loss 17.8503 LearningRate 0.0935 Epoch: 0 Global Step: 8190 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:31:53,671-Speed 3076.16 samples/sec Loss 17.8931 LearningRate 0.0935 Epoch: 0 Global Step: 8200 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:31:57,001-Speed 3076.37 samples/sec Loss 17.9700 LearningRate 0.0935 Epoch: 0 Global Step: 8210 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:32:00,295-Speed 3110.05 samples/sec Loss 17.9094 LearningRate 0.0935 Epoch: 0 Global Step: 8220 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:32:03,616-Speed 3083.63 samples/sec Loss 17.8457 LearningRate 0.0935 Epoch: 0 Global Step: 8230 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:32:06,943-Speed 3079.00 samples/sec Loss 17.6947 LearningRate 0.0935 Epoch: 0 Global Step: 8240 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:32:10,318-Speed 3035.06 samples/sec Loss 17.7821 LearningRate 0.0935 Epoch: 0 Global Step: 8250 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:32:13,608-Speed 3113.33 samples/sec Loss 17.8891 LearningRate 0.0935 Epoch: 0 Global Step: 8260 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:32:16,975-Speed 3042.73 samples/sec Loss 18.0742 LearningRate 0.0935 Epoch: 0 Global Step: 8270 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:32:20,332-Speed 3051.40 samples/sec Loss 17.7631 LearningRate 0.0934 Epoch: 0 Global Step: 8280 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:32:23,640-Speed 3095.46 samples/sec Loss 17.7534 LearningRate 0.0934 Epoch: 0 Global Step: 8290 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:32:26,924-Speed 3119.94 samples/sec Loss 17.9339 LearningRate 0.0934 Epoch: 0 Global Step: 8300 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-04-27 02:32:30,182-Speed 3143.90 samples/sec Loss 17.8871 LearningRate 0.0934 Epoch: 0 Global Step: 8310 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-04-27 02:32:33,468-Speed 3116.40 samples/sec Loss 17.7501 LearningRate 0.0934 Epoch: 0 Global Step: 8320 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-04-27 02:32:36,799-Speed 3075.66 samples/sec Loss 17.5689 LearningRate 0.0934 Epoch: 0 Global Step: 8330 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-04-27 02:32:40,096-Speed 3106.42 samples/sec Loss 17.7876 LearningRate 0.0934 Epoch: 0 Global Step: 8340 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-04-27 02:32:43,399-Speed 3101.47 samples/sec Loss 17.8739 LearningRate 0.0934 Epoch: 0 Global Step: 8350 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-04-27 02:32:46,764-Speed 3043.39 samples/sec Loss 17.8030 LearningRate 0.0934 Epoch: 0 Global Step: 8360 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-04-27 02:32:50,074-Speed 3094.63 samples/sec Loss 17.7812 LearningRate 0.0934 Epoch: 0 Global Step: 8370 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-04-27 02:32:53,372-Speed 3105.89 samples/sec Loss 17.7957 LearningRate 0.0934 Epoch: 0 Global Step: 8380 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-04-27 02:32:56,769-Speed 3015.43 samples/sec Loss 18.0063 LearningRate 0.0934 Epoch: 0 Global Step: 8390 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-04-27 02:33:00,087-Speed 3086.74 samples/sec Loss 17.8203 LearningRate 0.0934 Epoch: 0 Global Step: 8400 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:33:03,376-Speed 3115.26 samples/sec Loss 17.8233 LearningRate 0.0933 Epoch: 0 Global Step: 8410 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:33:06,711-Speed 3070.71 samples/sec Loss 17.9071 LearningRate 0.0933 Epoch: 0 Global Step: 8420 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:33:10,071-Speed 3048.80 samples/sec Loss 17.6880 LearningRate 0.0933 Epoch: 0 Global Step: 8430 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:33:13,440-Speed 3040.21 samples/sec Loss 17.7228 LearningRate 0.0933 Epoch: 0 Global Step: 8440 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:33:16,770-Speed 3075.62 samples/sec Loss 17.7492 LearningRate 0.0933 Epoch: 0 Global Step: 8450 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:33:20,029-Speed 3142.79 samples/sec Loss 17.6631 LearningRate 0.0933 Epoch: 0 Global Step: 8460 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:33:23,388-Speed 3049.78 samples/sec Loss 17.6342 LearningRate 0.0933 Epoch: 0 Global Step: 8470 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:33:26,758-Speed 3042.26 samples/sec Loss 17.7281 LearningRate 0.0933 Epoch: 0 Global Step: 8480 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:33:30,102-Speed 3063.07 samples/sec Loss 17.5854 LearningRate 0.0933 Epoch: 0 Global Step: 8490 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:33:33,434-Speed 3074.62 samples/sec Loss 17.7721 LearningRate 0.0933 Epoch: 0 Global Step: 8500 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:33:36,759-Speed 3079.93 samples/sec Loss 17.8680 LearningRate 0.0933 Epoch: 0 Global Step: 8510 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:33:40,056-Speed 3107.64 samples/sec Loss 17.6757 LearningRate 0.0933 Epoch: 0 Global Step: 8520 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:33:43,384-Speed 3078.31 samples/sec Loss 17.8306 LearningRate 0.0933 Epoch: 0 Global Step: 8530 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:33:46,693-Speed 3095.47 samples/sec Loss 17.8835 LearningRate 0.0932 Epoch: 0 Global Step: 8540 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:33:49,964-Speed 3130.73 samples/sec Loss 17.6950 LearningRate 0.0932 Epoch: 0 Global Step: 8550 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:33:53,290-Speed 3079.98 samples/sec Loss 17.8467 LearningRate 0.0932 Epoch: 0 Global Step: 8560 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:33:56,573-Speed 3122.22 samples/sec Loss 17.6042 LearningRate 0.0932 Epoch: 0 Global Step: 8570 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:33:59,855-Speed 3121.23 samples/sec Loss 17.6514 LearningRate 0.0932 Epoch: 0 Global Step: 8580 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:34:03,151-Speed 3107.92 samples/sec Loss 17.8674 LearningRate 0.0932 Epoch: 0 Global Step: 8590 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:34:06,482-Speed 3075.06 samples/sec Loss 17.6041 LearningRate 0.0932 Epoch: 0 Global Step: 8600 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:34:09,791-Speed 3095.17 samples/sec Loss 17.6631 LearningRate 0.0932 Epoch: 0 Global Step: 8610 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:34:13,122-Speed 3075.11 samples/sec Loss 17.6040 LearningRate 0.0932 Epoch: 0 Global Step: 8620 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:34:16,407-Speed 3118.31 samples/sec Loss 17.5257 LearningRate 0.0932 Epoch: 0 Global Step: 8630 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:34:19,733-Speed 3079.27 samples/sec Loss 17.7188 LearningRate 0.0932 Epoch: 0 Global Step: 8640 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:34:23,024-Speed 3112.60 samples/sec Loss 17.5425 LearningRate 0.0932 Epoch: 0 Global Step: 8650 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:34:26,304-Speed 3122.82 samples/sec Loss 17.6552 LearningRate 0.0931 Epoch: 0 Global Step: 8660 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:34:29,582-Speed 3125.08 samples/sec Loss 17.5445 LearningRate 0.0931 Epoch: 0 Global Step: 8670 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:34:32,944-Speed 3046.28 samples/sec Loss 17.6634 LearningRate 0.0931 Epoch: 0 Global Step: 8680 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:34:36,306-Speed 3047.13 samples/sec Loss 17.7452 LearningRate 0.0931 Epoch: 0 Global Step: 8690 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:34:39,635-Speed 3076.83 samples/sec Loss 17.5910 LearningRate 0.0931 Epoch: 0 Global Step: 8700 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:34:42,948-Speed 3092.04 samples/sec Loss 17.5204 LearningRate 0.0931 Epoch: 0 Global Step: 8710 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:34:46,334-Speed 3025.21 samples/sec Loss 17.5473 LearningRate 0.0931 Epoch: 0 Global Step: 8720 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:34:49,721-Speed 3024.29 samples/sec Loss 17.5160 LearningRate 0.0931 Epoch: 0 Global Step: 8730 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:34:53,054-Speed 3073.20 samples/sec Loss 17.5651 LearningRate 0.0931 Epoch: 0 Global Step: 8740 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:34:56,424-Speed 3039.09 samples/sec Loss 17.6578 LearningRate 0.0931 Epoch: 0 Global Step: 8750 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:34:59,793-Speed 3040.44 samples/sec Loss 17.5162 LearningRate 0.0931 Epoch: 0 Global Step: 8760 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:35:03,110-Speed 3088.16 samples/sec Loss 17.5835 LearningRate 0.0931 Epoch: 0 Global Step: 8770 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:35:06,481-Speed 3038.43 samples/sec Loss 17.3657 LearningRate 0.0931 Epoch: 0 Global Step: 8780 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:35:09,794-Speed 3092.38 samples/sec Loss 17.4742 LearningRate 0.0930 Epoch: 0 Global Step: 8790 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:35:13,142-Speed 3058.91 samples/sec Loss 17.4236 LearningRate 0.0930 Epoch: 0 Global Step: 8800 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:35:16,410-Speed 3134.73 samples/sec Loss 17.4190 LearningRate 0.0930 Epoch: 0 Global Step: 8810 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:35:19,738-Speed 3077.69 samples/sec Loss 17.5548 LearningRate 0.0930 Epoch: 0 Global Step: 8820 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:35:23,018-Speed 3122.45 samples/sec Loss 17.6469 LearningRate 0.0930 Epoch: 0 Global Step: 8830 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:35:26,368-Speed 3058.27 samples/sec Loss 17.4753 LearningRate 0.0930 Epoch: 0 Global Step: 8840 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:35:29,689-Speed 3083.86 samples/sec Loss 17.5945 LearningRate 0.0930 Epoch: 0 Global Step: 8850 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:35:33,026-Speed 3069.59 samples/sec Loss 17.5903 LearningRate 0.0930 Epoch: 0 Global Step: 8860 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:35:36,438-Speed 3002.41 samples/sec Loss 17.5442 LearningRate 0.0930 Epoch: 0 Global Step: 8870 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:35:39,794-Speed 3051.93 samples/sec Loss 17.3629 LearningRate 0.0930 Epoch: 0 Global Step: 8880 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:35:43,166-Speed 3037.02 samples/sec Loss 17.3853 LearningRate 0.0930 Epoch: 0 Global Step: 8890 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:35:46,539-Speed 3037.57 samples/sec Loss 17.6210 LearningRate 0.0930 Epoch: 0 Global Step: 8900 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:35:49,798-Speed 3142.95 samples/sec Loss 17.4884 LearningRate 0.0930 Epoch: 0 Global Step: 8910 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:35:53,141-Speed 3064.25 samples/sec Loss 17.5149 LearningRate 0.0929 Epoch: 0 Global Step: 8920 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:35:56,429-Speed 3115.36 samples/sec Loss 17.4718 LearningRate 0.0929 Epoch: 0 Global Step: 8930 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:35:59,701-Speed 3130.54 samples/sec Loss 17.4764 LearningRate 0.0929 Epoch: 0 Global Step: 8940 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:36:03,060-Speed 3049.25 samples/sec Loss 17.4505 LearningRate 0.0929 Epoch: 0 Global Step: 8950 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:36:06,369-Speed 3095.62 samples/sec Loss 17.3235 LearningRate 0.0929 Epoch: 0 Global Step: 8960 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:36:09,767-Speed 3014.86 samples/sec Loss 17.3374 LearningRate 0.0929 Epoch: 0 Global Step: 8970 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:36:13,092-Speed 3080.29 samples/sec Loss 17.5221 LearningRate 0.0929 Epoch: 0 Global Step: 8980 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:36:16,406-Speed 3092.40 samples/sec Loss 17.5774 LearningRate 0.0929 Epoch: 0 Global Step: 8990 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:36:19,717-Speed 3093.88 samples/sec Loss 17.4273 LearningRate 0.0929 Epoch: 0 Global Step: 9000 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:36:22,996-Speed 3123.71 samples/sec Loss 17.5751 LearningRate 0.0929 Epoch: 0 Global Step: 9010 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-04-27 02:36:26,280-Speed 3119.12 samples/sec Loss 17.5250 LearningRate 0.0929 Epoch: 0 Global Step: 9020 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:36:29,684-Speed 3009.03 samples/sec Loss 17.3668 LearningRate 0.0929 Epoch: 0 Global Step: 9030 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:36:32,992-Speed 3096.45 samples/sec Loss 17.4046 LearningRate 0.0929 Epoch: 0 Global Step: 9040 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:36:36,250-Speed 3143.62 samples/sec Loss 17.2880 LearningRate 0.0928 Epoch: 0 Global Step: 9050 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:36:39,534-Speed 3119.38 samples/sec Loss 17.1728 LearningRate 0.0928 Epoch: 0 Global Step: 9060 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:36:42,882-Speed 3059.00 samples/sec Loss 17.2746 LearningRate 0.0928 Epoch: 0 Global Step: 9070 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:36:46,196-Speed 3091.40 samples/sec Loss 17.3934 LearningRate 0.0928 Epoch: 0 Global Step: 9080 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:36:49,592-Speed 3016.44 samples/sec Loss 17.4262 LearningRate 0.0928 Epoch: 0 Global Step: 9090 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:36:52,936-Speed 3062.67 samples/sec Loss 17.3712 LearningRate 0.0928 Epoch: 0 Global Step: 9100 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:36:56,293-Speed 3051.04 samples/sec Loss 17.5027 LearningRate 0.0928 Epoch: 0 Global Step: 9110 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:36:59,690-Speed 3015.13 samples/sec Loss 17.3492 LearningRate 0.0928 Epoch: 0 Global Step: 9120 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:37:03,046-Speed 3052.74 samples/sec Loss 17.4759 LearningRate 0.0928 Epoch: 0 Global Step: 9130 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:37:06,367-Speed 3084.13 samples/sec Loss 17.3353 LearningRate 0.0928 Epoch: 0 Global Step: 9140 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:37:09,668-Speed 3102.50 samples/sec Loss 17.3069 LearningRate 0.0928 Epoch: 0 Global Step: 9150 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:37:12,953-Speed 3118.78 samples/sec Loss 17.1643 LearningRate 0.0928 Epoch: 0 Global Step: 9160 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:37:16,312-Speed 3049.51 samples/sec Loss 17.3288 LearningRate 0.0928 Epoch: 0 Global Step: 9170 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:37:19,596-Speed 3118.52 samples/sec Loss 17.2242 LearningRate 0.0927 Epoch: 0 Global Step: 9180 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:37:22,949-Speed 3055.27 samples/sec Loss 17.3271 LearningRate 0.0927 Epoch: 0 Global Step: 9190 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:37:26,296-Speed 3060.52 samples/sec Loss 17.4150 LearningRate 0.0927 Epoch: 0 Global Step: 9200 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:37:29,574-Speed 3124.35 samples/sec Loss 17.2778 LearningRate 0.0927 Epoch: 0 Global Step: 9210 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:37:32,888-Speed 3090.63 samples/sec Loss 17.3240 LearningRate 0.0927 Epoch: 0 Global Step: 9220 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-04-27 02:37:36,194-Speed 3098.38 samples/sec Loss 17.4122 LearningRate 0.0927 Epoch: 0 Global Step: 9230 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:37:39,491-Speed 3107.42 samples/sec Loss 17.2904 LearningRate 0.0927 Epoch: 0 Global Step: 9240 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:37:42,836-Speed 3061.90 samples/sec Loss 17.3413 LearningRate 0.0927 Epoch: 0 Global Step: 9250 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:37:46,143-Speed 3097.58 samples/sec Loss 17.0539 LearningRate 0.0927 Epoch: 0 Global Step: 9260 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:37:49,513-Speed 3039.60 samples/sec Loss 17.2606 LearningRate 0.0927 Epoch: 0 Global Step: 9270 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:37:52,851-Speed 3068.10 samples/sec Loss 17.2243 LearningRate 0.0927 Epoch: 0 Global Step: 9280 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:37:56,141-Speed 3113.94 samples/sec Loss 17.4753 LearningRate 0.0927 Epoch: 0 Global Step: 9290 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:37:59,528-Speed 3024.09 samples/sec Loss 17.1644 LearningRate 0.0927 Epoch: 0 Global Step: 9300 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:38:02,860-Speed 3074.57 samples/sec Loss 17.2562 LearningRate 0.0926 Epoch: 0 Global Step: 9310 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:38:06,152-Speed 3111.44 samples/sec Loss 17.2321 LearningRate 0.0926 Epoch: 0 Global Step: 9320 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:38:09,431-Speed 3123.67 samples/sec Loss 17.2158 LearningRate 0.0926 Epoch: 0 Global Step: 9330 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-04-27 02:38:12,740-Speed 3095.75 samples/sec Loss 17.2220 LearningRate 0.0926 Epoch: 0 Global Step: 9340 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:38:16,046-Speed 3099.27 samples/sec Loss 17.3067 LearningRate 0.0926 Epoch: 0 Global Step: 9350 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-04-27 02:38:19,404-Speed 3050.54 samples/sec Loss 17.1622 LearningRate 0.0926 Epoch: 0 Global Step: 9360 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-04-27 02:38:22,709-Speed 3099.30 samples/sec Loss 17.3922 LearningRate 0.0926 Epoch: 0 Global Step: 9370 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-04-27 02:38:26,022-Speed 3091.36 samples/sec Loss 17.2865 LearningRate 0.0926 Epoch: 0 Global Step: 9380 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-04-27 02:38:29,404-Speed 3028.92 samples/sec Loss 17.2986 LearningRate 0.0926 Epoch: 0 Global Step: 9390 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-04-27 02:38:32,721-Speed 3088.12 samples/sec Loss 17.2783 LearningRate 0.0926 Epoch: 0 Global Step: 9400 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-04-27 02:38:36,148-Speed 2988.77 samples/sec Loss 17.2584 LearningRate 0.0926 Epoch: 0 Global Step: 9410 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-04-27 02:38:39,523-Speed 3035.27 samples/sec Loss 17.2808 LearningRate 0.0926 Epoch: 0 Global Step: 9420 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-04-27 02:38:42,816-Speed 3110.65 samples/sec Loss 17.3021 LearningRate 0.0926 Epoch: 0 Global Step: 9430 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-04-27 02:38:46,116-Speed 3103.46 samples/sec Loss 17.1773 LearningRate 0.0925 Epoch: 0 Global Step: 9440 Fp16 Grad Scale: 32768 Required: 22 hours Training: 2022-04-27 02:38:49,481-Speed 3043.60 samples/sec Loss 17.2579 LearningRate 0.0925 Epoch: 0 Global Step: 9450 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:38:52,782-Speed 3103.22 samples/sec Loss 17.2416 LearningRate 0.0925 Epoch: 0 Global Step: 9460 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:38:56,090-Speed 3096.78 samples/sec Loss 17.4052 LearningRate 0.0925 Epoch: 0 Global Step: 9470 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:38:59,410-Speed 3084.92 samples/sec Loss 17.3125 LearningRate 0.0925 Epoch: 0 Global Step: 9480 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:39:02,732-Speed 3083.19 samples/sec Loss 17.0603 LearningRate 0.0925 Epoch: 0 Global Step: 9490 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:39:06,082-Speed 3057.50 samples/sec Loss 17.0298 LearningRate 0.0925 Epoch: 0 Global Step: 9500 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:39:09,440-Speed 3050.80 samples/sec Loss 17.1837 LearningRate 0.0925 Epoch: 0 Global Step: 9510 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:39:12,807-Speed 3042.06 samples/sec Loss 17.0961 LearningRate 0.0925 Epoch: 0 Global Step: 9520 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:39:16,071-Speed 3138.61 samples/sec Loss 17.1638 LearningRate 0.0925 Epoch: 0 Global Step: 9530 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:39:19,425-Speed 3053.22 samples/sec Loss 17.1772 LearningRate 0.0925 Epoch: 0 Global Step: 9540 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:39:22,818-Speed 3019.05 samples/sec Loss 17.1347 LearningRate 0.0925 Epoch: 0 Global Step: 9550 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:39:26,165-Speed 3060.85 samples/sec Loss 17.2832 LearningRate 0.0925 Epoch: 0 Global Step: 9560 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:39:29,487-Speed 3083.00 samples/sec Loss 17.0081 LearningRate 0.0924 Epoch: 0 Global Step: 9570 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:39:32,807-Speed 3084.73 samples/sec Loss 17.2992 LearningRate 0.0924 Epoch: 0 Global Step: 9580 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:39:36,074-Speed 3135.54 samples/sec Loss 17.2285 LearningRate 0.0924 Epoch: 0 Global Step: 9590 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:39:39,471-Speed 3015.69 samples/sec Loss 17.2818 LearningRate 0.0924 Epoch: 0 Global Step: 9600 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:39:42,789-Speed 3086.76 samples/sec Loss 17.3021 LearningRate 0.0924 Epoch: 0 Global Step: 9610 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:39:46,197-Speed 3005.58 samples/sec Loss 17.2346 LearningRate 0.0924 Epoch: 0 Global Step: 9620 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:39:49,495-Speed 3106.24 samples/sec Loss 17.1571 LearningRate 0.0924 Epoch: 0 Global Step: 9630 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:39:52,816-Speed 3084.57 samples/sec Loss 17.2436 LearningRate 0.0924 Epoch: 0 Global Step: 9640 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:39:56,141-Speed 3080.59 samples/sec Loss 17.1263 LearningRate 0.0924 Epoch: 0 Global Step: 9650 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:39:59,444-Speed 3101.54 samples/sec Loss 17.1399 LearningRate 0.0924 Epoch: 0 Global Step: 9660 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:40:02,752-Speed 3096.01 samples/sec Loss 17.0466 LearningRate 0.0924 Epoch: 0 Global Step: 9670 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:40:06,082-Speed 3076.07 samples/sec Loss 17.0773 LearningRate 0.0924 Epoch: 0 Global Step: 9680 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:40:09,444-Speed 3047.23 samples/sec Loss 17.1119 LearningRate 0.0924 Epoch: 0 Global Step: 9690 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:40:12,757-Speed 3091.15 samples/sec Loss 16.9363 LearningRate 0.0923 Epoch: 0 Global Step: 9700 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:40:16,112-Speed 3053.25 samples/sec Loss 17.0753 LearningRate 0.0923 Epoch: 0 Global Step: 9710 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:40:19,516-Speed 3009.48 samples/sec Loss 17.0686 LearningRate 0.0923 Epoch: 0 Global Step: 9720 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:40:22,851-Speed 3070.56 samples/sec Loss 17.1548 LearningRate 0.0923 Epoch: 0 Global Step: 9730 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:40:26,192-Speed 3066.56 samples/sec Loss 17.0724 LearningRate 0.0923 Epoch: 0 Global Step: 9740 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:40:29,491-Speed 3104.60 samples/sec Loss 17.0646 LearningRate 0.0923 Epoch: 0 Global Step: 9750 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-04-27 02:40:32,849-Speed 3050.39 samples/sec Loss 17.3209 LearningRate 0.0923 Epoch: 0 Global Step: 9760 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:40:36,150-Speed 3103.32 samples/sec Loss 17.0678 LearningRate 0.0923 Epoch: 0 Global Step: 9770 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:40:39,445-Speed 3108.86 samples/sec Loss 17.1809 LearningRate 0.0923 Epoch: 0 Global Step: 9780 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:40:42,814-Speed 3039.88 samples/sec Loss 17.1011 LearningRate 0.0923 Epoch: 0 Global Step: 9790 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:40:46,191-Speed 3033.37 samples/sec Loss 17.1482 LearningRate 0.0923 Epoch: 0 Global Step: 9800 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:40:49,468-Speed 3126.28 samples/sec Loss 17.2073 LearningRate 0.0923 Epoch: 0 Global Step: 9810 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:40:52,767-Speed 3105.04 samples/sec Loss 17.0879 LearningRate 0.0923 Epoch: 0 Global Step: 9820 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:40:56,160-Speed 3018.61 samples/sec Loss 17.2228 LearningRate 0.0922 Epoch: 0 Global Step: 9830 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:40:59,510-Speed 3057.79 samples/sec Loss 17.2557 LearningRate 0.0922 Epoch: 0 Global Step: 9840 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:41:02,832-Speed 3082.87 samples/sec Loss 17.0539 LearningRate 0.0922 Epoch: 0 Global Step: 9850 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:41:06,192-Speed 3049.11 samples/sec Loss 17.1212 LearningRate 0.0922 Epoch: 0 Global Step: 9860 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:41:09,524-Speed 3073.76 samples/sec Loss 17.0799 LearningRate 0.0922 Epoch: 0 Global Step: 9870 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:41:12,878-Speed 3053.95 samples/sec Loss 16.9762 LearningRate 0.0922 Epoch: 0 Global Step: 9880 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:41:16,207-Speed 3077.28 samples/sec Loss 17.0514 LearningRate 0.0922 Epoch: 0 Global Step: 9890 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:41:19,520-Speed 3091.76 samples/sec Loss 16.9554 LearningRate 0.0922 Epoch: 0 Global Step: 9900 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:41:22,829-Speed 3095.39 samples/sec Loss 17.0122 LearningRate 0.0922 Epoch: 0 Global Step: 9910 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:41:26,184-Speed 3053.13 samples/sec Loss 16.8065 LearningRate 0.0922 Epoch: 0 Global Step: 9920 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:41:29,554-Speed 3038.94 samples/sec Loss 17.1797 LearningRate 0.0922 Epoch: 0 Global Step: 9930 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:41:32,891-Speed 3069.92 samples/sec Loss 16.9584 LearningRate 0.0922 Epoch: 0 Global Step: 9940 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:41:36,272-Speed 3029.79 samples/sec Loss 17.1549 LearningRate 0.0921 Epoch: 0 Global Step: 9950 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:41:39,608-Speed 3070.12 samples/sec Loss 17.0574 LearningRate 0.0921 Epoch: 0 Global Step: 9960 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:41:42,911-Speed 3100.88 samples/sec Loss 17.1059 LearningRate 0.0921 Epoch: 0 Global Step: 9970 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:41:46,268-Speed 3052.03 samples/sec Loss 16.9694 LearningRate 0.0921 Epoch: 0 Global Step: 9980 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:41:49,577-Speed 3094.84 samples/sec Loss 16.9030 LearningRate 0.0921 Epoch: 0 Global Step: 9990 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:41:52,915-Speed 3069.60 samples/sec Loss 17.1578 LearningRate 0.0921 Epoch: 0 Global Step: 10000 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:41:56,206-Speed 3112.30 samples/sec Loss 16.8925 LearningRate 0.0921 Epoch: 0 Global Step: 10010 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:41:59,523-Speed 3088.72 samples/sec Loss 17.0468 LearningRate 0.0921 Epoch: 0 Global Step: 10020 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:42:02,794-Speed 3131.88 samples/sec Loss 17.0098 LearningRate 0.0921 Epoch: 0 Global Step: 10030 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:42:06,062-Speed 3133.86 samples/sec Loss 16.8644 LearningRate 0.0921 Epoch: 0 Global Step: 10040 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:42:09,360-Speed 3106.44 samples/sec Loss 17.0388 LearningRate 0.0921 Epoch: 0 Global Step: 10050 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:42:12,704-Speed 3063.03 samples/sec Loss 16.9868 LearningRate 0.0921 Epoch: 0 Global Step: 10060 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:42:16,029-Speed 3080.71 samples/sec Loss 16.8921 LearningRate 0.0921 Epoch: 0 Global Step: 10070 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:42:19,335-Speed 3098.45 samples/sec Loss 17.0205 LearningRate 0.0920 Epoch: 0 Global Step: 10080 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:42:22,636-Speed 3102.81 samples/sec Loss 16.9533 LearningRate 0.0920 Epoch: 0 Global Step: 10090 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:42:25,952-Speed 3089.30 samples/sec Loss 17.0867 LearningRate 0.0920 Epoch: 0 Global Step: 10100 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:42:29,266-Speed 3091.26 samples/sec Loss 17.1760 LearningRate 0.0920 Epoch: 0 Global Step: 10110 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:42:32,596-Speed 3075.77 samples/sec Loss 16.8928 LearningRate 0.0920 Epoch: 0 Global Step: 10120 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:42:35,967-Speed 3038.85 samples/sec Loss 16.9908 LearningRate 0.0920 Epoch: 0 Global Step: 10130 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:42:39,314-Speed 3060.47 samples/sec Loss 16.8698 LearningRate 0.0920 Epoch: 0 Global Step: 10140 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:42:42,670-Speed 3051.79 samples/sec Loss 16.9795 LearningRate 0.0920 Epoch: 0 Global Step: 10150 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:42:45,992-Speed 3083.74 samples/sec Loss 17.0486 LearningRate 0.0920 Epoch: 0 Global Step: 10160 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-04-27 02:42:49,256-Speed 3137.99 samples/sec Loss 16.9037 LearningRate 0.0920 Epoch: 0 Global Step: 10170 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:42:52,608-Speed 3055.22 samples/sec Loss 16.8647 LearningRate 0.0920 Epoch: 0 Global Step: 10180 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:42:55,942-Speed 3072.35 samples/sec Loss 16.9982 LearningRate 0.0920 Epoch: 0 Global Step: 10190 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:42:59,252-Speed 3094.85 samples/sec Loss 17.0433 LearningRate 0.0920 Epoch: 0 Global Step: 10200 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:43:02,539-Speed 3116.09 samples/sec Loss 17.0382 LearningRate 0.0919 Epoch: 0 Global Step: 10210 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:43:05,856-Speed 3088.25 samples/sec Loss 17.0364 LearningRate 0.0919 Epoch: 0 Global Step: 10220 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:43:09,230-Speed 3035.67 samples/sec Loss 16.8776 LearningRate 0.0919 Epoch: 0 Global Step: 10230 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:43:12,512-Speed 3120.82 samples/sec Loss 16.9949 LearningRate 0.0919 Epoch: 0 Global Step: 10240 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:43:15,810-Speed 3106.44 samples/sec Loss 16.7922 LearningRate 0.0919 Epoch: 0 Global Step: 10250 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:43:19,078-Speed 3134.21 samples/sec Loss 16.8799 LearningRate 0.0919 Epoch: 0 Global Step: 10260 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:43:22,332-Speed 3147.56 samples/sec Loss 16.7737 LearningRate 0.0919 Epoch: 0 Global Step: 10270 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:43:25,589-Speed 3146.07 samples/sec Loss 16.7983 LearningRate 0.0919 Epoch: 0 Global Step: 10280 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:43:28,908-Speed 3085.85 samples/sec Loss 16.7373 LearningRate 0.0919 Epoch: 0 Global Step: 10290 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:43:32,203-Speed 3108.93 samples/sec Loss 16.9649 LearningRate 0.0919 Epoch: 0 Global Step: 10300 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:43:35,498-Speed 3108.79 samples/sec Loss 16.7647 LearningRate 0.0919 Epoch: 0 Global Step: 10310 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:43:38,768-Speed 3131.92 samples/sec Loss 16.8457 LearningRate 0.0919 Epoch: 0 Global Step: 10320 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:43:42,052-Speed 3119.51 samples/sec Loss 16.8785 LearningRate 0.0919 Epoch: 0 Global Step: 10330 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:43:45,387-Speed 3073.45 samples/sec Loss 16.9440 LearningRate 0.0918 Epoch: 0 Global Step: 10340 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:43:48,739-Speed 3055.59 samples/sec Loss 16.8669 LearningRate 0.0918 Epoch: 0 Global Step: 10350 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:43:52,072-Speed 3073.22 samples/sec Loss 16.9352 LearningRate 0.0918 Epoch: 0 Global Step: 10360 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:43:55,377-Speed 3099.18 samples/sec Loss 16.7513 LearningRate 0.0918 Epoch: 0 Global Step: 10370 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:43:58,670-Speed 3111.25 samples/sec Loss 16.8463 LearningRate 0.0918 Epoch: 0 Global Step: 10380 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:44:01,990-Speed 3085.08 samples/sec Loss 16.8381 LearningRate 0.0918 Epoch: 0 Global Step: 10390 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:44:05,274-Speed 3119.40 samples/sec Loss 16.7063 LearningRate 0.0918 Epoch: 0 Global Step: 10400 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:44:08,647-Speed 3037.12 samples/sec Loss 16.7585 LearningRate 0.0918 Epoch: 0 Global Step: 10410 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:44:12,021-Speed 3035.76 samples/sec Loss 16.8905 LearningRate 0.0918 Epoch: 0 Global Step: 10420 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:44:15,400-Speed 3031.56 samples/sec Loss 16.8120 LearningRate 0.0918 Epoch: 0 Global Step: 10430 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:44:18,731-Speed 3075.21 samples/sec Loss 16.8998 LearningRate 0.0918 Epoch: 0 Global Step: 10440 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:44:22,005-Speed 3128.63 samples/sec Loss 16.7908 LearningRate 0.0918 Epoch: 0 Global Step: 10450 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:44:25,370-Speed 3044.28 samples/sec Loss 16.7161 LearningRate 0.0918 Epoch: 0 Global Step: 10460 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:44:28,704-Speed 3072.36 samples/sec Loss 16.9426 LearningRate 0.0917 Epoch: 0 Global Step: 10470 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:44:32,118-Speed 3000.22 samples/sec Loss 16.9452 LearningRate 0.0917 Epoch: 0 Global Step: 10480 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:44:35,405-Speed 3116.51 samples/sec Loss 16.9470 LearningRate 0.0917 Epoch: 0 Global Step: 10490 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:44:38,653-Speed 3152.92 samples/sec Loss 16.8677 LearningRate 0.0917 Epoch: 0 Global Step: 10500 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:44:41,983-Speed 3076.69 samples/sec Loss 16.8201 LearningRate 0.0917 Epoch: 0 Global Step: 10510 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:44:45,334-Speed 3056.37 samples/sec Loss 16.6929 LearningRate 0.0917 Epoch: 0 Global Step: 10520 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:44:48,679-Speed 3061.59 samples/sec Loss 16.6830 LearningRate 0.0917 Epoch: 0 Global Step: 10530 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:44:52,008-Speed 3077.13 samples/sec Loss 16.8600 LearningRate 0.0917 Epoch: 0 Global Step: 10540 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:44:55,386-Speed 3032.39 samples/sec Loss 16.7128 LearningRate 0.0917 Epoch: 0 Global Step: 10550 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:44:58,760-Speed 3036.28 samples/sec Loss 16.8211 LearningRate 0.0917 Epoch: 0 Global Step: 10560 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:45:02,090-Speed 3075.76 samples/sec Loss 16.8903 LearningRate 0.0917 Epoch: 0 Global Step: 10570 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:45:05,397-Speed 3097.70 samples/sec Loss 16.7409 LearningRate 0.0917 Epoch: 0 Global Step: 10580 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:45:08,684-Speed 3115.88 samples/sec Loss 16.6223 LearningRate 0.0917 Epoch: 0 Global Step: 10590 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:45:12,007-Speed 3082.64 samples/sec Loss 16.7094 LearningRate 0.0916 Epoch: 0 Global Step: 10600 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:45:15,404-Speed 3015.62 samples/sec Loss 16.7530 LearningRate 0.0916 Epoch: 0 Global Step: 10610 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:45:18,781-Speed 3033.33 samples/sec Loss 16.7522 LearningRate 0.0916 Epoch: 0 Global Step: 10620 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:45:22,122-Speed 3065.98 samples/sec Loss 16.7940 LearningRate 0.0916 Epoch: 0 Global Step: 10630 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:45:25,481-Speed 3049.57 samples/sec Loss 16.6685 LearningRate 0.0916 Epoch: 0 Global Step: 10640 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:45:28,866-Speed 3025.53 samples/sec Loss 16.9121 LearningRate 0.0916 Epoch: 0 Global Step: 10650 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:45:32,240-Speed 3035.93 samples/sec Loss 16.7570 LearningRate 0.0916 Epoch: 0 Global Step: 10660 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:45:35,595-Speed 3052.60 samples/sec Loss 16.6403 LearningRate 0.0916 Epoch: 0 Global Step: 10670 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:45:38,920-Speed 3081.50 samples/sec Loss 16.6360 LearningRate 0.0916 Epoch: 0 Global Step: 10680 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:45:42,243-Speed 3081.78 samples/sec Loss 16.7850 LearningRate 0.0916 Epoch: 0 Global Step: 10690 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:45:45,545-Speed 3102.81 samples/sec Loss 16.7065 LearningRate 0.0916 Epoch: 0 Global Step: 10700 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:45:48,917-Speed 3037.27 samples/sec Loss 16.9738 LearningRate 0.0916 Epoch: 0 Global Step: 10710 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:45:52,245-Speed 3077.69 samples/sec Loss 16.6804 LearningRate 0.0916 Epoch: 0 Global Step: 10720 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:45:55,595-Speed 3057.85 samples/sec Loss 16.8791 LearningRate 0.0915 Epoch: 0 Global Step: 10730 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:45:58,968-Speed 3036.93 samples/sec Loss 16.5989 LearningRate 0.0915 Epoch: 0 Global Step: 10740 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:46:02,331-Speed 3045.87 samples/sec Loss 16.8175 LearningRate 0.0915 Epoch: 0 Global Step: 10750 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:46:05,638-Speed 3097.47 samples/sec Loss 16.5677 LearningRate 0.0915 Epoch: 0 Global Step: 10760 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:46:09,022-Speed 3026.49 samples/sec Loss 16.4066 LearningRate 0.0915 Epoch: 0 Global Step: 10770 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:46:12,330-Speed 3095.95 samples/sec Loss 16.8191 LearningRate 0.0915 Epoch: 0 Global Step: 10780 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-04-27 02:46:15,667-Speed 3070.12 samples/sec Loss 16.8569 LearningRate 0.0915 Epoch: 0 Global Step: 10790 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:46:19,076-Speed 3004.73 samples/sec Loss 16.6558 LearningRate 0.0915 Epoch: 0 Global Step: 10800 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:46:22,398-Speed 3082.94 samples/sec Loss 16.4935 LearningRate 0.0915 Epoch: 0 Global Step: 10810 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:46:25,655-Speed 3145.38 samples/sec Loss 16.7167 LearningRate 0.0915 Epoch: 0 Global Step: 10820 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:46:28,970-Speed 3089.44 samples/sec Loss 16.7032 LearningRate 0.0915 Epoch: 0 Global Step: 10830 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:46:32,261-Speed 3112.62 samples/sec Loss 16.7091 LearningRate 0.0915 Epoch: 0 Global Step: 10840 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:46:35,552-Speed 3112.84 samples/sec Loss 16.4892 LearningRate 0.0915 Epoch: 0 Global Step: 10850 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:46:38,852-Speed 3103.95 samples/sec Loss 16.7831 LearningRate 0.0914 Epoch: 0 Global Step: 10860 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:46:42,202-Speed 3057.40 samples/sec Loss 16.8078 LearningRate 0.0914 Epoch: 0 Global Step: 10870 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:46:45,559-Speed 3051.63 samples/sec Loss 16.6877 LearningRate 0.0914 Epoch: 0 Global Step: 10880 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:46:48,918-Speed 3049.27 samples/sec Loss 16.6148 LearningRate 0.0914 Epoch: 0 Global Step: 10890 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:46:52,188-Speed 3132.40 samples/sec Loss 16.6374 LearningRate 0.0914 Epoch: 0 Global Step: 10900 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:46:55,440-Speed 3149.55 samples/sec Loss 16.6191 LearningRate 0.0914 Epoch: 0 Global Step: 10910 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:46:58,729-Speed 3114.53 samples/sec Loss 16.6294 LearningRate 0.0914 Epoch: 0 Global Step: 10920 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:47:02,032-Speed 3101.40 samples/sec Loss 16.7405 LearningRate 0.0914 Epoch: 0 Global Step: 10930 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:47:05,345-Speed 3092.02 samples/sec Loss 16.7966 LearningRate 0.0914 Epoch: 0 Global Step: 10940 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:47:08,611-Speed 3137.41 samples/sec Loss 16.5467 LearningRate 0.0914 Epoch: 0 Global Step: 10950 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:47:11,912-Speed 3102.93 samples/sec Loss 16.6581 LearningRate 0.0914 Epoch: 0 Global Step: 10960 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:47:15,284-Speed 3037.61 samples/sec Loss 16.6520 LearningRate 0.0914 Epoch: 0 Global Step: 10970 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:47:18,662-Speed 3033.09 samples/sec Loss 16.6672 LearningRate 0.0914 Epoch: 0 Global Step: 10980 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:47:21,990-Speed 3077.49 samples/sec Loss 16.5655 LearningRate 0.0913 Epoch: 0 Global Step: 10990 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:47:25,330-Speed 3066.87 samples/sec Loss 16.6240 LearningRate 0.0913 Epoch: 0 Global Step: 11000 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:47:28,663-Speed 3073.55 samples/sec Loss 16.6023 LearningRate 0.0913 Epoch: 0 Global Step: 11010 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:47:31,992-Speed 3076.78 samples/sec Loss 16.5194 LearningRate 0.0913 Epoch: 0 Global Step: 11020 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:47:35,296-Speed 3100.81 samples/sec Loss 16.7039 LearningRate 0.0913 Epoch: 0 Global Step: 11030 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:47:38,634-Speed 3068.04 samples/sec Loss 16.6981 LearningRate 0.0913 Epoch: 0 Global Step: 11040 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:47:41,985-Speed 3057.11 samples/sec Loss 16.5740 LearningRate 0.0913 Epoch: 0 Global Step: 11050 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:47:45,283-Speed 3106.57 samples/sec Loss 16.6028 LearningRate 0.0913 Epoch: 0 Global Step: 11060 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:47:48,627-Speed 3062.81 samples/sec Loss 16.6027 LearningRate 0.0913 Epoch: 0 Global Step: 11070 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:47:52,026-Speed 3014.38 samples/sec Loss 16.5663 LearningRate 0.0913 Epoch: 0 Global Step: 11080 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:47:55,405-Speed 3030.52 samples/sec Loss 16.7705 LearningRate 0.0913 Epoch: 0 Global Step: 11090 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:47:58,730-Speed 3080.81 samples/sec Loss 16.5998 LearningRate 0.0913 Epoch: 0 Global Step: 11100 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:48:02,040-Speed 3094.91 samples/sec Loss 16.3868 LearningRate 0.0913 Epoch: 0 Global Step: 11110 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:48:05,343-Speed 3100.82 samples/sec Loss 16.6244 LearningRate 0.0912 Epoch: 0 Global Step: 11120 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-04-27 02:48:08,632-Speed 3114.41 samples/sec Loss 16.3685 LearningRate 0.0912 Epoch: 0 Global Step: 11130 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:48:11,939-Speed 3096.81 samples/sec Loss 16.5548 LearningRate 0.0912 Epoch: 0 Global Step: 11140 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:48:15,269-Speed 3076.93 samples/sec Loss 16.4868 LearningRate 0.0912 Epoch: 0 Global Step: 11150 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:48:18,564-Speed 3108.39 samples/sec Loss 16.5884 LearningRate 0.0912 Epoch: 0 Global Step: 11160 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:48:21,869-Speed 3100.56 samples/sec Loss 16.7002 LearningRate 0.0912 Epoch: 0 Global Step: 11170 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:48:25,275-Speed 3007.11 samples/sec Loss 16.8109 LearningRate 0.0912 Epoch: 0 Global Step: 11180 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:48:28,556-Speed 3122.00 samples/sec Loss 16.5306 LearningRate 0.0912 Epoch: 0 Global Step: 11190 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:48:31,941-Speed 3025.85 samples/sec Loss 16.7107 LearningRate 0.0912 Epoch: 0 Global Step: 11200 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:48:35,255-Speed 3090.24 samples/sec Loss 16.5106 LearningRate 0.0912 Epoch: 0 Global Step: 11210 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:48:38,508-Speed 3148.87 samples/sec Loss 16.4677 LearningRate 0.0912 Epoch: 0 Global Step: 11220 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:48:41,791-Speed 3119.94 samples/sec Loss 16.3701 LearningRate 0.0912 Epoch: 0 Global Step: 11230 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:48:45,127-Speed 3070.58 samples/sec Loss 16.4473 LearningRate 0.0912 Epoch: 0 Global Step: 11240 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:48:48,404-Speed 3125.66 samples/sec Loss 16.5262 LearningRate 0.0911 Epoch: 0 Global Step: 11250 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:48:51,678-Speed 3129.13 samples/sec Loss 16.7497 LearningRate 0.0911 Epoch: 0 Global Step: 11260 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:48:54,938-Speed 3141.67 samples/sec Loss 16.6490 LearningRate 0.0911 Epoch: 0 Global Step: 11270 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:48:58,251-Speed 3091.77 samples/sec Loss 16.6507 LearningRate 0.0911 Epoch: 0 Global Step: 11280 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:49:01,565-Speed 3090.53 samples/sec Loss 16.4487 LearningRate 0.0911 Epoch: 0 Global Step: 11290 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:49:04,818-Speed 3149.00 samples/sec Loss 16.5035 LearningRate 0.0911 Epoch: 0 Global Step: 11300 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:49:08,180-Speed 3046.53 samples/sec Loss 16.6468 LearningRate 0.0911 Epoch: 0 Global Step: 11310 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:49:11,479-Speed 3105.09 samples/sec Loss 16.5853 LearningRate 0.0911 Epoch: 0 Global Step: 11320 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:49:14,744-Speed 3137.57 samples/sec Loss 16.6617 LearningRate 0.0911 Epoch: 0 Global Step: 11330 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:49:18,047-Speed 3100.85 samples/sec Loss 16.6227 LearningRate 0.0911 Epoch: 0 Global Step: 11340 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:49:21,360-Speed 3091.63 samples/sec Loss 16.5954 LearningRate 0.0911 Epoch: 0 Global Step: 11350 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:49:24,702-Speed 3065.34 samples/sec Loss 16.5837 LearningRate 0.0911 Epoch: 0 Global Step: 11360 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:49:28,004-Speed 3101.57 samples/sec Loss 16.4805 LearningRate 0.0911 Epoch: 0 Global Step: 11370 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:49:31,372-Speed 3041.15 samples/sec Loss 16.4195 LearningRate 0.0910 Epoch: 0 Global Step: 11380 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:49:34,669-Speed 3107.33 samples/sec Loss 16.4717 LearningRate 0.0910 Epoch: 0 Global Step: 11390 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:49:38,033-Speed 3045.14 samples/sec Loss 16.3940 LearningRate 0.0910 Epoch: 0 Global Step: 11400 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:49:41,346-Speed 3092.13 samples/sec Loss 16.4208 LearningRate 0.0910 Epoch: 0 Global Step: 11410 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:49:44,688-Speed 3064.94 samples/sec Loss 16.5680 LearningRate 0.0910 Epoch: 0 Global Step: 11420 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:49:48,024-Speed 3070.29 samples/sec Loss 16.5615 LearningRate 0.0910 Epoch: 0 Global Step: 11430 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:49:51,277-Speed 3149.17 samples/sec Loss 16.3639 LearningRate 0.0910 Epoch: 0 Global Step: 11440 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:49:54,557-Speed 3122.59 samples/sec Loss 16.4942 LearningRate 0.0910 Epoch: 0 Global Step: 11450 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:49:57,854-Speed 3107.04 samples/sec Loss 16.3812 LearningRate 0.0910 Epoch: 0 Global Step: 11460 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:50:01,127-Speed 3129.22 samples/sec Loss 16.3440 LearningRate 0.0910 Epoch: 0 Global Step: 11470 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:50:04,424-Speed 3107.69 samples/sec Loss 16.2078 LearningRate 0.0910 Epoch: 0 Global Step: 11480 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:50:07,761-Speed 3069.54 samples/sec Loss 16.5060 LearningRate 0.0910 Epoch: 0 Global Step: 11490 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:50:11,061-Speed 3103.25 samples/sec Loss 16.3777 LearningRate 0.0910 Epoch: 0 Global Step: 11500 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:50:14,403-Speed 3065.01 samples/sec Loss 16.4632 LearningRate 0.0909 Epoch: 0 Global Step: 11510 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:50:17,673-Speed 3132.71 samples/sec Loss 16.3848 LearningRate 0.0909 Epoch: 0 Global Step: 11520 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:50:21,028-Speed 3052.74 samples/sec Loss 16.5104 LearningRate 0.0909 Epoch: 0 Global Step: 11530 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:50:24,393-Speed 3044.12 samples/sec Loss 16.3942 LearningRate 0.0909 Epoch: 0 Global Step: 11540 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:50:27,667-Speed 3128.26 samples/sec Loss 16.3609 LearningRate 0.0909 Epoch: 0 Global Step: 11550 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:50:31,000-Speed 3072.99 samples/sec Loss 16.3314 LearningRate 0.0909 Epoch: 0 Global Step: 11560 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:50:34,338-Speed 3069.01 samples/sec Loss 16.4701 LearningRate 0.0909 Epoch: 0 Global Step: 11570 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:50:37,588-Speed 3151.06 samples/sec Loss 16.3791 LearningRate 0.0909 Epoch: 0 Global Step: 11580 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:50:40,886-Speed 3105.74 samples/sec Loss 16.4381 LearningRate 0.0909 Epoch: 0 Global Step: 11590 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:50:44,206-Speed 3085.80 samples/sec Loss 16.4607 LearningRate 0.0909 Epoch: 0 Global Step: 11600 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:50:47,478-Speed 3129.41 samples/sec Loss 16.3916 LearningRate 0.0909 Epoch: 0 Global Step: 11610 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:50:50,794-Speed 3089.63 samples/sec Loss 16.3516 LearningRate 0.0909 Epoch: 0 Global Step: 11620 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:50:54,083-Speed 3114.21 samples/sec Loss 16.3277 LearningRate 0.0909 Epoch: 0 Global Step: 11630 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:50:57,366-Speed 3120.05 samples/sec Loss 16.5946 LearningRate 0.0908 Epoch: 0 Global Step: 11640 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:51:00,765-Speed 3013.21 samples/sec Loss 16.3211 LearningRate 0.0908 Epoch: 0 Global Step: 11650 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:51:04,066-Speed 3103.62 samples/sec Loss 16.4943 LearningRate 0.0908 Epoch: 0 Global Step: 11660 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:51:07,323-Speed 3144.55 samples/sec Loss 16.6607 LearningRate 0.0908 Epoch: 0 Global Step: 11670 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:51:10,679-Speed 3052.61 samples/sec Loss 16.4356 LearningRate 0.0908 Epoch: 0 Global Step: 11680 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:51:14,004-Speed 3080.92 samples/sec Loss 16.2835 LearningRate 0.0908 Epoch: 0 Global Step: 11690 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:51:17,283-Speed 3124.11 samples/sec Loss 16.5251 LearningRate 0.0908 Epoch: 0 Global Step: 11700 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:51:20,557-Speed 3129.41 samples/sec Loss 16.2367 LearningRate 0.0908 Epoch: 0 Global Step: 11710 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:51:23,813-Speed 3144.92 samples/sec Loss 16.4716 LearningRate 0.0908 Epoch: 0 Global Step: 11720 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:51:27,160-Speed 3060.68 samples/sec Loss 16.2839 LearningRate 0.0908 Epoch: 0 Global Step: 11730 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:51:30,471-Speed 3094.19 samples/sec Loss 16.3159 LearningRate 0.0908 Epoch: 0 Global Step: 11740 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:51:33,797-Speed 3079.53 samples/sec Loss 16.4093 LearningRate 0.0908 Epoch: 0 Global Step: 11750 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:51:37,100-Speed 3101.11 samples/sec Loss 16.4318 LearningRate 0.0908 Epoch: 0 Global Step: 11760 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:51:40,466-Speed 3042.66 samples/sec Loss 16.4872 LearningRate 0.0907 Epoch: 0 Global Step: 11770 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:51:43,744-Speed 3125.32 samples/sec Loss 16.3467 LearningRate 0.0907 Epoch: 0 Global Step: 11780 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:51:47,115-Speed 3038.66 samples/sec Loss 16.2837 LearningRate 0.0907 Epoch: 0 Global Step: 11790 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:51:50,384-Speed 3132.88 samples/sec Loss 16.2637 LearningRate 0.0907 Epoch: 0 Global Step: 11800 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:51:53,685-Speed 3103.40 samples/sec Loss 16.5279 LearningRate 0.0907 Epoch: 0 Global Step: 11810 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:51:57,002-Speed 3088.33 samples/sec Loss 16.2531 LearningRate 0.0907 Epoch: 0 Global Step: 11820 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:52:00,312-Speed 3093.94 samples/sec Loss 16.3700 LearningRate 0.0907 Epoch: 0 Global Step: 11830 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:52:03,627-Speed 3089.89 samples/sec Loss 16.4625 LearningRate 0.0907 Epoch: 0 Global Step: 11840 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:52:06,907-Speed 3123.58 samples/sec Loss 16.4876 LearningRate 0.0907 Epoch: 0 Global Step: 11850 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:52:10,226-Speed 3085.54 samples/sec Loss 16.2732 LearningRate 0.0907 Epoch: 0 Global Step: 11860 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:52:13,550-Speed 3081.06 samples/sec Loss 16.3740 LearningRate 0.0907 Epoch: 0 Global Step: 11870 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:52:16,868-Speed 3088.01 samples/sec Loss 16.2265 LearningRate 0.0907 Epoch: 0 Global Step: 11880 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:52:20,247-Speed 3031.26 samples/sec Loss 16.3603 LearningRate 0.0907 Epoch: 0 Global Step: 11890 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:52:23,563-Speed 3089.37 samples/sec Loss 16.3864 LearningRate 0.0906 Epoch: 0 Global Step: 11900 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:52:26,885-Speed 3083.28 samples/sec Loss 16.2316 LearningRate 0.0906 Epoch: 0 Global Step: 11910 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:52:30,261-Speed 3033.89 samples/sec Loss 16.3167 LearningRate 0.0906 Epoch: 0 Global Step: 11920 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:52:33,529-Speed 3135.06 samples/sec Loss 16.4669 LearningRate 0.0906 Epoch: 0 Global Step: 11930 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:52:36,829-Speed 3103.57 samples/sec Loss 16.3446 LearningRate 0.0906 Epoch: 0 Global Step: 11940 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:52:40,227-Speed 3014.19 samples/sec Loss 16.4713 LearningRate 0.0906 Epoch: 0 Global Step: 11950 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:52:43,570-Speed 3064.00 samples/sec Loss 16.1059 LearningRate 0.0906 Epoch: 0 Global Step: 11960 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:52:46,877-Speed 3098.47 samples/sec Loss 16.4929 LearningRate 0.0906 Epoch: 0 Global Step: 11970 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:52:50,201-Speed 3080.99 samples/sec Loss 16.2102 LearningRate 0.0906 Epoch: 0 Global Step: 11980 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:52:53,566-Speed 3044.45 samples/sec Loss 16.3605 LearningRate 0.0906 Epoch: 0 Global Step: 11990 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:52:56,845-Speed 3123.99 samples/sec Loss 16.2622 LearningRate 0.0906 Epoch: 0 Global Step: 12000 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:53:00,194-Speed 3057.74 samples/sec Loss 16.3318 LearningRate 0.0906 Epoch: 0 Global Step: 12010 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:53:03,515-Speed 3084.38 samples/sec Loss 16.3079 LearningRate 0.0906 Epoch: 0 Global Step: 12020 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:53:06,879-Speed 3044.92 samples/sec Loss 16.4042 LearningRate 0.0905 Epoch: 0 Global Step: 12030 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:53:10,203-Speed 3081.91 samples/sec Loss 16.3529 LearningRate 0.0905 Epoch: 0 Global Step: 12040 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:53:13,539-Speed 3070.89 samples/sec Loss 16.3095 LearningRate 0.0905 Epoch: 0 Global Step: 12050 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:53:16,903-Speed 3044.58 samples/sec Loss 16.4192 LearningRate 0.0905 Epoch: 0 Global Step: 12060 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:53:20,219-Speed 3088.87 samples/sec Loss 16.3990 LearningRate 0.0905 Epoch: 0 Global Step: 12070 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:53:23,589-Speed 3039.68 samples/sec Loss 16.2786 LearningRate 0.0905 Epoch: 0 Global Step: 12080 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:53:26,856-Speed 3135.79 samples/sec Loss 16.2042 LearningRate 0.0905 Epoch: 0 Global Step: 12090 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:53:30,129-Speed 3129.29 samples/sec Loss 16.1156 LearningRate 0.0905 Epoch: 0 Global Step: 12100 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:53:33,482-Speed 3054.92 samples/sec Loss 16.4011 LearningRate 0.0905 Epoch: 0 Global Step: 12110 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:53:36,791-Speed 3095.93 samples/sec Loss 16.3716 LearningRate 0.0905 Epoch: 0 Global Step: 12120 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:53:40,130-Speed 3067.78 samples/sec Loss 16.2198 LearningRate 0.0905 Epoch: 0 Global Step: 12130 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:53:43,451-Speed 3084.30 samples/sec Loss 16.3042 LearningRate 0.0905 Epoch: 0 Global Step: 12140 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:53:46,817-Speed 3042.82 samples/sec Loss 16.4575 LearningRate 0.0905 Epoch: 0 Global Step: 12150 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:53:50,159-Speed 3064.70 samples/sec Loss 16.3431 LearningRate 0.0904 Epoch: 0 Global Step: 12160 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:53:53,523-Speed 3045.44 samples/sec Loss 16.3703 LearningRate 0.0904 Epoch: 0 Global Step: 12170 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:53:56,818-Speed 3108.99 samples/sec Loss 16.4280 LearningRate 0.0904 Epoch: 0 Global Step: 12180 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:54:00,127-Speed 3095.04 samples/sec Loss 16.2319 LearningRate 0.0904 Epoch: 0 Global Step: 12190 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:54:03,411-Speed 3119.86 samples/sec Loss 16.2686 LearningRate 0.0904 Epoch: 0 Global Step: 12200 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:54:06,726-Speed 3089.18 samples/sec Loss 16.1769 LearningRate 0.0904 Epoch: 0 Global Step: 12210 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:54:10,065-Speed 3068.42 samples/sec Loss 16.2671 LearningRate 0.0904 Epoch: 0 Global Step: 12220 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:54:13,380-Speed 3089.54 samples/sec Loss 16.2229 LearningRate 0.0904 Epoch: 0 Global Step: 12230 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:54:16,711-Speed 3075.42 samples/sec Loss 16.3372 LearningRate 0.0904 Epoch: 0 Global Step: 12240 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:54:20,082-Speed 3038.23 samples/sec Loss 16.1857 LearningRate 0.0904 Epoch: 0 Global Step: 12250 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:54:23,426-Speed 3063.40 samples/sec Loss 16.1687 LearningRate 0.0904 Epoch: 0 Global Step: 12260 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:54:26,765-Speed 3067.78 samples/sec Loss 16.2308 LearningRate 0.0904 Epoch: 0 Global Step: 12270 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:54:30,093-Speed 3077.63 samples/sec Loss 16.2654 LearningRate 0.0904 Epoch: 0 Global Step: 12280 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:54:33,423-Speed 3075.96 samples/sec Loss 16.3179 LearningRate 0.0904 Epoch: 0 Global Step: 12290 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:54:36,724-Speed 3103.72 samples/sec Loss 16.2387 LearningRate 0.0903 Epoch: 0 Global Step: 12300 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:54:40,102-Speed 3032.10 samples/sec Loss 16.3022 LearningRate 0.0903 Epoch: 0 Global Step: 12310 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:54:43,457-Speed 3053.22 samples/sec Loss 16.1577 LearningRate 0.0903 Epoch: 0 Global Step: 12320 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:54:46,814-Speed 3050.33 samples/sec Loss 16.3790 LearningRate 0.0903 Epoch: 0 Global Step: 12330 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:54:50,194-Speed 3031.16 samples/sec Loss 16.2926 LearningRate 0.0903 Epoch: 0 Global Step: 12340 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:54:53,574-Speed 3029.85 samples/sec Loss 16.2590 LearningRate 0.0903 Epoch: 0 Global Step: 12350 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:54:56,924-Speed 3058.04 samples/sec Loss 16.2450 LearningRate 0.0903 Epoch: 0 Global Step: 12360 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:55:00,238-Speed 3090.83 samples/sec Loss 16.1958 LearningRate 0.0903 Epoch: 0 Global Step: 12370 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:55:03,569-Speed 3074.94 samples/sec Loss 16.2714 LearningRate 0.0903 Epoch: 0 Global Step: 12380 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:55:06,896-Speed 3078.20 samples/sec Loss 16.1941 LearningRate 0.0903 Epoch: 0 Global Step: 12390 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:55:10,222-Speed 3079.73 samples/sec Loss 16.2552 LearningRate 0.0903 Epoch: 0 Global Step: 12400 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:55:13,726-Speed 2924.05 samples/sec Loss 16.3147 LearningRate 0.0903 Epoch: 0 Global Step: 12410 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:55:16,991-Speed 3136.71 samples/sec Loss 16.1806 LearningRate 0.0903 Epoch: 0 Global Step: 12420 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:55:49,255-Speed 317.40 samples/sec Loss 14.8885 LearningRate 0.0902 Epoch: 1 Global Step: 12430 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:55:52,674-Speed 2996.43 samples/sec Loss 14.7384 LearningRate 0.0902 Epoch: 1 Global Step: 12440 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:55:55,976-Speed 3102.16 samples/sec Loss 14.5572 LearningRate 0.0902 Epoch: 1 Global Step: 12450 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:55:59,273-Speed 3106.73 samples/sec Loss 14.6650 LearningRate 0.0902 Epoch: 1 Global Step: 12460 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:56:02,616-Speed 3064.37 samples/sec Loss 14.6077 LearningRate 0.0902 Epoch: 1 Global Step: 12470 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:56:05,944-Speed 3077.48 samples/sec Loss 14.5340 LearningRate 0.0902 Epoch: 1 Global Step: 12480 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:56:09,284-Speed 3066.74 samples/sec Loss 14.5251 LearningRate 0.0902 Epoch: 1 Global Step: 12490 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:56:12,639-Speed 3053.45 samples/sec Loss 14.6209 LearningRate 0.0902 Epoch: 1 Global Step: 12500 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:56:15,934-Speed 3108.01 samples/sec Loss 14.5287 LearningRate 0.0902 Epoch: 1 Global Step: 12510 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:56:19,229-Speed 3109.08 samples/sec Loss 14.6184 LearningRate 0.0902 Epoch: 1 Global Step: 12520 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-04-27 02:56:22,593-Speed 3045.45 samples/sec Loss 14.5791 LearningRate 0.0902 Epoch: 1 Global Step: 12530 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:56:25,903-Speed 3093.88 samples/sec Loss 14.6170 LearningRate 0.0902 Epoch: 1 Global Step: 12540 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:56:29,241-Speed 3068.69 samples/sec Loss 14.7209 LearningRate 0.0902 Epoch: 1 Global Step: 12550 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:56:32,586-Speed 3062.77 samples/sec Loss 14.8675 LearningRate 0.0901 Epoch: 1 Global Step: 12560 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:56:35,909-Speed 3081.91 samples/sec Loss 14.6752 LearningRate 0.0901 Epoch: 1 Global Step: 12570 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:56:39,288-Speed 3031.47 samples/sec Loss 14.7787 LearningRate 0.0901 Epoch: 1 Global Step: 12580 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:56:42,642-Speed 3054.01 samples/sec Loss 14.7225 LearningRate 0.0901 Epoch: 1 Global Step: 12590 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:56:45,921-Speed 3124.54 samples/sec Loss 14.6160 LearningRate 0.0901 Epoch: 1 Global Step: 12600 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:56:49,205-Speed 3118.58 samples/sec Loss 14.6300 LearningRate 0.0901 Epoch: 1 Global Step: 12610 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:56:52,561-Speed 3052.47 samples/sec Loss 14.7292 LearningRate 0.0901 Epoch: 1 Global Step: 12620 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:56:55,841-Speed 3122.32 samples/sec Loss 14.7992 LearningRate 0.0901 Epoch: 1 Global Step: 12630 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:56:59,160-Speed 3086.39 samples/sec Loss 14.8322 LearningRate 0.0901 Epoch: 1 Global Step: 12640 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:57:02,456-Speed 3108.29 samples/sec Loss 14.7572 LearningRate 0.0901 Epoch: 1 Global Step: 12650 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:57:05,770-Speed 3090.25 samples/sec Loss 14.4447 LearningRate 0.0901 Epoch: 1 Global Step: 12660 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:57:09,082-Speed 3093.53 samples/sec Loss 14.7249 LearningRate 0.0901 Epoch: 1 Global Step: 12670 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:57:12,814-Speed 2744.38 samples/sec Loss 14.9808 LearningRate 0.0901 Epoch: 1 Global Step: 12680 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:57:16,198-Speed 3027.48 samples/sec Loss 14.8378 LearningRate 0.0900 Epoch: 1 Global Step: 12690 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:57:19,541-Speed 3063.87 samples/sec Loss 14.8591 LearningRate 0.0900 Epoch: 1 Global Step: 12700 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:57:22,839-Speed 3106.32 samples/sec Loss 14.7530 LearningRate 0.0900 Epoch: 1 Global Step: 12710 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:57:26,157-Speed 3086.17 samples/sec Loss 14.7569 LearningRate 0.0900 Epoch: 1 Global Step: 12720 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:57:29,522-Speed 3044.46 samples/sec Loss 14.6261 LearningRate 0.0900 Epoch: 1 Global Step: 12730 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:57:32,813-Speed 3112.43 samples/sec Loss 14.7766 LearningRate 0.0900 Epoch: 1 Global Step: 12740 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:57:36,194-Speed 3029.11 samples/sec Loss 14.8770 LearningRate 0.0900 Epoch: 1 Global Step: 12750 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:57:39,571-Speed 3034.48 samples/sec Loss 14.8192 LearningRate 0.0900 Epoch: 1 Global Step: 12760 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:57:42,966-Speed 3016.54 samples/sec Loss 14.8349 LearningRate 0.0900 Epoch: 1 Global Step: 12770 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:57:46,291-Speed 3080.25 samples/sec Loss 14.6880 LearningRate 0.0900 Epoch: 1 Global Step: 12780 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:57:49,626-Speed 3071.49 samples/sec Loss 14.8086 LearningRate 0.0900 Epoch: 1 Global Step: 12790 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:57:52,985-Speed 3050.39 samples/sec Loss 14.8144 LearningRate 0.0900 Epoch: 1 Global Step: 12800 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:57:56,300-Speed 3089.43 samples/sec Loss 14.9611 LearningRate 0.0900 Epoch: 1 Global Step: 12810 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:57:59,625-Speed 3081.22 samples/sec Loss 14.9227 LearningRate 0.0899 Epoch: 1 Global Step: 12820 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:58:02,936-Speed 3093.63 samples/sec Loss 14.9270 LearningRate 0.0899 Epoch: 1 Global Step: 12830 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:58:06,245-Speed 3095.11 samples/sec Loss 14.8815 LearningRate 0.0899 Epoch: 1 Global Step: 12840 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:58:09,560-Speed 3090.56 samples/sec Loss 14.8677 LearningRate 0.0899 Epoch: 1 Global Step: 12850 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:58:12,914-Speed 3053.77 samples/sec Loss 15.0331 LearningRate 0.0899 Epoch: 1 Global Step: 12860 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:58:16,234-Speed 3085.55 samples/sec Loss 14.9208 LearningRate 0.0899 Epoch: 1 Global Step: 12870 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:58:19,563-Speed 3076.94 samples/sec Loss 14.9253 LearningRate 0.0899 Epoch: 1 Global Step: 12880 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:58:22,886-Speed 3082.04 samples/sec Loss 14.9328 LearningRate 0.0899 Epoch: 1 Global Step: 12890 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:58:26,222-Speed 3071.36 samples/sec Loss 14.8005 LearningRate 0.0899 Epoch: 1 Global Step: 12900 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:58:29,568-Speed 3061.16 samples/sec Loss 15.0490 LearningRate 0.0899 Epoch: 1 Global Step: 12910 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:58:32,932-Speed 3045.08 samples/sec Loss 14.9112 LearningRate 0.0899 Epoch: 1 Global Step: 12920 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:58:36,222-Speed 3114.12 samples/sec Loss 14.8653 LearningRate 0.0899 Epoch: 1 Global Step: 12930 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:58:39,599-Speed 3033.18 samples/sec Loss 14.8855 LearningRate 0.0899 Epoch: 1 Global Step: 12940 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:58:42,898-Speed 3104.61 samples/sec Loss 14.9197 LearningRate 0.0898 Epoch: 1 Global Step: 12950 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:58:46,221-Speed 3081.99 samples/sec Loss 14.9560 LearningRate 0.0898 Epoch: 1 Global Step: 12960 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:58:49,542-Speed 3084.65 samples/sec Loss 14.8906 LearningRate 0.0898 Epoch: 1 Global Step: 12970 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:58:52,853-Speed 3094.00 samples/sec Loss 14.9628 LearningRate 0.0898 Epoch: 1 Global Step: 12980 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:58:56,154-Speed 3103.16 samples/sec Loss 15.0403 LearningRate 0.0898 Epoch: 1 Global Step: 12990 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:58:59,406-Speed 3149.81 samples/sec Loss 15.0044 LearningRate 0.0898 Epoch: 1 Global Step: 13000 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:59:02,660-Speed 3148.07 samples/sec Loss 14.9806 LearningRate 0.0898 Epoch: 1 Global Step: 13010 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:59:05,962-Speed 3101.59 samples/sec Loss 15.0199 LearningRate 0.0898 Epoch: 1 Global Step: 13020 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:59:09,290-Speed 3078.25 samples/sec Loss 15.0194 LearningRate 0.0898 Epoch: 1 Global Step: 13030 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:59:12,592-Speed 3102.46 samples/sec Loss 14.9975 LearningRate 0.0898 Epoch: 1 Global Step: 13040 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:59:15,922-Speed 3075.51 samples/sec Loss 15.1050 LearningRate 0.0898 Epoch: 1 Global Step: 13050 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:59:19,261-Speed 3067.49 samples/sec Loss 14.9692 LearningRate 0.0898 Epoch: 1 Global Step: 13060 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:59:22,517-Speed 3146.48 samples/sec Loss 15.1222 LearningRate 0.0898 Epoch: 1 Global Step: 13070 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:59:25,795-Speed 3124.95 samples/sec Loss 15.0411 LearningRate 0.0897 Epoch: 1 Global Step: 13080 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:59:29,063-Speed 3134.80 samples/sec Loss 14.9702 LearningRate 0.0897 Epoch: 1 Global Step: 13090 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:59:32,342-Speed 3124.17 samples/sec Loss 15.1052 LearningRate 0.0897 Epoch: 1 Global Step: 13100 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:59:35,594-Speed 3148.94 samples/sec Loss 15.0715 LearningRate 0.0897 Epoch: 1 Global Step: 13110 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:59:38,866-Speed 3131.72 samples/sec Loss 14.9595 LearningRate 0.0897 Epoch: 1 Global Step: 13120 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 02:59:42,129-Speed 3138.95 samples/sec Loss 14.9461 LearningRate 0.0897 Epoch: 1 Global Step: 13130 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:59:45,407-Speed 3124.24 samples/sec Loss 15.1797 LearningRate 0.0897 Epoch: 1 Global Step: 13140 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:59:48,697-Speed 3113.64 samples/sec Loss 15.1761 LearningRate 0.0897 Epoch: 1 Global Step: 13150 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:59:51,987-Speed 3113.23 samples/sec Loss 15.0283 LearningRate 0.0897 Epoch: 1 Global Step: 13160 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:59:55,377-Speed 3021.90 samples/sec Loss 15.1091 LearningRate 0.0897 Epoch: 1 Global Step: 13170 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 02:59:58,705-Speed 3078.68 samples/sec Loss 15.2196 LearningRate 0.0897 Epoch: 1 Global Step: 13180 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:00:02,083-Speed 3032.48 samples/sec Loss 15.0981 LearningRate 0.0897 Epoch: 1 Global Step: 13190 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:00:05,421-Speed 3068.08 samples/sec Loss 15.0690 LearningRate 0.0897 Epoch: 1 Global Step: 13200 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:00:08,740-Speed 3086.22 samples/sec Loss 15.1177 LearningRate 0.0896 Epoch: 1 Global Step: 13210 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:00:12,045-Speed 3099.62 samples/sec Loss 15.2257 LearningRate 0.0896 Epoch: 1 Global Step: 13220 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:00:15,475-Speed 2986.02 samples/sec Loss 15.2738 LearningRate 0.0896 Epoch: 1 Global Step: 13230 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:00:18,759-Speed 3119.27 samples/sec Loss 15.2274 LearningRate 0.0896 Epoch: 1 Global Step: 13240 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:00:22,129-Speed 3039.71 samples/sec Loss 15.1597 LearningRate 0.0896 Epoch: 1 Global Step: 13250 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:00:25,459-Speed 3075.26 samples/sec Loss 15.1568 LearningRate 0.0896 Epoch: 1 Global Step: 13260 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:00:28,776-Speed 3088.31 samples/sec Loss 15.2955 LearningRate 0.0896 Epoch: 1 Global Step: 13270 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:00:32,135-Speed 3050.09 samples/sec Loss 15.3286 LearningRate 0.0896 Epoch: 1 Global Step: 13280 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:00:35,477-Speed 3064.53 samples/sec Loss 15.1449 LearningRate 0.0896 Epoch: 1 Global Step: 13290 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:00:38,785-Speed 3096.18 samples/sec Loss 15.0646 LearningRate 0.0896 Epoch: 1 Global Step: 13300 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:00:42,106-Speed 3084.24 samples/sec Loss 15.1554 LearningRate 0.0896 Epoch: 1 Global Step: 13310 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:00:45,375-Speed 3133.41 samples/sec Loss 15.1456 LearningRate 0.0896 Epoch: 1 Global Step: 13320 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:00:48,658-Speed 3120.08 samples/sec Loss 15.2742 LearningRate 0.0896 Epoch: 1 Global Step: 13330 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:00:51,940-Speed 3121.11 samples/sec Loss 15.1303 LearningRate 0.0895 Epoch: 1 Global Step: 13340 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:00:55,258-Speed 3087.43 samples/sec Loss 15.1286 LearningRate 0.0895 Epoch: 1 Global Step: 13350 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:00:58,533-Speed 3126.95 samples/sec Loss 15.2865 LearningRate 0.0895 Epoch: 1 Global Step: 13360 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:01:01,869-Speed 3070.32 samples/sec Loss 15.1967 LearningRate 0.0895 Epoch: 1 Global Step: 13370 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:01:05,147-Speed 3125.32 samples/sec Loss 15.3553 LearningRate 0.0895 Epoch: 1 Global Step: 13380 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:01:08,482-Speed 3071.07 samples/sec Loss 15.1848 LearningRate 0.0895 Epoch: 1 Global Step: 13390 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:01:11,785-Speed 3101.40 samples/sec Loss 15.1947 LearningRate 0.0895 Epoch: 1 Global Step: 13400 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:01:15,190-Speed 3008.67 samples/sec Loss 15.3134 LearningRate 0.0895 Epoch: 1 Global Step: 13410 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:01:18,488-Speed 3105.30 samples/sec Loss 15.2959 LearningRate 0.0895 Epoch: 1 Global Step: 13420 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:01:21,784-Speed 3107.74 samples/sec Loss 15.2614 LearningRate 0.0895 Epoch: 1 Global Step: 13430 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:01:25,113-Speed 3076.78 samples/sec Loss 15.2290 LearningRate 0.0895 Epoch: 1 Global Step: 13440 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:01:28,379-Speed 3136.74 samples/sec Loss 15.2232 LearningRate 0.0895 Epoch: 1 Global Step: 13450 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:01:31,668-Speed 3114.89 samples/sec Loss 15.2027 LearningRate 0.0895 Epoch: 1 Global Step: 13460 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:01:35,005-Speed 3068.82 samples/sec Loss 15.2724 LearningRate 0.0894 Epoch: 1 Global Step: 13470 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:01:38,281-Speed 3127.71 samples/sec Loss 15.2649 LearningRate 0.0894 Epoch: 1 Global Step: 13480 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:01:41,573-Speed 3111.06 samples/sec Loss 15.4311 LearningRate 0.0894 Epoch: 1 Global Step: 13490 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:01:44,873-Speed 3103.92 samples/sec Loss 15.2016 LearningRate 0.0894 Epoch: 1 Global Step: 13500 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:01:48,192-Speed 3085.66 samples/sec Loss 15.0670 LearningRate 0.0894 Epoch: 1 Global Step: 13510 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:01:51,529-Speed 3070.20 samples/sec Loss 15.2891 LearningRate 0.0894 Epoch: 1 Global Step: 13520 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:01:54,860-Speed 3075.06 samples/sec Loss 15.4165 LearningRate 0.0894 Epoch: 1 Global Step: 13530 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:01:58,138-Speed 3124.95 samples/sec Loss 15.3444 LearningRate 0.0894 Epoch: 1 Global Step: 13540 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:02:01,454-Speed 3088.54 samples/sec Loss 15.2358 LearningRate 0.0894 Epoch: 1 Global Step: 13550 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:02:04,769-Speed 3089.72 samples/sec Loss 15.3371 LearningRate 0.0894 Epoch: 1 Global Step: 13560 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:02:08,117-Speed 3059.24 samples/sec Loss 15.3004 LearningRate 0.0894 Epoch: 1 Global Step: 13570 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:02:11,456-Speed 3067.81 samples/sec Loss 15.2648 LearningRate 0.0894 Epoch: 1 Global Step: 13580 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:02:14,791-Speed 3071.78 samples/sec Loss 15.3541 LearningRate 0.0894 Epoch: 1 Global Step: 13590 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:02:18,120-Speed 3076.79 samples/sec Loss 15.2768 LearningRate 0.0894 Epoch: 1 Global Step: 13600 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:02:21,414-Speed 3109.76 samples/sec Loss 15.3392 LearningRate 0.0893 Epoch: 1 Global Step: 13610 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:02:24,736-Speed 3082.56 samples/sec Loss 15.2089 LearningRate 0.0893 Epoch: 1 Global Step: 13620 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:02:28,041-Speed 3099.84 samples/sec Loss 15.2897 LearningRate 0.0893 Epoch: 1 Global Step: 13630 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:02:31,370-Speed 3076.57 samples/sec Loss 15.5311 LearningRate 0.0893 Epoch: 1 Global Step: 13640 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:02:34,651-Speed 3121.93 samples/sec Loss 15.3070 LearningRate 0.0893 Epoch: 1 Global Step: 13650 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:02:37,962-Speed 3093.86 samples/sec Loss 15.4292 LearningRate 0.0893 Epoch: 1 Global Step: 13660 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:02:41,280-Speed 3087.54 samples/sec Loss 15.3525 LearningRate 0.0893 Epoch: 1 Global Step: 13670 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:02:44,579-Speed 3105.11 samples/sec Loss 15.3331 LearningRate 0.0893 Epoch: 1 Global Step: 13680 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:02:47,885-Speed 3098.09 samples/sec Loss 15.2476 LearningRate 0.0893 Epoch: 1 Global Step: 13690 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:02:51,175-Speed 3113.21 samples/sec Loss 15.3441 LearningRate 0.0893 Epoch: 1 Global Step: 13700 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:02:54,518-Speed 3063.84 samples/sec Loss 15.2595 LearningRate 0.0893 Epoch: 1 Global Step: 13710 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:02:57,834-Speed 3089.34 samples/sec Loss 15.1712 LearningRate 0.0893 Epoch: 1 Global Step: 13720 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:03:01,239-Speed 3008.83 samples/sec Loss 15.2991 LearningRate 0.0893 Epoch: 1 Global Step: 13730 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:03:04,596-Speed 3051.73 samples/sec Loss 15.5060 LearningRate 0.0892 Epoch: 1 Global Step: 13740 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:03:07,894-Speed 3105.62 samples/sec Loss 15.3321 LearningRate 0.0892 Epoch: 1 Global Step: 13750 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:03:11,246-Speed 3056.17 samples/sec Loss 15.2763 LearningRate 0.0892 Epoch: 1 Global Step: 13760 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:03:14,544-Speed 3105.26 samples/sec Loss 15.3188 LearningRate 0.0892 Epoch: 1 Global Step: 13770 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:03:17,865-Speed 3084.96 samples/sec Loss 15.3197 LearningRate 0.0892 Epoch: 1 Global Step: 13780 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:03:21,152-Speed 3115.68 samples/sec Loss 15.4541 LearningRate 0.0892 Epoch: 1 Global Step: 13790 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:03:24,446-Speed 3109.78 samples/sec Loss 15.2682 LearningRate 0.0892 Epoch: 1 Global Step: 13800 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:03:27,710-Speed 3138.23 samples/sec Loss 15.4309 LearningRate 0.0892 Epoch: 1 Global Step: 13810 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:03:31,049-Speed 3068.07 samples/sec Loss 15.4095 LearningRate 0.0892 Epoch: 1 Global Step: 13820 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-04-27 03:03:34,301-Speed 3149.29 samples/sec Loss 15.3642 LearningRate 0.0892 Epoch: 1 Global Step: 13830 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:03:37,647-Speed 3061.35 samples/sec Loss 15.5513 LearningRate 0.0892 Epoch: 1 Global Step: 13840 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:03:40,960-Speed 3091.89 samples/sec Loss 15.4691 LearningRate 0.0892 Epoch: 1 Global Step: 13850 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:03:44,284-Speed 3081.77 samples/sec Loss 15.5627 LearningRate 0.0892 Epoch: 1 Global Step: 13860 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:03:47,572-Speed 3115.12 samples/sec Loss 15.5072 LearningRate 0.0891 Epoch: 1 Global Step: 13870 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:03:50,893-Speed 3084.41 samples/sec Loss 15.5746 LearningRate 0.0891 Epoch: 1 Global Step: 13880 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:03:54,228-Speed 3071.26 samples/sec Loss 15.3822 LearningRate 0.0891 Epoch: 1 Global Step: 13890 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:03:57,601-Speed 3036.97 samples/sec Loss 15.5090 LearningRate 0.0891 Epoch: 1 Global Step: 13900 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:04:00,970-Speed 3040.14 samples/sec Loss 15.5658 LearningRate 0.0891 Epoch: 1 Global Step: 13910 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:04:04,261-Speed 3112.97 samples/sec Loss 15.3146 LearningRate 0.0891 Epoch: 1 Global Step: 13920 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:04:07,636-Speed 3034.77 samples/sec Loss 15.3316 LearningRate 0.0891 Epoch: 1 Global Step: 13930 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:04:10,941-Speed 3099.36 samples/sec Loss 15.3651 LearningRate 0.0891 Epoch: 1 Global Step: 13940 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:04:14,253-Speed 3092.26 samples/sec Loss 15.4127 LearningRate 0.0891 Epoch: 1 Global Step: 13950 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:04:17,608-Speed 3053.32 samples/sec Loss 15.4631 LearningRate 0.0891 Epoch: 1 Global Step: 13960 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:04:20,936-Speed 3077.21 samples/sec Loss 15.2476 LearningRate 0.0891 Epoch: 1 Global Step: 13970 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:04:24,304-Speed 3041.96 samples/sec Loss 15.3557 LearningRate 0.0891 Epoch: 1 Global Step: 13980 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:04:27,619-Speed 3089.56 samples/sec Loss 15.3105 LearningRate 0.0891 Epoch: 1 Global Step: 13990 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:04:30,910-Speed 3112.88 samples/sec Loss 15.3819 LearningRate 0.0890 Epoch: 1 Global Step: 14000 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:04:34,232-Speed 3082.63 samples/sec Loss 15.4866 LearningRate 0.0890 Epoch: 1 Global Step: 14010 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:04:37,588-Speed 3052.53 samples/sec Loss 15.4351 LearningRate 0.0890 Epoch: 1 Global Step: 14020 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:04:40,844-Speed 3145.74 samples/sec Loss 15.3783 LearningRate 0.0890 Epoch: 1 Global Step: 14030 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:04:44,131-Speed 3116.43 samples/sec Loss 15.3146 LearningRate 0.0890 Epoch: 1 Global Step: 14040 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:04:47,443-Speed 3092.85 samples/sec Loss 15.4759 LearningRate 0.0890 Epoch: 1 Global Step: 14050 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:04:50,760-Speed 3088.23 samples/sec Loss 15.4416 LearningRate 0.0890 Epoch: 1 Global Step: 14060 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:04:54,058-Speed 3105.51 samples/sec Loss 15.5031 LearningRate 0.0890 Epoch: 1 Global Step: 14070 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:04:57,359-Speed 3103.78 samples/sec Loss 15.2425 LearningRate 0.0890 Epoch: 1 Global Step: 14080 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:05:00,674-Speed 3089.02 samples/sec Loss 15.3822 LearningRate 0.0890 Epoch: 1 Global Step: 14090 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:05:04,000-Speed 3079.82 samples/sec Loss 15.3637 LearningRate 0.0890 Epoch: 1 Global Step: 14100 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:05:07,291-Speed 3113.47 samples/sec Loss 15.4101 LearningRate 0.0890 Epoch: 1 Global Step: 14110 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:05:10,577-Speed 3116.90 samples/sec Loss 15.3795 LearningRate 0.0890 Epoch: 1 Global Step: 14120 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:05:13,876-Speed 3105.07 samples/sec Loss 15.3480 LearningRate 0.0889 Epoch: 1 Global Step: 14130 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:05:17,198-Speed 3082.96 samples/sec Loss 15.1855 LearningRate 0.0889 Epoch: 1 Global Step: 14140 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:05:20,490-Speed 3111.96 samples/sec Loss 15.4294 LearningRate 0.0889 Epoch: 1 Global Step: 14150 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:05:23,771-Speed 3121.49 samples/sec Loss 15.4681 LearningRate 0.0889 Epoch: 1 Global Step: 14160 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:05:27,134-Speed 3046.64 samples/sec Loss 15.4432 LearningRate 0.0889 Epoch: 1 Global Step: 14170 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:05:30,437-Speed 3101.16 samples/sec Loss 15.3589 LearningRate 0.0889 Epoch: 1 Global Step: 14180 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:05:33,761-Speed 3082.38 samples/sec Loss 15.4320 LearningRate 0.0889 Epoch: 1 Global Step: 14190 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:05:37,056-Speed 3108.70 samples/sec Loss 15.4694 LearningRate 0.0889 Epoch: 1 Global Step: 14200 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:05:40,417-Speed 3047.33 samples/sec Loss 15.4563 LearningRate 0.0889 Epoch: 1 Global Step: 14210 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:05:43,779-Speed 3046.54 samples/sec Loss 15.5992 LearningRate 0.0889 Epoch: 1 Global Step: 14220 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:05:47,159-Speed 3030.30 samples/sec Loss 15.5474 LearningRate 0.0889 Epoch: 1 Global Step: 14230 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:05:50,506-Speed 3060.59 samples/sec Loss 15.4742 LearningRate 0.0889 Epoch: 1 Global Step: 14240 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:05:53,840-Speed 3072.32 samples/sec Loss 15.4804 LearningRate 0.0889 Epoch: 1 Global Step: 14250 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:05:57,121-Speed 3121.45 samples/sec Loss 15.3776 LearningRate 0.0888 Epoch: 1 Global Step: 14260 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:06:00,484-Speed 3045.72 samples/sec Loss 15.3304 LearningRate 0.0888 Epoch: 1 Global Step: 14270 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:06:03,823-Speed 3067.77 samples/sec Loss 15.5872 LearningRate 0.0888 Epoch: 1 Global Step: 14280 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:06:07,174-Speed 3056.92 samples/sec Loss 15.3847 LearningRate 0.0888 Epoch: 1 Global Step: 14290 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:06:10,483-Speed 3095.99 samples/sec Loss 15.3499 LearningRate 0.0888 Epoch: 1 Global Step: 14300 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:06:13,789-Speed 3098.24 samples/sec Loss 15.5344 LearningRate 0.0888 Epoch: 1 Global Step: 14310 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:06:17,106-Speed 3088.63 samples/sec Loss 15.4017 LearningRate 0.0888 Epoch: 1 Global Step: 14320 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:06:20,460-Speed 3053.33 samples/sec Loss 15.3367 LearningRate 0.0888 Epoch: 1 Global Step: 14330 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:06:23,769-Speed 3096.57 samples/sec Loss 15.2813 LearningRate 0.0888 Epoch: 1 Global Step: 14340 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:06:27,079-Speed 3094.59 samples/sec Loss 15.6328 LearningRate 0.0888 Epoch: 1 Global Step: 14350 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:06:30,400-Speed 3083.65 samples/sec Loss 15.4649 LearningRate 0.0888 Epoch: 1 Global Step: 14360 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:06:33,680-Speed 3123.42 samples/sec Loss 15.4253 LearningRate 0.0888 Epoch: 1 Global Step: 14370 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:06:37,010-Speed 3075.61 samples/sec Loss 15.3773 LearningRate 0.0888 Epoch: 1 Global Step: 14380 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:06:40,372-Speed 3046.73 samples/sec Loss 15.4404 LearningRate 0.0888 Epoch: 1 Global Step: 14390 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:06:43,742-Speed 3039.79 samples/sec Loss 15.3777 LearningRate 0.0887 Epoch: 1 Global Step: 14400 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:06:47,083-Speed 3065.63 samples/sec Loss 15.3095 LearningRate 0.0887 Epoch: 1 Global Step: 14410 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:06:50,454-Speed 3038.85 samples/sec Loss 15.6235 LearningRate 0.0887 Epoch: 1 Global Step: 14420 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:06:53,820-Speed 3042.90 samples/sec Loss 15.4072 LearningRate 0.0887 Epoch: 1 Global Step: 14430 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:06:57,122-Speed 3102.59 samples/sec Loss 15.5801 LearningRate 0.0887 Epoch: 1 Global Step: 14440 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:07:00,458-Speed 3070.79 samples/sec Loss 15.4274 LearningRate 0.0887 Epoch: 1 Global Step: 14450 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:07:03,868-Speed 3003.63 samples/sec Loss 15.4109 LearningRate 0.0887 Epoch: 1 Global Step: 14460 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:07:07,183-Speed 3089.95 samples/sec Loss 15.4790 LearningRate 0.0887 Epoch: 1 Global Step: 14470 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:07:10,557-Speed 3035.90 samples/sec Loss 15.3935 LearningRate 0.0887 Epoch: 1 Global Step: 14480 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:07:13,942-Speed 3025.33 samples/sec Loss 15.3084 LearningRate 0.0887 Epoch: 1 Global Step: 14490 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:07:17,196-Speed 3148.26 samples/sec Loss 15.4054 LearningRate 0.0887 Epoch: 1 Global Step: 14500 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:07:20,540-Speed 3063.40 samples/sec Loss 15.3545 LearningRate 0.0887 Epoch: 1 Global Step: 14510 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-04-27 03:07:23,814-Speed 3128.18 samples/sec Loss 15.4571 LearningRate 0.0887 Epoch: 1 Global Step: 14520 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:07:27,069-Speed 3148.38 samples/sec Loss 15.6245 LearningRate 0.0886 Epoch: 1 Global Step: 14530 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:07:30,416-Speed 3059.68 samples/sec Loss 15.4505 LearningRate 0.0886 Epoch: 1 Global Step: 14540 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:07:33,742-Speed 3079.96 samples/sec Loss 15.4376 LearningRate 0.0886 Epoch: 1 Global Step: 14550 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:07:37,093-Speed 3057.23 samples/sec Loss 15.5399 LearningRate 0.0886 Epoch: 1 Global Step: 14560 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:07:40,383-Speed 3113.20 samples/sec Loss 15.4970 LearningRate 0.0886 Epoch: 1 Global Step: 14570 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:07:43,731-Speed 3060.17 samples/sec Loss 15.4645 LearningRate 0.0886 Epoch: 1 Global Step: 14580 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:07:47,032-Speed 3102.32 samples/sec Loss 15.4885 LearningRate 0.0886 Epoch: 1 Global Step: 14590 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:07:50,402-Speed 3039.57 samples/sec Loss 15.5237 LearningRate 0.0886 Epoch: 1 Global Step: 14600 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:07:53,809-Speed 3006.72 samples/sec Loss 15.4580 LearningRate 0.0886 Epoch: 1 Global Step: 14610 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:07:57,192-Speed 3027.76 samples/sec Loss 15.4035 LearningRate 0.0886 Epoch: 1 Global Step: 14620 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:08:00,522-Speed 3076.43 samples/sec Loss 15.3933 LearningRate 0.0886 Epoch: 1 Global Step: 14630 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:08:03,955-Speed 2983.54 samples/sec Loss 15.5051 LearningRate 0.0886 Epoch: 1 Global Step: 14640 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:08:07,282-Speed 3079.10 samples/sec Loss 15.4415 LearningRate 0.0886 Epoch: 1 Global Step: 14650 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:08:10,643-Speed 3047.71 samples/sec Loss 15.5421 LearningRate 0.0885 Epoch: 1 Global Step: 14660 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:08:14,047-Speed 3008.97 samples/sec Loss 15.5142 LearningRate 0.0885 Epoch: 1 Global Step: 14670 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:08:17,359-Speed 3092.16 samples/sec Loss 15.4905 LearningRate 0.0885 Epoch: 1 Global Step: 14680 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:08:20,734-Speed 3034.97 samples/sec Loss 15.3741 LearningRate 0.0885 Epoch: 1 Global Step: 14690 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:08:24,147-Speed 3001.55 samples/sec Loss 15.5480 LearningRate 0.0885 Epoch: 1 Global Step: 14700 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:08:27,486-Speed 3069.54 samples/sec Loss 15.4947 LearningRate 0.0885 Epoch: 1 Global Step: 14710 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:08:30,791-Speed 3099.58 samples/sec Loss 15.3588 LearningRate 0.0885 Epoch: 1 Global Step: 14720 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:08:34,073-Speed 3120.99 samples/sec Loss 15.4160 LearningRate 0.0885 Epoch: 1 Global Step: 14730 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:08:37,436-Speed 3046.33 samples/sec Loss 15.4301 LearningRate 0.0885 Epoch: 1 Global Step: 14740 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:08:40,744-Speed 3096.04 samples/sec Loss 15.3052 LearningRate 0.0885 Epoch: 1 Global Step: 14750 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:08:44,049-Speed 3099.15 samples/sec Loss 15.3785 LearningRate 0.0885 Epoch: 1 Global Step: 14760 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:08:47,340-Speed 3113.03 samples/sec Loss 15.4523 LearningRate 0.0885 Epoch: 1 Global Step: 14770 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:08:50,689-Speed 3058.10 samples/sec Loss 15.5006 LearningRate 0.0885 Epoch: 1 Global Step: 14780 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:08:54,038-Speed 3059.01 samples/sec Loss 15.5657 LearningRate 0.0884 Epoch: 1 Global Step: 14790 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:08:57,360-Speed 3083.61 samples/sec Loss 15.4867 LearningRate 0.0884 Epoch: 1 Global Step: 14800 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:09:00,729-Speed 3040.16 samples/sec Loss 15.3469 LearningRate 0.0884 Epoch: 1 Global Step: 14810 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:09:04,074-Speed 3061.93 samples/sec Loss 15.4431 LearningRate 0.0884 Epoch: 1 Global Step: 14820 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:09:07,424-Speed 3058.37 samples/sec Loss 15.5999 LearningRate 0.0884 Epoch: 1 Global Step: 14830 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:09:10,753-Speed 3077.18 samples/sec Loss 15.4332 LearningRate 0.0884 Epoch: 1 Global Step: 14840 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:09:14,068-Speed 3090.54 samples/sec Loss 15.4075 LearningRate 0.0884 Epoch: 1 Global Step: 14850 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:09:17,421-Speed 3054.29 samples/sec Loss 15.5630 LearningRate 0.0884 Epoch: 1 Global Step: 14860 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:09:20,751-Speed 3076.03 samples/sec Loss 15.4616 LearningRate 0.0884 Epoch: 1 Global Step: 14870 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:09:24,083-Speed 3074.29 samples/sec Loss 15.5662 LearningRate 0.0884 Epoch: 1 Global Step: 14880 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:09:27,374-Speed 3113.01 samples/sec Loss 15.2796 LearningRate 0.0884 Epoch: 1 Global Step: 14890 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:09:30,664-Speed 3112.94 samples/sec Loss 15.4683 LearningRate 0.0884 Epoch: 1 Global Step: 14900 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:09:34,006-Speed 3064.86 samples/sec Loss 15.4387 LearningRate 0.0884 Epoch: 1 Global Step: 14910 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:09:37,431-Speed 2991.55 samples/sec Loss 15.5216 LearningRate 0.0883 Epoch: 1 Global Step: 14920 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:09:40,769-Speed 3068.45 samples/sec Loss 15.4650 LearningRate 0.0883 Epoch: 1 Global Step: 14930 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:09:44,109-Speed 3066.21 samples/sec Loss 15.4037 LearningRate 0.0883 Epoch: 1 Global Step: 14940 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:09:47,425-Speed 3089.75 samples/sec Loss 15.4396 LearningRate 0.0883 Epoch: 1 Global Step: 14950 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:09:50,742-Speed 3087.81 samples/sec Loss 15.5564 LearningRate 0.0883 Epoch: 1 Global Step: 14960 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:09:54,024-Speed 3121.26 samples/sec Loss 15.3816 LearningRate 0.0883 Epoch: 1 Global Step: 14970 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:09:57,357-Speed 3072.55 samples/sec Loss 15.4978 LearningRate 0.0883 Epoch: 1 Global Step: 14980 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:10:00,615-Speed 3144.63 samples/sec Loss 15.3996 LearningRate 0.0883 Epoch: 1 Global Step: 14990 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:10:03,909-Speed 3109.70 samples/sec Loss 15.4200 LearningRate 0.0883 Epoch: 1 Global Step: 15000 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:10:07,157-Speed 3153.35 samples/sec Loss 15.3586 LearningRate 0.0883 Epoch: 1 Global Step: 15010 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:10:10,471-Speed 3091.23 samples/sec Loss 15.4132 LearningRate 0.0883 Epoch: 1 Global Step: 15020 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:10:13,877-Speed 3007.46 samples/sec Loss 15.4707 LearningRate 0.0883 Epoch: 1 Global Step: 15030 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:10:17,209-Speed 3074.37 samples/sec Loss 15.2978 LearningRate 0.0883 Epoch: 1 Global Step: 15040 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:10:20,529-Speed 3085.25 samples/sec Loss 15.3727 LearningRate 0.0883 Epoch: 1 Global Step: 15050 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:10:23,846-Speed 3087.36 samples/sec Loss 15.4553 LearningRate 0.0882 Epoch: 1 Global Step: 15060 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:10:27,207-Speed 3047.48 samples/sec Loss 15.6635 LearningRate 0.0882 Epoch: 1 Global Step: 15070 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:10:30,564-Speed 3051.37 samples/sec Loss 15.3493 LearningRate 0.0882 Epoch: 1 Global Step: 15080 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:10:33,920-Speed 3052.57 samples/sec Loss 15.4685 LearningRate 0.0882 Epoch: 1 Global Step: 15090 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:10:37,292-Speed 3037.60 samples/sec Loss 15.4378 LearningRate 0.0882 Epoch: 1 Global Step: 15100 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:10:40,575-Speed 3120.11 samples/sec Loss 15.3645 LearningRate 0.0882 Epoch: 1 Global Step: 15110 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:10:44,013-Speed 2979.53 samples/sec Loss 15.4916 LearningRate 0.0882 Epoch: 1 Global Step: 15120 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:10:47,329-Speed 3088.48 samples/sec Loss 15.2273 LearningRate 0.0882 Epoch: 1 Global Step: 15130 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:10:50,649-Speed 3085.42 samples/sec Loss 15.4280 LearningRate 0.0882 Epoch: 1 Global Step: 15140 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:10:53,966-Speed 3088.14 samples/sec Loss 15.4808 LearningRate 0.0882 Epoch: 1 Global Step: 15150 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:10:57,261-Speed 3108.36 samples/sec Loss 15.4966 LearningRate 0.0882 Epoch: 1 Global Step: 15160 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:11:00,570-Speed 3095.56 samples/sec Loss 15.4977 LearningRate 0.0882 Epoch: 1 Global Step: 15170 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:11:03,927-Speed 3051.42 samples/sec Loss 15.3355 LearningRate 0.0882 Epoch: 1 Global Step: 15180 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:11:07,274-Speed 3060.72 samples/sec Loss 15.3558 LearningRate 0.0881 Epoch: 1 Global Step: 15190 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:11:10,611-Speed 3069.34 samples/sec Loss 15.5307 LearningRate 0.0881 Epoch: 1 Global Step: 15200 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:11:13,999-Speed 3023.68 samples/sec Loss 15.4670 LearningRate 0.0881 Epoch: 1 Global Step: 15210 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:11:17,283-Speed 3118.22 samples/sec Loss 15.4985 LearningRate 0.0881 Epoch: 1 Global Step: 15220 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:11:20,588-Speed 3099.43 samples/sec Loss 15.3697 LearningRate 0.0881 Epoch: 1 Global Step: 15230 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:11:23,920-Speed 3074.53 samples/sec Loss 15.4904 LearningRate 0.0881 Epoch: 1 Global Step: 15240 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:11:27,218-Speed 3105.73 samples/sec Loss 15.3399 LearningRate 0.0881 Epoch: 1 Global Step: 15250 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:11:30,489-Speed 3131.55 samples/sec Loss 15.3573 LearningRate 0.0881 Epoch: 1 Global Step: 15260 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:11:33,846-Speed 3050.82 samples/sec Loss 15.3555 LearningRate 0.0881 Epoch: 1 Global Step: 15270 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:11:37,181-Speed 3071.45 samples/sec Loss 15.2663 LearningRate 0.0881 Epoch: 1 Global Step: 15280 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:11:40,466-Speed 3118.07 samples/sec Loss 15.3818 LearningRate 0.0881 Epoch: 1 Global Step: 15290 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:11:43,821-Speed 3052.45 samples/sec Loss 15.3902 LearningRate 0.0881 Epoch: 1 Global Step: 15300 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:11:47,186-Speed 3045.04 samples/sec Loss 15.5335 LearningRate 0.0881 Epoch: 1 Global Step: 15310 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:11:50,539-Speed 3054.10 samples/sec Loss 15.5366 LearningRate 0.0880 Epoch: 1 Global Step: 15320 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:11:53,961-Speed 2993.87 samples/sec Loss 15.4498 LearningRate 0.0880 Epoch: 1 Global Step: 15330 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:11:57,354-Speed 3018.81 samples/sec Loss 15.3666 LearningRate 0.0880 Epoch: 1 Global Step: 15340 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:12:00,672-Speed 3087.07 samples/sec Loss 15.4831 LearningRate 0.0880 Epoch: 1 Global Step: 15350 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:12:04,042-Speed 3039.79 samples/sec Loss 15.5878 LearningRate 0.0880 Epoch: 1 Global Step: 15360 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:12:07,313-Speed 3131.15 samples/sec Loss 15.3518 LearningRate 0.0880 Epoch: 1 Global Step: 15370 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:12:10,630-Speed 3088.38 samples/sec Loss 15.4739 LearningRate 0.0880 Epoch: 1 Global Step: 15380 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:12:13,971-Speed 3065.47 samples/sec Loss 15.6324 LearningRate 0.0880 Epoch: 1 Global Step: 15390 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:12:17,344-Speed 3037.03 samples/sec Loss 15.3855 LearningRate 0.0880 Epoch: 1 Global Step: 15400 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:12:20,661-Speed 3088.65 samples/sec Loss 15.4529 LearningRate 0.0880 Epoch: 1 Global Step: 15410 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:12:24,034-Speed 3036.51 samples/sec Loss 15.5002 LearningRate 0.0880 Epoch: 1 Global Step: 15420 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:12:27,342-Speed 3096.46 samples/sec Loss 15.2615 LearningRate 0.0880 Epoch: 1 Global Step: 15430 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:12:30,699-Speed 3051.47 samples/sec Loss 15.5660 LearningRate 0.0880 Epoch: 1 Global Step: 15440 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:12:34,069-Speed 3039.30 samples/sec Loss 15.5052 LearningRate 0.0879 Epoch: 1 Global Step: 15450 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:12:37,343-Speed 3128.98 samples/sec Loss 15.3555 LearningRate 0.0879 Epoch: 1 Global Step: 15460 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:12:40,712-Speed 3040.27 samples/sec Loss 15.3593 LearningRate 0.0879 Epoch: 1 Global Step: 15470 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:12:44,065-Speed 3055.13 samples/sec Loss 15.4546 LearningRate 0.0879 Epoch: 1 Global Step: 15480 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:12:47,365-Speed 3104.93 samples/sec Loss 15.4100 LearningRate 0.0879 Epoch: 1 Global Step: 15490 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:12:50,664-Speed 3104.57 samples/sec Loss 15.4764 LearningRate 0.0879 Epoch: 1 Global Step: 15500 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:12:53,995-Speed 3074.71 samples/sec Loss 15.5998 LearningRate 0.0879 Epoch: 1 Global Step: 15510 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:12:57,304-Speed 3095.71 samples/sec Loss 15.4129 LearningRate 0.0879 Epoch: 1 Global Step: 15520 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:13:00,648-Speed 3063.58 samples/sec Loss 15.4968 LearningRate 0.0879 Epoch: 1 Global Step: 15530 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:13:03,996-Speed 3058.87 samples/sec Loss 15.2806 LearningRate 0.0879 Epoch: 1 Global Step: 15540 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:13:07,343-Speed 3060.82 samples/sec Loss 15.4716 LearningRate 0.0879 Epoch: 1 Global Step: 15550 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:13:10,668-Speed 3080.13 samples/sec Loss 15.4055 LearningRate 0.0879 Epoch: 1 Global Step: 15560 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:13:13,978-Speed 3094.50 samples/sec Loss 15.4458 LearningRate 0.0879 Epoch: 1 Global Step: 15570 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:13:17,259-Speed 3122.18 samples/sec Loss 15.5451 LearningRate 0.0879 Epoch: 1 Global Step: 15580 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:13:20,552-Speed 3110.47 samples/sec Loss 15.4825 LearningRate 0.0878 Epoch: 1 Global Step: 15590 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:13:23,861-Speed 3094.94 samples/sec Loss 15.3750 LearningRate 0.0878 Epoch: 1 Global Step: 15600 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:13:27,146-Speed 3118.39 samples/sec Loss 15.5041 LearningRate 0.0878 Epoch: 1 Global Step: 15610 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:13:30,478-Speed 3075.32 samples/sec Loss 15.4263 LearningRate 0.0878 Epoch: 1 Global Step: 15620 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:13:33,793-Speed 3090.51 samples/sec Loss 15.4014 LearningRate 0.0878 Epoch: 1 Global Step: 15630 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:13:37,074-Speed 3121.67 samples/sec Loss 15.4556 LearningRate 0.0878 Epoch: 1 Global Step: 15640 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:13:40,345-Speed 3131.88 samples/sec Loss 15.5355 LearningRate 0.0878 Epoch: 1 Global Step: 15650 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:13:43,698-Speed 3054.43 samples/sec Loss 15.3735 LearningRate 0.0878 Epoch: 1 Global Step: 15660 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:13:47,071-Speed 3036.61 samples/sec Loss 15.2502 LearningRate 0.0878 Epoch: 1 Global Step: 15670 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:13:50,431-Speed 3049.36 samples/sec Loss 15.4898 LearningRate 0.0878 Epoch: 1 Global Step: 15680 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:13:53,725-Speed 3109.05 samples/sec Loss 15.2022 LearningRate 0.0878 Epoch: 1 Global Step: 15690 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:13:57,038-Speed 3092.34 samples/sec Loss 15.3751 LearningRate 0.0878 Epoch: 1 Global Step: 15700 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:14:00,388-Speed 3057.67 samples/sec Loss 15.4367 LearningRate 0.0878 Epoch: 1 Global Step: 15710 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-04-27 03:14:03,630-Speed 3159.01 samples/sec Loss 15.4990 LearningRate 0.0877 Epoch: 1 Global Step: 15720 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:14:06,896-Speed 3136.55 samples/sec Loss 15.2280 LearningRate 0.0877 Epoch: 1 Global Step: 15730 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:14:10,234-Speed 3068.72 samples/sec Loss 15.3373 LearningRate 0.0877 Epoch: 1 Global Step: 15740 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:14:13,647-Speed 3001.52 samples/sec Loss 15.1941 LearningRate 0.0877 Epoch: 1 Global Step: 15750 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:14:16,960-Speed 3091.12 samples/sec Loss 15.5083 LearningRate 0.0877 Epoch: 1 Global Step: 15760 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:14:20,319-Speed 3049.05 samples/sec Loss 15.1937 LearningRate 0.0877 Epoch: 1 Global Step: 15770 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:14:23,672-Speed 3055.27 samples/sec Loss 15.3264 LearningRate 0.0877 Epoch: 1 Global Step: 15780 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:14:26,977-Speed 3099.66 samples/sec Loss 15.2953 LearningRate 0.0877 Epoch: 1 Global Step: 15790 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:14:30,243-Speed 3135.90 samples/sec Loss 15.4505 LearningRate 0.0877 Epoch: 1 Global Step: 15800 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:14:33,568-Speed 3081.07 samples/sec Loss 15.3647 LearningRate 0.0877 Epoch: 1 Global Step: 15810 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:14:36,892-Speed 3081.29 samples/sec Loss 15.3563 LearningRate 0.0877 Epoch: 1 Global Step: 15820 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:14:40,160-Speed 3133.83 samples/sec Loss 15.1664 LearningRate 0.0877 Epoch: 1 Global Step: 15830 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:14:43,494-Speed 3072.74 samples/sec Loss 15.3219 LearningRate 0.0877 Epoch: 1 Global Step: 15840 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:14:46,845-Speed 3056.62 samples/sec Loss 15.4584 LearningRate 0.0876 Epoch: 1 Global Step: 15850 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:14:50,174-Speed 3076.71 samples/sec Loss 15.4009 LearningRate 0.0876 Epoch: 1 Global Step: 15860 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:14:53,502-Speed 3078.47 samples/sec Loss 15.2413 LearningRate 0.0876 Epoch: 1 Global Step: 15870 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:14:56,847-Speed 3061.41 samples/sec Loss 15.4570 LearningRate 0.0876 Epoch: 1 Global Step: 15880 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:15:00,148-Speed 3103.89 samples/sec Loss 15.2431 LearningRate 0.0876 Epoch: 1 Global Step: 15890 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:15:03,484-Speed 3070.68 samples/sec Loss 15.3481 LearningRate 0.0876 Epoch: 1 Global Step: 15900 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:15:06,747-Speed 3138.97 samples/sec Loss 15.6051 LearningRate 0.0876 Epoch: 1 Global Step: 15910 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:15:10,022-Speed 3127.24 samples/sec Loss 15.4132 LearningRate 0.0876 Epoch: 1 Global Step: 15920 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:15:13,335-Speed 3092.68 samples/sec Loss 15.3628 LearningRate 0.0876 Epoch: 1 Global Step: 15930 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:15:16,752-Speed 2996.75 samples/sec Loss 15.3585 LearningRate 0.0876 Epoch: 1 Global Step: 15940 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:15:20,078-Speed 3080.13 samples/sec Loss 15.3177 LearningRate 0.0876 Epoch: 1 Global Step: 15950 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:15:23,419-Speed 3066.14 samples/sec Loss 15.4515 LearningRate 0.0876 Epoch: 1 Global Step: 15960 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:15:26,739-Speed 3084.82 samples/sec Loss 15.4290 LearningRate 0.0876 Epoch: 1 Global Step: 15970 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:15:30,002-Speed 3140.06 samples/sec Loss 15.3685 LearningRate 0.0875 Epoch: 1 Global Step: 15980 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:15:33,265-Speed 3139.01 samples/sec Loss 15.4175 LearningRate 0.0875 Epoch: 1 Global Step: 15990 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:15:36,547-Speed 3120.77 samples/sec Loss 15.3907 LearningRate 0.0875 Epoch: 1 Global Step: 16000 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:15:39,842-Speed 3108.82 samples/sec Loss 15.2318 LearningRate 0.0875 Epoch: 1 Global Step: 16010 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:15:43,100-Speed 3144.02 samples/sec Loss 15.4525 LearningRate 0.0875 Epoch: 1 Global Step: 16020 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:15:46,417-Speed 3087.94 samples/sec Loss 15.4053 LearningRate 0.0875 Epoch: 1 Global Step: 16030 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:15:49,715-Speed 3106.15 samples/sec Loss 15.4783 LearningRate 0.0875 Epoch: 1 Global Step: 16040 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:15:53,051-Speed 3071.02 samples/sec Loss 15.2993 LearningRate 0.0875 Epoch: 1 Global Step: 16050 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:15:56,332-Speed 3122.43 samples/sec Loss 15.5131 LearningRate 0.0875 Epoch: 1 Global Step: 16060 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:15:59,722-Speed 3020.99 samples/sec Loss 15.3153 LearningRate 0.0875 Epoch: 1 Global Step: 16070 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:16:03,024-Speed 3102.94 samples/sec Loss 15.3797 LearningRate 0.0875 Epoch: 1 Global Step: 16080 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:16:06,288-Speed 3137.99 samples/sec Loss 15.3157 LearningRate 0.0875 Epoch: 1 Global Step: 16090 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:16:09,643-Speed 3052.79 samples/sec Loss 15.3575 LearningRate 0.0875 Epoch: 1 Global Step: 16100 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:16:12,930-Speed 3117.07 samples/sec Loss 15.3006 LearningRate 0.0875 Epoch: 1 Global Step: 16110 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:16:16,261-Speed 3074.68 samples/sec Loss 15.3374 LearningRate 0.0874 Epoch: 1 Global Step: 16120 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:16:19,535-Speed 3129.43 samples/sec Loss 15.2544 LearningRate 0.0874 Epoch: 1 Global Step: 16130 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:16:22,794-Speed 3143.21 samples/sec Loss 15.3134 LearningRate 0.0874 Epoch: 1 Global Step: 16140 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:16:26,157-Speed 3045.18 samples/sec Loss 15.4423 LearningRate 0.0874 Epoch: 1 Global Step: 16150 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:16:29,514-Speed 3051.22 samples/sec Loss 15.2315 LearningRate 0.0874 Epoch: 1 Global Step: 16160 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:16:32,834-Speed 3086.17 samples/sec Loss 15.1249 LearningRate 0.0874 Epoch: 1 Global Step: 16170 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:16:36,133-Speed 3104.25 samples/sec Loss 15.5116 LearningRate 0.0874 Epoch: 1 Global Step: 16180 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:16:39,404-Speed 3131.31 samples/sec Loss 15.0256 LearningRate 0.0874 Epoch: 1 Global Step: 16190 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:16:42,736-Speed 3074.80 samples/sec Loss 15.3360 LearningRate 0.0874 Epoch: 1 Global Step: 16200 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:16:46,046-Speed 3094.57 samples/sec Loss 15.3551 LearningRate 0.0874 Epoch: 1 Global Step: 16210 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:16:49,392-Speed 3061.36 samples/sec Loss 15.3842 LearningRate 0.0874 Epoch: 1 Global Step: 16220 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:16:52,656-Speed 3137.53 samples/sec Loss 15.2977 LearningRate 0.0874 Epoch: 1 Global Step: 16230 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:16:55,993-Speed 3069.99 samples/sec Loss 15.3538 LearningRate 0.0874 Epoch: 1 Global Step: 16240 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:16:59,343-Speed 3058.30 samples/sec Loss 15.6020 LearningRate 0.0873 Epoch: 1 Global Step: 16250 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:17:02,656-Speed 3093.48 samples/sec Loss 15.3879 LearningRate 0.0873 Epoch: 1 Global Step: 16260 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:17:06,023-Speed 3041.85 samples/sec Loss 15.5560 LearningRate 0.0873 Epoch: 1 Global Step: 16270 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:17:09,325-Speed 3101.82 samples/sec Loss 15.3597 LearningRate 0.0873 Epoch: 1 Global Step: 16280 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:17:12,630-Speed 3099.48 samples/sec Loss 15.2653 LearningRate 0.0873 Epoch: 1 Global Step: 16290 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:17:15,919-Speed 3114.20 samples/sec Loss 15.2252 LearningRate 0.0873 Epoch: 1 Global Step: 16300 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:17:19,203-Speed 3120.39 samples/sec Loss 15.2944 LearningRate 0.0873 Epoch: 1 Global Step: 16310 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:17:22,520-Speed 3087.65 samples/sec Loss 15.2865 LearningRate 0.0873 Epoch: 1 Global Step: 16320 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:17:25,895-Speed 3035.29 samples/sec Loss 15.3299 LearningRate 0.0873 Epoch: 1 Global Step: 16330 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:17:29,203-Speed 3096.57 samples/sec Loss 15.3196 LearningRate 0.0873 Epoch: 1 Global Step: 16340 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:17:32,490-Speed 3116.64 samples/sec Loss 15.4667 LearningRate 0.0873 Epoch: 1 Global Step: 16350 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:17:35,770-Speed 3122.86 samples/sec Loss 15.3270 LearningRate 0.0873 Epoch: 1 Global Step: 16360 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:17:39,072-Speed 3102.10 samples/sec Loss 15.2819 LearningRate 0.0873 Epoch: 1 Global Step: 16370 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:17:42,414-Speed 3064.73 samples/sec Loss 15.3246 LearningRate 0.0872 Epoch: 1 Global Step: 16380 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:17:45,709-Speed 3109.06 samples/sec Loss 15.3632 LearningRate 0.0872 Epoch: 1 Global Step: 16390 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:17:49,035-Speed 3079.19 samples/sec Loss 15.1822 LearningRate 0.0872 Epoch: 1 Global Step: 16400 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:17:52,381-Speed 3061.73 samples/sec Loss 15.0581 LearningRate 0.0872 Epoch: 1 Global Step: 16410 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:17:55,708-Speed 3078.28 samples/sec Loss 15.2622 LearningRate 0.0872 Epoch: 1 Global Step: 16420 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:17:59,041-Speed 3073.85 samples/sec Loss 15.3462 LearningRate 0.0872 Epoch: 1 Global Step: 16430 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:18:02,366-Speed 3080.71 samples/sec Loss 15.3469 LearningRate 0.0872 Epoch: 1 Global Step: 16440 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:18:05,698-Speed 3073.58 samples/sec Loss 15.2015 LearningRate 0.0872 Epoch: 1 Global Step: 16450 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:18:09,051-Speed 3054.94 samples/sec Loss 15.1934 LearningRate 0.0872 Epoch: 1 Global Step: 16460 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:18:12,358-Speed 3097.53 samples/sec Loss 15.3299 LearningRate 0.0872 Epoch: 1 Global Step: 16470 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:18:15,656-Speed 3105.61 samples/sec Loss 15.2421 LearningRate 0.0872 Epoch: 1 Global Step: 16480 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:18:18,956-Speed 3104.12 samples/sec Loss 15.1340 LearningRate 0.0872 Epoch: 1 Global Step: 16490 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:18:22,373-Speed 2997.71 samples/sec Loss 15.2506 LearningRate 0.0872 Epoch: 1 Global Step: 16500 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:18:25,689-Speed 3087.83 samples/sec Loss 15.3546 LearningRate 0.0871 Epoch: 1 Global Step: 16510 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:18:29,030-Speed 3066.61 samples/sec Loss 15.1252 LearningRate 0.0871 Epoch: 1 Global Step: 16520 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:18:32,329-Speed 3104.23 samples/sec Loss 15.3484 LearningRate 0.0871 Epoch: 1 Global Step: 16530 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:18:35,674-Speed 3062.90 samples/sec Loss 15.2525 LearningRate 0.0871 Epoch: 1 Global Step: 16540 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:18:38,974-Speed 3103.04 samples/sec Loss 15.2216 LearningRate 0.0871 Epoch: 1 Global Step: 16550 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:18:42,346-Speed 3038.32 samples/sec Loss 15.3614 LearningRate 0.0871 Epoch: 1 Global Step: 16560 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:18:45,683-Speed 3069.79 samples/sec Loss 15.2647 LearningRate 0.0871 Epoch: 1 Global Step: 16570 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:18:49,032-Speed 3057.84 samples/sec Loss 15.1958 LearningRate 0.0871 Epoch: 1 Global Step: 16580 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:18:52,371-Speed 3067.99 samples/sec Loss 15.1609 LearningRate 0.0871 Epoch: 1 Global Step: 16590 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:18:55,657-Speed 3117.97 samples/sec Loss 15.2466 LearningRate 0.0871 Epoch: 1 Global Step: 16600 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:18:58,965-Speed 3096.19 samples/sec Loss 15.2139 LearningRate 0.0871 Epoch: 1 Global Step: 16610 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:19:02,274-Speed 3095.22 samples/sec Loss 15.2725 LearningRate 0.0871 Epoch: 1 Global Step: 16620 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:19:05,607-Speed 3073.57 samples/sec Loss 15.2488 LearningRate 0.0871 Epoch: 1 Global Step: 16630 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:19:08,911-Speed 3099.66 samples/sec Loss 14.9688 LearningRate 0.0871 Epoch: 1 Global Step: 16640 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:19:12,276-Speed 3044.14 samples/sec Loss 15.2464 LearningRate 0.0870 Epoch: 1 Global Step: 16650 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:19:15,625-Speed 3058.07 samples/sec Loss 15.2217 LearningRate 0.0870 Epoch: 1 Global Step: 16660 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:19:18,951-Speed 3079.63 samples/sec Loss 15.2078 LearningRate 0.0870 Epoch: 1 Global Step: 16670 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:19:22,307-Speed 3052.65 samples/sec Loss 15.2525 LearningRate 0.0870 Epoch: 1 Global Step: 16680 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:19:25,714-Speed 3006.88 samples/sec Loss 15.3526 LearningRate 0.0870 Epoch: 1 Global Step: 16690 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:19:29,014-Speed 3103.53 samples/sec Loss 15.3918 LearningRate 0.0870 Epoch: 1 Global Step: 16700 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:19:32,305-Speed 3111.78 samples/sec Loss 15.1994 LearningRate 0.0870 Epoch: 1 Global Step: 16710 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:19:35,658-Speed 3054.91 samples/sec Loss 15.3343 LearningRate 0.0870 Epoch: 1 Global Step: 16720 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:19:38,994-Speed 3070.46 samples/sec Loss 15.1545 LearningRate 0.0870 Epoch: 1 Global Step: 16730 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:19:42,340-Speed 3061.47 samples/sec Loss 15.1522 LearningRate 0.0870 Epoch: 1 Global Step: 16740 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:19:45,657-Speed 3087.96 samples/sec Loss 15.1580 LearningRate 0.0870 Epoch: 1 Global Step: 16750 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:19:48,913-Speed 3146.14 samples/sec Loss 15.0609 LearningRate 0.0870 Epoch: 1 Global Step: 16760 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:19:52,194-Speed 3121.16 samples/sec Loss 15.3560 LearningRate 0.0870 Epoch: 1 Global Step: 16770 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:19:55,480-Speed 3117.86 samples/sec Loss 15.3299 LearningRate 0.0869 Epoch: 1 Global Step: 16780 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:19:58,761-Speed 3122.21 samples/sec Loss 15.3381 LearningRate 0.0869 Epoch: 1 Global Step: 16790 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:20:02,095-Speed 3071.33 samples/sec Loss 15.1473 LearningRate 0.0869 Epoch: 1 Global Step: 16800 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:20:05,397-Speed 3102.16 samples/sec Loss 15.0855 LearningRate 0.0869 Epoch: 1 Global Step: 16810 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:20:08,666-Speed 3133.36 samples/sec Loss 15.3119 LearningRate 0.0869 Epoch: 1 Global Step: 16820 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:20:11,959-Speed 3110.80 samples/sec Loss 15.2914 LearningRate 0.0869 Epoch: 1 Global Step: 16830 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:20:15,336-Speed 3032.91 samples/sec Loss 15.2223 LearningRate 0.0869 Epoch: 1 Global Step: 16840 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:20:18,690-Speed 3054.23 samples/sec Loss 15.3601 LearningRate 0.0869 Epoch: 1 Global Step: 16850 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:20:22,014-Speed 3081.62 samples/sec Loss 15.1712 LearningRate 0.0869 Epoch: 1 Global Step: 16860 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:20:25,330-Speed 3088.30 samples/sec Loss 15.4292 LearningRate 0.0869 Epoch: 1 Global Step: 16870 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:20:28,585-Speed 3147.77 samples/sec Loss 15.2004 LearningRate 0.0869 Epoch: 1 Global Step: 16880 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:20:31,920-Speed 3070.46 samples/sec Loss 15.3529 LearningRate 0.0869 Epoch: 1 Global Step: 16890 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:20:35,171-Speed 3151.18 samples/sec Loss 15.3008 LearningRate 0.0869 Epoch: 1 Global Step: 16900 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:20:38,487-Speed 3089.15 samples/sec Loss 15.2541 LearningRate 0.0868 Epoch: 1 Global Step: 16910 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:20:41,787-Speed 3103.90 samples/sec Loss 15.3058 LearningRate 0.0868 Epoch: 1 Global Step: 16920 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:20:45,037-Speed 3151.92 samples/sec Loss 14.9788 LearningRate 0.0868 Epoch: 1 Global Step: 16930 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:20:48,303-Speed 3135.51 samples/sec Loss 15.3806 LearningRate 0.0868 Epoch: 1 Global Step: 16940 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:20:51,620-Speed 3088.92 samples/sec Loss 15.2195 LearningRate 0.0868 Epoch: 1 Global Step: 16950 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:20:54,950-Speed 3075.42 samples/sec Loss 15.3706 LearningRate 0.0868 Epoch: 1 Global Step: 16960 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:20:58,283-Speed 3073.43 samples/sec Loss 15.1653 LearningRate 0.0868 Epoch: 1 Global Step: 16970 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:21:01,552-Speed 3133.76 samples/sec Loss 15.1028 LearningRate 0.0868 Epoch: 1 Global Step: 16980 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:21:04,839-Speed 3115.88 samples/sec Loss 15.2534 LearningRate 0.0868 Epoch: 1 Global Step: 16990 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:21:08,246-Speed 3007.21 samples/sec Loss 15.1494 LearningRate 0.0868 Epoch: 1 Global Step: 17000 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:21:11,629-Speed 3027.47 samples/sec Loss 15.2051 LearningRate 0.0868 Epoch: 1 Global Step: 17010 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:21:14,944-Speed 3089.62 samples/sec Loss 15.2018 LearningRate 0.0868 Epoch: 1 Global Step: 17020 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:21:18,284-Speed 3066.66 samples/sec Loss 15.3767 LearningRate 0.0868 Epoch: 1 Global Step: 17030 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:21:21,598-Speed 3091.48 samples/sec Loss 15.1443 LearningRate 0.0868 Epoch: 1 Global Step: 17040 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:21:24,920-Speed 3082.71 samples/sec Loss 15.2529 LearningRate 0.0867 Epoch: 1 Global Step: 17050 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:21:28,262-Speed 3065.40 samples/sec Loss 15.0507 LearningRate 0.0867 Epoch: 1 Global Step: 17060 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:21:31,579-Speed 3088.55 samples/sec Loss 15.1389 LearningRate 0.0867 Epoch: 1 Global Step: 17070 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:21:34,978-Speed 3013.42 samples/sec Loss 15.2000 LearningRate 0.0867 Epoch: 1 Global Step: 17080 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:21:38,255-Speed 3125.38 samples/sec Loss 15.1968 LearningRate 0.0867 Epoch: 1 Global Step: 17090 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:21:41,535-Speed 3122.48 samples/sec Loss 15.0159 LearningRate 0.0867 Epoch: 1 Global Step: 17100 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:21:44,777-Speed 3160.20 samples/sec Loss 15.3396 LearningRate 0.0867 Epoch: 1 Global Step: 17110 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:21:48,096-Speed 3085.76 samples/sec Loss 15.2519 LearningRate 0.0867 Epoch: 1 Global Step: 17120 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:21:51,438-Speed 3064.86 samples/sec Loss 15.3089 LearningRate 0.0867 Epoch: 1 Global Step: 17130 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:21:54,697-Speed 3142.98 samples/sec Loss 15.1090 LearningRate 0.0867 Epoch: 1 Global Step: 17140 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:21:57,991-Speed 3109.22 samples/sec Loss 15.2090 LearningRate 0.0867 Epoch: 1 Global Step: 17150 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:22:01,329-Speed 3069.03 samples/sec Loss 15.1798 LearningRate 0.0867 Epoch: 1 Global Step: 17160 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:22:04,610-Speed 3121.99 samples/sec Loss 15.2514 LearningRate 0.0867 Epoch: 1 Global Step: 17170 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:22:07,926-Speed 3088.26 samples/sec Loss 15.1428 LearningRate 0.0866 Epoch: 1 Global Step: 17180 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:22:11,328-Speed 3011.41 samples/sec Loss 15.2615 LearningRate 0.0866 Epoch: 1 Global Step: 17190 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:22:14,694-Speed 3043.19 samples/sec Loss 15.0915 LearningRate 0.0866 Epoch: 1 Global Step: 17200 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:22:18,035-Speed 3065.92 samples/sec Loss 15.2755 LearningRate 0.0866 Epoch: 1 Global Step: 17210 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:22:21,347-Speed 3092.97 samples/sec Loss 15.2038 LearningRate 0.0866 Epoch: 1 Global Step: 17220 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:22:24,642-Speed 3108.83 samples/sec Loss 15.2295 LearningRate 0.0866 Epoch: 1 Global Step: 17230 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:22:27,898-Speed 3145.63 samples/sec Loss 15.1331 LearningRate 0.0866 Epoch: 1 Global Step: 17240 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:22:31,197-Speed 3104.60 samples/sec Loss 15.1442 LearningRate 0.0866 Epoch: 1 Global Step: 17250 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:22:34,542-Speed 3062.73 samples/sec Loss 15.3272 LearningRate 0.0866 Epoch: 1 Global Step: 17260 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:22:37,809-Speed 3134.34 samples/sec Loss 15.2063 LearningRate 0.0866 Epoch: 1 Global Step: 17270 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:22:41,130-Speed 3090.13 samples/sec Loss 15.1389 LearningRate 0.0866 Epoch: 1 Global Step: 17280 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:22:44,511-Speed 3029.76 samples/sec Loss 15.1556 LearningRate 0.0866 Epoch: 1 Global Step: 17290 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:22:47,826-Speed 3090.01 samples/sec Loss 15.3561 LearningRate 0.0866 Epoch: 1 Global Step: 17300 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:22:51,141-Speed 3089.76 samples/sec Loss 15.2250 LearningRate 0.0865 Epoch: 1 Global Step: 17310 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:22:54,439-Speed 3105.74 samples/sec Loss 15.1090 LearningRate 0.0865 Epoch: 1 Global Step: 17320 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:22:57,871-Speed 2983.88 samples/sec Loss 15.1134 LearningRate 0.0865 Epoch: 1 Global Step: 17330 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:23:01,288-Speed 2998.36 samples/sec Loss 15.1344 LearningRate 0.0865 Epoch: 1 Global Step: 17340 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:23:04,709-Speed 2993.40 samples/sec Loss 15.2363 LearningRate 0.0865 Epoch: 1 Global Step: 17350 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-04-27 03:23:08,014-Speed 3099.81 samples/sec Loss 15.1752 LearningRate 0.0865 Epoch: 1 Global Step: 17360 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:23:11,366-Speed 3055.86 samples/sec Loss 15.2173 LearningRate 0.0865 Epoch: 1 Global Step: 17370 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:23:14,685-Speed 3086.14 samples/sec Loss 15.0116 LearningRate 0.0865 Epoch: 1 Global Step: 17380 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:23:17,983-Speed 3105.94 samples/sec Loss 15.1681 LearningRate 0.0865 Epoch: 1 Global Step: 17390 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:23:21,315-Speed 3074.16 samples/sec Loss 15.0919 LearningRate 0.0865 Epoch: 1 Global Step: 17400 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:23:24,664-Speed 3058.18 samples/sec Loss 15.1449 LearningRate 0.0865 Epoch: 1 Global Step: 17410 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:23:28,008-Speed 3063.33 samples/sec Loss 15.0936 LearningRate 0.0865 Epoch: 1 Global Step: 17420 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:23:31,390-Speed 3028.43 samples/sec Loss 15.1762 LearningRate 0.0865 Epoch: 1 Global Step: 17430 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:23:34,738-Speed 3060.02 samples/sec Loss 15.3984 LearningRate 0.0865 Epoch: 1 Global Step: 17440 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:23:38,014-Speed 3126.86 samples/sec Loss 15.2022 LearningRate 0.0864 Epoch: 1 Global Step: 17450 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:23:41,348-Speed 3071.69 samples/sec Loss 15.1178 LearningRate 0.0864 Epoch: 1 Global Step: 17460 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:23:44,654-Speed 3098.52 samples/sec Loss 15.1743 LearningRate 0.0864 Epoch: 1 Global Step: 17470 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:23:48,060-Speed 3007.28 samples/sec Loss 14.9733 LearningRate 0.0864 Epoch: 1 Global Step: 17480 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:23:51,477-Speed 2998.40 samples/sec Loss 15.2495 LearningRate 0.0864 Epoch: 1 Global Step: 17490 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:23:54,810-Speed 3072.84 samples/sec Loss 15.1854 LearningRate 0.0864 Epoch: 1 Global Step: 17500 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:23:58,176-Speed 3043.14 samples/sec Loss 15.1956 LearningRate 0.0864 Epoch: 1 Global Step: 17510 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:24:01,483-Speed 3097.62 samples/sec Loss 15.2704 LearningRate 0.0864 Epoch: 1 Global Step: 17520 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:24:04,760-Speed 3125.59 samples/sec Loss 15.2272 LearningRate 0.0864 Epoch: 1 Global Step: 17530 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:24:08,161-Speed 3011.88 samples/sec Loss 15.2319 LearningRate 0.0864 Epoch: 1 Global Step: 17540 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:24:11,601-Speed 2976.76 samples/sec Loss 15.0704 LearningRate 0.0864 Epoch: 1 Global Step: 17550 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:24:14,914-Speed 3092.22 samples/sec Loss 15.2791 LearningRate 0.0864 Epoch: 1 Global Step: 17560 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:24:18,276-Speed 3046.40 samples/sec Loss 15.0863 LearningRate 0.0864 Epoch: 1 Global Step: 17570 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:24:21,588-Speed 3093.88 samples/sec Loss 15.1523 LearningRate 0.0863 Epoch: 1 Global Step: 17580 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:24:24,897-Speed 3095.27 samples/sec Loss 15.2778 LearningRate 0.0863 Epoch: 1 Global Step: 17590 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:24:28,246-Speed 3058.43 samples/sec Loss 15.0739 LearningRate 0.0863 Epoch: 1 Global Step: 17600 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:24:31,616-Speed 3039.69 samples/sec Loss 15.1572 LearningRate 0.0863 Epoch: 1 Global Step: 17610 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:24:34,962-Speed 3060.32 samples/sec Loss 15.0546 LearningRate 0.0863 Epoch: 1 Global Step: 17620 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:24:38,266-Speed 3100.86 samples/sec Loss 15.1887 LearningRate 0.0863 Epoch: 1 Global Step: 17630 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:24:41,550-Speed 3118.82 samples/sec Loss 14.9595 LearningRate 0.0863 Epoch: 1 Global Step: 17640 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:24:44,861-Speed 3093.50 samples/sec Loss 15.1884 LearningRate 0.0863 Epoch: 1 Global Step: 17650 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:24:48,144-Speed 3120.06 samples/sec Loss 15.2785 LearningRate 0.0863 Epoch: 1 Global Step: 17660 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:24:51,454-Speed 3094.35 samples/sec Loss 15.0099 LearningRate 0.0863 Epoch: 1 Global Step: 17670 Fp16 Grad Scale: 65536 Required: 22 hours Training: 2022-04-27 03:24:54,770-Speed 3089.35 samples/sec Loss 14.9926 LearningRate 0.0863 Epoch: 1 Global Step: 17680 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:24:58,060-Speed 3114.22 samples/sec Loss 15.0618 LearningRate 0.0863 Epoch: 1 Global Step: 17690 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:25:01,352-Speed 3111.53 samples/sec Loss 15.1348 LearningRate 0.0863 Epoch: 1 Global Step: 17700 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:25:04,692-Speed 3066.63 samples/sec Loss 14.9178 LearningRate 0.0863 Epoch: 1 Global Step: 17710 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:25:08,017-Speed 3080.25 samples/sec Loss 14.9843 LearningRate 0.0862 Epoch: 1 Global Step: 17720 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:25:11,332-Speed 3089.88 samples/sec Loss 15.2903 LearningRate 0.0862 Epoch: 1 Global Step: 17730 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:25:14,637-Speed 3099.26 samples/sec Loss 15.1036 LearningRate 0.0862 Epoch: 1 Global Step: 17740 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:25:17,915-Speed 3124.73 samples/sec Loss 15.0109 LearningRate 0.0862 Epoch: 1 Global Step: 17750 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:25:21,256-Speed 3065.93 samples/sec Loss 15.0815 LearningRate 0.0862 Epoch: 1 Global Step: 17760 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:25:24,579-Speed 3082.17 samples/sec Loss 15.2312 LearningRate 0.0862 Epoch: 1 Global Step: 17770 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:25:27,927-Speed 3059.76 samples/sec Loss 15.1871 LearningRate 0.0862 Epoch: 1 Global Step: 17780 Fp16 Grad Scale: 262144 Required: 22 hours Training: 2022-04-27 03:25:31,200-Speed 3129.10 samples/sec Loss 15.2028 LearningRate 0.0862 Epoch: 1 Global Step: 17790 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:25:34,493-Speed 3110.98 samples/sec Loss 15.3953 LearningRate 0.0862 Epoch: 1 Global Step: 17800 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:25:37,745-Speed 3149.50 samples/sec Loss 15.0557 LearningRate 0.0862 Epoch: 1 Global Step: 17810 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:25:41,032-Speed 3117.50 samples/sec Loss 14.8698 LearningRate 0.0862 Epoch: 1 Global Step: 17820 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:25:44,321-Speed 3114.29 samples/sec Loss 15.0983 LearningRate 0.0862 Epoch: 1 Global Step: 17830 Fp16 Grad Scale: 131072 Required: 22 hours Training: 2022-04-27 03:25:47,616-Speed 3108.76 samples/sec Loss 14.9814 LearningRate 0.0862 Epoch: 1 Global Step: 17840 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:25:50,980-Speed 3044.72 samples/sec Loss 15.1747 LearningRate 0.0861 Epoch: 1 Global Step: 17850 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:25:54,305-Speed 3081.33 samples/sec Loss 15.1243 LearningRate 0.0861 Epoch: 1 Global Step: 17860 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:25:57,623-Speed 3089.12 samples/sec Loss 15.0665 LearningRate 0.0861 Epoch: 1 Global Step: 17870 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:26:00,951-Speed 3078.18 samples/sec Loss 15.0478 LearningRate 0.0861 Epoch: 1 Global Step: 17880 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:26:04,288-Speed 3068.94 samples/sec Loss 15.0583 LearningRate 0.0861 Epoch: 1 Global Step: 17890 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:26:07,645-Speed 3050.92 samples/sec Loss 15.1290 LearningRate 0.0861 Epoch: 1 Global Step: 17900 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:26:11,049-Speed 3009.78 samples/sec Loss 15.1210 LearningRate 0.0861 Epoch: 1 Global Step: 17910 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:26:14,398-Speed 3057.95 samples/sec Loss 14.9979 LearningRate 0.0861 Epoch: 1 Global Step: 17920 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:26:17,765-Speed 3042.15 samples/sec Loss 15.0156 LearningRate 0.0861 Epoch: 1 Global Step: 17930 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:26:21,078-Speed 3092.27 samples/sec Loss 15.1855 LearningRate 0.0861 Epoch: 1 Global Step: 17940 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:26:24,382-Speed 3100.37 samples/sec Loss 15.0410 LearningRate 0.0861 Epoch: 1 Global Step: 17950 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:26:27,726-Speed 3062.79 samples/sec Loss 14.9936 LearningRate 0.0861 Epoch: 1 Global Step: 17960 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:26:31,040-Speed 3090.87 samples/sec Loss 15.0221 LearningRate 0.0861 Epoch: 1 Global Step: 17970 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:26:34,409-Speed 3040.61 samples/sec Loss 15.1371 LearningRate 0.0860 Epoch: 1 Global Step: 17980 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:26:37,676-Speed 3135.11 samples/sec Loss 14.9001 LearningRate 0.0860 Epoch: 1 Global Step: 17990 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:26:41,020-Speed 3063.25 samples/sec Loss 15.0990 LearningRate 0.0860 Epoch: 1 Global Step: 18000 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:26:44,384-Speed 3044.52 samples/sec Loss 15.1012 LearningRate 0.0860 Epoch: 1 Global Step: 18010 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:26:47,717-Speed 3073.30 samples/sec Loss 14.9493 LearningRate 0.0860 Epoch: 1 Global Step: 18020 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:26:51,112-Speed 3017.21 samples/sec Loss 15.0398 LearningRate 0.0860 Epoch: 1 Global Step: 18030 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:26:54,470-Speed 3050.20 samples/sec Loss 15.0873 LearningRate 0.0860 Epoch: 1 Global Step: 18040 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:26:57,766-Speed 3107.60 samples/sec Loss 15.0193 LearningRate 0.0860 Epoch: 1 Global Step: 18050 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:27:01,122-Speed 3052.89 samples/sec Loss 15.0197 LearningRate 0.0860 Epoch: 1 Global Step: 18060 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:27:04,460-Speed 3068.42 samples/sec Loss 14.9366 LearningRate 0.0860 Epoch: 1 Global Step: 18070 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:27:07,762-Speed 3101.85 samples/sec Loss 15.0481 LearningRate 0.0860 Epoch: 1 Global Step: 18080 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:27:11,158-Speed 3016.49 samples/sec Loss 15.0849 LearningRate 0.0860 Epoch: 1 Global Step: 18090 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-04-27 03:27:14,509-Speed 3056.86 samples/sec Loss 14.9035 LearningRate 0.0860 Epoch: 1 Global Step: 18100 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:27:17,836-Speed 3078.75 samples/sec Loss 15.1691 LearningRate 0.0860 Epoch: 1 Global Step: 18110 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:27:21,157-Speed 3083.93 samples/sec Loss 15.0879 LearningRate 0.0859 Epoch: 1 Global Step: 18120 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:27:24,551-Speed 3018.20 samples/sec Loss 15.0547 LearningRate 0.0859 Epoch: 1 Global Step: 18130 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:27:27,938-Speed 3024.34 samples/sec Loss 14.8966 LearningRate 0.0859 Epoch: 1 Global Step: 18140 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:27:31,289-Speed 3056.57 samples/sec Loss 14.9849 LearningRate 0.0859 Epoch: 1 Global Step: 18150 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:27:34,607-Speed 3087.23 samples/sec Loss 14.9674 LearningRate 0.0859 Epoch: 1 Global Step: 18160 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:27:37,935-Speed 3077.99 samples/sec Loss 15.0055 LearningRate 0.0859 Epoch: 1 Global Step: 18170 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:27:41,297-Speed 3047.13 samples/sec Loss 14.9231 LearningRate 0.0859 Epoch: 1 Global Step: 18180 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:27:44,621-Speed 3081.15 samples/sec Loss 14.9271 LearningRate 0.0859 Epoch: 1 Global Step: 18190 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:27:47,947-Speed 3079.54 samples/sec Loss 15.0827 LearningRate 0.0859 Epoch: 1 Global Step: 18200 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:27:51,319-Speed 3038.22 samples/sec Loss 14.9354 LearningRate 0.0859 Epoch: 1 Global Step: 18210 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:27:54,699-Speed 3030.18 samples/sec Loss 14.8800 LearningRate 0.0859 Epoch: 1 Global Step: 18220 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:27:58,023-Speed 3084.43 samples/sec Loss 14.9808 LearningRate 0.0859 Epoch: 1 Global Step: 18230 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:28:01,363-Speed 3066.21 samples/sec Loss 15.0074 LearningRate 0.0859 Epoch: 1 Global Step: 18240 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:28:04,723-Speed 3048.52 samples/sec Loss 15.1408 LearningRate 0.0858 Epoch: 1 Global Step: 18250 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:28:08,074-Speed 3056.98 samples/sec Loss 15.1166 LearningRate 0.0858 Epoch: 1 Global Step: 18260 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:28:11,420-Speed 3061.59 samples/sec Loss 15.0706 LearningRate 0.0858 Epoch: 1 Global Step: 18270 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:28:14,797-Speed 3032.79 samples/sec Loss 14.8525 LearningRate 0.0858 Epoch: 1 Global Step: 18280 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:28:18,191-Speed 3017.84 samples/sec Loss 15.0327 LearningRate 0.0858 Epoch: 1 Global Step: 18290 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:28:21,470-Speed 3124.53 samples/sec Loss 15.1405 LearningRate 0.0858 Epoch: 1 Global Step: 18300 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:28:24,760-Speed 3113.16 samples/sec Loss 14.9043 LearningRate 0.0858 Epoch: 1 Global Step: 18310 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:28:28,074-Speed 3090.06 samples/sec Loss 15.0298 LearningRate 0.0858 Epoch: 1 Global Step: 18320 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:28:31,334-Speed 3143.75 samples/sec Loss 14.9682 LearningRate 0.0858 Epoch: 1 Global Step: 18330 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:28:34,744-Speed 3003.25 samples/sec Loss 15.2191 LearningRate 0.0858 Epoch: 1 Global Step: 18340 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:28:38,085-Speed 3065.29 samples/sec Loss 15.0235 LearningRate 0.0858 Epoch: 1 Global Step: 18350 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:28:41,416-Speed 3075.89 samples/sec Loss 14.9092 LearningRate 0.0858 Epoch: 1 Global Step: 18360 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:28:44,761-Speed 3062.14 samples/sec Loss 15.0771 LearningRate 0.0858 Epoch: 1 Global Step: 18370 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:28:48,140-Speed 3031.30 samples/sec Loss 14.8969 LearningRate 0.0857 Epoch: 1 Global Step: 18380 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:28:51,454-Speed 3090.87 samples/sec Loss 14.8869 LearningRate 0.0857 Epoch: 1 Global Step: 18390 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:28:54,855-Speed 3011.37 samples/sec Loss 14.8827 LearningRate 0.0857 Epoch: 1 Global Step: 18400 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:28:58,199-Speed 3063.35 samples/sec Loss 14.8504 LearningRate 0.0857 Epoch: 1 Global Step: 18410 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:29:01,556-Speed 3051.12 samples/sec Loss 15.0340 LearningRate 0.0857 Epoch: 1 Global Step: 18420 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:29:04,872-Speed 3089.59 samples/sec Loss 15.0381 LearningRate 0.0857 Epoch: 1 Global Step: 18430 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:29:08,163-Speed 3111.99 samples/sec Loss 14.9338 LearningRate 0.0857 Epoch: 1 Global Step: 18440 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:29:11,477-Speed 3091.07 samples/sec Loss 15.0345 LearningRate 0.0857 Epoch: 1 Global Step: 18450 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:29:14,836-Speed 3049.74 samples/sec Loss 15.0474 LearningRate 0.0857 Epoch: 1 Global Step: 18460 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:29:18,189-Speed 3054.13 samples/sec Loss 14.8443 LearningRate 0.0857 Epoch: 1 Global Step: 18470 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:29:21,512-Speed 3082.59 samples/sec Loss 14.9975 LearningRate 0.0857 Epoch: 1 Global Step: 18480 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:29:24,783-Speed 3131.47 samples/sec Loss 14.9349 LearningRate 0.0857 Epoch: 1 Global Step: 18490 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:29:28,143-Speed 3048.23 samples/sec Loss 15.1132 LearningRate 0.0857 Epoch: 1 Global Step: 18500 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:29:31,497-Speed 3055.16 samples/sec Loss 14.9985 LearningRate 0.0857 Epoch: 1 Global Step: 18510 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:29:34,800-Speed 3101.04 samples/sec Loss 15.0951 LearningRate 0.0856 Epoch: 1 Global Step: 18520 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:29:38,063-Speed 3138.79 samples/sec Loss 15.0024 LearningRate 0.0856 Epoch: 1 Global Step: 18530 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:29:41,393-Speed 3076.13 samples/sec Loss 14.8511 LearningRate 0.0856 Epoch: 1 Global Step: 18540 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:29:44,725-Speed 3074.48 samples/sec Loss 15.0339 LearningRate 0.0856 Epoch: 1 Global Step: 18550 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:29:48,026-Speed 3102.53 samples/sec Loss 15.1566 LearningRate 0.0856 Epoch: 1 Global Step: 18560 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:29:51,361-Speed 3071.41 samples/sec Loss 14.9229 LearningRate 0.0856 Epoch: 1 Global Step: 18570 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:29:54,676-Speed 3090.10 samples/sec Loss 14.9437 LearningRate 0.0856 Epoch: 1 Global Step: 18580 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:29:58,059-Speed 3027.91 samples/sec Loss 15.1181 LearningRate 0.0856 Epoch: 1 Global Step: 18590 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:30:01,413-Speed 3053.72 samples/sec Loss 15.0051 LearningRate 0.0856 Epoch: 1 Global Step: 18600 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:30:04,732-Speed 3086.79 samples/sec Loss 14.9150 LearningRate 0.0856 Epoch: 1 Global Step: 18610 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:30:08,019-Speed 3116.49 samples/sec Loss 15.0041 LearningRate 0.0856 Epoch: 1 Global Step: 18620 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:30:11,333-Speed 3090.40 samples/sec Loss 15.0693 LearningRate 0.0856 Epoch: 1 Global Step: 18630 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:30:14,665-Speed 3073.99 samples/sec Loss 14.9668 LearningRate 0.0856 Epoch: 1 Global Step: 18640 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:30:17,932-Speed 3135.51 samples/sec Loss 15.0625 LearningRate 0.0855 Epoch: 1 Global Step: 18650 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:30:21,195-Speed 3139.39 samples/sec Loss 15.0575 LearningRate 0.0855 Epoch: 1 Global Step: 18660 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:30:24,587-Speed 3019.26 samples/sec Loss 15.0600 LearningRate 0.0855 Epoch: 1 Global Step: 18670 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:30:27,876-Speed 3114.73 samples/sec Loss 14.8878 LearningRate 0.0855 Epoch: 1 Global Step: 18680 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:30:31,133-Speed 3145.38 samples/sec Loss 14.8290 LearningRate 0.0855 Epoch: 1 Global Step: 18690 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:30:34,458-Speed 3080.38 samples/sec Loss 14.9722 LearningRate 0.0855 Epoch: 1 Global Step: 18700 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:30:37,837-Speed 3031.29 samples/sec Loss 14.9493 LearningRate 0.0855 Epoch: 1 Global Step: 18710 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:30:41,190-Speed 3055.04 samples/sec Loss 14.9122 LearningRate 0.0855 Epoch: 1 Global Step: 18720 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:30:44,486-Speed 3107.30 samples/sec Loss 15.0004 LearningRate 0.0855 Epoch: 1 Global Step: 18730 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:30:47,746-Speed 3142.35 samples/sec Loss 14.8838 LearningRate 0.0855 Epoch: 1 Global Step: 18740 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:30:51,082-Speed 3070.54 samples/sec Loss 15.0348 LearningRate 0.0855 Epoch: 1 Global Step: 18750 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:30:54,470-Speed 3023.05 samples/sec Loss 14.8749 LearningRate 0.0855 Epoch: 1 Global Step: 18760 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:30:57,825-Speed 3053.04 samples/sec Loss 15.0023 LearningRate 0.0855 Epoch: 1 Global Step: 18770 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:31:01,136-Speed 3093.72 samples/sec Loss 15.0971 LearningRate 0.0855 Epoch: 1 Global Step: 18780 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:31:04,470-Speed 3072.36 samples/sec Loss 14.9743 LearningRate 0.0854 Epoch: 1 Global Step: 18790 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:31:07,778-Speed 3096.13 samples/sec Loss 15.0151 LearningRate 0.0854 Epoch: 1 Global Step: 18800 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:31:11,122-Speed 3063.59 samples/sec Loss 14.9366 LearningRate 0.0854 Epoch: 1 Global Step: 18810 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:31:14,452-Speed 3076.04 samples/sec Loss 14.9397 LearningRate 0.0854 Epoch: 1 Global Step: 18820 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:31:17,803-Speed 3056.53 samples/sec Loss 14.8274 LearningRate 0.0854 Epoch: 1 Global Step: 18830 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:31:21,071-Speed 3134.67 samples/sec Loss 14.9237 LearningRate 0.0854 Epoch: 1 Global Step: 18840 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:31:24,379-Speed 3095.93 samples/sec Loss 14.8501 LearningRate 0.0854 Epoch: 1 Global Step: 18850 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:31:27,659-Speed 3123.34 samples/sec Loss 14.9231 LearningRate 0.0854 Epoch: 1 Global Step: 18860 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:31:30,964-Speed 3099.52 samples/sec Loss 14.8360 LearningRate 0.0854 Epoch: 1 Global Step: 18870 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:31:34,281-Speed 3087.42 samples/sec Loss 14.9616 LearningRate 0.0854 Epoch: 1 Global Step: 18880 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:31:37,584-Speed 3102.08 samples/sec Loss 14.8942 LearningRate 0.0854 Epoch: 1 Global Step: 18890 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:31:40,926-Speed 3064.16 samples/sec Loss 15.0211 LearningRate 0.0854 Epoch: 1 Global Step: 18900 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:31:44,204-Speed 3125.46 samples/sec Loss 14.8980 LearningRate 0.0854 Epoch: 1 Global Step: 18910 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:31:47,602-Speed 3014.13 samples/sec Loss 14.9933 LearningRate 0.0853 Epoch: 1 Global Step: 18920 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:31:50,936-Speed 3072.41 samples/sec Loss 14.8690 LearningRate 0.0853 Epoch: 1 Global Step: 18930 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:31:54,281-Speed 3061.65 samples/sec Loss 14.9228 LearningRate 0.0853 Epoch: 1 Global Step: 18940 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:31:57,574-Speed 3110.83 samples/sec Loss 14.7758 LearningRate 0.0853 Epoch: 1 Global Step: 18950 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:32:00,923-Speed 3057.91 samples/sec Loss 14.9349 LearningRate 0.0853 Epoch: 1 Global Step: 18960 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:32:04,176-Speed 3149.27 samples/sec Loss 14.9482 LearningRate 0.0853 Epoch: 1 Global Step: 18970 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:32:07,484-Speed 3096.30 samples/sec Loss 15.0302 LearningRate 0.0853 Epoch: 1 Global Step: 18980 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:32:10,851-Speed 3041.87 samples/sec Loss 14.7718 LearningRate 0.0853 Epoch: 1 Global Step: 18990 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:32:14,151-Speed 3104.33 samples/sec Loss 14.7778 LearningRate 0.0853 Epoch: 1 Global Step: 19000 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:32:17,493-Speed 3064.74 samples/sec Loss 14.7721 LearningRate 0.0853 Epoch: 1 Global Step: 19010 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:32:20,803-Speed 3094.95 samples/sec Loss 14.9008 LearningRate 0.0853 Epoch: 1 Global Step: 19020 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:32:24,102-Speed 3104.30 samples/sec Loss 14.9968 LearningRate 0.0853 Epoch: 1 Global Step: 19030 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:32:27,391-Speed 3114.42 samples/sec Loss 14.8195 LearningRate 0.0853 Epoch: 1 Global Step: 19040 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:32:30,708-Speed 3087.88 samples/sec Loss 15.0174 LearningRate 0.0853 Epoch: 1 Global Step: 19050 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:32:34,003-Speed 3108.21 samples/sec Loss 14.8088 LearningRate 0.0852 Epoch: 1 Global Step: 19060 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:32:37,378-Speed 3035.34 samples/sec Loss 15.0365 LearningRate 0.0852 Epoch: 1 Global Step: 19070 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:32:40,695-Speed 3087.99 samples/sec Loss 14.8275 LearningRate 0.0852 Epoch: 1 Global Step: 19080 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:32:44,034-Speed 3067.02 samples/sec Loss 14.9582 LearningRate 0.0852 Epoch: 1 Global Step: 19090 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:32:47,351-Speed 3088.06 samples/sec Loss 14.8908 LearningRate 0.0852 Epoch: 1 Global Step: 19100 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:32:50,650-Speed 3105.13 samples/sec Loss 14.9067 LearningRate 0.0852 Epoch: 1 Global Step: 19110 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:32:53,993-Speed 3063.98 samples/sec Loss 14.8106 LearningRate 0.0852 Epoch: 1 Global Step: 19120 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:32:57,333-Speed 3066.58 samples/sec Loss 14.7105 LearningRate 0.0852 Epoch: 1 Global Step: 19130 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:33:00,624-Speed 3112.93 samples/sec Loss 14.8180 LearningRate 0.0852 Epoch: 1 Global Step: 19140 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:33:03,954-Speed 3075.43 samples/sec Loss 14.9042 LearningRate 0.0852 Epoch: 1 Global Step: 19150 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:33:07,260-Speed 3099.00 samples/sec Loss 14.8209 LearningRate 0.0852 Epoch: 1 Global Step: 19160 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:33:10,638-Speed 3031.36 samples/sec Loss 14.8565 LearningRate 0.0852 Epoch: 1 Global Step: 19170 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:33:14,042-Speed 3009.49 samples/sec Loss 14.9181 LearningRate 0.0852 Epoch: 1 Global Step: 19180 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:33:17,365-Speed 3082.12 samples/sec Loss 14.7314 LearningRate 0.0851 Epoch: 1 Global Step: 19190 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:33:20,618-Speed 3149.18 samples/sec Loss 14.8663 LearningRate 0.0851 Epoch: 1 Global Step: 19200 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:33:23,931-Speed 3091.05 samples/sec Loss 14.8037 LearningRate 0.0851 Epoch: 1 Global Step: 19210 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:33:27,234-Speed 3101.20 samples/sec Loss 15.0185 LearningRate 0.0851 Epoch: 1 Global Step: 19220 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:33:30,521-Speed 3116.47 samples/sec Loss 14.8193 LearningRate 0.0851 Epoch: 1 Global Step: 19230 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:33:33,868-Speed 3060.29 samples/sec Loss 14.6785 LearningRate 0.0851 Epoch: 1 Global Step: 19240 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:33:37,151-Speed 3120.15 samples/sec Loss 15.0391 LearningRate 0.0851 Epoch: 1 Global Step: 19250 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:33:40,480-Speed 3076.33 samples/sec Loss 14.8239 LearningRate 0.0851 Epoch: 1 Global Step: 19260 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:33:43,788-Speed 3097.07 samples/sec Loss 14.7861 LearningRate 0.0851 Epoch: 1 Global Step: 19270 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:33:47,121-Speed 3072.72 samples/sec Loss 14.8338 LearningRate 0.0851 Epoch: 1 Global Step: 19280 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:33:50,444-Speed 3082.31 samples/sec Loss 14.8809 LearningRate 0.0851 Epoch: 1 Global Step: 19290 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:33:53,720-Speed 3126.40 samples/sec Loss 14.8438 LearningRate 0.0851 Epoch: 1 Global Step: 19300 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:33:56,995-Speed 3127.63 samples/sec Loss 14.7821 LearningRate 0.0851 Epoch: 1 Global Step: 19310 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:34:00,283-Speed 3115.65 samples/sec Loss 14.9898 LearningRate 0.0851 Epoch: 1 Global Step: 19320 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:34:03,567-Speed 3119.00 samples/sec Loss 14.8933 LearningRate 0.0850 Epoch: 1 Global Step: 19330 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:34:06,931-Speed 3044.17 samples/sec Loss 14.9272 LearningRate 0.0850 Epoch: 1 Global Step: 19340 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:34:10,246-Speed 3090.43 samples/sec Loss 14.8206 LearningRate 0.0850 Epoch: 1 Global Step: 19350 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:34:13,519-Speed 3129.70 samples/sec Loss 14.8624 LearningRate 0.0850 Epoch: 1 Global Step: 19360 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-04-27 03:34:16,837-Speed 3087.24 samples/sec Loss 14.7145 LearningRate 0.0850 Epoch: 1 Global Step: 19370 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:34:20,162-Speed 3079.88 samples/sec Loss 14.8525 LearningRate 0.0850 Epoch: 1 Global Step: 19380 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:34:23,531-Speed 3040.74 samples/sec Loss 14.8709 LearningRate 0.0850 Epoch: 1 Global Step: 19390 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:34:26,796-Speed 3137.43 samples/sec Loss 14.8363 LearningRate 0.0850 Epoch: 1 Global Step: 19400 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:34:30,073-Speed 3125.94 samples/sec Loss 14.7324 LearningRate 0.0850 Epoch: 1 Global Step: 19410 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:34:33,379-Speed 3098.20 samples/sec Loss 14.7105 LearningRate 0.0850 Epoch: 1 Global Step: 19420 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:34:36,668-Speed 3114.51 samples/sec Loss 14.8278 LearningRate 0.0850 Epoch: 1 Global Step: 19430 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:34:39,998-Speed 3075.58 samples/sec Loss 14.8486 LearningRate 0.0850 Epoch: 1 Global Step: 19440 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:34:43,268-Speed 3132.62 samples/sec Loss 14.7592 LearningRate 0.0850 Epoch: 1 Global Step: 19450 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:34:46,544-Speed 3126.86 samples/sec Loss 14.8965 LearningRate 0.0849 Epoch: 1 Global Step: 19460 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:34:49,791-Speed 3154.33 samples/sec Loss 14.8942 LearningRate 0.0849 Epoch: 1 Global Step: 19470 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:34:53,057-Speed 3135.82 samples/sec Loss 14.8302 LearningRate 0.0849 Epoch: 1 Global Step: 19480 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:34:56,347-Speed 3113.51 samples/sec Loss 14.8247 LearningRate 0.0849 Epoch: 1 Global Step: 19490 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:34:59,649-Speed 3102.36 samples/sec Loss 14.7786 LearningRate 0.0849 Epoch: 1 Global Step: 19500 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:35:02,939-Speed 3112.42 samples/sec Loss 14.7231 LearningRate 0.0849 Epoch: 1 Global Step: 19510 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:35:06,316-Speed 3033.21 samples/sec Loss 14.7675 LearningRate 0.0849 Epoch: 1 Global Step: 19520 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:35:09,669-Speed 3056.77 samples/sec Loss 14.8580 LearningRate 0.0849 Epoch: 1 Global Step: 19530 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:35:12,978-Speed 3096.03 samples/sec Loss 14.6900 LearningRate 0.0849 Epoch: 1 Global Step: 19540 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:35:16,253-Speed 3128.03 samples/sec Loss 14.7729 LearningRate 0.0849 Epoch: 1 Global Step: 19550 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:35:19,523-Speed 3132.13 samples/sec Loss 14.6372 LearningRate 0.0849 Epoch: 1 Global Step: 19560 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:35:22,785-Speed 3140.27 samples/sec Loss 14.8490 LearningRate 0.0849 Epoch: 1 Global Step: 19570 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:35:26,143-Speed 3050.26 samples/sec Loss 14.7617 LearningRate 0.0849 Epoch: 1 Global Step: 19580 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:35:29,502-Speed 3049.15 samples/sec Loss 14.8071 LearningRate 0.0849 Epoch: 1 Global Step: 19590 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:35:32,856-Speed 3053.39 samples/sec Loss 14.7556 LearningRate 0.0848 Epoch: 1 Global Step: 19600 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:35:36,159-Speed 3101.14 samples/sec Loss 14.6961 LearningRate 0.0848 Epoch: 1 Global Step: 19610 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:35:39,475-Speed 3088.90 samples/sec Loss 14.7369 LearningRate 0.0848 Epoch: 1 Global Step: 19620 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:35:42,735-Speed 3142.54 samples/sec Loss 14.8257 LearningRate 0.0848 Epoch: 1 Global Step: 19630 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:35:46,041-Speed 3097.71 samples/sec Loss 14.8930 LearningRate 0.0848 Epoch: 1 Global Step: 19640 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:35:49,380-Speed 3068.01 samples/sec Loss 14.7809 LearningRate 0.0848 Epoch: 1 Global Step: 19650 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:35:52,705-Speed 3080.47 samples/sec Loss 14.8180 LearningRate 0.0848 Epoch: 1 Global Step: 19660 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:35:55,978-Speed 3129.37 samples/sec Loss 14.8247 LearningRate 0.0848 Epoch: 1 Global Step: 19670 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-04-27 03:35:59,292-Speed 3090.62 samples/sec Loss 14.9389 LearningRate 0.0848 Epoch: 1 Global Step: 19680 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:36:02,592-Speed 3104.67 samples/sec Loss 14.7493 LearningRate 0.0848 Epoch: 1 Global Step: 19690 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:36:05,899-Speed 3096.91 samples/sec Loss 14.6787 LearningRate 0.0848 Epoch: 1 Global Step: 19700 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:36:09,181-Speed 3121.38 samples/sec Loss 14.7249 LearningRate 0.0848 Epoch: 1 Global Step: 19710 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:36:12,476-Speed 3108.90 samples/sec Loss 14.8554 LearningRate 0.0848 Epoch: 1 Global Step: 19720 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:36:15,810-Speed 3071.94 samples/sec Loss 14.8379 LearningRate 0.0847 Epoch: 1 Global Step: 19730 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:36:19,102-Speed 3111.51 samples/sec Loss 14.6916 LearningRate 0.0847 Epoch: 1 Global Step: 19740 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:36:22,387-Speed 3117.69 samples/sec Loss 14.7279 LearningRate 0.0847 Epoch: 1 Global Step: 19750 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:36:25,688-Speed 3103.34 samples/sec Loss 14.8611 LearningRate 0.0847 Epoch: 1 Global Step: 19760 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:36:29,034-Speed 3060.97 samples/sec Loss 14.7262 LearningRate 0.0847 Epoch: 1 Global Step: 19770 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:36:32,316-Speed 3120.90 samples/sec Loss 14.6684 LearningRate 0.0847 Epoch: 1 Global Step: 19780 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:36:35,633-Speed 3088.77 samples/sec Loss 14.7746 LearningRate 0.0847 Epoch: 1 Global Step: 19790 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:36:39,041-Speed 3004.97 samples/sec Loss 14.7466 LearningRate 0.0847 Epoch: 1 Global Step: 19800 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:36:42,333-Speed 3111.30 samples/sec Loss 14.7719 LearningRate 0.0847 Epoch: 1 Global Step: 19810 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:36:45,595-Speed 3140.11 samples/sec Loss 14.8382 LearningRate 0.0847 Epoch: 1 Global Step: 19820 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:36:48,923-Speed 3078.66 samples/sec Loss 14.9610 LearningRate 0.0847 Epoch: 1 Global Step: 19830 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:36:52,277-Speed 3054.41 samples/sec Loss 14.6003 LearningRate 0.0847 Epoch: 1 Global Step: 19840 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:36:55,573-Speed 3107.43 samples/sec Loss 14.7075 LearningRate 0.0847 Epoch: 1 Global Step: 19850 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:36:58,849-Speed 3127.41 samples/sec Loss 14.9352 LearningRate 0.0847 Epoch: 1 Global Step: 19860 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:37:02,263-Speed 2999.75 samples/sec Loss 14.7903 LearningRate 0.0846 Epoch: 1 Global Step: 19870 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:37:05,597-Speed 3072.35 samples/sec Loss 14.7385 LearningRate 0.0846 Epoch: 1 Global Step: 19880 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:37:08,961-Speed 3044.94 samples/sec Loss 14.7758 LearningRate 0.0846 Epoch: 1 Global Step: 19890 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:37:12,277-Speed 3089.36 samples/sec Loss 14.5032 LearningRate 0.0846 Epoch: 1 Global Step: 19900 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:37:15,666-Speed 3021.91 samples/sec Loss 14.9059 LearningRate 0.0846 Epoch: 1 Global Step: 19910 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 03:37:18,948-Speed 3121.59 samples/sec Loss 14.7804 LearningRate 0.0846 Epoch: 1 Global Step: 19920 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 03:37:22,278-Speed 3076.46 samples/sec Loss 14.8154 LearningRate 0.0846 Epoch: 1 Global Step: 19930 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 03:37:25,577-Speed 3104.06 samples/sec Loss 14.7191 LearningRate 0.0846 Epoch: 1 Global Step: 19940 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 03:37:28,869-Speed 3111.19 samples/sec Loss 14.8379 LearningRate 0.0846 Epoch: 1 Global Step: 19950 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 03:37:32,196-Speed 3078.84 samples/sec Loss 14.7363 LearningRate 0.0846 Epoch: 1 Global Step: 19960 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 03:37:35,506-Speed 3095.10 samples/sec Loss 14.6950 LearningRate 0.0846 Epoch: 1 Global Step: 19970 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 03:37:38,814-Speed 3095.59 samples/sec Loss 14.6127 LearningRate 0.0846 Epoch: 1 Global Step: 19980 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 03:37:42,101-Speed 3116.97 samples/sec Loss 14.6534 LearningRate 0.0846 Epoch: 1 Global Step: 19990 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 03:37:45,386-Speed 3117.55 samples/sec Loss 14.6925 LearningRate 0.0845 Epoch: 1 Global Step: 20000 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 03:37:48,728-Speed 3065.06 samples/sec Loss 14.8122 LearningRate 0.0845 Epoch: 1 Global Step: 20010 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:37:52,031-Speed 3101.04 samples/sec Loss 14.7494 LearningRate 0.0845 Epoch: 1 Global Step: 20020 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:37:55,347-Speed 3088.80 samples/sec Loss 14.7106 LearningRate 0.0845 Epoch: 1 Global Step: 20030 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:37:58,686-Speed 3067.97 samples/sec Loss 14.7012 LearningRate 0.0845 Epoch: 1 Global Step: 20040 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:38:01,960-Speed 3128.62 samples/sec Loss 14.6969 LearningRate 0.0845 Epoch: 1 Global Step: 20050 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:38:05,276-Speed 3089.29 samples/sec Loss 14.6606 LearningRate 0.0845 Epoch: 1 Global Step: 20060 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:38:08,636-Speed 3049.22 samples/sec Loss 14.8293 LearningRate 0.0845 Epoch: 1 Global Step: 20070 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:38:12,036-Speed 3012.76 samples/sec Loss 14.8227 LearningRate 0.0845 Epoch: 1 Global Step: 20080 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:38:15,356-Speed 3085.14 samples/sec Loss 14.7114 LearningRate 0.0845 Epoch: 1 Global Step: 20090 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:38:18,690-Speed 3072.75 samples/sec Loss 14.7867 LearningRate 0.0845 Epoch: 1 Global Step: 20100 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:38:22,089-Speed 3013.32 samples/sec Loss 14.6424 LearningRate 0.0845 Epoch: 1 Global Step: 20110 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:38:25,452-Speed 3045.63 samples/sec Loss 14.8131 LearningRate 0.0845 Epoch: 1 Global Step: 20120 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:38:28,782-Speed 3075.89 samples/sec Loss 14.6732 LearningRate 0.0845 Epoch: 1 Global Step: 20130 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:38:32,155-Speed 3037.29 samples/sec Loss 14.6971 LearningRate 0.0844 Epoch: 1 Global Step: 20140 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:38:35,449-Speed 3109.13 samples/sec Loss 14.6377 LearningRate 0.0844 Epoch: 1 Global Step: 20150 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:38:38,706-Speed 3145.24 samples/sec Loss 14.8254 LearningRate 0.0844 Epoch: 1 Global Step: 20160 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:38:42,041-Speed 3071.20 samples/sec Loss 14.7258 LearningRate 0.0844 Epoch: 1 Global Step: 20170 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:38:45,335-Speed 3109.90 samples/sec Loss 14.6586 LearningRate 0.0844 Epoch: 1 Global Step: 20180 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:38:48,682-Speed 3060.38 samples/sec Loss 14.6466 LearningRate 0.0844 Epoch: 1 Global Step: 20190 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:38:52,044-Speed 3046.07 samples/sec Loss 14.6600 LearningRate 0.0844 Epoch: 1 Global Step: 20200 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:38:55,376-Speed 3074.45 samples/sec Loss 14.7007 LearningRate 0.0844 Epoch: 1 Global Step: 20210 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:38:58,710-Speed 3072.49 samples/sec Loss 14.6659 LearningRate 0.0844 Epoch: 1 Global Step: 20220 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:39:02,041-Speed 3075.02 samples/sec Loss 14.8482 LearningRate 0.0844 Epoch: 1 Global Step: 20230 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:39:05,394-Speed 3054.36 samples/sec Loss 14.7433 LearningRate 0.0844 Epoch: 1 Global Step: 20240 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:39:08,737-Speed 3064.46 samples/sec Loss 14.7764 LearningRate 0.0844 Epoch: 1 Global Step: 20250 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:39:12,047-Speed 3095.13 samples/sec Loss 14.6043 LearningRate 0.0844 Epoch: 1 Global Step: 20260 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:39:15,371-Speed 3081.30 samples/sec Loss 14.6586 LearningRate 0.0843 Epoch: 1 Global Step: 20270 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:39:18,666-Speed 3109.22 samples/sec Loss 14.6181 LearningRate 0.0843 Epoch: 1 Global Step: 20280 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:39:22,112-Speed 2972.41 samples/sec Loss 14.5314 LearningRate 0.0843 Epoch: 1 Global Step: 20290 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:39:25,439-Speed 3077.71 samples/sec Loss 14.7132 LearningRate 0.0843 Epoch: 1 Global Step: 20300 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:39:28,737-Speed 3106.68 samples/sec Loss 14.7074 LearningRate 0.0843 Epoch: 1 Global Step: 20310 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:39:32,053-Speed 3088.50 samples/sec Loss 14.7239 LearningRate 0.0843 Epoch: 1 Global Step: 20320 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:39:35,397-Speed 3063.26 samples/sec Loss 14.7115 LearningRate 0.0843 Epoch: 1 Global Step: 20330 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:39:38,786-Speed 3022.68 samples/sec Loss 14.7684 LearningRate 0.0843 Epoch: 1 Global Step: 20340 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:39:42,086-Speed 3104.00 samples/sec Loss 14.6937 LearningRate 0.0843 Epoch: 1 Global Step: 20350 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:39:45,447-Speed 3046.52 samples/sec Loss 14.6261 LearningRate 0.0843 Epoch: 1 Global Step: 20360 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:39:48,753-Speed 3098.64 samples/sec Loss 14.6358 LearningRate 0.0843 Epoch: 1 Global Step: 20370 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:39:52,088-Speed 3071.82 samples/sec Loss 14.7170 LearningRate 0.0843 Epoch: 1 Global Step: 20380 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:39:55,367-Speed 3123.67 samples/sec Loss 14.6463 LearningRate 0.0843 Epoch: 1 Global Step: 20390 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:39:58,703-Speed 3069.91 samples/sec Loss 14.7828 LearningRate 0.0843 Epoch: 1 Global Step: 20400 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:40:02,047-Speed 3063.47 samples/sec Loss 14.7176 LearningRate 0.0842 Epoch: 1 Global Step: 20410 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:40:05,400-Speed 3054.32 samples/sec Loss 14.7955 LearningRate 0.0842 Epoch: 1 Global Step: 20420 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:40:08,678-Speed 3125.15 samples/sec Loss 14.6208 LearningRate 0.0842 Epoch: 1 Global Step: 20430 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:40:11,988-Speed 3094.90 samples/sec Loss 14.6637 LearningRate 0.0842 Epoch: 1 Global Step: 20440 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:40:15,270-Speed 3120.46 samples/sec Loss 14.7888 LearningRate 0.0842 Epoch: 1 Global Step: 20450 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:40:18,564-Speed 3109.95 samples/sec Loss 14.7999 LearningRate 0.0842 Epoch: 1 Global Step: 20460 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:40:21,891-Speed 3078.52 samples/sec Loss 14.6306 LearningRate 0.0842 Epoch: 1 Global Step: 20470 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:40:25,184-Speed 3109.88 samples/sec Loss 14.5868 LearningRate 0.0842 Epoch: 1 Global Step: 20480 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:40:28,485-Speed 3103.53 samples/sec Loss 14.8133 LearningRate 0.0842 Epoch: 1 Global Step: 20490 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:40:31,746-Speed 3141.72 samples/sec Loss 14.7401 LearningRate 0.0842 Epoch: 1 Global Step: 20500 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:40:35,116-Speed 3039.44 samples/sec Loss 14.6674 LearningRate 0.0842 Epoch: 1 Global Step: 20510 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:40:38,383-Speed 3134.70 samples/sec Loss 14.8685 LearningRate 0.0842 Epoch: 1 Global Step: 20520 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:40:41,770-Speed 3024.41 samples/sec Loss 14.5636 LearningRate 0.0842 Epoch: 1 Global Step: 20530 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:40:45,066-Speed 3107.43 samples/sec Loss 14.7227 LearningRate 0.0841 Epoch: 1 Global Step: 20540 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:40:48,354-Speed 3115.67 samples/sec Loss 14.6717 LearningRate 0.0841 Epoch: 1 Global Step: 20550 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:40:51,756-Speed 3010.69 samples/sec Loss 14.6161 LearningRate 0.0841 Epoch: 1 Global Step: 20560 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:40:55,106-Speed 3057.66 samples/sec Loss 14.6239 LearningRate 0.0841 Epoch: 1 Global Step: 20570 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:40:58,437-Speed 3074.95 samples/sec Loss 14.7745 LearningRate 0.0841 Epoch: 1 Global Step: 20580 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:41:01,707-Speed 3132.76 samples/sec Loss 14.6506 LearningRate 0.0841 Epoch: 1 Global Step: 20590 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:41:05,037-Speed 3075.82 samples/sec Loss 14.8112 LearningRate 0.0841 Epoch: 1 Global Step: 20600 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:41:08,340-Speed 3101.21 samples/sec Loss 14.6471 LearningRate 0.0841 Epoch: 1 Global Step: 20610 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:41:11,619-Speed 3124.22 samples/sec Loss 14.5714 LearningRate 0.0841 Epoch: 1 Global Step: 20620 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:41:14,893-Speed 3128.08 samples/sec Loss 14.7007 LearningRate 0.0841 Epoch: 1 Global Step: 20630 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:41:18,221-Speed 3077.78 samples/sec Loss 14.6620 LearningRate 0.0841 Epoch: 1 Global Step: 20640 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:41:21,573-Speed 3055.99 samples/sec Loss 14.4786 LearningRate 0.0841 Epoch: 1 Global Step: 20650 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:41:24,828-Speed 3147.19 samples/sec Loss 14.5958 LearningRate 0.0841 Epoch: 1 Global Step: 20660 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:41:28,148-Speed 3085.50 samples/sec Loss 14.4873 LearningRate 0.0841 Epoch: 1 Global Step: 20670 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:41:31,430-Speed 3120.48 samples/sec Loss 14.8662 LearningRate 0.0840 Epoch: 1 Global Step: 20680 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:41:34,823-Speed 3019.91 samples/sec Loss 14.4201 LearningRate 0.0840 Epoch: 1 Global Step: 20690 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:41:38,146-Speed 3082.91 samples/sec Loss 14.6640 LearningRate 0.0840 Epoch: 1 Global Step: 20700 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:41:41,456-Speed 3094.32 samples/sec Loss 14.6337 LearningRate 0.0840 Epoch: 1 Global Step: 20710 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:41:44,808-Speed 3055.12 samples/sec Loss 14.6363 LearningRate 0.0840 Epoch: 1 Global Step: 20720 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:41:48,110-Speed 3103.02 samples/sec Loss 14.5090 LearningRate 0.0840 Epoch: 1 Global Step: 20730 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:41:51,459-Speed 3058.18 samples/sec Loss 14.8014 LearningRate 0.0840 Epoch: 1 Global Step: 20740 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:41:54,849-Speed 3021.85 samples/sec Loss 14.9142 LearningRate 0.0840 Epoch: 1 Global Step: 20750 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:41:58,159-Speed 3094.07 samples/sec Loss 14.6756 LearningRate 0.0840 Epoch: 1 Global Step: 20760 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:42:01,443-Speed 3118.86 samples/sec Loss 14.6735 LearningRate 0.0840 Epoch: 1 Global Step: 20770 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:42:04,744-Speed 3102.91 samples/sec Loss 14.7607 LearningRate 0.0840 Epoch: 1 Global Step: 20780 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:42:08,090-Speed 3061.84 samples/sec Loss 14.8778 LearningRate 0.0840 Epoch: 1 Global Step: 20790 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:42:11,421-Speed 3074.61 samples/sec Loss 14.4739 LearningRate 0.0840 Epoch: 1 Global Step: 20800 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:42:14,777-Speed 3052.22 samples/sec Loss 14.7252 LearningRate 0.0839 Epoch: 1 Global Step: 20810 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:42:18,055-Speed 3124.70 samples/sec Loss 14.5783 LearningRate 0.0839 Epoch: 1 Global Step: 20820 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:42:21,399-Speed 3063.38 samples/sec Loss 14.5694 LearningRate 0.0839 Epoch: 1 Global Step: 20830 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-04-27 03:42:24,738-Speed 3067.21 samples/sec Loss 14.4210 LearningRate 0.0839 Epoch: 1 Global Step: 20840 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:42:28,041-Speed 3101.51 samples/sec Loss 14.7088 LearningRate 0.0839 Epoch: 1 Global Step: 20850 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:42:31,421-Speed 3030.76 samples/sec Loss 14.5502 LearningRate 0.0839 Epoch: 1 Global Step: 20860 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:42:34,822-Speed 3010.99 samples/sec Loss 14.5541 LearningRate 0.0839 Epoch: 1 Global Step: 20870 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:42:38,221-Speed 3014.06 samples/sec Loss 14.4918 LearningRate 0.0839 Epoch: 1 Global Step: 20880 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:42:41,533-Speed 3092.71 samples/sec Loss 14.5543 LearningRate 0.0839 Epoch: 1 Global Step: 20890 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 03:42:44,814-Speed 3122.03 samples/sec Loss 14.5946 LearningRate 0.0839 Epoch: 1 Global Step: 20900 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 03:42:48,086-Speed 3130.05 samples/sec Loss 14.7262 LearningRate 0.0839 Epoch: 1 Global Step: 20910 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 03:42:51,442-Speed 3052.24 samples/sec Loss 14.7581 LearningRate 0.0839 Epoch: 1 Global Step: 20920 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 03:42:54,799-Speed 3051.55 samples/sec Loss 14.6292 LearningRate 0.0839 Epoch: 1 Global Step: 20930 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 03:42:58,110-Speed 3093.22 samples/sec Loss 14.7280 LearningRate 0.0839 Epoch: 1 Global Step: 20940 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 03:43:01,471-Speed 3047.44 samples/sec Loss 14.6739 LearningRate 0.0838 Epoch: 1 Global Step: 20950 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 03:43:04,810-Speed 3067.59 samples/sec Loss 14.6481 LearningRate 0.0838 Epoch: 1 Global Step: 20960 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 03:43:08,117-Speed 3097.07 samples/sec Loss 14.5514 LearningRate 0.0838 Epoch: 1 Global Step: 20970 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 03:43:11,409-Speed 3111.91 samples/sec Loss 14.6499 LearningRate 0.0838 Epoch: 1 Global Step: 20980 Fp16 Grad Scale: 32768 Required: 21 hours Training: 2022-04-27 03:43:14,787-Speed 3032.19 samples/sec Loss 14.5205 LearningRate 0.0838 Epoch: 1 Global Step: 20990 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:43:18,144-Speed 3051.06 samples/sec Loss 14.5822 LearningRate 0.0838 Epoch: 1 Global Step: 21000 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:43:21,491-Speed 3060.28 samples/sec Loss 14.7102 LearningRate 0.0838 Epoch: 1 Global Step: 21010 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:43:24,858-Speed 3041.70 samples/sec Loss 14.6374 LearningRate 0.0838 Epoch: 1 Global Step: 21020 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:43:28,191-Speed 3073.37 samples/sec Loss 14.5087 LearningRate 0.0838 Epoch: 1 Global Step: 21030 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:43:31,555-Speed 3044.61 samples/sec Loss 14.5214 LearningRate 0.0838 Epoch: 1 Global Step: 21040 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:43:34,884-Speed 3077.60 samples/sec Loss 14.5940 LearningRate 0.0838 Epoch: 1 Global Step: 21050 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:43:38,308-Speed 2990.81 samples/sec Loss 14.6092 LearningRate 0.0838 Epoch: 1 Global Step: 21060 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:43:41,702-Speed 3018.38 samples/sec Loss 14.6307 LearningRate 0.0838 Epoch: 1 Global Step: 21070 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:43:45,051-Speed 3058.81 samples/sec Loss 14.8341 LearningRate 0.0837 Epoch: 1 Global Step: 21080 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:43:48,392-Speed 3065.51 samples/sec Loss 14.5790 LearningRate 0.0837 Epoch: 1 Global Step: 21090 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:43:51,700-Speed 3097.40 samples/sec Loss 14.5825 LearningRate 0.0837 Epoch: 1 Global Step: 21100 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:43:55,050-Speed 3057.07 samples/sec Loss 14.4933 LearningRate 0.0837 Epoch: 1 Global Step: 21110 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:43:58,348-Speed 3106.35 samples/sec Loss 14.5751 LearningRate 0.0837 Epoch: 1 Global Step: 21120 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:44:01,624-Speed 3126.57 samples/sec Loss 14.5990 LearningRate 0.0837 Epoch: 1 Global Step: 21130 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:44:04,884-Speed 3141.23 samples/sec Loss 14.6840 LearningRate 0.0837 Epoch: 1 Global Step: 21140 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:44:08,225-Speed 3066.53 samples/sec Loss 14.7286 LearningRate 0.0837 Epoch: 1 Global Step: 21150 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:44:11,549-Speed 3081.54 samples/sec Loss 14.6225 LearningRate 0.0837 Epoch: 1 Global Step: 21160 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:44:14,856-Speed 3097.57 samples/sec Loss 14.6222 LearningRate 0.0837 Epoch: 1 Global Step: 21170 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:44:18,111-Speed 3146.22 samples/sec Loss 14.5480 LearningRate 0.0837 Epoch: 1 Global Step: 21180 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:44:21,422-Speed 3093.58 samples/sec Loss 14.6329 LearningRate 0.0837 Epoch: 1 Global Step: 21190 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-04-27 03:44:24,763-Speed 3066.11 samples/sec Loss 14.5026 LearningRate 0.0837 Epoch: 1 Global Step: 21200 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:44:28,053-Speed 3113.77 samples/sec Loss 14.6561 LearningRate 0.0837 Epoch: 1 Global Step: 21210 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:44:31,360-Speed 3096.91 samples/sec Loss 14.5102 LearningRate 0.0836 Epoch: 1 Global Step: 21220 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:44:34,677-Speed 3088.06 samples/sec Loss 14.4007 LearningRate 0.0836 Epoch: 1 Global Step: 21230 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:44:38,026-Speed 3059.27 samples/sec Loss 14.5292 LearningRate 0.0836 Epoch: 1 Global Step: 21240 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:44:41,367-Speed 3066.13 samples/sec Loss 14.5653 LearningRate 0.0836 Epoch: 1 Global Step: 21250 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:44:44,690-Speed 3081.99 samples/sec Loss 14.5409 LearningRate 0.0836 Epoch: 1 Global Step: 21260 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:44:47,997-Speed 3097.15 samples/sec Loss 14.5470 LearningRate 0.0836 Epoch: 1 Global Step: 21270 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:44:51,328-Speed 3075.04 samples/sec Loss 14.6583 LearningRate 0.0836 Epoch: 1 Global Step: 21280 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:44:54,705-Speed 3034.99 samples/sec Loss 14.4229 LearningRate 0.0836 Epoch: 1 Global Step: 21290 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:44:58,028-Speed 3082.33 samples/sec Loss 14.3797 LearningRate 0.0836 Epoch: 1 Global Step: 21300 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:45:01,354-Speed 3080.11 samples/sec Loss 14.7542 LearningRate 0.0836 Epoch: 1 Global Step: 21310 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:45:04,603-Speed 3152.61 samples/sec Loss 14.6863 LearningRate 0.0836 Epoch: 1 Global Step: 21320 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:45:07,903-Speed 3102.84 samples/sec Loss 14.5437 LearningRate 0.0836 Epoch: 1 Global Step: 21330 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:45:11,209-Speed 3098.69 samples/sec Loss 14.4919 LearningRate 0.0836 Epoch: 1 Global Step: 21340 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:45:14,543-Speed 3072.65 samples/sec Loss 14.6621 LearningRate 0.0835 Epoch: 1 Global Step: 21350 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:45:17,920-Speed 3033.82 samples/sec Loss 14.4564 LearningRate 0.0835 Epoch: 1 Global Step: 21360 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:45:21,239-Speed 3085.79 samples/sec Loss 14.5076 LearningRate 0.0835 Epoch: 1 Global Step: 21370 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:45:24,562-Speed 3083.14 samples/sec Loss 14.5288 LearningRate 0.0835 Epoch: 1 Global Step: 21380 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:45:27,832-Speed 3133.11 samples/sec Loss 14.5496 LearningRate 0.0835 Epoch: 1 Global Step: 21390 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:45:31,158-Speed 3079.81 samples/sec Loss 14.4816 LearningRate 0.0835 Epoch: 1 Global Step: 21400 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:45:34,449-Speed 3112.18 samples/sec Loss 14.2796 LearningRate 0.0835 Epoch: 1 Global Step: 21410 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:45:37,810-Speed 3047.68 samples/sec Loss 14.6034 LearningRate 0.0835 Epoch: 1 Global Step: 21420 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:45:41,081-Speed 3131.13 samples/sec Loss 14.5721 LearningRate 0.0835 Epoch: 1 Global Step: 21430 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:45:44,348-Speed 3134.89 samples/sec Loss 14.7100 LearningRate 0.0835 Epoch: 1 Global Step: 21440 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:45:47,664-Speed 3089.84 samples/sec Loss 14.4435 LearningRate 0.0835 Epoch: 1 Global Step: 21450 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:45:50,997-Speed 3072.60 samples/sec Loss 14.6635 LearningRate 0.0835 Epoch: 1 Global Step: 21460 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:45:54,323-Speed 3079.55 samples/sec Loss 14.5835 LearningRate 0.0835 Epoch: 1 Global Step: 21470 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:45:57,607-Speed 3119.72 samples/sec Loss 14.4896 LearningRate 0.0835 Epoch: 1 Global Step: 21480 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:46:00,898-Speed 3112.62 samples/sec Loss 14.4684 LearningRate 0.0834 Epoch: 1 Global Step: 21490 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:46:04,222-Speed 3081.16 samples/sec Loss 14.6357 LearningRate 0.0834 Epoch: 1 Global Step: 21500 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:46:07,509-Speed 3116.47 samples/sec Loss 14.5168 LearningRate 0.0834 Epoch: 1 Global Step: 21510 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:46:10,827-Speed 3087.53 samples/sec Loss 14.5067 LearningRate 0.0834 Epoch: 1 Global Step: 21520 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:46:14,184-Speed 3051.51 samples/sec Loss 14.5431 LearningRate 0.0834 Epoch: 1 Global Step: 21530 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:46:17,515-Speed 3075.00 samples/sec Loss 14.4804 LearningRate 0.0834 Epoch: 1 Global Step: 21540 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:46:20,824-Speed 3095.46 samples/sec Loss 14.6875 LearningRate 0.0834 Epoch: 1 Global Step: 21550 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:46:24,156-Speed 3074.24 samples/sec Loss 14.4842 LearningRate 0.0834 Epoch: 1 Global Step: 21560 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:46:27,456-Speed 3103.29 samples/sec Loss 14.5512 LearningRate 0.0834 Epoch: 1 Global Step: 21570 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:46:30,759-Speed 3101.44 samples/sec Loss 14.6993 LearningRate 0.0834 Epoch: 1 Global Step: 21580 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:46:34,070-Speed 3093.33 samples/sec Loss 14.6559 LearningRate 0.0834 Epoch: 1 Global Step: 21590 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:46:37,335-Speed 3138.05 samples/sec Loss 14.5502 LearningRate 0.0834 Epoch: 1 Global Step: 21600 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:46:40,692-Speed 3050.32 samples/sec Loss 14.5454 LearningRate 0.0834 Epoch: 1 Global Step: 21610 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:46:43,978-Speed 3117.21 samples/sec Loss 14.4663 LearningRate 0.0834 Epoch: 1 Global Step: 21620 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:46:47,331-Speed 3055.71 samples/sec Loss 14.5810 LearningRate 0.0833 Epoch: 1 Global Step: 21630 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:46:50,629-Speed 3104.95 samples/sec Loss 14.3747 LearningRate 0.0833 Epoch: 1 Global Step: 21640 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:46:53,942-Speed 3091.81 samples/sec Loss 14.5487 LearningRate 0.0833 Epoch: 1 Global Step: 21650 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:46:57,233-Speed 3112.91 samples/sec Loss 14.5956 LearningRate 0.0833 Epoch: 1 Global Step: 21660 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:47:00,507-Speed 3128.49 samples/sec Loss 14.5630 LearningRate 0.0833 Epoch: 1 Global Step: 21670 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:47:03,818-Speed 3096.88 samples/sec Loss 14.4429 LearningRate 0.0833 Epoch: 1 Global Step: 21680 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:47:07,146-Speed 3077.49 samples/sec Loss 14.4222 LearningRate 0.0833 Epoch: 1 Global Step: 21690 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:47:10,500-Speed 3054.21 samples/sec Loss 14.4657 LearningRate 0.0833 Epoch: 1 Global Step: 21700 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:47:13,847-Speed 3060.69 samples/sec Loss 14.5779 LearningRate 0.0833 Epoch: 1 Global Step: 21710 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:47:17,142-Speed 3108.22 samples/sec Loss 14.4246 LearningRate 0.0833 Epoch: 1 Global Step: 21720 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:47:20,423-Speed 3122.15 samples/sec Loss 14.4484 LearningRate 0.0833 Epoch: 1 Global Step: 21730 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:47:23,751-Speed 3078.40 samples/sec Loss 14.5739 LearningRate 0.0833 Epoch: 1 Global Step: 21740 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:47:27,066-Speed 3089.64 samples/sec Loss 14.5977 LearningRate 0.0833 Epoch: 1 Global Step: 21750 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:47:30,373-Speed 3097.94 samples/sec Loss 14.4657 LearningRate 0.0832 Epoch: 1 Global Step: 21760 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:47:33,677-Speed 3100.20 samples/sec Loss 14.4659 LearningRate 0.0832 Epoch: 1 Global Step: 21770 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:47:37,009-Speed 3074.37 samples/sec Loss 14.4349 LearningRate 0.0832 Epoch: 1 Global Step: 21780 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:47:40,348-Speed 3067.46 samples/sec Loss 14.4070 LearningRate 0.0832 Epoch: 1 Global Step: 21790 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:47:43,704-Speed 3052.77 samples/sec Loss 14.4341 LearningRate 0.0832 Epoch: 1 Global Step: 21800 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:47:47,006-Speed 3101.59 samples/sec Loss 14.4830 LearningRate 0.0832 Epoch: 1 Global Step: 21810 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:47:50,321-Speed 3089.72 samples/sec Loss 14.4176 LearningRate 0.0832 Epoch: 1 Global Step: 21820 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:47:53,606-Speed 3118.84 samples/sec Loss 14.3435 LearningRate 0.0832 Epoch: 1 Global Step: 21830 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:47:56,928-Speed 3082.73 samples/sec Loss 14.5090 LearningRate 0.0832 Epoch: 1 Global Step: 21840 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:48:00,207-Speed 3124.04 samples/sec Loss 14.4194 LearningRate 0.0832 Epoch: 1 Global Step: 21850 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:48:03,505-Speed 3106.71 samples/sec Loss 14.4517 LearningRate 0.0832 Epoch: 1 Global Step: 21860 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:48:06,796-Speed 3112.38 samples/sec Loss 14.5727 LearningRate 0.0832 Epoch: 1 Global Step: 21870 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:48:10,055-Speed 3143.59 samples/sec Loss 14.3664 LearningRate 0.0832 Epoch: 1 Global Step: 21880 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:48:13,398-Speed 3063.51 samples/sec Loss 14.5426 LearningRate 0.0832 Epoch: 1 Global Step: 21890 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:48:16,703-Speed 3099.67 samples/sec Loss 14.2990 LearningRate 0.0831 Epoch: 1 Global Step: 21900 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:48:19,981-Speed 3124.87 samples/sec Loss 14.4837 LearningRate 0.0831 Epoch: 1 Global Step: 21910 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:48:23,284-Speed 3101.63 samples/sec Loss 14.4545 LearningRate 0.0831 Epoch: 1 Global Step: 21920 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:48:26,632-Speed 3059.37 samples/sec Loss 14.6092 LearningRate 0.0831 Epoch: 1 Global Step: 21930 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:48:29,989-Speed 3051.57 samples/sec Loss 14.6305 LearningRate 0.0831 Epoch: 1 Global Step: 21940 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-04-27 03:48:33,335-Speed 3060.58 samples/sec Loss 14.5512 LearningRate 0.0831 Epoch: 1 Global Step: 21950 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:48:36,717-Speed 3029.48 samples/sec Loss 14.2926 LearningRate 0.0831 Epoch: 1 Global Step: 21960 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:48:40,025-Speed 3095.94 samples/sec Loss 14.4356 LearningRate 0.0831 Epoch: 1 Global Step: 21970 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:48:43,349-Speed 3080.97 samples/sec Loss 14.2520 LearningRate 0.0831 Epoch: 1 Global Step: 21980 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:48:46,734-Speed 3026.25 samples/sec Loss 14.4511 LearningRate 0.0831 Epoch: 1 Global Step: 21990 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:48:50,069-Speed 3071.53 samples/sec Loss 14.4239 LearningRate 0.0831 Epoch: 1 Global Step: 22000 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:48:53,373-Speed 3100.30 samples/sec Loss 14.4226 LearningRate 0.0831 Epoch: 1 Global Step: 22010 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:48:56,665-Speed 3111.66 samples/sec Loss 14.4530 LearningRate 0.0831 Epoch: 1 Global Step: 22020 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:48:59,976-Speed 3093.04 samples/sec Loss 14.4441 LearningRate 0.0831 Epoch: 1 Global Step: 22030 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:49:03,282-Speed 3098.96 samples/sec Loss 14.4112 LearningRate 0.0830 Epoch: 1 Global Step: 22040 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:49:06,616-Speed 3072.22 samples/sec Loss 14.4487 LearningRate 0.0830 Epoch: 1 Global Step: 22050 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:49:09,901-Speed 3117.40 samples/sec Loss 14.4241 LearningRate 0.0830 Epoch: 1 Global Step: 22060 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:49:13,201-Speed 3103.72 samples/sec Loss 14.3417 LearningRate 0.0830 Epoch: 1 Global Step: 22070 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:49:16,502-Speed 3103.33 samples/sec Loss 14.4533 LearningRate 0.0830 Epoch: 1 Global Step: 22080 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:49:19,788-Speed 3116.94 samples/sec Loss 14.3496 LearningRate 0.0830 Epoch: 1 Global Step: 22090 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:49:23,117-Speed 3077.61 samples/sec Loss 14.4186 LearningRate 0.0830 Epoch: 1 Global Step: 22100 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:49:26,506-Speed 3022.16 samples/sec Loss 14.3280 LearningRate 0.0830 Epoch: 1 Global Step: 22110 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:49:29,832-Speed 3079.90 samples/sec Loss 14.3527 LearningRate 0.0830 Epoch: 1 Global Step: 22120 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:49:33,149-Speed 3087.60 samples/sec Loss 14.3493 LearningRate 0.0830 Epoch: 1 Global Step: 22130 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:49:36,424-Speed 3127.35 samples/sec Loss 14.3899 LearningRate 0.0830 Epoch: 1 Global Step: 22140 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:49:39,766-Speed 3065.40 samples/sec Loss 14.3795 LearningRate 0.0830 Epoch: 1 Global Step: 22150 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:49:43,151-Speed 3025.85 samples/sec Loss 14.4136 LearningRate 0.0830 Epoch: 1 Global Step: 22160 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:49:46,472-Speed 3084.22 samples/sec Loss 14.3866 LearningRate 0.0829 Epoch: 1 Global Step: 22170 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:49:49,784-Speed 3093.25 samples/sec Loss 14.3971 LearningRate 0.0829 Epoch: 1 Global Step: 22180 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:49:53,101-Speed 3088.39 samples/sec Loss 14.3842 LearningRate 0.0829 Epoch: 1 Global Step: 22190 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:49:56,477-Speed 3033.57 samples/sec Loss 14.2916 LearningRate 0.0829 Epoch: 1 Global Step: 22200 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:49:59,814-Speed 3069.76 samples/sec Loss 14.4727 LearningRate 0.0829 Epoch: 1 Global Step: 22210 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:50:03,148-Speed 3071.74 samples/sec Loss 14.4580 LearningRate 0.0829 Epoch: 1 Global Step: 22220 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:50:06,447-Speed 3104.88 samples/sec Loss 14.2196 LearningRate 0.0829 Epoch: 1 Global Step: 22230 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:50:09,764-Speed 3087.92 samples/sec Loss 14.4419 LearningRate 0.0829 Epoch: 1 Global Step: 22240 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:50:13,048-Speed 3119.39 samples/sec Loss 14.5166 LearningRate 0.0829 Epoch: 1 Global Step: 22250 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:50:16,411-Speed 3045.81 samples/sec Loss 14.3988 LearningRate 0.0829 Epoch: 1 Global Step: 22260 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:50:19,835-Speed 2991.57 samples/sec Loss 14.3931 LearningRate 0.0829 Epoch: 1 Global Step: 22270 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:50:23,191-Speed 3052.65 samples/sec Loss 14.5936 LearningRate 0.0829 Epoch: 1 Global Step: 22280 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:50:26,518-Speed 3077.96 samples/sec Loss 14.3923 LearningRate 0.0829 Epoch: 1 Global Step: 22290 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:50:29,816-Speed 3105.97 samples/sec Loss 14.4365 LearningRate 0.0829 Epoch: 1 Global Step: 22300 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:50:33,133-Speed 3089.40 samples/sec Loss 14.3916 LearningRate 0.0828 Epoch: 1 Global Step: 22310 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:50:36,441-Speed 3096.16 samples/sec Loss 14.3668 LearningRate 0.0828 Epoch: 1 Global Step: 22320 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:50:39,735-Speed 3109.54 samples/sec Loss 14.3242 LearningRate 0.0828 Epoch: 1 Global Step: 22330 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:50:43,072-Speed 3069.11 samples/sec Loss 14.2690 LearningRate 0.0828 Epoch: 1 Global Step: 22340 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:50:46,434-Speed 3047.05 samples/sec Loss 14.4222 LearningRate 0.0828 Epoch: 1 Global Step: 22350 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:50:49,711-Speed 3125.03 samples/sec Loss 14.5019 LearningRate 0.0828 Epoch: 1 Global Step: 22360 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:50:53,043-Speed 3074.78 samples/sec Loss 14.3849 LearningRate 0.0828 Epoch: 1 Global Step: 22370 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:50:56,317-Speed 3127.93 samples/sec Loss 14.3525 LearningRate 0.0828 Epoch: 1 Global Step: 22380 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:50:59,631-Speed 3090.72 samples/sec Loss 14.1990 LearningRate 0.0828 Epoch: 1 Global Step: 22390 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:51:02,964-Speed 3073.34 samples/sec Loss 14.4575 LearningRate 0.0828 Epoch: 1 Global Step: 22400 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:51:06,310-Speed 3061.70 samples/sec Loss 14.5265 LearningRate 0.0828 Epoch: 1 Global Step: 22410 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:51:09,634-Speed 3081.51 samples/sec Loss 14.4693 LearningRate 0.0828 Epoch: 1 Global Step: 22420 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:51:12,900-Speed 3136.17 samples/sec Loss 14.3083 LearningRate 0.0828 Epoch: 1 Global Step: 22430 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:51:16,316-Speed 2998.34 samples/sec Loss 14.3355 LearningRate 0.0827 Epoch: 1 Global Step: 22440 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:51:19,583-Speed 3134.94 samples/sec Loss 14.3864 LearningRate 0.0827 Epoch: 1 Global Step: 22450 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:51:22,899-Speed 3089.80 samples/sec Loss 14.3227 LearningRate 0.0827 Epoch: 1 Global Step: 22460 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:51:26,157-Speed 3143.01 samples/sec Loss 14.3566 LearningRate 0.0827 Epoch: 1 Global Step: 22470 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:51:29,450-Speed 3110.22 samples/sec Loss 14.3563 LearningRate 0.0827 Epoch: 1 Global Step: 22480 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:51:32,722-Speed 3131.10 samples/sec Loss 14.4385 LearningRate 0.0827 Epoch: 1 Global Step: 22490 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:51:36,029-Speed 3097.57 samples/sec Loss 14.3251 LearningRate 0.0827 Epoch: 1 Global Step: 22500 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:51:39,378-Speed 3058.42 samples/sec Loss 14.2929 LearningRate 0.0827 Epoch: 1 Global Step: 22510 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:51:42,666-Speed 3114.84 samples/sec Loss 14.1498 LearningRate 0.0827 Epoch: 1 Global Step: 22520 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:51:45,950-Speed 3118.63 samples/sec Loss 14.2621 LearningRate 0.0827 Epoch: 1 Global Step: 22530 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:51:49,278-Speed 3077.45 samples/sec Loss 14.3252 LearningRate 0.0827 Epoch: 1 Global Step: 22540 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:51:52,657-Speed 3031.93 samples/sec Loss 14.2623 LearningRate 0.0827 Epoch: 1 Global Step: 22550 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:51:55,993-Speed 3069.79 samples/sec Loss 14.4426 LearningRate 0.0827 Epoch: 1 Global Step: 22560 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:51:59,309-Speed 3089.41 samples/sec Loss 14.2747 LearningRate 0.0827 Epoch: 1 Global Step: 22570 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:52:02,595-Speed 3116.69 samples/sec Loss 14.3179 LearningRate 0.0826 Epoch: 1 Global Step: 22580 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:52:05,992-Speed 3015.88 samples/sec Loss 14.4433 LearningRate 0.0826 Epoch: 1 Global Step: 22590 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:52:09,298-Speed 3098.34 samples/sec Loss 14.5233 LearningRate 0.0826 Epoch: 1 Global Step: 22600 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:52:12,610-Speed 3092.41 samples/sec Loss 14.5470 LearningRate 0.0826 Epoch: 1 Global Step: 22610 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:52:15,916-Speed 3098.32 samples/sec Loss 14.3838 LearningRate 0.0826 Epoch: 1 Global Step: 22620 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:52:19,195-Speed 3122.97 samples/sec Loss 14.4619 LearningRate 0.0826 Epoch: 1 Global Step: 22630 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:52:22,525-Speed 3076.47 samples/sec Loss 14.5168 LearningRate 0.0826 Epoch: 1 Global Step: 22640 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:52:25,857-Speed 3074.11 samples/sec Loss 14.3839 LearningRate 0.0826 Epoch: 1 Global Step: 22650 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:52:29,168-Speed 3093.76 samples/sec Loss 14.2067 LearningRate 0.0826 Epoch: 1 Global Step: 22660 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:52:32,464-Speed 3108.18 samples/sec Loss 14.4621 LearningRate 0.0826 Epoch: 1 Global Step: 22670 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:52:35,760-Speed 3107.28 samples/sec Loss 14.2409 LearningRate 0.0826 Epoch: 1 Global Step: 22680 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:52:39,046-Speed 3118.03 samples/sec Loss 14.2186 LearningRate 0.0826 Epoch: 1 Global Step: 22690 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:52:42,344-Speed 3106.49 samples/sec Loss 14.1672 LearningRate 0.0826 Epoch: 1 Global Step: 22700 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:52:45,664-Speed 3084.93 samples/sec Loss 14.3629 LearningRate 0.0826 Epoch: 1 Global Step: 22710 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:52:48,997-Speed 3072.83 samples/sec Loss 14.5362 LearningRate 0.0825 Epoch: 1 Global Step: 22720 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:52:52,299-Speed 3102.19 samples/sec Loss 14.3670 LearningRate 0.0825 Epoch: 1 Global Step: 22730 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:52:55,619-Speed 3085.84 samples/sec Loss 14.3941 LearningRate 0.0825 Epoch: 1 Global Step: 22740 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:52:58,985-Speed 3042.71 samples/sec Loss 14.1859 LearningRate 0.0825 Epoch: 1 Global Step: 22750 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:53:02,337-Speed 3056.13 samples/sec Loss 14.4485 LearningRate 0.0825 Epoch: 1 Global Step: 22760 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-04-27 03:53:05,601-Speed 3137.38 samples/sec Loss 14.4314 LearningRate 0.0825 Epoch: 1 Global Step: 22770 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:53:08,926-Speed 3081.11 samples/sec Loss 14.4876 LearningRate 0.0825 Epoch: 1 Global Step: 22780 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:53:12,295-Speed 3040.25 samples/sec Loss 14.2590 LearningRate 0.0825 Epoch: 1 Global Step: 22790 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:53:15,640-Speed 3062.62 samples/sec Loss 14.2038 LearningRate 0.0825 Epoch: 1 Global Step: 22800 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:53:18,923-Speed 3119.49 samples/sec Loss 14.3631 LearningRate 0.0825 Epoch: 1 Global Step: 22810 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:53:22,239-Speed 3089.22 samples/sec Loss 14.2636 LearningRate 0.0825 Epoch: 1 Global Step: 22820 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:53:25,571-Speed 3073.57 samples/sec Loss 14.2379 LearningRate 0.0825 Epoch: 1 Global Step: 22830 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:53:28,860-Speed 3114.05 samples/sec Loss 14.2151 LearningRate 0.0825 Epoch: 1 Global Step: 22840 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:53:32,121-Speed 3141.91 samples/sec Loss 14.3020 LearningRate 0.0824 Epoch: 1 Global Step: 22850 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:53:35,474-Speed 3053.98 samples/sec Loss 14.2215 LearningRate 0.0824 Epoch: 1 Global Step: 22860 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:53:38,728-Speed 3148.58 samples/sec Loss 14.3355 LearningRate 0.0824 Epoch: 1 Global Step: 22870 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:53:42,022-Speed 3109.69 samples/sec Loss 14.2646 LearningRate 0.0824 Epoch: 1 Global Step: 22880 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:53:45,334-Speed 3092.20 samples/sec Loss 14.2900 LearningRate 0.0824 Epoch: 1 Global Step: 22890 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:53:49,235-Speed 2625.57 samples/sec Loss 14.2313 LearningRate 0.0824 Epoch: 1 Global Step: 22900 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:53:52,513-Speed 3125.33 samples/sec Loss 14.2814 LearningRate 0.0824 Epoch: 1 Global Step: 22910 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:53:55,816-Speed 3101.07 samples/sec Loss 14.2241 LearningRate 0.0824 Epoch: 1 Global Step: 22920 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:53:59,156-Speed 3066.36 samples/sec Loss 14.3420 LearningRate 0.0824 Epoch: 1 Global Step: 22930 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:54:02,461-Speed 3099.48 samples/sec Loss 14.2696 LearningRate 0.0824 Epoch: 1 Global Step: 22940 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:54:05,836-Speed 3034.55 samples/sec Loss 14.4210 LearningRate 0.0824 Epoch: 1 Global Step: 22950 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:54:09,223-Speed 3024.94 samples/sec Loss 14.2564 LearningRate 0.0824 Epoch: 1 Global Step: 22960 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:54:12,543-Speed 3085.53 samples/sec Loss 14.4287 LearningRate 0.0824 Epoch: 1 Global Step: 22970 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:54:15,873-Speed 3075.68 samples/sec Loss 14.2234 LearningRate 0.0824 Epoch: 1 Global Step: 22980 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:54:19,213-Speed 3066.82 samples/sec Loss 14.2941 LearningRate 0.0823 Epoch: 1 Global Step: 22990 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:54:22,549-Speed 3070.90 samples/sec Loss 14.4140 LearningRate 0.0823 Epoch: 1 Global Step: 23000 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:54:25,877-Speed 3077.79 samples/sec Loss 14.4207 LearningRate 0.0823 Epoch: 1 Global Step: 23010 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:54:29,224-Speed 3060.10 samples/sec Loss 14.5048 LearningRate 0.0823 Epoch: 1 Global Step: 23020 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:54:32,516-Speed 3111.46 samples/sec Loss 14.3636 LearningRate 0.0823 Epoch: 1 Global Step: 23030 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:54:35,785-Speed 3133.07 samples/sec Loss 14.2209 LearningRate 0.0823 Epoch: 1 Global Step: 23040 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:54:39,081-Speed 3108.51 samples/sec Loss 14.3992 LearningRate 0.0823 Epoch: 1 Global Step: 23050 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:54:42,419-Speed 3068.29 samples/sec Loss 14.2043 LearningRate 0.0823 Epoch: 1 Global Step: 23060 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:54:45,698-Speed 3124.40 samples/sec Loss 14.2409 LearningRate 0.0823 Epoch: 1 Global Step: 23070 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:54:49,000-Speed 3102.17 samples/sec Loss 14.1932 LearningRate 0.0823 Epoch: 1 Global Step: 23080 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:54:52,311-Speed 3093.35 samples/sec Loss 14.3510 LearningRate 0.0823 Epoch: 1 Global Step: 23090 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:54:55,686-Speed 3034.60 samples/sec Loss 14.4950 LearningRate 0.0823 Epoch: 1 Global Step: 23100 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:54:58,955-Speed 3133.44 samples/sec Loss 14.2994 LearningRate 0.0823 Epoch: 1 Global Step: 23110 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:55:02,291-Speed 3070.80 samples/sec Loss 14.2673 LearningRate 0.0823 Epoch: 1 Global Step: 23120 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:55:05,654-Speed 3046.12 samples/sec Loss 14.4528 LearningRate 0.0822 Epoch: 1 Global Step: 23130 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:55:08,950-Speed 3106.79 samples/sec Loss 14.3254 LearningRate 0.0822 Epoch: 1 Global Step: 23140 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:55:15,736-Speed 1509.48 samples/sec Loss 14.3521 LearningRate 0.0822 Epoch: 1 Global Step: 23150 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:55:19,046-Speed 3094.23 samples/sec Loss 14.3058 LearningRate 0.0822 Epoch: 1 Global Step: 23160 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:55:22,385-Speed 3067.71 samples/sec Loss 14.1786 LearningRate 0.0822 Epoch: 1 Global Step: 23170 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:55:25,671-Speed 3118.10 samples/sec Loss 14.3225 LearningRate 0.0822 Epoch: 1 Global Step: 23180 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:55:29,007-Speed 3070.54 samples/sec Loss 14.3567 LearningRate 0.0822 Epoch: 1 Global Step: 23190 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:55:32,345-Speed 3068.35 samples/sec Loss 14.3093 LearningRate 0.0822 Epoch: 1 Global Step: 23200 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:55:35,699-Speed 3053.85 samples/sec Loss 14.2388 LearningRate 0.0822 Epoch: 1 Global Step: 23210 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:55:39,084-Speed 3026.62 samples/sec Loss 14.4045 LearningRate 0.0822 Epoch: 1 Global Step: 23220 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:55:42,479-Speed 3016.54 samples/sec Loss 14.3020 LearningRate 0.0822 Epoch: 1 Global Step: 23230 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:55:45,796-Speed 3087.87 samples/sec Loss 14.2691 LearningRate 0.0822 Epoch: 1 Global Step: 23240 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:55:49,137-Speed 3066.61 samples/sec Loss 14.3569 LearningRate 0.0822 Epoch: 1 Global Step: 23250 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:55:52,442-Speed 3098.89 samples/sec Loss 14.0817 LearningRate 0.0822 Epoch: 1 Global Step: 23260 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:55:55,708-Speed 3135.92 samples/sec Loss 14.2021 LearningRate 0.0821 Epoch: 1 Global Step: 23270 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:55:59,040-Speed 3074.65 samples/sec Loss 14.1429 LearningRate 0.0821 Epoch: 1 Global Step: 23280 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:56:02,389-Speed 3058.44 samples/sec Loss 14.3212 LearningRate 0.0821 Epoch: 1 Global Step: 23290 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:56:05,711-Speed 3083.38 samples/sec Loss 14.3197 LearningRate 0.0821 Epoch: 1 Global Step: 23300 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:56:09,041-Speed 3075.82 samples/sec Loss 14.2269 LearningRate 0.0821 Epoch: 1 Global Step: 23310 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:56:12,361-Speed 3085.25 samples/sec Loss 14.2044 LearningRate 0.0821 Epoch: 1 Global Step: 23320 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:56:15,613-Speed 3149.41 samples/sec Loss 14.3074 LearningRate 0.0821 Epoch: 1 Global Step: 23330 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:56:18,941-Speed 3078.50 samples/sec Loss 14.1392 LearningRate 0.0821 Epoch: 1 Global Step: 23340 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:56:22,281-Speed 3066.03 samples/sec Loss 14.2299 LearningRate 0.0821 Epoch: 1 Global Step: 23350 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:56:25,628-Speed 3060.57 samples/sec Loss 14.1074 LearningRate 0.0821 Epoch: 1 Global Step: 23360 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:56:28,969-Speed 3066.14 samples/sec Loss 14.4764 LearningRate 0.0821 Epoch: 1 Global Step: 23370 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:56:32,291-Speed 3084.32 samples/sec Loss 14.3077 LearningRate 0.0821 Epoch: 1 Global Step: 23380 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:56:35,610-Speed 3085.35 samples/sec Loss 14.2924 LearningRate 0.0821 Epoch: 1 Global Step: 23390 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:56:38,962-Speed 3056.03 samples/sec Loss 14.3064 LearningRate 0.0820 Epoch: 1 Global Step: 23400 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:56:42,250-Speed 3115.25 samples/sec Loss 14.2357 LearningRate 0.0820 Epoch: 1 Global Step: 23410 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:56:45,589-Speed 3068.22 samples/sec Loss 14.1190 LearningRate 0.0820 Epoch: 1 Global Step: 23420 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:56:48,943-Speed 3053.68 samples/sec Loss 14.2171 LearningRate 0.0820 Epoch: 1 Global Step: 23430 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:56:52,347-Speed 3010.25 samples/sec Loss 14.1840 LearningRate 0.0820 Epoch: 1 Global Step: 23440 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:56:55,683-Speed 3070.51 samples/sec Loss 14.1915 LearningRate 0.0820 Epoch: 1 Global Step: 23450 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:56:59,023-Speed 3067.75 samples/sec Loss 14.2344 LearningRate 0.0820 Epoch: 1 Global Step: 23460 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:57:02,376-Speed 3054.74 samples/sec Loss 14.1686 LearningRate 0.0820 Epoch: 1 Global Step: 23470 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:57:05,729-Speed 3054.87 samples/sec Loss 14.0988 LearningRate 0.0820 Epoch: 1 Global Step: 23480 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:57:09,100-Speed 3038.58 samples/sec Loss 14.4230 LearningRate 0.0820 Epoch: 1 Global Step: 23490 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:57:12,413-Speed 3092.00 samples/sec Loss 14.0939 LearningRate 0.0820 Epoch: 1 Global Step: 23500 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:57:15,758-Speed 3062.55 samples/sec Loss 14.2228 LearningRate 0.0820 Epoch: 1 Global Step: 23510 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:57:19,074-Speed 3089.59 samples/sec Loss 14.2312 LearningRate 0.0820 Epoch: 1 Global Step: 23520 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:57:22,427-Speed 3054.21 samples/sec Loss 14.2141 LearningRate 0.0820 Epoch: 1 Global Step: 23530 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:57:25,713-Speed 3116.98 samples/sec Loss 14.3136 LearningRate 0.0819 Epoch: 1 Global Step: 23540 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:57:29,105-Speed 3020.06 samples/sec Loss 14.3235 LearningRate 0.0819 Epoch: 1 Global Step: 23550 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:57:32,469-Speed 3044.90 samples/sec Loss 14.0965 LearningRate 0.0819 Epoch: 1 Global Step: 23560 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:57:35,787-Speed 3086.80 samples/sec Loss 14.3022 LearningRate 0.0819 Epoch: 1 Global Step: 23570 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:57:39,106-Speed 3086.42 samples/sec Loss 14.1621 LearningRate 0.0819 Epoch: 1 Global Step: 23580 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:57:42,399-Speed 3110.30 samples/sec Loss 14.2540 LearningRate 0.0819 Epoch: 1 Global Step: 23590 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:57:45,697-Speed 3106.50 samples/sec Loss 14.3664 LearningRate 0.0819 Epoch: 1 Global Step: 23600 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:57:48,968-Speed 3131.09 samples/sec Loss 14.4002 LearningRate 0.0819 Epoch: 1 Global Step: 23610 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:57:52,316-Speed 3059.45 samples/sec Loss 14.2801 LearningRate 0.0819 Epoch: 1 Global Step: 23620 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:57:55,663-Speed 3060.27 samples/sec Loss 14.1632 LearningRate 0.0819 Epoch: 1 Global Step: 23630 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:57:59,019-Speed 3052.09 samples/sec Loss 14.1775 LearningRate 0.0819 Epoch: 1 Global Step: 23640 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:58:02,310-Speed 3112.86 samples/sec Loss 14.3684 LearningRate 0.0819 Epoch: 1 Global Step: 23650 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:58:05,634-Speed 3081.11 samples/sec Loss 14.3534 LearningRate 0.0819 Epoch: 1 Global Step: 23660 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:58:08,922-Speed 3115.55 samples/sec Loss 14.1516 LearningRate 0.0819 Epoch: 1 Global Step: 23670 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:58:12,229-Speed 3097.72 samples/sec Loss 14.1133 LearningRate 0.0818 Epoch: 1 Global Step: 23680 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:58:15,517-Speed 3115.62 samples/sec Loss 14.0734 LearningRate 0.0818 Epoch: 1 Global Step: 23690 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:58:18,829-Speed 3092.42 samples/sec Loss 14.1004 LearningRate 0.0818 Epoch: 1 Global Step: 23700 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:58:22,132-Speed 3101.33 samples/sec Loss 14.3764 LearningRate 0.0818 Epoch: 1 Global Step: 23710 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:58:25,506-Speed 3035.54 samples/sec Loss 14.1769 LearningRate 0.0818 Epoch: 1 Global Step: 23720 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:58:28,826-Speed 3085.34 samples/sec Loss 14.1682 LearningRate 0.0818 Epoch: 1 Global Step: 23730 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:58:32,120-Speed 3109.59 samples/sec Loss 14.2497 LearningRate 0.0818 Epoch: 1 Global Step: 23740 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:58:35,430-Speed 3094.31 samples/sec Loss 14.2881 LearningRate 0.0818 Epoch: 1 Global Step: 23750 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:58:38,758-Speed 3077.92 samples/sec Loss 14.3298 LearningRate 0.0818 Epoch: 1 Global Step: 23760 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:58:42,058-Speed 3104.82 samples/sec Loss 14.2256 LearningRate 0.0818 Epoch: 1 Global Step: 23770 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:58:45,474-Speed 2998.23 samples/sec Loss 14.2370 LearningRate 0.0818 Epoch: 1 Global Step: 23780 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-04-27 03:58:48,845-Speed 3038.88 samples/sec Loss 14.3913 LearningRate 0.0818 Epoch: 1 Global Step: 23790 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:58:52,192-Speed 3060.31 samples/sec Loss 14.1982 LearningRate 0.0818 Epoch: 1 Global Step: 23800 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:58:55,535-Speed 3063.99 samples/sec Loss 14.1415 LearningRate 0.0817 Epoch: 1 Global Step: 23810 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:58:58,910-Speed 3035.66 samples/sec Loss 14.3207 LearningRate 0.0817 Epoch: 1 Global Step: 23820 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:59:02,308-Speed 3013.76 samples/sec Loss 14.0917 LearningRate 0.0817 Epoch: 1 Global Step: 23830 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:59:05,599-Speed 3112.83 samples/sec Loss 14.1513 LearningRate 0.0817 Epoch: 1 Global Step: 23840 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:59:08,925-Speed 3079.58 samples/sec Loss 14.2206 LearningRate 0.0817 Epoch: 1 Global Step: 23850 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:59:12,258-Speed 3072.99 samples/sec Loss 14.2132 LearningRate 0.0817 Epoch: 1 Global Step: 23860 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:59:15,573-Speed 3089.94 samples/sec Loss 14.2393 LearningRate 0.0817 Epoch: 1 Global Step: 23870 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:59:18,877-Speed 3100.51 samples/sec Loss 14.1913 LearningRate 0.0817 Epoch: 1 Global Step: 23880 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:59:22,125-Speed 3154.02 samples/sec Loss 14.0365 LearningRate 0.0817 Epoch: 1 Global Step: 23890 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:59:25,436-Speed 3093.70 samples/sec Loss 14.1937 LearningRate 0.0817 Epoch: 1 Global Step: 23900 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:59:28,705-Speed 3132.88 samples/sec Loss 14.2114 LearningRate 0.0817 Epoch: 1 Global Step: 23910 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 03:59:32,018-Speed 3092.56 samples/sec Loss 14.3223 LearningRate 0.0817 Epoch: 1 Global Step: 23920 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:59:35,342-Speed 3081.62 samples/sec Loss 14.2975 LearningRate 0.0817 Epoch: 1 Global Step: 23930 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:59:38,705-Speed 3045.67 samples/sec Loss 14.0578 LearningRate 0.0817 Epoch: 1 Global Step: 23940 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:59:42,081-Speed 3034.02 samples/sec Loss 14.1143 LearningRate 0.0816 Epoch: 1 Global Step: 23950 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:59:45,409-Speed 3078.04 samples/sec Loss 14.1265 LearningRate 0.0816 Epoch: 1 Global Step: 23960 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:59:48,716-Speed 3097.48 samples/sec Loss 14.2224 LearningRate 0.0816 Epoch: 1 Global Step: 23970 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:59:51,997-Speed 3121.62 samples/sec Loss 14.1548 LearningRate 0.0816 Epoch: 1 Global Step: 23980 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:59:55,318-Speed 3084.62 samples/sec Loss 14.1225 LearningRate 0.0816 Epoch: 1 Global Step: 23990 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 03:59:58,680-Speed 3046.79 samples/sec Loss 14.1173 LearningRate 0.0816 Epoch: 1 Global Step: 24000 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:00:02,023-Speed 3064.67 samples/sec Loss 14.2402 LearningRate 0.0816 Epoch: 1 Global Step: 24010 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:00:05,308-Speed 3117.21 samples/sec Loss 14.2400 LearningRate 0.0816 Epoch: 1 Global Step: 24020 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:00:08,642-Speed 3072.53 samples/sec Loss 13.9743 LearningRate 0.0816 Epoch: 1 Global Step: 24030 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:00:11,959-Speed 3088.17 samples/sec Loss 14.2410 LearningRate 0.0816 Epoch: 1 Global Step: 24040 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:00:15,299-Speed 3066.74 samples/sec Loss 14.1693 LearningRate 0.0816 Epoch: 1 Global Step: 24050 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:00:18,619-Speed 3085.81 samples/sec Loss 14.2213 LearningRate 0.0816 Epoch: 1 Global Step: 24060 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:00:21,906-Speed 3116.79 samples/sec Loss 14.2529 LearningRate 0.0816 Epoch: 1 Global Step: 24070 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:00:25,289-Speed 3027.33 samples/sec Loss 14.0483 LearningRate 0.0816 Epoch: 1 Global Step: 24080 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:00:28,578-Speed 3115.05 samples/sec Loss 14.1187 LearningRate 0.0815 Epoch: 1 Global Step: 24090 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:00:31,879-Speed 3102.35 samples/sec Loss 14.1120 LearningRate 0.0815 Epoch: 1 Global Step: 24100 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:00:35,140-Speed 3141.99 samples/sec Loss 14.1148 LearningRate 0.0815 Epoch: 1 Global Step: 24110 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:00:38,449-Speed 3095.26 samples/sec Loss 14.0641 LearningRate 0.0815 Epoch: 1 Global Step: 24120 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:00:41,700-Speed 3150.78 samples/sec Loss 14.0759 LearningRate 0.0815 Epoch: 1 Global Step: 24130 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:00:44,982-Speed 3120.64 samples/sec Loss 14.1615 LearningRate 0.0815 Epoch: 1 Global Step: 24140 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:00:48,299-Speed 3088.73 samples/sec Loss 14.1997 LearningRate 0.0815 Epoch: 1 Global Step: 24150 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:00:51,713-Speed 2999.87 samples/sec Loss 14.2136 LearningRate 0.0815 Epoch: 1 Global Step: 24160 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:00:55,004-Speed 3113.08 samples/sec Loss 14.1837 LearningRate 0.0815 Epoch: 1 Global Step: 24170 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:00:58,328-Speed 3081.43 samples/sec Loss 14.2340 LearningRate 0.0815 Epoch: 1 Global Step: 24180 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:01:01,599-Speed 3131.12 samples/sec Loss 14.1801 LearningRate 0.0815 Epoch: 1 Global Step: 24190 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:01:04,906-Speed 3097.63 samples/sec Loss 14.2831 LearningRate 0.0815 Epoch: 1 Global Step: 24200 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:01:08,178-Speed 3130.71 samples/sec Loss 14.2598 LearningRate 0.0815 Epoch: 1 Global Step: 24210 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-04-27 04:01:11,521-Speed 3063.17 samples/sec Loss 14.1664 LearningRate 0.0815 Epoch: 1 Global Step: 24220 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:01:14,869-Speed 3060.02 samples/sec Loss 14.2294 LearningRate 0.0814 Epoch: 1 Global Step: 24230 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:01:18,199-Speed 3076.04 samples/sec Loss 14.2449 LearningRate 0.0814 Epoch: 1 Global Step: 24240 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:01:21,533-Speed 3072.13 samples/sec Loss 14.0886 LearningRate 0.0814 Epoch: 1 Global Step: 24250 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:01:24,880-Speed 3060.20 samples/sec Loss 14.2430 LearningRate 0.0814 Epoch: 1 Global Step: 24260 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:01:28,177-Speed 3106.28 samples/sec Loss 14.0722 LearningRate 0.0814 Epoch: 1 Global Step: 24270 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:01:31,440-Speed 3140.12 samples/sec Loss 14.0344 LearningRate 0.0814 Epoch: 1 Global Step: 24280 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:01:34,697-Speed 3144.38 samples/sec Loss 14.2088 LearningRate 0.0814 Epoch: 1 Global Step: 24290 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:01:38,017-Speed 3084.95 samples/sec Loss 14.1396 LearningRate 0.0814 Epoch: 1 Global Step: 24300 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:01:41,283-Speed 3136.72 samples/sec Loss 14.1472 LearningRate 0.0814 Epoch: 1 Global Step: 24310 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:01:44,540-Speed 3145.59 samples/sec Loss 14.2248 LearningRate 0.0814 Epoch: 1 Global Step: 24320 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-04-27 04:01:47,817-Speed 3125.61 samples/sec Loss 14.1106 LearningRate 0.0814 Epoch: 1 Global Step: 24330 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:01:51,102-Speed 3118.09 samples/sec Loss 14.0878 LearningRate 0.0814 Epoch: 1 Global Step: 24340 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:01:54,403-Speed 3103.37 samples/sec Loss 14.2967 LearningRate 0.0814 Epoch: 1 Global Step: 24350 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:01:57,742-Speed 3067.80 samples/sec Loss 14.1822 LearningRate 0.0813 Epoch: 1 Global Step: 24360 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:02:01,052-Speed 3094.64 samples/sec Loss 14.1394 LearningRate 0.0813 Epoch: 1 Global Step: 24370 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:02:04,411-Speed 3049.26 samples/sec Loss 14.1319 LearningRate 0.0813 Epoch: 1 Global Step: 24380 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:02:07,762-Speed 3056.72 samples/sec Loss 14.1103 LearningRate 0.0813 Epoch: 1 Global Step: 24390 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:02:11,043-Speed 3121.63 samples/sec Loss 14.1855 LearningRate 0.0813 Epoch: 1 Global Step: 24400 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:02:14,328-Speed 3118.40 samples/sec Loss 14.2588 LearningRate 0.0813 Epoch: 1 Global Step: 24410 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:02:17,697-Speed 3040.03 samples/sec Loss 13.9730 LearningRate 0.0813 Epoch: 1 Global Step: 24420 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:02:21,020-Speed 3082.48 samples/sec Loss 14.1981 LearningRate 0.0813 Epoch: 1 Global Step: 24430 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:02:24,376-Speed 3052.23 samples/sec Loss 14.2257 LearningRate 0.0813 Epoch: 1 Global Step: 24440 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:02:27,729-Speed 3054.88 samples/sec Loss 14.1873 LearningRate 0.0813 Epoch: 1 Global Step: 24450 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:02:31,032-Speed 3100.91 samples/sec Loss 14.0762 LearningRate 0.0813 Epoch: 1 Global Step: 24460 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:02:34,305-Speed 3129.34 samples/sec Loss 14.1661 LearningRate 0.0813 Epoch: 1 Global Step: 24470 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:02:37,634-Speed 3077.39 samples/sec Loss 13.9165 LearningRate 0.0813 Epoch: 1 Global Step: 24480 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:02:40,994-Speed 3048.52 samples/sec Loss 14.1839 LearningRate 0.0813 Epoch: 1 Global Step: 24490 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:02:44,337-Speed 3064.65 samples/sec Loss 14.0174 LearningRate 0.0812 Epoch: 1 Global Step: 24500 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:02:47,685-Speed 3059.72 samples/sec Loss 14.1252 LearningRate 0.0812 Epoch: 1 Global Step: 24510 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:02:50,986-Speed 3102.92 samples/sec Loss 14.1293 LearningRate 0.0812 Epoch: 1 Global Step: 24520 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:02:54,246-Speed 3142.59 samples/sec Loss 14.0211 LearningRate 0.0812 Epoch: 1 Global Step: 24530 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:02:57,550-Speed 3100.16 samples/sec Loss 14.1293 LearningRate 0.0812 Epoch: 1 Global Step: 24540 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:03:00,836-Speed 3117.87 samples/sec Loss 14.1670 LearningRate 0.0812 Epoch: 1 Global Step: 24550 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:03:04,114-Speed 3124.28 samples/sec Loss 14.1846 LearningRate 0.0812 Epoch: 1 Global Step: 24560 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:03:07,433-Speed 3086.63 samples/sec Loss 14.2569 LearningRate 0.0812 Epoch: 1 Global Step: 24570 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:03:10,731-Speed 3105.18 samples/sec Loss 14.1557 LearningRate 0.0812 Epoch: 1 Global Step: 24580 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:03:14,048-Speed 3088.67 samples/sec Loss 13.9939 LearningRate 0.0812 Epoch: 1 Global Step: 24590 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:03:17,413-Speed 3044.36 samples/sec Loss 14.0813 LearningRate 0.0812 Epoch: 1 Global Step: 24600 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:03:20,754-Speed 3065.36 samples/sec Loss 14.1745 LearningRate 0.0812 Epoch: 1 Global Step: 24610 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:03:24,104-Speed 3057.58 samples/sec Loss 14.2517 LearningRate 0.0812 Epoch: 1 Global Step: 24620 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:03:27,417-Speed 3092.49 samples/sec Loss 14.0991 LearningRate 0.0812 Epoch: 1 Global Step: 24630 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-04-27 04:03:30,767-Speed 3057.17 samples/sec Loss 14.0544 LearningRate 0.0811 Epoch: 1 Global Step: 24640 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:03:34,092-Speed 3080.97 samples/sec Loss 14.0559 LearningRate 0.0811 Epoch: 1 Global Step: 24650 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:03:37,425-Speed 3072.73 samples/sec Loss 14.2144 LearningRate 0.0811 Epoch: 1 Global Step: 24660 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:03:40,789-Speed 3044.91 samples/sec Loss 14.0039 LearningRate 0.0811 Epoch: 1 Global Step: 24670 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:03:44,052-Speed 3139.64 samples/sec Loss 14.1149 LearningRate 0.0811 Epoch: 1 Global Step: 24680 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:03:47,402-Speed 3057.60 samples/sec Loss 14.1923 LearningRate 0.0811 Epoch: 1 Global Step: 24690 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:03:50,701-Speed 3104.91 samples/sec Loss 14.0607 LearningRate 0.0811 Epoch: 1 Global Step: 24700 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:03:53,975-Speed 3128.97 samples/sec Loss 14.0129 LearningRate 0.0811 Epoch: 1 Global Step: 24710 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:03:57,299-Speed 3082.32 samples/sec Loss 14.0692 LearningRate 0.0811 Epoch: 1 Global Step: 24720 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:04:00,614-Speed 3089.37 samples/sec Loss 14.2281 LearningRate 0.0811 Epoch: 1 Global Step: 24730 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:04:03,906-Speed 3112.49 samples/sec Loss 14.1715 LearningRate 0.0811 Epoch: 1 Global Step: 24740 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:04:07,238-Speed 3073.74 samples/sec Loss 14.1549 LearningRate 0.0811 Epoch: 1 Global Step: 24750 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:04:10,575-Speed 3070.54 samples/sec Loss 14.0938 LearningRate 0.0811 Epoch: 1 Global Step: 24760 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:04:13,840-Speed 3136.73 samples/sec Loss 14.0692 LearningRate 0.0811 Epoch: 1 Global Step: 24770 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:04:17,194-Speed 3054.38 samples/sec Loss 13.9665 LearningRate 0.0810 Epoch: 1 Global Step: 24780 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:04:20,510-Speed 3089.27 samples/sec Loss 13.9075 LearningRate 0.0810 Epoch: 1 Global Step: 24790 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:04:23,861-Speed 3056.28 samples/sec Loss 14.0670 LearningRate 0.0810 Epoch: 1 Global Step: 24800 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:04:27,228-Speed 3042.75 samples/sec Loss 14.0135 LearningRate 0.0810 Epoch: 1 Global Step: 24810 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:04:30,593-Speed 3044.26 samples/sec Loss 14.1973 LearningRate 0.0810 Epoch: 1 Global Step: 24820 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:04:33,878-Speed 3117.91 samples/sec Loss 13.8699 LearningRate 0.0810 Epoch: 1 Global Step: 24830 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:04:37,501-Speed 2828.12 samples/sec Loss 13.9247 LearningRate 0.0810 Epoch: 1 Global Step: 24840 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:05:09,125-Speed 323.82 samples/sec Loss 12.8697 LearningRate 0.0810 Epoch: 2 Global Step: 24850 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:05:12,548-Speed 2993.06 samples/sec Loss 12.5972 LearningRate 0.0810 Epoch: 2 Global Step: 24860 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:05:15,824-Speed 3126.15 samples/sec Loss 12.4119 LearningRate 0.0810 Epoch: 2 Global Step: 24870 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:05:19,184-Speed 3048.68 samples/sec Loss 12.4678 LearningRate 0.0810 Epoch: 2 Global Step: 24880 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:05:22,457-Speed 3130.19 samples/sec Loss 12.5394 LearningRate 0.0810 Epoch: 2 Global Step: 24890 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:05:25,754-Speed 3108.35 samples/sec Loss 12.5143 LearningRate 0.0810 Epoch: 2 Global Step: 24900 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:05:29,050-Speed 3107.46 samples/sec Loss 12.6044 LearningRate 0.0810 Epoch: 2 Global Step: 24910 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:05:32,339-Speed 3114.05 samples/sec Loss 12.4336 LearningRate 0.0809 Epoch: 2 Global Step: 24920 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:05:35,685-Speed 3061.87 samples/sec Loss 12.3095 LearningRate 0.0809 Epoch: 2 Global Step: 24930 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:05:39,019-Speed 3072.31 samples/sec Loss 12.5376 LearningRate 0.0809 Epoch: 2 Global Step: 24940 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:05:42,328-Speed 3095.24 samples/sec Loss 12.6920 LearningRate 0.0809 Epoch: 2 Global Step: 24950 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:05:45,668-Speed 3066.92 samples/sec Loss 12.5537 LearningRate 0.0809 Epoch: 2 Global Step: 24960 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:05:49,053-Speed 3026.00 samples/sec Loss 12.4234 LearningRate 0.0809 Epoch: 2 Global Step: 24970 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:05:52,322-Speed 3133.80 samples/sec Loss 12.5275 LearningRate 0.0809 Epoch: 2 Global Step: 24980 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:05:55,622-Speed 3103.94 samples/sec Loss 12.5730 LearningRate 0.0809 Epoch: 2 Global Step: 24990 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:05:58,950-Speed 3077.93 samples/sec Loss 12.6667 LearningRate 0.0809 Epoch: 2 Global Step: 25000 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:06:02,334-Speed 3026.35 samples/sec Loss 12.6471 LearningRate 0.0809 Epoch: 2 Global Step: 25010 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:06:05,690-Speed 3052.74 samples/sec Loss 12.6139 LearningRate 0.0809 Epoch: 2 Global Step: 25020 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:06:08,949-Speed 3142.43 samples/sec Loss 12.6491 LearningRate 0.0809 Epoch: 2 Global Step: 25030 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:06:12,236-Speed 3116.71 samples/sec Loss 12.7140 LearningRate 0.0809 Epoch: 2 Global Step: 25040 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:06:15,553-Speed 3087.42 samples/sec Loss 12.5319 LearningRate 0.0808 Epoch: 2 Global Step: 25050 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:06:18,812-Speed 3143.61 samples/sec Loss 12.6874 LearningRate 0.0808 Epoch: 2 Global Step: 25060 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:06:22,119-Speed 3097.47 samples/sec Loss 12.8795 LearningRate 0.0808 Epoch: 2 Global Step: 25070 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:06:25,407-Speed 3115.60 samples/sec Loss 12.6344 LearningRate 0.0808 Epoch: 2 Global Step: 25080 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:06:28,836-Speed 2987.31 samples/sec Loss 12.6583 LearningRate 0.0808 Epoch: 2 Global Step: 25090 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:06:32,097-Speed 3140.89 samples/sec Loss 12.6715 LearningRate 0.0808 Epoch: 2 Global Step: 25100 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:06:35,373-Speed 3126.15 samples/sec Loss 12.7642 LearningRate 0.0808 Epoch: 2 Global Step: 25110 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:06:38,694-Speed 3085.04 samples/sec Loss 12.7345 LearningRate 0.0808 Epoch: 2 Global Step: 25120 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:06:41,975-Speed 3121.11 samples/sec Loss 12.6872 LearningRate 0.0808 Epoch: 2 Global Step: 25130 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:06:45,301-Speed 3080.38 samples/sec Loss 12.8515 LearningRate 0.0808 Epoch: 2 Global Step: 25140 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:06:48,624-Speed 3082.68 samples/sec Loss 12.7765 LearningRate 0.0808 Epoch: 2 Global Step: 25150 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:06:51,886-Speed 3139.61 samples/sec Loss 12.6247 LearningRate 0.0808 Epoch: 2 Global Step: 25160 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:06:55,147-Speed 3141.51 samples/sec Loss 12.7436 LearningRate 0.0808 Epoch: 2 Global Step: 25170 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:06:58,463-Speed 3089.44 samples/sec Loss 12.6856 LearningRate 0.0808 Epoch: 2 Global Step: 25180 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:07:01,783-Speed 3084.91 samples/sec Loss 12.7939 LearningRate 0.0807 Epoch: 2 Global Step: 25190 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:07:05,048-Speed 3137.18 samples/sec Loss 12.8209 LearningRate 0.0807 Epoch: 2 Global Step: 25200 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:07:08,354-Speed 3099.07 samples/sec Loss 12.8890 LearningRate 0.0807 Epoch: 2 Global Step: 25210 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:07:11,720-Speed 3042.66 samples/sec Loss 12.9401 LearningRate 0.0807 Epoch: 2 Global Step: 25220 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:07:15,094-Speed 3036.15 samples/sec Loss 12.7394 LearningRate 0.0807 Epoch: 2 Global Step: 25230 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:07:18,393-Speed 3104.76 samples/sec Loss 12.8329 LearningRate 0.0807 Epoch: 2 Global Step: 25240 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:07:21,675-Speed 3121.41 samples/sec Loss 12.8991 LearningRate 0.0807 Epoch: 2 Global Step: 25250 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:07:24,969-Speed 3110.24 samples/sec Loss 12.8395 LearningRate 0.0807 Epoch: 2 Global Step: 25260 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:07:28,262-Speed 3110.55 samples/sec Loss 12.9104 LearningRate 0.0807 Epoch: 2 Global Step: 25270 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:07:31,625-Speed 3045.84 samples/sec Loss 12.9829 LearningRate 0.0807 Epoch: 2 Global Step: 25280 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:07:34,957-Speed 3073.96 samples/sec Loss 12.8341 LearningRate 0.0807 Epoch: 2 Global Step: 25290 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:07:38,349-Speed 3019.81 samples/sec Loss 12.7605 LearningRate 0.0807 Epoch: 2 Global Step: 25300 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:07:41,666-Speed 3087.66 samples/sec Loss 12.9570 LearningRate 0.0807 Epoch: 2 Global Step: 25310 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:07:44,961-Speed 3109.06 samples/sec Loss 12.9527 LearningRate 0.0807 Epoch: 2 Global Step: 25320 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:07:48,246-Speed 3118.62 samples/sec Loss 12.9605 LearningRate 0.0806 Epoch: 2 Global Step: 25330 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:07:51,607-Speed 3047.32 samples/sec Loss 12.9940 LearningRate 0.0806 Epoch: 2 Global Step: 25340 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:07:54,985-Speed 3031.91 samples/sec Loss 12.8001 LearningRate 0.0806 Epoch: 2 Global Step: 25350 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:07:58,343-Speed 3050.66 samples/sec Loss 13.1202 LearningRate 0.0806 Epoch: 2 Global Step: 25360 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:08:01,614-Speed 3131.81 samples/sec Loss 13.0941 LearningRate 0.0806 Epoch: 2 Global Step: 25370 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:08:04,880-Speed 3135.28 samples/sec Loss 12.9728 LearningRate 0.0806 Epoch: 2 Global Step: 25380 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:08:08,177-Speed 3106.95 samples/sec Loss 13.0060 LearningRate 0.0806 Epoch: 2 Global Step: 25390 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:08:11,491-Speed 3091.24 samples/sec Loss 13.0978 LearningRate 0.0806 Epoch: 2 Global Step: 25400 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:08:14,769-Speed 3125.22 samples/sec Loss 12.9839 LearningRate 0.0806 Epoch: 2 Global Step: 25410 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:08:18,102-Speed 3073.00 samples/sec Loss 12.9628 LearningRate 0.0806 Epoch: 2 Global Step: 25420 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:08:21,487-Speed 3026.16 samples/sec Loss 13.1029 LearningRate 0.0806 Epoch: 2 Global Step: 25430 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:08:24,796-Speed 3095.19 samples/sec Loss 13.0235 LearningRate 0.0806 Epoch: 2 Global Step: 25440 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:08:28,145-Speed 3058.96 samples/sec Loss 13.0573 LearningRate 0.0806 Epoch: 2 Global Step: 25450 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:08:31,461-Speed 3089.05 samples/sec Loss 13.0414 LearningRate 0.0806 Epoch: 2 Global Step: 25460 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-04-27 04:08:34,756-Speed 3109.56 samples/sec Loss 13.0519 LearningRate 0.0805 Epoch: 2 Global Step: 25470 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:08:38,052-Speed 3107.46 samples/sec Loss 13.1016 LearningRate 0.0805 Epoch: 2 Global Step: 25480 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:08:41,353-Speed 3103.43 samples/sec Loss 13.2219 LearningRate 0.0805 Epoch: 2 Global Step: 25490 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:08:44,725-Speed 3037.10 samples/sec Loss 12.9425 LearningRate 0.0805 Epoch: 2 Global Step: 25500 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:08:48,056-Speed 3075.13 samples/sec Loss 13.0005 LearningRate 0.0805 Epoch: 2 Global Step: 25510 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:08:51,393-Speed 3070.05 samples/sec Loss 13.0378 LearningRate 0.0805 Epoch: 2 Global Step: 25520 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:08:54,722-Speed 3077.28 samples/sec Loss 13.0149 LearningRate 0.0805 Epoch: 2 Global Step: 25530 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:08:58,060-Speed 3067.88 samples/sec Loss 13.0576 LearningRate 0.0805 Epoch: 2 Global Step: 25540 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:09:01,379-Speed 3086.58 samples/sec Loss 13.2541 LearningRate 0.0805 Epoch: 2 Global Step: 25550 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:09:04,686-Speed 3097.30 samples/sec Loss 13.1641 LearningRate 0.0805 Epoch: 2 Global Step: 25560 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:09:08,104-Speed 2997.33 samples/sec Loss 13.1687 LearningRate 0.0805 Epoch: 2 Global Step: 25570 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:09:11,405-Speed 3103.20 samples/sec Loss 13.2196 LearningRate 0.0805 Epoch: 2 Global Step: 25580 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:09:14,755-Speed 3056.73 samples/sec Loss 13.1589 LearningRate 0.0805 Epoch: 2 Global Step: 25590 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:09:18,025-Speed 3133.30 samples/sec Loss 13.1455 LearningRate 0.0805 Epoch: 2 Global Step: 25600 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:09:21,299-Speed 3128.89 samples/sec Loss 13.1844 LearningRate 0.0804 Epoch: 2 Global Step: 25610 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:09:24,637-Speed 3068.89 samples/sec Loss 13.1072 LearningRate 0.0804 Epoch: 2 Global Step: 25620 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:09:27,945-Speed 3095.91 samples/sec Loss 13.1669 LearningRate 0.0804 Epoch: 2 Global Step: 25630 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:09:31,241-Speed 3108.30 samples/sec Loss 13.2314 LearningRate 0.0804 Epoch: 2 Global Step: 25640 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:09:34,558-Speed 3088.02 samples/sec Loss 13.2228 LearningRate 0.0804 Epoch: 2 Global Step: 25650 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:09:37,850-Speed 3110.96 samples/sec Loss 13.3326 LearningRate 0.0804 Epoch: 2 Global Step: 25660 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:09:41,195-Speed 3061.91 samples/sec Loss 13.2058 LearningRate 0.0804 Epoch: 2 Global Step: 25670 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-04-27 04:09:44,499-Speed 3100.42 samples/sec Loss 13.1520 LearningRate 0.0804 Epoch: 2 Global Step: 25680 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:09:47,868-Speed 3040.10 samples/sec Loss 13.1829 LearningRate 0.0804 Epoch: 2 Global Step: 25690 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:09:51,172-Speed 3100.19 samples/sec Loss 13.1797 LearningRate 0.0804 Epoch: 2 Global Step: 25700 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:09:54,486-Speed 3091.52 samples/sec Loss 13.2022 LearningRate 0.0804 Epoch: 2 Global Step: 25710 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:09:57,740-Speed 3147.37 samples/sec Loss 13.1975 LearningRate 0.0804 Epoch: 2 Global Step: 25720 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:10:01,037-Speed 3106.54 samples/sec Loss 13.2883 LearningRate 0.0804 Epoch: 2 Global Step: 25730 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:10:04,409-Speed 3037.97 samples/sec Loss 13.2799 LearningRate 0.0804 Epoch: 2 Global Step: 25740 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:10:07,706-Speed 3107.35 samples/sec Loss 13.3904 LearningRate 0.0803 Epoch: 2 Global Step: 25750 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:10:11,024-Speed 3086.95 samples/sec Loss 13.1118 LearningRate 0.0803 Epoch: 2 Global Step: 25760 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:10:14,365-Speed 3066.30 samples/sec Loss 13.2502 LearningRate 0.0803 Epoch: 2 Global Step: 25770 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:10:17,650-Speed 3117.65 samples/sec Loss 13.2440 LearningRate 0.0803 Epoch: 2 Global Step: 25780 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:10:20,942-Speed 3112.23 samples/sec Loss 13.1858 LearningRate 0.0803 Epoch: 2 Global Step: 25790 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:10:24,218-Speed 3126.95 samples/sec Loss 13.2251 LearningRate 0.0803 Epoch: 2 Global Step: 25800 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:10:27,499-Speed 3121.76 samples/sec Loss 13.2405 LearningRate 0.0803 Epoch: 2 Global Step: 25810 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:10:30,827-Speed 3078.12 samples/sec Loss 13.3319 LearningRate 0.0803 Epoch: 2 Global Step: 25820 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:10:34,131-Speed 3100.11 samples/sec Loss 13.2975 LearningRate 0.0803 Epoch: 2 Global Step: 25830 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:10:37,440-Speed 3096.22 samples/sec Loss 13.2622 LearningRate 0.0803 Epoch: 2 Global Step: 25840 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:10:40,828-Speed 3023.39 samples/sec Loss 13.3473 LearningRate 0.0803 Epoch: 2 Global Step: 25850 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:10:44,194-Speed 3042.45 samples/sec Loss 13.2358 LearningRate 0.0803 Epoch: 2 Global Step: 25860 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:10:47,555-Speed 3048.16 samples/sec Loss 13.2739 LearningRate 0.0803 Epoch: 2 Global Step: 25870 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:10:50,863-Speed 3096.67 samples/sec Loss 13.2940 LearningRate 0.0802 Epoch: 2 Global Step: 25880 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:10:54,292-Speed 2987.01 samples/sec Loss 13.2774 LearningRate 0.0802 Epoch: 2 Global Step: 25890 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:10:57,695-Speed 3010.79 samples/sec Loss 13.2098 LearningRate 0.0802 Epoch: 2 Global Step: 25900 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:11:01,013-Speed 3086.66 samples/sec Loss 13.2957 LearningRate 0.0802 Epoch: 2 Global Step: 25910 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:11:04,352-Speed 3067.98 samples/sec Loss 13.2373 LearningRate 0.0802 Epoch: 2 Global Step: 25920 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:11:07,681-Speed 3076.98 samples/sec Loss 13.2499 LearningRate 0.0802 Epoch: 2 Global Step: 25930 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:11:10,954-Speed 3129.55 samples/sec Loss 13.2189 LearningRate 0.0802 Epoch: 2 Global Step: 25940 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:11:14,246-Speed 3111.40 samples/sec Loss 13.2560 LearningRate 0.0802 Epoch: 2 Global Step: 25950 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:11:17,518-Speed 3130.70 samples/sec Loss 13.3923 LearningRate 0.0802 Epoch: 2 Global Step: 25960 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:11:20,797-Speed 3123.48 samples/sec Loss 13.2917 LearningRate 0.0802 Epoch: 2 Global Step: 25970 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:11:24,129-Speed 3074.20 samples/sec Loss 13.3624 LearningRate 0.0802 Epoch: 2 Global Step: 25980 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:11:27,451-Speed 3083.42 samples/sec Loss 13.3054 LearningRate 0.0802 Epoch: 2 Global Step: 25990 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:11:30,813-Speed 3047.06 samples/sec Loss 13.4357 LearningRate 0.0802 Epoch: 2 Global Step: 26000 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:11:34,168-Speed 3052.79 samples/sec Loss 13.5497 LearningRate 0.0802 Epoch: 2 Global Step: 26010 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:11:37,463-Speed 3109.26 samples/sec Loss 13.3280 LearningRate 0.0801 Epoch: 2 Global Step: 26020 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:11:40,830-Speed 3042.36 samples/sec Loss 13.4148 LearningRate 0.0801 Epoch: 2 Global Step: 26030 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:11:44,162-Speed 3073.20 samples/sec Loss 13.4633 LearningRate 0.0801 Epoch: 2 Global Step: 26040 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:11:47,475-Speed 3092.37 samples/sec Loss 13.3002 LearningRate 0.0801 Epoch: 2 Global Step: 26050 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:11:50,827-Speed 3055.98 samples/sec Loss 13.3578 LearningRate 0.0801 Epoch: 2 Global Step: 26060 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:11:54,188-Speed 3046.68 samples/sec Loss 13.3842 LearningRate 0.0801 Epoch: 2 Global Step: 26070 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:11:57,564-Speed 3034.52 samples/sec Loss 13.2572 LearningRate 0.0801 Epoch: 2 Global Step: 26080 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:12:00,892-Speed 3077.55 samples/sec Loss 13.4383 LearningRate 0.0801 Epoch: 2 Global Step: 26090 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:12:04,255-Speed 3045.48 samples/sec Loss 13.3738 LearningRate 0.0801 Epoch: 2 Global Step: 26100 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:12:07,600-Speed 3063.29 samples/sec Loss 13.3507 LearningRate 0.0801 Epoch: 2 Global Step: 26110 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:12:10,953-Speed 3054.76 samples/sec Loss 13.4124 LearningRate 0.0801 Epoch: 2 Global Step: 26120 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:12:14,244-Speed 3112.14 samples/sec Loss 13.1991 LearningRate 0.0801 Epoch: 2 Global Step: 26130 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:12:17,540-Speed 3107.03 samples/sec Loss 13.4492 LearningRate 0.0801 Epoch: 2 Global Step: 26140 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:12:20,834-Speed 3110.31 samples/sec Loss 13.2803 LearningRate 0.0801 Epoch: 2 Global Step: 26150 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:12:24,189-Speed 3052.39 samples/sec Loss 13.3670 LearningRate 0.0800 Epoch: 2 Global Step: 26160 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-04-27 04:12:27,495-Speed 3098.42 samples/sec Loss 13.4031 LearningRate 0.0800 Epoch: 2 Global Step: 26170 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:12:30,800-Speed 3100.29 samples/sec Loss 13.3939 LearningRate 0.0800 Epoch: 2 Global Step: 26180 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:12:34,129-Speed 3075.80 samples/sec Loss 13.4229 LearningRate 0.0800 Epoch: 2 Global Step: 26190 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:12:37,425-Speed 3107.89 samples/sec Loss 13.2912 LearningRate 0.0800 Epoch: 2 Global Step: 26200 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:12:40,729-Speed 3100.94 samples/sec Loss 13.5243 LearningRate 0.0800 Epoch: 2 Global Step: 26210 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:12:44,055-Speed 3078.84 samples/sec Loss 13.4336 LearningRate 0.0800 Epoch: 2 Global Step: 26220 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:12:47,373-Speed 3087.62 samples/sec Loss 13.5761 LearningRate 0.0800 Epoch: 2 Global Step: 26230 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:12:50,691-Speed 3087.03 samples/sec Loss 13.4652 LearningRate 0.0800 Epoch: 2 Global Step: 26240 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:12:54,020-Speed 3077.11 samples/sec Loss 13.5149 LearningRate 0.0800 Epoch: 2 Global Step: 26250 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:12:57,406-Speed 3024.96 samples/sec Loss 13.4165 LearningRate 0.0800 Epoch: 2 Global Step: 26260 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:13:00,694-Speed 3115.33 samples/sec Loss 13.5074 LearningRate 0.0800 Epoch: 2 Global Step: 26270 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:13:04,028-Speed 3071.82 samples/sec Loss 13.5153 LearningRate 0.0800 Epoch: 2 Global Step: 26280 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:13:07,358-Speed 3076.35 samples/sec Loss 13.2832 LearningRate 0.0800 Epoch: 2 Global Step: 26290 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:13:10,643-Speed 3117.44 samples/sec Loss 13.3348 LearningRate 0.0799 Epoch: 2 Global Step: 26300 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:13:13,981-Speed 3068.71 samples/sec Loss 13.3035 LearningRate 0.0799 Epoch: 2 Global Step: 26310 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:13:17,279-Speed 3105.65 samples/sec Loss 13.4156 LearningRate 0.0799 Epoch: 2 Global Step: 26320 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:13:20,676-Speed 3016.87 samples/sec Loss 13.5227 LearningRate 0.0799 Epoch: 2 Global Step: 26330 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:13:23,968-Speed 3111.59 samples/sec Loss 13.3722 LearningRate 0.0799 Epoch: 2 Global Step: 26340 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:13:27,321-Speed 3053.79 samples/sec Loss 13.4578 LearningRate 0.0799 Epoch: 2 Global Step: 26350 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:13:30,670-Speed 3058.54 samples/sec Loss 13.6569 LearningRate 0.0799 Epoch: 2 Global Step: 26360 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:13:33,991-Speed 3084.69 samples/sec Loss 13.5843 LearningRate 0.0799 Epoch: 2 Global Step: 26370 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:13:37,355-Speed 3045.19 samples/sec Loss 13.3414 LearningRate 0.0799 Epoch: 2 Global Step: 26380 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:13:40,658-Speed 3101.10 samples/sec Loss 13.4404 LearningRate 0.0799 Epoch: 2 Global Step: 26390 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:13:43,925-Speed 3135.31 samples/sec Loss 13.5661 LearningRate 0.0799 Epoch: 2 Global Step: 26400 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:13:47,197-Speed 3130.02 samples/sec Loss 13.4212 LearningRate 0.0799 Epoch: 2 Global Step: 26410 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:13:50,529-Speed 3074.40 samples/sec Loss 13.5245 LearningRate 0.0799 Epoch: 2 Global Step: 26420 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:13:53,818-Speed 3114.59 samples/sec Loss 13.4970 LearningRate 0.0799 Epoch: 2 Global Step: 26430 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:13:57,098-Speed 3122.54 samples/sec Loss 13.5489 LearningRate 0.0798 Epoch: 2 Global Step: 26440 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:14:00,386-Speed 3115.50 samples/sec Loss 13.5165 LearningRate 0.0798 Epoch: 2 Global Step: 26450 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:14:03,652-Speed 3136.74 samples/sec Loss 13.5680 LearningRate 0.0798 Epoch: 2 Global Step: 26460 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:14:06,942-Speed 3113.46 samples/sec Loss 13.4717 LearningRate 0.0798 Epoch: 2 Global Step: 26470 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:14:10,212-Speed 3131.76 samples/sec Loss 13.5956 LearningRate 0.0798 Epoch: 2 Global Step: 26480 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:14:13,556-Speed 3063.40 samples/sec Loss 13.4282 LearningRate 0.0798 Epoch: 2 Global Step: 26490 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:14:16,927-Speed 3039.12 samples/sec Loss 13.5331 LearningRate 0.0798 Epoch: 2 Global Step: 26500 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:14:20,286-Speed 3049.30 samples/sec Loss 13.5100 LearningRate 0.0798 Epoch: 2 Global Step: 26510 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:14:23,687-Speed 3011.61 samples/sec Loss 13.4870 LearningRate 0.0798 Epoch: 2 Global Step: 26520 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:14:26,968-Speed 3122.06 samples/sec Loss 13.6275 LearningRate 0.0798 Epoch: 2 Global Step: 26530 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:14:30,298-Speed 3076.24 samples/sec Loss 13.4800 LearningRate 0.0798 Epoch: 2 Global Step: 26540 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:14:33,648-Speed 3057.96 samples/sec Loss 13.5390 LearningRate 0.0798 Epoch: 2 Global Step: 26550 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:14:37,031-Speed 3027.44 samples/sec Loss 13.4445 LearningRate 0.0798 Epoch: 2 Global Step: 26560 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:14:40,350-Speed 3086.36 samples/sec Loss 13.5555 LearningRate 0.0798 Epoch: 2 Global Step: 26570 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-04-27 04:14:43,639-Speed 3114.83 samples/sec Loss 13.6011 LearningRate 0.0797 Epoch: 2 Global Step: 26580 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:14:46,933-Speed 3109.83 samples/sec Loss 13.5171 LearningRate 0.0797 Epoch: 2 Global Step: 26590 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:14:50,268-Speed 3070.71 samples/sec Loss 13.7166 LearningRate 0.0797 Epoch: 2 Global Step: 26600 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:14:53,554-Speed 3117.24 samples/sec Loss 13.4712 LearningRate 0.0797 Epoch: 2 Global Step: 26610 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:14:56,901-Speed 3060.71 samples/sec Loss 13.3705 LearningRate 0.0797 Epoch: 2 Global Step: 26620 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:15:00,234-Speed 3073.45 samples/sec Loss 13.6024 LearningRate 0.0797 Epoch: 2 Global Step: 26630 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:15:03,581-Speed 3059.55 samples/sec Loss 13.3976 LearningRate 0.0797 Epoch: 2 Global Step: 26640 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:15:06,932-Speed 3057.07 samples/sec Loss 13.5172 LearningRate 0.0797 Epoch: 2 Global Step: 26650 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:15:10,258-Speed 3079.40 samples/sec Loss 13.3320 LearningRate 0.0797 Epoch: 2 Global Step: 26660 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:15:13,605-Speed 3061.15 samples/sec Loss 13.5578 LearningRate 0.0797 Epoch: 2 Global Step: 26670 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:15:16,923-Speed 3087.17 samples/sec Loss 13.7853 LearningRate 0.0797 Epoch: 2 Global Step: 26680 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:15:20,279-Speed 3051.83 samples/sec Loss 13.7889 LearningRate 0.0797 Epoch: 2 Global Step: 26690 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:15:23,658-Speed 3030.74 samples/sec Loss 13.5547 LearningRate 0.0797 Epoch: 2 Global Step: 26700 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:15:27,038-Speed 3030.46 samples/sec Loss 13.6240 LearningRate 0.0797 Epoch: 2 Global Step: 26710 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:15:30,335-Speed 3107.15 samples/sec Loss 13.4607 LearningRate 0.0796 Epoch: 2 Global Step: 26720 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:15:33,658-Speed 3082.24 samples/sec Loss 13.6555 LearningRate 0.0796 Epoch: 2 Global Step: 26730 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:15:36,984-Speed 3080.45 samples/sec Loss 13.5995 LearningRate 0.0796 Epoch: 2 Global Step: 26740 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:15:40,312-Speed 3077.60 samples/sec Loss 13.4418 LearningRate 0.0796 Epoch: 2 Global Step: 26750 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:15:43,681-Speed 3040.51 samples/sec Loss 13.6021 LearningRate 0.0796 Epoch: 2 Global Step: 26760 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:15:47,051-Speed 3039.64 samples/sec Loss 13.5016 LearningRate 0.0796 Epoch: 2 Global Step: 26770 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:15:50,425-Speed 3036.18 samples/sec Loss 13.4610 LearningRate 0.0796 Epoch: 2 Global Step: 26780 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:15:53,727-Speed 3101.96 samples/sec Loss 13.5754 LearningRate 0.0796 Epoch: 2 Global Step: 26790 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:15:57,030-Speed 3100.98 samples/sec Loss 13.5721 LearningRate 0.0796 Epoch: 2 Global Step: 26800 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:16:00,296-Speed 3136.37 samples/sec Loss 13.5738 LearningRate 0.0796 Epoch: 2 Global Step: 26810 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:16:03,636-Speed 3066.71 samples/sec Loss 13.6350 LearningRate 0.0796 Epoch: 2 Global Step: 26820 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:16:06,980-Speed 3063.07 samples/sec Loss 13.7326 LearningRate 0.0796 Epoch: 2 Global Step: 26830 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:16:10,317-Speed 3069.56 samples/sec Loss 13.3724 LearningRate 0.0796 Epoch: 2 Global Step: 26840 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:16:13,665-Speed 3060.03 samples/sec Loss 13.5761 LearningRate 0.0796 Epoch: 2 Global Step: 26850 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:16:17,006-Speed 3066.26 samples/sec Loss 13.4776 LearningRate 0.0795 Epoch: 2 Global Step: 26860 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:16:20,370-Speed 3044.24 samples/sec Loss 13.5383 LearningRate 0.0795 Epoch: 2 Global Step: 26870 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:16:23,736-Speed 3043.03 samples/sec Loss 13.5240 LearningRate 0.0795 Epoch: 2 Global Step: 26880 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:16:27,028-Speed 3112.15 samples/sec Loss 13.7066 LearningRate 0.0795 Epoch: 2 Global Step: 26890 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:16:30,410-Speed 3028.30 samples/sec Loss 13.6809 LearningRate 0.0795 Epoch: 2 Global Step: 26900 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:16:33,774-Speed 3045.07 samples/sec Loss 13.6003 LearningRate 0.0795 Epoch: 2 Global Step: 26910 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:16:37,112-Speed 3067.87 samples/sec Loss 13.6186 LearningRate 0.0795 Epoch: 2 Global Step: 26920 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:16:40,447-Speed 3071.77 samples/sec Loss 13.5171 LearningRate 0.0795 Epoch: 2 Global Step: 26930 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:16:43,750-Speed 3101.63 samples/sec Loss 13.6327 LearningRate 0.0795 Epoch: 2 Global Step: 26940 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:16:47,022-Speed 3130.81 samples/sec Loss 13.5886 LearningRate 0.0795 Epoch: 2 Global Step: 26950 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:16:50,404-Speed 3028.94 samples/sec Loss 13.5488 LearningRate 0.0795 Epoch: 2 Global Step: 26960 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:16:53,725-Speed 3084.37 samples/sec Loss 13.7017 LearningRate 0.0795 Epoch: 2 Global Step: 26970 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:16:57,184-Speed 2960.58 samples/sec Loss 13.6403 LearningRate 0.0795 Epoch: 2 Global Step: 26980 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:17:00,585-Speed 3012.49 samples/sec Loss 13.6375 LearningRate 0.0795 Epoch: 2 Global Step: 26990 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:17:03,905-Speed 3084.67 samples/sec Loss 13.5953 LearningRate 0.0794 Epoch: 2 Global Step: 27000 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:17:07,247-Speed 3065.46 samples/sec Loss 13.6985 LearningRate 0.0794 Epoch: 2 Global Step: 27010 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:17:10,571-Speed 3081.76 samples/sec Loss 13.6207 LearningRate 0.0794 Epoch: 2 Global Step: 27020 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:17:13,850-Speed 3123.43 samples/sec Loss 13.6665 LearningRate 0.0794 Epoch: 2 Global Step: 27030 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:17:17,212-Speed 3047.32 samples/sec Loss 13.7795 LearningRate 0.0794 Epoch: 2 Global Step: 27040 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:17:20,564-Speed 3055.04 samples/sec Loss 13.6211 LearningRate 0.0794 Epoch: 2 Global Step: 27050 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:17:23,853-Speed 3114.63 samples/sec Loss 13.6302 LearningRate 0.0794 Epoch: 2 Global Step: 27060 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:17:27,146-Speed 3110.33 samples/sec Loss 13.7523 LearningRate 0.0794 Epoch: 2 Global Step: 27070 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:17:30,464-Speed 3087.41 samples/sec Loss 13.4932 LearningRate 0.0794 Epoch: 2 Global Step: 27080 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-04-27 04:17:33,761-Speed 3106.93 samples/sec Loss 13.4713 LearningRate 0.0794 Epoch: 2 Global Step: 27090 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:17:37,063-Speed 3102.05 samples/sec Loss 13.5397 LearningRate 0.0794 Epoch: 2 Global Step: 27100 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:17:40,369-Speed 3098.48 samples/sec Loss 13.6291 LearningRate 0.0794 Epoch: 2 Global Step: 27110 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:17:43,643-Speed 3128.82 samples/sec Loss 13.5371 LearningRate 0.0794 Epoch: 2 Global Step: 27120 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:17:46,970-Speed 3079.19 samples/sec Loss 13.5747 LearningRate 0.0794 Epoch: 2 Global Step: 27130 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:17:50,225-Speed 3146.06 samples/sec Loss 13.5265 LearningRate 0.0793 Epoch: 2 Global Step: 27140 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:17:53,563-Speed 3069.17 samples/sec Loss 13.6397 LearningRate 0.0793 Epoch: 2 Global Step: 27150 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:17:56,856-Speed 3110.28 samples/sec Loss 13.6325 LearningRate 0.0793 Epoch: 2 Global Step: 27160 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:18:00,204-Speed 3059.72 samples/sec Loss 13.6784 LearningRate 0.0793 Epoch: 2 Global Step: 27170 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:18:03,508-Speed 3099.54 samples/sec Loss 13.7335 LearningRate 0.0793 Epoch: 2 Global Step: 27180 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:18:06,855-Speed 3060.56 samples/sec Loss 13.6488 LearningRate 0.0793 Epoch: 2 Global Step: 27190 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:18:10,240-Speed 3027.06 samples/sec Loss 13.4915 LearningRate 0.0793 Epoch: 2 Global Step: 27200 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:18:13,545-Speed 3099.33 samples/sec Loss 13.6367 LearningRate 0.0793 Epoch: 2 Global Step: 27210 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:18:16,853-Speed 3096.95 samples/sec Loss 13.5852 LearningRate 0.0793 Epoch: 2 Global Step: 27220 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:18:20,188-Speed 3071.64 samples/sec Loss 13.8103 LearningRate 0.0793 Epoch: 2 Global Step: 27230 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:18:23,503-Speed 3089.69 samples/sec Loss 13.7371 LearningRate 0.0793 Epoch: 2 Global Step: 27240 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:18:26,788-Speed 3118.65 samples/sec Loss 13.6256 LearningRate 0.0793 Epoch: 2 Global Step: 27250 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:18:30,068-Speed 3123.24 samples/sec Loss 13.5175 LearningRate 0.0793 Epoch: 2 Global Step: 27260 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:18:33,383-Speed 3089.50 samples/sec Loss 13.6015 LearningRate 0.0793 Epoch: 2 Global Step: 27270 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:18:36,699-Speed 3089.05 samples/sec Loss 13.6799 LearningRate 0.0792 Epoch: 2 Global Step: 27280 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:18:40,087-Speed 3023.68 samples/sec Loss 13.5009 LearningRate 0.0792 Epoch: 2 Global Step: 27290 Fp16 Grad Scale: 262144 Required: 21 hours Training: 2022-04-27 04:18:43,401-Speed 3090.29 samples/sec Loss 13.7015 LearningRate 0.0792 Epoch: 2 Global Step: 27300 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:18:46,693-Speed 3111.85 samples/sec Loss 13.3657 LearningRate 0.0792 Epoch: 2 Global Step: 27310 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:18:50,087-Speed 3017.33 samples/sec Loss 13.5644 LearningRate 0.0792 Epoch: 2 Global Step: 27320 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:18:53,490-Speed 3010.13 samples/sec Loss 13.5812 LearningRate 0.0792 Epoch: 2 Global Step: 27330 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:18:56,821-Speed 3074.97 samples/sec Loss 13.8772 LearningRate 0.0792 Epoch: 2 Global Step: 27340 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:19:00,169-Speed 3059.10 samples/sec Loss 13.7292 LearningRate 0.0792 Epoch: 2 Global Step: 27350 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:19:03,528-Speed 3050.23 samples/sec Loss 13.8678 LearningRate 0.0792 Epoch: 2 Global Step: 27360 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:19:06,824-Speed 3107.75 samples/sec Loss 13.6787 LearningRate 0.0792 Epoch: 2 Global Step: 27370 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:19:10,198-Speed 3035.82 samples/sec Loss 13.7633 LearningRate 0.0792 Epoch: 2 Global Step: 27380 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:19:13,494-Speed 3107.46 samples/sec Loss 13.6325 LearningRate 0.0792 Epoch: 2 Global Step: 27390 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:19:16,813-Speed 3087.83 samples/sec Loss 13.7208 LearningRate 0.0792 Epoch: 2 Global Step: 27400 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:19:20,143-Speed 3076.03 samples/sec Loss 13.6611 LearningRate 0.0791 Epoch: 2 Global Step: 27410 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:19:23,416-Speed 3129.89 samples/sec Loss 13.7780 LearningRate 0.0791 Epoch: 2 Global Step: 27420 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:19:26,741-Speed 3080.55 samples/sec Loss 13.5888 LearningRate 0.0791 Epoch: 2 Global Step: 27430 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:19:30,014-Speed 3129.70 samples/sec Loss 13.7857 LearningRate 0.0791 Epoch: 2 Global Step: 27440 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:19:33,338-Speed 3081.29 samples/sec Loss 13.7826 LearningRate 0.0791 Epoch: 2 Global Step: 27450 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:19:36,684-Speed 3061.16 samples/sec Loss 13.5985 LearningRate 0.0791 Epoch: 2 Global Step: 27460 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:19:40,023-Speed 3067.98 samples/sec Loss 13.6958 LearningRate 0.0791 Epoch: 2 Global Step: 27470 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:19:43,334-Speed 3093.59 samples/sec Loss 13.6256 LearningRate 0.0791 Epoch: 2 Global Step: 27480 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:19:46,663-Speed 3077.05 samples/sec Loss 13.7637 LearningRate 0.0791 Epoch: 2 Global Step: 27490 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:19:49,933-Speed 3132.87 samples/sec Loss 13.6159 LearningRate 0.0791 Epoch: 2 Global Step: 27500 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:19:53,259-Speed 3078.93 samples/sec Loss 13.5868 LearningRate 0.0791 Epoch: 2 Global Step: 27510 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:19:56,588-Speed 3077.04 samples/sec Loss 13.6989 LearningRate 0.0791 Epoch: 2 Global Step: 27520 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:19:59,908-Speed 3085.49 samples/sec Loss 13.7079 LearningRate 0.0791 Epoch: 2 Global Step: 27530 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:20:03,185-Speed 3125.71 samples/sec Loss 13.7187 LearningRate 0.0791 Epoch: 2 Global Step: 27540 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:20:06,572-Speed 3024.76 samples/sec Loss 13.7556 LearningRate 0.0790 Epoch: 2 Global Step: 27550 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:20:09,864-Speed 3110.98 samples/sec Loss 13.7629 LearningRate 0.0790 Epoch: 2 Global Step: 27560 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:20:13,194-Speed 3076.35 samples/sec Loss 13.7720 LearningRate 0.0790 Epoch: 2 Global Step: 27570 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:20:16,525-Speed 3074.71 samples/sec Loss 13.5793 LearningRate 0.0790 Epoch: 2 Global Step: 27580 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:20:19,847-Speed 3083.28 samples/sec Loss 13.6874 LearningRate 0.0790 Epoch: 2 Global Step: 27590 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:20:23,111-Speed 3138.19 samples/sec Loss 13.6645 LearningRate 0.0790 Epoch: 2 Global Step: 27600 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:20:26,435-Speed 3081.68 samples/sec Loss 13.7392 LearningRate 0.0790 Epoch: 2 Global Step: 27610 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:20:29,796-Speed 3047.56 samples/sec Loss 13.6405 LearningRate 0.0790 Epoch: 2 Global Step: 27620 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:20:33,095-Speed 3104.49 samples/sec Loss 13.6857 LearningRate 0.0790 Epoch: 2 Global Step: 27630 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:20:36,350-Speed 3147.11 samples/sec Loss 13.7881 LearningRate 0.0790 Epoch: 2 Global Step: 27640 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:20:39,672-Speed 3083.17 samples/sec Loss 13.5655 LearningRate 0.0790 Epoch: 2 Global Step: 27650 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:20:42,995-Speed 3083.33 samples/sec Loss 13.7114 LearningRate 0.0790 Epoch: 2 Global Step: 27660 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:20:46,402-Speed 3005.91 samples/sec Loss 13.6711 LearningRate 0.0790 Epoch: 2 Global Step: 27670 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:20:49,733-Speed 3075.22 samples/sec Loss 13.6167 LearningRate 0.0790 Epoch: 2 Global Step: 27680 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:20:53,025-Speed 3112.25 samples/sec Loss 13.7310 LearningRate 0.0789 Epoch: 2 Global Step: 27690 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:20:56,341-Speed 3088.32 samples/sec Loss 13.6701 LearningRate 0.0789 Epoch: 2 Global Step: 27700 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:20:59,663-Speed 3084.07 samples/sec Loss 13.6730 LearningRate 0.0789 Epoch: 2 Global Step: 27710 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:21:02,992-Speed 3076.93 samples/sec Loss 13.6371 LearningRate 0.0789 Epoch: 2 Global Step: 27720 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:21:06,362-Speed 3038.49 samples/sec Loss 13.6714 LearningRate 0.0789 Epoch: 2 Global Step: 27730 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:21:09,752-Speed 3021.83 samples/sec Loss 13.5844 LearningRate 0.0789 Epoch: 2 Global Step: 27740 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:21:13,060-Speed 3096.39 samples/sec Loss 13.6538 LearningRate 0.0789 Epoch: 2 Global Step: 27750 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:21:16,318-Speed 3144.14 samples/sec Loss 13.4738 LearningRate 0.0789 Epoch: 2 Global Step: 27760 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:21:19,667-Speed 3058.85 samples/sec Loss 13.5471 LearningRate 0.0789 Epoch: 2 Global Step: 27770 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:21:23,037-Speed 3039.35 samples/sec Loss 13.7857 LearningRate 0.0789 Epoch: 2 Global Step: 27780 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:21:26,303-Speed 3136.25 samples/sec Loss 13.5663 LearningRate 0.0789 Epoch: 2 Global Step: 27790 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:21:29,649-Speed 3060.87 samples/sec Loss 13.5161 LearningRate 0.0789 Epoch: 2 Global Step: 27800 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:21:33,005-Speed 3051.64 samples/sec Loss 13.8040 LearningRate 0.0789 Epoch: 2 Global Step: 27810 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:21:36,375-Speed 3039.65 samples/sec Loss 13.5301 LearningRate 0.0789 Epoch: 2 Global Step: 27820 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:21:39,691-Speed 3088.71 samples/sec Loss 13.5731 LearningRate 0.0788 Epoch: 2 Global Step: 27830 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:21:43,022-Speed 3075.22 samples/sec Loss 13.6296 LearningRate 0.0788 Epoch: 2 Global Step: 27840 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:21:46,361-Speed 3067.23 samples/sec Loss 13.5407 LearningRate 0.0788 Epoch: 2 Global Step: 27850 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:21:49,711-Speed 3057.78 samples/sec Loss 13.7258 LearningRate 0.0788 Epoch: 2 Global Step: 27860 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:21:53,001-Speed 3113.04 samples/sec Loss 13.7074 LearningRate 0.0788 Epoch: 2 Global Step: 27870 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:21:56,315-Speed 3091.42 samples/sec Loss 13.7622 LearningRate 0.0788 Epoch: 2 Global Step: 27880 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:21:59,670-Speed 3053.17 samples/sec Loss 13.5730 LearningRate 0.0788 Epoch: 2 Global Step: 27890 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:22:02,937-Speed 3135.43 samples/sec Loss 13.6018 LearningRate 0.0788 Epoch: 2 Global Step: 27900 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:22:06,207-Speed 3131.97 samples/sec Loss 13.6161 LearningRate 0.0788 Epoch: 2 Global Step: 27910 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:22:09,500-Speed 3110.65 samples/sec Loss 13.6994 LearningRate 0.0788 Epoch: 2 Global Step: 27920 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:22:12,864-Speed 3044.66 samples/sec Loss 13.7788 LearningRate 0.0788 Epoch: 2 Global Step: 27930 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:22:16,135-Speed 3131.49 samples/sec Loss 13.7415 LearningRate 0.0788 Epoch: 2 Global Step: 27940 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:22:19,505-Speed 3040.19 samples/sec Loss 13.5957 LearningRate 0.0788 Epoch: 2 Global Step: 27950 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:22:22,824-Speed 3086.00 samples/sec Loss 13.6832 LearningRate 0.0788 Epoch: 2 Global Step: 27960 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:22:26,164-Speed 3067.36 samples/sec Loss 13.5804 LearningRate 0.0787 Epoch: 2 Global Step: 27970 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:22:29,498-Speed 3072.17 samples/sec Loss 13.7327 LearningRate 0.0787 Epoch: 2 Global Step: 27980 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:22:32,802-Speed 3099.81 samples/sec Loss 13.6619 LearningRate 0.0787 Epoch: 2 Global Step: 27990 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:22:36,091-Speed 3114.15 samples/sec Loss 13.7651 LearningRate 0.0787 Epoch: 2 Global Step: 28000 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:22:39,372-Speed 3122.60 samples/sec Loss 13.6732 LearningRate 0.0787 Epoch: 2 Global Step: 28010 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:22:42,681-Speed 3095.93 samples/sec Loss 13.5761 LearningRate 0.0787 Epoch: 2 Global Step: 28020 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:22:45,981-Speed 3103.44 samples/sec Loss 13.7041 LearningRate 0.0787 Epoch: 2 Global Step: 28030 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:22:49,346-Speed 3044.09 samples/sec Loss 13.6100 LearningRate 0.0787 Epoch: 2 Global Step: 28040 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:22:52,715-Speed 3040.23 samples/sec Loss 13.6622 LearningRate 0.0787 Epoch: 2 Global Step: 28050 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:22:56,121-Speed 3008.05 samples/sec Loss 13.7907 LearningRate 0.0787 Epoch: 2 Global Step: 28060 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:22:59,468-Speed 3060.16 samples/sec Loss 13.6090 LearningRate 0.0787 Epoch: 2 Global Step: 28070 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:23:02,799-Speed 3075.24 samples/sec Loss 13.5500 LearningRate 0.0787 Epoch: 2 Global Step: 28080 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:23:06,138-Speed 3067.81 samples/sec Loss 13.6471 LearningRate 0.0787 Epoch: 2 Global Step: 28090 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:23:09,448-Speed 3094.34 samples/sec Loss 13.5439 LearningRate 0.0787 Epoch: 2 Global Step: 28100 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:23:12,742-Speed 3109.78 samples/sec Loss 13.6082 LearningRate 0.0786 Epoch: 2 Global Step: 28110 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:23:16,088-Speed 3061.01 samples/sec Loss 13.6722 LearningRate 0.0786 Epoch: 2 Global Step: 28120 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:23:19,338-Speed 3151.96 samples/sec Loss 13.7552 LearningRate 0.0786 Epoch: 2 Global Step: 28130 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:23:22,657-Speed 3086.06 samples/sec Loss 13.5251 LearningRate 0.0786 Epoch: 2 Global Step: 28140 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:23:26,035-Speed 3032.78 samples/sec Loss 13.7212 LearningRate 0.0786 Epoch: 2 Global Step: 28150 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:23:29,307-Speed 3129.97 samples/sec Loss 13.5585 LearningRate 0.0786 Epoch: 2 Global Step: 28160 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:23:32,689-Speed 3028.38 samples/sec Loss 13.6549 LearningRate 0.0786 Epoch: 2 Global Step: 28170 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:23:36,055-Speed 3043.36 samples/sec Loss 13.6119 LearningRate 0.0786 Epoch: 2 Global Step: 28180 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:23:39,401-Speed 3061.31 samples/sec Loss 13.5155 LearningRate 0.0786 Epoch: 2 Global Step: 28190 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:23:42,713-Speed 3093.08 samples/sec Loss 13.7530 LearningRate 0.0786 Epoch: 2 Global Step: 28200 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:23:46,072-Speed 3049.34 samples/sec Loss 13.6362 LearningRate 0.0786 Epoch: 2 Global Step: 28210 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:23:49,328-Speed 3145.40 samples/sec Loss 13.7777 LearningRate 0.0786 Epoch: 2 Global Step: 28220 Fp16 Grad Scale: 65536 Required: 21 hours Training: 2022-04-27 04:23:52,585-Speed 3145.36 samples/sec Loss 13.7361 LearningRate 0.0786 Epoch: 2 Global Step: 28230 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:23:55,907-Speed 3084.08 samples/sec Loss 13.6583 LearningRate 0.0786 Epoch: 2 Global Step: 28240 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:23:59,320-Speed 3000.81 samples/sec Loss 13.5749 LearningRate 0.0785 Epoch: 2 Global Step: 28250 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:24:02,607-Speed 3116.73 samples/sec Loss 13.5988 LearningRate 0.0785 Epoch: 2 Global Step: 28260 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:24:05,881-Speed 3128.02 samples/sec Loss 13.7135 LearningRate 0.0785 Epoch: 2 Global Step: 28270 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:24:09,248-Speed 3043.25 samples/sec Loss 13.6831 LearningRate 0.0785 Epoch: 2 Global Step: 28280 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:24:12,564-Speed 3088.22 samples/sec Loss 13.5543 LearningRate 0.0785 Epoch: 2 Global Step: 28290 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:24:15,896-Speed 3074.12 samples/sec Loss 13.5843 LearningRate 0.0785 Epoch: 2 Global Step: 28300 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:24:19,205-Speed 3096.60 samples/sec Loss 13.5057 LearningRate 0.0785 Epoch: 2 Global Step: 28310 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:24:22,510-Speed 3098.64 samples/sec Loss 13.8234 LearningRate 0.0785 Epoch: 2 Global Step: 28320 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:24:25,848-Speed 3069.32 samples/sec Loss 13.5930 LearningRate 0.0785 Epoch: 2 Global Step: 28330 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:24:29,143-Speed 3108.13 samples/sec Loss 13.8097 LearningRate 0.0785 Epoch: 2 Global Step: 28340 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:24:32,516-Speed 3036.81 samples/sec Loss 13.6475 LearningRate 0.0785 Epoch: 2 Global Step: 28350 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:24:35,961-Speed 2973.15 samples/sec Loss 13.6687 LearningRate 0.0785 Epoch: 2 Global Step: 28360 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:24:39,368-Speed 3006.75 samples/sec Loss 13.6495 LearningRate 0.0785 Epoch: 2 Global Step: 28370 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:24:42,671-Speed 3101.30 samples/sec Loss 13.5850 LearningRate 0.0785 Epoch: 2 Global Step: 28380 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:24:46,000-Speed 3076.28 samples/sec Loss 13.5174 LearningRate 0.0784 Epoch: 2 Global Step: 28390 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:24:49,321-Speed 3085.58 samples/sec Loss 13.6386 LearningRate 0.0784 Epoch: 2 Global Step: 28400 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:24:52,679-Speed 3050.27 samples/sec Loss 13.6636 LearningRate 0.0784 Epoch: 2 Global Step: 28410 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:24:56,098-Speed 2996.17 samples/sec Loss 13.5115 LearningRate 0.0784 Epoch: 2 Global Step: 28420 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:24:59,364-Speed 3135.94 samples/sec Loss 13.6837 LearningRate 0.0784 Epoch: 2 Global Step: 28430 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:25:02,701-Speed 3070.25 samples/sec Loss 13.8603 LearningRate 0.0784 Epoch: 2 Global Step: 28440 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:25:06,014-Speed 3091.56 samples/sec Loss 13.5724 LearningRate 0.0784 Epoch: 2 Global Step: 28450 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:25:09,367-Speed 3054.92 samples/sec Loss 13.6485 LearningRate 0.0784 Epoch: 2 Global Step: 28460 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:25:12,755-Speed 3023.68 samples/sec Loss 13.5851 LearningRate 0.0784 Epoch: 2 Global Step: 28470 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:25:16,041-Speed 3117.02 samples/sec Loss 13.5983 LearningRate 0.0784 Epoch: 2 Global Step: 28480 Fp16 Grad Scale: 131072 Required: 21 hours Training: 2022-04-27 04:25:19,314-Speed 3130.34 samples/sec Loss 13.6311 LearningRate 0.0784 Epoch: 2 Global Step: 28490 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:25:22,655-Speed 3065.01 samples/sec Loss 13.7901 LearningRate 0.0784 Epoch: 2 Global Step: 28500 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:25:25,989-Speed 3073.03 samples/sec Loss 13.6035 LearningRate 0.0784 Epoch: 2 Global Step: 28510 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:25:29,359-Speed 3039.01 samples/sec Loss 13.7311 LearningRate 0.0784 Epoch: 2 Global Step: 28520 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:25:32,709-Speed 3057.61 samples/sec Loss 13.8125 LearningRate 0.0783 Epoch: 2 Global Step: 28530 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:25:36,071-Speed 3046.65 samples/sec Loss 13.5980 LearningRate 0.0783 Epoch: 2 Global Step: 28540 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 04:25:39,408-Speed 3070.06 samples/sec Loss 13.5439 LearningRate 0.0783 Epoch: 2 Global Step: 28550 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 04:25:42,703-Speed 3108.98 samples/sec Loss 13.6165 LearningRate 0.0783 Epoch: 2 Global Step: 28560 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 04:25:46,068-Speed 3043.66 samples/sec Loss 13.7239 LearningRate 0.0783 Epoch: 2 Global Step: 28570 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 04:25:49,422-Speed 3054.20 samples/sec Loss 13.5877 LearningRate 0.0783 Epoch: 2 Global Step: 28580 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 04:25:52,746-Speed 3081.71 samples/sec Loss 13.5449 LearningRate 0.0783 Epoch: 2 Global Step: 28590 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 04:25:56,100-Speed 3053.81 samples/sec Loss 13.5843 LearningRate 0.0783 Epoch: 2 Global Step: 28600 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 04:25:59,420-Speed 3085.42 samples/sec Loss 13.8068 LearningRate 0.0783 Epoch: 2 Global Step: 28610 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 04:26:02,737-Speed 3087.66 samples/sec Loss 13.4571 LearningRate 0.0783 Epoch: 2 Global Step: 28620 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 04:26:06,103-Speed 3043.67 samples/sec Loss 13.6266 LearningRate 0.0783 Epoch: 2 Global Step: 28630 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 04:26:09,471-Speed 3041.29 samples/sec Loss 13.5626 LearningRate 0.0783 Epoch: 2 Global Step: 28640 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:26:12,819-Speed 3059.68 samples/sec Loss 13.6635 LearningRate 0.0783 Epoch: 2 Global Step: 28650 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:26:16,137-Speed 3087.23 samples/sec Loss 13.6692 LearningRate 0.0783 Epoch: 2 Global Step: 28660 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:26:19,519-Speed 3028.86 samples/sec Loss 13.6678 LearningRate 0.0783 Epoch: 2 Global Step: 28670 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:26:22,864-Speed 3061.94 samples/sec Loss 13.6897 LearningRate 0.0782 Epoch: 2 Global Step: 28680 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:26:26,230-Speed 3043.58 samples/sec Loss 13.4713 LearningRate 0.0782 Epoch: 2 Global Step: 28690 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:26:29,507-Speed 3126.24 samples/sec Loss 13.6912 LearningRate 0.0782 Epoch: 2 Global Step: 28700 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:26:32,853-Speed 3061.41 samples/sec Loss 13.5977 LearningRate 0.0782 Epoch: 2 Global Step: 28710 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:26:36,189-Speed 3070.76 samples/sec Loss 13.5805 LearningRate 0.0782 Epoch: 2 Global Step: 28720 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:26:39,561-Speed 3036.88 samples/sec Loss 13.6973 LearningRate 0.0782 Epoch: 2 Global Step: 28730 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:26:42,885-Speed 3081.85 samples/sec Loss 13.6772 LearningRate 0.0782 Epoch: 2 Global Step: 28740 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:26:46,154-Speed 3132.90 samples/sec Loss 13.5886 LearningRate 0.0782 Epoch: 2 Global Step: 28750 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:26:49,572-Speed 2997.35 samples/sec Loss 13.5097 LearningRate 0.0782 Epoch: 2 Global Step: 28760 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:26:52,959-Speed 3023.93 samples/sec Loss 13.6683 LearningRate 0.0782 Epoch: 2 Global Step: 28770 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:26:56,347-Speed 3023.47 samples/sec Loss 13.6752 LearningRate 0.0782 Epoch: 2 Global Step: 28780 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:26:59,650-Speed 3101.14 samples/sec Loss 13.6555 LearningRate 0.0782 Epoch: 2 Global Step: 28790 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:27:02,976-Speed 3079.67 samples/sec Loss 13.6190 LearningRate 0.0782 Epoch: 2 Global Step: 28800 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:27:06,260-Speed 3119.35 samples/sec Loss 13.5719 LearningRate 0.0782 Epoch: 2 Global Step: 28810 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:27:09,580-Speed 3085.48 samples/sec Loss 13.6199 LearningRate 0.0781 Epoch: 2 Global Step: 28820 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:27:12,832-Speed 3149.32 samples/sec Loss 13.6784 LearningRate 0.0781 Epoch: 2 Global Step: 28830 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:27:16,199-Speed 3042.27 samples/sec Loss 13.6444 LearningRate 0.0781 Epoch: 2 Global Step: 28840 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:27:19,544-Speed 3062.23 samples/sec Loss 13.6465 LearningRate 0.0781 Epoch: 2 Global Step: 28850 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:27:22,884-Speed 3067.39 samples/sec Loss 13.5387 LearningRate 0.0781 Epoch: 2 Global Step: 28860 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:27:26,214-Speed 3076.21 samples/sec Loss 13.6559 LearningRate 0.0781 Epoch: 2 Global Step: 28870 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:27:29,580-Speed 3042.90 samples/sec Loss 13.6790 LearningRate 0.0781 Epoch: 2 Global Step: 28880 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:27:32,959-Speed 3031.00 samples/sec Loss 13.4524 LearningRate 0.0781 Epoch: 2 Global Step: 28890 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:27:36,249-Speed 3114.67 samples/sec Loss 13.5575 LearningRate 0.0781 Epoch: 2 Global Step: 28900 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:27:39,597-Speed 3059.25 samples/sec Loss 13.4765 LearningRate 0.0781 Epoch: 2 Global Step: 28910 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:27:42,897-Speed 3104.18 samples/sec Loss 13.7536 LearningRate 0.0781 Epoch: 2 Global Step: 28920 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:27:46,207-Speed 3094.42 samples/sec Loss 13.6159 LearningRate 0.0781 Epoch: 2 Global Step: 28930 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:27:49,513-Speed 3099.04 samples/sec Loss 13.7722 LearningRate 0.0781 Epoch: 2 Global Step: 28940 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:27:52,893-Speed 3030.42 samples/sec Loss 13.4876 LearningRate 0.0781 Epoch: 2 Global Step: 28950 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:27:56,204-Speed 3093.65 samples/sec Loss 13.7255 LearningRate 0.0780 Epoch: 2 Global Step: 28960 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:27:59,545-Speed 3065.20 samples/sec Loss 13.5315 LearningRate 0.0780 Epoch: 2 Global Step: 28970 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:28:02,880-Speed 3071.75 samples/sec Loss 13.7524 LearningRate 0.0780 Epoch: 2 Global Step: 28980 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:28:06,278-Speed 3014.48 samples/sec Loss 13.6687 LearningRate 0.0780 Epoch: 2 Global Step: 28990 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:28:09,618-Speed 3066.52 samples/sec Loss 13.5905 LearningRate 0.0780 Epoch: 2 Global Step: 29000 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:28:12,973-Speed 3053.54 samples/sec Loss 13.5925 LearningRate 0.0780 Epoch: 2 Global Step: 29010 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:28:16,376-Speed 3009.58 samples/sec Loss 13.6221 LearningRate 0.0780 Epoch: 2 Global Step: 29020 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:28:19,716-Speed 3066.90 samples/sec Loss 13.7121 LearningRate 0.0780 Epoch: 2 Global Step: 29030 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:28:23,064-Speed 3060.15 samples/sec Loss 13.6336 LearningRate 0.0780 Epoch: 2 Global Step: 29040 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:28:26,409-Speed 3061.56 samples/sec Loss 13.6127 LearningRate 0.0780 Epoch: 2 Global Step: 29050 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:28:29,723-Speed 3091.07 samples/sec Loss 13.6533 LearningRate 0.0780 Epoch: 2 Global Step: 29060 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:28:33,054-Speed 3075.36 samples/sec Loss 13.4210 LearningRate 0.0780 Epoch: 2 Global Step: 29070 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:28:36,321-Speed 3134.52 samples/sec Loss 13.5733 LearningRate 0.0780 Epoch: 2 Global Step: 29080 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:28:39,602-Speed 3122.74 samples/sec Loss 13.5159 LearningRate 0.0780 Epoch: 2 Global Step: 29090 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:28:42,881-Speed 3123.53 samples/sec Loss 13.6542 LearningRate 0.0779 Epoch: 2 Global Step: 29100 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:28:46,188-Speed 3097.76 samples/sec Loss 13.5626 LearningRate 0.0779 Epoch: 2 Global Step: 29110 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:28:49,522-Speed 3072.46 samples/sec Loss 13.5604 LearningRate 0.0779 Epoch: 2 Global Step: 29120 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:28:52,806-Speed 3119.04 samples/sec Loss 13.6252 LearningRate 0.0779 Epoch: 2 Global Step: 29130 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 04:28:56,150-Speed 3062.55 samples/sec Loss 13.5751 LearningRate 0.0779 Epoch: 2 Global Step: 29140 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 04:28:59,439-Speed 3114.59 samples/sec Loss 13.6363 LearningRate 0.0779 Epoch: 2 Global Step: 29150 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 04:29:02,768-Speed 3077.03 samples/sec Loss 13.4658 LearningRate 0.0779 Epoch: 2 Global Step: 29160 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 04:29:06,087-Speed 3085.59 samples/sec Loss 13.7012 LearningRate 0.0779 Epoch: 2 Global Step: 29170 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 04:29:09,386-Speed 3104.82 samples/sec Loss 13.6240 LearningRate 0.0779 Epoch: 2 Global Step: 29180 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 04:29:12,712-Speed 3080.26 samples/sec Loss 13.6694 LearningRate 0.0779 Epoch: 2 Global Step: 29190 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 04:29:15,989-Speed 3125.70 samples/sec Loss 13.5740 LearningRate 0.0779 Epoch: 2 Global Step: 29200 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 04:29:19,319-Speed 3075.81 samples/sec Loss 13.6630 LearningRate 0.0779 Epoch: 2 Global Step: 29210 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 04:29:22,604-Speed 3118.00 samples/sec Loss 13.6160 LearningRate 0.0779 Epoch: 2 Global Step: 29220 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 04:29:25,951-Speed 3059.87 samples/sec Loss 13.6055 LearningRate 0.0779 Epoch: 2 Global Step: 29230 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:29:29,353-Speed 3011.05 samples/sec Loss 13.5223 LearningRate 0.0778 Epoch: 2 Global Step: 29240 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:29:32,696-Speed 3063.98 samples/sec Loss 13.5838 LearningRate 0.0778 Epoch: 2 Global Step: 29250 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:29:36,039-Speed 3064.07 samples/sec Loss 13.6459 LearningRate 0.0778 Epoch: 2 Global Step: 29260 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:29:39,312-Speed 3129.78 samples/sec Loss 13.6965 LearningRate 0.0778 Epoch: 2 Global Step: 29270 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:29:42,657-Speed 3062.26 samples/sec Loss 13.5480 LearningRate 0.0778 Epoch: 2 Global Step: 29280 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:29:45,939-Speed 3120.09 samples/sec Loss 13.5668 LearningRate 0.0778 Epoch: 2 Global Step: 29290 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:29:49,258-Speed 3085.89 samples/sec Loss 13.5390 LearningRate 0.0778 Epoch: 2 Global Step: 29300 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:29:52,626-Speed 3041.55 samples/sec Loss 13.7917 LearningRate 0.0778 Epoch: 2 Global Step: 29310 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:29:56,027-Speed 3011.67 samples/sec Loss 13.5181 LearningRate 0.0778 Epoch: 2 Global Step: 29320 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:29:59,334-Speed 3097.48 samples/sec Loss 13.5705 LearningRate 0.0778 Epoch: 2 Global Step: 29330 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:30:02,683-Speed 3059.20 samples/sec Loss 13.5424 LearningRate 0.0778 Epoch: 2 Global Step: 29340 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:30:06,042-Speed 3049.51 samples/sec Loss 13.5363 LearningRate 0.0778 Epoch: 2 Global Step: 29350 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:30:09,392-Speed 3057.29 samples/sec Loss 13.4735 LearningRate 0.0778 Epoch: 2 Global Step: 29360 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:30:12,677-Speed 3118.78 samples/sec Loss 13.6214 LearningRate 0.0778 Epoch: 2 Global Step: 29370 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:30:16,032-Speed 3053.25 samples/sec Loss 13.5268 LearningRate 0.0777 Epoch: 2 Global Step: 29380 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:30:19,344-Speed 3092.08 samples/sec Loss 13.6066 LearningRate 0.0777 Epoch: 2 Global Step: 29390 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:30:22,668-Speed 3081.95 samples/sec Loss 13.5516 LearningRate 0.0777 Epoch: 2 Global Step: 29400 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:30:25,990-Speed 3083.47 samples/sec Loss 13.6017 LearningRate 0.0777 Epoch: 2 Global Step: 29410 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:30:29,325-Speed 3071.27 samples/sec Loss 13.5056 LearningRate 0.0777 Epoch: 2 Global Step: 29420 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:30:32,653-Speed 3078.21 samples/sec Loss 13.5013 LearningRate 0.0777 Epoch: 2 Global Step: 29430 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:30:35,940-Speed 3115.93 samples/sec Loss 13.6319 LearningRate 0.0777 Epoch: 2 Global Step: 29440 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:30:39,242-Speed 3101.36 samples/sec Loss 13.4952 LearningRate 0.0777 Epoch: 2 Global Step: 29450 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:30:42,571-Speed 3077.40 samples/sec Loss 13.6023 LearningRate 0.0777 Epoch: 2 Global Step: 29460 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:30:45,887-Speed 3088.80 samples/sec Loss 13.6348 LearningRate 0.0777 Epoch: 2 Global Step: 29470 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:30:49,180-Speed 3110.43 samples/sec Loss 13.4611 LearningRate 0.0777 Epoch: 2 Global Step: 29480 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:30:52,480-Speed 3104.10 samples/sec Loss 13.4249 LearningRate 0.0777 Epoch: 2 Global Step: 29490 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:30:55,815-Speed 3071.60 samples/sec Loss 13.5336 LearningRate 0.0777 Epoch: 2 Global Step: 29500 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:30:59,144-Speed 3076.90 samples/sec Loss 13.5344 LearningRate 0.0777 Epoch: 2 Global Step: 29510 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:31:02,479-Speed 3071.65 samples/sec Loss 13.6293 LearningRate 0.0776 Epoch: 2 Global Step: 29520 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:31:05,792-Speed 3091.53 samples/sec Loss 13.4015 LearningRate 0.0776 Epoch: 2 Global Step: 29530 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:31:09,076-Speed 3118.95 samples/sec Loss 13.6521 LearningRate 0.0776 Epoch: 2 Global Step: 29540 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:31:12,384-Speed 3096.70 samples/sec Loss 13.4798 LearningRate 0.0776 Epoch: 2 Global Step: 29550 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:31:15,649-Speed 3137.46 samples/sec Loss 13.6713 LearningRate 0.0776 Epoch: 2 Global Step: 29560 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:31:18,977-Speed 3076.88 samples/sec Loss 13.5579 LearningRate 0.0776 Epoch: 2 Global Step: 29570 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:31:22,307-Speed 3076.80 samples/sec Loss 13.6432 LearningRate 0.0776 Epoch: 2 Global Step: 29580 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:31:25,637-Speed 3075.93 samples/sec Loss 13.4846 LearningRate 0.0776 Epoch: 2 Global Step: 29590 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:31:28,891-Speed 3148.04 samples/sec Loss 13.5823 LearningRate 0.0776 Epoch: 2 Global Step: 29600 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:31:32,234-Speed 3063.54 samples/sec Loss 13.5909 LearningRate 0.0776 Epoch: 2 Global Step: 29610 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:31:35,572-Speed 3068.84 samples/sec Loss 13.5511 LearningRate 0.0776 Epoch: 2 Global Step: 29620 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:31:38,938-Speed 3042.58 samples/sec Loss 13.5125 LearningRate 0.0776 Epoch: 2 Global Step: 29630 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:31:42,244-Speed 3099.10 samples/sec Loss 13.4525 LearningRate 0.0776 Epoch: 2 Global Step: 29640 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:31:45,584-Speed 3066.42 samples/sec Loss 13.5735 LearningRate 0.0776 Epoch: 2 Global Step: 29650 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:31:48,913-Speed 3077.19 samples/sec Loss 13.7072 LearningRate 0.0775 Epoch: 2 Global Step: 29660 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:31:52,211-Speed 3106.53 samples/sec Loss 13.6688 LearningRate 0.0775 Epoch: 2 Global Step: 29670 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:31:55,513-Speed 3101.92 samples/sec Loss 13.5709 LearningRate 0.0775 Epoch: 2 Global Step: 29680 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:31:58,803-Speed 3112.62 samples/sec Loss 13.6285 LearningRate 0.0775 Epoch: 2 Global Step: 29690 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:32:02,169-Speed 3043.03 samples/sec Loss 13.5757 LearningRate 0.0775 Epoch: 2 Global Step: 29700 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-04-27 04:32:05,464-Speed 3109.33 samples/sec Loss 13.4587 LearningRate 0.0775 Epoch: 2 Global Step: 29710 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:32:08,749-Speed 3117.86 samples/sec Loss 13.6255 LearningRate 0.0775 Epoch: 2 Global Step: 29720 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:32:12,090-Speed 3066.13 samples/sec Loss 13.5453 LearningRate 0.0775 Epoch: 2 Global Step: 29730 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:32:15,431-Speed 3065.50 samples/sec Loss 13.5904 LearningRate 0.0775 Epoch: 2 Global Step: 29740 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:32:18,768-Speed 3069.84 samples/sec Loss 13.5913 LearningRate 0.0775 Epoch: 2 Global Step: 29750 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:32:22,044-Speed 3126.71 samples/sec Loss 13.3791 LearningRate 0.0775 Epoch: 2 Global Step: 29760 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:32:25,401-Speed 3051.52 samples/sec Loss 13.5971 LearningRate 0.0775 Epoch: 2 Global Step: 29770 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:32:28,705-Speed 3100.25 samples/sec Loss 13.5473 LearningRate 0.0775 Epoch: 2 Global Step: 29780 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:32:32,036-Speed 3074.55 samples/sec Loss 13.5548 LearningRate 0.0775 Epoch: 2 Global Step: 29790 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:32:35,360-Speed 3082.25 samples/sec Loss 13.4715 LearningRate 0.0774 Epoch: 2 Global Step: 29800 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:32:38,675-Speed 3090.02 samples/sec Loss 13.5135 LearningRate 0.0774 Epoch: 2 Global Step: 29810 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:32:41,967-Speed 3111.32 samples/sec Loss 13.6865 LearningRate 0.0774 Epoch: 2 Global Step: 29820 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:32:45,239-Speed 3131.05 samples/sec Loss 13.4668 LearningRate 0.0774 Epoch: 2 Global Step: 29830 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:32:48,525-Speed 3116.86 samples/sec Loss 13.5344 LearningRate 0.0774 Epoch: 2 Global Step: 29840 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:32:51,855-Speed 3076.22 samples/sec Loss 13.5162 LearningRate 0.0774 Epoch: 2 Global Step: 29850 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:32:55,151-Speed 3107.71 samples/sec Loss 13.4288 LearningRate 0.0774 Epoch: 2 Global Step: 29860 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:32:58,442-Speed 3112.50 samples/sec Loss 13.5423 LearningRate 0.0774 Epoch: 2 Global Step: 29870 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:33:01,778-Speed 3070.54 samples/sec Loss 13.5081 LearningRate 0.0774 Epoch: 2 Global Step: 29880 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:33:05,114-Speed 3070.63 samples/sec Loss 13.6163 LearningRate 0.0774 Epoch: 2 Global Step: 29890 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:33:08,472-Speed 3049.58 samples/sec Loss 13.4005 LearningRate 0.0774 Epoch: 2 Global Step: 29900 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:33:11,797-Speed 3080.59 samples/sec Loss 13.6140 LearningRate 0.0774 Epoch: 2 Global Step: 29910 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:33:15,145-Speed 3060.31 samples/sec Loss 13.4652 LearningRate 0.0774 Epoch: 2 Global Step: 29920 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:33:18,435-Speed 3112.64 samples/sec Loss 13.4755 LearningRate 0.0774 Epoch: 2 Global Step: 29930 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:33:21,769-Speed 3073.04 samples/sec Loss 13.4874 LearningRate 0.0773 Epoch: 2 Global Step: 29940 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:33:25,133-Speed 3045.23 samples/sec Loss 13.5419 LearningRate 0.0773 Epoch: 2 Global Step: 29950 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:33:28,431-Speed 3105.80 samples/sec Loss 13.2860 LearningRate 0.0773 Epoch: 2 Global Step: 29960 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:33:31,802-Speed 3038.11 samples/sec Loss 13.4868 LearningRate 0.0773 Epoch: 2 Global Step: 29970 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:33:35,107-Speed 3098.80 samples/sec Loss 13.4528 LearningRate 0.0773 Epoch: 2 Global Step: 29980 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:33:38,407-Speed 3104.59 samples/sec Loss 13.5212 LearningRate 0.0773 Epoch: 2 Global Step: 29990 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:33:41,708-Speed 3102.68 samples/sec Loss 13.4855 LearningRate 0.0773 Epoch: 2 Global Step: 30000 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:33:45,063-Speed 3052.74 samples/sec Loss 13.6108 LearningRate 0.0773 Epoch: 2 Global Step: 30010 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:33:48,408-Speed 3062.93 samples/sec Loss 13.5350 LearningRate 0.0773 Epoch: 2 Global Step: 30020 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:33:51,683-Speed 3126.94 samples/sec Loss 13.4450 LearningRate 0.0773 Epoch: 2 Global Step: 30030 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:33:55,012-Speed 3077.60 samples/sec Loss 13.4643 LearningRate 0.0773 Epoch: 2 Global Step: 30040 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:33:58,376-Speed 3044.31 samples/sec Loss 13.4905 LearningRate 0.0773 Epoch: 2 Global Step: 30050 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:34:01,770-Speed 3018.95 samples/sec Loss 13.4316 LearningRate 0.0773 Epoch: 2 Global Step: 30060 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:34:05,206-Speed 2980.53 samples/sec Loss 13.3925 LearningRate 0.0773 Epoch: 2 Global Step: 30070 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:34:08,516-Speed 3095.05 samples/sec Loss 13.3680 LearningRate 0.0772 Epoch: 2 Global Step: 30080 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:34:11,871-Speed 3053.26 samples/sec Loss 13.5058 LearningRate 0.0772 Epoch: 2 Global Step: 30090 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:34:15,203-Speed 3073.93 samples/sec Loss 13.6540 LearningRate 0.0772 Epoch: 2 Global Step: 30100 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:34:18,544-Speed 3065.83 samples/sec Loss 13.5310 LearningRate 0.0772 Epoch: 2 Global Step: 30110 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:34:21,883-Speed 3068.07 samples/sec Loss 13.4579 LearningRate 0.0772 Epoch: 2 Global Step: 30120 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:34:25,167-Speed 3119.53 samples/sec Loss 13.5952 LearningRate 0.0772 Epoch: 2 Global Step: 30130 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:34:28,463-Speed 3108.05 samples/sec Loss 13.5606 LearningRate 0.0772 Epoch: 2 Global Step: 30140 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:34:31,809-Speed 3060.60 samples/sec Loss 13.4754 LearningRate 0.0772 Epoch: 2 Global Step: 30150 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:34:35,088-Speed 3124.55 samples/sec Loss 13.6887 LearningRate 0.0772 Epoch: 2 Global Step: 30160 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:34:38,365-Speed 3125.35 samples/sec Loss 13.6072 LearningRate 0.0772 Epoch: 2 Global Step: 30170 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:34:41,668-Speed 3101.65 samples/sec Loss 13.6134 LearningRate 0.0772 Epoch: 2 Global Step: 30180 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:34:44,974-Speed 3097.51 samples/sec Loss 13.7901 LearningRate 0.0772 Epoch: 2 Global Step: 30190 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:34:48,314-Speed 3067.10 samples/sec Loss 13.4991 LearningRate 0.0772 Epoch: 2 Global Step: 30200 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:34:51,650-Speed 3070.73 samples/sec Loss 13.3225 LearningRate 0.0772 Epoch: 2 Global Step: 30210 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:34:54,985-Speed 3071.39 samples/sec Loss 13.4219 LearningRate 0.0772 Epoch: 2 Global Step: 30220 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:34:58,282-Speed 3107.45 samples/sec Loss 13.5371 LearningRate 0.0771 Epoch: 2 Global Step: 30230 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-04-27 04:35:01,578-Speed 3106.99 samples/sec Loss 13.3801 LearningRate 0.0771 Epoch: 2 Global Step: 30240 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:35:04,893-Speed 3090.30 samples/sec Loss 13.6691 LearningRate 0.0771 Epoch: 2 Global Step: 30250 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:35:08,203-Speed 3094.10 samples/sec Loss 13.4870 LearningRate 0.0771 Epoch: 2 Global Step: 30260 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:35:11,524-Speed 3085.03 samples/sec Loss 13.4271 LearningRate 0.0771 Epoch: 2 Global Step: 30270 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:35:14,836-Speed 3092.61 samples/sec Loss 13.6125 LearningRate 0.0771 Epoch: 2 Global Step: 30280 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:35:18,111-Speed 3126.97 samples/sec Loss 13.6459 LearningRate 0.0771 Epoch: 2 Global Step: 30290 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:35:21,409-Speed 3106.72 samples/sec Loss 13.4646 LearningRate 0.0771 Epoch: 2 Global Step: 30300 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:35:24,705-Speed 3107.10 samples/sec Loss 13.6533 LearningRate 0.0771 Epoch: 2 Global Step: 30310 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:35:28,000-Speed 3108.52 samples/sec Loss 13.4745 LearningRate 0.0771 Epoch: 2 Global Step: 30320 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:35:31,300-Speed 3104.65 samples/sec Loss 13.4194 LearningRate 0.0771 Epoch: 2 Global Step: 30330 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:35:34,577-Speed 3125.67 samples/sec Loss 13.5321 LearningRate 0.0771 Epoch: 2 Global Step: 30340 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:35:37,880-Speed 3100.32 samples/sec Loss 13.4864 LearningRate 0.0771 Epoch: 2 Global Step: 30350 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:35:41,242-Speed 3046.44 samples/sec Loss 13.3899 LearningRate 0.0771 Epoch: 2 Global Step: 30360 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:35:44,595-Speed 3055.51 samples/sec Loss 13.5554 LearningRate 0.0770 Epoch: 2 Global Step: 30370 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:35:47,990-Speed 3017.06 samples/sec Loss 13.5243 LearningRate 0.0770 Epoch: 2 Global Step: 30380 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:35:51,341-Speed 3057.27 samples/sec Loss 13.6358 LearningRate 0.0770 Epoch: 2 Global Step: 30390 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:35:54,664-Speed 3081.63 samples/sec Loss 13.4809 LearningRate 0.0770 Epoch: 2 Global Step: 30400 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:35:58,022-Speed 3050.41 samples/sec Loss 13.3779 LearningRate 0.0770 Epoch: 2 Global Step: 30410 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:36:01,383-Speed 3047.35 samples/sec Loss 13.5444 LearningRate 0.0770 Epoch: 2 Global Step: 30420 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:36:04,824-Speed 2976.99 samples/sec Loss 13.4681 LearningRate 0.0770 Epoch: 2 Global Step: 30430 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:36:08,124-Speed 3104.70 samples/sec Loss 13.5171 LearningRate 0.0770 Epoch: 2 Global Step: 30440 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:36:11,421-Speed 3106.24 samples/sec Loss 13.5270 LearningRate 0.0770 Epoch: 2 Global Step: 30450 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:36:14,749-Speed 3077.45 samples/sec Loss 13.5385 LearningRate 0.0770 Epoch: 2 Global Step: 30460 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:36:18,053-Speed 3101.35 samples/sec Loss 13.5951 LearningRate 0.0770 Epoch: 2 Global Step: 30470 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:36:21,380-Speed 3077.94 samples/sec Loss 13.5470 LearningRate 0.0770 Epoch: 2 Global Step: 30480 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:36:24,640-Speed 3142.26 samples/sec Loss 13.5981 LearningRate 0.0770 Epoch: 2 Global Step: 30490 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:36:28,008-Speed 3040.93 samples/sec Loss 13.5759 LearningRate 0.0770 Epoch: 2 Global Step: 30500 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:36:31,312-Speed 3100.09 samples/sec Loss 13.5353 LearningRate 0.0769 Epoch: 2 Global Step: 30510 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:36:34,638-Speed 3080.28 samples/sec Loss 13.5841 LearningRate 0.0769 Epoch: 2 Global Step: 30520 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:36:37,947-Speed 3095.33 samples/sec Loss 13.4099 LearningRate 0.0769 Epoch: 2 Global Step: 30530 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:36:41,273-Speed 3078.87 samples/sec Loss 13.5408 LearningRate 0.0769 Epoch: 2 Global Step: 30540 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:36:44,542-Speed 3133.71 samples/sec Loss 13.5174 LearningRate 0.0769 Epoch: 2 Global Step: 30550 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:36:47,830-Speed 3115.10 samples/sec Loss 13.4870 LearningRate 0.0769 Epoch: 2 Global Step: 30560 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:36:51,100-Speed 3132.97 samples/sec Loss 13.4365 LearningRate 0.0769 Epoch: 2 Global Step: 30570 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:36:54,359-Speed 3142.41 samples/sec Loss 13.4770 LearningRate 0.0769 Epoch: 2 Global Step: 30580 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:36:57,671-Speed 3092.39 samples/sec Loss 13.4171 LearningRate 0.0769 Epoch: 2 Global Step: 30590 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:37:00,994-Speed 3082.88 samples/sec Loss 13.5638 LearningRate 0.0769 Epoch: 2 Global Step: 30600 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:37:04,280-Speed 3117.40 samples/sec Loss 13.3821 LearningRate 0.0769 Epoch: 2 Global Step: 30610 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:37:07,588-Speed 3096.43 samples/sec Loss 13.3757 LearningRate 0.0769 Epoch: 2 Global Step: 30620 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:37:10,866-Speed 3124.03 samples/sec Loss 13.3019 LearningRate 0.0769 Epoch: 2 Global Step: 30630 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:37:14,175-Speed 3095.12 samples/sec Loss 13.6636 LearningRate 0.0769 Epoch: 2 Global Step: 30640 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:37:17,464-Speed 3114.34 samples/sec Loss 13.6339 LearningRate 0.0768 Epoch: 2 Global Step: 30650 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:37:20,744-Speed 3123.22 samples/sec Loss 13.5717 LearningRate 0.0768 Epoch: 2 Global Step: 30660 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:37:24,082-Speed 3068.73 samples/sec Loss 13.4007 LearningRate 0.0768 Epoch: 2 Global Step: 30670 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:37:27,378-Speed 3107.99 samples/sec Loss 13.5280 LearningRate 0.0768 Epoch: 2 Global Step: 30680 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:37:30,750-Speed 3037.30 samples/sec Loss 13.3607 LearningRate 0.0768 Epoch: 2 Global Step: 30690 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:37:34,119-Speed 3040.91 samples/sec Loss 13.5628 LearningRate 0.0768 Epoch: 2 Global Step: 30700 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:37:37,424-Speed 3099.49 samples/sec Loss 13.5927 LearningRate 0.0768 Epoch: 2 Global Step: 30710 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:37:40,716-Speed 3111.17 samples/sec Loss 13.3972 LearningRate 0.0768 Epoch: 2 Global Step: 30720 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:37:44,014-Speed 3105.65 samples/sec Loss 13.5299 LearningRate 0.0768 Epoch: 2 Global Step: 30730 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:37:47,319-Speed 3099.54 samples/sec Loss 13.3788 LearningRate 0.0768 Epoch: 2 Global Step: 30740 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:37:50,649-Speed 3076.08 samples/sec Loss 13.5657 LearningRate 0.0768 Epoch: 2 Global Step: 30750 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:37:54,003-Speed 3054.18 samples/sec Loss 13.4401 LearningRate 0.0768 Epoch: 2 Global Step: 30760 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:37:57,337-Speed 3072.06 samples/sec Loss 13.4708 LearningRate 0.0768 Epoch: 2 Global Step: 30770 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:38:00,687-Speed 3057.14 samples/sec Loss 13.5139 LearningRate 0.0768 Epoch: 2 Global Step: 30780 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:38:04,016-Speed 3076.73 samples/sec Loss 13.4310 LearningRate 0.0767 Epoch: 2 Global Step: 30790 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:38:07,363-Speed 3061.08 samples/sec Loss 13.4078 LearningRate 0.0767 Epoch: 2 Global Step: 30800 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:38:10,704-Speed 3065.81 samples/sec Loss 13.5306 LearningRate 0.0767 Epoch: 2 Global Step: 30810 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:38:14,054-Speed 3057.89 samples/sec Loss 13.5867 LearningRate 0.0767 Epoch: 2 Global Step: 30820 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:38:17,413-Speed 3049.49 samples/sec Loss 13.3574 LearningRate 0.0767 Epoch: 2 Global Step: 30830 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:38:20,711-Speed 3105.70 samples/sec Loss 13.4334 LearningRate 0.0767 Epoch: 2 Global Step: 30840 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:38:24,024-Speed 3091.41 samples/sec Loss 13.3404 LearningRate 0.0767 Epoch: 2 Global Step: 30850 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:38:27,356-Speed 3074.39 samples/sec Loss 13.6204 LearningRate 0.0767 Epoch: 2 Global Step: 30860 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:38:30,682-Speed 3079.83 samples/sec Loss 13.3963 LearningRate 0.0767 Epoch: 2 Global Step: 30870 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:38:33,955-Speed 3129.82 samples/sec Loss 13.4417 LearningRate 0.0767 Epoch: 2 Global Step: 30880 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:38:37,263-Speed 3096.06 samples/sec Loss 13.4320 LearningRate 0.0767 Epoch: 2 Global Step: 30890 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:38:40,627-Speed 3044.89 samples/sec Loss 13.5540 LearningRate 0.0767 Epoch: 2 Global Step: 30900 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:38:43,945-Speed 3086.88 samples/sec Loss 13.3726 LearningRate 0.0767 Epoch: 2 Global Step: 30910 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:38:47,286-Speed 3065.73 samples/sec Loss 13.4619 LearningRate 0.0767 Epoch: 2 Global Step: 30920 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:38:50,662-Speed 3033.77 samples/sec Loss 13.4389 LearningRate 0.0766 Epoch: 2 Global Step: 30930 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:38:54,010-Speed 3059.80 samples/sec Loss 13.6939 LearningRate 0.0766 Epoch: 2 Global Step: 30940 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:38:57,331-Speed 3084.87 samples/sec Loss 13.5801 LearningRate 0.0766 Epoch: 2 Global Step: 30950 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:39:00,653-Speed 3083.21 samples/sec Loss 13.2562 LearningRate 0.0766 Epoch: 2 Global Step: 30960 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:39:04,052-Speed 3013.15 samples/sec Loss 13.6125 LearningRate 0.0766 Epoch: 2 Global Step: 30970 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:39:07,433-Speed 3029.60 samples/sec Loss 13.3350 LearningRate 0.0766 Epoch: 2 Global Step: 30980 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:39:10,743-Speed 3094.29 samples/sec Loss 13.4758 LearningRate 0.0766 Epoch: 2 Global Step: 30990 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:39:14,075-Speed 3074.50 samples/sec Loss 13.3469 LearningRate 0.0766 Epoch: 2 Global Step: 31000 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:39:17,409-Speed 3072.95 samples/sec Loss 13.4119 LearningRate 0.0766 Epoch: 2 Global Step: 31010 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:39:20,735-Speed 3079.73 samples/sec Loss 13.5339 LearningRate 0.0766 Epoch: 2 Global Step: 31020 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:39:24,073-Speed 3068.70 samples/sec Loss 13.4494 LearningRate 0.0766 Epoch: 2 Global Step: 31030 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:39:27,398-Speed 3080.07 samples/sec Loss 13.4265 LearningRate 0.0766 Epoch: 2 Global Step: 31040 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:39:30,739-Speed 3066.30 samples/sec Loss 13.5099 LearningRate 0.0766 Epoch: 2 Global Step: 31050 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:39:34,090-Speed 3056.53 samples/sec Loss 13.4380 LearningRate 0.0766 Epoch: 2 Global Step: 31060 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:39:37,424-Speed 3072.17 samples/sec Loss 13.4083 LearningRate 0.0766 Epoch: 2 Global Step: 31070 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:39:40,773-Speed 3058.66 samples/sec Loss 13.4291 LearningRate 0.0765 Epoch: 2 Global Step: 31080 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:39:44,142-Speed 3040.04 samples/sec Loss 13.4630 LearningRate 0.0765 Epoch: 2 Global Step: 31090 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:39:47,513-Speed 3038.86 samples/sec Loss 13.4754 LearningRate 0.0765 Epoch: 2 Global Step: 31100 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:39:50,854-Speed 3065.72 samples/sec Loss 13.4215 LearningRate 0.0765 Epoch: 2 Global Step: 31110 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:39:54,172-Speed 3087.36 samples/sec Loss 13.5064 LearningRate 0.0765 Epoch: 2 Global Step: 31120 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:39:57,514-Speed 3066.62 samples/sec Loss 13.5161 LearningRate 0.0765 Epoch: 2 Global Step: 31130 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:40:00,853-Speed 3067.14 samples/sec Loss 13.5224 LearningRate 0.0765 Epoch: 2 Global Step: 31140 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:40:04,159-Speed 3098.50 samples/sec Loss 13.4345 LearningRate 0.0765 Epoch: 2 Global Step: 31150 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:40:07,494-Speed 3071.95 samples/sec Loss 13.4170 LearningRate 0.0765 Epoch: 2 Global Step: 31160 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:40:10,749-Speed 3146.58 samples/sec Loss 13.3311 LearningRate 0.0765 Epoch: 2 Global Step: 31170 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:40:14,072-Speed 3081.75 samples/sec Loss 13.3474 LearningRate 0.0765 Epoch: 2 Global Step: 31180 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:40:17,371-Speed 3105.18 samples/sec Loss 13.4607 LearningRate 0.0765 Epoch: 2 Global Step: 31190 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 04:40:20,720-Speed 3058.94 samples/sec Loss 13.5553 LearningRate 0.0765 Epoch: 2 Global Step: 31200 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 04:40:24,049-Speed 3076.09 samples/sec Loss 13.5768 LearningRate 0.0765 Epoch: 2 Global Step: 31210 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 04:40:27,370-Speed 3084.82 samples/sec Loss 13.5386 LearningRate 0.0764 Epoch: 2 Global Step: 31220 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 04:40:30,695-Speed 3080.55 samples/sec Loss 13.4436 LearningRate 0.0764 Epoch: 2 Global Step: 31230 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 04:40:34,076-Speed 3029.55 samples/sec Loss 13.4487 LearningRate 0.0764 Epoch: 2 Global Step: 31240 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 04:40:37,414-Speed 3068.64 samples/sec Loss 13.4645 LearningRate 0.0764 Epoch: 2 Global Step: 31250 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 04:40:40,738-Speed 3081.58 samples/sec Loss 13.2276 LearningRate 0.0764 Epoch: 2 Global Step: 31260 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 04:40:44,046-Speed 3096.09 samples/sec Loss 13.4734 LearningRate 0.0764 Epoch: 2 Global Step: 31270 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 04:40:47,355-Speed 3095.44 samples/sec Loss 13.2908 LearningRate 0.0764 Epoch: 2 Global Step: 31280 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 04:40:50,711-Speed 3052.02 samples/sec Loss 13.3224 LearningRate 0.0764 Epoch: 2 Global Step: 31290 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:40:54,053-Speed 3064.84 samples/sec Loss 13.5574 LearningRate 0.0764 Epoch: 2 Global Step: 31300 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:40:57,482-Speed 2988.09 samples/sec Loss 13.4098 LearningRate 0.0764 Epoch: 2 Global Step: 31310 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:41:00,828-Speed 3060.73 samples/sec Loss 13.6187 LearningRate 0.0764 Epoch: 2 Global Step: 31320 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:41:04,099-Speed 3131.81 samples/sec Loss 13.2752 LearningRate 0.0764 Epoch: 2 Global Step: 31330 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:41:07,422-Speed 3082.26 samples/sec Loss 13.4369 LearningRate 0.0764 Epoch: 2 Global Step: 31340 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:41:10,747-Speed 3080.63 samples/sec Loss 13.4517 LearningRate 0.0764 Epoch: 2 Global Step: 31350 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:41:14,049-Speed 3102.34 samples/sec Loss 13.4722 LearningRate 0.0763 Epoch: 2 Global Step: 31360 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:41:17,382-Speed 3073.71 samples/sec Loss 13.4252 LearningRate 0.0763 Epoch: 2 Global Step: 31370 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:41:20,715-Speed 3073.29 samples/sec Loss 13.4390 LearningRate 0.0763 Epoch: 2 Global Step: 31380 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:41:24,062-Speed 3060.37 samples/sec Loss 13.4766 LearningRate 0.0763 Epoch: 2 Global Step: 31390 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 04:41:27,364-Speed 3101.67 samples/sec Loss 13.3338 LearningRate 0.0763 Epoch: 2 Global Step: 31400 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 04:41:30,640-Speed 3126.59 samples/sec Loss 13.3974 LearningRate 0.0763 Epoch: 2 Global Step: 31410 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 04:41:33,939-Speed 3104.80 samples/sec Loss 13.3562 LearningRate 0.0763 Epoch: 2 Global Step: 31420 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 04:41:37,236-Speed 3106.47 samples/sec Loss 13.5917 LearningRate 0.0763 Epoch: 2 Global Step: 31430 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 04:41:40,526-Speed 3112.93 samples/sec Loss 13.3540 LearningRate 0.0763 Epoch: 2 Global Step: 31440 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 04:41:43,861-Speed 3071.79 samples/sec Loss 13.3386 LearningRate 0.0763 Epoch: 2 Global Step: 31450 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 04:41:47,205-Speed 3063.59 samples/sec Loss 13.4774 LearningRate 0.0763 Epoch: 2 Global Step: 31460 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 04:41:50,478-Speed 3129.40 samples/sec Loss 13.2701 LearningRate 0.0763 Epoch: 2 Global Step: 31470 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 04:41:53,771-Speed 3110.55 samples/sec Loss 13.4248 LearningRate 0.0763 Epoch: 2 Global Step: 31480 Fp16 Grad Scale: 32768 Required: 20 hours Training: 2022-04-27 04:41:57,052-Speed 3121.87 samples/sec Loss 13.3128 LearningRate 0.0763 Epoch: 2 Global Step: 31490 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:42:00,371-Speed 3088.80 samples/sec Loss 13.3343 LearningRate 0.0762 Epoch: 2 Global Step: 31500 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:42:03,675-Speed 3100.25 samples/sec Loss 13.3881 LearningRate 0.0762 Epoch: 2 Global Step: 31510 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:42:07,004-Speed 3076.62 samples/sec Loss 13.3343 LearningRate 0.0762 Epoch: 2 Global Step: 31520 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:42:10,316-Speed 3092.77 samples/sec Loss 13.3357 LearningRate 0.0762 Epoch: 2 Global Step: 31530 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:42:13,590-Speed 3128.88 samples/sec Loss 13.3361 LearningRate 0.0762 Epoch: 2 Global Step: 31540 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:42:16,923-Speed 3073.35 samples/sec Loss 13.3159 LearningRate 0.0762 Epoch: 2 Global Step: 31550 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:42:20,299-Speed 3034.09 samples/sec Loss 13.3887 LearningRate 0.0762 Epoch: 2 Global Step: 31560 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:42:23,621-Speed 3083.56 samples/sec Loss 13.3568 LearningRate 0.0762 Epoch: 2 Global Step: 31570 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:42:26,905-Speed 3118.91 samples/sec Loss 13.3530 LearningRate 0.0762 Epoch: 2 Global Step: 31580 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:42:30,263-Speed 3050.13 samples/sec Loss 13.3434 LearningRate 0.0762 Epoch: 2 Global Step: 31590 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:42:33,590-Speed 3079.00 samples/sec Loss 13.4376 LearningRate 0.0762 Epoch: 2 Global Step: 31600 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:42:36,888-Speed 3106.36 samples/sec Loss 13.3450 LearningRate 0.0762 Epoch: 2 Global Step: 31610 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:42:40,172-Speed 3118.72 samples/sec Loss 13.4780 LearningRate 0.0762 Epoch: 2 Global Step: 31620 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:42:43,488-Speed 3089.29 samples/sec Loss 13.5170 LearningRate 0.0762 Epoch: 2 Global Step: 31630 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:42:46,795-Speed 3097.69 samples/sec Loss 13.3914 LearningRate 0.0761 Epoch: 2 Global Step: 31640 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:42:50,104-Speed 3095.39 samples/sec Loss 13.2141 LearningRate 0.0761 Epoch: 2 Global Step: 31650 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:42:53,420-Speed 3088.96 samples/sec Loss 13.2043 LearningRate 0.0761 Epoch: 2 Global Step: 31660 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:42:56,751-Speed 3075.09 samples/sec Loss 13.3591 LearningRate 0.0761 Epoch: 2 Global Step: 31670 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:43:00,032-Speed 3121.56 samples/sec Loss 13.2684 LearningRate 0.0761 Epoch: 2 Global Step: 31680 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:43:03,333-Speed 3103.51 samples/sec Loss 13.4536 LearningRate 0.0761 Epoch: 2 Global Step: 31690 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:43:06,666-Speed 3073.55 samples/sec Loss 13.4442 LearningRate 0.0761 Epoch: 2 Global Step: 31700 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:43:10,007-Speed 3065.34 samples/sec Loss 13.5985 LearningRate 0.0761 Epoch: 2 Global Step: 31710 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:43:13,299-Speed 3111.60 samples/sec Loss 13.4507 LearningRate 0.0761 Epoch: 2 Global Step: 31720 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:43:16,609-Speed 3094.46 samples/sec Loss 13.3228 LearningRate 0.0761 Epoch: 2 Global Step: 31730 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:43:19,953-Speed 3063.40 samples/sec Loss 13.3683 LearningRate 0.0761 Epoch: 2 Global Step: 31740 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:43:23,297-Speed 3063.37 samples/sec Loss 13.3618 LearningRate 0.0761 Epoch: 2 Global Step: 31750 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:43:26,682-Speed 3025.79 samples/sec Loss 13.3550 LearningRate 0.0761 Epoch: 2 Global Step: 31760 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:43:29,954-Speed 3130.18 samples/sec Loss 13.5097 LearningRate 0.0761 Epoch: 2 Global Step: 31770 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:43:33,341-Speed 3023.62 samples/sec Loss 13.3509 LearningRate 0.0761 Epoch: 2 Global Step: 31780 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:43:36,621-Speed 3123.08 samples/sec Loss 13.4595 LearningRate 0.0760 Epoch: 2 Global Step: 31790 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:43:39,967-Speed 3061.79 samples/sec Loss 13.3864 LearningRate 0.0760 Epoch: 2 Global Step: 31800 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:43:43,363-Speed 3015.99 samples/sec Loss 13.2631 LearningRate 0.0760 Epoch: 2 Global Step: 31810 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:43:46,711-Speed 3059.13 samples/sec Loss 13.3045 LearningRate 0.0760 Epoch: 2 Global Step: 31820 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:43:50,039-Speed 3078.43 samples/sec Loss 13.1801 LearningRate 0.0760 Epoch: 2 Global Step: 31830 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:43:53,349-Speed 3094.11 samples/sec Loss 13.4373 LearningRate 0.0760 Epoch: 2 Global Step: 31840 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:43:56,666-Speed 3088.62 samples/sec Loss 13.3657 LearningRate 0.0760 Epoch: 2 Global Step: 31850 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:43:59,970-Speed 3100.67 samples/sec Loss 13.3814 LearningRate 0.0760 Epoch: 2 Global Step: 31860 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:44:03,311-Speed 3065.52 samples/sec Loss 13.4375 LearningRate 0.0760 Epoch: 2 Global Step: 31870 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:44:06,646-Speed 3071.59 samples/sec Loss 13.2931 LearningRate 0.0760 Epoch: 2 Global Step: 31880 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:44:10,019-Speed 3036.83 samples/sec Loss 13.3725 LearningRate 0.0760 Epoch: 2 Global Step: 31890 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:44:13,309-Speed 3113.48 samples/sec Loss 13.2024 LearningRate 0.0760 Epoch: 2 Global Step: 31900 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:44:16,704-Speed 3016.81 samples/sec Loss 13.4280 LearningRate 0.0760 Epoch: 2 Global Step: 31910 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:44:19,977-Speed 3129.93 samples/sec Loss 13.4858 LearningRate 0.0760 Epoch: 2 Global Step: 31920 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:44:23,312-Speed 3070.46 samples/sec Loss 13.4076 LearningRate 0.0759 Epoch: 2 Global Step: 31930 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:44:26,635-Speed 3082.64 samples/sec Loss 13.2960 LearningRate 0.0759 Epoch: 2 Global Step: 31940 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:44:29,941-Speed 3098.78 samples/sec Loss 13.3431 LearningRate 0.0759 Epoch: 2 Global Step: 31950 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:44:33,216-Speed 3126.74 samples/sec Loss 13.2922 LearningRate 0.0759 Epoch: 2 Global Step: 31960 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:44:36,547-Speed 3076.08 samples/sec Loss 13.4354 LearningRate 0.0759 Epoch: 2 Global Step: 31970 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:44:39,860-Speed 3091.33 samples/sec Loss 13.4107 LearningRate 0.0759 Epoch: 2 Global Step: 31980 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:44:43,145-Speed 3117.59 samples/sec Loss 13.3609 LearningRate 0.0759 Epoch: 2 Global Step: 31990 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:44:46,449-Speed 3100.23 samples/sec Loss 13.2766 LearningRate 0.0759 Epoch: 2 Global Step: 32000 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:44:49,731-Speed 3120.88 samples/sec Loss 13.3433 LearningRate 0.0759 Epoch: 2 Global Step: 32010 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:44:53,026-Speed 3109.09 samples/sec Loss 13.4065 LearningRate 0.0759 Epoch: 2 Global Step: 32020 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:44:56,314-Speed 3115.44 samples/sec Loss 13.4407 LearningRate 0.0759 Epoch: 2 Global Step: 32030 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:44:59,659-Speed 3061.39 samples/sec Loss 13.3698 LearningRate 0.0759 Epoch: 2 Global Step: 32040 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:45:02,974-Speed 3090.00 samples/sec Loss 13.3837 LearningRate 0.0759 Epoch: 2 Global Step: 32050 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:45:06,290-Speed 3089.68 samples/sec Loss 13.5212 LearningRate 0.0759 Epoch: 2 Global Step: 32060 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:45:09,622-Speed 3074.42 samples/sec Loss 13.3323 LearningRate 0.0758 Epoch: 2 Global Step: 32070 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:45:12,913-Speed 3113.18 samples/sec Loss 13.4049 LearningRate 0.0758 Epoch: 2 Global Step: 32080 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:45:16,244-Speed 3074.83 samples/sec Loss 13.1521 LearningRate 0.0758 Epoch: 2 Global Step: 32090 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:45:19,601-Speed 3051.02 samples/sec Loss 13.4845 LearningRate 0.0758 Epoch: 2 Global Step: 32100 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:45:22,971-Speed 3039.76 samples/sec Loss 13.4205 LearningRate 0.0758 Epoch: 2 Global Step: 32110 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:45:26,331-Speed 3049.30 samples/sec Loss 13.2009 LearningRate 0.0758 Epoch: 2 Global Step: 32120 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:45:29,668-Speed 3068.82 samples/sec Loss 13.3476 LearningRate 0.0758 Epoch: 2 Global Step: 32130 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:45:33,032-Speed 3045.41 samples/sec Loss 13.3135 LearningRate 0.0758 Epoch: 2 Global Step: 32140 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:45:36,427-Speed 3016.76 samples/sec Loss 13.3678 LearningRate 0.0758 Epoch: 2 Global Step: 32150 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:45:39,735-Speed 3097.15 samples/sec Loss 13.3103 LearningRate 0.0758 Epoch: 2 Global Step: 32160 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:45:43,069-Speed 3072.24 samples/sec Loss 13.3128 LearningRate 0.0758 Epoch: 2 Global Step: 32170 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:45:46,386-Speed 3087.43 samples/sec Loss 13.3189 LearningRate 0.0758 Epoch: 2 Global Step: 32180 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:45:49,715-Speed 3077.10 samples/sec Loss 13.5368 LearningRate 0.0758 Epoch: 2 Global Step: 32190 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:45:53,066-Speed 3056.78 samples/sec Loss 13.3019 LearningRate 0.0758 Epoch: 2 Global Step: 32200 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:45:56,358-Speed 3111.75 samples/sec Loss 13.2985 LearningRate 0.0757 Epoch: 2 Global Step: 32210 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:45:59,691-Speed 3072.57 samples/sec Loss 13.2087 LearningRate 0.0757 Epoch: 2 Global Step: 32220 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:46:03,009-Speed 3086.98 samples/sec Loss 13.3176 LearningRate 0.0757 Epoch: 2 Global Step: 32230 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:46:06,327-Speed 3087.11 samples/sec Loss 13.2457 LearningRate 0.0757 Epoch: 2 Global Step: 32240 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:46:09,628-Speed 3103.50 samples/sec Loss 13.3458 LearningRate 0.0757 Epoch: 2 Global Step: 32250 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:46:12,995-Speed 3042.26 samples/sec Loss 13.3388 LearningRate 0.0757 Epoch: 2 Global Step: 32260 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:46:16,360-Speed 3043.76 samples/sec Loss 13.3952 LearningRate 0.0757 Epoch: 2 Global Step: 32270 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:46:19,638-Speed 3124.67 samples/sec Loss 13.4457 LearningRate 0.0757 Epoch: 2 Global Step: 32280 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:46:23,000-Speed 3046.44 samples/sec Loss 13.3057 LearningRate 0.0757 Epoch: 2 Global Step: 32290 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:46:26,342-Speed 3065.41 samples/sec Loss 13.3968 LearningRate 0.0757 Epoch: 2 Global Step: 32300 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:46:29,732-Speed 3021.52 samples/sec Loss 13.3266 LearningRate 0.0757 Epoch: 2 Global Step: 32310 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:46:33,051-Speed 3086.29 samples/sec Loss 13.3524 LearningRate 0.0757 Epoch: 2 Global Step: 32320 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:46:36,359-Speed 3096.99 samples/sec Loss 13.3455 LearningRate 0.0757 Epoch: 2 Global Step: 32330 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:46:39,633-Speed 3127.73 samples/sec Loss 13.2124 LearningRate 0.0757 Epoch: 2 Global Step: 32340 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:46:42,993-Speed 3049.12 samples/sec Loss 13.3080 LearningRate 0.0757 Epoch: 2 Global Step: 32350 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:46:46,275-Speed 3120.32 samples/sec Loss 13.3328 LearningRate 0.0756 Epoch: 2 Global Step: 32360 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:46:49,630-Speed 3053.26 samples/sec Loss 13.1978 LearningRate 0.0756 Epoch: 2 Global Step: 32370 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:46:52,919-Speed 3114.51 samples/sec Loss 13.3372 LearningRate 0.0756 Epoch: 2 Global Step: 32380 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:46:56,307-Speed 3023.35 samples/sec Loss 13.2886 LearningRate 0.0756 Epoch: 2 Global Step: 32390 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:46:59,718-Speed 3002.61 samples/sec Loss 13.2115 LearningRate 0.0756 Epoch: 2 Global Step: 32400 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:47:02,996-Speed 3124.42 samples/sec Loss 13.4155 LearningRate 0.0756 Epoch: 2 Global Step: 32410 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:47:06,380-Speed 3027.59 samples/sec Loss 13.2229 LearningRate 0.0756 Epoch: 2 Global Step: 32420 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:47:09,722-Speed 3064.87 samples/sec Loss 13.2516 LearningRate 0.0756 Epoch: 2 Global Step: 32430 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:47:13,039-Speed 3087.99 samples/sec Loss 13.3417 LearningRate 0.0756 Epoch: 2 Global Step: 32440 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:47:16,332-Speed 3111.33 samples/sec Loss 13.4043 LearningRate 0.0756 Epoch: 2 Global Step: 32450 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:47:19,685-Speed 3054.08 samples/sec Loss 13.3355 LearningRate 0.0756 Epoch: 2 Global Step: 32460 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:47:22,988-Speed 3100.81 samples/sec Loss 13.4790 LearningRate 0.0756 Epoch: 2 Global Step: 32470 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:47:26,276-Speed 3115.96 samples/sec Loss 13.3677 LearningRate 0.0756 Epoch: 2 Global Step: 32480 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:47:29,650-Speed 3035.49 samples/sec Loss 13.2601 LearningRate 0.0756 Epoch: 2 Global Step: 32490 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:47:32,943-Speed 3110.27 samples/sec Loss 13.2224 LearningRate 0.0755 Epoch: 2 Global Step: 32500 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:47:36,321-Speed 3033.05 samples/sec Loss 13.2935 LearningRate 0.0755 Epoch: 2 Global Step: 32510 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:47:39,626-Speed 3099.31 samples/sec Loss 13.3237 LearningRate 0.0755 Epoch: 2 Global Step: 32520 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:47:43,002-Speed 3033.95 samples/sec Loss 13.3345 LearningRate 0.0755 Epoch: 2 Global Step: 32530 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:47:46,359-Speed 3051.87 samples/sec Loss 13.3242 LearningRate 0.0755 Epoch: 2 Global Step: 32540 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:47:49,694-Speed 3071.82 samples/sec Loss 13.3677 LearningRate 0.0755 Epoch: 2 Global Step: 32550 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:47:52,995-Speed 3102.77 samples/sec Loss 13.2102 LearningRate 0.0755 Epoch: 2 Global Step: 32560 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:47:56,374-Speed 3030.97 samples/sec Loss 13.2676 LearningRate 0.0755 Epoch: 2 Global Step: 32570 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:47:59,681-Speed 3096.94 samples/sec Loss 13.2322 LearningRate 0.0755 Epoch: 2 Global Step: 32580 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:48:03,059-Speed 3032.50 samples/sec Loss 13.2750 LearningRate 0.0755 Epoch: 2 Global Step: 32590 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:48:06,358-Speed 3105.84 samples/sec Loss 13.2238 LearningRate 0.0755 Epoch: 2 Global Step: 32600 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:48:09,622-Speed 3138.37 samples/sec Loss 13.4179 LearningRate 0.0755 Epoch: 2 Global Step: 32610 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:48:12,967-Speed 3061.72 samples/sec Loss 13.2399 LearningRate 0.0755 Epoch: 2 Global Step: 32620 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:48:16,283-Speed 3088.88 samples/sec Loss 13.3454 LearningRate 0.0755 Epoch: 2 Global Step: 32630 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:48:19,577-Speed 3109.92 samples/sec Loss 13.3401 LearningRate 0.0754 Epoch: 2 Global Step: 32640 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:48:22,857-Speed 3122.74 samples/sec Loss 13.2853 LearningRate 0.0754 Epoch: 2 Global Step: 32650 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:48:26,168-Speed 3094.11 samples/sec Loss 13.1725 LearningRate 0.0754 Epoch: 2 Global Step: 32660 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:48:29,440-Speed 3130.14 samples/sec Loss 13.1905 LearningRate 0.0754 Epoch: 2 Global Step: 32670 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:48:32,768-Speed 3077.74 samples/sec Loss 13.1586 LearningRate 0.0754 Epoch: 2 Global Step: 32680 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:48:36,127-Speed 3049.67 samples/sec Loss 13.1417 LearningRate 0.0754 Epoch: 2 Global Step: 32690 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:48:39,471-Speed 3063.12 samples/sec Loss 13.3151 LearningRate 0.0754 Epoch: 2 Global Step: 32700 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:48:42,790-Speed 3085.79 samples/sec Loss 13.2426 LearningRate 0.0754 Epoch: 2 Global Step: 32710 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:48:46,113-Speed 3082.68 samples/sec Loss 13.3770 LearningRate 0.0754 Epoch: 2 Global Step: 32720 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:48:49,415-Speed 3101.96 samples/sec Loss 13.3610 LearningRate 0.0754 Epoch: 2 Global Step: 32730 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:48:52,764-Speed 3058.18 samples/sec Loss 13.3202 LearningRate 0.0754 Epoch: 2 Global Step: 32740 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:48:56,076-Speed 3092.89 samples/sec Loss 13.3338 LearningRate 0.0754 Epoch: 2 Global Step: 32750 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-04-27 04:48:59,413-Speed 3069.94 samples/sec Loss 13.2868 LearningRate 0.0754 Epoch: 2 Global Step: 32760 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:49:02,742-Speed 3076.61 samples/sec Loss 13.2781 LearningRate 0.0754 Epoch: 2 Global Step: 32770 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:49:06,083-Speed 3066.08 samples/sec Loss 13.2765 LearningRate 0.0754 Epoch: 2 Global Step: 32780 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:49:09,371-Speed 3115.33 samples/sec Loss 13.2617 LearningRate 0.0753 Epoch: 2 Global Step: 32790 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:49:12,676-Speed 3098.89 samples/sec Loss 13.3201 LearningRate 0.0753 Epoch: 2 Global Step: 32800 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:49:15,975-Speed 3104.89 samples/sec Loss 13.3497 LearningRate 0.0753 Epoch: 2 Global Step: 32810 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:49:19,324-Speed 3059.08 samples/sec Loss 13.4070 LearningRate 0.0753 Epoch: 2 Global Step: 32820 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:49:22,621-Speed 3106.72 samples/sec Loss 13.2580 LearningRate 0.0753 Epoch: 2 Global Step: 32830 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:49:25,963-Speed 3064.93 samples/sec Loss 13.1695 LearningRate 0.0753 Epoch: 2 Global Step: 32840 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:49:29,289-Speed 3079.63 samples/sec Loss 13.2328 LearningRate 0.0753 Epoch: 2 Global Step: 32850 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:49:32,581-Speed 3111.10 samples/sec Loss 13.3581 LearningRate 0.0753 Epoch: 2 Global Step: 32860 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:49:35,863-Speed 3120.82 samples/sec Loss 13.2442 LearningRate 0.0753 Epoch: 2 Global Step: 32870 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:49:39,211-Speed 3059.98 samples/sec Loss 13.1008 LearningRate 0.0753 Epoch: 2 Global Step: 32880 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:49:42,602-Speed 3020.61 samples/sec Loss 13.3015 LearningRate 0.0753 Epoch: 2 Global Step: 32890 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:49:45,926-Speed 3081.30 samples/sec Loss 13.3068 LearningRate 0.0753 Epoch: 2 Global Step: 32900 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:49:49,290-Speed 3044.85 samples/sec Loss 13.1647 LearningRate 0.0753 Epoch: 2 Global Step: 32910 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:49:52,623-Speed 3073.28 samples/sec Loss 13.0607 LearningRate 0.0753 Epoch: 2 Global Step: 32920 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:49:55,911-Speed 3115.06 samples/sec Loss 13.2826 LearningRate 0.0752 Epoch: 2 Global Step: 32930 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:49:59,247-Speed 3071.28 samples/sec Loss 13.1621 LearningRate 0.0752 Epoch: 2 Global Step: 32940 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:50:02,596-Speed 3058.13 samples/sec Loss 13.0954 LearningRate 0.0752 Epoch: 2 Global Step: 32950 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:50:05,896-Speed 3103.60 samples/sec Loss 13.3313 LearningRate 0.0752 Epoch: 2 Global Step: 32960 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:50:09,202-Speed 3099.45 samples/sec Loss 13.1210 LearningRate 0.0752 Epoch: 2 Global Step: 32970 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:50:12,523-Speed 3084.18 samples/sec Loss 13.3512 LearningRate 0.0752 Epoch: 2 Global Step: 32980 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:50:15,870-Speed 3060.65 samples/sec Loss 13.3226 LearningRate 0.0752 Epoch: 2 Global Step: 32990 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:50:19,181-Speed 3093.44 samples/sec Loss 13.2293 LearningRate 0.0752 Epoch: 2 Global Step: 33000 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:50:22,476-Speed 3108.75 samples/sec Loss 13.2434 LearningRate 0.0752 Epoch: 2 Global Step: 33010 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:50:25,812-Speed 3070.76 samples/sec Loss 13.2305 LearningRate 0.0752 Epoch: 2 Global Step: 33020 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:50:29,122-Speed 3094.12 samples/sec Loss 13.3315 LearningRate 0.0752 Epoch: 2 Global Step: 33030 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:50:32,536-Speed 3000.37 samples/sec Loss 13.3109 LearningRate 0.0752 Epoch: 2 Global Step: 33040 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:50:35,841-Speed 3099.15 samples/sec Loss 13.2170 LearningRate 0.0752 Epoch: 2 Global Step: 33050 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:50:39,164-Speed 3082.93 samples/sec Loss 13.3055 LearningRate 0.0752 Epoch: 2 Global Step: 33060 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:50:42,526-Speed 3046.19 samples/sec Loss 13.3099 LearningRate 0.0751 Epoch: 2 Global Step: 33070 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:50:45,932-Speed 3007.39 samples/sec Loss 13.2807 LearningRate 0.0751 Epoch: 2 Global Step: 33080 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:50:49,299-Speed 3042.09 samples/sec Loss 13.2328 LearningRate 0.0751 Epoch: 2 Global Step: 33090 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:50:52,687-Speed 3023.63 samples/sec Loss 13.2692 LearningRate 0.0751 Epoch: 2 Global Step: 33100 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:50:55,947-Speed 3142.55 samples/sec Loss 13.0934 LearningRate 0.0751 Epoch: 2 Global Step: 33110 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:50:59,203-Speed 3145.77 samples/sec Loss 13.2274 LearningRate 0.0751 Epoch: 2 Global Step: 33120 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:51:02,527-Speed 3081.57 samples/sec Loss 13.1820 LearningRate 0.0751 Epoch: 2 Global Step: 33130 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:51:05,867-Speed 3066.42 samples/sec Loss 13.2117 LearningRate 0.0751 Epoch: 2 Global Step: 33140 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:51:09,175-Speed 3096.91 samples/sec Loss 13.1509 LearningRate 0.0751 Epoch: 2 Global Step: 33150 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:51:12,484-Speed 3095.24 samples/sec Loss 13.2397 LearningRate 0.0751 Epoch: 2 Global Step: 33160 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-04-27 04:51:15,863-Speed 3032.08 samples/sec Loss 13.3193 LearningRate 0.0751 Epoch: 2 Global Step: 33170 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:51:19,222-Speed 3049.67 samples/sec Loss 13.3636 LearningRate 0.0751 Epoch: 2 Global Step: 33180 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:51:22,567-Speed 3061.71 samples/sec Loss 13.2348 LearningRate 0.0751 Epoch: 2 Global Step: 33190 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:51:25,939-Speed 3038.33 samples/sec Loss 13.3891 LearningRate 0.0751 Epoch: 2 Global Step: 33200 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:51:29,291-Speed 3055.43 samples/sec Loss 13.1850 LearningRate 0.0751 Epoch: 2 Global Step: 33210 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:51:32,626-Speed 3071.92 samples/sec Loss 13.0829 LearningRate 0.0750 Epoch: 2 Global Step: 33220 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:51:35,940-Speed 3090.15 samples/sec Loss 13.2218 LearningRate 0.0750 Epoch: 2 Global Step: 33230 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:51:39,332-Speed 3019.95 samples/sec Loss 13.2636 LearningRate 0.0750 Epoch: 2 Global Step: 33240 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:51:42,684-Speed 3055.82 samples/sec Loss 13.1769 LearningRate 0.0750 Epoch: 2 Global Step: 33250 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:51:46,113-Speed 2986.83 samples/sec Loss 13.3360 LearningRate 0.0750 Epoch: 2 Global Step: 33260 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:51:49,465-Speed 3056.29 samples/sec Loss 13.1496 LearningRate 0.0750 Epoch: 2 Global Step: 33270 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:51:52,808-Speed 3064.21 samples/sec Loss 13.1823 LearningRate 0.0750 Epoch: 2 Global Step: 33280 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:51:56,191-Speed 3027.35 samples/sec Loss 13.1472 LearningRate 0.0750 Epoch: 2 Global Step: 33290 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:51:59,549-Speed 3050.39 samples/sec Loss 13.2135 LearningRate 0.0750 Epoch: 2 Global Step: 33300 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:52:02,851-Speed 3102.55 samples/sec Loss 13.1356 LearningRate 0.0750 Epoch: 2 Global Step: 33310 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:52:06,209-Speed 3049.82 samples/sec Loss 13.2167 LearningRate 0.0750 Epoch: 2 Global Step: 33320 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:52:09,486-Speed 3126.18 samples/sec Loss 13.1417 LearningRate 0.0750 Epoch: 2 Global Step: 33330 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:52:12,860-Speed 3035.16 samples/sec Loss 13.1100 LearningRate 0.0750 Epoch: 2 Global Step: 33340 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:52:16,208-Speed 3059.98 samples/sec Loss 13.2857 LearningRate 0.0750 Epoch: 2 Global Step: 33350 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:52:19,515-Speed 3097.18 samples/sec Loss 13.1521 LearningRate 0.0749 Epoch: 2 Global Step: 33360 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:52:22,837-Speed 3083.41 samples/sec Loss 13.2718 LearningRate 0.0749 Epoch: 2 Global Step: 33370 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:52:26,199-Speed 3046.54 samples/sec Loss 13.1951 LearningRate 0.0749 Epoch: 2 Global Step: 33380 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:52:29,471-Speed 3130.09 samples/sec Loss 13.2699 LearningRate 0.0749 Epoch: 2 Global Step: 33390 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:52:32,783-Speed 3092.98 samples/sec Loss 13.2999 LearningRate 0.0749 Epoch: 2 Global Step: 33400 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:52:36,171-Speed 3023.16 samples/sec Loss 13.2870 LearningRate 0.0749 Epoch: 2 Global Step: 33410 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:52:39,558-Speed 3024.80 samples/sec Loss 13.2108 LearningRate 0.0749 Epoch: 2 Global Step: 33420 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:52:42,867-Speed 3094.65 samples/sec Loss 13.1141 LearningRate 0.0749 Epoch: 2 Global Step: 33430 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:52:46,199-Speed 3074.00 samples/sec Loss 13.3297 LearningRate 0.0749 Epoch: 2 Global Step: 33440 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:52:49,490-Speed 3113.46 samples/sec Loss 13.2736 LearningRate 0.0749 Epoch: 2 Global Step: 33450 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:52:52,822-Speed 3074.90 samples/sec Loss 13.1716 LearningRate 0.0749 Epoch: 2 Global Step: 33460 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:52:56,156-Speed 3072.78 samples/sec Loss 13.2842 LearningRate 0.0749 Epoch: 2 Global Step: 33470 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:52:59,485-Speed 3076.33 samples/sec Loss 13.2464 LearningRate 0.0749 Epoch: 2 Global Step: 33480 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:53:02,772-Speed 3116.37 samples/sec Loss 13.1235 LearningRate 0.0749 Epoch: 2 Global Step: 33490 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:53:06,081-Speed 3095.70 samples/sec Loss 13.1912 LearningRate 0.0748 Epoch: 2 Global Step: 33500 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:53:09,359-Speed 3124.36 samples/sec Loss 13.0870 LearningRate 0.0748 Epoch: 2 Global Step: 33510 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:53:12,688-Speed 3076.89 samples/sec Loss 13.1832 LearningRate 0.0748 Epoch: 2 Global Step: 33520 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:53:16,033-Speed 3062.14 samples/sec Loss 13.3642 LearningRate 0.0748 Epoch: 2 Global Step: 33530 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:53:19,375-Speed 3065.48 samples/sec Loss 13.3165 LearningRate 0.0748 Epoch: 2 Global Step: 33540 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:53:22,726-Speed 3056.60 samples/sec Loss 13.1651 LearningRate 0.0748 Epoch: 2 Global Step: 33550 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:53:26,032-Speed 3098.47 samples/sec Loss 13.2866 LearningRate 0.0748 Epoch: 2 Global Step: 33560 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:53:29,299-Speed 3134.99 samples/sec Loss 13.1866 LearningRate 0.0748 Epoch: 2 Global Step: 33570 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:53:32,641-Speed 3065.28 samples/sec Loss 13.2455 LearningRate 0.0748 Epoch: 2 Global Step: 33580 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:53:35,950-Speed 3095.09 samples/sec Loss 13.2481 LearningRate 0.0748 Epoch: 2 Global Step: 33590 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:53:39,274-Speed 3081.41 samples/sec Loss 13.1498 LearningRate 0.0748 Epoch: 2 Global Step: 33600 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:53:42,603-Speed 3077.59 samples/sec Loss 13.2521 LearningRate 0.0748 Epoch: 2 Global Step: 33610 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:53:45,972-Speed 3039.63 samples/sec Loss 13.0684 LearningRate 0.0748 Epoch: 2 Global Step: 33620 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:53:49,292-Speed 3085.86 samples/sec Loss 13.1607 LearningRate 0.0748 Epoch: 2 Global Step: 33630 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:53:52,648-Speed 3052.39 samples/sec Loss 13.2167 LearningRate 0.0748 Epoch: 2 Global Step: 33640 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:53:55,959-Speed 3093.66 samples/sec Loss 13.1804 LearningRate 0.0747 Epoch: 2 Global Step: 33650 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:53:59,314-Speed 3052.65 samples/sec Loss 13.1942 LearningRate 0.0747 Epoch: 2 Global Step: 33660 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:54:02,658-Speed 3063.04 samples/sec Loss 13.1234 LearningRate 0.0747 Epoch: 2 Global Step: 33670 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:54:05,994-Speed 3070.34 samples/sec Loss 13.0450 LearningRate 0.0747 Epoch: 2 Global Step: 33680 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:54:09,330-Speed 3071.89 samples/sec Loss 13.0391 LearningRate 0.0747 Epoch: 2 Global Step: 33690 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:54:12,621-Speed 3112.33 samples/sec Loss 13.2307 LearningRate 0.0747 Epoch: 2 Global Step: 33700 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:54:16,000-Speed 3030.75 samples/sec Loss 13.2446 LearningRate 0.0747 Epoch: 2 Global Step: 33710 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:54:19,409-Speed 3005.01 samples/sec Loss 13.1229 LearningRate 0.0747 Epoch: 2 Global Step: 33720 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:54:22,698-Speed 3114.49 samples/sec Loss 13.2272 LearningRate 0.0747 Epoch: 2 Global Step: 33730 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:54:26,080-Speed 3028.49 samples/sec Loss 13.1860 LearningRate 0.0747 Epoch: 2 Global Step: 33740 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:54:29,403-Speed 3082.96 samples/sec Loss 13.0774 LearningRate 0.0747 Epoch: 2 Global Step: 33750 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:54:32,751-Speed 3058.92 samples/sec Loss 13.0600 LearningRate 0.0747 Epoch: 2 Global Step: 33760 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:54:36,119-Speed 3041.43 samples/sec Loss 13.2193 LearningRate 0.0747 Epoch: 2 Global Step: 33770 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:54:39,522-Speed 3009.94 samples/sec Loss 13.0191 LearningRate 0.0747 Epoch: 2 Global Step: 33780 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:54:42,902-Speed 3030.90 samples/sec Loss 13.2579 LearningRate 0.0746 Epoch: 2 Global Step: 33790 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:54:46,295-Speed 3017.98 samples/sec Loss 13.3087 LearningRate 0.0746 Epoch: 2 Global Step: 33800 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:54:49,592-Speed 3107.07 samples/sec Loss 13.2166 LearningRate 0.0746 Epoch: 2 Global Step: 33810 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:54:52,903-Speed 3093.72 samples/sec Loss 13.2848 LearningRate 0.0746 Epoch: 2 Global Step: 33820 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:54:56,269-Speed 3043.49 samples/sec Loss 13.1733 LearningRate 0.0746 Epoch: 2 Global Step: 33830 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:54:59,573-Speed 3100.24 samples/sec Loss 13.1510 LearningRate 0.0746 Epoch: 2 Global Step: 33840 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:55:02,839-Speed 3135.94 samples/sec Loss 13.2091 LearningRate 0.0746 Epoch: 2 Global Step: 33850 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:55:06,139-Speed 3104.16 samples/sec Loss 13.2473 LearningRate 0.0746 Epoch: 2 Global Step: 33860 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:55:09,562-Speed 2992.17 samples/sec Loss 13.1724 LearningRate 0.0746 Epoch: 2 Global Step: 33870 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:55:12,858-Speed 3107.64 samples/sec Loss 13.2324 LearningRate 0.0746 Epoch: 2 Global Step: 33880 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:55:16,249-Speed 3020.66 samples/sec Loss 13.2020 LearningRate 0.0746 Epoch: 2 Global Step: 33890 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:55:19,592-Speed 3064.21 samples/sec Loss 13.3491 LearningRate 0.0746 Epoch: 2 Global Step: 33900 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:55:22,894-Speed 3102.36 samples/sec Loss 13.1672 LearningRate 0.0746 Epoch: 2 Global Step: 33910 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:55:26,283-Speed 3021.82 samples/sec Loss 13.2964 LearningRate 0.0746 Epoch: 2 Global Step: 33920 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:55:29,634-Speed 3057.62 samples/sec Loss 13.2988 LearningRate 0.0745 Epoch: 2 Global Step: 33930 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:55:32,994-Speed 3048.33 samples/sec Loss 13.1517 LearningRate 0.0745 Epoch: 2 Global Step: 33940 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:55:36,377-Speed 3027.59 samples/sec Loss 13.1566 LearningRate 0.0745 Epoch: 2 Global Step: 33950 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:55:39,772-Speed 3017.40 samples/sec Loss 13.0291 LearningRate 0.0745 Epoch: 2 Global Step: 33960 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:55:43,090-Speed 3086.74 samples/sec Loss 13.0850 LearningRate 0.0745 Epoch: 2 Global Step: 33970 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:55:46,441-Speed 3056.38 samples/sec Loss 13.0778 LearningRate 0.0745 Epoch: 2 Global Step: 33980 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:55:49,801-Speed 3048.80 samples/sec Loss 13.2742 LearningRate 0.0745 Epoch: 2 Global Step: 33990 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:55:53,152-Speed 3056.71 samples/sec Loss 13.1367 LearningRate 0.0745 Epoch: 2 Global Step: 34000 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:55:56,439-Speed 3116.27 samples/sec Loss 13.0975 LearningRate 0.0745 Epoch: 2 Global Step: 34010 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:55:59,773-Speed 3072.52 samples/sec Loss 13.1594 LearningRate 0.0745 Epoch: 2 Global Step: 34020 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:56:03,060-Speed 3116.02 samples/sec Loss 13.2431 LearningRate 0.0745 Epoch: 2 Global Step: 34030 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:56:06,379-Speed 3086.27 samples/sec Loss 13.2245 LearningRate 0.0745 Epoch: 2 Global Step: 34040 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:56:09,681-Speed 3102.91 samples/sec Loss 13.1897 LearningRate 0.0745 Epoch: 2 Global Step: 34050 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:56:13,014-Speed 3073.27 samples/sec Loss 13.1750 LearningRate 0.0745 Epoch: 2 Global Step: 34060 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:56:16,308-Speed 3109.55 samples/sec Loss 13.1302 LearningRate 0.0745 Epoch: 2 Global Step: 34070 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:56:19,646-Speed 3068.00 samples/sec Loss 13.0939 LearningRate 0.0744 Epoch: 2 Global Step: 34080 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:56:23,000-Speed 3054.12 samples/sec Loss 13.2441 LearningRate 0.0744 Epoch: 2 Global Step: 34090 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:56:26,326-Speed 3079.77 samples/sec Loss 12.9704 LearningRate 0.0744 Epoch: 2 Global Step: 34100 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:56:29,639-Speed 3092.38 samples/sec Loss 13.2604 LearningRate 0.0744 Epoch: 2 Global Step: 34110 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:56:32,961-Speed 3083.42 samples/sec Loss 13.1384 LearningRate 0.0744 Epoch: 2 Global Step: 34120 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:56:36,326-Speed 3044.13 samples/sec Loss 13.4172 LearningRate 0.0744 Epoch: 2 Global Step: 34130 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:56:39,651-Speed 3080.43 samples/sec Loss 13.1556 LearningRate 0.0744 Epoch: 2 Global Step: 34140 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-04-27 04:56:42,936-Speed 3117.89 samples/sec Loss 13.1813 LearningRate 0.0744 Epoch: 2 Global Step: 34150 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:56:46,265-Speed 3076.68 samples/sec Loss 13.1161 LearningRate 0.0744 Epoch: 2 Global Step: 34160 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:56:49,603-Speed 3068.84 samples/sec Loss 13.1920 LearningRate 0.0744 Epoch: 2 Global Step: 34170 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:56:52,939-Speed 3070.84 samples/sec Loss 13.1335 LearningRate 0.0744 Epoch: 2 Global Step: 34180 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:56:56,278-Speed 3067.40 samples/sec Loss 13.1989 LearningRate 0.0744 Epoch: 2 Global Step: 34190 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:56:59,618-Speed 3067.78 samples/sec Loss 13.1768 LearningRate 0.0744 Epoch: 2 Global Step: 34200 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:57:03,044-Speed 2990.11 samples/sec Loss 13.0319 LearningRate 0.0744 Epoch: 2 Global Step: 34210 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:57:06,367-Speed 3082.43 samples/sec Loss 13.1149 LearningRate 0.0743 Epoch: 2 Global Step: 34220 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:57:09,711-Speed 3062.99 samples/sec Loss 13.0723 LearningRate 0.0743 Epoch: 2 Global Step: 34230 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:57:13,001-Speed 3113.44 samples/sec Loss 13.0069 LearningRate 0.0743 Epoch: 2 Global Step: 34240 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:57:16,315-Speed 3090.51 samples/sec Loss 13.0198 LearningRate 0.0743 Epoch: 2 Global Step: 34250 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:57:19,660-Speed 3062.55 samples/sec Loss 12.8689 LearningRate 0.0743 Epoch: 2 Global Step: 34260 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:57:22,950-Speed 3113.13 samples/sec Loss 12.9774 LearningRate 0.0743 Epoch: 2 Global Step: 34270 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:57:26,292-Speed 3065.10 samples/sec Loss 13.3327 LearningRate 0.0743 Epoch: 2 Global Step: 34280 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:57:29,603-Speed 3093.71 samples/sec Loss 13.1194 LearningRate 0.0743 Epoch: 2 Global Step: 34290 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:57:32,914-Speed 3093.46 samples/sec Loss 13.2248 LearningRate 0.0743 Epoch: 2 Global Step: 34300 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:57:36,202-Speed 3115.14 samples/sec Loss 13.1331 LearningRate 0.0743 Epoch: 2 Global Step: 34310 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:57:39,480-Speed 3125.55 samples/sec Loss 13.1304 LearningRate 0.0743 Epoch: 2 Global Step: 34320 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:57:42,759-Speed 3123.77 samples/sec Loss 13.2620 LearningRate 0.0743 Epoch: 2 Global Step: 34330 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:57:46,105-Speed 3061.21 samples/sec Loss 13.0486 LearningRate 0.0743 Epoch: 2 Global Step: 34340 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:57:49,405-Speed 3104.02 samples/sec Loss 13.1535 LearningRate 0.0743 Epoch: 2 Global Step: 34350 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:57:52,699-Speed 3109.02 samples/sec Loss 13.1834 LearningRate 0.0743 Epoch: 2 Global Step: 34360 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:57:55,999-Speed 3104.40 samples/sec Loss 13.1164 LearningRate 0.0742 Epoch: 2 Global Step: 34370 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:57:59,346-Speed 3060.14 samples/sec Loss 13.1539 LearningRate 0.0742 Epoch: 2 Global Step: 34380 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:58:02,678-Speed 3074.27 samples/sec Loss 13.1674 LearningRate 0.0742 Epoch: 2 Global Step: 34390 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:58:05,977-Speed 3105.53 samples/sec Loss 12.9516 LearningRate 0.0742 Epoch: 2 Global Step: 34400 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:58:09,324-Speed 3059.51 samples/sec Loss 13.0487 LearningRate 0.0742 Epoch: 2 Global Step: 34410 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:58:12,628-Speed 3100.27 samples/sec Loss 13.2246 LearningRate 0.0742 Epoch: 2 Global Step: 34420 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:58:15,967-Speed 3067.22 samples/sec Loss 13.1338 LearningRate 0.0742 Epoch: 2 Global Step: 34430 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:58:19,290-Speed 3082.95 samples/sec Loss 13.2509 LearningRate 0.0742 Epoch: 2 Global Step: 34440 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 04:58:22,714-Speed 2991.07 samples/sec Loss 13.0597 LearningRate 0.0742 Epoch: 2 Global Step: 34450 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:58:26,084-Speed 3039.95 samples/sec Loss 13.0987 LearningRate 0.0742 Epoch: 2 Global Step: 34460 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:58:29,428-Speed 3062.76 samples/sec Loss 13.2625 LearningRate 0.0742 Epoch: 2 Global Step: 34470 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:58:32,752-Speed 3081.24 samples/sec Loss 13.1655 LearningRate 0.0742 Epoch: 2 Global Step: 34480 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:58:36,097-Speed 3063.21 samples/sec Loss 12.9619 LearningRate 0.0742 Epoch: 2 Global Step: 34490 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:58:39,461-Speed 3045.14 samples/sec Loss 12.9639 LearningRate 0.0742 Epoch: 2 Global Step: 34500 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:58:42,781-Speed 3084.93 samples/sec Loss 12.9820 LearningRate 0.0741 Epoch: 2 Global Step: 34510 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:58:46,084-Speed 3100.96 samples/sec Loss 13.1020 LearningRate 0.0741 Epoch: 2 Global Step: 34520 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:58:49,413-Speed 3077.22 samples/sec Loss 13.0925 LearningRate 0.0741 Epoch: 2 Global Step: 34530 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:58:52,745-Speed 3073.80 samples/sec Loss 13.1557 LearningRate 0.0741 Epoch: 2 Global Step: 34540 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:58:56,016-Speed 3131.72 samples/sec Loss 12.9970 LearningRate 0.0741 Epoch: 2 Global Step: 34550 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:58:59,311-Speed 3110.41 samples/sec Loss 13.0702 LearningRate 0.0741 Epoch: 2 Global Step: 34560 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:59:02,605-Speed 3109.50 samples/sec Loss 12.9972 LearningRate 0.0741 Epoch: 2 Global Step: 34570 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:59:05,871-Speed 3137.28 samples/sec Loss 13.1046 LearningRate 0.0741 Epoch: 2 Global Step: 34580 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:59:09,175-Speed 3099.79 samples/sec Loss 13.2619 LearningRate 0.0741 Epoch: 2 Global Step: 34590 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:59:12,503-Speed 3077.42 samples/sec Loss 13.1754 LearningRate 0.0741 Epoch: 2 Global Step: 34600 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:59:15,818-Speed 3090.41 samples/sec Loss 13.1520 LearningRate 0.0741 Epoch: 2 Global Step: 34610 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:59:19,172-Speed 3054.11 samples/sec Loss 13.0169 LearningRate 0.0741 Epoch: 2 Global Step: 34620 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:59:22,483-Speed 3093.91 samples/sec Loss 13.0646 LearningRate 0.0741 Epoch: 2 Global Step: 34630 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:59:25,752-Speed 3133.01 samples/sec Loss 13.1495 LearningRate 0.0741 Epoch: 2 Global Step: 34640 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:59:29,034-Speed 3120.64 samples/sec Loss 13.2383 LearningRate 0.0740 Epoch: 2 Global Step: 34650 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:59:32,363-Speed 3077.53 samples/sec Loss 12.9846 LearningRate 0.0740 Epoch: 2 Global Step: 34660 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:59:35,657-Speed 3109.43 samples/sec Loss 13.0861 LearningRate 0.0740 Epoch: 2 Global Step: 34670 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:59:38,971-Speed 3090.86 samples/sec Loss 13.2861 LearningRate 0.0740 Epoch: 2 Global Step: 34680 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:59:42,319-Speed 3059.28 samples/sec Loss 13.1531 LearningRate 0.0740 Epoch: 2 Global Step: 34690 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:59:45,585-Speed 3136.30 samples/sec Loss 13.1684 LearningRate 0.0740 Epoch: 2 Global Step: 34700 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:59:48,938-Speed 3054.45 samples/sec Loss 13.1490 LearningRate 0.0740 Epoch: 2 Global Step: 34710 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:59:52,278-Speed 3067.55 samples/sec Loss 13.0548 LearningRate 0.0740 Epoch: 2 Global Step: 34720 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:59:55,607-Speed 3076.45 samples/sec Loss 13.1122 LearningRate 0.0740 Epoch: 2 Global Step: 34730 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 04:59:58,936-Speed 3076.94 samples/sec Loss 13.1256 LearningRate 0.0740 Epoch: 2 Global Step: 34740 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:00:02,278-Speed 3064.95 samples/sec Loss 13.0874 LearningRate 0.0740 Epoch: 2 Global Step: 34750 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:00:05,551-Speed 3130.21 samples/sec Loss 13.1554 LearningRate 0.0740 Epoch: 2 Global Step: 34760 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:00:08,895-Speed 3062.50 samples/sec Loss 13.0342 LearningRate 0.0740 Epoch: 2 Global Step: 34770 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 05:00:12,227-Speed 3074.53 samples/sec Loss 13.2825 LearningRate 0.0740 Epoch: 2 Global Step: 34780 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 05:00:15,551-Speed 3081.47 samples/sec Loss 13.0297 LearningRate 0.0740 Epoch: 2 Global Step: 34790 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 05:00:18,858-Speed 3097.62 samples/sec Loss 13.1795 LearningRate 0.0739 Epoch: 2 Global Step: 34800 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 05:00:22,248-Speed 3021.18 samples/sec Loss 12.9859 LearningRate 0.0739 Epoch: 2 Global Step: 34810 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 05:00:25,563-Speed 3090.65 samples/sec Loss 13.2028 LearningRate 0.0739 Epoch: 2 Global Step: 34820 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 05:00:28,903-Speed 3065.91 samples/sec Loss 13.2642 LearningRate 0.0739 Epoch: 2 Global Step: 34830 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 05:00:32,220-Speed 3088.75 samples/sec Loss 13.0566 LearningRate 0.0739 Epoch: 2 Global Step: 34840 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 05:00:35,517-Speed 3106.87 samples/sec Loss 13.0053 LearningRate 0.0739 Epoch: 2 Global Step: 34850 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 05:00:38,825-Speed 3097.04 samples/sec Loss 13.0329 LearningRate 0.0739 Epoch: 2 Global Step: 34860 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 05:00:42,103-Speed 3124.53 samples/sec Loss 13.1280 LearningRate 0.0739 Epoch: 2 Global Step: 34870 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:00:45,416-Speed 3091.72 samples/sec Loss 13.2612 LearningRate 0.0739 Epoch: 2 Global Step: 34880 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:00:48,719-Speed 3100.58 samples/sec Loss 13.1161 LearningRate 0.0739 Epoch: 2 Global Step: 34890 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:00:52,065-Speed 3061.53 samples/sec Loss 13.1564 LearningRate 0.0739 Epoch: 2 Global Step: 34900 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:00:55,443-Speed 3033.97 samples/sec Loss 13.0988 LearningRate 0.0739 Epoch: 2 Global Step: 34910 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:00:58,724-Speed 3121.60 samples/sec Loss 13.0019 LearningRate 0.0739 Epoch: 2 Global Step: 34920 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:01:02,043-Speed 3086.62 samples/sec Loss 13.2048 LearningRate 0.0739 Epoch: 2 Global Step: 34930 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:01:05,326-Speed 3119.45 samples/sec Loss 12.9974 LearningRate 0.0738 Epoch: 2 Global Step: 34940 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:01:08,664-Speed 3069.28 samples/sec Loss 13.1391 LearningRate 0.0738 Epoch: 2 Global Step: 34950 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:01:11,971-Speed 3096.73 samples/sec Loss 13.0206 LearningRate 0.0738 Epoch: 2 Global Step: 34960 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:01:15,307-Speed 3071.14 samples/sec Loss 13.2023 LearningRate 0.0738 Epoch: 2 Global Step: 34970 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-04-27 05:01:18,606-Speed 3104.13 samples/sec Loss 13.0538 LearningRate 0.0738 Epoch: 2 Global Step: 34980 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:01:21,956-Speed 3058.35 samples/sec Loss 13.1141 LearningRate 0.0738 Epoch: 2 Global Step: 34990 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:01:25,286-Speed 3075.11 samples/sec Loss 13.2038 LearningRate 0.0738 Epoch: 2 Global Step: 35000 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:01:28,574-Speed 3116.44 samples/sec Loss 13.0730 LearningRate 0.0738 Epoch: 2 Global Step: 35010 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:01:31,923-Speed 3057.78 samples/sec Loss 13.0248 LearningRate 0.0738 Epoch: 2 Global Step: 35020 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:01:35,232-Speed 3095.29 samples/sec Loss 12.9567 LearningRate 0.0738 Epoch: 2 Global Step: 35030 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:01:38,606-Speed 3036.17 samples/sec Loss 13.0753 LearningRate 0.0738 Epoch: 2 Global Step: 35040 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:01:41,926-Speed 3085.15 samples/sec Loss 13.0422 LearningRate 0.0738 Epoch: 2 Global Step: 35050 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:01:45,241-Speed 3090.27 samples/sec Loss 13.0883 LearningRate 0.0738 Epoch: 2 Global Step: 35060 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:01:48,590-Speed 3058.67 samples/sec Loss 12.9390 LearningRate 0.0738 Epoch: 2 Global Step: 35070 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:01:51,920-Speed 3076.84 samples/sec Loss 13.0201 LearningRate 0.0738 Epoch: 2 Global Step: 35080 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:01:55,288-Speed 3041.20 samples/sec Loss 12.9734 LearningRate 0.0737 Epoch: 2 Global Step: 35090 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:01:58,611-Speed 3082.16 samples/sec Loss 13.2068 LearningRate 0.0737 Epoch: 2 Global Step: 35100 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:02:01,889-Speed 3125.00 samples/sec Loss 13.1131 LearningRate 0.0737 Epoch: 2 Global Step: 35110 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:02:05,230-Speed 3065.91 samples/sec Loss 12.9593 LearningRate 0.0737 Epoch: 2 Global Step: 35120 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:02:08,564-Speed 3071.34 samples/sec Loss 13.1215 LearningRate 0.0737 Epoch: 2 Global Step: 35130 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:02:11,945-Speed 3029.88 samples/sec Loss 12.9352 LearningRate 0.0737 Epoch: 2 Global Step: 35140 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:02:15,344-Speed 3013.85 samples/sec Loss 13.0940 LearningRate 0.0737 Epoch: 2 Global Step: 35150 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:02:18,698-Speed 3053.32 samples/sec Loss 12.9241 LearningRate 0.0737 Epoch: 2 Global Step: 35160 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:02:21,984-Speed 3117.55 samples/sec Loss 13.0949 LearningRate 0.0737 Epoch: 2 Global Step: 35170 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:02:25,348-Speed 3045.55 samples/sec Loss 13.0184 LearningRate 0.0737 Epoch: 2 Global Step: 35180 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:02:28,685-Speed 3069.24 samples/sec Loss 13.2305 LearningRate 0.0737 Epoch: 2 Global Step: 35190 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:02:32,019-Speed 3071.96 samples/sec Loss 12.9642 LearningRate 0.0737 Epoch: 2 Global Step: 35200 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:02:35,349-Speed 3076.59 samples/sec Loss 12.9550 LearningRate 0.0737 Epoch: 2 Global Step: 35210 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:02:38,649-Speed 3103.64 samples/sec Loss 12.9185 LearningRate 0.0737 Epoch: 2 Global Step: 35220 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:02:41,964-Speed 3089.65 samples/sec Loss 13.1760 LearningRate 0.0736 Epoch: 2 Global Step: 35230 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:02:45,296-Speed 3074.89 samples/sec Loss 13.0833 LearningRate 0.0736 Epoch: 2 Global Step: 35240 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:02:48,605-Speed 3095.50 samples/sec Loss 13.0108 LearningRate 0.0736 Epoch: 2 Global Step: 35250 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:02:51,920-Speed 3089.26 samples/sec Loss 12.9675 LearningRate 0.0736 Epoch: 2 Global Step: 35260 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:02:55,212-Speed 3111.87 samples/sec Loss 13.0593 LearningRate 0.0736 Epoch: 2 Global Step: 35270 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:02:58,539-Speed 3078.99 samples/sec Loss 12.9757 LearningRate 0.0736 Epoch: 2 Global Step: 35280 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:03:01,836-Speed 3106.22 samples/sec Loss 13.0680 LearningRate 0.0736 Epoch: 2 Global Step: 35290 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:03:05,149-Speed 3091.86 samples/sec Loss 12.9848 LearningRate 0.0736 Epoch: 2 Global Step: 35300 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:03:08,417-Speed 3134.36 samples/sec Loss 13.0243 LearningRate 0.0736 Epoch: 2 Global Step: 35310 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:03:11,728-Speed 3093.26 samples/sec Loss 13.0505 LearningRate 0.0736 Epoch: 2 Global Step: 35320 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:03:15,067-Speed 3068.48 samples/sec Loss 12.9987 LearningRate 0.0736 Epoch: 2 Global Step: 35330 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 05:03:18,403-Speed 3069.68 samples/sec Loss 13.0375 LearningRate 0.0736 Epoch: 2 Global Step: 35340 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 05:03:21,776-Speed 3037.01 samples/sec Loss 13.0925 LearningRate 0.0736 Epoch: 2 Global Step: 35350 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 05:03:25,052-Speed 3126.64 samples/sec Loss 13.1516 LearningRate 0.0736 Epoch: 2 Global Step: 35360 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 05:03:28,338-Speed 3117.42 samples/sec Loss 12.9971 LearningRate 0.0736 Epoch: 2 Global Step: 35370 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 05:03:31,664-Speed 3079.59 samples/sec Loss 13.0410 LearningRate 0.0735 Epoch: 2 Global Step: 35380 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 05:03:34,971-Speed 3097.59 samples/sec Loss 13.0062 LearningRate 0.0735 Epoch: 2 Global Step: 35390 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 05:03:38,270-Speed 3104.49 samples/sec Loss 13.0327 LearningRate 0.0735 Epoch: 2 Global Step: 35400 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 05:03:41,571-Speed 3102.99 samples/sec Loss 12.9810 LearningRate 0.0735 Epoch: 2 Global Step: 35410 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 05:03:44,857-Speed 3118.21 samples/sec Loss 13.0263 LearningRate 0.0735 Epoch: 2 Global Step: 35420 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 05:03:48,170-Speed 3091.06 samples/sec Loss 13.0954 LearningRate 0.0735 Epoch: 2 Global Step: 35430 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:03:51,466-Speed 3107.82 samples/sec Loss 12.9053 LearningRate 0.0735 Epoch: 2 Global Step: 35440 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:03:54,808-Speed 3065.30 samples/sec Loss 12.9550 LearningRate 0.0735 Epoch: 2 Global Step: 35450 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:03:58,114-Speed 3097.75 samples/sec Loss 13.2120 LearningRate 0.0735 Epoch: 2 Global Step: 35460 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:04:01,423-Speed 3095.61 samples/sec Loss 13.1242 LearningRate 0.0735 Epoch: 2 Global Step: 35470 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:04:04,802-Speed 3034.94 samples/sec Loss 12.9201 LearningRate 0.0735 Epoch: 2 Global Step: 35480 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:04:08,134-Speed 3073.38 samples/sec Loss 13.0782 LearningRate 0.0735 Epoch: 2 Global Step: 35490 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:04:11,438-Speed 3100.12 samples/sec Loss 12.9077 LearningRate 0.0735 Epoch: 2 Global Step: 35500 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:04:14,786-Speed 3059.93 samples/sec Loss 13.0249 LearningRate 0.0735 Epoch: 2 Global Step: 35510 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:04:18,166-Speed 3030.80 samples/sec Loss 13.0155 LearningRate 0.0734 Epoch: 2 Global Step: 35520 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:04:21,422-Speed 3146.43 samples/sec Loss 13.2444 LearningRate 0.0734 Epoch: 2 Global Step: 35530 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:04:24,704-Speed 3120.57 samples/sec Loss 13.0738 LearningRate 0.0734 Epoch: 2 Global Step: 35540 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:04:28,011-Speed 3097.39 samples/sec Loss 12.9534 LearningRate 0.0734 Epoch: 2 Global Step: 35550 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:04:31,275-Speed 3138.49 samples/sec Loss 12.9029 LearningRate 0.0734 Epoch: 2 Global Step: 35560 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:04:34,557-Speed 3121.13 samples/sec Loss 12.9086 LearningRate 0.0734 Epoch: 2 Global Step: 35570 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:04:37,822-Speed 3137.25 samples/sec Loss 12.8643 LearningRate 0.0734 Epoch: 2 Global Step: 35580 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:04:41,120-Speed 3106.30 samples/sec Loss 13.0320 LearningRate 0.0734 Epoch: 2 Global Step: 35590 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:04:44,386-Speed 3135.46 samples/sec Loss 12.9854 LearningRate 0.0734 Epoch: 2 Global Step: 35600 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:04:47,702-Speed 3089.99 samples/sec Loss 13.0077 LearningRate 0.0734 Epoch: 2 Global Step: 35610 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:04:50,979-Speed 3125.07 samples/sec Loss 13.0629 LearningRate 0.0734 Epoch: 2 Global Step: 35620 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:04:54,273-Speed 3110.52 samples/sec Loss 12.9643 LearningRate 0.0734 Epoch: 2 Global Step: 35630 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:04:57,618-Speed 3061.58 samples/sec Loss 13.0042 LearningRate 0.0734 Epoch: 2 Global Step: 35640 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:05:00,941-Speed 3082.79 samples/sec Loss 12.8403 LearningRate 0.0734 Epoch: 2 Global Step: 35650 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:05:04,327-Speed 3024.68 samples/sec Loss 13.0366 LearningRate 0.0734 Epoch: 2 Global Step: 35660 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:05:07,638-Speed 3093.41 samples/sec Loss 12.8554 LearningRate 0.0733 Epoch: 2 Global Step: 35670 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:05:10,990-Speed 3056.38 samples/sec Loss 12.9044 LearningRate 0.0733 Epoch: 2 Global Step: 35680 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:05:14,387-Speed 3015.11 samples/sec Loss 13.0877 LearningRate 0.0733 Epoch: 2 Global Step: 35690 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:05:17,698-Speed 3094.09 samples/sec Loss 12.9246 LearningRate 0.0733 Epoch: 2 Global Step: 35700 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:05:21,027-Speed 3076.39 samples/sec Loss 12.9905 LearningRate 0.0733 Epoch: 2 Global Step: 35710 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:05:24,436-Speed 3005.35 samples/sec Loss 12.9176 LearningRate 0.0733 Epoch: 2 Global Step: 35720 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:05:27,707-Speed 3130.89 samples/sec Loss 12.9838 LearningRate 0.0733 Epoch: 2 Global Step: 35730 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:05:30,978-Speed 3131.41 samples/sec Loss 12.9956 LearningRate 0.0733 Epoch: 2 Global Step: 35740 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:05:34,297-Speed 3087.07 samples/sec Loss 12.9231 LearningRate 0.0733 Epoch: 2 Global Step: 35750 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:05:37,607-Speed 3094.79 samples/sec Loss 13.0684 LearningRate 0.0733 Epoch: 2 Global Step: 35760 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:05:40,999-Speed 3019.33 samples/sec Loss 13.1908 LearningRate 0.0733 Epoch: 2 Global Step: 35770 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:05:44,337-Speed 3068.95 samples/sec Loss 13.0441 LearningRate 0.0733 Epoch: 2 Global Step: 35780 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:05:47,726-Speed 3021.78 samples/sec Loss 13.0811 LearningRate 0.0733 Epoch: 2 Global Step: 35790 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:05:51,067-Speed 3066.00 samples/sec Loss 12.8813 LearningRate 0.0733 Epoch: 2 Global Step: 35800 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:05:54,379-Speed 3093.19 samples/sec Loss 12.9781 LearningRate 0.0732 Epoch: 2 Global Step: 35810 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:05:57,723-Speed 3062.72 samples/sec Loss 13.0098 LearningRate 0.0732 Epoch: 2 Global Step: 35820 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:06:01,016-Speed 3111.24 samples/sec Loss 13.1611 LearningRate 0.0732 Epoch: 2 Global Step: 35830 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:06:04,374-Speed 3049.60 samples/sec Loss 13.1241 LearningRate 0.0732 Epoch: 2 Global Step: 35840 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:06:07,688-Speed 3091.46 samples/sec Loss 12.9969 LearningRate 0.0732 Epoch: 2 Global Step: 35850 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:06:10,993-Speed 3099.43 samples/sec Loss 12.9535 LearningRate 0.0732 Epoch: 2 Global Step: 35860 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:06:14,295-Speed 3102.24 samples/sec Loss 13.0856 LearningRate 0.0732 Epoch: 2 Global Step: 35870 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:06:17,621-Speed 3079.09 samples/sec Loss 12.8756 LearningRate 0.0732 Epoch: 2 Global Step: 35880 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:06:20,944-Speed 3083.32 samples/sec Loss 13.1147 LearningRate 0.0732 Epoch: 2 Global Step: 35890 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:06:24,274-Speed 3076.07 samples/sec Loss 12.9846 LearningRate 0.0732 Epoch: 2 Global Step: 35900 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:06:27,582-Speed 3096.04 samples/sec Loss 12.9816 LearningRate 0.0732 Epoch: 2 Global Step: 35910 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:06:30,854-Speed 3133.26 samples/sec Loss 12.9605 LearningRate 0.0732 Epoch: 2 Global Step: 35920 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:06:34,191-Speed 3069.12 samples/sec Loss 12.9396 LearningRate 0.0732 Epoch: 2 Global Step: 35930 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-04-27 05:06:37,510-Speed 3086.40 samples/sec Loss 12.8718 LearningRate 0.0732 Epoch: 2 Global Step: 35940 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:06:40,827-Speed 3088.54 samples/sec Loss 12.9024 LearningRate 0.0732 Epoch: 2 Global Step: 35950 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:06:44,252-Speed 2990.67 samples/sec Loss 13.0007 LearningRate 0.0731 Epoch: 2 Global Step: 35960 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:06:47,626-Speed 3036.01 samples/sec Loss 13.0210 LearningRate 0.0731 Epoch: 2 Global Step: 35970 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:06:50,999-Speed 3036.52 samples/sec Loss 13.0151 LearningRate 0.0731 Epoch: 2 Global Step: 35980 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:06:54,280-Speed 3122.43 samples/sec Loss 13.0522 LearningRate 0.0731 Epoch: 2 Global Step: 35990 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:06:57,621-Speed 3066.05 samples/sec Loss 13.0137 LearningRate 0.0731 Epoch: 2 Global Step: 36000 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:07:01,009-Speed 3023.27 samples/sec Loss 12.8956 LearningRate 0.0731 Epoch: 2 Global Step: 36010 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:07:04,337-Speed 3078.66 samples/sec Loss 13.1117 LearningRate 0.0731 Epoch: 2 Global Step: 36020 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:07:07,643-Speed 3098.35 samples/sec Loss 12.9829 LearningRate 0.0731 Epoch: 2 Global Step: 36030 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:07:10,946-Speed 3101.50 samples/sec Loss 12.9416 LearningRate 0.0731 Epoch: 2 Global Step: 36040 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-04-27 05:07:14,251-Speed 3099.52 samples/sec Loss 13.0614 LearningRate 0.0731 Epoch: 2 Global Step: 36050 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:07:17,560-Speed 3095.24 samples/sec Loss 13.0021 LearningRate 0.0731 Epoch: 2 Global Step: 36060 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:07:20,897-Speed 3069.50 samples/sec Loss 12.9547 LearningRate 0.0731 Epoch: 2 Global Step: 36070 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:07:24,208-Speed 3093.60 samples/sec Loss 12.8756 LearningRate 0.0731 Epoch: 2 Global Step: 36080 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:07:27,526-Speed 3087.91 samples/sec Loss 12.8037 LearningRate 0.0731 Epoch: 2 Global Step: 36090 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:07:30,823-Speed 3106.28 samples/sec Loss 12.9073 LearningRate 0.0730 Epoch: 2 Global Step: 36100 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:07:34,096-Speed 3130.03 samples/sec Loss 12.9273 LearningRate 0.0730 Epoch: 2 Global Step: 36110 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:07:37,427-Speed 3074.30 samples/sec Loss 13.1129 LearningRate 0.0730 Epoch: 2 Global Step: 36120 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:07:40,767-Speed 3067.40 samples/sec Loss 12.9305 LearningRate 0.0730 Epoch: 2 Global Step: 36130 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:07:44,100-Speed 3072.62 samples/sec Loss 12.8100 LearningRate 0.0730 Epoch: 2 Global Step: 36140 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:07:47,387-Speed 3116.72 samples/sec Loss 12.9023 LearningRate 0.0730 Epoch: 2 Global Step: 36150 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:07:50,660-Speed 3129.59 samples/sec Loss 12.9999 LearningRate 0.0730 Epoch: 2 Global Step: 36160 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:07:54,057-Speed 3015.24 samples/sec Loss 12.8754 LearningRate 0.0730 Epoch: 2 Global Step: 36170 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:07:57,425-Speed 3040.60 samples/sec Loss 12.8482 LearningRate 0.0730 Epoch: 2 Global Step: 36180 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:08:00,730-Speed 3100.19 samples/sec Loss 12.9470 LearningRate 0.0730 Epoch: 2 Global Step: 36190 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:08:04,096-Speed 3043.34 samples/sec Loss 12.8553 LearningRate 0.0730 Epoch: 2 Global Step: 36200 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:08:07,480-Speed 3026.70 samples/sec Loss 12.8754 LearningRate 0.0730 Epoch: 2 Global Step: 36210 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:08:10,781-Speed 3102.84 samples/sec Loss 12.8222 LearningRate 0.0730 Epoch: 2 Global Step: 36220 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:08:14,076-Speed 3108.67 samples/sec Loss 13.0676 LearningRate 0.0730 Epoch: 2 Global Step: 36230 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:08:17,366-Speed 3113.20 samples/sec Loss 12.8813 LearningRate 0.0730 Epoch: 2 Global Step: 36240 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:08:20,642-Speed 3126.14 samples/sec Loss 12.9625 LearningRate 0.0729 Epoch: 2 Global Step: 36250 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-04-27 05:08:23,960-Speed 3087.40 samples/sec Loss 12.9066 LearningRate 0.0729 Epoch: 2 Global Step: 36260 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:08:27,292-Speed 3073.83 samples/sec Loss 13.0432 LearningRate 0.0729 Epoch: 2 Global Step: 36270 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:08:30,590-Speed 3106.84 samples/sec Loss 12.8477 LearningRate 0.0729 Epoch: 2 Global Step: 36280 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:08:33,894-Speed 3100.20 samples/sec Loss 12.9276 LearningRate 0.0729 Epoch: 2 Global Step: 36290 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:08:37,170-Speed 3127.00 samples/sec Loss 13.0060 LearningRate 0.0729 Epoch: 2 Global Step: 36300 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:08:40,461-Speed 3112.41 samples/sec Loss 12.8923 LearningRate 0.0729 Epoch: 2 Global Step: 36310 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:08:43,815-Speed 3053.82 samples/sec Loss 12.8831 LearningRate 0.0729 Epoch: 2 Global Step: 36320 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:08:47,191-Speed 3034.37 samples/sec Loss 12.9038 LearningRate 0.0729 Epoch: 2 Global Step: 36330 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:08:50,585-Speed 3017.73 samples/sec Loss 12.9913 LearningRate 0.0729 Epoch: 2 Global Step: 36340 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:08:53,890-Speed 3099.23 samples/sec Loss 12.7886 LearningRate 0.0729 Epoch: 2 Global Step: 36350 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:08:57,189-Speed 3104.50 samples/sec Loss 12.8896 LearningRate 0.0729 Epoch: 2 Global Step: 36360 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:09:00,525-Speed 3070.67 samples/sec Loss 12.6755 LearningRate 0.0729 Epoch: 2 Global Step: 36370 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:09:03,882-Speed 3051.68 samples/sec Loss 13.0536 LearningRate 0.0729 Epoch: 2 Global Step: 36380 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:09:07,221-Speed 3067.21 samples/sec Loss 12.7660 LearningRate 0.0728 Epoch: 2 Global Step: 36390 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:09:10,533-Speed 3092.35 samples/sec Loss 12.9223 LearningRate 0.0728 Epoch: 2 Global Step: 36400 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:09:13,858-Speed 3081.44 samples/sec Loss 13.0208 LearningRate 0.0728 Epoch: 2 Global Step: 36410 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:09:17,170-Speed 3092.04 samples/sec Loss 12.9870 LearningRate 0.0728 Epoch: 2 Global Step: 36420 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:09:20,516-Speed 3061.21 samples/sec Loss 13.0470 LearningRate 0.0728 Epoch: 2 Global Step: 36430 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:09:23,809-Speed 3111.34 samples/sec Loss 12.8917 LearningRate 0.0728 Epoch: 2 Global Step: 36440 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:09:27,114-Speed 3099.11 samples/sec Loss 12.9454 LearningRate 0.0728 Epoch: 2 Global Step: 36450 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:09:30,476-Speed 3046.22 samples/sec Loss 12.8939 LearningRate 0.0728 Epoch: 2 Global Step: 36460 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:09:33,844-Speed 3042.17 samples/sec Loss 12.9213 LearningRate 0.0728 Epoch: 2 Global Step: 36470 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:09:37,217-Speed 3035.91 samples/sec Loss 12.6987 LearningRate 0.0728 Epoch: 2 Global Step: 36480 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:09:40,586-Speed 3040.46 samples/sec Loss 12.7704 LearningRate 0.0728 Epoch: 2 Global Step: 36490 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:09:43,958-Speed 3037.76 samples/sec Loss 12.8202 LearningRate 0.0728 Epoch: 2 Global Step: 36500 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:09:47,349-Speed 3020.70 samples/sec Loss 12.9386 LearningRate 0.0728 Epoch: 2 Global Step: 36510 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:09:50,696-Speed 3060.02 samples/sec Loss 12.8461 LearningRate 0.0728 Epoch: 2 Global Step: 36520 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:09:54,053-Speed 3051.82 samples/sec Loss 12.9890 LearningRate 0.0728 Epoch: 2 Global Step: 36530 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:09:57,405-Speed 3055.13 samples/sec Loss 12.9805 LearningRate 0.0727 Epoch: 2 Global Step: 36540 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:10:00,794-Speed 3022.33 samples/sec Loss 12.7804 LearningRate 0.0727 Epoch: 2 Global Step: 36550 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:10:04,172-Speed 3032.99 samples/sec Loss 12.9932 LearningRate 0.0727 Epoch: 2 Global Step: 36560 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-04-27 05:10:07,595-Speed 2991.62 samples/sec Loss 12.8991 LearningRate 0.0727 Epoch: 2 Global Step: 36570 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:10:10,904-Speed 3096.02 samples/sec Loss 12.7684 LearningRate 0.0727 Epoch: 2 Global Step: 36580 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:10:14,204-Speed 3104.45 samples/sec Loss 12.9101 LearningRate 0.0727 Epoch: 2 Global Step: 36590 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:10:17,483-Speed 3123.17 samples/sec Loss 12.9448 LearningRate 0.0727 Epoch: 2 Global Step: 36600 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:10:20,812-Speed 3077.10 samples/sec Loss 12.9974 LearningRate 0.0727 Epoch: 2 Global Step: 36610 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:10:24,124-Speed 3093.07 samples/sec Loss 13.0074 LearningRate 0.0727 Epoch: 2 Global Step: 36620 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:10:27,531-Speed 3005.83 samples/sec Loss 12.9626 LearningRate 0.0727 Epoch: 2 Global Step: 36630 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:10:30,883-Speed 3056.45 samples/sec Loss 12.9321 LearningRate 0.0727 Epoch: 2 Global Step: 36640 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:10:34,263-Speed 3030.25 samples/sec Loss 12.6570 LearningRate 0.0727 Epoch: 2 Global Step: 36650 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:10:37,554-Speed 3112.68 samples/sec Loss 12.9307 LearningRate 0.0727 Epoch: 2 Global Step: 36660 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:10:40,894-Speed 3066.36 samples/sec Loss 12.9936 LearningRate 0.0727 Epoch: 2 Global Step: 36670 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:10:44,240-Speed 3062.23 samples/sec Loss 12.9812 LearningRate 0.0726 Epoch: 2 Global Step: 36680 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:10:47,577-Speed 3068.69 samples/sec Loss 12.8401 LearningRate 0.0726 Epoch: 2 Global Step: 36690 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:10:50,889-Speed 3093.00 samples/sec Loss 12.9790 LearningRate 0.0726 Epoch: 2 Global Step: 36700 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:10:54,207-Speed 3087.40 samples/sec Loss 13.0494 LearningRate 0.0726 Epoch: 2 Global Step: 36710 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:10:57,548-Speed 3065.82 samples/sec Loss 13.0596 LearningRate 0.0726 Epoch: 2 Global Step: 36720 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:11:00,879-Speed 3075.58 samples/sec Loss 13.0030 LearningRate 0.0726 Epoch: 2 Global Step: 36730 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:11:04,185-Speed 3098.10 samples/sec Loss 12.8777 LearningRate 0.0726 Epoch: 2 Global Step: 36740 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:11:07,488-Speed 3101.40 samples/sec Loss 12.8691 LearningRate 0.0726 Epoch: 2 Global Step: 36750 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:11:10,821-Speed 3072.82 samples/sec Loss 12.8913 LearningRate 0.0726 Epoch: 2 Global Step: 36760 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:11:14,266-Speed 2973.72 samples/sec Loss 12.7072 LearningRate 0.0726 Epoch: 2 Global Step: 36770 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-04-27 05:11:17,608-Speed 3064.47 samples/sec Loss 12.8755 LearningRate 0.0726 Epoch: 2 Global Step: 36780 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:11:20,937-Speed 3077.50 samples/sec Loss 12.9637 LearningRate 0.0726 Epoch: 2 Global Step: 36790 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:11:24,291-Speed 3053.98 samples/sec Loss 12.8786 LearningRate 0.0726 Epoch: 2 Global Step: 36800 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:11:27,637-Speed 3061.64 samples/sec Loss 12.8534 LearningRate 0.0726 Epoch: 2 Global Step: 36810 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:11:30,941-Speed 3099.99 samples/sec Loss 12.8979 LearningRate 0.0726 Epoch: 2 Global Step: 36820 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:11:34,268-Speed 3078.16 samples/sec Loss 12.9558 LearningRate 0.0725 Epoch: 2 Global Step: 36830 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:11:37,580-Speed 3093.59 samples/sec Loss 12.9081 LearningRate 0.0725 Epoch: 2 Global Step: 36840 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:11:40,869-Speed 3113.38 samples/sec Loss 12.9030 LearningRate 0.0725 Epoch: 2 Global Step: 36850 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:11:44,191-Speed 3084.24 samples/sec Loss 12.7949 LearningRate 0.0725 Epoch: 2 Global Step: 36860 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:11:47,526-Speed 3071.28 samples/sec Loss 12.8247 LearningRate 0.0725 Epoch: 2 Global Step: 36870 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:11:50,823-Speed 3106.33 samples/sec Loss 12.9763 LearningRate 0.0725 Epoch: 2 Global Step: 36880 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:11:54,162-Speed 3068.34 samples/sec Loss 12.9014 LearningRate 0.0725 Epoch: 2 Global Step: 36890 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:11:57,445-Speed 3119.97 samples/sec Loss 12.9474 LearningRate 0.0725 Epoch: 2 Global Step: 36900 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:12:00,784-Speed 3067.44 samples/sec Loss 13.1510 LearningRate 0.0725 Epoch: 2 Global Step: 36910 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:12:04,143-Speed 3049.41 samples/sec Loss 12.8148 LearningRate 0.0725 Epoch: 2 Global Step: 36920 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:12:07,476-Speed 3072.95 samples/sec Loss 12.9353 LearningRate 0.0725 Epoch: 2 Global Step: 36930 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:12:10,822-Speed 3061.45 samples/sec Loss 12.7645 LearningRate 0.0725 Epoch: 2 Global Step: 36940 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:12:14,083-Speed 3141.26 samples/sec Loss 12.8052 LearningRate 0.0725 Epoch: 2 Global Step: 36950 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:12:17,413-Speed 3075.07 samples/sec Loss 12.9229 LearningRate 0.0725 Epoch: 2 Global Step: 36960 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:12:20,729-Speed 3089.54 samples/sec Loss 12.8385 LearningRate 0.0725 Epoch: 2 Global Step: 36970 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:12:24,089-Speed 3048.65 samples/sec Loss 12.9046 LearningRate 0.0724 Epoch: 2 Global Step: 36980 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:12:27,452-Speed 3044.99 samples/sec Loss 12.7703 LearningRate 0.0724 Epoch: 2 Global Step: 36990 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:12:30,790-Speed 3068.64 samples/sec Loss 12.8828 LearningRate 0.0724 Epoch: 2 Global Step: 37000 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:12:34,101-Speed 3093.42 samples/sec Loss 12.8340 LearningRate 0.0724 Epoch: 2 Global Step: 37010 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:12:37,421-Speed 3085.13 samples/sec Loss 13.0160 LearningRate 0.0724 Epoch: 2 Global Step: 37020 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:12:40,755-Speed 3072.47 samples/sec Loss 13.0049 LearningRate 0.0724 Epoch: 2 Global Step: 37030 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:12:44,152-Speed 3015.13 samples/sec Loss 12.9755 LearningRate 0.0724 Epoch: 2 Global Step: 37040 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:12:47,465-Speed 3091.49 samples/sec Loss 12.9060 LearningRate 0.0724 Epoch: 2 Global Step: 37050 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:12:50,839-Speed 3035.99 samples/sec Loss 12.8630 LearningRate 0.0724 Epoch: 2 Global Step: 37060 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:12:54,198-Speed 3049.77 samples/sec Loss 12.8024 LearningRate 0.0724 Epoch: 2 Global Step: 37070 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:12:57,509-Speed 3093.24 samples/sec Loss 12.8654 LearningRate 0.0724 Epoch: 2 Global Step: 37080 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:13:00,884-Speed 3035.63 samples/sec Loss 12.9864 LearningRate 0.0724 Epoch: 2 Global Step: 37090 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:13:04,241-Speed 3050.92 samples/sec Loss 13.0573 LearningRate 0.0724 Epoch: 2 Global Step: 37100 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:13:07,536-Speed 3108.83 samples/sec Loss 12.9189 LearningRate 0.0724 Epoch: 2 Global Step: 37110 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:13:10,934-Speed 3013.85 samples/sec Loss 12.9223 LearningRate 0.0723 Epoch: 2 Global Step: 37120 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:13:14,237-Speed 3101.79 samples/sec Loss 12.8100 LearningRate 0.0723 Epoch: 2 Global Step: 37130 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:13:17,575-Speed 3068.10 samples/sec Loss 12.7477 LearningRate 0.0723 Epoch: 2 Global Step: 37140 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:13:20,926-Speed 3057.12 samples/sec Loss 12.7529 LearningRate 0.0723 Epoch: 2 Global Step: 37150 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:13:24,259-Speed 3072.61 samples/sec Loss 12.8531 LearningRate 0.0723 Epoch: 2 Global Step: 37160 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:13:27,528-Speed 3133.84 samples/sec Loss 12.9308 LearningRate 0.0723 Epoch: 2 Global Step: 37170 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:13:30,919-Speed 3020.96 samples/sec Loss 12.8848 LearningRate 0.0723 Epoch: 2 Global Step: 37180 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-04-27 05:13:34,233-Speed 3090.02 samples/sec Loss 12.9662 LearningRate 0.0723 Epoch: 2 Global Step: 37190 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:13:37,499-Speed 3136.67 samples/sec Loss 12.9804 LearningRate 0.0723 Epoch: 2 Global Step: 37200 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:13:40,905-Speed 3007.34 samples/sec Loss 12.6714 LearningRate 0.0723 Epoch: 2 Global Step: 37210 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:13:44,232-Speed 3078.00 samples/sec Loss 12.6555 LearningRate 0.0723 Epoch: 2 Global Step: 37220 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:13:47,595-Speed 3045.75 samples/sec Loss 12.6770 LearningRate 0.0723 Epoch: 2 Global Step: 37230 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:13:50,939-Speed 3063.60 samples/sec Loss 12.8288 LearningRate 0.0723 Epoch: 2 Global Step: 37240 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:13:54,230-Speed 3111.88 samples/sec Loss 13.0078 LearningRate 0.0723 Epoch: 2 Global Step: 37250 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:13:57,974-Speed 2735.66 samples/sec Loss 12.9341 LearningRate 0.0723 Epoch: 2 Global Step: 37260 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:14:30,995-Speed 310.12 samples/sec Loss 11.9205 LearningRate 0.0722 Epoch: 3 Global Step: 37270 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:14:34,426-Speed 2986.57 samples/sec Loss 11.5031 LearningRate 0.0722 Epoch: 3 Global Step: 37280 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:14:37,940-Speed 2914.85 samples/sec Loss 11.3419 LearningRate 0.0722 Epoch: 3 Global Step: 37290 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:14:41,227-Speed 3116.31 samples/sec Loss 11.2460 LearningRate 0.0722 Epoch: 3 Global Step: 37300 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:14:44,555-Speed 3078.11 samples/sec Loss 11.3438 LearningRate 0.0722 Epoch: 3 Global Step: 37310 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:14:47,883-Speed 3078.09 samples/sec Loss 11.4066 LearningRate 0.0722 Epoch: 3 Global Step: 37320 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:14:51,157-Speed 3128.50 samples/sec Loss 11.4017 LearningRate 0.0722 Epoch: 3 Global Step: 37330 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:14:54,486-Speed 3077.45 samples/sec Loss 11.4480 LearningRate 0.0722 Epoch: 3 Global Step: 37340 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:14:57,848-Speed 3046.24 samples/sec Loss 11.3209 LearningRate 0.0722 Epoch: 3 Global Step: 37350 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:15:01,183-Speed 3072.58 samples/sec Loss 11.3723 LearningRate 0.0722 Epoch: 3 Global Step: 37360 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:15:04,634-Speed 2967.91 samples/sec Loss 11.3667 LearningRate 0.0722 Epoch: 3 Global Step: 37370 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:15:08,277-Speed 2811.32 samples/sec Loss 11.3046 LearningRate 0.0722 Epoch: 3 Global Step: 37380 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:15:11,596-Speed 3086.62 samples/sec Loss 11.4139 LearningRate 0.0722 Epoch: 3 Global Step: 37390 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:15:14,905-Speed 3094.99 samples/sec Loss 11.5662 LearningRate 0.0722 Epoch: 3 Global Step: 37400 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:15:18,241-Speed 3070.70 samples/sec Loss 11.3513 LearningRate 0.0721 Epoch: 3 Global Step: 37410 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:15:21,532-Speed 3112.65 samples/sec Loss 11.4208 LearningRate 0.0721 Epoch: 3 Global Step: 37420 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:15:24,880-Speed 3059.81 samples/sec Loss 11.4459 LearningRate 0.0721 Epoch: 3 Global Step: 37430 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:15:28,202-Speed 3083.59 samples/sec Loss 11.4062 LearningRate 0.0721 Epoch: 3 Global Step: 37440 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:15:31,641-Speed 2978.35 samples/sec Loss 11.4481 LearningRate 0.0721 Epoch: 3 Global Step: 37450 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:15:34,944-Speed 3100.77 samples/sec Loss 11.5933 LearningRate 0.0721 Epoch: 3 Global Step: 37460 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:15:38,235-Speed 3113.48 samples/sec Loss 11.4029 LearningRate 0.0721 Epoch: 3 Global Step: 37470 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:15:41,546-Speed 3093.61 samples/sec Loss 11.5003 LearningRate 0.0721 Epoch: 3 Global Step: 37480 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:15:44,954-Speed 3005.19 samples/sec Loss 11.4417 LearningRate 0.0721 Epoch: 3 Global Step: 37490 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:15:48,239-Speed 3118.73 samples/sec Loss 11.5585 LearningRate 0.0721 Epoch: 3 Global Step: 37500 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:15:51,507-Speed 3134.03 samples/sec Loss 11.5626 LearningRate 0.0721 Epoch: 3 Global Step: 37510 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:15:54,790-Speed 3120.40 samples/sec Loss 11.5870 LearningRate 0.0721 Epoch: 3 Global Step: 37520 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:15:58,108-Speed 3087.04 samples/sec Loss 11.5009 LearningRate 0.0721 Epoch: 3 Global Step: 37530 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:16:01,368-Speed 3142.42 samples/sec Loss 11.4991 LearningRate 0.0721 Epoch: 3 Global Step: 37540 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:16:04,631-Speed 3139.13 samples/sec Loss 11.4121 LearningRate 0.0721 Epoch: 3 Global Step: 37550 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:16:07,941-Speed 3094.73 samples/sec Loss 11.5383 LearningRate 0.0720 Epoch: 3 Global Step: 37560 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:16:11,233-Speed 3112.25 samples/sec Loss 11.4058 LearningRate 0.0720 Epoch: 3 Global Step: 37570 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:16:14,542-Speed 3095.39 samples/sec Loss 11.5314 LearningRate 0.0720 Epoch: 3 Global Step: 37580 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 05:16:17,858-Speed 3088.59 samples/sec Loss 11.5564 LearningRate 0.0720 Epoch: 3 Global Step: 37590 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 05:16:21,210-Speed 3056.46 samples/sec Loss 11.6092 LearningRate 0.0720 Epoch: 3 Global Step: 37600 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 05:16:24,531-Speed 3084.05 samples/sec Loss 11.5646 LearningRate 0.0720 Epoch: 3 Global Step: 37610 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 05:16:27,861-Speed 3076.25 samples/sec Loss 11.6669 LearningRate 0.0720 Epoch: 3 Global Step: 37620 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 05:16:31,157-Speed 3108.07 samples/sec Loss 11.6234 LearningRate 0.0720 Epoch: 3 Global Step: 37630 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 05:16:34,416-Speed 3142.78 samples/sec Loss 11.6427 LearningRate 0.0720 Epoch: 3 Global Step: 37640 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 05:16:37,799-Speed 3028.15 samples/sec Loss 11.6192 LearningRate 0.0720 Epoch: 3 Global Step: 37650 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 05:16:41,068-Speed 3133.28 samples/sec Loss 11.6650 LearningRate 0.0720 Epoch: 3 Global Step: 37660 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 05:16:44,367-Speed 3104.49 samples/sec Loss 11.6262 LearningRate 0.0720 Epoch: 3 Global Step: 37670 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 05:16:47,681-Speed 3090.91 samples/sec Loss 11.6907 LearningRate 0.0720 Epoch: 3 Global Step: 37680 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:16:51,002-Speed 3085.47 samples/sec Loss 11.5558 LearningRate 0.0720 Epoch: 3 Global Step: 37690 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:16:54,350-Speed 3060.11 samples/sec Loss 11.6180 LearningRate 0.0720 Epoch: 3 Global Step: 37700 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:16:57,725-Speed 3034.75 samples/sec Loss 11.7682 LearningRate 0.0719 Epoch: 3 Global Step: 37710 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:17:01,011-Speed 3117.27 samples/sec Loss 11.6428 LearningRate 0.0719 Epoch: 3 Global Step: 37720 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:17:04,278-Speed 3135.36 samples/sec Loss 11.8409 LearningRate 0.0719 Epoch: 3 Global Step: 37730 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:17:07,560-Speed 3121.23 samples/sec Loss 11.5750 LearningRate 0.0719 Epoch: 3 Global Step: 37740 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:17:10,888-Speed 3077.13 samples/sec Loss 11.7856 LearningRate 0.0719 Epoch: 3 Global Step: 37750 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:17:14,195-Speed 3097.43 samples/sec Loss 11.7447 LearningRate 0.0719 Epoch: 3 Global Step: 37760 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:17:17,489-Speed 3109.94 samples/sec Loss 11.9200 LearningRate 0.0719 Epoch: 3 Global Step: 37770 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:17:20,783-Speed 3109.46 samples/sec Loss 11.8256 LearningRate 0.0719 Epoch: 3 Global Step: 37780 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:17:24,131-Speed 3059.49 samples/sec Loss 11.7022 LearningRate 0.0719 Epoch: 3 Global Step: 37790 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:17:27,485-Speed 3054.64 samples/sec Loss 11.8034 LearningRate 0.0719 Epoch: 3 Global Step: 37800 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:17:30,881-Speed 3016.27 samples/sec Loss 11.8240 LearningRate 0.0719 Epoch: 3 Global Step: 37810 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:17:34,196-Speed 3089.78 samples/sec Loss 11.6209 LearningRate 0.0719 Epoch: 3 Global Step: 37820 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:17:37,483-Speed 3116.01 samples/sec Loss 11.7351 LearningRate 0.0719 Epoch: 3 Global Step: 37830 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 05:17:40,799-Speed 3089.75 samples/sec Loss 11.6422 LearningRate 0.0719 Epoch: 3 Global Step: 37840 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 05:17:44,064-Speed 3136.50 samples/sec Loss 11.7979 LearningRate 0.0718 Epoch: 3 Global Step: 37850 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 05:17:47,400-Speed 3070.72 samples/sec Loss 11.8036 LearningRate 0.0718 Epoch: 3 Global Step: 37860 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 05:17:51,299-Speed 2626.78 samples/sec Loss 11.8661 LearningRate 0.0718 Epoch: 3 Global Step: 37870 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 05:17:54,640-Speed 3065.97 samples/sec Loss 11.8632 LearningRate 0.0718 Epoch: 3 Global Step: 37880 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 05:17:58,006-Speed 3043.47 samples/sec Loss 11.6830 LearningRate 0.0718 Epoch: 3 Global Step: 37890 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 05:18:01,307-Speed 3103.16 samples/sec Loss 11.7900 LearningRate 0.0718 Epoch: 3 Global Step: 37900 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 05:18:04,610-Speed 3101.41 samples/sec Loss 11.8069 LearningRate 0.0718 Epoch: 3 Global Step: 37910 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 05:18:07,934-Speed 3081.10 samples/sec Loss 11.7834 LearningRate 0.0718 Epoch: 3 Global Step: 37920 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 05:18:11,319-Speed 3026.25 samples/sec Loss 12.0536 LearningRate 0.0718 Epoch: 3 Global Step: 37930 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:18:14,658-Speed 3067.18 samples/sec Loss 11.9290 LearningRate 0.0718 Epoch: 3 Global Step: 37940 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:18:17,970-Speed 3092.93 samples/sec Loss 11.9530 LearningRate 0.0718 Epoch: 3 Global Step: 37950 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:18:21,289-Speed 3086.08 samples/sec Loss 11.8556 LearningRate 0.0718 Epoch: 3 Global Step: 37960 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:18:24,634-Speed 3062.00 samples/sec Loss 11.9205 LearningRate 0.0718 Epoch: 3 Global Step: 37970 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:18:27,985-Speed 3057.02 samples/sec Loss 11.9414 LearningRate 0.0718 Epoch: 3 Global Step: 37980 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:18:31,326-Speed 3065.61 samples/sec Loss 11.7276 LearningRate 0.0718 Epoch: 3 Global Step: 37990 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:18:34,680-Speed 3053.78 samples/sec Loss 11.9858 LearningRate 0.0717 Epoch: 3 Global Step: 38000 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:18:37,995-Speed 3089.92 samples/sec Loss 11.8407 LearningRate 0.0717 Epoch: 3 Global Step: 38010 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:18:41,289-Speed 3109.68 samples/sec Loss 11.9894 LearningRate 0.0717 Epoch: 3 Global Step: 38020 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:18:44,647-Speed 3050.75 samples/sec Loss 11.9589 LearningRate 0.0717 Epoch: 3 Global Step: 38030 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-04-27 05:18:47,987-Speed 3066.58 samples/sec Loss 11.9274 LearningRate 0.0717 Epoch: 3 Global Step: 38040 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-04-27 05:18:51,352-Speed 3043.97 samples/sec Loss 11.9925 LearningRate 0.0717 Epoch: 3 Global Step: 38050 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:18:54,665-Speed 3092.18 samples/sec Loss 12.0199 LearningRate 0.0717 Epoch: 3 Global Step: 38060 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:18:58,006-Speed 3065.79 samples/sec Loss 11.7407 LearningRate 0.0717 Epoch: 3 Global Step: 38070 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:19:01,372-Speed 3043.96 samples/sec Loss 12.0449 LearningRate 0.0717 Epoch: 3 Global Step: 38080 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:19:04,734-Speed 3046.79 samples/sec Loss 11.8248 LearningRate 0.0717 Epoch: 3 Global Step: 38090 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:19:08,059-Speed 3080.52 samples/sec Loss 12.0219 LearningRate 0.0717 Epoch: 3 Global Step: 38100 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:19:11,378-Speed 3086.52 samples/sec Loss 12.0451 LearningRate 0.0717 Epoch: 3 Global Step: 38110 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:19:14,719-Speed 3065.32 samples/sec Loss 12.0044 LearningRate 0.0717 Epoch: 3 Global Step: 38120 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:19:18,044-Speed 3080.60 samples/sec Loss 11.8876 LearningRate 0.0717 Epoch: 3 Global Step: 38130 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:19:21,471-Speed 2988.33 samples/sec Loss 11.8823 LearningRate 0.0717 Epoch: 3 Global Step: 38140 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:19:24,794-Speed 3082.45 samples/sec Loss 12.1163 LearningRate 0.0716 Epoch: 3 Global Step: 38150 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:19:28,136-Speed 3064.97 samples/sec Loss 12.0786 LearningRate 0.0716 Epoch: 3 Global Step: 38160 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:19:31,491-Speed 3053.61 samples/sec Loss 11.9061 LearningRate 0.0716 Epoch: 3 Global Step: 38170 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:19:34,905-Speed 3000.45 samples/sec Loss 11.9836 LearningRate 0.0716 Epoch: 3 Global Step: 38180 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:19:38,230-Speed 3080.53 samples/sec Loss 12.0793 LearningRate 0.0716 Epoch: 3 Global Step: 38190 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 05:19:41,566-Speed 3069.65 samples/sec Loss 12.0546 LearningRate 0.0716 Epoch: 3 Global Step: 38200 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 05:19:44,946-Speed 3031.01 samples/sec Loss 12.1910 LearningRate 0.0716 Epoch: 3 Global Step: 38210 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 05:19:48,241-Speed 3108.26 samples/sec Loss 12.2086 LearningRate 0.0716 Epoch: 3 Global Step: 38220 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 05:19:51,598-Speed 3052.15 samples/sec Loss 11.9569 LearningRate 0.0716 Epoch: 3 Global Step: 38230 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 05:19:54,994-Speed 3015.46 samples/sec Loss 11.9658 LearningRate 0.0716 Epoch: 3 Global Step: 38240 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 05:19:58,346-Speed 3056.04 samples/sec Loss 11.8958 LearningRate 0.0716 Epoch: 3 Global Step: 38250 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 05:20:01,686-Speed 3067.19 samples/sec Loss 12.0074 LearningRate 0.0716 Epoch: 3 Global Step: 38260 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 05:20:05,052-Speed 3042.88 samples/sec Loss 12.2445 LearningRate 0.0716 Epoch: 3 Global Step: 38270 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 05:20:08,390-Speed 3069.04 samples/sec Loss 12.1076 LearningRate 0.0716 Epoch: 3 Global Step: 38280 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 05:20:11,701-Speed 3093.22 samples/sec Loss 12.1438 LearningRate 0.0715 Epoch: 3 Global Step: 38290 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:20:15,061-Speed 3048.36 samples/sec Loss 12.1913 LearningRate 0.0715 Epoch: 3 Global Step: 38300 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:20:18,358-Speed 3106.71 samples/sec Loss 11.9882 LearningRate 0.0715 Epoch: 3 Global Step: 38310 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:20:21,671-Speed 3092.93 samples/sec Loss 12.1582 LearningRate 0.0715 Epoch: 3 Global Step: 38320 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:20:25,016-Speed 3061.55 samples/sec Loss 12.0691 LearningRate 0.0715 Epoch: 3 Global Step: 38330 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:20:28,391-Speed 3035.42 samples/sec Loss 12.1104 LearningRate 0.0715 Epoch: 3 Global Step: 38340 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:20:31,780-Speed 3022.44 samples/sec Loss 12.2594 LearningRate 0.0715 Epoch: 3 Global Step: 38350 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:20:35,069-Speed 3114.15 samples/sec Loss 12.1816 LearningRate 0.0715 Epoch: 3 Global Step: 38360 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:20:38,352-Speed 3119.85 samples/sec Loss 12.3508 LearningRate 0.0715 Epoch: 3 Global Step: 38370 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:20:41,637-Speed 3118.66 samples/sec Loss 12.2759 LearningRate 0.0715 Epoch: 3 Global Step: 38380 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:20:44,943-Speed 3098.28 samples/sec Loss 12.1389 LearningRate 0.0715 Epoch: 3 Global Step: 38390 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:20:48,363-Speed 2994.18 samples/sec Loss 12.1294 LearningRate 0.0715 Epoch: 3 Global Step: 38400 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:20:51,686-Speed 3083.19 samples/sec Loss 12.1652 LearningRate 0.0715 Epoch: 3 Global Step: 38410 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:20:54,991-Speed 3098.56 samples/sec Loss 11.9764 LearningRate 0.0715 Epoch: 3 Global Step: 38420 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:20:58,342-Speed 3057.57 samples/sec Loss 12.2184 LearningRate 0.0715 Epoch: 3 Global Step: 38430 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:21:01,692-Speed 3056.61 samples/sec Loss 12.0630 LearningRate 0.0714 Epoch: 3 Global Step: 38440 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:21:05,019-Speed 3079.71 samples/sec Loss 12.1810 LearningRate 0.0714 Epoch: 3 Global Step: 38450 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:21:08,368-Speed 3058.03 samples/sec Loss 12.0914 LearningRate 0.0714 Epoch: 3 Global Step: 38460 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:21:11,668-Speed 3104.35 samples/sec Loss 12.0928 LearningRate 0.0714 Epoch: 3 Global Step: 38470 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:21:15,021-Speed 3054.71 samples/sec Loss 12.2230 LearningRate 0.0714 Epoch: 3 Global Step: 38480 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:21:18,315-Speed 3109.12 samples/sec Loss 12.1104 LearningRate 0.0714 Epoch: 3 Global Step: 38490 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:21:21,680-Speed 3044.32 samples/sec Loss 12.2006 LearningRate 0.0714 Epoch: 3 Global Step: 38500 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:21:25,009-Speed 3076.58 samples/sec Loss 12.2275 LearningRate 0.0714 Epoch: 3 Global Step: 38510 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:21:28,303-Speed 3109.52 samples/sec Loss 12.1747 LearningRate 0.0714 Epoch: 3 Global Step: 38520 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 05:21:31,586-Speed 3120.20 samples/sec Loss 12.2843 LearningRate 0.0714 Epoch: 3 Global Step: 38530 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 05:21:34,920-Speed 3072.07 samples/sec Loss 12.1800 LearningRate 0.0714 Epoch: 3 Global Step: 38540 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 05:21:38,287-Speed 3042.38 samples/sec Loss 12.2954 LearningRate 0.0714 Epoch: 3 Global Step: 38550 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 05:21:41,582-Speed 3108.35 samples/sec Loss 12.1370 LearningRate 0.0714 Epoch: 3 Global Step: 38560 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 05:21:44,937-Speed 3053.42 samples/sec Loss 12.1298 LearningRate 0.0714 Epoch: 3 Global Step: 38570 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 05:21:48,319-Speed 3028.77 samples/sec Loss 12.3969 LearningRate 0.0714 Epoch: 3 Global Step: 38580 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 05:21:51,679-Speed 3048.18 samples/sec Loss 12.1990 LearningRate 0.0713 Epoch: 3 Global Step: 38590 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 05:21:54,996-Speed 3087.57 samples/sec Loss 12.2661 LearningRate 0.0713 Epoch: 3 Global Step: 38600 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 05:21:58,306-Speed 3095.26 samples/sec Loss 12.0357 LearningRate 0.0713 Epoch: 3 Global Step: 38610 Fp16 Grad Scale: 65536 Required: 20 hours Training: 2022-04-27 05:22:01,642-Speed 3070.26 samples/sec Loss 12.1960 LearningRate 0.0713 Epoch: 3 Global Step: 38620 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:22:04,998-Speed 3051.78 samples/sec Loss 12.1963 LearningRate 0.0713 Epoch: 3 Global Step: 38630 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:22:08,370-Speed 3037.87 samples/sec Loss 11.9942 LearningRate 0.0713 Epoch: 3 Global Step: 38640 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:22:11,723-Speed 3057.52 samples/sec Loss 12.3534 LearningRate 0.0713 Epoch: 3 Global Step: 38650 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:22:15,173-Speed 2968.90 samples/sec Loss 12.2677 LearningRate 0.0713 Epoch: 3 Global Step: 38660 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:22:18,493-Speed 3085.72 samples/sec Loss 12.2452 LearningRate 0.0713 Epoch: 3 Global Step: 38670 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:22:21,798-Speed 3099.44 samples/sec Loss 12.1382 LearningRate 0.0713 Epoch: 3 Global Step: 38680 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:22:25,188-Speed 3021.17 samples/sec Loss 12.3438 LearningRate 0.0713 Epoch: 3 Global Step: 38690 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:22:28,458-Speed 3131.95 samples/sec Loss 12.2088 LearningRate 0.0713 Epoch: 3 Global Step: 38700 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:22:31,725-Speed 3135.79 samples/sec Loss 12.2470 LearningRate 0.0713 Epoch: 3 Global Step: 38710 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:22:35,007-Speed 3119.98 samples/sec Loss 12.3032 LearningRate 0.0713 Epoch: 3 Global Step: 38720 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:22:38,331-Speed 3081.66 samples/sec Loss 12.2575 LearningRate 0.0712 Epoch: 3 Global Step: 38730 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:22:41,697-Speed 3043.41 samples/sec Loss 12.2594 LearningRate 0.0712 Epoch: 3 Global Step: 38740 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:22:44,983-Speed 3116.16 samples/sec Loss 12.1834 LearningRate 0.0712 Epoch: 3 Global Step: 38750 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:22:48,360-Speed 3034.29 samples/sec Loss 12.1471 LearningRate 0.0712 Epoch: 3 Global Step: 38760 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:22:51,631-Speed 3131.23 samples/sec Loss 12.1082 LearningRate 0.0712 Epoch: 3 Global Step: 38770 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:22:55,067-Speed 2980.20 samples/sec Loss 12.3610 LearningRate 0.0712 Epoch: 3 Global Step: 38780 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:22:58,393-Speed 3079.69 samples/sec Loss 12.1750 LearningRate 0.0712 Epoch: 3 Global Step: 38790 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:23:01,813-Speed 2995.16 samples/sec Loss 12.1507 LearningRate 0.0712 Epoch: 3 Global Step: 38800 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:23:05,186-Speed 3037.77 samples/sec Loss 12.3541 LearningRate 0.0712 Epoch: 3 Global Step: 38810 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:23:08,511-Speed 3079.84 samples/sec Loss 12.1784 LearningRate 0.0712 Epoch: 3 Global Step: 38820 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:23:11,811-Speed 3104.73 samples/sec Loss 12.3251 LearningRate 0.0712 Epoch: 3 Global Step: 38830 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:23:15,173-Speed 3046.32 samples/sec Loss 12.3657 LearningRate 0.0712 Epoch: 3 Global Step: 38840 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:23:18,485-Speed 3092.75 samples/sec Loss 12.3164 LearningRate 0.0712 Epoch: 3 Global Step: 38850 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:23:21,838-Speed 3054.24 samples/sec Loss 12.1710 LearningRate 0.0712 Epoch: 3 Global Step: 38860 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:23:25,119-Speed 3121.71 samples/sec Loss 12.2899 LearningRate 0.0712 Epoch: 3 Global Step: 38870 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:23:28,413-Speed 3110.33 samples/sec Loss 12.3232 LearningRate 0.0711 Epoch: 3 Global Step: 38880 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:23:31,733-Speed 3085.40 samples/sec Loss 12.0157 LearningRate 0.0711 Epoch: 3 Global Step: 38890 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:23:35,075-Speed 3064.47 samples/sec Loss 12.3465 LearningRate 0.0711 Epoch: 3 Global Step: 38900 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:23:38,363-Speed 3115.60 samples/sec Loss 12.2577 LearningRate 0.0711 Epoch: 3 Global Step: 38910 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:23:41,663-Speed 3104.40 samples/sec Loss 12.2596 LearningRate 0.0711 Epoch: 3 Global Step: 38920 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:23:45,030-Speed 3042.69 samples/sec Loss 12.3845 LearningRate 0.0711 Epoch: 3 Global Step: 38930 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:23:48,397-Speed 3041.96 samples/sec Loss 12.3351 LearningRate 0.0711 Epoch: 3 Global Step: 38940 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:23:51,756-Speed 3049.52 samples/sec Loss 12.3550 LearningRate 0.0711 Epoch: 3 Global Step: 38950 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:23:55,047-Speed 3111.83 samples/sec Loss 12.3568 LearningRate 0.0711 Epoch: 3 Global Step: 38960 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:23:58,401-Speed 3054.39 samples/sec Loss 12.3820 LearningRate 0.0711 Epoch: 3 Global Step: 38970 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:24:01,726-Speed 3080.92 samples/sec Loss 12.2201 LearningRate 0.0711 Epoch: 3 Global Step: 38980 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:24:05,017-Speed 3112.03 samples/sec Loss 12.2793 LearningRate 0.0711 Epoch: 3 Global Step: 38990 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:24:08,326-Speed 3095.43 samples/sec Loss 12.3387 LearningRate 0.0711 Epoch: 3 Global Step: 39000 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:24:11,658-Speed 3075.21 samples/sec Loss 12.3265 LearningRate 0.0711 Epoch: 3 Global Step: 39010 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:24:14,972-Speed 3090.59 samples/sec Loss 12.3376 LearningRate 0.0711 Epoch: 3 Global Step: 39020 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-04-27 05:24:18,260-Speed 3115.61 samples/sec Loss 12.4751 LearningRate 0.0710 Epoch: 3 Global Step: 39030 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:24:21,550-Speed 3113.06 samples/sec Loss 12.4039 LearningRate 0.0710 Epoch: 3 Global Step: 39040 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:24:24,891-Speed 3065.71 samples/sec Loss 12.4563 LearningRate 0.0710 Epoch: 3 Global Step: 39050 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:24:28,268-Speed 3032.94 samples/sec Loss 12.3651 LearningRate 0.0710 Epoch: 3 Global Step: 39060 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:24:31,602-Speed 3073.52 samples/sec Loss 12.4255 LearningRate 0.0710 Epoch: 3 Global Step: 39070 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:24:34,919-Speed 3088.16 samples/sec Loss 12.3663 LearningRate 0.0710 Epoch: 3 Global Step: 39080 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:24:38,338-Speed 2996.15 samples/sec Loss 12.3953 LearningRate 0.0710 Epoch: 3 Global Step: 39090 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:24:41,746-Speed 3005.02 samples/sec Loss 12.3752 LearningRate 0.0710 Epoch: 3 Global Step: 39100 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:24:45,126-Speed 3031.03 samples/sec Loss 12.2951 LearningRate 0.0710 Epoch: 3 Global Step: 39110 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:24:48,475-Speed 3058.25 samples/sec Loss 12.3152 LearningRate 0.0710 Epoch: 3 Global Step: 39120 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:24:51,731-Speed 3145.92 samples/sec Loss 12.4722 LearningRate 0.0710 Epoch: 3 Global Step: 39130 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:24:55,106-Speed 3034.80 samples/sec Loss 12.4999 LearningRate 0.0710 Epoch: 3 Global Step: 39140 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:24:58,430-Speed 3081.30 samples/sec Loss 12.4214 LearningRate 0.0710 Epoch: 3 Global Step: 39150 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:25:01,813-Speed 3028.03 samples/sec Loss 12.4076 LearningRate 0.0710 Epoch: 3 Global Step: 39160 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:25:05,120-Speed 3097.72 samples/sec Loss 12.4416 LearningRate 0.0710 Epoch: 3 Global Step: 39170 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:25:08,491-Speed 3038.95 samples/sec Loss 12.5211 LearningRate 0.0709 Epoch: 3 Global Step: 39180 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:25:11,842-Speed 3056.88 samples/sec Loss 12.3173 LearningRate 0.0709 Epoch: 3 Global Step: 39190 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:25:15,150-Speed 3095.86 samples/sec Loss 12.5403 LearningRate 0.0709 Epoch: 3 Global Step: 39200 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:25:18,428-Speed 3125.45 samples/sec Loss 12.5795 LearningRate 0.0709 Epoch: 3 Global Step: 39210 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:25:21,766-Speed 3068.89 samples/sec Loss 12.3762 LearningRate 0.0709 Epoch: 3 Global Step: 39220 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:25:25,065-Speed 3104.51 samples/sec Loss 12.3915 LearningRate 0.0709 Epoch: 3 Global Step: 39230 Fp16 Grad Scale: 262144 Required: 20 hours Training: 2022-04-27 05:25:28,351-Speed 3118.28 samples/sec Loss 12.3515 LearningRate 0.0709 Epoch: 3 Global Step: 39240 Fp16 Grad Scale: 131072 Required: 20 hours Training: 2022-04-27 05:25:31,625-Speed 3128.92 samples/sec Loss 12.4495 LearningRate 0.0709 Epoch: 3 Global Step: 39250 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:25:34,980-Speed 3054.13 samples/sec Loss 12.2142 LearningRate 0.0709 Epoch: 3 Global Step: 39260 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:25:38,274-Speed 3109.35 samples/sec Loss 12.4218 LearningRate 0.0709 Epoch: 3 Global Step: 39270 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:25:41,625-Speed 3056.39 samples/sec Loss 12.4805 LearningRate 0.0709 Epoch: 3 Global Step: 39280 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:25:44,964-Speed 3067.50 samples/sec Loss 12.4209 LearningRate 0.0709 Epoch: 3 Global Step: 39290 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:25:48,270-Speed 3099.06 samples/sec Loss 12.5576 LearningRate 0.0709 Epoch: 3 Global Step: 39300 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:25:51,648-Speed 3032.32 samples/sec Loss 12.5759 LearningRate 0.0709 Epoch: 3 Global Step: 39310 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:25:55,074-Speed 2990.16 samples/sec Loss 12.3980 LearningRate 0.0708 Epoch: 3 Global Step: 39320 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:25:58,464-Speed 3021.35 samples/sec Loss 12.4466 LearningRate 0.0708 Epoch: 3 Global Step: 39330 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:26:01,799-Speed 3071.70 samples/sec Loss 12.4160 LearningRate 0.0708 Epoch: 3 Global Step: 39340 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:26:05,175-Speed 3033.61 samples/sec Loss 12.2531 LearningRate 0.0708 Epoch: 3 Global Step: 39350 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:26:08,549-Speed 3036.25 samples/sec Loss 12.3855 LearningRate 0.0708 Epoch: 3 Global Step: 39360 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:26:11,894-Speed 3062.27 samples/sec Loss 12.4068 LearningRate 0.0708 Epoch: 3 Global Step: 39370 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:26:15,174-Speed 3122.80 samples/sec Loss 12.4296 LearningRate 0.0708 Epoch: 3 Global Step: 39380 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:26:18,522-Speed 3059.89 samples/sec Loss 12.2724 LearningRate 0.0708 Epoch: 3 Global Step: 39390 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:26:21,936-Speed 3000.42 samples/sec Loss 12.3088 LearningRate 0.0708 Epoch: 3 Global Step: 39400 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:26:25,325-Speed 3021.64 samples/sec Loss 12.4758 LearningRate 0.0708 Epoch: 3 Global Step: 39410 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:26:28,693-Speed 3041.86 samples/sec Loss 12.5726 LearningRate 0.0708 Epoch: 3 Global Step: 39420 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:26:31,988-Speed 3108.23 samples/sec Loss 12.4493 LearningRate 0.0708 Epoch: 3 Global Step: 39430 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:26:35,373-Speed 3025.64 samples/sec Loss 12.5835 LearningRate 0.0708 Epoch: 3 Global Step: 39440 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:26:38,783-Speed 3003.87 samples/sec Loss 12.4324 LearningRate 0.0708 Epoch: 3 Global Step: 39450 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:26:42,097-Speed 3091.40 samples/sec Loss 12.4517 LearningRate 0.0708 Epoch: 3 Global Step: 39460 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:26:45,410-Speed 3091.62 samples/sec Loss 12.4188 LearningRate 0.0707 Epoch: 3 Global Step: 39470 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:26:48,761-Speed 3056.97 samples/sec Loss 12.4432 LearningRate 0.0707 Epoch: 3 Global Step: 39480 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:26:53,302-Speed 2255.93 samples/sec Loss 12.5188 LearningRate 0.0707 Epoch: 3 Global Step: 39490 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:26:57,990-Speed 2184.69 samples/sec Loss 12.4680 LearningRate 0.0707 Epoch: 3 Global Step: 39500 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:27:01,320-Speed 3076.26 samples/sec Loss 12.4680 LearningRate 0.0707 Epoch: 3 Global Step: 39510 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:27:04,687-Speed 3042.48 samples/sec Loss 12.3784 LearningRate 0.0707 Epoch: 3 Global Step: 39520 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:27:08,038-Speed 3056.63 samples/sec Loss 12.5120 LearningRate 0.0707 Epoch: 3 Global Step: 39530 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:27:11,416-Speed 3032.28 samples/sec Loss 12.5623 LearningRate 0.0707 Epoch: 3 Global Step: 39540 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:27:14,800-Speed 3027.72 samples/sec Loss 12.3467 LearningRate 0.0707 Epoch: 3 Global Step: 39550 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:27:18,126-Speed 3079.27 samples/sec Loss 12.5298 LearningRate 0.0707 Epoch: 3 Global Step: 39560 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:27:21,473-Speed 3061.04 samples/sec Loss 12.3783 LearningRate 0.0707 Epoch: 3 Global Step: 39570 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:27:24,801-Speed 3077.33 samples/sec Loss 12.5238 LearningRate 0.0707 Epoch: 3 Global Step: 39580 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:27:28,088-Speed 3117.70 samples/sec Loss 12.4797 LearningRate 0.0707 Epoch: 3 Global Step: 39590 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:27:31,376-Speed 3115.05 samples/sec Loss 12.3444 LearningRate 0.0707 Epoch: 3 Global Step: 39600 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:27:34,726-Speed 3057.23 samples/sec Loss 12.2828 LearningRate 0.0707 Epoch: 3 Global Step: 39610 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:27:38,115-Speed 3022.30 samples/sec Loss 12.3502 LearningRate 0.0706 Epoch: 3 Global Step: 39620 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:27:41,481-Speed 3044.10 samples/sec Loss 12.6049 LearningRate 0.0706 Epoch: 3 Global Step: 39630 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:27:44,809-Speed 3077.58 samples/sec Loss 12.5241 LearningRate 0.0706 Epoch: 3 Global Step: 39640 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:27:48,122-Speed 3091.59 samples/sec Loss 12.4046 LearningRate 0.0706 Epoch: 3 Global Step: 39650 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:27:51,484-Speed 3046.73 samples/sec Loss 12.3180 LearningRate 0.0706 Epoch: 3 Global Step: 39660 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:27:54,782-Speed 3106.04 samples/sec Loss 12.4850 LearningRate 0.0706 Epoch: 3 Global Step: 39670 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:27:58,068-Speed 3117.31 samples/sec Loss 12.3339 LearningRate 0.0706 Epoch: 3 Global Step: 39680 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:28:01,475-Speed 3006.10 samples/sec Loss 12.4977 LearningRate 0.0706 Epoch: 3 Global Step: 39690 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:28:04,866-Speed 3021.17 samples/sec Loss 12.5315 LearningRate 0.0706 Epoch: 3 Global Step: 39700 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:28:08,219-Speed 3054.99 samples/sec Loss 12.2665 LearningRate 0.0706 Epoch: 3 Global Step: 39710 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:28:11,617-Speed 3015.36 samples/sec Loss 12.4228 LearningRate 0.0706 Epoch: 3 Global Step: 39720 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:28:14,906-Speed 3113.96 samples/sec Loss 12.4609 LearningRate 0.0706 Epoch: 3 Global Step: 39730 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:28:18,291-Speed 3026.38 samples/sec Loss 12.4611 LearningRate 0.0706 Epoch: 3 Global Step: 39740 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:28:21,726-Speed 2981.85 samples/sec Loss 12.2233 LearningRate 0.0706 Epoch: 3 Global Step: 39750 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:28:25,064-Speed 3068.70 samples/sec Loss 12.4592 LearningRate 0.0706 Epoch: 3 Global Step: 39760 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:28:28,413-Speed 3058.28 samples/sec Loss 12.3910 LearningRate 0.0705 Epoch: 3 Global Step: 39770 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:28:31,808-Speed 3017.36 samples/sec Loss 12.4739 LearningRate 0.0705 Epoch: 3 Global Step: 39780 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:28:35,131-Speed 3082.60 samples/sec Loss 12.4855 LearningRate 0.0705 Epoch: 3 Global Step: 39790 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:28:38,504-Speed 3036.56 samples/sec Loss 12.5104 LearningRate 0.0705 Epoch: 3 Global Step: 39800 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-04-27 05:28:41,828-Speed 3081.48 samples/sec Loss 12.3974 LearningRate 0.0705 Epoch: 3 Global Step: 39810 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:28:45,134-Speed 3098.13 samples/sec Loss 12.5955 LearningRate 0.0705 Epoch: 3 Global Step: 39820 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:28:48,500-Speed 3043.30 samples/sec Loss 12.4848 LearningRate 0.0705 Epoch: 3 Global Step: 39830 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:28:51,835-Speed 3071.85 samples/sec Loss 12.4704 LearningRate 0.0705 Epoch: 3 Global Step: 39840 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:28:55,200-Speed 3043.37 samples/sec Loss 12.5525 LearningRate 0.0705 Epoch: 3 Global Step: 39850 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:28:58,556-Speed 3052.57 samples/sec Loss 12.4954 LearningRate 0.0705 Epoch: 3 Global Step: 39860 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:29:01,893-Speed 3069.86 samples/sec Loss 12.4105 LearningRate 0.0705 Epoch: 3 Global Step: 39870 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:29:05,166-Speed 3129.25 samples/sec Loss 12.3791 LearningRate 0.0705 Epoch: 3 Global Step: 39880 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:29:08,490-Speed 3081.07 samples/sec Loss 12.4411 LearningRate 0.0705 Epoch: 3 Global Step: 39890 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:29:11,786-Speed 3107.91 samples/sec Loss 12.3842 LearningRate 0.0705 Epoch: 3 Global Step: 39900 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:29:15,112-Speed 3080.06 samples/sec Loss 12.4662 LearningRate 0.0704 Epoch: 3 Global Step: 39910 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:29:18,428-Speed 3089.39 samples/sec Loss 12.4879 LearningRate 0.0704 Epoch: 3 Global Step: 39920 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:29:21,743-Speed 3089.34 samples/sec Loss 12.4418 LearningRate 0.0704 Epoch: 3 Global Step: 39930 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:29:25,081-Speed 3068.85 samples/sec Loss 12.5027 LearningRate 0.0704 Epoch: 3 Global Step: 39940 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:29:28,447-Speed 3043.11 samples/sec Loss 12.4592 LearningRate 0.0704 Epoch: 3 Global Step: 39950 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:29:31,906-Speed 2961.08 samples/sec Loss 12.5647 LearningRate 0.0704 Epoch: 3 Global Step: 39960 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:29:35,193-Speed 3116.65 samples/sec Loss 12.6003 LearningRate 0.0704 Epoch: 3 Global Step: 39970 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:29:38,480-Speed 3116.22 samples/sec Loss 12.4405 LearningRate 0.0704 Epoch: 3 Global Step: 39980 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:29:41,861-Speed 3029.15 samples/sec Loss 12.5136 LearningRate 0.0704 Epoch: 3 Global Step: 39990 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:29:45,171-Speed 3094.78 samples/sec Loss 12.5506 LearningRate 0.0704 Epoch: 3 Global Step: 40000 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:29:48,530-Speed 3048.92 samples/sec Loss 12.6624 LearningRate 0.0704 Epoch: 3 Global Step: 40010 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:29:51,923-Speed 3019.90 samples/sec Loss 12.4312 LearningRate 0.0704 Epoch: 3 Global Step: 40020 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:29:55,301-Speed 3032.19 samples/sec Loss 12.6056 LearningRate 0.0704 Epoch: 3 Global Step: 40030 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:29:58,640-Speed 3067.61 samples/sec Loss 12.5346 LearningRate 0.0704 Epoch: 3 Global Step: 40040 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:30:01,959-Speed 3085.88 samples/sec Loss 12.5525 LearningRate 0.0704 Epoch: 3 Global Step: 40050 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:30:05,299-Speed 3066.49 samples/sec Loss 12.4807 LearningRate 0.0703 Epoch: 3 Global Step: 40060 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:30:08,689-Speed 3021.75 samples/sec Loss 12.6405 LearningRate 0.0703 Epoch: 3 Global Step: 40070 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:30:12,006-Speed 3088.03 samples/sec Loss 12.5056 LearningRate 0.0703 Epoch: 3 Global Step: 40080 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:30:15,358-Speed 3056.07 samples/sec Loss 12.5758 LearningRate 0.0703 Epoch: 3 Global Step: 40090 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:30:18,658-Speed 3103.78 samples/sec Loss 12.3732 LearningRate 0.0703 Epoch: 3 Global Step: 40100 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:30:21,988-Speed 3076.23 samples/sec Loss 12.5930 LearningRate 0.0703 Epoch: 3 Global Step: 40110 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:30:25,284-Speed 3107.59 samples/sec Loss 12.4137 LearningRate 0.0703 Epoch: 3 Global Step: 40120 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-04-27 05:30:28,612-Speed 3077.88 samples/sec Loss 12.5349 LearningRate 0.0703 Epoch: 3 Global Step: 40130 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:30:31,944-Speed 3074.71 samples/sec Loss 12.6049 LearningRate 0.0703 Epoch: 3 Global Step: 40140 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:30:35,316-Speed 3037.57 samples/sec Loss 12.3937 LearningRate 0.0703 Epoch: 3 Global Step: 40150 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:30:38,644-Speed 3077.47 samples/sec Loss 12.3588 LearningRate 0.0703 Epoch: 3 Global Step: 40160 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:30:41,973-Speed 3077.33 samples/sec Loss 12.6301 LearningRate 0.0703 Epoch: 3 Global Step: 40170 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:30:45,348-Speed 3034.47 samples/sec Loss 12.6130 LearningRate 0.0703 Epoch: 3 Global Step: 40180 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:30:48,687-Speed 3067.76 samples/sec Loss 12.4910 LearningRate 0.0703 Epoch: 3 Global Step: 40190 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:30:52,002-Speed 3090.48 samples/sec Loss 12.6477 LearningRate 0.0703 Epoch: 3 Global Step: 40200 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:30:55,362-Speed 3048.52 samples/sec Loss 12.5845 LearningRate 0.0702 Epoch: 3 Global Step: 40210 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:30:58,659-Speed 3107.42 samples/sec Loss 12.4460 LearningRate 0.0702 Epoch: 3 Global Step: 40220 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:31:02,046-Speed 3023.80 samples/sec Loss 12.5609 LearningRate 0.0702 Epoch: 3 Global Step: 40230 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:31:05,383-Speed 3069.24 samples/sec Loss 12.5193 LearningRate 0.0702 Epoch: 3 Global Step: 40240 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:31:08,672-Speed 3114.11 samples/sec Loss 12.4122 LearningRate 0.0702 Epoch: 3 Global Step: 40250 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:31:11,961-Speed 3115.35 samples/sec Loss 12.4636 LearningRate 0.0702 Epoch: 3 Global Step: 40260 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:31:15,325-Speed 3044.10 samples/sec Loss 12.5726 LearningRate 0.0702 Epoch: 3 Global Step: 40270 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:31:18,611-Speed 3118.02 samples/sec Loss 12.5618 LearningRate 0.0702 Epoch: 3 Global Step: 40280 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:31:21,915-Speed 3099.60 samples/sec Loss 12.7461 LearningRate 0.0702 Epoch: 3 Global Step: 40290 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:31:25,248-Speed 3073.89 samples/sec Loss 12.5610 LearningRate 0.0702 Epoch: 3 Global Step: 40300 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:31:28,562-Speed 3090.47 samples/sec Loss 12.4990 LearningRate 0.0702 Epoch: 3 Global Step: 40310 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:31:31,861-Speed 3104.98 samples/sec Loss 12.5312 LearningRate 0.0702 Epoch: 3 Global Step: 40320 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:31:35,143-Speed 3121.02 samples/sec Loss 12.3843 LearningRate 0.0702 Epoch: 3 Global Step: 40330 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-04-27 05:31:38,460-Speed 3088.01 samples/sec Loss 12.6623 LearningRate 0.0702 Epoch: 3 Global Step: 40340 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:31:41,789-Speed 3076.52 samples/sec Loss 12.5734 LearningRate 0.0702 Epoch: 3 Global Step: 40350 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:31:45,100-Speed 3093.58 samples/sec Loss 12.5473 LearningRate 0.0701 Epoch: 3 Global Step: 40360 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:31:48,408-Speed 3096.14 samples/sec Loss 12.5449 LearningRate 0.0701 Epoch: 3 Global Step: 40370 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:31:51,746-Speed 3069.07 samples/sec Loss 12.6228 LearningRate 0.0701 Epoch: 3 Global Step: 40380 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:31:55,101-Speed 3052.66 samples/sec Loss 12.5582 LearningRate 0.0701 Epoch: 3 Global Step: 40390 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:31:58,482-Speed 3029.43 samples/sec Loss 12.4855 LearningRate 0.0701 Epoch: 3 Global Step: 40400 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:32:01,864-Speed 3029.42 samples/sec Loss 12.5993 LearningRate 0.0701 Epoch: 3 Global Step: 40410 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:32:05,210-Speed 3061.08 samples/sec Loss 12.7215 LearningRate 0.0701 Epoch: 3 Global Step: 40420 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:32:08,547-Speed 3069.39 samples/sec Loss 12.5495 LearningRate 0.0701 Epoch: 3 Global Step: 40430 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:32:11,920-Speed 3037.26 samples/sec Loss 12.4941 LearningRate 0.0701 Epoch: 3 Global Step: 40440 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-04-27 05:32:15,194-Speed 3128.10 samples/sec Loss 12.5878 LearningRate 0.0701 Epoch: 3 Global Step: 40450 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:32:18,509-Speed 3089.96 samples/sec Loss 12.4186 LearningRate 0.0701 Epoch: 3 Global Step: 40460 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:32:21,836-Speed 3078.19 samples/sec Loss 12.5219 LearningRate 0.0701 Epoch: 3 Global Step: 40470 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:32:25,161-Speed 3081.39 samples/sec Loss 12.5490 LearningRate 0.0701 Epoch: 3 Global Step: 40480 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:32:28,532-Speed 3038.50 samples/sec Loss 12.5635 LearningRate 0.0701 Epoch: 3 Global Step: 40490 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:32:31,903-Speed 3038.43 samples/sec Loss 12.3832 LearningRate 0.0701 Epoch: 3 Global Step: 40500 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:32:35,249-Speed 3061.48 samples/sec Loss 12.6454 LearningRate 0.0700 Epoch: 3 Global Step: 40510 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:32:38,591-Speed 3064.51 samples/sec Loss 12.5892 LearningRate 0.0700 Epoch: 3 Global Step: 40520 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:32:41,875-Speed 3118.70 samples/sec Loss 12.3716 LearningRate 0.0700 Epoch: 3 Global Step: 40530 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:32:45,257-Speed 3029.12 samples/sec Loss 12.7095 LearningRate 0.0700 Epoch: 3 Global Step: 40540 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:32:48,505-Speed 3153.35 samples/sec Loss 12.4472 LearningRate 0.0700 Epoch: 3 Global Step: 40550 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:32:51,808-Speed 3101.49 samples/sec Loss 12.6154 LearningRate 0.0700 Epoch: 3 Global Step: 40560 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:32:55,110-Speed 3102.22 samples/sec Loss 12.5448 LearningRate 0.0700 Epoch: 3 Global Step: 40570 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:32:58,430-Speed 3085.23 samples/sec Loss 12.4088 LearningRate 0.0700 Epoch: 3 Global Step: 40580 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:33:01,750-Speed 3085.04 samples/sec Loss 12.3161 LearningRate 0.0700 Epoch: 3 Global Step: 40590 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:33:05,084-Speed 3071.63 samples/sec Loss 12.6370 LearningRate 0.0700 Epoch: 3 Global Step: 40600 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:33:08,413-Speed 3076.44 samples/sec Loss 12.6830 LearningRate 0.0700 Epoch: 3 Global Step: 40610 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:33:11,707-Speed 3110.20 samples/sec Loss 12.7448 LearningRate 0.0700 Epoch: 3 Global Step: 40620 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:33:15,062-Speed 3053.23 samples/sec Loss 12.4139 LearningRate 0.0700 Epoch: 3 Global Step: 40630 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:33:18,386-Speed 3081.37 samples/sec Loss 12.4854 LearningRate 0.0700 Epoch: 3 Global Step: 40640 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:33:21,726-Speed 3066.52 samples/sec Loss 12.5631 LearningRate 0.0700 Epoch: 3 Global Step: 40650 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:33:25,031-Speed 3099.32 samples/sec Loss 12.7184 LearningRate 0.0699 Epoch: 3 Global Step: 40660 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:33:28,349-Speed 3087.52 samples/sec Loss 12.4256 LearningRate 0.0699 Epoch: 3 Global Step: 40670 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:33:31,627-Speed 3125.29 samples/sec Loss 12.4796 LearningRate 0.0699 Epoch: 3 Global Step: 40680 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:33:34,933-Speed 3097.32 samples/sec Loss 12.5224 LearningRate 0.0699 Epoch: 3 Global Step: 40690 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:33:38,238-Speed 3099.52 samples/sec Loss 12.5332 LearningRate 0.0699 Epoch: 3 Global Step: 40700 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:33:41,584-Speed 3061.59 samples/sec Loss 12.5824 LearningRate 0.0699 Epoch: 3 Global Step: 40710 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:33:44,961-Speed 3033.88 samples/sec Loss 12.4650 LearningRate 0.0699 Epoch: 3 Global Step: 40720 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:33:48,362-Speed 3011.52 samples/sec Loss 12.4888 LearningRate 0.0699 Epoch: 3 Global Step: 40730 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:33:51,676-Speed 3091.12 samples/sec Loss 12.5063 LearningRate 0.0699 Epoch: 3 Global Step: 40740 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:33:54,987-Speed 3093.96 samples/sec Loss 12.5463 LearningRate 0.0699 Epoch: 3 Global Step: 40750 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-04-27 05:33:58,275-Speed 3115.47 samples/sec Loss 12.5510 LearningRate 0.0699 Epoch: 3 Global Step: 40760 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:34:01,601-Speed 3079.17 samples/sec Loss 12.5595 LearningRate 0.0699 Epoch: 3 Global Step: 40770 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:34:04,967-Speed 3044.04 samples/sec Loss 12.4887 LearningRate 0.0699 Epoch: 3 Global Step: 40780 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:34:08,288-Speed 3084.10 samples/sec Loss 12.4942 LearningRate 0.0699 Epoch: 3 Global Step: 40790 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:34:11,640-Speed 3056.09 samples/sec Loss 12.4054 LearningRate 0.0698 Epoch: 3 Global Step: 40800 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:34:15,014-Speed 3036.16 samples/sec Loss 12.4532 LearningRate 0.0698 Epoch: 3 Global Step: 40810 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:34:18,337-Speed 3081.78 samples/sec Loss 12.6587 LearningRate 0.0698 Epoch: 3 Global Step: 40820 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:34:21,708-Speed 3038.56 samples/sec Loss 12.4574 LearningRate 0.0698 Epoch: 3 Global Step: 40830 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:34:25,022-Speed 3090.84 samples/sec Loss 12.5278 LearningRate 0.0698 Epoch: 3 Global Step: 40840 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:34:28,350-Speed 3077.53 samples/sec Loss 12.4959 LearningRate 0.0698 Epoch: 3 Global Step: 40850 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:34:31,611-Speed 3142.91 samples/sec Loss 12.5502 LearningRate 0.0698 Epoch: 3 Global Step: 40860 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:34:34,936-Speed 3081.03 samples/sec Loss 12.8110 LearningRate 0.0698 Epoch: 3 Global Step: 40870 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:34:38,327-Speed 3021.07 samples/sec Loss 12.6021 LearningRate 0.0698 Epoch: 3 Global Step: 40880 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:34:41,678-Speed 3056.82 samples/sec Loss 12.5015 LearningRate 0.0698 Epoch: 3 Global Step: 40890 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:34:45,020-Speed 3064.50 samples/sec Loss 12.5337 LearningRate 0.0698 Epoch: 3 Global Step: 40900 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:34:48,402-Speed 3028.91 samples/sec Loss 12.5619 LearningRate 0.0698 Epoch: 3 Global Step: 40910 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:34:51,774-Speed 3037.44 samples/sec Loss 12.6577 LearningRate 0.0698 Epoch: 3 Global Step: 40920 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:34:55,051-Speed 3126.60 samples/sec Loss 12.5650 LearningRate 0.0698 Epoch: 3 Global Step: 40930 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:34:58,348-Speed 3106.21 samples/sec Loss 12.4253 LearningRate 0.0698 Epoch: 3 Global Step: 40940 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:35:01,683-Speed 3071.29 samples/sec Loss 12.3574 LearningRate 0.0697 Epoch: 3 Global Step: 40950 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:35:04,994-Speed 3093.78 samples/sec Loss 12.6223 LearningRate 0.0697 Epoch: 3 Global Step: 40960 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-04-27 05:35:08,332-Speed 3068.94 samples/sec Loss 12.4038 LearningRate 0.0697 Epoch: 3 Global Step: 40970 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:35:11,676-Speed 3062.62 samples/sec Loss 12.4616 LearningRate 0.0697 Epoch: 3 Global Step: 40980 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:35:15,054-Speed 3031.96 samples/sec Loss 12.5311 LearningRate 0.0697 Epoch: 3 Global Step: 40990 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:35:18,395-Speed 3066.02 samples/sec Loss 12.5175 LearningRate 0.0697 Epoch: 3 Global Step: 41000 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:35:21,785-Speed 3021.66 samples/sec Loss 12.5024 LearningRate 0.0697 Epoch: 3 Global Step: 41010 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:35:25,161-Speed 3034.60 samples/sec Loss 12.4092 LearningRate 0.0697 Epoch: 3 Global Step: 41020 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:35:28,478-Speed 3087.79 samples/sec Loss 12.5286 LearningRate 0.0697 Epoch: 3 Global Step: 41030 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:35:31,777-Speed 3104.56 samples/sec Loss 12.3709 LearningRate 0.0697 Epoch: 3 Global Step: 41040 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:35:35,095-Speed 3087.26 samples/sec Loss 12.5980 LearningRate 0.0697 Epoch: 3 Global Step: 41050 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:35:38,397-Speed 3102.83 samples/sec Loss 12.4571 LearningRate 0.0697 Epoch: 3 Global Step: 41060 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:35:41,724-Speed 3078.28 samples/sec Loss 12.6438 LearningRate 0.0697 Epoch: 3 Global Step: 41070 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:35:45,076-Speed 3055.43 samples/sec Loss 12.5423 LearningRate 0.0697 Epoch: 3 Global Step: 41080 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:35:48,407-Speed 3074.80 samples/sec Loss 12.4374 LearningRate 0.0697 Epoch: 3 Global Step: 41090 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:35:51,745-Speed 3069.13 samples/sec Loss 12.4508 LearningRate 0.0696 Epoch: 3 Global Step: 41100 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:35:55,094-Speed 3058.18 samples/sec Loss 12.5442 LearningRate 0.0696 Epoch: 3 Global Step: 41110 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:35:58,476-Speed 3028.88 samples/sec Loss 12.5390 LearningRate 0.0696 Epoch: 3 Global Step: 41120 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:36:01,797-Speed 3084.42 samples/sec Loss 12.5305 LearningRate 0.0696 Epoch: 3 Global Step: 41130 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:36:05,111-Speed 3090.10 samples/sec Loss 12.5159 LearningRate 0.0696 Epoch: 3 Global Step: 41140 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:36:08,516-Speed 3009.34 samples/sec Loss 12.4193 LearningRate 0.0696 Epoch: 3 Global Step: 41150 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:36:11,879-Speed 3044.81 samples/sec Loss 12.5820 LearningRate 0.0696 Epoch: 3 Global Step: 41160 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:36:15,231-Speed 3055.81 samples/sec Loss 12.5696 LearningRate 0.0696 Epoch: 3 Global Step: 41170 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:36:18,585-Speed 3054.20 samples/sec Loss 12.4668 LearningRate 0.0696 Epoch: 3 Global Step: 41180 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:36:21,909-Speed 3082.16 samples/sec Loss 12.6295 LearningRate 0.0696 Epoch: 3 Global Step: 41190 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:36:25,185-Speed 3125.94 samples/sec Loss 12.4917 LearningRate 0.0696 Epoch: 3 Global Step: 41200 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:36:28,490-Speed 3099.33 samples/sec Loss 12.4442 LearningRate 0.0696 Epoch: 3 Global Step: 41210 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:36:31,861-Speed 3038.64 samples/sec Loss 12.5308 LearningRate 0.0696 Epoch: 3 Global Step: 41220 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:36:35,238-Speed 3032.67 samples/sec Loss 12.5062 LearningRate 0.0696 Epoch: 3 Global Step: 41230 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:36:38,592-Speed 3054.48 samples/sec Loss 12.5859 LearningRate 0.0696 Epoch: 3 Global Step: 41240 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:36:41,915-Speed 3082.32 samples/sec Loss 12.6239 LearningRate 0.0695 Epoch: 3 Global Step: 41250 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:36:45,273-Speed 3050.05 samples/sec Loss 12.5059 LearningRate 0.0695 Epoch: 3 Global Step: 41260 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:36:48,597-Speed 3081.63 samples/sec Loss 12.6391 LearningRate 0.0695 Epoch: 3 Global Step: 41270 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:36:51,906-Speed 3096.26 samples/sec Loss 12.5890 LearningRate 0.0695 Epoch: 3 Global Step: 41280 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:36:55,275-Speed 3039.87 samples/sec Loss 12.4169 LearningRate 0.0695 Epoch: 3 Global Step: 41290 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:36:58,543-Speed 3134.54 samples/sec Loss 12.5215 LearningRate 0.0695 Epoch: 3 Global Step: 41300 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:37:01,840-Speed 3107.18 samples/sec Loss 12.3687 LearningRate 0.0695 Epoch: 3 Global Step: 41310 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:37:05,190-Speed 3057.01 samples/sec Loss 12.3843 LearningRate 0.0695 Epoch: 3 Global Step: 41320 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:37:08,533-Speed 3064.71 samples/sec Loss 12.5494 LearningRate 0.0695 Epoch: 3 Global Step: 41330 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:37:11,838-Speed 3098.71 samples/sec Loss 12.3671 LearningRate 0.0695 Epoch: 3 Global Step: 41340 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:37:15,139-Speed 3102.76 samples/sec Loss 12.4928 LearningRate 0.0695 Epoch: 3 Global Step: 41350 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:37:18,451-Speed 3093.04 samples/sec Loss 12.5024 LearningRate 0.0695 Epoch: 3 Global Step: 41360 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:37:21,752-Speed 3103.11 samples/sec Loss 12.5313 LearningRate 0.0695 Epoch: 3 Global Step: 41370 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:37:25,076-Speed 3081.58 samples/sec Loss 12.5466 LearningRate 0.0695 Epoch: 3 Global Step: 41380 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:37:28,407-Speed 3075.09 samples/sec Loss 12.4450 LearningRate 0.0695 Epoch: 3 Global Step: 41390 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:37:31,795-Speed 3022.91 samples/sec Loss 12.5913 LearningRate 0.0694 Epoch: 3 Global Step: 41400 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:37:35,189-Speed 3018.56 samples/sec Loss 12.2726 LearningRate 0.0694 Epoch: 3 Global Step: 41410 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:37:38,544-Speed 3053.51 samples/sec Loss 12.5175 LearningRate 0.0694 Epoch: 3 Global Step: 41420 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:37:41,871-Speed 3077.88 samples/sec Loss 12.6453 LearningRate 0.0694 Epoch: 3 Global Step: 41430 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:37:45,214-Speed 3063.70 samples/sec Loss 12.4236 LearningRate 0.0694 Epoch: 3 Global Step: 41440 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:37:48,569-Speed 3053.29 samples/sec Loss 12.5604 LearningRate 0.0694 Epoch: 3 Global Step: 41450 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:37:51,899-Speed 3076.41 samples/sec Loss 12.4828 LearningRate 0.0694 Epoch: 3 Global Step: 41460 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:37:55,323-Speed 2990.81 samples/sec Loss 12.4478 LearningRate 0.0694 Epoch: 3 Global Step: 41470 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:37:58,652-Speed 3077.25 samples/sec Loss 12.6363 LearningRate 0.0694 Epoch: 3 Global Step: 41480 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:38:01,994-Speed 3064.66 samples/sec Loss 12.4834 LearningRate 0.0694 Epoch: 3 Global Step: 41490 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:38:05,356-Speed 3047.02 samples/sec Loss 12.4164 LearningRate 0.0694 Epoch: 3 Global Step: 41500 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:38:08,708-Speed 3056.60 samples/sec Loss 12.5189 LearningRate 0.0694 Epoch: 3 Global Step: 41510 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:38:12,024-Speed 3088.69 samples/sec Loss 12.4876 LearningRate 0.0694 Epoch: 3 Global Step: 41520 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:38:15,401-Speed 3033.57 samples/sec Loss 12.6728 LearningRate 0.0694 Epoch: 3 Global Step: 41530 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:38:18,709-Speed 3096.47 samples/sec Loss 12.4777 LearningRate 0.0694 Epoch: 3 Global Step: 41540 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:38:22,021-Speed 3092.84 samples/sec Loss 12.5734 LearningRate 0.0693 Epoch: 3 Global Step: 41550 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:38:25,401-Speed 3030.16 samples/sec Loss 12.4624 LearningRate 0.0693 Epoch: 3 Global Step: 41560 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:38:28,747-Speed 3061.47 samples/sec Loss 12.6429 LearningRate 0.0693 Epoch: 3 Global Step: 41570 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:38:32,097-Speed 3057.38 samples/sec Loss 12.5839 LearningRate 0.0693 Epoch: 3 Global Step: 41580 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-04-27 05:38:35,369-Speed 3130.02 samples/sec Loss 12.6168 LearningRate 0.0693 Epoch: 3 Global Step: 41590 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:38:38,643-Speed 3128.95 samples/sec Loss 12.3803 LearningRate 0.0693 Epoch: 3 Global Step: 41600 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:38:41,938-Speed 3108.36 samples/sec Loss 12.5064 LearningRate 0.0693 Epoch: 3 Global Step: 41610 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:38:45,239-Speed 3103.56 samples/sec Loss 12.4928 LearningRate 0.0693 Epoch: 3 Global Step: 41620 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:38:48,553-Speed 3090.55 samples/sec Loss 12.5363 LearningRate 0.0693 Epoch: 3 Global Step: 41630 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:38:51,852-Speed 3104.77 samples/sec Loss 12.3706 LearningRate 0.0693 Epoch: 3 Global Step: 41640 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:38:55,181-Speed 3077.68 samples/sec Loss 12.6176 LearningRate 0.0693 Epoch: 3 Global Step: 41650 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:38:58,501-Speed 3085.21 samples/sec Loss 12.5219 LearningRate 0.0693 Epoch: 3 Global Step: 41660 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:39:01,820-Speed 3086.33 samples/sec Loss 12.4497 LearningRate 0.0693 Epoch: 3 Global Step: 41670 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:39:05,193-Speed 3036.26 samples/sec Loss 12.4516 LearningRate 0.0693 Epoch: 3 Global Step: 41680 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:39:08,460-Speed 3135.42 samples/sec Loss 12.5876 LearningRate 0.0693 Epoch: 3 Global Step: 41690 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:39:11,779-Speed 3086.50 samples/sec Loss 12.5585 LearningRate 0.0692 Epoch: 3 Global Step: 41700 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:39:15,062-Speed 3119.10 samples/sec Loss 12.6022 LearningRate 0.0692 Epoch: 3 Global Step: 41710 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:39:18,387-Speed 3081.25 samples/sec Loss 12.5900 LearningRate 0.0692 Epoch: 3 Global Step: 41720 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:39:21,708-Speed 3084.19 samples/sec Loss 12.3372 LearningRate 0.0692 Epoch: 3 Global Step: 41730 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:39:25,094-Speed 3024.75 samples/sec Loss 12.5940 LearningRate 0.0692 Epoch: 3 Global Step: 41740 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:39:28,438-Speed 3062.92 samples/sec Loss 12.4101 LearningRate 0.0692 Epoch: 3 Global Step: 41750 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:39:31,785-Speed 3060.49 samples/sec Loss 12.4045 LearningRate 0.0692 Epoch: 3 Global Step: 41760 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:39:35,279-Speed 2931.67 samples/sec Loss 12.5294 LearningRate 0.0692 Epoch: 3 Global Step: 41770 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:39:38,617-Speed 3068.97 samples/sec Loss 12.5552 LearningRate 0.0692 Epoch: 3 Global Step: 41780 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:39:41,981-Speed 3044.86 samples/sec Loss 12.5207 LearningRate 0.0692 Epoch: 3 Global Step: 41790 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:39:45,307-Speed 3079.48 samples/sec Loss 12.4532 LearningRate 0.0692 Epoch: 3 Global Step: 41800 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:39:48,691-Speed 3027.41 samples/sec Loss 12.5153 LearningRate 0.0692 Epoch: 3 Global Step: 41810 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:39:52,064-Speed 3036.44 samples/sec Loss 12.4807 LearningRate 0.0692 Epoch: 3 Global Step: 41820 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:39:55,394-Speed 3076.79 samples/sec Loss 12.5050 LearningRate 0.0692 Epoch: 3 Global Step: 41830 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:39:58,728-Speed 3072.26 samples/sec Loss 12.5754 LearningRate 0.0692 Epoch: 3 Global Step: 41840 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:40:02,063-Speed 3071.16 samples/sec Loss 12.2875 LearningRate 0.0691 Epoch: 3 Global Step: 41850 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:40:05,354-Speed 3112.18 samples/sec Loss 12.5046 LearningRate 0.0691 Epoch: 3 Global Step: 41860 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:40:08,634-Speed 3123.68 samples/sec Loss 12.3587 LearningRate 0.0691 Epoch: 3 Global Step: 41870 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:40:11,953-Speed 3085.57 samples/sec Loss 12.4314 LearningRate 0.0691 Epoch: 3 Global Step: 41880 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:40:15,290-Speed 3070.03 samples/sec Loss 12.3890 LearningRate 0.0691 Epoch: 3 Global Step: 41890 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-04-27 05:40:18,622-Speed 3073.91 samples/sec Loss 12.5360 LearningRate 0.0691 Epoch: 3 Global Step: 41900 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:40:21,895-Speed 3129.92 samples/sec Loss 12.5962 LearningRate 0.0691 Epoch: 3 Global Step: 41910 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:40:25,254-Speed 3049.86 samples/sec Loss 12.3088 LearningRate 0.0691 Epoch: 3 Global Step: 41920 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:40:28,579-Speed 3079.87 samples/sec Loss 12.4544 LearningRate 0.0691 Epoch: 3 Global Step: 41930 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:40:31,873-Speed 3109.60 samples/sec Loss 12.5174 LearningRate 0.0691 Epoch: 3 Global Step: 41940 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:40:35,201-Speed 3078.54 samples/sec Loss 12.4370 LearningRate 0.0691 Epoch: 3 Global Step: 41950 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:40:38,490-Speed 3114.17 samples/sec Loss 12.4367 LearningRate 0.0691 Epoch: 3 Global Step: 41960 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:40:41,865-Speed 3034.56 samples/sec Loss 12.3345 LearningRate 0.0691 Epoch: 3 Global Step: 41970 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:40:45,186-Speed 3084.47 samples/sec Loss 12.3982 LearningRate 0.0691 Epoch: 3 Global Step: 41980 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:40:48,487-Speed 3102.70 samples/sec Loss 12.3924 LearningRate 0.0691 Epoch: 3 Global Step: 41990 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:40:51,767-Speed 3122.87 samples/sec Loss 12.5216 LearningRate 0.0690 Epoch: 3 Global Step: 42000 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:40:55,096-Speed 3077.06 samples/sec Loss 12.6008 LearningRate 0.0690 Epoch: 3 Global Step: 42010 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:40:58,424-Speed 3077.29 samples/sec Loss 12.5984 LearningRate 0.0690 Epoch: 3 Global Step: 42020 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:41:01,746-Speed 3083.79 samples/sec Loss 12.4978 LearningRate 0.0690 Epoch: 3 Global Step: 42030 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:41:05,064-Speed 3087.06 samples/sec Loss 12.3646 LearningRate 0.0690 Epoch: 3 Global Step: 42040 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:41:08,391-Speed 3079.07 samples/sec Loss 12.3181 LearningRate 0.0690 Epoch: 3 Global Step: 42050 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:41:11,696-Speed 3099.19 samples/sec Loss 12.5454 LearningRate 0.0690 Epoch: 3 Global Step: 42060 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:41:15,080-Speed 3026.49 samples/sec Loss 12.3721 LearningRate 0.0690 Epoch: 3 Global Step: 42070 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:41:18,377-Speed 3107.13 samples/sec Loss 12.5925 LearningRate 0.0690 Epoch: 3 Global Step: 42080 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:41:21,695-Speed 3086.93 samples/sec Loss 12.4077 LearningRate 0.0690 Epoch: 3 Global Step: 42090 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:41:24,994-Speed 3105.03 samples/sec Loss 12.4697 LearningRate 0.0690 Epoch: 3 Global Step: 42100 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-04-27 05:41:28,346-Speed 3055.69 samples/sec Loss 12.3590 LearningRate 0.0690 Epoch: 3 Global Step: 42110 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:41:31,651-Speed 3099.76 samples/sec Loss 12.6028 LearningRate 0.0690 Epoch: 3 Global Step: 42120 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:41:34,973-Speed 3083.53 samples/sec Loss 12.4327 LearningRate 0.0690 Epoch: 3 Global Step: 42130 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:41:38,343-Speed 3039.11 samples/sec Loss 12.4501 LearningRate 0.0690 Epoch: 3 Global Step: 42140 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:41:41,691-Speed 3059.80 samples/sec Loss 12.3631 LearningRate 0.0689 Epoch: 3 Global Step: 42150 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:41:44,981-Speed 3113.12 samples/sec Loss 12.4425 LearningRate 0.0689 Epoch: 3 Global Step: 42160 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:41:48,285-Speed 3100.14 samples/sec Loss 12.4039 LearningRate 0.0689 Epoch: 3 Global Step: 42170 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:41:51,599-Speed 3091.42 samples/sec Loss 12.4345 LearningRate 0.0689 Epoch: 3 Global Step: 42180 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:41:54,947-Speed 3059.02 samples/sec Loss 12.6959 LearningRate 0.0689 Epoch: 3 Global Step: 42190 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:41:58,288-Speed 3065.66 samples/sec Loss 12.4036 LearningRate 0.0689 Epoch: 3 Global Step: 42200 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:42:01,566-Speed 3124.92 samples/sec Loss 12.5196 LearningRate 0.0689 Epoch: 3 Global Step: 42210 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:42:04,890-Speed 3081.47 samples/sec Loss 12.4163 LearningRate 0.0689 Epoch: 3 Global Step: 42220 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:42:08,193-Speed 3101.52 samples/sec Loss 12.3810 LearningRate 0.0689 Epoch: 3 Global Step: 42230 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:42:11,592-Speed 3012.97 samples/sec Loss 12.6546 LearningRate 0.0689 Epoch: 3 Global Step: 42240 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:42:14,979-Speed 3024.06 samples/sec Loss 12.7075 LearningRate 0.0689 Epoch: 3 Global Step: 42250 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:42:18,295-Speed 3089.66 samples/sec Loss 12.6464 LearningRate 0.0689 Epoch: 3 Global Step: 42260 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:42:21,657-Speed 3046.31 samples/sec Loss 12.5487 LearningRate 0.0689 Epoch: 3 Global Step: 42270 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:42:25,006-Speed 3058.38 samples/sec Loss 12.4510 LearningRate 0.0689 Epoch: 3 Global Step: 42280 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:42:28,297-Speed 3112.28 samples/sec Loss 12.4322 LearningRate 0.0689 Epoch: 3 Global Step: 42290 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:42:31,622-Speed 3081.32 samples/sec Loss 12.4643 LearningRate 0.0688 Epoch: 3 Global Step: 42300 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:42:35,005-Speed 3027.74 samples/sec Loss 12.5288 LearningRate 0.0688 Epoch: 3 Global Step: 42310 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:42:38,346-Speed 3066.17 samples/sec Loss 12.5297 LearningRate 0.0688 Epoch: 3 Global Step: 42320 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:42:41,650-Speed 3100.31 samples/sec Loss 12.3811 LearningRate 0.0688 Epoch: 3 Global Step: 42330 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:42:44,979-Speed 3076.97 samples/sec Loss 12.4794 LearningRate 0.0688 Epoch: 3 Global Step: 42340 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:42:48,355-Speed 3033.95 samples/sec Loss 12.5707 LearningRate 0.0688 Epoch: 3 Global Step: 42350 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:42:51,694-Speed 3066.99 samples/sec Loss 12.3846 LearningRate 0.0688 Epoch: 3 Global Step: 42360 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:42:54,971-Speed 3125.77 samples/sec Loss 12.4327 LearningRate 0.0688 Epoch: 3 Global Step: 42370 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:42:58,316-Speed 3062.44 samples/sec Loss 12.5351 LearningRate 0.0688 Epoch: 3 Global Step: 42380 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:43:01,594-Speed 3124.45 samples/sec Loss 12.5911 LearningRate 0.0688 Epoch: 3 Global Step: 42390 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:43:04,932-Speed 3069.09 samples/sec Loss 12.5524 LearningRate 0.0688 Epoch: 3 Global Step: 42400 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:43:08,314-Speed 3028.35 samples/sec Loss 12.3494 LearningRate 0.0688 Epoch: 3 Global Step: 42410 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-04-27 05:43:11,683-Speed 3040.26 samples/sec Loss 12.3815 LearningRate 0.0688 Epoch: 3 Global Step: 42420 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:43:15,047-Speed 3045.24 samples/sec Loss 12.4485 LearningRate 0.0688 Epoch: 3 Global Step: 42430 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:43:18,441-Speed 3017.88 samples/sec Loss 12.6118 LearningRate 0.0688 Epoch: 3 Global Step: 42440 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:43:21,786-Speed 3062.25 samples/sec Loss 12.4588 LearningRate 0.0687 Epoch: 3 Global Step: 42450 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:43:25,168-Speed 3029.45 samples/sec Loss 12.4744 LearningRate 0.0687 Epoch: 3 Global Step: 42460 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:43:28,543-Speed 3034.62 samples/sec Loss 12.3774 LearningRate 0.0687 Epoch: 3 Global Step: 42470 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:43:31,866-Speed 3082.29 samples/sec Loss 12.5490 LearningRate 0.0687 Epoch: 3 Global Step: 42480 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:43:35,119-Speed 3148.92 samples/sec Loss 12.6097 LearningRate 0.0687 Epoch: 3 Global Step: 42490 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:43:38,493-Speed 3035.68 samples/sec Loss 12.4364 LearningRate 0.0687 Epoch: 3 Global Step: 42500 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:43:41,804-Speed 3094.50 samples/sec Loss 12.5154 LearningRate 0.0687 Epoch: 3 Global Step: 42510 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:43:45,150-Speed 3060.59 samples/sec Loss 12.3613 LearningRate 0.0687 Epoch: 3 Global Step: 42520 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:43:48,462-Speed 3093.18 samples/sec Loss 12.4787 LearningRate 0.0687 Epoch: 3 Global Step: 42530 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:43:51,795-Speed 3072.41 samples/sec Loss 12.5066 LearningRate 0.0687 Epoch: 3 Global Step: 42540 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:43:55,161-Speed 3043.48 samples/sec Loss 12.5179 LearningRate 0.0687 Epoch: 3 Global Step: 42550 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:43:58,501-Speed 3067.22 samples/sec Loss 12.4877 LearningRate 0.0687 Epoch: 3 Global Step: 42560 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:44:01,864-Speed 3045.31 samples/sec Loss 12.3858 LearningRate 0.0687 Epoch: 3 Global Step: 42570 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:44:05,215-Speed 3057.17 samples/sec Loss 12.4607 LearningRate 0.0687 Epoch: 3 Global Step: 42580 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:44:08,530-Speed 3089.92 samples/sec Loss 12.4162 LearningRate 0.0687 Epoch: 3 Global Step: 42590 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:44:11,812-Speed 3120.88 samples/sec Loss 12.4441 LearningRate 0.0686 Epoch: 3 Global Step: 42600 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:44:15,199-Speed 3024.04 samples/sec Loss 12.3814 LearningRate 0.0686 Epoch: 3 Global Step: 42610 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:44:18,569-Speed 3039.37 samples/sec Loss 12.4682 LearningRate 0.0686 Epoch: 3 Global Step: 42620 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:44:21,922-Speed 3055.15 samples/sec Loss 12.5228 LearningRate 0.0686 Epoch: 3 Global Step: 42630 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:44:25,308-Speed 3024.63 samples/sec Loss 12.2618 LearningRate 0.0686 Epoch: 3 Global Step: 42640 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:44:28,684-Speed 3034.54 samples/sec Loss 12.4633 LearningRate 0.0686 Epoch: 3 Global Step: 42650 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:44:31,984-Speed 3104.45 samples/sec Loss 12.4418 LearningRate 0.0686 Epoch: 3 Global Step: 42660 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:44:35,269-Speed 3117.31 samples/sec Loss 12.3706 LearningRate 0.0686 Epoch: 3 Global Step: 42670 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:44:38,587-Speed 3087.85 samples/sec Loss 12.4898 LearningRate 0.0686 Epoch: 3 Global Step: 42680 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:44:41,991-Speed 3008.59 samples/sec Loss 12.4300 LearningRate 0.0686 Epoch: 3 Global Step: 42690 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:44:45,314-Speed 3082.07 samples/sec Loss 12.4032 LearningRate 0.0686 Epoch: 3 Global Step: 42700 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:44:48,649-Speed 3071.88 samples/sec Loss 12.5091 LearningRate 0.0686 Epoch: 3 Global Step: 42710 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:44:51,963-Speed 3090.72 samples/sec Loss 12.4249 LearningRate 0.0686 Epoch: 3 Global Step: 42720 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:44:55,291-Speed 3077.74 samples/sec Loss 12.3208 LearningRate 0.0686 Epoch: 3 Global Step: 42730 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:44:58,640-Speed 3060.73 samples/sec Loss 12.4105 LearningRate 0.0686 Epoch: 3 Global Step: 42740 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:45:01,962-Speed 3083.45 samples/sec Loss 12.4678 LearningRate 0.0685 Epoch: 3 Global Step: 42750 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:45:05,333-Speed 3038.59 samples/sec Loss 12.3278 LearningRate 0.0685 Epoch: 3 Global Step: 42760 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:45:08,620-Speed 3116.23 samples/sec Loss 12.5075 LearningRate 0.0685 Epoch: 3 Global Step: 42770 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:45:11,934-Speed 3090.15 samples/sec Loss 12.4322 LearningRate 0.0685 Epoch: 3 Global Step: 42780 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:45:15,229-Speed 3109.70 samples/sec Loss 12.3752 LearningRate 0.0685 Epoch: 3 Global Step: 42790 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-04-27 05:45:18,519-Speed 3113.05 samples/sec Loss 12.3770 LearningRate 0.0685 Epoch: 3 Global Step: 42800 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:45:21,911-Speed 3019.74 samples/sec Loss 12.4023 LearningRate 0.0685 Epoch: 3 Global Step: 42810 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:45:25,271-Speed 3048.64 samples/sec Loss 12.6107 LearningRate 0.0685 Epoch: 3 Global Step: 42820 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:45:28,666-Speed 3017.81 samples/sec Loss 12.4023 LearningRate 0.0685 Epoch: 3 Global Step: 42830 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:45:32,039-Speed 3036.26 samples/sec Loss 12.4476 LearningRate 0.0685 Epoch: 3 Global Step: 42840 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:45:35,366-Speed 3078.91 samples/sec Loss 12.5108 LearningRate 0.0685 Epoch: 3 Global Step: 42850 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:45:38,746-Speed 3030.59 samples/sec Loss 12.4251 LearningRate 0.0685 Epoch: 3 Global Step: 42860 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:45:42,076-Speed 3076.97 samples/sec Loss 12.3388 LearningRate 0.0685 Epoch: 3 Global Step: 42870 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:45:45,440-Speed 3044.59 samples/sec Loss 12.3593 LearningRate 0.0685 Epoch: 3 Global Step: 42880 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:45:48,731-Speed 3112.15 samples/sec Loss 12.5631 LearningRate 0.0685 Epoch: 3 Global Step: 42890 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:45:52,045-Speed 3091.39 samples/sec Loss 12.3737 LearningRate 0.0684 Epoch: 3 Global Step: 42900 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:45:55,358-Speed 3091.20 samples/sec Loss 12.2157 LearningRate 0.0684 Epoch: 3 Global Step: 42910 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:45:58,686-Speed 3079.77 samples/sec Loss 12.4303 LearningRate 0.0684 Epoch: 3 Global Step: 42920 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:46:02,012-Speed 3080.22 samples/sec Loss 12.3957 LearningRate 0.0684 Epoch: 3 Global Step: 42930 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:46:05,341-Speed 3076.70 samples/sec Loss 12.4276 LearningRate 0.0684 Epoch: 3 Global Step: 42940 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:46:08,692-Speed 3057.14 samples/sec Loss 12.5606 LearningRate 0.0684 Epoch: 3 Global Step: 42950 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:46:12,063-Speed 3040.31 samples/sec Loss 12.4345 LearningRate 0.0684 Epoch: 3 Global Step: 42960 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:46:15,385-Speed 3083.40 samples/sec Loss 12.3115 LearningRate 0.0684 Epoch: 3 Global Step: 42970 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:46:18,746-Speed 3047.35 samples/sec Loss 12.4215 LearningRate 0.0684 Epoch: 3 Global Step: 42980 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:46:22,130-Speed 3026.95 samples/sec Loss 12.3781 LearningRate 0.0684 Epoch: 3 Global Step: 42990 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:46:25,446-Speed 3088.79 samples/sec Loss 12.3317 LearningRate 0.0684 Epoch: 3 Global Step: 43000 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:46:28,773-Speed 3079.02 samples/sec Loss 12.5736 LearningRate 0.0684 Epoch: 3 Global Step: 43010 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:46:32,052-Speed 3123.81 samples/sec Loss 12.4130 LearningRate 0.0684 Epoch: 3 Global Step: 43020 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:46:35,333-Speed 3122.17 samples/sec Loss 12.4619 LearningRate 0.0684 Epoch: 3 Global Step: 43030 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:46:38,695-Speed 3046.47 samples/sec Loss 12.5018 LearningRate 0.0684 Epoch: 3 Global Step: 43040 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:46:42,074-Speed 3031.31 samples/sec Loss 12.4018 LearningRate 0.0683 Epoch: 3 Global Step: 43050 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:46:45,374-Speed 3104.54 samples/sec Loss 12.4346 LearningRate 0.0683 Epoch: 3 Global Step: 43060 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:46:48,659-Speed 3118.10 samples/sec Loss 12.2570 LearningRate 0.0683 Epoch: 3 Global Step: 43070 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:46:52,020-Speed 3047.01 samples/sec Loss 12.5681 LearningRate 0.0683 Epoch: 3 Global Step: 43080 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:46:55,338-Speed 3087.37 samples/sec Loss 12.4149 LearningRate 0.0683 Epoch: 3 Global Step: 43090 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:46:58,648-Speed 3094.87 samples/sec Loss 12.3969 LearningRate 0.0683 Epoch: 3 Global Step: 43100 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:47:01,958-Speed 3094.49 samples/sec Loss 12.3858 LearningRate 0.0683 Epoch: 3 Global Step: 43110 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:47:05,273-Speed 3090.43 samples/sec Loss 12.4800 LearningRate 0.0683 Epoch: 3 Global Step: 43120 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:47:08,626-Speed 3054.00 samples/sec Loss 12.3393 LearningRate 0.0683 Epoch: 3 Global Step: 43130 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:47:11,943-Speed 3088.18 samples/sec Loss 12.4256 LearningRate 0.0683 Epoch: 3 Global Step: 43140 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:47:15,240-Speed 3107.45 samples/sec Loss 12.5309 LearningRate 0.0683 Epoch: 3 Global Step: 43150 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:47:18,572-Speed 3073.19 samples/sec Loss 12.5212 LearningRate 0.0683 Epoch: 3 Global Step: 43160 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:47:21,999-Speed 2989.41 samples/sec Loss 12.4094 LearningRate 0.0683 Epoch: 3 Global Step: 43170 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:47:25,306-Speed 3097.50 samples/sec Loss 12.4150 LearningRate 0.0683 Epoch: 3 Global Step: 43180 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:47:28,624-Speed 3087.18 samples/sec Loss 12.2855 LearningRate 0.0683 Epoch: 3 Global Step: 43190 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:47:31,939-Speed 3089.56 samples/sec Loss 12.2974 LearningRate 0.0682 Epoch: 3 Global Step: 43200 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:47:35,255-Speed 3089.10 samples/sec Loss 12.2374 LearningRate 0.0682 Epoch: 3 Global Step: 43210 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:47:38,600-Speed 3061.97 samples/sec Loss 12.3599 LearningRate 0.0682 Epoch: 3 Global Step: 43220 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:47:41,901-Speed 3103.39 samples/sec Loss 12.4862 LearningRate 0.0682 Epoch: 3 Global Step: 43230 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:47:45,185-Speed 3119.09 samples/sec Loss 12.4483 LearningRate 0.0682 Epoch: 3 Global Step: 43240 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:47:48,508-Speed 3083.00 samples/sec Loss 12.4753 LearningRate 0.0682 Epoch: 3 Global Step: 43250 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:47:51,852-Speed 3062.93 samples/sec Loss 12.4780 LearningRate 0.0682 Epoch: 3 Global Step: 43260 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:47:55,159-Speed 3097.81 samples/sec Loss 12.4644 LearningRate 0.0682 Epoch: 3 Global Step: 43270 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:47:58,473-Speed 3090.67 samples/sec Loss 12.2041 LearningRate 0.0682 Epoch: 3 Global Step: 43280 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:48:01,775-Speed 3102.25 samples/sec Loss 12.3300 LearningRate 0.0682 Epoch: 3 Global Step: 43290 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:48:05,090-Speed 3089.46 samples/sec Loss 12.3596 LearningRate 0.0682 Epoch: 3 Global Step: 43300 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:48:08,426-Speed 3071.26 samples/sec Loss 12.4623 LearningRate 0.0682 Epoch: 3 Global Step: 43310 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:48:11,699-Speed 3129.07 samples/sec Loss 12.4280 LearningRate 0.0682 Epoch: 3 Global Step: 43320 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:48:15,003-Speed 3100.37 samples/sec Loss 12.3562 LearningRate 0.0682 Epoch: 3 Global Step: 43330 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:48:18,357-Speed 3053.50 samples/sec Loss 12.3980 LearningRate 0.0682 Epoch: 3 Global Step: 43340 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:48:21,659-Speed 3101.97 samples/sec Loss 12.4919 LearningRate 0.0681 Epoch: 3 Global Step: 43350 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:48:24,975-Speed 3089.41 samples/sec Loss 12.4273 LearningRate 0.0681 Epoch: 3 Global Step: 43360 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:48:28,342-Speed 3041.80 samples/sec Loss 12.3533 LearningRate 0.0681 Epoch: 3 Global Step: 43370 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:48:31,663-Speed 3085.27 samples/sec Loss 12.3494 LearningRate 0.0681 Epoch: 3 Global Step: 43380 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:48:34,989-Speed 3079.02 samples/sec Loss 12.3811 LearningRate 0.0681 Epoch: 3 Global Step: 43390 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:48:38,349-Speed 3049.32 samples/sec Loss 12.4373 LearningRate 0.0681 Epoch: 3 Global Step: 43400 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:48:41,750-Speed 3011.43 samples/sec Loss 12.2182 LearningRate 0.0681 Epoch: 3 Global Step: 43410 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:48:45,029-Speed 3123.88 samples/sec Loss 12.1821 LearningRate 0.0681 Epoch: 3 Global Step: 43420 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:48:48,323-Speed 3108.90 samples/sec Loss 12.3016 LearningRate 0.0681 Epoch: 3 Global Step: 43430 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:48:51,667-Speed 3063.62 samples/sec Loss 12.5779 LearningRate 0.0681 Epoch: 3 Global Step: 43440 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:48:54,983-Speed 3088.61 samples/sec Loss 12.4481 LearningRate 0.0681 Epoch: 3 Global Step: 43450 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:48:58,360-Speed 3033.21 samples/sec Loss 12.2661 LearningRate 0.0681 Epoch: 3 Global Step: 43460 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:49:01,718-Speed 3050.82 samples/sec Loss 12.3220 LearningRate 0.0681 Epoch: 3 Global Step: 43470 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:49:04,996-Speed 3124.70 samples/sec Loss 12.2238 LearningRate 0.0681 Epoch: 3 Global Step: 43480 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:49:08,306-Speed 3094.66 samples/sec Loss 12.4699 LearningRate 0.0681 Epoch: 3 Global Step: 43490 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:49:11,670-Speed 3044.60 samples/sec Loss 12.3322 LearningRate 0.0680 Epoch: 3 Global Step: 43500 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:49:14,981-Speed 3094.32 samples/sec Loss 12.3565 LearningRate 0.0680 Epoch: 3 Global Step: 43510 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:49:18,347-Speed 3043.02 samples/sec Loss 12.4391 LearningRate 0.0680 Epoch: 3 Global Step: 43520 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:49:21,730-Speed 3027.25 samples/sec Loss 12.3698 LearningRate 0.0680 Epoch: 3 Global Step: 43530 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:49:25,041-Speed 3093.94 samples/sec Loss 12.4238 LearningRate 0.0680 Epoch: 3 Global Step: 43540 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:49:28,381-Speed 3066.90 samples/sec Loss 12.3941 LearningRate 0.0680 Epoch: 3 Global Step: 43550 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:49:31,645-Speed 3137.73 samples/sec Loss 12.2798 LearningRate 0.0680 Epoch: 3 Global Step: 43560 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:49:34,993-Speed 3059.80 samples/sec Loss 12.4989 LearningRate 0.0680 Epoch: 3 Global Step: 43570 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:49:38,289-Speed 3107.92 samples/sec Loss 12.4030 LearningRate 0.0680 Epoch: 3 Global Step: 43580 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:49:41,634-Speed 3061.93 samples/sec Loss 12.5533 LearningRate 0.0680 Epoch: 3 Global Step: 43590 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:49:44,907-Speed 3130.07 samples/sec Loss 12.4488 LearningRate 0.0680 Epoch: 3 Global Step: 43600 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:49:48,256-Speed 3057.89 samples/sec Loss 12.4151 LearningRate 0.0680 Epoch: 3 Global Step: 43610 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:49:51,568-Speed 3092.73 samples/sec Loss 12.4607 LearningRate 0.0680 Epoch: 3 Global Step: 43620 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:49:55,024-Speed 2964.29 samples/sec Loss 12.4766 LearningRate 0.0680 Epoch: 3 Global Step: 43630 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:49:58,371-Speed 3059.95 samples/sec Loss 12.3626 LearningRate 0.0680 Epoch: 3 Global Step: 43640 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:50:01,691-Speed 3085.83 samples/sec Loss 12.3715 LearningRate 0.0679 Epoch: 3 Global Step: 43650 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:50:05,163-Speed 2950.13 samples/sec Loss 12.3991 LearningRate 0.0679 Epoch: 3 Global Step: 43660 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:50:08,508-Speed 3062.14 samples/sec Loss 12.2719 LearningRate 0.0679 Epoch: 3 Global Step: 43670 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:50:11,825-Speed 3087.71 samples/sec Loss 12.2914 LearningRate 0.0679 Epoch: 3 Global Step: 43680 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:50:15,217-Speed 3020.27 samples/sec Loss 12.3624 LearningRate 0.0679 Epoch: 3 Global Step: 43690 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:50:18,552-Speed 3070.86 samples/sec Loss 12.3573 LearningRate 0.0679 Epoch: 3 Global Step: 43700 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:50:21,940-Speed 3023.63 samples/sec Loss 12.5662 LearningRate 0.0679 Epoch: 3 Global Step: 43710 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:50:25,263-Speed 3082.80 samples/sec Loss 12.4836 LearningRate 0.0679 Epoch: 3 Global Step: 43720 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:50:28,653-Speed 3021.78 samples/sec Loss 12.4643 LearningRate 0.0679 Epoch: 3 Global Step: 43730 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:50:32,003-Speed 3057.39 samples/sec Loss 12.3458 LearningRate 0.0679 Epoch: 3 Global Step: 43740 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 05:50:35,365-Speed 3047.35 samples/sec Loss 12.4674 LearningRate 0.0679 Epoch: 3 Global Step: 43750 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 05:50:38,677-Speed 3092.26 samples/sec Loss 12.4189 LearningRate 0.0679 Epoch: 3 Global Step: 43760 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 05:50:42,026-Speed 3058.71 samples/sec Loss 12.3055 LearningRate 0.0679 Epoch: 3 Global Step: 43770 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 05:50:45,386-Speed 3047.97 samples/sec Loss 12.3029 LearningRate 0.0679 Epoch: 3 Global Step: 43780 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 05:50:48,820-Speed 2983.31 samples/sec Loss 12.3614 LearningRate 0.0679 Epoch: 3 Global Step: 43790 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 05:50:52,099-Speed 3123.63 samples/sec Loss 12.2863 LearningRate 0.0678 Epoch: 3 Global Step: 43800 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 05:50:55,466-Speed 3042.40 samples/sec Loss 12.3672 LearningRate 0.0678 Epoch: 3 Global Step: 43810 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 05:50:58,853-Speed 3023.88 samples/sec Loss 12.3590 LearningRate 0.0678 Epoch: 3 Global Step: 43820 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 05:51:02,225-Speed 3037.79 samples/sec Loss 12.3355 LearningRate 0.0678 Epoch: 3 Global Step: 43830 Fp16 Grad Scale: 32768 Required: 19 hours Training: 2022-04-27 05:51:05,597-Speed 3037.82 samples/sec Loss 12.2756 LearningRate 0.0678 Epoch: 3 Global Step: 43840 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:51:08,926-Speed 3077.39 samples/sec Loss 12.2757 LearningRate 0.0678 Epoch: 3 Global Step: 43850 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:51:12,240-Speed 3090.28 samples/sec Loss 12.3935 LearningRate 0.0678 Epoch: 3 Global Step: 43860 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:51:15,578-Speed 3068.39 samples/sec Loss 12.3286 LearningRate 0.0678 Epoch: 3 Global Step: 43870 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:51:18,996-Speed 2997.41 samples/sec Loss 12.3514 LearningRate 0.0678 Epoch: 3 Global Step: 43880 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:51:22,351-Speed 3053.13 samples/sec Loss 12.3303 LearningRate 0.0678 Epoch: 3 Global Step: 43890 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:51:25,818-Speed 2954.04 samples/sec Loss 12.3417 LearningRate 0.0678 Epoch: 3 Global Step: 43900 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:51:29,186-Speed 3041.21 samples/sec Loss 12.3248 LearningRate 0.0678 Epoch: 3 Global Step: 43910 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:51:32,542-Speed 3052.44 samples/sec Loss 12.3415 LearningRate 0.0678 Epoch: 3 Global Step: 43920 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:51:35,946-Speed 3008.86 samples/sec Loss 12.4423 LearningRate 0.0678 Epoch: 3 Global Step: 43930 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:51:39,338-Speed 3019.58 samples/sec Loss 12.2954 LearningRate 0.0678 Epoch: 3 Global Step: 43940 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:51:42,765-Speed 2989.73 samples/sec Loss 12.3772 LearningRate 0.0677 Epoch: 3 Global Step: 43950 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:51:46,130-Speed 3044.12 samples/sec Loss 12.4305 LearningRate 0.0677 Epoch: 3 Global Step: 43960 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:51:49,491-Speed 3047.51 samples/sec Loss 12.3943 LearningRate 0.0677 Epoch: 3 Global Step: 43970 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:51:52,790-Speed 3104.83 samples/sec Loss 12.4189 LearningRate 0.0677 Epoch: 3 Global Step: 43980 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:51:56,065-Speed 3127.90 samples/sec Loss 12.4352 LearningRate 0.0677 Epoch: 3 Global Step: 43990 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:51:59,425-Speed 3048.29 samples/sec Loss 12.2694 LearningRate 0.0677 Epoch: 3 Global Step: 44000 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:52:02,744-Speed 3087.17 samples/sec Loss 12.3484 LearningRate 0.0677 Epoch: 3 Global Step: 44010 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:52:06,095-Speed 3057.01 samples/sec Loss 12.3098 LearningRate 0.0677 Epoch: 3 Global Step: 44020 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:52:09,460-Speed 3043.47 samples/sec Loss 12.3853 LearningRate 0.0677 Epoch: 3 Global Step: 44030 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:52:12,747-Speed 3116.40 samples/sec Loss 12.2673 LearningRate 0.0677 Epoch: 3 Global Step: 44040 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:52:16,021-Speed 3128.59 samples/sec Loss 12.2009 LearningRate 0.0677 Epoch: 3 Global Step: 44050 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:52:19,356-Speed 3071.56 samples/sec Loss 12.4004 LearningRate 0.0677 Epoch: 3 Global Step: 44060 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:52:22,712-Speed 3052.79 samples/sec Loss 12.2869 LearningRate 0.0677 Epoch: 3 Global Step: 44070 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:52:26,026-Speed 3090.90 samples/sec Loss 12.3020 LearningRate 0.0677 Epoch: 3 Global Step: 44080 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:52:29,336-Speed 3094.49 samples/sec Loss 12.4433 LearningRate 0.0677 Epoch: 3 Global Step: 44090 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:52:32,686-Speed 3057.83 samples/sec Loss 12.4162 LearningRate 0.0676 Epoch: 3 Global Step: 44100 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:52:36,033-Speed 3060.03 samples/sec Loss 12.2311 LearningRate 0.0676 Epoch: 3 Global Step: 44110 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:52:39,444-Speed 3003.14 samples/sec Loss 12.3081 LearningRate 0.0676 Epoch: 3 Global Step: 44120 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:52:42,750-Speed 3098.38 samples/sec Loss 12.2695 LearningRate 0.0676 Epoch: 3 Global Step: 44130 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:52:46,055-Speed 3099.24 samples/sec Loss 12.2862 LearningRate 0.0676 Epoch: 3 Global Step: 44140 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-04-27 05:52:49,327-Speed 3130.63 samples/sec Loss 12.3190 LearningRate 0.0676 Epoch: 3 Global Step: 44150 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:52:52,689-Speed 3046.92 samples/sec Loss 12.3968 LearningRate 0.0676 Epoch: 3 Global Step: 44160 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:52:56,023-Speed 3071.86 samples/sec Loss 12.2713 LearningRate 0.0676 Epoch: 3 Global Step: 44170 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:52:59,324-Speed 3103.30 samples/sec Loss 12.4444 LearningRate 0.0676 Epoch: 3 Global Step: 44180 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:53:02,660-Speed 3071.12 samples/sec Loss 12.3069 LearningRate 0.0676 Epoch: 3 Global Step: 44190 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:53:06,003-Speed 3063.95 samples/sec Loss 12.5187 LearningRate 0.0676 Epoch: 3 Global Step: 44200 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:53:09,321-Speed 3086.80 samples/sec Loss 12.4271 LearningRate 0.0676 Epoch: 3 Global Step: 44210 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:53:12,683-Speed 3046.52 samples/sec Loss 12.2584 LearningRate 0.0676 Epoch: 3 Global Step: 44220 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:53:16,028-Speed 3062.63 samples/sec Loss 12.1407 LearningRate 0.0676 Epoch: 3 Global Step: 44230 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:53:19,318-Speed 3113.47 samples/sec Loss 12.4616 LearningRate 0.0676 Epoch: 3 Global Step: 44240 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:53:22,638-Speed 3085.45 samples/sec Loss 12.4835 LearningRate 0.0675 Epoch: 3 Global Step: 44250 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:53:26,026-Speed 3022.61 samples/sec Loss 12.4200 LearningRate 0.0675 Epoch: 3 Global Step: 44260 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:53:29,387-Speed 3048.43 samples/sec Loss 12.3048 LearningRate 0.0675 Epoch: 3 Global Step: 44270 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:53:32,641-Speed 3147.81 samples/sec Loss 12.2918 LearningRate 0.0675 Epoch: 3 Global Step: 44280 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:53:35,952-Speed 3093.23 samples/sec Loss 12.1901 LearningRate 0.0675 Epoch: 3 Global Step: 44290 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:53:39,261-Speed 3096.24 samples/sec Loss 12.5132 LearningRate 0.0675 Epoch: 3 Global Step: 44300 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:53:42,628-Speed 3041.48 samples/sec Loss 12.3937 LearningRate 0.0675 Epoch: 3 Global Step: 44310 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:53:46,012-Speed 3027.32 samples/sec Loss 12.3020 LearningRate 0.0675 Epoch: 3 Global Step: 44320 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:53:49,428-Speed 2998.49 samples/sec Loss 12.2844 LearningRate 0.0675 Epoch: 3 Global Step: 44330 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:53:52,732-Speed 3100.33 samples/sec Loss 12.3277 LearningRate 0.0675 Epoch: 3 Global Step: 44340 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:53:56,043-Speed 3093.10 samples/sec Loss 12.3226 LearningRate 0.0675 Epoch: 3 Global Step: 44350 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-04-27 05:53:59,314-Speed 3131.42 samples/sec Loss 12.3949 LearningRate 0.0675 Epoch: 3 Global Step: 44360 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:54:02,632-Speed 3087.23 samples/sec Loss 12.3457 LearningRate 0.0675 Epoch: 3 Global Step: 44370 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:54:05,900-Speed 3134.31 samples/sec Loss 12.3097 LearningRate 0.0675 Epoch: 3 Global Step: 44380 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:54:09,197-Speed 3107.51 samples/sec Loss 12.3155 LearningRate 0.0675 Epoch: 3 Global Step: 44390 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:54:12,517-Speed 3084.27 samples/sec Loss 12.3125 LearningRate 0.0674 Epoch: 3 Global Step: 44400 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:54:15,957-Speed 2978.18 samples/sec Loss 12.4556 LearningRate 0.0674 Epoch: 3 Global Step: 44410 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:54:19,313-Speed 3051.82 samples/sec Loss 12.2535 LearningRate 0.0674 Epoch: 3 Global Step: 44420 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:54:22,682-Speed 3040.74 samples/sec Loss 12.4553 LearningRate 0.0674 Epoch: 3 Global Step: 44430 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:54:26,007-Speed 3080.55 samples/sec Loss 12.2148 LearningRate 0.0674 Epoch: 3 Global Step: 44440 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:54:29,311-Speed 3099.70 samples/sec Loss 12.2004 LearningRate 0.0674 Epoch: 3 Global Step: 44450 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:54:32,638-Speed 3079.12 samples/sec Loss 12.3082 LearningRate 0.0674 Epoch: 3 Global Step: 44460 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:54:36,006-Speed 3041.21 samples/sec Loss 12.3827 LearningRate 0.0674 Epoch: 3 Global Step: 44470 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:54:39,307-Speed 3103.48 samples/sec Loss 12.2846 LearningRate 0.0674 Epoch: 3 Global Step: 44480 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:54:42,608-Speed 3102.27 samples/sec Loss 12.2859 LearningRate 0.0674 Epoch: 3 Global Step: 44490 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:54:45,967-Speed 3049.52 samples/sec Loss 12.3547 LearningRate 0.0674 Epoch: 3 Global Step: 44500 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:54:49,384-Speed 2997.89 samples/sec Loss 12.2469 LearningRate 0.0674 Epoch: 3 Global Step: 44510 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:54:52,701-Speed 3087.56 samples/sec Loss 12.3763 LearningRate 0.0674 Epoch: 3 Global Step: 44520 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:54:56,002-Speed 3103.47 samples/sec Loss 12.3060 LearningRate 0.0674 Epoch: 3 Global Step: 44530 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:54:59,290-Speed 3114.84 samples/sec Loss 12.1564 LearningRate 0.0674 Epoch: 3 Global Step: 44540 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:55:02,597-Speed 3097.19 samples/sec Loss 12.2491 LearningRate 0.0673 Epoch: 3 Global Step: 44550 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:55:06,013-Speed 2999.02 samples/sec Loss 12.4150 LearningRate 0.0673 Epoch: 3 Global Step: 44560 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:55:09,364-Speed 3056.61 samples/sec Loss 12.4344 LearningRate 0.0673 Epoch: 3 Global Step: 44570 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:55:12,733-Speed 3040.34 samples/sec Loss 12.2977 LearningRate 0.0673 Epoch: 3 Global Step: 44580 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:55:16,075-Speed 3065.14 samples/sec Loss 12.4036 LearningRate 0.0673 Epoch: 3 Global Step: 44590 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:55:19,401-Speed 3079.13 samples/sec Loss 12.3158 LearningRate 0.0673 Epoch: 3 Global Step: 44600 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:55:22,735-Speed 3072.75 samples/sec Loss 12.1688 LearningRate 0.0673 Epoch: 3 Global Step: 44610 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:55:26,073-Speed 3068.35 samples/sec Loss 12.4049 LearningRate 0.0673 Epoch: 3 Global Step: 44620 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:55:29,383-Speed 3094.19 samples/sec Loss 12.2311 LearningRate 0.0673 Epoch: 3 Global Step: 44630 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:55:32,776-Speed 3018.82 samples/sec Loss 12.3083 LearningRate 0.0673 Epoch: 3 Global Step: 44640 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:55:36,103-Speed 3078.72 samples/sec Loss 12.3306 LearningRate 0.0673 Epoch: 3 Global Step: 44650 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:55:39,446-Speed 3064.27 samples/sec Loss 12.3022 LearningRate 0.0673 Epoch: 3 Global Step: 44660 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:55:42,802-Speed 3052.11 samples/sec Loss 12.4138 LearningRate 0.0673 Epoch: 3 Global Step: 44670 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:55:46,133-Speed 3074.78 samples/sec Loss 12.2831 LearningRate 0.0673 Epoch: 3 Global Step: 44680 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:55:49,412-Speed 3124.33 samples/sec Loss 12.3206 LearningRate 0.0673 Epoch: 3 Global Step: 44690 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:55:52,714-Speed 3101.92 samples/sec Loss 12.4565 LearningRate 0.0673 Epoch: 3 Global Step: 44700 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:55:56,040-Speed 3080.22 samples/sec Loss 12.2082 LearningRate 0.0672 Epoch: 3 Global Step: 44710 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:55:59,371-Speed 3074.52 samples/sec Loss 12.2257 LearningRate 0.0672 Epoch: 3 Global Step: 44720 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:56:02,721-Speed 3057.41 samples/sec Loss 12.2184 LearningRate 0.0672 Epoch: 3 Global Step: 44730 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:56:05,997-Speed 3126.98 samples/sec Loss 12.2583 LearningRate 0.0672 Epoch: 3 Global Step: 44740 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:56:09,359-Speed 3046.85 samples/sec Loss 12.4327 LearningRate 0.0672 Epoch: 3 Global Step: 44750 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:56:12,720-Speed 3047.51 samples/sec Loss 12.4521 LearningRate 0.0672 Epoch: 3 Global Step: 44760 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:56:16,064-Speed 3063.03 samples/sec Loss 12.3670 LearningRate 0.0672 Epoch: 3 Global Step: 44770 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:56:19,412-Speed 3059.88 samples/sec Loss 12.3522 LearningRate 0.0672 Epoch: 3 Global Step: 44780 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:56:22,689-Speed 3124.97 samples/sec Loss 12.1413 LearningRate 0.0672 Epoch: 3 Global Step: 44790 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:56:26,026-Speed 3069.80 samples/sec Loss 12.3570 LearningRate 0.0672 Epoch: 3 Global Step: 44800 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:56:29,326-Speed 3104.50 samples/sec Loss 12.3182 LearningRate 0.0672 Epoch: 3 Global Step: 44810 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:56:32,626-Speed 3104.19 samples/sec Loss 12.3513 LearningRate 0.0672 Epoch: 3 Global Step: 44820 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:56:35,932-Speed 3098.64 samples/sec Loss 12.2420 LearningRate 0.0672 Epoch: 3 Global Step: 44830 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:56:39,269-Speed 3069.20 samples/sec Loss 12.3663 LearningRate 0.0672 Epoch: 3 Global Step: 44840 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:56:42,579-Speed 3094.91 samples/sec Loss 12.2617 LearningRate 0.0672 Epoch: 3 Global Step: 44850 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:56:45,905-Speed 3079.72 samples/sec Loss 12.2341 LearningRate 0.0671 Epoch: 3 Global Step: 44860 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:56:49,232-Speed 3077.79 samples/sec Loss 12.3828 LearningRate 0.0671 Epoch: 3 Global Step: 44870 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:56:52,592-Speed 3048.86 samples/sec Loss 12.3646 LearningRate 0.0671 Epoch: 3 Global Step: 44880 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:56:55,874-Speed 3120.69 samples/sec Loss 12.3610 LearningRate 0.0671 Epoch: 3 Global Step: 44890 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:56:59,205-Speed 3075.29 samples/sec Loss 12.3220 LearningRate 0.0671 Epoch: 3 Global Step: 44900 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:57:02,466-Speed 3141.24 samples/sec Loss 12.2149 LearningRate 0.0671 Epoch: 3 Global Step: 44910 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:57:05,829-Speed 3045.85 samples/sec Loss 12.2787 LearningRate 0.0671 Epoch: 3 Global Step: 44920 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:57:09,105-Speed 3126.35 samples/sec Loss 12.2565 LearningRate 0.0671 Epoch: 3 Global Step: 44930 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:57:12,486-Speed 3029.61 samples/sec Loss 12.2046 LearningRate 0.0671 Epoch: 3 Global Step: 44940 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:57:15,784-Speed 3106.52 samples/sec Loss 12.3575 LearningRate 0.0671 Epoch: 3 Global Step: 44950 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:57:19,098-Speed 3090.63 samples/sec Loss 12.2137 LearningRate 0.0671 Epoch: 3 Global Step: 44960 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:57:22,418-Speed 3085.71 samples/sec Loss 12.2353 LearningRate 0.0671 Epoch: 3 Global Step: 44970 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:57:25,773-Speed 3052.70 samples/sec Loss 12.2729 LearningRate 0.0671 Epoch: 3 Global Step: 44980 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:57:29,093-Speed 3085.59 samples/sec Loss 12.2524 LearningRate 0.0671 Epoch: 3 Global Step: 44990 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:57:32,374-Speed 3121.26 samples/sec Loss 12.2103 LearningRate 0.0671 Epoch: 3 Global Step: 45000 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:57:35,621-Speed 3154.76 samples/sec Loss 12.2746 LearningRate 0.0670 Epoch: 3 Global Step: 45010 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:57:38,971-Speed 3058.57 samples/sec Loss 12.2340 LearningRate 0.0670 Epoch: 3 Global Step: 45020 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:57:42,330-Speed 3049.61 samples/sec Loss 12.2470 LearningRate 0.0670 Epoch: 3 Global Step: 45030 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:57:45,679-Speed 3059.29 samples/sec Loss 12.2890 LearningRate 0.0670 Epoch: 3 Global Step: 45040 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:57:48,959-Speed 3124.53 samples/sec Loss 12.1206 LearningRate 0.0670 Epoch: 3 Global Step: 45050 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:57:52,270-Speed 3094.09 samples/sec Loss 12.1884 LearningRate 0.0670 Epoch: 3 Global Step: 45060 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:57:55,656-Speed 3025.72 samples/sec Loss 12.2108 LearningRate 0.0670 Epoch: 3 Global Step: 45070 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:57:59,008-Speed 3055.23 samples/sec Loss 12.3746 LearningRate 0.0670 Epoch: 3 Global Step: 45080 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:58:02,403-Speed 3017.94 samples/sec Loss 12.1899 LearningRate 0.0670 Epoch: 3 Global Step: 45090 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:58:05,710-Speed 3097.37 samples/sec Loss 12.2396 LearningRate 0.0670 Epoch: 3 Global Step: 45100 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:58:09,020-Speed 3094.34 samples/sec Loss 12.3952 LearningRate 0.0670 Epoch: 3 Global Step: 45110 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:58:12,324-Speed 3099.84 samples/sec Loss 12.2921 LearningRate 0.0670 Epoch: 3 Global Step: 45120 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:58:15,648-Speed 3081.71 samples/sec Loss 12.3781 LearningRate 0.0670 Epoch: 3 Global Step: 45130 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:58:18,954-Speed 3098.10 samples/sec Loss 12.2428 LearningRate 0.0670 Epoch: 3 Global Step: 45140 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:58:22,295-Speed 3066.31 samples/sec Loss 12.2178 LearningRate 0.0670 Epoch: 3 Global Step: 45150 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:58:25,651-Speed 3052.02 samples/sec Loss 12.1595 LearningRate 0.0669 Epoch: 3 Global Step: 45160 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:58:28,962-Speed 3093.57 samples/sec Loss 12.1822 LearningRate 0.0669 Epoch: 3 Global Step: 45170 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:58:32,273-Speed 3093.42 samples/sec Loss 12.2966 LearningRate 0.0669 Epoch: 3 Global Step: 45180 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:58:35,641-Speed 3041.97 samples/sec Loss 12.3516 LearningRate 0.0669 Epoch: 3 Global Step: 45190 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:58:38,977-Speed 3069.70 samples/sec Loss 12.2980 LearningRate 0.0669 Epoch: 3 Global Step: 45200 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:58:42,272-Speed 3109.23 samples/sec Loss 12.1896 LearningRate 0.0669 Epoch: 3 Global Step: 45210 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-04-27 05:58:45,593-Speed 3085.41 samples/sec Loss 12.3391 LearningRate 0.0669 Epoch: 3 Global Step: 45220 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:58:49,061-Speed 2952.98 samples/sec Loss 12.3223 LearningRate 0.0669 Epoch: 3 Global Step: 45230 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:58:52,434-Speed 3036.65 samples/sec Loss 12.3270 LearningRate 0.0669 Epoch: 3 Global Step: 45240 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:58:55,805-Speed 3039.41 samples/sec Loss 12.1968 LearningRate 0.0669 Epoch: 3 Global Step: 45250 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:58:59,175-Speed 3038.72 samples/sec Loss 12.1952 LearningRate 0.0669 Epoch: 3 Global Step: 45260 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:59:02,584-Speed 3005.44 samples/sec Loss 12.3980 LearningRate 0.0669 Epoch: 3 Global Step: 45270 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:59:05,847-Speed 3138.91 samples/sec Loss 12.1447 LearningRate 0.0669 Epoch: 3 Global Step: 45280 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:59:09,202-Speed 3053.14 samples/sec Loss 12.1274 LearningRate 0.0669 Epoch: 3 Global Step: 45290 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:59:12,523-Speed 3084.39 samples/sec Loss 12.1944 LearningRate 0.0669 Epoch: 3 Global Step: 45300 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:59:15,924-Speed 3011.43 samples/sec Loss 12.1533 LearningRate 0.0668 Epoch: 3 Global Step: 45310 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 05:59:19,298-Speed 3035.67 samples/sec Loss 12.2455 LearningRate 0.0668 Epoch: 3 Global Step: 45320 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:59:22,673-Speed 3034.83 samples/sec Loss 12.3415 LearningRate 0.0668 Epoch: 3 Global Step: 45330 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:59:26,086-Speed 3001.35 samples/sec Loss 12.2339 LearningRate 0.0668 Epoch: 3 Global Step: 45340 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:59:29,433-Speed 3060.54 samples/sec Loss 12.2373 LearningRate 0.0668 Epoch: 3 Global Step: 45350 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:59:32,722-Speed 3113.96 samples/sec Loss 12.1898 LearningRate 0.0668 Epoch: 3 Global Step: 45360 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:59:36,046-Speed 3081.86 samples/sec Loss 12.2014 LearningRate 0.0668 Epoch: 3 Global Step: 45370 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:59:39,495-Speed 2969.50 samples/sec Loss 12.3762 LearningRate 0.0668 Epoch: 3 Global Step: 45380 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:59:42,819-Speed 3081.67 samples/sec Loss 12.2697 LearningRate 0.0668 Epoch: 3 Global Step: 45390 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:59:46,148-Speed 3076.99 samples/sec Loss 12.3187 LearningRate 0.0668 Epoch: 3 Global Step: 45400 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:59:49,451-Speed 3100.97 samples/sec Loss 12.2633 LearningRate 0.0668 Epoch: 3 Global Step: 45410 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:59:52,790-Speed 3067.98 samples/sec Loss 12.3408 LearningRate 0.0668 Epoch: 3 Global Step: 45420 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:59:56,103-Speed 3091.34 samples/sec Loss 12.1121 LearningRate 0.0668 Epoch: 3 Global Step: 45430 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 05:59:59,467-Speed 3045.28 samples/sec Loss 12.1909 LearningRate 0.0668 Epoch: 3 Global Step: 45440 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:00:02,761-Speed 3109.61 samples/sec Loss 12.2458 LearningRate 0.0668 Epoch: 3 Global Step: 45450 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:00:06,046-Speed 3118.05 samples/sec Loss 12.4000 LearningRate 0.0667 Epoch: 3 Global Step: 45460 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:00:09,413-Speed 3041.42 samples/sec Loss 12.1353 LearningRate 0.0667 Epoch: 3 Global Step: 45470 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:00:12,795-Speed 3029.58 samples/sec Loss 12.1258 LearningRate 0.0667 Epoch: 3 Global Step: 45480 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:00:16,236-Speed 2976.86 samples/sec Loss 12.3625 LearningRate 0.0667 Epoch: 3 Global Step: 45490 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:00:19,581-Speed 3061.95 samples/sec Loss 12.0884 LearningRate 0.0667 Epoch: 3 Global Step: 45500 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:00:22,840-Speed 3143.17 samples/sec Loss 12.2661 LearningRate 0.0667 Epoch: 3 Global Step: 45510 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:00:26,120-Speed 3122.33 samples/sec Loss 12.2945 LearningRate 0.0667 Epoch: 3 Global Step: 45520 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:00:29,429-Speed 3096.31 samples/sec Loss 12.1118 LearningRate 0.0667 Epoch: 3 Global Step: 45530 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:00:32,815-Speed 3024.82 samples/sec Loss 12.2109 LearningRate 0.0667 Epoch: 3 Global Step: 45540 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:00:36,137-Speed 3082.97 samples/sec Loss 12.2981 LearningRate 0.0667 Epoch: 3 Global Step: 45550 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:00:39,442-Speed 3099.64 samples/sec Loss 12.1699 LearningRate 0.0667 Epoch: 3 Global Step: 45560 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:00:42,758-Speed 3088.59 samples/sec Loss 12.0507 LearningRate 0.0667 Epoch: 3 Global Step: 45570 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:00:46,063-Speed 3100.60 samples/sec Loss 12.3340 LearningRate 0.0667 Epoch: 3 Global Step: 45580 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:00:49,436-Speed 3036.72 samples/sec Loss 12.1204 LearningRate 0.0667 Epoch: 3 Global Step: 45590 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:00:52,830-Speed 3017.18 samples/sec Loss 12.1528 LearningRate 0.0667 Epoch: 3 Global Step: 45600 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:00:56,166-Speed 3070.85 samples/sec Loss 12.2769 LearningRate 0.0667 Epoch: 3 Global Step: 45610 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:00:59,517-Speed 3056.35 samples/sec Loss 12.2296 LearningRate 0.0666 Epoch: 3 Global Step: 45620 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:01:02,915-Speed 3014.92 samples/sec Loss 12.3475 LearningRate 0.0666 Epoch: 3 Global Step: 45630 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:01:06,252-Speed 3069.16 samples/sec Loss 12.3297 LearningRate 0.0666 Epoch: 3 Global Step: 45640 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:01:09,583-Speed 3075.38 samples/sec Loss 12.2000 LearningRate 0.0666 Epoch: 3 Global Step: 45650 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:01:12,883-Speed 3104.17 samples/sec Loss 12.1228 LearningRate 0.0666 Epoch: 3 Global Step: 45660 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:01:16,198-Speed 3092.20 samples/sec Loss 12.1283 LearningRate 0.0666 Epoch: 3 Global Step: 45670 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:01:19,496-Speed 3105.57 samples/sec Loss 12.1286 LearningRate 0.0666 Epoch: 3 Global Step: 45680 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:01:22,804-Speed 3096.03 samples/sec Loss 12.3008 LearningRate 0.0666 Epoch: 3 Global Step: 45690 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:01:26,191-Speed 3024.12 samples/sec Loss 12.1592 LearningRate 0.0666 Epoch: 3 Global Step: 45700 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:01:29,531-Speed 3067.34 samples/sec Loss 12.2270 LearningRate 0.0666 Epoch: 3 Global Step: 45710 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:01:32,886-Speed 3052.88 samples/sec Loss 12.3166 LearningRate 0.0666 Epoch: 3 Global Step: 45720 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:01:36,284-Speed 3014.98 samples/sec Loss 12.1846 LearningRate 0.0666 Epoch: 3 Global Step: 45730 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:01:39,601-Speed 3088.05 samples/sec Loss 12.2877 LearningRate 0.0666 Epoch: 3 Global Step: 45740 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:01:42,961-Speed 3048.89 samples/sec Loss 12.2481 LearningRate 0.0666 Epoch: 3 Global Step: 45750 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:01:46,212-Speed 3150.26 samples/sec Loss 12.3965 LearningRate 0.0666 Epoch: 3 Global Step: 45760 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:01:49,498-Speed 3117.66 samples/sec Loss 12.3217 LearningRate 0.0665 Epoch: 3 Global Step: 45770 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:01:52,814-Speed 3088.61 samples/sec Loss 12.3936 LearningRate 0.0665 Epoch: 3 Global Step: 45780 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:01:56,122-Speed 3096.05 samples/sec Loss 12.2450 LearningRate 0.0665 Epoch: 3 Global Step: 45790 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:01:59,412-Speed 3114.03 samples/sec Loss 12.2030 LearningRate 0.0665 Epoch: 3 Global Step: 45800 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:02:02,703-Speed 3112.15 samples/sec Loss 12.1002 LearningRate 0.0665 Epoch: 3 Global Step: 45810 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:02:05,975-Speed 3131.42 samples/sec Loss 12.3087 LearningRate 0.0665 Epoch: 3 Global Step: 45820 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:02:09,277-Speed 3102.45 samples/sec Loss 12.1870 LearningRate 0.0665 Epoch: 3 Global Step: 45830 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:02:12,562-Speed 3118.39 samples/sec Loss 12.3334 LearningRate 0.0665 Epoch: 3 Global Step: 45840 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:02:15,855-Speed 3109.77 samples/sec Loss 12.3737 LearningRate 0.0665 Epoch: 3 Global Step: 45850 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:02:19,213-Speed 3049.98 samples/sec Loss 12.2590 LearningRate 0.0665 Epoch: 3 Global Step: 45860 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:02:22,589-Speed 3034.07 samples/sec Loss 12.2717 LearningRate 0.0665 Epoch: 3 Global Step: 45870 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:02:25,978-Speed 3022.81 samples/sec Loss 12.1308 LearningRate 0.0665 Epoch: 3 Global Step: 45880 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:02:29,369-Speed 3020.64 samples/sec Loss 12.2963 LearningRate 0.0665 Epoch: 3 Global Step: 45890 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:02:32,666-Speed 3106.37 samples/sec Loss 12.1359 LearningRate 0.0665 Epoch: 3 Global Step: 45900 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:02:36,047-Speed 3029.69 samples/sec Loss 12.0449 LearningRate 0.0665 Epoch: 3 Global Step: 45910 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:02:39,398-Speed 3056.64 samples/sec Loss 12.2284 LearningRate 0.0664 Epoch: 3 Global Step: 45920 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:02:42,698-Speed 3104.49 samples/sec Loss 12.0871 LearningRate 0.0664 Epoch: 3 Global Step: 45930 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:02:46,004-Speed 3097.90 samples/sec Loss 12.1692 LearningRate 0.0664 Epoch: 3 Global Step: 45940 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:02:49,339-Speed 3071.92 samples/sec Loss 12.1835 LearningRate 0.0664 Epoch: 3 Global Step: 45950 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:02:52,675-Speed 3070.09 samples/sec Loss 12.1830 LearningRate 0.0664 Epoch: 3 Global Step: 45960 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:02:56,062-Speed 3024.69 samples/sec Loss 12.1724 LearningRate 0.0664 Epoch: 3 Global Step: 45970 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:02:59,429-Speed 3041.91 samples/sec Loss 12.2625 LearningRate 0.0664 Epoch: 3 Global Step: 45980 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:03:02,801-Speed 3037.40 samples/sec Loss 12.1990 LearningRate 0.0664 Epoch: 3 Global Step: 45990 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:03:06,175-Speed 3035.58 samples/sec Loss 12.0902 LearningRate 0.0664 Epoch: 3 Global Step: 46000 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:03:09,471-Speed 3108.45 samples/sec Loss 12.0105 LearningRate 0.0664 Epoch: 3 Global Step: 46010 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:03:12,820-Speed 3058.40 samples/sec Loss 12.2529 LearningRate 0.0664 Epoch: 3 Global Step: 46020 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:03:16,135-Speed 3090.59 samples/sec Loss 12.1180 LearningRate 0.0664 Epoch: 3 Global Step: 46030 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:03:19,416-Speed 3121.48 samples/sec Loss 12.2574 LearningRate 0.0664 Epoch: 3 Global Step: 46040 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:03:22,716-Speed 3104.20 samples/sec Loss 12.1624 LearningRate 0.0664 Epoch: 3 Global Step: 46050 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:03:26,053-Speed 3069.61 samples/sec Loss 12.1019 LearningRate 0.0664 Epoch: 3 Global Step: 46060 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:03:29,424-Speed 3038.12 samples/sec Loss 12.2317 LearningRate 0.0663 Epoch: 3 Global Step: 46070 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:03:32,760-Speed 3070.83 samples/sec Loss 12.3519 LearningRate 0.0663 Epoch: 3 Global Step: 46080 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:03:36,058-Speed 3105.99 samples/sec Loss 11.9728 LearningRate 0.0663 Epoch: 3 Global Step: 46090 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:03:39,351-Speed 3110.53 samples/sec Loss 12.1091 LearningRate 0.0663 Epoch: 3 Global Step: 46100 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:03:42,691-Speed 3066.82 samples/sec Loss 12.1851 LearningRate 0.0663 Epoch: 3 Global Step: 46110 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:03:46,028-Speed 3069.81 samples/sec Loss 12.3646 LearningRate 0.0663 Epoch: 3 Global Step: 46120 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:03:49,370-Speed 3064.08 samples/sec Loss 12.2763 LearningRate 0.0663 Epoch: 3 Global Step: 46130 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:03:52,733-Speed 3046.48 samples/sec Loss 12.2401 LearningRate 0.0663 Epoch: 3 Global Step: 46140 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:03:56,064-Speed 3074.41 samples/sec Loss 12.1488 LearningRate 0.0663 Epoch: 3 Global Step: 46150 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:03:59,328-Speed 3138.39 samples/sec Loss 12.0901 LearningRate 0.0663 Epoch: 3 Global Step: 46160 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-04-27 06:04:02,695-Speed 3042.08 samples/sec Loss 12.1838 LearningRate 0.0663 Epoch: 3 Global Step: 46170 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:04:06,035-Speed 3066.95 samples/sec Loss 12.1757 LearningRate 0.0663 Epoch: 3 Global Step: 46180 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:04:09,351-Speed 3088.87 samples/sec Loss 12.2286 LearningRate 0.0663 Epoch: 3 Global Step: 46190 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:04:12,704-Speed 3055.51 samples/sec Loss 12.1668 LearningRate 0.0663 Epoch: 3 Global Step: 46200 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:04:16,076-Speed 3037.36 samples/sec Loss 12.3754 LearningRate 0.0663 Epoch: 3 Global Step: 46210 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:04:19,400-Speed 3082.34 samples/sec Loss 12.1156 LearningRate 0.0663 Epoch: 3 Global Step: 46220 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:04:22,732-Speed 3073.61 samples/sec Loss 12.3309 LearningRate 0.0662 Epoch: 3 Global Step: 46230 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:04:26,039-Speed 3097.69 samples/sec Loss 12.3336 LearningRate 0.0662 Epoch: 3 Global Step: 46240 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:04:29,338-Speed 3104.91 samples/sec Loss 12.2384 LearningRate 0.0662 Epoch: 3 Global Step: 46250 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:04:32,641-Speed 3101.38 samples/sec Loss 12.0258 LearningRate 0.0662 Epoch: 3 Global Step: 46260 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:04:35,991-Speed 3058.01 samples/sec Loss 12.1039 LearningRate 0.0662 Epoch: 3 Global Step: 46270 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:04:39,289-Speed 3106.20 samples/sec Loss 12.3052 LearningRate 0.0662 Epoch: 3 Global Step: 46280 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:04:42,641-Speed 3055.11 samples/sec Loss 12.2394 LearningRate 0.0662 Epoch: 3 Global Step: 46290 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:04:45,937-Speed 3108.31 samples/sec Loss 12.1883 LearningRate 0.0662 Epoch: 3 Global Step: 46300 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:04:49,352-Speed 2998.64 samples/sec Loss 12.1557 LearningRate 0.0662 Epoch: 3 Global Step: 46310 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:04:52,719-Speed 3042.38 samples/sec Loss 12.2865 LearningRate 0.0662 Epoch: 3 Global Step: 46320 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:04:56,127-Speed 3006.14 samples/sec Loss 12.2572 LearningRate 0.0662 Epoch: 3 Global Step: 46330 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:04:59,465-Speed 3068.82 samples/sec Loss 12.1056 LearningRate 0.0662 Epoch: 3 Global Step: 46340 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:05:02,853-Speed 3022.95 samples/sec Loss 12.1057 LearningRate 0.0662 Epoch: 3 Global Step: 46350 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:05:06,182-Speed 3077.38 samples/sec Loss 12.0857 LearningRate 0.0662 Epoch: 3 Global Step: 46360 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:05:09,524-Speed 3064.79 samples/sec Loss 12.1423 LearningRate 0.0662 Epoch: 3 Global Step: 46370 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:05:12,931-Speed 3005.83 samples/sec Loss 12.2254 LearningRate 0.0661 Epoch: 3 Global Step: 46380 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:05:16,210-Speed 3124.43 samples/sec Loss 12.2817 LearningRate 0.0661 Epoch: 3 Global Step: 46390 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:05:19,507-Speed 3106.79 samples/sec Loss 12.1150 LearningRate 0.0661 Epoch: 3 Global Step: 46400 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:05:22,814-Speed 3097.31 samples/sec Loss 12.0503 LearningRate 0.0661 Epoch: 3 Global Step: 46410 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:05:26,136-Speed 3083.04 samples/sec Loss 12.2277 LearningRate 0.0661 Epoch: 3 Global Step: 46420 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:05:29,511-Speed 3035.46 samples/sec Loss 12.1283 LearningRate 0.0661 Epoch: 3 Global Step: 46430 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:05:32,837-Speed 3079.06 samples/sec Loss 12.1088 LearningRate 0.0661 Epoch: 3 Global Step: 46440 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:05:36,157-Speed 3084.95 samples/sec Loss 12.1310 LearningRate 0.0661 Epoch: 3 Global Step: 46450 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:05:39,477-Speed 3085.75 samples/sec Loss 12.1815 LearningRate 0.0661 Epoch: 3 Global Step: 46460 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:05:42,823-Speed 3060.29 samples/sec Loss 12.0538 LearningRate 0.0661 Epoch: 3 Global Step: 46470 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:05:46,084-Speed 3142.01 samples/sec Loss 12.1102 LearningRate 0.0661 Epoch: 3 Global Step: 46480 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:05:49,398-Speed 3090.84 samples/sec Loss 12.0994 LearningRate 0.0661 Epoch: 3 Global Step: 46490 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:05:52,757-Speed 3049.49 samples/sec Loss 12.0570 LearningRate 0.0661 Epoch: 3 Global Step: 46500 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:05:56,101-Speed 3062.90 samples/sec Loss 12.1123 LearningRate 0.0661 Epoch: 3 Global Step: 46510 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:05:59,515-Speed 2999.94 samples/sec Loss 12.3364 LearningRate 0.0661 Epoch: 3 Global Step: 46520 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:06:02,928-Speed 3001.05 samples/sec Loss 12.0844 LearningRate 0.0660 Epoch: 3 Global Step: 46530 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:06:06,300-Speed 3037.95 samples/sec Loss 12.2107 LearningRate 0.0660 Epoch: 3 Global Step: 46540 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:06:09,716-Speed 2998.09 samples/sec Loss 12.1246 LearningRate 0.0660 Epoch: 3 Global Step: 46550 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:06:13,039-Speed 3082.16 samples/sec Loss 12.0701 LearningRate 0.0660 Epoch: 3 Global Step: 46560 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:06:16,392-Speed 3055.88 samples/sec Loss 12.3296 LearningRate 0.0660 Epoch: 3 Global Step: 46570 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:06:19,794-Speed 3010.24 samples/sec Loss 12.1560 LearningRate 0.0660 Epoch: 3 Global Step: 46580 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:06:23,161-Speed 3042.29 samples/sec Loss 12.0234 LearningRate 0.0660 Epoch: 3 Global Step: 46590 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:06:26,544-Speed 3027.47 samples/sec Loss 12.2341 LearningRate 0.0660 Epoch: 3 Global Step: 46600 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:06:29,848-Speed 3100.37 samples/sec Loss 12.0960 LearningRate 0.0660 Epoch: 3 Global Step: 46610 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:06:33,170-Speed 3083.93 samples/sec Loss 12.0924 LearningRate 0.0660 Epoch: 3 Global Step: 46620 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:06:36,466-Speed 3107.37 samples/sec Loss 11.9782 LearningRate 0.0660 Epoch: 3 Global Step: 46630 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-04-27 06:06:39,779-Speed 3091.49 samples/sec Loss 12.2870 LearningRate 0.0660 Epoch: 3 Global Step: 46640 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:06:43,058-Speed 3124.50 samples/sec Loss 12.0518 LearningRate 0.0660 Epoch: 3 Global Step: 46650 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:06:46,404-Speed 3060.84 samples/sec Loss 12.2461 LearningRate 0.0660 Epoch: 3 Global Step: 46660 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:06:49,690-Speed 3116.87 samples/sec Loss 12.0692 LearningRate 0.0660 Epoch: 3 Global Step: 46670 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:06:52,988-Speed 3106.30 samples/sec Loss 12.2173 LearningRate 0.0659 Epoch: 3 Global Step: 46680 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:06:56,311-Speed 3082.54 samples/sec Loss 12.1715 LearningRate 0.0659 Epoch: 3 Global Step: 46690 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:06:59,660-Speed 3058.32 samples/sec Loss 12.1746 LearningRate 0.0659 Epoch: 3 Global Step: 46700 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:07:03,001-Speed 3066.10 samples/sec Loss 12.2786 LearningRate 0.0659 Epoch: 3 Global Step: 46710 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:07:06,351-Speed 3057.93 samples/sec Loss 12.1090 LearningRate 0.0659 Epoch: 3 Global Step: 46720 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:07:09,707-Speed 3052.00 samples/sec Loss 12.0826 LearningRate 0.0659 Epoch: 3 Global Step: 46730 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:07:13,040-Speed 3072.98 samples/sec Loss 12.0911 LearningRate 0.0659 Epoch: 3 Global Step: 46740 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:07:16,403-Speed 3046.18 samples/sec Loss 12.1108 LearningRate 0.0659 Epoch: 3 Global Step: 46750 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:07:19,783-Speed 3030.43 samples/sec Loss 12.1841 LearningRate 0.0659 Epoch: 3 Global Step: 46760 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:07:23,093-Speed 3094.97 samples/sec Loss 12.0911 LearningRate 0.0659 Epoch: 3 Global Step: 46770 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:07:26,480-Speed 3024.23 samples/sec Loss 12.1768 LearningRate 0.0659 Epoch: 3 Global Step: 46780 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:07:29,822-Speed 3064.19 samples/sec Loss 12.1760 LearningRate 0.0659 Epoch: 3 Global Step: 46790 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:07:33,205-Speed 3027.90 samples/sec Loss 12.1735 LearningRate 0.0659 Epoch: 3 Global Step: 46800 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:07:36,593-Speed 3023.80 samples/sec Loss 12.1883 LearningRate 0.0659 Epoch: 3 Global Step: 46810 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:07:39,953-Speed 3048.38 samples/sec Loss 12.0950 LearningRate 0.0659 Epoch: 3 Global Step: 46820 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:07:43,342-Speed 3022.94 samples/sec Loss 12.2753 LearningRate 0.0659 Epoch: 3 Global Step: 46830 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:07:46,692-Speed 3057.19 samples/sec Loss 12.0439 LearningRate 0.0658 Epoch: 3 Global Step: 46840 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:07:50,000-Speed 3096.94 samples/sec Loss 12.1505 LearningRate 0.0658 Epoch: 3 Global Step: 46850 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:07:53,376-Speed 3034.11 samples/sec Loss 12.1665 LearningRate 0.0658 Epoch: 3 Global Step: 46860 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:07:56,738-Speed 3046.65 samples/sec Loss 12.1962 LearningRate 0.0658 Epoch: 3 Global Step: 46870 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:08:00,058-Speed 3084.72 samples/sec Loss 12.1061 LearningRate 0.0658 Epoch: 3 Global Step: 46880 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:08:03,330-Speed 3130.35 samples/sec Loss 12.1100 LearningRate 0.0658 Epoch: 3 Global Step: 46890 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:08:06,662-Speed 3074.68 samples/sec Loss 12.0665 LearningRate 0.0658 Epoch: 3 Global Step: 46900 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:08:10,009-Speed 3060.72 samples/sec Loss 12.1966 LearningRate 0.0658 Epoch: 3 Global Step: 46910 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:08:13,401-Speed 3019.09 samples/sec Loss 12.2276 LearningRate 0.0658 Epoch: 3 Global Step: 46920 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:08:16,764-Speed 3046.17 samples/sec Loss 12.1021 LearningRate 0.0658 Epoch: 3 Global Step: 46930 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:08:20,053-Speed 3114.23 samples/sec Loss 12.1109 LearningRate 0.0658 Epoch: 3 Global Step: 46940 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:08:23,355-Speed 3102.95 samples/sec Loss 12.2303 LearningRate 0.0658 Epoch: 3 Global Step: 46950 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:08:26,682-Speed 3078.90 samples/sec Loss 12.2072 LearningRate 0.0658 Epoch: 3 Global Step: 46960 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:08:30,021-Speed 3067.25 samples/sec Loss 12.0061 LearningRate 0.0658 Epoch: 3 Global Step: 46970 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:08:33,342-Speed 3084.53 samples/sec Loss 12.2295 LearningRate 0.0658 Epoch: 3 Global Step: 46980 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:08:36,649-Speed 3097.68 samples/sec Loss 12.0647 LearningRate 0.0657 Epoch: 3 Global Step: 46990 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:08:39,939-Speed 3113.61 samples/sec Loss 11.9765 LearningRate 0.0657 Epoch: 3 Global Step: 47000 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:08:43,296-Speed 3050.54 samples/sec Loss 12.3553 LearningRate 0.0657 Epoch: 3 Global Step: 47010 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:08:46,670-Speed 3036.22 samples/sec Loss 12.2312 LearningRate 0.0657 Epoch: 3 Global Step: 47020 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:08:50,013-Speed 3064.25 samples/sec Loss 12.0114 LearningRate 0.0657 Epoch: 3 Global Step: 47030 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:08:53,408-Speed 3016.73 samples/sec Loss 12.1842 LearningRate 0.0657 Epoch: 3 Global Step: 47040 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:08:56,759-Speed 3057.30 samples/sec Loss 12.0820 LearningRate 0.0657 Epoch: 3 Global Step: 47050 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:09:00,115-Speed 3051.32 samples/sec Loss 12.2177 LearningRate 0.0657 Epoch: 3 Global Step: 47060 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:09:03,416-Speed 3103.79 samples/sec Loss 12.2724 LearningRate 0.0657 Epoch: 3 Global Step: 47070 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:09:06,754-Speed 3068.44 samples/sec Loss 11.9520 LearningRate 0.0657 Epoch: 3 Global Step: 47080 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:09:10,061-Speed 3097.69 samples/sec Loss 12.1612 LearningRate 0.0657 Epoch: 3 Global Step: 47090 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:09:13,442-Speed 3029.59 samples/sec Loss 12.2317 LearningRate 0.0657 Epoch: 3 Global Step: 47100 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:09:16,747-Speed 3099.51 samples/sec Loss 12.0277 LearningRate 0.0657 Epoch: 3 Global Step: 47110 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:09:20,080-Speed 3072.91 samples/sec Loss 12.0614 LearningRate 0.0657 Epoch: 3 Global Step: 47120 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:09:23,390-Speed 3094.84 samples/sec Loss 12.1619 LearningRate 0.0657 Epoch: 3 Global Step: 47130 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:09:26,717-Speed 3078.57 samples/sec Loss 12.1054 LearningRate 0.0656 Epoch: 3 Global Step: 47140 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:09:29,998-Speed 3122.12 samples/sec Loss 12.0885 LearningRate 0.0656 Epoch: 3 Global Step: 47150 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:09:33,326-Speed 3077.46 samples/sec Loss 12.1081 LearningRate 0.0656 Epoch: 3 Global Step: 47160 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:09:36,655-Speed 3076.94 samples/sec Loss 12.1633 LearningRate 0.0656 Epoch: 3 Global Step: 47170 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:09:40,012-Speed 3051.07 samples/sec Loss 12.2000 LearningRate 0.0656 Epoch: 3 Global Step: 47180 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:09:43,309-Speed 3107.65 samples/sec Loss 12.1628 LearningRate 0.0656 Epoch: 3 Global Step: 47190 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:09:46,577-Speed 3134.36 samples/sec Loss 12.1095 LearningRate 0.0656 Epoch: 3 Global Step: 47200 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:09:49,949-Speed 3037.42 samples/sec Loss 11.9439 LearningRate 0.0656 Epoch: 3 Global Step: 47210 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:09:53,378-Speed 2987.04 samples/sec Loss 12.1476 LearningRate 0.0656 Epoch: 3 Global Step: 47220 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:09:56,753-Speed 3035.97 samples/sec Loss 12.0995 LearningRate 0.0656 Epoch: 3 Global Step: 47230 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:10:00,121-Speed 3040.91 samples/sec Loss 12.1312 LearningRate 0.0656 Epoch: 3 Global Step: 47240 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:10:03,503-Speed 3028.57 samples/sec Loss 12.1437 LearningRate 0.0656 Epoch: 3 Global Step: 47250 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:10:06,901-Speed 3014.33 samples/sec Loss 12.1619 LearningRate 0.0656 Epoch: 3 Global Step: 47260 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:10:10,232-Speed 3074.95 samples/sec Loss 12.1585 LearningRate 0.0656 Epoch: 3 Global Step: 47270 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:10:13,552-Speed 3085.45 samples/sec Loss 12.1127 LearningRate 0.0656 Epoch: 3 Global Step: 47280 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:10:16,880-Speed 3078.11 samples/sec Loss 12.0247 LearningRate 0.0656 Epoch: 3 Global Step: 47290 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:10:20,245-Speed 3043.91 samples/sec Loss 12.1246 LearningRate 0.0655 Epoch: 3 Global Step: 47300 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:10:23,607-Speed 3046.30 samples/sec Loss 12.0968 LearningRate 0.0655 Epoch: 3 Global Step: 47310 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:10:26,940-Speed 3074.13 samples/sec Loss 12.0554 LearningRate 0.0655 Epoch: 3 Global Step: 47320 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:10:30,286-Speed 3060.54 samples/sec Loss 12.0931 LearningRate 0.0655 Epoch: 3 Global Step: 47330 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:10:33,683-Speed 3016.50 samples/sec Loss 11.9874 LearningRate 0.0655 Epoch: 3 Global Step: 47340 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:10:37,051-Speed 3040.45 samples/sec Loss 11.9903 LearningRate 0.0655 Epoch: 3 Global Step: 47350 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:10:40,385-Speed 3072.38 samples/sec Loss 12.1202 LearningRate 0.0655 Epoch: 3 Global Step: 47360 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:10:43,751-Speed 3043.22 samples/sec Loss 12.0959 LearningRate 0.0655 Epoch: 3 Global Step: 47370 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:10:47,129-Speed 3032.06 samples/sec Loss 11.9588 LearningRate 0.0655 Epoch: 3 Global Step: 47380 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:10:50,534-Speed 3008.04 samples/sec Loss 12.1008 LearningRate 0.0655 Epoch: 3 Global Step: 47390 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:10:53,887-Speed 3055.37 samples/sec Loss 12.0515 LearningRate 0.0655 Epoch: 3 Global Step: 47400 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:10:57,211-Speed 3081.37 samples/sec Loss 12.2769 LearningRate 0.0655 Epoch: 3 Global Step: 47410 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:11:00,560-Speed 3059.02 samples/sec Loss 12.0777 LearningRate 0.0655 Epoch: 3 Global Step: 47420 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:11:03,933-Speed 3036.67 samples/sec Loss 12.1677 LearningRate 0.0655 Epoch: 3 Global Step: 47430 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:11:07,269-Speed 3070.70 samples/sec Loss 12.0520 LearningRate 0.0655 Epoch: 3 Global Step: 47440 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:11:10,538-Speed 3133.72 samples/sec Loss 12.1240 LearningRate 0.0654 Epoch: 3 Global Step: 47450 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:11:13,818-Speed 3122.13 samples/sec Loss 11.9635 LearningRate 0.0654 Epoch: 3 Global Step: 47460 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:11:17,179-Speed 3048.21 samples/sec Loss 12.0728 LearningRate 0.0654 Epoch: 3 Global Step: 47470 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:11:20,534-Speed 3053.07 samples/sec Loss 12.0544 LearningRate 0.0654 Epoch: 3 Global Step: 47480 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:11:23,890-Speed 3051.79 samples/sec Loss 12.1657 LearningRate 0.0654 Epoch: 3 Global Step: 47490 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:11:27,191-Speed 3103.03 samples/sec Loss 12.0951 LearningRate 0.0654 Epoch: 3 Global Step: 47500 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:11:30,569-Speed 3031.94 samples/sec Loss 11.9874 LearningRate 0.0654 Epoch: 3 Global Step: 47510 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-04-27 06:11:33,955-Speed 3025.58 samples/sec Loss 12.0162 LearningRate 0.0654 Epoch: 3 Global Step: 47520 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:11:37,312-Speed 3051.05 samples/sec Loss 12.0082 LearningRate 0.0654 Epoch: 3 Global Step: 47530 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:11:40,725-Speed 3000.94 samples/sec Loss 12.0262 LearningRate 0.0654 Epoch: 3 Global Step: 47540 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:11:44,072-Speed 3060.22 samples/sec Loss 12.0793 LearningRate 0.0654 Epoch: 3 Global Step: 47550 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:11:47,368-Speed 3108.10 samples/sec Loss 11.9963 LearningRate 0.0654 Epoch: 3 Global Step: 47560 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:11:50,669-Speed 3102.85 samples/sec Loss 12.0906 LearningRate 0.0654 Epoch: 3 Global Step: 47570 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:11:54,054-Speed 3026.02 samples/sec Loss 12.1313 LearningRate 0.0654 Epoch: 3 Global Step: 47580 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:11:57,395-Speed 3066.68 samples/sec Loss 12.2214 LearningRate 0.0654 Epoch: 3 Global Step: 47590 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:12:00,741-Speed 3061.01 samples/sec Loss 12.0871 LearningRate 0.0653 Epoch: 3 Global Step: 47600 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:12:04,160-Speed 2995.94 samples/sec Loss 12.1443 LearningRate 0.0653 Epoch: 3 Global Step: 47610 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:12:07,486-Speed 3079.27 samples/sec Loss 12.1358 LearningRate 0.0653 Epoch: 3 Global Step: 47620 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-04-27 06:12:10,756-Speed 3131.96 samples/sec Loss 12.1064 LearningRate 0.0653 Epoch: 3 Global Step: 47630 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:12:14,118-Speed 3047.40 samples/sec Loss 12.0990 LearningRate 0.0653 Epoch: 3 Global Step: 47640 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:12:17,436-Speed 3086.99 samples/sec Loss 12.1277 LearningRate 0.0653 Epoch: 3 Global Step: 47650 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:12:20,805-Speed 3039.71 samples/sec Loss 11.9947 LearningRate 0.0653 Epoch: 3 Global Step: 47660 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:12:24,237-Speed 2985.05 samples/sec Loss 12.0681 LearningRate 0.0653 Epoch: 3 Global Step: 47670 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:12:27,565-Speed 3077.25 samples/sec Loss 11.9034 LearningRate 0.0653 Epoch: 3 Global Step: 47680 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:12:30,960-Speed 3020.48 samples/sec Loss 12.0767 LearningRate 0.0653 Epoch: 3 Global Step: 47690 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:12:34,308-Speed 3059.12 samples/sec Loss 12.0780 LearningRate 0.0653 Epoch: 3 Global Step: 47700 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:12:37,613-Speed 3099.64 samples/sec Loss 12.0977 LearningRate 0.0653 Epoch: 3 Global Step: 47710 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:12:40,998-Speed 3026.43 samples/sec Loss 11.8794 LearningRate 0.0653 Epoch: 3 Global Step: 47720 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:12:44,381-Speed 3027.63 samples/sec Loss 12.0886 LearningRate 0.0653 Epoch: 3 Global Step: 47730 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:12:47,739-Speed 3050.50 samples/sec Loss 12.1018 LearningRate 0.0653 Epoch: 3 Global Step: 47740 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:12:51,038-Speed 3104.95 samples/sec Loss 11.8268 LearningRate 0.0653 Epoch: 3 Global Step: 47750 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:12:54,417-Speed 3031.28 samples/sec Loss 12.0410 LearningRate 0.0652 Epoch: 3 Global Step: 47760 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:12:57,819-Speed 3010.93 samples/sec Loss 11.8594 LearningRate 0.0652 Epoch: 3 Global Step: 47770 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:13:01,195-Speed 3033.90 samples/sec Loss 12.0085 LearningRate 0.0652 Epoch: 3 Global Step: 47780 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:13:04,615-Speed 2994.84 samples/sec Loss 12.0599 LearningRate 0.0652 Epoch: 3 Global Step: 47790 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:13:07,964-Speed 3059.25 samples/sec Loss 11.9746 LearningRate 0.0652 Epoch: 3 Global Step: 47800 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:13:11,318-Speed 3053.75 samples/sec Loss 12.1198 LearningRate 0.0652 Epoch: 3 Global Step: 47810 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:13:14,628-Speed 3093.86 samples/sec Loss 12.0721 LearningRate 0.0652 Epoch: 3 Global Step: 47820 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:13:18,013-Speed 3026.38 samples/sec Loss 12.0804 LearningRate 0.0652 Epoch: 3 Global Step: 47830 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:13:21,353-Speed 3067.25 samples/sec Loss 11.9830 LearningRate 0.0652 Epoch: 3 Global Step: 47840 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:13:24,636-Speed 3119.84 samples/sec Loss 12.0987 LearningRate 0.0652 Epoch: 3 Global Step: 47850 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:13:27,943-Speed 3097.24 samples/sec Loss 12.1273 LearningRate 0.0652 Epoch: 3 Global Step: 47860 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:13:31,241-Speed 3105.31 samples/sec Loss 12.0696 LearningRate 0.0652 Epoch: 3 Global Step: 47870 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:13:34,559-Speed 3087.32 samples/sec Loss 12.0750 LearningRate 0.0652 Epoch: 3 Global Step: 47880 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:13:37,866-Speed 3098.22 samples/sec Loss 12.0281 LearningRate 0.0652 Epoch: 3 Global Step: 47890 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:13:41,140-Speed 3128.39 samples/sec Loss 11.7802 LearningRate 0.0652 Epoch: 3 Global Step: 47900 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:13:44,509-Speed 3040.80 samples/sec Loss 11.9988 LearningRate 0.0651 Epoch: 3 Global Step: 47910 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:13:47,808-Speed 3105.13 samples/sec Loss 11.9639 LearningRate 0.0651 Epoch: 3 Global Step: 47920 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:13:51,116-Speed 3096.43 samples/sec Loss 11.9876 LearningRate 0.0651 Epoch: 3 Global Step: 47930 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:13:54,478-Speed 3046.91 samples/sec Loss 12.0508 LearningRate 0.0651 Epoch: 3 Global Step: 47940 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:13:57,909-Speed 2984.73 samples/sec Loss 11.9857 LearningRate 0.0651 Epoch: 3 Global Step: 47950 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:14:01,265-Speed 3052.74 samples/sec Loss 12.1019 LearningRate 0.0651 Epoch: 3 Global Step: 47960 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:14:04,588-Speed 3082.22 samples/sec Loss 12.0955 LearningRate 0.0651 Epoch: 3 Global Step: 47970 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:14:07,920-Speed 3074.77 samples/sec Loss 12.1171 LearningRate 0.0651 Epoch: 3 Global Step: 47980 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:14:11,215-Speed 3108.49 samples/sec Loss 12.1011 LearningRate 0.0651 Epoch: 3 Global Step: 47990 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:14:14,567-Speed 3055.45 samples/sec Loss 11.9936 LearningRate 0.0651 Epoch: 3 Global Step: 48000 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:14:17,914-Speed 3060.67 samples/sec Loss 12.0503 LearningRate 0.0651 Epoch: 3 Global Step: 48010 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:14:21,282-Speed 3041.28 samples/sec Loss 12.0815 LearningRate 0.0651 Epoch: 3 Global Step: 48020 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:14:24,573-Speed 3112.94 samples/sec Loss 12.0213 LearningRate 0.0651 Epoch: 3 Global Step: 48030 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:14:27,909-Speed 3069.93 samples/sec Loss 11.9802 LearningRate 0.0651 Epoch: 3 Global Step: 48040 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:14:31,204-Speed 3109.02 samples/sec Loss 12.0263 LearningRate 0.0651 Epoch: 3 Global Step: 48050 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:14:34,521-Speed 3088.05 samples/sec Loss 12.0217 LearningRate 0.0651 Epoch: 3 Global Step: 48060 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:14:37,839-Speed 3086.76 samples/sec Loss 11.9985 LearningRate 0.0650 Epoch: 3 Global Step: 48070 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:14:41,142-Speed 3100.87 samples/sec Loss 12.0295 LearningRate 0.0650 Epoch: 3 Global Step: 48080 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:14:44,488-Speed 3061.64 samples/sec Loss 11.9656 LearningRate 0.0650 Epoch: 3 Global Step: 48090 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:14:47,780-Speed 3111.49 samples/sec Loss 12.0758 LearningRate 0.0650 Epoch: 3 Global Step: 48100 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:14:51,071-Speed 3112.16 samples/sec Loss 11.9947 LearningRate 0.0650 Epoch: 3 Global Step: 48110 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:14:54,394-Speed 3083.02 samples/sec Loss 12.1112 LearningRate 0.0650 Epoch: 3 Global Step: 48120 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:14:57,805-Speed 3003.17 samples/sec Loss 12.0732 LearningRate 0.0650 Epoch: 3 Global Step: 48130 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:15:01,157-Speed 3055.21 samples/sec Loss 11.8938 LearningRate 0.0650 Epoch: 3 Global Step: 48140 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:15:04,502-Speed 3062.68 samples/sec Loss 12.0383 LearningRate 0.0650 Epoch: 3 Global Step: 48150 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:15:07,837-Speed 3071.59 samples/sec Loss 12.0893 LearningRate 0.0650 Epoch: 3 Global Step: 48160 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:15:11,163-Speed 3079.41 samples/sec Loss 12.0812 LearningRate 0.0650 Epoch: 3 Global Step: 48170 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:15:14,516-Speed 3054.93 samples/sec Loss 11.9647 LearningRate 0.0650 Epoch: 3 Global Step: 48180 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:15:17,873-Speed 3050.41 samples/sec Loss 12.0607 LearningRate 0.0650 Epoch: 3 Global Step: 48190 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:15:21,308-Speed 2982.43 samples/sec Loss 11.9657 LearningRate 0.0650 Epoch: 3 Global Step: 48200 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:15:24,586-Speed 3124.43 samples/sec Loss 11.8875 LearningRate 0.0650 Epoch: 3 Global Step: 48210 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:15:27,880-Speed 3109.63 samples/sec Loss 11.8199 LearningRate 0.0649 Epoch: 3 Global Step: 48220 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:15:31,282-Speed 3011.34 samples/sec Loss 11.9958 LearningRate 0.0649 Epoch: 3 Global Step: 48230 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:15:34,587-Speed 3098.65 samples/sec Loss 11.8962 LearningRate 0.0649 Epoch: 3 Global Step: 48240 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:15:37,911-Speed 3082.23 samples/sec Loss 11.8996 LearningRate 0.0649 Epoch: 3 Global Step: 48250 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:15:41,277-Speed 3042.89 samples/sec Loss 12.1352 LearningRate 0.0649 Epoch: 3 Global Step: 48260 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:15:44,581-Speed 3100.21 samples/sec Loss 12.0876 LearningRate 0.0649 Epoch: 3 Global Step: 48270 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:15:47,861-Speed 3123.44 samples/sec Loss 11.9613 LearningRate 0.0649 Epoch: 3 Global Step: 48280 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:15:51,165-Speed 3100.26 samples/sec Loss 12.0102 LearningRate 0.0649 Epoch: 3 Global Step: 48290 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:15:54,512-Speed 3060.48 samples/sec Loss 11.9436 LearningRate 0.0649 Epoch: 3 Global Step: 48300 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:15:57,858-Speed 3061.76 samples/sec Loss 11.9274 LearningRate 0.0649 Epoch: 3 Global Step: 48310 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:16:01,241-Speed 3027.76 samples/sec Loss 11.8980 LearningRate 0.0649 Epoch: 3 Global Step: 48320 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:16:04,593-Speed 3055.96 samples/sec Loss 12.1108 LearningRate 0.0649 Epoch: 3 Global Step: 48330 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:16:07,947-Speed 3053.67 samples/sec Loss 11.9890 LearningRate 0.0649 Epoch: 3 Global Step: 48340 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:16:11,267-Speed 3084.79 samples/sec Loss 12.1710 LearningRate 0.0649 Epoch: 3 Global Step: 48350 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:16:14,612-Speed 3062.61 samples/sec Loss 12.1199 LearningRate 0.0649 Epoch: 3 Global Step: 48360 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:16:17,987-Speed 3034.99 samples/sec Loss 11.9555 LearningRate 0.0648 Epoch: 3 Global Step: 48370 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:16:21,376-Speed 3021.89 samples/sec Loss 11.9520 LearningRate 0.0648 Epoch: 3 Global Step: 48380 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:16:24,763-Speed 3024.10 samples/sec Loss 12.0338 LearningRate 0.0648 Epoch: 3 Global Step: 48390 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:16:28,126-Speed 3046.46 samples/sec Loss 11.9849 LearningRate 0.0648 Epoch: 3 Global Step: 48400 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:16:31,448-Speed 3083.45 samples/sec Loss 12.2002 LearningRate 0.0648 Epoch: 3 Global Step: 48410 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:16:34,766-Speed 3086.85 samples/sec Loss 11.9756 LearningRate 0.0648 Epoch: 3 Global Step: 48420 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:16:38,077-Speed 3094.00 samples/sec Loss 11.9112 LearningRate 0.0648 Epoch: 3 Global Step: 48430 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:16:41,421-Speed 3062.69 samples/sec Loss 12.0586 LearningRate 0.0648 Epoch: 3 Global Step: 48440 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:16:44,725-Speed 3101.05 samples/sec Loss 12.0921 LearningRate 0.0648 Epoch: 3 Global Step: 48450 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:16:48,132-Speed 3006.80 samples/sec Loss 11.9494 LearningRate 0.0648 Epoch: 3 Global Step: 48460 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:16:51,464-Speed 3073.83 samples/sec Loss 12.0282 LearningRate 0.0648 Epoch: 3 Global Step: 48470 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:16:54,914-Speed 2968.59 samples/sec Loss 12.0514 LearningRate 0.0648 Epoch: 3 Global Step: 48480 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:16:58,330-Speed 2998.92 samples/sec Loss 12.0671 LearningRate 0.0648 Epoch: 3 Global Step: 48490 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:17:01,709-Speed 3031.08 samples/sec Loss 12.1015 LearningRate 0.0648 Epoch: 3 Global Step: 48500 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:17:05,033-Speed 3082.24 samples/sec Loss 11.8893 LearningRate 0.0648 Epoch: 3 Global Step: 48510 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:17:08,437-Speed 3008.59 samples/sec Loss 12.0235 LearningRate 0.0648 Epoch: 3 Global Step: 48520 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:17:11,753-Speed 3089.27 samples/sec Loss 11.9666 LearningRate 0.0647 Epoch: 3 Global Step: 48530 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:17:15,095-Speed 3065.01 samples/sec Loss 11.8148 LearningRate 0.0647 Epoch: 3 Global Step: 48540 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:17:18,515-Speed 2995.16 samples/sec Loss 12.0942 LearningRate 0.0647 Epoch: 3 Global Step: 48550 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:17:21,891-Speed 3034.22 samples/sec Loss 11.8147 LearningRate 0.0647 Epoch: 3 Global Step: 48560 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:17:25,211-Speed 3085.41 samples/sec Loss 11.8972 LearningRate 0.0647 Epoch: 3 Global Step: 48570 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:17:28,538-Speed 3078.31 samples/sec Loss 11.8683 LearningRate 0.0647 Epoch: 3 Global Step: 48580 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:17:31,880-Speed 3064.44 samples/sec Loss 11.9260 LearningRate 0.0647 Epoch: 3 Global Step: 48590 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:17:35,280-Speed 3013.78 samples/sec Loss 12.0573 LearningRate 0.0647 Epoch: 3 Global Step: 48600 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:17:38,646-Speed 3043.39 samples/sec Loss 11.9277 LearningRate 0.0647 Epoch: 3 Global Step: 48610 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:17:41,995-Speed 3058.54 samples/sec Loss 11.9320 LearningRate 0.0647 Epoch: 3 Global Step: 48620 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:17:45,357-Speed 3046.34 samples/sec Loss 12.0262 LearningRate 0.0647 Epoch: 3 Global Step: 48630 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:17:48,723-Speed 3042.75 samples/sec Loss 11.9681 LearningRate 0.0647 Epoch: 3 Global Step: 48640 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:17:52,075-Speed 3056.23 samples/sec Loss 11.9163 LearningRate 0.0647 Epoch: 3 Global Step: 48650 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:17:55,454-Speed 3031.38 samples/sec Loss 12.0751 LearningRate 0.0647 Epoch: 3 Global Step: 48660 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:17:58,764-Speed 3094.37 samples/sec Loss 11.9658 LearningRate 0.0647 Epoch: 3 Global Step: 48670 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:18:02,081-Speed 3088.34 samples/sec Loss 11.8607 LearningRate 0.0646 Epoch: 3 Global Step: 48680 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:18:05,447-Speed 3042.78 samples/sec Loss 11.9155 LearningRate 0.0646 Epoch: 3 Global Step: 48690 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:18:08,809-Speed 3047.00 samples/sec Loss 11.8850 LearningRate 0.0646 Epoch: 3 Global Step: 48700 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:18:12,125-Speed 3088.96 samples/sec Loss 11.9892 LearningRate 0.0646 Epoch: 3 Global Step: 48710 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:18:15,446-Speed 3084.83 samples/sec Loss 11.8988 LearningRate 0.0646 Epoch: 3 Global Step: 48720 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:18:18,857-Speed 3002.52 samples/sec Loss 11.9154 LearningRate 0.0646 Epoch: 3 Global Step: 48730 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:18:22,272-Speed 2999.63 samples/sec Loss 12.0619 LearningRate 0.0646 Epoch: 3 Global Step: 48740 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:18:25,632-Speed 3048.84 samples/sec Loss 11.7943 LearningRate 0.0646 Epoch: 3 Global Step: 48750 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:18:28,939-Speed 3097.33 samples/sec Loss 11.9614 LearningRate 0.0646 Epoch: 3 Global Step: 48760 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:18:32,239-Speed 3103.78 samples/sec Loss 12.0207 LearningRate 0.0646 Epoch: 3 Global Step: 48770 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:18:35,570-Speed 3075.00 samples/sec Loss 11.7383 LearningRate 0.0646 Epoch: 3 Global Step: 48780 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:18:38,911-Speed 3066.30 samples/sec Loss 11.9627 LearningRate 0.0646 Epoch: 3 Global Step: 48790 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:18:42,245-Speed 3072.02 samples/sec Loss 12.0579 LearningRate 0.0646 Epoch: 3 Global Step: 48800 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:18:45,538-Speed 3110.41 samples/sec Loss 11.9105 LearningRate 0.0646 Epoch: 3 Global Step: 48810 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:18:48,866-Speed 3078.38 samples/sec Loss 11.8778 LearningRate 0.0646 Epoch: 3 Global Step: 48820 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:18:52,259-Speed 3018.03 samples/sec Loss 11.9723 LearningRate 0.0646 Epoch: 3 Global Step: 48830 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:18:55,650-Speed 3021.75 samples/sec Loss 11.9565 LearningRate 0.0645 Epoch: 3 Global Step: 48840 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:18:59,094-Speed 2974.15 samples/sec Loss 11.9586 LearningRate 0.0645 Epoch: 3 Global Step: 48850 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:19:02,487-Speed 3019.02 samples/sec Loss 11.9363 LearningRate 0.0645 Epoch: 3 Global Step: 48860 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:19:05,814-Speed 3079.08 samples/sec Loss 11.9814 LearningRate 0.0645 Epoch: 3 Global Step: 48870 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:19:09,118-Speed 3099.99 samples/sec Loss 12.0615 LearningRate 0.0645 Epoch: 3 Global Step: 48880 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:19:12,491-Speed 3036.82 samples/sec Loss 12.0541 LearningRate 0.0645 Epoch: 3 Global Step: 48890 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:19:15,836-Speed 3061.54 samples/sec Loss 12.0175 LearningRate 0.0645 Epoch: 3 Global Step: 48900 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:19:19,201-Speed 3043.90 samples/sec Loss 11.9344 LearningRate 0.0645 Epoch: 3 Global Step: 48910 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:19:22,633-Speed 2984.76 samples/sec Loss 12.0083 LearningRate 0.0645 Epoch: 3 Global Step: 48920 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:19:25,952-Speed 3086.10 samples/sec Loss 11.8750 LearningRate 0.0645 Epoch: 3 Global Step: 48930 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:19:29,284-Speed 3073.95 samples/sec Loss 12.1422 LearningRate 0.0645 Epoch: 3 Global Step: 48940 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-04-27 06:19:32,652-Speed 3041.66 samples/sec Loss 11.9305 LearningRate 0.0645 Epoch: 3 Global Step: 48950 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:19:35,981-Speed 3077.22 samples/sec Loss 11.9345 LearningRate 0.0645 Epoch: 3 Global Step: 48960 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:19:39,357-Speed 3033.14 samples/sec Loss 11.8726 LearningRate 0.0645 Epoch: 3 Global Step: 48970 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:19:42,718-Speed 3048.27 samples/sec Loss 11.9255 LearningRate 0.0645 Epoch: 3 Global Step: 48980 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:19:46,140-Speed 2993.21 samples/sec Loss 11.8440 LearningRate 0.0644 Epoch: 3 Global Step: 48990 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:19:49,450-Speed 3094.21 samples/sec Loss 11.9080 LearningRate 0.0644 Epoch: 3 Global Step: 49000 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:19:52,820-Speed 3039.64 samples/sec Loss 11.8940 LearningRate 0.0644 Epoch: 3 Global Step: 49010 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:19:56,168-Speed 3059.66 samples/sec Loss 11.9700 LearningRate 0.0644 Epoch: 3 Global Step: 49020 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:19:59,548-Speed 3030.22 samples/sec Loss 11.9914 LearningRate 0.0644 Epoch: 3 Global Step: 49030 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:20:02,919-Speed 3038.53 samples/sec Loss 12.1211 LearningRate 0.0644 Epoch: 3 Global Step: 49040 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:20:06,321-Speed 3010.99 samples/sec Loss 11.8753 LearningRate 0.0644 Epoch: 3 Global Step: 49050 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:20:09,752-Speed 2985.65 samples/sec Loss 11.9878 LearningRate 0.0644 Epoch: 3 Global Step: 49060 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:20:13,053-Speed 3102.92 samples/sec Loss 11.9194 LearningRate 0.0644 Epoch: 3 Global Step: 49070 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:20:16,354-Speed 3103.12 samples/sec Loss 11.7589 LearningRate 0.0644 Epoch: 3 Global Step: 49080 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:20:19,739-Speed 3026.31 samples/sec Loss 12.0237 LearningRate 0.0644 Epoch: 3 Global Step: 49090 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:20:23,063-Speed 3081.35 samples/sec Loss 11.9556 LearningRate 0.0644 Epoch: 3 Global Step: 49100 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:20:26,458-Speed 3016.93 samples/sec Loss 11.9139 LearningRate 0.0644 Epoch: 3 Global Step: 49110 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:20:29,811-Speed 3054.85 samples/sec Loss 11.9722 LearningRate 0.0644 Epoch: 3 Global Step: 49120 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:20:33,198-Speed 3024.68 samples/sec Loss 12.0197 LearningRate 0.0644 Epoch: 3 Global Step: 49130 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:20:36,596-Speed 3014.38 samples/sec Loss 11.8811 LearningRate 0.0644 Epoch: 3 Global Step: 49140 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:20:39,911-Speed 3089.84 samples/sec Loss 12.0120 LearningRate 0.0643 Epoch: 3 Global Step: 49150 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:20:43,238-Speed 3078.84 samples/sec Loss 11.9888 LearningRate 0.0643 Epoch: 3 Global Step: 49160 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:20:46,629-Speed 3021.29 samples/sec Loss 12.0609 LearningRate 0.0643 Epoch: 3 Global Step: 49170 Fp16 Grad Scale: 262144 Required: 19 hours Training: 2022-04-27 06:20:49,985-Speed 3051.85 samples/sec Loss 11.9287 LearningRate 0.0643 Epoch: 3 Global Step: 49180 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:20:53,356-Speed 3038.13 samples/sec Loss 12.0815 LearningRate 0.0643 Epoch: 3 Global Step: 49190 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:20:56,677-Speed 3084.60 samples/sec Loss 11.8564 LearningRate 0.0643 Epoch: 3 Global Step: 49200 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:21:00,069-Speed 3020.36 samples/sec Loss 12.0223 LearningRate 0.0643 Epoch: 3 Global Step: 49210 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:21:03,407-Speed 3067.87 samples/sec Loss 11.9772 LearningRate 0.0643 Epoch: 3 Global Step: 49220 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:21:06,767-Speed 3049.34 samples/sec Loss 11.8801 LearningRate 0.0643 Epoch: 3 Global Step: 49230 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:21:10,124-Speed 3050.87 samples/sec Loss 11.9108 LearningRate 0.0643 Epoch: 3 Global Step: 49240 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:21:13,500-Speed 3034.49 samples/sec Loss 11.9317 LearningRate 0.0643 Epoch: 3 Global Step: 49250 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:21:16,850-Speed 3057.31 samples/sec Loss 12.1295 LearningRate 0.0643 Epoch: 3 Global Step: 49260 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:21:20,191-Speed 3066.14 samples/sec Loss 11.9112 LearningRate 0.0643 Epoch: 3 Global Step: 49270 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:21:23,520-Speed 3076.22 samples/sec Loss 12.0057 LearningRate 0.0643 Epoch: 3 Global Step: 49280 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:21:26,852-Speed 3074.69 samples/sec Loss 11.8881 LearningRate 0.0643 Epoch: 3 Global Step: 49290 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:21:30,197-Speed 3061.54 samples/sec Loss 11.9553 LearningRate 0.0642 Epoch: 3 Global Step: 49300 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:21:33,594-Speed 3015.11 samples/sec Loss 12.0822 LearningRate 0.0642 Epoch: 3 Global Step: 49310 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:21:36,953-Speed 3049.89 samples/sec Loss 11.9296 LearningRate 0.0642 Epoch: 3 Global Step: 49320 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:21:40,283-Speed 3075.99 samples/sec Loss 12.0479 LearningRate 0.0642 Epoch: 3 Global Step: 49330 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:21:43,690-Speed 3006.63 samples/sec Loss 11.9228 LearningRate 0.0642 Epoch: 3 Global Step: 49340 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:21:47,051-Speed 3047.55 samples/sec Loss 11.9105 LearningRate 0.0642 Epoch: 3 Global Step: 49350 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:21:50,447-Speed 3016.32 samples/sec Loss 11.8460 LearningRate 0.0642 Epoch: 3 Global Step: 49360 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:21:53,778-Speed 3074.41 samples/sec Loss 12.0231 LearningRate 0.0642 Epoch: 3 Global Step: 49370 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:21:57,153-Speed 3035.16 samples/sec Loss 11.8866 LearningRate 0.0642 Epoch: 3 Global Step: 49380 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:22:00,557-Speed 3009.07 samples/sec Loss 11.9844 LearningRate 0.0642 Epoch: 3 Global Step: 49390 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:22:03,867-Speed 3094.94 samples/sec Loss 11.7919 LearningRate 0.0642 Epoch: 3 Global Step: 49400 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:22:07,229-Speed 3046.61 samples/sec Loss 11.8777 LearningRate 0.0642 Epoch: 3 Global Step: 49410 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:22:10,545-Speed 3089.01 samples/sec Loss 11.7681 LearningRate 0.0642 Epoch: 3 Global Step: 49420 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:22:13,947-Speed 3010.62 samples/sec Loss 11.8379 LearningRate 0.0642 Epoch: 3 Global Step: 49430 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:22:17,259-Speed 3092.95 samples/sec Loss 11.7795 LearningRate 0.0642 Epoch: 3 Global Step: 49440 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:22:20,609-Speed 3057.24 samples/sec Loss 11.9648 LearningRate 0.0642 Epoch: 3 Global Step: 49450 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:22:24,034-Speed 2990.80 samples/sec Loss 11.8449 LearningRate 0.0641 Epoch: 3 Global Step: 49460 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:22:27,443-Speed 3004.38 samples/sec Loss 11.8452 LearningRate 0.0641 Epoch: 3 Global Step: 49470 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:22:30,757-Speed 3091.88 samples/sec Loss 11.9846 LearningRate 0.0641 Epoch: 3 Global Step: 49480 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:22:34,135-Speed 3031.68 samples/sec Loss 11.9669 LearningRate 0.0641 Epoch: 3 Global Step: 49490 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:22:37,562-Speed 2988.78 samples/sec Loss 11.8749 LearningRate 0.0641 Epoch: 3 Global Step: 49500 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:22:40,904-Speed 3065.34 samples/sec Loss 11.9091 LearningRate 0.0641 Epoch: 3 Global Step: 49510 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:22:44,237-Speed 3073.35 samples/sec Loss 11.9416 LearningRate 0.0641 Epoch: 3 Global Step: 49520 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:22:47,610-Speed 3036.74 samples/sec Loss 11.8455 LearningRate 0.0641 Epoch: 3 Global Step: 49530 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:22:50,951-Speed 3066.15 samples/sec Loss 11.8635 LearningRate 0.0641 Epoch: 3 Global Step: 49540 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:22:54,389-Speed 2979.02 samples/sec Loss 11.9360 LearningRate 0.0641 Epoch: 3 Global Step: 49550 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:22:57,765-Speed 3034.53 samples/sec Loss 12.0381 LearningRate 0.0641 Epoch: 3 Global Step: 49560 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:23:01,184-Speed 2995.96 samples/sec Loss 11.8689 LearningRate 0.0641 Epoch: 3 Global Step: 49570 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:23:04,508-Speed 3080.51 samples/sec Loss 11.9864 LearningRate 0.0641 Epoch: 3 Global Step: 49580 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:23:07,790-Speed 3120.98 samples/sec Loss 11.9076 LearningRate 0.0641 Epoch: 3 Global Step: 49590 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:23:11,138-Speed 3060.38 samples/sec Loss 11.9142 LearningRate 0.0641 Epoch: 3 Global Step: 49600 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:23:14,584-Speed 2972.15 samples/sec Loss 12.0358 LearningRate 0.0640 Epoch: 3 Global Step: 49610 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:23:18,009-Speed 2991.27 samples/sec Loss 12.0725 LearningRate 0.0640 Epoch: 3 Global Step: 49620 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:23:21,456-Speed 2971.05 samples/sec Loss 12.0775 LearningRate 0.0640 Epoch: 3 Global Step: 49630 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:23:24,881-Speed 2990.79 samples/sec Loss 11.8763 LearningRate 0.0640 Epoch: 3 Global Step: 49640 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:23:28,220-Speed 3068.35 samples/sec Loss 11.8891 LearningRate 0.0640 Epoch: 3 Global Step: 49650 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:23:31,551-Speed 3075.48 samples/sec Loss 11.8944 LearningRate 0.0640 Epoch: 3 Global Step: 49660 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:23:34,862-Speed 3093.67 samples/sec Loss 11.8110 LearningRate 0.0640 Epoch: 3 Global Step: 49670 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:23:38,424-Speed 2875.35 samples/sec Loss 12.0148 LearningRate 0.0640 Epoch: 3 Global Step: 49680 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:24:10,150-Speed 322.78 samples/sec Loss 11.0676 LearningRate 0.0640 Epoch: 4 Global Step: 49690 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:24:13,718-Speed 2870.86 samples/sec Loss 10.5282 LearningRate 0.0640 Epoch: 4 Global Step: 49700 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:24:17,004-Speed 3117.28 samples/sec Loss 10.4182 LearningRate 0.0640 Epoch: 4 Global Step: 49710 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:24:20,306-Speed 3102.77 samples/sec Loss 10.3401 LearningRate 0.0640 Epoch: 4 Global Step: 49720 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:24:23,550-Speed 3157.65 samples/sec Loss 10.3440 LearningRate 0.0640 Epoch: 4 Global Step: 49730 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:24:26,927-Speed 3033.29 samples/sec Loss 10.4689 LearningRate 0.0640 Epoch: 4 Global Step: 49740 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:24:30,204-Speed 3125.20 samples/sec Loss 10.4103 LearningRate 0.0640 Epoch: 4 Global Step: 49750 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:24:33,535-Speed 3075.88 samples/sec Loss 10.3721 LearningRate 0.0640 Epoch: 4 Global Step: 49760 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:24:36,838-Speed 3100.98 samples/sec Loss 10.4029 LearningRate 0.0639 Epoch: 4 Global Step: 49770 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:24:40,190-Speed 3056.54 samples/sec Loss 10.5132 LearningRate 0.0639 Epoch: 4 Global Step: 49780 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:24:43,586-Speed 3016.13 samples/sec Loss 10.3815 LearningRate 0.0639 Epoch: 4 Global Step: 49790 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:24:46,881-Speed 3108.63 samples/sec Loss 10.4688 LearningRate 0.0639 Epoch: 4 Global Step: 49800 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:24:50,222-Speed 3066.66 samples/sec Loss 10.6229 LearningRate 0.0639 Epoch: 4 Global Step: 49810 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:24:53,556-Speed 3071.83 samples/sec Loss 10.4594 LearningRate 0.0639 Epoch: 4 Global Step: 49820 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:24:56,855-Speed 3105.29 samples/sec Loss 10.4788 LearningRate 0.0639 Epoch: 4 Global Step: 49830 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:25:00,172-Speed 3088.30 samples/sec Loss 10.6266 LearningRate 0.0639 Epoch: 4 Global Step: 49840 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:25:03,449-Speed 3125.85 samples/sec Loss 10.4870 LearningRate 0.0639 Epoch: 4 Global Step: 49850 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:25:06,815-Speed 3043.24 samples/sec Loss 10.5963 LearningRate 0.0639 Epoch: 4 Global Step: 49860 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:25:10,131-Speed 3088.94 samples/sec Loss 10.5054 LearningRate 0.0639 Epoch: 4 Global Step: 49870 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:25:13,406-Speed 3127.85 samples/sec Loss 10.6787 LearningRate 0.0639 Epoch: 4 Global Step: 49880 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:25:16,918-Speed 2916.43 samples/sec Loss 10.5208 LearningRate 0.0639 Epoch: 4 Global Step: 49890 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:25:20,277-Speed 3048.81 samples/sec Loss 10.6117 LearningRate 0.0639 Epoch: 4 Global Step: 49900 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:25:23,605-Speed 3078.77 samples/sec Loss 10.4589 LearningRate 0.0639 Epoch: 4 Global Step: 49910 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:25:27,077-Speed 2950.06 samples/sec Loss 10.3768 LearningRate 0.0638 Epoch: 4 Global Step: 49920 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:25:30,393-Speed 3089.17 samples/sec Loss 10.3702 LearningRate 0.0638 Epoch: 4 Global Step: 49930 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:25:33,715-Speed 3083.57 samples/sec Loss 10.5707 LearningRate 0.0638 Epoch: 4 Global Step: 49940 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:25:37,018-Speed 3101.05 samples/sec Loss 10.3478 LearningRate 0.0638 Epoch: 4 Global Step: 49950 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:25:40,419-Speed 3011.58 samples/sec Loss 10.4418 LearningRate 0.0638 Epoch: 4 Global Step: 49960 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:25:43,765-Speed 3061.68 samples/sec Loss 10.5226 LearningRate 0.0638 Epoch: 4 Global Step: 49970 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:25:47,093-Speed 3078.20 samples/sec Loss 10.5069 LearningRate 0.0638 Epoch: 4 Global Step: 49980 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:25:50,446-Speed 3054.20 samples/sec Loss 10.6944 LearningRate 0.0638 Epoch: 4 Global Step: 49990 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:25:53,837-Speed 3021.12 samples/sec Loss 10.6058 LearningRate 0.0638 Epoch: 4 Global Step: 50000 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:25:57,252-Speed 2999.59 samples/sec Loss 10.5323 LearningRate 0.0638 Epoch: 4 Global Step: 50010 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:26:00,577-Speed 3080.61 samples/sec Loss 10.6741 LearningRate 0.0638 Epoch: 4 Global Step: 50020 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:26:03,967-Speed 3022.62 samples/sec Loss 10.6120 LearningRate 0.0638 Epoch: 4 Global Step: 50030 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:26:07,328-Speed 3047.27 samples/sec Loss 10.6727 LearningRate 0.0638 Epoch: 4 Global Step: 50040 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:26:10,634-Speed 3098.39 samples/sec Loss 10.7496 LearningRate 0.0638 Epoch: 4 Global Step: 50050 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:26:13,924-Speed 3113.62 samples/sec Loss 10.6979 LearningRate 0.0638 Epoch: 4 Global Step: 50060 Fp16 Grad Scale: 65536 Required: 19 hours Training: 2022-04-27 06:26:17,198-Speed 3127.92 samples/sec Loss 10.6649 LearningRate 0.0638 Epoch: 4 Global Step: 50070 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:26:20,502-Speed 3101.30 samples/sec Loss 10.8206 LearningRate 0.0637 Epoch: 4 Global Step: 50080 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:26:23,868-Speed 3042.77 samples/sec Loss 10.6953 LearningRate 0.0637 Epoch: 4 Global Step: 50090 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:26:27,259-Speed 3020.70 samples/sec Loss 10.7809 LearningRate 0.0637 Epoch: 4 Global Step: 50100 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:26:30,596-Speed 3069.50 samples/sec Loss 10.6507 LearningRate 0.0637 Epoch: 4 Global Step: 50110 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:26:33,975-Speed 3031.08 samples/sec Loss 10.6695 LearningRate 0.0637 Epoch: 4 Global Step: 50120 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:26:37,314-Speed 3068.17 samples/sec Loss 10.4377 LearningRate 0.0637 Epoch: 4 Global Step: 50130 Fp16 Grad Scale: 131072 Required: 19 hours Training: 2022-04-27 06:26:40,653-Speed 3067.66 samples/sec Loss 10.6130 LearningRate 0.0637 Epoch: 4 Global Step: 50140 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:26:44,080-Speed 2988.57 samples/sec Loss 10.7870 LearningRate 0.0637 Epoch: 4 Global Step: 50150 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:26:47,355-Speed 3127.98 samples/sec Loss 10.6142 LearningRate 0.0637 Epoch: 4 Global Step: 50160 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:26:50,693-Speed 3068.54 samples/sec Loss 10.7926 LearningRate 0.0637 Epoch: 4 Global Step: 50170 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:26:54,035-Speed 3064.44 samples/sec Loss 10.7994 LearningRate 0.0637 Epoch: 4 Global Step: 50180 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:26:57,384-Speed 3058.85 samples/sec Loss 10.8156 LearningRate 0.0637 Epoch: 4 Global Step: 50190 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:27:00,740-Speed 3052.82 samples/sec Loss 10.8625 LearningRate 0.0637 Epoch: 4 Global Step: 50200 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:27:04,101-Speed 3047.15 samples/sec Loss 10.7036 LearningRate 0.0637 Epoch: 4 Global Step: 50210 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:27:07,482-Speed 3030.49 samples/sec Loss 10.6692 LearningRate 0.0637 Epoch: 4 Global Step: 50220 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:27:10,747-Speed 3137.51 samples/sec Loss 10.7677 LearningRate 0.0636 Epoch: 4 Global Step: 50230 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:27:14,110-Speed 3046.04 samples/sec Loss 10.8734 LearningRate 0.0636 Epoch: 4 Global Step: 50240 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:27:17,432-Speed 3082.99 samples/sec Loss 10.7799 LearningRate 0.0636 Epoch: 4 Global Step: 50250 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:27:20,740-Speed 3096.53 samples/sec Loss 10.8928 LearningRate 0.0636 Epoch: 4 Global Step: 50260 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:27:24,001-Speed 3141.55 samples/sec Loss 10.8116 LearningRate 0.0636 Epoch: 4 Global Step: 50270 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:27:27,361-Speed 3048.83 samples/sec Loss 10.8954 LearningRate 0.0636 Epoch: 4 Global Step: 50280 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:27:30,742-Speed 3029.49 samples/sec Loss 10.9963 LearningRate 0.0636 Epoch: 4 Global Step: 50290 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:27:34,043-Speed 3103.65 samples/sec Loss 10.7886 LearningRate 0.0636 Epoch: 4 Global Step: 50300 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:27:37,344-Speed 3103.29 samples/sec Loss 10.7822 LearningRate 0.0636 Epoch: 4 Global Step: 50310 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:27:40,724-Speed 3030.22 samples/sec Loss 10.8244 LearningRate 0.0636 Epoch: 4 Global Step: 50320 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:27:44,054-Speed 3076.41 samples/sec Loss 10.9105 LearningRate 0.0636 Epoch: 4 Global Step: 50330 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:27:47,389-Speed 3070.63 samples/sec Loss 10.8206 LearningRate 0.0636 Epoch: 4 Global Step: 50340 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:27:50,734-Speed 3062.57 samples/sec Loss 11.0732 LearningRate 0.0636 Epoch: 4 Global Step: 50350 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:27:54,055-Speed 3084.17 samples/sec Loss 10.6925 LearningRate 0.0636 Epoch: 4 Global Step: 50360 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:27:57,446-Speed 3020.78 samples/sec Loss 10.9035 LearningRate 0.0636 Epoch: 4 Global Step: 50370 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:28:00,875-Speed 2986.94 samples/sec Loss 10.8431 LearningRate 0.0636 Epoch: 4 Global Step: 50380 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:28:04,240-Speed 3044.26 samples/sec Loss 10.9426 LearningRate 0.0635 Epoch: 4 Global Step: 50390 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:28:07,580-Speed 3066.21 samples/sec Loss 10.9983 LearningRate 0.0635 Epoch: 4 Global Step: 50400 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:28:10,879-Speed 3105.19 samples/sec Loss 10.9250 LearningRate 0.0635 Epoch: 4 Global Step: 50410 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:28:14,293-Speed 3000.57 samples/sec Loss 10.8605 LearningRate 0.0635 Epoch: 4 Global Step: 50420 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:28:17,741-Speed 2970.54 samples/sec Loss 10.8231 LearningRate 0.0635 Epoch: 4 Global Step: 50430 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:28:21,169-Speed 2988.41 samples/sec Loss 10.9182 LearningRate 0.0635 Epoch: 4 Global Step: 50440 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:28:24,535-Speed 3043.09 samples/sec Loss 10.8506 LearningRate 0.0635 Epoch: 4 Global Step: 50450 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:28:27,960-Speed 2990.62 samples/sec Loss 10.9320 LearningRate 0.0635 Epoch: 4 Global Step: 50460 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:28:31,279-Speed 3085.86 samples/sec Loss 11.0188 LearningRate 0.0635 Epoch: 4 Global Step: 50470 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:28:34,676-Speed 3016.08 samples/sec Loss 11.0753 LearningRate 0.0635 Epoch: 4 Global Step: 50480 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:28:38,027-Speed 3056.64 samples/sec Loss 11.1873 LearningRate 0.0635 Epoch: 4 Global Step: 50490 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:28:41,355-Speed 3078.32 samples/sec Loss 11.0564 LearningRate 0.0635 Epoch: 4 Global Step: 50500 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:28:44,686-Speed 3075.45 samples/sec Loss 11.0815 LearningRate 0.0635 Epoch: 4 Global Step: 50510 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:28:48,071-Speed 3025.46 samples/sec Loss 10.9153 LearningRate 0.0635 Epoch: 4 Global Step: 50520 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:28:51,416-Speed 3063.05 samples/sec Loss 11.0282 LearningRate 0.0635 Epoch: 4 Global Step: 50530 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:28:54,712-Speed 3107.07 samples/sec Loss 11.0750 LearningRate 0.0634 Epoch: 4 Global Step: 50540 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:28:58,046-Speed 3072.85 samples/sec Loss 11.0528 LearningRate 0.0634 Epoch: 4 Global Step: 50550 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:29:01,376-Speed 3075.75 samples/sec Loss 11.0616 LearningRate 0.0634 Epoch: 4 Global Step: 50560 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:29:04,711-Speed 3072.43 samples/sec Loss 11.0440 LearningRate 0.0634 Epoch: 4 Global Step: 50570 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:29:08,073-Speed 3046.58 samples/sec Loss 10.9424 LearningRate 0.0634 Epoch: 4 Global Step: 50580 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:29:11,456-Speed 3027.61 samples/sec Loss 11.0346 LearningRate 0.0634 Epoch: 4 Global Step: 50590 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:29:14,758-Speed 3102.35 samples/sec Loss 10.9380 LearningRate 0.0634 Epoch: 4 Global Step: 50600 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:29:18,126-Speed 3041.38 samples/sec Loss 11.0084 LearningRate 0.0634 Epoch: 4 Global Step: 50610 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:29:21,490-Speed 3045.10 samples/sec Loss 10.9701 LearningRate 0.0634 Epoch: 4 Global Step: 50620 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:29:24,839-Speed 3057.90 samples/sec Loss 11.0177 LearningRate 0.0634 Epoch: 4 Global Step: 50630 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:29:28,236-Speed 3015.60 samples/sec Loss 11.1063 LearningRate 0.0634 Epoch: 4 Global Step: 50640 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:29:31,560-Speed 3081.29 samples/sec Loss 10.9856 LearningRate 0.0634 Epoch: 4 Global Step: 50650 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:29:34,933-Speed 3037.47 samples/sec Loss 11.0476 LearningRate 0.0634 Epoch: 4 Global Step: 50660 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:29:38,328-Speed 3016.54 samples/sec Loss 11.0885 LearningRate 0.0634 Epoch: 4 Global Step: 50670 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:29:41,720-Speed 3020.26 samples/sec Loss 11.1328 LearningRate 0.0634 Epoch: 4 Global Step: 50680 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:29:45,021-Speed 3102.69 samples/sec Loss 11.0660 LearningRate 0.0634 Epoch: 4 Global Step: 50690 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:29:48,285-Speed 3137.90 samples/sec Loss 11.0669 LearningRate 0.0633 Epoch: 4 Global Step: 50700 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:29:51,639-Speed 3053.89 samples/sec Loss 11.0864 LearningRate 0.0633 Epoch: 4 Global Step: 50710 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:29:55,016-Speed 3033.94 samples/sec Loss 11.0763 LearningRate 0.0633 Epoch: 4 Global Step: 50720 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:29:58,297-Speed 3121.08 samples/sec Loss 11.1872 LearningRate 0.0633 Epoch: 4 Global Step: 50730 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:30:01,647-Speed 3057.87 samples/sec Loss 11.1234 LearningRate 0.0633 Epoch: 4 Global Step: 50740 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:30:05,014-Speed 3041.97 samples/sec Loss 11.1439 LearningRate 0.0633 Epoch: 4 Global Step: 50750 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:30:08,367-Speed 3055.40 samples/sec Loss 11.1688 LearningRate 0.0633 Epoch: 4 Global Step: 50760 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:30:11,694-Speed 3078.86 samples/sec Loss 11.1679 LearningRate 0.0633 Epoch: 4 Global Step: 50770 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:30:15,116-Speed 2993.70 samples/sec Loss 11.0803 LearningRate 0.0633 Epoch: 4 Global Step: 50780 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:30:18,441-Speed 3082.71 samples/sec Loss 11.0808 LearningRate 0.0633 Epoch: 4 Global Step: 50790 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:30:21,784-Speed 3063.83 samples/sec Loss 11.3074 LearningRate 0.0633 Epoch: 4 Global Step: 50800 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:30:25,138-Speed 3054.02 samples/sec Loss 11.0956 LearningRate 0.0633 Epoch: 4 Global Step: 50810 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:30:28,523-Speed 3026.28 samples/sec Loss 11.3605 LearningRate 0.0633 Epoch: 4 Global Step: 50820 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:30:31,864-Speed 3066.04 samples/sec Loss 11.0976 LearningRate 0.0633 Epoch: 4 Global Step: 50830 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:30:35,225-Speed 3047.83 samples/sec Loss 11.2913 LearningRate 0.0633 Epoch: 4 Global Step: 50840 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:30:38,627-Speed 3010.64 samples/sec Loss 11.0790 LearningRate 0.0633 Epoch: 4 Global Step: 50850 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:30:42,064-Speed 2980.19 samples/sec Loss 11.0554 LearningRate 0.0632 Epoch: 4 Global Step: 50860 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:30:45,456-Speed 3019.06 samples/sec Loss 11.2641 LearningRate 0.0632 Epoch: 4 Global Step: 50870 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:30:48,846-Speed 3021.87 samples/sec Loss 11.2661 LearningRate 0.0632 Epoch: 4 Global Step: 50880 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:30:52,144-Speed 3105.86 samples/sec Loss 11.0307 LearningRate 0.0632 Epoch: 4 Global Step: 50890 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:30:55,445-Speed 3103.49 samples/sec Loss 11.0827 LearningRate 0.0632 Epoch: 4 Global Step: 50900 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:30:58,773-Speed 3077.23 samples/sec Loss 11.1074 LearningRate 0.0632 Epoch: 4 Global Step: 50910 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:31:02,069-Speed 3108.14 samples/sec Loss 11.0001 LearningRate 0.0632 Epoch: 4 Global Step: 50920 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:31:05,437-Speed 3041.52 samples/sec Loss 11.2092 LearningRate 0.0632 Epoch: 4 Global Step: 50930 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:31:08,762-Speed 3080.92 samples/sec Loss 11.3024 LearningRate 0.0632 Epoch: 4 Global Step: 50940 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:31:12,173-Speed 3003.14 samples/sec Loss 11.3352 LearningRate 0.0632 Epoch: 4 Global Step: 50950 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:31:15,554-Speed 3028.85 samples/sec Loss 11.2500 LearningRate 0.0632 Epoch: 4 Global Step: 50960 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:31:18,850-Speed 3108.71 samples/sec Loss 11.0118 LearningRate 0.0632 Epoch: 4 Global Step: 50970 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:31:22,225-Speed 3034.48 samples/sec Loss 11.1825 LearningRate 0.0632 Epoch: 4 Global Step: 50980 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:31:25,609-Speed 3027.51 samples/sec Loss 11.3511 LearningRate 0.0632 Epoch: 4 Global Step: 50990 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:31:29,026-Speed 2997.28 samples/sec Loss 11.3320 LearningRate 0.0632 Epoch: 4 Global Step: 51000 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:31:32,412-Speed 3025.91 samples/sec Loss 11.1758 LearningRate 0.0631 Epoch: 4 Global Step: 51010 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:31:35,768-Speed 3052.11 samples/sec Loss 11.2056 LearningRate 0.0631 Epoch: 4 Global Step: 51020 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:31:39,110-Speed 3064.25 samples/sec Loss 11.2543 LearningRate 0.0631 Epoch: 4 Global Step: 51030 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:31:42,479-Speed 3040.55 samples/sec Loss 11.2878 LearningRate 0.0631 Epoch: 4 Global Step: 51040 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:31:45,809-Speed 3076.10 samples/sec Loss 11.2499 LearningRate 0.0631 Epoch: 4 Global Step: 51050 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:31:49,179-Speed 3039.64 samples/sec Loss 11.3215 LearningRate 0.0631 Epoch: 4 Global Step: 51060 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:31:52,563-Speed 3027.03 samples/sec Loss 11.2938 LearningRate 0.0631 Epoch: 4 Global Step: 51070 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:31:55,993-Speed 2985.96 samples/sec Loss 11.3209 LearningRate 0.0631 Epoch: 4 Global Step: 51080 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:31:59,408-Speed 3000.03 samples/sec Loss 11.4735 LearningRate 0.0631 Epoch: 4 Global Step: 51090 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:32:02,802-Speed 3018.67 samples/sec Loss 11.3373 LearningRate 0.0631 Epoch: 4 Global Step: 51100 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:32:06,185-Speed 3027.59 samples/sec Loss 11.2587 LearningRate 0.0631 Epoch: 4 Global Step: 51110 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:32:09,535-Speed 3057.26 samples/sec Loss 11.2616 LearningRate 0.0631 Epoch: 4 Global Step: 51120 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:32:12,883-Speed 3059.75 samples/sec Loss 11.1369 LearningRate 0.0631 Epoch: 4 Global Step: 51130 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:32:16,197-Speed 3090.55 samples/sec Loss 11.4007 LearningRate 0.0631 Epoch: 4 Global Step: 51140 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:32:19,552-Speed 3053.69 samples/sec Loss 11.2803 LearningRate 0.0631 Epoch: 4 Global Step: 51150 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:32:22,865-Speed 3091.43 samples/sec Loss 11.2991 LearningRate 0.0631 Epoch: 4 Global Step: 51160 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:32:26,153-Speed 3115.64 samples/sec Loss 11.1851 LearningRate 0.0630 Epoch: 4 Global Step: 51170 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:32:29,529-Speed 3034.24 samples/sec Loss 11.2745 LearningRate 0.0630 Epoch: 4 Global Step: 51180 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:32:32,916-Speed 3024.44 samples/sec Loss 11.4857 LearningRate 0.0630 Epoch: 4 Global Step: 51190 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:32:36,260-Speed 3063.10 samples/sec Loss 11.2209 LearningRate 0.0630 Epoch: 4 Global Step: 51200 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:32:39,669-Speed 3005.03 samples/sec Loss 11.3500 LearningRate 0.0630 Epoch: 4 Global Step: 51210 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:32:42,973-Speed 3100.55 samples/sec Loss 11.3563 LearningRate 0.0630 Epoch: 4 Global Step: 51220 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:32:46,315-Speed 3065.07 samples/sec Loss 11.2981 LearningRate 0.0630 Epoch: 4 Global Step: 51230 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:32:49,673-Speed 3050.30 samples/sec Loss 11.3745 LearningRate 0.0630 Epoch: 4 Global Step: 51240 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:32:53,022-Speed 3059.11 samples/sec Loss 11.3323 LearningRate 0.0630 Epoch: 4 Global Step: 51250 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:32:56,425-Speed 3010.28 samples/sec Loss 11.2998 LearningRate 0.0630 Epoch: 4 Global Step: 51260 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:32:59,851-Speed 2989.97 samples/sec Loss 11.4462 LearningRate 0.0630 Epoch: 4 Global Step: 51270 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:33:03,207-Speed 3051.73 samples/sec Loss 11.1832 LearningRate 0.0630 Epoch: 4 Global Step: 51280 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:33:06,556-Speed 3059.31 samples/sec Loss 11.2283 LearningRate 0.0630 Epoch: 4 Global Step: 51290 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:33:09,989-Speed 2983.61 samples/sec Loss 11.4069 LearningRate 0.0630 Epoch: 4 Global Step: 51300 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:33:13,408-Speed 2995.32 samples/sec Loss 11.2544 LearningRate 0.0630 Epoch: 4 Global Step: 51310 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:33:16,790-Speed 3029.65 samples/sec Loss 11.3990 LearningRate 0.0630 Epoch: 4 Global Step: 51320 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:33:20,136-Speed 3061.58 samples/sec Loss 11.4037 LearningRate 0.0629 Epoch: 4 Global Step: 51330 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:33:23,474-Speed 3068.37 samples/sec Loss 11.2165 LearningRate 0.0629 Epoch: 4 Global Step: 51340 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-04-27 06:33:26,825-Speed 3057.00 samples/sec Loss 11.3671 LearningRate 0.0629 Epoch: 4 Global Step: 51350 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:33:30,235-Speed 3003.84 samples/sec Loss 11.4477 LearningRate 0.0629 Epoch: 4 Global Step: 51360 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:33:33,536-Speed 3103.00 samples/sec Loss 11.3713 LearningRate 0.0629 Epoch: 4 Global Step: 51370 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:33:36,852-Speed 3088.81 samples/sec Loss 11.3408 LearningRate 0.0629 Epoch: 4 Global Step: 51380 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:33:40,185-Speed 3073.37 samples/sec Loss 11.2923 LearningRate 0.0629 Epoch: 4 Global Step: 51390 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:33:43,604-Speed 2995.65 samples/sec Loss 11.3757 LearningRate 0.0629 Epoch: 4 Global Step: 51400 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:33:46,917-Speed 3091.33 samples/sec Loss 11.4027 LearningRate 0.0629 Epoch: 4 Global Step: 51410 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:33:50,283-Speed 3043.70 samples/sec Loss 11.4656 LearningRate 0.0629 Epoch: 4 Global Step: 51420 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:33:53,574-Speed 3112.55 samples/sec Loss 11.3181 LearningRate 0.0629 Epoch: 4 Global Step: 51430 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:33:56,880-Speed 3098.18 samples/sec Loss 11.1800 LearningRate 0.0629 Epoch: 4 Global Step: 51440 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:34:00,228-Speed 3059.65 samples/sec Loss 11.4256 LearningRate 0.0629 Epoch: 4 Global Step: 51450 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:34:03,586-Speed 3050.18 samples/sec Loss 11.3915 LearningRate 0.0629 Epoch: 4 Global Step: 51460 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:34:06,980-Speed 3017.95 samples/sec Loss 11.4319 LearningRate 0.0629 Epoch: 4 Global Step: 51470 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:34:10,315-Speed 3071.09 samples/sec Loss 11.5441 LearningRate 0.0628 Epoch: 4 Global Step: 51480 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:34:13,662-Speed 3059.99 samples/sec Loss 11.3398 LearningRate 0.0628 Epoch: 4 Global Step: 51490 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:34:17,049-Speed 3024.56 samples/sec Loss 11.4796 LearningRate 0.0628 Epoch: 4 Global Step: 51500 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:34:20,433-Speed 3027.61 samples/sec Loss 11.4263 LearningRate 0.0628 Epoch: 4 Global Step: 51510 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:34:23,850-Speed 2997.60 samples/sec Loss 11.4646 LearningRate 0.0628 Epoch: 4 Global Step: 51520 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:34:27,211-Speed 3047.44 samples/sec Loss 11.3640 LearningRate 0.0628 Epoch: 4 Global Step: 51530 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:34:30,623-Speed 3002.17 samples/sec Loss 11.2918 LearningRate 0.0628 Epoch: 4 Global Step: 51540 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:34:33,960-Speed 3069.67 samples/sec Loss 11.4209 LearningRate 0.0628 Epoch: 4 Global Step: 51550 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:34:37,325-Speed 3044.10 samples/sec Loss 11.5225 LearningRate 0.0628 Epoch: 4 Global Step: 51560 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:34:40,696-Speed 3038.72 samples/sec Loss 11.4546 LearningRate 0.0628 Epoch: 4 Global Step: 51570 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:34:44,029-Speed 3072.54 samples/sec Loss 11.3316 LearningRate 0.0628 Epoch: 4 Global Step: 51580 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:34:47,447-Speed 2997.01 samples/sec Loss 11.5013 LearningRate 0.0628 Epoch: 4 Global Step: 51590 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:34:50,765-Speed 3087.05 samples/sec Loss 11.2580 LearningRate 0.0628 Epoch: 4 Global Step: 51600 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:34:54,174-Speed 3004.67 samples/sec Loss 11.4342 LearningRate 0.0628 Epoch: 4 Global Step: 51610 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:34:57,548-Speed 3036.51 samples/sec Loss 11.3983 LearningRate 0.0628 Epoch: 4 Global Step: 51620 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:35:00,876-Speed 3077.83 samples/sec Loss 11.4574 LearningRate 0.0628 Epoch: 4 Global Step: 51630 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:35:04,281-Speed 3007.86 samples/sec Loss 11.6563 LearningRate 0.0627 Epoch: 4 Global Step: 51640 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:35:07,609-Speed 3077.58 samples/sec Loss 11.5676 LearningRate 0.0627 Epoch: 4 Global Step: 51650 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:35:10,942-Speed 3072.97 samples/sec Loss 11.5249 LearningRate 0.0627 Epoch: 4 Global Step: 51660 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:35:14,345-Speed 3010.31 samples/sec Loss 11.5381 LearningRate 0.0627 Epoch: 4 Global Step: 51670 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:35:17,739-Speed 3018.12 samples/sec Loss 11.4557 LearningRate 0.0627 Epoch: 4 Global Step: 51680 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:35:21,054-Speed 3089.22 samples/sec Loss 11.2974 LearningRate 0.0627 Epoch: 4 Global Step: 51690 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:35:24,345-Speed 3112.49 samples/sec Loss 11.4376 LearningRate 0.0627 Epoch: 4 Global Step: 51700 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:35:27,690-Speed 3062.23 samples/sec Loss 11.3306 LearningRate 0.0627 Epoch: 4 Global Step: 51710 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:35:31,043-Speed 3055.63 samples/sec Loss 11.5178 LearningRate 0.0627 Epoch: 4 Global Step: 51720 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:35:34,410-Speed 3041.82 samples/sec Loss 11.5115 LearningRate 0.0627 Epoch: 4 Global Step: 51730 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:35:37,744-Speed 3073.02 samples/sec Loss 11.3897 LearningRate 0.0627 Epoch: 4 Global Step: 51740 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:35:41,070-Speed 3079.65 samples/sec Loss 11.4555 LearningRate 0.0627 Epoch: 4 Global Step: 51750 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:35:44,468-Speed 3015.01 samples/sec Loss 11.6378 LearningRate 0.0627 Epoch: 4 Global Step: 51760 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 06:35:47,768-Speed 3103.98 samples/sec Loss 11.5636 LearningRate 0.0627 Epoch: 4 Global Step: 51770 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 06:35:51,085-Speed 3088.15 samples/sec Loss 11.3124 LearningRate 0.0627 Epoch: 4 Global Step: 51780 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 06:35:54,480-Speed 3016.55 samples/sec Loss 11.5098 LearningRate 0.0627 Epoch: 4 Global Step: 51790 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 06:35:57,909-Speed 2987.56 samples/sec Loss 11.6361 LearningRate 0.0626 Epoch: 4 Global Step: 51800 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 06:36:01,238-Speed 3077.28 samples/sec Loss 11.3764 LearningRate 0.0626 Epoch: 4 Global Step: 51810 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 06:36:04,679-Speed 2977.00 samples/sec Loss 11.5924 LearningRate 0.0626 Epoch: 4 Global Step: 51820 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 06:36:07,994-Speed 3089.42 samples/sec Loss 11.5526 LearningRate 0.0626 Epoch: 4 Global Step: 51830 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 06:36:11,318-Speed 3081.43 samples/sec Loss 11.3979 LearningRate 0.0626 Epoch: 4 Global Step: 51840 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 06:36:14,616-Speed 3106.02 samples/sec Loss 11.3544 LearningRate 0.0626 Epoch: 4 Global Step: 51850 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 06:36:17,943-Speed 3078.70 samples/sec Loss 11.4557 LearningRate 0.0626 Epoch: 4 Global Step: 51860 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:36:21,276-Speed 3073.06 samples/sec Loss 11.5899 LearningRate 0.0626 Epoch: 4 Global Step: 51870 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:36:24,702-Speed 2990.17 samples/sec Loss 11.5094 LearningRate 0.0626 Epoch: 4 Global Step: 51880 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:36:28,037-Speed 3071.37 samples/sec Loss 11.6333 LearningRate 0.0626 Epoch: 4 Global Step: 51890 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:36:31,353-Speed 3089.19 samples/sec Loss 11.5299 LearningRate 0.0626 Epoch: 4 Global Step: 51900 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:36:34,661-Speed 3096.48 samples/sec Loss 11.5350 LearningRate 0.0626 Epoch: 4 Global Step: 51910 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:36:37,993-Speed 3073.74 samples/sec Loss 11.5738 LearningRate 0.0626 Epoch: 4 Global Step: 51920 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:36:41,342-Speed 3059.04 samples/sec Loss 11.4182 LearningRate 0.0626 Epoch: 4 Global Step: 51930 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:36:44,689-Speed 3059.76 samples/sec Loss 11.6173 LearningRate 0.0626 Epoch: 4 Global Step: 51940 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:36:48,060-Speed 3039.21 samples/sec Loss 11.5498 LearningRate 0.0625 Epoch: 4 Global Step: 51950 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:36:51,403-Speed 3063.52 samples/sec Loss 11.3173 LearningRate 0.0625 Epoch: 4 Global Step: 51960 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:36:54,732-Speed 3077.15 samples/sec Loss 11.4753 LearningRate 0.0625 Epoch: 4 Global Step: 51970 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:36:58,030-Speed 3106.21 samples/sec Loss 11.4384 LearningRate 0.0625 Epoch: 4 Global Step: 51980 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:37:01,381-Speed 3056.22 samples/sec Loss 11.5215 LearningRate 0.0625 Epoch: 4 Global Step: 51990 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:37:04,725-Speed 3062.98 samples/sec Loss 11.4861 LearningRate 0.0625 Epoch: 4 Global Step: 52000 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:37:08,078-Speed 3055.06 samples/sec Loss 11.4924 LearningRate 0.0625 Epoch: 4 Global Step: 52010 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:37:11,513-Speed 2982.13 samples/sec Loss 11.5119 LearningRate 0.0625 Epoch: 4 Global Step: 52020 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:37:14,903-Speed 3021.65 samples/sec Loss 11.3850 LearningRate 0.0625 Epoch: 4 Global Step: 52030 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:37:18,235-Speed 3073.66 samples/sec Loss 11.4635 LearningRate 0.0625 Epoch: 4 Global Step: 52040 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:37:21,548-Speed 3091.90 samples/sec Loss 11.5454 LearningRate 0.0625 Epoch: 4 Global Step: 52050 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:37:24,884-Speed 3070.41 samples/sec Loss 11.3607 LearningRate 0.0625 Epoch: 4 Global Step: 52060 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:37:28,287-Speed 3010.39 samples/sec Loss 11.4421 LearningRate 0.0625 Epoch: 4 Global Step: 52070 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:37:31,602-Speed 3091.05 samples/sec Loss 11.6396 LearningRate 0.0625 Epoch: 4 Global Step: 52080 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:37:34,965-Speed 3045.27 samples/sec Loss 11.3714 LearningRate 0.0625 Epoch: 4 Global Step: 52090 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:37:38,353-Speed 3023.71 samples/sec Loss 11.6197 LearningRate 0.0625 Epoch: 4 Global Step: 52100 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:37:41,721-Speed 3041.22 samples/sec Loss 11.6068 LearningRate 0.0624 Epoch: 4 Global Step: 52110 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:37:45,020-Speed 3104.82 samples/sec Loss 11.4673 LearningRate 0.0624 Epoch: 4 Global Step: 52120 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:37:48,378-Speed 3050.80 samples/sec Loss 11.4723 LearningRate 0.0624 Epoch: 4 Global Step: 52130 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:37:51,759-Speed 3029.05 samples/sec Loss 11.5202 LearningRate 0.0624 Epoch: 4 Global Step: 52140 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:37:55,160-Speed 3012.21 samples/sec Loss 11.6172 LearningRate 0.0624 Epoch: 4 Global Step: 52150 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:37:58,487-Speed 3078.40 samples/sec Loss 11.4489 LearningRate 0.0624 Epoch: 4 Global Step: 52160 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:38:01,876-Speed 3022.45 samples/sec Loss 11.6203 LearningRate 0.0624 Epoch: 4 Global Step: 52170 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:38:05,204-Speed 3078.08 samples/sec Loss 11.3931 LearningRate 0.0624 Epoch: 4 Global Step: 52180 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:38:08,556-Speed 3056.35 samples/sec Loss 11.5527 LearningRate 0.0624 Epoch: 4 Global Step: 52190 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:38:11,933-Speed 3032.81 samples/sec Loss 11.5370 LearningRate 0.0624 Epoch: 4 Global Step: 52200 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:38:15,271-Speed 3069.24 samples/sec Loss 11.4970 LearningRate 0.0624 Epoch: 4 Global Step: 52210 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 06:38:18,747-Speed 2946.82 samples/sec Loss 11.4302 LearningRate 0.0624 Epoch: 4 Global Step: 52220 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 06:38:22,087-Speed 3066.63 samples/sec Loss 11.5237 LearningRate 0.0624 Epoch: 4 Global Step: 52230 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 06:38:25,444-Speed 3051.12 samples/sec Loss 11.6305 LearningRate 0.0624 Epoch: 4 Global Step: 52240 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 06:38:28,768-Speed 3082.14 samples/sec Loss 11.4836 LearningRate 0.0624 Epoch: 4 Global Step: 52250 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 06:38:32,152-Speed 3026.81 samples/sec Loss 11.3285 LearningRate 0.0624 Epoch: 4 Global Step: 52260 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 06:38:35,523-Speed 3039.12 samples/sec Loss 11.7099 LearningRate 0.0623 Epoch: 4 Global Step: 52270 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 06:38:38,927-Speed 3008.86 samples/sec Loss 11.3358 LearningRate 0.0623 Epoch: 4 Global Step: 52280 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 06:38:42,309-Speed 3028.85 samples/sec Loss 11.5137 LearningRate 0.0623 Epoch: 4 Global Step: 52290 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 06:38:45,773-Speed 2956.48 samples/sec Loss 11.5053 LearningRate 0.0623 Epoch: 4 Global Step: 52300 Fp16 Grad Scale: 16384 Required: 18 hours Training: 2022-04-27 06:38:49,206-Speed 2983.36 samples/sec Loss 11.5574 LearningRate 0.0623 Epoch: 4 Global Step: 52310 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 06:38:52,606-Speed 3013.69 samples/sec Loss 11.6793 LearningRate 0.0623 Epoch: 4 Global Step: 52320 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 06:38:55,912-Speed 3097.70 samples/sec Loss 11.7196 LearningRate 0.0623 Epoch: 4 Global Step: 52330 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 06:38:59,244-Speed 3074.36 samples/sec Loss 11.6159 LearningRate 0.0623 Epoch: 4 Global Step: 52340 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 06:39:02,579-Speed 3071.36 samples/sec Loss 11.3768 LearningRate 0.0623 Epoch: 4 Global Step: 52350 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 06:39:05,921-Speed 3065.62 samples/sec Loss 11.5131 LearningRate 0.0623 Epoch: 4 Global Step: 52360 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 06:39:09,320-Speed 3013.25 samples/sec Loss 11.4978 LearningRate 0.0623 Epoch: 4 Global Step: 52370 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 06:39:12,740-Speed 2995.20 samples/sec Loss 11.4717 LearningRate 0.0623 Epoch: 4 Global Step: 52380 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 06:39:16,184-Speed 2973.89 samples/sec Loss 11.5423 LearningRate 0.0623 Epoch: 4 Global Step: 52390 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 06:39:19,639-Speed 2964.89 samples/sec Loss 11.5521 LearningRate 0.0623 Epoch: 4 Global Step: 52400 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 06:39:23,059-Speed 2995.13 samples/sec Loss 11.6368 LearningRate 0.0623 Epoch: 4 Global Step: 52410 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:39:26,403-Speed 3062.90 samples/sec Loss 11.5966 LearningRate 0.0622 Epoch: 4 Global Step: 52420 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:39:29,847-Speed 2974.30 samples/sec Loss 11.5709 LearningRate 0.0622 Epoch: 4 Global Step: 52430 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:39:33,238-Speed 3020.30 samples/sec Loss 11.6526 LearningRate 0.0622 Epoch: 4 Global Step: 52440 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:39:36,629-Speed 3021.44 samples/sec Loss 11.4652 LearningRate 0.0622 Epoch: 4 Global Step: 52450 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:39:39,989-Speed 3048.73 samples/sec Loss 11.6241 LearningRate 0.0622 Epoch: 4 Global Step: 52460 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:39:43,354-Speed 3043.08 samples/sec Loss 11.5563 LearningRate 0.0622 Epoch: 4 Global Step: 52470 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:39:46,697-Speed 3063.98 samples/sec Loss 11.4757 LearningRate 0.0622 Epoch: 4 Global Step: 52480 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:39:50,078-Speed 3029.45 samples/sec Loss 11.4553 LearningRate 0.0622 Epoch: 4 Global Step: 52490 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:39:53,470-Speed 3019.94 samples/sec Loss 11.5550 LearningRate 0.0622 Epoch: 4 Global Step: 52500 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:39:56,832-Speed 3046.45 samples/sec Loss 11.5511 LearningRate 0.0622 Epoch: 4 Global Step: 52510 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:40:00,159-Speed 3078.76 samples/sec Loss 11.5237 LearningRate 0.0622 Epoch: 4 Global Step: 52520 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:40:03,553-Speed 3018.49 samples/sec Loss 11.6642 LearningRate 0.0622 Epoch: 4 Global Step: 52530 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:40:06,868-Speed 3089.18 samples/sec Loss 11.6161 LearningRate 0.0622 Epoch: 4 Global Step: 52540 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:40:10,176-Speed 3096.67 samples/sec Loss 11.6641 LearningRate 0.0622 Epoch: 4 Global Step: 52550 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:40:13,486-Speed 3094.80 samples/sec Loss 11.6242 LearningRate 0.0622 Epoch: 4 Global Step: 52560 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:40:16,882-Speed 3015.58 samples/sec Loss 11.5397 LearningRate 0.0622 Epoch: 4 Global Step: 52570 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:40:20,287-Speed 3008.57 samples/sec Loss 11.6073 LearningRate 0.0621 Epoch: 4 Global Step: 52580 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:40:23,693-Speed 3007.82 samples/sec Loss 11.5848 LearningRate 0.0621 Epoch: 4 Global Step: 52590 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:40:27,070-Speed 3032.78 samples/sec Loss 11.5148 LearningRate 0.0621 Epoch: 4 Global Step: 52600 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:40:30,534-Speed 2957.39 samples/sec Loss 11.5442 LearningRate 0.0621 Epoch: 4 Global Step: 52610 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:40:33,837-Speed 3100.74 samples/sec Loss 11.3956 LearningRate 0.0621 Epoch: 4 Global Step: 52620 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:40:37,216-Speed 3030.85 samples/sec Loss 11.4521 LearningRate 0.0621 Epoch: 4 Global Step: 52630 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:40:40,675-Speed 2961.70 samples/sec Loss 11.6088 LearningRate 0.0621 Epoch: 4 Global Step: 52640 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:40:44,027-Speed 3056.23 samples/sec Loss 11.4317 LearningRate 0.0621 Epoch: 4 Global Step: 52650 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:40:47,391-Speed 3044.38 samples/sec Loss 11.4772 LearningRate 0.0621 Epoch: 4 Global Step: 52660 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:40:50,731-Speed 3067.19 samples/sec Loss 11.4311 LearningRate 0.0621 Epoch: 4 Global Step: 52670 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:40:54,103-Speed 3037.52 samples/sec Loss 11.5573 LearningRate 0.0621 Epoch: 4 Global Step: 52680 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:40:57,483-Speed 3029.99 samples/sec Loss 11.6076 LearningRate 0.0621 Epoch: 4 Global Step: 52690 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:41:00,841-Speed 3049.87 samples/sec Loss 11.4488 LearningRate 0.0621 Epoch: 4 Global Step: 52700 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:41:04,174-Speed 3073.88 samples/sec Loss 11.6093 LearningRate 0.0621 Epoch: 4 Global Step: 52710 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:41:07,565-Speed 3020.47 samples/sec Loss 11.4824 LearningRate 0.0621 Epoch: 4 Global Step: 52720 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:41:10,893-Speed 3078.12 samples/sec Loss 11.3490 LearningRate 0.0621 Epoch: 4 Global Step: 52730 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:41:14,192-Speed 3105.12 samples/sec Loss 11.5135 LearningRate 0.0620 Epoch: 4 Global Step: 52740 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:41:17,577-Speed 3025.69 samples/sec Loss 11.7010 LearningRate 0.0620 Epoch: 4 Global Step: 52750 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:41:20,969-Speed 3019.73 samples/sec Loss 11.5087 LearningRate 0.0620 Epoch: 4 Global Step: 52760 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:41:24,413-Speed 2974.10 samples/sec Loss 11.7607 LearningRate 0.0620 Epoch: 4 Global Step: 52770 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:41:27,815-Speed 3010.89 samples/sec Loss 11.5344 LearningRate 0.0620 Epoch: 4 Global Step: 52780 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:41:31,183-Speed 3041.16 samples/sec Loss 11.4367 LearningRate 0.0620 Epoch: 4 Global Step: 52790 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:41:34,545-Speed 3047.31 samples/sec Loss 11.4832 LearningRate 0.0620 Epoch: 4 Global Step: 52800 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:41:37,900-Speed 3052.74 samples/sec Loss 11.5413 LearningRate 0.0620 Epoch: 4 Global Step: 52810 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:41:41,203-Speed 3101.38 samples/sec Loss 11.6134 LearningRate 0.0620 Epoch: 4 Global Step: 52820 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:41:44,504-Speed 3102.35 samples/sec Loss 11.4772 LearningRate 0.0620 Epoch: 4 Global Step: 52830 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:41:47,805-Speed 3103.83 samples/sec Loss 11.5041 LearningRate 0.0620 Epoch: 4 Global Step: 52840 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:41:51,136-Speed 3075.66 samples/sec Loss 11.7017 LearningRate 0.0620 Epoch: 4 Global Step: 52850 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:41:54,480-Speed 3063.17 samples/sec Loss 11.7797 LearningRate 0.0620 Epoch: 4 Global Step: 52860 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:41:57,862-Speed 3028.98 samples/sec Loss 11.4753 LearningRate 0.0620 Epoch: 4 Global Step: 52870 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:42:01,204-Speed 3064.95 samples/sec Loss 11.5872 LearningRate 0.0620 Epoch: 4 Global Step: 52880 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:42:04,547-Speed 3064.29 samples/sec Loss 11.6314 LearningRate 0.0620 Epoch: 4 Global Step: 52890 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:42:07,912-Speed 3044.13 samples/sec Loss 11.6410 LearningRate 0.0619 Epoch: 4 Global Step: 52900 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:42:11,251-Speed 3067.86 samples/sec Loss 11.5472 LearningRate 0.0619 Epoch: 4 Global Step: 52910 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:42:14,628-Speed 3032.70 samples/sec Loss 11.6932 LearningRate 0.0619 Epoch: 4 Global Step: 52920 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:42:18,045-Speed 2997.49 samples/sec Loss 11.6474 LearningRate 0.0619 Epoch: 4 Global Step: 52930 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:42:21,366-Speed 3084.09 samples/sec Loss 11.5503 LearningRate 0.0619 Epoch: 4 Global Step: 52940 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:42:24,766-Speed 3012.85 samples/sec Loss 11.6461 LearningRate 0.0619 Epoch: 4 Global Step: 52950 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:42:28,168-Speed 3010.77 samples/sec Loss 11.5648 LearningRate 0.0619 Epoch: 4 Global Step: 52960 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:42:31,535-Speed 3042.25 samples/sec Loss 11.6238 LearningRate 0.0619 Epoch: 4 Global Step: 52970 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:42:34,877-Speed 3065.21 samples/sec Loss 11.7090 LearningRate 0.0619 Epoch: 4 Global Step: 52980 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:42:38,266-Speed 3021.92 samples/sec Loss 11.6024 LearningRate 0.0619 Epoch: 4 Global Step: 52990 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:42:41,623-Speed 3051.21 samples/sec Loss 11.6165 LearningRate 0.0619 Epoch: 4 Global Step: 53000 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:42:45,101-Speed 2945.11 samples/sec Loss 11.5451 LearningRate 0.0619 Epoch: 4 Global Step: 53010 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:42:48,500-Speed 3013.53 samples/sec Loss 11.6437 LearningRate 0.0619 Epoch: 4 Global Step: 53020 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:42:51,917-Speed 2997.66 samples/sec Loss 11.5956 LearningRate 0.0619 Epoch: 4 Global Step: 53030 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:42:55,223-Speed 3097.97 samples/sec Loss 11.6205 LearningRate 0.0619 Epoch: 4 Global Step: 53040 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:42:58,559-Speed 3070.31 samples/sec Loss 11.4869 LearningRate 0.0619 Epoch: 4 Global Step: 53050 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:43:02,012-Speed 2966.09 samples/sec Loss 11.6238 LearningRate 0.0618 Epoch: 4 Global Step: 53060 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:43:05,332-Speed 3086.17 samples/sec Loss 11.6509 LearningRate 0.0618 Epoch: 4 Global Step: 53070 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:43:08,654-Speed 3083.40 samples/sec Loss 11.5492 LearningRate 0.0618 Epoch: 4 Global Step: 53080 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:43:12,030-Speed 3033.81 samples/sec Loss 11.3747 LearningRate 0.0618 Epoch: 4 Global Step: 53090 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:43:15,388-Speed 3050.52 samples/sec Loss 11.6113 LearningRate 0.0618 Epoch: 4 Global Step: 53100 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:43:18,779-Speed 3020.50 samples/sec Loss 11.6653 LearningRate 0.0618 Epoch: 4 Global Step: 53110 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:43:22,112-Speed 3072.85 samples/sec Loss 11.6415 LearningRate 0.0618 Epoch: 4 Global Step: 53120 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:43:25,507-Speed 3017.20 samples/sec Loss 11.4282 LearningRate 0.0618 Epoch: 4 Global Step: 53130 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:43:28,929-Speed 2993.33 samples/sec Loss 11.4686 LearningRate 0.0618 Epoch: 4 Global Step: 53140 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:43:32,277-Speed 3059.27 samples/sec Loss 11.4220 LearningRate 0.0618 Epoch: 4 Global Step: 53150 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:43:35,630-Speed 3055.24 samples/sec Loss 11.6781 LearningRate 0.0618 Epoch: 4 Global Step: 53160 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:43:38,997-Speed 3042.00 samples/sec Loss 11.6844 LearningRate 0.0618 Epoch: 4 Global Step: 53170 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:43:42,341-Speed 3063.29 samples/sec Loss 11.5940 LearningRate 0.0618 Epoch: 4 Global Step: 53180 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:43:45,725-Speed 3026.66 samples/sec Loss 11.5914 LearningRate 0.0618 Epoch: 4 Global Step: 53190 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:43:49,142-Speed 2997.25 samples/sec Loss 11.6731 LearningRate 0.0618 Epoch: 4 Global Step: 53200 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:43:52,502-Speed 3048.92 samples/sec Loss 11.5838 LearningRate 0.0617 Epoch: 4 Global Step: 53210 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:43:55,885-Speed 3027.93 samples/sec Loss 11.6365 LearningRate 0.0617 Epoch: 4 Global Step: 53220 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:43:59,265-Speed 3030.23 samples/sec Loss 11.6113 LearningRate 0.0617 Epoch: 4 Global Step: 53230 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:44:02,672-Speed 3006.01 samples/sec Loss 11.7168 LearningRate 0.0617 Epoch: 4 Global Step: 53240 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:44:06,004-Speed 3074.29 samples/sec Loss 11.5103 LearningRate 0.0617 Epoch: 4 Global Step: 53250 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:44:09,327-Speed 3082.42 samples/sec Loss 11.3810 LearningRate 0.0617 Epoch: 4 Global Step: 53260 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:44:12,692-Speed 3043.98 samples/sec Loss 11.5713 LearningRate 0.0617 Epoch: 4 Global Step: 53270 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:44:16,082-Speed 3021.55 samples/sec Loss 11.6224 LearningRate 0.0617 Epoch: 4 Global Step: 53280 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:44:19,476-Speed 3018.24 samples/sec Loss 11.5785 LearningRate 0.0617 Epoch: 4 Global Step: 53290 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:44:22,861-Speed 3026.01 samples/sec Loss 11.5583 LearningRate 0.0617 Epoch: 4 Global Step: 53300 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:44:26,237-Speed 3033.94 samples/sec Loss 11.6053 LearningRate 0.0617 Epoch: 4 Global Step: 53310 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:44:29,536-Speed 3104.13 samples/sec Loss 11.4815 LearningRate 0.0617 Epoch: 4 Global Step: 53320 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:44:32,974-Speed 2979.55 samples/sec Loss 11.5005 LearningRate 0.0617 Epoch: 4 Global Step: 53330 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:44:36,437-Speed 2957.77 samples/sec Loss 11.7660 LearningRate 0.0617 Epoch: 4 Global Step: 53340 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:44:39,893-Speed 2964.20 samples/sec Loss 11.6570 LearningRate 0.0617 Epoch: 4 Global Step: 53350 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:44:43,361-Speed 2953.33 samples/sec Loss 11.6226 LearningRate 0.0617 Epoch: 4 Global Step: 53360 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:44:46,704-Speed 3064.19 samples/sec Loss 11.3747 LearningRate 0.0616 Epoch: 4 Global Step: 53370 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:44:50,094-Speed 3021.67 samples/sec Loss 11.5258 LearningRate 0.0616 Epoch: 4 Global Step: 53380 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:44:53,489-Speed 3016.31 samples/sec Loss 11.6545 LearningRate 0.0616 Epoch: 4 Global Step: 53390 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:44:56,869-Speed 3031.23 samples/sec Loss 11.5663 LearningRate 0.0616 Epoch: 4 Global Step: 53400 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:45:00,275-Speed 3007.16 samples/sec Loss 11.6166 LearningRate 0.0616 Epoch: 4 Global Step: 53410 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:45:03,681-Speed 3007.00 samples/sec Loss 11.5613 LearningRate 0.0616 Epoch: 4 Global Step: 53420 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:45:07,036-Speed 3053.36 samples/sec Loss 11.6251 LearningRate 0.0616 Epoch: 4 Global Step: 53430 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:45:10,363-Speed 3079.78 samples/sec Loss 11.7030 LearningRate 0.0616 Epoch: 4 Global Step: 53440 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:45:13,735-Speed 3037.29 samples/sec Loss 11.4739 LearningRate 0.0616 Epoch: 4 Global Step: 53450 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:45:17,089-Speed 3054.45 samples/sec Loss 11.7657 LearningRate 0.0616 Epoch: 4 Global Step: 53460 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:45:20,461-Speed 3037.53 samples/sec Loss 11.3226 LearningRate 0.0616 Epoch: 4 Global Step: 53470 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:45:23,888-Speed 2988.74 samples/sec Loss 11.4777 LearningRate 0.0616 Epoch: 4 Global Step: 53480 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:45:27,308-Speed 2995.63 samples/sec Loss 11.5155 LearningRate 0.0616 Epoch: 4 Global Step: 53490 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:45:30,728-Speed 2994.59 samples/sec Loss 11.7165 LearningRate 0.0616 Epoch: 4 Global Step: 53500 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:45:34,103-Speed 3034.83 samples/sec Loss 11.4667 LearningRate 0.0616 Epoch: 4 Global Step: 53510 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:45:37,502-Speed 3013.04 samples/sec Loss 11.5797 LearningRate 0.0616 Epoch: 4 Global Step: 53520 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:45:40,931-Speed 2987.52 samples/sec Loss 11.6972 LearningRate 0.0615 Epoch: 4 Global Step: 53530 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:45:44,306-Speed 3034.99 samples/sec Loss 11.6369 LearningRate 0.0615 Epoch: 4 Global Step: 53540 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:45:47,631-Speed 3080.06 samples/sec Loss 11.5096 LearningRate 0.0615 Epoch: 4 Global Step: 53550 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:45:50,999-Speed 3042.53 samples/sec Loss 11.6290 LearningRate 0.0615 Epoch: 4 Global Step: 53560 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:45:54,347-Speed 3059.41 samples/sec Loss 11.5215 LearningRate 0.0615 Epoch: 4 Global Step: 53570 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:45:57,699-Speed 3055.45 samples/sec Loss 11.7505 LearningRate 0.0615 Epoch: 4 Global Step: 53580 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:46:01,098-Speed 3013.44 samples/sec Loss 11.6729 LearningRate 0.0615 Epoch: 4 Global Step: 53590 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:46:04,428-Speed 3076.13 samples/sec Loss 11.6419 LearningRate 0.0615 Epoch: 4 Global Step: 53600 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:46:07,815-Speed 3024.14 samples/sec Loss 11.5296 LearningRate 0.0615 Epoch: 4 Global Step: 53610 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:46:11,206-Speed 3020.58 samples/sec Loss 11.6102 LearningRate 0.0615 Epoch: 4 Global Step: 53620 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:46:14,591-Speed 3025.52 samples/sec Loss 11.5859 LearningRate 0.0615 Epoch: 4 Global Step: 53630 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 06:46:17,987-Speed 3016.93 samples/sec Loss 11.6633 LearningRate 0.0615 Epoch: 4 Global Step: 53640 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 06:46:21,306-Speed 3085.36 samples/sec Loss 11.6721 LearningRate 0.0615 Epoch: 4 Global Step: 53650 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 06:46:24,657-Speed 3056.82 samples/sec Loss 11.5103 LearningRate 0.0615 Epoch: 4 Global Step: 53660 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 06:46:28,029-Speed 3037.86 samples/sec Loss 11.7440 LearningRate 0.0615 Epoch: 4 Global Step: 53670 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 06:46:31,330-Speed 3102.95 samples/sec Loss 11.4998 LearningRate 0.0615 Epoch: 4 Global Step: 53680 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 06:46:34,688-Speed 3049.80 samples/sec Loss 11.5729 LearningRate 0.0614 Epoch: 4 Global Step: 53690 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 06:46:38,109-Speed 2994.50 samples/sec Loss 11.3812 LearningRate 0.0614 Epoch: 4 Global Step: 53700 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 06:46:41,442-Speed 3073.02 samples/sec Loss 11.5781 LearningRate 0.0614 Epoch: 4 Global Step: 53710 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 06:46:44,855-Speed 3000.49 samples/sec Loss 11.5163 LearningRate 0.0614 Epoch: 4 Global Step: 53720 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 06:46:48,256-Speed 3012.62 samples/sec Loss 11.5531 LearningRate 0.0614 Epoch: 4 Global Step: 53730 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:46:51,703-Speed 2971.51 samples/sec Loss 11.5736 LearningRate 0.0614 Epoch: 4 Global Step: 53740 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:46:55,082-Speed 3030.99 samples/sec Loss 11.5702 LearningRate 0.0614 Epoch: 4 Global Step: 53750 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:46:58,511-Speed 2987.66 samples/sec Loss 11.5257 LearningRate 0.0614 Epoch: 4 Global Step: 53760 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:47:01,886-Speed 3034.46 samples/sec Loss 11.5903 LearningRate 0.0614 Epoch: 4 Global Step: 53770 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:47:05,274-Speed 3023.85 samples/sec Loss 11.6636 LearningRate 0.0614 Epoch: 4 Global Step: 53780 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:47:08,738-Speed 2956.99 samples/sec Loss 11.6953 LearningRate 0.0614 Epoch: 4 Global Step: 53790 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:47:12,093-Speed 3053.28 samples/sec Loss 11.5624 LearningRate 0.0614 Epoch: 4 Global Step: 53800 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:47:15,411-Speed 3086.92 samples/sec Loss 11.5044 LearningRate 0.0614 Epoch: 4 Global Step: 53810 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:47:18,793-Speed 3028.85 samples/sec Loss 11.5833 LearningRate 0.0614 Epoch: 4 Global Step: 53820 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:47:22,168-Speed 3034.65 samples/sec Loss 11.6299 LearningRate 0.0614 Epoch: 4 Global Step: 53830 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:47:25,574-Speed 3007.05 samples/sec Loss 11.4583 LearningRate 0.0614 Epoch: 4 Global Step: 53840 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:47:28,955-Speed 3030.17 samples/sec Loss 11.5450 LearningRate 0.0613 Epoch: 4 Global Step: 53850 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:47:32,342-Speed 3024.04 samples/sec Loss 11.4612 LearningRate 0.0613 Epoch: 4 Global Step: 53860 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:47:35,823-Speed 2942.08 samples/sec Loss 11.6803 LearningRate 0.0613 Epoch: 4 Global Step: 53870 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:47:39,241-Speed 2997.48 samples/sec Loss 11.4536 LearningRate 0.0613 Epoch: 4 Global Step: 53880 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:47:42,644-Speed 3009.94 samples/sec Loss 11.5574 LearningRate 0.0613 Epoch: 4 Global Step: 53890 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:47:45,962-Speed 3087.06 samples/sec Loss 11.6529 LearningRate 0.0613 Epoch: 4 Global Step: 53900 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:47:49,349-Speed 3024.22 samples/sec Loss 11.5886 LearningRate 0.0613 Epoch: 4 Global Step: 53910 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:47:52,665-Speed 3089.10 samples/sec Loss 11.6872 LearningRate 0.0613 Epoch: 4 Global Step: 53920 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:47:55,967-Speed 3102.01 samples/sec Loss 11.4107 LearningRate 0.0613 Epoch: 4 Global Step: 53930 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:47:59,262-Speed 3108.53 samples/sec Loss 11.6026 LearningRate 0.0613 Epoch: 4 Global Step: 53940 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:48:02,581-Speed 3086.04 samples/sec Loss 11.5950 LearningRate 0.0613 Epoch: 4 Global Step: 53950 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:48:05,987-Speed 3007.21 samples/sec Loss 11.4597 LearningRate 0.0613 Epoch: 4 Global Step: 53960 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:48:09,305-Speed 3087.15 samples/sec Loss 11.6579 LearningRate 0.0613 Epoch: 4 Global Step: 53970 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:48:12,778-Speed 2949.43 samples/sec Loss 11.7087 LearningRate 0.0613 Epoch: 4 Global Step: 53980 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:48:16,206-Speed 2988.05 samples/sec Loss 11.7129 LearningRate 0.0613 Epoch: 4 Global Step: 53990 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:48:20,473-Speed 2400.55 samples/sec Loss 11.7100 LearningRate 0.0613 Epoch: 4 Global Step: 54000 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:48:23,958-Speed 2939.01 samples/sec Loss 11.6768 LearningRate 0.0612 Epoch: 4 Global Step: 54010 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:48:27,356-Speed 3014.47 samples/sec Loss 11.6457 LearningRate 0.0612 Epoch: 4 Global Step: 54020 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:48:30,714-Speed 3050.07 samples/sec Loss 11.5811 LearningRate 0.0612 Epoch: 4 Global Step: 54030 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:48:34,148-Speed 2982.53 samples/sec Loss 11.5617 LearningRate 0.0612 Epoch: 4 Global Step: 54040 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:48:37,568-Speed 2995.31 samples/sec Loss 11.6401 LearningRate 0.0612 Epoch: 4 Global Step: 54050 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:48:40,911-Speed 3065.68 samples/sec Loss 11.5651 LearningRate 0.0612 Epoch: 4 Global Step: 54060 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:48:44,202-Speed 3111.71 samples/sec Loss 11.5416 LearningRate 0.0612 Epoch: 4 Global Step: 54070 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:48:47,577-Speed 3035.50 samples/sec Loss 11.4634 LearningRate 0.0612 Epoch: 4 Global Step: 54080 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:48:51,091-Speed 2914.44 samples/sec Loss 11.5157 LearningRate 0.0612 Epoch: 4 Global Step: 54090 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:48:54,560-Speed 2952.80 samples/sec Loss 11.6055 LearningRate 0.0612 Epoch: 4 Global Step: 54100 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:48:57,989-Speed 2987.53 samples/sec Loss 11.7500 LearningRate 0.0612 Epoch: 4 Global Step: 54110 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:49:01,390-Speed 3011.65 samples/sec Loss 11.5336 LearningRate 0.0612 Epoch: 4 Global Step: 54120 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:49:04,764-Speed 3035.75 samples/sec Loss 11.6623 LearningRate 0.0612 Epoch: 4 Global Step: 54130 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:49:08,184-Speed 2994.67 samples/sec Loss 11.6055 LearningRate 0.0612 Epoch: 4 Global Step: 54140 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:49:11,580-Speed 3016.82 samples/sec Loss 11.5778 LearningRate 0.0612 Epoch: 4 Global Step: 54150 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:49:14,945-Speed 3043.46 samples/sec Loss 11.6073 LearningRate 0.0611 Epoch: 4 Global Step: 54160 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:49:18,294-Speed 3058.70 samples/sec Loss 11.7006 LearningRate 0.0611 Epoch: 4 Global Step: 54170 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:49:21,707-Speed 3000.44 samples/sec Loss 11.6388 LearningRate 0.0611 Epoch: 4 Global Step: 54180 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:49:25,037-Speed 3076.50 samples/sec Loss 11.7313 LearningRate 0.0611 Epoch: 4 Global Step: 54190 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:49:28,458-Speed 2994.29 samples/sec Loss 11.5348 LearningRate 0.0611 Epoch: 4 Global Step: 54200 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:49:31,810-Speed 3055.10 samples/sec Loss 11.7427 LearningRate 0.0611 Epoch: 4 Global Step: 54210 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:49:35,141-Speed 3075.67 samples/sec Loss 11.6284 LearningRate 0.0611 Epoch: 4 Global Step: 54220 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:49:38,515-Speed 3035.99 samples/sec Loss 11.6754 LearningRate 0.0611 Epoch: 4 Global Step: 54230 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:49:41,837-Speed 3083.24 samples/sec Loss 11.4993 LearningRate 0.0611 Epoch: 4 Global Step: 54240 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:49:45,147-Speed 3094.99 samples/sec Loss 11.7094 LearningRate 0.0611 Epoch: 4 Global Step: 54250 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:49:48,577-Speed 2986.02 samples/sec Loss 11.4413 LearningRate 0.0611 Epoch: 4 Global Step: 54260 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:49:52,036-Speed 2961.63 samples/sec Loss 11.5536 LearningRate 0.0611 Epoch: 4 Global Step: 54270 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:49:55,409-Speed 3037.23 samples/sec Loss 11.6943 LearningRate 0.0611 Epoch: 4 Global Step: 54280 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:49:58,835-Speed 2989.57 samples/sec Loss 11.7027 LearningRate 0.0611 Epoch: 4 Global Step: 54290 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:50:02,183-Speed 3059.69 samples/sec Loss 11.6542 LearningRate 0.0611 Epoch: 4 Global Step: 54300 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:50:05,575-Speed 3019.48 samples/sec Loss 11.5895 LearningRate 0.0611 Epoch: 4 Global Step: 54310 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:50:08,951-Speed 3033.81 samples/sec Loss 11.6485 LearningRate 0.0610 Epoch: 4 Global Step: 54320 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:50:12,297-Speed 3061.52 samples/sec Loss 11.6097 LearningRate 0.0610 Epoch: 4 Global Step: 54330 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:50:15,672-Speed 3034.24 samples/sec Loss 11.5364 LearningRate 0.0610 Epoch: 4 Global Step: 54340 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:50:19,019-Speed 3061.27 samples/sec Loss 11.5736 LearningRate 0.0610 Epoch: 4 Global Step: 54350 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:50:22,470-Speed 2968.20 samples/sec Loss 11.5920 LearningRate 0.0610 Epoch: 4 Global Step: 54360 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:50:25,827-Speed 3051.15 samples/sec Loss 11.6272 LearningRate 0.0610 Epoch: 4 Global Step: 54370 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:50:29,131-Speed 3099.61 samples/sec Loss 11.4548 LearningRate 0.0610 Epoch: 4 Global Step: 54380 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:50:32,559-Speed 2988.43 samples/sec Loss 11.6443 LearningRate 0.0610 Epoch: 4 Global Step: 54390 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:50:35,934-Speed 3036.30 samples/sec Loss 11.4786 LearningRate 0.0610 Epoch: 4 Global Step: 54400 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:50:39,224-Speed 3113.74 samples/sec Loss 11.6555 LearningRate 0.0610 Epoch: 4 Global Step: 54410 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:50:42,506-Speed 3120.54 samples/sec Loss 11.3626 LearningRate 0.0610 Epoch: 4 Global Step: 54420 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:50:45,961-Speed 2964.25 samples/sec Loss 11.6597 LearningRate 0.0610 Epoch: 4 Global Step: 54430 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:50:49,407-Speed 2972.54 samples/sec Loss 11.7660 LearningRate 0.0610 Epoch: 4 Global Step: 54440 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:50:52,706-Speed 3105.71 samples/sec Loss 11.3766 LearningRate 0.0610 Epoch: 4 Global Step: 54450 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:50:56,114-Speed 3005.07 samples/sec Loss 11.5163 LearningRate 0.0610 Epoch: 4 Global Step: 54460 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:50:59,443-Speed 3077.14 samples/sec Loss 11.5583 LearningRate 0.0610 Epoch: 4 Global Step: 54470 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:51:02,758-Speed 3090.07 samples/sec Loss 11.6187 LearningRate 0.0609 Epoch: 4 Global Step: 54480 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:51:06,125-Speed 3042.15 samples/sec Loss 11.6358 LearningRate 0.0609 Epoch: 4 Global Step: 54490 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:51:09,443-Speed 3087.68 samples/sec Loss 11.6346 LearningRate 0.0609 Epoch: 4 Global Step: 54500 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:51:12,851-Speed 3005.67 samples/sec Loss 11.5825 LearningRate 0.0609 Epoch: 4 Global Step: 54510 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:51:16,298-Speed 2971.80 samples/sec Loss 11.6363 LearningRate 0.0609 Epoch: 4 Global Step: 54520 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:51:19,623-Speed 3079.87 samples/sec Loss 11.5246 LearningRate 0.0609 Epoch: 4 Global Step: 54530 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:51:22,970-Speed 3060.64 samples/sec Loss 11.6334 LearningRate 0.0609 Epoch: 4 Global Step: 54540 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:51:26,356-Speed 3025.12 samples/sec Loss 11.6182 LearningRate 0.0609 Epoch: 4 Global Step: 54550 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:51:29,712-Speed 3052.33 samples/sec Loss 11.6346 LearningRate 0.0609 Epoch: 4 Global Step: 54560 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:51:33,116-Speed 3009.03 samples/sec Loss 11.4725 LearningRate 0.0609 Epoch: 4 Global Step: 54570 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:51:36,496-Speed 3030.79 samples/sec Loss 11.5694 LearningRate 0.0609 Epoch: 4 Global Step: 54580 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:51:39,967-Speed 2950.55 samples/sec Loss 11.4761 LearningRate 0.0609 Epoch: 4 Global Step: 54590 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:51:43,384-Speed 2998.30 samples/sec Loss 11.6286 LearningRate 0.0609 Epoch: 4 Global Step: 54600 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:51:46,795-Speed 3002.90 samples/sec Loss 11.3999 LearningRate 0.0609 Epoch: 4 Global Step: 54610 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:51:50,135-Speed 3066.74 samples/sec Loss 11.3411 LearningRate 0.0609 Epoch: 4 Global Step: 54620 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:51:53,540-Speed 3007.39 samples/sec Loss 11.4038 LearningRate 0.0609 Epoch: 4 Global Step: 54630 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:51:56,864-Speed 3081.65 samples/sec Loss 11.6866 LearningRate 0.0608 Epoch: 4 Global Step: 54640 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:52:00,241-Speed 3032.96 samples/sec Loss 11.4638 LearningRate 0.0608 Epoch: 4 Global Step: 54650 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:52:03,608-Speed 3042.55 samples/sec Loss 11.6801 LearningRate 0.0608 Epoch: 4 Global Step: 54660 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:52:06,920-Speed 3092.99 samples/sec Loss 11.6925 LearningRate 0.0608 Epoch: 4 Global Step: 54670 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:52:10,298-Speed 3032.00 samples/sec Loss 11.5874 LearningRate 0.0608 Epoch: 4 Global Step: 54680 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:52:13,662-Speed 3044.69 samples/sec Loss 11.5637 LearningRate 0.0608 Epoch: 4 Global Step: 54690 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:52:16,968-Speed 3098.58 samples/sec Loss 11.3575 LearningRate 0.0608 Epoch: 4 Global Step: 54700 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:52:20,315-Speed 3059.97 samples/sec Loss 11.4761 LearningRate 0.0608 Epoch: 4 Global Step: 54710 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:52:23,713-Speed 3014.92 samples/sec Loss 11.4925 LearningRate 0.0608 Epoch: 4 Global Step: 54720 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:52:27,099-Speed 3024.84 samples/sec Loss 11.6400 LearningRate 0.0608 Epoch: 4 Global Step: 54730 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:52:30,477-Speed 3032.17 samples/sec Loss 11.6669 LearningRate 0.0608 Epoch: 4 Global Step: 54740 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:52:33,799-Speed 3083.49 samples/sec Loss 11.6831 LearningRate 0.0608 Epoch: 4 Global Step: 54750 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:52:37,173-Speed 3035.64 samples/sec Loss 11.5887 LearningRate 0.0608 Epoch: 4 Global Step: 54760 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:52:40,540-Speed 3042.44 samples/sec Loss 11.6317 LearningRate 0.0608 Epoch: 4 Global Step: 54770 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:52:43,988-Speed 2971.30 samples/sec Loss 11.5056 LearningRate 0.0608 Epoch: 4 Global Step: 54780 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:52:47,477-Speed 2935.18 samples/sec Loss 11.7754 LearningRate 0.0608 Epoch: 4 Global Step: 54790 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:52:50,791-Speed 3091.37 samples/sec Loss 11.6175 LearningRate 0.0607 Epoch: 4 Global Step: 54800 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:52:54,200-Speed 3004.93 samples/sec Loss 11.6063 LearningRate 0.0607 Epoch: 4 Global Step: 54810 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:52:57,595-Speed 3016.84 samples/sec Loss 11.6220 LearningRate 0.0607 Epoch: 4 Global Step: 54820 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:53:01,008-Speed 3001.26 samples/sec Loss 11.5411 LearningRate 0.0607 Epoch: 4 Global Step: 54830 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:53:04,344-Speed 3071.16 samples/sec Loss 11.4212 LearningRate 0.0607 Epoch: 4 Global Step: 54840 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:53:07,789-Speed 2973.49 samples/sec Loss 11.5562 LearningRate 0.0607 Epoch: 4 Global Step: 54850 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:53:11,188-Speed 3013.86 samples/sec Loss 11.5483 LearningRate 0.0607 Epoch: 4 Global Step: 54860 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:53:14,555-Speed 3042.11 samples/sec Loss 11.5924 LearningRate 0.0607 Epoch: 4 Global Step: 54870 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:53:17,932-Speed 3032.50 samples/sec Loss 11.4875 LearningRate 0.0607 Epoch: 4 Global Step: 54880 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:53:21,290-Speed 3050.34 samples/sec Loss 11.4838 LearningRate 0.0607 Epoch: 4 Global Step: 54890 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:53:24,711-Speed 2994.72 samples/sec Loss 11.6014 LearningRate 0.0607 Epoch: 4 Global Step: 54900 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:53:28,083-Speed 3036.93 samples/sec Loss 11.6597 LearningRate 0.0607 Epoch: 4 Global Step: 54910 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:53:31,469-Speed 3025.14 samples/sec Loss 11.5459 LearningRate 0.0607 Epoch: 4 Global Step: 54920 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:53:34,890-Speed 2993.98 samples/sec Loss 11.3760 LearningRate 0.0607 Epoch: 4 Global Step: 54930 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:53:38,403-Speed 2924.40 samples/sec Loss 11.6175 LearningRate 0.0607 Epoch: 4 Global Step: 54940 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:53:41,750-Speed 3060.34 samples/sec Loss 11.7068 LearningRate 0.0607 Epoch: 4 Global Step: 54950 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:53:45,075-Speed 3080.55 samples/sec Loss 11.5954 LearningRate 0.0606 Epoch: 4 Global Step: 54960 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:53:48,448-Speed 3036.57 samples/sec Loss 11.5201 LearningRate 0.0606 Epoch: 4 Global Step: 54970 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:53:51,785-Speed 3070.09 samples/sec Loss 11.4773 LearningRate 0.0606 Epoch: 4 Global Step: 54980 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:53:55,150-Speed 3043.62 samples/sec Loss 11.4960 LearningRate 0.0606 Epoch: 4 Global Step: 54990 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:53:58,466-Speed 3089.65 samples/sec Loss 11.5797 LearningRate 0.0606 Epoch: 4 Global Step: 55000 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:54:01,840-Speed 3035.52 samples/sec Loss 11.3024 LearningRate 0.0606 Epoch: 4 Global Step: 55010 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:54:05,233-Speed 3018.92 samples/sec Loss 11.6026 LearningRate 0.0606 Epoch: 4 Global Step: 55020 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:54:08,594-Speed 3047.72 samples/sec Loss 11.6042 LearningRate 0.0606 Epoch: 4 Global Step: 55030 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:54:11,932-Speed 3069.19 samples/sec Loss 11.5802 LearningRate 0.0606 Epoch: 4 Global Step: 55040 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:54:15,325-Speed 3018.49 samples/sec Loss 11.4687 LearningRate 0.0606 Epoch: 4 Global Step: 55050 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:54:18,703-Speed 3031.75 samples/sec Loss 11.5485 LearningRate 0.0606 Epoch: 4 Global Step: 55060 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:54:22,053-Speed 3058.48 samples/sec Loss 11.5818 LearningRate 0.0606 Epoch: 4 Global Step: 55070 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:54:25,424-Speed 3038.78 samples/sec Loss 11.6391 LearningRate 0.0606 Epoch: 4 Global Step: 55080 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:54:28,835-Speed 3002.88 samples/sec Loss 11.4567 LearningRate 0.0606 Epoch: 4 Global Step: 55090 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:54:32,286-Speed 2968.48 samples/sec Loss 11.3760 LearningRate 0.0606 Epoch: 4 Global Step: 55100 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:54:35,620-Speed 3072.28 samples/sec Loss 11.6156 LearningRate 0.0606 Epoch: 4 Global Step: 55110 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-04-27 06:54:38,999-Speed 3031.58 samples/sec Loss 11.6581 LearningRate 0.0605 Epoch: 4 Global Step: 55120 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:54:42,336-Speed 3069.20 samples/sec Loss 11.6172 LearningRate 0.0605 Epoch: 4 Global Step: 55130 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:54:45,730-Speed 3018.00 samples/sec Loss 11.5688 LearningRate 0.0605 Epoch: 4 Global Step: 55140 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:54:49,087-Speed 3051.51 samples/sec Loss 11.4421 LearningRate 0.0605 Epoch: 4 Global Step: 55150 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:54:52,506-Speed 2995.62 samples/sec Loss 11.6760 LearningRate 0.0605 Epoch: 4 Global Step: 55160 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:54:55,844-Speed 3068.41 samples/sec Loss 11.5579 LearningRate 0.0605 Epoch: 4 Global Step: 55170 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:54:59,217-Speed 3036.36 samples/sec Loss 11.6270 LearningRate 0.0605 Epoch: 4 Global Step: 55180 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:55:02,678-Speed 2959.86 samples/sec Loss 11.4861 LearningRate 0.0605 Epoch: 4 Global Step: 55190 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:55:06,103-Speed 2990.16 samples/sec Loss 11.4877 LearningRate 0.0605 Epoch: 4 Global Step: 55200 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:55:09,447-Speed 3063.74 samples/sec Loss 11.4588 LearningRate 0.0605 Epoch: 4 Global Step: 55210 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:55:12,853-Speed 3007.53 samples/sec Loss 11.5588 LearningRate 0.0605 Epoch: 4 Global Step: 55220 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:55:16,243-Speed 3021.58 samples/sec Loss 11.5880 LearningRate 0.0605 Epoch: 4 Global Step: 55230 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:55:19,609-Speed 3042.40 samples/sec Loss 11.3791 LearningRate 0.0605 Epoch: 4 Global Step: 55240 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:55:22,987-Speed 3032.87 samples/sec Loss 11.5418 LearningRate 0.0605 Epoch: 4 Global Step: 55250 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:55:26,357-Speed 3039.79 samples/sec Loss 11.5544 LearningRate 0.0605 Epoch: 4 Global Step: 55260 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:55:29,770-Speed 3000.78 samples/sec Loss 11.4896 LearningRate 0.0605 Epoch: 4 Global Step: 55270 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:55:33,111-Speed 3066.17 samples/sec Loss 11.5447 LearningRate 0.0604 Epoch: 4 Global Step: 55280 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:55:36,468-Speed 3051.28 samples/sec Loss 11.7086 LearningRate 0.0604 Epoch: 4 Global Step: 55290 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:55:39,938-Speed 2951.82 samples/sec Loss 11.5620 LearningRate 0.0604 Epoch: 4 Global Step: 55300 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:55:43,298-Speed 3047.82 samples/sec Loss 11.6295 LearningRate 0.0604 Epoch: 4 Global Step: 55310 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:55:46,698-Speed 3013.52 samples/sec Loss 11.6400 LearningRate 0.0604 Epoch: 4 Global Step: 55320 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:55:50,059-Speed 3047.07 samples/sec Loss 11.4215 LearningRate 0.0604 Epoch: 4 Global Step: 55330 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:55:53,424-Speed 3044.43 samples/sec Loss 11.4877 LearningRate 0.0604 Epoch: 4 Global Step: 55340 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:55:56,880-Speed 2963.68 samples/sec Loss 11.5364 LearningRate 0.0604 Epoch: 4 Global Step: 55350 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:56:00,282-Speed 3011.00 samples/sec Loss 11.5523 LearningRate 0.0604 Epoch: 4 Global Step: 55360 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:56:03,656-Speed 3036.27 samples/sec Loss 11.3728 LearningRate 0.0604 Epoch: 4 Global Step: 55370 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:56:07,032-Speed 3033.49 samples/sec Loss 11.3947 LearningRate 0.0604 Epoch: 4 Global Step: 55380 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:56:10,479-Speed 2971.73 samples/sec Loss 11.6221 LearningRate 0.0604 Epoch: 4 Global Step: 55390 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:56:13,883-Speed 3008.47 samples/sec Loss 11.4897 LearningRate 0.0604 Epoch: 4 Global Step: 55400 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:56:17,275-Speed 3019.85 samples/sec Loss 11.4919 LearningRate 0.0604 Epoch: 4 Global Step: 55410 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:56:20,652-Speed 3033.25 samples/sec Loss 11.4363 LearningRate 0.0604 Epoch: 4 Global Step: 55420 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:56:24,049-Speed 3015.25 samples/sec Loss 11.5609 LearningRate 0.0604 Epoch: 4 Global Step: 55430 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:56:27,409-Speed 3048.55 samples/sec Loss 11.5522 LearningRate 0.0603 Epoch: 4 Global Step: 55440 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:56:30,798-Speed 3023.00 samples/sec Loss 11.5554 LearningRate 0.0603 Epoch: 4 Global Step: 55450 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:56:34,222-Speed 2990.65 samples/sec Loss 11.4038 LearningRate 0.0603 Epoch: 4 Global Step: 55460 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:56:37,581-Speed 3049.14 samples/sec Loss 11.5485 LearningRate 0.0603 Epoch: 4 Global Step: 55470 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:56:41,041-Speed 2960.67 samples/sec Loss 11.4855 LearningRate 0.0603 Epoch: 4 Global Step: 55480 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:56:44,465-Speed 2991.82 samples/sec Loss 11.7439 LearningRate 0.0603 Epoch: 4 Global Step: 55490 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:56:47,884-Speed 2995.81 samples/sec Loss 11.5072 LearningRate 0.0603 Epoch: 4 Global Step: 55500 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:56:51,219-Speed 3071.84 samples/sec Loss 11.3675 LearningRate 0.0603 Epoch: 4 Global Step: 55510 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-04-27 06:56:54,603-Speed 3026.64 samples/sec Loss 11.5961 LearningRate 0.0603 Epoch: 4 Global Step: 55520 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:56:57,973-Speed 3039.31 samples/sec Loss 11.4663 LearningRate 0.0603 Epoch: 4 Global Step: 55530 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:57:01,355-Speed 3029.03 samples/sec Loss 11.5605 LearningRate 0.0603 Epoch: 4 Global Step: 55540 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:57:04,761-Speed 3007.30 samples/sec Loss 11.5025 LearningRate 0.0603 Epoch: 4 Global Step: 55550 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:57:08,179-Speed 2996.79 samples/sec Loss 11.5076 LearningRate 0.0603 Epoch: 4 Global Step: 55560 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:57:11,602-Speed 2992.08 samples/sec Loss 11.4896 LearningRate 0.0603 Epoch: 4 Global Step: 55570 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:57:15,063-Speed 2960.10 samples/sec Loss 11.6332 LearningRate 0.0603 Epoch: 4 Global Step: 55580 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:57:18,403-Speed 3065.74 samples/sec Loss 11.3800 LearningRate 0.0603 Epoch: 4 Global Step: 55590 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:57:21,760-Speed 3051.45 samples/sec Loss 11.4920 LearningRate 0.0602 Epoch: 4 Global Step: 55600 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:57:25,161-Speed 3012.45 samples/sec Loss 11.5997 LearningRate 0.0602 Epoch: 4 Global Step: 55610 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:57:28,638-Speed 2946.02 samples/sec Loss 11.4053 LearningRate 0.0602 Epoch: 4 Global Step: 55620 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:57:31,963-Speed 3080.39 samples/sec Loss 11.5900 LearningRate 0.0602 Epoch: 4 Global Step: 55630 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:57:35,414-Speed 2967.71 samples/sec Loss 11.5037 LearningRate 0.0602 Epoch: 4 Global Step: 55640 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:57:38,780-Speed 3042.90 samples/sec Loss 11.4142 LearningRate 0.0602 Epoch: 4 Global Step: 55650 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:57:42,230-Speed 2969.39 samples/sec Loss 11.4739 LearningRate 0.0602 Epoch: 4 Global Step: 55660 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:57:45,566-Speed 3070.17 samples/sec Loss 11.5183 LearningRate 0.0602 Epoch: 4 Global Step: 55670 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:57:48,960-Speed 3018.36 samples/sec Loss 11.5945 LearningRate 0.0602 Epoch: 4 Global Step: 55680 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:57:52,322-Speed 3047.18 samples/sec Loss 11.3687 LearningRate 0.0602 Epoch: 4 Global Step: 55690 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:57:55,679-Speed 3050.95 samples/sec Loss 11.5815 LearningRate 0.0602 Epoch: 4 Global Step: 55700 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:57:59,065-Speed 3025.70 samples/sec Loss 11.3251 LearningRate 0.0602 Epoch: 4 Global Step: 55710 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:58:02,544-Speed 2944.30 samples/sec Loss 11.5321 LearningRate 0.0602 Epoch: 4 Global Step: 55720 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:58:05,928-Speed 3027.01 samples/sec Loss 11.5871 LearningRate 0.0602 Epoch: 4 Global Step: 55730 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:58:09,328-Speed 3012.88 samples/sec Loss 11.3673 LearningRate 0.0602 Epoch: 4 Global Step: 55740 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:58:12,759-Speed 2985.13 samples/sec Loss 11.4013 LearningRate 0.0602 Epoch: 4 Global Step: 55750 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:58:16,176-Speed 2997.61 samples/sec Loss 11.4281 LearningRate 0.0601 Epoch: 4 Global Step: 55760 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:58:19,545-Speed 3040.66 samples/sec Loss 11.4786 LearningRate 0.0601 Epoch: 4 Global Step: 55770 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:58:22,959-Speed 3000.49 samples/sec Loss 11.3154 LearningRate 0.0601 Epoch: 4 Global Step: 55780 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:58:26,419-Speed 2960.36 samples/sec Loss 11.4079 LearningRate 0.0601 Epoch: 4 Global Step: 55790 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:58:29,877-Speed 2961.95 samples/sec Loss 11.5757 LearningRate 0.0601 Epoch: 4 Global Step: 55800 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:58:33,295-Speed 2997.28 samples/sec Loss 11.3913 LearningRate 0.0601 Epoch: 4 Global Step: 55810 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:58:36,688-Speed 3018.93 samples/sec Loss 11.4472 LearningRate 0.0601 Epoch: 4 Global Step: 55820 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:58:40,127-Speed 2978.13 samples/sec Loss 11.5921 LearningRate 0.0601 Epoch: 4 Global Step: 55830 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:58:44,829-Speed 2178.20 samples/sec Loss 11.4634 LearningRate 0.0601 Epoch: 4 Global Step: 55840 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:58:48,809-Speed 2573.51 samples/sec Loss 11.4984 LearningRate 0.0601 Epoch: 4 Global Step: 55850 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:58:52,215-Speed 3007.76 samples/sec Loss 11.4198 LearningRate 0.0601 Epoch: 4 Global Step: 55860 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:58:57,020-Speed 2131.66 samples/sec Loss 11.5302 LearningRate 0.0601 Epoch: 4 Global Step: 55870 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:59:00,399-Speed 3031.86 samples/sec Loss 11.5322 LearningRate 0.0601 Epoch: 4 Global Step: 55880 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:59:03,735-Speed 3069.89 samples/sec Loss 11.3742 LearningRate 0.0601 Epoch: 4 Global Step: 55890 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 06:59:07,126-Speed 3020.84 samples/sec Loss 11.4942 LearningRate 0.0601 Epoch: 4 Global Step: 55900 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:59:10,554-Speed 2988.13 samples/sec Loss 11.3215 LearningRate 0.0601 Epoch: 4 Global Step: 55910 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:59:13,863-Speed 3095.08 samples/sec Loss 11.4072 LearningRate 0.0600 Epoch: 4 Global Step: 55920 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:59:17,207-Speed 3062.97 samples/sec Loss 11.3841 LearningRate 0.0600 Epoch: 4 Global Step: 55930 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:59:20,551-Speed 3063.40 samples/sec Loss 11.5572 LearningRate 0.0600 Epoch: 4 Global Step: 55940 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:59:23,902-Speed 3057.06 samples/sec Loss 11.4554 LearningRate 0.0600 Epoch: 4 Global Step: 55950 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:59:27,295-Speed 3019.62 samples/sec Loss 11.2526 LearningRate 0.0600 Epoch: 4 Global Step: 55960 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:59:30,664-Speed 3039.68 samples/sec Loss 11.5452 LearningRate 0.0600 Epoch: 4 Global Step: 55970 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:59:34,050-Speed 3025.20 samples/sec Loss 11.5616 LearningRate 0.0600 Epoch: 4 Global Step: 55980 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:59:37,528-Speed 2945.73 samples/sec Loss 11.3846 LearningRate 0.0600 Epoch: 4 Global Step: 55990 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:59:40,854-Speed 3079.15 samples/sec Loss 11.5850 LearningRate 0.0600 Epoch: 4 Global Step: 56000 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:59:44,210-Speed 3052.52 samples/sec Loss 11.3889 LearningRate 0.0600 Epoch: 4 Global Step: 56010 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:59:47,613-Speed 3010.19 samples/sec Loss 11.5346 LearningRate 0.0600 Epoch: 4 Global Step: 56020 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:59:51,014-Speed 3011.06 samples/sec Loss 11.4462 LearningRate 0.0600 Epoch: 4 Global Step: 56030 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:59:54,405-Speed 3021.27 samples/sec Loss 11.3112 LearningRate 0.0600 Epoch: 4 Global Step: 56040 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 06:59:57,738-Speed 3072.81 samples/sec Loss 11.6375 LearningRate 0.0600 Epoch: 4 Global Step: 56050 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:00:01,124-Speed 3025.02 samples/sec Loss 11.4595 LearningRate 0.0600 Epoch: 4 Global Step: 56060 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:00:04,588-Speed 2957.20 samples/sec Loss 11.5088 LearningRate 0.0600 Epoch: 4 Global Step: 56070 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:00:08,013-Speed 2990.94 samples/sec Loss 11.5526 LearningRate 0.0599 Epoch: 4 Global Step: 56080 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:00:11,454-Speed 2976.19 samples/sec Loss 11.4891 LearningRate 0.0599 Epoch: 4 Global Step: 56090 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:00:14,816-Speed 3046.72 samples/sec Loss 11.4998 LearningRate 0.0599 Epoch: 4 Global Step: 56100 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-04-27 07:00:18,144-Speed 3078.45 samples/sec Loss 11.5691 LearningRate 0.0599 Epoch: 4 Global Step: 56110 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:00:21,533-Speed 3022.69 samples/sec Loss 11.5568 LearningRate 0.0599 Epoch: 4 Global Step: 56120 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:00:24,956-Speed 2992.72 samples/sec Loss 11.4859 LearningRate 0.0599 Epoch: 4 Global Step: 56130 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:00:28,345-Speed 3022.34 samples/sec Loss 11.7370 LearningRate 0.0599 Epoch: 4 Global Step: 56140 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:00:31,735-Speed 3022.16 samples/sec Loss 11.4940 LearningRate 0.0599 Epoch: 4 Global Step: 56150 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:00:35,208-Speed 2949.01 samples/sec Loss 11.5322 LearningRate 0.0599 Epoch: 4 Global Step: 56160 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:00:38,570-Speed 3046.96 samples/sec Loss 11.4676 LearningRate 0.0599 Epoch: 4 Global Step: 56170 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:00:41,945-Speed 3035.00 samples/sec Loss 11.2923 LearningRate 0.0599 Epoch: 4 Global Step: 56180 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:00:45,293-Speed 3059.24 samples/sec Loss 11.3572 LearningRate 0.0599 Epoch: 4 Global Step: 56190 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:00:48,730-Speed 2980.28 samples/sec Loss 11.4926 LearningRate 0.0599 Epoch: 4 Global Step: 56200 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:00:52,139-Speed 3004.78 samples/sec Loss 11.3945 LearningRate 0.0599 Epoch: 4 Global Step: 56210 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:00:55,561-Speed 2993.30 samples/sec Loss 11.4882 LearningRate 0.0599 Epoch: 4 Global Step: 56220 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:00:59,041-Speed 2943.15 samples/sec Loss 11.5208 LearningRate 0.0599 Epoch: 4 Global Step: 56230 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:01:02,430-Speed 3022.75 samples/sec Loss 11.4793 LearningRate 0.0598 Epoch: 4 Global Step: 56240 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:01:05,815-Speed 3025.36 samples/sec Loss 11.4419 LearningRate 0.0598 Epoch: 4 Global Step: 56250 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:01:09,201-Speed 3025.50 samples/sec Loss 11.4253 LearningRate 0.0598 Epoch: 4 Global Step: 56260 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:01:12,510-Speed 3095.94 samples/sec Loss 11.4288 LearningRate 0.0598 Epoch: 4 Global Step: 56270 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:01:15,905-Speed 3016.86 samples/sec Loss 11.6186 LearningRate 0.0598 Epoch: 4 Global Step: 56280 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:01:19,301-Speed 3016.31 samples/sec Loss 11.5868 LearningRate 0.0598 Epoch: 4 Global Step: 56290 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:01:22,654-Speed 3056.38 samples/sec Loss 11.3809 LearningRate 0.0598 Epoch: 4 Global Step: 56300 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:01:26,029-Speed 3034.60 samples/sec Loss 11.4971 LearningRate 0.0598 Epoch: 4 Global Step: 56310 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:01:29,367-Speed 3068.74 samples/sec Loss 11.3996 LearningRate 0.0598 Epoch: 4 Global Step: 56320 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:01:32,814-Speed 2971.41 samples/sec Loss 11.3141 LearningRate 0.0598 Epoch: 4 Global Step: 56330 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:01:36,207-Speed 3018.72 samples/sec Loss 11.4727 LearningRate 0.0598 Epoch: 4 Global Step: 56340 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:01:39,579-Speed 3037.91 samples/sec Loss 11.5985 LearningRate 0.0598 Epoch: 4 Global Step: 56350 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:01:42,942-Speed 3045.95 samples/sec Loss 11.5016 LearningRate 0.0598 Epoch: 4 Global Step: 56360 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:01:46,308-Speed 3043.15 samples/sec Loss 11.5362 LearningRate 0.0598 Epoch: 4 Global Step: 56370 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:01:49,688-Speed 3030.70 samples/sec Loss 11.4841 LearningRate 0.0598 Epoch: 4 Global Step: 56380 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:01:53,095-Speed 3006.86 samples/sec Loss 11.4883 LearningRate 0.0598 Epoch: 4 Global Step: 56390 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:01:56,607-Speed 2916.61 samples/sec Loss 11.5432 LearningRate 0.0597 Epoch: 4 Global Step: 56400 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:02:00,007-Speed 3011.89 samples/sec Loss 11.4357 LearningRate 0.0597 Epoch: 4 Global Step: 56410 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:02:03,435-Speed 2988.22 samples/sec Loss 11.5043 LearningRate 0.0597 Epoch: 4 Global Step: 56420 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:02:06,843-Speed 3005.89 samples/sec Loss 11.3976 LearningRate 0.0597 Epoch: 4 Global Step: 56430 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:02:10,290-Speed 2971.36 samples/sec Loss 11.3199 LearningRate 0.0597 Epoch: 4 Global Step: 56440 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:02:13,696-Speed 3007.23 samples/sec Loss 11.4037 LearningRate 0.0597 Epoch: 4 Global Step: 56450 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:02:17,188-Speed 2933.04 samples/sec Loss 11.4951 LearningRate 0.0597 Epoch: 4 Global Step: 56460 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:02:20,621-Speed 2984.30 samples/sec Loss 11.4567 LearningRate 0.0597 Epoch: 4 Global Step: 56470 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:02:24,055-Speed 2982.35 samples/sec Loss 11.4138 LearningRate 0.0597 Epoch: 4 Global Step: 56480 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:02:27,475-Speed 2995.15 samples/sec Loss 11.4343 LearningRate 0.0597 Epoch: 4 Global Step: 56490 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-04-27 07:02:30,916-Speed 2976.72 samples/sec Loss 11.3709 LearningRate 0.0597 Epoch: 4 Global Step: 56500 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:02:34,310-Speed 3018.33 samples/sec Loss 11.5384 LearningRate 0.0597 Epoch: 4 Global Step: 56510 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:02:37,664-Speed 3053.54 samples/sec Loss 11.2924 LearningRate 0.0597 Epoch: 4 Global Step: 56520 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:02:41,082-Speed 2997.24 samples/sec Loss 11.4263 LearningRate 0.0597 Epoch: 4 Global Step: 56530 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:02:44,586-Speed 2923.41 samples/sec Loss 11.3670 LearningRate 0.0597 Epoch: 4 Global Step: 56540 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:02:47,945-Speed 3049.26 samples/sec Loss 11.4072 LearningRate 0.0597 Epoch: 4 Global Step: 56550 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:02:51,368-Speed 2992.52 samples/sec Loss 11.3691 LearningRate 0.0596 Epoch: 4 Global Step: 56560 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:02:54,846-Speed 2944.56 samples/sec Loss 11.5604 LearningRate 0.0596 Epoch: 4 Global Step: 56570 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:02:58,296-Speed 2969.87 samples/sec Loss 11.4066 LearningRate 0.0596 Epoch: 4 Global Step: 56580 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:03:01,694-Speed 3014.00 samples/sec Loss 11.3146 LearningRate 0.0596 Epoch: 4 Global Step: 56590 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:03:05,092-Speed 3014.07 samples/sec Loss 11.4655 LearningRate 0.0596 Epoch: 4 Global Step: 56600 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:03:08,566-Speed 2948.07 samples/sec Loss 11.4526 LearningRate 0.0596 Epoch: 4 Global Step: 56610 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:03:11,999-Speed 2983.98 samples/sec Loss 11.2916 LearningRate 0.0596 Epoch: 4 Global Step: 56620 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:03:15,420-Speed 2994.01 samples/sec Loss 11.3023 LearningRate 0.0596 Epoch: 4 Global Step: 56630 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:03:18,773-Speed 3054.63 samples/sec Loss 11.3935 LearningRate 0.0596 Epoch: 4 Global Step: 56640 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:03:22,126-Speed 3055.54 samples/sec Loss 11.2830 LearningRate 0.0596 Epoch: 4 Global Step: 56650 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:03:25,487-Speed 3047.78 samples/sec Loss 11.4464 LearningRate 0.0596 Epoch: 4 Global Step: 56660 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:03:28,840-Speed 3055.29 samples/sec Loss 11.5545 LearningRate 0.0596 Epoch: 4 Global Step: 56670 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:03:32,197-Speed 3050.52 samples/sec Loss 11.3073 LearningRate 0.0596 Epoch: 4 Global Step: 56680 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:03:35,552-Speed 3053.63 samples/sec Loss 11.3365 LearningRate 0.0596 Epoch: 4 Global Step: 56690 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:03:38,921-Speed 3039.93 samples/sec Loss 11.4233 LearningRate 0.0596 Epoch: 4 Global Step: 56700 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:03:42,332-Speed 3002.73 samples/sec Loss 11.3306 LearningRate 0.0596 Epoch: 4 Global Step: 56710 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:03:45,787-Speed 2964.93 samples/sec Loss 11.3830 LearningRate 0.0595 Epoch: 4 Global Step: 56720 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:03:49,180-Speed 3018.58 samples/sec Loss 11.3677 LearningRate 0.0595 Epoch: 4 Global Step: 56730 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:03:52,512-Speed 3074.30 samples/sec Loss 11.6405 LearningRate 0.0595 Epoch: 4 Global Step: 56740 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:03:55,973-Speed 2959.67 samples/sec Loss 11.3311 LearningRate 0.0595 Epoch: 4 Global Step: 56750 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:03:59,415-Speed 2976.21 samples/sec Loss 11.3268 LearningRate 0.0595 Epoch: 4 Global Step: 56760 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:04:02,824-Speed 3004.57 samples/sec Loss 11.5467 LearningRate 0.0595 Epoch: 4 Global Step: 56770 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:04:06,155-Speed 3074.64 samples/sec Loss 11.3409 LearningRate 0.0595 Epoch: 4 Global Step: 56780 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:04:09,536-Speed 3029.51 samples/sec Loss 11.4723 LearningRate 0.0595 Epoch: 4 Global Step: 56790 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:04:12,922-Speed 3025.20 samples/sec Loss 11.3872 LearningRate 0.0595 Epoch: 4 Global Step: 56800 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:04:16,309-Speed 3024.43 samples/sec Loss 11.5193 LearningRate 0.0595 Epoch: 4 Global Step: 56810 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:04:19,758-Speed 2970.09 samples/sec Loss 11.3382 LearningRate 0.0595 Epoch: 4 Global Step: 56820 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:04:23,164-Speed 3007.06 samples/sec Loss 11.4808 LearningRate 0.0595 Epoch: 4 Global Step: 56830 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:04:26,586-Speed 2993.37 samples/sec Loss 11.3485 LearningRate 0.0595 Epoch: 4 Global Step: 56840 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:04:29,971-Speed 3026.67 samples/sec Loss 11.5241 LearningRate 0.0595 Epoch: 4 Global Step: 56850 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:04:33,392-Speed 2993.47 samples/sec Loss 11.3890 LearningRate 0.0595 Epoch: 4 Global Step: 56860 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:04:36,741-Speed 3058.91 samples/sec Loss 11.3799 LearningRate 0.0595 Epoch: 4 Global Step: 56870 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:04:40,124-Speed 3028.08 samples/sec Loss 11.4795 LearningRate 0.0594 Epoch: 4 Global Step: 56880 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:04:43,622-Speed 2927.75 samples/sec Loss 11.5086 LearningRate 0.0594 Epoch: 4 Global Step: 56890 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:04:46,985-Speed 3045.88 samples/sec Loss 11.2048 LearningRate 0.0594 Epoch: 4 Global Step: 56900 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:04:50,371-Speed 3025.33 samples/sec Loss 11.4811 LearningRate 0.0594 Epoch: 4 Global Step: 56910 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:04:53,846-Speed 2947.06 samples/sec Loss 11.5453 LearningRate 0.0594 Epoch: 4 Global Step: 56920 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:04:57,204-Speed 3050.52 samples/sec Loss 11.3281 LearningRate 0.0594 Epoch: 4 Global Step: 56930 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:05:00,574-Speed 3040.12 samples/sec Loss 11.3996 LearningRate 0.0594 Epoch: 4 Global Step: 56940 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:05:03,929-Speed 3053.32 samples/sec Loss 11.4104 LearningRate 0.0594 Epoch: 4 Global Step: 56950 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:05:07,352-Speed 2992.44 samples/sec Loss 11.5590 LearningRate 0.0594 Epoch: 4 Global Step: 56960 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:05:10,731-Speed 3031.12 samples/sec Loss 11.3539 LearningRate 0.0594 Epoch: 4 Global Step: 56970 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:05:14,185-Speed 2965.50 samples/sec Loss 11.3665 LearningRate 0.0594 Epoch: 4 Global Step: 56980 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:05:17,526-Speed 3065.89 samples/sec Loss 11.4097 LearningRate 0.0594 Epoch: 4 Global Step: 56990 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:05:20,959-Speed 2984.00 samples/sec Loss 11.3228 LearningRate 0.0594 Epoch: 4 Global Step: 57000 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:05:24,289-Speed 3075.32 samples/sec Loss 11.5212 LearningRate 0.0594 Epoch: 4 Global Step: 57010 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:05:27,647-Speed 3051.09 samples/sec Loss 11.5040 LearningRate 0.0594 Epoch: 4 Global Step: 57020 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:05:31,108-Speed 2959.04 samples/sec Loss 11.5582 LearningRate 0.0594 Epoch: 4 Global Step: 57030 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:05:34,441-Speed 3073.82 samples/sec Loss 11.4292 LearningRate 0.0593 Epoch: 4 Global Step: 57040 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:05:37,792-Speed 3056.08 samples/sec Loss 11.3884 LearningRate 0.0593 Epoch: 4 Global Step: 57050 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:05:41,260-Speed 2954.24 samples/sec Loss 11.3799 LearningRate 0.0593 Epoch: 4 Global Step: 57060 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:05:44,560-Speed 3103.15 samples/sec Loss 11.4198 LearningRate 0.0593 Epoch: 4 Global Step: 57070 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:05:47,952-Speed 3020.28 samples/sec Loss 11.3774 LearningRate 0.0593 Epoch: 4 Global Step: 57080 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:05:51,373-Speed 2993.51 samples/sec Loss 11.4725 LearningRate 0.0593 Epoch: 4 Global Step: 57090 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:05:54,806-Speed 2983.85 samples/sec Loss 11.4423 LearningRate 0.0593 Epoch: 4 Global Step: 57100 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:05:58,221-Speed 3000.03 samples/sec Loss 11.3209 LearningRate 0.0593 Epoch: 4 Global Step: 57110 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:06:01,668-Speed 2971.88 samples/sec Loss 11.5428 LearningRate 0.0593 Epoch: 4 Global Step: 57120 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:06:05,089-Speed 2993.77 samples/sec Loss 11.4490 LearningRate 0.0593 Epoch: 4 Global Step: 57130 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:06:08,532-Speed 2974.57 samples/sec Loss 11.3488 LearningRate 0.0593 Epoch: 4 Global Step: 57140 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:06:11,887-Speed 3053.15 samples/sec Loss 11.5412 LearningRate 0.0593 Epoch: 4 Global Step: 57150 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:06:15,220-Speed 3073.97 samples/sec Loss 11.3971 LearningRate 0.0593 Epoch: 4 Global Step: 57160 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:06:18,615-Speed 3016.75 samples/sec Loss 11.3871 LearningRate 0.0593 Epoch: 4 Global Step: 57170 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:06:22,067-Speed 2967.21 samples/sec Loss 11.6059 LearningRate 0.0593 Epoch: 4 Global Step: 57180 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:06:25,465-Speed 3014.50 samples/sec Loss 11.5082 LearningRate 0.0593 Epoch: 4 Global Step: 57190 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:06:28,931-Speed 2955.22 samples/sec Loss 11.3759 LearningRate 0.0593 Epoch: 4 Global Step: 57200 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:06:32,331-Speed 3012.99 samples/sec Loss 11.3609 LearningRate 0.0592 Epoch: 4 Global Step: 57210 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:06:35,681-Speed 3057.85 samples/sec Loss 11.3824 LearningRate 0.0592 Epoch: 4 Global Step: 57220 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:06:39,103-Speed 2993.79 samples/sec Loss 11.4371 LearningRate 0.0592 Epoch: 4 Global Step: 57230 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:06:42,484-Speed 3028.84 samples/sec Loss 11.3725 LearningRate 0.0592 Epoch: 4 Global Step: 57240 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:06:45,877-Speed 3018.82 samples/sec Loss 11.3888 LearningRate 0.0592 Epoch: 4 Global Step: 57250 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:06:49,276-Speed 3014.19 samples/sec Loss 11.4764 LearningRate 0.0592 Epoch: 4 Global Step: 57260 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:06:52,678-Speed 3010.10 samples/sec Loss 11.6325 LearningRate 0.0592 Epoch: 4 Global Step: 57270 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:06:56,117-Speed 2979.09 samples/sec Loss 11.3182 LearningRate 0.0592 Epoch: 4 Global Step: 57280 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:06:59,499-Speed 3028.43 samples/sec Loss 11.4831 LearningRate 0.0592 Epoch: 4 Global Step: 57290 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:07:02,908-Speed 3005.07 samples/sec Loss 11.4212 LearningRate 0.0592 Epoch: 4 Global Step: 57300 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:07:06,349-Speed 2976.69 samples/sec Loss 11.1888 LearningRate 0.0592 Epoch: 4 Global Step: 57310 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:07:09,815-Speed 2954.84 samples/sec Loss 11.3069 LearningRate 0.0592 Epoch: 4 Global Step: 57320 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:07:13,288-Speed 2949.79 samples/sec Loss 11.3224 LearningRate 0.0592 Epoch: 4 Global Step: 57330 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:07:16,698-Speed 3004.08 samples/sec Loss 11.3678 LearningRate 0.0592 Epoch: 4 Global Step: 57340 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:07:20,102-Speed 3008.74 samples/sec Loss 11.4376 LearningRate 0.0592 Epoch: 4 Global Step: 57350 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:07:23,640-Speed 2894.88 samples/sec Loss 11.2842 LearningRate 0.0592 Epoch: 4 Global Step: 57360 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:07:27,025-Speed 3026.06 samples/sec Loss 11.1437 LearningRate 0.0591 Epoch: 4 Global Step: 57370 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:07:30,468-Speed 2975.27 samples/sec Loss 11.3502 LearningRate 0.0591 Epoch: 4 Global Step: 57380 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:07:33,868-Speed 3011.89 samples/sec Loss 11.5038 LearningRate 0.0591 Epoch: 4 Global Step: 57390 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:07:37,314-Speed 2972.86 samples/sec Loss 11.3671 LearningRate 0.0591 Epoch: 4 Global Step: 57400 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:07:40,686-Speed 3037.73 samples/sec Loss 11.3121 LearningRate 0.0591 Epoch: 4 Global Step: 57410 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:07:44,108-Speed 2993.42 samples/sec Loss 11.3728 LearningRate 0.0591 Epoch: 4 Global Step: 57420 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-04-27 07:07:47,433-Speed 3080.47 samples/sec Loss 11.3328 LearningRate 0.0591 Epoch: 4 Global Step: 57430 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:07:50,789-Speed 3051.72 samples/sec Loss 11.3456 LearningRate 0.0591 Epoch: 4 Global Step: 57440 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:07:54,198-Speed 3004.48 samples/sec Loss 11.4288 LearningRate 0.0591 Epoch: 4 Global Step: 57450 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:07:57,547-Speed 3058.62 samples/sec Loss 11.3556 LearningRate 0.0591 Epoch: 4 Global Step: 57460 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:08:00,927-Speed 3030.24 samples/sec Loss 11.3628 LearningRate 0.0591 Epoch: 4 Global Step: 57470 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:08:04,387-Speed 2960.51 samples/sec Loss 11.3383 LearningRate 0.0591 Epoch: 4 Global Step: 57480 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:08:07,768-Speed 3029.47 samples/sec Loss 11.2078 LearningRate 0.0591 Epoch: 4 Global Step: 57490 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:08:11,146-Speed 3032.13 samples/sec Loss 11.1937 LearningRate 0.0591 Epoch: 4 Global Step: 57500 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:08:14,494-Speed 3060.00 samples/sec Loss 11.3654 LearningRate 0.0591 Epoch: 4 Global Step: 57510 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:08:17,900-Speed 3006.98 samples/sec Loss 11.5254 LearningRate 0.0591 Epoch: 4 Global Step: 57520 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:08:21,266-Speed 3043.13 samples/sec Loss 11.3117 LearningRate 0.0590 Epoch: 4 Global Step: 57530 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:08:24,774-Speed 2919.48 samples/sec Loss 11.4761 LearningRate 0.0590 Epoch: 4 Global Step: 57540 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:08:28,171-Speed 3015.76 samples/sec Loss 11.3420 LearningRate 0.0590 Epoch: 4 Global Step: 57550 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:08:31,632-Speed 2958.72 samples/sec Loss 11.3003 LearningRate 0.0590 Epoch: 4 Global Step: 57560 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:08:35,051-Speed 2996.35 samples/sec Loss 11.3748 LearningRate 0.0590 Epoch: 4 Global Step: 57570 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:08:38,473-Speed 2993.19 samples/sec Loss 11.4269 LearningRate 0.0590 Epoch: 4 Global Step: 57580 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:08:41,826-Speed 3054.58 samples/sec Loss 11.5106 LearningRate 0.0590 Epoch: 4 Global Step: 57590 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:08:45,251-Speed 2990.75 samples/sec Loss 11.4582 LearningRate 0.0590 Epoch: 4 Global Step: 57600 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:08:48,734-Speed 2941.10 samples/sec Loss 11.3732 LearningRate 0.0590 Epoch: 4 Global Step: 57610 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:08:52,240-Speed 2921.02 samples/sec Loss 11.5324 LearningRate 0.0590 Epoch: 4 Global Step: 57620 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:08:55,640-Speed 3012.92 samples/sec Loss 11.3975 LearningRate 0.0590 Epoch: 4 Global Step: 57630 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:08:59,033-Speed 3019.26 samples/sec Loss 11.3027 LearningRate 0.0590 Epoch: 4 Global Step: 57640 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:09:02,432-Speed 3013.48 samples/sec Loss 11.3165 LearningRate 0.0590 Epoch: 4 Global Step: 57650 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:09:05,958-Speed 2904.63 samples/sec Loss 11.2775 LearningRate 0.0590 Epoch: 4 Global Step: 57660 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:09:09,283-Speed 3080.70 samples/sec Loss 11.2449 LearningRate 0.0590 Epoch: 4 Global Step: 57670 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:09:12,708-Speed 2990.08 samples/sec Loss 11.3460 LearningRate 0.0590 Epoch: 4 Global Step: 57680 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:09:16,060-Speed 3055.85 samples/sec Loss 11.3278 LearningRate 0.0589 Epoch: 4 Global Step: 57690 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:09:19,481-Speed 2995.48 samples/sec Loss 11.3832 LearningRate 0.0589 Epoch: 4 Global Step: 57700 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:09:22,920-Speed 2978.85 samples/sec Loss 11.3696 LearningRate 0.0589 Epoch: 4 Global Step: 57710 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:09:26,363-Speed 2975.43 samples/sec Loss 11.5166 LearningRate 0.0589 Epoch: 4 Global Step: 57720 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:09:29,707-Speed 3063.14 samples/sec Loss 11.4014 LearningRate 0.0589 Epoch: 4 Global Step: 57730 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:09:33,098-Speed 3020.69 samples/sec Loss 11.3007 LearningRate 0.0589 Epoch: 4 Global Step: 57740 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:09:36,566-Speed 2952.80 samples/sec Loss 11.3314 LearningRate 0.0589 Epoch: 4 Global Step: 57750 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:09:39,976-Speed 3004.56 samples/sec Loss 11.2217 LearningRate 0.0589 Epoch: 4 Global Step: 57760 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:09:43,322-Speed 3060.85 samples/sec Loss 11.2685 LearningRate 0.0589 Epoch: 4 Global Step: 57770 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:09:46,723-Speed 3011.64 samples/sec Loss 11.3583 LearningRate 0.0589 Epoch: 4 Global Step: 57780 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:09:50,100-Speed 3033.32 samples/sec Loss 11.3840 LearningRate 0.0589 Epoch: 4 Global Step: 57790 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:09:53,592-Speed 2932.70 samples/sec Loss 11.4444 LearningRate 0.0589 Epoch: 4 Global Step: 57800 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:09:56,958-Speed 3043.62 samples/sec Loss 11.2471 LearningRate 0.0589 Epoch: 4 Global Step: 57810 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:10:00,317-Speed 3049.56 samples/sec Loss 11.3253 LearningRate 0.0589 Epoch: 4 Global Step: 57820 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:10:03,691-Speed 3036.28 samples/sec Loss 11.3543 LearningRate 0.0589 Epoch: 4 Global Step: 57830 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-04-27 07:10:07,025-Speed 3072.18 samples/sec Loss 11.2958 LearningRate 0.0589 Epoch: 4 Global Step: 57840 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:10:10,378-Speed 3054.25 samples/sec Loss 11.3952 LearningRate 0.0588 Epoch: 4 Global Step: 57850 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:10:13,748-Speed 3039.73 samples/sec Loss 11.5260 LearningRate 0.0588 Epoch: 4 Global Step: 57860 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:10:17,146-Speed 3013.68 samples/sec Loss 11.3742 LearningRate 0.0588 Epoch: 4 Global Step: 57870 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:10:20,519-Speed 3037.09 samples/sec Loss 11.3554 LearningRate 0.0588 Epoch: 4 Global Step: 57880 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:10:23,933-Speed 3000.16 samples/sec Loss 11.3668 LearningRate 0.0588 Epoch: 4 Global Step: 57890 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:10:27,341-Speed 3006.35 samples/sec Loss 11.3275 LearningRate 0.0588 Epoch: 4 Global Step: 57900 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:10:30,776-Speed 2981.63 samples/sec Loss 11.3532 LearningRate 0.0588 Epoch: 4 Global Step: 57910 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:10:34,182-Speed 3007.57 samples/sec Loss 11.2745 LearningRate 0.0588 Epoch: 4 Global Step: 57920 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:10:37,543-Speed 3047.54 samples/sec Loss 11.3975 LearningRate 0.0588 Epoch: 4 Global Step: 57930 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:10:40,996-Speed 2966.30 samples/sec Loss 11.1503 LearningRate 0.0588 Epoch: 4 Global Step: 57940 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:10:44,421-Speed 2990.07 samples/sec Loss 11.3549 LearningRate 0.0588 Epoch: 4 Global Step: 57950 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:10:47,803-Speed 3029.02 samples/sec Loss 11.3112 LearningRate 0.0588 Epoch: 4 Global Step: 57960 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:10:51,161-Speed 3050.38 samples/sec Loss 11.3954 LearningRate 0.0588 Epoch: 4 Global Step: 57970 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:10:54,543-Speed 3028.68 samples/sec Loss 11.5734 LearningRate 0.0588 Epoch: 4 Global Step: 57980 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:10:57,922-Speed 3031.04 samples/sec Loss 11.4223 LearningRate 0.0588 Epoch: 4 Global Step: 57990 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:11:01,372-Speed 2969.83 samples/sec Loss 11.4035 LearningRate 0.0588 Epoch: 4 Global Step: 58000 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:11:04,742-Speed 3038.81 samples/sec Loss 11.3843 LearningRate 0.0587 Epoch: 4 Global Step: 58010 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:11:08,135-Speed 3018.71 samples/sec Loss 11.2458 LearningRate 0.0587 Epoch: 4 Global Step: 58020 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:11:11,478-Speed 3064.15 samples/sec Loss 11.4664 LearningRate 0.0587 Epoch: 4 Global Step: 58030 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:11:14,859-Speed 3030.06 samples/sec Loss 11.4329 LearningRate 0.0587 Epoch: 4 Global Step: 58040 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:11:18,226-Speed 3042.12 samples/sec Loss 11.4566 LearningRate 0.0587 Epoch: 4 Global Step: 58050 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:11:21,703-Speed 2946.07 samples/sec Loss 11.3759 LearningRate 0.0587 Epoch: 4 Global Step: 58060 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:11:25,084-Speed 3029.04 samples/sec Loss 11.1654 LearningRate 0.0587 Epoch: 4 Global Step: 58070 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:11:28,434-Speed 3057.77 samples/sec Loss 11.3912 LearningRate 0.0587 Epoch: 4 Global Step: 58080 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:11:31,789-Speed 3053.24 samples/sec Loss 11.3301 LearningRate 0.0587 Epoch: 4 Global Step: 58090 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:11:35,123-Speed 3072.37 samples/sec Loss 11.3111 LearningRate 0.0587 Epoch: 4 Global Step: 58100 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:11:38,484-Speed 3047.88 samples/sec Loss 11.1888 LearningRate 0.0587 Epoch: 4 Global Step: 58110 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:11:41,876-Speed 3019.98 samples/sec Loss 11.4153 LearningRate 0.0587 Epoch: 4 Global Step: 58120 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:11:45,325-Speed 2969.88 samples/sec Loss 11.3556 LearningRate 0.0587 Epoch: 4 Global Step: 58130 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:11:48,804-Speed 2943.62 samples/sec Loss 11.3407 LearningRate 0.0587 Epoch: 4 Global Step: 58140 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:11:52,274-Speed 2952.48 samples/sec Loss 11.2912 LearningRate 0.0587 Epoch: 4 Global Step: 58150 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:11:55,685-Speed 3003.09 samples/sec Loss 11.2934 LearningRate 0.0587 Epoch: 4 Global Step: 58160 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:11:59,117-Speed 2984.67 samples/sec Loss 11.3598 LearningRate 0.0587 Epoch: 4 Global Step: 58170 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:12:02,603-Speed 2938.33 samples/sec Loss 11.3196 LearningRate 0.0586 Epoch: 4 Global Step: 58180 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:12:06,018-Speed 2999.88 samples/sec Loss 11.3887 LearningRate 0.0586 Epoch: 4 Global Step: 58190 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:12:09,370-Speed 3054.80 samples/sec Loss 11.2467 LearningRate 0.0586 Epoch: 4 Global Step: 58200 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:12:12,732-Speed 3046.91 samples/sec Loss 11.4070 LearningRate 0.0586 Epoch: 4 Global Step: 58210 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:12:16,094-Speed 3046.57 samples/sec Loss 11.2511 LearningRate 0.0586 Epoch: 4 Global Step: 58220 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:12:19,554-Speed 2960.75 samples/sec Loss 11.2783 LearningRate 0.0586 Epoch: 4 Global Step: 58230 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:12:22,948-Speed 3018.49 samples/sec Loss 11.3217 LearningRate 0.0586 Epoch: 4 Global Step: 58240 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:12:26,302-Speed 3053.73 samples/sec Loss 11.2783 LearningRate 0.0586 Epoch: 4 Global Step: 58250 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:12:29,790-Speed 2936.57 samples/sec Loss 11.2870 LearningRate 0.0586 Epoch: 4 Global Step: 58260 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:12:33,111-Speed 3084.47 samples/sec Loss 11.2994 LearningRate 0.0586 Epoch: 4 Global Step: 58270 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:12:36,504-Speed 3018.39 samples/sec Loss 11.3557 LearningRate 0.0586 Epoch: 4 Global Step: 58280 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:12:39,825-Speed 3084.93 samples/sec Loss 11.3997 LearningRate 0.0586 Epoch: 4 Global Step: 58290 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:12:43,296-Speed 2950.63 samples/sec Loss 11.1798 LearningRate 0.0586 Epoch: 4 Global Step: 58300 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:12:46,744-Speed 2970.24 samples/sec Loss 11.3447 LearningRate 0.0586 Epoch: 4 Global Step: 58310 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:12:50,126-Speed 3029.39 samples/sec Loss 11.2367 LearningRate 0.0586 Epoch: 4 Global Step: 58320 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:12:53,533-Speed 3006.13 samples/sec Loss 11.4663 LearningRate 0.0586 Epoch: 4 Global Step: 58330 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:12:56,922-Speed 3022.53 samples/sec Loss 11.1299 LearningRate 0.0585 Epoch: 4 Global Step: 58340 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:13:00,247-Speed 3080.41 samples/sec Loss 11.3052 LearningRate 0.0585 Epoch: 4 Global Step: 58350 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:13:03,646-Speed 3013.87 samples/sec Loss 11.1133 LearningRate 0.0585 Epoch: 4 Global Step: 58360 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:13:07,111-Speed 2956.16 samples/sec Loss 11.3770 LearningRate 0.0585 Epoch: 4 Global Step: 58370 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:13:10,523-Speed 3001.89 samples/sec Loss 11.2540 LearningRate 0.0585 Epoch: 4 Global Step: 58380 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:13:13,897-Speed 3046.76 samples/sec Loss 11.3728 LearningRate 0.0585 Epoch: 4 Global Step: 58390 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:13:17,265-Speed 3042.03 samples/sec Loss 11.3631 LearningRate 0.0585 Epoch: 4 Global Step: 58400 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:13:20,666-Speed 3011.32 samples/sec Loss 11.2826 LearningRate 0.0585 Epoch: 4 Global Step: 58410 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:13:24,076-Speed 3004.18 samples/sec Loss 11.1115 LearningRate 0.0585 Epoch: 4 Global Step: 58420 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:13:27,517-Speed 2976.42 samples/sec Loss 11.3421 LearningRate 0.0585 Epoch: 4 Global Step: 58430 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:13:30,956-Speed 2977.99 samples/sec Loss 11.4141 LearningRate 0.0585 Epoch: 4 Global Step: 58440 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:13:34,386-Speed 2986.58 samples/sec Loss 11.2073 LearningRate 0.0585 Epoch: 4 Global Step: 58450 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:13:37,721-Speed 3071.37 samples/sec Loss 11.4584 LearningRate 0.0585 Epoch: 4 Global Step: 58460 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:13:41,089-Speed 3041.02 samples/sec Loss 11.4017 LearningRate 0.0585 Epoch: 4 Global Step: 58470 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:13:44,409-Speed 3086.39 samples/sec Loss 11.2487 LearningRate 0.0585 Epoch: 4 Global Step: 58480 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:13:47,723-Speed 3090.68 samples/sec Loss 11.2098 LearningRate 0.0585 Epoch: 4 Global Step: 58490 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:13:51,082-Speed 3049.05 samples/sec Loss 11.1461 LearningRate 0.0584 Epoch: 4 Global Step: 58500 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:13:54,502-Speed 2994.54 samples/sec Loss 11.3112 LearningRate 0.0584 Epoch: 4 Global Step: 58510 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:13:57,864-Speed 3047.68 samples/sec Loss 11.3976 LearningRate 0.0584 Epoch: 4 Global Step: 58520 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:14:01,256-Speed 3019.30 samples/sec Loss 11.1774 LearningRate 0.0584 Epoch: 4 Global Step: 58530 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:14:04,578-Speed 3083.99 samples/sec Loss 11.2760 LearningRate 0.0584 Epoch: 4 Global Step: 58540 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:14:07,932-Speed 3053.82 samples/sec Loss 11.1727 LearningRate 0.0584 Epoch: 4 Global Step: 58550 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:14:11,375-Speed 2975.30 samples/sec Loss 11.2605 LearningRate 0.0584 Epoch: 4 Global Step: 58560 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:14:14,742-Speed 3041.97 samples/sec Loss 11.3613 LearningRate 0.0584 Epoch: 4 Global Step: 58570 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:14:18,112-Speed 3039.36 samples/sec Loss 11.2938 LearningRate 0.0584 Epoch: 4 Global Step: 58580 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:14:21,576-Speed 2957.28 samples/sec Loss 11.2315 LearningRate 0.0584 Epoch: 4 Global Step: 58590 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:14:24,979-Speed 3009.51 samples/sec Loss 11.3377 LearningRate 0.0584 Epoch: 4 Global Step: 58600 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:14:28,369-Speed 3021.58 samples/sec Loss 11.3645 LearningRate 0.0584 Epoch: 4 Global Step: 58610 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:14:31,711-Speed 3065.18 samples/sec Loss 11.3862 LearningRate 0.0584 Epoch: 4 Global Step: 58620 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:14:35,053-Speed 3065.48 samples/sec Loss 11.2631 LearningRate 0.0584 Epoch: 4 Global Step: 58630 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:14:38,409-Speed 3052.27 samples/sec Loss 11.3678 LearningRate 0.0584 Epoch: 4 Global Step: 58640 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:14:41,798-Speed 3021.53 samples/sec Loss 11.2307 LearningRate 0.0584 Epoch: 4 Global Step: 58650 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:14:45,169-Speed 3039.40 samples/sec Loss 11.3129 LearningRate 0.0583 Epoch: 4 Global Step: 58660 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:14:48,629-Speed 2960.38 samples/sec Loss 11.2227 LearningRate 0.0583 Epoch: 4 Global Step: 58670 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:14:52,000-Speed 3039.08 samples/sec Loss 11.4249 LearningRate 0.0583 Epoch: 4 Global Step: 58680 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:14:55,407-Speed 3006.32 samples/sec Loss 11.2967 LearningRate 0.0583 Epoch: 4 Global Step: 58690 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:14:58,824-Speed 2998.20 samples/sec Loss 11.2023 LearningRate 0.0583 Epoch: 4 Global Step: 58700 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:15:02,337-Speed 2915.01 samples/sec Loss 11.3233 LearningRate 0.0583 Epoch: 4 Global Step: 58710 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:15:05,847-Speed 2918.46 samples/sec Loss 11.1907 LearningRate 0.0583 Epoch: 4 Global Step: 58720 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:15:09,221-Speed 3035.90 samples/sec Loss 11.2253 LearningRate 0.0583 Epoch: 4 Global Step: 58730 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:15:12,632-Speed 3003.28 samples/sec Loss 11.1547 LearningRate 0.0583 Epoch: 4 Global Step: 58740 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:15:15,957-Speed 3079.96 samples/sec Loss 11.2474 LearningRate 0.0583 Epoch: 4 Global Step: 58750 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:15:19,337-Speed 3030.77 samples/sec Loss 11.3071 LearningRate 0.0583 Epoch: 4 Global Step: 58760 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:15:22,686-Speed 3058.94 samples/sec Loss 11.2487 LearningRate 0.0583 Epoch: 4 Global Step: 58770 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:15:26,020-Speed 3071.93 samples/sec Loss 11.2405 LearningRate 0.0583 Epoch: 4 Global Step: 58780 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:15:29,356-Speed 3070.65 samples/sec Loss 11.4408 LearningRate 0.0583 Epoch: 4 Global Step: 58790 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:15:32,697-Speed 3066.85 samples/sec Loss 11.1309 LearningRate 0.0583 Epoch: 4 Global Step: 58800 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:15:35,995-Speed 3106.32 samples/sec Loss 11.2843 LearningRate 0.0583 Epoch: 4 Global Step: 58810 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:15:39,353-Speed 3050.36 samples/sec Loss 11.3581 LearningRate 0.0583 Epoch: 4 Global Step: 58820 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:15:42,820-Speed 2954.22 samples/sec Loss 11.2335 LearningRate 0.0582 Epoch: 4 Global Step: 58830 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:15:46,154-Speed 3072.53 samples/sec Loss 11.5452 LearningRate 0.0582 Epoch: 4 Global Step: 58840 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:15:49,558-Speed 3008.78 samples/sec Loss 11.1623 LearningRate 0.0582 Epoch: 4 Global Step: 58850 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:15:52,986-Speed 2987.78 samples/sec Loss 11.1400 LearningRate 0.0582 Epoch: 4 Global Step: 58860 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:15:56,324-Speed 3069.42 samples/sec Loss 11.1985 LearningRate 0.0582 Epoch: 4 Global Step: 58870 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:15:59,784-Speed 2960.12 samples/sec Loss 11.3073 LearningRate 0.0582 Epoch: 4 Global Step: 58880 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:16:03,159-Speed 3035.79 samples/sec Loss 11.2745 LearningRate 0.0582 Epoch: 4 Global Step: 58890 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:16:06,520-Speed 3047.46 samples/sec Loss 11.2564 LearningRate 0.0582 Epoch: 4 Global Step: 58900 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:16:09,891-Speed 3038.15 samples/sec Loss 11.2157 LearningRate 0.0582 Epoch: 4 Global Step: 58910 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:16:13,249-Speed 3050.91 samples/sec Loss 11.2782 LearningRate 0.0582 Epoch: 4 Global Step: 58920 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:16:16,556-Speed 3097.82 samples/sec Loss 11.3713 LearningRate 0.0582 Epoch: 4 Global Step: 58930 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:16:19,874-Speed 3087.11 samples/sec Loss 11.3469 LearningRate 0.0582 Epoch: 4 Global Step: 58940 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:16:23,210-Speed 3069.97 samples/sec Loss 11.2098 LearningRate 0.0582 Epoch: 4 Global Step: 58950 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:16:26,538-Speed 3078.38 samples/sec Loss 11.2550 LearningRate 0.0582 Epoch: 4 Global Step: 58960 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:16:29,924-Speed 3024.78 samples/sec Loss 11.3209 LearningRate 0.0582 Epoch: 4 Global Step: 58970 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:16:33,323-Speed 3013.17 samples/sec Loss 11.1863 LearningRate 0.0582 Epoch: 4 Global Step: 58980 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:16:36,711-Speed 3024.25 samples/sec Loss 11.2248 LearningRate 0.0581 Epoch: 4 Global Step: 58990 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:16:40,107-Speed 3015.41 samples/sec Loss 11.2542 LearningRate 0.0581 Epoch: 4 Global Step: 59000 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:16:43,535-Speed 2988.73 samples/sec Loss 11.3270 LearningRate 0.0581 Epoch: 4 Global Step: 59010 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-04-27 07:16:46,870-Speed 3071.89 samples/sec Loss 11.0734 LearningRate 0.0581 Epoch: 4 Global Step: 59020 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:16:50,263-Speed 3017.76 samples/sec Loss 11.2173 LearningRate 0.0581 Epoch: 4 Global Step: 59030 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:16:53,622-Speed 3050.13 samples/sec Loss 11.2629 LearningRate 0.0581 Epoch: 4 Global Step: 59040 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:16:57,006-Speed 3026.86 samples/sec Loss 11.2636 LearningRate 0.0581 Epoch: 4 Global Step: 59050 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:17:00,435-Speed 2987.55 samples/sec Loss 11.2209 LearningRate 0.0581 Epoch: 4 Global Step: 59060 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:17:03,851-Speed 2998.28 samples/sec Loss 11.2626 LearningRate 0.0581 Epoch: 4 Global Step: 59070 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:17:07,226-Speed 3034.93 samples/sec Loss 11.2610 LearningRate 0.0581 Epoch: 4 Global Step: 59080 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:17:10,560-Speed 3071.50 samples/sec Loss 11.3697 LearningRate 0.0581 Epoch: 4 Global Step: 59090 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:17:13,950-Speed 3022.23 samples/sec Loss 11.3169 LearningRate 0.0581 Epoch: 4 Global Step: 59100 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:17:17,291-Speed 3065.56 samples/sec Loss 11.1991 LearningRate 0.0581 Epoch: 4 Global Step: 59110 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:17:20,663-Speed 3037.77 samples/sec Loss 11.1506 LearningRate 0.0581 Epoch: 4 Global Step: 59120 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:17:24,044-Speed 3028.82 samples/sec Loss 11.1910 LearningRate 0.0581 Epoch: 4 Global Step: 59130 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:17:27,388-Speed 3063.48 samples/sec Loss 11.1781 LearningRate 0.0581 Epoch: 4 Global Step: 59140 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:17:30,855-Speed 2954.66 samples/sec Loss 11.2046 LearningRate 0.0580 Epoch: 4 Global Step: 59150 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:17:34,204-Speed 3058.08 samples/sec Loss 11.4008 LearningRate 0.0580 Epoch: 4 Global Step: 59160 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:17:37,571-Speed 3042.73 samples/sec Loss 11.3308 LearningRate 0.0580 Epoch: 4 Global Step: 59170 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:17:41,004-Speed 2983.01 samples/sec Loss 11.1938 LearningRate 0.0580 Epoch: 4 Global Step: 59180 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:17:44,418-Speed 2999.97 samples/sec Loss 11.2794 LearningRate 0.0580 Epoch: 4 Global Step: 59190 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:17:47,791-Speed 3037.66 samples/sec Loss 11.3349 LearningRate 0.0580 Epoch: 4 Global Step: 59200 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:17:51,125-Speed 3071.89 samples/sec Loss 11.4177 LearningRate 0.0580 Epoch: 4 Global Step: 59210 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:17:54,484-Speed 3048.84 samples/sec Loss 11.1589 LearningRate 0.0580 Epoch: 4 Global Step: 59220 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-04-27 07:17:57,901-Speed 2998.09 samples/sec Loss 11.4366 LearningRate 0.0580 Epoch: 4 Global Step: 59230 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:18:01,259-Speed 3049.64 samples/sec Loss 11.2909 LearningRate 0.0580 Epoch: 4 Global Step: 59240 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:18:04,611-Speed 3055.86 samples/sec Loss 11.1808 LearningRate 0.0580 Epoch: 4 Global Step: 59250 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:18:07,957-Speed 3062.22 samples/sec Loss 11.2847 LearningRate 0.0580 Epoch: 4 Global Step: 59260 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:18:11,395-Speed 2979.52 samples/sec Loss 11.0426 LearningRate 0.0580 Epoch: 4 Global Step: 59270 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:18:14,804-Speed 3004.58 samples/sec Loss 11.2825 LearningRate 0.0580 Epoch: 4 Global Step: 59280 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:18:18,176-Speed 3037.51 samples/sec Loss 11.3475 LearningRate 0.0580 Epoch: 4 Global Step: 59290 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:18:21,557-Speed 3029.28 samples/sec Loss 11.2898 LearningRate 0.0580 Epoch: 4 Global Step: 59300 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:18:24,989-Speed 2984.51 samples/sec Loss 11.2033 LearningRate 0.0580 Epoch: 4 Global Step: 59310 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:18:28,348-Speed 3049.79 samples/sec Loss 11.1765 LearningRate 0.0579 Epoch: 4 Global Step: 59320 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:18:31,671-Speed 3082.60 samples/sec Loss 11.2205 LearningRate 0.0579 Epoch: 4 Global Step: 59330 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:18:35,025-Speed 3053.75 samples/sec Loss 11.2036 LearningRate 0.0579 Epoch: 4 Global Step: 59340 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:18:38,375-Speed 3057.24 samples/sec Loss 11.1290 LearningRate 0.0579 Epoch: 4 Global Step: 59350 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:18:41,736-Speed 3048.28 samples/sec Loss 11.2802 LearningRate 0.0579 Epoch: 4 Global Step: 59360 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:18:45,059-Speed 3082.47 samples/sec Loss 11.2414 LearningRate 0.0579 Epoch: 4 Global Step: 59370 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:18:48,422-Speed 3045.60 samples/sec Loss 11.2938 LearningRate 0.0579 Epoch: 4 Global Step: 59380 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:18:51,762-Speed 3066.07 samples/sec Loss 11.2341 LearningRate 0.0579 Epoch: 4 Global Step: 59390 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:18:55,168-Speed 3007.60 samples/sec Loss 11.2467 LearningRate 0.0579 Epoch: 4 Global Step: 59400 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:18:58,598-Speed 2986.29 samples/sec Loss 11.1551 LearningRate 0.0579 Epoch: 4 Global Step: 59410 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:19:01,999-Speed 3012.17 samples/sec Loss 10.9473 LearningRate 0.0579 Epoch: 4 Global Step: 59420 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:19:05,412-Speed 3000.97 samples/sec Loss 11.2783 LearningRate 0.0579 Epoch: 4 Global Step: 59430 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:19:08,776-Speed 3044.88 samples/sec Loss 11.2068 LearningRate 0.0579 Epoch: 4 Global Step: 59440 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:19:12,126-Speed 3057.82 samples/sec Loss 11.3180 LearningRate 0.0579 Epoch: 4 Global Step: 59450 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:19:15,603-Speed 2945.89 samples/sec Loss 11.1940 LearningRate 0.0579 Epoch: 4 Global Step: 59460 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:19:19,102-Speed 2927.32 samples/sec Loss 11.3416 LearningRate 0.0579 Epoch: 4 Global Step: 59470 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:19:22,526-Speed 2991.18 samples/sec Loss 11.1437 LearningRate 0.0578 Epoch: 4 Global Step: 59480 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:19:25,872-Speed 3061.36 samples/sec Loss 11.2616 LearningRate 0.0578 Epoch: 4 Global Step: 59490 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:19:29,330-Speed 2961.50 samples/sec Loss 11.2547 LearningRate 0.0578 Epoch: 4 Global Step: 59500 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:19:32,752-Speed 2993.90 samples/sec Loss 11.2030 LearningRate 0.0578 Epoch: 4 Global Step: 59510 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:19:36,111-Speed 3049.41 samples/sec Loss 11.2438 LearningRate 0.0578 Epoch: 4 Global Step: 59520 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:19:39,537-Speed 2988.93 samples/sec Loss 11.1023 LearningRate 0.0578 Epoch: 4 Global Step: 59530 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:19:42,968-Speed 2985.99 samples/sec Loss 11.1739 LearningRate 0.0578 Epoch: 4 Global Step: 59540 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:19:46,337-Speed 3040.05 samples/sec Loss 11.1926 LearningRate 0.0578 Epoch: 4 Global Step: 59550 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:19:49,688-Speed 3056.30 samples/sec Loss 11.1892 LearningRate 0.0578 Epoch: 4 Global Step: 59560 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:19:53,057-Speed 3040.68 samples/sec Loss 11.2705 LearningRate 0.0578 Epoch: 4 Global Step: 59570 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:19:56,462-Speed 3008.21 samples/sec Loss 11.2677 LearningRate 0.0578 Epoch: 4 Global Step: 59580 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:19:59,836-Speed 3035.57 samples/sec Loss 11.2732 LearningRate 0.0578 Epoch: 4 Global Step: 59590 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:20:03,245-Speed 3004.56 samples/sec Loss 11.2545 LearningRate 0.0578 Epoch: 4 Global Step: 59600 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:20:06,591-Speed 3061.65 samples/sec Loss 11.0794 LearningRate 0.0578 Epoch: 4 Global Step: 59610 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:20:10,012-Speed 2993.64 samples/sec Loss 11.1802 LearningRate 0.0578 Epoch: 4 Global Step: 59620 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:20:13,419-Speed 3006.71 samples/sec Loss 11.1128 LearningRate 0.0578 Epoch: 4 Global Step: 59630 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:20:16,781-Speed 3046.20 samples/sec Loss 11.2447 LearningRate 0.0577 Epoch: 4 Global Step: 59640 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:20:20,110-Speed 3076.97 samples/sec Loss 11.4589 LearningRate 0.0577 Epoch: 4 Global Step: 59650 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:20:23,536-Speed 2989.80 samples/sec Loss 11.3408 LearningRate 0.0577 Epoch: 4 Global Step: 59660 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-04-27 07:20:26,898-Speed 3046.59 samples/sec Loss 11.3307 LearningRate 0.0577 Epoch: 4 Global Step: 59670 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:20:30,312-Speed 3000.55 samples/sec Loss 11.1660 LearningRate 0.0577 Epoch: 4 Global Step: 59680 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:20:33,689-Speed 3033.16 samples/sec Loss 11.2469 LearningRate 0.0577 Epoch: 4 Global Step: 59690 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:20:36,997-Speed 3095.97 samples/sec Loss 11.2483 LearningRate 0.0577 Epoch: 4 Global Step: 59700 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:20:40,373-Speed 3033.80 samples/sec Loss 11.1691 LearningRate 0.0577 Epoch: 4 Global Step: 59710 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:20:43,772-Speed 3013.69 samples/sec Loss 11.2857 LearningRate 0.0577 Epoch: 4 Global Step: 59720 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:20:47,125-Speed 3054.99 samples/sec Loss 11.1574 LearningRate 0.0577 Epoch: 4 Global Step: 59730 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:20:50,462-Speed 3069.18 samples/sec Loss 11.2628 LearningRate 0.0577 Epoch: 4 Global Step: 59740 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:20:53,834-Speed 3038.20 samples/sec Loss 11.2246 LearningRate 0.0577 Epoch: 4 Global Step: 59750 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:20:57,164-Speed 3075.69 samples/sec Loss 11.2061 LearningRate 0.0577 Epoch: 4 Global Step: 59760 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:21:00,545-Speed 3029.50 samples/sec Loss 11.4423 LearningRate 0.0577 Epoch: 4 Global Step: 59770 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:21:03,947-Speed 3010.63 samples/sec Loss 11.1429 LearningRate 0.0577 Epoch: 4 Global Step: 59780 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:21:07,335-Speed 3023.08 samples/sec Loss 11.2141 LearningRate 0.0577 Epoch: 4 Global Step: 59790 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:21:10,669-Speed 3072.94 samples/sec Loss 11.1005 LearningRate 0.0577 Epoch: 4 Global Step: 59800 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:21:14,008-Speed 3067.65 samples/sec Loss 11.4540 LearningRate 0.0576 Epoch: 4 Global Step: 59810 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:21:17,376-Speed 3041.92 samples/sec Loss 11.3061 LearningRate 0.0576 Epoch: 4 Global Step: 59820 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:21:20,756-Speed 3030.37 samples/sec Loss 11.1445 LearningRate 0.0576 Epoch: 4 Global Step: 59830 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:21:24,149-Speed 3019.10 samples/sec Loss 11.0281 LearningRate 0.0576 Epoch: 4 Global Step: 59840 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:21:27,534-Speed 3026.01 samples/sec Loss 11.0818 LearningRate 0.0576 Epoch: 4 Global Step: 59850 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:21:30,952-Speed 2996.86 samples/sec Loss 11.1424 LearningRate 0.0576 Epoch: 4 Global Step: 59860 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:21:34,400-Speed 2970.41 samples/sec Loss 11.3719 LearningRate 0.0576 Epoch: 4 Global Step: 59870 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:21:37,884-Speed 2939.70 samples/sec Loss 11.1811 LearningRate 0.0576 Epoch: 4 Global Step: 59880 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:21:41,353-Speed 2953.40 samples/sec Loss 11.2454 LearningRate 0.0576 Epoch: 4 Global Step: 59890 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:21:44,781-Speed 2987.79 samples/sec Loss 11.2274 LearningRate 0.0576 Epoch: 4 Global Step: 59900 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:21:48,169-Speed 3023.08 samples/sec Loss 11.2519 LearningRate 0.0576 Epoch: 4 Global Step: 59910 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:21:51,536-Speed 3042.36 samples/sec Loss 11.1479 LearningRate 0.0576 Epoch: 4 Global Step: 59920 Fp16 Grad Scale: 262144 Required: 18 hours Training: 2022-04-27 07:21:54,938-Speed 3010.49 samples/sec Loss 11.1108 LearningRate 0.0576 Epoch: 4 Global Step: 59930 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:21:58,374-Speed 2982.24 samples/sec Loss 11.2713 LearningRate 0.0576 Epoch: 4 Global Step: 59940 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:22:01,823-Speed 2970.50 samples/sec Loss 11.2870 LearningRate 0.0576 Epoch: 4 Global Step: 59950 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:22:05,184-Speed 3046.76 samples/sec Loss 11.2219 LearningRate 0.0576 Epoch: 4 Global Step: 59960 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:22:08,613-Speed 2987.69 samples/sec Loss 11.2985 LearningRate 0.0575 Epoch: 4 Global Step: 59970 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:22:11,986-Speed 3037.04 samples/sec Loss 11.1085 LearningRate 0.0575 Epoch: 4 Global Step: 59980 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:22:15,373-Speed 3023.65 samples/sec Loss 11.2134 LearningRate 0.0575 Epoch: 4 Global Step: 59990 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:22:18,702-Speed 3077.20 samples/sec Loss 11.1727 LearningRate 0.0575 Epoch: 4 Global Step: 60000 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:22:22,070-Speed 3041.66 samples/sec Loss 11.1138 LearningRate 0.0575 Epoch: 4 Global Step: 60010 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:22:25,499-Speed 2987.00 samples/sec Loss 11.1712 LearningRate 0.0575 Epoch: 4 Global Step: 60020 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:22:28,862-Speed 3044.92 samples/sec Loss 11.2647 LearningRate 0.0575 Epoch: 4 Global Step: 60030 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:22:32,310-Speed 2971.22 samples/sec Loss 11.1341 LearningRate 0.0575 Epoch: 4 Global Step: 60040 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:22:35,705-Speed 3016.81 samples/sec Loss 11.2162 LearningRate 0.0575 Epoch: 4 Global Step: 60050 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:22:39,080-Speed 3034.62 samples/sec Loss 11.1132 LearningRate 0.0575 Epoch: 4 Global Step: 60060 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:22:42,442-Speed 3047.51 samples/sec Loss 11.2969 LearningRate 0.0575 Epoch: 4 Global Step: 60070 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:22:45,899-Speed 2962.48 samples/sec Loss 11.2354 LearningRate 0.0575 Epoch: 4 Global Step: 60080 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:22:49,374-Speed 2947.94 samples/sec Loss 11.2254 LearningRate 0.0575 Epoch: 4 Global Step: 60090 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:22:52,675-Speed 3102.97 samples/sec Loss 11.0932 LearningRate 0.0575 Epoch: 4 Global Step: 60100 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:22:56,061-Speed 3025.13 samples/sec Loss 11.1883 LearningRate 0.0575 Epoch: 4 Global Step: 60110 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:22:59,416-Speed 3053.13 samples/sec Loss 11.2048 LearningRate 0.0575 Epoch: 4 Global Step: 60120 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:23:02,832-Speed 2998.23 samples/sec Loss 11.2620 LearningRate 0.0574 Epoch: 4 Global Step: 60130 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:23:06,261-Speed 2987.47 samples/sec Loss 11.3082 LearningRate 0.0574 Epoch: 4 Global Step: 60140 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:23:09,739-Speed 2945.01 samples/sec Loss 11.2842 LearningRate 0.0574 Epoch: 4 Global Step: 60150 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:23:13,102-Speed 3045.65 samples/sec Loss 11.1924 LearningRate 0.0574 Epoch: 4 Global Step: 60160 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:23:16,475-Speed 3037.06 samples/sec Loss 11.2080 LearningRate 0.0574 Epoch: 4 Global Step: 60170 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:23:19,883-Speed 3005.77 samples/sec Loss 10.8748 LearningRate 0.0574 Epoch: 4 Global Step: 60180 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:23:23,208-Speed 3080.16 samples/sec Loss 11.2766 LearningRate 0.0574 Epoch: 4 Global Step: 60190 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:23:26,552-Speed 3063.17 samples/sec Loss 11.2931 LearningRate 0.0574 Epoch: 4 Global Step: 60200 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:23:29,989-Speed 2980.10 samples/sec Loss 11.2317 LearningRate 0.0574 Epoch: 4 Global Step: 60210 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:23:33,460-Speed 2950.80 samples/sec Loss 11.1386 LearningRate 0.0574 Epoch: 4 Global Step: 60220 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:23:36,885-Speed 2990.70 samples/sec Loss 11.2783 LearningRate 0.0574 Epoch: 4 Global Step: 60230 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:23:40,264-Speed 3030.80 samples/sec Loss 11.1524 LearningRate 0.0574 Epoch: 4 Global Step: 60240 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:23:43,626-Speed 3047.27 samples/sec Loss 11.1620 LearningRate 0.0574 Epoch: 4 Global Step: 60250 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:23:47,035-Speed 3004.20 samples/sec Loss 11.3498 LearningRate 0.0574 Epoch: 4 Global Step: 60260 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:23:50,413-Speed 3032.14 samples/sec Loss 11.0794 LearningRate 0.0574 Epoch: 4 Global Step: 60270 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:23:53,748-Speed 3071.66 samples/sec Loss 11.0537 LearningRate 0.0574 Epoch: 4 Global Step: 60280 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:23:57,104-Speed 3052.00 samples/sec Loss 11.2670 LearningRate 0.0574 Epoch: 4 Global Step: 60290 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:24:00,539-Speed 2982.53 samples/sec Loss 11.0650 LearningRate 0.0573 Epoch: 4 Global Step: 60300 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:24:03,898-Speed 3049.01 samples/sec Loss 11.2761 LearningRate 0.0573 Epoch: 4 Global Step: 60310 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:24:07,236-Speed 3068.99 samples/sec Loss 11.1370 LearningRate 0.0573 Epoch: 4 Global Step: 60320 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:24:10,614-Speed 3031.98 samples/sec Loss 11.0291 LearningRate 0.0573 Epoch: 4 Global Step: 60330 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:24:14,013-Speed 3013.46 samples/sec Loss 11.1805 LearningRate 0.0573 Epoch: 4 Global Step: 60340 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:24:17,403-Speed 3021.52 samples/sec Loss 11.1115 LearningRate 0.0573 Epoch: 4 Global Step: 60350 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:24:20,816-Speed 3001.87 samples/sec Loss 11.2587 LearningRate 0.0573 Epoch: 4 Global Step: 60360 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:24:24,281-Speed 2956.01 samples/sec Loss 11.0560 LearningRate 0.0573 Epoch: 4 Global Step: 60370 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:24:27,767-Speed 2938.12 samples/sec Loss 11.1540 LearningRate 0.0573 Epoch: 4 Global Step: 60380 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:24:31,195-Speed 2988.39 samples/sec Loss 11.1942 LearningRate 0.0573 Epoch: 4 Global Step: 60390 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:24:34,599-Speed 3008.45 samples/sec Loss 11.0612 LearningRate 0.0573 Epoch: 4 Global Step: 60400 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:24:37,990-Speed 3020.56 samples/sec Loss 11.0661 LearningRate 0.0573 Epoch: 4 Global Step: 60410 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:24:41,407-Speed 2997.67 samples/sec Loss 11.1319 LearningRate 0.0573 Epoch: 4 Global Step: 60420 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:24:44,823-Speed 2998.29 samples/sec Loss 11.2434 LearningRate 0.0573 Epoch: 4 Global Step: 60430 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:24:48,161-Speed 3068.59 samples/sec Loss 11.0640 LearningRate 0.0573 Epoch: 4 Global Step: 60440 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:24:51,622-Speed 2959.83 samples/sec Loss 11.3050 LearningRate 0.0573 Epoch: 4 Global Step: 60450 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:24:54,972-Speed 3056.97 samples/sec Loss 11.3213 LearningRate 0.0572 Epoch: 4 Global Step: 60460 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:24:58,429-Speed 2963.18 samples/sec Loss 11.3371 LearningRate 0.0572 Epoch: 4 Global Step: 60470 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:25:01,877-Speed 2970.77 samples/sec Loss 11.2885 LearningRate 0.0572 Epoch: 4 Global Step: 60480 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:25:05,316-Speed 2978.84 samples/sec Loss 11.2164 LearningRate 0.0572 Epoch: 4 Global Step: 60490 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:25:08,743-Speed 2988.41 samples/sec Loss 11.1755 LearningRate 0.0572 Epoch: 4 Global Step: 60500 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:25:12,096-Speed 3055.01 samples/sec Loss 11.1525 LearningRate 0.0572 Epoch: 4 Global Step: 60510 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:25:15,499-Speed 3009.79 samples/sec Loss 11.1620 LearningRate 0.0572 Epoch: 4 Global Step: 60520 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:25:18,916-Speed 2997.33 samples/sec Loss 11.0883 LearningRate 0.0572 Epoch: 4 Global Step: 60530 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:25:22,355-Speed 2979.03 samples/sec Loss 11.1982 LearningRate 0.0572 Epoch: 4 Global Step: 60540 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:25:25,759-Speed 3008.27 samples/sec Loss 11.2685 LearningRate 0.0572 Epoch: 4 Global Step: 60550 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:25:29,170-Speed 3003.71 samples/sec Loss 11.1812 LearningRate 0.0572 Epoch: 4 Global Step: 60560 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:25:32,516-Speed 3061.10 samples/sec Loss 11.1234 LearningRate 0.0572 Epoch: 4 Global Step: 60570 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:25:35,847-Speed 3074.55 samples/sec Loss 11.1073 LearningRate 0.0572 Epoch: 4 Global Step: 60580 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:25:39,231-Speed 3027.42 samples/sec Loss 11.1933 LearningRate 0.0572 Epoch: 4 Global Step: 60590 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:25:42,703-Speed 2950.14 samples/sec Loss 11.2736 LearningRate 0.0572 Epoch: 4 Global Step: 60600 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:25:46,066-Speed 3045.61 samples/sec Loss 11.3268 LearningRate 0.0572 Epoch: 4 Global Step: 60610 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:25:49,446-Speed 3029.83 samples/sec Loss 11.0463 LearningRate 0.0572 Epoch: 4 Global Step: 60620 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:25:52,767-Speed 3084.66 samples/sec Loss 11.2316 LearningRate 0.0571 Epoch: 4 Global Step: 60630 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:25:56,090-Speed 3081.90 samples/sec Loss 11.1173 LearningRate 0.0571 Epoch: 4 Global Step: 60640 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:25:59,477-Speed 3024.10 samples/sec Loss 11.2336 LearningRate 0.0571 Epoch: 4 Global Step: 60650 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:26:02,806-Speed 3077.13 samples/sec Loss 11.1490 LearningRate 0.0571 Epoch: 4 Global Step: 60660 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:26:06,131-Speed 3080.85 samples/sec Loss 11.0104 LearningRate 0.0571 Epoch: 4 Global Step: 60670 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:26:09,438-Speed 3098.00 samples/sec Loss 11.1834 LearningRate 0.0571 Epoch: 4 Global Step: 60680 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:26:12,765-Speed 3078.52 samples/sec Loss 11.1156 LearningRate 0.0571 Epoch: 4 Global Step: 60690 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:26:16,081-Speed 3089.13 samples/sec Loss 11.2682 LearningRate 0.0571 Epoch: 4 Global Step: 60700 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:26:19,416-Speed 3070.82 samples/sec Loss 11.1588 LearningRate 0.0571 Epoch: 4 Global Step: 60710 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:26:22,792-Speed 3033.50 samples/sec Loss 11.1742 LearningRate 0.0571 Epoch: 4 Global Step: 60720 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:26:26,160-Speed 3041.25 samples/sec Loss 11.1945 LearningRate 0.0571 Epoch: 4 Global Step: 60730 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:26:29,514-Speed 3054.48 samples/sec Loss 10.9830 LearningRate 0.0571 Epoch: 4 Global Step: 60740 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:26:32,853-Speed 3067.23 samples/sec Loss 11.1539 LearningRate 0.0571 Epoch: 4 Global Step: 60750 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:26:36,188-Speed 3071.43 samples/sec Loss 11.0890 LearningRate 0.0571 Epoch: 4 Global Step: 60760 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:26:39,580-Speed 3019.34 samples/sec Loss 11.1861 LearningRate 0.0571 Epoch: 4 Global Step: 60770 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:26:42,889-Speed 3096.15 samples/sec Loss 11.0513 LearningRate 0.0571 Epoch: 4 Global Step: 60780 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:26:46,279-Speed 3021.35 samples/sec Loss 11.1671 LearningRate 0.0570 Epoch: 4 Global Step: 60790 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:26:49,670-Speed 3021.28 samples/sec Loss 11.1419 LearningRate 0.0570 Epoch: 4 Global Step: 60800 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:26:53,052-Speed 3028.53 samples/sec Loss 11.1774 LearningRate 0.0570 Epoch: 4 Global Step: 60810 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:26:56,451-Speed 3013.49 samples/sec Loss 11.1326 LearningRate 0.0570 Epoch: 4 Global Step: 60820 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:26:59,863-Speed 3001.89 samples/sec Loss 11.1420 LearningRate 0.0570 Epoch: 4 Global Step: 60830 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:27:03,194-Speed 3075.30 samples/sec Loss 11.0427 LearningRate 0.0570 Epoch: 4 Global Step: 60840 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:27:06,568-Speed 3036.18 samples/sec Loss 11.0559 LearningRate 0.0570 Epoch: 4 Global Step: 60850 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:27:09,969-Speed 3011.18 samples/sec Loss 11.1904 LearningRate 0.0570 Epoch: 4 Global Step: 60860 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:27:13,282-Speed 3091.67 samples/sec Loss 11.0762 LearningRate 0.0570 Epoch: 4 Global Step: 60870 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:27:16,625-Speed 3064.03 samples/sec Loss 11.2240 LearningRate 0.0570 Epoch: 4 Global Step: 60880 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:27:20,020-Speed 3017.78 samples/sec Loss 11.1150 LearningRate 0.0570 Epoch: 4 Global Step: 60890 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:27:23,377-Speed 3050.42 samples/sec Loss 11.1219 LearningRate 0.0570 Epoch: 4 Global Step: 60900 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:27:26,790-Speed 3001.60 samples/sec Loss 11.0545 LearningRate 0.0570 Epoch: 4 Global Step: 60910 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:27:30,174-Speed 3026.33 samples/sec Loss 10.9691 LearningRate 0.0570 Epoch: 4 Global Step: 60920 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:27:33,548-Speed 3036.24 samples/sec Loss 11.1477 LearningRate 0.0570 Epoch: 4 Global Step: 60930 Fp16 Grad Scale: 131072 Required: 18 hours Training: 2022-04-27 07:27:36,865-Speed 3088.25 samples/sec Loss 11.1679 LearningRate 0.0570 Epoch: 4 Global Step: 60940 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:27:40,203-Speed 3069.41 samples/sec Loss 11.1470 LearningRate 0.0569 Epoch: 4 Global Step: 60950 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:27:43,593-Speed 3021.51 samples/sec Loss 10.9971 LearningRate 0.0569 Epoch: 4 Global Step: 60960 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:27:47,002-Speed 3004.55 samples/sec Loss 11.1323 LearningRate 0.0569 Epoch: 4 Global Step: 60970 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:27:50,408-Speed 3007.91 samples/sec Loss 11.0824 LearningRate 0.0569 Epoch: 4 Global Step: 60980 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 07:27:53,848-Speed 2977.95 samples/sec Loss 11.1218 LearningRate 0.0569 Epoch: 4 Global Step: 60990 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 07:27:57,229-Speed 3030.94 samples/sec Loss 11.1417 LearningRate 0.0569 Epoch: 4 Global Step: 61000 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 07:28:00,586-Speed 3051.08 samples/sec Loss 11.0456 LearningRate 0.0569 Epoch: 4 Global Step: 61010 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 07:28:04,006-Speed 2995.53 samples/sec Loss 11.1144 LearningRate 0.0569 Epoch: 4 Global Step: 61020 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 07:28:07,360-Speed 3053.29 samples/sec Loss 10.9617 LearningRate 0.0569 Epoch: 4 Global Step: 61030 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 07:28:10,682-Speed 3083.52 samples/sec Loss 11.1223 LearningRate 0.0569 Epoch: 4 Global Step: 61040 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 07:28:14,092-Speed 3003.72 samples/sec Loss 11.1110 LearningRate 0.0569 Epoch: 4 Global Step: 61050 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 07:28:17,460-Speed 3041.03 samples/sec Loss 10.9734 LearningRate 0.0569 Epoch: 4 Global Step: 61060 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 07:28:20,893-Speed 2983.81 samples/sec Loss 11.1297 LearningRate 0.0569 Epoch: 4 Global Step: 61070 Fp16 Grad Scale: 32768 Required: 18 hours Training: 2022-04-27 07:28:24,240-Speed 3060.49 samples/sec Loss 11.1981 LearningRate 0.0569 Epoch: 4 Global Step: 61080 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:28:27,659-Speed 2995.72 samples/sec Loss 11.2074 LearningRate 0.0569 Epoch: 4 Global Step: 61090 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:28:31,079-Speed 2995.56 samples/sec Loss 11.1309 LearningRate 0.0569 Epoch: 4 Global Step: 61100 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:28:34,481-Speed 3010.51 samples/sec Loss 11.0526 LearningRate 0.0569 Epoch: 4 Global Step: 61110 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:28:37,884-Speed 3010.34 samples/sec Loss 11.1642 LearningRate 0.0568 Epoch: 4 Global Step: 61120 Fp16 Grad Scale: 65536 Required: 18 hours Training: 2022-04-27 07:28:41,267-Speed 3028.14 samples/sec Loss 11.0651 LearningRate 0.0568 Epoch: 4 Global Step: 61130 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:28:44,611-Speed 3062.82 samples/sec Loss 11.1802 LearningRate 0.0568 Epoch: 4 Global Step: 61140 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:28:47,986-Speed 3035.16 samples/sec Loss 11.2556 LearningRate 0.0568 Epoch: 4 Global Step: 61150 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:28:51,370-Speed 3027.10 samples/sec Loss 11.1775 LearningRate 0.0568 Epoch: 4 Global Step: 61160 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:28:54,784-Speed 3000.00 samples/sec Loss 10.9830 LearningRate 0.0568 Epoch: 4 Global Step: 61170 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:28:58,205-Speed 2994.21 samples/sec Loss 11.0995 LearningRate 0.0568 Epoch: 4 Global Step: 61180 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:29:01,576-Speed 3038.85 samples/sec Loss 11.1942 LearningRate 0.0568 Epoch: 4 Global Step: 61190 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:29:05,007-Speed 2985.05 samples/sec Loss 11.0662 LearningRate 0.0568 Epoch: 4 Global Step: 61200 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:29:08,481-Speed 2948.43 samples/sec Loss 11.0456 LearningRate 0.0568 Epoch: 4 Global Step: 61210 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:29:11,890-Speed 3004.85 samples/sec Loss 11.1161 LearningRate 0.0568 Epoch: 4 Global Step: 61220 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:29:15,359-Speed 2952.89 samples/sec Loss 11.1033 LearningRate 0.0568 Epoch: 4 Global Step: 61230 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:29:18,736-Speed 3033.13 samples/sec Loss 11.0022 LearningRate 0.0568 Epoch: 4 Global Step: 61240 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:29:22,170-Speed 2982.85 samples/sec Loss 10.9823 LearningRate 0.0568 Epoch: 4 Global Step: 61250 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:29:25,625-Speed 2965.38 samples/sec Loss 11.1470 LearningRate 0.0568 Epoch: 4 Global Step: 61260 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:29:29,037-Speed 3002.04 samples/sec Loss 10.9460 LearningRate 0.0568 Epoch: 4 Global Step: 61270 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:29:32,490-Speed 2965.97 samples/sec Loss 11.0930 LearningRate 0.0567 Epoch: 4 Global Step: 61280 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:29:35,889-Speed 3014.11 samples/sec Loss 11.1668 LearningRate 0.0567 Epoch: 4 Global Step: 61290 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:29:39,250-Speed 3047.77 samples/sec Loss 11.1518 LearningRate 0.0567 Epoch: 4 Global Step: 61300 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:29:42,634-Speed 3026.47 samples/sec Loss 11.2314 LearningRate 0.0567 Epoch: 4 Global Step: 61310 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:29:46,029-Speed 3017.77 samples/sec Loss 11.0954 LearningRate 0.0567 Epoch: 4 Global Step: 61320 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:29:49,418-Speed 3022.34 samples/sec Loss 11.1765 LearningRate 0.0567 Epoch: 4 Global Step: 61330 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:29:52,825-Speed 3005.78 samples/sec Loss 11.2132 LearningRate 0.0567 Epoch: 4 Global Step: 61340 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:29:56,226-Speed 3012.12 samples/sec Loss 11.0995 LearningRate 0.0567 Epoch: 4 Global Step: 61350 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:29:59,618-Speed 3019.89 samples/sec Loss 11.1593 LearningRate 0.0567 Epoch: 4 Global Step: 61360 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:30:03,068-Speed 2969.20 samples/sec Loss 11.0817 LearningRate 0.0567 Epoch: 4 Global Step: 61370 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:30:06,416-Speed 3058.96 samples/sec Loss 11.1802 LearningRate 0.0567 Epoch: 4 Global Step: 61380 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:30:09,799-Speed 3027.97 samples/sec Loss 11.0379 LearningRate 0.0567 Epoch: 4 Global Step: 61390 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:30:13,251-Speed 2967.14 samples/sec Loss 11.0088 LearningRate 0.0567 Epoch: 4 Global Step: 61400 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:30:16,755-Speed 2923.19 samples/sec Loss 11.0874 LearningRate 0.0567 Epoch: 4 Global Step: 61410 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:30:20,091-Speed 3070.83 samples/sec Loss 11.1335 LearningRate 0.0567 Epoch: 4 Global Step: 61420 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:30:23,466-Speed 3034.67 samples/sec Loss 11.0737 LearningRate 0.0567 Epoch: 4 Global Step: 61430 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:30:26,869-Speed 3010.00 samples/sec Loss 10.9982 LearningRate 0.0567 Epoch: 4 Global Step: 61440 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:30:30,242-Speed 3036.74 samples/sec Loss 10.9013 LearningRate 0.0566 Epoch: 4 Global Step: 61450 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:30:33,677-Speed 2982.20 samples/sec Loss 11.0042 LearningRate 0.0566 Epoch: 4 Global Step: 61460 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:30:37,024-Speed 3060.60 samples/sec Loss 11.0231 LearningRate 0.0566 Epoch: 4 Global Step: 61470 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:30:40,495-Speed 2950.84 samples/sec Loss 11.1729 LearningRate 0.0566 Epoch: 4 Global Step: 61480 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:30:43,867-Speed 3037.68 samples/sec Loss 11.1574 LearningRate 0.0566 Epoch: 4 Global Step: 61490 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:30:47,202-Speed 3071.55 samples/sec Loss 11.0872 LearningRate 0.0566 Epoch: 4 Global Step: 61500 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:30:50,659-Speed 2962.86 samples/sec Loss 11.1720 LearningRate 0.0566 Epoch: 4 Global Step: 61510 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:30:54,102-Speed 2975.02 samples/sec Loss 11.1206 LearningRate 0.0566 Epoch: 4 Global Step: 61520 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:30:57,481-Speed 3031.44 samples/sec Loss 10.9862 LearningRate 0.0566 Epoch: 4 Global Step: 61530 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:31:00,853-Speed 3037.77 samples/sec Loss 10.9804 LearningRate 0.0566 Epoch: 4 Global Step: 61540 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:31:04,233-Speed 3030.11 samples/sec Loss 11.1287 LearningRate 0.0566 Epoch: 4 Global Step: 61550 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:31:07,646-Speed 3001.89 samples/sec Loss 10.9787 LearningRate 0.0566 Epoch: 4 Global Step: 61560 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:31:11,033-Speed 3023.76 samples/sec Loss 11.0283 LearningRate 0.0566 Epoch: 4 Global Step: 61570 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:31:14,541-Speed 2919.96 samples/sec Loss 11.1294 LearningRate 0.0566 Epoch: 4 Global Step: 61580 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:31:17,980-Speed 2978.81 samples/sec Loss 11.0197 LearningRate 0.0566 Epoch: 4 Global Step: 61590 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:31:21,420-Speed 2977.53 samples/sec Loss 11.2090 LearningRate 0.0566 Epoch: 4 Global Step: 61600 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:31:24,786-Speed 3042.21 samples/sec Loss 11.1417 LearningRate 0.0565 Epoch: 4 Global Step: 61610 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:31:28,282-Speed 2929.88 samples/sec Loss 11.0679 LearningRate 0.0565 Epoch: 4 Global Step: 61620 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:31:31,657-Speed 3035.34 samples/sec Loss 10.8976 LearningRate 0.0565 Epoch: 4 Global Step: 61630 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:31:35,033-Speed 3033.74 samples/sec Loss 11.0406 LearningRate 0.0565 Epoch: 4 Global Step: 61640 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:31:38,432-Speed 3013.42 samples/sec Loss 11.1043 LearningRate 0.0565 Epoch: 4 Global Step: 61650 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:31:41,813-Speed 3030.16 samples/sec Loss 11.0649 LearningRate 0.0565 Epoch: 4 Global Step: 61660 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:31:45,148-Speed 3071.34 samples/sec Loss 10.9366 LearningRate 0.0565 Epoch: 4 Global Step: 61670 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:31:48,572-Speed 2991.08 samples/sec Loss 11.0169 LearningRate 0.0565 Epoch: 4 Global Step: 61680 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:31:51,923-Speed 3056.85 samples/sec Loss 11.0757 LearningRate 0.0565 Epoch: 4 Global Step: 61690 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:31:55,347-Speed 2991.55 samples/sec Loss 11.0033 LearningRate 0.0565 Epoch: 4 Global Step: 61700 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:31:58,756-Speed 3004.97 samples/sec Loss 11.1916 LearningRate 0.0565 Epoch: 4 Global Step: 61710 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:32:02,131-Speed 3034.86 samples/sec Loss 11.3488 LearningRate 0.0565 Epoch: 4 Global Step: 61720 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:32:05,446-Speed 3089.34 samples/sec Loss 11.0641 LearningRate 0.0565 Epoch: 4 Global Step: 61730 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:32:08,887-Speed 2977.53 samples/sec Loss 11.0887 LearningRate 0.0565 Epoch: 4 Global Step: 61740 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:32:12,289-Speed 3010.17 samples/sec Loss 11.0607 LearningRate 0.0565 Epoch: 4 Global Step: 61750 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:32:15,746-Speed 2963.12 samples/sec Loss 11.1026 LearningRate 0.0565 Epoch: 4 Global Step: 61760 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:32:19,111-Speed 3044.33 samples/sec Loss 11.0546 LearningRate 0.0565 Epoch: 4 Global Step: 61770 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:32:22,534-Speed 2992.65 samples/sec Loss 11.0142 LearningRate 0.0564 Epoch: 4 Global Step: 61780 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:32:25,968-Speed 2982.47 samples/sec Loss 11.0530 LearningRate 0.0564 Epoch: 4 Global Step: 61790 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:32:29,381-Speed 3001.04 samples/sec Loss 11.1621 LearningRate 0.0564 Epoch: 4 Global Step: 61800 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:32:32,879-Speed 2928.48 samples/sec Loss 11.0021 LearningRate 0.0564 Epoch: 4 Global Step: 61810 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:32:36,273-Speed 3017.06 samples/sec Loss 11.0466 LearningRate 0.0564 Epoch: 4 Global Step: 61820 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:32:39,647-Speed 3036.80 samples/sec Loss 11.1266 LearningRate 0.0564 Epoch: 4 Global Step: 61830 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:32:43,064-Speed 2997.13 samples/sec Loss 11.1204 LearningRate 0.0564 Epoch: 4 Global Step: 61840 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:32:46,447-Speed 3027.60 samples/sec Loss 11.1243 LearningRate 0.0564 Epoch: 4 Global Step: 61850 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:32:49,881-Speed 2983.39 samples/sec Loss 11.0452 LearningRate 0.0564 Epoch: 4 Global Step: 61860 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:32:53,305-Speed 2991.72 samples/sec Loss 10.9924 LearningRate 0.0564 Epoch: 4 Global Step: 61870 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:32:56,842-Speed 2896.96 samples/sec Loss 11.2675 LearningRate 0.0564 Epoch: 4 Global Step: 61880 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:33:00,184-Speed 3065.17 samples/sec Loss 10.9703 LearningRate 0.0564 Epoch: 4 Global Step: 61890 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:33:03,577-Speed 3018.36 samples/sec Loss 10.9994 LearningRate 0.0564 Epoch: 4 Global Step: 61900 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:33:06,992-Speed 3000.03 samples/sec Loss 10.9538 LearningRate 0.0564 Epoch: 4 Global Step: 61910 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:33:10,360-Speed 3040.95 samples/sec Loss 11.1731 LearningRate 0.0564 Epoch: 4 Global Step: 61920 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:33:13,858-Speed 2929.93 samples/sec Loss 11.0406 LearningRate 0.0564 Epoch: 4 Global Step: 61930 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:33:17,360-Speed 2924.21 samples/sec Loss 11.0319 LearningRate 0.0563 Epoch: 4 Global Step: 61940 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:33:20,738-Speed 3032.30 samples/sec Loss 11.0258 LearningRate 0.0563 Epoch: 4 Global Step: 61950 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:33:24,104-Speed 3042.94 samples/sec Loss 11.1742 LearningRate 0.0563 Epoch: 4 Global Step: 61960 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 07:33:27,503-Speed 3014.09 samples/sec Loss 11.0447 LearningRate 0.0563 Epoch: 4 Global Step: 61970 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 07:33:30,922-Speed 2995.65 samples/sec Loss 11.0774 LearningRate 0.0563 Epoch: 4 Global Step: 61980 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 07:33:34,273-Speed 3056.98 samples/sec Loss 11.0531 LearningRate 0.0563 Epoch: 4 Global Step: 61990 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 07:33:37,665-Speed 3019.51 samples/sec Loss 11.0466 LearningRate 0.0563 Epoch: 4 Global Step: 62000 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 07:33:41,083-Speed 2997.11 samples/sec Loss 10.9762 LearningRate 0.0563 Epoch: 4 Global Step: 62010 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 07:33:44,538-Speed 2964.62 samples/sec Loss 11.0364 LearningRate 0.0563 Epoch: 4 Global Step: 62020 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 07:33:47,919-Speed 3029.45 samples/sec Loss 11.0337 LearningRate 0.0563 Epoch: 4 Global Step: 62030 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 07:33:51,355-Speed 2980.89 samples/sec Loss 10.9832 LearningRate 0.0563 Epoch: 4 Global Step: 62040 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 07:33:54,762-Speed 3006.58 samples/sec Loss 11.1033 LearningRate 0.0563 Epoch: 4 Global Step: 62050 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 07:33:58,208-Speed 2972.60 samples/sec Loss 10.9834 LearningRate 0.0563 Epoch: 4 Global Step: 62060 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:34:01,662-Speed 2966.19 samples/sec Loss 11.0649 LearningRate 0.0563 Epoch: 4 Global Step: 62070 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:34:05,064-Speed 3010.35 samples/sec Loss 11.0305 LearningRate 0.0563 Epoch: 4 Global Step: 62080 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:34:08,478-Speed 3000.21 samples/sec Loss 11.0374 LearningRate 0.0563 Epoch: 4 Global Step: 62090 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:34:12,113-Speed 2818.60 samples/sec Loss 11.0921 LearningRate 0.0563 Epoch: 4 Global Step: 62100 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:34:44,975-Speed 311.61 samples/sec Loss 10.3215 LearningRate 0.0562 Epoch: 5 Global Step: 62110 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:34:48,768-Speed 2701.18 samples/sec Loss 9.5626 LearningRate 0.0562 Epoch: 5 Global Step: 62120 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:34:52,112-Speed 3062.74 samples/sec Loss 9.5958 LearningRate 0.0562 Epoch: 5 Global Step: 62130 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:34:55,573-Speed 2959.98 samples/sec Loss 9.5816 LearningRate 0.0562 Epoch: 5 Global Step: 62140 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:34:58,988-Speed 2999.98 samples/sec Loss 9.4290 LearningRate 0.0562 Epoch: 5 Global Step: 62150 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:35:02,423-Speed 2981.90 samples/sec Loss 9.6085 LearningRate 0.0562 Epoch: 5 Global Step: 62160 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:35:05,786-Speed 3045.78 samples/sec Loss 9.5169 LearningRate 0.0562 Epoch: 5 Global Step: 62170 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:35:09,194-Speed 3005.15 samples/sec Loss 9.6375 LearningRate 0.0562 Epoch: 5 Global Step: 62180 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:35:12,533-Speed 3068.40 samples/sec Loss 9.6095 LearningRate 0.0562 Epoch: 5 Global Step: 62190 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:35:15,905-Speed 3036.82 samples/sec Loss 9.4291 LearningRate 0.0562 Epoch: 5 Global Step: 62200 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:35:19,263-Speed 3050.83 samples/sec Loss 9.6182 LearningRate 0.0562 Epoch: 5 Global Step: 62210 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:35:22,570-Speed 3096.97 samples/sec Loss 9.4552 LearningRate 0.0562 Epoch: 5 Global Step: 62220 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:35:25,994-Speed 2991.18 samples/sec Loss 9.4728 LearningRate 0.0562 Epoch: 5 Global Step: 62230 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:35:29,446-Speed 2968.06 samples/sec Loss 9.6476 LearningRate 0.0562 Epoch: 5 Global Step: 62240 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:35:32,791-Speed 3061.93 samples/sec Loss 9.7005 LearningRate 0.0562 Epoch: 5 Global Step: 62250 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:35:36,151-Speed 3049.13 samples/sec Loss 9.7295 LearningRate 0.0562 Epoch: 5 Global Step: 62260 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:35:39,554-Speed 3009.71 samples/sec Loss 9.7019 LearningRate 0.0562 Epoch: 5 Global Step: 62270 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:35:42,972-Speed 2996.32 samples/sec Loss 9.5800 LearningRate 0.0561 Epoch: 5 Global Step: 62280 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:35:46,472-Speed 2926.86 samples/sec Loss 9.6913 LearningRate 0.0561 Epoch: 5 Global Step: 62290 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:35:49,887-Speed 2999.49 samples/sec Loss 9.6377 LearningRate 0.0561 Epoch: 5 Global Step: 62300 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:35:53,396-Speed 2918.96 samples/sec Loss 9.6242 LearningRate 0.0561 Epoch: 5 Global Step: 62310 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:35:56,726-Speed 3075.58 samples/sec Loss 9.5635 LearningRate 0.0561 Epoch: 5 Global Step: 62320 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:36:00,135-Speed 3005.32 samples/sec Loss 9.6383 LearningRate 0.0561 Epoch: 5 Global Step: 62330 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:36:03,485-Speed 3057.54 samples/sec Loss 9.8049 LearningRate 0.0561 Epoch: 5 Global Step: 62340 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:36:06,871-Speed 3024.89 samples/sec Loss 9.6228 LearningRate 0.0561 Epoch: 5 Global Step: 62350 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:36:10,238-Speed 3041.73 samples/sec Loss 9.9038 LearningRate 0.0561 Epoch: 5 Global Step: 62360 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:36:13,572-Speed 3072.37 samples/sec Loss 9.7633 LearningRate 0.0561 Epoch: 5 Global Step: 62370 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:36:16,951-Speed 3031.99 samples/sec Loss 9.8192 LearningRate 0.0561 Epoch: 5 Global Step: 62380 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:36:20,354-Speed 3009.13 samples/sec Loss 9.7288 LearningRate 0.0561 Epoch: 5 Global Step: 62390 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:36:23,768-Speed 3000.21 samples/sec Loss 9.8726 LearningRate 0.0561 Epoch: 5 Global Step: 62400 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:36:27,319-Speed 2884.85 samples/sec Loss 9.8785 LearningRate 0.0561 Epoch: 5 Global Step: 62410 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:36:30,705-Speed 3025.17 samples/sec Loss 9.7871 LearningRate 0.0561 Epoch: 5 Global Step: 62420 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:36:34,312-Speed 2839.23 samples/sec Loss 9.8195 LearningRate 0.0561 Epoch: 5 Global Step: 62430 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:36:38,013-Speed 2767.52 samples/sec Loss 9.7767 LearningRate 0.0560 Epoch: 5 Global Step: 62440 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:36:41,456-Speed 2975.30 samples/sec Loss 9.8623 LearningRate 0.0560 Epoch: 5 Global Step: 62450 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:36:44,884-Speed 2988.63 samples/sec Loss 9.7631 LearningRate 0.0560 Epoch: 5 Global Step: 62460 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:36:48,349-Speed 2955.72 samples/sec Loss 9.7628 LearningRate 0.0560 Epoch: 5 Global Step: 62470 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:36:51,846-Speed 2928.78 samples/sec Loss 9.8545 LearningRate 0.0560 Epoch: 5 Global Step: 62480 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:36:55,262-Speed 2998.28 samples/sec Loss 9.8812 LearningRate 0.0560 Epoch: 5 Global Step: 62490 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:36:58,635-Speed 3036.89 samples/sec Loss 9.6874 LearningRate 0.0560 Epoch: 5 Global Step: 62500 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:37:02,029-Speed 3018.26 samples/sec Loss 9.6779 LearningRate 0.0560 Epoch: 5 Global Step: 62510 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:37:05,398-Speed 3040.71 samples/sec Loss 9.7529 LearningRate 0.0560 Epoch: 5 Global Step: 62520 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:37:08,776-Speed 3032.78 samples/sec Loss 9.9488 LearningRate 0.0560 Epoch: 5 Global Step: 62530 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:37:12,171-Speed 3017.21 samples/sec Loss 9.9177 LearningRate 0.0560 Epoch: 5 Global Step: 62540 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:37:15,629-Speed 2961.71 samples/sec Loss 9.9251 LearningRate 0.0560 Epoch: 5 Global Step: 62550 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:37:19,067-Speed 2979.45 samples/sec Loss 9.9615 LearningRate 0.0560 Epoch: 5 Global Step: 62560 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:37:22,466-Speed 3014.10 samples/sec Loss 9.8064 LearningRate 0.0560 Epoch: 5 Global Step: 62570 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:37:25,859-Speed 3018.21 samples/sec Loss 9.7462 LearningRate 0.0560 Epoch: 5 Global Step: 62580 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:37:29,243-Speed 3027.48 samples/sec Loss 9.8390 LearningRate 0.0560 Epoch: 5 Global Step: 62590 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:37:32,591-Speed 3059.40 samples/sec Loss 9.7444 LearningRate 0.0560 Epoch: 5 Global Step: 62600 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:37:35,939-Speed 3059.21 samples/sec Loss 9.8240 LearningRate 0.0559 Epoch: 5 Global Step: 62610 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:37:39,343-Speed 3009.17 samples/sec Loss 9.9918 LearningRate 0.0559 Epoch: 5 Global Step: 62620 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:37:42,751-Speed 3006.19 samples/sec Loss 10.0691 LearningRate 0.0559 Epoch: 5 Global Step: 62630 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:37:46,132-Speed 3029.53 samples/sec Loss 10.0102 LearningRate 0.0559 Epoch: 5 Global Step: 62640 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:37:49,510-Speed 3032.54 samples/sec Loss 9.8886 LearningRate 0.0559 Epoch: 5 Global Step: 62650 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:37:52,915-Speed 3007.86 samples/sec Loss 9.9772 LearningRate 0.0559 Epoch: 5 Global Step: 62660 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:37:56,405-Speed 2935.90 samples/sec Loss 9.9107 LearningRate 0.0559 Epoch: 5 Global Step: 62670 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:37:59,796-Speed 3020.32 samples/sec Loss 9.9604 LearningRate 0.0559 Epoch: 5 Global Step: 62680 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:38:03,184-Speed 3023.68 samples/sec Loss 10.0702 LearningRate 0.0559 Epoch: 5 Global Step: 62690 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:38:06,576-Speed 3019.27 samples/sec Loss 9.8848 LearningRate 0.0559 Epoch: 5 Global Step: 62700 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:38:09,967-Speed 3020.38 samples/sec Loss 10.0451 LearningRate 0.0559 Epoch: 5 Global Step: 62710 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:38:13,361-Speed 3018.04 samples/sec Loss 10.1155 LearningRate 0.0559 Epoch: 5 Global Step: 62720 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:38:16,795-Speed 2983.06 samples/sec Loss 9.9659 LearningRate 0.0559 Epoch: 5 Global Step: 62730 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:38:20,213-Speed 2996.37 samples/sec Loss 9.8999 LearningRate 0.0559 Epoch: 5 Global Step: 62740 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:38:23,638-Speed 2990.87 samples/sec Loss 10.1011 LearningRate 0.0559 Epoch: 5 Global Step: 62750 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:38:27,027-Speed 3022.53 samples/sec Loss 9.9075 LearningRate 0.0559 Epoch: 5 Global Step: 62760 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:38:30,489-Speed 2958.49 samples/sec Loss 10.0791 LearningRate 0.0558 Epoch: 5 Global Step: 62770 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:38:33,923-Speed 2983.61 samples/sec Loss 9.9984 LearningRate 0.0558 Epoch: 5 Global Step: 62780 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:38:37,329-Speed 3007.50 samples/sec Loss 9.8182 LearningRate 0.0558 Epoch: 5 Global Step: 62790 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:38:40,758-Speed 2986.72 samples/sec Loss 9.9068 LearningRate 0.0558 Epoch: 5 Global Step: 62800 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:38:44,098-Speed 3067.43 samples/sec Loss 10.0933 LearningRate 0.0558 Epoch: 5 Global Step: 62810 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:38:47,399-Speed 3102.48 samples/sec Loss 9.9042 LearningRate 0.0558 Epoch: 5 Global Step: 62820 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:38:50,776-Speed 3033.27 samples/sec Loss 10.0639 LearningRate 0.0558 Epoch: 5 Global Step: 62830 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:38:54,234-Speed 2962.30 samples/sec Loss 10.1064 LearningRate 0.0558 Epoch: 5 Global Step: 62840 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:38:57,617-Speed 3028.14 samples/sec Loss 10.0424 LearningRate 0.0558 Epoch: 5 Global Step: 62850 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:39:01,045-Speed 2987.52 samples/sec Loss 10.0327 LearningRate 0.0558 Epoch: 5 Global Step: 62860 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:39:04,417-Speed 3037.93 samples/sec Loss 9.9690 LearningRate 0.0558 Epoch: 5 Global Step: 62870 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:39:07,843-Speed 2989.63 samples/sec Loss 10.1914 LearningRate 0.0558 Epoch: 5 Global Step: 62880 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:39:11,252-Speed 3004.48 samples/sec Loss 9.9738 LearningRate 0.0558 Epoch: 5 Global Step: 62890 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:39:14,644-Speed 3020.57 samples/sec Loss 10.0844 LearningRate 0.0558 Epoch: 5 Global Step: 62900 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:39:18,096-Speed 2967.29 samples/sec Loss 10.0437 LearningRate 0.0558 Epoch: 5 Global Step: 62910 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:39:21,457-Speed 3047.04 samples/sec Loss 10.0016 LearningRate 0.0558 Epoch: 5 Global Step: 62920 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:39:24,796-Speed 3067.41 samples/sec Loss 10.0604 LearningRate 0.0558 Epoch: 5 Global Step: 62930 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:39:28,164-Speed 3041.67 samples/sec Loss 10.1423 LearningRate 0.0557 Epoch: 5 Global Step: 62940 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:39:31,527-Speed 3045.70 samples/sec Loss 10.2449 LearningRate 0.0557 Epoch: 5 Global Step: 62950 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:39:34,847-Speed 3085.49 samples/sec Loss 10.2750 LearningRate 0.0557 Epoch: 5 Global Step: 62960 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:39:38,249-Speed 3010.88 samples/sec Loss 10.1862 LearningRate 0.0557 Epoch: 5 Global Step: 62970 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:39:41,632-Speed 3027.36 samples/sec Loss 10.1229 LearningRate 0.0557 Epoch: 5 Global Step: 62980 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:39:45,070-Speed 2979.58 samples/sec Loss 10.1011 LearningRate 0.0557 Epoch: 5 Global Step: 62990 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:39:48,471-Speed 3011.28 samples/sec Loss 10.0860 LearningRate 0.0557 Epoch: 5 Global Step: 63000 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:39:51,844-Speed 3036.64 samples/sec Loss 10.0921 LearningRate 0.0557 Epoch: 5 Global Step: 63010 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:39:55,256-Speed 3001.79 samples/sec Loss 10.1756 LearningRate 0.0557 Epoch: 5 Global Step: 63020 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:39:58,603-Speed 3060.30 samples/sec Loss 10.0426 LearningRate 0.0557 Epoch: 5 Global Step: 63030 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:40:02,026-Speed 2992.50 samples/sec Loss 10.3137 LearningRate 0.0557 Epoch: 5 Global Step: 63040 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:40:05,406-Speed 3030.77 samples/sec Loss 10.2669 LearningRate 0.0557 Epoch: 5 Global Step: 63050 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:40:08,784-Speed 3031.75 samples/sec Loss 10.2827 LearningRate 0.0557 Epoch: 5 Global Step: 63060 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:40:12,140-Speed 3052.68 samples/sec Loss 10.0795 LearningRate 0.0557 Epoch: 5 Global Step: 63070 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:40:15,486-Speed 3060.99 samples/sec Loss 10.0346 LearningRate 0.0557 Epoch: 5 Global Step: 63080 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:40:18,941-Speed 2965.23 samples/sec Loss 10.1577 LearningRate 0.0557 Epoch: 5 Global Step: 63090 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:40:22,268-Speed 3079.08 samples/sec Loss 10.1251 LearningRate 0.0557 Epoch: 5 Global Step: 63100 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:40:25,626-Speed 3049.63 samples/sec Loss 10.2392 LearningRate 0.0556 Epoch: 5 Global Step: 63110 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:40:29,034-Speed 3005.42 samples/sec Loss 10.3909 LearningRate 0.0556 Epoch: 5 Global Step: 63120 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:40:32,395-Speed 3047.97 samples/sec Loss 10.2874 LearningRate 0.0556 Epoch: 5 Global Step: 63130 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:40:35,827-Speed 2984.13 samples/sec Loss 10.1617 LearningRate 0.0556 Epoch: 5 Global Step: 63140 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:40:39,242-Speed 2999.92 samples/sec Loss 10.2156 LearningRate 0.0556 Epoch: 5 Global Step: 63150 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:40:42,656-Speed 3000.83 samples/sec Loss 10.1042 LearningRate 0.0556 Epoch: 5 Global Step: 63160 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:40:46,062-Speed 3007.17 samples/sec Loss 10.0339 LearningRate 0.0556 Epoch: 5 Global Step: 63170 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:40:49,566-Speed 2923.64 samples/sec Loss 10.1711 LearningRate 0.0556 Epoch: 5 Global Step: 63180 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:40:52,927-Speed 3047.24 samples/sec Loss 10.0814 LearningRate 0.0556 Epoch: 5 Global Step: 63190 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:40:56,406-Speed 2943.77 samples/sec Loss 10.2197 LearningRate 0.0556 Epoch: 5 Global Step: 63200 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:40:59,803-Speed 3015.48 samples/sec Loss 10.1238 LearningRate 0.0556 Epoch: 5 Global Step: 63210 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:41:03,213-Speed 3003.59 samples/sec Loss 10.3654 LearningRate 0.0556 Epoch: 5 Global Step: 63220 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:41:06,651-Speed 2980.13 samples/sec Loss 10.3771 LearningRate 0.0556 Epoch: 5 Global Step: 63230 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:41:10,042-Speed 3020.12 samples/sec Loss 10.2834 LearningRate 0.0556 Epoch: 5 Global Step: 63240 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:41:13,373-Speed 3075.79 samples/sec Loss 10.1849 LearningRate 0.0556 Epoch: 5 Global Step: 63250 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:41:16,764-Speed 3020.49 samples/sec Loss 10.1723 LearningRate 0.0556 Epoch: 5 Global Step: 63260 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:41:20,114-Speed 3058.07 samples/sec Loss 10.4094 LearningRate 0.0555 Epoch: 5 Global Step: 63270 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:41:23,531-Speed 2997.64 samples/sec Loss 10.2561 LearningRate 0.0555 Epoch: 5 Global Step: 63280 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:41:26,879-Speed 3059.02 samples/sec Loss 10.2356 LearningRate 0.0555 Epoch: 5 Global Step: 63290 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:41:30,223-Speed 3063.02 samples/sec Loss 10.1762 LearningRate 0.0555 Epoch: 5 Global Step: 63300 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:41:33,567-Speed 3063.55 samples/sec Loss 10.1985 LearningRate 0.0555 Epoch: 5 Global Step: 63310 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:41:36,990-Speed 2991.92 samples/sec Loss 10.1745 LearningRate 0.0555 Epoch: 5 Global Step: 63320 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:41:40,346-Speed 3052.34 samples/sec Loss 10.3111 LearningRate 0.0555 Epoch: 5 Global Step: 63330 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:41:43,719-Speed 3036.50 samples/sec Loss 10.1673 LearningRate 0.0555 Epoch: 5 Global Step: 63340 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:41:47,056-Speed 3069.53 samples/sec Loss 10.1893 LearningRate 0.0555 Epoch: 5 Global Step: 63350 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:41:50,482-Speed 2989.77 samples/sec Loss 10.2655 LearningRate 0.0555 Epoch: 5 Global Step: 63360 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:41:53,942-Speed 2960.18 samples/sec Loss 10.2569 LearningRate 0.0555 Epoch: 5 Global Step: 63370 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:41:57,346-Speed 3009.65 samples/sec Loss 10.4852 LearningRate 0.0555 Epoch: 5 Global Step: 63380 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:42:00,719-Speed 3035.89 samples/sec Loss 10.5209 LearningRate 0.0555 Epoch: 5 Global Step: 63390 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:42:04,148-Speed 2987.35 samples/sec Loss 10.4006 LearningRate 0.0555 Epoch: 5 Global Step: 63400 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:42:07,554-Speed 3007.58 samples/sec Loss 10.3950 LearningRate 0.0555 Epoch: 5 Global Step: 63410 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:42:10,960-Speed 3007.07 samples/sec Loss 10.4323 LearningRate 0.0555 Epoch: 5 Global Step: 63420 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:42:14,332-Speed 3037.76 samples/sec Loss 10.2640 LearningRate 0.0555 Epoch: 5 Global Step: 63430 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:42:17,665-Speed 3073.94 samples/sec Loss 10.2056 LearningRate 0.0554 Epoch: 5 Global Step: 63440 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:42:21,127-Speed 2957.97 samples/sec Loss 10.4892 LearningRate 0.0554 Epoch: 5 Global Step: 63450 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:42:24,511-Speed 3027.60 samples/sec Loss 10.3815 LearningRate 0.0554 Epoch: 5 Global Step: 63460 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:42:27,861-Speed 3056.72 samples/sec Loss 10.4426 LearningRate 0.0554 Epoch: 5 Global Step: 63470 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:42:31,192-Speed 3075.20 samples/sec Loss 10.2695 LearningRate 0.0554 Epoch: 5 Global Step: 63480 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:42:34,631-Speed 2978.77 samples/sec Loss 10.2664 LearningRate 0.0554 Epoch: 5 Global Step: 63490 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:42:37,993-Speed 3046.46 samples/sec Loss 10.3140 LearningRate 0.0554 Epoch: 5 Global Step: 63500 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:42:41,423-Speed 2986.43 samples/sec Loss 10.4192 LearningRate 0.0554 Epoch: 5 Global Step: 63510 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:42:44,791-Speed 3041.14 samples/sec Loss 10.2234 LearningRate 0.0554 Epoch: 5 Global Step: 63520 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:42:48,165-Speed 3036.31 samples/sec Loss 10.2599 LearningRate 0.0554 Epoch: 5 Global Step: 63530 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:42:51,546-Speed 3029.40 samples/sec Loss 10.2370 LearningRate 0.0554 Epoch: 5 Global Step: 63540 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:42:54,858-Speed 3092.27 samples/sec Loss 10.4867 LearningRate 0.0554 Epoch: 5 Global Step: 63550 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:42:58,227-Speed 3040.54 samples/sec Loss 10.3328 LearningRate 0.0554 Epoch: 5 Global Step: 63560 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:43:01,648-Speed 2994.19 samples/sec Loss 10.3742 LearningRate 0.0554 Epoch: 5 Global Step: 63570 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:43:05,075-Speed 2989.26 samples/sec Loss 10.4058 LearningRate 0.0554 Epoch: 5 Global Step: 63580 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:43:08,437-Speed 3046.96 samples/sec Loss 10.3098 LearningRate 0.0554 Epoch: 5 Global Step: 63590 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:43:11,773-Speed 3070.14 samples/sec Loss 10.4993 LearningRate 0.0554 Epoch: 5 Global Step: 63600 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:43:15,120-Speed 3060.56 samples/sec Loss 10.4242 LearningRate 0.0553 Epoch: 5 Global Step: 63610 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:43:18,484-Speed 3045.30 samples/sec Loss 10.5310 LearningRate 0.0553 Epoch: 5 Global Step: 63620 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:43:21,917-Speed 2983.49 samples/sec Loss 10.3320 LearningRate 0.0553 Epoch: 5 Global Step: 63630 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:43:25,280-Speed 3045.68 samples/sec Loss 10.4068 LearningRate 0.0553 Epoch: 5 Global Step: 63640 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:43:28,588-Speed 3096.97 samples/sec Loss 10.4640 LearningRate 0.0553 Epoch: 5 Global Step: 63650 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:43:31,963-Speed 3034.76 samples/sec Loss 10.5032 LearningRate 0.0553 Epoch: 5 Global Step: 63660 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:43:35,382-Speed 2995.40 samples/sec Loss 10.2540 LearningRate 0.0553 Epoch: 5 Global Step: 63670 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:43:38,755-Speed 3037.53 samples/sec Loss 10.3982 LearningRate 0.0553 Epoch: 5 Global Step: 63680 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:43:42,151-Speed 3015.21 samples/sec Loss 10.4053 LearningRate 0.0553 Epoch: 5 Global Step: 63690 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:43:45,463-Speed 3093.32 samples/sec Loss 10.4485 LearningRate 0.0553 Epoch: 5 Global Step: 63700 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:43:48,788-Speed 3080.34 samples/sec Loss 10.4929 LearningRate 0.0553 Epoch: 5 Global Step: 63710 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:43:52,188-Speed 3013.12 samples/sec Loss 10.5086 LearningRate 0.0553 Epoch: 5 Global Step: 63720 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:43:55,579-Speed 3020.74 samples/sec Loss 10.4957 LearningRate 0.0553 Epoch: 5 Global Step: 63730 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:43:58,933-Speed 3053.89 samples/sec Loss 10.5494 LearningRate 0.0553 Epoch: 5 Global Step: 63740 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:44:02,356-Speed 2992.08 samples/sec Loss 10.5097 LearningRate 0.0553 Epoch: 5 Global Step: 63750 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:44:05,754-Speed 3014.79 samples/sec Loss 10.2893 LearningRate 0.0553 Epoch: 5 Global Step: 63760 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:44:09,202-Speed 2970.93 samples/sec Loss 10.4935 LearningRate 0.0552 Epoch: 5 Global Step: 63770 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:44:12,603-Speed 3011.33 samples/sec Loss 10.4717 LearningRate 0.0552 Epoch: 5 Global Step: 63780 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:44:16,093-Speed 2935.10 samples/sec Loss 10.4197 LearningRate 0.0552 Epoch: 5 Global Step: 63790 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:44:19,537-Speed 2974.01 samples/sec Loss 10.4784 LearningRate 0.0552 Epoch: 5 Global Step: 63800 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:44:23,025-Speed 2937.07 samples/sec Loss 10.3926 LearningRate 0.0552 Epoch: 5 Global Step: 63810 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:44:26,426-Speed 3011.00 samples/sec Loss 10.4959 LearningRate 0.0552 Epoch: 5 Global Step: 63820 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:44:29,884-Speed 2962.49 samples/sec Loss 10.5062 LearningRate 0.0552 Epoch: 5 Global Step: 63830 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:44:33,214-Speed 3075.62 samples/sec Loss 10.5311 LearningRate 0.0552 Epoch: 5 Global Step: 63840 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:44:36,624-Speed 3004.35 samples/sec Loss 10.4073 LearningRate 0.0552 Epoch: 5 Global Step: 63850 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:44:40,086-Speed 2958.59 samples/sec Loss 10.6511 LearningRate 0.0552 Epoch: 5 Global Step: 63860 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:44:43,446-Speed 3049.02 samples/sec Loss 10.6344 LearningRate 0.0552 Epoch: 5 Global Step: 63870 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:44:46,840-Speed 3017.28 samples/sec Loss 10.5207 LearningRate 0.0552 Epoch: 5 Global Step: 63880 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:44:50,253-Speed 3001.58 samples/sec Loss 10.5527 LearningRate 0.0552 Epoch: 5 Global Step: 63890 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:44:53,698-Speed 2973.08 samples/sec Loss 10.6098 LearningRate 0.0552 Epoch: 5 Global Step: 63900 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:44:57,069-Speed 3038.41 samples/sec Loss 10.5222 LearningRate 0.0552 Epoch: 5 Global Step: 63910 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:45:00,464-Speed 3017.55 samples/sec Loss 10.5640 LearningRate 0.0552 Epoch: 5 Global Step: 63920 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:45:03,896-Speed 2983.95 samples/sec Loss 10.5806 LearningRate 0.0552 Epoch: 5 Global Step: 63930 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:45:07,267-Speed 3038.86 samples/sec Loss 10.4607 LearningRate 0.0551 Epoch: 5 Global Step: 63940 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:45:10,674-Speed 3006.66 samples/sec Loss 10.4725 LearningRate 0.0551 Epoch: 5 Global Step: 63950 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:45:14,071-Speed 3014.91 samples/sec Loss 10.4276 LearningRate 0.0551 Epoch: 5 Global Step: 63960 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:45:17,417-Speed 3060.97 samples/sec Loss 10.4739 LearningRate 0.0551 Epoch: 5 Global Step: 63970 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:45:20,745-Speed 3078.87 samples/sec Loss 10.4910 LearningRate 0.0551 Epoch: 5 Global Step: 63980 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:45:24,062-Speed 3087.32 samples/sec Loss 10.5808 LearningRate 0.0551 Epoch: 5 Global Step: 63990 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:45:27,413-Speed 3057.32 samples/sec Loss 10.5820 LearningRate 0.0551 Epoch: 5 Global Step: 64000 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:45:30,808-Speed 3016.67 samples/sec Loss 10.5257 LearningRate 0.0551 Epoch: 5 Global Step: 64010 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:45:34,137-Speed 3076.75 samples/sec Loss 10.5586 LearningRate 0.0551 Epoch: 5 Global Step: 64020 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:45:37,645-Speed 2920.46 samples/sec Loss 10.6684 LearningRate 0.0551 Epoch: 5 Global Step: 64030 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:45:41,041-Speed 3016.51 samples/sec Loss 10.3975 LearningRate 0.0551 Epoch: 5 Global Step: 64040 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:45:44,368-Speed 3078.19 samples/sec Loss 10.5731 LearningRate 0.0551 Epoch: 5 Global Step: 64050 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:45:47,790-Speed 2993.89 samples/sec Loss 10.5381 LearningRate 0.0551 Epoch: 5 Global Step: 64060 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:45:51,223-Speed 2983.33 samples/sec Loss 10.4621 LearningRate 0.0551 Epoch: 5 Global Step: 64070 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:45:54,655-Speed 2984.52 samples/sec Loss 10.4919 LearningRate 0.0551 Epoch: 5 Global Step: 64080 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:45:58,094-Speed 2978.77 samples/sec Loss 10.5207 LearningRate 0.0551 Epoch: 5 Global Step: 64090 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:46:01,471-Speed 3033.01 samples/sec Loss 10.5132 LearningRate 0.0551 Epoch: 5 Global Step: 64100 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:46:04,893-Speed 2993.60 samples/sec Loss 10.4540 LearningRate 0.0550 Epoch: 5 Global Step: 64110 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:46:08,284-Speed 3020.41 samples/sec Loss 10.6119 LearningRate 0.0550 Epoch: 5 Global Step: 64120 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:46:11,742-Speed 2961.85 samples/sec Loss 10.6403 LearningRate 0.0550 Epoch: 5 Global Step: 64130 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:46:15,251-Speed 2919.52 samples/sec Loss 10.5548 LearningRate 0.0550 Epoch: 5 Global Step: 64140 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:46:18,605-Speed 3053.39 samples/sec Loss 10.5986 LearningRate 0.0550 Epoch: 5 Global Step: 64150 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:46:21,921-Speed 3089.06 samples/sec Loss 10.5812 LearningRate 0.0550 Epoch: 5 Global Step: 64160 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:46:25,323-Speed 3011.15 samples/sec Loss 10.4670 LearningRate 0.0550 Epoch: 5 Global Step: 64170 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:46:28,764-Speed 2976.94 samples/sec Loss 10.5403 LearningRate 0.0550 Epoch: 5 Global Step: 64180 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:46:32,154-Speed 3021.80 samples/sec Loss 10.6167 LearningRate 0.0550 Epoch: 5 Global Step: 64190 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:46:35,590-Speed 2980.57 samples/sec Loss 10.5670 LearningRate 0.0550 Epoch: 5 Global Step: 64200 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:46:38,999-Speed 3005.21 samples/sec Loss 10.6290 LearningRate 0.0550 Epoch: 5 Global Step: 64210 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:46:42,319-Speed 3084.83 samples/sec Loss 10.5219 LearningRate 0.0550 Epoch: 5 Global Step: 64220 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:46:45,748-Speed 2987.08 samples/sec Loss 10.5588 LearningRate 0.0550 Epoch: 5 Global Step: 64230 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:46:49,106-Speed 3051.31 samples/sec Loss 10.5975 LearningRate 0.0550 Epoch: 5 Global Step: 64240 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:46:52,497-Speed 3020.56 samples/sec Loss 10.6644 LearningRate 0.0550 Epoch: 5 Global Step: 64250 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:46:55,880-Speed 3027.63 samples/sec Loss 10.7322 LearningRate 0.0550 Epoch: 5 Global Step: 64260 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:46:59,266-Speed 3025.56 samples/sec Loss 10.5495 LearningRate 0.0550 Epoch: 5 Global Step: 64270 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:47:02,587-Speed 3083.80 samples/sec Loss 10.5599 LearningRate 0.0549 Epoch: 5 Global Step: 64280 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:47:05,956-Speed 3040.31 samples/sec Loss 10.7597 LearningRate 0.0549 Epoch: 5 Global Step: 64290 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:47:09,298-Speed 3065.40 samples/sec Loss 10.4944 LearningRate 0.0549 Epoch: 5 Global Step: 64300 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:47:12,607-Speed 3095.41 samples/sec Loss 10.7199 LearningRate 0.0549 Epoch: 5 Global Step: 64310 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:47:16,112-Speed 2922.08 samples/sec Loss 10.6751 LearningRate 0.0549 Epoch: 5 Global Step: 64320 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:47:19,419-Speed 3097.27 samples/sec Loss 10.5976 LearningRate 0.0549 Epoch: 5 Global Step: 64330 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:47:22,774-Speed 3053.52 samples/sec Loss 10.5441 LearningRate 0.0549 Epoch: 5 Global Step: 64340 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:47:26,126-Speed 3055.65 samples/sec Loss 10.6758 LearningRate 0.0549 Epoch: 5 Global Step: 64350 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:47:29,493-Speed 3042.46 samples/sec Loss 10.6265 LearningRate 0.0549 Epoch: 5 Global Step: 64360 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:47:32,877-Speed 3027.14 samples/sec Loss 10.6009 LearningRate 0.0549 Epoch: 5 Global Step: 64370 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:47:36,250-Speed 3036.02 samples/sec Loss 10.5105 LearningRate 0.0549 Epoch: 5 Global Step: 64380 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-04-27 07:47:39,564-Speed 3091.70 samples/sec Loss 10.7475 LearningRate 0.0549 Epoch: 5 Global Step: 64390 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:47:42,958-Speed 3017.90 samples/sec Loss 10.6747 LearningRate 0.0549 Epoch: 5 Global Step: 64400 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:47:46,363-Speed 3007.84 samples/sec Loss 10.5406 LearningRate 0.0549 Epoch: 5 Global Step: 64410 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 07:47:49,754-Speed 3021.12 samples/sec Loss 10.6328 LearningRate 0.0549 Epoch: 5 Global Step: 64420 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 07:47:53,067-Speed 3091.76 samples/sec Loss 10.5414 LearningRate 0.0549 Epoch: 5 Global Step: 64430 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 07:47:56,439-Speed 3037.21 samples/sec Loss 10.7676 LearningRate 0.0548 Epoch: 5 Global Step: 64440 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 07:47:59,791-Speed 3056.24 samples/sec Loss 10.5226 LearningRate 0.0548 Epoch: 5 Global Step: 64450 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 07:48:03,128-Speed 3069.57 samples/sec Loss 10.6756 LearningRate 0.0548 Epoch: 5 Global Step: 64460 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 07:48:06,591-Speed 2957.34 samples/sec Loss 10.5627 LearningRate 0.0548 Epoch: 5 Global Step: 64470 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 07:48:09,984-Speed 3019.20 samples/sec Loss 10.4827 LearningRate 0.0548 Epoch: 5 Global Step: 64480 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 07:48:13,411-Speed 2989.16 samples/sec Loss 10.6297 LearningRate 0.0548 Epoch: 5 Global Step: 64490 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 07:48:16,827-Speed 2998.01 samples/sec Loss 10.6500 LearningRate 0.0548 Epoch: 5 Global Step: 64500 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 07:48:20,273-Speed 2973.14 samples/sec Loss 10.6303 LearningRate 0.0548 Epoch: 5 Global Step: 64510 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:48:23,584-Speed 3092.59 samples/sec Loss 10.6915 LearningRate 0.0548 Epoch: 5 Global Step: 64520 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:48:26,991-Speed 3006.83 samples/sec Loss 10.5980 LearningRate 0.0548 Epoch: 5 Global Step: 64530 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:48:30,436-Speed 2973.29 samples/sec Loss 10.8288 LearningRate 0.0548 Epoch: 5 Global Step: 64540 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:48:33,850-Speed 3000.13 samples/sec Loss 10.6002 LearningRate 0.0548 Epoch: 5 Global Step: 64550 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:48:37,200-Speed 3057.76 samples/sec Loss 10.5768 LearningRate 0.0548 Epoch: 5 Global Step: 64560 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:48:40,680-Speed 2943.23 samples/sec Loss 10.6320 LearningRate 0.0548 Epoch: 5 Global Step: 64570 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:48:44,077-Speed 3015.31 samples/sec Loss 10.5357 LearningRate 0.0548 Epoch: 5 Global Step: 64580 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:48:47,490-Speed 3001.72 samples/sec Loss 10.6220 LearningRate 0.0548 Epoch: 5 Global Step: 64590 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:48:50,901-Speed 3003.30 samples/sec Loss 10.7337 LearningRate 0.0548 Epoch: 5 Global Step: 64600 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:48:54,346-Speed 2972.88 samples/sec Loss 10.6445 LearningRate 0.0547 Epoch: 5 Global Step: 64610 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:48:57,690-Speed 3063.38 samples/sec Loss 10.7399 LearningRate 0.0547 Epoch: 5 Global Step: 64620 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:49:01,063-Speed 3036.99 samples/sec Loss 10.7357 LearningRate 0.0547 Epoch: 5 Global Step: 64630 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:49:04,460-Speed 3015.31 samples/sec Loss 10.6797 LearningRate 0.0547 Epoch: 5 Global Step: 64640 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:49:07,892-Speed 2984.50 samples/sec Loss 10.6272 LearningRate 0.0547 Epoch: 5 Global Step: 64650 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:49:11,265-Speed 3037.73 samples/sec Loss 10.7166 LearningRate 0.0547 Epoch: 5 Global Step: 64660 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:49:14,639-Speed 3035.64 samples/sec Loss 10.6290 LearningRate 0.0547 Epoch: 5 Global Step: 64670 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:49:18,038-Speed 3013.16 samples/sec Loss 10.5954 LearningRate 0.0547 Epoch: 5 Global Step: 64680 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:49:21,398-Speed 3049.51 samples/sec Loss 10.6760 LearningRate 0.0547 Epoch: 5 Global Step: 64690 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:49:24,854-Speed 2963.39 samples/sec Loss 10.6961 LearningRate 0.0547 Epoch: 5 Global Step: 64700 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:49:28,234-Speed 3030.82 samples/sec Loss 10.6296 LearningRate 0.0547 Epoch: 5 Global Step: 64710 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:49:31,636-Speed 3010.95 samples/sec Loss 10.6559 LearningRate 0.0547 Epoch: 5 Global Step: 64720 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:49:34,994-Speed 3049.71 samples/sec Loss 10.6398 LearningRate 0.0547 Epoch: 5 Global Step: 64730 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:49:38,448-Speed 2965.74 samples/sec Loss 10.6571 LearningRate 0.0547 Epoch: 5 Global Step: 64740 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:49:41,891-Speed 2974.71 samples/sec Loss 10.7311 LearningRate 0.0547 Epoch: 5 Global Step: 64750 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:49:45,241-Speed 3057.71 samples/sec Loss 10.5822 LearningRate 0.0547 Epoch: 5 Global Step: 64760 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:49:48,714-Speed 2949.42 samples/sec Loss 10.6384 LearningRate 0.0547 Epoch: 5 Global Step: 64770 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:49:52,171-Speed 2963.78 samples/sec Loss 10.6765 LearningRate 0.0546 Epoch: 5 Global Step: 64780 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:49:55,538-Speed 3041.61 samples/sec Loss 10.5933 LearningRate 0.0546 Epoch: 5 Global Step: 64790 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:49:58,911-Speed 3036.87 samples/sec Loss 10.7680 LearningRate 0.0546 Epoch: 5 Global Step: 64800 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:50:02,353-Speed 2976.46 samples/sec Loss 10.7506 LearningRate 0.0546 Epoch: 5 Global Step: 64810 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:50:05,781-Speed 2987.86 samples/sec Loss 10.6981 LearningRate 0.0546 Epoch: 5 Global Step: 64820 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:50:09,211-Speed 2985.93 samples/sec Loss 10.5830 LearningRate 0.0546 Epoch: 5 Global Step: 64830 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:50:12,625-Speed 3000.43 samples/sec Loss 10.7202 LearningRate 0.0546 Epoch: 5 Global Step: 64840 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:50:16,019-Speed 3017.80 samples/sec Loss 10.7496 LearningRate 0.0546 Epoch: 5 Global Step: 64850 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:50:19,424-Speed 3008.99 samples/sec Loss 10.8133 LearningRate 0.0546 Epoch: 5 Global Step: 64860 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:50:22,919-Speed 2930.51 samples/sec Loss 10.6322 LearningRate 0.0546 Epoch: 5 Global Step: 64870 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:50:26,249-Speed 3076.18 samples/sec Loss 10.7574 LearningRate 0.0546 Epoch: 5 Global Step: 64880 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:50:29,567-Speed 3087.29 samples/sec Loss 10.9369 LearningRate 0.0546 Epoch: 5 Global Step: 64890 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:50:32,934-Speed 3041.60 samples/sec Loss 10.7420 LearningRate 0.0546 Epoch: 5 Global Step: 64900 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:50:36,290-Speed 3052.02 samples/sec Loss 10.6949 LearningRate 0.0546 Epoch: 5 Global Step: 64910 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:50:39,640-Speed 3058.09 samples/sec Loss 10.7330 LearningRate 0.0546 Epoch: 5 Global Step: 64920 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:50:43,038-Speed 3014.16 samples/sec Loss 10.8981 LearningRate 0.0546 Epoch: 5 Global Step: 64930 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:50:46,418-Speed 3030.26 samples/sec Loss 10.6026 LearningRate 0.0546 Epoch: 5 Global Step: 64940 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:50:49,767-Speed 3058.65 samples/sec Loss 10.4884 LearningRate 0.0545 Epoch: 5 Global Step: 64950 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:50:53,155-Speed 3023.88 samples/sec Loss 10.7805 LearningRate 0.0545 Epoch: 5 Global Step: 64960 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:50:56,489-Speed 3072.42 samples/sec Loss 10.6522 LearningRate 0.0545 Epoch: 5 Global Step: 64970 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:50:59,867-Speed 3032.23 samples/sec Loss 10.7310 LearningRate 0.0545 Epoch: 5 Global Step: 64980 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:51:03,222-Speed 3053.72 samples/sec Loss 10.6713 LearningRate 0.0545 Epoch: 5 Global Step: 64990 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:51:06,620-Speed 3013.72 samples/sec Loss 10.6581 LearningRate 0.0545 Epoch: 5 Global Step: 65000 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:51:10,011-Speed 3021.22 samples/sec Loss 10.6048 LearningRate 0.0545 Epoch: 5 Global Step: 65010 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:51:13,376-Speed 3043.75 samples/sec Loss 10.5077 LearningRate 0.0545 Epoch: 5 Global Step: 65020 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:51:16,745-Speed 3040.64 samples/sec Loss 10.6764 LearningRate 0.0545 Epoch: 5 Global Step: 65030 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:51:20,175-Speed 2986.55 samples/sec Loss 10.7924 LearningRate 0.0545 Epoch: 5 Global Step: 65040 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:51:23,592-Speed 2997.49 samples/sec Loss 10.8096 LearningRate 0.0545 Epoch: 5 Global Step: 65050 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:51:27,044-Speed 2967.37 samples/sec Loss 10.8178 LearningRate 0.0545 Epoch: 5 Global Step: 65060 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:51:30,529-Speed 2939.23 samples/sec Loss 10.6211 LearningRate 0.0545 Epoch: 5 Global Step: 65070 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:51:33,894-Speed 3044.51 samples/sec Loss 10.7413 LearningRate 0.0545 Epoch: 5 Global Step: 65080 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:51:37,248-Speed 3053.52 samples/sec Loss 10.7276 LearningRate 0.0545 Epoch: 5 Global Step: 65090 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:51:40,598-Speed 3057.62 samples/sec Loss 10.7941 LearningRate 0.0545 Epoch: 5 Global Step: 65100 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:51:44,056-Speed 2962.62 samples/sec Loss 10.6953 LearningRate 0.0545 Epoch: 5 Global Step: 65110 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:51:47,421-Speed 3043.51 samples/sec Loss 10.7406 LearningRate 0.0544 Epoch: 5 Global Step: 65120 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:51:50,791-Speed 3039.92 samples/sec Loss 10.6936 LearningRate 0.0544 Epoch: 5 Global Step: 65130 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:51:54,107-Speed 3089.35 samples/sec Loss 10.7246 LearningRate 0.0544 Epoch: 5 Global Step: 65140 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:51:57,492-Speed 3025.52 samples/sec Loss 10.6966 LearningRate 0.0544 Epoch: 5 Global Step: 65150 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:52:00,824-Speed 3074.68 samples/sec Loss 10.6426 LearningRate 0.0544 Epoch: 5 Global Step: 65160 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:52:04,163-Speed 3067.22 samples/sec Loss 10.6004 LearningRate 0.0544 Epoch: 5 Global Step: 65170 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:52:07,475-Speed 3093.26 samples/sec Loss 10.8649 LearningRate 0.0544 Epoch: 5 Global Step: 65180 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:52:10,762-Speed 3117.03 samples/sec Loss 10.7255 LearningRate 0.0544 Epoch: 5 Global Step: 65190 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:52:14,092-Speed 3075.12 samples/sec Loss 10.7315 LearningRate 0.0544 Epoch: 5 Global Step: 65200 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:52:17,521-Speed 2987.17 samples/sec Loss 10.7873 LearningRate 0.0544 Epoch: 5 Global Step: 65210 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:52:20,925-Speed 3009.40 samples/sec Loss 10.6691 LearningRate 0.0544 Epoch: 5 Global Step: 65220 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:52:24,347-Speed 2993.74 samples/sec Loss 10.7433 LearningRate 0.0544 Epoch: 5 Global Step: 65230 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:52:27,750-Speed 3009.59 samples/sec Loss 10.7275 LearningRate 0.0544 Epoch: 5 Global Step: 65240 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:52:31,102-Speed 3056.74 samples/sec Loss 10.7189 LearningRate 0.0544 Epoch: 5 Global Step: 65250 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:52:34,453-Speed 3056.26 samples/sec Loss 10.7449 LearningRate 0.0544 Epoch: 5 Global Step: 65260 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:52:37,831-Speed 3032.81 samples/sec Loss 10.6309 LearningRate 0.0544 Epoch: 5 Global Step: 65270 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:52:41,217-Speed 3024.51 samples/sec Loss 10.5987 LearningRate 0.0543 Epoch: 5 Global Step: 65280 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:52:44,574-Speed 3051.35 samples/sec Loss 10.8643 LearningRate 0.0543 Epoch: 5 Global Step: 65290 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-04-27 07:52:47,888-Speed 3090.93 samples/sec Loss 10.8580 LearningRate 0.0543 Epoch: 5 Global Step: 65300 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:52:51,204-Speed 3089.06 samples/sec Loss 10.6975 LearningRate 0.0543 Epoch: 5 Global Step: 65310 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:52:54,608-Speed 3009.28 samples/sec Loss 10.8619 LearningRate 0.0543 Epoch: 5 Global Step: 65320 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:52:57,934-Speed 3079.47 samples/sec Loss 10.7533 LearningRate 0.0543 Epoch: 5 Global Step: 65330 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:53:01,347-Speed 3001.60 samples/sec Loss 10.7662 LearningRate 0.0543 Epoch: 5 Global Step: 65340 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:53:04,675-Speed 3077.89 samples/sec Loss 10.7052 LearningRate 0.0543 Epoch: 5 Global Step: 65350 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:53:08,109-Speed 2983.35 samples/sec Loss 10.7921 LearningRate 0.0543 Epoch: 5 Global Step: 65360 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:53:11,470-Speed 3047.72 samples/sec Loss 10.6530 LearningRate 0.0543 Epoch: 5 Global Step: 65370 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:53:14,816-Speed 3060.79 samples/sec Loss 10.5740 LearningRate 0.0543 Epoch: 5 Global Step: 65380 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:53:18,131-Speed 3089.88 samples/sec Loss 10.7836 LearningRate 0.0543 Epoch: 5 Global Step: 65390 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:53:21,467-Speed 3070.32 samples/sec Loss 10.7657 LearningRate 0.0543 Epoch: 5 Global Step: 65400 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:53:24,844-Speed 3033.71 samples/sec Loss 10.7137 LearningRate 0.0543 Epoch: 5 Global Step: 65410 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:53:28,164-Speed 3085.02 samples/sec Loss 10.6421 LearningRate 0.0543 Epoch: 5 Global Step: 65420 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:53:31,620-Speed 2963.74 samples/sec Loss 10.6673 LearningRate 0.0543 Epoch: 5 Global Step: 65430 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:53:34,987-Speed 3042.57 samples/sec Loss 10.6609 LearningRate 0.0543 Epoch: 5 Global Step: 65440 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:53:38,359-Speed 3037.94 samples/sec Loss 10.6990 LearningRate 0.0542 Epoch: 5 Global Step: 65450 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:53:41,791-Speed 2984.76 samples/sec Loss 10.7460 LearningRate 0.0542 Epoch: 5 Global Step: 65460 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:53:45,252-Speed 2959.53 samples/sec Loss 10.7478 LearningRate 0.0542 Epoch: 5 Global Step: 65470 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:53:48,598-Speed 3060.93 samples/sec Loss 10.7959 LearningRate 0.0542 Epoch: 5 Global Step: 65480 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:53:51,897-Speed 3105.76 samples/sec Loss 10.7891 LearningRate 0.0542 Epoch: 5 Global Step: 65490 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:53:55,238-Speed 3066.30 samples/sec Loss 10.7141 LearningRate 0.0542 Epoch: 5 Global Step: 65500 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:53:58,689-Speed 2968.37 samples/sec Loss 10.7715 LearningRate 0.0542 Epoch: 5 Global Step: 65510 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:54:02,017-Speed 3077.63 samples/sec Loss 10.7318 LearningRate 0.0542 Epoch: 5 Global Step: 65520 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:54:05,402-Speed 3026.03 samples/sec Loss 10.5821 LearningRate 0.0542 Epoch: 5 Global Step: 65530 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:54:08,770-Speed 3041.36 samples/sec Loss 10.6756 LearningRate 0.0542 Epoch: 5 Global Step: 65540 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:54:12,188-Speed 2996.96 samples/sec Loss 10.7050 LearningRate 0.0542 Epoch: 5 Global Step: 65550 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:54:15,531-Speed 3063.32 samples/sec Loss 10.7405 LearningRate 0.0542 Epoch: 5 Global Step: 65560 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:54:18,870-Speed 3067.76 samples/sec Loss 10.7285 LearningRate 0.0542 Epoch: 5 Global Step: 65570 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:54:22,184-Speed 3090.94 samples/sec Loss 10.6332 LearningRate 0.0542 Epoch: 5 Global Step: 65580 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:54:25,571-Speed 3024.63 samples/sec Loss 10.8082 LearningRate 0.0542 Epoch: 5 Global Step: 65590 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:54:28,958-Speed 3023.52 samples/sec Loss 10.8322 LearningRate 0.0542 Epoch: 5 Global Step: 65600 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:54:32,374-Speed 2998.88 samples/sec Loss 10.7941 LearningRate 0.0542 Epoch: 5 Global Step: 65610 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:54:35,731-Speed 3050.86 samples/sec Loss 10.6839 LearningRate 0.0541 Epoch: 5 Global Step: 65620 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:54:39,101-Speed 3039.82 samples/sec Loss 10.6138 LearningRate 0.0541 Epoch: 5 Global Step: 65630 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:54:42,461-Speed 3048.32 samples/sec Loss 10.7012 LearningRate 0.0541 Epoch: 5 Global Step: 65640 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:54:45,804-Speed 3063.72 samples/sec Loss 10.6601 LearningRate 0.0541 Epoch: 5 Global Step: 65650 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:54:49,133-Speed 3077.40 samples/sec Loss 10.7972 LearningRate 0.0541 Epoch: 5 Global Step: 65660 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:54:52,501-Speed 3040.95 samples/sec Loss 10.7352 LearningRate 0.0541 Epoch: 5 Global Step: 65670 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:54:55,840-Speed 3067.86 samples/sec Loss 10.7868 LearningRate 0.0541 Epoch: 5 Global Step: 65680 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:54:59,186-Speed 3061.23 samples/sec Loss 10.7478 LearningRate 0.0541 Epoch: 5 Global Step: 65690 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:55:02,537-Speed 3056.86 samples/sec Loss 10.7550 LearningRate 0.0541 Epoch: 5 Global Step: 65700 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:55:05,912-Speed 3034.94 samples/sec Loss 10.6302 LearningRate 0.0541 Epoch: 5 Global Step: 65710 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:55:09,300-Speed 3023.24 samples/sec Loss 10.6196 LearningRate 0.0541 Epoch: 5 Global Step: 65720 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:55:12,638-Speed 3068.50 samples/sec Loss 10.6415 LearningRate 0.0541 Epoch: 5 Global Step: 65730 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:55:15,988-Speed 3057.50 samples/sec Loss 10.6640 LearningRate 0.0541 Epoch: 5 Global Step: 65740 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:55:19,474-Speed 2938.70 samples/sec Loss 10.5794 LearningRate 0.0541 Epoch: 5 Global Step: 65750 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:55:22,823-Speed 3058.70 samples/sec Loss 10.7871 LearningRate 0.0541 Epoch: 5 Global Step: 65760 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:55:26,207-Speed 3026.00 samples/sec Loss 10.6960 LearningRate 0.0541 Epoch: 5 Global Step: 65770 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:55:29,641-Speed 2983.39 samples/sec Loss 10.8620 LearningRate 0.0541 Epoch: 5 Global Step: 65780 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:55:33,135-Speed 2932.01 samples/sec Loss 10.5894 LearningRate 0.0540 Epoch: 5 Global Step: 65790 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:55:36,491-Speed 3052.04 samples/sec Loss 10.8274 LearningRate 0.0540 Epoch: 5 Global Step: 65800 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:55:39,939-Speed 2970.79 samples/sec Loss 10.7344 LearningRate 0.0540 Epoch: 5 Global Step: 65810 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:55:43,369-Speed 2986.25 samples/sec Loss 10.8136 LearningRate 0.0540 Epoch: 5 Global Step: 65820 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:55:46,757-Speed 3023.04 samples/sec Loss 10.6432 LearningRate 0.0540 Epoch: 5 Global Step: 65830 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:55:50,272-Speed 2914.44 samples/sec Loss 10.7327 LearningRate 0.0540 Epoch: 5 Global Step: 65840 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:55:53,633-Speed 3047.29 samples/sec Loss 10.6697 LearningRate 0.0540 Epoch: 5 Global Step: 65850 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:55:57,068-Speed 2981.25 samples/sec Loss 10.7392 LearningRate 0.0540 Epoch: 5 Global Step: 65860 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:56:00,560-Speed 2934.08 samples/sec Loss 10.7960 LearningRate 0.0540 Epoch: 5 Global Step: 65870 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:56:03,869-Speed 3095.68 samples/sec Loss 10.6664 LearningRate 0.0540 Epoch: 5 Global Step: 65880 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:56:07,281-Speed 3001.27 samples/sec Loss 10.7440 LearningRate 0.0540 Epoch: 5 Global Step: 65890 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:56:10,729-Speed 2971.43 samples/sec Loss 10.7427 LearningRate 0.0540 Epoch: 5 Global Step: 65900 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:56:14,248-Speed 2910.49 samples/sec Loss 10.5801 LearningRate 0.0540 Epoch: 5 Global Step: 65910 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:56:17,607-Speed 3049.74 samples/sec Loss 10.7318 LearningRate 0.0540 Epoch: 5 Global Step: 65920 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:56:21,029-Speed 2993.06 samples/sec Loss 10.8733 LearningRate 0.0540 Epoch: 5 Global Step: 65930 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:56:24,371-Speed 3065.09 samples/sec Loss 10.6831 LearningRate 0.0540 Epoch: 5 Global Step: 65940 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:56:27,750-Speed 3031.08 samples/sec Loss 10.7688 LearningRate 0.0540 Epoch: 5 Global Step: 65950 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:56:31,096-Speed 3061.18 samples/sec Loss 10.4862 LearningRate 0.0539 Epoch: 5 Global Step: 65960 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:56:34,555-Speed 2961.61 samples/sec Loss 10.7343 LearningRate 0.0539 Epoch: 5 Global Step: 65970 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:56:37,929-Speed 3035.52 samples/sec Loss 10.6260 LearningRate 0.0539 Epoch: 5 Global Step: 65980 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:56:41,411-Speed 2941.66 samples/sec Loss 10.7885 LearningRate 0.0539 Epoch: 5 Global Step: 65990 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:56:44,860-Speed 2970.52 samples/sec Loss 10.8577 LearningRate 0.0539 Epoch: 5 Global Step: 66000 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:56:48,342-Speed 2941.29 samples/sec Loss 10.7123 LearningRate 0.0539 Epoch: 5 Global Step: 66010 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:56:51,720-Speed 3033.05 samples/sec Loss 10.6951 LearningRate 0.0539 Epoch: 5 Global Step: 66020 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:56:55,189-Speed 2952.56 samples/sec Loss 10.7259 LearningRate 0.0539 Epoch: 5 Global Step: 66030 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:56:58,547-Speed 3050.12 samples/sec Loss 10.6306 LearningRate 0.0539 Epoch: 5 Global Step: 66040 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:57:01,978-Speed 2986.23 samples/sec Loss 10.7356 LearningRate 0.0539 Epoch: 5 Global Step: 66050 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:57:05,445-Speed 2954.37 samples/sec Loss 10.7567 LearningRate 0.0539 Epoch: 5 Global Step: 66060 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:57:08,822-Speed 3033.68 samples/sec Loss 10.7666 LearningRate 0.0539 Epoch: 5 Global Step: 66070 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:57:12,238-Speed 2998.20 samples/sec Loss 10.7375 LearningRate 0.0539 Epoch: 5 Global Step: 66080 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:57:15,683-Speed 2973.11 samples/sec Loss 10.5962 LearningRate 0.0539 Epoch: 5 Global Step: 66090 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:57:19,068-Speed 3026.33 samples/sec Loss 10.7339 LearningRate 0.0539 Epoch: 5 Global Step: 66100 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:57:22,427-Speed 3050.15 samples/sec Loss 10.8696 LearningRate 0.0539 Epoch: 5 Global Step: 66110 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:57:25,832-Speed 3007.66 samples/sec Loss 10.7003 LearningRate 0.0539 Epoch: 5 Global Step: 66120 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:57:29,310-Speed 2944.62 samples/sec Loss 10.7084 LearningRate 0.0538 Epoch: 5 Global Step: 66130 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:57:32,664-Speed 3054.52 samples/sec Loss 10.7493 LearningRate 0.0538 Epoch: 5 Global Step: 66140 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:57:36,053-Speed 3022.12 samples/sec Loss 10.7715 LearningRate 0.0538 Epoch: 5 Global Step: 66150 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:57:39,392-Speed 3067.87 samples/sec Loss 10.5633 LearningRate 0.0538 Epoch: 5 Global Step: 66160 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:57:42,723-Speed 3075.54 samples/sec Loss 10.6867 LearningRate 0.0538 Epoch: 5 Global Step: 66170 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:57:46,044-Speed 3084.27 samples/sec Loss 10.7830 LearningRate 0.0538 Epoch: 5 Global Step: 66180 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:57:49,460-Speed 2998.32 samples/sec Loss 10.7308 LearningRate 0.0538 Epoch: 5 Global Step: 66190 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:57:52,853-Speed 3018.59 samples/sec Loss 10.7854 LearningRate 0.0538 Epoch: 5 Global Step: 66200 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:57:56,262-Speed 3005.25 samples/sec Loss 10.7370 LearningRate 0.0538 Epoch: 5 Global Step: 66210 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:57:59,690-Speed 2987.66 samples/sec Loss 10.8925 LearningRate 0.0538 Epoch: 5 Global Step: 66220 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:58:03,088-Speed 3014.42 samples/sec Loss 10.8577 LearningRate 0.0538 Epoch: 5 Global Step: 66230 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:58:06,556-Speed 2953.69 samples/sec Loss 10.7449 LearningRate 0.0538 Epoch: 5 Global Step: 66240 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:58:09,945-Speed 3022.27 samples/sec Loss 10.6715 LearningRate 0.0538 Epoch: 5 Global Step: 66250 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:58:13,346-Speed 3011.26 samples/sec Loss 10.8933 LearningRate 0.0538 Epoch: 5 Global Step: 66260 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:58:16,701-Speed 3053.29 samples/sec Loss 10.8231 LearningRate 0.0538 Epoch: 5 Global Step: 66270 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:58:20,056-Speed 3053.35 samples/sec Loss 10.7559 LearningRate 0.0538 Epoch: 5 Global Step: 66280 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:58:23,436-Speed 3029.94 samples/sec Loss 10.6013 LearningRate 0.0538 Epoch: 5 Global Step: 66290 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:58:26,782-Speed 3061.05 samples/sec Loss 10.7963 LearningRate 0.0537 Epoch: 5 Global Step: 66300 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:58:30,100-Speed 3088.52 samples/sec Loss 10.6059 LearningRate 0.0537 Epoch: 5 Global Step: 66310 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:58:33,453-Speed 3054.91 samples/sec Loss 10.6700 LearningRate 0.0537 Epoch: 5 Global Step: 66320 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:58:36,808-Speed 3053.44 samples/sec Loss 10.6728 LearningRate 0.0537 Epoch: 5 Global Step: 66330 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:58:40,204-Speed 3015.80 samples/sec Loss 10.7267 LearningRate 0.0537 Epoch: 5 Global Step: 66340 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:58:43,625-Speed 2994.71 samples/sec Loss 10.8694 LearningRate 0.0537 Epoch: 5 Global Step: 66350 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:58:46,988-Speed 3045.25 samples/sec Loss 10.7261 LearningRate 0.0537 Epoch: 5 Global Step: 66360 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:58:50,363-Speed 3035.34 samples/sec Loss 10.6919 LearningRate 0.0537 Epoch: 5 Global Step: 66370 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:58:53,710-Speed 3059.70 samples/sec Loss 10.7683 LearningRate 0.0537 Epoch: 5 Global Step: 66380 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:58:57,031-Speed 3084.42 samples/sec Loss 10.7358 LearningRate 0.0537 Epoch: 5 Global Step: 66390 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:59:00,461-Speed 2986.08 samples/sec Loss 10.7172 LearningRate 0.0537 Epoch: 5 Global Step: 66400 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:59:03,811-Speed 3058.12 samples/sec Loss 10.6795 LearningRate 0.0537 Epoch: 5 Global Step: 66410 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:59:07,162-Speed 3056.38 samples/sec Loss 10.8024 LearningRate 0.0537 Epoch: 5 Global Step: 66420 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:59:10,541-Speed 3032.11 samples/sec Loss 10.6860 LearningRate 0.0537 Epoch: 5 Global Step: 66430 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:59:13,871-Speed 3075.86 samples/sec Loss 10.8269 LearningRate 0.0537 Epoch: 5 Global Step: 66440 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:59:17,220-Speed 3059.32 samples/sec Loss 10.7616 LearningRate 0.0537 Epoch: 5 Global Step: 66450 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:59:20,650-Speed 2985.20 samples/sec Loss 10.6471 LearningRate 0.0537 Epoch: 5 Global Step: 66460 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:59:24,052-Speed 3011.01 samples/sec Loss 10.7135 LearningRate 0.0536 Epoch: 5 Global Step: 66470 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:59:27,414-Speed 3046.86 samples/sec Loss 10.7473 LearningRate 0.0536 Epoch: 5 Global Step: 66480 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:59:30,812-Speed 3014.43 samples/sec Loss 10.7390 LearningRate 0.0536 Epoch: 5 Global Step: 66490 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:59:34,224-Speed 3002.37 samples/sec Loss 10.7288 LearningRate 0.0536 Epoch: 5 Global Step: 66500 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 07:59:37,552-Speed 3077.20 samples/sec Loss 10.6731 LearningRate 0.0536 Epoch: 5 Global Step: 66510 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:59:40,976-Speed 2992.64 samples/sec Loss 10.7567 LearningRate 0.0536 Epoch: 5 Global Step: 66520 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:59:44,419-Speed 2974.84 samples/sec Loss 10.6476 LearningRate 0.0536 Epoch: 5 Global Step: 66530 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:59:47,929-Speed 2918.15 samples/sec Loss 10.8218 LearningRate 0.0536 Epoch: 5 Global Step: 66540 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:59:51,331-Speed 3010.41 samples/sec Loss 10.7018 LearningRate 0.0536 Epoch: 5 Global Step: 66550 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:59:54,751-Speed 2994.93 samples/sec Loss 10.5815 LearningRate 0.0536 Epoch: 5 Global Step: 66560 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 07:59:58,147-Speed 3016.57 samples/sec Loss 10.7703 LearningRate 0.0536 Epoch: 5 Global Step: 66570 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:00:01,593-Speed 2972.66 samples/sec Loss 10.6692 LearningRate 0.0536 Epoch: 5 Global Step: 66580 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:00:04,944-Speed 3056.81 samples/sec Loss 10.8008 LearningRate 0.0536 Epoch: 5 Global Step: 66590 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:00:08,299-Speed 3053.00 samples/sec Loss 10.6768 LearningRate 0.0536 Epoch: 5 Global Step: 66600 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:00:11,735-Speed 2981.02 samples/sec Loss 10.6653 LearningRate 0.0536 Epoch: 5 Global Step: 66610 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:00:15,148-Speed 3001.04 samples/sec Loss 10.5833 LearningRate 0.0536 Epoch: 5 Global Step: 66620 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:00:18,545-Speed 3016.27 samples/sec Loss 10.6965 LearningRate 0.0536 Epoch: 5 Global Step: 66630 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:00:21,897-Speed 3055.32 samples/sec Loss 10.5915 LearningRate 0.0535 Epoch: 5 Global Step: 66640 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:00:25,253-Speed 3052.46 samples/sec Loss 10.7545 LearningRate 0.0535 Epoch: 5 Global Step: 66650 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:00:28,646-Speed 3018.52 samples/sec Loss 10.8225 LearningRate 0.0535 Epoch: 5 Global Step: 66660 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:00:31,992-Speed 3061.11 samples/sec Loss 10.7613 LearningRate 0.0535 Epoch: 5 Global Step: 66670 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:00:35,375-Speed 3028.08 samples/sec Loss 10.7635 LearningRate 0.0535 Epoch: 5 Global Step: 66680 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:00:38,759-Speed 3027.05 samples/sec Loss 10.8674 LearningRate 0.0535 Epoch: 5 Global Step: 66690 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:00:42,143-Speed 3026.92 samples/sec Loss 10.8175 LearningRate 0.0535 Epoch: 5 Global Step: 66700 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:00:45,500-Speed 3050.74 samples/sec Loss 10.7858 LearningRate 0.0535 Epoch: 5 Global Step: 66710 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:00:48,854-Speed 3054.54 samples/sec Loss 10.7428 LearningRate 0.0535 Epoch: 5 Global Step: 66720 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:00:52,337-Speed 2940.29 samples/sec Loss 10.9403 LearningRate 0.0535 Epoch: 5 Global Step: 66730 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:00:55,724-Speed 3024.81 samples/sec Loss 10.6059 LearningRate 0.0535 Epoch: 5 Global Step: 66740 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:00:59,063-Speed 3067.24 samples/sec Loss 10.7714 LearningRate 0.0535 Epoch: 5 Global Step: 66750 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:01:02,437-Speed 3035.95 samples/sec Loss 10.7440 LearningRate 0.0535 Epoch: 5 Global Step: 66760 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:01:05,852-Speed 2999.15 samples/sec Loss 10.6723 LearningRate 0.0535 Epoch: 5 Global Step: 66770 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:01:09,204-Speed 3056.14 samples/sec Loss 10.7563 LearningRate 0.0535 Epoch: 5 Global Step: 66780 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:01:12,557-Speed 3054.66 samples/sec Loss 10.7768 LearningRate 0.0535 Epoch: 5 Global Step: 66790 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:01:15,896-Speed 3068.17 samples/sec Loss 10.7478 LearningRate 0.0535 Epoch: 5 Global Step: 66800 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:01:19,269-Speed 3036.59 samples/sec Loss 10.7080 LearningRate 0.0534 Epoch: 5 Global Step: 66810 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:01:22,614-Speed 3062.37 samples/sec Loss 10.6169 LearningRate 0.0534 Epoch: 5 Global Step: 66820 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:01:25,980-Speed 3043.60 samples/sec Loss 10.6756 LearningRate 0.0534 Epoch: 5 Global Step: 66830 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:01:29,351-Speed 3038.51 samples/sec Loss 10.6408 LearningRate 0.0534 Epoch: 5 Global Step: 66840 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:01:32,707-Speed 3052.12 samples/sec Loss 10.7185 LearningRate 0.0534 Epoch: 5 Global Step: 66850 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:01:36,147-Speed 2977.36 samples/sec Loss 10.6914 LearningRate 0.0534 Epoch: 5 Global Step: 66860 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:01:39,619-Speed 2949.97 samples/sec Loss 10.7722 LearningRate 0.0534 Epoch: 5 Global Step: 66870 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:01:43,016-Speed 3015.68 samples/sec Loss 10.7421 LearningRate 0.0534 Epoch: 5 Global Step: 66880 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:01:46,413-Speed 3015.18 samples/sec Loss 10.7500 LearningRate 0.0534 Epoch: 5 Global Step: 66890 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:01:49,784-Speed 3039.39 samples/sec Loss 10.8280 LearningRate 0.0534 Epoch: 5 Global Step: 66900 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:01:53,094-Speed 3094.27 samples/sec Loss 10.7813 LearningRate 0.0534 Epoch: 5 Global Step: 66910 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:01:56,458-Speed 3044.58 samples/sec Loss 10.6742 LearningRate 0.0534 Epoch: 5 Global Step: 66920 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:01:59,844-Speed 3024.93 samples/sec Loss 10.6434 LearningRate 0.0534 Epoch: 5 Global Step: 66930 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:02:03,287-Speed 2975.00 samples/sec Loss 10.6642 LearningRate 0.0534 Epoch: 5 Global Step: 66940 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:02:06,704-Speed 2998.04 samples/sec Loss 10.8657 LearningRate 0.0534 Epoch: 5 Global Step: 66950 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:02:10,133-Speed 2986.58 samples/sec Loss 10.7198 LearningRate 0.0534 Epoch: 5 Global Step: 66960 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:02:13,483-Speed 3057.70 samples/sec Loss 10.6795 LearningRate 0.0534 Epoch: 5 Global Step: 66970 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:02:16,863-Speed 3030.53 samples/sec Loss 10.6332 LearningRate 0.0533 Epoch: 5 Global Step: 66980 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:02:20,246-Speed 3028.12 samples/sec Loss 10.6551 LearningRate 0.0533 Epoch: 5 Global Step: 66990 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:02:23,658-Speed 3001.21 samples/sec Loss 10.7208 LearningRate 0.0533 Epoch: 5 Global Step: 67000 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:02:27,127-Speed 2952.77 samples/sec Loss 10.6986 LearningRate 0.0533 Epoch: 5 Global Step: 67010 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:02:30,444-Speed 3088.27 samples/sec Loss 10.7268 LearningRate 0.0533 Epoch: 5 Global Step: 67020 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:02:33,796-Speed 3056.76 samples/sec Loss 10.7955 LearningRate 0.0533 Epoch: 5 Global Step: 67030 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:02:37,179-Speed 3026.90 samples/sec Loss 10.5147 LearningRate 0.0533 Epoch: 5 Global Step: 67040 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:02:40,567-Speed 3023.42 samples/sec Loss 10.6530 LearningRate 0.0533 Epoch: 5 Global Step: 67050 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:02:43,919-Speed 3056.04 samples/sec Loss 10.6570 LearningRate 0.0533 Epoch: 5 Global Step: 67060 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:02:47,332-Speed 3001.51 samples/sec Loss 10.7217 LearningRate 0.0533 Epoch: 5 Global Step: 67070 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:02:50,668-Speed 3070.26 samples/sec Loss 10.8263 LearningRate 0.0533 Epoch: 5 Global Step: 67080 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:02:54,047-Speed 3031.46 samples/sec Loss 10.8062 LearningRate 0.0533 Epoch: 5 Global Step: 67090 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:02:57,414-Speed 3041.60 samples/sec Loss 10.8308 LearningRate 0.0533 Epoch: 5 Global Step: 67100 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:03:00,912-Speed 2928.51 samples/sec Loss 10.8300 LearningRate 0.0533 Epoch: 5 Global Step: 67110 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:03:04,267-Speed 3053.01 samples/sec Loss 10.7318 LearningRate 0.0533 Epoch: 5 Global Step: 67120 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:03:07,617-Speed 3057.85 samples/sec Loss 10.6815 LearningRate 0.0533 Epoch: 5 Global Step: 67130 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:03:10,996-Speed 3031.91 samples/sec Loss 10.8334 LearningRate 0.0533 Epoch: 5 Global Step: 67140 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:03:14,440-Speed 2974.06 samples/sec Loss 10.6084 LearningRate 0.0532 Epoch: 5 Global Step: 67150 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:03:17,832-Speed 3020.06 samples/sec Loss 10.7531 LearningRate 0.0532 Epoch: 5 Global Step: 67160 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:03:21,196-Speed 3044.87 samples/sec Loss 10.7698 LearningRate 0.0532 Epoch: 5 Global Step: 67170 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:03:24,544-Speed 3059.17 samples/sec Loss 10.6388 LearningRate 0.0532 Epoch: 5 Global Step: 67180 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:03:27,930-Speed 3025.05 samples/sec Loss 10.6577 LearningRate 0.0532 Epoch: 5 Global Step: 67190 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:03:31,264-Speed 3072.59 samples/sec Loss 10.8266 LearningRate 0.0532 Epoch: 5 Global Step: 67200 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:03:34,687-Speed 2992.21 samples/sec Loss 10.7953 LearningRate 0.0532 Epoch: 5 Global Step: 67210 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:03:38,041-Speed 3053.89 samples/sec Loss 10.7433 LearningRate 0.0532 Epoch: 5 Global Step: 67220 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:03:41,430-Speed 3022.58 samples/sec Loss 10.5769 LearningRate 0.0532 Epoch: 5 Global Step: 67230 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:03:44,824-Speed 3018.35 samples/sec Loss 10.6782 LearningRate 0.0532 Epoch: 5 Global Step: 67240 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:03:48,150-Speed 3079.91 samples/sec Loss 10.6509 LearningRate 0.0532 Epoch: 5 Global Step: 67250 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:03:51,476-Speed 3079.45 samples/sec Loss 10.6387 LearningRate 0.0532 Epoch: 5 Global Step: 67260 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:03:54,792-Speed 3089.19 samples/sec Loss 10.7303 LearningRate 0.0532 Epoch: 5 Global Step: 67270 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:03:58,152-Speed 3047.96 samples/sec Loss 10.7297 LearningRate 0.0532 Epoch: 5 Global Step: 67280 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:04:01,510-Speed 3050.36 samples/sec Loss 10.6650 LearningRate 0.0532 Epoch: 5 Global Step: 67290 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:04:04,894-Speed 3026.59 samples/sec Loss 10.6303 LearningRate 0.0532 Epoch: 5 Global Step: 67300 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:04:08,280-Speed 3025.75 samples/sec Loss 10.7251 LearningRate 0.0532 Epoch: 5 Global Step: 67310 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:04:11,708-Speed 2987.92 samples/sec Loss 10.6078 LearningRate 0.0531 Epoch: 5 Global Step: 67320 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:04:15,151-Speed 2974.66 samples/sec Loss 10.7807 LearningRate 0.0531 Epoch: 5 Global Step: 67330 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:04:18,521-Speed 3039.48 samples/sec Loss 10.6142 LearningRate 0.0531 Epoch: 5 Global Step: 67340 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:04:21,946-Speed 2990.26 samples/sec Loss 10.5989 LearningRate 0.0531 Epoch: 5 Global Step: 67350 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-04-27 08:04:25,420-Speed 2949.89 samples/sec Loss 10.7226 LearningRate 0.0531 Epoch: 5 Global Step: 67360 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:04:28,788-Speed 3040.56 samples/sec Loss 10.5936 LearningRate 0.0531 Epoch: 5 Global Step: 67370 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:04:32,189-Speed 3011.54 samples/sec Loss 10.6476 LearningRate 0.0531 Epoch: 5 Global Step: 67380 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:04:35,555-Speed 3042.96 samples/sec Loss 10.7759 LearningRate 0.0531 Epoch: 5 Global Step: 67390 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:04:38,949-Speed 3019.05 samples/sec Loss 10.7656 LearningRate 0.0531 Epoch: 5 Global Step: 67400 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:04:42,311-Speed 3046.20 samples/sec Loss 10.6993 LearningRate 0.0531 Epoch: 5 Global Step: 67410 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:04:45,686-Speed 3035.01 samples/sec Loss 10.4840 LearningRate 0.0531 Epoch: 5 Global Step: 67420 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:04:49,051-Speed 3044.27 samples/sec Loss 10.7926 LearningRate 0.0531 Epoch: 5 Global Step: 67430 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:04:52,405-Speed 3053.82 samples/sec Loss 10.7368 LearningRate 0.0531 Epoch: 5 Global Step: 67440 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:04:55,852-Speed 2971.65 samples/sec Loss 10.5432 LearningRate 0.0531 Epoch: 5 Global Step: 67450 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:04:59,298-Speed 2972.69 samples/sec Loss 10.5743 LearningRate 0.0531 Epoch: 5 Global Step: 67460 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:05:02,745-Speed 2971.47 samples/sec Loss 10.7866 LearningRate 0.0531 Epoch: 5 Global Step: 67470 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:05:06,124-Speed 3031.19 samples/sec Loss 10.8294 LearningRate 0.0531 Epoch: 5 Global Step: 67480 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:05:09,541-Speed 2997.68 samples/sec Loss 10.7359 LearningRate 0.0530 Epoch: 5 Global Step: 67490 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:05:12,917-Speed 3034.05 samples/sec Loss 10.9050 LearningRate 0.0530 Epoch: 5 Global Step: 67500 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:05:16,278-Speed 3048.04 samples/sec Loss 10.7911 LearningRate 0.0530 Epoch: 5 Global Step: 67510 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:05:19,739-Speed 2959.52 samples/sec Loss 10.6457 LearningRate 0.0530 Epoch: 5 Global Step: 67520 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:05:23,149-Speed 3003.17 samples/sec Loss 10.7635 LearningRate 0.0530 Epoch: 5 Global Step: 67530 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:05:26,535-Speed 3025.77 samples/sec Loss 10.5966 LearningRate 0.0530 Epoch: 5 Global Step: 67540 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:05:30,011-Speed 2946.61 samples/sec Loss 10.6967 LearningRate 0.0530 Epoch: 5 Global Step: 67550 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:05:33,381-Speed 3039.57 samples/sec Loss 10.7860 LearningRate 0.0530 Epoch: 5 Global Step: 67560 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:05:36,814-Speed 2983.47 samples/sec Loss 10.7720 LearningRate 0.0530 Epoch: 5 Global Step: 67570 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:05:40,187-Speed 3036.72 samples/sec Loss 10.7617 LearningRate 0.0530 Epoch: 5 Global Step: 67580 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:05:43,514-Speed 3078.80 samples/sec Loss 10.7091 LearningRate 0.0530 Epoch: 5 Global Step: 67590 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:05:46,892-Speed 3031.64 samples/sec Loss 10.7164 LearningRate 0.0530 Epoch: 5 Global Step: 67600 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:05:50,257-Speed 3044.48 samples/sec Loss 10.7892 LearningRate 0.0530 Epoch: 5 Global Step: 67610 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:05:53,672-Speed 2999.55 samples/sec Loss 10.7900 LearningRate 0.0530 Epoch: 5 Global Step: 67620 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:05:57,084-Speed 3002.07 samples/sec Loss 10.7176 LearningRate 0.0530 Epoch: 5 Global Step: 67630 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:06:00,427-Speed 3063.66 samples/sec Loss 10.5414 LearningRate 0.0530 Epoch: 5 Global Step: 67640 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:06:03,878-Speed 2968.41 samples/sec Loss 10.6004 LearningRate 0.0530 Epoch: 5 Global Step: 67650 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:06:07,321-Speed 2974.75 samples/sec Loss 10.8354 LearningRate 0.0529 Epoch: 5 Global Step: 67660 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:06:10,655-Speed 3072.83 samples/sec Loss 10.8447 LearningRate 0.0529 Epoch: 5 Global Step: 67670 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:06:14,065-Speed 3003.25 samples/sec Loss 10.6623 LearningRate 0.0529 Epoch: 5 Global Step: 67680 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:06:17,402-Speed 3069.56 samples/sec Loss 10.7494 LearningRate 0.0529 Epoch: 5 Global Step: 67690 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:06:20,853-Speed 2968.60 samples/sec Loss 10.7336 LearningRate 0.0529 Epoch: 5 Global Step: 67700 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:06:24,206-Speed 3055.00 samples/sec Loss 10.6503 LearningRate 0.0529 Epoch: 5 Global Step: 67710 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:06:27,621-Speed 2999.39 samples/sec Loss 10.8211 LearningRate 0.0529 Epoch: 5 Global Step: 67720 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:06:30,963-Speed 3064.83 samples/sec Loss 10.6921 LearningRate 0.0529 Epoch: 5 Global Step: 67730 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:06:34,372-Speed 3004.73 samples/sec Loss 10.7460 LearningRate 0.0529 Epoch: 5 Global Step: 67740 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:06:37,809-Speed 2979.60 samples/sec Loss 10.6238 LearningRate 0.0529 Epoch: 5 Global Step: 67750 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:06:41,235-Speed 2989.55 samples/sec Loss 10.8084 LearningRate 0.0529 Epoch: 5 Global Step: 67760 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:06:44,692-Speed 2963.79 samples/sec Loss 10.7771 LearningRate 0.0529 Epoch: 5 Global Step: 67770 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:06:48,149-Speed 2962.94 samples/sec Loss 10.6448 LearningRate 0.0529 Epoch: 5 Global Step: 67780 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:06:51,638-Speed 2936.07 samples/sec Loss 10.7041 LearningRate 0.0529 Epoch: 5 Global Step: 67790 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:06:55,086-Speed 2970.00 samples/sec Loss 10.6903 LearningRate 0.0529 Epoch: 5 Global Step: 67800 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:06:58,562-Speed 2946.72 samples/sec Loss 10.7751 LearningRate 0.0529 Epoch: 5 Global Step: 67810 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:07:01,909-Speed 3061.21 samples/sec Loss 10.8012 LearningRate 0.0529 Epoch: 5 Global Step: 67820 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:07:05,295-Speed 3024.75 samples/sec Loss 10.4585 LearningRate 0.0528 Epoch: 5 Global Step: 67830 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:07:08,717-Speed 2993.15 samples/sec Loss 10.6626 LearningRate 0.0528 Epoch: 5 Global Step: 67840 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:07:12,148-Speed 2985.62 samples/sec Loss 10.6975 LearningRate 0.0528 Epoch: 5 Global Step: 67850 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:07:15,552-Speed 3009.46 samples/sec Loss 10.7986 LearningRate 0.0528 Epoch: 5 Global Step: 67860 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:07:18,954-Speed 3010.57 samples/sec Loss 10.6678 LearningRate 0.0528 Epoch: 5 Global Step: 67870 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:07:22,340-Speed 3025.66 samples/sec Loss 10.6437 LearningRate 0.0528 Epoch: 5 Global Step: 67880 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:07:25,678-Speed 3068.52 samples/sec Loss 10.6608 LearningRate 0.0528 Epoch: 5 Global Step: 67890 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:07:29,093-Speed 2998.90 samples/sec Loss 10.7001 LearningRate 0.0528 Epoch: 5 Global Step: 67900 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:07:32,530-Speed 2980.57 samples/sec Loss 10.6478 LearningRate 0.0528 Epoch: 5 Global Step: 67910 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:07:35,894-Speed 3044.96 samples/sec Loss 10.7160 LearningRate 0.0528 Epoch: 5 Global Step: 67920 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:07:39,318-Speed 2991.72 samples/sec Loss 10.8014 LearningRate 0.0528 Epoch: 5 Global Step: 67930 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:07:42,718-Speed 3012.56 samples/sec Loss 10.7614 LearningRate 0.0528 Epoch: 5 Global Step: 67940 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:07:46,171-Speed 2966.43 samples/sec Loss 10.7054 LearningRate 0.0528 Epoch: 5 Global Step: 67950 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:07:49,583-Speed 3001.46 samples/sec Loss 10.6141 LearningRate 0.0528 Epoch: 5 Global Step: 67960 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:07:52,950-Speed 3042.25 samples/sec Loss 10.6882 LearningRate 0.0528 Epoch: 5 Global Step: 67970 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:07:56,313-Speed 3045.67 samples/sec Loss 10.7782 LearningRate 0.0528 Epoch: 5 Global Step: 67980 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:07:59,735-Speed 2993.80 samples/sec Loss 10.5843 LearningRate 0.0528 Epoch: 5 Global Step: 67990 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:08:03,102-Speed 3041.98 samples/sec Loss 10.7033 LearningRate 0.0527 Epoch: 5 Global Step: 68000 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:08:06,508-Speed 3006.84 samples/sec Loss 10.6143 LearningRate 0.0527 Epoch: 5 Global Step: 68010 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:08:09,870-Speed 3047.41 samples/sec Loss 10.5642 LearningRate 0.0527 Epoch: 5 Global Step: 68020 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:08:13,228-Speed 3050.08 samples/sec Loss 10.7817 LearningRate 0.0527 Epoch: 5 Global Step: 68030 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:08:16,635-Speed 3006.04 samples/sec Loss 10.6587 LearningRate 0.0527 Epoch: 5 Global Step: 68040 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:08:20,000-Speed 3044.04 samples/sec Loss 10.7572 LearningRate 0.0527 Epoch: 5 Global Step: 68050 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:08:23,427-Speed 2989.38 samples/sec Loss 10.6839 LearningRate 0.0527 Epoch: 5 Global Step: 68060 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:08:26,839-Speed 3002.23 samples/sec Loss 10.8045 LearningRate 0.0527 Epoch: 5 Global Step: 68070 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:08:30,238-Speed 3013.46 samples/sec Loss 10.7231 LearningRate 0.0527 Epoch: 5 Global Step: 68080 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:08:33,657-Speed 2995.92 samples/sec Loss 10.5848 LearningRate 0.0527 Epoch: 5 Global Step: 68090 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:08:37,076-Speed 2995.58 samples/sec Loss 10.6903 LearningRate 0.0527 Epoch: 5 Global Step: 68100 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:08:40,431-Speed 3052.91 samples/sec Loss 10.7271 LearningRate 0.0527 Epoch: 5 Global Step: 68110 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:08:43,861-Speed 2986.49 samples/sec Loss 10.6996 LearningRate 0.0527 Epoch: 5 Global Step: 68120 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:08:47,255-Speed 3018.26 samples/sec Loss 10.6010 LearningRate 0.0527 Epoch: 5 Global Step: 68130 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:08:50,639-Speed 3026.50 samples/sec Loss 10.5644 LearningRate 0.0527 Epoch: 5 Global Step: 68140 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:08:54,036-Speed 3015.50 samples/sec Loss 10.6702 LearningRate 0.0527 Epoch: 5 Global Step: 68150 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:08:57,464-Speed 2988.40 samples/sec Loss 10.5686 LearningRate 0.0527 Epoch: 5 Global Step: 68160 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:09:00,833-Speed 3040.16 samples/sec Loss 10.6991 LearningRate 0.0526 Epoch: 5 Global Step: 68170 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:09:04,211-Speed 3031.48 samples/sec Loss 10.6700 LearningRate 0.0526 Epoch: 5 Global Step: 68180 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:09:07,634-Speed 2993.03 samples/sec Loss 10.6844 LearningRate 0.0526 Epoch: 5 Global Step: 68190 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:09:11,123-Speed 2935.75 samples/sec Loss 10.6550 LearningRate 0.0526 Epoch: 5 Global Step: 68200 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:09:14,468-Speed 3061.98 samples/sec Loss 10.7680 LearningRate 0.0526 Epoch: 5 Global Step: 68210 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:09:17,845-Speed 3034.00 samples/sec Loss 10.6750 LearningRate 0.0526 Epoch: 5 Global Step: 68220 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:09:21,289-Speed 2974.30 samples/sec Loss 10.6280 LearningRate 0.0526 Epoch: 5 Global Step: 68230 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:09:24,745-Speed 2963.96 samples/sec Loss 10.6984 LearningRate 0.0526 Epoch: 5 Global Step: 68240 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:09:28,149-Speed 3009.09 samples/sec Loss 10.6458 LearningRate 0.0526 Epoch: 5 Global Step: 68250 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:09:31,629-Speed 2943.72 samples/sec Loss 10.7140 LearningRate 0.0526 Epoch: 5 Global Step: 68260 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:09:35,088-Speed 2961.18 samples/sec Loss 10.6177 LearningRate 0.0526 Epoch: 5 Global Step: 68270 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:09:38,415-Speed 3078.67 samples/sec Loss 10.4955 LearningRate 0.0526 Epoch: 5 Global Step: 68280 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:09:41,803-Speed 3023.77 samples/sec Loss 10.5385 LearningRate 0.0526 Epoch: 5 Global Step: 68290 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:09:45,168-Speed 3043.76 samples/sec Loss 10.6832 LearningRate 0.0526 Epoch: 5 Global Step: 68300 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:09:48,586-Speed 2996.83 samples/sec Loss 10.5427 LearningRate 0.0526 Epoch: 5 Global Step: 68310 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:09:51,991-Speed 3007.86 samples/sec Loss 10.7618 LearningRate 0.0526 Epoch: 5 Global Step: 68320 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:09:55,404-Speed 3001.80 samples/sec Loss 10.6655 LearningRate 0.0526 Epoch: 5 Global Step: 68330 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:09:58,749-Speed 3061.85 samples/sec Loss 10.5728 LearningRate 0.0525 Epoch: 5 Global Step: 68340 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:10:02,156-Speed 3006.87 samples/sec Loss 10.7355 LearningRate 0.0525 Epoch: 5 Global Step: 68350 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:10:05,502-Speed 3060.76 samples/sec Loss 10.6452 LearningRate 0.0525 Epoch: 5 Global Step: 68360 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:10:08,905-Speed 3010.34 samples/sec Loss 10.6369 LearningRate 0.0525 Epoch: 5 Global Step: 68370 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:10:12,337-Speed 2984.46 samples/sec Loss 10.6657 LearningRate 0.0525 Epoch: 5 Global Step: 68380 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:10:16,658-Speed 2370.70 samples/sec Loss 10.7003 LearningRate 0.0525 Epoch: 5 Global Step: 68390 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:10:20,042-Speed 3026.20 samples/sec Loss 10.5975 LearningRate 0.0525 Epoch: 5 Global Step: 68400 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:10:23,489-Speed 2972.36 samples/sec Loss 10.6498 LearningRate 0.0525 Epoch: 5 Global Step: 68410 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:10:26,940-Speed 2967.72 samples/sec Loss 10.6962 LearningRate 0.0525 Epoch: 5 Global Step: 68420 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:10:30,351-Speed 3003.09 samples/sec Loss 10.5735 LearningRate 0.0525 Epoch: 5 Global Step: 68430 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:10:33,731-Speed 3031.05 samples/sec Loss 10.6633 LearningRate 0.0525 Epoch: 5 Global Step: 68440 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:10:37,057-Speed 3078.73 samples/sec Loss 10.5704 LearningRate 0.0525 Epoch: 5 Global Step: 68450 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:10:40,420-Speed 3045.94 samples/sec Loss 10.7241 LearningRate 0.0525 Epoch: 5 Global Step: 68460 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:10:43,751-Speed 3075.42 samples/sec Loss 10.6937 LearningRate 0.0525 Epoch: 5 Global Step: 68470 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:10:47,138-Speed 3023.63 samples/sec Loss 10.6491 LearningRate 0.0525 Epoch: 5 Global Step: 68480 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:10:50,558-Speed 2995.55 samples/sec Loss 10.5797 LearningRate 0.0525 Epoch: 5 Global Step: 68490 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:10:53,886-Speed 3077.45 samples/sec Loss 10.4848 LearningRate 0.0525 Epoch: 5 Global Step: 68500 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:10:57,196-Speed 3096.23 samples/sec Loss 10.7289 LearningRate 0.0524 Epoch: 5 Global Step: 68510 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:11:00,592-Speed 3015.82 samples/sec Loss 10.5722 LearningRate 0.0524 Epoch: 5 Global Step: 68520 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:11:03,934-Speed 3065.43 samples/sec Loss 10.6418 LearningRate 0.0524 Epoch: 5 Global Step: 68530 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:11:07,281-Speed 3060.69 samples/sec Loss 10.4871 LearningRate 0.0524 Epoch: 5 Global Step: 68540 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:11:10,760-Speed 2944.14 samples/sec Loss 10.6076 LearningRate 0.0524 Epoch: 5 Global Step: 68550 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:11:14,124-Speed 3044.56 samples/sec Loss 10.5683 LearningRate 0.0524 Epoch: 5 Global Step: 68560 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:11:17,519-Speed 3017.34 samples/sec Loss 10.5925 LearningRate 0.0524 Epoch: 5 Global Step: 68570 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:11:20,873-Speed 3054.06 samples/sec Loss 10.6176 LearningRate 0.0524 Epoch: 5 Global Step: 68580 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:11:24,230-Speed 3050.78 samples/sec Loss 10.6667 LearningRate 0.0524 Epoch: 5 Global Step: 68590 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:11:27,608-Speed 3033.82 samples/sec Loss 10.7471 LearningRate 0.0524 Epoch: 5 Global Step: 68600 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:11:30,972-Speed 3044.18 samples/sec Loss 10.5124 LearningRate 0.0524 Epoch: 5 Global Step: 68610 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:11:34,367-Speed 3017.14 samples/sec Loss 10.6673 LearningRate 0.0524 Epoch: 5 Global Step: 68620 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:11:37,773-Speed 3007.83 samples/sec Loss 10.7556 LearningRate 0.0524 Epoch: 5 Global Step: 68630 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:11:41,136-Speed 3046.16 samples/sec Loss 10.5951 LearningRate 0.0524 Epoch: 5 Global Step: 68640 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:11:44,516-Speed 3030.70 samples/sec Loss 10.6359 LearningRate 0.0524 Epoch: 5 Global Step: 68650 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 08:11:47,882-Speed 3042.38 samples/sec Loss 10.6129 LearningRate 0.0524 Epoch: 5 Global Step: 68660 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 08:11:51,292-Speed 3004.60 samples/sec Loss 10.7106 LearningRate 0.0524 Epoch: 5 Global Step: 68670 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 08:11:54,666-Speed 3035.22 samples/sec Loss 10.7844 LearningRate 0.0523 Epoch: 5 Global Step: 68680 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 08:11:58,062-Speed 3016.77 samples/sec Loss 10.6647 LearningRate 0.0523 Epoch: 5 Global Step: 68690 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 08:12:01,423-Speed 3047.86 samples/sec Loss 10.4793 LearningRate 0.0523 Epoch: 5 Global Step: 68700 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 08:12:04,818-Speed 3016.67 samples/sec Loss 10.6465 LearningRate 0.0523 Epoch: 5 Global Step: 68710 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 08:12:08,167-Speed 3058.86 samples/sec Loss 10.7804 LearningRate 0.0523 Epoch: 5 Global Step: 68720 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 08:12:11,537-Speed 3039.03 samples/sec Loss 10.5121 LearningRate 0.0523 Epoch: 5 Global Step: 68730 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 08:12:14,941-Speed 3009.39 samples/sec Loss 10.9000 LearningRate 0.0523 Epoch: 5 Global Step: 68740 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 08:12:18,396-Speed 2964.65 samples/sec Loss 10.6993 LearningRate 0.0523 Epoch: 5 Global Step: 68750 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:12:21,757-Speed 3047.79 samples/sec Loss 10.7276 LearningRate 0.0523 Epoch: 5 Global Step: 68760 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:12:25,113-Speed 3051.93 samples/sec Loss 10.6693 LearningRate 0.0523 Epoch: 5 Global Step: 68770 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:12:28,468-Speed 3054.60 samples/sec Loss 10.5889 LearningRate 0.0523 Epoch: 5 Global Step: 68780 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:12:31,926-Speed 2962.23 samples/sec Loss 10.6015 LearningRate 0.0523 Epoch: 5 Global Step: 68790 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:12:35,304-Speed 3033.32 samples/sec Loss 10.5562 LearningRate 0.0523 Epoch: 5 Global Step: 68800 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:12:38,650-Speed 3060.56 samples/sec Loss 10.6508 LearningRate 0.0523 Epoch: 5 Global Step: 68810 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:12:42,014-Speed 3044.62 samples/sec Loss 10.7257 LearningRate 0.0523 Epoch: 5 Global Step: 68820 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:12:45,372-Speed 3050.25 samples/sec Loss 10.5395 LearningRate 0.0523 Epoch: 5 Global Step: 68830 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:12:48,780-Speed 3006.19 samples/sec Loss 10.5668 LearningRate 0.0523 Epoch: 5 Global Step: 68840 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:12:52,177-Speed 3014.98 samples/sec Loss 10.4655 LearningRate 0.0523 Epoch: 5 Global Step: 68850 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:12:55,510-Speed 3072.83 samples/sec Loss 10.6507 LearningRate 0.0522 Epoch: 5 Global Step: 68860 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:12:58,903-Speed 3020.20 samples/sec Loss 10.6027 LearningRate 0.0522 Epoch: 5 Global Step: 68870 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:13:02,285-Speed 3028.99 samples/sec Loss 10.5544 LearningRate 0.0522 Epoch: 5 Global Step: 68880 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:13:05,754-Speed 2952.38 samples/sec Loss 10.6270 LearningRate 0.0522 Epoch: 5 Global Step: 68890 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:13:09,172-Speed 2996.92 samples/sec Loss 10.5423 LearningRate 0.0522 Epoch: 5 Global Step: 68900 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:13:12,572-Speed 3013.03 samples/sec Loss 10.6322 LearningRate 0.0522 Epoch: 5 Global Step: 68910 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:13:15,999-Speed 2988.56 samples/sec Loss 10.6773 LearningRate 0.0522 Epoch: 5 Global Step: 68920 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 08:13:19,484-Speed 2939.09 samples/sec Loss 10.7705 LearningRate 0.0522 Epoch: 5 Global Step: 68930 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 08:13:22,904-Speed 2995.25 samples/sec Loss 10.6132 LearningRate 0.0522 Epoch: 5 Global Step: 68940 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 08:13:26,337-Speed 2983.78 samples/sec Loss 10.6784 LearningRate 0.0522 Epoch: 5 Global Step: 68950 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 08:13:29,691-Speed 3053.84 samples/sec Loss 10.8037 LearningRate 0.0522 Epoch: 5 Global Step: 68960 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 08:13:33,052-Speed 3047.25 samples/sec Loss 10.5606 LearningRate 0.0522 Epoch: 5 Global Step: 68970 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 08:13:36,470-Speed 2997.48 samples/sec Loss 10.5491 LearningRate 0.0522 Epoch: 5 Global Step: 68980 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 08:13:39,826-Speed 3052.26 samples/sec Loss 10.5792 LearningRate 0.0522 Epoch: 5 Global Step: 68990 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 08:13:43,184-Speed 3049.85 samples/sec Loss 10.6236 LearningRate 0.0522 Epoch: 5 Global Step: 69000 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 08:13:46,555-Speed 3038.75 samples/sec Loss 10.7673 LearningRate 0.0522 Epoch: 5 Global Step: 69010 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 08:13:49,867-Speed 3092.96 samples/sec Loss 10.4815 LearningRate 0.0522 Epoch: 5 Global Step: 69020 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:13:53,208-Speed 3065.61 samples/sec Loss 10.6153 LearningRate 0.0521 Epoch: 5 Global Step: 69030 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:13:56,617-Speed 3006.32 samples/sec Loss 10.6084 LearningRate 0.0521 Epoch: 5 Global Step: 69040 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:14:00,029-Speed 3001.84 samples/sec Loss 10.6476 LearningRate 0.0521 Epoch: 5 Global Step: 69050 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:14:03,406-Speed 3033.19 samples/sec Loss 10.6508 LearningRate 0.0521 Epoch: 5 Global Step: 69060 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:14:06,847-Speed 2976.61 samples/sec Loss 10.7111 LearningRate 0.0521 Epoch: 5 Global Step: 69070 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:14:10,166-Speed 3086.03 samples/sec Loss 10.6233 LearningRate 0.0521 Epoch: 5 Global Step: 69080 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:14:13,569-Speed 3011.30 samples/sec Loss 10.4203 LearningRate 0.0521 Epoch: 5 Global Step: 69090 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:14:16,915-Speed 3061.48 samples/sec Loss 10.6765 LearningRate 0.0521 Epoch: 5 Global Step: 69100 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:14:20,284-Speed 3039.60 samples/sec Loss 10.7655 LearningRate 0.0521 Epoch: 5 Global Step: 69110 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:14:23,667-Speed 3028.16 samples/sec Loss 10.7093 LearningRate 0.0521 Epoch: 5 Global Step: 69120 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:14:27,093-Speed 2989.75 samples/sec Loss 10.6327 LearningRate 0.0521 Epoch: 5 Global Step: 69130 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:14:30,479-Speed 3025.19 samples/sec Loss 10.5872 LearningRate 0.0521 Epoch: 5 Global Step: 69140 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:14:33,848-Speed 3040.22 samples/sec Loss 10.5254 LearningRate 0.0521 Epoch: 5 Global Step: 69150 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:14:37,311-Speed 2957.51 samples/sec Loss 10.8188 LearningRate 0.0521 Epoch: 5 Global Step: 69160 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:14:40,633-Speed 3083.22 samples/sec Loss 10.5811 LearningRate 0.0521 Epoch: 5 Global Step: 69170 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:14:43,987-Speed 3054.00 samples/sec Loss 10.6571 LearningRate 0.0521 Epoch: 5 Global Step: 69180 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:14:47,363-Speed 3034.84 samples/sec Loss 10.6873 LearningRate 0.0521 Epoch: 5 Global Step: 69190 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:14:50,764-Speed 3011.66 samples/sec Loss 10.5188 LearningRate 0.0520 Epoch: 5 Global Step: 69200 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:14:54,219-Speed 2964.31 samples/sec Loss 10.5958 LearningRate 0.0520 Epoch: 5 Global Step: 69210 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:14:57,537-Speed 3087.36 samples/sec Loss 10.5619 LearningRate 0.0520 Epoch: 5 Global Step: 69220 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:15:00,936-Speed 3013.68 samples/sec Loss 10.5571 LearningRate 0.0520 Epoch: 5 Global Step: 69230 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:15:04,373-Speed 2979.71 samples/sec Loss 10.5578 LearningRate 0.0520 Epoch: 5 Global Step: 69240 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:15:07,790-Speed 2998.16 samples/sec Loss 10.7596 LearningRate 0.0520 Epoch: 5 Global Step: 69250 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:15:11,196-Speed 3006.96 samples/sec Loss 10.7229 LearningRate 0.0520 Epoch: 5 Global Step: 69260 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:15:14,586-Speed 3021.41 samples/sec Loss 10.5617 LearningRate 0.0520 Epoch: 5 Global Step: 69270 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:15:17,923-Speed 3069.50 samples/sec Loss 10.6525 LearningRate 0.0520 Epoch: 5 Global Step: 69280 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:15:21,333-Speed 3004.11 samples/sec Loss 10.7201 LearningRate 0.0520 Epoch: 5 Global Step: 69290 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:15:24,767-Speed 2982.14 samples/sec Loss 10.4245 LearningRate 0.0520 Epoch: 5 Global Step: 69300 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:15:28,230-Speed 2957.94 samples/sec Loss 10.6497 LearningRate 0.0520 Epoch: 5 Global Step: 69310 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:15:31,675-Speed 2973.60 samples/sec Loss 10.7042 LearningRate 0.0520 Epoch: 5 Global Step: 69320 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:15:35,043-Speed 3041.15 samples/sec Loss 10.5257 LearningRate 0.0520 Epoch: 5 Global Step: 69330 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:15:38,544-Speed 2925.83 samples/sec Loss 10.7536 LearningRate 0.0520 Epoch: 5 Global Step: 69340 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:15:41,974-Speed 2986.33 samples/sec Loss 10.5048 LearningRate 0.0520 Epoch: 5 Global Step: 69350 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:15:45,362-Speed 3023.45 samples/sec Loss 10.6968 LearningRate 0.0520 Epoch: 5 Global Step: 69360 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:15:48,721-Speed 3048.91 samples/sec Loss 10.4959 LearningRate 0.0519 Epoch: 5 Global Step: 69370 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:15:52,216-Speed 2930.97 samples/sec Loss 10.6162 LearningRate 0.0519 Epoch: 5 Global Step: 69380 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:15:55,624-Speed 3005.45 samples/sec Loss 10.6434 LearningRate 0.0519 Epoch: 5 Global Step: 69390 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:15:59,011-Speed 3024.63 samples/sec Loss 10.5131 LearningRate 0.0519 Epoch: 5 Global Step: 69400 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:16:02,482-Speed 2950.55 samples/sec Loss 10.6165 LearningRate 0.0519 Epoch: 5 Global Step: 69410 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:16:05,817-Speed 3071.61 samples/sec Loss 10.6021 LearningRate 0.0519 Epoch: 5 Global Step: 69420 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:16:09,228-Speed 3002.83 samples/sec Loss 10.4905 LearningRate 0.0519 Epoch: 5 Global Step: 69430 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:16:12,630-Speed 3011.24 samples/sec Loss 10.8399 LearningRate 0.0519 Epoch: 5 Global Step: 69440 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:16:16,081-Speed 2967.92 samples/sec Loss 10.6232 LearningRate 0.0519 Epoch: 5 Global Step: 69450 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:16:19,438-Speed 3051.52 samples/sec Loss 10.4809 LearningRate 0.0519 Epoch: 5 Global Step: 69460 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:16:22,851-Speed 3001.19 samples/sec Loss 10.6663 LearningRate 0.0519 Epoch: 5 Global Step: 69470 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:16:26,234-Speed 3028.25 samples/sec Loss 10.4670 LearningRate 0.0519 Epoch: 5 Global Step: 69480 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:16:29,660-Speed 2989.09 samples/sec Loss 10.5656 LearningRate 0.0519 Epoch: 5 Global Step: 69490 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:16:33,077-Speed 2999.23 samples/sec Loss 10.5798 LearningRate 0.0519 Epoch: 5 Global Step: 69500 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:16:36,498-Speed 2994.63 samples/sec Loss 10.7109 LearningRate 0.0519 Epoch: 5 Global Step: 69510 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:16:39,894-Speed 3015.58 samples/sec Loss 10.6718 LearningRate 0.0519 Epoch: 5 Global Step: 69520 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:16:43,290-Speed 3017.24 samples/sec Loss 10.6079 LearningRate 0.0519 Epoch: 5 Global Step: 69530 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:16:46,709-Speed 2995.78 samples/sec Loss 10.5118 LearningRate 0.0519 Epoch: 5 Global Step: 69540 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:16:50,155-Speed 2971.95 samples/sec Loss 10.5524 LearningRate 0.0518 Epoch: 5 Global Step: 69550 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:16:53,533-Speed 3032.50 samples/sec Loss 10.5674 LearningRate 0.0518 Epoch: 5 Global Step: 69560 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:16:56,978-Speed 2973.60 samples/sec Loss 10.6461 LearningRate 0.0518 Epoch: 5 Global Step: 69570 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:17:00,426-Speed 2970.47 samples/sec Loss 10.5011 LearningRate 0.0518 Epoch: 5 Global Step: 69580 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:17:03,789-Speed 3045.64 samples/sec Loss 10.6868 LearningRate 0.0518 Epoch: 5 Global Step: 69590 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:17:07,221-Speed 2984.87 samples/sec Loss 10.5245 LearningRate 0.0518 Epoch: 5 Global Step: 69600 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:17:10,624-Speed 3010.34 samples/sec Loss 10.6743 LearningRate 0.0518 Epoch: 5 Global Step: 69610 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:17:13,980-Speed 3051.93 samples/sec Loss 10.6838 LearningRate 0.0518 Epoch: 5 Global Step: 69620 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-04-27 08:17:17,383-Speed 3009.59 samples/sec Loss 10.6226 LearningRate 0.0518 Epoch: 5 Global Step: 69630 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:17:20,733-Speed 3058.23 samples/sec Loss 10.6551 LearningRate 0.0518 Epoch: 5 Global Step: 69640 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:17:24,110-Speed 3032.94 samples/sec Loss 10.6634 LearningRate 0.0518 Epoch: 5 Global Step: 69650 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:17:27,416-Speed 3098.01 samples/sec Loss 10.6472 LearningRate 0.0518 Epoch: 5 Global Step: 69660 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:17:30,834-Speed 2997.36 samples/sec Loss 10.4444 LearningRate 0.0518 Epoch: 5 Global Step: 69670 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:17:34,267-Speed 2983.29 samples/sec Loss 10.5852 LearningRate 0.0518 Epoch: 5 Global Step: 69680 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:17:37,647-Speed 3031.23 samples/sec Loss 10.5666 LearningRate 0.0518 Epoch: 5 Global Step: 69690 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:17:41,036-Speed 3022.12 samples/sec Loss 10.6151 LearningRate 0.0518 Epoch: 5 Global Step: 69700 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:17:44,351-Speed 3090.31 samples/sec Loss 10.6790 LearningRate 0.0518 Epoch: 5 Global Step: 69710 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:17:47,817-Speed 2955.34 samples/sec Loss 10.6724 LearningRate 0.0517 Epoch: 5 Global Step: 69720 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:17:51,270-Speed 2966.11 samples/sec Loss 10.6730 LearningRate 0.0517 Epoch: 5 Global Step: 69730 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:17:54,663-Speed 3019.73 samples/sec Loss 10.5308 LearningRate 0.0517 Epoch: 5 Global Step: 69740 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:17:58,067-Speed 3009.06 samples/sec Loss 10.5820 LearningRate 0.0517 Epoch: 5 Global Step: 69750 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:18:01,450-Speed 3027.74 samples/sec Loss 10.6233 LearningRate 0.0517 Epoch: 5 Global Step: 69760 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:18:04,868-Speed 2996.68 samples/sec Loss 10.4945 LearningRate 0.0517 Epoch: 5 Global Step: 69770 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:18:08,239-Speed 3038.53 samples/sec Loss 10.5453 LearningRate 0.0517 Epoch: 5 Global Step: 69780 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:18:11,646-Speed 3007.49 samples/sec Loss 10.4664 LearningRate 0.0517 Epoch: 5 Global Step: 69790 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:18:15,011-Speed 3043.92 samples/sec Loss 10.5945 LearningRate 0.0517 Epoch: 5 Global Step: 69800 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:18:18,429-Speed 2996.26 samples/sec Loss 10.4083 LearningRate 0.0517 Epoch: 5 Global Step: 69810 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:18:21,811-Speed 3028.44 samples/sec Loss 10.4149 LearningRate 0.0517 Epoch: 5 Global Step: 69820 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:18:25,318-Speed 2921.43 samples/sec Loss 10.5983 LearningRate 0.0517 Epoch: 5 Global Step: 69830 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:18:28,746-Speed 2987.31 samples/sec Loss 10.4799 LearningRate 0.0517 Epoch: 5 Global Step: 69840 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:18:32,118-Speed 3037.96 samples/sec Loss 10.5334 LearningRate 0.0517 Epoch: 5 Global Step: 69850 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:18:35,535-Speed 2997.84 samples/sec Loss 10.4926 LearningRate 0.0517 Epoch: 5 Global Step: 69860 Fp16 Grad Scale: 262144 Required: 17 hours Training: 2022-04-27 08:18:38,938-Speed 3010.31 samples/sec Loss 10.6635 LearningRate 0.0517 Epoch: 5 Global Step: 69870 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:18:42,310-Speed 3037.38 samples/sec Loss 10.5801 LearningRate 0.0517 Epoch: 5 Global Step: 69880 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:18:45,729-Speed 2995.78 samples/sec Loss 10.5335 LearningRate 0.0516 Epoch: 5 Global Step: 69890 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:18:49,091-Speed 3046.08 samples/sec Loss 10.7556 LearningRate 0.0516 Epoch: 5 Global Step: 69900 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:18:52,442-Speed 3057.66 samples/sec Loss 10.6249 LearningRate 0.0516 Epoch: 5 Global Step: 69910 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:18:55,841-Speed 3013.65 samples/sec Loss 10.5901 LearningRate 0.0516 Epoch: 5 Global Step: 69920 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:18:59,209-Speed 3041.26 samples/sec Loss 10.5197 LearningRate 0.0516 Epoch: 5 Global Step: 69930 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:19:02,620-Speed 3003.02 samples/sec Loss 10.5370 LearningRate 0.0516 Epoch: 5 Global Step: 69940 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:19:05,975-Speed 3052.84 samples/sec Loss 10.4817 LearningRate 0.0516 Epoch: 5 Global Step: 69950 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:19:09,403-Speed 2988.83 samples/sec Loss 10.5898 LearningRate 0.0516 Epoch: 5 Global Step: 69960 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:19:12,864-Speed 2959.26 samples/sec Loss 10.6417 LearningRate 0.0516 Epoch: 5 Global Step: 69970 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:19:16,356-Speed 2932.73 samples/sec Loss 10.5264 LearningRate 0.0516 Epoch: 5 Global Step: 69980 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:19:19,771-Speed 2999.35 samples/sec Loss 10.4215 LearningRate 0.0516 Epoch: 5 Global Step: 69990 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:19:23,208-Speed 2980.82 samples/sec Loss 10.5037 LearningRate 0.0516 Epoch: 5 Global Step: 70000 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:19:26,578-Speed 3039.13 samples/sec Loss 10.5840 LearningRate 0.0516 Epoch: 5 Global Step: 70010 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:19:29,952-Speed 3035.59 samples/sec Loss 10.7062 LearningRate 0.0516 Epoch: 5 Global Step: 70020 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:19:33,319-Speed 3042.43 samples/sec Loss 10.5122 LearningRate 0.0516 Epoch: 5 Global Step: 70030 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:19:36,802-Speed 2941.10 samples/sec Loss 10.6049 LearningRate 0.0516 Epoch: 5 Global Step: 70040 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:19:40,167-Speed 3044.41 samples/sec Loss 10.5476 LearningRate 0.0516 Epoch: 5 Global Step: 70050 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:19:43,573-Speed 3006.60 samples/sec Loss 10.6239 LearningRate 0.0515 Epoch: 5 Global Step: 70060 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:19:47,039-Speed 2956.20 samples/sec Loss 10.5605 LearningRate 0.0515 Epoch: 5 Global Step: 70070 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:19:50,368-Speed 3076.09 samples/sec Loss 10.5203 LearningRate 0.0515 Epoch: 5 Global Step: 70080 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:19:53,730-Speed 3048.16 samples/sec Loss 10.4862 LearningRate 0.0515 Epoch: 5 Global Step: 70090 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:19:57,188-Speed 2962.53 samples/sec Loss 10.5130 LearningRate 0.0515 Epoch: 5 Global Step: 70100 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:20:00,575-Speed 3024.22 samples/sec Loss 10.7845 LearningRate 0.0515 Epoch: 5 Global Step: 70110 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:20:03,941-Speed 3042.88 samples/sec Loss 10.5947 LearningRate 0.0515 Epoch: 5 Global Step: 70120 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:20:07,282-Speed 3065.95 samples/sec Loss 10.3987 LearningRate 0.0515 Epoch: 5 Global Step: 70130 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:20:10,657-Speed 3035.42 samples/sec Loss 10.4712 LearningRate 0.0515 Epoch: 5 Global Step: 70140 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:20:14,009-Speed 3056.21 samples/sec Loss 10.6206 LearningRate 0.0515 Epoch: 5 Global Step: 70150 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:20:17,421-Speed 3001.45 samples/sec Loss 10.4796 LearningRate 0.0515 Epoch: 5 Global Step: 70160 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:20:20,834-Speed 3000.86 samples/sec Loss 10.5846 LearningRate 0.0515 Epoch: 5 Global Step: 70170 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:20:24,199-Speed 3044.21 samples/sec Loss 10.4560 LearningRate 0.0515 Epoch: 5 Global Step: 70180 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:20:27,598-Speed 3013.66 samples/sec Loss 10.5127 LearningRate 0.0515 Epoch: 5 Global Step: 70190 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:20:31,010-Speed 3001.68 samples/sec Loss 10.5663 LearningRate 0.0515 Epoch: 5 Global Step: 70200 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:20:34,424-Speed 3000.67 samples/sec Loss 10.5004 LearningRate 0.0515 Epoch: 5 Global Step: 70210 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:20:37,790-Speed 3043.01 samples/sec Loss 10.5019 LearningRate 0.0515 Epoch: 5 Global Step: 70220 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:20:41,181-Speed 3020.31 samples/sec Loss 10.5046 LearningRate 0.0515 Epoch: 5 Global Step: 70230 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:20:44,550-Speed 3040.54 samples/sec Loss 10.5327 LearningRate 0.0514 Epoch: 5 Global Step: 70240 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:20:47,937-Speed 3023.62 samples/sec Loss 10.5039 LearningRate 0.0514 Epoch: 5 Global Step: 70250 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:20:51,331-Speed 3018.29 samples/sec Loss 10.6547 LearningRate 0.0514 Epoch: 5 Global Step: 70260 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:20:54,806-Speed 2947.82 samples/sec Loss 10.6417 LearningRate 0.0514 Epoch: 5 Global Step: 70270 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:20:58,170-Speed 3044.69 samples/sec Loss 10.6402 LearningRate 0.0514 Epoch: 5 Global Step: 70280 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:21:01,575-Speed 3008.23 samples/sec Loss 10.4684 LearningRate 0.0514 Epoch: 5 Global Step: 70290 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:21:05,001-Speed 2989.92 samples/sec Loss 10.4927 LearningRate 0.0514 Epoch: 5 Global Step: 70300 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:21:08,444-Speed 2974.91 samples/sec Loss 10.4119 LearningRate 0.0514 Epoch: 5 Global Step: 70310 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:21:11,852-Speed 3005.21 samples/sec Loss 10.4856 LearningRate 0.0514 Epoch: 5 Global Step: 70320 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:21:15,340-Speed 2937.12 samples/sec Loss 10.5032 LearningRate 0.0514 Epoch: 5 Global Step: 70330 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:21:18,822-Speed 2941.23 samples/sec Loss 10.5749 LearningRate 0.0514 Epoch: 5 Global Step: 70340 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:21:22,172-Speed 3057.70 samples/sec Loss 10.3565 LearningRate 0.0514 Epoch: 5 Global Step: 70350 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:21:25,624-Speed 2967.10 samples/sec Loss 10.6013 LearningRate 0.0514 Epoch: 5 Global Step: 70360 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:21:28,976-Speed 3056.95 samples/sec Loss 10.6555 LearningRate 0.0514 Epoch: 5 Global Step: 70370 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:21:32,480-Speed 2922.94 samples/sec Loss 10.6889 LearningRate 0.0514 Epoch: 5 Global Step: 70380 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:21:35,888-Speed 3005.56 samples/sec Loss 10.4752 LearningRate 0.0514 Epoch: 5 Global Step: 70390 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:21:39,224-Speed 3071.06 samples/sec Loss 10.5276 LearningRate 0.0514 Epoch: 5 Global Step: 70400 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:21:42,602-Speed 3032.05 samples/sec Loss 10.5899 LearningRate 0.0513 Epoch: 5 Global Step: 70410 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:21:46,003-Speed 3011.97 samples/sec Loss 10.5274 LearningRate 0.0513 Epoch: 5 Global Step: 70420 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:21:49,416-Speed 3000.79 samples/sec Loss 10.6030 LearningRate 0.0513 Epoch: 5 Global Step: 70430 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:21:52,806-Speed 3021.97 samples/sec Loss 10.5811 LearningRate 0.0513 Epoch: 5 Global Step: 70440 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:21:56,135-Speed 3076.88 samples/sec Loss 10.5085 LearningRate 0.0513 Epoch: 5 Global Step: 70450 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:21:59,506-Speed 3038.27 samples/sec Loss 10.5181 LearningRate 0.0513 Epoch: 5 Global Step: 70460 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:22:02,881-Speed 3035.39 samples/sec Loss 10.6635 LearningRate 0.0513 Epoch: 5 Global Step: 70470 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:22:06,217-Speed 3069.98 samples/sec Loss 10.5055 LearningRate 0.0513 Epoch: 5 Global Step: 70480 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:22:09,642-Speed 2990.67 samples/sec Loss 10.5972 LearningRate 0.0513 Epoch: 5 Global Step: 70490 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:22:13,069-Speed 2988.99 samples/sec Loss 10.4974 LearningRate 0.0513 Epoch: 5 Global Step: 70500 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:22:16,459-Speed 3021.37 samples/sec Loss 10.5343 LearningRate 0.0513 Epoch: 5 Global Step: 70510 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:22:19,815-Speed 3052.40 samples/sec Loss 10.5969 LearningRate 0.0513 Epoch: 5 Global Step: 70520 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:22:23,162-Speed 3060.35 samples/sec Loss 10.4654 LearningRate 0.0513 Epoch: 5 Global Step: 70530 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:22:26,586-Speed 2991.39 samples/sec Loss 10.4582 LearningRate 0.0513 Epoch: 5 Global Step: 70540 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:22:29,948-Speed 3046.93 samples/sec Loss 10.5752 LearningRate 0.0513 Epoch: 5 Global Step: 70550 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:22:33,298-Speed 3058.06 samples/sec Loss 10.6620 LearningRate 0.0513 Epoch: 5 Global Step: 70560 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:22:36,751-Speed 2966.41 samples/sec Loss 10.5088 LearningRate 0.0513 Epoch: 5 Global Step: 70570 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:22:40,177-Speed 2990.16 samples/sec Loss 10.6071 LearningRate 0.0512 Epoch: 5 Global Step: 70580 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:22:43,595-Speed 2996.48 samples/sec Loss 10.6257 LearningRate 0.0512 Epoch: 5 Global Step: 70590 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:22:47,069-Speed 2948.64 samples/sec Loss 10.5262 LearningRate 0.0512 Epoch: 5 Global Step: 70600 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:22:50,502-Speed 2982.96 samples/sec Loss 10.3705 LearningRate 0.0512 Epoch: 5 Global Step: 70610 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:22:53,931-Speed 2987.53 samples/sec Loss 10.4505 LearningRate 0.0512 Epoch: 5 Global Step: 70620 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:22:57,360-Speed 2986.92 samples/sec Loss 10.5163 LearningRate 0.0512 Epoch: 5 Global Step: 70630 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:23:00,728-Speed 3041.35 samples/sec Loss 10.4752 LearningRate 0.0512 Epoch: 5 Global Step: 70640 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:23:04,066-Speed 3068.70 samples/sec Loss 10.5961 LearningRate 0.0512 Epoch: 5 Global Step: 70650 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:23:07,472-Speed 3007.32 samples/sec Loss 10.2734 LearningRate 0.0512 Epoch: 5 Global Step: 70660 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:23:10,794-Speed 3083.98 samples/sec Loss 10.4093 LearningRate 0.0512 Epoch: 5 Global Step: 70670 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:23:14,218-Speed 2990.89 samples/sec Loss 10.5455 LearningRate 0.0512 Epoch: 5 Global Step: 70680 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:23:17,547-Speed 3077.03 samples/sec Loss 10.4759 LearningRate 0.0512 Epoch: 5 Global Step: 70690 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:23:20,878-Speed 3075.00 samples/sec Loss 10.4189 LearningRate 0.0512 Epoch: 5 Global Step: 70700 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:23:24,262-Speed 3027.12 samples/sec Loss 10.4982 LearningRate 0.0512 Epoch: 5 Global Step: 70710 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:23:27,720-Speed 2962.41 samples/sec Loss 10.5842 LearningRate 0.0512 Epoch: 5 Global Step: 70720 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:23:31,163-Speed 2974.39 samples/sec Loss 10.5386 LearningRate 0.0512 Epoch: 5 Global Step: 70730 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:23:34,611-Speed 2972.36 samples/sec Loss 10.5009 LearningRate 0.0512 Epoch: 5 Global Step: 70740 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:23:37,979-Speed 3041.18 samples/sec Loss 10.3940 LearningRate 0.0512 Epoch: 5 Global Step: 70750 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:23:41,420-Speed 2976.72 samples/sec Loss 10.3894 LearningRate 0.0511 Epoch: 5 Global Step: 70760 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:23:44,858-Speed 2979.31 samples/sec Loss 10.5352 LearningRate 0.0511 Epoch: 5 Global Step: 70770 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:23:48,252-Speed 3017.79 samples/sec Loss 10.5488 LearningRate 0.0511 Epoch: 5 Global Step: 70780 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:23:51,592-Speed 3067.11 samples/sec Loss 10.3773 LearningRate 0.0511 Epoch: 5 Global Step: 70790 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:23:54,923-Speed 3075.37 samples/sec Loss 10.3292 LearningRate 0.0511 Epoch: 5 Global Step: 70800 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:23:58,251-Speed 3077.53 samples/sec Loss 10.6228 LearningRate 0.0511 Epoch: 5 Global Step: 70810 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:24:01,627-Speed 3034.53 samples/sec Loss 10.4681 LearningRate 0.0511 Epoch: 5 Global Step: 70820 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:24:05,044-Speed 2997.12 samples/sec Loss 10.4815 LearningRate 0.0511 Epoch: 5 Global Step: 70830 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:24:08,417-Speed 3036.77 samples/sec Loss 10.4859 LearningRate 0.0511 Epoch: 5 Global Step: 70840 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:24:11,749-Speed 3074.39 samples/sec Loss 10.4520 LearningRate 0.0511 Epoch: 5 Global Step: 70850 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:24:15,172-Speed 2993.08 samples/sec Loss 10.5323 LearningRate 0.0511 Epoch: 5 Global Step: 70860 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:24:18,529-Speed 3050.76 samples/sec Loss 10.4398 LearningRate 0.0511 Epoch: 5 Global Step: 70870 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:24:21,915-Speed 3024.67 samples/sec Loss 10.4723 LearningRate 0.0511 Epoch: 5 Global Step: 70880 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:24:25,379-Speed 2957.30 samples/sec Loss 10.3338 LearningRate 0.0511 Epoch: 5 Global Step: 70890 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:24:28,773-Speed 3017.51 samples/sec Loss 10.5076 LearningRate 0.0511 Epoch: 5 Global Step: 70900 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:24:32,184-Speed 3003.06 samples/sec Loss 10.4001 LearningRate 0.0511 Epoch: 5 Global Step: 70910 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:24:35,632-Speed 2970.58 samples/sec Loss 10.4811 LearningRate 0.0511 Epoch: 5 Global Step: 70920 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:24:39,105-Speed 2949.74 samples/sec Loss 10.3010 LearningRate 0.0510 Epoch: 5 Global Step: 70930 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:24:42,498-Speed 3019.43 samples/sec Loss 10.4153 LearningRate 0.0510 Epoch: 5 Global Step: 70940 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:24:45,855-Speed 3050.62 samples/sec Loss 10.6066 LearningRate 0.0510 Epoch: 5 Global Step: 70950 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:24:49,241-Speed 3025.38 samples/sec Loss 10.5711 LearningRate 0.0510 Epoch: 5 Global Step: 70960 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:24:52,634-Speed 3018.93 samples/sec Loss 10.4892 LearningRate 0.0510 Epoch: 5 Global Step: 70970 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:24:56,051-Speed 2997.44 samples/sec Loss 10.6682 LearningRate 0.0510 Epoch: 5 Global Step: 70980 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:24:59,465-Speed 3000.00 samples/sec Loss 10.4887 LearningRate 0.0510 Epoch: 5 Global Step: 70990 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:25:02,839-Speed 3036.21 samples/sec Loss 10.4588 LearningRate 0.0510 Epoch: 5 Global Step: 71000 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:25:06,363-Speed 2906.97 samples/sec Loss 10.5007 LearningRate 0.0510 Epoch: 5 Global Step: 71010 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:25:09,756-Speed 3018.63 samples/sec Loss 10.4127 LearningRate 0.0510 Epoch: 5 Global Step: 71020 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:25:13,166-Speed 3003.79 samples/sec Loss 10.4959 LearningRate 0.0510 Epoch: 5 Global Step: 71030 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:25:16,558-Speed 3019.54 samples/sec Loss 10.4928 LearningRate 0.0510 Epoch: 5 Global Step: 71040 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:25:20,019-Speed 2960.10 samples/sec Loss 10.4659 LearningRate 0.0510 Epoch: 5 Global Step: 71050 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:25:23,436-Speed 2997.21 samples/sec Loss 10.5250 LearningRate 0.0510 Epoch: 5 Global Step: 71060 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:25:26,854-Speed 2996.87 samples/sec Loss 10.5204 LearningRate 0.0510 Epoch: 5 Global Step: 71070 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:25:30,239-Speed 3025.87 samples/sec Loss 10.5804 LearningRate 0.0510 Epoch: 5 Global Step: 71080 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:25:33,702-Speed 2958.30 samples/sec Loss 10.4290 LearningRate 0.0510 Epoch: 5 Global Step: 71090 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:25:37,108-Speed 3007.69 samples/sec Loss 10.4970 LearningRate 0.0509 Epoch: 5 Global Step: 71100 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:25:40,591-Speed 2940.73 samples/sec Loss 10.4373 LearningRate 0.0509 Epoch: 5 Global Step: 71110 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:25:44,080-Speed 2935.75 samples/sec Loss 10.5351 LearningRate 0.0509 Epoch: 5 Global Step: 71120 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:25:47,568-Speed 2936.57 samples/sec Loss 10.5264 LearningRate 0.0509 Epoch: 5 Global Step: 71130 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:25:51,016-Speed 2971.19 samples/sec Loss 10.4332 LearningRate 0.0509 Epoch: 5 Global Step: 71140 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:25:54,451-Speed 2982.48 samples/sec Loss 10.6344 LearningRate 0.0509 Epoch: 5 Global Step: 71150 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:25:57,806-Speed 3052.78 samples/sec Loss 10.4920 LearningRate 0.0509 Epoch: 5 Global Step: 71160 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:26:01,166-Speed 3048.24 samples/sec Loss 10.5846 LearningRate 0.0509 Epoch: 5 Global Step: 71170 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:26:04,509-Speed 3064.56 samples/sec Loss 10.4241 LearningRate 0.0509 Epoch: 5 Global Step: 71180 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:26:07,895-Speed 3024.48 samples/sec Loss 10.4300 LearningRate 0.0509 Epoch: 5 Global Step: 71190 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:26:11,243-Speed 3059.94 samples/sec Loss 10.4073 LearningRate 0.0509 Epoch: 5 Global Step: 71200 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:26:14,630-Speed 3023.52 samples/sec Loss 10.5257 LearningRate 0.0509 Epoch: 5 Global Step: 71210 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:26:18,070-Speed 2978.40 samples/sec Loss 10.4744 LearningRate 0.0509 Epoch: 5 Global Step: 71220 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:26:21,534-Speed 2956.53 samples/sec Loss 10.4745 LearningRate 0.0509 Epoch: 5 Global Step: 71230 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:26:24,851-Speed 3088.17 samples/sec Loss 10.4919 LearningRate 0.0509 Epoch: 5 Global Step: 71240 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:26:28,217-Speed 3042.76 samples/sec Loss 10.5472 LearningRate 0.0509 Epoch: 5 Global Step: 71250 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:26:31,543-Speed 3080.01 samples/sec Loss 10.6030 LearningRate 0.0509 Epoch: 5 Global Step: 71260 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:26:34,957-Speed 2999.61 samples/sec Loss 10.4458 LearningRate 0.0509 Epoch: 5 Global Step: 71270 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:26:38,411-Speed 2966.28 samples/sec Loss 10.4434 LearningRate 0.0508 Epoch: 5 Global Step: 71280 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:26:41,831-Speed 2994.93 samples/sec Loss 10.3994 LearningRate 0.0508 Epoch: 5 Global Step: 71290 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:26:45,270-Speed 2978.16 samples/sec Loss 10.5918 LearningRate 0.0508 Epoch: 5 Global Step: 71300 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:26:48,743-Speed 2949.26 samples/sec Loss 10.4094 LearningRate 0.0508 Epoch: 5 Global Step: 71310 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:26:52,090-Speed 3060.36 samples/sec Loss 10.5391 LearningRate 0.0508 Epoch: 5 Global Step: 71320 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:26:55,496-Speed 3007.21 samples/sec Loss 10.5028 LearningRate 0.0508 Epoch: 5 Global Step: 71330 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:26:58,900-Speed 3009.30 samples/sec Loss 10.6240 LearningRate 0.0508 Epoch: 5 Global Step: 71340 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:27:02,306-Speed 3007.06 samples/sec Loss 10.4756 LearningRate 0.0508 Epoch: 5 Global Step: 71350 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:27:05,680-Speed 3037.00 samples/sec Loss 10.4337 LearningRate 0.0508 Epoch: 5 Global Step: 71360 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:27:09,178-Speed 2928.76 samples/sec Loss 10.3458 LearningRate 0.0508 Epoch: 5 Global Step: 71370 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:27:12,544-Speed 3042.64 samples/sec Loss 10.4838 LearningRate 0.0508 Epoch: 5 Global Step: 71380 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:27:15,948-Speed 3009.21 samples/sec Loss 10.4382 LearningRate 0.0508 Epoch: 5 Global Step: 71390 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:27:19,301-Speed 3055.77 samples/sec Loss 10.4948 LearningRate 0.0508 Epoch: 5 Global Step: 71400 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:27:22,789-Speed 2936.59 samples/sec Loss 10.3554 LearningRate 0.0508 Epoch: 5 Global Step: 71410 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:27:26,143-Speed 3053.72 samples/sec Loss 10.5778 LearningRate 0.0508 Epoch: 5 Global Step: 71420 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:27:29,552-Speed 3004.05 samples/sec Loss 10.3816 LearningRate 0.0508 Epoch: 5 Global Step: 71430 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:27:32,895-Speed 3064.56 samples/sec Loss 10.4523 LearningRate 0.0508 Epoch: 5 Global Step: 71440 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:27:36,269-Speed 3035.25 samples/sec Loss 10.5072 LearningRate 0.0507 Epoch: 5 Global Step: 71450 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:27:39,678-Speed 3005.04 samples/sec Loss 10.4357 LearningRate 0.0507 Epoch: 5 Global Step: 71460 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:27:43,059-Speed 3029.48 samples/sec Loss 10.5619 LearningRate 0.0507 Epoch: 5 Global Step: 71470 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:27:46,421-Speed 3046.72 samples/sec Loss 10.3712 LearningRate 0.0507 Epoch: 5 Global Step: 71480 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:27:49,781-Speed 3048.30 samples/sec Loss 10.4065 LearningRate 0.0507 Epoch: 5 Global Step: 71490 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:27:53,245-Speed 2957.44 samples/sec Loss 10.2923 LearningRate 0.0507 Epoch: 5 Global Step: 71500 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:27:56,695-Speed 2968.38 samples/sec Loss 10.4433 LearningRate 0.0507 Epoch: 5 Global Step: 71510 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:28:00,029-Speed 3072.35 samples/sec Loss 10.5186 LearningRate 0.0507 Epoch: 5 Global Step: 71520 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:28:03,383-Speed 3054.73 samples/sec Loss 10.5153 LearningRate 0.0507 Epoch: 5 Global Step: 71530 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:28:06,771-Speed 3023.31 samples/sec Loss 10.4535 LearningRate 0.0507 Epoch: 5 Global Step: 71540 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:28:10,197-Speed 2989.64 samples/sec Loss 10.4274 LearningRate 0.0507 Epoch: 5 Global Step: 71550 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:28:13,662-Speed 2955.90 samples/sec Loss 10.4838 LearningRate 0.0507 Epoch: 5 Global Step: 71560 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:28:17,048-Speed 3025.46 samples/sec Loss 10.4260 LearningRate 0.0507 Epoch: 5 Global Step: 71570 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:28:20,508-Speed 2959.81 samples/sec Loss 10.4768 LearningRate 0.0507 Epoch: 5 Global Step: 71580 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 08:28:23,944-Speed 2981.39 samples/sec Loss 10.5295 LearningRate 0.0507 Epoch: 5 Global Step: 71590 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 08:28:27,290-Speed 3061.00 samples/sec Loss 10.4264 LearningRate 0.0507 Epoch: 5 Global Step: 71600 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 08:28:30,618-Speed 3077.78 samples/sec Loss 10.3983 LearningRate 0.0507 Epoch: 5 Global Step: 71610 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 08:28:33,965-Speed 3062.32 samples/sec Loss 10.3368 LearningRate 0.0507 Epoch: 5 Global Step: 71620 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 08:28:37,373-Speed 3005.66 samples/sec Loss 10.4078 LearningRate 0.0506 Epoch: 5 Global Step: 71630 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 08:28:40,811-Speed 2978.60 samples/sec Loss 10.3488 LearningRate 0.0506 Epoch: 5 Global Step: 71640 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 08:28:44,278-Speed 2955.28 samples/sec Loss 10.6279 LearningRate 0.0506 Epoch: 5 Global Step: 71650 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 08:28:47,731-Speed 2966.28 samples/sec Loss 10.3624 LearningRate 0.0506 Epoch: 5 Global Step: 71660 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 08:28:51,123-Speed 3019.57 samples/sec Loss 10.4145 LearningRate 0.0506 Epoch: 5 Global Step: 71670 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 08:28:54,520-Speed 3015.56 samples/sec Loss 10.3859 LearningRate 0.0506 Epoch: 5 Global Step: 71680 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:28:57,947-Speed 2988.38 samples/sec Loss 10.5627 LearningRate 0.0506 Epoch: 5 Global Step: 71690 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:29:01,369-Speed 2993.68 samples/sec Loss 10.4602 LearningRate 0.0506 Epoch: 5 Global Step: 71700 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:29:04,771-Speed 3011.11 samples/sec Loss 10.4124 LearningRate 0.0506 Epoch: 5 Global Step: 71710 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:29:08,161-Speed 3021.39 samples/sec Loss 10.4194 LearningRate 0.0506 Epoch: 5 Global Step: 71720 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:29:11,618-Speed 2962.52 samples/sec Loss 10.2698 LearningRate 0.0506 Epoch: 5 Global Step: 71730 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:29:15,002-Speed 3026.84 samples/sec Loss 10.3914 LearningRate 0.0506 Epoch: 5 Global Step: 71740 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:29:18,409-Speed 3006.38 samples/sec Loss 10.3225 LearningRate 0.0506 Epoch: 5 Global Step: 71750 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:29:21,781-Speed 3037.93 samples/sec Loss 10.4794 LearningRate 0.0506 Epoch: 5 Global Step: 71760 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:29:25,208-Speed 2989.28 samples/sec Loss 10.4131 LearningRate 0.0506 Epoch: 5 Global Step: 71770 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:29:28,553-Speed 3061.73 samples/sec Loss 10.3624 LearningRate 0.0506 Epoch: 5 Global Step: 71780 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:29:31,880-Speed 3079.06 samples/sec Loss 10.5639 LearningRate 0.0506 Epoch: 5 Global Step: 71790 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:29:35,251-Speed 3038.64 samples/sec Loss 10.2158 LearningRate 0.0505 Epoch: 5 Global Step: 71800 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:29:38,599-Speed 3058.67 samples/sec Loss 10.3229 LearningRate 0.0505 Epoch: 5 Global Step: 71810 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:29:41,955-Speed 3052.24 samples/sec Loss 10.4455 LearningRate 0.0505 Epoch: 5 Global Step: 71820 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:29:45,355-Speed 3012.86 samples/sec Loss 10.4014 LearningRate 0.0505 Epoch: 5 Global Step: 71830 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:29:48,729-Speed 3036.01 samples/sec Loss 10.4037 LearningRate 0.0505 Epoch: 5 Global Step: 71840 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:29:52,128-Speed 3013.28 samples/sec Loss 10.4106 LearningRate 0.0505 Epoch: 5 Global Step: 71850 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:29:55,504-Speed 3034.09 samples/sec Loss 10.4279 LearningRate 0.0505 Epoch: 5 Global Step: 71860 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:29:58,933-Speed 2987.29 samples/sec Loss 10.3890 LearningRate 0.0505 Epoch: 5 Global Step: 71870 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:30:02,366-Speed 2983.39 samples/sec Loss 10.3334 LearningRate 0.0505 Epoch: 5 Global Step: 71880 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:30:05,743-Speed 3033.43 samples/sec Loss 10.4499 LearningRate 0.0505 Epoch: 5 Global Step: 71890 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:30:09,121-Speed 3032.35 samples/sec Loss 10.4807 LearningRate 0.0505 Epoch: 5 Global Step: 71900 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:30:12,539-Speed 2997.27 samples/sec Loss 10.4394 LearningRate 0.0505 Epoch: 5 Global Step: 71910 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:30:16,030-Speed 2933.86 samples/sec Loss 10.3688 LearningRate 0.0505 Epoch: 5 Global Step: 71920 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:30:19,453-Speed 2992.81 samples/sec Loss 10.4618 LearningRate 0.0505 Epoch: 5 Global Step: 71930 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:30:22,848-Speed 3016.89 samples/sec Loss 10.3946 LearningRate 0.0505 Epoch: 5 Global Step: 71940 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:30:26,159-Speed 3093.86 samples/sec Loss 10.4964 LearningRate 0.0505 Epoch: 5 Global Step: 71950 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:30:29,540-Speed 3029.95 samples/sec Loss 10.3406 LearningRate 0.0505 Epoch: 5 Global Step: 71960 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:30:32,919-Speed 3030.90 samples/sec Loss 10.3740 LearningRate 0.0505 Epoch: 5 Global Step: 71970 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:30:36,322-Speed 3009.82 samples/sec Loss 10.2919 LearningRate 0.0504 Epoch: 5 Global Step: 71980 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:30:39,745-Speed 2992.71 samples/sec Loss 10.4710 LearningRate 0.0504 Epoch: 5 Global Step: 71990 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:30:43,129-Speed 3026.91 samples/sec Loss 10.4038 LearningRate 0.0504 Epoch: 5 Global Step: 72000 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:30:46,518-Speed 3022.82 samples/sec Loss 10.5169 LearningRate 0.0504 Epoch: 5 Global Step: 72010 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:30:49,947-Speed 2987.09 samples/sec Loss 10.5074 LearningRate 0.0504 Epoch: 5 Global Step: 72020 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:30:53,366-Speed 2995.63 samples/sec Loss 10.5658 LearningRate 0.0504 Epoch: 5 Global Step: 72030 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:30:56,744-Speed 3032.77 samples/sec Loss 10.4648 LearningRate 0.0504 Epoch: 5 Global Step: 72040 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:31:00,172-Speed 2987.78 samples/sec Loss 10.4619 LearningRate 0.0504 Epoch: 5 Global Step: 72050 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:31:03,554-Speed 3028.84 samples/sec Loss 10.5254 LearningRate 0.0504 Epoch: 5 Global Step: 72060 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:31:06,886-Speed 3074.50 samples/sec Loss 10.4253 LearningRate 0.0504 Epoch: 5 Global Step: 72070 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:31:10,205-Speed 3085.77 samples/sec Loss 10.3156 LearningRate 0.0504 Epoch: 5 Global Step: 72080 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:31:13,667-Speed 2958.88 samples/sec Loss 10.5082 LearningRate 0.0504 Epoch: 5 Global Step: 72090 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:31:17,050-Speed 3027.85 samples/sec Loss 10.2629 LearningRate 0.0504 Epoch: 5 Global Step: 72100 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:31:20,448-Speed 3015.06 samples/sec Loss 10.3715 LearningRate 0.0504 Epoch: 5 Global Step: 72110 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:31:23,837-Speed 3022.38 samples/sec Loss 10.3280 LearningRate 0.0504 Epoch: 5 Global Step: 72120 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:31:27,234-Speed 3014.88 samples/sec Loss 10.3258 LearningRate 0.0504 Epoch: 5 Global Step: 72130 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:31:30,576-Speed 3064.78 samples/sec Loss 10.3709 LearningRate 0.0504 Epoch: 5 Global Step: 72140 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:31:33,979-Speed 3010.14 samples/sec Loss 10.5082 LearningRate 0.0503 Epoch: 5 Global Step: 72150 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:31:37,403-Speed 2991.76 samples/sec Loss 10.4847 LearningRate 0.0503 Epoch: 5 Global Step: 72160 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:31:40,819-Speed 2998.21 samples/sec Loss 10.4222 LearningRate 0.0503 Epoch: 5 Global Step: 72170 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:31:45,033-Speed 2430.31 samples/sec Loss 10.3397 LearningRate 0.0503 Epoch: 5 Global Step: 72180 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:31:48,442-Speed 3005.62 samples/sec Loss 10.4122 LearningRate 0.0503 Epoch: 5 Global Step: 72190 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:31:53,121-Speed 2188.88 samples/sec Loss 10.3602 LearningRate 0.0503 Epoch: 5 Global Step: 72200 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:31:57,129-Speed 2555.73 samples/sec Loss 10.4978 LearningRate 0.0503 Epoch: 5 Global Step: 72210 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:32:00,479-Speed 3058.21 samples/sec Loss 10.3418 LearningRate 0.0503 Epoch: 5 Global Step: 72220 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:32:04,590-Speed 2491.36 samples/sec Loss 10.4354 LearningRate 0.0503 Epoch: 5 Global Step: 72230 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:32:07,978-Speed 3023.06 samples/sec Loss 10.3518 LearningRate 0.0503 Epoch: 5 Global Step: 72240 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:32:11,401-Speed 2991.98 samples/sec Loss 10.4671 LearningRate 0.0503 Epoch: 5 Global Step: 72250 Fp16 Grad Scale: 131072 Required: 17 hours Training: 2022-04-27 08:32:14,748-Speed 3060.88 samples/sec Loss 10.2545 LearningRate 0.0503 Epoch: 5 Global Step: 72260 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:32:18,154-Speed 3007.28 samples/sec Loss 10.4326 LearningRate 0.0503 Epoch: 5 Global Step: 72270 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:32:21,530-Speed 3034.61 samples/sec Loss 10.4205 LearningRate 0.0503 Epoch: 5 Global Step: 72280 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:32:24,959-Speed 2987.15 samples/sec Loss 10.5108 LearningRate 0.0503 Epoch: 5 Global Step: 72290 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:32:28,358-Speed 3013.18 samples/sec Loss 10.2528 LearningRate 0.0503 Epoch: 5 Global Step: 72300 Fp16 Grad Scale: 65536 Required: 17 hours Training: 2022-04-27 08:32:31,759-Speed 3011.89 samples/sec Loss 10.3837 LearningRate 0.0503 Epoch: 5 Global Step: 72310 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 08:32:35,131-Speed 3037.38 samples/sec Loss 10.3827 LearningRate 0.0503 Epoch: 5 Global Step: 72320 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 08:32:38,546-Speed 3000.18 samples/sec Loss 10.3095 LearningRate 0.0502 Epoch: 5 Global Step: 72330 Fp16 Grad Scale: 32768 Required: 17 hours Training: 2022-04-27 08:32:41,940-Speed 3017.80 samples/sec Loss 10.3826 LearningRate 0.0502 Epoch: 5 Global Step: 72340 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 08:32:45,303-Speed 3045.09 samples/sec Loss 10.2963 LearningRate 0.0502 Epoch: 5 Global Step: 72350 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 08:32:48,694-Speed 3021.16 samples/sec Loss 10.3069 LearningRate 0.0502 Epoch: 5 Global Step: 72360 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 08:32:52,063-Speed 3040.04 samples/sec Loss 10.5349 LearningRate 0.0502 Epoch: 5 Global Step: 72370 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 08:32:55,408-Speed 3062.80 samples/sec Loss 10.4027 LearningRate 0.0502 Epoch: 5 Global Step: 72380 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 08:32:58,770-Speed 3046.85 samples/sec Loss 10.3717 LearningRate 0.0502 Epoch: 5 Global Step: 72390 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 08:33:02,211-Speed 2976.27 samples/sec Loss 10.4102 LearningRate 0.0502 Epoch: 5 Global Step: 72400 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 08:33:05,590-Speed 3031.32 samples/sec Loss 10.4473 LearningRate 0.0502 Epoch: 5 Global Step: 72410 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:33:08,921-Speed 3074.58 samples/sec Loss 10.3473 LearningRate 0.0502 Epoch: 5 Global Step: 72420 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:33:12,351-Speed 2986.76 samples/sec Loss 10.3422 LearningRate 0.0502 Epoch: 5 Global Step: 72430 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:33:15,775-Speed 2992.68 samples/sec Loss 10.3371 LearningRate 0.0502 Epoch: 5 Global Step: 72440 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:33:19,138-Speed 3046.13 samples/sec Loss 10.5771 LearningRate 0.0502 Epoch: 5 Global Step: 72450 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:33:22,461-Speed 3081.70 samples/sec Loss 10.3407 LearningRate 0.0502 Epoch: 5 Global Step: 72460 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:33:25,846-Speed 3026.20 samples/sec Loss 10.4264 LearningRate 0.0502 Epoch: 5 Global Step: 72470 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:33:29,214-Speed 3041.38 samples/sec Loss 10.3647 LearningRate 0.0502 Epoch: 5 Global Step: 72480 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:33:32,579-Speed 3043.93 samples/sec Loss 10.4161 LearningRate 0.0502 Epoch: 5 Global Step: 72490 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:33:35,912-Speed 3073.49 samples/sec Loss 10.4541 LearningRate 0.0501 Epoch: 5 Global Step: 72500 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:33:39,302-Speed 3021.73 samples/sec Loss 10.3491 LearningRate 0.0501 Epoch: 5 Global Step: 72510 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:33:42,622-Speed 3085.23 samples/sec Loss 10.3886 LearningRate 0.0501 Epoch: 5 Global Step: 72520 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:33:46,068-Speed 2972.36 samples/sec Loss 10.3699 LearningRate 0.0501 Epoch: 5 Global Step: 72530 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:33:49,460-Speed 3019.77 samples/sec Loss 10.3208 LearningRate 0.0501 Epoch: 5 Global Step: 72540 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:33:52,842-Speed 3029.09 samples/sec Loss 10.4145 LearningRate 0.0501 Epoch: 5 Global Step: 72550 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:33:56,208-Speed 3043.42 samples/sec Loss 10.2853 LearningRate 0.0501 Epoch: 5 Global Step: 72560 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:33:59,573-Speed 3044.21 samples/sec Loss 10.2592 LearningRate 0.0501 Epoch: 5 Global Step: 72570 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:34:04,211-Speed 2208.13 samples/sec Loss 10.2738 LearningRate 0.0501 Epoch: 5 Global Step: 72580 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:34:07,648-Speed 2980.70 samples/sec Loss 10.3501 LearningRate 0.0501 Epoch: 5 Global Step: 72590 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:34:11,060-Speed 3001.63 samples/sec Loss 10.4587 LearningRate 0.0501 Epoch: 5 Global Step: 72600 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:34:14,445-Speed 3026.71 samples/sec Loss 10.3599 LearningRate 0.0501 Epoch: 5 Global Step: 72610 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:34:17,850-Speed 3008.00 samples/sec Loss 10.2808 LearningRate 0.0501 Epoch: 5 Global Step: 72620 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:34:21,260-Speed 3004.44 samples/sec Loss 10.4352 LearningRate 0.0501 Epoch: 5 Global Step: 72630 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:34:24,585-Speed 3080.22 samples/sec Loss 10.4621 LearningRate 0.0501 Epoch: 5 Global Step: 72640 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:34:27,971-Speed 3024.77 samples/sec Loss 10.3826 LearningRate 0.0501 Epoch: 5 Global Step: 72650 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:34:31,290-Speed 3086.95 samples/sec Loss 10.4165 LearningRate 0.0501 Epoch: 5 Global Step: 72660 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:34:34,588-Speed 3104.86 samples/sec Loss 10.2374 LearningRate 0.0501 Epoch: 5 Global Step: 72670 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:34:37,915-Speed 3078.99 samples/sec Loss 10.3428 LearningRate 0.0500 Epoch: 5 Global Step: 72680 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:34:41,337-Speed 2993.67 samples/sec Loss 10.3632 LearningRate 0.0500 Epoch: 5 Global Step: 72690 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:34:44,666-Speed 3077.74 samples/sec Loss 10.4557 LearningRate 0.0500 Epoch: 5 Global Step: 72700 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:34:48,042-Speed 3034.01 samples/sec Loss 10.5162 LearningRate 0.0500 Epoch: 5 Global Step: 72710 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:34:51,529-Speed 2937.71 samples/sec Loss 10.4015 LearningRate 0.0500 Epoch: 5 Global Step: 72720 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:34:54,964-Speed 2981.78 samples/sec Loss 10.4715 LearningRate 0.0500 Epoch: 5 Global Step: 72730 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:34:58,350-Speed 3025.36 samples/sec Loss 10.3654 LearningRate 0.0500 Epoch: 5 Global Step: 72740 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:35:01,763-Speed 3000.83 samples/sec Loss 10.4574 LearningRate 0.0500 Epoch: 5 Global Step: 72750 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:35:05,114-Speed 3056.83 samples/sec Loss 10.4467 LearningRate 0.0500 Epoch: 5 Global Step: 72760 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:35:08,524-Speed 3003.73 samples/sec Loss 10.4942 LearningRate 0.0500 Epoch: 5 Global Step: 72770 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:35:11,877-Speed 3055.17 samples/sec Loss 10.2488 LearningRate 0.0500 Epoch: 5 Global Step: 72780 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:35:15,200-Speed 3082.55 samples/sec Loss 10.3627 LearningRate 0.0500 Epoch: 5 Global Step: 72790 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:35:18,638-Speed 2979.01 samples/sec Loss 10.3235 LearningRate 0.0500 Epoch: 5 Global Step: 72800 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:35:22,077-Speed 2978.79 samples/sec Loss 10.2140 LearningRate 0.0500 Epoch: 5 Global Step: 72810 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:35:25,443-Speed 3042.73 samples/sec Loss 10.2631 LearningRate 0.0500 Epoch: 5 Global Step: 72820 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:35:28,926-Speed 2940.80 samples/sec Loss 10.3067 LearningRate 0.0500 Epoch: 5 Global Step: 72830 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:35:32,411-Speed 2939.14 samples/sec Loss 10.3584 LearningRate 0.0500 Epoch: 5 Global Step: 72840 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:35:35,832-Speed 2994.48 samples/sec Loss 10.4718 LearningRate 0.0499 Epoch: 5 Global Step: 72850 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:35:39,232-Speed 3012.77 samples/sec Loss 10.3875 LearningRate 0.0499 Epoch: 5 Global Step: 72860 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:35:42,617-Speed 3025.79 samples/sec Loss 10.3112 LearningRate 0.0499 Epoch: 5 Global Step: 72870 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:35:46,053-Speed 2981.17 samples/sec Loss 10.2045 LearningRate 0.0499 Epoch: 5 Global Step: 72880 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:35:49,532-Speed 2944.30 samples/sec Loss 10.3680 LearningRate 0.0499 Epoch: 5 Global Step: 72890 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:35:52,910-Speed 3031.87 samples/sec Loss 10.4006 LearningRate 0.0499 Epoch: 5 Global Step: 72900 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:35:56,288-Speed 3032.81 samples/sec Loss 10.2676 LearningRate 0.0499 Epoch: 5 Global Step: 72910 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:35:59,641-Speed 3054.10 samples/sec Loss 10.1740 LearningRate 0.0499 Epoch: 5 Global Step: 72920 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:36:02,992-Speed 3057.45 samples/sec Loss 10.3516 LearningRate 0.0499 Epoch: 5 Global Step: 72930 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:36:06,362-Speed 3039.32 samples/sec Loss 10.3908 LearningRate 0.0499 Epoch: 5 Global Step: 72940 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:36:09,750-Speed 3023.21 samples/sec Loss 10.3318 LearningRate 0.0499 Epoch: 5 Global Step: 72950 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:36:13,270-Speed 2910.01 samples/sec Loss 10.2788 LearningRate 0.0499 Epoch: 5 Global Step: 72960 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:36:16,665-Speed 3016.56 samples/sec Loss 10.2621 LearningRate 0.0499 Epoch: 5 Global Step: 72970 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:36:20,073-Speed 3005.96 samples/sec Loss 10.4643 LearningRate 0.0499 Epoch: 5 Global Step: 72980 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:36:23,415-Speed 3064.98 samples/sec Loss 10.3525 LearningRate 0.0499 Epoch: 5 Global Step: 72990 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:36:26,755-Speed 3067.15 samples/sec Loss 10.2337 LearningRate 0.0499 Epoch: 5 Global Step: 73000 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:36:30,166-Speed 3002.76 samples/sec Loss 10.1478 LearningRate 0.0499 Epoch: 5 Global Step: 73010 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:36:33,627-Speed 2959.51 samples/sec Loss 10.5560 LearningRate 0.0499 Epoch: 5 Global Step: 73020 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:36:37,141-Speed 2915.07 samples/sec Loss 10.4411 LearningRate 0.0498 Epoch: 5 Global Step: 73030 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:36:40,570-Speed 2986.71 samples/sec Loss 10.4128 LearningRate 0.0498 Epoch: 5 Global Step: 73040 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:36:44,002-Speed 2984.63 samples/sec Loss 10.3939 LearningRate 0.0498 Epoch: 5 Global Step: 73050 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:36:47,424-Speed 2992.58 samples/sec Loss 10.4430 LearningRate 0.0498 Epoch: 5 Global Step: 73060 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:36:50,863-Speed 2978.43 samples/sec Loss 10.2894 LearningRate 0.0498 Epoch: 5 Global Step: 73070 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:36:54,226-Speed 3045.88 samples/sec Loss 10.2948 LearningRate 0.0498 Epoch: 5 Global Step: 73080 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:36:57,563-Speed 3069.92 samples/sec Loss 10.3296 LearningRate 0.0498 Epoch: 5 Global Step: 73090 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:37:00,966-Speed 3010.35 samples/sec Loss 10.4269 LearningRate 0.0498 Epoch: 5 Global Step: 73100 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:37:04,328-Speed 3045.96 samples/sec Loss 10.2462 LearningRate 0.0498 Epoch: 5 Global Step: 73110 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:37:07,705-Speed 3033.44 samples/sec Loss 10.4742 LearningRate 0.0498 Epoch: 5 Global Step: 73120 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:37:11,088-Speed 3027.73 samples/sec Loss 10.2588 LearningRate 0.0498 Epoch: 5 Global Step: 73130 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:37:14,401-Speed 3092.50 samples/sec Loss 10.3065 LearningRate 0.0498 Epoch: 5 Global Step: 73140 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:37:17,813-Speed 3001.15 samples/sec Loss 10.4049 LearningRate 0.0498 Epoch: 5 Global Step: 73150 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:37:21,212-Speed 3014.10 samples/sec Loss 10.1118 LearningRate 0.0498 Epoch: 5 Global Step: 73160 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:37:24,575-Speed 3045.91 samples/sec Loss 10.2891 LearningRate 0.0498 Epoch: 5 Global Step: 73170 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:37:27,948-Speed 3036.66 samples/sec Loss 10.3075 LearningRate 0.0498 Epoch: 5 Global Step: 73180 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:37:31,286-Speed 3068.08 samples/sec Loss 10.2967 LearningRate 0.0498 Epoch: 5 Global Step: 73190 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:37:34,660-Speed 3036.07 samples/sec Loss 10.4052 LearningRate 0.0498 Epoch: 5 Global Step: 73200 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:37:38,006-Speed 3061.28 samples/sec Loss 10.3757 LearningRate 0.0497 Epoch: 5 Global Step: 73210 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:37:41,361-Speed 3053.12 samples/sec Loss 10.2287 LearningRate 0.0497 Epoch: 5 Global Step: 73220 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:37:44,747-Speed 3025.31 samples/sec Loss 10.1888 LearningRate 0.0497 Epoch: 5 Global Step: 73230 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:37:48,134-Speed 3023.39 samples/sec Loss 10.3929 LearningRate 0.0497 Epoch: 5 Global Step: 73240 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:37:51,528-Speed 3017.90 samples/sec Loss 10.2948 LearningRate 0.0497 Epoch: 5 Global Step: 73250 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:37:54,842-Speed 3090.88 samples/sec Loss 10.3308 LearningRate 0.0497 Epoch: 5 Global Step: 73260 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:37:58,173-Speed 3075.03 samples/sec Loss 10.3851 LearningRate 0.0497 Epoch: 5 Global Step: 73270 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:38:01,632-Speed 2960.89 samples/sec Loss 10.3053 LearningRate 0.0497 Epoch: 5 Global Step: 73280 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:38:05,054-Speed 2993.66 samples/sec Loss 10.2853 LearningRate 0.0497 Epoch: 5 Global Step: 73290 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:38:08,470-Speed 2998.15 samples/sec Loss 10.3600 LearningRate 0.0497 Epoch: 5 Global Step: 73300 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:38:11,843-Speed 3036.67 samples/sec Loss 10.2980 LearningRate 0.0497 Epoch: 5 Global Step: 73310 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:38:15,302-Speed 2961.09 samples/sec Loss 10.2996 LearningRate 0.0497 Epoch: 5 Global Step: 73320 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:38:18,624-Speed 3084.13 samples/sec Loss 10.4179 LearningRate 0.0497 Epoch: 5 Global Step: 73330 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:38:21,998-Speed 3034.87 samples/sec Loss 10.3767 LearningRate 0.0497 Epoch: 5 Global Step: 73340 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:38:25,396-Speed 3014.31 samples/sec Loss 10.3097 LearningRate 0.0497 Epoch: 5 Global Step: 73350 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:38:28,771-Speed 3035.34 samples/sec Loss 10.2880 LearningRate 0.0497 Epoch: 5 Global Step: 73360 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:38:32,120-Speed 3058.38 samples/sec Loss 10.1940 LearningRate 0.0497 Epoch: 5 Global Step: 73370 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:38:35,530-Speed 3004.10 samples/sec Loss 10.2957 LearningRate 0.0496 Epoch: 5 Global Step: 73380 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:38:38,919-Speed 3022.18 samples/sec Loss 10.2298 LearningRate 0.0496 Epoch: 5 Global Step: 73390 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:38:42,339-Speed 2994.95 samples/sec Loss 10.4740 LearningRate 0.0496 Epoch: 5 Global Step: 73400 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:38:45,705-Speed 3043.61 samples/sec Loss 10.3303 LearningRate 0.0496 Epoch: 5 Global Step: 73410 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:38:49,027-Speed 3082.96 samples/sec Loss 10.2071 LearningRate 0.0496 Epoch: 5 Global Step: 73420 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:38:52,391-Speed 3045.10 samples/sec Loss 10.2561 LearningRate 0.0496 Epoch: 5 Global Step: 73430 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:38:55,783-Speed 3019.45 samples/sec Loss 10.1875 LearningRate 0.0496 Epoch: 5 Global Step: 73440 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:38:59,138-Speed 3053.77 samples/sec Loss 10.1387 LearningRate 0.0496 Epoch: 5 Global Step: 73450 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:39:02,477-Speed 3067.37 samples/sec Loss 10.1845 LearningRate 0.0496 Epoch: 5 Global Step: 73460 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:39:05,799-Speed 3083.20 samples/sec Loss 10.2722 LearningRate 0.0496 Epoch: 5 Global Step: 73470 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:39:09,180-Speed 3030.35 samples/sec Loss 10.2357 LearningRate 0.0496 Epoch: 5 Global Step: 73480 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:39:12,496-Speed 3088.34 samples/sec Loss 10.3682 LearningRate 0.0496 Epoch: 5 Global Step: 73490 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:39:15,849-Speed 3055.38 samples/sec Loss 10.1973 LearningRate 0.0496 Epoch: 5 Global Step: 73500 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:39:19,118-Speed 3132.86 samples/sec Loss 10.4214 LearningRate 0.0496 Epoch: 5 Global Step: 73510 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 08:39:22,482-Speed 3044.74 samples/sec Loss 10.2996 LearningRate 0.0496 Epoch: 5 Global Step: 73520 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 08:39:25,808-Speed 3080.15 samples/sec Loss 10.2957 LearningRate 0.0496 Epoch: 5 Global Step: 73530 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 08:39:29,118-Speed 3094.16 samples/sec Loss 10.4521 LearningRate 0.0496 Epoch: 5 Global Step: 73540 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 08:39:32,509-Speed 3020.93 samples/sec Loss 10.3228 LearningRate 0.0496 Epoch: 5 Global Step: 73550 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 08:39:35,890-Speed 3029.07 samples/sec Loss 10.2694 LearningRate 0.0495 Epoch: 5 Global Step: 73560 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 08:39:39,347-Speed 2962.61 samples/sec Loss 10.4077 LearningRate 0.0495 Epoch: 5 Global Step: 73570 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 08:39:42,784-Speed 2980.83 samples/sec Loss 10.3145 LearningRate 0.0495 Epoch: 5 Global Step: 73580 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 08:39:46,126-Speed 3064.14 samples/sec Loss 10.3809 LearningRate 0.0495 Epoch: 5 Global Step: 73590 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 08:39:49,487-Speed 3047.55 samples/sec Loss 10.3297 LearningRate 0.0495 Epoch: 5 Global Step: 73600 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 08:39:52,836-Speed 3058.74 samples/sec Loss 10.2545 LearningRate 0.0495 Epoch: 5 Global Step: 73610 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:39:56,352-Speed 2912.94 samples/sec Loss 10.1957 LearningRate 0.0495 Epoch: 5 Global Step: 73620 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:39:59,729-Speed 3033.49 samples/sec Loss 10.3972 LearningRate 0.0495 Epoch: 5 Global Step: 73630 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:40:03,100-Speed 3037.92 samples/sec Loss 10.3055 LearningRate 0.0495 Epoch: 5 Global Step: 73640 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:40:06,467-Speed 3042.51 samples/sec Loss 10.2458 LearningRate 0.0495 Epoch: 5 Global Step: 73650 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:40:09,888-Speed 2993.89 samples/sec Loss 10.1908 LearningRate 0.0495 Epoch: 5 Global Step: 73660 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:40:13,261-Speed 3036.67 samples/sec Loss 10.2245 LearningRate 0.0495 Epoch: 5 Global Step: 73670 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:40:16,569-Speed 3096.70 samples/sec Loss 10.4708 LearningRate 0.0495 Epoch: 5 Global Step: 73680 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:40:19,886-Speed 3088.15 samples/sec Loss 10.2948 LearningRate 0.0495 Epoch: 5 Global Step: 73690 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:40:23,260-Speed 3035.49 samples/sec Loss 10.4001 LearningRate 0.0495 Epoch: 5 Global Step: 73700 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 08:40:26,691-Speed 2985.44 samples/sec Loss 10.3691 LearningRate 0.0495 Epoch: 5 Global Step: 73710 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 08:40:30,064-Speed 3036.41 samples/sec Loss 10.2779 LearningRate 0.0495 Epoch: 5 Global Step: 73720 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 08:40:33,416-Speed 3055.86 samples/sec Loss 10.3558 LearningRate 0.0494 Epoch: 5 Global Step: 73730 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 08:40:36,792-Speed 3034.00 samples/sec Loss 10.2807 LearningRate 0.0494 Epoch: 5 Global Step: 73740 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 08:40:40,160-Speed 3041.82 samples/sec Loss 10.3151 LearningRate 0.0494 Epoch: 5 Global Step: 73750 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 08:40:43,540-Speed 3029.88 samples/sec Loss 10.2772 LearningRate 0.0494 Epoch: 5 Global Step: 73760 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 08:40:46,938-Speed 3015.00 samples/sec Loss 10.2474 LearningRate 0.0494 Epoch: 5 Global Step: 73770 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 08:40:50,321-Speed 3027.56 samples/sec Loss 10.4037 LearningRate 0.0494 Epoch: 5 Global Step: 73780 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 08:40:53,802-Speed 2942.14 samples/sec Loss 10.2101 LearningRate 0.0494 Epoch: 5 Global Step: 73790 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 08:40:57,264-Speed 2959.49 samples/sec Loss 10.2783 LearningRate 0.0494 Epoch: 5 Global Step: 73800 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:41:00,665-Speed 3011.80 samples/sec Loss 10.3216 LearningRate 0.0494 Epoch: 5 Global Step: 73810 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:41:04,064-Speed 3013.84 samples/sec Loss 10.4767 LearningRate 0.0494 Epoch: 5 Global Step: 73820 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:41:07,482-Speed 2997.52 samples/sec Loss 10.3472 LearningRate 0.0494 Epoch: 5 Global Step: 73830 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:41:10,836-Speed 3053.91 samples/sec Loss 10.4145 LearningRate 0.0494 Epoch: 5 Global Step: 73840 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:41:14,203-Speed 3042.67 samples/sec Loss 10.3138 LearningRate 0.0494 Epoch: 5 Global Step: 73850 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:41:17,564-Speed 3047.31 samples/sec Loss 10.3832 LearningRate 0.0494 Epoch: 5 Global Step: 73860 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:41:20,963-Speed 3012.89 samples/sec Loss 10.2813 LearningRate 0.0494 Epoch: 5 Global Step: 73870 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:41:24,436-Speed 2949.59 samples/sec Loss 10.3801 LearningRate 0.0494 Epoch: 5 Global Step: 73880 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:41:27,852-Speed 2998.74 samples/sec Loss 10.2891 LearningRate 0.0494 Epoch: 5 Global Step: 73890 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:41:31,274-Speed 2993.29 samples/sec Loss 10.1221 LearningRate 0.0494 Epoch: 5 Global Step: 73900 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:41:34,595-Speed 3083.45 samples/sec Loss 10.2639 LearningRate 0.0493 Epoch: 5 Global Step: 73910 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:41:37,962-Speed 3042.38 samples/sec Loss 10.1513 LearningRate 0.0493 Epoch: 5 Global Step: 73920 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:41:41,321-Speed 3050.01 samples/sec Loss 10.3111 LearningRate 0.0493 Epoch: 5 Global Step: 73930 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:41:44,759-Speed 2979.39 samples/sec Loss 10.2671 LearningRate 0.0493 Epoch: 5 Global Step: 73940 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:41:48,129-Speed 3038.93 samples/sec Loss 10.3032 LearningRate 0.0493 Epoch: 5 Global Step: 73950 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:41:51,608-Speed 2944.14 samples/sec Loss 10.2279 LearningRate 0.0493 Epoch: 5 Global Step: 73960 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:41:54,994-Speed 3024.92 samples/sec Loss 10.3748 LearningRate 0.0493 Epoch: 5 Global Step: 73970 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:41:58,388-Speed 3018.16 samples/sec Loss 10.3386 LearningRate 0.0493 Epoch: 5 Global Step: 73980 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:42:01,861-Speed 2949.20 samples/sec Loss 10.2583 LearningRate 0.0493 Epoch: 5 Global Step: 73990 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:42:05,242-Speed 3030.13 samples/sec Loss 10.2606 LearningRate 0.0493 Epoch: 5 Global Step: 74000 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:42:08,637-Speed 3016.55 samples/sec Loss 10.2729 LearningRate 0.0493 Epoch: 5 Global Step: 74010 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:42:11,988-Speed 3056.51 samples/sec Loss 10.2261 LearningRate 0.0493 Epoch: 5 Global Step: 74020 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:42:15,343-Speed 3052.84 samples/sec Loss 10.2448 LearningRate 0.0493 Epoch: 5 Global Step: 74030 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:42:18,760-Speed 2997.18 samples/sec Loss 10.2426 LearningRate 0.0493 Epoch: 5 Global Step: 74040 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:42:22,135-Speed 3035.06 samples/sec Loss 10.3444 LearningRate 0.0493 Epoch: 5 Global Step: 74050 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:42:25,521-Speed 3025.88 samples/sec Loss 10.2757 LearningRate 0.0493 Epoch: 5 Global Step: 74060 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:42:28,956-Speed 2981.13 samples/sec Loss 10.1542 LearningRate 0.0493 Epoch: 5 Global Step: 74070 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:42:32,295-Speed 3067.55 samples/sec Loss 10.4233 LearningRate 0.0493 Epoch: 5 Global Step: 74080 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:42:35,650-Speed 3053.46 samples/sec Loss 10.3772 LearningRate 0.0492 Epoch: 5 Global Step: 74090 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:42:39,026-Speed 3033.45 samples/sec Loss 10.3001 LearningRate 0.0492 Epoch: 5 Global Step: 74100 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:42:42,462-Speed 2981.38 samples/sec Loss 10.2053 LearningRate 0.0492 Epoch: 5 Global Step: 74110 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:42:45,851-Speed 3022.44 samples/sec Loss 10.2481 LearningRate 0.0492 Epoch: 5 Global Step: 74120 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:42:49,270-Speed 2995.89 samples/sec Loss 10.2462 LearningRate 0.0492 Epoch: 5 Global Step: 74130 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:42:52,619-Speed 3058.72 samples/sec Loss 10.2839 LearningRate 0.0492 Epoch: 5 Global Step: 74140 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:42:56,014-Speed 3018.01 samples/sec Loss 10.1839 LearningRate 0.0492 Epoch: 5 Global Step: 74150 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:42:59,432-Speed 2996.59 samples/sec Loss 10.2773 LearningRate 0.0492 Epoch: 5 Global Step: 74160 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:43:02,847-Speed 2999.32 samples/sec Loss 10.1461 LearningRate 0.0492 Epoch: 5 Global Step: 74170 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:43:06,247-Speed 3012.31 samples/sec Loss 10.2035 LearningRate 0.0492 Epoch: 5 Global Step: 74180 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:43:09,580-Speed 3073.41 samples/sec Loss 10.2782 LearningRate 0.0492 Epoch: 5 Global Step: 74190 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:43:12,961-Speed 3029.77 samples/sec Loss 10.3336 LearningRate 0.0492 Epoch: 5 Global Step: 74200 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:43:16,380-Speed 2995.89 samples/sec Loss 10.1883 LearningRate 0.0492 Epoch: 5 Global Step: 74210 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:43:19,731-Speed 3057.20 samples/sec Loss 10.1014 LearningRate 0.0492 Epoch: 5 Global Step: 74220 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:43:23,128-Speed 3015.63 samples/sec Loss 10.2516 LearningRate 0.0492 Epoch: 5 Global Step: 74230 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:43:26,493-Speed 3044.70 samples/sec Loss 10.3004 LearningRate 0.0492 Epoch: 5 Global Step: 74240 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:43:29,962-Speed 2952.18 samples/sec Loss 10.2467 LearningRate 0.0492 Epoch: 5 Global Step: 74250 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:43:33,368-Speed 3008.41 samples/sec Loss 10.1543 LearningRate 0.0492 Epoch: 5 Global Step: 74260 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:43:36,694-Speed 3078.96 samples/sec Loss 10.2853 LearningRate 0.0491 Epoch: 5 Global Step: 74270 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:43:40,049-Speed 3053.33 samples/sec Loss 10.3075 LearningRate 0.0491 Epoch: 5 Global Step: 74280 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:43:43,439-Speed 3020.88 samples/sec Loss 10.3789 LearningRate 0.0491 Epoch: 5 Global Step: 74290 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:43:46,801-Speed 3046.77 samples/sec Loss 10.2828 LearningRate 0.0491 Epoch: 5 Global Step: 74300 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:43:50,165-Speed 3044.98 samples/sec Loss 10.2135 LearningRate 0.0491 Epoch: 5 Global Step: 74310 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:43:53,519-Speed 3053.76 samples/sec Loss 10.1096 LearningRate 0.0491 Epoch: 5 Global Step: 74320 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:43:56,988-Speed 2952.87 samples/sec Loss 10.2387 LearningRate 0.0491 Epoch: 5 Global Step: 74330 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:44:00,403-Speed 2998.79 samples/sec Loss 10.3560 LearningRate 0.0491 Epoch: 5 Global Step: 74340 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:44:03,784-Speed 3030.16 samples/sec Loss 10.2618 LearningRate 0.0491 Epoch: 5 Global Step: 74350 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:44:07,180-Speed 3016.36 samples/sec Loss 10.1436 LearningRate 0.0491 Epoch: 5 Global Step: 74360 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:44:10,558-Speed 3031.99 samples/sec Loss 10.2609 LearningRate 0.0491 Epoch: 5 Global Step: 74370 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:44:13,967-Speed 3004.18 samples/sec Loss 10.2685 LearningRate 0.0491 Epoch: 5 Global Step: 74380 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:44:17,337-Speed 3039.85 samples/sec Loss 10.1258 LearningRate 0.0491 Epoch: 5 Global Step: 74390 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:44:20,759-Speed 2993.13 samples/sec Loss 10.2211 LearningRate 0.0491 Epoch: 5 Global Step: 74400 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:44:24,123-Speed 3044.55 samples/sec Loss 10.3786 LearningRate 0.0491 Epoch: 5 Global Step: 74410 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:44:27,524-Speed 3012.59 samples/sec Loss 10.2506 LearningRate 0.0491 Epoch: 5 Global Step: 74420 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:44:30,866-Speed 3064.10 samples/sec Loss 10.2067 LearningRate 0.0491 Epoch: 5 Global Step: 74430 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:44:34,261-Speed 3017.48 samples/sec Loss 10.1625 LearningRate 0.0490 Epoch: 5 Global Step: 74440 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:44:37,656-Speed 3016.82 samples/sec Loss 10.1750 LearningRate 0.0490 Epoch: 5 Global Step: 74450 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:44:41,021-Speed 3043.52 samples/sec Loss 10.1639 LearningRate 0.0490 Epoch: 5 Global Step: 74460 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:44:44,504-Speed 2941.53 samples/sec Loss 10.3189 LearningRate 0.0490 Epoch: 5 Global Step: 74470 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:44:47,917-Speed 3001.21 samples/sec Loss 10.1237 LearningRate 0.0490 Epoch: 5 Global Step: 74480 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:44:51,315-Speed 3013.86 samples/sec Loss 10.2248 LearningRate 0.0490 Epoch: 5 Global Step: 74490 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:44:54,800-Speed 2939.24 samples/sec Loss 10.4323 LearningRate 0.0490 Epoch: 5 Global Step: 74500 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:44:58,192-Speed 3019.95 samples/sec Loss 10.1319 LearningRate 0.0490 Epoch: 5 Global Step: 74510 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:45:01,908-Speed 2756.72 samples/sec Loss 10.3306 LearningRate 0.0490 Epoch: 5 Global Step: 74520 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:45:33,871-Speed 320.39 samples/sec Loss 9.6998 LearningRate 0.0490 Epoch: 6 Global Step: 74530 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:45:37,487-Speed 2833.87 samples/sec Loss 8.9480 LearningRate 0.0490 Epoch: 6 Global Step: 74540 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:45:41,025-Speed 2894.89 samples/sec Loss 8.8148 LearningRate 0.0490 Epoch: 6 Global Step: 74550 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:45:44,481-Speed 2964.42 samples/sec Loss 8.7749 LearningRate 0.0490 Epoch: 6 Global Step: 74560 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:45:47,911-Speed 2986.38 samples/sec Loss 8.6834 LearningRate 0.0490 Epoch: 6 Global Step: 74570 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:45:51,311-Speed 3013.05 samples/sec Loss 8.7957 LearningRate 0.0490 Epoch: 6 Global Step: 74580 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:45:54,739-Speed 2988.34 samples/sec Loss 8.8649 LearningRate 0.0490 Epoch: 6 Global Step: 74590 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:45:58,127-Speed 3023.51 samples/sec Loss 8.7687 LearningRate 0.0490 Epoch: 6 Global Step: 74600 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:46:01,553-Speed 2990.12 samples/sec Loss 8.7817 LearningRate 0.0490 Epoch: 6 Global Step: 74610 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:46:05,065-Speed 2916.00 samples/sec Loss 8.7817 LearningRate 0.0489 Epoch: 6 Global Step: 74620 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:46:08,743-Speed 2785.06 samples/sec Loss 8.8454 LearningRate 0.0489 Epoch: 6 Global Step: 74630 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:46:12,174-Speed 2996.80 samples/sec Loss 8.8552 LearningRate 0.0489 Epoch: 6 Global Step: 74640 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:46:15,541-Speed 3042.33 samples/sec Loss 8.7392 LearningRate 0.0489 Epoch: 6 Global Step: 74650 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:46:18,933-Speed 3019.45 samples/sec Loss 8.8558 LearningRate 0.0489 Epoch: 6 Global Step: 74660 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:46:22,376-Speed 2974.97 samples/sec Loss 9.0353 LearningRate 0.0489 Epoch: 6 Global Step: 74670 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:46:25,749-Speed 3037.25 samples/sec Loss 8.8821 LearningRate 0.0489 Epoch: 6 Global Step: 74680 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:46:29,099-Speed 3058.01 samples/sec Loss 8.9492 LearningRate 0.0489 Epoch: 6 Global Step: 74690 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:46:32,534-Speed 2981.22 samples/sec Loss 8.9338 LearningRate 0.0489 Epoch: 6 Global Step: 74700 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:46:35,970-Speed 2981.41 samples/sec Loss 8.7856 LearningRate 0.0489 Epoch: 6 Global Step: 74710 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:46:39,330-Speed 3048.67 samples/sec Loss 8.8833 LearningRate 0.0489 Epoch: 6 Global Step: 74720 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:46:42,738-Speed 3005.39 samples/sec Loss 8.8761 LearningRate 0.0489 Epoch: 6 Global Step: 74730 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:46:46,101-Speed 3045.99 samples/sec Loss 8.7385 LearningRate 0.0489 Epoch: 6 Global Step: 74740 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:46:49,554-Speed 2966.73 samples/sec Loss 9.0346 LearningRate 0.0489 Epoch: 6 Global Step: 74750 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:46:52,902-Speed 3059.56 samples/sec Loss 8.9555 LearningRate 0.0489 Epoch: 6 Global Step: 74760 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:46:56,283-Speed 3030.47 samples/sec Loss 8.9609 LearningRate 0.0489 Epoch: 6 Global Step: 74770 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:46:59,649-Speed 3042.92 samples/sec Loss 8.9388 LearningRate 0.0489 Epoch: 6 Global Step: 74780 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:47:03,035-Speed 3025.20 samples/sec Loss 8.9309 LearningRate 0.0489 Epoch: 6 Global Step: 74790 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:47:06,379-Speed 3062.65 samples/sec Loss 8.8191 LearningRate 0.0488 Epoch: 6 Global Step: 74800 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:47:09,739-Speed 3048.25 samples/sec Loss 9.0659 LearningRate 0.0488 Epoch: 6 Global Step: 74810 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:47:13,190-Speed 2968.52 samples/sec Loss 9.0627 LearningRate 0.0488 Epoch: 6 Global Step: 74820 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:47:16,645-Speed 2965.00 samples/sec Loss 8.8793 LearningRate 0.0488 Epoch: 6 Global Step: 74830 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:47:20,084-Speed 2977.69 samples/sec Loss 9.0565 LearningRate 0.0488 Epoch: 6 Global Step: 74840 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:47:23,578-Speed 2931.75 samples/sec Loss 9.1583 LearningRate 0.0488 Epoch: 6 Global Step: 74850 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:47:26,984-Speed 3007.84 samples/sec Loss 8.9714 LearningRate 0.0488 Epoch: 6 Global Step: 74860 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:47:30,320-Speed 3070.06 samples/sec Loss 9.0181 LearningRate 0.0488 Epoch: 6 Global Step: 74870 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:47:33,700-Speed 3030.44 samples/sec Loss 9.0502 LearningRate 0.0488 Epoch: 6 Global Step: 74880 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:47:37,107-Speed 3006.82 samples/sec Loss 8.9486 LearningRate 0.0488 Epoch: 6 Global Step: 74890 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:47:40,525-Speed 2996.57 samples/sec Loss 8.9196 LearningRate 0.0488 Epoch: 6 Global Step: 74900 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:47:43,922-Speed 3015.32 samples/sec Loss 8.9909 LearningRate 0.0488 Epoch: 6 Global Step: 74910 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:47:47,305-Speed 3028.58 samples/sec Loss 8.9809 LearningRate 0.0488 Epoch: 6 Global Step: 74920 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 08:47:50,684-Speed 3031.36 samples/sec Loss 9.1026 LearningRate 0.0488 Epoch: 6 Global Step: 74930 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 08:47:53,997-Speed 3091.93 samples/sec Loss 8.9917 LearningRate 0.0488 Epoch: 6 Global Step: 74940 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 08:47:57,387-Speed 3021.55 samples/sec Loss 9.0385 LearningRate 0.0488 Epoch: 6 Global Step: 74950 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 08:48:00,732-Speed 3061.77 samples/sec Loss 8.8799 LearningRate 0.0488 Epoch: 6 Global Step: 74960 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 08:48:04,088-Speed 3051.88 samples/sec Loss 9.0861 LearningRate 0.0488 Epoch: 6 Global Step: 74970 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 08:48:07,510-Speed 2993.58 samples/sec Loss 9.0813 LearningRate 0.0487 Epoch: 6 Global Step: 74980 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 08:48:10,862-Speed 3056.20 samples/sec Loss 9.0176 LearningRate 0.0487 Epoch: 6 Global Step: 74990 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 08:48:14,205-Speed 3063.71 samples/sec Loss 9.1416 LearningRate 0.0487 Epoch: 6 Global Step: 75000 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 08:48:17,667-Speed 2959.12 samples/sec Loss 9.1205 LearningRate 0.0487 Epoch: 6 Global Step: 75010 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 08:48:21,060-Speed 3018.76 samples/sec Loss 9.0506 LearningRate 0.0487 Epoch: 6 Global Step: 75020 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:48:24,458-Speed 3014.93 samples/sec Loss 9.1461 LearningRate 0.0487 Epoch: 6 Global Step: 75030 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:48:27,853-Speed 3017.41 samples/sec Loss 9.0471 LearningRate 0.0487 Epoch: 6 Global Step: 75040 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:48:31,221-Speed 3041.01 samples/sec Loss 9.1475 LearningRate 0.0487 Epoch: 6 Global Step: 75050 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:48:34,560-Speed 3067.87 samples/sec Loss 9.0485 LearningRate 0.0487 Epoch: 6 Global Step: 75060 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:48:37,886-Speed 3079.41 samples/sec Loss 9.0507 LearningRate 0.0487 Epoch: 6 Global Step: 75070 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:48:41,287-Speed 3011.39 samples/sec Loss 9.1527 LearningRate 0.0487 Epoch: 6 Global Step: 75080 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:48:44,677-Speed 3022.27 samples/sec Loss 9.0377 LearningRate 0.0487 Epoch: 6 Global Step: 75090 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:48:48,064-Speed 3024.38 samples/sec Loss 9.0279 LearningRate 0.0487 Epoch: 6 Global Step: 75100 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:48:51,439-Speed 3034.55 samples/sec Loss 9.1755 LearningRate 0.0487 Epoch: 6 Global Step: 75110 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:48:54,876-Speed 2979.99 samples/sec Loss 9.1152 LearningRate 0.0487 Epoch: 6 Global Step: 75120 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:48:58,263-Speed 3024.14 samples/sec Loss 9.0845 LearningRate 0.0487 Epoch: 6 Global Step: 75130 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:49:01,651-Speed 3023.74 samples/sec Loss 9.0695 LearningRate 0.0487 Epoch: 6 Global Step: 75140 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:49:05,063-Speed 3002.00 samples/sec Loss 9.1648 LearningRate 0.0486 Epoch: 6 Global Step: 75150 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:49:08,453-Speed 3020.94 samples/sec Loss 9.1358 LearningRate 0.0486 Epoch: 6 Global Step: 75160 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:49:11,909-Speed 2963.76 samples/sec Loss 9.1411 LearningRate 0.0486 Epoch: 6 Global Step: 75170 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:49:15,327-Speed 2996.60 samples/sec Loss 8.9664 LearningRate 0.0486 Epoch: 6 Global Step: 75180 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:49:18,716-Speed 3022.23 samples/sec Loss 9.3463 LearningRate 0.0486 Epoch: 6 Global Step: 75190 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:49:22,114-Speed 3014.87 samples/sec Loss 9.1414 LearningRate 0.0486 Epoch: 6 Global Step: 75200 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:49:25,521-Speed 3006.14 samples/sec Loss 9.1977 LearningRate 0.0486 Epoch: 6 Global Step: 75210 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:49:28,882-Speed 3047.24 samples/sec Loss 9.2658 LearningRate 0.0486 Epoch: 6 Global Step: 75220 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:49:32,288-Speed 3007.66 samples/sec Loss 9.2224 LearningRate 0.0486 Epoch: 6 Global Step: 75230 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:49:35,710-Speed 2993.00 samples/sec Loss 9.1116 LearningRate 0.0486 Epoch: 6 Global Step: 75240 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:49:39,086-Speed 3034.27 samples/sec Loss 9.2883 LearningRate 0.0486 Epoch: 6 Global Step: 75250 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:49:42,434-Speed 3058.90 samples/sec Loss 9.3048 LearningRate 0.0486 Epoch: 6 Global Step: 75260 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:49:45,786-Speed 3055.93 samples/sec Loss 9.2620 LearningRate 0.0486 Epoch: 6 Global Step: 75270 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:49:49,122-Speed 3070.58 samples/sec Loss 9.1512 LearningRate 0.0486 Epoch: 6 Global Step: 75280 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:49:52,563-Speed 2976.72 samples/sec Loss 9.1720 LearningRate 0.0486 Epoch: 6 Global Step: 75290 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:49:55,944-Speed 3029.84 samples/sec Loss 9.1822 LearningRate 0.0486 Epoch: 6 Global Step: 75300 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:49:59,335-Speed 3021.03 samples/sec Loss 9.2114 LearningRate 0.0486 Epoch: 6 Global Step: 75310 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:50:02,664-Speed 3076.72 samples/sec Loss 9.1633 LearningRate 0.0486 Epoch: 6 Global Step: 75320 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:50:06,059-Speed 3016.99 samples/sec Loss 9.3064 LearningRate 0.0485 Epoch: 6 Global Step: 75330 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:50:09,431-Speed 3038.09 samples/sec Loss 9.1933 LearningRate 0.0485 Epoch: 6 Global Step: 75340 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:50:12,841-Speed 3004.49 samples/sec Loss 9.3679 LearningRate 0.0485 Epoch: 6 Global Step: 75350 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:50:16,293-Speed 2966.74 samples/sec Loss 9.2500 LearningRate 0.0485 Epoch: 6 Global Step: 75360 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:50:19,713-Speed 2995.05 samples/sec Loss 9.2270 LearningRate 0.0485 Epoch: 6 Global Step: 75370 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:50:23,089-Speed 3034.68 samples/sec Loss 9.1307 LearningRate 0.0485 Epoch: 6 Global Step: 75380 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:50:26,490-Speed 3010.92 samples/sec Loss 9.3491 LearningRate 0.0485 Epoch: 6 Global Step: 75390 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:50:29,873-Speed 3028.42 samples/sec Loss 9.2493 LearningRate 0.0485 Epoch: 6 Global Step: 75400 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:50:33,241-Speed 3041.18 samples/sec Loss 9.2028 LearningRate 0.0485 Epoch: 6 Global Step: 75410 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:50:36,621-Speed 3030.30 samples/sec Loss 9.3683 LearningRate 0.0485 Epoch: 6 Global Step: 75420 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:50:39,996-Speed 3034.54 samples/sec Loss 9.2578 LearningRate 0.0485 Epoch: 6 Global Step: 75430 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:50:43,397-Speed 3011.45 samples/sec Loss 9.2727 LearningRate 0.0485 Epoch: 6 Global Step: 75440 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:50:46,786-Speed 3022.95 samples/sec Loss 9.3441 LearningRate 0.0485 Epoch: 6 Global Step: 75450 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:50:50,153-Speed 3041.77 samples/sec Loss 9.1911 LearningRate 0.0485 Epoch: 6 Global Step: 75460 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:50:53,627-Speed 2948.60 samples/sec Loss 9.3408 LearningRate 0.0485 Epoch: 6 Global Step: 75470 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:50:57,005-Speed 3032.82 samples/sec Loss 9.3937 LearningRate 0.0485 Epoch: 6 Global Step: 75480 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:51:00,427-Speed 2992.93 samples/sec Loss 9.2078 LearningRate 0.0485 Epoch: 6 Global Step: 75490 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:51:03,862-Speed 2982.22 samples/sec Loss 9.2721 LearningRate 0.0485 Epoch: 6 Global Step: 75500 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:51:07,338-Speed 2947.89 samples/sec Loss 9.3184 LearningRate 0.0484 Epoch: 6 Global Step: 75510 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:51:10,738-Speed 3012.44 samples/sec Loss 9.2335 LearningRate 0.0484 Epoch: 6 Global Step: 75520 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:51:14,215-Speed 2945.52 samples/sec Loss 9.3551 LearningRate 0.0484 Epoch: 6 Global Step: 75530 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:51:17,560-Speed 3062.15 samples/sec Loss 9.4974 LearningRate 0.0484 Epoch: 6 Global Step: 75540 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:51:20,990-Speed 2986.24 samples/sec Loss 9.4088 LearningRate 0.0484 Epoch: 6 Global Step: 75550 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:51:24,407-Speed 2997.61 samples/sec Loss 9.3839 LearningRate 0.0484 Epoch: 6 Global Step: 75560 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:51:27,790-Speed 3028.21 samples/sec Loss 9.4525 LearningRate 0.0484 Epoch: 6 Global Step: 75570 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:51:31,129-Speed 3067.81 samples/sec Loss 9.3630 LearningRate 0.0484 Epoch: 6 Global Step: 75580 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:51:34,492-Speed 3045.69 samples/sec Loss 9.3714 LearningRate 0.0484 Epoch: 6 Global Step: 75590 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:51:37,904-Speed 3002.00 samples/sec Loss 9.3435 LearningRate 0.0484 Epoch: 6 Global Step: 75600 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:51:41,308-Speed 3009.02 samples/sec Loss 9.5091 LearningRate 0.0484 Epoch: 6 Global Step: 75610 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:51:44,619-Speed 3093.27 samples/sec Loss 9.3125 LearningRate 0.0484 Epoch: 6 Global Step: 75620 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:51:48,017-Speed 3014.56 samples/sec Loss 9.4150 LearningRate 0.0484 Epoch: 6 Global Step: 75630 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:51:51,375-Speed 3050.83 samples/sec Loss 9.3398 LearningRate 0.0484 Epoch: 6 Global Step: 75640 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:51:54,774-Speed 3013.44 samples/sec Loss 9.5156 LearningRate 0.0484 Epoch: 6 Global Step: 75650 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:51:58,208-Speed 2982.88 samples/sec Loss 9.4093 LearningRate 0.0484 Epoch: 6 Global Step: 75660 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:52:01,679-Speed 2950.55 samples/sec Loss 9.3481 LearningRate 0.0484 Epoch: 6 Global Step: 75670 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:52:05,149-Speed 2951.89 samples/sec Loss 9.4359 LearningRate 0.0484 Epoch: 6 Global Step: 75680 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:52:08,506-Speed 3051.20 samples/sec Loss 9.4613 LearningRate 0.0483 Epoch: 6 Global Step: 75690 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:52:11,862-Speed 3051.73 samples/sec Loss 9.5154 LearningRate 0.0483 Epoch: 6 Global Step: 75700 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:52:15,226-Speed 3044.85 samples/sec Loss 9.5376 LearningRate 0.0483 Epoch: 6 Global Step: 75710 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:52:18,583-Speed 3051.90 samples/sec Loss 9.4340 LearningRate 0.0483 Epoch: 6 Global Step: 75720 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:52:22,020-Speed 2980.44 samples/sec Loss 9.3456 LearningRate 0.0483 Epoch: 6 Global Step: 75730 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:52:25,363-Speed 3063.93 samples/sec Loss 9.3680 LearningRate 0.0483 Epoch: 6 Global Step: 75740 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:52:28,775-Speed 3002.39 samples/sec Loss 9.4242 LearningRate 0.0483 Epoch: 6 Global Step: 75750 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:52:32,224-Speed 2969.53 samples/sec Loss 9.5886 LearningRate 0.0483 Epoch: 6 Global Step: 75760 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:52:35,636-Speed 3002.11 samples/sec Loss 9.4544 LearningRate 0.0483 Epoch: 6 Global Step: 75770 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:52:39,042-Speed 3006.56 samples/sec Loss 9.4630 LearningRate 0.0483 Epoch: 6 Global Step: 75780 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:52:42,475-Speed 2983.74 samples/sec Loss 9.5633 LearningRate 0.0483 Epoch: 6 Global Step: 75790 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:52:45,846-Speed 3039.59 samples/sec Loss 9.5647 LearningRate 0.0483 Epoch: 6 Global Step: 75800 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:52:49,225-Speed 3030.71 samples/sec Loss 9.4992 LearningRate 0.0483 Epoch: 6 Global Step: 75810 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:52:52,591-Speed 3042.59 samples/sec Loss 9.5042 LearningRate 0.0483 Epoch: 6 Global Step: 75820 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:52:56,016-Speed 2991.65 samples/sec Loss 9.4965 LearningRate 0.0483 Epoch: 6 Global Step: 75830 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:52:59,369-Speed 3054.33 samples/sec Loss 9.5315 LearningRate 0.0483 Epoch: 6 Global Step: 75840 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:53:02,750-Speed 3030.00 samples/sec Loss 9.4455 LearningRate 0.0483 Epoch: 6 Global Step: 75850 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:53:06,170-Speed 2994.66 samples/sec Loss 9.4786 LearningRate 0.0483 Epoch: 6 Global Step: 75860 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:53:09,562-Speed 3019.74 samples/sec Loss 9.4987 LearningRate 0.0482 Epoch: 6 Global Step: 75870 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:53:13,055-Speed 2932.40 samples/sec Loss 9.5387 LearningRate 0.0482 Epoch: 6 Global Step: 75880 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:53:16,513-Speed 2961.96 samples/sec Loss 9.4653 LearningRate 0.0482 Epoch: 6 Global Step: 75890 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:53:19,868-Speed 3053.17 samples/sec Loss 9.4321 LearningRate 0.0482 Epoch: 6 Global Step: 75900 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:53:23,211-Speed 3064.11 samples/sec Loss 9.5545 LearningRate 0.0482 Epoch: 6 Global Step: 75910 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:53:26,602-Speed 3020.89 samples/sec Loss 9.3820 LearningRate 0.0482 Epoch: 6 Global Step: 75920 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:53:30,093-Speed 2933.62 samples/sec Loss 9.3922 LearningRate 0.0482 Epoch: 6 Global Step: 75930 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:53:33,438-Speed 3062.47 samples/sec Loss 9.5106 LearningRate 0.0482 Epoch: 6 Global Step: 75940 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:53:36,837-Speed 3013.16 samples/sec Loss 9.3443 LearningRate 0.0482 Epoch: 6 Global Step: 75950 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:53:40,310-Speed 2949.28 samples/sec Loss 9.4151 LearningRate 0.0482 Epoch: 6 Global Step: 75960 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:53:43,766-Speed 2964.23 samples/sec Loss 9.4383 LearningRate 0.0482 Epoch: 6 Global Step: 75970 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:53:47,170-Speed 3008.75 samples/sec Loss 9.3968 LearningRate 0.0482 Epoch: 6 Global Step: 75980 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:53:50,559-Speed 3022.63 samples/sec Loss 9.4784 LearningRate 0.0482 Epoch: 6 Global Step: 75990 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:53:53,872-Speed 3092.09 samples/sec Loss 9.5711 LearningRate 0.0482 Epoch: 6 Global Step: 76000 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:53:57,447-Speed 2864.92 samples/sec Loss 9.5863 LearningRate 0.0482 Epoch: 6 Global Step: 76010 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:54:00,926-Speed 2944.73 samples/sec Loss 9.5619 LearningRate 0.0482 Epoch: 6 Global Step: 76020 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 08:54:04,353-Speed 2988.46 samples/sec Loss 9.5364 LearningRate 0.0482 Epoch: 6 Global Step: 76030 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:54:07,775-Speed 2993.29 samples/sec Loss 9.5417 LearningRate 0.0482 Epoch: 6 Global Step: 76040 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:54:11,223-Speed 2971.13 samples/sec Loss 9.5216 LearningRate 0.0481 Epoch: 6 Global Step: 76050 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:54:14,626-Speed 3009.61 samples/sec Loss 9.4436 LearningRate 0.0481 Epoch: 6 Global Step: 76060 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:54:18,061-Speed 2981.67 samples/sec Loss 9.4797 LearningRate 0.0481 Epoch: 6 Global Step: 76070 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:54:21,400-Speed 3068.07 samples/sec Loss 9.4801 LearningRate 0.0481 Epoch: 6 Global Step: 76080 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:54:24,728-Speed 3078.09 samples/sec Loss 9.6099 LearningRate 0.0481 Epoch: 6 Global Step: 76090 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:54:28,160-Speed 2984.33 samples/sec Loss 9.5525 LearningRate 0.0481 Epoch: 6 Global Step: 76100 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:54:31,613-Speed 2966.29 samples/sec Loss 9.5740 LearningRate 0.0481 Epoch: 6 Global Step: 76110 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:54:35,011-Speed 3015.05 samples/sec Loss 9.7543 LearningRate 0.0481 Epoch: 6 Global Step: 76120 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:54:38,384-Speed 3036.40 samples/sec Loss 9.4945 LearningRate 0.0481 Epoch: 6 Global Step: 76130 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-04-27 08:54:41,834-Speed 2968.87 samples/sec Loss 9.5437 LearningRate 0.0481 Epoch: 6 Global Step: 76140 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:54:45,216-Speed 3028.09 samples/sec Loss 9.4925 LearningRate 0.0481 Epoch: 6 Global Step: 76150 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:54:48,618-Speed 3011.11 samples/sec Loss 9.7061 LearningRate 0.0481 Epoch: 6 Global Step: 76160 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:54:51,997-Speed 3031.45 samples/sec Loss 9.6177 LearningRate 0.0481 Epoch: 6 Global Step: 76170 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:54:55,466-Speed 2952.75 samples/sec Loss 9.6423 LearningRate 0.0481 Epoch: 6 Global Step: 76180 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:54:58,879-Speed 3001.23 samples/sec Loss 9.6546 LearningRate 0.0481 Epoch: 6 Global Step: 76190 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:55:02,299-Speed 2995.28 samples/sec Loss 9.5206 LearningRate 0.0481 Epoch: 6 Global Step: 76200 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:55:05,795-Speed 2929.63 samples/sec Loss 9.5741 LearningRate 0.0481 Epoch: 6 Global Step: 76210 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:55:09,217-Speed 2994.48 samples/sec Loss 9.4817 LearningRate 0.0480 Epoch: 6 Global Step: 76220 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:55:12,670-Speed 2965.59 samples/sec Loss 9.5384 LearningRate 0.0480 Epoch: 6 Global Step: 76230 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:55:16,072-Speed 3011.48 samples/sec Loss 9.6323 LearningRate 0.0480 Epoch: 6 Global Step: 76240 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:55:19,513-Speed 2977.09 samples/sec Loss 9.6076 LearningRate 0.0480 Epoch: 6 Global Step: 76250 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:55:23,001-Speed 2936.62 samples/sec Loss 9.5322 LearningRate 0.0480 Epoch: 6 Global Step: 76260 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:55:26,370-Speed 3040.37 samples/sec Loss 9.6041 LearningRate 0.0480 Epoch: 6 Global Step: 76270 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:55:29,802-Speed 2984.16 samples/sec Loss 9.6350 LearningRate 0.0480 Epoch: 6 Global Step: 76280 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:55:33,223-Speed 2994.80 samples/sec Loss 9.7607 LearningRate 0.0480 Epoch: 6 Global Step: 76290 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:55:36,697-Speed 2948.43 samples/sec Loss 9.5431 LearningRate 0.0480 Epoch: 6 Global Step: 76300 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:55:40,096-Speed 3013.27 samples/sec Loss 9.6983 LearningRate 0.0480 Epoch: 6 Global Step: 76310 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:55:43,553-Speed 2962.95 samples/sec Loss 9.5899 LearningRate 0.0480 Epoch: 6 Global Step: 76320 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:55:47,004-Speed 2968.83 samples/sec Loss 9.5463 LearningRate 0.0480 Epoch: 6 Global Step: 76330 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:55:50,388-Speed 3026.72 samples/sec Loss 9.6595 LearningRate 0.0480 Epoch: 6 Global Step: 76340 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:55:53,828-Speed 2977.23 samples/sec Loss 9.8596 LearningRate 0.0480 Epoch: 6 Global Step: 76350 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:55:57,215-Speed 3024.18 samples/sec Loss 9.6962 LearningRate 0.0480 Epoch: 6 Global Step: 76360 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:56:00,621-Speed 3007.39 samples/sec Loss 9.7333 LearningRate 0.0480 Epoch: 6 Global Step: 76370 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:56:04,012-Speed 3021.17 samples/sec Loss 9.6678 LearningRate 0.0480 Epoch: 6 Global Step: 76380 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:56:07,454-Speed 2976.02 samples/sec Loss 9.7074 LearningRate 0.0480 Epoch: 6 Global Step: 76390 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:56:10,855-Speed 3011.10 samples/sec Loss 9.6164 LearningRate 0.0479 Epoch: 6 Global Step: 76400 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:56:14,248-Speed 3018.89 samples/sec Loss 9.8359 LearningRate 0.0479 Epoch: 6 Global Step: 76410 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:56:17,603-Speed 3053.53 samples/sec Loss 9.6829 LearningRate 0.0479 Epoch: 6 Global Step: 76420 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:56:20,965-Speed 3046.26 samples/sec Loss 9.5953 LearningRate 0.0479 Epoch: 6 Global Step: 76430 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:56:24,301-Speed 3070.62 samples/sec Loss 9.8254 LearningRate 0.0479 Epoch: 6 Global Step: 76440 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:56:27,713-Speed 3002.20 samples/sec Loss 9.6451 LearningRate 0.0479 Epoch: 6 Global Step: 76450 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:56:31,118-Speed 3008.11 samples/sec Loss 9.7508 LearningRate 0.0479 Epoch: 6 Global Step: 76460 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:56:34,445-Speed 3078.72 samples/sec Loss 9.7547 LearningRate 0.0479 Epoch: 6 Global Step: 76470 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:56:37,923-Speed 2945.62 samples/sec Loss 9.6173 LearningRate 0.0479 Epoch: 6 Global Step: 76480 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:56:41,274-Speed 3056.78 samples/sec Loss 9.6626 LearningRate 0.0479 Epoch: 6 Global Step: 76490 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:56:44,714-Speed 2977.10 samples/sec Loss 9.6271 LearningRate 0.0479 Epoch: 6 Global Step: 76500 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:56:48,143-Speed 2987.19 samples/sec Loss 9.7875 LearningRate 0.0479 Epoch: 6 Global Step: 76510 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:56:51,502-Speed 3049.85 samples/sec Loss 9.6441 LearningRate 0.0479 Epoch: 6 Global Step: 76520 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:56:54,892-Speed 3021.55 samples/sec Loss 9.6263 LearningRate 0.0479 Epoch: 6 Global Step: 76530 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:56:58,265-Speed 3036.97 samples/sec Loss 9.7661 LearningRate 0.0479 Epoch: 6 Global Step: 76540 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:57:01,620-Speed 3053.03 samples/sec Loss 9.7210 LearningRate 0.0479 Epoch: 6 Global Step: 76550 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:57:05,019-Speed 3015.65 samples/sec Loss 9.7477 LearningRate 0.0479 Epoch: 6 Global Step: 76560 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:57:08,534-Speed 2913.95 samples/sec Loss 9.7983 LearningRate 0.0479 Epoch: 6 Global Step: 76570 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:57:11,990-Speed 2964.24 samples/sec Loss 9.7038 LearningRate 0.0478 Epoch: 6 Global Step: 76580 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:57:15,365-Speed 3034.45 samples/sec Loss 9.7060 LearningRate 0.0478 Epoch: 6 Global Step: 76590 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:57:18,708-Speed 3063.82 samples/sec Loss 9.9089 LearningRate 0.0478 Epoch: 6 Global Step: 76600 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:57:22,083-Speed 3035.19 samples/sec Loss 9.8386 LearningRate 0.0478 Epoch: 6 Global Step: 76610 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:57:25,510-Speed 2989.60 samples/sec Loss 9.6677 LearningRate 0.0478 Epoch: 6 Global Step: 76620 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:57:28,874-Speed 3043.85 samples/sec Loss 9.7126 LearningRate 0.0478 Epoch: 6 Global Step: 76630 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:57:32,214-Speed 3068.44 samples/sec Loss 9.8517 LearningRate 0.0478 Epoch: 6 Global Step: 76640 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:57:35,694-Speed 2943.20 samples/sec Loss 9.8351 LearningRate 0.0478 Epoch: 6 Global Step: 76650 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:57:39,177-Speed 2940.92 samples/sec Loss 9.7666 LearningRate 0.0478 Epoch: 6 Global Step: 76660 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:57:42,625-Speed 2970.57 samples/sec Loss 9.8165 LearningRate 0.0478 Epoch: 6 Global Step: 76670 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:57:46,102-Speed 2946.10 samples/sec Loss 9.5790 LearningRate 0.0478 Epoch: 6 Global Step: 76680 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:57:49,473-Speed 3038.86 samples/sec Loss 9.8062 LearningRate 0.0478 Epoch: 6 Global Step: 76690 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:57:52,832-Speed 3048.93 samples/sec Loss 9.6428 LearningRate 0.0478 Epoch: 6 Global Step: 76700 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:57:56,214-Speed 3029.36 samples/sec Loss 9.7794 LearningRate 0.0478 Epoch: 6 Global Step: 76710 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:57:59,611-Speed 3015.48 samples/sec Loss 9.7813 LearningRate 0.0478 Epoch: 6 Global Step: 76720 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:58:02,990-Speed 3031.81 samples/sec Loss 9.7568 LearningRate 0.0478 Epoch: 6 Global Step: 76730 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:58:06,321-Speed 3074.83 samples/sec Loss 9.7910 LearningRate 0.0478 Epoch: 6 Global Step: 76740 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:58:09,745-Speed 2991.33 samples/sec Loss 9.6689 LearningRate 0.0478 Epoch: 6 Global Step: 76750 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:58:13,156-Speed 3002.76 samples/sec Loss 9.7442 LearningRate 0.0477 Epoch: 6 Global Step: 76760 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:58:16,522-Speed 3042.92 samples/sec Loss 9.6425 LearningRate 0.0477 Epoch: 6 Global Step: 76770 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:58:19,890-Speed 3041.19 samples/sec Loss 9.8334 LearningRate 0.0477 Epoch: 6 Global Step: 76780 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:58:23,296-Speed 3007.25 samples/sec Loss 9.6262 LearningRate 0.0477 Epoch: 6 Global Step: 76790 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:58:26,699-Speed 3010.23 samples/sec Loss 9.6280 LearningRate 0.0477 Epoch: 6 Global Step: 76800 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:58:30,106-Speed 3006.66 samples/sec Loss 9.6748 LearningRate 0.0477 Epoch: 6 Global Step: 76810 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:58:33,500-Speed 3017.62 samples/sec Loss 9.7366 LearningRate 0.0477 Epoch: 6 Global Step: 76820 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:58:36,961-Speed 2961.04 samples/sec Loss 9.7910 LearningRate 0.0477 Epoch: 6 Global Step: 76830 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:58:40,302-Speed 3065.80 samples/sec Loss 9.6933 LearningRate 0.0477 Epoch: 6 Global Step: 76840 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:58:43,720-Speed 2996.40 samples/sec Loss 9.6481 LearningRate 0.0477 Epoch: 6 Global Step: 76850 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:58:47,069-Speed 3058.58 samples/sec Loss 9.8795 LearningRate 0.0477 Epoch: 6 Global Step: 76860 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:58:50,448-Speed 3031.89 samples/sec Loss 9.7594 LearningRate 0.0477 Epoch: 6 Global Step: 76870 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:58:53,833-Speed 3025.53 samples/sec Loss 9.8806 LearningRate 0.0477 Epoch: 6 Global Step: 76880 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:58:57,204-Speed 3038.85 samples/sec Loss 9.7449 LearningRate 0.0477 Epoch: 6 Global Step: 76890 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:59:00,561-Speed 3051.21 samples/sec Loss 9.8068 LearningRate 0.0477 Epoch: 6 Global Step: 76900 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:59:03,933-Speed 3037.65 samples/sec Loss 9.7437 LearningRate 0.0477 Epoch: 6 Global Step: 76910 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:59:07,316-Speed 3028.19 samples/sec Loss 9.7208 LearningRate 0.0477 Epoch: 6 Global Step: 76920 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:59:10,747-Speed 2985.47 samples/sec Loss 9.8356 LearningRate 0.0477 Epoch: 6 Global Step: 76930 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:59:14,122-Speed 3035.04 samples/sec Loss 9.7875 LearningRate 0.0476 Epoch: 6 Global Step: 76940 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-04-27 08:59:17,478-Speed 3052.16 samples/sec Loss 9.9216 LearningRate 0.0476 Epoch: 6 Global Step: 76950 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:59:20,984-Speed 2921.12 samples/sec Loss 9.6758 LearningRate 0.0476 Epoch: 6 Global Step: 76960 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:59:24,420-Speed 2981.48 samples/sec Loss 9.6826 LearningRate 0.0476 Epoch: 6 Global Step: 76970 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:59:27,813-Speed 3018.83 samples/sec Loss 9.9581 LearningRate 0.0476 Epoch: 6 Global Step: 76980 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:59:31,152-Speed 3067.80 samples/sec Loss 9.8006 LearningRate 0.0476 Epoch: 6 Global Step: 76990 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:59:34,567-Speed 2999.12 samples/sec Loss 9.8170 LearningRate 0.0476 Epoch: 6 Global Step: 77000 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:59:37,933-Speed 3043.34 samples/sec Loss 9.7576 LearningRate 0.0476 Epoch: 6 Global Step: 77010 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:59:41,271-Speed 3068.63 samples/sec Loss 9.8400 LearningRate 0.0476 Epoch: 6 Global Step: 77020 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:59:44,679-Speed 3005.59 samples/sec Loss 9.7107 LearningRate 0.0476 Epoch: 6 Global Step: 77030 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:59:48,069-Speed 3021.89 samples/sec Loss 9.7183 LearningRate 0.0476 Epoch: 6 Global Step: 77040 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:59:51,412-Speed 3064.01 samples/sec Loss 9.8228 LearningRate 0.0476 Epoch: 6 Global Step: 77050 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:59:54,769-Speed 3051.12 samples/sec Loss 9.6919 LearningRate 0.0476 Epoch: 6 Global Step: 77060 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 08:59:58,109-Speed 3067.16 samples/sec Loss 9.9137 LearningRate 0.0476 Epoch: 6 Global Step: 77070 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:00:01,547-Speed 2978.81 samples/sec Loss 9.7698 LearningRate 0.0476 Epoch: 6 Global Step: 77080 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:00:04,969-Speed 2993.81 samples/sec Loss 9.7853 LearningRate 0.0476 Epoch: 6 Global Step: 77090 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:00:08,375-Speed 3006.99 samples/sec Loss 9.8443 LearningRate 0.0476 Epoch: 6 Global Step: 77100 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:00:11,820-Speed 2973.76 samples/sec Loss 9.6998 LearningRate 0.0476 Epoch: 6 Global Step: 77110 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:00:15,149-Speed 3077.02 samples/sec Loss 9.9969 LearningRate 0.0475 Epoch: 6 Global Step: 77120 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:00:18,526-Speed 3032.85 samples/sec Loss 9.8382 LearningRate 0.0475 Epoch: 6 Global Step: 77130 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:00:21,879-Speed 3055.17 samples/sec Loss 9.9079 LearningRate 0.0475 Epoch: 6 Global Step: 77140 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:00:25,243-Speed 3045.07 samples/sec Loss 9.8379 LearningRate 0.0475 Epoch: 6 Global Step: 77150 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:00:28,629-Speed 3024.81 samples/sec Loss 9.9720 LearningRate 0.0475 Epoch: 6 Global Step: 77160 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:00:32,141-Speed 2916.88 samples/sec Loss 9.8594 LearningRate 0.0475 Epoch: 6 Global Step: 77170 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:00:35,529-Speed 3023.34 samples/sec Loss 9.7779 LearningRate 0.0475 Epoch: 6 Global Step: 77180 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:00:38,845-Speed 3089.53 samples/sec Loss 9.8606 LearningRate 0.0475 Epoch: 6 Global Step: 77190 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:00:42,254-Speed 3004.18 samples/sec Loss 9.8193 LearningRate 0.0475 Epoch: 6 Global Step: 77200 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:00:45,663-Speed 3005.34 samples/sec Loss 9.8569 LearningRate 0.0475 Epoch: 6 Global Step: 77210 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:00:49,029-Speed 3042.41 samples/sec Loss 9.8716 LearningRate 0.0475 Epoch: 6 Global Step: 77220 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:00:52,461-Speed 2984.88 samples/sec Loss 9.8831 LearningRate 0.0475 Epoch: 6 Global Step: 77230 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:00:55,846-Speed 3026.12 samples/sec Loss 9.7007 LearningRate 0.0475 Epoch: 6 Global Step: 77240 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:00:59,262-Speed 2998.54 samples/sec Loss 9.7190 LearningRate 0.0475 Epoch: 6 Global Step: 77250 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:01:02,590-Speed 3077.77 samples/sec Loss 9.8217 LearningRate 0.0475 Epoch: 6 Global Step: 77260 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:01:05,959-Speed 3040.16 samples/sec Loss 9.9034 LearningRate 0.0475 Epoch: 6 Global Step: 77270 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:01:09,309-Speed 3057.33 samples/sec Loss 9.8669 LearningRate 0.0475 Epoch: 6 Global Step: 77280 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:01:12,768-Speed 2962.17 samples/sec Loss 9.7956 LearningRate 0.0475 Epoch: 6 Global Step: 77290 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:01:16,123-Speed 3052.66 samples/sec Loss 9.8467 LearningRate 0.0474 Epoch: 6 Global Step: 77300 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:01:19,470-Speed 3060.65 samples/sec Loss 9.8088 LearningRate 0.0474 Epoch: 6 Global Step: 77310 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:01:22,817-Speed 3059.61 samples/sec Loss 9.7237 LearningRate 0.0474 Epoch: 6 Global Step: 77320 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:01:26,202-Speed 3027.36 samples/sec Loss 9.7330 LearningRate 0.0474 Epoch: 6 Global Step: 77330 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:01:29,542-Speed 3066.30 samples/sec Loss 9.8755 LearningRate 0.0474 Epoch: 6 Global Step: 77340 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:01:32,881-Speed 3067.43 samples/sec Loss 9.9166 LearningRate 0.0474 Epoch: 6 Global Step: 77350 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:01:36,191-Speed 3095.16 samples/sec Loss 9.9106 LearningRate 0.0474 Epoch: 6 Global Step: 77360 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:01:39,538-Speed 3060.44 samples/sec Loss 9.7226 LearningRate 0.0474 Epoch: 6 Global Step: 77370 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:01:42,906-Speed 3041.00 samples/sec Loss 9.8110 LearningRate 0.0474 Epoch: 6 Global Step: 77380 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:01:46,351-Speed 2973.19 samples/sec Loss 9.8399 LearningRate 0.0474 Epoch: 6 Global Step: 77390 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:01:49,714-Speed 3046.20 samples/sec Loss 9.8450 LearningRate 0.0474 Epoch: 6 Global Step: 77400 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:01:53,083-Speed 3040.45 samples/sec Loss 10.0265 LearningRate 0.0474 Epoch: 6 Global Step: 77410 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:01:56,478-Speed 3017.00 samples/sec Loss 9.9053 LearningRate 0.0474 Epoch: 6 Global Step: 77420 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:01:59,953-Speed 2947.54 samples/sec Loss 9.8614 LearningRate 0.0474 Epoch: 6 Global Step: 77430 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:02:03,378-Speed 2991.17 samples/sec Loss 9.7269 LearningRate 0.0474 Epoch: 6 Global Step: 77440 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:02:06,719-Speed 3065.79 samples/sec Loss 9.8659 LearningRate 0.0474 Epoch: 6 Global Step: 77450 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:02:10,130-Speed 3002.79 samples/sec Loss 10.0788 LearningRate 0.0474 Epoch: 6 Global Step: 77460 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:02:13,651-Speed 2908.62 samples/sec Loss 9.7952 LearningRate 0.0474 Epoch: 6 Global Step: 77470 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:02:17,071-Speed 2994.78 samples/sec Loss 9.7468 LearningRate 0.0473 Epoch: 6 Global Step: 77480 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:02:20,504-Speed 2984.00 samples/sec Loss 10.0211 LearningRate 0.0473 Epoch: 6 Global Step: 77490 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:02:23,967-Speed 2957.74 samples/sec Loss 9.8161 LearningRate 0.0473 Epoch: 6 Global Step: 77500 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:02:27,439-Speed 2950.83 samples/sec Loss 9.8479 LearningRate 0.0473 Epoch: 6 Global Step: 77510 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:02:30,906-Speed 2953.66 samples/sec Loss 10.0607 LearningRate 0.0473 Epoch: 6 Global Step: 77520 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:02:34,380-Speed 2949.18 samples/sec Loss 9.9336 LearningRate 0.0473 Epoch: 6 Global Step: 77530 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:02:37,816-Speed 2980.76 samples/sec Loss 9.8228 LearningRate 0.0473 Epoch: 6 Global Step: 77540 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:02:41,176-Speed 3048.94 samples/sec Loss 10.0104 LearningRate 0.0473 Epoch: 6 Global Step: 77550 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:02:44,531-Speed 3053.21 samples/sec Loss 9.8831 LearningRate 0.0473 Epoch: 6 Global Step: 77560 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:02:47,866-Speed 3070.81 samples/sec Loss 9.6824 LearningRate 0.0473 Epoch: 6 Global Step: 77570 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:02:51,359-Speed 2932.46 samples/sec Loss 9.7911 LearningRate 0.0473 Epoch: 6 Global Step: 77580 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:02:54,691-Speed 3074.51 samples/sec Loss 9.7929 LearningRate 0.0473 Epoch: 6 Global Step: 77590 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:02:58,092-Speed 3011.56 samples/sec Loss 9.7632 LearningRate 0.0473 Epoch: 6 Global Step: 77600 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:03:01,472-Speed 3030.62 samples/sec Loss 9.9001 LearningRate 0.0473 Epoch: 6 Global Step: 77610 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:03:04,866-Speed 3017.67 samples/sec Loss 9.9470 LearningRate 0.0473 Epoch: 6 Global Step: 77620 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:03:08,273-Speed 3006.22 samples/sec Loss 9.9889 LearningRate 0.0473 Epoch: 6 Global Step: 77630 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:03:11,676-Speed 3010.17 samples/sec Loss 9.9535 LearningRate 0.0473 Epoch: 6 Global Step: 77640 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:03:15,119-Speed 2975.17 samples/sec Loss 9.7665 LearningRate 0.0473 Epoch: 6 Global Step: 77650 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:03:18,514-Speed 3017.44 samples/sec Loss 9.9160 LearningRate 0.0472 Epoch: 6 Global Step: 77660 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:03:21,960-Speed 2971.85 samples/sec Loss 9.8505 LearningRate 0.0472 Epoch: 6 Global Step: 77670 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:03:25,309-Speed 3059.13 samples/sec Loss 9.7716 LearningRate 0.0472 Epoch: 6 Global Step: 77680 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:03:28,627-Speed 3086.80 samples/sec Loss 9.9210 LearningRate 0.0472 Epoch: 6 Global Step: 77690 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:03:32,013-Speed 3025.04 samples/sec Loss 9.8040 LearningRate 0.0472 Epoch: 6 Global Step: 77700 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:03:35,388-Speed 3035.56 samples/sec Loss 9.9155 LearningRate 0.0472 Epoch: 6 Global Step: 77710 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:03:38,820-Speed 2984.21 samples/sec Loss 9.8581 LearningRate 0.0472 Epoch: 6 Global Step: 77720 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:03:42,209-Speed 3022.86 samples/sec Loss 9.7082 LearningRate 0.0472 Epoch: 6 Global Step: 77730 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:03:45,635-Speed 2990.44 samples/sec Loss 9.9470 LearningRate 0.0472 Epoch: 6 Global Step: 77740 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:03:49,037-Speed 3011.04 samples/sec Loss 9.9775 LearningRate 0.0472 Epoch: 6 Global Step: 77750 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:03:52,394-Speed 3051.58 samples/sec Loss 10.0457 LearningRate 0.0472 Epoch: 6 Global Step: 77760 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:03:55,742-Speed 3059.98 samples/sec Loss 9.9426 LearningRate 0.0472 Epoch: 6 Global Step: 77770 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:03:59,120-Speed 3031.98 samples/sec Loss 10.0268 LearningRate 0.0472 Epoch: 6 Global Step: 77780 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:04:02,552-Speed 2984.37 samples/sec Loss 9.9188 LearningRate 0.0472 Epoch: 6 Global Step: 77790 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:04:06,033-Speed 2943.24 samples/sec Loss 9.8458 LearningRate 0.0472 Epoch: 6 Global Step: 77800 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:04:09,499-Speed 2954.66 samples/sec Loss 9.8542 LearningRate 0.0472 Epoch: 6 Global Step: 77810 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:04:12,954-Speed 2964.84 samples/sec Loss 9.9601 LearningRate 0.0472 Epoch: 6 Global Step: 77820 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:04:16,342-Speed 3024.47 samples/sec Loss 9.8400 LearningRate 0.0472 Epoch: 6 Global Step: 77830 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:04:19,806-Speed 2956.73 samples/sec Loss 10.0471 LearningRate 0.0472 Epoch: 6 Global Step: 77840 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:04:23,249-Speed 2975.74 samples/sec Loss 9.7947 LearningRate 0.0471 Epoch: 6 Global Step: 77850 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:04:26,692-Speed 2974.93 samples/sec Loss 9.8066 LearningRate 0.0471 Epoch: 6 Global Step: 77860 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:04:30,052-Speed 3048.37 samples/sec Loss 9.7963 LearningRate 0.0471 Epoch: 6 Global Step: 77870 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:04:33,442-Speed 3021.25 samples/sec Loss 9.9148 LearningRate 0.0471 Epoch: 6 Global Step: 77880 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:04:36,948-Speed 2921.91 samples/sec Loss 10.0421 LearningRate 0.0471 Epoch: 6 Global Step: 77890 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:04:40,307-Speed 3049.69 samples/sec Loss 9.8903 LearningRate 0.0471 Epoch: 6 Global Step: 77900 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:04:43,692-Speed 3025.50 samples/sec Loss 9.8470 LearningRate 0.0471 Epoch: 6 Global Step: 77910 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:04:47,215-Speed 2907.80 samples/sec Loss 9.9131 LearningRate 0.0471 Epoch: 6 Global Step: 77920 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:04:50,612-Speed 3015.80 samples/sec Loss 9.8034 LearningRate 0.0471 Epoch: 6 Global Step: 77930 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:04:54,014-Speed 3010.96 samples/sec Loss 9.8795 LearningRate 0.0471 Epoch: 6 Global Step: 77940 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:04:57,443-Speed 2987.19 samples/sec Loss 10.0352 LearningRate 0.0471 Epoch: 6 Global Step: 77950 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:05:00,840-Speed 3015.56 samples/sec Loss 9.8272 LearningRate 0.0471 Epoch: 6 Global Step: 77960 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:05:04,264-Speed 2990.93 samples/sec Loss 9.7919 LearningRate 0.0471 Epoch: 6 Global Step: 77970 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:05:07,710-Speed 2972.36 samples/sec Loss 10.0617 LearningRate 0.0471 Epoch: 6 Global Step: 77980 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:05:11,240-Speed 2901.76 samples/sec Loss 9.9047 LearningRate 0.0471 Epoch: 6 Global Step: 77990 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:05:14,665-Speed 2990.63 samples/sec Loss 9.8586 LearningRate 0.0471 Epoch: 6 Global Step: 78000 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:05:18,092-Speed 2989.48 samples/sec Loss 9.8672 LearningRate 0.0471 Epoch: 6 Global Step: 78010 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:05:21,438-Speed 3061.14 samples/sec Loss 9.8918 LearningRate 0.0471 Epoch: 6 Global Step: 78020 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:05:24,807-Speed 3040.61 samples/sec Loss 9.9822 LearningRate 0.0470 Epoch: 6 Global Step: 78030 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:05:28,210-Speed 3009.87 samples/sec Loss 9.9160 LearningRate 0.0470 Epoch: 6 Global Step: 78040 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:05:31,606-Speed 3016.40 samples/sec Loss 10.0069 LearningRate 0.0470 Epoch: 6 Global Step: 78050 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:05:34,992-Speed 3024.84 samples/sec Loss 9.8208 LearningRate 0.0470 Epoch: 6 Global Step: 78060 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:05:38,375-Speed 3027.39 samples/sec Loss 10.0033 LearningRate 0.0470 Epoch: 6 Global Step: 78070 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:05:41,839-Speed 2957.62 samples/sec Loss 9.7872 LearningRate 0.0470 Epoch: 6 Global Step: 78080 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:05:45,197-Speed 3049.79 samples/sec Loss 9.7035 LearningRate 0.0470 Epoch: 6 Global Step: 78090 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:05:48,655-Speed 2961.96 samples/sec Loss 9.9026 LearningRate 0.0470 Epoch: 6 Global Step: 78100 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:05:52,042-Speed 3024.40 samples/sec Loss 9.8457 LearningRate 0.0470 Epoch: 6 Global Step: 78110 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:05:55,400-Speed 3050.33 samples/sec Loss 9.9822 LearningRate 0.0470 Epoch: 6 Global Step: 78120 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:05:58,756-Speed 3052.75 samples/sec Loss 9.9453 LearningRate 0.0470 Epoch: 6 Global Step: 78130 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:06:02,191-Speed 2981.15 samples/sec Loss 9.9767 LearningRate 0.0470 Epoch: 6 Global Step: 78140 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:06:05,535-Speed 3063.63 samples/sec Loss 9.7857 LearningRate 0.0470 Epoch: 6 Global Step: 78150 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:06:08,989-Speed 2965.40 samples/sec Loss 9.9126 LearningRate 0.0470 Epoch: 6 Global Step: 78160 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:06:12,379-Speed 3021.85 samples/sec Loss 9.9509 LearningRate 0.0470 Epoch: 6 Global Step: 78170 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:06:15,878-Speed 2927.45 samples/sec Loss 10.0139 LearningRate 0.0470 Epoch: 6 Global Step: 78180 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:06:19,299-Speed 2994.12 samples/sec Loss 10.0646 LearningRate 0.0470 Epoch: 6 Global Step: 78190 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:06:22,694-Speed 3016.94 samples/sec Loss 9.9149 LearningRate 0.0470 Epoch: 6 Global Step: 78200 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:06:26,178-Speed 2940.30 samples/sec Loss 9.9819 LearningRate 0.0469 Epoch: 6 Global Step: 78210 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:06:29,518-Speed 3066.61 samples/sec Loss 9.8890 LearningRate 0.0469 Epoch: 6 Global Step: 78220 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:06:32,937-Speed 2995.69 samples/sec Loss 9.9682 LearningRate 0.0469 Epoch: 6 Global Step: 78230 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:06:36,370-Speed 2983.63 samples/sec Loss 9.9433 LearningRate 0.0469 Epoch: 6 Global Step: 78240 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:06:39,836-Speed 2955.16 samples/sec Loss 9.8250 LearningRate 0.0469 Epoch: 6 Global Step: 78250 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:06:43,162-Speed 3079.48 samples/sec Loss 9.9014 LearningRate 0.0469 Epoch: 6 Global Step: 78260 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:06:46,474-Speed 3093.47 samples/sec Loss 9.7394 LearningRate 0.0469 Epoch: 6 Global Step: 78270 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:06:49,877-Speed 3009.32 samples/sec Loss 10.0546 LearningRate 0.0469 Epoch: 6 Global Step: 78280 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:06:53,222-Speed 3062.84 samples/sec Loss 9.9108 LearningRate 0.0469 Epoch: 6 Global Step: 78290 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:06:56,585-Speed 3045.55 samples/sec Loss 9.9855 LearningRate 0.0469 Epoch: 6 Global Step: 78300 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:06:59,933-Speed 3059.71 samples/sec Loss 10.0335 LearningRate 0.0469 Epoch: 6 Global Step: 78310 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:07:03,337-Speed 3009.28 samples/sec Loss 9.9537 LearningRate 0.0469 Epoch: 6 Global Step: 78320 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:07:06,710-Speed 3037.41 samples/sec Loss 10.0034 LearningRate 0.0469 Epoch: 6 Global Step: 78330 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:07:10,061-Speed 3057.09 samples/sec Loss 9.8992 LearningRate 0.0469 Epoch: 6 Global Step: 78340 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:07:13,420-Speed 3049.11 samples/sec Loss 10.0294 LearningRate 0.0469 Epoch: 6 Global Step: 78350 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:07:16,788-Speed 3041.65 samples/sec Loss 9.9021 LearningRate 0.0469 Epoch: 6 Global Step: 78360 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:07:20,107-Speed 3086.56 samples/sec Loss 9.7775 LearningRate 0.0469 Epoch: 6 Global Step: 78370 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:07:23,515-Speed 3005.59 samples/sec Loss 9.9037 LearningRate 0.0469 Epoch: 6 Global Step: 78380 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:07:26,915-Speed 3012.36 samples/sec Loss 9.9656 LearningRate 0.0468 Epoch: 6 Global Step: 78390 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:07:30,279-Speed 3044.91 samples/sec Loss 9.7721 LearningRate 0.0468 Epoch: 6 Global Step: 78400 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:07:33,717-Speed 2979.19 samples/sec Loss 9.8428 LearningRate 0.0468 Epoch: 6 Global Step: 78410 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:07:37,132-Speed 2999.98 samples/sec Loss 9.9329 LearningRate 0.0468 Epoch: 6 Global Step: 78420 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:07:40,555-Speed 2992.06 samples/sec Loss 9.8981 LearningRate 0.0468 Epoch: 6 Global Step: 78430 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:07:43,956-Speed 3011.68 samples/sec Loss 10.0055 LearningRate 0.0468 Epoch: 6 Global Step: 78440 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:07:47,316-Speed 3048.44 samples/sec Loss 9.9664 LearningRate 0.0468 Epoch: 6 Global Step: 78450 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:07:50,723-Speed 3007.21 samples/sec Loss 9.9233 LearningRate 0.0468 Epoch: 6 Global Step: 78460 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:07:54,089-Speed 3043.01 samples/sec Loss 9.9141 LearningRate 0.0468 Epoch: 6 Global Step: 78470 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:07:57,487-Speed 3014.08 samples/sec Loss 9.9739 LearningRate 0.0468 Epoch: 6 Global Step: 78480 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:08:00,846-Speed 3050.33 samples/sec Loss 10.0895 LearningRate 0.0468 Epoch: 6 Global Step: 78490 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:08:04,257-Speed 3002.42 samples/sec Loss 9.8743 LearningRate 0.0468 Epoch: 6 Global Step: 78500 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:08:07,664-Speed 3006.59 samples/sec Loss 9.9317 LearningRate 0.0468 Epoch: 6 Global Step: 78510 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:08:11,075-Speed 3003.62 samples/sec Loss 9.9453 LearningRate 0.0468 Epoch: 6 Global Step: 78520 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:08:14,467-Speed 3019.21 samples/sec Loss 10.0000 LearningRate 0.0468 Epoch: 6 Global Step: 78530 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:08:17,876-Speed 3005.20 samples/sec Loss 9.7598 LearningRate 0.0468 Epoch: 6 Global Step: 78540 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:08:21,270-Speed 3018.44 samples/sec Loss 9.9614 LearningRate 0.0468 Epoch: 6 Global Step: 78550 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:08:24,681-Speed 3002.92 samples/sec Loss 9.8951 LearningRate 0.0468 Epoch: 6 Global Step: 78560 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:08:28,071-Speed 3021.04 samples/sec Loss 9.8619 LearningRate 0.0467 Epoch: 6 Global Step: 78570 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:08:31,418-Speed 3060.66 samples/sec Loss 9.7884 LearningRate 0.0467 Epoch: 6 Global Step: 78580 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:08:34,875-Speed 2962.61 samples/sec Loss 9.8880 LearningRate 0.0467 Epoch: 6 Global Step: 78590 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:08:38,214-Speed 3068.33 samples/sec Loss 9.8499 LearningRate 0.0467 Epoch: 6 Global Step: 78600 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:08:41,639-Speed 2990.48 samples/sec Loss 9.8695 LearningRate 0.0467 Epoch: 6 Global Step: 78610 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:08:44,996-Speed 3050.65 samples/sec Loss 10.0130 LearningRate 0.0467 Epoch: 6 Global Step: 78620 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:08:48,401-Speed 3008.90 samples/sec Loss 9.8907 LearningRate 0.0467 Epoch: 6 Global Step: 78630 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:08:51,808-Speed 3006.22 samples/sec Loss 9.8947 LearningRate 0.0467 Epoch: 6 Global Step: 78640 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:08:55,218-Speed 3004.52 samples/sec Loss 9.8517 LearningRate 0.0467 Epoch: 6 Global Step: 78650 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:08:58,601-Speed 3027.54 samples/sec Loss 9.8492 LearningRate 0.0467 Epoch: 6 Global Step: 78660 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:09:01,971-Speed 3040.11 samples/sec Loss 9.7967 LearningRate 0.0467 Epoch: 6 Global Step: 78670 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:09:05,366-Speed 3016.67 samples/sec Loss 9.9256 LearningRate 0.0467 Epoch: 6 Global Step: 78680 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:09:08,753-Speed 3024.84 samples/sec Loss 9.8752 LearningRate 0.0467 Epoch: 6 Global Step: 78690 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:09:12,104-Speed 3056.15 samples/sec Loss 9.7285 LearningRate 0.0467 Epoch: 6 Global Step: 78700 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:09:15,571-Speed 2954.75 samples/sec Loss 10.0214 LearningRate 0.0467 Epoch: 6 Global Step: 78710 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:09:19,001-Speed 2986.21 samples/sec Loss 10.0130 LearningRate 0.0467 Epoch: 6 Global Step: 78720 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:09:22,473-Speed 2950.50 samples/sec Loss 9.9101 LearningRate 0.0467 Epoch: 6 Global Step: 78730 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:09:25,858-Speed 3025.47 samples/sec Loss 9.7900 LearningRate 0.0467 Epoch: 6 Global Step: 78740 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:09:29,259-Speed 3012.17 samples/sec Loss 10.0264 LearningRate 0.0466 Epoch: 6 Global Step: 78750 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:09:32,653-Speed 3017.49 samples/sec Loss 9.8943 LearningRate 0.0466 Epoch: 6 Global Step: 78760 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:09:36,180-Speed 2904.44 samples/sec Loss 10.0131 LearningRate 0.0466 Epoch: 6 Global Step: 78770 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:09:39,528-Speed 3059.75 samples/sec Loss 10.0115 LearningRate 0.0466 Epoch: 6 Global Step: 78780 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:09:43,030-Speed 2924.21 samples/sec Loss 10.0714 LearningRate 0.0466 Epoch: 6 Global Step: 78790 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:09:46,449-Speed 2996.01 samples/sec Loss 9.9299 LearningRate 0.0466 Epoch: 6 Global Step: 78800 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:09:49,846-Speed 3015.50 samples/sec Loss 9.9401 LearningRate 0.0466 Epoch: 6 Global Step: 78810 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:09:53,159-Speed 3092.26 samples/sec Loss 9.9127 LearningRate 0.0466 Epoch: 6 Global Step: 78820 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:09:56,511-Speed 3055.39 samples/sec Loss 10.0607 LearningRate 0.0466 Epoch: 6 Global Step: 78830 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:09:59,858-Speed 3060.05 samples/sec Loss 10.0644 LearningRate 0.0466 Epoch: 6 Global Step: 78840 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:10:03,241-Speed 3028.01 samples/sec Loss 10.0181 LearningRate 0.0466 Epoch: 6 Global Step: 78850 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:10:06,646-Speed 3007.94 samples/sec Loss 9.9421 LearningRate 0.0466 Epoch: 6 Global Step: 78860 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:10:10,085-Speed 2978.23 samples/sec Loss 9.9299 LearningRate 0.0466 Epoch: 6 Global Step: 78870 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:10:13,460-Speed 3035.10 samples/sec Loss 10.0356 LearningRate 0.0466 Epoch: 6 Global Step: 78880 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:10:16,902-Speed 2976.49 samples/sec Loss 9.9891 LearningRate 0.0466 Epoch: 6 Global Step: 78890 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:10:20,341-Speed 2978.41 samples/sec Loss 9.8531 LearningRate 0.0466 Epoch: 6 Global Step: 78900 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:10:23,672-Speed 3074.84 samples/sec Loss 9.8107 LearningRate 0.0466 Epoch: 6 Global Step: 78910 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:10:27,121-Speed 2970.77 samples/sec Loss 9.9356 LearningRate 0.0466 Epoch: 6 Global Step: 78920 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:10:30,496-Speed 3034.17 samples/sec Loss 9.8883 LearningRate 0.0465 Epoch: 6 Global Step: 78930 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:10:33,943-Speed 2972.30 samples/sec Loss 9.9294 LearningRate 0.0465 Epoch: 6 Global Step: 78940 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:10:37,335-Speed 3019.04 samples/sec Loss 9.8915 LearningRate 0.0465 Epoch: 6 Global Step: 78950 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:10:40,695-Speed 3049.21 samples/sec Loss 9.9803 LearningRate 0.0465 Epoch: 6 Global Step: 78960 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:10:44,150-Speed 2964.74 samples/sec Loss 9.9703 LearningRate 0.0465 Epoch: 6 Global Step: 78970 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:10:47,554-Speed 3008.70 samples/sec Loss 9.9158 LearningRate 0.0465 Epoch: 6 Global Step: 78980 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:10:50,910-Speed 3052.53 samples/sec Loss 10.1433 LearningRate 0.0465 Epoch: 6 Global Step: 78990 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:10:54,282-Speed 3037.44 samples/sec Loss 9.7681 LearningRate 0.0465 Epoch: 6 Global Step: 79000 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:10:57,662-Speed 3030.68 samples/sec Loss 9.8737 LearningRate 0.0465 Epoch: 6 Global Step: 79010 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:11:01,082-Speed 2995.24 samples/sec Loss 9.9449 LearningRate 0.0465 Epoch: 6 Global Step: 79020 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:11:04,528-Speed 2972.43 samples/sec Loss 10.0276 LearningRate 0.0465 Epoch: 6 Global Step: 79030 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:11:07,965-Speed 2979.99 samples/sec Loss 9.7789 LearningRate 0.0465 Epoch: 6 Global Step: 79040 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:11:11,421-Speed 2964.20 samples/sec Loss 10.0269 LearningRate 0.0465 Epoch: 6 Global Step: 79050 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:11:14,758-Speed 3068.87 samples/sec Loss 9.9623 LearningRate 0.0465 Epoch: 6 Global Step: 79060 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:11:18,153-Speed 3017.66 samples/sec Loss 9.8099 LearningRate 0.0465 Epoch: 6 Global Step: 79070 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:11:21,555-Speed 3011.04 samples/sec Loss 9.8793 LearningRate 0.0465 Epoch: 6 Global Step: 79080 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:11:24,875-Speed 3085.00 samples/sec Loss 9.9589 LearningRate 0.0465 Epoch: 6 Global Step: 79090 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:11:28,191-Speed 3088.64 samples/sec Loss 10.0547 LearningRate 0.0465 Epoch: 6 Global Step: 79100 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:11:31,638-Speed 2972.30 samples/sec Loss 10.0314 LearningRate 0.0465 Epoch: 6 Global Step: 79110 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:11:35,038-Speed 3013.05 samples/sec Loss 9.9004 LearningRate 0.0464 Epoch: 6 Global Step: 79120 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:11:38,406-Speed 3040.66 samples/sec Loss 10.0068 LearningRate 0.0464 Epoch: 6 Global Step: 79130 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:11:41,857-Speed 2968.27 samples/sec Loss 9.9768 LearningRate 0.0464 Epoch: 6 Global Step: 79140 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:11:45,240-Speed 3027.68 samples/sec Loss 9.8919 LearningRate 0.0464 Epoch: 6 Global Step: 79150 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:11:48,636-Speed 3017.44 samples/sec Loss 10.1381 LearningRate 0.0464 Epoch: 6 Global Step: 79160 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:11:52,059-Speed 2992.64 samples/sec Loss 9.9694 LearningRate 0.0464 Epoch: 6 Global Step: 79170 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:11:55,445-Speed 3024.48 samples/sec Loss 9.9636 LearningRate 0.0464 Epoch: 6 Global Step: 79180 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:11:58,841-Speed 3016.00 samples/sec Loss 9.7875 LearningRate 0.0464 Epoch: 6 Global Step: 79190 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:12:02,197-Speed 3052.03 samples/sec Loss 9.9042 LearningRate 0.0464 Epoch: 6 Global Step: 79200 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:12:05,627-Speed 2986.53 samples/sec Loss 9.8724 LearningRate 0.0464 Epoch: 6 Global Step: 79210 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:12:09,101-Speed 2948.06 samples/sec Loss 9.8962 LearningRate 0.0464 Epoch: 6 Global Step: 79220 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:12:12,561-Speed 2960.27 samples/sec Loss 9.8178 LearningRate 0.0464 Epoch: 6 Global Step: 79230 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:12:15,970-Speed 3004.60 samples/sec Loss 9.9858 LearningRate 0.0464 Epoch: 6 Global Step: 79240 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:12:19,339-Speed 3040.99 samples/sec Loss 9.8745 LearningRate 0.0464 Epoch: 6 Global Step: 79250 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:12:22,737-Speed 3014.42 samples/sec Loss 10.0019 LearningRate 0.0464 Epoch: 6 Global Step: 79260 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:12:26,156-Speed 2995.52 samples/sec Loss 10.0250 LearningRate 0.0464 Epoch: 6 Global Step: 79270 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:12:29,540-Speed 3027.43 samples/sec Loss 10.0698 LearningRate 0.0464 Epoch: 6 Global Step: 79280 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:12:32,961-Speed 2993.98 samples/sec Loss 9.8625 LearningRate 0.0464 Epoch: 6 Global Step: 79290 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:12:36,397-Speed 2980.79 samples/sec Loss 9.9015 LearningRate 0.0463 Epoch: 6 Global Step: 79300 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:12:39,828-Speed 2986.05 samples/sec Loss 9.9134 LearningRate 0.0463 Epoch: 6 Global Step: 79310 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:12:43,301-Speed 2949.84 samples/sec Loss 9.7248 LearningRate 0.0463 Epoch: 6 Global Step: 79320 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:12:46,664-Speed 3045.44 samples/sec Loss 9.9027 LearningRate 0.0463 Epoch: 6 Global Step: 79330 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:12:50,096-Speed 2985.24 samples/sec Loss 9.8904 LearningRate 0.0463 Epoch: 6 Global Step: 79340 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:12:53,517-Speed 2993.40 samples/sec Loss 9.8894 LearningRate 0.0463 Epoch: 6 Global Step: 79350 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:12:56,865-Speed 3059.71 samples/sec Loss 9.9071 LearningRate 0.0463 Epoch: 6 Global Step: 79360 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:13:00,352-Speed 2938.05 samples/sec Loss 10.0375 LearningRate 0.0463 Epoch: 6 Global Step: 79370 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:13:03,684-Speed 3073.34 samples/sec Loss 9.9229 LearningRate 0.0463 Epoch: 6 Global Step: 79380 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:13:07,113-Speed 2987.75 samples/sec Loss 9.8814 LearningRate 0.0463 Epoch: 6 Global Step: 79390 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:13:10,489-Speed 3035.05 samples/sec Loss 10.0321 LearningRate 0.0463 Epoch: 6 Global Step: 79400 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:13:13,934-Speed 2972.87 samples/sec Loss 9.8075 LearningRate 0.0463 Epoch: 6 Global Step: 79410 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:13:17,355-Speed 2994.04 samples/sec Loss 9.9054 LearningRate 0.0463 Epoch: 6 Global Step: 79420 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:13:20,739-Speed 3027.49 samples/sec Loss 9.9772 LearningRate 0.0463 Epoch: 6 Global Step: 79430 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:13:24,196-Speed 2963.08 samples/sec Loss 9.7782 LearningRate 0.0463 Epoch: 6 Global Step: 79440 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:13:27,576-Speed 3030.82 samples/sec Loss 10.0750 LearningRate 0.0463 Epoch: 6 Global Step: 79450 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:13:30,992-Speed 2998.18 samples/sec Loss 9.9632 LearningRate 0.0463 Epoch: 6 Global Step: 79460 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:13:34,397-Speed 3008.59 samples/sec Loss 10.0766 LearningRate 0.0463 Epoch: 6 Global Step: 79470 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:13:37,836-Speed 2978.65 samples/sec Loss 9.9230 LearningRate 0.0462 Epoch: 6 Global Step: 79480 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:13:41,267-Speed 2985.01 samples/sec Loss 9.8539 LearningRate 0.0462 Epoch: 6 Global Step: 79490 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:13:44,697-Speed 2986.78 samples/sec Loss 9.8874 LearningRate 0.0462 Epoch: 6 Global Step: 79500 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:13:48,141-Speed 2973.80 samples/sec Loss 9.9942 LearningRate 0.0462 Epoch: 6 Global Step: 79510 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:13:51,474-Speed 3073.33 samples/sec Loss 9.6469 LearningRate 0.0462 Epoch: 6 Global Step: 79520 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:13:54,937-Speed 2958.46 samples/sec Loss 9.8867 LearningRate 0.0462 Epoch: 6 Global Step: 79530 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:13:58,313-Speed 3033.53 samples/sec Loss 9.8713 LearningRate 0.0462 Epoch: 6 Global Step: 79540 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:14:01,732-Speed 2995.95 samples/sec Loss 9.9515 LearningRate 0.0462 Epoch: 6 Global Step: 79550 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:14:05,191-Speed 2961.35 samples/sec Loss 9.8585 LearningRate 0.0462 Epoch: 6 Global Step: 79560 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:14:08,521-Speed 3075.75 samples/sec Loss 9.8730 LearningRate 0.0462 Epoch: 6 Global Step: 79570 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:14:11,882-Speed 3047.70 samples/sec Loss 10.0942 LearningRate 0.0462 Epoch: 6 Global Step: 79580 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:14:15,254-Speed 3037.44 samples/sec Loss 9.8733 LearningRate 0.0462 Epoch: 6 Global Step: 79590 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:14:18,628-Speed 3036.32 samples/sec Loss 9.9007 LearningRate 0.0462 Epoch: 6 Global Step: 79600 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:14:22,005-Speed 3032.81 samples/sec Loss 9.8307 LearningRate 0.0462 Epoch: 6 Global Step: 79610 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:14:25,378-Speed 3036.84 samples/sec Loss 9.9275 LearningRate 0.0462 Epoch: 6 Global Step: 79620 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:14:28,727-Speed 3059.11 samples/sec Loss 9.8582 LearningRate 0.0462 Epoch: 6 Global Step: 79630 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:14:32,038-Speed 3093.42 samples/sec Loss 9.7542 LearningRate 0.0462 Epoch: 6 Global Step: 79640 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:14:35,457-Speed 2995.02 samples/sec Loss 9.9006 LearningRate 0.0462 Epoch: 6 Global Step: 79650 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:14:38,844-Speed 3024.58 samples/sec Loss 9.7628 LearningRate 0.0461 Epoch: 6 Global Step: 79660 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:14:42,226-Speed 3028.61 samples/sec Loss 10.0072 LearningRate 0.0461 Epoch: 6 Global Step: 79670 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:14:45,579-Speed 3055.06 samples/sec Loss 9.9707 LearningRate 0.0461 Epoch: 6 Global Step: 79680 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:14:48,983-Speed 3009.43 samples/sec Loss 9.9173 LearningRate 0.0461 Epoch: 6 Global Step: 79690 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:14:52,376-Speed 3019.16 samples/sec Loss 9.9658 LearningRate 0.0461 Epoch: 6 Global Step: 79700 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:14:55,880-Speed 2922.95 samples/sec Loss 9.9208 LearningRate 0.0461 Epoch: 6 Global Step: 79710 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:14:59,223-Speed 3064.35 samples/sec Loss 9.8116 LearningRate 0.0461 Epoch: 6 Global Step: 79720 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:15:02,645-Speed 2993.55 samples/sec Loss 9.9542 LearningRate 0.0461 Epoch: 6 Global Step: 79730 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:15:06,048-Speed 3009.71 samples/sec Loss 9.8271 LearningRate 0.0461 Epoch: 6 Global Step: 79740 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:15:09,412-Speed 3045.02 samples/sec Loss 9.8479 LearningRate 0.0461 Epoch: 6 Global Step: 79750 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:15:12,789-Speed 3033.12 samples/sec Loss 9.9795 LearningRate 0.0461 Epoch: 6 Global Step: 79760 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:15:16,129-Speed 3066.96 samples/sec Loss 10.0176 LearningRate 0.0461 Epoch: 6 Global Step: 79770 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 09:15:19,482-Speed 3054.32 samples/sec Loss 9.9552 LearningRate 0.0461 Epoch: 6 Global Step: 79780 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 09:15:22,943-Speed 2960.68 samples/sec Loss 10.0252 LearningRate 0.0461 Epoch: 6 Global Step: 79790 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 09:15:26,377-Speed 2982.38 samples/sec Loss 9.9071 LearningRate 0.0461 Epoch: 6 Global Step: 79800 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 09:15:29,754-Speed 3033.78 samples/sec Loss 9.7359 LearningRate 0.0461 Epoch: 6 Global Step: 79810 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 09:15:33,171-Speed 2997.02 samples/sec Loss 9.9476 LearningRate 0.0461 Epoch: 6 Global Step: 79820 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 09:15:36,500-Speed 3077.25 samples/sec Loss 9.8493 LearningRate 0.0461 Epoch: 6 Global Step: 79830 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 09:15:39,818-Speed 3086.80 samples/sec Loss 9.8443 LearningRate 0.0461 Epoch: 6 Global Step: 79840 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 09:15:43,253-Speed 2982.19 samples/sec Loss 9.8066 LearningRate 0.0460 Epoch: 6 Global Step: 79850 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 09:15:46,654-Speed 3011.92 samples/sec Loss 9.7738 LearningRate 0.0460 Epoch: 6 Global Step: 79860 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 09:15:49,994-Speed 3067.28 samples/sec Loss 9.8857 LearningRate 0.0460 Epoch: 6 Global Step: 79870 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:15:53,360-Speed 3042.98 samples/sec Loss 10.0170 LearningRate 0.0460 Epoch: 6 Global Step: 79880 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:15:56,803-Speed 2975.10 samples/sec Loss 9.7927 LearningRate 0.0460 Epoch: 6 Global Step: 79890 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:16:00,209-Speed 3008.00 samples/sec Loss 9.8631 LearningRate 0.0460 Epoch: 6 Global Step: 79900 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:16:03,544-Speed 3071.45 samples/sec Loss 9.7649 LearningRate 0.0460 Epoch: 6 Global Step: 79910 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:16:07,001-Speed 2962.23 samples/sec Loss 9.8076 LearningRate 0.0460 Epoch: 6 Global Step: 79920 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:16:10,400-Speed 3014.06 samples/sec Loss 9.8919 LearningRate 0.0460 Epoch: 6 Global Step: 79930 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:16:13,821-Speed 2993.74 samples/sec Loss 10.0415 LearningRate 0.0460 Epoch: 6 Global Step: 79940 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:16:17,262-Speed 2977.10 samples/sec Loss 9.9912 LearningRate 0.0460 Epoch: 6 Global Step: 79950 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:16:20,632-Speed 3040.01 samples/sec Loss 9.7659 LearningRate 0.0460 Epoch: 6 Global Step: 79960 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:16:24,017-Speed 3025.86 samples/sec Loss 10.0235 LearningRate 0.0460 Epoch: 6 Global Step: 79970 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:16:27,471-Speed 2965.81 samples/sec Loss 9.8859 LearningRate 0.0460 Epoch: 6 Global Step: 79980 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:16:30,872-Speed 3012.47 samples/sec Loss 9.9063 LearningRate 0.0460 Epoch: 6 Global Step: 79990 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:16:34,230-Speed 3050.84 samples/sec Loss 9.9558 LearningRate 0.0460 Epoch: 6 Global Step: 80000 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:16:37,645-Speed 2999.61 samples/sec Loss 9.8668 LearningRate 0.0460 Epoch: 6 Global Step: 80010 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:16:40,989-Speed 3062.79 samples/sec Loss 9.9123 LearningRate 0.0460 Epoch: 6 Global Step: 80020 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:16:44,407-Speed 2997.03 samples/sec Loss 9.9142 LearningRate 0.0459 Epoch: 6 Global Step: 80030 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:16:47,735-Speed 3077.35 samples/sec Loss 9.8123 LearningRate 0.0459 Epoch: 6 Global Step: 80040 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:16:51,130-Speed 3017.65 samples/sec Loss 9.9814 LearningRate 0.0459 Epoch: 6 Global Step: 80050 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:16:54,559-Speed 2987.60 samples/sec Loss 9.9187 LearningRate 0.0459 Epoch: 6 Global Step: 80060 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:16:57,927-Speed 3040.40 samples/sec Loss 9.8734 LearningRate 0.0459 Epoch: 6 Global Step: 80070 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:17:01,286-Speed 3049.80 samples/sec Loss 9.9597 LearningRate 0.0459 Epoch: 6 Global Step: 80080 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:17:04,754-Speed 2953.38 samples/sec Loss 9.9226 LearningRate 0.0459 Epoch: 6 Global Step: 80090 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:17:08,161-Speed 3006.66 samples/sec Loss 9.8140 LearningRate 0.0459 Epoch: 6 Global Step: 80100 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:17:11,559-Speed 3014.76 samples/sec Loss 9.9102 LearningRate 0.0459 Epoch: 6 Global Step: 80110 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:17:14,983-Speed 2991.91 samples/sec Loss 9.9186 LearningRate 0.0459 Epoch: 6 Global Step: 80120 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:17:18,448-Speed 2955.61 samples/sec Loss 9.7691 LearningRate 0.0459 Epoch: 6 Global Step: 80130 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:17:21,840-Speed 3019.94 samples/sec Loss 9.8227 LearningRate 0.0459 Epoch: 6 Global Step: 80140 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:17:25,332-Speed 2932.90 samples/sec Loss 9.9210 LearningRate 0.0459 Epoch: 6 Global Step: 80150 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:17:28,770-Speed 2979.51 samples/sec Loss 9.9841 LearningRate 0.0459 Epoch: 6 Global Step: 80160 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:17:32,185-Speed 2999.50 samples/sec Loss 10.0065 LearningRate 0.0459 Epoch: 6 Global Step: 80170 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:17:35,603-Speed 2996.93 samples/sec Loss 9.8681 LearningRate 0.0459 Epoch: 6 Global Step: 80180 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:17:38,910-Speed 3097.09 samples/sec Loss 10.0436 LearningRate 0.0459 Epoch: 6 Global Step: 80190 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:17:42,290-Speed 3030.84 samples/sec Loss 9.9374 LearningRate 0.0459 Epoch: 6 Global Step: 80200 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:17:45,629-Speed 3067.53 samples/sec Loss 9.9245 LearningRate 0.0458 Epoch: 6 Global Step: 80210 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:17:49,030-Speed 3012.33 samples/sec Loss 9.9406 LearningRate 0.0458 Epoch: 6 Global Step: 80220 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:17:52,456-Speed 2989.60 samples/sec Loss 9.9263 LearningRate 0.0458 Epoch: 6 Global Step: 80230 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:17:55,926-Speed 2951.51 samples/sec Loss 9.8937 LearningRate 0.0458 Epoch: 6 Global Step: 80240 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:17:59,320-Speed 3018.09 samples/sec Loss 9.8821 LearningRate 0.0458 Epoch: 6 Global Step: 80250 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:18:02,684-Speed 3044.58 samples/sec Loss 9.9868 LearningRate 0.0458 Epoch: 6 Global Step: 80260 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:18:06,097-Speed 3001.38 samples/sec Loss 9.8796 LearningRate 0.0458 Epoch: 6 Global Step: 80270 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:18:09,583-Speed 2938.49 samples/sec Loss 9.8408 LearningRate 0.0458 Epoch: 6 Global Step: 80280 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:18:12,972-Speed 3022.16 samples/sec Loss 9.9400 LearningRate 0.0458 Epoch: 6 Global Step: 80290 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:18:16,350-Speed 3033.28 samples/sec Loss 9.9678 LearningRate 0.0458 Epoch: 6 Global Step: 80300 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:18:19,732-Speed 3027.87 samples/sec Loss 9.8718 LearningRate 0.0458 Epoch: 6 Global Step: 80310 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:18:23,170-Speed 2979.43 samples/sec Loss 9.9651 LearningRate 0.0458 Epoch: 6 Global Step: 80320 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:18:26,562-Speed 3019.50 samples/sec Loss 9.9749 LearningRate 0.0458 Epoch: 6 Global Step: 80330 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:18:29,975-Speed 3001.63 samples/sec Loss 9.9425 LearningRate 0.0458 Epoch: 6 Global Step: 80340 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:18:33,395-Speed 2995.15 samples/sec Loss 9.9213 LearningRate 0.0458 Epoch: 6 Global Step: 80350 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:18:36,864-Speed 2953.00 samples/sec Loss 10.0787 LearningRate 0.0458 Epoch: 6 Global Step: 80360 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:18:40,245-Speed 3029.20 samples/sec Loss 9.9101 LearningRate 0.0458 Epoch: 6 Global Step: 80370 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:18:43,654-Speed 3004.82 samples/sec Loss 9.9625 LearningRate 0.0458 Epoch: 6 Global Step: 80380 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:18:47,023-Speed 3039.88 samples/sec Loss 9.7882 LearningRate 0.0458 Epoch: 6 Global Step: 80390 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:18:50,437-Speed 3000.49 samples/sec Loss 10.0600 LearningRate 0.0457 Epoch: 6 Global Step: 80400 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:18:53,792-Speed 3053.16 samples/sec Loss 9.7911 LearningRate 0.0457 Epoch: 6 Global Step: 80410 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:18:57,112-Speed 3085.86 samples/sec Loss 9.9526 LearningRate 0.0457 Epoch: 6 Global Step: 80420 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:19:00,529-Speed 2998.18 samples/sec Loss 9.9087 LearningRate 0.0457 Epoch: 6 Global Step: 80430 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:19:03,888-Speed 3049.46 samples/sec Loss 9.8144 LearningRate 0.0457 Epoch: 6 Global Step: 80440 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:19:07,219-Speed 3074.46 samples/sec Loss 9.9498 LearningRate 0.0457 Epoch: 6 Global Step: 80450 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:19:10,563-Speed 3063.52 samples/sec Loss 9.9748 LearningRate 0.0457 Epoch: 6 Global Step: 80460 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:19:13,967-Speed 3008.53 samples/sec Loss 9.8544 LearningRate 0.0457 Epoch: 6 Global Step: 80470 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:19:17,326-Speed 3049.58 samples/sec Loss 9.9290 LearningRate 0.0457 Epoch: 6 Global Step: 80480 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:19:20,661-Speed 3072.21 samples/sec Loss 9.7994 LearningRate 0.0457 Epoch: 6 Global Step: 80490 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 09:19:24,040-Speed 3030.48 samples/sec Loss 9.8051 LearningRate 0.0457 Epoch: 6 Global Step: 80500 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 09:19:27,456-Speed 2999.06 samples/sec Loss 10.0108 LearningRate 0.0457 Epoch: 6 Global Step: 80510 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 09:19:30,781-Speed 3080.23 samples/sec Loss 9.9759 LearningRate 0.0457 Epoch: 6 Global Step: 80520 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 09:19:34,129-Speed 3059.33 samples/sec Loss 9.9370 LearningRate 0.0457 Epoch: 6 Global Step: 80530 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 09:19:37,509-Speed 3030.30 samples/sec Loss 9.9433 LearningRate 0.0457 Epoch: 6 Global Step: 80540 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 09:19:40,978-Speed 2953.61 samples/sec Loss 9.8242 LearningRate 0.0457 Epoch: 6 Global Step: 80550 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 09:19:44,294-Speed 3089.13 samples/sec Loss 9.9890 LearningRate 0.0457 Epoch: 6 Global Step: 80560 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 09:19:47,662-Speed 3040.69 samples/sec Loss 9.9463 LearningRate 0.0457 Epoch: 6 Global Step: 80570 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 09:19:51,053-Speed 3021.07 samples/sec Loss 9.8707 LearningRate 0.0456 Epoch: 6 Global Step: 80580 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 09:19:54,433-Speed 3030.46 samples/sec Loss 10.0690 LearningRate 0.0456 Epoch: 6 Global Step: 80590 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:19:57,875-Speed 2975.57 samples/sec Loss 9.9306 LearningRate 0.0456 Epoch: 6 Global Step: 80600 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:20:01,355-Speed 2943.28 samples/sec Loss 9.8806 LearningRate 0.0456 Epoch: 6 Global Step: 80610 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:20:04,764-Speed 3005.19 samples/sec Loss 9.9502 LearningRate 0.0456 Epoch: 6 Global Step: 80620 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:20:08,173-Speed 3004.53 samples/sec Loss 9.8826 LearningRate 0.0456 Epoch: 6 Global Step: 80630 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:20:11,551-Speed 3032.45 samples/sec Loss 9.8318 LearningRate 0.0456 Epoch: 6 Global Step: 80640 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:20:14,904-Speed 3054.91 samples/sec Loss 9.8591 LearningRate 0.0456 Epoch: 6 Global Step: 80650 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:20:18,220-Speed 3088.81 samples/sec Loss 9.9565 LearningRate 0.0456 Epoch: 6 Global Step: 80660 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:20:21,579-Speed 3049.10 samples/sec Loss 9.8250 LearningRate 0.0456 Epoch: 6 Global Step: 80670 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:20:25,053-Speed 2948.65 samples/sec Loss 9.8636 LearningRate 0.0456 Epoch: 6 Global Step: 80680 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:20:28,505-Speed 2967.35 samples/sec Loss 9.7641 LearningRate 0.0456 Epoch: 6 Global Step: 80690 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:20:31,901-Speed 3016.28 samples/sec Loss 9.7430 LearningRate 0.0456 Epoch: 6 Global Step: 80700 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:20:35,275-Speed 3035.96 samples/sec Loss 9.9940 LearningRate 0.0456 Epoch: 6 Global Step: 80710 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:20:38,769-Speed 2930.80 samples/sec Loss 9.8736 LearningRate 0.0456 Epoch: 6 Global Step: 80720 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:20:42,224-Speed 2965.54 samples/sec Loss 9.9020 LearningRate 0.0456 Epoch: 6 Global Step: 80730 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:20:45,639-Speed 2999.67 samples/sec Loss 9.8807 LearningRate 0.0456 Epoch: 6 Global Step: 80740 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:20:49,126-Speed 2937.04 samples/sec Loss 9.9585 LearningRate 0.0456 Epoch: 6 Global Step: 80750 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:20:52,499-Speed 3036.63 samples/sec Loss 9.8521 LearningRate 0.0455 Epoch: 6 Global Step: 80760 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:20:55,896-Speed 3015.55 samples/sec Loss 9.9582 LearningRate 0.0455 Epoch: 6 Global Step: 80770 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:20:59,191-Speed 3108.07 samples/sec Loss 9.8250 LearningRate 0.0455 Epoch: 6 Global Step: 80780 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 09:21:02,629-Speed 2979.95 samples/sec Loss 9.9859 LearningRate 0.0455 Epoch: 6 Global Step: 80790 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 09:21:05,991-Speed 3046.84 samples/sec Loss 9.9067 LearningRate 0.0455 Epoch: 6 Global Step: 80800 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 09:21:09,319-Speed 3077.37 samples/sec Loss 9.8363 LearningRate 0.0455 Epoch: 6 Global Step: 80810 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 09:21:12,655-Speed 3070.75 samples/sec Loss 9.8797 LearningRate 0.0455 Epoch: 6 Global Step: 80820 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 09:21:16,056-Speed 3012.07 samples/sec Loss 9.9981 LearningRate 0.0455 Epoch: 6 Global Step: 80830 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 09:21:19,445-Speed 3021.83 samples/sec Loss 9.8536 LearningRate 0.0455 Epoch: 6 Global Step: 80840 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 09:21:22,804-Speed 3049.99 samples/sec Loss 9.8232 LearningRate 0.0455 Epoch: 6 Global Step: 80850 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 09:21:26,189-Speed 3025.99 samples/sec Loss 9.9121 LearningRate 0.0455 Epoch: 6 Global Step: 80860 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 09:21:29,517-Speed 3078.02 samples/sec Loss 9.7818 LearningRate 0.0455 Epoch: 6 Global Step: 80870 Fp16 Grad Scale: 32768 Required: 16 hours Training: 2022-04-27 09:21:32,885-Speed 3040.42 samples/sec Loss 9.6713 LearningRate 0.0455 Epoch: 6 Global Step: 80880 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:21:36,390-Speed 2923.44 samples/sec Loss 9.9743 LearningRate 0.0455 Epoch: 6 Global Step: 80890 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:21:39,809-Speed 2997.78 samples/sec Loss 9.9061 LearningRate 0.0455 Epoch: 6 Global Step: 80900 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:21:43,258-Speed 2969.53 samples/sec Loss 10.0155 LearningRate 0.0455 Epoch: 6 Global Step: 80910 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:21:46,599-Speed 3065.61 samples/sec Loss 9.7953 LearningRate 0.0455 Epoch: 6 Global Step: 80920 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:21:50,007-Speed 3005.34 samples/sec Loss 9.9444 LearningRate 0.0455 Epoch: 6 Global Step: 80930 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:21:53,318-Speed 3094.35 samples/sec Loss 9.9023 LearningRate 0.0455 Epoch: 6 Global Step: 80940 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:21:56,690-Speed 3037.45 samples/sec Loss 9.9389 LearningRate 0.0454 Epoch: 6 Global Step: 80950 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:22:00,040-Speed 3056.89 samples/sec Loss 9.8309 LearningRate 0.0454 Epoch: 6 Global Step: 80960 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:22:03,459-Speed 2996.08 samples/sec Loss 9.9556 LearningRate 0.0454 Epoch: 6 Global Step: 80970 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:22:06,833-Speed 3035.66 samples/sec Loss 9.9730 LearningRate 0.0454 Epoch: 6 Global Step: 80980 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:22:10,230-Speed 3015.64 samples/sec Loss 9.8291 LearningRate 0.0454 Epoch: 6 Global Step: 80990 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:22:13,707-Speed 2945.71 samples/sec Loss 9.7822 LearningRate 0.0454 Epoch: 6 Global Step: 81000 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:22:17,038-Speed 3075.62 samples/sec Loss 9.7428 LearningRate 0.0454 Epoch: 6 Global Step: 81010 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:22:20,425-Speed 3023.93 samples/sec Loss 9.9476 LearningRate 0.0454 Epoch: 6 Global Step: 81020 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:22:23,798-Speed 3036.41 samples/sec Loss 9.9012 LearningRate 0.0454 Epoch: 6 Global Step: 81030 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:22:27,215-Speed 2998.22 samples/sec Loss 9.9425 LearningRate 0.0454 Epoch: 6 Global Step: 81040 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:22:30,622-Speed 3006.04 samples/sec Loss 10.0235 LearningRate 0.0454 Epoch: 6 Global Step: 81050 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:22:33,967-Speed 3062.68 samples/sec Loss 9.7989 LearningRate 0.0454 Epoch: 6 Global Step: 81060 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:22:37,409-Speed 2976.08 samples/sec Loss 9.9324 LearningRate 0.0454 Epoch: 6 Global Step: 81070 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:22:40,703-Speed 3108.94 samples/sec Loss 9.7792 LearningRate 0.0454 Epoch: 6 Global Step: 81080 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:22:44,084-Speed 3029.77 samples/sec Loss 9.9850 LearningRate 0.0454 Epoch: 6 Global Step: 81090 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:22:47,447-Speed 3045.28 samples/sec Loss 9.8025 LearningRate 0.0454 Epoch: 6 Global Step: 81100 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:22:50,798-Speed 3056.47 samples/sec Loss 9.9064 LearningRate 0.0454 Epoch: 6 Global Step: 81110 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:22:54,171-Speed 3037.09 samples/sec Loss 10.0240 LearningRate 0.0454 Epoch: 6 Global Step: 81120 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:22:57,487-Speed 3088.96 samples/sec Loss 9.8117 LearningRate 0.0453 Epoch: 6 Global Step: 81130 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:23:00,855-Speed 3041.40 samples/sec Loss 9.7351 LearningRate 0.0453 Epoch: 6 Global Step: 81140 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:23:04,212-Speed 3051.48 samples/sec Loss 9.5927 LearningRate 0.0453 Epoch: 6 Global Step: 81150 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:23:07,558-Speed 3060.87 samples/sec Loss 9.9222 LearningRate 0.0453 Epoch: 6 Global Step: 81160 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:23:10,915-Speed 3051.83 samples/sec Loss 9.8944 LearningRate 0.0453 Epoch: 6 Global Step: 81170 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:23:14,317-Speed 3011.04 samples/sec Loss 9.8539 LearningRate 0.0453 Epoch: 6 Global Step: 81180 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:23:17,687-Speed 3038.66 samples/sec Loss 9.8696 LearningRate 0.0453 Epoch: 6 Global Step: 81190 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:23:21,059-Speed 3037.55 samples/sec Loss 9.9418 LearningRate 0.0453 Epoch: 6 Global Step: 81200 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:23:24,415-Speed 3052.70 samples/sec Loss 10.0057 LearningRate 0.0453 Epoch: 6 Global Step: 81210 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:23:27,889-Speed 2948.81 samples/sec Loss 9.8763 LearningRate 0.0453 Epoch: 6 Global Step: 81220 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:23:31,276-Speed 3023.37 samples/sec Loss 9.9597 LearningRate 0.0453 Epoch: 6 Global Step: 81230 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:23:34,710-Speed 2982.62 samples/sec Loss 9.8066 LearningRate 0.0453 Epoch: 6 Global Step: 81240 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:23:38,049-Speed 3068.14 samples/sec Loss 9.8529 LearningRate 0.0453 Epoch: 6 Global Step: 81250 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:23:41,545-Speed 2930.29 samples/sec Loss 9.8039 LearningRate 0.0453 Epoch: 6 Global Step: 81260 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:23:44,979-Speed 2982.08 samples/sec Loss 9.9378 LearningRate 0.0453 Epoch: 6 Global Step: 81270 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:23:48,402-Speed 2992.68 samples/sec Loss 9.8591 LearningRate 0.0453 Epoch: 6 Global Step: 81280 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:23:51,847-Speed 2973.90 samples/sec Loss 9.7574 LearningRate 0.0453 Epoch: 6 Global Step: 81290 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:23:55,264-Speed 2997.11 samples/sec Loss 9.8658 LearningRate 0.0453 Epoch: 6 Global Step: 81300 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:23:58,701-Speed 2980.22 samples/sec Loss 9.7620 LearningRate 0.0453 Epoch: 6 Global Step: 81310 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:24:02,105-Speed 3009.48 samples/sec Loss 9.9030 LearningRate 0.0452 Epoch: 6 Global Step: 81320 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:24:05,454-Speed 3058.69 samples/sec Loss 9.9431 LearningRate 0.0452 Epoch: 6 Global Step: 81330 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:24:08,902-Speed 2970.72 samples/sec Loss 9.9216 LearningRate 0.0452 Epoch: 6 Global Step: 81340 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:24:12,308-Speed 3007.63 samples/sec Loss 9.9276 LearningRate 0.0452 Epoch: 6 Global Step: 81350 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:24:15,663-Speed 3052.45 samples/sec Loss 9.8654 LearningRate 0.0452 Epoch: 6 Global Step: 81360 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:24:19,028-Speed 3043.91 samples/sec Loss 9.8599 LearningRate 0.0452 Epoch: 6 Global Step: 81370 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:24:22,392-Speed 3044.81 samples/sec Loss 9.8190 LearningRate 0.0452 Epoch: 6 Global Step: 81380 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:24:25,763-Speed 3039.01 samples/sec Loss 9.9401 LearningRate 0.0452 Epoch: 6 Global Step: 81390 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:24:29,102-Speed 3068.26 samples/sec Loss 9.9883 LearningRate 0.0452 Epoch: 6 Global Step: 81400 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:24:32,421-Speed 3085.77 samples/sec Loss 9.6822 LearningRate 0.0452 Epoch: 6 Global Step: 81410 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:24:35,791-Speed 3039.41 samples/sec Loss 9.8958 LearningRate 0.0452 Epoch: 6 Global Step: 81420 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:24:39,193-Speed 3011.46 samples/sec Loss 9.8655 LearningRate 0.0452 Epoch: 6 Global Step: 81430 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:24:42,613-Speed 2995.17 samples/sec Loss 9.9250 LearningRate 0.0452 Epoch: 6 Global Step: 81440 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:24:46,036-Speed 2992.05 samples/sec Loss 9.8022 LearningRate 0.0452 Epoch: 6 Global Step: 81450 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:24:49,451-Speed 2999.80 samples/sec Loss 9.7725 LearningRate 0.0452 Epoch: 6 Global Step: 81460 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:24:52,803-Speed 3055.68 samples/sec Loss 9.7485 LearningRate 0.0452 Epoch: 6 Global Step: 81470 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:24:56,217-Speed 3000.65 samples/sec Loss 9.8524 LearningRate 0.0452 Epoch: 6 Global Step: 81480 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:24:59,569-Speed 3055.58 samples/sec Loss 9.8835 LearningRate 0.0452 Epoch: 6 Global Step: 81490 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:25:03,020-Speed 2967.81 samples/sec Loss 9.7697 LearningRate 0.0451 Epoch: 6 Global Step: 81500 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:25:06,389-Speed 3040.69 samples/sec Loss 9.7130 LearningRate 0.0451 Epoch: 6 Global Step: 81510 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:25:09,791-Speed 3010.77 samples/sec Loss 9.8253 LearningRate 0.0451 Epoch: 6 Global Step: 81520 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:25:13,139-Speed 3059.74 samples/sec Loss 9.7594 LearningRate 0.0451 Epoch: 6 Global Step: 81530 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:25:16,633-Speed 2932.43 samples/sec Loss 9.8023 LearningRate 0.0451 Epoch: 6 Global Step: 81540 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:25:20,001-Speed 3040.57 samples/sec Loss 9.8520 LearningRate 0.0451 Epoch: 6 Global Step: 81550 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:25:23,428-Speed 2989.17 samples/sec Loss 9.8438 LearningRate 0.0451 Epoch: 6 Global Step: 81560 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:25:26,836-Speed 3006.30 samples/sec Loss 9.8349 LearningRate 0.0451 Epoch: 6 Global Step: 81570 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:25:30,148-Speed 3093.21 samples/sec Loss 9.7642 LearningRate 0.0451 Epoch: 6 Global Step: 81580 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:25:33,593-Speed 2973.10 samples/sec Loss 9.7942 LearningRate 0.0451 Epoch: 6 Global Step: 81590 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:25:37,034-Speed 2976.66 samples/sec Loss 9.8904 LearningRate 0.0451 Epoch: 6 Global Step: 81600 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:25:40,522-Speed 2936.11 samples/sec Loss 9.8992 LearningRate 0.0451 Epoch: 6 Global Step: 81610 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:25:43,935-Speed 3001.34 samples/sec Loss 9.9066 LearningRate 0.0451 Epoch: 6 Global Step: 81620 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:25:47,311-Speed 3034.25 samples/sec Loss 9.8103 LearningRate 0.0451 Epoch: 6 Global Step: 81630 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:25:50,736-Speed 2990.23 samples/sec Loss 9.8012 LearningRate 0.0451 Epoch: 6 Global Step: 81640 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:25:54,169-Speed 2984.22 samples/sec Loss 9.7150 LearningRate 0.0451 Epoch: 6 Global Step: 81650 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:25:57,580-Speed 3002.41 samples/sec Loss 9.8334 LearningRate 0.0451 Epoch: 6 Global Step: 81660 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:26:00,958-Speed 3031.85 samples/sec Loss 9.8646 LearningRate 0.0451 Epoch: 6 Global Step: 81670 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:26:04,342-Speed 3027.77 samples/sec Loss 9.7748 LearningRate 0.0451 Epoch: 6 Global Step: 81680 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:26:07,723-Speed 3029.20 samples/sec Loss 9.8152 LearningRate 0.0450 Epoch: 6 Global Step: 81690 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:26:11,136-Speed 3001.78 samples/sec Loss 9.8601 LearningRate 0.0450 Epoch: 6 Global Step: 81700 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:26:14,586-Speed 2969.06 samples/sec Loss 9.8618 LearningRate 0.0450 Epoch: 6 Global Step: 81710 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:26:17,945-Speed 3049.02 samples/sec Loss 9.8100 LearningRate 0.0450 Epoch: 6 Global Step: 81720 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:26:21,342-Speed 3015.43 samples/sec Loss 9.7499 LearningRate 0.0450 Epoch: 6 Global Step: 81730 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:26:24,772-Speed 2986.54 samples/sec Loss 9.9438 LearningRate 0.0450 Epoch: 6 Global Step: 81740 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:26:28,170-Speed 3015.05 samples/sec Loss 9.8219 LearningRate 0.0450 Epoch: 6 Global Step: 81750 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:26:31,642-Speed 2950.15 samples/sec Loss 9.8867 LearningRate 0.0450 Epoch: 6 Global Step: 81760 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:26:35,021-Speed 3030.50 samples/sec Loss 9.7620 LearningRate 0.0450 Epoch: 6 Global Step: 81770 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:26:38,508-Speed 2937.39 samples/sec Loss 9.8517 LearningRate 0.0450 Epoch: 6 Global Step: 81780 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:26:41,900-Speed 3020.25 samples/sec Loss 9.7934 LearningRate 0.0450 Epoch: 6 Global Step: 81790 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:26:45,288-Speed 3023.50 samples/sec Loss 9.9655 LearningRate 0.0450 Epoch: 6 Global Step: 81800 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:26:48,656-Speed 3041.45 samples/sec Loss 9.8560 LearningRate 0.0450 Epoch: 6 Global Step: 81810 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:26:52,003-Speed 3060.19 samples/sec Loss 9.6061 LearningRate 0.0450 Epoch: 6 Global Step: 81820 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:26:55,411-Speed 3005.66 samples/sec Loss 9.9837 LearningRate 0.0450 Epoch: 6 Global Step: 81830 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:26:58,839-Speed 2988.39 samples/sec Loss 9.7736 LearningRate 0.0450 Epoch: 6 Global Step: 81840 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:27:02,231-Speed 3019.08 samples/sec Loss 9.9156 LearningRate 0.0450 Epoch: 6 Global Step: 81850 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:27:05,634-Speed 3010.71 samples/sec Loss 9.6171 LearningRate 0.0450 Epoch: 6 Global Step: 81860 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:27:09,045-Speed 3003.02 samples/sec Loss 9.7061 LearningRate 0.0449 Epoch: 6 Global Step: 81870 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:27:12,463-Speed 2996.16 samples/sec Loss 9.8300 LearningRate 0.0449 Epoch: 6 Global Step: 81880 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:27:15,832-Speed 3040.58 samples/sec Loss 9.9236 LearningRate 0.0449 Epoch: 6 Global Step: 81890 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:27:19,282-Speed 2968.26 samples/sec Loss 9.8619 LearningRate 0.0449 Epoch: 6 Global Step: 81900 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:27:22,761-Speed 2944.35 samples/sec Loss 9.7810 LearningRate 0.0449 Epoch: 6 Global Step: 81910 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:27:26,222-Speed 2959.76 samples/sec Loss 9.8425 LearningRate 0.0449 Epoch: 6 Global Step: 81920 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:27:29,673-Speed 2967.72 samples/sec Loss 9.8770 LearningRate 0.0449 Epoch: 6 Global Step: 81930 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:27:33,029-Speed 3052.68 samples/sec Loss 9.9932 LearningRate 0.0449 Epoch: 6 Global Step: 81940 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:27:36,435-Speed 3007.23 samples/sec Loss 9.7433 LearningRate 0.0449 Epoch: 6 Global Step: 81950 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:27:39,890-Speed 2964.20 samples/sec Loss 9.7382 LearningRate 0.0449 Epoch: 6 Global Step: 81960 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:27:43,324-Speed 2982.92 samples/sec Loss 9.9012 LearningRate 0.0449 Epoch: 6 Global Step: 81970 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:27:46,738-Speed 3000.96 samples/sec Loss 9.8182 LearningRate 0.0449 Epoch: 6 Global Step: 81980 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:27:50,136-Speed 3014.47 samples/sec Loss 9.8802 LearningRate 0.0449 Epoch: 6 Global Step: 81990 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:27:53,561-Speed 2990.27 samples/sec Loss 9.7372 LearningRate 0.0449 Epoch: 6 Global Step: 82000 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:27:56,919-Speed 3049.78 samples/sec Loss 9.7345 LearningRate 0.0449 Epoch: 6 Global Step: 82010 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:28:00,268-Speed 3058.70 samples/sec Loss 9.7644 LearningRate 0.0449 Epoch: 6 Global Step: 82020 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:28:03,589-Speed 3085.18 samples/sec Loss 9.8388 LearningRate 0.0449 Epoch: 6 Global Step: 82030 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:28:06,894-Speed 3098.53 samples/sec Loss 9.7293 LearningRate 0.0449 Epoch: 6 Global Step: 82040 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:28:10,257-Speed 3046.10 samples/sec Loss 9.8118 LearningRate 0.0449 Epoch: 6 Global Step: 82050 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:28:13,574-Speed 3089.05 samples/sec Loss 9.8318 LearningRate 0.0448 Epoch: 6 Global Step: 82060 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:28:16,926-Speed 3055.34 samples/sec Loss 9.8148 LearningRate 0.0448 Epoch: 6 Global Step: 82070 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:28:20,334-Speed 3005.15 samples/sec Loss 9.7919 LearningRate 0.0448 Epoch: 6 Global Step: 82080 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:28:23,638-Speed 3100.71 samples/sec Loss 9.8566 LearningRate 0.0448 Epoch: 6 Global Step: 82090 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:28:26,953-Speed 3089.47 samples/sec Loss 9.8298 LearningRate 0.0448 Epoch: 6 Global Step: 82100 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:28:30,357-Speed 3009.45 samples/sec Loss 9.9397 LearningRate 0.0448 Epoch: 6 Global Step: 82110 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:28:33,708-Speed 3056.39 samples/sec Loss 9.8034 LearningRate 0.0448 Epoch: 6 Global Step: 82120 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:28:37,103-Speed 3016.99 samples/sec Loss 9.8336 LearningRate 0.0448 Epoch: 6 Global Step: 82130 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:28:40,554-Speed 2968.15 samples/sec Loss 9.9021 LearningRate 0.0448 Epoch: 6 Global Step: 82140 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:28:43,931-Speed 3033.17 samples/sec Loss 9.7170 LearningRate 0.0448 Epoch: 6 Global Step: 82150 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:28:47,247-Speed 3088.47 samples/sec Loss 9.8138 LearningRate 0.0448 Epoch: 6 Global Step: 82160 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:28:50,626-Speed 3032.23 samples/sec Loss 9.6929 LearningRate 0.0448 Epoch: 6 Global Step: 82170 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:28:54,042-Speed 2998.31 samples/sec Loss 9.7222 LearningRate 0.0448 Epoch: 6 Global Step: 82180 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:28:57,428-Speed 3024.88 samples/sec Loss 9.7019 LearningRate 0.0448 Epoch: 6 Global Step: 82190 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:29:00,818-Speed 3021.17 samples/sec Loss 9.7830 LearningRate 0.0448 Epoch: 6 Global Step: 82200 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:29:04,264-Speed 2972.72 samples/sec Loss 9.8291 LearningRate 0.0448 Epoch: 6 Global Step: 82210 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:29:07,608-Speed 3063.29 samples/sec Loss 9.8403 LearningRate 0.0448 Epoch: 6 Global Step: 82220 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:29:11,005-Speed 3014.84 samples/sec Loss 9.7197 LearningRate 0.0448 Epoch: 6 Global Step: 82230 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:29:14,413-Speed 3005.70 samples/sec Loss 9.9420 LearningRate 0.0447 Epoch: 6 Global Step: 82240 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:29:17,870-Speed 2962.88 samples/sec Loss 9.8742 LearningRate 0.0447 Epoch: 6 Global Step: 82250 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:29:21,220-Speed 3058.60 samples/sec Loss 9.8290 LearningRate 0.0447 Epoch: 6 Global Step: 82260 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:29:24,601-Speed 3029.08 samples/sec Loss 9.7106 LearningRate 0.0447 Epoch: 6 Global Step: 82270 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:29:27,928-Speed 3078.60 samples/sec Loss 9.9491 LearningRate 0.0447 Epoch: 6 Global Step: 82280 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:29:31,298-Speed 3039.45 samples/sec Loss 9.8990 LearningRate 0.0447 Epoch: 6 Global Step: 82290 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:29:34,659-Speed 3047.97 samples/sec Loss 9.7581 LearningRate 0.0447 Epoch: 6 Global Step: 82300 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:29:37,979-Speed 3085.20 samples/sec Loss 9.8474 LearningRate 0.0447 Epoch: 6 Global Step: 82310 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:29:41,365-Speed 3024.84 samples/sec Loss 9.7141 LearningRate 0.0447 Epoch: 6 Global Step: 82320 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:29:44,798-Speed 2983.60 samples/sec Loss 9.7215 LearningRate 0.0447 Epoch: 6 Global Step: 82330 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:29:48,166-Speed 3041.43 samples/sec Loss 9.7928 LearningRate 0.0447 Epoch: 6 Global Step: 82340 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:29:51,491-Speed 3081.18 samples/sec Loss 9.7859 LearningRate 0.0447 Epoch: 6 Global Step: 82350 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:29:54,842-Speed 3056.50 samples/sec Loss 9.8739 LearningRate 0.0447 Epoch: 6 Global Step: 82360 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:29:58,254-Speed 3001.59 samples/sec Loss 9.6801 LearningRate 0.0447 Epoch: 6 Global Step: 82370 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:30:01,678-Speed 2991.45 samples/sec Loss 9.8270 LearningRate 0.0447 Epoch: 6 Global Step: 82380 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:30:05,036-Speed 3050.87 samples/sec Loss 9.6994 LearningRate 0.0447 Epoch: 6 Global Step: 82390 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:30:08,442-Speed 3007.27 samples/sec Loss 9.8301 LearningRate 0.0447 Epoch: 6 Global Step: 82400 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:30:11,801-Speed 3048.66 samples/sec Loss 9.8519 LearningRate 0.0447 Epoch: 6 Global Step: 82410 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:30:15,119-Speed 3087.63 samples/sec Loss 9.6506 LearningRate 0.0447 Epoch: 6 Global Step: 82420 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:30:18,471-Speed 3055.84 samples/sec Loss 9.6624 LearningRate 0.0446 Epoch: 6 Global Step: 82430 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:30:21,830-Speed 3048.66 samples/sec Loss 9.6897 LearningRate 0.0446 Epoch: 6 Global Step: 82440 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:30:25,189-Speed 3050.11 samples/sec Loss 9.7483 LearningRate 0.0446 Epoch: 6 Global Step: 82450 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:30:28,520-Speed 3074.72 samples/sec Loss 9.6448 LearningRate 0.0446 Epoch: 6 Global Step: 82460 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:30:31,841-Speed 3084.74 samples/sec Loss 9.8070 LearningRate 0.0446 Epoch: 6 Global Step: 82470 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:30:35,299-Speed 2961.99 samples/sec Loss 9.6670 LearningRate 0.0446 Epoch: 6 Global Step: 82480 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:30:38,706-Speed 3006.43 samples/sec Loss 9.8398 LearningRate 0.0446 Epoch: 6 Global Step: 82490 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:30:42,065-Speed 3049.43 samples/sec Loss 9.7466 LearningRate 0.0446 Epoch: 6 Global Step: 82500 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:30:45,417-Speed 3055.83 samples/sec Loss 9.7428 LearningRate 0.0446 Epoch: 6 Global Step: 82510 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:30:48,815-Speed 3014.48 samples/sec Loss 9.9718 LearningRate 0.0446 Epoch: 6 Global Step: 82520 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:30:52,150-Speed 3071.94 samples/sec Loss 9.8265 LearningRate 0.0446 Epoch: 6 Global Step: 82530 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:30:55,505-Speed 3052.75 samples/sec Loss 9.9564 LearningRate 0.0446 Epoch: 6 Global Step: 82540 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:30:58,892-Speed 3024.40 samples/sec Loss 9.8790 LearningRate 0.0446 Epoch: 6 Global Step: 82550 Fp16 Grad Scale: 262144 Required: 16 hours Training: 2022-04-27 09:31:02,213-Speed 3084.19 samples/sec Loss 9.8043 LearningRate 0.0446 Epoch: 6 Global Step: 82560 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:31:05,516-Speed 3101.43 samples/sec Loss 9.7167 LearningRate 0.0446 Epoch: 6 Global Step: 82570 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:31:08,890-Speed 3035.76 samples/sec Loss 9.7685 LearningRate 0.0446 Epoch: 6 Global Step: 82580 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:31:12,239-Speed 3058.43 samples/sec Loss 9.6987 LearningRate 0.0446 Epoch: 6 Global Step: 82590 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:31:15,586-Speed 3060.15 samples/sec Loss 9.7313 LearningRate 0.0446 Epoch: 6 Global Step: 82600 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:31:18,926-Speed 3066.96 samples/sec Loss 9.6786 LearningRate 0.0446 Epoch: 6 Global Step: 82610 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:31:22,312-Speed 3025.33 samples/sec Loss 9.8181 LearningRate 0.0445 Epoch: 6 Global Step: 82620 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:31:25,735-Speed 2992.01 samples/sec Loss 9.8660 LearningRate 0.0445 Epoch: 6 Global Step: 82630 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:31:29,121-Speed 3025.17 samples/sec Loss 9.7739 LearningRate 0.0445 Epoch: 6 Global Step: 82640 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:31:32,513-Speed 3020.25 samples/sec Loss 9.7124 LearningRate 0.0445 Epoch: 6 Global Step: 82650 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:31:35,883-Speed 3039.02 samples/sec Loss 9.5441 LearningRate 0.0445 Epoch: 6 Global Step: 82660 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:31:39,215-Speed 3074.10 samples/sec Loss 9.6621 LearningRate 0.0445 Epoch: 6 Global Step: 82670 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:31:42,599-Speed 3027.21 samples/sec Loss 9.8367 LearningRate 0.0445 Epoch: 6 Global Step: 82680 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:31:45,977-Speed 3031.98 samples/sec Loss 9.7390 LearningRate 0.0445 Epoch: 6 Global Step: 82690 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:31:49,308-Speed 3075.27 samples/sec Loss 9.7824 LearningRate 0.0445 Epoch: 6 Global Step: 82700 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:31:52,737-Speed 2987.22 samples/sec Loss 9.8047 LearningRate 0.0445 Epoch: 6 Global Step: 82710 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:31:56,133-Speed 3015.58 samples/sec Loss 9.7261 LearningRate 0.0445 Epoch: 6 Global Step: 82720 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:31:59,488-Speed 3053.69 samples/sec Loss 9.7911 LearningRate 0.0445 Epoch: 6 Global Step: 82730 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:32:02,828-Speed 3066.65 samples/sec Loss 9.6781 LearningRate 0.0445 Epoch: 6 Global Step: 82740 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:32:06,141-Speed 3091.41 samples/sec Loss 9.7717 LearningRate 0.0445 Epoch: 6 Global Step: 82750 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:32:09,499-Speed 3050.78 samples/sec Loss 9.7712 LearningRate 0.0445 Epoch: 6 Global Step: 82760 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:32:12,986-Speed 2937.70 samples/sec Loss 9.6895 LearningRate 0.0445 Epoch: 6 Global Step: 82770 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:32:16,383-Speed 3015.16 samples/sec Loss 9.8024 LearningRate 0.0445 Epoch: 6 Global Step: 82780 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:32:19,817-Speed 2983.03 samples/sec Loss 9.8966 LearningRate 0.0445 Epoch: 6 Global Step: 82790 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:32:23,191-Speed 3035.52 samples/sec Loss 9.7819 LearningRate 0.0444 Epoch: 6 Global Step: 82800 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:32:26,695-Speed 2923.65 samples/sec Loss 9.8176 LearningRate 0.0444 Epoch: 6 Global Step: 82810 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:32:30,097-Speed 3011.05 samples/sec Loss 9.7011 LearningRate 0.0444 Epoch: 6 Global Step: 82820 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:32:33,566-Speed 2952.18 samples/sec Loss 9.7058 LearningRate 0.0444 Epoch: 6 Global Step: 82830 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:32:36,995-Speed 2987.35 samples/sec Loss 9.6685 LearningRate 0.0444 Epoch: 6 Global Step: 82840 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:32:40,305-Speed 3094.46 samples/sec Loss 9.7546 LearningRate 0.0444 Epoch: 6 Global Step: 82850 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:32:43,641-Speed 3071.02 samples/sec Loss 9.8023 LearningRate 0.0444 Epoch: 6 Global Step: 82860 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:32:46,978-Speed 3069.39 samples/sec Loss 9.7846 LearningRate 0.0444 Epoch: 6 Global Step: 82870 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:32:50,429-Speed 2967.80 samples/sec Loss 9.7333 LearningRate 0.0444 Epoch: 6 Global Step: 82880 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:32:53,890-Speed 2959.87 samples/sec Loss 9.6885 LearningRate 0.0444 Epoch: 6 Global Step: 82890 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:32:57,274-Speed 3026.57 samples/sec Loss 9.7130 LearningRate 0.0444 Epoch: 6 Global Step: 82900 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:33:00,703-Speed 2987.06 samples/sec Loss 9.7781 LearningRate 0.0444 Epoch: 6 Global Step: 82910 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:33:04,143-Speed 2978.21 samples/sec Loss 9.6479 LearningRate 0.0444 Epoch: 6 Global Step: 82920 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:33:07,530-Speed 3023.85 samples/sec Loss 9.8575 LearningRate 0.0444 Epoch: 6 Global Step: 82930 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:33:10,991-Speed 2959.78 samples/sec Loss 9.7912 LearningRate 0.0444 Epoch: 6 Global Step: 82940 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:33:14,444-Speed 2966.69 samples/sec Loss 9.8760 LearningRate 0.0444 Epoch: 6 Global Step: 82950 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:33:17,803-Speed 3049.75 samples/sec Loss 9.8397 LearningRate 0.0444 Epoch: 6 Global Step: 82960 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:33:21,200-Speed 3015.22 samples/sec Loss 9.8930 LearningRate 0.0444 Epoch: 6 Global Step: 82970 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:33:24,556-Speed 3051.94 samples/sec Loss 9.7362 LearningRate 0.0444 Epoch: 6 Global Step: 82980 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:33:27,912-Speed 3052.25 samples/sec Loss 9.8945 LearningRate 0.0443 Epoch: 6 Global Step: 82990 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:33:31,308-Speed 3016.94 samples/sec Loss 9.6735 LearningRate 0.0443 Epoch: 6 Global Step: 83000 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:33:34,719-Speed 3002.61 samples/sec Loss 9.6552 LearningRate 0.0443 Epoch: 6 Global Step: 83010 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:33:38,189-Speed 2951.68 samples/sec Loss 9.9245 LearningRate 0.0443 Epoch: 6 Global Step: 83020 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:33:41,617-Speed 2988.60 samples/sec Loss 9.7500 LearningRate 0.0443 Epoch: 6 Global Step: 83030 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:33:45,067-Speed 2969.31 samples/sec Loss 9.7096 LearningRate 0.0443 Epoch: 6 Global Step: 83040 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:33:48,458-Speed 3020.49 samples/sec Loss 9.8316 LearningRate 0.0443 Epoch: 6 Global Step: 83050 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:33:51,881-Speed 2992.73 samples/sec Loss 9.7340 LearningRate 0.0443 Epoch: 6 Global Step: 83060 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:33:55,279-Speed 3014.69 samples/sec Loss 9.8786 LearningRate 0.0443 Epoch: 6 Global Step: 83070 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:33:58,695-Speed 2999.02 samples/sec Loss 9.8211 LearningRate 0.0443 Epoch: 6 Global Step: 83080 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:34:02,112-Speed 2997.72 samples/sec Loss 9.6287 LearningRate 0.0443 Epoch: 6 Global Step: 83090 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:34:05,476-Speed 3045.01 samples/sec Loss 9.6756 LearningRate 0.0443 Epoch: 6 Global Step: 83100 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:34:08,856-Speed 3030.56 samples/sec Loss 9.6666 LearningRate 0.0443 Epoch: 6 Global Step: 83110 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:34:12,289-Speed 2983.65 samples/sec Loss 9.7164 LearningRate 0.0443 Epoch: 6 Global Step: 83120 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:34:15,720-Speed 2985.55 samples/sec Loss 9.8116 LearningRate 0.0443 Epoch: 6 Global Step: 83130 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:34:19,046-Speed 3079.08 samples/sec Loss 9.7455 LearningRate 0.0443 Epoch: 6 Global Step: 83140 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:34:22,385-Speed 3067.98 samples/sec Loss 9.6743 LearningRate 0.0443 Epoch: 6 Global Step: 83150 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:34:25,795-Speed 3004.54 samples/sec Loss 9.7693 LearningRate 0.0443 Epoch: 6 Global Step: 83160 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:34:29,170-Speed 3035.36 samples/sec Loss 9.7711 LearningRate 0.0442 Epoch: 6 Global Step: 83170 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:34:32,573-Speed 3010.25 samples/sec Loss 9.7870 LearningRate 0.0442 Epoch: 6 Global Step: 83180 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:34:35,915-Speed 3065.32 samples/sec Loss 9.6672 LearningRate 0.0442 Epoch: 6 Global Step: 83190 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:34:39,230-Speed 3089.79 samples/sec Loss 9.7679 LearningRate 0.0442 Epoch: 6 Global Step: 83200 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:34:42,647-Speed 2997.38 samples/sec Loss 9.6511 LearningRate 0.0442 Epoch: 6 Global Step: 83210 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:34:46,043-Speed 3016.86 samples/sec Loss 9.7629 LearningRate 0.0442 Epoch: 6 Global Step: 83220 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:34:49,520-Speed 2946.07 samples/sec Loss 9.7812 LearningRate 0.0442 Epoch: 6 Global Step: 83230 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:34:52,897-Speed 3032.73 samples/sec Loss 9.6953 LearningRate 0.0442 Epoch: 6 Global Step: 83240 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:34:56,339-Speed 2976.71 samples/sec Loss 9.6958 LearningRate 0.0442 Epoch: 6 Global Step: 83250 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:34:59,709-Speed 3039.22 samples/sec Loss 9.7933 LearningRate 0.0442 Epoch: 6 Global Step: 83260 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:35:03,101-Speed 3019.65 samples/sec Loss 9.6602 LearningRate 0.0442 Epoch: 6 Global Step: 83270 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:35:06,466-Speed 3045.04 samples/sec Loss 9.6984 LearningRate 0.0442 Epoch: 6 Global Step: 83280 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:35:09,822-Speed 3051.70 samples/sec Loss 9.6941 LearningRate 0.0442 Epoch: 6 Global Step: 83290 Fp16 Grad Scale: 131072 Required: 16 hours Training: 2022-04-27 09:35:13,233-Speed 3002.67 samples/sec Loss 9.6906 LearningRate 0.0442 Epoch: 6 Global Step: 83300 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:35:16,673-Speed 2977.78 samples/sec Loss 9.6801 LearningRate 0.0442 Epoch: 6 Global Step: 83310 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:35:20,059-Speed 3025.23 samples/sec Loss 9.5980 LearningRate 0.0442 Epoch: 6 Global Step: 83320 Fp16 Grad Scale: 65536 Required: 16 hours Training: 2022-04-27 09:35:23,478-Speed 2995.10 samples/sec Loss 9.7321 LearningRate 0.0442 Epoch: 6 Global Step: 83330 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:35:26,856-Speed 3032.20 samples/sec Loss 9.7236 LearningRate 0.0442 Epoch: 6 Global Step: 83340 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:35:30,209-Speed 3054.97 samples/sec Loss 9.7578 LearningRate 0.0442 Epoch: 6 Global Step: 83350 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:35:33,581-Speed 3037.50 samples/sec Loss 9.7039 LearningRate 0.0441 Epoch: 6 Global Step: 83360 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:35:36,965-Speed 3027.80 samples/sec Loss 9.6978 LearningRate 0.0441 Epoch: 6 Global Step: 83370 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:35:40,354-Speed 3021.66 samples/sec Loss 9.6224 LearningRate 0.0441 Epoch: 6 Global Step: 83380 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 09:35:43,780-Speed 2990.11 samples/sec Loss 9.6020 LearningRate 0.0441 Epoch: 6 Global Step: 83390 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 09:35:47,241-Speed 2959.06 samples/sec Loss 9.7015 LearningRate 0.0441 Epoch: 6 Global Step: 83400 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 09:35:50,611-Speed 3039.57 samples/sec Loss 9.7224 LearningRate 0.0441 Epoch: 6 Global Step: 83410 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 09:35:54,083-Speed 2949.99 samples/sec Loss 9.9047 LearningRate 0.0441 Epoch: 6 Global Step: 83420 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 09:35:57,438-Speed 3053.44 samples/sec Loss 9.6433 LearningRate 0.0441 Epoch: 6 Global Step: 83430 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 09:36:00,814-Speed 3034.07 samples/sec Loss 9.6075 LearningRate 0.0441 Epoch: 6 Global Step: 83440 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 09:36:04,255-Speed 2977.32 samples/sec Loss 9.8044 LearningRate 0.0441 Epoch: 6 Global Step: 83450 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 09:36:07,765-Speed 2918.34 samples/sec Loss 9.8074 LearningRate 0.0441 Epoch: 6 Global Step: 83460 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 09:36:11,152-Speed 3023.79 samples/sec Loss 9.7239 LearningRate 0.0441 Epoch: 6 Global Step: 83470 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 09:36:14,588-Speed 2981.36 samples/sec Loss 9.8296 LearningRate 0.0441 Epoch: 6 Global Step: 83480 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:36:17,911-Speed 3082.60 samples/sec Loss 9.8217 LearningRate 0.0441 Epoch: 6 Global Step: 83490 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:36:21,243-Speed 3074.36 samples/sec Loss 9.6208 LearningRate 0.0441 Epoch: 6 Global Step: 83500 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:36:24,627-Speed 3026.96 samples/sec Loss 9.6540 LearningRate 0.0441 Epoch: 6 Global Step: 83510 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:36:28,066-Speed 2977.71 samples/sec Loss 9.7093 LearningRate 0.0441 Epoch: 6 Global Step: 83520 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:36:31,572-Speed 2922.43 samples/sec Loss 9.6464 LearningRate 0.0441 Epoch: 6 Global Step: 83530 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:36:35,020-Speed 2970.54 samples/sec Loss 9.6401 LearningRate 0.0441 Epoch: 6 Global Step: 83540 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:36:38,440-Speed 2995.18 samples/sec Loss 9.6447 LearningRate 0.0440 Epoch: 6 Global Step: 83550 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:36:41,793-Speed 3054.25 samples/sec Loss 9.8059 LearningRate 0.0440 Epoch: 6 Global Step: 83560 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:36:45,113-Speed 3085.94 samples/sec Loss 9.7815 LearningRate 0.0440 Epoch: 6 Global Step: 83570 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:36:48,475-Speed 3046.21 samples/sec Loss 9.6619 LearningRate 0.0440 Epoch: 6 Global Step: 83580 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 09:36:51,827-Speed 3055.80 samples/sec Loss 9.7820 LearningRate 0.0440 Epoch: 6 Global Step: 83590 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 09:36:55,221-Speed 3017.86 samples/sec Loss 9.5901 LearningRate 0.0440 Epoch: 6 Global Step: 83600 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 09:36:58,601-Speed 3030.66 samples/sec Loss 9.7114 LearningRate 0.0440 Epoch: 6 Global Step: 83610 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 09:37:02,022-Speed 2994.39 samples/sec Loss 9.6113 LearningRate 0.0440 Epoch: 6 Global Step: 83620 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 09:37:05,371-Speed 3057.82 samples/sec Loss 9.6793 LearningRate 0.0440 Epoch: 6 Global Step: 83630 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:37:08,716-Speed 3062.54 samples/sec Loss 9.7147 LearningRate 0.0440 Epoch: 6 Global Step: 83640 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:37:12,093-Speed 3033.68 samples/sec Loss 9.7576 LearningRate 0.0440 Epoch: 6 Global Step: 83650 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:37:15,500-Speed 3005.76 samples/sec Loss 9.6405 LearningRate 0.0440 Epoch: 6 Global Step: 83660 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:37:18,960-Speed 2960.76 samples/sec Loss 9.6364 LearningRate 0.0440 Epoch: 6 Global Step: 83670 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:37:22,353-Speed 3018.63 samples/sec Loss 9.7675 LearningRate 0.0440 Epoch: 6 Global Step: 83680 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:37:25,760-Speed 3006.35 samples/sec Loss 9.6339 LearningRate 0.0440 Epoch: 6 Global Step: 83690 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:37:29,119-Speed 3050.09 samples/sec Loss 9.5112 LearningRate 0.0440 Epoch: 6 Global Step: 83700 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:37:32,486-Speed 3042.62 samples/sec Loss 9.6515 LearningRate 0.0440 Epoch: 6 Global Step: 83710 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:37:35,914-Speed 2987.66 samples/sec Loss 9.6531 LearningRate 0.0440 Epoch: 6 Global Step: 83720 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:37:39,358-Speed 2974.64 samples/sec Loss 9.6100 LearningRate 0.0440 Epoch: 6 Global Step: 83730 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 09:37:42,730-Speed 3037.27 samples/sec Loss 9.5506 LearningRate 0.0439 Epoch: 6 Global Step: 83740 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 09:37:46,191-Speed 2959.38 samples/sec Loss 9.6223 LearningRate 0.0439 Epoch: 6 Global Step: 83750 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 09:37:49,650-Speed 2961.46 samples/sec Loss 9.7890 LearningRate 0.0439 Epoch: 6 Global Step: 83760 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 09:37:53,031-Speed 3029.21 samples/sec Loss 9.7306 LearningRate 0.0439 Epoch: 6 Global Step: 83770 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 09:37:56,347-Speed 3088.70 samples/sec Loss 9.6215 LearningRate 0.0439 Epoch: 6 Global Step: 83780 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 09:37:59,801-Speed 2966.07 samples/sec Loss 9.7709 LearningRate 0.0439 Epoch: 6 Global Step: 83790 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:38:03,196-Speed 3016.89 samples/sec Loss 9.6865 LearningRate 0.0439 Epoch: 6 Global Step: 83800 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:38:06,543-Speed 3060.42 samples/sec Loss 9.7566 LearningRate 0.0439 Epoch: 6 Global Step: 83810 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:38:11,086-Speed 2254.22 samples/sec Loss 9.7235 LearningRate 0.0439 Epoch: 6 Global Step: 83820 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:38:14,428-Speed 3065.59 samples/sec Loss 9.5953 LearningRate 0.0439 Epoch: 6 Global Step: 83830 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:38:17,855-Speed 2988.89 samples/sec Loss 9.7141 LearningRate 0.0439 Epoch: 6 Global Step: 83840 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:38:21,196-Speed 3065.82 samples/sec Loss 9.5991 LearningRate 0.0439 Epoch: 6 Global Step: 83850 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:38:24,558-Speed 3047.84 samples/sec Loss 9.8023 LearningRate 0.0439 Epoch: 6 Global Step: 83860 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:38:27,985-Speed 2988.78 samples/sec Loss 9.5281 LearningRate 0.0439 Epoch: 6 Global Step: 83870 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:38:31,346-Speed 3047.86 samples/sec Loss 9.8355 LearningRate 0.0439 Epoch: 6 Global Step: 83880 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:38:34,750-Speed 3009.32 samples/sec Loss 9.6993 LearningRate 0.0439 Epoch: 6 Global Step: 83890 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 09:38:38,082-Speed 3075.03 samples/sec Loss 9.8579 LearningRate 0.0439 Epoch: 6 Global Step: 83900 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:38:41,551-Speed 2952.83 samples/sec Loss 9.7327 LearningRate 0.0439 Epoch: 6 Global Step: 83910 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:38:45,050-Speed 2927.20 samples/sec Loss 9.7982 LearningRate 0.0438 Epoch: 6 Global Step: 83920 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:38:48,481-Speed 2985.83 samples/sec Loss 9.6218 LearningRate 0.0438 Epoch: 6 Global Step: 83930 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:38:51,876-Speed 3017.59 samples/sec Loss 9.6533 LearningRate 0.0438 Epoch: 6 Global Step: 83940 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:38:55,229-Speed 3054.42 samples/sec Loss 9.7785 LearningRate 0.0438 Epoch: 6 Global Step: 83950 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:38:58,598-Speed 3039.96 samples/sec Loss 9.6088 LearningRate 0.0438 Epoch: 6 Global Step: 83960 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:39:01,960-Speed 3047.05 samples/sec Loss 9.5784 LearningRate 0.0438 Epoch: 6 Global Step: 83970 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:39:05,334-Speed 3035.60 samples/sec Loss 9.6690 LearningRate 0.0438 Epoch: 6 Global Step: 83980 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:39:08,742-Speed 3006.33 samples/sec Loss 9.5447 LearningRate 0.0438 Epoch: 6 Global Step: 83990 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:39:12,206-Speed 2956.14 samples/sec Loss 9.6327 LearningRate 0.0438 Epoch: 6 Global Step: 84000 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 09:39:15,520-Speed 3091.11 samples/sec Loss 9.6959 LearningRate 0.0438 Epoch: 6 Global Step: 84010 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 09:39:18,907-Speed 3024.08 samples/sec Loss 9.5023 LearningRate 0.0438 Epoch: 6 Global Step: 84020 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 09:39:22,326-Speed 2996.33 samples/sec Loss 9.7550 LearningRate 0.0438 Epoch: 6 Global Step: 84030 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 09:39:25,775-Speed 2970.04 samples/sec Loss 9.6975 LearningRate 0.0438 Epoch: 6 Global Step: 84040 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 09:39:29,167-Speed 3019.36 samples/sec Loss 9.7138 LearningRate 0.0438 Epoch: 6 Global Step: 84050 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 09:39:32,532-Speed 3044.08 samples/sec Loss 9.5024 LearningRate 0.0438 Epoch: 6 Global Step: 84060 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 09:39:35,904-Speed 3036.97 samples/sec Loss 9.6825 LearningRate 0.0438 Epoch: 6 Global Step: 84070 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 09:39:39,356-Speed 2968.01 samples/sec Loss 9.5656 LearningRate 0.0438 Epoch: 6 Global Step: 84080 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 09:39:42,793-Speed 2980.34 samples/sec Loss 9.6268 LearningRate 0.0438 Epoch: 6 Global Step: 84090 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 09:39:46,208-Speed 2999.27 samples/sec Loss 9.6480 LearningRate 0.0438 Epoch: 6 Global Step: 84100 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-04-27 09:39:49,543-Speed 3071.30 samples/sec Loss 9.7119 LearningRate 0.0437 Epoch: 6 Global Step: 84110 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 09:39:53,048-Speed 2921.85 samples/sec Loss 9.5932 LearningRate 0.0437 Epoch: 6 Global Step: 84120 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 09:39:56,583-Speed 2898.05 samples/sec Loss 9.7702 LearningRate 0.0437 Epoch: 6 Global Step: 84130 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 09:40:00,034-Speed 2967.82 samples/sec Loss 9.7324 LearningRate 0.0437 Epoch: 6 Global Step: 84140 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 09:40:03,485-Speed 2968.04 samples/sec Loss 9.6847 LearningRate 0.0437 Epoch: 6 Global Step: 84150 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 09:40:06,859-Speed 3036.05 samples/sec Loss 9.6227 LearningRate 0.0437 Epoch: 6 Global Step: 84160 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 09:40:10,211-Speed 3055.99 samples/sec Loss 9.5896 LearningRate 0.0437 Epoch: 6 Global Step: 84170 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 09:40:13,631-Speed 2995.11 samples/sec Loss 9.6982 LearningRate 0.0437 Epoch: 6 Global Step: 84180 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 09:40:16,940-Speed 3094.97 samples/sec Loss 9.7074 LearningRate 0.0437 Epoch: 6 Global Step: 84190 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 09:40:20,278-Speed 3069.20 samples/sec Loss 9.7084 LearningRate 0.0437 Epoch: 6 Global Step: 84200 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 09:40:23,681-Speed 3010.10 samples/sec Loss 9.8006 LearningRate 0.0437 Epoch: 6 Global Step: 84210 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 09:40:27,088-Speed 3006.81 samples/sec Loss 9.5589 LearningRate 0.0437 Epoch: 6 Global Step: 84220 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 09:40:30,512-Speed 2991.37 samples/sec Loss 9.5839 LearningRate 0.0437 Epoch: 6 Global Step: 84230 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 09:40:33,826-Speed 3090.60 samples/sec Loss 9.5900 LearningRate 0.0437 Epoch: 6 Global Step: 84240 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 09:40:37,215-Speed 3022.29 samples/sec Loss 9.7184 LearningRate 0.0437 Epoch: 6 Global Step: 84250 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 09:40:40,589-Speed 3036.55 samples/sec Loss 9.6812 LearningRate 0.0437 Epoch: 6 Global Step: 84260 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:40:44,012-Speed 2992.28 samples/sec Loss 9.7146 LearningRate 0.0437 Epoch: 6 Global Step: 84270 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:40:47,347-Speed 3070.83 samples/sec Loss 9.5794 LearningRate 0.0437 Epoch: 6 Global Step: 84280 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:40:50,717-Speed 3039.45 samples/sec Loss 9.6097 LearningRate 0.0437 Epoch: 6 Global Step: 84290 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:40:54,123-Speed 3007.75 samples/sec Loss 9.6741 LearningRate 0.0436 Epoch: 6 Global Step: 84300 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:40:57,556-Speed 2983.45 samples/sec Loss 9.6189 LearningRate 0.0436 Epoch: 6 Global Step: 84310 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:41:00,945-Speed 3022.49 samples/sec Loss 9.5700 LearningRate 0.0436 Epoch: 6 Global Step: 84320 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:41:04,356-Speed 3002.94 samples/sec Loss 9.5248 LearningRate 0.0436 Epoch: 6 Global Step: 84330 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:41:07,692-Speed 3070.87 samples/sec Loss 9.5672 LearningRate 0.0436 Epoch: 6 Global Step: 84340 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:41:11,107-Speed 2998.93 samples/sec Loss 9.7231 LearningRate 0.0436 Epoch: 6 Global Step: 84350 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:41:14,518-Speed 3003.26 samples/sec Loss 9.7142 LearningRate 0.0436 Epoch: 6 Global Step: 84360 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 09:41:17,879-Speed 3047.08 samples/sec Loss 9.6925 LearningRate 0.0436 Epoch: 6 Global Step: 84370 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 09:41:21,194-Speed 3089.99 samples/sec Loss 9.5902 LearningRate 0.0436 Epoch: 6 Global Step: 84380 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 09:41:24,559-Speed 3043.67 samples/sec Loss 9.6896 LearningRate 0.0436 Epoch: 6 Global Step: 84390 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 09:41:27,896-Speed 3069.58 samples/sec Loss 9.6415 LearningRate 0.0436 Epoch: 6 Global Step: 84400 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 09:41:31,265-Speed 3041.10 samples/sec Loss 9.6527 LearningRate 0.0436 Epoch: 6 Global Step: 84410 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 09:41:34,667-Speed 3010.10 samples/sec Loss 9.6579 LearningRate 0.0436 Epoch: 6 Global Step: 84420 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 09:41:38,112-Speed 2974.65 samples/sec Loss 9.6135 LearningRate 0.0436 Epoch: 6 Global Step: 84430 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:41:41,448-Speed 3069.54 samples/sec Loss 9.6205 LearningRate 0.0436 Epoch: 6 Global Step: 84440 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:41:44,794-Speed 3061.63 samples/sec Loss 9.5928 LearningRate 0.0436 Epoch: 6 Global Step: 84450 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:41:48,247-Speed 2966.89 samples/sec Loss 9.7240 LearningRate 0.0436 Epoch: 6 Global Step: 84460 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:41:51,580-Speed 3072.38 samples/sec Loss 9.6714 LearningRate 0.0436 Epoch: 6 Global Step: 84470 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:41:54,899-Speed 3086.07 samples/sec Loss 9.7174 LearningRate 0.0436 Epoch: 6 Global Step: 84480 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:41:58,320-Speed 2994.54 samples/sec Loss 9.8488 LearningRate 0.0435 Epoch: 6 Global Step: 84490 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:42:01,748-Speed 2988.00 samples/sec Loss 9.5659 LearningRate 0.0435 Epoch: 6 Global Step: 84500 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:42:05,101-Speed 3055.15 samples/sec Loss 9.6686 LearningRate 0.0435 Epoch: 6 Global Step: 84510 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:42:08,441-Speed 3066.60 samples/sec Loss 9.7264 LearningRate 0.0435 Epoch: 6 Global Step: 84520 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:42:11,769-Speed 3077.59 samples/sec Loss 9.7019 LearningRate 0.0435 Epoch: 6 Global Step: 84530 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:42:15,144-Speed 3035.61 samples/sec Loss 9.5160 LearningRate 0.0435 Epoch: 6 Global Step: 84540 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:42:18,519-Speed 3034.46 samples/sec Loss 9.6842 LearningRate 0.0435 Epoch: 6 Global Step: 84550 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:42:21,976-Speed 2962.46 samples/sec Loss 9.6775 LearningRate 0.0435 Epoch: 6 Global Step: 84560 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:42:25,375-Speed 3013.73 samples/sec Loss 9.8331 LearningRate 0.0435 Epoch: 6 Global Step: 84570 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:42:28,817-Speed 2975.67 samples/sec Loss 9.7217 LearningRate 0.0435 Epoch: 6 Global Step: 84580 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:42:32,197-Speed 3030.62 samples/sec Loss 9.5682 LearningRate 0.0435 Epoch: 6 Global Step: 84590 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:42:35,558-Speed 3047.49 samples/sec Loss 9.5525 LearningRate 0.0435 Epoch: 6 Global Step: 84600 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:42:38,974-Speed 2998.30 samples/sec Loss 9.7469 LearningRate 0.0435 Epoch: 6 Global Step: 84610 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:42:42,407-Speed 2984.40 samples/sec Loss 9.6323 LearningRate 0.0435 Epoch: 6 Global Step: 84620 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:42:45,804-Speed 3015.19 samples/sec Loss 9.7090 LearningRate 0.0435 Epoch: 6 Global Step: 84630 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 09:42:49,168-Speed 3044.46 samples/sec Loss 9.6615 LearningRate 0.0435 Epoch: 6 Global Step: 84640 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:42:52,581-Speed 3000.89 samples/sec Loss 9.4822 LearningRate 0.0435 Epoch: 6 Global Step: 84650 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:42:55,925-Speed 3063.45 samples/sec Loss 9.6248 LearningRate 0.0435 Epoch: 6 Global Step: 84660 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:42:59,322-Speed 3015.21 samples/sec Loss 9.5806 LearningRate 0.0434 Epoch: 6 Global Step: 84670 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:43:02,693-Speed 3038.79 samples/sec Loss 9.7796 LearningRate 0.0434 Epoch: 6 Global Step: 84680 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:43:06,049-Speed 3052.45 samples/sec Loss 9.5150 LearningRate 0.0434 Epoch: 6 Global Step: 84690 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:43:09,433-Speed 3026.48 samples/sec Loss 9.6263 LearningRate 0.0434 Epoch: 6 Global Step: 84700 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:43:12,804-Speed 3038.94 samples/sec Loss 9.8756 LearningRate 0.0434 Epoch: 6 Global Step: 84710 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:43:16,135-Speed 3075.25 samples/sec Loss 9.5334 LearningRate 0.0434 Epoch: 6 Global Step: 84720 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:43:19,551-Speed 2997.81 samples/sec Loss 9.6695 LearningRate 0.0434 Epoch: 6 Global Step: 84730 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:43:22,902-Speed 3056.97 samples/sec Loss 9.6893 LearningRate 0.0434 Epoch: 6 Global Step: 84740 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 09:43:26,325-Speed 2992.33 samples/sec Loss 9.7348 LearningRate 0.0434 Epoch: 6 Global Step: 84750 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 09:43:29,694-Speed 3040.17 samples/sec Loss 9.6177 LearningRate 0.0434 Epoch: 6 Global Step: 84760 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:43:33,066-Speed 3037.96 samples/sec Loss 9.6835 LearningRate 0.0434 Epoch: 6 Global Step: 84770 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:43:36,458-Speed 3019.61 samples/sec Loss 9.5989 LearningRate 0.0434 Epoch: 6 Global Step: 84780 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:43:39,863-Speed 3007.65 samples/sec Loss 9.5806 LearningRate 0.0434 Epoch: 6 Global Step: 84790 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:43:43,239-Speed 3035.03 samples/sec Loss 9.7594 LearningRate 0.0434 Epoch: 6 Global Step: 84800 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:43:46,580-Speed 3065.69 samples/sec Loss 9.7063 LearningRate 0.0434 Epoch: 6 Global Step: 84810 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:43:49,915-Speed 3070.84 samples/sec Loss 9.6170 LearningRate 0.0434 Epoch: 6 Global Step: 84820 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:43:53,394-Speed 2944.49 samples/sec Loss 9.4656 LearningRate 0.0434 Epoch: 6 Global Step: 84830 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:43:56,829-Speed 2982.14 samples/sec Loss 9.6746 LearningRate 0.0434 Epoch: 6 Global Step: 84840 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:44:00,250-Speed 2993.52 samples/sec Loss 9.6150 LearningRate 0.0434 Epoch: 6 Global Step: 84850 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:44:03,659-Speed 3004.96 samples/sec Loss 9.5118 LearningRate 0.0433 Epoch: 6 Global Step: 84860 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 09:44:07,036-Speed 3033.23 samples/sec Loss 9.4952 LearningRate 0.0433 Epoch: 6 Global Step: 84870 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 09:44:10,472-Speed 2981.64 samples/sec Loss 9.5364 LearningRate 0.0433 Epoch: 6 Global Step: 84880 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 09:44:13,848-Speed 3034.06 samples/sec Loss 9.4527 LearningRate 0.0433 Epoch: 6 Global Step: 84890 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 09:44:17,261-Speed 3001.94 samples/sec Loss 9.6063 LearningRate 0.0433 Epoch: 6 Global Step: 84900 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 09:44:20,636-Speed 3035.07 samples/sec Loss 9.4846 LearningRate 0.0433 Epoch: 6 Global Step: 84910 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 09:44:24,029-Speed 3019.04 samples/sec Loss 9.5990 LearningRate 0.0433 Epoch: 6 Global Step: 84920 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:44:27,392-Speed 3045.34 samples/sec Loss 9.5950 LearningRate 0.0433 Epoch: 6 Global Step: 84930 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:44:30,878-Speed 2938.69 samples/sec Loss 9.6681 LearningRate 0.0433 Epoch: 6 Global Step: 84940 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:44:34,279-Speed 3011.96 samples/sec Loss 9.6085 LearningRate 0.0433 Epoch: 6 Global Step: 84950 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:44:37,673-Speed 3018.45 samples/sec Loss 9.6572 LearningRate 0.0433 Epoch: 6 Global Step: 84960 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:44:41,054-Speed 3029.31 samples/sec Loss 9.5765 LearningRate 0.0433 Epoch: 6 Global Step: 84970 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:44:44,464-Speed 3005.01 samples/sec Loss 9.6366 LearningRate 0.0433 Epoch: 6 Global Step: 84980 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:44:47,828-Speed 3044.85 samples/sec Loss 9.5247 LearningRate 0.0433 Epoch: 6 Global Step: 84990 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:44:51,171-Speed 3063.42 samples/sec Loss 9.5928 LearningRate 0.0433 Epoch: 6 Global Step: 85000 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:44:54,568-Speed 3015.89 samples/sec Loss 9.7868 LearningRate 0.0433 Epoch: 6 Global Step: 85010 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:44:57,948-Speed 3030.65 samples/sec Loss 9.6006 LearningRate 0.0433 Epoch: 6 Global Step: 85020 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 09:45:01,322-Speed 3035.13 samples/sec Loss 9.9367 LearningRate 0.0433 Epoch: 6 Global Step: 85030 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 09:45:04,715-Speed 3018.96 samples/sec Loss 9.6872 LearningRate 0.0433 Epoch: 6 Global Step: 85040 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 09:45:08,079-Speed 3045.97 samples/sec Loss 9.5724 LearningRate 0.0432 Epoch: 6 Global Step: 85050 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 09:45:11,483-Speed 3009.19 samples/sec Loss 9.4627 LearningRate 0.0432 Epoch: 6 Global Step: 85060 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 09:45:14,872-Speed 3022.73 samples/sec Loss 9.5306 LearningRate 0.0432 Epoch: 6 Global Step: 85070 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:45:18,247-Speed 3034.41 samples/sec Loss 9.6917 LearningRate 0.0432 Epoch: 6 Global Step: 85080 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:45:21,656-Speed 3004.76 samples/sec Loss 9.6531 LearningRate 0.0432 Epoch: 6 Global Step: 85090 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:45:25,054-Speed 3014.41 samples/sec Loss 9.7032 LearningRate 0.0432 Epoch: 6 Global Step: 85100 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:45:28,435-Speed 3030.07 samples/sec Loss 9.6191 LearningRate 0.0432 Epoch: 6 Global Step: 85110 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:45:31,867-Speed 2984.51 samples/sec Loss 9.5223 LearningRate 0.0432 Epoch: 6 Global Step: 85120 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:45:35,243-Speed 3033.58 samples/sec Loss 9.5096 LearningRate 0.0432 Epoch: 6 Global Step: 85130 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:45:38,598-Speed 3053.51 samples/sec Loss 9.5034 LearningRate 0.0432 Epoch: 6 Global Step: 85140 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:45:42,048-Speed 2969.68 samples/sec Loss 9.6262 LearningRate 0.0432 Epoch: 6 Global Step: 85150 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:45:45,472-Speed 2991.35 samples/sec Loss 9.7299 LearningRate 0.0432 Epoch: 6 Global Step: 85160 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:45:48,923-Speed 2967.84 samples/sec Loss 9.5964 LearningRate 0.0432 Epoch: 6 Global Step: 85170 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 09:45:52,357-Speed 2983.14 samples/sec Loss 9.5329 LearningRate 0.0432 Epoch: 6 Global Step: 85180 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 09:45:55,734-Speed 3032.64 samples/sec Loss 9.5358 LearningRate 0.0432 Epoch: 6 Global Step: 85190 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:45:59,120-Speed 3025.17 samples/sec Loss 9.5938 LearningRate 0.0432 Epoch: 6 Global Step: 85200 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:46:02,573-Speed 2966.55 samples/sec Loss 9.6750 LearningRate 0.0432 Epoch: 6 Global Step: 85210 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:46:05,935-Speed 3047.13 samples/sec Loss 9.6797 LearningRate 0.0432 Epoch: 6 Global Step: 85220 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:46:09,360-Speed 2990.54 samples/sec Loss 9.6723 LearningRate 0.0432 Epoch: 6 Global Step: 85230 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:46:12,748-Speed 3022.98 samples/sec Loss 9.6677 LearningRate 0.0431 Epoch: 6 Global Step: 85240 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:46:16,101-Speed 3054.58 samples/sec Loss 9.4760 LearningRate 0.0431 Epoch: 6 Global Step: 85250 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:46:19,437-Speed 3070.59 samples/sec Loss 9.6877 LearningRate 0.0431 Epoch: 6 Global Step: 85260 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:46:22,830-Speed 3019.16 samples/sec Loss 9.5035 LearningRate 0.0431 Epoch: 6 Global Step: 85270 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:46:26,208-Speed 3032.12 samples/sec Loss 9.5607 LearningRate 0.0431 Epoch: 6 Global Step: 85280 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:46:29,660-Speed 2969.63 samples/sec Loss 9.5887 LearningRate 0.0431 Epoch: 6 Global Step: 85290 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 09:46:33,125-Speed 2956.21 samples/sec Loss 9.4774 LearningRate 0.0431 Epoch: 6 Global Step: 85300 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 09:46:36,513-Speed 3022.91 samples/sec Loss 9.6588 LearningRate 0.0431 Epoch: 6 Global Step: 85310 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 09:46:39,935-Speed 2993.50 samples/sec Loss 9.7193 LearningRate 0.0431 Epoch: 6 Global Step: 85320 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 09:46:43,393-Speed 2962.44 samples/sec Loss 9.7163 LearningRate 0.0431 Epoch: 6 Global Step: 85330 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 09:46:46,830-Speed 2980.29 samples/sec Loss 9.6185 LearningRate 0.0431 Epoch: 6 Global Step: 85340 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 09:46:50,252-Speed 2992.93 samples/sec Loss 9.6576 LearningRate 0.0431 Epoch: 6 Global Step: 85350 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 09:46:53,672-Speed 2995.62 samples/sec Loss 9.5768 LearningRate 0.0431 Epoch: 6 Global Step: 85360 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 09:46:57,047-Speed 3034.47 samples/sec Loss 9.6686 LearningRate 0.0431 Epoch: 6 Global Step: 85370 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:47:00,427-Speed 3030.02 samples/sec Loss 9.4105 LearningRate 0.0431 Epoch: 6 Global Step: 85380 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:47:03,807-Speed 3030.47 samples/sec Loss 9.5970 LearningRate 0.0431 Epoch: 6 Global Step: 85390 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:47:07,306-Speed 2928.08 samples/sec Loss 9.5237 LearningRate 0.0431 Epoch: 6 Global Step: 85400 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:47:10,729-Speed 2992.15 samples/sec Loss 9.5825 LearningRate 0.0431 Epoch: 6 Global Step: 85410 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:47:14,131-Speed 3011.06 samples/sec Loss 9.5397 LearningRate 0.0431 Epoch: 6 Global Step: 85420 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:47:17,503-Speed 3037.60 samples/sec Loss 9.6539 LearningRate 0.0430 Epoch: 6 Global Step: 85430 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:47:20,879-Speed 3033.76 samples/sec Loss 9.5997 LearningRate 0.0430 Epoch: 6 Global Step: 85440 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:47:24,241-Speed 3046.06 samples/sec Loss 9.4866 LearningRate 0.0430 Epoch: 6 Global Step: 85450 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:47:27,626-Speed 3026.20 samples/sec Loss 9.5535 LearningRate 0.0430 Epoch: 6 Global Step: 85460 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:47:31,024-Speed 3014.35 samples/sec Loss 9.5141 LearningRate 0.0430 Epoch: 6 Global Step: 85470 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:47:34,451-Speed 2988.75 samples/sec Loss 9.5009 LearningRate 0.0430 Epoch: 6 Global Step: 85480 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:47:37,892-Speed 2977.82 samples/sec Loss 9.6659 LearningRate 0.0430 Epoch: 6 Global Step: 85490 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:47:41,292-Speed 3013.04 samples/sec Loss 9.5682 LearningRate 0.0430 Epoch: 6 Global Step: 85500 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:47:44,692-Speed 3012.13 samples/sec Loss 9.6688 LearningRate 0.0430 Epoch: 6 Global Step: 85510 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:47:48,160-Speed 2954.16 samples/sec Loss 9.4740 LearningRate 0.0430 Epoch: 6 Global Step: 85520 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:47:51,573-Speed 3000.57 samples/sec Loss 9.5691 LearningRate 0.0430 Epoch: 6 Global Step: 85530 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:47:54,942-Speed 3040.33 samples/sec Loss 9.4302 LearningRate 0.0430 Epoch: 6 Global Step: 85540 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:47:58,415-Speed 2949.44 samples/sec Loss 9.5904 LearningRate 0.0430 Epoch: 6 Global Step: 85550 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:48:01,841-Speed 2990.09 samples/sec Loss 9.4156 LearningRate 0.0430 Epoch: 6 Global Step: 85560 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:48:05,184-Speed 3064.40 samples/sec Loss 9.6464 LearningRate 0.0430 Epoch: 6 Global Step: 85570 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 09:48:08,572-Speed 3022.59 samples/sec Loss 9.6917 LearningRate 0.0430 Epoch: 6 Global Step: 85580 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:48:11,926-Speed 3054.10 samples/sec Loss 9.4380 LearningRate 0.0430 Epoch: 6 Global Step: 85590 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:48:15,347-Speed 2994.11 samples/sec Loss 9.5427 LearningRate 0.0430 Epoch: 6 Global Step: 85600 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:48:18,733-Speed 3025.29 samples/sec Loss 9.6990 LearningRate 0.0430 Epoch: 6 Global Step: 85610 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:48:22,123-Speed 3021.63 samples/sec Loss 9.5881 LearningRate 0.0429 Epoch: 6 Global Step: 85620 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:48:25,571-Speed 2970.54 samples/sec Loss 9.6501 LearningRate 0.0429 Epoch: 6 Global Step: 85630 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:48:28,970-Speed 3013.29 samples/sec Loss 9.4386 LearningRate 0.0429 Epoch: 6 Global Step: 85640 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:48:32,317-Speed 3060.44 samples/sec Loss 9.8613 LearningRate 0.0429 Epoch: 6 Global Step: 85650 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:48:35,678-Speed 3047.99 samples/sec Loss 9.5380 LearningRate 0.0429 Epoch: 6 Global Step: 85660 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:48:39,033-Speed 3052.87 samples/sec Loss 9.6746 LearningRate 0.0429 Epoch: 6 Global Step: 85670 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:48:42,374-Speed 3065.67 samples/sec Loss 9.5556 LearningRate 0.0429 Epoch: 6 Global Step: 85680 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 09:48:45,797-Speed 2992.53 samples/sec Loss 9.5032 LearningRate 0.0429 Epoch: 6 Global Step: 85690 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 09:48:49,132-Speed 3071.29 samples/sec Loss 9.6843 LearningRate 0.0429 Epoch: 6 Global Step: 85700 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:48:52,600-Speed 2953.47 samples/sec Loss 9.5519 LearningRate 0.0429 Epoch: 6 Global Step: 85710 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:48:55,999-Speed 3013.52 samples/sec Loss 9.4998 LearningRate 0.0429 Epoch: 6 Global Step: 85720 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:48:59,366-Speed 3042.14 samples/sec Loss 9.6742 LearningRate 0.0429 Epoch: 6 Global Step: 85730 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:49:02,734-Speed 3041.20 samples/sec Loss 9.4778 LearningRate 0.0429 Epoch: 6 Global Step: 85740 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:49:06,076-Speed 3065.48 samples/sec Loss 9.5190 LearningRate 0.0429 Epoch: 6 Global Step: 85750 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:49:09,548-Speed 2949.66 samples/sec Loss 9.6039 LearningRate 0.0429 Epoch: 6 Global Step: 85760 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:49:12,949-Speed 3013.05 samples/sec Loss 9.4493 LearningRate 0.0429 Epoch: 6 Global Step: 85770 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:49:16,364-Speed 2999.66 samples/sec Loss 9.5903 LearningRate 0.0429 Epoch: 6 Global Step: 85780 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:49:19,750-Speed 3024.26 samples/sec Loss 9.6695 LearningRate 0.0429 Epoch: 6 Global Step: 85790 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:49:23,157-Speed 3006.95 samples/sec Loss 9.6194 LearningRate 0.0429 Epoch: 6 Global Step: 85800 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:49:26,508-Speed 3056.31 samples/sec Loss 9.4967 LearningRate 0.0428 Epoch: 6 Global Step: 85810 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 09:49:29,938-Speed 2986.60 samples/sec Loss 9.5300 LearningRate 0.0428 Epoch: 6 Global Step: 85820 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 09:49:33,322-Speed 3026.29 samples/sec Loss 9.5478 LearningRate 0.0428 Epoch: 6 Global Step: 85830 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 09:49:36,735-Speed 3001.24 samples/sec Loss 9.4669 LearningRate 0.0428 Epoch: 6 Global Step: 85840 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 09:49:40,130-Speed 3017.53 samples/sec Loss 9.6346 LearningRate 0.0428 Epoch: 6 Global Step: 85850 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 09:49:43,597-Speed 2954.11 samples/sec Loss 9.5330 LearningRate 0.0428 Epoch: 6 Global Step: 85860 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 09:49:46,969-Speed 3037.64 samples/sec Loss 9.5319 LearningRate 0.0428 Epoch: 6 Global Step: 85870 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 09:49:50,372-Speed 3010.42 samples/sec Loss 9.4257 LearningRate 0.0428 Epoch: 6 Global Step: 85880 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 09:49:53,732-Speed 3048.69 samples/sec Loss 9.6136 LearningRate 0.0428 Epoch: 6 Global Step: 85890 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 09:49:57,082-Speed 3057.22 samples/sec Loss 9.6099 LearningRate 0.0428 Epoch: 6 Global Step: 85900 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 09:50:00,447-Speed 3043.76 samples/sec Loss 9.5695 LearningRate 0.0428 Epoch: 6 Global Step: 85910 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:50:03,890-Speed 2974.84 samples/sec Loss 9.5542 LearningRate 0.0428 Epoch: 6 Global Step: 85920 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:50:07,245-Speed 3053.30 samples/sec Loss 9.5825 LearningRate 0.0428 Epoch: 6 Global Step: 85930 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:50:10,641-Speed 3016.74 samples/sec Loss 9.6101 LearningRate 0.0428 Epoch: 6 Global Step: 85940 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:50:14,075-Speed 2982.49 samples/sec Loss 9.4433 LearningRate 0.0428 Epoch: 6 Global Step: 85950 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:50:17,455-Speed 3030.95 samples/sec Loss 9.4958 LearningRate 0.0428 Epoch: 6 Global Step: 85960 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:50:20,860-Speed 3007.47 samples/sec Loss 9.5727 LearningRate 0.0428 Epoch: 6 Global Step: 85970 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:50:24,197-Speed 3069.97 samples/sec Loss 9.5173 LearningRate 0.0428 Epoch: 6 Global Step: 85980 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:50:27,621-Speed 2991.64 samples/sec Loss 9.7483 LearningRate 0.0428 Epoch: 6 Global Step: 85990 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:50:31,092-Speed 2951.18 samples/sec Loss 9.5701 LearningRate 0.0427 Epoch: 6 Global Step: 86000 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:50:34,437-Speed 3061.79 samples/sec Loss 9.4408 LearningRate 0.0427 Epoch: 6 Global Step: 86010 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 09:50:37,788-Speed 3058.02 samples/sec Loss 9.7122 LearningRate 0.0427 Epoch: 6 Global Step: 86020 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:50:41,157-Speed 3040.33 samples/sec Loss 9.5068 LearningRate 0.0427 Epoch: 6 Global Step: 86030 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:50:44,527-Speed 3038.91 samples/sec Loss 9.6277 LearningRate 0.0427 Epoch: 6 Global Step: 86040 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:50:47,915-Speed 3023.79 samples/sec Loss 9.5735 LearningRate 0.0427 Epoch: 6 Global Step: 86050 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:50:51,412-Speed 2928.36 samples/sec Loss 9.4444 LearningRate 0.0427 Epoch: 6 Global Step: 86060 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:50:54,842-Speed 2986.74 samples/sec Loss 9.5986 LearningRate 0.0427 Epoch: 6 Global Step: 86070 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:50:58,196-Speed 3053.46 samples/sec Loss 9.4655 LearningRate 0.0427 Epoch: 6 Global Step: 86080 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:51:01,640-Speed 2973.92 samples/sec Loss 9.3944 LearningRate 0.0427 Epoch: 6 Global Step: 86090 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:51:04,998-Speed 3050.80 samples/sec Loss 9.4877 LearningRate 0.0427 Epoch: 6 Global Step: 86100 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:51:08,356-Speed 3049.89 samples/sec Loss 9.7201 LearningRate 0.0427 Epoch: 6 Global Step: 86110 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:51:11,684-Speed 3078.37 samples/sec Loss 9.4504 LearningRate 0.0427 Epoch: 6 Global Step: 86120 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 09:51:15,060-Speed 3033.89 samples/sec Loss 9.4921 LearningRate 0.0427 Epoch: 6 Global Step: 86130 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:51:18,508-Speed 2970.70 samples/sec Loss 9.5714 LearningRate 0.0427 Epoch: 6 Global Step: 86140 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:51:21,907-Speed 3013.76 samples/sec Loss 9.6278 LearningRate 0.0427 Epoch: 6 Global Step: 86150 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:51:25,343-Speed 2981.01 samples/sec Loss 9.4617 LearningRate 0.0427 Epoch: 6 Global Step: 86160 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:51:28,789-Speed 2972.53 samples/sec Loss 9.4949 LearningRate 0.0427 Epoch: 6 Global Step: 86170 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:51:32,233-Speed 2973.96 samples/sec Loss 9.5787 LearningRate 0.0427 Epoch: 6 Global Step: 86180 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:51:35,650-Speed 2998.01 samples/sec Loss 9.5475 LearningRate 0.0426 Epoch: 6 Global Step: 86190 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:51:39,098-Speed 2970.91 samples/sec Loss 9.5615 LearningRate 0.0426 Epoch: 6 Global Step: 86200 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:51:42,517-Speed 2995.62 samples/sec Loss 9.4731 LearningRate 0.0426 Epoch: 6 Global Step: 86210 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:51:45,965-Speed 2971.17 samples/sec Loss 9.4520 LearningRate 0.0426 Epoch: 6 Global Step: 86220 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:51:49,363-Speed 3014.73 samples/sec Loss 9.5283 LearningRate 0.0426 Epoch: 6 Global Step: 86230 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 09:51:52,742-Speed 3031.38 samples/sec Loss 9.6252 LearningRate 0.0426 Epoch: 6 Global Step: 86240 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 09:51:56,088-Speed 3060.67 samples/sec Loss 9.5746 LearningRate 0.0426 Epoch: 6 Global Step: 86250 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:51:59,519-Speed 2985.47 samples/sec Loss 9.3814 LearningRate 0.0426 Epoch: 6 Global Step: 86260 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:52:02,994-Speed 2947.96 samples/sec Loss 9.6302 LearningRate 0.0426 Epoch: 6 Global Step: 86270 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:52:06,413-Speed 2996.07 samples/sec Loss 9.4963 LearningRate 0.0426 Epoch: 6 Global Step: 86280 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:52:09,764-Speed 3056.03 samples/sec Loss 9.4970 LearningRate 0.0426 Epoch: 6 Global Step: 86290 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:52:13,149-Speed 3026.26 samples/sec Loss 9.6056 LearningRate 0.0426 Epoch: 6 Global Step: 86300 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:52:16,541-Speed 3020.17 samples/sec Loss 9.4848 LearningRate 0.0426 Epoch: 6 Global Step: 86310 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:52:19,888-Speed 3059.90 samples/sec Loss 9.5863 LearningRate 0.0426 Epoch: 6 Global Step: 86320 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:52:23,352-Speed 2956.93 samples/sec Loss 9.4429 LearningRate 0.0426 Epoch: 6 Global Step: 86330 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:52:26,766-Speed 3000.71 samples/sec Loss 9.4415 LearningRate 0.0426 Epoch: 6 Global Step: 86340 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:52:30,152-Speed 3025.19 samples/sec Loss 9.5100 LearningRate 0.0426 Epoch: 6 Global Step: 86350 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 09:52:33,481-Speed 3077.13 samples/sec Loss 9.5028 LearningRate 0.0426 Epoch: 6 Global Step: 86360 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:52:36,812-Speed 3074.53 samples/sec Loss 9.4140 LearningRate 0.0426 Epoch: 6 Global Step: 86370 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:52:40,175-Speed 3045.91 samples/sec Loss 9.5865 LearningRate 0.0425 Epoch: 6 Global Step: 86380 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:52:43,506-Speed 3075.10 samples/sec Loss 9.5508 LearningRate 0.0425 Epoch: 6 Global Step: 86390 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:52:46,948-Speed 2975.95 samples/sec Loss 9.5463 LearningRate 0.0425 Epoch: 6 Global Step: 86400 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:52:50,324-Speed 3034.56 samples/sec Loss 9.5845 LearningRate 0.0425 Epoch: 6 Global Step: 86410 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:52:53,657-Speed 3072.87 samples/sec Loss 9.5141 LearningRate 0.0425 Epoch: 6 Global Step: 86420 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:52:57,063-Speed 3007.64 samples/sec Loss 9.3906 LearningRate 0.0425 Epoch: 6 Global Step: 86430 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:53:00,494-Speed 2984.85 samples/sec Loss 9.4743 LearningRate 0.0425 Epoch: 6 Global Step: 86440 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:53:03,923-Speed 2987.45 samples/sec Loss 9.5652 LearningRate 0.0425 Epoch: 6 Global Step: 86450 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:53:07,393-Speed 2951.80 samples/sec Loss 9.4958 LearningRate 0.0425 Epoch: 6 Global Step: 86460 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 09:53:10,755-Speed 3047.35 samples/sec Loss 9.4153 LearningRate 0.0425 Epoch: 6 Global Step: 86470 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 09:53:14,168-Speed 3000.56 samples/sec Loss 9.4741 LearningRate 0.0425 Epoch: 6 Global Step: 86480 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 09:53:17,529-Speed 3047.70 samples/sec Loss 9.5205 LearningRate 0.0425 Epoch: 6 Global Step: 86490 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 09:53:20,850-Speed 3085.69 samples/sec Loss 9.5854 LearningRate 0.0425 Epoch: 6 Global Step: 86500 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:53:24,304-Speed 2966.16 samples/sec Loss 9.4093 LearningRate 0.0425 Epoch: 6 Global Step: 86510 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:53:27,750-Speed 2971.71 samples/sec Loss 9.4119 LearningRate 0.0425 Epoch: 6 Global Step: 86520 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:53:31,106-Speed 3052.80 samples/sec Loss 9.5935 LearningRate 0.0425 Epoch: 6 Global Step: 86530 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:53:34,481-Speed 3034.43 samples/sec Loss 9.4852 LearningRate 0.0425 Epoch: 6 Global Step: 86540 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:53:37,921-Speed 2977.81 samples/sec Loss 9.4319 LearningRate 0.0425 Epoch: 6 Global Step: 86550 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:53:41,320-Speed 3013.27 samples/sec Loss 9.5248 LearningRate 0.0425 Epoch: 6 Global Step: 86560 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:53:44,796-Speed 2946.53 samples/sec Loss 9.5539 LearningRate 0.0424 Epoch: 6 Global Step: 86570 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:53:48,214-Speed 2997.60 samples/sec Loss 9.3987 LearningRate 0.0424 Epoch: 6 Global Step: 86580 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:53:51,599-Speed 3025.73 samples/sec Loss 9.3548 LearningRate 0.0424 Epoch: 6 Global Step: 86590 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:53:55,029-Speed 2986.16 samples/sec Loss 9.4495 LearningRate 0.0424 Epoch: 6 Global Step: 86600 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 09:53:58,398-Speed 3040.06 samples/sec Loss 9.5100 LearningRate 0.0424 Epoch: 6 Global Step: 86610 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 09:54:01,771-Speed 3036.61 samples/sec Loss 9.5515 LearningRate 0.0424 Epoch: 6 Global Step: 86620 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:54:05,075-Speed 3100.01 samples/sec Loss 9.6233 LearningRate 0.0424 Epoch: 6 Global Step: 86630 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:54:08,447-Speed 3038.25 samples/sec Loss 9.4303 LearningRate 0.0424 Epoch: 6 Global Step: 86640 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:54:11,804-Speed 3051.07 samples/sec Loss 9.5319 LearningRate 0.0424 Epoch: 6 Global Step: 86650 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:54:15,129-Speed 3079.86 samples/sec Loss 9.4896 LearningRate 0.0424 Epoch: 6 Global Step: 86660 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:54:18,503-Speed 3036.66 samples/sec Loss 9.5034 LearningRate 0.0424 Epoch: 6 Global Step: 86670 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:54:21,902-Speed 3013.05 samples/sec Loss 9.4286 LearningRate 0.0424 Epoch: 6 Global Step: 86680 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:54:25,290-Speed 3024.26 samples/sec Loss 9.5582 LearningRate 0.0424 Epoch: 6 Global Step: 86690 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:54:28,674-Speed 3026.19 samples/sec Loss 9.4728 LearningRate 0.0424 Epoch: 6 Global Step: 86700 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:54:32,070-Speed 3016.78 samples/sec Loss 9.4160 LearningRate 0.0424 Epoch: 6 Global Step: 86710 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:54:35,414-Speed 3062.58 samples/sec Loss 9.4431 LearningRate 0.0424 Epoch: 6 Global Step: 86720 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:54:38,769-Speed 3053.96 samples/sec Loss 9.5210 LearningRate 0.0424 Epoch: 6 Global Step: 86730 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:54:42,111-Speed 3064.44 samples/sec Loss 9.6023 LearningRate 0.0424 Epoch: 6 Global Step: 86740 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:54:45,473-Speed 3046.82 samples/sec Loss 9.4413 LearningRate 0.0424 Epoch: 6 Global Step: 86750 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:54:48,890-Speed 2997.30 samples/sec Loss 9.6043 LearningRate 0.0423 Epoch: 6 Global Step: 86760 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:54:52,340-Speed 2969.32 samples/sec Loss 9.5068 LearningRate 0.0423 Epoch: 6 Global Step: 86770 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:54:55,858-Speed 2911.32 samples/sec Loss 9.4630 LearningRate 0.0423 Epoch: 6 Global Step: 86780 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:54:59,283-Speed 2990.43 samples/sec Loss 9.4134 LearningRate 0.0423 Epoch: 6 Global Step: 86790 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:55:02,629-Speed 3061.84 samples/sec Loss 9.5620 LearningRate 0.0423 Epoch: 6 Global Step: 86800 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:55:06,040-Speed 3002.79 samples/sec Loss 9.5222 LearningRate 0.0423 Epoch: 6 Global Step: 86810 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:55:09,366-Speed 3086.28 samples/sec Loss 9.4873 LearningRate 0.0423 Epoch: 6 Global Step: 86820 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 09:55:12,783-Speed 2997.77 samples/sec Loss 9.5895 LearningRate 0.0423 Epoch: 6 Global Step: 86830 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 09:55:16,170-Speed 3024.73 samples/sec Loss 9.3970 LearningRate 0.0423 Epoch: 6 Global Step: 86840 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 09:55:19,564-Speed 3018.78 samples/sec Loss 9.6004 LearningRate 0.0423 Epoch: 6 Global Step: 86850 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 09:55:22,958-Speed 3017.33 samples/sec Loss 9.4227 LearningRate 0.0423 Epoch: 6 Global Step: 86860 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 09:55:26,335-Speed 3033.08 samples/sec Loss 9.5377 LearningRate 0.0423 Epoch: 6 Global Step: 86870 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 09:55:29,723-Speed 3023.52 samples/sec Loss 9.5177 LearningRate 0.0423 Epoch: 6 Global Step: 86880 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 09:55:33,210-Speed 2937.76 samples/sec Loss 9.3525 LearningRate 0.0423 Epoch: 6 Global Step: 86890 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 09:55:36,606-Speed 3015.81 samples/sec Loss 9.3812 LearningRate 0.0423 Epoch: 6 Global Step: 86900 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 09:55:39,991-Speed 3026.00 samples/sec Loss 9.4342 LearningRate 0.0423 Epoch: 6 Global Step: 86910 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 09:55:43,408-Speed 2997.47 samples/sec Loss 9.4053 LearningRate 0.0423 Epoch: 6 Global Step: 86920 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:55:46,802-Speed 3017.70 samples/sec Loss 9.3387 LearningRate 0.0423 Epoch: 6 Global Step: 86930 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:55:50,488-Speed 2780.26 samples/sec Loss 9.5867 LearningRate 0.0423 Epoch: 6 Global Step: 86940 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:56:21,938-Speed 325.62 samples/sec Loss 9.0649 LearningRate 0.0422 Epoch: 7 Global Step: 86950 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:56:25,354-Speed 2998.83 samples/sec Loss 7.9646 LearningRate 0.0422 Epoch: 7 Global Step: 86960 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:56:28,866-Speed 2916.85 samples/sec Loss 7.9309 LearningRate 0.0422 Epoch: 7 Global Step: 86970 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:56:32,297-Speed 2985.48 samples/sec Loss 7.9687 LearningRate 0.0422 Epoch: 7 Global Step: 86980 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:56:35,760-Speed 2958.38 samples/sec Loss 7.8560 LearningRate 0.0422 Epoch: 7 Global Step: 86990 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:56:39,275-Speed 2913.92 samples/sec Loss 8.0329 LearningRate 0.0422 Epoch: 7 Global Step: 87000 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:56:42,687-Speed 3001.61 samples/sec Loss 7.9530 LearningRate 0.0422 Epoch: 7 Global Step: 87010 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:56:46,153-Speed 2955.95 samples/sec Loss 7.9844 LearningRate 0.0422 Epoch: 7 Global Step: 87020 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 09:56:49,615-Speed 2958.13 samples/sec Loss 7.9556 LearningRate 0.0422 Epoch: 7 Global Step: 87030 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 09:56:53,095-Speed 2943.89 samples/sec Loss 7.9791 LearningRate 0.0422 Epoch: 7 Global Step: 87040 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 09:56:56,673-Speed 2862.70 samples/sec Loss 7.9496 LearningRate 0.0422 Epoch: 7 Global Step: 87050 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 09:57:00,060-Speed 3024.70 samples/sec Loss 8.0040 LearningRate 0.0422 Epoch: 7 Global Step: 87060 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 09:57:03,436-Speed 3033.67 samples/sec Loss 7.9732 LearningRate 0.0422 Epoch: 7 Global Step: 87070 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 09:57:06,911-Speed 2948.06 samples/sec Loss 8.0815 LearningRate 0.0422 Epoch: 7 Global Step: 87080 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 09:57:10,354-Speed 2974.87 samples/sec Loss 8.0999 LearningRate 0.0422 Epoch: 7 Global Step: 87090 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 09:57:13,704-Speed 3057.85 samples/sec Loss 8.1002 LearningRate 0.0422 Epoch: 7 Global Step: 87100 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 09:57:17,253-Speed 2885.66 samples/sec Loss 8.2582 LearningRate 0.0422 Epoch: 7 Global Step: 87110 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:57:20,681-Speed 2988.65 samples/sec Loss 8.0822 LearningRate 0.0422 Epoch: 7 Global Step: 87120 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:57:24,091-Speed 3003.92 samples/sec Loss 8.0259 LearningRate 0.0422 Epoch: 7 Global Step: 87130 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:57:27,714-Speed 2826.43 samples/sec Loss 7.9177 LearningRate 0.0421 Epoch: 7 Global Step: 87140 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:57:31,216-Speed 2925.26 samples/sec Loss 8.2360 LearningRate 0.0421 Epoch: 7 Global Step: 87150 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:57:34,612-Speed 3016.14 samples/sec Loss 8.1162 LearningRate 0.0421 Epoch: 7 Global Step: 87160 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:57:38,040-Speed 2988.48 samples/sec Loss 7.9445 LearningRate 0.0421 Epoch: 7 Global Step: 87170 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:57:41,456-Speed 2998.24 samples/sec Loss 8.0810 LearningRate 0.0421 Epoch: 7 Global Step: 87180 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:57:44,889-Speed 2983.64 samples/sec Loss 8.0740 LearningRate 0.0421 Epoch: 7 Global Step: 87190 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:57:48,284-Speed 3016.69 samples/sec Loss 8.2013 LearningRate 0.0421 Epoch: 7 Global Step: 87200 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:57:51,656-Speed 3037.80 samples/sec Loss 8.1608 LearningRate 0.0421 Epoch: 7 Global Step: 87210 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 09:57:55,133-Speed 2945.78 samples/sec Loss 8.1599 LearningRate 0.0421 Epoch: 7 Global Step: 87220 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 09:57:58,588-Speed 2965.40 samples/sec Loss 8.1002 LearningRate 0.0421 Epoch: 7 Global Step: 87230 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 09:58:02,040-Speed 2967.34 samples/sec Loss 8.1885 LearningRate 0.0421 Epoch: 7 Global Step: 87240 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 09:58:05,517-Speed 2945.81 samples/sec Loss 8.1173 LearningRate 0.0421 Epoch: 7 Global Step: 87250 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 09:58:08,916-Speed 3013.77 samples/sec Loss 8.2527 LearningRate 0.0421 Epoch: 7 Global Step: 87260 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 09:58:12,255-Speed 3067.02 samples/sec Loss 8.2013 LearningRate 0.0421 Epoch: 7 Global Step: 87270 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 09:58:15,666-Speed 3002.81 samples/sec Loss 8.2368 LearningRate 0.0421 Epoch: 7 Global Step: 87280 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 09:58:19,158-Speed 2933.53 samples/sec Loss 8.1870 LearningRate 0.0421 Epoch: 7 Global Step: 87290 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 09:58:22,597-Speed 2978.36 samples/sec Loss 8.2838 LearningRate 0.0421 Epoch: 7 Global Step: 87300 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 09:58:25,968-Speed 3038.53 samples/sec Loss 8.3135 LearningRate 0.0421 Epoch: 7 Global Step: 87310 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 09:58:29,323-Speed 3052.83 samples/sec Loss 8.1282 LearningRate 0.0421 Epoch: 7 Global Step: 87320 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 09:58:32,745-Speed 2993.75 samples/sec Loss 8.1738 LearningRate 0.0420 Epoch: 7 Global Step: 87330 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 09:58:36,054-Speed 3095.60 samples/sec Loss 8.1563 LearningRate 0.0420 Epoch: 7 Global Step: 87340 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 09:58:39,438-Speed 3026.89 samples/sec Loss 8.2270 LearningRate 0.0420 Epoch: 7 Global Step: 87350 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 09:58:42,862-Speed 2991.48 samples/sec Loss 8.1117 LearningRate 0.0420 Epoch: 7 Global Step: 87360 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 09:58:46,238-Speed 3033.68 samples/sec Loss 8.2281 LearningRate 0.0420 Epoch: 7 Global Step: 87370 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 09:58:49,652-Speed 3000.26 samples/sec Loss 8.3919 LearningRate 0.0420 Epoch: 7 Global Step: 87380 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 09:58:53,025-Speed 3037.11 samples/sec Loss 8.2595 LearningRate 0.0420 Epoch: 7 Global Step: 87390 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:58:56,355-Speed 3076.25 samples/sec Loss 8.3148 LearningRate 0.0420 Epoch: 7 Global Step: 87400 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:58:59,753-Speed 3014.18 samples/sec Loss 8.1404 LearningRate 0.0420 Epoch: 7 Global Step: 87410 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:59:03,122-Speed 3039.98 samples/sec Loss 8.2477 LearningRate 0.0420 Epoch: 7 Global Step: 87420 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:59:06,497-Speed 3034.79 samples/sec Loss 8.4236 LearningRate 0.0420 Epoch: 7 Global Step: 87430 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:59:09,926-Speed 2987.79 samples/sec Loss 8.3309 LearningRate 0.0420 Epoch: 7 Global Step: 87440 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:59:13,336-Speed 3003.45 samples/sec Loss 8.2494 LearningRate 0.0420 Epoch: 7 Global Step: 87450 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:59:16,653-Speed 3087.57 samples/sec Loss 8.3610 LearningRate 0.0420 Epoch: 7 Global Step: 87460 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:59:20,023-Speed 3039.15 samples/sec Loss 8.2972 LearningRate 0.0420 Epoch: 7 Global Step: 87470 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:59:23,399-Speed 3034.72 samples/sec Loss 8.4125 LearningRate 0.0420 Epoch: 7 Global Step: 87480 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:59:26,740-Speed 3065.66 samples/sec Loss 8.3535 LearningRate 0.0420 Epoch: 7 Global Step: 87490 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 09:59:30,084-Speed 3062.69 samples/sec Loss 8.5236 LearningRate 0.0420 Epoch: 7 Global Step: 87500 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 09:59:33,446-Speed 3047.01 samples/sec Loss 8.2762 LearningRate 0.0420 Epoch: 7 Global Step: 87510 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 09:59:36,835-Speed 3022.17 samples/sec Loss 8.3223 LearningRate 0.0420 Epoch: 7 Global Step: 87520 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 09:59:40,253-Speed 2996.87 samples/sec Loss 8.3086 LearningRate 0.0419 Epoch: 7 Global Step: 87530 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 09:59:43,596-Speed 3064.16 samples/sec Loss 8.3891 LearningRate 0.0419 Epoch: 7 Global Step: 87540 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 09:59:47,006-Speed 3003.65 samples/sec Loss 8.3887 LearningRate 0.0419 Epoch: 7 Global Step: 87550 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 09:59:50,514-Speed 2920.35 samples/sec Loss 8.4697 LearningRate 0.0419 Epoch: 7 Global Step: 87560 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:59:53,861-Speed 3060.35 samples/sec Loss 8.3752 LearningRate 0.0419 Epoch: 7 Global Step: 87570 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 09:59:57,242-Speed 3029.50 samples/sec Loss 8.4165 LearningRate 0.0419 Epoch: 7 Global Step: 87580 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:00:00,599-Speed 3051.21 samples/sec Loss 8.3229 LearningRate 0.0419 Epoch: 7 Global Step: 87590 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:00:03,923-Speed 3082.89 samples/sec Loss 8.3413 LearningRate 0.0419 Epoch: 7 Global Step: 87600 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:00:07,274-Speed 3056.93 samples/sec Loss 8.4121 LearningRate 0.0419 Epoch: 7 Global Step: 87610 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:00:10,653-Speed 3031.95 samples/sec Loss 8.3110 LearningRate 0.0419 Epoch: 7 Global Step: 87620 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:00:14,131-Speed 2944.78 samples/sec Loss 8.2928 LearningRate 0.0419 Epoch: 7 Global Step: 87630 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:00:17,489-Speed 3050.39 samples/sec Loss 8.3724 LearningRate 0.0419 Epoch: 7 Global Step: 87640 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:00:21,007-Speed 2911.65 samples/sec Loss 8.4822 LearningRate 0.0419 Epoch: 7 Global Step: 87650 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:00:24,429-Speed 2993.42 samples/sec Loss 8.5025 LearningRate 0.0419 Epoch: 7 Global Step: 87660 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:00:27,804-Speed 3034.87 samples/sec Loss 8.4944 LearningRate 0.0419 Epoch: 7 Global Step: 87670 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:00:31,207-Speed 3010.32 samples/sec Loss 8.3300 LearningRate 0.0419 Epoch: 7 Global Step: 87680 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:00:34,631-Speed 2991.43 samples/sec Loss 8.3290 LearningRate 0.0419 Epoch: 7 Global Step: 87690 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:00:37,970-Speed 3067.69 samples/sec Loss 8.3337 LearningRate 0.0419 Epoch: 7 Global Step: 87700 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:00:41,353-Speed 3028.04 samples/sec Loss 8.3552 LearningRate 0.0419 Epoch: 7 Global Step: 87710 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:00:44,703-Speed 3058.29 samples/sec Loss 8.4089 LearningRate 0.0418 Epoch: 7 Global Step: 87720 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:00:48,073-Speed 3038.98 samples/sec Loss 8.4817 LearningRate 0.0418 Epoch: 7 Global Step: 87730 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:00:51,442-Speed 3039.80 samples/sec Loss 8.4017 LearningRate 0.0418 Epoch: 7 Global Step: 87740 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:00:54,833-Speed 3021.03 samples/sec Loss 8.5269 LearningRate 0.0418 Epoch: 7 Global Step: 87750 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:00:58,176-Speed 3063.95 samples/sec Loss 8.4717 LearningRate 0.0418 Epoch: 7 Global Step: 87760 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:01:01,516-Speed 3066.49 samples/sec Loss 8.4960 LearningRate 0.0418 Epoch: 7 Global Step: 87770 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:01:04,915-Speed 3013.86 samples/sec Loss 8.4833 LearningRate 0.0418 Epoch: 7 Global Step: 87780 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:01:08,235-Speed 3085.28 samples/sec Loss 8.4427 LearningRate 0.0418 Epoch: 7 Global Step: 87790 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:01:11,579-Speed 3063.39 samples/sec Loss 8.4338 LearningRate 0.0418 Epoch: 7 Global Step: 87800 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:01:14,920-Speed 3065.72 samples/sec Loss 8.3885 LearningRate 0.0418 Epoch: 7 Global Step: 87810 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:01:18,314-Speed 3017.88 samples/sec Loss 8.4656 LearningRate 0.0418 Epoch: 7 Global Step: 87820 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:01:21,703-Speed 3022.20 samples/sec Loss 8.3215 LearningRate 0.0418 Epoch: 7 Global Step: 87830 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:01:25,107-Speed 3009.00 samples/sec Loss 8.5538 LearningRate 0.0418 Epoch: 7 Global Step: 87840 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:01:28,537-Speed 2987.00 samples/sec Loss 8.6022 LearningRate 0.0418 Epoch: 7 Global Step: 87850 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:01:31,851-Speed 3090.27 samples/sec Loss 8.4965 LearningRate 0.0418 Epoch: 7 Global Step: 87860 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:01:35,228-Speed 3033.24 samples/sec Loss 8.4417 LearningRate 0.0418 Epoch: 7 Global Step: 87870 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:01:38,656-Speed 2987.95 samples/sec Loss 8.4123 LearningRate 0.0418 Epoch: 7 Global Step: 87880 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:01:42,063-Speed 3006.29 samples/sec Loss 8.4671 LearningRate 0.0418 Epoch: 7 Global Step: 87890 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:01:45,431-Speed 3042.29 samples/sec Loss 8.5051 LearningRate 0.0418 Epoch: 7 Global Step: 87900 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:01:48,781-Speed 3056.82 samples/sec Loss 8.6747 LearningRate 0.0417 Epoch: 7 Global Step: 87910 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:01:52,159-Speed 3032.90 samples/sec Loss 8.4271 LearningRate 0.0417 Epoch: 7 Global Step: 87920 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:01:55,510-Speed 3055.87 samples/sec Loss 8.5293 LearningRate 0.0417 Epoch: 7 Global Step: 87930 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:01:58,859-Speed 3058.66 samples/sec Loss 8.3778 LearningRate 0.0417 Epoch: 7 Global Step: 87940 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:02:02,239-Speed 3031.54 samples/sec Loss 8.5616 LearningRate 0.0417 Epoch: 7 Global Step: 87950 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:02:05,644-Speed 3008.29 samples/sec Loss 8.5126 LearningRate 0.0417 Epoch: 7 Global Step: 87960 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:02:09,108-Speed 2956.78 samples/sec Loss 8.6567 LearningRate 0.0417 Epoch: 7 Global Step: 87970 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:02:12,497-Speed 3022.61 samples/sec Loss 8.4714 LearningRate 0.0417 Epoch: 7 Global Step: 87980 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:02:15,837-Speed 3066.69 samples/sec Loss 8.6024 LearningRate 0.0417 Epoch: 7 Global Step: 87990 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:02:19,163-Speed 3079.94 samples/sec Loss 8.5607 LearningRate 0.0417 Epoch: 7 Global Step: 88000 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:02:22,506-Speed 3063.38 samples/sec Loss 8.6084 LearningRate 0.0417 Epoch: 7 Global Step: 88010 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:02:25,893-Speed 3024.57 samples/sec Loss 8.6196 LearningRate 0.0417 Epoch: 7 Global Step: 88020 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:02:29,227-Speed 3072.25 samples/sec Loss 8.6666 LearningRate 0.0417 Epoch: 7 Global Step: 88030 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:02:32,590-Speed 3045.94 samples/sec Loss 8.6161 LearningRate 0.0417 Epoch: 7 Global Step: 88040 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:02:35,908-Speed 3086.41 samples/sec Loss 8.6209 LearningRate 0.0417 Epoch: 7 Global Step: 88050 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:02:39,238-Speed 3075.71 samples/sec Loss 8.6672 LearningRate 0.0417 Epoch: 7 Global Step: 88060 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:02:42,596-Speed 3050.71 samples/sec Loss 8.5922 LearningRate 0.0417 Epoch: 7 Global Step: 88070 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:02:45,969-Speed 3036.18 samples/sec Loss 8.4141 LearningRate 0.0417 Epoch: 7 Global Step: 88080 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:02:49,346-Speed 3033.69 samples/sec Loss 8.5021 LearningRate 0.0417 Epoch: 7 Global Step: 88090 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:02:52,699-Speed 3054.26 samples/sec Loss 8.6464 LearningRate 0.0416 Epoch: 7 Global Step: 88100 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:02:56,086-Speed 3024.57 samples/sec Loss 8.5815 LearningRate 0.0416 Epoch: 7 Global Step: 88110 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:02:59,459-Speed 3037.08 samples/sec Loss 8.6699 LearningRate 0.0416 Epoch: 7 Global Step: 88120 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:03:02,882-Speed 2991.80 samples/sec Loss 8.5568 LearningRate 0.0416 Epoch: 7 Global Step: 88130 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:03:06,250-Speed 3041.68 samples/sec Loss 8.5696 LearningRate 0.0416 Epoch: 7 Global Step: 88140 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:03:09,570-Speed 3084.56 samples/sec Loss 8.4597 LearningRate 0.0416 Epoch: 7 Global Step: 88150 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:03:12,912-Speed 3065.77 samples/sec Loss 8.6732 LearningRate 0.0416 Epoch: 7 Global Step: 88160 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:03:16,247-Speed 3070.98 samples/sec Loss 8.6871 LearningRate 0.0416 Epoch: 7 Global Step: 88170 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:03:19,612-Speed 3044.42 samples/sec Loss 8.5025 LearningRate 0.0416 Epoch: 7 Global Step: 88180 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:03:22,940-Speed 3077.22 samples/sec Loss 8.5586 LearningRate 0.0416 Epoch: 7 Global Step: 88190 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:03:26,266-Speed 3080.07 samples/sec Loss 8.6933 LearningRate 0.0416 Epoch: 7 Global Step: 88200 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:03:29,608-Speed 3064.96 samples/sec Loss 8.5506 LearningRate 0.0416 Epoch: 7 Global Step: 88210 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:03:32,954-Speed 3061.36 samples/sec Loss 8.4229 LearningRate 0.0416 Epoch: 7 Global Step: 88220 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:03:36,412-Speed 2961.24 samples/sec Loss 8.6029 LearningRate 0.0416 Epoch: 7 Global Step: 88230 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:03:39,810-Speed 3014.45 samples/sec Loss 8.6661 LearningRate 0.0416 Epoch: 7 Global Step: 88240 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:03:43,184-Speed 3036.68 samples/sec Loss 8.7029 LearningRate 0.0416 Epoch: 7 Global Step: 88250 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:03:46,555-Speed 3038.11 samples/sec Loss 8.5381 LearningRate 0.0416 Epoch: 7 Global Step: 88260 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:03:49,907-Speed 3056.43 samples/sec Loss 8.6044 LearningRate 0.0416 Epoch: 7 Global Step: 88270 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:03:53,299-Speed 3019.69 samples/sec Loss 8.7334 LearningRate 0.0416 Epoch: 7 Global Step: 88280 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:03:56,715-Speed 2997.89 samples/sec Loss 8.6390 LearningRate 0.0416 Epoch: 7 Global Step: 88290 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:04:00,187-Speed 2949.93 samples/sec Loss 8.5757 LearningRate 0.0415 Epoch: 7 Global Step: 88300 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:04:03,583-Speed 3016.48 samples/sec Loss 8.6617 LearningRate 0.0415 Epoch: 7 Global Step: 88310 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:04:06,912-Speed 3076.51 samples/sec Loss 8.7498 LearningRate 0.0415 Epoch: 7 Global Step: 88320 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:04:10,265-Speed 3054.85 samples/sec Loss 8.6363 LearningRate 0.0415 Epoch: 7 Global Step: 88330 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:04:13,638-Speed 3036.98 samples/sec Loss 8.7777 LearningRate 0.0415 Epoch: 7 Global Step: 88340 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:04:16,971-Speed 3072.87 samples/sec Loss 8.6729 LearningRate 0.0415 Epoch: 7 Global Step: 88350 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:04:20,308-Speed 3070.04 samples/sec Loss 8.7891 LearningRate 0.0415 Epoch: 7 Global Step: 88360 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:04:23,722-Speed 2999.52 samples/sec Loss 8.5898 LearningRate 0.0415 Epoch: 7 Global Step: 88370 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:04:27,095-Speed 3037.64 samples/sec Loss 8.7945 LearningRate 0.0415 Epoch: 7 Global Step: 88380 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:04:30,410-Speed 3089.82 samples/sec Loss 8.7316 LearningRate 0.0415 Epoch: 7 Global Step: 88390 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:04:33,913-Speed 2923.98 samples/sec Loss 8.7206 LearningRate 0.0415 Epoch: 7 Global Step: 88400 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:04:37,326-Speed 3000.81 samples/sec Loss 8.7759 LearningRate 0.0415 Epoch: 7 Global Step: 88410 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:04:40,759-Speed 2983.59 samples/sec Loss 8.7385 LearningRate 0.0415 Epoch: 7 Global Step: 88420 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:04:44,267-Speed 2919.57 samples/sec Loss 8.8082 LearningRate 0.0415 Epoch: 7 Global Step: 88430 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:04:47,713-Speed 2972.79 samples/sec Loss 8.7018 LearningRate 0.0415 Epoch: 7 Global Step: 88440 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:04:51,137-Speed 2991.98 samples/sec Loss 8.7600 LearningRate 0.0415 Epoch: 7 Global Step: 88450 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:04:54,452-Speed 3089.80 samples/sec Loss 8.6831 LearningRate 0.0415 Epoch: 7 Global Step: 88460 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:04:57,818-Speed 3043.05 samples/sec Loss 8.9472 LearningRate 0.0415 Epoch: 7 Global Step: 88470 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:05:01,287-Speed 2952.67 samples/sec Loss 8.6828 LearningRate 0.0415 Epoch: 7 Global Step: 88480 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:05:04,712-Speed 2990.78 samples/sec Loss 8.7276 LearningRate 0.0414 Epoch: 7 Global Step: 88490 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:05:08,045-Speed 3072.95 samples/sec Loss 8.7176 LearningRate 0.0414 Epoch: 7 Global Step: 88500 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:05:11,409-Speed 3044.12 samples/sec Loss 8.8172 LearningRate 0.0414 Epoch: 7 Global Step: 88510 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:05:14,747-Speed 3068.66 samples/sec Loss 8.7217 LearningRate 0.0414 Epoch: 7 Global Step: 88520 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:05:18,704-Speed 2589.02 samples/sec Loss 8.8225 LearningRate 0.0414 Epoch: 7 Global Step: 88530 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:05:22,031-Speed 3078.20 samples/sec Loss 8.7922 LearningRate 0.0414 Epoch: 7 Global Step: 88540 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:05:25,392-Speed 3048.04 samples/sec Loss 8.8568 LearningRate 0.0414 Epoch: 7 Global Step: 88550 Fp16 Grad Scale: 262144 Required: 15 hours Training: 2022-04-27 10:05:30,148-Speed 2153.32 samples/sec Loss 8.8414 LearningRate 0.0414 Epoch: 7 Global Step: 88560 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:05:33,576-Speed 2988.56 samples/sec Loss 8.8066 LearningRate 0.0414 Epoch: 7 Global Step: 88570 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:05:37,615-Speed 2535.96 samples/sec Loss 8.6957 LearningRate 0.0414 Epoch: 7 Global Step: 88580 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:05:40,951-Speed 3069.96 samples/sec Loss 8.7383 LearningRate 0.0414 Epoch: 7 Global Step: 88590 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:05:44,294-Speed 3064.27 samples/sec Loss 8.7657 LearningRate 0.0414 Epoch: 7 Global Step: 88600 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:05:47,694-Speed 3012.81 samples/sec Loss 8.7959 LearningRate 0.0414 Epoch: 7 Global Step: 88610 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:05:51,093-Speed 3013.30 samples/sec Loss 8.6624 LearningRate 0.0414 Epoch: 7 Global Step: 88620 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:05:54,503-Speed 3003.77 samples/sec Loss 8.7722 LearningRate 0.0414 Epoch: 7 Global Step: 88630 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:05:57,897-Speed 3017.36 samples/sec Loss 8.5679 LearningRate 0.0414 Epoch: 7 Global Step: 88640 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:06:01,322-Speed 2991.09 samples/sec Loss 8.7849 LearningRate 0.0414 Epoch: 7 Global Step: 88650 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:06:04,675-Speed 3054.76 samples/sec Loss 8.8641 LearningRate 0.0414 Epoch: 7 Global Step: 88660 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:06:08,037-Speed 3046.93 samples/sec Loss 8.8363 LearningRate 0.0414 Epoch: 7 Global Step: 88670 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:06:11,472-Speed 2981.71 samples/sec Loss 8.8235 LearningRate 0.0413 Epoch: 7 Global Step: 88680 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:06:14,857-Speed 3025.71 samples/sec Loss 8.6101 LearningRate 0.0413 Epoch: 7 Global Step: 88690 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:06:18,265-Speed 3005.65 samples/sec Loss 8.9281 LearningRate 0.0413 Epoch: 7 Global Step: 88700 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:06:21,687-Speed 2993.86 samples/sec Loss 8.7739 LearningRate 0.0413 Epoch: 7 Global Step: 88710 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:06:25,098-Speed 3002.77 samples/sec Loss 8.7301 LearningRate 0.0413 Epoch: 7 Global Step: 88720 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:06:28,484-Speed 3025.20 samples/sec Loss 8.7709 LearningRate 0.0413 Epoch: 7 Global Step: 88730 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:06:31,932-Speed 2970.67 samples/sec Loss 8.8623 LearningRate 0.0413 Epoch: 7 Global Step: 88740 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:06:35,396-Speed 2956.36 samples/sec Loss 8.8185 LearningRate 0.0413 Epoch: 7 Global Step: 88750 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:06:38,814-Speed 2996.80 samples/sec Loss 8.9481 LearningRate 0.0413 Epoch: 7 Global Step: 88760 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:06:42,265-Speed 2968.38 samples/sec Loss 8.8766 LearningRate 0.0413 Epoch: 7 Global Step: 88770 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:06:45,704-Speed 2978.77 samples/sec Loss 8.8057 LearningRate 0.0413 Epoch: 7 Global Step: 88780 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:06:49,151-Speed 2971.23 samples/sec Loss 8.8650 LearningRate 0.0413 Epoch: 7 Global Step: 88790 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:06:52,576-Speed 2991.25 samples/sec Loss 8.8527 LearningRate 0.0413 Epoch: 7 Global Step: 88800 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:06:56,000-Speed 2991.05 samples/sec Loss 8.8816 LearningRate 0.0413 Epoch: 7 Global Step: 88810 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:06:59,374-Speed 3035.75 samples/sec Loss 8.7575 LearningRate 0.0413 Epoch: 7 Global Step: 88820 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:07:02,752-Speed 3032.20 samples/sec Loss 8.8678 LearningRate 0.0413 Epoch: 7 Global Step: 88830 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:07:06,136-Speed 3027.03 samples/sec Loss 8.8171 LearningRate 0.0413 Epoch: 7 Global Step: 88840 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:07:09,467-Speed 3074.27 samples/sec Loss 8.9022 LearningRate 0.0413 Epoch: 7 Global Step: 88850 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:07:12,829-Speed 3047.14 samples/sec Loss 8.8757 LearningRate 0.0413 Epoch: 7 Global Step: 88860 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:07:16,192-Speed 3045.73 samples/sec Loss 8.9141 LearningRate 0.0412 Epoch: 7 Global Step: 88870 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:07:19,548-Speed 3052.05 samples/sec Loss 8.8886 LearningRate 0.0412 Epoch: 7 Global Step: 88880 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:07:22,878-Speed 3076.28 samples/sec Loss 8.7891 LearningRate 0.0412 Epoch: 7 Global Step: 88890 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:07:26,192-Speed 3090.43 samples/sec Loss 8.9329 LearningRate 0.0412 Epoch: 7 Global Step: 88900 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:07:29,531-Speed 3068.36 samples/sec Loss 8.8077 LearningRate 0.0412 Epoch: 7 Global Step: 88910 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:07:33,490-Speed 2586.98 samples/sec Loss 9.0181 LearningRate 0.0412 Epoch: 7 Global Step: 88920 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:07:36,893-Speed 3009.73 samples/sec Loss 8.8072 LearningRate 0.0412 Epoch: 7 Global Step: 88930 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:07:40,243-Speed 3057.15 samples/sec Loss 8.9655 LearningRate 0.0412 Epoch: 7 Global Step: 88940 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:07:43,635-Speed 3020.77 samples/sec Loss 8.8605 LearningRate 0.0412 Epoch: 7 Global Step: 88950 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:07:47,011-Speed 3033.57 samples/sec Loss 8.9091 LearningRate 0.0412 Epoch: 7 Global Step: 88960 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:07:50,487-Speed 2947.91 samples/sec Loss 8.8177 LearningRate 0.0412 Epoch: 7 Global Step: 88970 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:07:53,948-Speed 2960.03 samples/sec Loss 8.9323 LearningRate 0.0412 Epoch: 7 Global Step: 88980 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:07:57,354-Speed 3007.52 samples/sec Loss 8.9706 LearningRate 0.0412 Epoch: 7 Global Step: 88990 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:08:00,742-Speed 3023.37 samples/sec Loss 8.8267 LearningRate 0.0412 Epoch: 7 Global Step: 89000 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:08:04,259-Speed 2912.85 samples/sec Loss 8.8299 LearningRate 0.0412 Epoch: 7 Global Step: 89010 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:08:07,667-Speed 3005.70 samples/sec Loss 8.8584 LearningRate 0.0412 Epoch: 7 Global Step: 89020 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:08:11,110-Speed 2975.00 samples/sec Loss 8.9487 LearningRate 0.0412 Epoch: 7 Global Step: 89030 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:08:14,509-Speed 3013.03 samples/sec Loss 8.9425 LearningRate 0.0412 Epoch: 7 Global Step: 89040 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:08:17,894-Speed 3026.18 samples/sec Loss 8.9646 LearningRate 0.0412 Epoch: 7 Global Step: 89050 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:08:21,280-Speed 3025.39 samples/sec Loss 8.8778 LearningRate 0.0412 Epoch: 7 Global Step: 89060 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:08:24,722-Speed 2975.40 samples/sec Loss 8.9107 LearningRate 0.0411 Epoch: 7 Global Step: 89070 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:08:28,178-Speed 2964.25 samples/sec Loss 8.7020 LearningRate 0.0411 Epoch: 7 Global Step: 89080 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:08:31,566-Speed 3022.82 samples/sec Loss 9.0227 LearningRate 0.0411 Epoch: 7 Global Step: 89090 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:08:34,964-Speed 3014.53 samples/sec Loss 8.9990 LearningRate 0.0411 Epoch: 7 Global Step: 89100 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:08:38,348-Speed 3026.88 samples/sec Loss 8.9012 LearningRate 0.0411 Epoch: 7 Global Step: 89110 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:08:41,767-Speed 2995.75 samples/sec Loss 8.8603 LearningRate 0.0411 Epoch: 7 Global Step: 89120 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:08:45,159-Speed 3020.51 samples/sec Loss 9.0431 LearningRate 0.0411 Epoch: 7 Global Step: 89130 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:08:48,546-Speed 3023.82 samples/sec Loss 8.7395 LearningRate 0.0411 Epoch: 7 Global Step: 89140 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:08:51,889-Speed 3063.74 samples/sec Loss 8.8137 LearningRate 0.0411 Epoch: 7 Global Step: 89150 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:08:55,259-Speed 3039.91 samples/sec Loss 8.9324 LearningRate 0.0411 Epoch: 7 Global Step: 89160 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:08:58,687-Speed 2987.34 samples/sec Loss 8.9750 LearningRate 0.0411 Epoch: 7 Global Step: 89170 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:09:02,060-Speed 3037.58 samples/sec Loss 8.8807 LearningRate 0.0411 Epoch: 7 Global Step: 89180 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:09:05,444-Speed 3026.39 samples/sec Loss 8.9896 LearningRate 0.0411 Epoch: 7 Global Step: 89190 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:09:08,898-Speed 2965.50 samples/sec Loss 8.9443 LearningRate 0.0411 Epoch: 7 Global Step: 89200 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:09:12,269-Speed 3038.33 samples/sec Loss 8.8305 LearningRate 0.0411 Epoch: 7 Global Step: 89210 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:09:15,643-Speed 3036.31 samples/sec Loss 8.9914 LearningRate 0.0411 Epoch: 7 Global Step: 89220 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:09:19,044-Speed 3010.95 samples/sec Loss 9.0833 LearningRate 0.0411 Epoch: 7 Global Step: 89230 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:09:22,541-Speed 2929.50 samples/sec Loss 8.9433 LearningRate 0.0411 Epoch: 7 Global Step: 89240 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:09:25,924-Speed 3027.54 samples/sec Loss 8.9684 LearningRate 0.0411 Epoch: 7 Global Step: 89250 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:09:29,298-Speed 3036.00 samples/sec Loss 8.8717 LearningRate 0.0410 Epoch: 7 Global Step: 89260 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:09:32,664-Speed 3042.86 samples/sec Loss 8.9998 LearningRate 0.0410 Epoch: 7 Global Step: 89270 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:09:36,079-Speed 2999.68 samples/sec Loss 8.9021 LearningRate 0.0410 Epoch: 7 Global Step: 89280 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:09:39,543-Speed 2956.54 samples/sec Loss 8.9251 LearningRate 0.0410 Epoch: 7 Global Step: 89290 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:09:42,884-Speed 3065.97 samples/sec Loss 8.8692 LearningRate 0.0410 Epoch: 7 Global Step: 89300 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:09:46,318-Speed 2983.35 samples/sec Loss 8.9551 LearningRate 0.0410 Epoch: 7 Global Step: 89310 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:09:49,727-Speed 3004.02 samples/sec Loss 8.9684 LearningRate 0.0410 Epoch: 7 Global Step: 89320 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:09:53,168-Speed 2976.91 samples/sec Loss 8.9648 LearningRate 0.0410 Epoch: 7 Global Step: 89330 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:09:56,510-Speed 3064.74 samples/sec Loss 9.0265 LearningRate 0.0410 Epoch: 7 Global Step: 89340 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:10:00,028-Speed 2911.74 samples/sec Loss 9.0529 LearningRate 0.0410 Epoch: 7 Global Step: 89350 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:10:03,515-Speed 2937.67 samples/sec Loss 9.0528 LearningRate 0.0410 Epoch: 7 Global Step: 89360 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:10:06,948-Speed 2983.46 samples/sec Loss 8.9408 LearningRate 0.0410 Epoch: 7 Global Step: 89370 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:10:10,401-Speed 2966.83 samples/sec Loss 9.0694 LearningRate 0.0410 Epoch: 7 Global Step: 89380 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:10:13,793-Speed 3019.50 samples/sec Loss 8.9792 LearningRate 0.0410 Epoch: 7 Global Step: 89390 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:10:17,133-Speed 3067.07 samples/sec Loss 8.9914 LearningRate 0.0410 Epoch: 7 Global Step: 89400 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:10:20,560-Speed 2989.10 samples/sec Loss 9.0491 LearningRate 0.0410 Epoch: 7 Global Step: 89410 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:10:23,988-Speed 2987.86 samples/sec Loss 9.0314 LearningRate 0.0410 Epoch: 7 Global Step: 89420 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:10:27,420-Speed 2984.70 samples/sec Loss 8.8898 LearningRate 0.0410 Epoch: 7 Global Step: 89430 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:10:30,922-Speed 2924.35 samples/sec Loss 9.1615 LearningRate 0.0410 Epoch: 7 Global Step: 89440 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:10:34,267-Speed 3062.01 samples/sec Loss 9.0874 LearningRate 0.0410 Epoch: 7 Global Step: 89450 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:10:37,672-Speed 3008.54 samples/sec Loss 8.9138 LearningRate 0.0409 Epoch: 7 Global Step: 89460 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:10:41,035-Speed 3046.06 samples/sec Loss 9.0770 LearningRate 0.0409 Epoch: 7 Global Step: 89470 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:10:44,356-Speed 3083.58 samples/sec Loss 9.0917 LearningRate 0.0409 Epoch: 7 Global Step: 89480 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:10:47,730-Speed 3036.62 samples/sec Loss 9.1051 LearningRate 0.0409 Epoch: 7 Global Step: 89490 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:10:51,038-Speed 3095.42 samples/sec Loss 8.9367 LearningRate 0.0409 Epoch: 7 Global Step: 89500 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:10:54,471-Speed 2984.64 samples/sec Loss 9.0223 LearningRate 0.0409 Epoch: 7 Global Step: 89510 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:10:57,839-Speed 3040.91 samples/sec Loss 8.9664 LearningRate 0.0409 Epoch: 7 Global Step: 89520 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:11:01,296-Speed 2963.35 samples/sec Loss 9.1226 LearningRate 0.0409 Epoch: 7 Global Step: 89530 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:11:04,673-Speed 3033.02 samples/sec Loss 8.8551 LearningRate 0.0409 Epoch: 7 Global Step: 89540 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:11:08,130-Speed 2962.23 samples/sec Loss 9.1116 LearningRate 0.0409 Epoch: 7 Global Step: 89550 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:11:11,493-Speed 3046.06 samples/sec Loss 8.9471 LearningRate 0.0409 Epoch: 7 Global Step: 89560 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:11:14,823-Speed 3076.35 samples/sec Loss 8.9608 LearningRate 0.0409 Epoch: 7 Global Step: 89570 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:11:18,152-Speed 3076.38 samples/sec Loss 8.9719 LearningRate 0.0409 Epoch: 7 Global Step: 89580 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:11:21,570-Speed 2996.90 samples/sec Loss 9.0467 LearningRate 0.0409 Epoch: 7 Global Step: 89590 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:11:24,947-Speed 3033.13 samples/sec Loss 9.0241 LearningRate 0.0409 Epoch: 7 Global Step: 89600 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:11:28,354-Speed 3006.83 samples/sec Loss 9.0247 LearningRate 0.0409 Epoch: 7 Global Step: 89610 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:11:31,739-Speed 3025.95 samples/sec Loss 8.9936 LearningRate 0.0409 Epoch: 7 Global Step: 89620 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:11:35,076-Speed 3069.11 samples/sec Loss 8.9659 LearningRate 0.0409 Epoch: 7 Global Step: 89630 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:11:38,428-Speed 3055.80 samples/sec Loss 8.9770 LearningRate 0.0409 Epoch: 7 Global Step: 89640 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:11:41,795-Speed 3042.54 samples/sec Loss 8.9778 LearningRate 0.0408 Epoch: 7 Global Step: 89650 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:11:45,143-Speed 3059.11 samples/sec Loss 9.1349 LearningRate 0.0408 Epoch: 7 Global Step: 89660 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:11:48,465-Speed 3083.34 samples/sec Loss 8.9212 LearningRate 0.0408 Epoch: 7 Global Step: 89670 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:11:51,935-Speed 2952.15 samples/sec Loss 9.0575 LearningRate 0.0408 Epoch: 7 Global Step: 89680 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:11:55,337-Speed 3010.94 samples/sec Loss 9.0360 LearningRate 0.0408 Epoch: 7 Global Step: 89690 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:11:58,688-Speed 3056.90 samples/sec Loss 9.0753 LearningRate 0.0408 Epoch: 7 Global Step: 89700 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:12:02,076-Speed 3022.38 samples/sec Loss 8.9904 LearningRate 0.0408 Epoch: 7 Global Step: 89710 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:12:05,449-Speed 3037.28 samples/sec Loss 9.1048 LearningRate 0.0408 Epoch: 7 Global Step: 89720 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:12:08,801-Speed 3056.17 samples/sec Loss 8.9242 LearningRate 0.0408 Epoch: 7 Global Step: 89730 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:12:12,173-Speed 3036.70 samples/sec Loss 9.1564 LearningRate 0.0408 Epoch: 7 Global Step: 89740 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:12:15,511-Speed 3069.50 samples/sec Loss 8.9922 LearningRate 0.0408 Epoch: 7 Global Step: 89750 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:12:18,861-Speed 3057.06 samples/sec Loss 9.1362 LearningRate 0.0408 Epoch: 7 Global Step: 89760 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:12:22,216-Speed 3052.58 samples/sec Loss 9.0106 LearningRate 0.0408 Epoch: 7 Global Step: 89770 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:12:25,624-Speed 3006.07 samples/sec Loss 9.1287 LearningRate 0.0408 Epoch: 7 Global Step: 89780 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:12:28,969-Speed 3063.11 samples/sec Loss 9.0806 LearningRate 0.0408 Epoch: 7 Global Step: 89790 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:12:32,438-Speed 2952.48 samples/sec Loss 9.1127 LearningRate 0.0408 Epoch: 7 Global Step: 89800 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:12:35,811-Speed 3036.88 samples/sec Loss 8.9792 LearningRate 0.0408 Epoch: 7 Global Step: 89810 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:12:39,189-Speed 3032.56 samples/sec Loss 8.9939 LearningRate 0.0408 Epoch: 7 Global Step: 89820 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:12:42,524-Speed 3071.23 samples/sec Loss 8.9222 LearningRate 0.0408 Epoch: 7 Global Step: 89830 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:12:45,933-Speed 3005.92 samples/sec Loss 9.0604 LearningRate 0.0407 Epoch: 7 Global Step: 89840 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:12:49,244-Speed 3093.02 samples/sec Loss 9.1680 LearningRate 0.0407 Epoch: 7 Global Step: 89850 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:12:52,641-Speed 3015.83 samples/sec Loss 8.9649 LearningRate 0.0407 Epoch: 7 Global Step: 89860 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:12:55,999-Speed 3049.81 samples/sec Loss 9.1070 LearningRate 0.0407 Epoch: 7 Global Step: 89870 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:12:59,363-Speed 3045.14 samples/sec Loss 9.0568 LearningRate 0.0407 Epoch: 7 Global Step: 89880 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:13:02,688-Speed 3080.60 samples/sec Loss 9.0268 LearningRate 0.0407 Epoch: 7 Global Step: 89890 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:13:06,051-Speed 3045.84 samples/sec Loss 9.0046 LearningRate 0.0407 Epoch: 7 Global Step: 89900 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:13:09,478-Speed 2989.01 samples/sec Loss 8.9566 LearningRate 0.0407 Epoch: 7 Global Step: 89910 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:13:12,807-Speed 3076.97 samples/sec Loss 9.0083 LearningRate 0.0407 Epoch: 7 Global Step: 89920 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:13:16,177-Speed 3039.23 samples/sec Loss 9.1000 LearningRate 0.0407 Epoch: 7 Global Step: 89930 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:13:19,486-Speed 3096.67 samples/sec Loss 9.1193 LearningRate 0.0407 Epoch: 7 Global Step: 89940 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:13:22,932-Speed 2973.01 samples/sec Loss 9.1041 LearningRate 0.0407 Epoch: 7 Global Step: 89950 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:13:26,327-Speed 3016.20 samples/sec Loss 9.0322 LearningRate 0.0407 Epoch: 7 Global Step: 89960 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:13:29,765-Speed 2980.47 samples/sec Loss 9.0663 LearningRate 0.0407 Epoch: 7 Global Step: 89970 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:13:33,230-Speed 2956.43 samples/sec Loss 8.9947 LearningRate 0.0407 Epoch: 7 Global Step: 89980 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:13:36,610-Speed 3030.34 samples/sec Loss 9.1804 LearningRate 0.0407 Epoch: 7 Global Step: 89990 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:13:40,007-Speed 3015.22 samples/sec Loss 9.0825 LearningRate 0.0407 Epoch: 7 Global Step: 90000 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:13:43,448-Speed 2976.80 samples/sec Loss 9.0072 LearningRate 0.0407 Epoch: 7 Global Step: 90010 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:13:46,930-Speed 2941.42 samples/sec Loss 9.1141 LearningRate 0.0407 Epoch: 7 Global Step: 90020 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:13:50,390-Speed 2961.01 samples/sec Loss 9.0458 LearningRate 0.0407 Epoch: 7 Global Step: 90030 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:13:53,908-Speed 2911.27 samples/sec Loss 9.2579 LearningRate 0.0406 Epoch: 7 Global Step: 90040 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:13:57,348-Speed 2977.23 samples/sec Loss 9.1649 LearningRate 0.0406 Epoch: 7 Global Step: 90050 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:14:00,710-Speed 3047.28 samples/sec Loss 9.0170 LearningRate 0.0406 Epoch: 7 Global Step: 90060 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:14:04,150-Speed 2977.46 samples/sec Loss 9.0073 LearningRate 0.0406 Epoch: 7 Global Step: 90070 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:14:07,574-Speed 2991.21 samples/sec Loss 9.1533 LearningRate 0.0406 Epoch: 7 Global Step: 90080 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:14:10,950-Speed 3034.03 samples/sec Loss 9.1766 LearningRate 0.0406 Epoch: 7 Global Step: 90090 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:14:14,290-Speed 3067.00 samples/sec Loss 9.1252 LearningRate 0.0406 Epoch: 7 Global Step: 90100 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:14:17,653-Speed 3046.37 samples/sec Loss 9.1727 LearningRate 0.0406 Epoch: 7 Global Step: 90110 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:14:21,061-Speed 3005.27 samples/sec Loss 9.0984 LearningRate 0.0406 Epoch: 7 Global Step: 90120 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:14:24,478-Speed 2997.16 samples/sec Loss 9.0067 LearningRate 0.0406 Epoch: 7 Global Step: 90130 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:14:27,861-Speed 3027.58 samples/sec Loss 9.0024 LearningRate 0.0406 Epoch: 7 Global Step: 90140 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:14:31,250-Speed 3022.91 samples/sec Loss 9.0591 LearningRate 0.0406 Epoch: 7 Global Step: 90150 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:14:34,614-Speed 3044.66 samples/sec Loss 9.1600 LearningRate 0.0406 Epoch: 7 Global Step: 90160 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:14:38,027-Speed 3001.67 samples/sec Loss 8.9467 LearningRate 0.0406 Epoch: 7 Global Step: 90170 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:14:41,476-Speed 2969.86 samples/sec Loss 9.0760 LearningRate 0.0406 Epoch: 7 Global Step: 90180 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:14:44,891-Speed 2999.15 samples/sec Loss 9.0674 LearningRate 0.0406 Epoch: 7 Global Step: 90190 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:14:48,383-Speed 2933.81 samples/sec Loss 9.1916 LearningRate 0.0406 Epoch: 7 Global Step: 90200 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:14:51,805-Speed 2992.50 samples/sec Loss 9.1781 LearningRate 0.0406 Epoch: 7 Global Step: 90210 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:14:55,181-Speed 3034.02 samples/sec Loss 9.1218 LearningRate 0.0406 Epoch: 7 Global Step: 90220 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:14:58,537-Speed 3052.88 samples/sec Loss 8.9273 LearningRate 0.0405 Epoch: 7 Global Step: 90230 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:15:01,973-Speed 2980.46 samples/sec Loss 9.1006 LearningRate 0.0405 Epoch: 7 Global Step: 90240 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:15:05,409-Speed 2985.09 samples/sec Loss 9.1067 LearningRate 0.0405 Epoch: 7 Global Step: 90250 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:15:08,828-Speed 2995.50 samples/sec Loss 9.2770 LearningRate 0.0405 Epoch: 7 Global Step: 90260 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:15:12,217-Speed 3022.78 samples/sec Loss 9.0360 LearningRate 0.0405 Epoch: 7 Global Step: 90270 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:15:15,574-Speed 3050.94 samples/sec Loss 9.0142 LearningRate 0.0405 Epoch: 7 Global Step: 90280 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:15:18,940-Speed 3043.66 samples/sec Loss 9.0540 LearningRate 0.0405 Epoch: 7 Global Step: 90290 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:15:22,265-Speed 3079.78 samples/sec Loss 9.1429 LearningRate 0.0405 Epoch: 7 Global Step: 90300 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:15:25,591-Speed 3079.57 samples/sec Loss 9.1614 LearningRate 0.0405 Epoch: 7 Global Step: 90310 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:15:28,940-Speed 3058.91 samples/sec Loss 9.0701 LearningRate 0.0405 Epoch: 7 Global Step: 90320 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:15:32,315-Speed 3034.64 samples/sec Loss 9.0903 LearningRate 0.0405 Epoch: 7 Global Step: 90330 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:15:35,799-Speed 2940.31 samples/sec Loss 9.1612 LearningRate 0.0405 Epoch: 7 Global Step: 90340 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:15:39,223-Speed 2992.07 samples/sec Loss 9.0880 LearningRate 0.0405 Epoch: 7 Global Step: 90350 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:15:42,671-Speed 2970.50 samples/sec Loss 9.1082 LearningRate 0.0405 Epoch: 7 Global Step: 90360 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:15:46,052-Speed 3029.97 samples/sec Loss 9.0228 LearningRate 0.0405 Epoch: 7 Global Step: 90370 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:15:49,406-Speed 3053.49 samples/sec Loss 9.1425 LearningRate 0.0405 Epoch: 7 Global Step: 90380 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:15:52,859-Speed 2966.63 samples/sec Loss 9.0460 LearningRate 0.0405 Epoch: 7 Global Step: 90390 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:15:56,222-Speed 3045.92 samples/sec Loss 9.0226 LearningRate 0.0405 Epoch: 7 Global Step: 90400 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:15:59,682-Speed 2959.79 samples/sec Loss 9.0218 LearningRate 0.0405 Epoch: 7 Global Step: 90410 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:16:03,071-Speed 3022.55 samples/sec Loss 9.1065 LearningRate 0.0405 Epoch: 7 Global Step: 90420 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:16:06,452-Speed 3030.11 samples/sec Loss 9.0658 LearningRate 0.0404 Epoch: 7 Global Step: 90430 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:16:09,823-Speed 3038.60 samples/sec Loss 9.2124 LearningRate 0.0404 Epoch: 7 Global Step: 90440 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:16:13,221-Speed 3014.43 samples/sec Loss 9.0756 LearningRate 0.0404 Epoch: 7 Global Step: 90450 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:16:16,617-Speed 3015.58 samples/sec Loss 9.1882 LearningRate 0.0404 Epoch: 7 Global Step: 90460 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:16:20,048-Speed 2985.45 samples/sec Loss 9.1134 LearningRate 0.0404 Epoch: 7 Global Step: 90470 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:16:23,473-Speed 2990.63 samples/sec Loss 8.9329 LearningRate 0.0404 Epoch: 7 Global Step: 90480 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:16:26,922-Speed 2969.90 samples/sec Loss 9.0976 LearningRate 0.0404 Epoch: 7 Global Step: 90490 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:16:30,419-Speed 2929.70 samples/sec Loss 9.0460 LearningRate 0.0404 Epoch: 7 Global Step: 90500 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:16:33,822-Speed 3009.00 samples/sec Loss 9.0012 LearningRate 0.0404 Epoch: 7 Global Step: 90510 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:16:37,202-Speed 3030.69 samples/sec Loss 9.0834 LearningRate 0.0404 Epoch: 7 Global Step: 90520 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:16:40,578-Speed 3034.91 samples/sec Loss 9.0174 LearningRate 0.0404 Epoch: 7 Global Step: 90530 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:16:43,945-Speed 3042.30 samples/sec Loss 9.1547 LearningRate 0.0404 Epoch: 7 Global Step: 90540 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:16:47,318-Speed 3036.41 samples/sec Loss 9.1736 LearningRate 0.0404 Epoch: 7 Global Step: 90550 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:16:50,776-Speed 2962.43 samples/sec Loss 9.2045 LearningRate 0.0404 Epoch: 7 Global Step: 90560 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:16:54,119-Speed 3063.64 samples/sec Loss 9.0159 LearningRate 0.0404 Epoch: 7 Global Step: 90570 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:16:57,458-Speed 3067.58 samples/sec Loss 9.3973 LearningRate 0.0404 Epoch: 7 Global Step: 90580 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:17:00,776-Speed 3086.88 samples/sec Loss 9.0767 LearningRate 0.0404 Epoch: 7 Global Step: 90590 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:17:04,114-Speed 3068.80 samples/sec Loss 9.1124 LearningRate 0.0404 Epoch: 7 Global Step: 90600 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:17:07,426-Speed 3092.96 samples/sec Loss 9.2056 LearningRate 0.0404 Epoch: 7 Global Step: 90610 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:17:10,769-Speed 3064.22 samples/sec Loss 9.1909 LearningRate 0.0403 Epoch: 7 Global Step: 90620 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:17:14,150-Speed 3029.30 samples/sec Loss 9.0000 LearningRate 0.0403 Epoch: 7 Global Step: 90630 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:17:17,485-Speed 3071.38 samples/sec Loss 9.1793 LearningRate 0.0403 Epoch: 7 Global Step: 90640 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:17:20,870-Speed 3025.58 samples/sec Loss 9.0551 LearningRate 0.0403 Epoch: 7 Global Step: 90650 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:17:24,278-Speed 3005.41 samples/sec Loss 9.0951 LearningRate 0.0403 Epoch: 7 Global Step: 90660 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:17:27,717-Speed 2978.59 samples/sec Loss 9.0984 LearningRate 0.0403 Epoch: 7 Global Step: 90670 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:17:31,152-Speed 2982.25 samples/sec Loss 9.1860 LearningRate 0.0403 Epoch: 7 Global Step: 90680 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:17:34,502-Speed 3057.55 samples/sec Loss 9.1463 LearningRate 0.0403 Epoch: 7 Global Step: 90690 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:17:37,858-Speed 3051.66 samples/sec Loss 9.3023 LearningRate 0.0403 Epoch: 7 Global Step: 90700 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:17:41,232-Speed 3036.69 samples/sec Loss 9.1949 LearningRate 0.0403 Epoch: 7 Global Step: 90710 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:17:44,583-Speed 3055.88 samples/sec Loss 9.1094 LearningRate 0.0403 Epoch: 7 Global Step: 90720 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:17:48,058-Speed 2948.03 samples/sec Loss 9.2244 LearningRate 0.0403 Epoch: 7 Global Step: 90730 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:17:51,461-Speed 3009.93 samples/sec Loss 9.1936 LearningRate 0.0403 Epoch: 7 Global Step: 90740 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:17:54,877-Speed 2998.59 samples/sec Loss 9.3003 LearningRate 0.0403 Epoch: 7 Global Step: 90750 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:17:58,362-Speed 2939.21 samples/sec Loss 9.1378 LearningRate 0.0403 Epoch: 7 Global Step: 90760 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:18:01,793-Speed 2984.84 samples/sec Loss 8.9902 LearningRate 0.0403 Epoch: 7 Global Step: 90770 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:18:05,140-Speed 3060.26 samples/sec Loss 9.1417 LearningRate 0.0403 Epoch: 7 Global Step: 90780 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:18:08,562-Speed 2993.43 samples/sec Loss 9.2030 LearningRate 0.0403 Epoch: 7 Global Step: 90790 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:18:11,971-Speed 3004.54 samples/sec Loss 9.1216 LearningRate 0.0403 Epoch: 7 Global Step: 90800 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:18:15,388-Speed 2998.35 samples/sec Loss 9.1754 LearningRate 0.0403 Epoch: 7 Global Step: 90810 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:18:18,810-Speed 2993.36 samples/sec Loss 9.1331 LearningRate 0.0402 Epoch: 7 Global Step: 90820 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:18:22,225-Speed 2998.72 samples/sec Loss 8.9983 LearningRate 0.0402 Epoch: 7 Global Step: 90830 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:18:25,630-Speed 3008.62 samples/sec Loss 9.2045 LearningRate 0.0402 Epoch: 7 Global Step: 90840 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:18:29,039-Speed 3005.09 samples/sec Loss 9.1268 LearningRate 0.0402 Epoch: 7 Global Step: 90850 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:18:32,459-Speed 2994.81 samples/sec Loss 9.1795 LearningRate 0.0402 Epoch: 7 Global Step: 90860 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:18:35,843-Speed 3026.65 samples/sec Loss 9.1537 LearningRate 0.0402 Epoch: 7 Global Step: 90870 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:18:39,275-Speed 2984.57 samples/sec Loss 9.1172 LearningRate 0.0402 Epoch: 7 Global Step: 90880 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:18:42,731-Speed 2963.60 samples/sec Loss 9.1037 LearningRate 0.0402 Epoch: 7 Global Step: 90890 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:18:46,152-Speed 2994.50 samples/sec Loss 9.1355 LearningRate 0.0402 Epoch: 7 Global Step: 90900 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:18:49,562-Speed 3003.71 samples/sec Loss 9.1430 LearningRate 0.0402 Epoch: 7 Global Step: 90910 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:18:52,911-Speed 3058.78 samples/sec Loss 9.1474 LearningRate 0.0402 Epoch: 7 Global Step: 90920 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:18:56,262-Speed 3056.29 samples/sec Loss 9.0480 LearningRate 0.0402 Epoch: 7 Global Step: 90930 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:18:59,815-Speed 2883.39 samples/sec Loss 9.2274 LearningRate 0.0402 Epoch: 7 Global Step: 90940 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:19:03,192-Speed 3032.65 samples/sec Loss 9.0960 LearningRate 0.0402 Epoch: 7 Global Step: 90950 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:19:06,527-Speed 3071.28 samples/sec Loss 9.1639 LearningRate 0.0402 Epoch: 7 Global Step: 90960 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:19:09,865-Speed 3068.98 samples/sec Loss 9.2096 LearningRate 0.0402 Epoch: 7 Global Step: 90970 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:19:13,229-Speed 3044.69 samples/sec Loss 9.1024 LearningRate 0.0402 Epoch: 7 Global Step: 90980 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:19:16,632-Speed 3010.59 samples/sec Loss 9.1914 LearningRate 0.0402 Epoch: 7 Global Step: 90990 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:19:19,977-Speed 3061.52 samples/sec Loss 9.2601 LearningRate 0.0402 Epoch: 7 Global Step: 91000 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:19:23,389-Speed 3002.63 samples/sec Loss 8.9977 LearningRate 0.0402 Epoch: 7 Global Step: 91010 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:19:26,780-Speed 3020.75 samples/sec Loss 9.1218 LearningRate 0.0401 Epoch: 7 Global Step: 91020 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:19:30,238-Speed 2962.30 samples/sec Loss 9.1553 LearningRate 0.0401 Epoch: 7 Global Step: 91030 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:19:33,722-Speed 2939.32 samples/sec Loss 9.2405 LearningRate 0.0401 Epoch: 7 Global Step: 91040 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:19:37,205-Speed 2940.62 samples/sec Loss 9.1610 LearningRate 0.0401 Epoch: 7 Global Step: 91050 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:19:40,684-Speed 2944.78 samples/sec Loss 9.1366 LearningRate 0.0401 Epoch: 7 Global Step: 91060 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:19:44,163-Speed 2944.19 samples/sec Loss 9.2828 LearningRate 0.0401 Epoch: 7 Global Step: 91070 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:19:47,552-Speed 3022.11 samples/sec Loss 9.1007 LearningRate 0.0401 Epoch: 7 Global Step: 91080 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:19:51,058-Speed 2921.74 samples/sec Loss 9.1104 LearningRate 0.0401 Epoch: 7 Global Step: 91090 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:19:54,418-Speed 3048.71 samples/sec Loss 9.0853 LearningRate 0.0401 Epoch: 7 Global Step: 91100 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:19:57,784-Speed 3043.35 samples/sec Loss 9.1492 LearningRate 0.0401 Epoch: 7 Global Step: 91110 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:20:01,291-Speed 2920.44 samples/sec Loss 9.0812 LearningRate 0.0401 Epoch: 7 Global Step: 91120 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:20:04,647-Speed 3052.68 samples/sec Loss 9.0170 LearningRate 0.0401 Epoch: 7 Global Step: 91130 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:20:08,119-Speed 2950.27 samples/sec Loss 9.0878 LearningRate 0.0401 Epoch: 7 Global Step: 91140 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:20:11,506-Speed 3024.63 samples/sec Loss 9.1964 LearningRate 0.0401 Epoch: 7 Global Step: 91150 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:20:14,833-Speed 3078.99 samples/sec Loss 9.1612 LearningRate 0.0401 Epoch: 7 Global Step: 91160 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:20:18,156-Speed 3081.98 samples/sec Loss 9.0271 LearningRate 0.0401 Epoch: 7 Global Step: 91170 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:20:21,506-Speed 3058.58 samples/sec Loss 9.1139 LearningRate 0.0401 Epoch: 7 Global Step: 91180 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:20:24,914-Speed 3004.88 samples/sec Loss 9.1061 LearningRate 0.0401 Epoch: 7 Global Step: 91190 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:20:28,269-Speed 3053.41 samples/sec Loss 9.1235 LearningRate 0.0401 Epoch: 7 Global Step: 91200 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:20:31,660-Speed 3020.00 samples/sec Loss 9.0985 LearningRate 0.0400 Epoch: 7 Global Step: 91210 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:20:35,118-Speed 2963.89 samples/sec Loss 9.1875 LearningRate 0.0400 Epoch: 7 Global Step: 91220 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:20:38,537-Speed 2996.49 samples/sec Loss 9.1969 LearningRate 0.0400 Epoch: 7 Global Step: 91230 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:20:41,869-Speed 3073.77 samples/sec Loss 9.0515 LearningRate 0.0400 Epoch: 7 Global Step: 91240 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:20:45,249-Speed 3030.92 samples/sec Loss 9.0517 LearningRate 0.0400 Epoch: 7 Global Step: 91250 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:20:48,587-Speed 3068.54 samples/sec Loss 9.2547 LearningRate 0.0400 Epoch: 7 Global Step: 91260 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:20:51,980-Speed 3019.08 samples/sec Loss 9.0727 LearningRate 0.0400 Epoch: 7 Global Step: 91270 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:20:55,338-Speed 3049.65 samples/sec Loss 9.1181 LearningRate 0.0400 Epoch: 7 Global Step: 91280 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:20:58,758-Speed 2995.35 samples/sec Loss 9.2304 LearningRate 0.0400 Epoch: 7 Global Step: 91290 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:21:02,111-Speed 3054.47 samples/sec Loss 9.1333 LearningRate 0.0400 Epoch: 7 Global Step: 91300 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:21:05,487-Speed 3034.92 samples/sec Loss 9.1819 LearningRate 0.0400 Epoch: 7 Global Step: 91310 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:21:08,916-Speed 2987.29 samples/sec Loss 9.1355 LearningRate 0.0400 Epoch: 7 Global Step: 91320 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:21:12,295-Speed 3031.33 samples/sec Loss 9.2804 LearningRate 0.0400 Epoch: 7 Global Step: 91330 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:21:15,653-Speed 3051.37 samples/sec Loss 9.0987 LearningRate 0.0400 Epoch: 7 Global Step: 91340 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:21:19,036-Speed 3027.15 samples/sec Loss 9.2681 LearningRate 0.0400 Epoch: 7 Global Step: 91350 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:21:22,404-Speed 3041.98 samples/sec Loss 9.1404 LearningRate 0.0400 Epoch: 7 Global Step: 91360 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:21:25,733-Speed 3076.58 samples/sec Loss 9.2019 LearningRate 0.0400 Epoch: 7 Global Step: 91370 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:21:29,115-Speed 3028.69 samples/sec Loss 8.9826 LearningRate 0.0400 Epoch: 7 Global Step: 91380 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:21:32,526-Speed 3003.18 samples/sec Loss 9.1828 LearningRate 0.0400 Epoch: 7 Global Step: 91390 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:21:35,924-Speed 3014.39 samples/sec Loss 9.1357 LearningRate 0.0400 Epoch: 7 Global Step: 91400 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:21:39,323-Speed 3012.68 samples/sec Loss 9.2415 LearningRate 0.0399 Epoch: 7 Global Step: 91410 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:21:42,734-Speed 3003.05 samples/sec Loss 9.0227 LearningRate 0.0399 Epoch: 7 Global Step: 91420 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:21:46,155-Speed 2994.40 samples/sec Loss 9.1433 LearningRate 0.0399 Epoch: 7 Global Step: 91430 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:21:49,627-Speed 2950.43 samples/sec Loss 9.1675 LearningRate 0.0399 Epoch: 7 Global Step: 91440 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:21:53,000-Speed 3036.24 samples/sec Loss 9.1369 LearningRate 0.0399 Epoch: 7 Global Step: 91450 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:21:56,415-Speed 2999.61 samples/sec Loss 9.1907 LearningRate 0.0399 Epoch: 7 Global Step: 91460 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:21:59,786-Speed 3038.99 samples/sec Loss 9.2669 LearningRate 0.0399 Epoch: 7 Global Step: 91470 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:22:03,135-Speed 3058.18 samples/sec Loss 9.0855 LearningRate 0.0399 Epoch: 7 Global Step: 91480 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:22:06,503-Speed 3041.57 samples/sec Loss 9.2340 LearningRate 0.0399 Epoch: 7 Global Step: 91490 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:22:09,856-Speed 3054.51 samples/sec Loss 9.1951 LearningRate 0.0399 Epoch: 7 Global Step: 91500 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:22:13,213-Speed 3051.42 samples/sec Loss 9.1888 LearningRate 0.0399 Epoch: 7 Global Step: 91510 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:22:16,669-Speed 2963.64 samples/sec Loss 9.1226 LearningRate 0.0399 Epoch: 7 Global Step: 91520 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:22:20,061-Speed 3020.11 samples/sec Loss 9.0709 LearningRate 0.0399 Epoch: 7 Global Step: 91530 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:22:23,500-Speed 2978.60 samples/sec Loss 9.0868 LearningRate 0.0399 Epoch: 7 Global Step: 91540 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:22:26,910-Speed 3002.97 samples/sec Loss 9.1502 LearningRate 0.0399 Epoch: 7 Global Step: 91550 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:22:30,295-Speed 3026.52 samples/sec Loss 9.2627 LearningRate 0.0399 Epoch: 7 Global Step: 91560 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:22:33,620-Speed 3080.25 samples/sec Loss 9.0323 LearningRate 0.0399 Epoch: 7 Global Step: 91570 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:22:36,934-Speed 3091.08 samples/sec Loss 9.2442 LearningRate 0.0399 Epoch: 7 Global Step: 91580 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 10:22:40,391-Speed 2962.72 samples/sec Loss 9.1462 LearningRate 0.0399 Epoch: 7 Global Step: 91590 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 10:22:43,768-Speed 3033.66 samples/sec Loss 9.2292 LearningRate 0.0399 Epoch: 7 Global Step: 91600 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 10:22:47,091-Speed 3081.83 samples/sec Loss 9.1483 LearningRate 0.0398 Epoch: 7 Global Step: 91610 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 10:22:50,523-Speed 2984.89 samples/sec Loss 9.1846 LearningRate 0.0398 Epoch: 7 Global Step: 91620 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 10:22:53,908-Speed 3025.91 samples/sec Loss 8.9830 LearningRate 0.0398 Epoch: 7 Global Step: 91630 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 10:22:57,284-Speed 3033.72 samples/sec Loss 9.0999 LearningRate 0.0398 Epoch: 7 Global Step: 91640 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 10:23:00,674-Speed 3021.54 samples/sec Loss 9.2521 LearningRate 0.0398 Epoch: 7 Global Step: 91650 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 10:23:04,101-Speed 2988.71 samples/sec Loss 9.2596 LearningRate 0.0398 Epoch: 7 Global Step: 91660 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 10:23:07,564-Speed 2958.26 samples/sec Loss 9.1331 LearningRate 0.0398 Epoch: 7 Global Step: 91670 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 10:23:10,927-Speed 3045.29 samples/sec Loss 8.9684 LearningRate 0.0398 Epoch: 7 Global Step: 91680 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:23:14,285-Speed 3050.41 samples/sec Loss 9.1814 LearningRate 0.0398 Epoch: 7 Global Step: 91690 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:23:17,679-Speed 3018.61 samples/sec Loss 9.1515 LearningRate 0.0398 Epoch: 7 Global Step: 91700 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:23:21,071-Speed 3019.95 samples/sec Loss 9.0844 LearningRate 0.0398 Epoch: 7 Global Step: 91710 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:23:24,574-Speed 2923.93 samples/sec Loss 9.1215 LearningRate 0.0398 Epoch: 7 Global Step: 91720 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:23:28,013-Speed 2977.68 samples/sec Loss 9.0823 LearningRate 0.0398 Epoch: 7 Global Step: 91730 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:23:31,445-Speed 2984.70 samples/sec Loss 9.2279 LearningRate 0.0398 Epoch: 7 Global Step: 91740 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:23:34,790-Speed 3061.96 samples/sec Loss 9.3143 LearningRate 0.0398 Epoch: 7 Global Step: 91750 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:23:38,150-Speed 3049.90 samples/sec Loss 9.1482 LearningRate 0.0398 Epoch: 7 Global Step: 91760 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:23:41,508-Speed 3050.39 samples/sec Loss 9.2974 LearningRate 0.0398 Epoch: 7 Global Step: 91770 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:23:44,955-Speed 2971.92 samples/sec Loss 9.1688 LearningRate 0.0398 Epoch: 7 Global Step: 91780 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:23:48,351-Speed 3015.65 samples/sec Loss 9.2636 LearningRate 0.0398 Epoch: 7 Global Step: 91790 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:23:51,776-Speed 2991.43 samples/sec Loss 9.1739 LearningRate 0.0397 Epoch: 7 Global Step: 91800 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:23:55,121-Speed 3061.80 samples/sec Loss 9.1269 LearningRate 0.0397 Epoch: 7 Global Step: 91810 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:23:58,458-Speed 3070.21 samples/sec Loss 9.3067 LearningRate 0.0397 Epoch: 7 Global Step: 91820 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:24:01,885-Speed 2988.76 samples/sec Loss 9.0654 LearningRate 0.0397 Epoch: 7 Global Step: 91830 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:24:05,331-Speed 2972.37 samples/sec Loss 9.1309 LearningRate 0.0397 Epoch: 7 Global Step: 91840 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:24:08,660-Speed 3076.79 samples/sec Loss 9.1515 LearningRate 0.0397 Epoch: 7 Global Step: 91850 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:24:11,991-Speed 3075.06 samples/sec Loss 9.0268 LearningRate 0.0397 Epoch: 7 Global Step: 91860 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:24:15,429-Speed 2980.13 samples/sec Loss 9.1884 LearningRate 0.0397 Epoch: 7 Global Step: 91870 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:24:18,821-Speed 3018.84 samples/sec Loss 9.1367 LearningRate 0.0397 Epoch: 7 Global Step: 91880 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:24:22,217-Speed 3016.68 samples/sec Loss 9.1771 LearningRate 0.0397 Epoch: 7 Global Step: 91890 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:24:25,606-Speed 3022.40 samples/sec Loss 9.2061 LearningRate 0.0397 Epoch: 7 Global Step: 91900 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:24:28,972-Speed 3043.19 samples/sec Loss 9.1661 LearningRate 0.0397 Epoch: 7 Global Step: 91910 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:24:32,394-Speed 2993.22 samples/sec Loss 9.1425 LearningRate 0.0397 Epoch: 7 Global Step: 91920 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:24:35,765-Speed 3039.29 samples/sec Loss 9.0099 LearningRate 0.0397 Epoch: 7 Global Step: 91930 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:24:39,199-Speed 2981.88 samples/sec Loss 9.1978 LearningRate 0.0397 Epoch: 7 Global Step: 91940 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:24:42,622-Speed 2993.07 samples/sec Loss 9.1545 LearningRate 0.0397 Epoch: 7 Global Step: 91950 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:24:45,977-Speed 3052.88 samples/sec Loss 9.0647 LearningRate 0.0397 Epoch: 7 Global Step: 91960 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:24:49,433-Speed 2964.05 samples/sec Loss 9.1536 LearningRate 0.0397 Epoch: 7 Global Step: 91970 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:24:52,919-Speed 2938.30 samples/sec Loss 9.1747 LearningRate 0.0397 Epoch: 7 Global Step: 91980 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:24:56,285-Speed 3042.94 samples/sec Loss 9.1633 LearningRate 0.0397 Epoch: 7 Global Step: 91990 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:24:59,645-Speed 3048.33 samples/sec Loss 9.1516 LearningRate 0.0396 Epoch: 7 Global Step: 92000 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:25:03,039-Speed 3018.40 samples/sec Loss 9.1454 LearningRate 0.0396 Epoch: 7 Global Step: 92010 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:25:06,379-Speed 3067.04 samples/sec Loss 9.1392 LearningRate 0.0396 Epoch: 7 Global Step: 92020 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:25:09,730-Speed 3056.44 samples/sec Loss 9.1738 LearningRate 0.0396 Epoch: 7 Global Step: 92030 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:25:13,209-Speed 2943.69 samples/sec Loss 9.1685 LearningRate 0.0396 Epoch: 7 Global Step: 92040 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:25:16,569-Speed 3048.81 samples/sec Loss 9.2109 LearningRate 0.0396 Epoch: 7 Global Step: 92050 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:25:19,923-Speed 3054.21 samples/sec Loss 9.2740 LearningRate 0.0396 Epoch: 7 Global Step: 92060 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:25:23,294-Speed 3038.23 samples/sec Loss 9.2095 LearningRate 0.0396 Epoch: 7 Global Step: 92070 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:25:26,630-Speed 3070.33 samples/sec Loss 9.2236 LearningRate 0.0396 Epoch: 7 Global Step: 92080 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:25:29,956-Speed 3079.62 samples/sec Loss 9.1149 LearningRate 0.0396 Epoch: 7 Global Step: 92090 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:25:33,308-Speed 3055.63 samples/sec Loss 9.1290 LearningRate 0.0396 Epoch: 7 Global Step: 92100 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:25:36,715-Speed 3006.93 samples/sec Loss 8.9805 LearningRate 0.0396 Epoch: 7 Global Step: 92110 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:25:40,072-Speed 3051.07 samples/sec Loss 9.1512 LearningRate 0.0396 Epoch: 7 Global Step: 92120 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:25:43,463-Speed 3020.37 samples/sec Loss 9.0936 LearningRate 0.0396 Epoch: 7 Global Step: 92130 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:25:46,826-Speed 3046.79 samples/sec Loss 9.0041 LearningRate 0.0396 Epoch: 7 Global Step: 92140 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:25:50,230-Speed 3009.00 samples/sec Loss 9.1959 LearningRate 0.0396 Epoch: 7 Global Step: 92150 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:25:53,612-Speed 3029.20 samples/sec Loss 9.0799 LearningRate 0.0396 Epoch: 7 Global Step: 92160 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:25:56,965-Speed 3054.08 samples/sec Loss 9.1777 LearningRate 0.0396 Epoch: 7 Global Step: 92170 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:26:00,276-Speed 3094.00 samples/sec Loss 9.0644 LearningRate 0.0396 Epoch: 7 Global Step: 92180 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:26:03,638-Speed 3046.39 samples/sec Loss 9.1406 LearningRate 0.0396 Epoch: 7 Global Step: 92190 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:26:07,072-Speed 2982.76 samples/sec Loss 9.1789 LearningRate 0.0395 Epoch: 7 Global Step: 92200 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:26:10,486-Speed 3000.75 samples/sec Loss 9.0648 LearningRate 0.0395 Epoch: 7 Global Step: 92210 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:26:13,893-Speed 3006.35 samples/sec Loss 9.1909 LearningRate 0.0395 Epoch: 7 Global Step: 92220 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:26:17,302-Speed 3004.41 samples/sec Loss 8.9778 LearningRate 0.0395 Epoch: 7 Global Step: 92230 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:26:20,693-Speed 3020.93 samples/sec Loss 9.0777 LearningRate 0.0395 Epoch: 7 Global Step: 92240 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:26:24,065-Speed 3037.77 samples/sec Loss 9.1611 LearningRate 0.0395 Epoch: 7 Global Step: 92250 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:26:27,451-Speed 3025.10 samples/sec Loss 9.1239 LearningRate 0.0395 Epoch: 7 Global Step: 92260 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:26:30,936-Speed 2939.07 samples/sec Loss 9.1467 LearningRate 0.0395 Epoch: 7 Global Step: 92270 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:26:34,315-Speed 3031.93 samples/sec Loss 8.9850 LearningRate 0.0395 Epoch: 7 Global Step: 92280 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:26:37,685-Speed 3039.57 samples/sec Loss 9.1011 LearningRate 0.0395 Epoch: 7 Global Step: 92290 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:26:41,037-Speed 3055.73 samples/sec Loss 9.0974 LearningRate 0.0395 Epoch: 7 Global Step: 92300 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:26:44,425-Speed 3023.64 samples/sec Loss 9.0560 LearningRate 0.0395 Epoch: 7 Global Step: 92310 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:26:47,890-Speed 2955.84 samples/sec Loss 9.0592 LearningRate 0.0395 Epoch: 7 Global Step: 92320 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:26:51,208-Speed 3087.34 samples/sec Loss 9.1497 LearningRate 0.0395 Epoch: 7 Global Step: 92330 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:26:54,578-Speed 3039.92 samples/sec Loss 9.1933 LearningRate 0.0395 Epoch: 7 Global Step: 92340 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:26:57,942-Speed 3044.93 samples/sec Loss 9.2674 LearningRate 0.0395 Epoch: 7 Global Step: 92350 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:27:01,332-Speed 3020.90 samples/sec Loss 9.0248 LearningRate 0.0395 Epoch: 7 Global Step: 92360 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:27:04,808-Speed 2947.40 samples/sec Loss 9.2145 LearningRate 0.0395 Epoch: 7 Global Step: 92370 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:27:08,160-Speed 3055.68 samples/sec Loss 9.3523 LearningRate 0.0395 Epoch: 7 Global Step: 92380 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:27:11,584-Speed 2990.50 samples/sec Loss 9.2008 LearningRate 0.0394 Epoch: 7 Global Step: 92390 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:27:14,941-Speed 3052.32 samples/sec Loss 9.1394 LearningRate 0.0394 Epoch: 7 Global Step: 92400 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:27:18,266-Speed 3080.34 samples/sec Loss 9.2517 LearningRate 0.0394 Epoch: 7 Global Step: 92410 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:27:21,624-Speed 3050.11 samples/sec Loss 9.1648 LearningRate 0.0394 Epoch: 7 Global Step: 92420 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:27:24,978-Speed 3053.86 samples/sec Loss 9.1984 LearningRate 0.0394 Epoch: 7 Global Step: 92430 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:27:28,291-Speed 3091.89 samples/sec Loss 9.0986 LearningRate 0.0394 Epoch: 7 Global Step: 92440 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:27:31,754-Speed 2958.16 samples/sec Loss 9.0944 LearningRate 0.0394 Epoch: 7 Global Step: 92450 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 10:27:35,061-Speed 3096.61 samples/sec Loss 9.2520 LearningRate 0.0394 Epoch: 7 Global Step: 92460 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 10:27:38,390-Speed 3077.17 samples/sec Loss 9.1555 LearningRate 0.0394 Epoch: 7 Global Step: 92470 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 10:27:41,751-Speed 3048.31 samples/sec Loss 9.2865 LearningRate 0.0394 Epoch: 7 Global Step: 92480 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 10:27:45,137-Speed 3024.92 samples/sec Loss 9.1492 LearningRate 0.0394 Epoch: 7 Global Step: 92490 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 10:27:48,517-Speed 3030.22 samples/sec Loss 9.1260 LearningRate 0.0394 Epoch: 7 Global Step: 92500 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 10:27:51,912-Speed 3016.97 samples/sec Loss 9.1048 LearningRate 0.0394 Epoch: 7 Global Step: 92510 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 10:27:55,253-Speed 3066.52 samples/sec Loss 9.0633 LearningRate 0.0394 Epoch: 7 Global Step: 92520 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 10:27:58,579-Speed 3079.56 samples/sec Loss 9.0958 LearningRate 0.0394 Epoch: 7 Global Step: 92530 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 10:28:01,982-Speed 3009.52 samples/sec Loss 9.1255 LearningRate 0.0394 Epoch: 7 Global Step: 92540 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 10:28:05,461-Speed 2944.82 samples/sec Loss 9.2521 LearningRate 0.0394 Epoch: 7 Global Step: 92550 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:28:08,881-Speed 2995.43 samples/sec Loss 9.2494 LearningRate 0.0394 Epoch: 7 Global Step: 92560 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:28:12,279-Speed 3013.60 samples/sec Loss 9.1877 LearningRate 0.0394 Epoch: 7 Global Step: 92570 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:28:15,623-Speed 3063.25 samples/sec Loss 9.1315 LearningRate 0.0394 Epoch: 7 Global Step: 92580 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:28:18,978-Speed 3053.72 samples/sec Loss 9.0946 LearningRate 0.0393 Epoch: 7 Global Step: 92590 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:28:22,351-Speed 3036.08 samples/sec Loss 9.2183 LearningRate 0.0393 Epoch: 7 Global Step: 92600 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:28:25,810-Speed 2961.17 samples/sec Loss 9.1244 LearningRate 0.0393 Epoch: 7 Global Step: 92610 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:28:29,160-Speed 3057.93 samples/sec Loss 9.2476 LearningRate 0.0393 Epoch: 7 Global Step: 92620 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:28:32,543-Speed 3028.18 samples/sec Loss 9.0062 LearningRate 0.0393 Epoch: 7 Global Step: 92630 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:28:35,900-Speed 3051.25 samples/sec Loss 9.1342 LearningRate 0.0393 Epoch: 7 Global Step: 92640 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:28:39,255-Speed 3052.81 samples/sec Loss 9.0894 LearningRate 0.0393 Epoch: 7 Global Step: 92650 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:28:42,578-Speed 3081.96 samples/sec Loss 9.2032 LearningRate 0.0393 Epoch: 7 Global Step: 92660 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:28:46,016-Speed 2979.67 samples/sec Loss 9.0718 LearningRate 0.0393 Epoch: 7 Global Step: 92670 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:28:49,366-Speed 3057.37 samples/sec Loss 9.2426 LearningRate 0.0393 Epoch: 7 Global Step: 92680 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:28:52,792-Speed 2989.87 samples/sec Loss 9.0153 LearningRate 0.0393 Epoch: 7 Global Step: 92690 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:28:56,178-Speed 3025.38 samples/sec Loss 9.1030 LearningRate 0.0393 Epoch: 7 Global Step: 92700 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:28:59,497-Speed 3085.84 samples/sec Loss 9.1377 LearningRate 0.0393 Epoch: 7 Global Step: 92710 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:29:02,913-Speed 2998.75 samples/sec Loss 9.1433 LearningRate 0.0393 Epoch: 7 Global Step: 92720 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:29:06,333-Speed 2995.05 samples/sec Loss 9.1244 LearningRate 0.0393 Epoch: 7 Global Step: 92730 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:29:09,739-Speed 3007.32 samples/sec Loss 9.1748 LearningRate 0.0393 Epoch: 7 Global Step: 92740 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:29:13,054-Speed 3090.09 samples/sec Loss 9.1849 LearningRate 0.0393 Epoch: 7 Global Step: 92750 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:29:16,495-Speed 2976.37 samples/sec Loss 9.2155 LearningRate 0.0393 Epoch: 7 Global Step: 92760 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:29:19,893-Speed 3014.32 samples/sec Loss 9.2942 LearningRate 0.0393 Epoch: 7 Global Step: 92770 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:29:23,242-Speed 3058.57 samples/sec Loss 9.1171 LearningRate 0.0393 Epoch: 7 Global Step: 92780 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:29:26,690-Speed 2970.44 samples/sec Loss 9.1296 LearningRate 0.0392 Epoch: 7 Global Step: 92790 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:29:29,995-Speed 3099.13 samples/sec Loss 9.1198 LearningRate 0.0392 Epoch: 7 Global Step: 92800 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:29:33,333-Speed 3069.01 samples/sec Loss 9.1266 LearningRate 0.0392 Epoch: 7 Global Step: 92810 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:29:36,807-Speed 2948.84 samples/sec Loss 9.0528 LearningRate 0.0392 Epoch: 7 Global Step: 92820 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:29:40,301-Speed 2931.41 samples/sec Loss 9.0535 LearningRate 0.0392 Epoch: 7 Global Step: 92830 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:29:43,776-Speed 2947.27 samples/sec Loss 8.9605 LearningRate 0.0392 Epoch: 7 Global Step: 92840 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 10:29:47,222-Speed 2972.99 samples/sec Loss 9.2382 LearningRate 0.0392 Epoch: 7 Global Step: 92850 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 10:29:50,664-Speed 2975.33 samples/sec Loss 9.1902 LearningRate 0.0392 Epoch: 7 Global Step: 92860 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 10:29:54,086-Speed 2993.88 samples/sec Loss 9.1115 LearningRate 0.0392 Epoch: 7 Global Step: 92870 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 10:29:57,491-Speed 3008.18 samples/sec Loss 9.1018 LearningRate 0.0392 Epoch: 7 Global Step: 92880 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 10:30:00,853-Speed 3046.57 samples/sec Loss 9.1963 LearningRate 0.0392 Epoch: 7 Global Step: 92890 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 10:30:04,310-Speed 2963.41 samples/sec Loss 9.0929 LearningRate 0.0392 Epoch: 7 Global Step: 92900 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 10:30:07,705-Speed 3016.53 samples/sec Loss 9.1630 LearningRate 0.0392 Epoch: 7 Global Step: 92910 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 10:30:11,188-Speed 2940.94 samples/sec Loss 9.1327 LearningRate 0.0392 Epoch: 7 Global Step: 92920 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 10:30:14,635-Speed 2971.49 samples/sec Loss 9.0990 LearningRate 0.0392 Epoch: 7 Global Step: 92930 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 10:30:17,978-Speed 3064.36 samples/sec Loss 9.1975 LearningRate 0.0392 Epoch: 7 Global Step: 92940 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:30:21,426-Speed 2970.40 samples/sec Loss 9.0981 LearningRate 0.0392 Epoch: 7 Global Step: 92950 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:30:24,786-Speed 3049.06 samples/sec Loss 9.0719 LearningRate 0.0392 Epoch: 7 Global Step: 92960 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:30:28,127-Speed 3065.31 samples/sec Loss 9.0974 LearningRate 0.0392 Epoch: 7 Global Step: 92970 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:30:31,538-Speed 3003.31 samples/sec Loss 9.2131 LearningRate 0.0392 Epoch: 7 Global Step: 92980 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:30:35,008-Speed 2951.96 samples/sec Loss 9.0995 LearningRate 0.0391 Epoch: 7 Global Step: 92990 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:30:38,381-Speed 3036.63 samples/sec Loss 9.2087 LearningRate 0.0391 Epoch: 7 Global Step: 93000 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:30:41,774-Speed 3018.39 samples/sec Loss 9.2104 LearningRate 0.0391 Epoch: 7 Global Step: 93010 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:30:45,185-Speed 3003.36 samples/sec Loss 9.0020 LearningRate 0.0391 Epoch: 7 Global Step: 93020 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:30:48,570-Speed 3025.21 samples/sec Loss 9.0372 LearningRate 0.0391 Epoch: 7 Global Step: 93030 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:30:52,043-Speed 2949.75 samples/sec Loss 9.1835 LearningRate 0.0391 Epoch: 7 Global Step: 93040 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:30:55,403-Speed 3048.51 samples/sec Loss 9.1082 LearningRate 0.0391 Epoch: 7 Global Step: 93050 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:30:58,859-Speed 2963.63 samples/sec Loss 9.0743 LearningRate 0.0391 Epoch: 7 Global Step: 93060 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:31:02,191-Speed 3074.18 samples/sec Loss 9.1870 LearningRate 0.0391 Epoch: 7 Global Step: 93070 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:31:05,572-Speed 3030.00 samples/sec Loss 9.1448 LearningRate 0.0391 Epoch: 7 Global Step: 93080 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:31:08,926-Speed 3053.63 samples/sec Loss 9.0041 LearningRate 0.0391 Epoch: 7 Global Step: 93090 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:31:12,273-Speed 3060.43 samples/sec Loss 9.2301 LearningRate 0.0391 Epoch: 7 Global Step: 93100 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:31:15,633-Speed 3048.25 samples/sec Loss 9.0950 LearningRate 0.0391 Epoch: 7 Global Step: 93110 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 10:31:19,155-Speed 2907.88 samples/sec Loss 9.1434 LearningRate 0.0391 Epoch: 7 Global Step: 93120 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 10:31:22,665-Speed 2918.27 samples/sec Loss 9.1937 LearningRate 0.0391 Epoch: 7 Global Step: 93130 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 10:31:26,164-Speed 2928.12 samples/sec Loss 9.1677 LearningRate 0.0391 Epoch: 7 Global Step: 93140 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 10:31:29,657-Speed 2931.72 samples/sec Loss 9.2375 LearningRate 0.0391 Epoch: 7 Global Step: 93150 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 10:31:33,032-Speed 3035.26 samples/sec Loss 9.1377 LearningRate 0.0391 Epoch: 7 Global Step: 93160 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 10:31:36,428-Speed 3016.30 samples/sec Loss 9.0438 LearningRate 0.0391 Epoch: 7 Global Step: 93170 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 10:31:39,864-Speed 2980.78 samples/sec Loss 9.0954 LearningRate 0.0391 Epoch: 7 Global Step: 93180 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 10:31:43,261-Speed 3014.99 samples/sec Loss 9.0984 LearningRate 0.0390 Epoch: 7 Global Step: 93190 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 10:31:46,678-Speed 2998.09 samples/sec Loss 9.0707 LearningRate 0.0390 Epoch: 7 Global Step: 93200 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 10:31:50,113-Speed 2982.13 samples/sec Loss 9.1183 LearningRate 0.0390 Epoch: 7 Global Step: 93210 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:31:53,521-Speed 3004.97 samples/sec Loss 9.2240 LearningRate 0.0390 Epoch: 7 Global Step: 93220 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 10:31:56,919-Speed 3014.20 samples/sec Loss 9.1270 LearningRate 0.0390 Epoch: 7 Global Step: 93230 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 10:32:00,390-Speed 2951.87 samples/sec Loss 9.0673 LearningRate 0.0390 Epoch: 7 Global Step: 93240 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 10:32:03,774-Speed 3027.08 samples/sec Loss 9.1012 LearningRate 0.0390 Epoch: 7 Global Step: 93250 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 10:32:07,233-Speed 2960.85 samples/sec Loss 8.9304 LearningRate 0.0390 Epoch: 7 Global Step: 93260 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 10:32:10,603-Speed 3039.87 samples/sec Loss 9.1549 LearningRate 0.0390 Epoch: 7 Global Step: 93270 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 10:32:14,058-Speed 2964.37 samples/sec Loss 9.2380 LearningRate 0.0390 Epoch: 7 Global Step: 93280 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 10:32:17,479-Speed 2994.09 samples/sec Loss 9.0065 LearningRate 0.0390 Epoch: 7 Global Step: 93290 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 10:32:20,896-Speed 2997.50 samples/sec Loss 8.9620 LearningRate 0.0390 Epoch: 7 Global Step: 93300 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 10:32:24,371-Speed 2947.78 samples/sec Loss 9.1324 LearningRate 0.0390 Epoch: 7 Global Step: 93310 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 10:32:27,762-Speed 3020.70 samples/sec Loss 9.1725 LearningRate 0.0390 Epoch: 7 Global Step: 93320 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:32:31,067-Speed 3099.24 samples/sec Loss 9.1099 LearningRate 0.0390 Epoch: 7 Global Step: 93330 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:32:34,511-Speed 2974.59 samples/sec Loss 9.0156 LearningRate 0.0390 Epoch: 7 Global Step: 93340 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:32:37,858-Speed 3060.11 samples/sec Loss 9.2039 LearningRate 0.0390 Epoch: 7 Global Step: 93350 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:32:41,295-Speed 2979.62 samples/sec Loss 9.0942 LearningRate 0.0390 Epoch: 7 Global Step: 93360 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:32:44,685-Speed 3021.33 samples/sec Loss 9.0807 LearningRate 0.0390 Epoch: 7 Global Step: 93370 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:32:48,165-Speed 2943.41 samples/sec Loss 9.1144 LearningRate 0.0390 Epoch: 7 Global Step: 93380 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:32:51,549-Speed 3026.66 samples/sec Loss 9.1260 LearningRate 0.0389 Epoch: 7 Global Step: 93390 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:32:54,910-Speed 3048.19 samples/sec Loss 9.0993 LearningRate 0.0389 Epoch: 7 Global Step: 93400 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:32:58,256-Speed 3060.45 samples/sec Loss 9.1415 LearningRate 0.0389 Epoch: 7 Global Step: 93410 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:33:01,678-Speed 2993.40 samples/sec Loss 9.0866 LearningRate 0.0389 Epoch: 7 Global Step: 93420 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:33:05,044-Speed 3043.36 samples/sec Loss 9.0557 LearningRate 0.0389 Epoch: 7 Global Step: 93430 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:33:08,392-Speed 3059.04 samples/sec Loss 9.1366 LearningRate 0.0389 Epoch: 7 Global Step: 93440 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:33:11,813-Speed 2994.41 samples/sec Loss 9.0954 LearningRate 0.0389 Epoch: 7 Global Step: 93450 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:33:15,234-Speed 2994.39 samples/sec Loss 9.2174 LearningRate 0.0389 Epoch: 7 Global Step: 93460 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:33:18,697-Speed 2957.86 samples/sec Loss 9.0233 LearningRate 0.0389 Epoch: 7 Global Step: 93470 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:33:22,088-Speed 3020.04 samples/sec Loss 9.1072 LearningRate 0.0389 Epoch: 7 Global Step: 93480 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:33:25,525-Speed 2980.46 samples/sec Loss 9.0992 LearningRate 0.0389 Epoch: 7 Global Step: 93490 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:33:28,957-Speed 2985.04 samples/sec Loss 9.1474 LearningRate 0.0389 Epoch: 7 Global Step: 93500 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:33:32,297-Speed 3066.77 samples/sec Loss 9.1485 LearningRate 0.0389 Epoch: 7 Global Step: 93510 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:33:35,712-Speed 2999.88 samples/sec Loss 9.0463 LearningRate 0.0389 Epoch: 7 Global Step: 93520 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:33:39,138-Speed 2989.84 samples/sec Loss 9.0660 LearningRate 0.0389 Epoch: 7 Global Step: 93530 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:33:42,500-Speed 3046.04 samples/sec Loss 9.0353 LearningRate 0.0389 Epoch: 7 Global Step: 93540 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:33:45,993-Speed 2932.81 samples/sec Loss 9.1739 LearningRate 0.0389 Epoch: 7 Global Step: 93550 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:33:49,414-Speed 2994.02 samples/sec Loss 9.1664 LearningRate 0.0389 Epoch: 7 Global Step: 93560 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:33:52,855-Speed 2976.73 samples/sec Loss 9.0704 LearningRate 0.0389 Epoch: 7 Global Step: 93570 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:33:56,250-Speed 3017.39 samples/sec Loss 9.0066 LearningRate 0.0389 Epoch: 7 Global Step: 93580 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:33:59,652-Speed 3010.71 samples/sec Loss 9.0633 LearningRate 0.0388 Epoch: 7 Global Step: 93590 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:34:03,120-Speed 2953.51 samples/sec Loss 9.0086 LearningRate 0.0388 Epoch: 7 Global Step: 93600 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:34:06,555-Speed 2982.38 samples/sec Loss 9.0033 LearningRate 0.0388 Epoch: 7 Global Step: 93610 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:34:09,977-Speed 2993.09 samples/sec Loss 9.1273 LearningRate 0.0388 Epoch: 7 Global Step: 93620 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:34:13,373-Speed 3016.24 samples/sec Loss 9.0647 LearningRate 0.0388 Epoch: 7 Global Step: 93630 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:34:16,766-Speed 3018.79 samples/sec Loss 8.9985 LearningRate 0.0388 Epoch: 7 Global Step: 93640 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:34:20,135-Speed 3039.87 samples/sec Loss 9.0171 LearningRate 0.0388 Epoch: 7 Global Step: 93650 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:34:23,457-Speed 3084.05 samples/sec Loss 9.1409 LearningRate 0.0388 Epoch: 7 Global Step: 93660 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:34:26,832-Speed 3034.32 samples/sec Loss 9.1382 LearningRate 0.0388 Epoch: 7 Global Step: 93670 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:34:30,246-Speed 3000.67 samples/sec Loss 9.0730 LearningRate 0.0388 Epoch: 7 Global Step: 93680 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:34:33,582-Speed 3070.09 samples/sec Loss 9.1795 LearningRate 0.0388 Epoch: 7 Global Step: 93690 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:34:36,928-Speed 3060.86 samples/sec Loss 9.1822 LearningRate 0.0388 Epoch: 7 Global Step: 93700 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:34:40,291-Speed 3046.67 samples/sec Loss 9.1336 LearningRate 0.0388 Epoch: 7 Global Step: 93710 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:34:43,648-Speed 3050.91 samples/sec Loss 8.9611 LearningRate 0.0388 Epoch: 7 Global Step: 93720 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:34:46,997-Speed 3058.03 samples/sec Loss 9.1392 LearningRate 0.0388 Epoch: 7 Global Step: 93730 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:34:50,353-Speed 3052.50 samples/sec Loss 9.0880 LearningRate 0.0388 Epoch: 7 Global Step: 93740 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:34:53,776-Speed 2992.24 samples/sec Loss 9.1068 LearningRate 0.0388 Epoch: 7 Global Step: 93750 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:34:57,184-Speed 3005.47 samples/sec Loss 9.1039 LearningRate 0.0388 Epoch: 7 Global Step: 93760 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:35:00,536-Speed 3056.48 samples/sec Loss 9.2326 LearningRate 0.0388 Epoch: 7 Global Step: 93770 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:35:03,962-Speed 2990.07 samples/sec Loss 9.1298 LearningRate 0.0387 Epoch: 7 Global Step: 93780 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:35:07,405-Speed 2974.46 samples/sec Loss 9.0621 LearningRate 0.0387 Epoch: 7 Global Step: 93790 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:35:10,834-Speed 2986.72 samples/sec Loss 9.0674 LearningRate 0.0387 Epoch: 7 Global Step: 93800 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:35:14,314-Speed 2944.05 samples/sec Loss 9.1100 LearningRate 0.0387 Epoch: 7 Global Step: 93810 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:35:17,780-Speed 2954.92 samples/sec Loss 9.0283 LearningRate 0.0387 Epoch: 7 Global Step: 93820 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:35:21,122-Speed 3065.42 samples/sec Loss 9.1766 LearningRate 0.0387 Epoch: 7 Global Step: 93830 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:35:24,569-Speed 2971.43 samples/sec Loss 9.1073 LearningRate 0.0387 Epoch: 7 Global Step: 93840 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:35:27,980-Speed 3002.40 samples/sec Loss 9.0935 LearningRate 0.0387 Epoch: 7 Global Step: 93850 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:35:31,301-Speed 3084.79 samples/sec Loss 9.0103 LearningRate 0.0387 Epoch: 7 Global Step: 93860 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 10:35:34,677-Speed 3034.00 samples/sec Loss 9.0700 LearningRate 0.0387 Epoch: 7 Global Step: 93870 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 10:35:38,062-Speed 3025.87 samples/sec Loss 9.0752 LearningRate 0.0387 Epoch: 7 Global Step: 93880 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 10:35:41,405-Speed 3064.70 samples/sec Loss 9.1399 LearningRate 0.0387 Epoch: 7 Global Step: 93890 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 10:35:44,811-Speed 3007.78 samples/sec Loss 9.0355 LearningRate 0.0387 Epoch: 7 Global Step: 93900 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 10:35:48,206-Speed 3016.61 samples/sec Loss 9.1584 LearningRate 0.0387 Epoch: 7 Global Step: 93910 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 10:35:51,602-Speed 3016.73 samples/sec Loss 8.9869 LearningRate 0.0387 Epoch: 7 Global Step: 93920 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 10:35:54,970-Speed 3040.98 samples/sec Loss 9.1106 LearningRate 0.0387 Epoch: 7 Global Step: 93930 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 10:35:58,349-Speed 3031.80 samples/sec Loss 9.1174 LearningRate 0.0387 Epoch: 7 Global Step: 93940 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 10:36:01,792-Speed 2975.05 samples/sec Loss 9.1257 LearningRate 0.0387 Epoch: 7 Global Step: 93950 Fp16 Grad Scale: 32768 Required: 15 hours Training: 2022-04-27 10:36:05,175-Speed 3027.92 samples/sec Loss 9.0662 LearningRate 0.0387 Epoch: 7 Global Step: 93960 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:36:08,643-Speed 2954.14 samples/sec Loss 9.1535 LearningRate 0.0387 Epoch: 7 Global Step: 93970 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:36:12,104-Speed 2959.17 samples/sec Loss 8.9671 LearningRate 0.0386 Epoch: 7 Global Step: 93980 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:36:15,492-Speed 3023.48 samples/sec Loss 9.0323 LearningRate 0.0386 Epoch: 7 Global Step: 93990 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:36:18,927-Speed 2981.30 samples/sec Loss 9.0353 LearningRate 0.0386 Epoch: 7 Global Step: 94000 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:36:22,287-Speed 3048.44 samples/sec Loss 9.0450 LearningRate 0.0386 Epoch: 7 Global Step: 94010 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:36:25,673-Speed 3025.28 samples/sec Loss 8.9127 LearningRate 0.0386 Epoch: 7 Global Step: 94020 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:36:29,098-Speed 2990.75 samples/sec Loss 9.0529 LearningRate 0.0386 Epoch: 7 Global Step: 94030 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:36:32,555-Speed 2962.67 samples/sec Loss 9.1057 LearningRate 0.0386 Epoch: 7 Global Step: 94040 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:36:35,988-Speed 2983.91 samples/sec Loss 8.9104 LearningRate 0.0386 Epoch: 7 Global Step: 94050 Fp16 Grad Scale: 65536 Required: 15 hours Training: 2022-04-27 10:36:39,439-Speed 2968.19 samples/sec Loss 9.1238 LearningRate 0.0386 Epoch: 7 Global Step: 94060 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:36:42,922-Speed 2940.81 samples/sec Loss 9.1491 LearningRate 0.0386 Epoch: 7 Global Step: 94070 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:36:46,371-Speed 2969.50 samples/sec Loss 9.1184 LearningRate 0.0386 Epoch: 7 Global Step: 94080 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:36:49,864-Speed 2932.16 samples/sec Loss 9.1662 LearningRate 0.0386 Epoch: 7 Global Step: 94090 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:36:53,241-Speed 3034.01 samples/sec Loss 9.1225 LearningRate 0.0386 Epoch: 7 Global Step: 94100 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:36:56,713-Speed 2949.88 samples/sec Loss 8.9696 LearningRate 0.0386 Epoch: 7 Global Step: 94110 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:37:00,132-Speed 2996.35 samples/sec Loss 9.0880 LearningRate 0.0386 Epoch: 7 Global Step: 94120 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:37:03,470-Speed 3068.51 samples/sec Loss 9.0866 LearningRate 0.0386 Epoch: 7 Global Step: 94130 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:37:06,909-Speed 2978.67 samples/sec Loss 9.1557 LearningRate 0.0386 Epoch: 7 Global Step: 94140 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:37:10,315-Speed 3007.21 samples/sec Loss 9.1439 LearningRate 0.0386 Epoch: 7 Global Step: 94150 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:37:13,781-Speed 2955.24 samples/sec Loss 9.0435 LearningRate 0.0386 Epoch: 7 Global Step: 94160 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:37:17,157-Speed 3034.20 samples/sec Loss 9.1770 LearningRate 0.0386 Epoch: 7 Global Step: 94170 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:37:20,532-Speed 3035.04 samples/sec Loss 9.0096 LearningRate 0.0385 Epoch: 7 Global Step: 94180 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:37:24,028-Speed 2929.77 samples/sec Loss 9.1201 LearningRate 0.0385 Epoch: 7 Global Step: 94190 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:37:27,439-Speed 3005.84 samples/sec Loss 9.0338 LearningRate 0.0385 Epoch: 7 Global Step: 94200 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:37:30,819-Speed 3031.33 samples/sec Loss 9.1763 LearningRate 0.0385 Epoch: 7 Global Step: 94210 Fp16 Grad Scale: 131072 Required: 15 hours Training: 2022-04-27 10:37:34,251-Speed 2984.19 samples/sec Loss 9.1406 LearningRate 0.0385 Epoch: 7 Global Step: 94220 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 10:37:37,673-Speed 2993.76 samples/sec Loss 9.1738 LearningRate 0.0385 Epoch: 7 Global Step: 94230 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:37:41,038-Speed 3043.38 samples/sec Loss 8.9329 LearningRate 0.0385 Epoch: 7 Global Step: 94240 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:37:44,545-Speed 2920.57 samples/sec Loss 9.0980 LearningRate 0.0385 Epoch: 7 Global Step: 94250 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:37:48,018-Speed 2949.99 samples/sec Loss 9.0263 LearningRate 0.0385 Epoch: 7 Global Step: 94260 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:37:51,417-Speed 3012.79 samples/sec Loss 9.1357 LearningRate 0.0385 Epoch: 7 Global Step: 94270 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:37:54,831-Speed 2999.84 samples/sec Loss 9.1105 LearningRate 0.0385 Epoch: 7 Global Step: 94280 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:37:58,218-Speed 3025.24 samples/sec Loss 9.0531 LearningRate 0.0385 Epoch: 7 Global Step: 94290 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:38:01,682-Speed 2957.01 samples/sec Loss 9.1847 LearningRate 0.0385 Epoch: 7 Global Step: 94300 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:38:05,111-Speed 2986.56 samples/sec Loss 9.0214 LearningRate 0.0385 Epoch: 7 Global Step: 94310 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:38:08,547-Speed 2981.11 samples/sec Loss 8.9461 LearningRate 0.0385 Epoch: 7 Global Step: 94320 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:38:11,897-Speed 3057.76 samples/sec Loss 9.0130 LearningRate 0.0385 Epoch: 7 Global Step: 94330 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 10:38:15,283-Speed 3024.76 samples/sec Loss 8.9391 LearningRate 0.0385 Epoch: 7 Global Step: 94340 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 10:38:18,682-Speed 3013.59 samples/sec Loss 9.0081 LearningRate 0.0385 Epoch: 7 Global Step: 94350 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 10:38:22,045-Speed 3046.33 samples/sec Loss 9.0310 LearningRate 0.0385 Epoch: 7 Global Step: 94360 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:38:25,436-Speed 3020.15 samples/sec Loss 9.0724 LearningRate 0.0385 Epoch: 7 Global Step: 94370 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:38:28,847-Speed 3003.24 samples/sec Loss 9.1009 LearningRate 0.0384 Epoch: 7 Global Step: 94380 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:38:32,264-Speed 2997.59 samples/sec Loss 9.1198 LearningRate 0.0384 Epoch: 7 Global Step: 94390 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:38:35,670-Speed 3007.62 samples/sec Loss 9.0412 LearningRate 0.0384 Epoch: 7 Global Step: 94400 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:38:39,041-Speed 3038.22 samples/sec Loss 9.0532 LearningRate 0.0384 Epoch: 7 Global Step: 94410 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:38:42,357-Speed 3089.59 samples/sec Loss 9.0688 LearningRate 0.0384 Epoch: 7 Global Step: 94420 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:38:45,827-Speed 2951.34 samples/sec Loss 8.9770 LearningRate 0.0384 Epoch: 7 Global Step: 94430 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:38:49,220-Speed 3018.78 samples/sec Loss 9.0130 LearningRate 0.0384 Epoch: 7 Global Step: 94440 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:38:52,635-Speed 2999.53 samples/sec Loss 9.0987 LearningRate 0.0384 Epoch: 7 Global Step: 94450 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:38:56,008-Speed 3037.10 samples/sec Loss 9.1258 LearningRate 0.0384 Epoch: 7 Global Step: 94460 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 10:38:59,449-Speed 2976.54 samples/sec Loss 9.1073 LearningRate 0.0384 Epoch: 7 Global Step: 94470 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 10:39:02,804-Speed 3053.65 samples/sec Loss 9.0276 LearningRate 0.0384 Epoch: 7 Global Step: 94480 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 10:39:06,148-Speed 3063.17 samples/sec Loss 9.1194 LearningRate 0.0384 Epoch: 7 Global Step: 94490 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:39:09,667-Speed 2910.56 samples/sec Loss 9.0589 LearningRate 0.0384 Epoch: 7 Global Step: 94500 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:39:13,000-Speed 3072.90 samples/sec Loss 9.0308 LearningRate 0.0384 Epoch: 7 Global Step: 94510 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:39:16,377-Speed 3033.40 samples/sec Loss 8.9862 LearningRate 0.0384 Epoch: 7 Global Step: 94520 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:39:19,749-Speed 3037.63 samples/sec Loss 9.0547 LearningRate 0.0384 Epoch: 7 Global Step: 94530 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:39:23,130-Speed 3028.91 samples/sec Loss 9.1080 LearningRate 0.0384 Epoch: 7 Global Step: 94540 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:39:26,612-Speed 2941.85 samples/sec Loss 8.9770 LearningRate 0.0384 Epoch: 7 Global Step: 94550 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:39:30,070-Speed 2962.37 samples/sec Loss 9.0986 LearningRate 0.0384 Epoch: 7 Global Step: 94560 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:39:33,421-Speed 3056.24 samples/sec Loss 8.9825 LearningRate 0.0384 Epoch: 7 Global Step: 94570 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:39:36,841-Speed 2995.43 samples/sec Loss 9.0588 LearningRate 0.0384 Epoch: 7 Global Step: 94580 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:39:40,275-Speed 2983.15 samples/sec Loss 9.0232 LearningRate 0.0383 Epoch: 7 Global Step: 94590 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 10:39:43,741-Speed 2955.62 samples/sec Loss 9.0064 LearningRate 0.0383 Epoch: 7 Global Step: 94600 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 10:39:47,120-Speed 3031.41 samples/sec Loss 8.9790 LearningRate 0.0383 Epoch: 7 Global Step: 94610 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 10:39:50,558-Speed 2979.06 samples/sec Loss 8.9276 LearningRate 0.0383 Epoch: 7 Global Step: 94620 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:39:53,986-Speed 2987.38 samples/sec Loss 9.0510 LearningRate 0.0383 Epoch: 7 Global Step: 94630 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 10:39:57,390-Speed 3009.83 samples/sec Loss 9.0849 LearningRate 0.0383 Epoch: 7 Global Step: 94640 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 10:40:00,854-Speed 2956.91 samples/sec Loss 9.1913 LearningRate 0.0383 Epoch: 7 Global Step: 94650 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 10:40:04,248-Speed 3017.90 samples/sec Loss 9.1780 LearningRate 0.0383 Epoch: 7 Global Step: 94660 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 10:40:07,745-Speed 2929.36 samples/sec Loss 9.1287 LearningRate 0.0383 Epoch: 7 Global Step: 94670 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 10:40:11,167-Speed 2992.78 samples/sec Loss 8.8825 LearningRate 0.0383 Epoch: 7 Global Step: 94680 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 10:40:14,565-Speed 3014.66 samples/sec Loss 9.1767 LearningRate 0.0383 Epoch: 7 Global Step: 94690 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 10:40:18,046-Speed 2942.84 samples/sec Loss 9.0083 LearningRate 0.0383 Epoch: 7 Global Step: 94700 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 10:40:21,465-Speed 2995.96 samples/sec Loss 9.1165 LearningRate 0.0383 Epoch: 7 Global Step: 94710 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 10:40:24,891-Speed 2989.40 samples/sec Loss 9.1286 LearningRate 0.0383 Epoch: 7 Global Step: 94720 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 10:40:28,217-Speed 3081.64 samples/sec Loss 8.8765 LearningRate 0.0383 Epoch: 7 Global Step: 94730 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:40:31,537-Speed 3085.29 samples/sec Loss 9.0445 LearningRate 0.0383 Epoch: 7 Global Step: 94740 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:40:34,882-Speed 3062.13 samples/sec Loss 8.9979 LearningRate 0.0383 Epoch: 7 Global Step: 94750 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:40:38,391-Speed 2919.23 samples/sec Loss 9.0918 LearningRate 0.0383 Epoch: 7 Global Step: 94760 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:40:41,751-Speed 3048.16 samples/sec Loss 9.0605 LearningRate 0.0383 Epoch: 7 Global Step: 94770 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:40:45,154-Speed 3010.46 samples/sec Loss 9.1920 LearningRate 0.0383 Epoch: 7 Global Step: 94780 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:40:48,523-Speed 3040.40 samples/sec Loss 9.2203 LearningRate 0.0382 Epoch: 7 Global Step: 94790 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:40:51,949-Speed 2989.15 samples/sec Loss 9.0910 LearningRate 0.0382 Epoch: 7 Global Step: 94800 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:40:55,397-Speed 2970.35 samples/sec Loss 8.9893 LearningRate 0.0382 Epoch: 7 Global Step: 94810 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:40:58,881-Speed 2940.82 samples/sec Loss 8.8941 LearningRate 0.0382 Epoch: 7 Global Step: 94820 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:41:02,276-Speed 3016.56 samples/sec Loss 8.9007 LearningRate 0.0382 Epoch: 7 Global Step: 94830 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 10:41:05,771-Speed 2930.69 samples/sec Loss 8.9639 LearningRate 0.0382 Epoch: 7 Global Step: 94840 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 10:41:09,168-Speed 3015.33 samples/sec Loss 9.0463 LearningRate 0.0382 Epoch: 7 Global Step: 94850 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 10:41:12,677-Speed 2918.95 samples/sec Loss 9.0711 LearningRate 0.0382 Epoch: 7 Global Step: 94860 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 10:41:16,082-Speed 3008.63 samples/sec Loss 9.1328 LearningRate 0.0382 Epoch: 7 Global Step: 94870 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 10:41:19,465-Speed 3027.68 samples/sec Loss 8.9984 LearningRate 0.0382 Epoch: 7 Global Step: 94880 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:41:22,915-Speed 2968.96 samples/sec Loss 9.0730 LearningRate 0.0382 Epoch: 7 Global Step: 94890 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:41:26,329-Speed 3000.04 samples/sec Loss 9.1526 LearningRate 0.0382 Epoch: 7 Global Step: 94900 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:41:29,810-Speed 2943.14 samples/sec Loss 9.0606 LearningRate 0.0382 Epoch: 7 Global Step: 94910 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:41:33,216-Speed 3006.74 samples/sec Loss 9.1043 LearningRate 0.0382 Epoch: 7 Global Step: 94920 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:41:36,714-Speed 2928.58 samples/sec Loss 9.0251 LearningRate 0.0382 Epoch: 7 Global Step: 94930 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:41:40,112-Speed 3014.61 samples/sec Loss 8.8790 LearningRate 0.0382 Epoch: 7 Global Step: 94940 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:41:43,486-Speed 3035.25 samples/sec Loss 9.0532 LearningRate 0.0382 Epoch: 7 Global Step: 94950 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:41:46,870-Speed 3026.83 samples/sec Loss 9.0556 LearningRate 0.0382 Epoch: 7 Global Step: 94960 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:41:50,293-Speed 2992.22 samples/sec Loss 9.0258 LearningRate 0.0382 Epoch: 7 Global Step: 94970 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:41:53,727-Speed 2982.49 samples/sec Loss 9.0169 LearningRate 0.0382 Epoch: 7 Global Step: 94980 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 10:41:57,211-Speed 2940.41 samples/sec Loss 9.1682 LearningRate 0.0381 Epoch: 7 Global Step: 94990 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 10:42:00,688-Speed 2946.13 samples/sec Loss 8.9097 LearningRate 0.0381 Epoch: 7 Global Step: 95000 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 10:42:04,178-Speed 2934.37 samples/sec Loss 9.0211 LearningRate 0.0381 Epoch: 7 Global Step: 95010 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 10:42:07,530-Speed 3056.24 samples/sec Loss 8.9912 LearningRate 0.0381 Epoch: 7 Global Step: 95020 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:42:10,898-Speed 3040.92 samples/sec Loss 8.9602 LearningRate 0.0381 Epoch: 7 Global Step: 95030 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:42:14,253-Speed 3053.67 samples/sec Loss 9.1303 LearningRate 0.0381 Epoch: 7 Global Step: 95040 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:42:17,671-Speed 2996.43 samples/sec Loss 9.0646 LearningRate 0.0381 Epoch: 7 Global Step: 95050 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:42:21,032-Speed 3047.62 samples/sec Loss 8.9500 LearningRate 0.0381 Epoch: 7 Global Step: 95060 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:42:24,531-Speed 2927.36 samples/sec Loss 9.1213 LearningRate 0.0381 Epoch: 7 Global Step: 95070 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:42:27,903-Speed 3037.53 samples/sec Loss 8.9328 LearningRate 0.0381 Epoch: 7 Global Step: 95080 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:42:31,257-Speed 3053.92 samples/sec Loss 8.9522 LearningRate 0.0381 Epoch: 7 Global Step: 95090 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:42:34,627-Speed 3039.37 samples/sec Loss 8.9133 LearningRate 0.0381 Epoch: 7 Global Step: 95100 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:42:38,018-Speed 3020.52 samples/sec Loss 8.9986 LearningRate 0.0381 Epoch: 7 Global Step: 95110 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:42:41,433-Speed 2999.49 samples/sec Loss 9.0202 LearningRate 0.0381 Epoch: 7 Global Step: 95120 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:42:44,763-Speed 3076.57 samples/sec Loss 8.9807 LearningRate 0.0381 Epoch: 7 Global Step: 95130 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:42:48,121-Speed 3049.86 samples/sec Loss 9.0497 LearningRate 0.0381 Epoch: 7 Global Step: 95140 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:42:51,457-Speed 3070.25 samples/sec Loss 8.9372 LearningRate 0.0381 Epoch: 7 Global Step: 95150 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:42:54,892-Speed 2981.92 samples/sec Loss 9.0479 LearningRate 0.0381 Epoch: 7 Global Step: 95160 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:42:58,253-Speed 3047.30 samples/sec Loss 9.0341 LearningRate 0.0381 Epoch: 7 Global Step: 95170 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:43:01,697-Speed 2973.77 samples/sec Loss 9.1452 LearningRate 0.0381 Epoch: 7 Global Step: 95180 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:43:05,034-Speed 3069.93 samples/sec Loss 9.0402 LearningRate 0.0380 Epoch: 7 Global Step: 95190 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:43:08,360-Speed 3079.87 samples/sec Loss 8.9677 LearningRate 0.0380 Epoch: 7 Global Step: 95200 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:43:11,705-Speed 3061.51 samples/sec Loss 8.9601 LearningRate 0.0380 Epoch: 7 Global Step: 95210 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:43:15,052-Speed 3060.44 samples/sec Loss 9.1013 LearningRate 0.0380 Epoch: 7 Global Step: 95220 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 10:43:18,433-Speed 3029.88 samples/sec Loss 9.0844 LearningRate 0.0380 Epoch: 7 Global Step: 95230 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:43:21,775-Speed 3064.76 samples/sec Loss 9.0727 LearningRate 0.0380 Epoch: 7 Global Step: 95240 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 10:43:25,087-Speed 3092.10 samples/sec Loss 8.9377 LearningRate 0.0380 Epoch: 7 Global Step: 95250 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 10:43:28,410-Speed 3083.14 samples/sec Loss 9.0534 LearningRate 0.0380 Epoch: 7 Global Step: 95260 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 10:43:31,797-Speed 3023.41 samples/sec Loss 8.8795 LearningRate 0.0380 Epoch: 7 Global Step: 95270 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 10:43:35,200-Speed 3010.35 samples/sec Loss 9.0790 LearningRate 0.0380 Epoch: 7 Global Step: 95280 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 10:43:38,626-Speed 2989.14 samples/sec Loss 9.0168 LearningRate 0.0380 Epoch: 7 Global Step: 95290 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 10:43:42,081-Speed 2965.00 samples/sec Loss 8.9134 LearningRate 0.0380 Epoch: 7 Global Step: 95300 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 10:43:45,447-Speed 3042.98 samples/sec Loss 8.9751 LearningRate 0.0380 Epoch: 7 Global Step: 95310 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 10:43:48,922-Speed 2947.49 samples/sec Loss 9.0108 LearningRate 0.0380 Epoch: 7 Global Step: 95320 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 10:43:52,356-Speed 2982.81 samples/sec Loss 9.0234 LearningRate 0.0380 Epoch: 7 Global Step: 95330 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 10:43:55,732-Speed 3033.72 samples/sec Loss 8.9598 LearningRate 0.0380 Epoch: 7 Global Step: 95340 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 10:43:59,057-Speed 3080.07 samples/sec Loss 9.0239 LearningRate 0.0380 Epoch: 7 Global Step: 95350 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 10:44:02,466-Speed 3004.78 samples/sec Loss 9.0595 LearningRate 0.0380 Epoch: 7 Global Step: 95360 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 10:44:05,890-Speed 2991.55 samples/sec Loss 8.9803 LearningRate 0.0380 Epoch: 7 Global Step: 95370 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 10:44:09,235-Speed 3061.67 samples/sec Loss 9.0005 LearningRate 0.0380 Epoch: 7 Global Step: 95380 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 10:44:12,697-Speed 2958.80 samples/sec Loss 9.0770 LearningRate 0.0379 Epoch: 7 Global Step: 95390 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 10:44:16,042-Speed 3062.41 samples/sec Loss 9.0644 LearningRate 0.0379 Epoch: 7 Global Step: 95400 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 10:44:19,468-Speed 2989.50 samples/sec Loss 9.0899 LearningRate 0.0379 Epoch: 7 Global Step: 95410 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 10:44:22,869-Speed 3011.30 samples/sec Loss 9.0234 LearningRate 0.0379 Epoch: 7 Global Step: 95420 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 10:44:26,212-Speed 3063.92 samples/sec Loss 8.9058 LearningRate 0.0379 Epoch: 7 Global Step: 95430 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 10:44:29,592-Speed 3030.37 samples/sec Loss 8.9857 LearningRate 0.0379 Epoch: 7 Global Step: 95440 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 10:44:33,049-Speed 2963.12 samples/sec Loss 8.9548 LearningRate 0.0379 Epoch: 7 Global Step: 95450 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 10:44:36,504-Speed 2964.60 samples/sec Loss 9.0498 LearningRate 0.0379 Epoch: 7 Global Step: 95460 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 10:44:39,863-Speed 3049.52 samples/sec Loss 9.1856 LearningRate 0.0379 Epoch: 7 Global Step: 95470 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:44:43,290-Speed 2988.70 samples/sec Loss 8.9760 LearningRate 0.0379 Epoch: 7 Global Step: 95480 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:44:46,692-Speed 3011.14 samples/sec Loss 9.1386 LearningRate 0.0379 Epoch: 7 Global Step: 95490 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:44:50,144-Speed 2966.81 samples/sec Loss 8.9290 LearningRate 0.0379 Epoch: 7 Global Step: 95500 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:44:53,569-Speed 2991.25 samples/sec Loss 9.0613 LearningRate 0.0379 Epoch: 7 Global Step: 95510 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:44:56,941-Speed 3037.34 samples/sec Loss 9.0229 LearningRate 0.0379 Epoch: 7 Global Step: 95520 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:45:00,352-Speed 3003.25 samples/sec Loss 9.0167 LearningRate 0.0379 Epoch: 7 Global Step: 95530 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:45:03,803-Speed 2967.99 samples/sec Loss 8.9183 LearningRate 0.0379 Epoch: 7 Global Step: 95540 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:45:07,305-Speed 2924.66 samples/sec Loss 8.9986 LearningRate 0.0379 Epoch: 7 Global Step: 95550 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:45:10,747-Speed 2976.06 samples/sec Loss 8.9932 LearningRate 0.0379 Epoch: 7 Global Step: 95560 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:45:14,107-Speed 3048.64 samples/sec Loss 9.0863 LearningRate 0.0379 Epoch: 7 Global Step: 95570 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:45:17,409-Speed 3102.69 samples/sec Loss 9.1024 LearningRate 0.0379 Epoch: 7 Global Step: 95580 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 10:45:20,864-Speed 2965.36 samples/sec Loss 8.9847 LearningRate 0.0378 Epoch: 7 Global Step: 95590 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 10:45:24,282-Speed 2996.31 samples/sec Loss 9.0416 LearningRate 0.0378 Epoch: 7 Global Step: 95600 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 10:45:27,642-Speed 3048.05 samples/sec Loss 8.8679 LearningRate 0.0378 Epoch: 7 Global Step: 95610 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 10:45:31,114-Speed 2950.41 samples/sec Loss 8.9841 LearningRate 0.0378 Epoch: 7 Global Step: 95620 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 10:45:34,532-Speed 2996.56 samples/sec Loss 8.9684 LearningRate 0.0378 Epoch: 7 Global Step: 95630 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 10:45:37,934-Speed 3010.96 samples/sec Loss 8.9954 LearningRate 0.0378 Epoch: 7 Global Step: 95640 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 10:45:41,389-Speed 2964.43 samples/sec Loss 8.9827 LearningRate 0.0378 Epoch: 7 Global Step: 95650 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 10:45:44,807-Speed 2996.39 samples/sec Loss 8.9527 LearningRate 0.0378 Epoch: 7 Global Step: 95660 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 10:45:48,186-Speed 3031.70 samples/sec Loss 8.9224 LearningRate 0.0378 Epoch: 7 Global Step: 95670 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 10:45:51,694-Speed 2920.05 samples/sec Loss 8.9731 LearningRate 0.0378 Epoch: 7 Global Step: 95680 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:45:55,078-Speed 3026.12 samples/sec Loss 9.0709 LearningRate 0.0378 Epoch: 7 Global Step: 95690 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:45:58,555-Speed 2945.86 samples/sec Loss 8.8825 LearningRate 0.0378 Epoch: 7 Global Step: 95700 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:46:01,931-Speed 3034.74 samples/sec Loss 8.9375 LearningRate 0.0378 Epoch: 7 Global Step: 95710 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:46:05,305-Speed 3035.61 samples/sec Loss 8.8990 LearningRate 0.0378 Epoch: 7 Global Step: 95720 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:46:08,664-Speed 3049.01 samples/sec Loss 8.9831 LearningRate 0.0378 Epoch: 7 Global Step: 95730 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:46:12,049-Speed 3026.15 samples/sec Loss 9.0505 LearningRate 0.0378 Epoch: 7 Global Step: 95740 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:46:15,498-Speed 2969.72 samples/sec Loss 9.0207 LearningRate 0.0378 Epoch: 7 Global Step: 95750 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:46:18,951-Speed 2966.35 samples/sec Loss 9.0116 LearningRate 0.0378 Epoch: 7 Global Step: 95760 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:46:22,287-Speed 3070.07 samples/sec Loss 8.8965 LearningRate 0.0378 Epoch: 7 Global Step: 95770 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:46:25,701-Speed 3000.02 samples/sec Loss 8.9117 LearningRate 0.0378 Epoch: 7 Global Step: 95780 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 10:46:29,163-Speed 2958.85 samples/sec Loss 9.0363 LearningRate 0.0377 Epoch: 7 Global Step: 95790 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:46:32,609-Speed 2972.67 samples/sec Loss 8.9679 LearningRate 0.0377 Epoch: 7 Global Step: 95800 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:46:36,061-Speed 2966.92 samples/sec Loss 9.0483 LearningRate 0.0377 Epoch: 7 Global Step: 95810 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:46:39,536-Speed 2948.14 samples/sec Loss 8.9975 LearningRate 0.0377 Epoch: 7 Global Step: 95820 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 10:46:42,868-Speed 3074.38 samples/sec Loss 8.9927 LearningRate 0.0377 Epoch: 7 Global Step: 95830 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 10:46:46,270-Speed 3010.45 samples/sec Loss 8.8690 LearningRate 0.0377 Epoch: 7 Global Step: 95840 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 10:46:49,611-Speed 3066.28 samples/sec Loss 8.7889 LearningRate 0.0377 Epoch: 7 Global Step: 95850 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 10:46:52,953-Speed 3064.74 samples/sec Loss 8.9345 LearningRate 0.0377 Epoch: 7 Global Step: 95860 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 10:46:56,334-Speed 3029.09 samples/sec Loss 8.8863 LearningRate 0.0377 Epoch: 7 Global Step: 95870 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 10:46:59,763-Speed 2987.64 samples/sec Loss 8.8639 LearningRate 0.0377 Epoch: 7 Global Step: 95880 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 10:47:03,178-Speed 2999.03 samples/sec Loss 8.9642 LearningRate 0.0377 Epoch: 7 Global Step: 95890 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 10:47:06,514-Speed 3070.43 samples/sec Loss 9.0393 LearningRate 0.0377 Epoch: 7 Global Step: 95900 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 10:47:09,912-Speed 3013.96 samples/sec Loss 8.9411 LearningRate 0.0377 Epoch: 7 Global Step: 95910 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 10:47:13,371-Speed 2961.71 samples/sec Loss 8.8616 LearningRate 0.0377 Epoch: 7 Global Step: 95920 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:47:16,733-Speed 3046.47 samples/sec Loss 9.0077 LearningRate 0.0377 Epoch: 7 Global Step: 95930 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:47:20,098-Speed 3043.54 samples/sec Loss 8.9670 LearningRate 0.0377 Epoch: 7 Global Step: 95940 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:47:23,426-Speed 3078.12 samples/sec Loss 9.0444 LearningRate 0.0377 Epoch: 7 Global Step: 95950 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:47:26,758-Speed 3074.06 samples/sec Loss 9.0666 LearningRate 0.0377 Epoch: 7 Global Step: 95960 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:47:30,126-Speed 3040.99 samples/sec Loss 9.0383 LearningRate 0.0377 Epoch: 7 Global Step: 95970 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:47:33,477-Speed 3056.73 samples/sec Loss 8.9632 LearningRate 0.0377 Epoch: 7 Global Step: 95980 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:47:36,817-Speed 3066.47 samples/sec Loss 8.7649 LearningRate 0.0377 Epoch: 7 Global Step: 95990 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:47:40,145-Speed 3078.19 samples/sec Loss 9.0547 LearningRate 0.0376 Epoch: 7 Global Step: 96000 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:47:43,618-Speed 2949.05 samples/sec Loss 8.9745 LearningRate 0.0376 Epoch: 7 Global Step: 96010 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:47:47,041-Speed 2991.99 samples/sec Loss 8.9052 LearningRate 0.0376 Epoch: 7 Global Step: 96020 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 10:47:50,382-Speed 3066.29 samples/sec Loss 8.9863 LearningRate 0.0376 Epoch: 7 Global Step: 96030 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:47:53,734-Speed 3055.93 samples/sec Loss 9.0658 LearningRate 0.0376 Epoch: 7 Global Step: 96040 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:47:57,088-Speed 3053.36 samples/sec Loss 9.0012 LearningRate 0.0376 Epoch: 7 Global Step: 96050 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:48:00,435-Speed 3060.61 samples/sec Loss 8.8561 LearningRate 0.0376 Epoch: 7 Global Step: 96060 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:48:03,797-Speed 3046.67 samples/sec Loss 8.9812 LearningRate 0.0376 Epoch: 7 Global Step: 96070 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:48:07,151-Speed 3053.72 samples/sec Loss 8.8853 LearningRate 0.0376 Epoch: 7 Global Step: 96080 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:48:10,494-Speed 3064.01 samples/sec Loss 8.9358 LearningRate 0.0376 Epoch: 7 Global Step: 96090 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:48:13,795-Speed 3103.68 samples/sec Loss 8.9579 LearningRate 0.0376 Epoch: 7 Global Step: 96100 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:48:17,221-Speed 2989.22 samples/sec Loss 8.9575 LearningRate 0.0376 Epoch: 7 Global Step: 96110 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:48:20,627-Speed 3008.24 samples/sec Loss 8.9976 LearningRate 0.0376 Epoch: 7 Global Step: 96120 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:48:24,057-Speed 2986.22 samples/sec Loss 8.8650 LearningRate 0.0376 Epoch: 7 Global Step: 96130 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 10:48:27,516-Speed 2960.48 samples/sec Loss 8.7413 LearningRate 0.0376 Epoch: 7 Global Step: 96140 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 10:48:30,904-Speed 3023.29 samples/sec Loss 8.8684 LearningRate 0.0376 Epoch: 7 Global Step: 96150 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 10:48:34,355-Speed 2968.93 samples/sec Loss 8.9019 LearningRate 0.0376 Epoch: 7 Global Step: 96160 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 10:48:37,729-Speed 3035.26 samples/sec Loss 8.9043 LearningRate 0.0376 Epoch: 7 Global Step: 96170 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:48:41,073-Speed 3063.84 samples/sec Loss 8.9802 LearningRate 0.0376 Epoch: 7 Global Step: 96180 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 10:48:44,429-Speed 3051.50 samples/sec Loss 8.8697 LearningRate 0.0376 Epoch: 7 Global Step: 96190 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 10:48:47,755-Speed 3079.60 samples/sec Loss 8.9479 LearningRate 0.0375 Epoch: 7 Global Step: 96200 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 10:48:51,164-Speed 3004.80 samples/sec Loss 8.9505 LearningRate 0.0375 Epoch: 7 Global Step: 96210 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 10:48:54,502-Speed 3069.16 samples/sec Loss 8.9376 LearningRate 0.0375 Epoch: 7 Global Step: 96220 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 10:48:57,910-Speed 3005.37 samples/sec Loss 8.9089 LearningRate 0.0375 Epoch: 7 Global Step: 96230 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 10:49:01,312-Speed 3010.95 samples/sec Loss 8.9416 LearningRate 0.0375 Epoch: 7 Global Step: 96240 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 10:49:04,704-Speed 3019.89 samples/sec Loss 9.0191 LearningRate 0.0375 Epoch: 7 Global Step: 96250 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 10:49:08,113-Speed 3004.70 samples/sec Loss 8.8243 LearningRate 0.0375 Epoch: 7 Global Step: 96260 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 10:49:11,516-Speed 3010.49 samples/sec Loss 8.9309 LearningRate 0.0375 Epoch: 7 Global Step: 96270 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 10:49:14,890-Speed 3035.42 samples/sec Loss 8.9723 LearningRate 0.0375 Epoch: 7 Global Step: 96280 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:49:18,284-Speed 3018.51 samples/sec Loss 9.0280 LearningRate 0.0375 Epoch: 7 Global Step: 96290 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:49:21,707-Speed 2991.64 samples/sec Loss 8.8888 LearningRate 0.0375 Epoch: 7 Global Step: 96300 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:49:25,136-Speed 2987.73 samples/sec Loss 9.0155 LearningRate 0.0375 Epoch: 7 Global Step: 96310 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:49:28,495-Speed 3049.06 samples/sec Loss 8.9860 LearningRate 0.0375 Epoch: 7 Global Step: 96320 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:49:31,873-Speed 3032.22 samples/sec Loss 8.9256 LearningRate 0.0375 Epoch: 7 Global Step: 96330 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:49:35,269-Speed 3016.54 samples/sec Loss 8.9875 LearningRate 0.0375 Epoch: 7 Global Step: 96340 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:49:38,699-Speed 2985.74 samples/sec Loss 8.8903 LearningRate 0.0375 Epoch: 7 Global Step: 96350 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:49:42,081-Speed 3029.33 samples/sec Loss 8.8658 LearningRate 0.0375 Epoch: 7 Global Step: 96360 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:49:45,426-Speed 3061.81 samples/sec Loss 8.9962 LearningRate 0.0375 Epoch: 7 Global Step: 96370 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:49:48,828-Speed 3010.99 samples/sec Loss 8.9236 LearningRate 0.0375 Epoch: 7 Global Step: 96380 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 10:49:52,241-Speed 3000.05 samples/sec Loss 8.9304 LearningRate 0.0375 Epoch: 7 Global Step: 96390 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 10:49:55,645-Speed 3009.50 samples/sec Loss 8.9523 LearningRate 0.0374 Epoch: 7 Global Step: 96400 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:49:59,076-Speed 2984.95 samples/sec Loss 8.9291 LearningRate 0.0374 Epoch: 7 Global Step: 96410 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:50:02,451-Speed 3034.85 samples/sec Loss 8.8429 LearningRate 0.0374 Epoch: 7 Global Step: 96420 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:50:05,925-Speed 2948.81 samples/sec Loss 8.8831 LearningRate 0.0374 Epoch: 7 Global Step: 96430 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:50:09,300-Speed 3035.28 samples/sec Loss 8.7889 LearningRate 0.0374 Epoch: 7 Global Step: 96440 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:50:12,659-Speed 3049.07 samples/sec Loss 8.9814 LearningRate 0.0374 Epoch: 7 Global Step: 96450 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:50:16,002-Speed 3064.11 samples/sec Loss 8.9721 LearningRate 0.0374 Epoch: 7 Global Step: 96460 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:50:19,355-Speed 3055.06 samples/sec Loss 8.7968 LearningRate 0.0374 Epoch: 7 Global Step: 96470 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:50:22,777-Speed 2992.96 samples/sec Loss 8.9173 LearningRate 0.0374 Epoch: 7 Global Step: 96480 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:50:26,145-Speed 3041.47 samples/sec Loss 8.8748 LearningRate 0.0374 Epoch: 7 Global Step: 96490 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:50:29,473-Speed 3077.31 samples/sec Loss 9.0710 LearningRate 0.0374 Epoch: 7 Global Step: 96500 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 10:50:32,790-Speed 3088.96 samples/sec Loss 8.8966 LearningRate 0.0374 Epoch: 7 Global Step: 96510 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 10:50:36,119-Speed 3076.64 samples/sec Loss 8.8456 LearningRate 0.0374 Epoch: 7 Global Step: 96520 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 10:50:39,458-Speed 3067.83 samples/sec Loss 9.0250 LearningRate 0.0374 Epoch: 7 Global Step: 96530 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 10:50:42,785-Speed 3079.18 samples/sec Loss 8.9951 LearningRate 0.0374 Epoch: 7 Global Step: 96540 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 10:50:46,120-Speed 3070.75 samples/sec Loss 8.8747 LearningRate 0.0374 Epoch: 7 Global Step: 96550 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 10:50:49,435-Speed 3090.02 samples/sec Loss 8.9655 LearningRate 0.0374 Epoch: 7 Global Step: 96560 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 10:50:52,832-Speed 3015.48 samples/sec Loss 8.9548 LearningRate 0.0374 Epoch: 7 Global Step: 96570 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 10:50:56,249-Speed 2997.89 samples/sec Loss 8.9890 LearningRate 0.0374 Epoch: 7 Global Step: 96580 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:50:59,680-Speed 2984.99 samples/sec Loss 8.9446 LearningRate 0.0374 Epoch: 7 Global Step: 96590 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:51:03,197-Speed 2912.92 samples/sec Loss 8.9811 LearningRate 0.0373 Epoch: 7 Global Step: 96600 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:51:06,630-Speed 2983.05 samples/sec Loss 8.9563 LearningRate 0.0373 Epoch: 7 Global Step: 96610 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:51:10,007-Speed 3033.30 samples/sec Loss 8.7699 LearningRate 0.0373 Epoch: 7 Global Step: 96620 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:51:13,428-Speed 2994.13 samples/sec Loss 8.9158 LearningRate 0.0373 Epoch: 7 Global Step: 96630 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:51:16,817-Speed 3021.72 samples/sec Loss 8.8886 LearningRate 0.0373 Epoch: 7 Global Step: 96640 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:51:20,231-Speed 2999.97 samples/sec Loss 8.8094 LearningRate 0.0373 Epoch: 7 Global Step: 96650 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:51:23,574-Speed 3064.24 samples/sec Loss 9.0598 LearningRate 0.0373 Epoch: 7 Global Step: 96660 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:51:26,945-Speed 3038.58 samples/sec Loss 8.8583 LearningRate 0.0373 Epoch: 7 Global Step: 96670 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:51:30,440-Speed 2930.78 samples/sec Loss 8.9227 LearningRate 0.0373 Epoch: 7 Global Step: 96680 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 10:51:33,895-Speed 2964.54 samples/sec Loss 8.9018 LearningRate 0.0373 Epoch: 7 Global Step: 96690 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 10:51:37,278-Speed 3028.11 samples/sec Loss 8.7476 LearningRate 0.0373 Epoch: 7 Global Step: 96700 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:51:40,623-Speed 3062.26 samples/sec Loss 8.7290 LearningRate 0.0373 Epoch: 7 Global Step: 96710 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:51:44,010-Speed 3023.57 samples/sec Loss 8.9035 LearningRate 0.0373 Epoch: 7 Global Step: 96720 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:51:47,412-Speed 3010.88 samples/sec Loss 8.9672 LearningRate 0.0373 Epoch: 7 Global Step: 96730 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:51:50,818-Speed 3007.13 samples/sec Loss 8.8593 LearningRate 0.0373 Epoch: 7 Global Step: 96740 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:51:54,191-Speed 3037.00 samples/sec Loss 8.7474 LearningRate 0.0373 Epoch: 7 Global Step: 96750 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:51:57,570-Speed 3030.93 samples/sec Loss 8.8330 LearningRate 0.0373 Epoch: 7 Global Step: 96760 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 10:52:00,874-Speed 3100.26 samples/sec Loss 8.9399 LearningRate 0.0373 Epoch: 7 Global Step: 96770 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 10:52:04,315-Speed 2976.71 samples/sec Loss 8.9105 LearningRate 0.0373 Epoch: 7 Global Step: 96780 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 10:52:07,763-Speed 2970.82 samples/sec Loss 8.9022 LearningRate 0.0373 Epoch: 7 Global Step: 96790 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 10:52:11,228-Speed 2956.63 samples/sec Loss 8.9744 LearningRate 0.0373 Epoch: 7 Global Step: 96800 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 10:52:14,565-Speed 3069.60 samples/sec Loss 8.9629 LearningRate 0.0372 Epoch: 7 Global Step: 96810 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 10:52:18,019-Speed 2965.38 samples/sec Loss 9.1618 LearningRate 0.0372 Epoch: 7 Global Step: 96820 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 10:52:21,475-Speed 2963.63 samples/sec Loss 8.9659 LearningRate 0.0372 Epoch: 7 Global Step: 96830 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 10:52:24,837-Speed 3048.53 samples/sec Loss 8.8015 LearningRate 0.0372 Epoch: 7 Global Step: 96840 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 10:52:28,310-Speed 2949.53 samples/sec Loss 8.9325 LearningRate 0.0372 Epoch: 7 Global Step: 96850 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 10:52:31,768-Speed 2961.58 samples/sec Loss 8.9026 LearningRate 0.0372 Epoch: 7 Global Step: 96860 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:52:35,080-Speed 3092.50 samples/sec Loss 8.8764 LearningRate 0.0372 Epoch: 7 Global Step: 96870 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:52:38,478-Speed 3015.11 samples/sec Loss 8.8995 LearningRate 0.0372 Epoch: 7 Global Step: 96880 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:52:41,850-Speed 3036.75 samples/sec Loss 9.0692 LearningRate 0.0372 Epoch: 7 Global Step: 96890 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:52:45,232-Speed 3029.05 samples/sec Loss 8.6988 LearningRate 0.0372 Epoch: 7 Global Step: 96900 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:52:48,632-Speed 3012.89 samples/sec Loss 9.0782 LearningRate 0.0372 Epoch: 7 Global Step: 96910 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:52:51,987-Speed 3053.02 samples/sec Loss 8.8400 LearningRate 0.0372 Epoch: 7 Global Step: 96920 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:52:55,376-Speed 3021.92 samples/sec Loss 8.7990 LearningRate 0.0372 Epoch: 7 Global Step: 96930 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:52:58,850-Speed 2948.98 samples/sec Loss 9.0187 LearningRate 0.0372 Epoch: 7 Global Step: 96940 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:53:02,209-Speed 3049.06 samples/sec Loss 8.8986 LearningRate 0.0372 Epoch: 7 Global Step: 96950 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:53:05,555-Speed 3060.89 samples/sec Loss 8.9260 LearningRate 0.0372 Epoch: 7 Global Step: 96960 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 10:53:08,994-Speed 2978.06 samples/sec Loss 8.9248 LearningRate 0.0372 Epoch: 7 Global Step: 96970 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 10:53:12,345-Speed 3056.67 samples/sec Loss 8.8723 LearningRate 0.0372 Epoch: 7 Global Step: 96980 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 10:53:15,719-Speed 3035.42 samples/sec Loss 8.8674 LearningRate 0.0372 Epoch: 7 Global Step: 96990 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 10:53:19,177-Speed 2962.72 samples/sec Loss 9.0461 LearningRate 0.0372 Epoch: 7 Global Step: 97000 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 10:53:22,607-Speed 2985.74 samples/sec Loss 8.8007 LearningRate 0.0371 Epoch: 7 Global Step: 97010 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 10:53:25,984-Speed 3033.18 samples/sec Loss 8.8801 LearningRate 0.0371 Epoch: 7 Global Step: 97020 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 10:53:29,436-Speed 2967.16 samples/sec Loss 9.0077 LearningRate 0.0371 Epoch: 7 Global Step: 97030 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 10:53:32,842-Speed 3007.25 samples/sec Loss 8.9414 LearningRate 0.0371 Epoch: 7 Global Step: 97040 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 10:53:36,270-Speed 2988.42 samples/sec Loss 8.8223 LearningRate 0.0371 Epoch: 7 Global Step: 97050 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:53:39,689-Speed 2995.38 samples/sec Loss 8.8359 LearningRate 0.0371 Epoch: 7 Global Step: 97060 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:53:43,097-Speed 3005.50 samples/sec Loss 8.8880 LearningRate 0.0371 Epoch: 7 Global Step: 97070 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:53:46,478-Speed 3029.38 samples/sec Loss 8.8359 LearningRate 0.0371 Epoch: 7 Global Step: 97080 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:53:49,891-Speed 3001.36 samples/sec Loss 8.9863 LearningRate 0.0371 Epoch: 7 Global Step: 97090 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:53:53,278-Speed 3024.28 samples/sec Loss 8.9032 LearningRate 0.0371 Epoch: 7 Global Step: 97100 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:53:56,689-Speed 3002.79 samples/sec Loss 8.8898 LearningRate 0.0371 Epoch: 7 Global Step: 97110 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:54:00,124-Speed 2982.16 samples/sec Loss 8.9886 LearningRate 0.0371 Epoch: 7 Global Step: 97120 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:54:03,466-Speed 3064.22 samples/sec Loss 8.7600 LearningRate 0.0371 Epoch: 7 Global Step: 97130 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:54:06,818-Speed 3056.05 samples/sec Loss 8.9728 LearningRate 0.0371 Epoch: 7 Global Step: 97140 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:54:10,229-Speed 3002.26 samples/sec Loss 8.8138 LearningRate 0.0371 Epoch: 7 Global Step: 97150 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 10:54:13,641-Speed 3002.26 samples/sec Loss 8.7155 LearningRate 0.0371 Epoch: 7 Global Step: 97160 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 10:54:17,001-Speed 3048.74 samples/sec Loss 8.9763 LearningRate 0.0371 Epoch: 7 Global Step: 97170 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:54:20,390-Speed 3022.36 samples/sec Loss 8.7984 LearningRate 0.0371 Epoch: 7 Global Step: 97180 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:54:23,767-Speed 3032.75 samples/sec Loss 8.8154 LearningRate 0.0371 Epoch: 7 Global Step: 97190 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:54:27,097-Speed 3075.58 samples/sec Loss 8.8659 LearningRate 0.0371 Epoch: 7 Global Step: 97200 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:54:30,495-Speed 3014.80 samples/sec Loss 8.8808 LearningRate 0.0370 Epoch: 7 Global Step: 97210 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:54:33,844-Speed 3057.81 samples/sec Loss 8.8975 LearningRate 0.0370 Epoch: 7 Global Step: 97220 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:54:37,293-Speed 2970.00 samples/sec Loss 8.8123 LearningRate 0.0370 Epoch: 7 Global Step: 97230 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:54:40,763-Speed 2951.52 samples/sec Loss 8.9299 LearningRate 0.0370 Epoch: 7 Global Step: 97240 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:54:44,148-Speed 3026.73 samples/sec Loss 8.7788 LearningRate 0.0370 Epoch: 7 Global Step: 97250 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:54:47,562-Speed 3001.01 samples/sec Loss 8.9946 LearningRate 0.0370 Epoch: 7 Global Step: 97260 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:54:50,891-Speed 3077.06 samples/sec Loss 8.9142 LearningRate 0.0370 Epoch: 7 Global Step: 97270 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 10:54:54,287-Speed 3015.63 samples/sec Loss 8.8197 LearningRate 0.0370 Epoch: 7 Global Step: 97280 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 10:54:57,702-Speed 2999.80 samples/sec Loss 9.0053 LearningRate 0.0370 Epoch: 7 Global Step: 97290 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 10:55:01,194-Speed 2934.31 samples/sec Loss 8.8058 LearningRate 0.0370 Epoch: 7 Global Step: 97300 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 10:55:04,565-Speed 3038.20 samples/sec Loss 8.8742 LearningRate 0.0370 Epoch: 7 Global Step: 97310 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 10:55:07,892-Speed 3078.05 samples/sec Loss 8.8328 LearningRate 0.0370 Epoch: 7 Global Step: 97320 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 10:55:11,245-Speed 3054.67 samples/sec Loss 8.7775 LearningRate 0.0370 Epoch: 7 Global Step: 97330 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:55:14,666-Speed 2994.28 samples/sec Loss 8.6417 LearningRate 0.0370 Epoch: 7 Global Step: 97340 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:55:18,044-Speed 3032.54 samples/sec Loss 8.7883 LearningRate 0.0370 Epoch: 7 Global Step: 97350 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:55:21,400-Speed 3051.64 samples/sec Loss 8.9307 LearningRate 0.0370 Epoch: 7 Global Step: 97360 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:55:24,837-Speed 2981.00 samples/sec Loss 8.8745 LearningRate 0.0370 Epoch: 7 Global Step: 97370 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:55:28,237-Speed 3012.01 samples/sec Loss 8.7870 LearningRate 0.0370 Epoch: 7 Global Step: 97380 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 10:55:31,609-Speed 3038.05 samples/sec Loss 9.0396 LearningRate 0.0370 Epoch: 7 Global Step: 97390 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 10:55:35,047-Speed 2978.75 samples/sec Loss 8.8062 LearningRate 0.0370 Epoch: 7 Global Step: 97400 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 10:55:38,503-Speed 2963.88 samples/sec Loss 8.9212 LearningRate 0.0370 Epoch: 7 Global Step: 97410 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 10:55:41,928-Speed 2991.04 samples/sec Loss 8.9011 LearningRate 0.0369 Epoch: 7 Global Step: 97420 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 10:55:45,322-Speed 3018.26 samples/sec Loss 8.9534 LearningRate 0.0369 Epoch: 7 Global Step: 97430 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 10:55:48,697-Speed 3035.26 samples/sec Loss 8.9385 LearningRate 0.0369 Epoch: 7 Global Step: 97440 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 10:55:52,041-Speed 3064.00 samples/sec Loss 8.9726 LearningRate 0.0369 Epoch: 7 Global Step: 97450 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 10:55:55,395-Speed 3053.07 samples/sec Loss 8.7924 LearningRate 0.0369 Epoch: 7 Global Step: 97460 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 10:55:58,779-Speed 3027.40 samples/sec Loss 8.8835 LearningRate 0.0369 Epoch: 7 Global Step: 97470 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 10:56:02,175-Speed 3016.51 samples/sec Loss 8.8698 LearningRate 0.0369 Epoch: 7 Global Step: 97480 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:56:05,518-Speed 3064.04 samples/sec Loss 8.8000 LearningRate 0.0369 Epoch: 7 Global Step: 97490 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:56:08,851-Speed 3073.15 samples/sec Loss 8.9321 LearningRate 0.0369 Epoch: 7 Global Step: 97500 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:56:12,194-Speed 3064.00 samples/sec Loss 8.9502 LearningRate 0.0369 Epoch: 7 Global Step: 97510 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:56:15,574-Speed 3029.76 samples/sec Loss 8.9073 LearningRate 0.0369 Epoch: 7 Global Step: 97520 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:56:18,933-Speed 3049.66 samples/sec Loss 8.8988 LearningRate 0.0369 Epoch: 7 Global Step: 97530 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:56:22,279-Speed 3061.80 samples/sec Loss 8.8264 LearningRate 0.0369 Epoch: 7 Global Step: 97540 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:56:25,598-Speed 3085.25 samples/sec Loss 8.8081 LearningRate 0.0369 Epoch: 7 Global Step: 97550 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 10:56:29,041-Speed 2975.03 samples/sec Loss 8.9016 LearningRate 0.0369 Epoch: 7 Global Step: 97560 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 10:56:32,402-Speed 3047.96 samples/sec Loss 9.0321 LearningRate 0.0369 Epoch: 7 Global Step: 97570 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 10:56:35,755-Speed 3054.25 samples/sec Loss 8.8818 LearningRate 0.0369 Epoch: 7 Global Step: 97580 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 10:56:39,169-Speed 3000.45 samples/sec Loss 8.8898 LearningRate 0.0369 Epoch: 7 Global Step: 97590 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 10:56:42,603-Speed 2982.96 samples/sec Loss 8.7834 LearningRate 0.0369 Epoch: 7 Global Step: 97600 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 10:56:45,942-Speed 3067.56 samples/sec Loss 8.6872 LearningRate 0.0369 Epoch: 7 Global Step: 97610 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 10:56:49,324-Speed 3028.13 samples/sec Loss 8.9714 LearningRate 0.0368 Epoch: 7 Global Step: 97620 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 10:56:52,780-Speed 2964.20 samples/sec Loss 8.8118 LearningRate 0.0368 Epoch: 7 Global Step: 97630 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 10:56:56,110-Speed 3075.91 samples/sec Loss 8.8465 LearningRate 0.0368 Epoch: 7 Global Step: 97640 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 10:56:59,485-Speed 3035.07 samples/sec Loss 8.8060 LearningRate 0.0368 Epoch: 7 Global Step: 97650 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:57:02,817-Speed 3074.34 samples/sec Loss 8.8788 LearningRate 0.0368 Epoch: 7 Global Step: 97660 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:57:06,174-Speed 3050.94 samples/sec Loss 8.8354 LearningRate 0.0368 Epoch: 7 Global Step: 97670 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:57:09,608-Speed 2982.58 samples/sec Loss 8.8231 LearningRate 0.0368 Epoch: 7 Global Step: 97680 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:57:13,039-Speed 2985.41 samples/sec Loss 8.9274 LearningRate 0.0368 Epoch: 7 Global Step: 97690 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:57:16,487-Speed 2970.88 samples/sec Loss 8.8245 LearningRate 0.0368 Epoch: 7 Global Step: 97700 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 10:57:19,833-Speed 3061.26 samples/sec Loss 8.6798 LearningRate 0.0368 Epoch: 7 Global Step: 97710 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 10:57:23,338-Speed 2922.18 samples/sec Loss 8.8842 LearningRate 0.0368 Epoch: 7 Global Step: 97720 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 10:57:26,736-Speed 3014.26 samples/sec Loss 8.7079 LearningRate 0.0368 Epoch: 7 Global Step: 97730 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 10:57:30,132-Speed 3016.22 samples/sec Loss 8.7655 LearningRate 0.0368 Epoch: 7 Global Step: 97740 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 10:57:33,515-Speed 3027.99 samples/sec Loss 8.9700 LearningRate 0.0368 Epoch: 7 Global Step: 97750 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 10:57:36,936-Speed 2994.17 samples/sec Loss 8.8206 LearningRate 0.0368 Epoch: 7 Global Step: 97760 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 10:57:40,266-Speed 3076.47 samples/sec Loss 8.6962 LearningRate 0.0368 Epoch: 7 Global Step: 97770 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 10:57:43,628-Speed 3046.70 samples/sec Loss 8.7418 LearningRate 0.0368 Epoch: 7 Global Step: 97780 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 10:57:47,017-Speed 3022.85 samples/sec Loss 8.8790 LearningRate 0.0368 Epoch: 7 Global Step: 97790 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 10:57:50,346-Speed 3076.99 samples/sec Loss 8.6843 LearningRate 0.0368 Epoch: 7 Global Step: 97800 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:57:53,781-Speed 2981.80 samples/sec Loss 8.8219 LearningRate 0.0368 Epoch: 7 Global Step: 97810 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:57:57,207-Speed 2989.74 samples/sec Loss 8.8333 LearningRate 0.0368 Epoch: 7 Global Step: 97820 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:58:00,565-Speed 3050.81 samples/sec Loss 8.8749 LearningRate 0.0367 Epoch: 7 Global Step: 97830 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:58:03,937-Speed 3037.83 samples/sec Loss 8.7909 LearningRate 0.0367 Epoch: 7 Global Step: 97840 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:58:07,367-Speed 2985.46 samples/sec Loss 8.8529 LearningRate 0.0367 Epoch: 7 Global Step: 97850 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:58:10,750-Speed 3028.17 samples/sec Loss 8.6793 LearningRate 0.0367 Epoch: 7 Global Step: 97860 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:58:14,193-Speed 2975.23 samples/sec Loss 8.8471 LearningRate 0.0367 Epoch: 7 Global Step: 97870 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:58:17,552-Speed 3049.06 samples/sec Loss 8.7698 LearningRate 0.0367 Epoch: 7 Global Step: 97880 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:58:20,891-Speed 3067.90 samples/sec Loss 8.8465 LearningRate 0.0367 Epoch: 7 Global Step: 97890 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:58:24,283-Speed 3019.90 samples/sec Loss 8.7581 LearningRate 0.0367 Epoch: 7 Global Step: 97900 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 10:58:27,721-Speed 2979.53 samples/sec Loss 8.9542 LearningRate 0.0367 Epoch: 7 Global Step: 97910 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 10:58:31,133-Speed 3002.06 samples/sec Loss 8.6208 LearningRate 0.0367 Epoch: 7 Global Step: 97920 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 10:58:34,523-Speed 3021.15 samples/sec Loss 8.8773 LearningRate 0.0367 Epoch: 7 Global Step: 97930 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 10:58:37,902-Speed 3030.89 samples/sec Loss 8.9237 LearningRate 0.0367 Epoch: 7 Global Step: 97940 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 10:58:41,402-Speed 2927.03 samples/sec Loss 8.7250 LearningRate 0.0367 Epoch: 7 Global Step: 97950 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 10:58:44,738-Speed 3070.58 samples/sec Loss 8.8156 LearningRate 0.0367 Epoch: 7 Global Step: 97960 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 10:58:48,131-Speed 3018.41 samples/sec Loss 8.7263 LearningRate 0.0367 Epoch: 7 Global Step: 97970 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 10:58:51,452-Speed 3084.48 samples/sec Loss 8.8758 LearningRate 0.0367 Epoch: 7 Global Step: 97980 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:58:54,890-Speed 2978.68 samples/sec Loss 8.9573 LearningRate 0.0367 Epoch: 7 Global Step: 97990 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:58:58,286-Speed 3016.24 samples/sec Loss 8.8942 LearningRate 0.0367 Epoch: 7 Global Step: 98000 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:59:01,656-Speed 3039.99 samples/sec Loss 8.9468 LearningRate 0.0367 Epoch: 7 Global Step: 98010 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:59:05,011-Speed 3052.47 samples/sec Loss 8.8953 LearningRate 0.0367 Epoch: 7 Global Step: 98020 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:59:08,489-Speed 2945.66 samples/sec Loss 8.8656 LearningRate 0.0366 Epoch: 7 Global Step: 98030 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:59:11,933-Speed 2973.91 samples/sec Loss 8.9119 LearningRate 0.0366 Epoch: 7 Global Step: 98040 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:59:15,325-Speed 3022.95 samples/sec Loss 8.7959 LearningRate 0.0366 Epoch: 7 Global Step: 98050 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:59:18,745-Speed 2994.31 samples/sec Loss 8.8309 LearningRate 0.0366 Epoch: 7 Global Step: 98060 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:59:22,190-Speed 2974.07 samples/sec Loss 8.7300 LearningRate 0.0366 Epoch: 7 Global Step: 98070 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:59:25,615-Speed 2989.76 samples/sec Loss 8.8800 LearningRate 0.0366 Epoch: 7 Global Step: 98080 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 10:59:29,111-Speed 2930.39 samples/sec Loss 8.8846 LearningRate 0.0366 Epoch: 7 Global Step: 98090 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 10:59:32,522-Speed 3002.89 samples/sec Loss 8.9373 LearningRate 0.0366 Epoch: 7 Global Step: 98100 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 10:59:35,865-Speed 3063.93 samples/sec Loss 8.9864 LearningRate 0.0366 Epoch: 7 Global Step: 98110 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 10:59:39,209-Speed 3063.73 samples/sec Loss 8.8678 LearningRate 0.0366 Epoch: 7 Global Step: 98120 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 10:59:42,565-Speed 3052.10 samples/sec Loss 8.8094 LearningRate 0.0366 Epoch: 7 Global Step: 98130 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 10:59:45,976-Speed 3002.16 samples/sec Loss 8.7852 LearningRate 0.0366 Epoch: 7 Global Step: 98140 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:59:49,379-Speed 3010.42 samples/sec Loss 8.7659 LearningRate 0.0366 Epoch: 7 Global Step: 98150 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:59:52,791-Speed 3001.99 samples/sec Loss 8.8008 LearningRate 0.0366 Epoch: 7 Global Step: 98160 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:59:56,235-Speed 2973.94 samples/sec Loss 8.7948 LearningRate 0.0366 Epoch: 7 Global Step: 98170 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 10:59:59,713-Speed 2944.77 samples/sec Loss 8.8538 LearningRate 0.0366 Epoch: 7 Global Step: 98180 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:00:03,157-Speed 2974.31 samples/sec Loss 8.8353 LearningRate 0.0366 Epoch: 7 Global Step: 98190 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:00:06,551-Speed 3017.68 samples/sec Loss 8.8099 LearningRate 0.0366 Epoch: 7 Global Step: 98200 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:00:09,901-Speed 3058.29 samples/sec Loss 8.7490 LearningRate 0.0366 Epoch: 7 Global Step: 98210 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:00:13,373-Speed 2949.86 samples/sec Loss 8.8203 LearningRate 0.0366 Epoch: 7 Global Step: 98220 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:00:16,765-Speed 3023.56 samples/sec Loss 8.7997 LearningRate 0.0366 Epoch: 7 Global Step: 98230 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:00:20,121-Speed 3052.28 samples/sec Loss 8.9117 LearningRate 0.0365 Epoch: 7 Global Step: 98240 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:00:23,545-Speed 2991.56 samples/sec Loss 8.8297 LearningRate 0.0365 Epoch: 7 Global Step: 98250 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:00:26,995-Speed 2968.30 samples/sec Loss 8.7606 LearningRate 0.0365 Epoch: 7 Global Step: 98260 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:00:30,397-Speed 3011.56 samples/sec Loss 8.7752 LearningRate 0.0365 Epoch: 7 Global Step: 98270 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:00:33,782-Speed 3026.30 samples/sec Loss 8.8297 LearningRate 0.0365 Epoch: 7 Global Step: 98280 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:00:37,138-Speed 3051.31 samples/sec Loss 8.7576 LearningRate 0.0365 Epoch: 7 Global Step: 98290 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:00:40,449-Speed 3094.23 samples/sec Loss 8.7964 LearningRate 0.0365 Epoch: 7 Global Step: 98300 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:00:43,863-Speed 2999.94 samples/sec Loss 8.8885 LearningRate 0.0365 Epoch: 7 Global Step: 98310 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:00:47,311-Speed 2970.79 samples/sec Loss 8.8308 LearningRate 0.0365 Epoch: 7 Global Step: 98320 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:00:50,743-Speed 2985.08 samples/sec Loss 8.8482 LearningRate 0.0365 Epoch: 7 Global Step: 98330 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:00:54,141-Speed 3013.89 samples/sec Loss 8.7998 LearningRate 0.0365 Epoch: 7 Global Step: 98340 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 11:00:57,580-Speed 2978.54 samples/sec Loss 8.7347 LearningRate 0.0365 Epoch: 7 Global Step: 98350 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:01:00,897-Speed 3088.55 samples/sec Loss 8.8931 LearningRate 0.0365 Epoch: 7 Global Step: 98360 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:01:04,250-Speed 3054.85 samples/sec Loss 8.6643 LearningRate 0.0365 Epoch: 7 Global Step: 98370 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:01:07,621-Speed 3038.77 samples/sec Loss 8.8635 LearningRate 0.0365 Epoch: 7 Global Step: 98380 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:01:10,973-Speed 3055.87 samples/sec Loss 8.8141 LearningRate 0.0365 Epoch: 7 Global Step: 98390 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:01:14,293-Speed 3085.11 samples/sec Loss 8.6422 LearningRate 0.0365 Epoch: 7 Global Step: 98400 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:01:17,748-Speed 2964.52 samples/sec Loss 8.7483 LearningRate 0.0365 Epoch: 7 Global Step: 98410 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:01:21,198-Speed 2968.62 samples/sec Loss 8.8496 LearningRate 0.0365 Epoch: 7 Global Step: 98420 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:01:24,527-Speed 3076.98 samples/sec Loss 8.7667 LearningRate 0.0365 Epoch: 7 Global Step: 98430 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:01:27,851-Speed 3081.58 samples/sec Loss 8.5817 LearningRate 0.0364 Epoch: 7 Global Step: 98440 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:01:31,234-Speed 3027.85 samples/sec Loss 8.8284 LearningRate 0.0364 Epoch: 7 Global Step: 98450 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 11:01:34,611-Speed 3032.94 samples/sec Loss 8.8021 LearningRate 0.0364 Epoch: 7 Global Step: 98460 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 11:01:38,012-Speed 3012.05 samples/sec Loss 8.6987 LearningRate 0.0364 Epoch: 7 Global Step: 98470 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 11:01:41,404-Speed 3019.70 samples/sec Loss 8.7935 LearningRate 0.0364 Epoch: 7 Global Step: 98480 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 11:01:44,803-Speed 3013.40 samples/sec Loss 8.7207 LearningRate 0.0364 Epoch: 7 Global Step: 98490 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 11:01:48,137-Speed 3072.13 samples/sec Loss 8.7121 LearningRate 0.0364 Epoch: 7 Global Step: 98500 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 11:01:51,521-Speed 3026.88 samples/sec Loss 8.6753 LearningRate 0.0364 Epoch: 7 Global Step: 98510 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 11:01:54,907-Speed 3025.37 samples/sec Loss 8.7374 LearningRate 0.0364 Epoch: 7 Global Step: 98520 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 11:01:58,305-Speed 3014.56 samples/sec Loss 8.7676 LearningRate 0.0364 Epoch: 7 Global Step: 98530 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:02:01,684-Speed 3031.18 samples/sec Loss 8.7443 LearningRate 0.0364 Epoch: 7 Global Step: 98540 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:02:05,103-Speed 2995.37 samples/sec Loss 8.8289 LearningRate 0.0364 Epoch: 7 Global Step: 98550 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:02:08,535-Speed 2984.79 samples/sec Loss 8.6730 LearningRate 0.0364 Epoch: 7 Global Step: 98560 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:02:11,893-Speed 3050.33 samples/sec Loss 8.8245 LearningRate 0.0364 Epoch: 7 Global Step: 98570 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:02:15,237-Speed 3062.57 samples/sec Loss 8.6594 LearningRate 0.0364 Epoch: 7 Global Step: 98580 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:02:18,662-Speed 2990.85 samples/sec Loss 8.8145 LearningRate 0.0364 Epoch: 7 Global Step: 98590 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:02:22,142-Speed 2943.59 samples/sec Loss 8.6629 LearningRate 0.0364 Epoch: 7 Global Step: 98600 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:02:25,577-Speed 2981.56 samples/sec Loss 8.7493 LearningRate 0.0364 Epoch: 7 Global Step: 98610 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:02:28,961-Speed 3026.44 samples/sec Loss 8.8718 LearningRate 0.0364 Epoch: 7 Global Step: 98620 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:02:32,363-Speed 3010.82 samples/sec Loss 8.7751 LearningRate 0.0364 Epoch: 7 Global Step: 98630 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:02:35,805-Speed 2975.84 samples/sec Loss 8.9291 LearningRate 0.0364 Epoch: 7 Global Step: 98640 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:02:39,194-Speed 3022.86 samples/sec Loss 8.8631 LearningRate 0.0363 Epoch: 7 Global Step: 98650 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:02:42,623-Speed 2987.05 samples/sec Loss 8.8539 LearningRate 0.0363 Epoch: 7 Global Step: 98660 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:02:46,002-Speed 3031.72 samples/sec Loss 8.8526 LearningRate 0.0363 Epoch: 7 Global Step: 98670 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:02:49,451-Speed 2969.13 samples/sec Loss 8.7111 LearningRate 0.0363 Epoch: 7 Global Step: 98680 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:02:52,843-Speed 3020.08 samples/sec Loss 8.7559 LearningRate 0.0363 Epoch: 7 Global Step: 98690 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:02:56,243-Speed 3012.10 samples/sec Loss 8.8666 LearningRate 0.0363 Epoch: 7 Global Step: 98700 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:02:59,673-Speed 2986.71 samples/sec Loss 8.8236 LearningRate 0.0363 Epoch: 7 Global Step: 98710 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:03:03,061-Speed 3023.33 samples/sec Loss 8.8859 LearningRate 0.0363 Epoch: 7 Global Step: 98720 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:03:06,463-Speed 3011.36 samples/sec Loss 8.7962 LearningRate 0.0363 Epoch: 7 Global Step: 98730 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:03:09,770-Speed 3096.92 samples/sec Loss 8.7788 LearningRate 0.0363 Epoch: 7 Global Step: 98740 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:03:13,103-Speed 3073.44 samples/sec Loss 8.7348 LearningRate 0.0363 Epoch: 7 Global Step: 98750 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:03:16,462-Speed 3048.61 samples/sec Loss 8.9729 LearningRate 0.0363 Epoch: 7 Global Step: 98760 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:03:19,822-Speed 3048.66 samples/sec Loss 8.8586 LearningRate 0.0363 Epoch: 7 Global Step: 98770 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:03:23,176-Speed 3054.40 samples/sec Loss 8.6978 LearningRate 0.0363 Epoch: 7 Global Step: 98780 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:03:26,561-Speed 3025.12 samples/sec Loss 8.6570 LearningRate 0.0363 Epoch: 7 Global Step: 98790 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:03:29,953-Speed 3020.43 samples/sec Loss 8.7406 LearningRate 0.0363 Epoch: 7 Global Step: 98800 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:03:33,337-Speed 3026.36 samples/sec Loss 8.7189 LearningRate 0.0363 Epoch: 7 Global Step: 98810 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:03:36,788-Speed 2968.27 samples/sec Loss 8.7848 LearningRate 0.0363 Epoch: 7 Global Step: 98820 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:03:40,165-Speed 3033.03 samples/sec Loss 8.7555 LearningRate 0.0363 Epoch: 7 Global Step: 98830 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:03:43,616-Speed 2968.13 samples/sec Loss 8.7859 LearningRate 0.0363 Epoch: 7 Global Step: 98840 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:03:47,011-Speed 3016.65 samples/sec Loss 8.8081 LearningRate 0.0363 Epoch: 7 Global Step: 98850 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:03:50,399-Speed 3023.37 samples/sec Loss 8.9326 LearningRate 0.0362 Epoch: 7 Global Step: 98860 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:03:53,729-Speed 3076.01 samples/sec Loss 8.6891 LearningRate 0.0362 Epoch: 7 Global Step: 98870 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:03:57,092-Speed 3045.67 samples/sec Loss 8.6904 LearningRate 0.0362 Epoch: 7 Global Step: 98880 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:04:00,514-Speed 2993.35 samples/sec Loss 8.7428 LearningRate 0.0362 Epoch: 7 Global Step: 98890 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 11:04:03,911-Speed 3014.88 samples/sec Loss 8.7562 LearningRate 0.0362 Epoch: 7 Global Step: 98900 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:04:07,345-Speed 2983.12 samples/sec Loss 8.8476 LearningRate 0.0362 Epoch: 7 Global Step: 98910 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:04:10,724-Speed 3031.71 samples/sec Loss 8.8251 LearningRate 0.0362 Epoch: 7 Global Step: 98920 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:04:14,089-Speed 3043.58 samples/sec Loss 8.8441 LearningRate 0.0362 Epoch: 7 Global Step: 98930 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:04:17,433-Speed 3063.38 samples/sec Loss 8.8471 LearningRate 0.0362 Epoch: 7 Global Step: 98940 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:04:20,828-Speed 3016.98 samples/sec Loss 8.7348 LearningRate 0.0362 Epoch: 7 Global Step: 98950 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:04:24,200-Speed 3037.47 samples/sec Loss 8.8005 LearningRate 0.0362 Epoch: 7 Global Step: 98960 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:04:27,659-Speed 2960.76 samples/sec Loss 8.6458 LearningRate 0.0362 Epoch: 7 Global Step: 98970 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:04:31,052-Speed 3019.14 samples/sec Loss 8.7909 LearningRate 0.0362 Epoch: 7 Global Step: 98980 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:04:34,456-Speed 3008.95 samples/sec Loss 8.7020 LearningRate 0.0362 Epoch: 7 Global Step: 98990 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:04:37,827-Speed 3038.79 samples/sec Loss 8.7750 LearningRate 0.0362 Epoch: 7 Global Step: 99000 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 11:04:41,198-Speed 3038.34 samples/sec Loss 8.7691 LearningRate 0.0362 Epoch: 7 Global Step: 99010 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 11:04:44,521-Speed 3083.19 samples/sec Loss 8.6457 LearningRate 0.0362 Epoch: 7 Global Step: 99020 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:04:47,889-Speed 3041.27 samples/sec Loss 8.7752 LearningRate 0.0362 Epoch: 7 Global Step: 99030 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:04:51,283-Speed 3017.60 samples/sec Loss 8.7709 LearningRate 0.0362 Epoch: 7 Global Step: 99040 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:04:54,675-Speed 3019.59 samples/sec Loss 8.7951 LearningRate 0.0362 Epoch: 7 Global Step: 99050 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:04:58,038-Speed 3046.71 samples/sec Loss 8.6726 LearningRate 0.0361 Epoch: 7 Global Step: 99060 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:05:01,381-Speed 3062.97 samples/sec Loss 8.7520 LearningRate 0.0361 Epoch: 7 Global Step: 99070 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:05:04,764-Speed 3028.06 samples/sec Loss 8.6848 LearningRate 0.0361 Epoch: 7 Global Step: 99080 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:05:08,153-Speed 3022.69 samples/sec Loss 8.8206 LearningRate 0.0361 Epoch: 7 Global Step: 99090 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:05:11,456-Speed 3101.22 samples/sec Loss 8.7614 LearningRate 0.0361 Epoch: 7 Global Step: 99100 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:05:14,770-Speed 3090.47 samples/sec Loss 8.8022 LearningRate 0.0361 Epoch: 7 Global Step: 99110 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:05:18,182-Speed 3001.97 samples/sec Loss 8.8407 LearningRate 0.0361 Epoch: 7 Global Step: 99120 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:05:21,526-Speed 3063.66 samples/sec Loss 8.8280 LearningRate 0.0361 Epoch: 7 Global Step: 99130 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:05:24,945-Speed 2995.75 samples/sec Loss 8.6800 LearningRate 0.0361 Epoch: 7 Global Step: 99140 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:05:28,259-Speed 3090.34 samples/sec Loss 8.8605 LearningRate 0.0361 Epoch: 7 Global Step: 99150 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:05:31,633-Speed 3036.10 samples/sec Loss 8.7610 LearningRate 0.0361 Epoch: 7 Global Step: 99160 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:05:35,016-Speed 3027.80 samples/sec Loss 8.7074 LearningRate 0.0361 Epoch: 7 Global Step: 99170 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:05:38,474-Speed 2961.81 samples/sec Loss 8.8910 LearningRate 0.0361 Epoch: 7 Global Step: 99180 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:05:41,911-Speed 2980.06 samples/sec Loss 8.7482 LearningRate 0.0361 Epoch: 7 Global Step: 99190 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:05:45,269-Speed 3051.00 samples/sec Loss 8.5725 LearningRate 0.0361 Epoch: 7 Global Step: 99200 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:05:48,705-Speed 2981.01 samples/sec Loss 8.7157 LearningRate 0.0361 Epoch: 7 Global Step: 99210 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:05:52,045-Speed 3067.01 samples/sec Loss 8.7156 LearningRate 0.0361 Epoch: 7 Global Step: 99220 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:05:55,417-Speed 3037.30 samples/sec Loss 8.7131 LearningRate 0.0361 Epoch: 7 Global Step: 99230 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:05:58,834-Speed 2997.81 samples/sec Loss 8.6120 LearningRate 0.0361 Epoch: 7 Global Step: 99240 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:06:02,224-Speed 3021.51 samples/sec Loss 8.8910 LearningRate 0.0361 Epoch: 7 Global Step: 99250 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:06:05,588-Speed 3044.74 samples/sec Loss 8.6300 LearningRate 0.0361 Epoch: 7 Global Step: 99260 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:06:08,920-Speed 3074.47 samples/sec Loss 8.7194 LearningRate 0.0360 Epoch: 7 Global Step: 99270 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:06:12,307-Speed 3023.78 samples/sec Loss 8.7861 LearningRate 0.0360 Epoch: 7 Global Step: 99280 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:06:15,666-Speed 3049.49 samples/sec Loss 8.6518 LearningRate 0.0360 Epoch: 7 Global Step: 99290 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:06:19,131-Speed 2956.31 samples/sec Loss 8.5905 LearningRate 0.0360 Epoch: 7 Global Step: 99300 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:06:22,605-Speed 2948.21 samples/sec Loss 8.7057 LearningRate 0.0360 Epoch: 7 Global Step: 99310 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:06:25,931-Speed 3079.82 samples/sec Loss 8.7123 LearningRate 0.0360 Epoch: 7 Global Step: 99320 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:06:29,296-Speed 3043.75 samples/sec Loss 8.8262 LearningRate 0.0360 Epoch: 7 Global Step: 99330 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:06:32,653-Speed 3050.88 samples/sec Loss 8.8175 LearningRate 0.0360 Epoch: 7 Global Step: 99340 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:06:36,001-Speed 3060.04 samples/sec Loss 8.6552 LearningRate 0.0360 Epoch: 7 Global Step: 99350 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:06:39,631-Speed 2821.74 samples/sec Loss 8.6464 LearningRate 0.0360 Epoch: 7 Global Step: 99360 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:07:11,891-Speed 317.44 samples/sec Loss 8.3451 LearningRate 0.0360 Epoch: 8 Global Step: 99370 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:07:15,531-Speed 2814.11 samples/sec Loss 7.1314 LearningRate 0.0360 Epoch: 8 Global Step: 99380 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:07:19,174-Speed 2812.01 samples/sec Loss 7.3146 LearningRate 0.0360 Epoch: 8 Global Step: 99390 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:07:22,530-Speed 3052.14 samples/sec Loss 7.1750 LearningRate 0.0360 Epoch: 8 Global Step: 99400 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:07:25,929-Speed 3013.39 samples/sec Loss 7.1601 LearningRate 0.0360 Epoch: 8 Global Step: 99410 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:07:29,287-Speed 3050.15 samples/sec Loss 7.2595 LearningRate 0.0360 Epoch: 8 Global Step: 99420 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 11:07:32,630-Speed 3065.11 samples/sec Loss 7.1778 LearningRate 0.0360 Epoch: 8 Global Step: 99430 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 11:07:36,031-Speed 3011.82 samples/sec Loss 7.1284 LearningRate 0.0360 Epoch: 8 Global Step: 99440 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 11:07:39,590-Speed 2877.40 samples/sec Loss 7.2354 LearningRate 0.0360 Epoch: 8 Global Step: 99450 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 11:07:42,989-Speed 3014.10 samples/sec Loss 7.3683 LearningRate 0.0360 Epoch: 8 Global Step: 99460 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:07:46,330-Speed 3065.42 samples/sec Loss 7.2832 LearningRate 0.0360 Epoch: 8 Global Step: 99470 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:07:49,744-Speed 3000.85 samples/sec Loss 7.3471 LearningRate 0.0359 Epoch: 8 Global Step: 99480 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:07:53,128-Speed 3026.66 samples/sec Loss 7.2503 LearningRate 0.0359 Epoch: 8 Global Step: 99490 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:07:56,463-Speed 3070.87 samples/sec Loss 7.3702 LearningRate 0.0359 Epoch: 8 Global Step: 99500 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:07:59,779-Speed 3088.90 samples/sec Loss 7.2685 LearningRate 0.0359 Epoch: 8 Global Step: 99510 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:08:03,146-Speed 3042.39 samples/sec Loss 7.3333 LearningRate 0.0359 Epoch: 8 Global Step: 99520 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:08:06,468-Speed 3083.11 samples/sec Loss 7.1936 LearningRate 0.0359 Epoch: 8 Global Step: 99530 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:08:09,853-Speed 3026.87 samples/sec Loss 7.2725 LearningRate 0.0359 Epoch: 8 Global Step: 99540 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:08:13,201-Speed 3059.26 samples/sec Loss 7.2587 LearningRate 0.0359 Epoch: 8 Global Step: 99550 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:08:16,509-Speed 3097.09 samples/sec Loss 7.3167 LearningRate 0.0359 Epoch: 8 Global Step: 99560 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:08:19,806-Speed 3106.85 samples/sec Loss 7.4791 LearningRate 0.0359 Epoch: 8 Global Step: 99570 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:08:23,128-Speed 3083.01 samples/sec Loss 7.3314 LearningRate 0.0359 Epoch: 8 Global Step: 99580 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:08:26,447-Speed 3085.64 samples/sec Loss 7.3774 LearningRate 0.0359 Epoch: 8 Global Step: 99590 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:08:29,812-Speed 3044.68 samples/sec Loss 7.4138 LearningRate 0.0359 Epoch: 8 Global Step: 99600 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:08:33,206-Speed 3017.94 samples/sec Loss 7.3264 LearningRate 0.0359 Epoch: 8 Global Step: 99610 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:08:36,579-Speed 3036.46 samples/sec Loss 7.3207 LearningRate 0.0359 Epoch: 8 Global Step: 99620 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:08:40,033-Speed 2965.78 samples/sec Loss 7.3822 LearningRate 0.0359 Epoch: 8 Global Step: 99630 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:08:43,414-Speed 3030.13 samples/sec Loss 7.4612 LearningRate 0.0359 Epoch: 8 Global Step: 99640 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:08:46,815-Speed 3011.88 samples/sec Loss 7.4116 LearningRate 0.0359 Epoch: 8 Global Step: 99650 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:08:50,357-Speed 2891.82 samples/sec Loss 7.2968 LearningRate 0.0359 Epoch: 8 Global Step: 99660 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:08:53,984-Speed 2824.60 samples/sec Loss 7.4792 LearningRate 0.0359 Epoch: 8 Global Step: 99670 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:08:57,449-Speed 2955.98 samples/sec Loss 7.3006 LearningRate 0.0358 Epoch: 8 Global Step: 99680 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:09:00,824-Speed 3035.12 samples/sec Loss 7.3465 LearningRate 0.0358 Epoch: 8 Global Step: 99690 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:09:04,208-Speed 3026.74 samples/sec Loss 7.3286 LearningRate 0.0358 Epoch: 8 Global Step: 99700 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:09:07,606-Speed 3014.46 samples/sec Loss 7.3858 LearningRate 0.0358 Epoch: 8 Global Step: 99710 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:09:11,027-Speed 2993.58 samples/sec Loss 7.3979 LearningRate 0.0358 Epoch: 8 Global Step: 99720 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:09:14,446-Speed 2996.31 samples/sec Loss 7.3965 LearningRate 0.0358 Epoch: 8 Global Step: 99730 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:09:17,821-Speed 3035.18 samples/sec Loss 7.3293 LearningRate 0.0358 Epoch: 8 Global Step: 99740 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:09:21,264-Speed 2975.05 samples/sec Loss 7.3937 LearningRate 0.0358 Epoch: 8 Global Step: 99750 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:09:24,631-Speed 3042.05 samples/sec Loss 7.3941 LearningRate 0.0358 Epoch: 8 Global Step: 99760 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:09:28,024-Speed 3019.13 samples/sec Loss 7.4289 LearningRate 0.0358 Epoch: 8 Global Step: 99770 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 11:09:31,401-Speed 3033.80 samples/sec Loss 7.4566 LearningRate 0.0358 Epoch: 8 Global Step: 99780 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:09:34,747-Speed 3061.11 samples/sec Loss 7.4410 LearningRate 0.0358 Epoch: 8 Global Step: 99790 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:09:38,101-Speed 3053.62 samples/sec Loss 7.4193 LearningRate 0.0358 Epoch: 8 Global Step: 99800 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:09:41,409-Speed 3096.88 samples/sec Loss 7.5260 LearningRate 0.0358 Epoch: 8 Global Step: 99810 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:09:44,729-Speed 3085.57 samples/sec Loss 7.4995 LearningRate 0.0358 Epoch: 8 Global Step: 99820 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:09:48,848-Speed 2486.38 samples/sec Loss 7.4539 LearningRate 0.0358 Epoch: 8 Global Step: 99830 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:09:52,184-Speed 3071.20 samples/sec Loss 7.4625 LearningRate 0.0358 Epoch: 8 Global Step: 99840 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:09:55,542-Speed 3049.51 samples/sec Loss 7.4311 LearningRate 0.0358 Epoch: 8 Global Step: 99850 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:09:58,891-Speed 3058.74 samples/sec Loss 7.4624 LearningRate 0.0358 Epoch: 8 Global Step: 99860 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:10:02,214-Speed 3082.20 samples/sec Loss 7.5377 LearningRate 0.0358 Epoch: 8 Global Step: 99870 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:10:05,591-Speed 3033.15 samples/sec Loss 7.3936 LearningRate 0.0358 Epoch: 8 Global Step: 99880 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:10:08,968-Speed 3033.26 samples/sec Loss 7.4006 LearningRate 0.0357 Epoch: 8 Global Step: 99890 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:10:12,352-Speed 3027.08 samples/sec Loss 7.5452 LearningRate 0.0357 Epoch: 8 Global Step: 99900 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:10:15,716-Speed 3045.28 samples/sec Loss 7.5889 LearningRate 0.0357 Epoch: 8 Global Step: 99910 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:10:19,058-Speed 3064.93 samples/sec Loss 7.4154 LearningRate 0.0357 Epoch: 8 Global Step: 99920 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:10:22,434-Speed 3034.05 samples/sec Loss 7.5875 LearningRate 0.0357 Epoch: 8 Global Step: 99930 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:10:25,815-Speed 3029.37 samples/sec Loss 7.4723 LearningRate 0.0357 Epoch: 8 Global Step: 99940 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:10:29,183-Speed 3041.17 samples/sec Loss 7.3787 LearningRate 0.0357 Epoch: 8 Global Step: 99950 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:10:32,609-Speed 2990.40 samples/sec Loss 7.4771 LearningRate 0.0357 Epoch: 8 Global Step: 99960 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:10:35,929-Speed 3084.98 samples/sec Loss 7.6808 LearningRate 0.0357 Epoch: 8 Global Step: 99970 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:10:39,276-Speed 3060.15 samples/sec Loss 7.6132 LearningRate 0.0357 Epoch: 8 Global Step: 99980 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:10:42,705-Speed 2987.24 samples/sec Loss 7.6285 LearningRate 0.0357 Epoch: 8 Global Step: 99990 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:10:46,050-Speed 3062.24 samples/sec Loss 7.5130 LearningRate 0.0357 Epoch: 8 Global Step: 100000 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:10:49,411-Speed 3048.08 samples/sec Loss 7.6380 LearningRate 0.0357 Epoch: 8 Global Step: 100010 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:10:52,755-Speed 3062.45 samples/sec Loss 7.4507 LearningRate 0.0357 Epoch: 8 Global Step: 100020 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:10:56,117-Speed 3047.07 samples/sec Loss 7.6197 LearningRate 0.0357 Epoch: 8 Global Step: 100030 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:10:59,540-Speed 2992.72 samples/sec Loss 7.5550 LearningRate 0.0357 Epoch: 8 Global Step: 100040 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:11:02,866-Speed 3079.67 samples/sec Loss 7.5072 LearningRate 0.0357 Epoch: 8 Global Step: 100050 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:11:06,311-Speed 2973.49 samples/sec Loss 7.5440 LearningRate 0.0357 Epoch: 8 Global Step: 100060 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:11:09,699-Speed 3022.81 samples/sec Loss 7.5776 LearningRate 0.0357 Epoch: 8 Global Step: 100070 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:11:13,101-Speed 3011.41 samples/sec Loss 7.7882 LearningRate 0.0357 Epoch: 8 Global Step: 100080 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:11:16,457-Speed 3052.71 samples/sec Loss 7.6378 LearningRate 0.0357 Epoch: 8 Global Step: 100090 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:11:19,816-Speed 3048.55 samples/sec Loss 7.6560 LearningRate 0.0356 Epoch: 8 Global Step: 100100 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 11:11:23,149-Speed 3073.90 samples/sec Loss 7.5470 LearningRate 0.0356 Epoch: 8 Global Step: 100110 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 11:11:26,503-Speed 3053.98 samples/sec Loss 7.5587 LearningRate 0.0356 Epoch: 8 Global Step: 100120 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:11:29,854-Speed 3056.41 samples/sec Loss 7.6051 LearningRate 0.0356 Epoch: 8 Global Step: 100130 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:11:33,233-Speed 3031.47 samples/sec Loss 7.4905 LearningRate 0.0356 Epoch: 8 Global Step: 100140 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:11:36,612-Speed 3032.19 samples/sec Loss 7.6335 LearningRate 0.0356 Epoch: 8 Global Step: 100150 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:11:39,971-Speed 3049.72 samples/sec Loss 7.6822 LearningRate 0.0356 Epoch: 8 Global Step: 100160 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:11:43,375-Speed 3009.35 samples/sec Loss 7.6212 LearningRate 0.0356 Epoch: 8 Global Step: 100170 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:11:46,766-Speed 3020.52 samples/sec Loss 7.6544 LearningRate 0.0356 Epoch: 8 Global Step: 100180 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:11:50,139-Speed 3036.64 samples/sec Loss 7.6865 LearningRate 0.0356 Epoch: 8 Global Step: 100190 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:11:53,548-Speed 3004.54 samples/sec Loss 7.5816 LearningRate 0.0356 Epoch: 8 Global Step: 100200 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:11:56,949-Speed 3012.03 samples/sec Loss 7.7824 LearningRate 0.0356 Epoch: 8 Global Step: 100210 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:12:00,326-Speed 3032.50 samples/sec Loss 7.7136 LearningRate 0.0356 Epoch: 8 Global Step: 100220 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 11:12:03,687-Speed 3048.89 samples/sec Loss 7.6310 LearningRate 0.0356 Epoch: 8 Global Step: 100230 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 11:12:07,079-Speed 3019.51 samples/sec Loss 7.6648 LearningRate 0.0356 Epoch: 8 Global Step: 100240 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 11:12:10,513-Speed 2982.95 samples/sec Loss 7.7585 LearningRate 0.0356 Epoch: 8 Global Step: 100250 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 11:12:13,842-Speed 3076.68 samples/sec Loss 7.7105 LearningRate 0.0356 Epoch: 8 Global Step: 100260 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:12:17,149-Speed 3097.54 samples/sec Loss 7.6160 LearningRate 0.0356 Epoch: 8 Global Step: 100270 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:12:20,456-Speed 3096.39 samples/sec Loss 7.6795 LearningRate 0.0356 Epoch: 8 Global Step: 100280 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:12:23,870-Speed 3000.61 samples/sec Loss 7.6807 LearningRate 0.0356 Epoch: 8 Global Step: 100290 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:12:27,197-Speed 3078.94 samples/sec Loss 7.5552 LearningRate 0.0356 Epoch: 8 Global Step: 100300 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:12:30,528-Speed 3075.52 samples/sec Loss 7.6675 LearningRate 0.0355 Epoch: 8 Global Step: 100310 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:12:33,949-Speed 2993.81 samples/sec Loss 7.8786 LearningRate 0.0355 Epoch: 8 Global Step: 100320 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:12:37,290-Speed 3065.95 samples/sec Loss 7.8891 LearningRate 0.0355 Epoch: 8 Global Step: 100330 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:12:40,619-Speed 3077.46 samples/sec Loss 7.7168 LearningRate 0.0355 Epoch: 8 Global Step: 100340 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:12:43,977-Speed 3050.87 samples/sec Loss 7.7196 LearningRate 0.0355 Epoch: 8 Global Step: 100350 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:12:47,326-Speed 3057.89 samples/sec Loss 7.7625 LearningRate 0.0355 Epoch: 8 Global Step: 100360 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 11:12:50,708-Speed 3028.56 samples/sec Loss 7.7323 LearningRate 0.0355 Epoch: 8 Global Step: 100370 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 11:12:54,023-Speed 3090.63 samples/sec Loss 7.6857 LearningRate 0.0355 Epoch: 8 Global Step: 100380 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 11:12:57,451-Speed 2986.96 samples/sec Loss 7.7657 LearningRate 0.0355 Epoch: 8 Global Step: 100390 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 11:13:00,798-Speed 3060.38 samples/sec Loss 7.7763 LearningRate 0.0355 Epoch: 8 Global Step: 100400 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 11:13:04,208-Speed 3005.07 samples/sec Loss 7.7927 LearningRate 0.0355 Epoch: 8 Global Step: 100410 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 11:13:07,602-Speed 3017.87 samples/sec Loss 7.8617 LearningRate 0.0355 Epoch: 8 Global Step: 100420 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 11:13:11,076-Speed 2948.20 samples/sec Loss 7.7348 LearningRate 0.0355 Epoch: 8 Global Step: 100430 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 11:13:14,390-Speed 3091.02 samples/sec Loss 7.8245 LearningRate 0.0355 Epoch: 8 Global Step: 100440 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 11:13:17,738-Speed 3059.35 samples/sec Loss 7.6511 LearningRate 0.0355 Epoch: 8 Global Step: 100450 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 11:13:21,113-Speed 3035.24 samples/sec Loss 7.5906 LearningRate 0.0355 Epoch: 8 Global Step: 100460 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:13:24,434-Speed 3084.77 samples/sec Loss 7.7297 LearningRate 0.0355 Epoch: 8 Global Step: 100470 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:13:27,761-Speed 3078.41 samples/sec Loss 7.7054 LearningRate 0.0355 Epoch: 8 Global Step: 100480 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:13:31,087-Speed 3079.99 samples/sec Loss 7.7374 LearningRate 0.0355 Epoch: 8 Global Step: 100490 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:13:34,498-Speed 3002.63 samples/sec Loss 8.0391 LearningRate 0.0355 Epoch: 8 Global Step: 100500 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:13:37,979-Speed 2941.77 samples/sec Loss 7.7674 LearningRate 0.0355 Epoch: 8 Global Step: 100510 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:13:41,376-Speed 3016.27 samples/sec Loss 7.7639 LearningRate 0.0354 Epoch: 8 Global Step: 100520 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:13:44,761-Speed 3025.44 samples/sec Loss 7.8283 LearningRate 0.0354 Epoch: 8 Global Step: 100530 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:13:48,155-Speed 3017.68 samples/sec Loss 7.8281 LearningRate 0.0354 Epoch: 8 Global Step: 100540 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:13:51,570-Speed 2999.43 samples/sec Loss 7.7697 LearningRate 0.0354 Epoch: 8 Global Step: 100550 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:13:54,951-Speed 3030.00 samples/sec Loss 7.9313 LearningRate 0.0354 Epoch: 8 Global Step: 100560 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:13:58,280-Speed 3075.84 samples/sec Loss 7.8098 LearningRate 0.0354 Epoch: 8 Global Step: 100570 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:14:01,674-Speed 3018.64 samples/sec Loss 7.7978 LearningRate 0.0354 Epoch: 8 Global Step: 100580 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:14:05,120-Speed 2972.52 samples/sec Loss 7.8973 LearningRate 0.0354 Epoch: 8 Global Step: 100590 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:14:08,490-Speed 3038.70 samples/sec Loss 7.7771 LearningRate 0.0354 Epoch: 8 Global Step: 100600 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:14:11,823-Speed 3074.41 samples/sec Loss 7.8065 LearningRate 0.0354 Epoch: 8 Global Step: 100610 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:14:15,139-Speed 3088.46 samples/sec Loss 7.9188 LearningRate 0.0354 Epoch: 8 Global Step: 100620 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:14:18,468-Speed 3077.65 samples/sec Loss 7.8083 LearningRate 0.0354 Epoch: 8 Global Step: 100630 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:14:21,821-Speed 3054.44 samples/sec Loss 7.6664 LearningRate 0.0354 Epoch: 8 Global Step: 100640 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:14:25,123-Speed 3102.60 samples/sec Loss 7.7320 LearningRate 0.0354 Epoch: 8 Global Step: 100650 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:14:28,578-Speed 2965.15 samples/sec Loss 7.8602 LearningRate 0.0354 Epoch: 8 Global Step: 100660 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:14:31,988-Speed 3003.35 samples/sec Loss 7.8082 LearningRate 0.0354 Epoch: 8 Global Step: 100670 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:14:35,369-Speed 3029.99 samples/sec Loss 7.9484 LearningRate 0.0354 Epoch: 8 Global Step: 100680 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:14:38,704-Speed 3071.58 samples/sec Loss 7.8321 LearningRate 0.0354 Epoch: 8 Global Step: 100690 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 11:14:42,153-Speed 2969.52 samples/sec Loss 7.8563 LearningRate 0.0354 Epoch: 8 Global Step: 100700 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 11:14:45,460-Speed 3097.80 samples/sec Loss 7.7318 LearningRate 0.0354 Epoch: 8 Global Step: 100710 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 11:14:48,773-Speed 3091.49 samples/sec Loss 8.0411 LearningRate 0.0353 Epoch: 8 Global Step: 100720 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:14:52,113-Speed 3066.33 samples/sec Loss 7.8553 LearningRate 0.0353 Epoch: 8 Global Step: 100730 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:14:55,502-Speed 3022.42 samples/sec Loss 7.8334 LearningRate 0.0353 Epoch: 8 Global Step: 100740 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:14:58,959-Speed 2962.80 samples/sec Loss 7.8685 LearningRate 0.0353 Epoch: 8 Global Step: 100750 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:15:02,350-Speed 3020.31 samples/sec Loss 7.9087 LearningRate 0.0353 Epoch: 8 Global Step: 100760 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:15:05,676-Speed 3080.17 samples/sec Loss 7.8606 LearningRate 0.0353 Epoch: 8 Global Step: 100770 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:15:09,063-Speed 3024.57 samples/sec Loss 7.8402 LearningRate 0.0353 Epoch: 8 Global Step: 100780 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:15:12,529-Speed 2955.11 samples/sec Loss 7.9247 LearningRate 0.0353 Epoch: 8 Global Step: 100790 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:15:15,909-Speed 3030.76 samples/sec Loss 7.8795 LearningRate 0.0353 Epoch: 8 Global Step: 100800 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:15:19,287-Speed 3032.38 samples/sec Loss 8.0003 LearningRate 0.0353 Epoch: 8 Global Step: 100810 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:15:22,661-Speed 3035.83 samples/sec Loss 7.8783 LearningRate 0.0353 Epoch: 8 Global Step: 100820 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 11:15:26,035-Speed 3036.02 samples/sec Loss 7.9106 LearningRate 0.0353 Epoch: 8 Global Step: 100830 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:15:29,373-Speed 3069.03 samples/sec Loss 7.8657 LearningRate 0.0353 Epoch: 8 Global Step: 100840 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:15:32,725-Speed 3055.60 samples/sec Loss 7.8959 LearningRate 0.0353 Epoch: 8 Global Step: 100850 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:15:36,050-Speed 3080.35 samples/sec Loss 7.9777 LearningRate 0.0353 Epoch: 8 Global Step: 100860 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:15:39,447-Speed 3015.94 samples/sec Loss 7.9972 LearningRate 0.0353 Epoch: 8 Global Step: 100870 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:15:42,796-Speed 3058.64 samples/sec Loss 7.8998 LearningRate 0.0353 Epoch: 8 Global Step: 100880 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:15:46,234-Speed 2979.55 samples/sec Loss 7.8992 LearningRate 0.0353 Epoch: 8 Global Step: 100890 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:15:49,652-Speed 2996.65 samples/sec Loss 7.9697 LearningRate 0.0353 Epoch: 8 Global Step: 100900 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:15:53,068-Speed 2998.36 samples/sec Loss 7.9922 LearningRate 0.0353 Epoch: 8 Global Step: 100910 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:15:56,418-Speed 3057.21 samples/sec Loss 8.0528 LearningRate 0.0353 Epoch: 8 Global Step: 100920 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:15:59,787-Speed 3040.22 samples/sec Loss 7.9242 LearningRate 0.0352 Epoch: 8 Global Step: 100930 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:16:03,116-Speed 3077.15 samples/sec Loss 7.9446 LearningRate 0.0352 Epoch: 8 Global Step: 100940 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:16:06,495-Speed 3031.72 samples/sec Loss 8.0086 LearningRate 0.0352 Epoch: 8 Global Step: 100950 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:16:09,879-Speed 3026.72 samples/sec Loss 7.9094 LearningRate 0.0352 Epoch: 8 Global Step: 100960 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:16:13,216-Speed 3069.66 samples/sec Loss 8.0628 LearningRate 0.0352 Epoch: 8 Global Step: 100970 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:16:16,682-Speed 2954.95 samples/sec Loss 7.9397 LearningRate 0.0352 Epoch: 8 Global Step: 100980 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:16:20,134-Speed 2967.11 samples/sec Loss 7.9410 LearningRate 0.0352 Epoch: 8 Global Step: 100990 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:16:23,522-Speed 3023.52 samples/sec Loss 8.0036 LearningRate 0.0352 Epoch: 8 Global Step: 101000 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:16:26,879-Speed 3050.59 samples/sec Loss 8.0455 LearningRate 0.0352 Epoch: 8 Global Step: 101010 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:16:30,306-Speed 2989.01 samples/sec Loss 7.8875 LearningRate 0.0352 Epoch: 8 Global Step: 101020 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:16:33,647-Speed 3066.03 samples/sec Loss 8.0504 LearningRate 0.0352 Epoch: 8 Global Step: 101030 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:16:37,083-Speed 2981.18 samples/sec Loss 7.9217 LearningRate 0.0352 Epoch: 8 Global Step: 101040 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:16:40,512-Speed 2987.15 samples/sec Loss 7.9317 LearningRate 0.0352 Epoch: 8 Global Step: 101050 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:16:43,904-Speed 3020.19 samples/sec Loss 7.8957 LearningRate 0.0352 Epoch: 8 Global Step: 101060 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:16:47,246-Speed 3064.02 samples/sec Loss 7.9089 LearningRate 0.0352 Epoch: 8 Global Step: 101070 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:16:50,646-Speed 3013.07 samples/sec Loss 8.0180 LearningRate 0.0352 Epoch: 8 Global Step: 101080 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:16:54,057-Speed 3002.83 samples/sec Loss 7.9945 LearningRate 0.0352 Epoch: 8 Global Step: 101090 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:16:57,459-Speed 3011.12 samples/sec Loss 7.9189 LearningRate 0.0352 Epoch: 8 Global Step: 101100 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:17:00,868-Speed 3004.27 samples/sec Loss 7.9558 LearningRate 0.0352 Epoch: 8 Global Step: 101110 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:17:04,222-Speed 3054.05 samples/sec Loss 7.9728 LearningRate 0.0352 Epoch: 8 Global Step: 101120 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:17:07,711-Speed 2935.82 samples/sec Loss 7.9117 LearningRate 0.0352 Epoch: 8 Global Step: 101130 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:17:11,088-Speed 3033.22 samples/sec Loss 7.9994 LearningRate 0.0351 Epoch: 8 Global Step: 101140 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 11:17:14,548-Speed 2960.85 samples/sec Loss 8.0292 LearningRate 0.0351 Epoch: 8 Global Step: 101150 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 11:17:17,929-Speed 3029.90 samples/sec Loss 8.0912 LearningRate 0.0351 Epoch: 8 Global Step: 101160 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 11:17:21,306-Speed 3032.14 samples/sec Loss 8.1336 LearningRate 0.0351 Epoch: 8 Global Step: 101170 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 11:17:24,679-Speed 3037.46 samples/sec Loss 8.1369 LearningRate 0.0351 Epoch: 8 Global Step: 101180 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 11:17:28,111-Speed 2984.37 samples/sec Loss 7.8505 LearningRate 0.0351 Epoch: 8 Global Step: 101190 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 11:17:31,454-Speed 3064.13 samples/sec Loss 8.0211 LearningRate 0.0351 Epoch: 8 Global Step: 101200 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 11:17:34,874-Speed 2995.02 samples/sec Loss 8.0240 LearningRate 0.0351 Epoch: 8 Global Step: 101210 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 11:17:38,273-Speed 3014.05 samples/sec Loss 7.9739 LearningRate 0.0351 Epoch: 8 Global Step: 101220 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 11:17:41,730-Speed 2962.71 samples/sec Loss 7.9676 LearningRate 0.0351 Epoch: 8 Global Step: 101230 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 11:17:45,114-Speed 3026.98 samples/sec Loss 7.9761 LearningRate 0.0351 Epoch: 8 Global Step: 101240 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:17:48,440-Speed 3079.31 samples/sec Loss 7.8668 LearningRate 0.0351 Epoch: 8 Global Step: 101250 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:17:51,793-Speed 3054.86 samples/sec Loss 8.0201 LearningRate 0.0351 Epoch: 8 Global Step: 101260 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:17:55,161-Speed 3041.78 samples/sec Loss 8.0889 LearningRate 0.0351 Epoch: 8 Global Step: 101270 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:17:58,539-Speed 3031.94 samples/sec Loss 8.0545 LearningRate 0.0351 Epoch: 8 Global Step: 101280 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:18:01,963-Speed 2991.65 samples/sec Loss 8.0656 LearningRate 0.0351 Epoch: 8 Global Step: 101290 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:18:05,363-Speed 3012.44 samples/sec Loss 8.1381 LearningRate 0.0351 Epoch: 8 Global Step: 101300 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:18:08,777-Speed 3000.51 samples/sec Loss 8.0252 LearningRate 0.0351 Epoch: 8 Global Step: 101310 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:18:12,164-Speed 3023.98 samples/sec Loss 8.0972 LearningRate 0.0351 Epoch: 8 Global Step: 101320 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:18:15,568-Speed 3008.77 samples/sec Loss 8.0957 LearningRate 0.0351 Epoch: 8 Global Step: 101330 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:18:19,034-Speed 2955.60 samples/sec Loss 8.1053 LearningRate 0.0351 Epoch: 8 Global Step: 101340 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 11:18:22,434-Speed 3012.40 samples/sec Loss 8.0719 LearningRate 0.0350 Epoch: 8 Global Step: 101350 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 11:18:25,781-Speed 3060.72 samples/sec Loss 8.2370 LearningRate 0.0350 Epoch: 8 Global Step: 101360 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:18:29,200-Speed 2995.88 samples/sec Loss 7.9575 LearningRate 0.0350 Epoch: 8 Global Step: 101370 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:18:32,556-Speed 3051.58 samples/sec Loss 8.1000 LearningRate 0.0350 Epoch: 8 Global Step: 101380 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:18:35,951-Speed 3017.63 samples/sec Loss 8.0822 LearningRate 0.0350 Epoch: 8 Global Step: 101390 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:18:39,323-Speed 3038.05 samples/sec Loss 8.2682 LearningRate 0.0350 Epoch: 8 Global Step: 101400 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:18:42,761-Speed 2979.01 samples/sec Loss 8.0820 LearningRate 0.0350 Epoch: 8 Global Step: 101410 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:18:46,209-Speed 2970.22 samples/sec Loss 8.0088 LearningRate 0.0350 Epoch: 8 Global Step: 101420 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:18:49,605-Speed 3017.01 samples/sec Loss 8.2121 LearningRate 0.0350 Epoch: 8 Global Step: 101430 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:18:53,010-Speed 3007.48 samples/sec Loss 7.9575 LearningRate 0.0350 Epoch: 8 Global Step: 101440 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:18:56,443-Speed 2984.80 samples/sec Loss 8.0516 LearningRate 0.0350 Epoch: 8 Global Step: 101450 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:18:59,891-Speed 2970.66 samples/sec Loss 8.3590 LearningRate 0.0350 Epoch: 8 Global Step: 101460 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 11:19:03,244-Speed 3054.39 samples/sec Loss 8.1479 LearningRate 0.0350 Epoch: 8 Global Step: 101470 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:19:06,636-Speed 3019.72 samples/sec Loss 7.9825 LearningRate 0.0350 Epoch: 8 Global Step: 101480 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:19:10,033-Speed 3015.75 samples/sec Loss 8.1047 LearningRate 0.0350 Epoch: 8 Global Step: 101490 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:19:13,418-Speed 3025.87 samples/sec Loss 8.2024 LearningRate 0.0350 Epoch: 8 Global Step: 101500 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:19:16,831-Speed 3001.28 samples/sec Loss 8.2209 LearningRate 0.0350 Epoch: 8 Global Step: 101510 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:19:20,180-Speed 3058.28 samples/sec Loss 8.1142 LearningRate 0.0350 Epoch: 8 Global Step: 101520 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:19:23,501-Speed 3084.65 samples/sec Loss 8.0681 LearningRate 0.0350 Epoch: 8 Global Step: 101530 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:19:26,838-Speed 3069.38 samples/sec Loss 8.2010 LearningRate 0.0350 Epoch: 8 Global Step: 101540 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:19:30,338-Speed 2926.36 samples/sec Loss 8.1931 LearningRate 0.0350 Epoch: 8 Global Step: 101550 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:19:33,792-Speed 2965.11 samples/sec Loss 8.1815 LearningRate 0.0349 Epoch: 8 Global Step: 101560 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:19:37,289-Speed 2929.81 samples/sec Loss 8.0437 LearningRate 0.0349 Epoch: 8 Global Step: 101570 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 11:19:40,736-Speed 2971.52 samples/sec Loss 8.0222 LearningRate 0.0349 Epoch: 8 Global Step: 101580 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:19:44,117-Speed 3029.27 samples/sec Loss 8.1585 LearningRate 0.0349 Epoch: 8 Global Step: 101590 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:19:47,541-Speed 2991.77 samples/sec Loss 8.1066 LearningRate 0.0349 Epoch: 8 Global Step: 101600 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:19:51,019-Speed 2945.35 samples/sec Loss 8.1146 LearningRate 0.0349 Epoch: 8 Global Step: 101610 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:19:54,439-Speed 2994.64 samples/sec Loss 8.0734 LearningRate 0.0349 Epoch: 8 Global Step: 101620 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:19:57,890-Speed 2968.36 samples/sec Loss 8.1661 LearningRate 0.0349 Epoch: 8 Global Step: 101630 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:20:01,304-Speed 3000.45 samples/sec Loss 7.9966 LearningRate 0.0349 Epoch: 8 Global Step: 101640 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:20:04,675-Speed 3038.53 samples/sec Loss 8.2689 LearningRate 0.0349 Epoch: 8 Global Step: 101650 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:20:08,104-Speed 2987.19 samples/sec Loss 8.2822 LearningRate 0.0349 Epoch: 8 Global Step: 101660 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:20:11,481-Speed 3033.12 samples/sec Loss 8.1602 LearningRate 0.0349 Epoch: 8 Global Step: 101670 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:20:14,806-Speed 3081.48 samples/sec Loss 8.1678 LearningRate 0.0349 Epoch: 8 Global Step: 101680 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 11:20:18,139-Speed 3072.97 samples/sec Loss 8.1275 LearningRate 0.0349 Epoch: 8 Global Step: 101690 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:20:21,503-Speed 3044.72 samples/sec Loss 8.1537 LearningRate 0.0349 Epoch: 8 Global Step: 101700 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:20:24,846-Speed 3064.03 samples/sec Loss 8.0904 LearningRate 0.0349 Epoch: 8 Global Step: 101710 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:20:28,282-Speed 2980.96 samples/sec Loss 8.1441 LearningRate 0.0349 Epoch: 8 Global Step: 101720 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:20:31,666-Speed 3027.00 samples/sec Loss 8.2530 LearningRate 0.0349 Epoch: 8 Global Step: 101730 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:20:35,090-Speed 2991.36 samples/sec Loss 8.2235 LearningRate 0.0349 Epoch: 8 Global Step: 101740 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:20:38,475-Speed 3026.64 samples/sec Loss 8.2056 LearningRate 0.0349 Epoch: 8 Global Step: 101750 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:20:41,905-Speed 2986.38 samples/sec Loss 8.1685 LearningRate 0.0349 Epoch: 8 Global Step: 101760 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:20:45,307-Speed 3010.18 samples/sec Loss 8.1277 LearningRate 0.0348 Epoch: 8 Global Step: 101770 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:20:48,737-Speed 2987.02 samples/sec Loss 8.2594 LearningRate 0.0348 Epoch: 8 Global Step: 101780 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:20:52,103-Speed 3042.69 samples/sec Loss 8.3255 LearningRate 0.0348 Epoch: 8 Global Step: 101790 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:20:55,460-Speed 3051.06 samples/sec Loss 8.0921 LearningRate 0.0348 Epoch: 8 Global Step: 101800 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:20:58,894-Speed 2982.95 samples/sec Loss 8.0854 LearningRate 0.0348 Epoch: 8 Global Step: 101810 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:21:02,228-Speed 3071.91 samples/sec Loss 8.2085 LearningRate 0.0348 Epoch: 8 Global Step: 101820 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:21:05,585-Speed 3051.31 samples/sec Loss 8.1825 LearningRate 0.0348 Epoch: 8 Global Step: 101830 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:21:08,995-Speed 3003.56 samples/sec Loss 8.3646 LearningRate 0.0348 Epoch: 8 Global Step: 101840 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:21:12,458-Speed 2958.55 samples/sec Loss 8.2222 LearningRate 0.0348 Epoch: 8 Global Step: 101850 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:21:15,830-Speed 3036.87 samples/sec Loss 8.2577 LearningRate 0.0348 Epoch: 8 Global Step: 101860 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:21:19,207-Speed 3033.28 samples/sec Loss 8.2383 LearningRate 0.0348 Epoch: 8 Global Step: 101870 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:21:22,571-Speed 3044.95 samples/sec Loss 8.2127 LearningRate 0.0348 Epoch: 8 Global Step: 101880 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:21:25,901-Speed 3075.78 samples/sec Loss 8.1512 LearningRate 0.0348 Epoch: 8 Global Step: 101890 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:21:29,314-Speed 3001.58 samples/sec Loss 8.1876 LearningRate 0.0348 Epoch: 8 Global Step: 101900 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 11:21:32,731-Speed 2997.20 samples/sec Loss 8.1462 LearningRate 0.0348 Epoch: 8 Global Step: 101910 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 11:21:36,082-Speed 3057.00 samples/sec Loss 8.2378 LearningRate 0.0348 Epoch: 8 Global Step: 101920 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 11:21:39,495-Speed 3001.01 samples/sec Loss 8.4164 LearningRate 0.0348 Epoch: 8 Global Step: 101930 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 11:21:42,869-Speed 3036.12 samples/sec Loss 8.1357 LearningRate 0.0348 Epoch: 8 Global Step: 101940 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 11:21:46,322-Speed 2967.07 samples/sec Loss 8.1619 LearningRate 0.0348 Epoch: 8 Global Step: 101950 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:21:49,707-Speed 3025.90 samples/sec Loss 8.1727 LearningRate 0.0348 Epoch: 8 Global Step: 101960 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:21:53,084-Speed 3033.20 samples/sec Loss 8.1749 LearningRate 0.0348 Epoch: 8 Global Step: 101970 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:21:56,466-Speed 3028.96 samples/sec Loss 8.1836 LearningRate 0.0347 Epoch: 8 Global Step: 101980 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:21:59,820-Speed 3054.63 samples/sec Loss 8.1694 LearningRate 0.0347 Epoch: 8 Global Step: 101990 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:22:03,236-Speed 2998.52 samples/sec Loss 8.1769 LearningRate 0.0347 Epoch: 8 Global Step: 102000 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:22:06,701-Speed 2957.03 samples/sec Loss 8.0280 LearningRate 0.0347 Epoch: 8 Global Step: 102010 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:22:10,129-Speed 2987.93 samples/sec Loss 8.1743 LearningRate 0.0347 Epoch: 8 Global Step: 102020 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:22:13,491-Speed 3046.58 samples/sec Loss 8.1273 LearningRate 0.0347 Epoch: 8 Global Step: 102030 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:22:16,846-Speed 3053.14 samples/sec Loss 8.1346 LearningRate 0.0347 Epoch: 8 Global Step: 102040 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:22:20,191-Speed 3062.11 samples/sec Loss 8.3165 LearningRate 0.0347 Epoch: 8 Global Step: 102050 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:22:23,563-Speed 3037.82 samples/sec Loss 8.2700 LearningRate 0.0347 Epoch: 8 Global Step: 102060 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:22:26,931-Speed 3040.86 samples/sec Loss 8.0485 LearningRate 0.0347 Epoch: 8 Global Step: 102070 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:22:30,266-Speed 3071.85 samples/sec Loss 8.0909 LearningRate 0.0347 Epoch: 8 Global Step: 102080 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:22:33,632-Speed 3042.88 samples/sec Loss 8.2787 LearningRate 0.0347 Epoch: 8 Global Step: 102090 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:22:37,007-Speed 3034.98 samples/sec Loss 8.2680 LearningRate 0.0347 Epoch: 8 Global Step: 102100 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:22:40,456-Speed 2970.25 samples/sec Loss 8.2457 LearningRate 0.0347 Epoch: 8 Global Step: 102110 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:22:43,851-Speed 3016.86 samples/sec Loss 8.2880 LearningRate 0.0347 Epoch: 8 Global Step: 102120 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:22:47,188-Speed 3069.44 samples/sec Loss 8.3259 LearningRate 0.0347 Epoch: 8 Global Step: 102130 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:22:50,574-Speed 3024.72 samples/sec Loss 8.3071 LearningRate 0.0347 Epoch: 8 Global Step: 102140 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:22:53,916-Speed 3065.65 samples/sec Loss 8.2635 LearningRate 0.0347 Epoch: 8 Global Step: 102150 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:22:57,285-Speed 3039.59 samples/sec Loss 8.2422 LearningRate 0.0347 Epoch: 8 Global Step: 102160 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:23:00,713-Speed 2988.32 samples/sec Loss 8.3085 LearningRate 0.0347 Epoch: 8 Global Step: 102170 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:23:04,108-Speed 3016.45 samples/sec Loss 8.2392 LearningRate 0.0347 Epoch: 8 Global Step: 102180 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:23:07,471-Speed 3045.95 samples/sec Loss 8.1586 LearningRate 0.0346 Epoch: 8 Global Step: 102190 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:23:10,839-Speed 3041.72 samples/sec Loss 8.2982 LearningRate 0.0346 Epoch: 8 Global Step: 102200 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:23:14,220-Speed 3029.48 samples/sec Loss 8.1618 LearningRate 0.0346 Epoch: 8 Global Step: 102210 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:23:17,552-Speed 3073.82 samples/sec Loss 8.2488 LearningRate 0.0346 Epoch: 8 Global Step: 102220 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:23:20,995-Speed 2975.65 samples/sec Loss 8.3157 LearningRate 0.0346 Epoch: 8 Global Step: 102230 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:23:24,428-Speed 2983.59 samples/sec Loss 8.2794 LearningRate 0.0346 Epoch: 8 Global Step: 102240 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:23:27,853-Speed 2990.32 samples/sec Loss 8.3177 LearningRate 0.0346 Epoch: 8 Global Step: 102250 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:23:31,274-Speed 2993.50 samples/sec Loss 8.2823 LearningRate 0.0346 Epoch: 8 Global Step: 102260 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:23:34,612-Speed 3069.26 samples/sec Loss 8.2192 LearningRate 0.0346 Epoch: 8 Global Step: 102270 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:23:37,950-Speed 3068.10 samples/sec Loss 8.3882 LearningRate 0.0346 Epoch: 8 Global Step: 102280 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:23:41,340-Speed 3021.60 samples/sec Loss 8.2484 LearningRate 0.0346 Epoch: 8 Global Step: 102290 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:23:44,768-Speed 2988.11 samples/sec Loss 8.2133 LearningRate 0.0346 Epoch: 8 Global Step: 102300 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:23:48,165-Speed 3015.50 samples/sec Loss 8.3357 LearningRate 0.0346 Epoch: 8 Global Step: 102310 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:23:51,655-Speed 2934.96 samples/sec Loss 8.3843 LearningRate 0.0346 Epoch: 8 Global Step: 102320 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:23:55,101-Speed 2972.29 samples/sec Loss 8.4035 LearningRate 0.0346 Epoch: 8 Global Step: 102330 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:23:58,547-Speed 2972.38 samples/sec Loss 8.2882 LearningRate 0.0346 Epoch: 8 Global Step: 102340 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:24:01,965-Speed 2996.79 samples/sec Loss 8.2837 LearningRate 0.0346 Epoch: 8 Global Step: 102350 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:24:05,314-Speed 3058.11 samples/sec Loss 8.2095 LearningRate 0.0346 Epoch: 8 Global Step: 102360 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:24:08,779-Speed 2956.60 samples/sec Loss 8.1861 LearningRate 0.0346 Epoch: 8 Global Step: 102370 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:24:12,178-Speed 3013.11 samples/sec Loss 8.2953 LearningRate 0.0346 Epoch: 8 Global Step: 102380 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:24:15,517-Speed 3067.20 samples/sec Loss 8.2444 LearningRate 0.0346 Epoch: 8 Global Step: 102390 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:24:18,905-Speed 3023.62 samples/sec Loss 8.3037 LearningRate 0.0346 Epoch: 8 Global Step: 102400 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:24:22,214-Speed 3095.59 samples/sec Loss 8.2676 LearningRate 0.0345 Epoch: 8 Global Step: 102410 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:24:25,528-Speed 3090.28 samples/sec Loss 8.2685 LearningRate 0.0345 Epoch: 8 Global Step: 102420 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:24:28,867-Speed 3068.81 samples/sec Loss 8.4220 LearningRate 0.0345 Epoch: 8 Global Step: 102430 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:24:32,168-Speed 3102.03 samples/sec Loss 8.2903 LearningRate 0.0345 Epoch: 8 Global Step: 102440 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:24:35,605-Speed 2980.64 samples/sec Loss 8.3098 LearningRate 0.0345 Epoch: 8 Global Step: 102450 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:24:39,016-Speed 3002.86 samples/sec Loss 8.3123 LearningRate 0.0345 Epoch: 8 Global Step: 102460 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:24:42,447-Speed 2985.37 samples/sec Loss 8.2530 LearningRate 0.0345 Epoch: 8 Global Step: 102470 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:24:45,781-Speed 3072.27 samples/sec Loss 8.2928 LearningRate 0.0345 Epoch: 8 Global Step: 102480 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 11:24:49,131-Speed 3057.29 samples/sec Loss 8.2725 LearningRate 0.0345 Epoch: 8 Global Step: 102490 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:24:52,558-Speed 2989.36 samples/sec Loss 8.2933 LearningRate 0.0345 Epoch: 8 Global Step: 102500 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:24:56,004-Speed 2972.06 samples/sec Loss 8.3472 LearningRate 0.0345 Epoch: 8 Global Step: 102510 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:24:59,404-Speed 3012.88 samples/sec Loss 8.3800 LearningRate 0.0345 Epoch: 8 Global Step: 102520 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:25:02,783-Speed 3030.79 samples/sec Loss 8.2165 LearningRate 0.0345 Epoch: 8 Global Step: 102530 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:25:06,135-Speed 3056.31 samples/sec Loss 8.4148 LearningRate 0.0345 Epoch: 8 Global Step: 102540 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:25:09,503-Speed 3040.85 samples/sec Loss 8.2725 LearningRate 0.0345 Epoch: 8 Global Step: 102550 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:25:12,825-Speed 3083.65 samples/sec Loss 8.3369 LearningRate 0.0345 Epoch: 8 Global Step: 102560 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:25:16,206-Speed 3028.90 samples/sec Loss 8.3541 LearningRate 0.0345 Epoch: 8 Global Step: 102570 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:25:19,577-Speed 3038.74 samples/sec Loss 8.4367 LearningRate 0.0345 Epoch: 8 Global Step: 102580 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:25:22,920-Speed 3064.75 samples/sec Loss 8.3992 LearningRate 0.0345 Epoch: 8 Global Step: 102590 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 11:25:26,213-Speed 3109.94 samples/sec Loss 8.3213 LearningRate 0.0345 Epoch: 8 Global Step: 102600 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:25:29,647-Speed 2982.47 samples/sec Loss 8.4049 LearningRate 0.0345 Epoch: 8 Global Step: 102610 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:25:32,998-Speed 3056.79 samples/sec Loss 8.3075 LearningRate 0.0344 Epoch: 8 Global Step: 102620 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:25:36,376-Speed 3032.11 samples/sec Loss 8.3383 LearningRate 0.0344 Epoch: 8 Global Step: 102630 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:25:39,764-Speed 3022.91 samples/sec Loss 8.3061 LearningRate 0.0344 Epoch: 8 Global Step: 102640 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:25:43,117-Speed 3054.82 samples/sec Loss 8.3672 LearningRate 0.0344 Epoch: 8 Global Step: 102650 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 11:25:46,561-Speed 2974.22 samples/sec Loss 8.3413 LearningRate 0.0344 Epoch: 8 Global Step: 102660 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 11:25:50,003-Speed 2975.89 samples/sec Loss 8.3636 LearningRate 0.0344 Epoch: 8 Global Step: 102670 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 11:25:53,436-Speed 2983.32 samples/sec Loss 8.3880 LearningRate 0.0344 Epoch: 8 Global Step: 102680 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 11:25:56,815-Speed 3031.85 samples/sec Loss 8.2390 LearningRate 0.0344 Epoch: 8 Global Step: 102690 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 11:26:00,333-Speed 2911.49 samples/sec Loss 8.3847 LearningRate 0.0344 Epoch: 8 Global Step: 102700 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 11:26:03,679-Speed 3060.28 samples/sec Loss 8.3872 LearningRate 0.0344 Epoch: 8 Global Step: 102710 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 11:26:07,061-Speed 3029.25 samples/sec Loss 8.3923 LearningRate 0.0344 Epoch: 8 Global Step: 102720 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 11:26:10,375-Speed 3090.28 samples/sec Loss 8.2796 LearningRate 0.0344 Epoch: 8 Global Step: 102730 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 11:26:13,730-Speed 3053.33 samples/sec Loss 8.2465 LearningRate 0.0344 Epoch: 8 Global Step: 102740 Fp16 Grad Scale: 16384 Required: 14 hours Training: 2022-04-27 11:26:17,091-Speed 3047.69 samples/sec Loss 8.2492 LearningRate 0.0344 Epoch: 8 Global Step: 102750 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:26:20,537-Speed 2972.15 samples/sec Loss 8.3696 LearningRate 0.0344 Epoch: 8 Global Step: 102760 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:26:23,879-Speed 3064.98 samples/sec Loss 8.2492 LearningRate 0.0344 Epoch: 8 Global Step: 102770 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:26:27,300-Speed 2993.70 samples/sec Loss 8.3883 LearningRate 0.0344 Epoch: 8 Global Step: 102780 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:26:30,663-Speed 3045.99 samples/sec Loss 8.3468 LearningRate 0.0344 Epoch: 8 Global Step: 102790 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:26:34,014-Speed 3055.90 samples/sec Loss 8.2586 LearningRate 0.0344 Epoch: 8 Global Step: 102800 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:26:37,402-Speed 3023.76 samples/sec Loss 8.2845 LearningRate 0.0344 Epoch: 8 Global Step: 102810 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:26:40,786-Speed 3026.37 samples/sec Loss 8.2714 LearningRate 0.0344 Epoch: 8 Global Step: 102820 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:26:44,183-Speed 3015.64 samples/sec Loss 8.2611 LearningRate 0.0343 Epoch: 8 Global Step: 102830 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:26:47,564-Speed 3029.81 samples/sec Loss 8.3605 LearningRate 0.0343 Epoch: 8 Global Step: 102840 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:26:50,873-Speed 3095.10 samples/sec Loss 8.4252 LearningRate 0.0343 Epoch: 8 Global Step: 102850 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:26:54,221-Speed 3059.77 samples/sec Loss 8.3168 LearningRate 0.0343 Epoch: 8 Global Step: 102860 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:26:57,590-Speed 3040.20 samples/sec Loss 8.3159 LearningRate 0.0343 Epoch: 8 Global Step: 102870 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:27:00,947-Speed 3051.80 samples/sec Loss 8.3022 LearningRate 0.0343 Epoch: 8 Global Step: 102880 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:27:04,368-Speed 2994.08 samples/sec Loss 8.2961 LearningRate 0.0343 Epoch: 8 Global Step: 102890 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:27:07,731-Speed 3045.91 samples/sec Loss 8.3447 LearningRate 0.0343 Epoch: 8 Global Step: 102900 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:27:11,149-Speed 2996.43 samples/sec Loss 8.3945 LearningRate 0.0343 Epoch: 8 Global Step: 102910 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:27:14,519-Speed 3039.26 samples/sec Loss 8.3415 LearningRate 0.0343 Epoch: 8 Global Step: 102920 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:27:17,953-Speed 2983.06 samples/sec Loss 8.3377 LearningRate 0.0343 Epoch: 8 Global Step: 102930 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:27:21,341-Speed 3022.93 samples/sec Loss 8.4296 LearningRate 0.0343 Epoch: 8 Global Step: 102940 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:27:24,772-Speed 2985.56 samples/sec Loss 8.2287 LearningRate 0.0343 Epoch: 8 Global Step: 102950 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 11:27:28,133-Speed 3048.35 samples/sec Loss 8.3038 LearningRate 0.0343 Epoch: 8 Global Step: 102960 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:27:31,577-Speed 2973.78 samples/sec Loss 8.4366 LearningRate 0.0343 Epoch: 8 Global Step: 102970 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:27:34,908-Speed 3074.77 samples/sec Loss 8.4450 LearningRate 0.0343 Epoch: 8 Global Step: 102980 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:27:38,247-Speed 3068.94 samples/sec Loss 8.3484 LearningRate 0.0343 Epoch: 8 Global Step: 102990 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:27:41,550-Speed 3101.07 samples/sec Loss 8.3170 LearningRate 0.0343 Epoch: 8 Global Step: 103000 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:27:44,968-Speed 2996.42 samples/sec Loss 8.3047 LearningRate 0.0343 Epoch: 8 Global Step: 103010 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:27:48,356-Speed 3023.39 samples/sec Loss 8.3631 LearningRate 0.0343 Epoch: 8 Global Step: 103020 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:27:51,695-Speed 3067.84 samples/sec Loss 8.2420 LearningRate 0.0343 Epoch: 8 Global Step: 103030 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:27:55,065-Speed 3039.66 samples/sec Loss 8.1933 LearningRate 0.0342 Epoch: 8 Global Step: 103040 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:27:58,434-Speed 3040.16 samples/sec Loss 8.3451 LearningRate 0.0342 Epoch: 8 Global Step: 103050 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:28:01,780-Speed 3060.92 samples/sec Loss 8.2958 LearningRate 0.0342 Epoch: 8 Global Step: 103060 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:28:05,123-Speed 3064.20 samples/sec Loss 8.1775 LearningRate 0.0342 Epoch: 8 Global Step: 103070 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:28:08,461-Speed 3068.42 samples/sec Loss 8.2809 LearningRate 0.0342 Epoch: 8 Global Step: 103080 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:28:11,789-Speed 3078.95 samples/sec Loss 8.4287 LearningRate 0.0342 Epoch: 8 Global Step: 103090 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:28:15,139-Speed 3057.09 samples/sec Loss 8.3409 LearningRate 0.0342 Epoch: 8 Global Step: 103100 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:28:18,460-Speed 3084.11 samples/sec Loss 8.3758 LearningRate 0.0342 Epoch: 8 Global Step: 103110 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:28:21,896-Speed 2981.18 samples/sec Loss 8.3588 LearningRate 0.0342 Epoch: 8 Global Step: 103120 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:28:25,338-Speed 2976.51 samples/sec Loss 8.2894 LearningRate 0.0342 Epoch: 8 Global Step: 103130 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:28:28,781-Speed 2974.97 samples/sec Loss 8.3303 LearningRate 0.0342 Epoch: 8 Global Step: 103140 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:28:32,096-Speed 3089.57 samples/sec Loss 8.2796 LearningRate 0.0342 Epoch: 8 Global Step: 103150 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:28:35,453-Speed 3051.02 samples/sec Loss 8.4377 LearningRate 0.0342 Epoch: 8 Global Step: 103160 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:28:38,791-Speed 3068.11 samples/sec Loss 8.5243 LearningRate 0.0342 Epoch: 8 Global Step: 103170 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:28:42,176-Speed 3026.25 samples/sec Loss 8.4169 LearningRate 0.0342 Epoch: 8 Global Step: 103180 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:28:45,536-Speed 3048.44 samples/sec Loss 8.2953 LearningRate 0.0342 Epoch: 8 Global Step: 103190 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:28:48,914-Speed 3032.36 samples/sec Loss 8.3225 LearningRate 0.0342 Epoch: 8 Global Step: 103200 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:28:52,289-Speed 3034.73 samples/sec Loss 8.3509 LearningRate 0.0342 Epoch: 8 Global Step: 103210 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:28:55,718-Speed 2986.71 samples/sec Loss 8.3476 LearningRate 0.0342 Epoch: 8 Global Step: 103220 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:28:59,069-Speed 3057.16 samples/sec Loss 8.4900 LearningRate 0.0342 Epoch: 8 Global Step: 103230 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:29:02,543-Speed 2948.23 samples/sec Loss 8.3501 LearningRate 0.0342 Epoch: 8 Global Step: 103240 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:29:05,860-Speed 3087.44 samples/sec Loss 8.3797 LearningRate 0.0341 Epoch: 8 Global Step: 103250 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:29:09,313-Speed 2966.66 samples/sec Loss 8.3535 LearningRate 0.0341 Epoch: 8 Global Step: 103260 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:29:12,696-Speed 3027.42 samples/sec Loss 8.3077 LearningRate 0.0341 Epoch: 8 Global Step: 103270 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:29:16,091-Speed 3017.14 samples/sec Loss 8.2474 LearningRate 0.0341 Epoch: 8 Global Step: 103280 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:29:19,466-Speed 3035.61 samples/sec Loss 8.4115 LearningRate 0.0341 Epoch: 8 Global Step: 103290 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:29:22,786-Speed 3084.90 samples/sec Loss 8.4038 LearningRate 0.0341 Epoch: 8 Global Step: 103300 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:29:26,158-Speed 3036.93 samples/sec Loss 8.4699 LearningRate 0.0341 Epoch: 8 Global Step: 103310 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:29:29,551-Speed 3019.34 samples/sec Loss 8.2388 LearningRate 0.0341 Epoch: 8 Global Step: 103320 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:29:32,879-Speed 3077.70 samples/sec Loss 8.3126 LearningRate 0.0341 Epoch: 8 Global Step: 103330 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:29:36,346-Speed 2954.63 samples/sec Loss 8.4372 LearningRate 0.0341 Epoch: 8 Global Step: 103340 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:29:39,808-Speed 2958.24 samples/sec Loss 8.2639 LearningRate 0.0341 Epoch: 8 Global Step: 103350 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:29:43,229-Speed 2994.32 samples/sec Loss 8.2713 LearningRate 0.0341 Epoch: 8 Global Step: 103360 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:29:46,583-Speed 3053.46 samples/sec Loss 8.3447 LearningRate 0.0341 Epoch: 8 Global Step: 103370 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:29:49,996-Speed 3001.81 samples/sec Loss 8.4997 LearningRate 0.0341 Epoch: 8 Global Step: 103380 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:29:53,412-Speed 2998.23 samples/sec Loss 8.4370 LearningRate 0.0341 Epoch: 8 Global Step: 103390 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:29:56,801-Speed 3022.04 samples/sec Loss 8.4282 LearningRate 0.0341 Epoch: 8 Global Step: 103400 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:30:00,192-Speed 3020.71 samples/sec Loss 8.3596 LearningRate 0.0341 Epoch: 8 Global Step: 103410 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:30:03,564-Speed 3037.76 samples/sec Loss 8.3298 LearningRate 0.0341 Epoch: 8 Global Step: 103420 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:30:06,937-Speed 3036.73 samples/sec Loss 8.4987 LearningRate 0.0341 Epoch: 8 Global Step: 103430 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:30:10,331-Speed 3018.19 samples/sec Loss 8.3036 LearningRate 0.0341 Epoch: 8 Global Step: 103440 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:30:13,705-Speed 3036.14 samples/sec Loss 8.4741 LearningRate 0.0341 Epoch: 8 Global Step: 103450 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:30:17,008-Speed 3100.75 samples/sec Loss 8.3621 LearningRate 0.0341 Epoch: 8 Global Step: 103460 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:30:20,330-Speed 3083.92 samples/sec Loss 8.3195 LearningRate 0.0340 Epoch: 8 Global Step: 103470 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:30:23,696-Speed 3042.82 samples/sec Loss 8.3024 LearningRate 0.0340 Epoch: 8 Global Step: 103480 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:30:27,063-Speed 3041.59 samples/sec Loss 8.3988 LearningRate 0.0340 Epoch: 8 Global Step: 103490 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:30:30,434-Speed 3039.04 samples/sec Loss 8.5404 LearningRate 0.0340 Epoch: 8 Global Step: 103500 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:30:33,807-Speed 3036.77 samples/sec Loss 8.4133 LearningRate 0.0340 Epoch: 8 Global Step: 103510 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:30:37,213-Speed 3007.33 samples/sec Loss 8.3633 LearningRate 0.0340 Epoch: 8 Global Step: 103520 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:30:40,575-Speed 3046.31 samples/sec Loss 8.3315 LearningRate 0.0340 Epoch: 8 Global Step: 103530 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:30:43,940-Speed 3044.14 samples/sec Loss 8.2543 LearningRate 0.0340 Epoch: 8 Global Step: 103540 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:30:47,273-Speed 3073.73 samples/sec Loss 8.3298 LearningRate 0.0340 Epoch: 8 Global Step: 103550 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:30:50,672-Speed 3013.13 samples/sec Loss 8.3395 LearningRate 0.0340 Epoch: 8 Global Step: 103560 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:30:54,080-Speed 3005.52 samples/sec Loss 8.3581 LearningRate 0.0340 Epoch: 8 Global Step: 103570 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:30:57,470-Speed 3021.34 samples/sec Loss 8.3681 LearningRate 0.0340 Epoch: 8 Global Step: 103580 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:31:00,800-Speed 3075.60 samples/sec Loss 8.1693 LearningRate 0.0340 Epoch: 8 Global Step: 103590 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:31:04,114-Speed 3090.98 samples/sec Loss 8.1778 LearningRate 0.0340 Epoch: 8 Global Step: 103600 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:31:07,488-Speed 3035.46 samples/sec Loss 8.3617 LearningRate 0.0340 Epoch: 8 Global Step: 103610 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:31:10,812-Speed 3081.82 samples/sec Loss 8.2665 LearningRate 0.0340 Epoch: 8 Global Step: 103620 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:31:14,152-Speed 3066.89 samples/sec Loss 8.4255 LearningRate 0.0340 Epoch: 8 Global Step: 103630 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:31:17,485-Speed 3072.72 samples/sec Loss 8.3229 LearningRate 0.0340 Epoch: 8 Global Step: 103640 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:31:20,808-Speed 3082.52 samples/sec Loss 8.4612 LearningRate 0.0340 Epoch: 8 Global Step: 103650 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:31:24,171-Speed 3045.85 samples/sec Loss 8.3678 LearningRate 0.0340 Epoch: 8 Global Step: 103660 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:31:27,498-Speed 3078.67 samples/sec Loss 8.3221 LearningRate 0.0340 Epoch: 8 Global Step: 103670 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:31:30,865-Speed 3042.88 samples/sec Loss 8.4352 LearningRate 0.0339 Epoch: 8 Global Step: 103680 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:31:34,258-Speed 3018.49 samples/sec Loss 8.3112 LearningRate 0.0339 Epoch: 8 Global Step: 103690 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:31:37,602-Speed 3062.94 samples/sec Loss 8.2966 LearningRate 0.0339 Epoch: 8 Global Step: 103700 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:31:40,974-Speed 3037.80 samples/sec Loss 8.3626 LearningRate 0.0339 Epoch: 8 Global Step: 103710 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:31:44,314-Speed 3066.38 samples/sec Loss 8.2993 LearningRate 0.0339 Epoch: 8 Global Step: 103720 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:31:47,739-Speed 2990.17 samples/sec Loss 8.3725 LearningRate 0.0339 Epoch: 8 Global Step: 103730 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:31:51,138-Speed 3013.17 samples/sec Loss 8.2950 LearningRate 0.0339 Epoch: 8 Global Step: 103740 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:31:54,461-Speed 3082.90 samples/sec Loss 8.4583 LearningRate 0.0339 Epoch: 8 Global Step: 103750 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:31:57,814-Speed 3055.13 samples/sec Loss 8.3601 LearningRate 0.0339 Epoch: 8 Global Step: 103760 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:32:01,273-Speed 2961.06 samples/sec Loss 8.3889 LearningRate 0.0339 Epoch: 8 Global Step: 103770 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:32:04,714-Speed 2976.78 samples/sec Loss 8.3642 LearningRate 0.0339 Epoch: 8 Global Step: 103780 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:32:08,074-Speed 3048.67 samples/sec Loss 8.3393 LearningRate 0.0339 Epoch: 8 Global Step: 103790 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:32:11,450-Speed 3033.63 samples/sec Loss 8.4609 LearningRate 0.0339 Epoch: 8 Global Step: 103800 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:32:14,813-Speed 3045.98 samples/sec Loss 8.4282 LearningRate 0.0339 Epoch: 8 Global Step: 103810 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:32:18,214-Speed 3012.61 samples/sec Loss 8.4043 LearningRate 0.0339 Epoch: 8 Global Step: 103820 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:32:21,525-Speed 3093.80 samples/sec Loss 8.3497 LearningRate 0.0339 Epoch: 8 Global Step: 103830 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:32:24,859-Speed 3072.25 samples/sec Loss 8.4127 LearningRate 0.0339 Epoch: 8 Global Step: 103840 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 11:32:28,251-Speed 3019.34 samples/sec Loss 8.4132 LearningRate 0.0339 Epoch: 8 Global Step: 103850 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 11:32:31,731-Speed 2943.30 samples/sec Loss 8.3773 LearningRate 0.0339 Epoch: 8 Global Step: 103860 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:32:35,228-Speed 2929.49 samples/sec Loss 8.2965 LearningRate 0.0339 Epoch: 8 Global Step: 103870 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:32:38,636-Speed 3005.22 samples/sec Loss 8.4393 LearningRate 0.0339 Epoch: 8 Global Step: 103880 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:32:42,047-Speed 3003.48 samples/sec Loss 8.3729 LearningRate 0.0338 Epoch: 8 Global Step: 103890 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:32:45,474-Speed 2988.24 samples/sec Loss 8.3211 LearningRate 0.0338 Epoch: 8 Global Step: 103900 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:32:48,884-Speed 3003.93 samples/sec Loss 8.2000 LearningRate 0.0338 Epoch: 8 Global Step: 103910 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:32:52,322-Speed 2979.62 samples/sec Loss 8.3035 LearningRate 0.0338 Epoch: 8 Global Step: 103920 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:32:55,729-Speed 3006.38 samples/sec Loss 8.3976 LearningRate 0.0338 Epoch: 8 Global Step: 103930 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:32:59,132-Speed 3009.51 samples/sec Loss 8.3710 LearningRate 0.0338 Epoch: 8 Global Step: 103940 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:33:02,586-Speed 2965.64 samples/sec Loss 8.3440 LearningRate 0.0338 Epoch: 8 Global Step: 103950 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:33:05,981-Speed 3017.54 samples/sec Loss 8.4830 LearningRate 0.0338 Epoch: 8 Global Step: 103960 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 11:33:09,334-Speed 3053.88 samples/sec Loss 8.3957 LearningRate 0.0338 Epoch: 8 Global Step: 103970 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:33:12,697-Speed 3046.14 samples/sec Loss 8.4019 LearningRate 0.0338 Epoch: 8 Global Step: 103980 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:33:16,094-Speed 3015.36 samples/sec Loss 8.4672 LearningRate 0.0338 Epoch: 8 Global Step: 103990 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:33:19,404-Speed 3094.04 samples/sec Loss 8.3929 LearningRate 0.0338 Epoch: 8 Global Step: 104000 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:33:22,806-Speed 3011.47 samples/sec Loss 8.3760 LearningRate 0.0338 Epoch: 8 Global Step: 104010 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:33:26,213-Speed 3005.83 samples/sec Loss 8.4065 LearningRate 0.0338 Epoch: 8 Global Step: 104020 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:33:29,610-Speed 3015.17 samples/sec Loss 8.3463 LearningRate 0.0338 Epoch: 8 Global Step: 104030 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:33:32,951-Speed 3065.97 samples/sec Loss 8.2945 LearningRate 0.0338 Epoch: 8 Global Step: 104040 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:33:36,388-Speed 2980.03 samples/sec Loss 8.4185 LearningRate 0.0338 Epoch: 8 Global Step: 104050 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:33:39,750-Speed 3048.47 samples/sec Loss 8.4957 LearningRate 0.0338 Epoch: 8 Global Step: 104060 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:33:43,138-Speed 3022.85 samples/sec Loss 8.3054 LearningRate 0.0338 Epoch: 8 Global Step: 104070 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 11:33:46,513-Speed 3034.55 samples/sec Loss 8.3909 LearningRate 0.0338 Epoch: 8 Global Step: 104080 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:33:49,907-Speed 3018.38 samples/sec Loss 8.3783 LearningRate 0.0338 Epoch: 8 Global Step: 104090 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:33:53,308-Speed 3011.79 samples/sec Loss 8.4518 LearningRate 0.0338 Epoch: 8 Global Step: 104100 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:33:56,696-Speed 3023.15 samples/sec Loss 8.3408 LearningRate 0.0337 Epoch: 8 Global Step: 104110 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:34:00,201-Speed 2923.16 samples/sec Loss 8.3400 LearningRate 0.0337 Epoch: 8 Global Step: 104120 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:34:03,549-Speed 3058.94 samples/sec Loss 8.3222 LearningRate 0.0337 Epoch: 8 Global Step: 104130 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:34:06,882-Speed 3073.60 samples/sec Loss 8.3237 LearningRate 0.0337 Epoch: 8 Global Step: 104140 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:34:10,286-Speed 3008.74 samples/sec Loss 8.3180 LearningRate 0.0337 Epoch: 8 Global Step: 104150 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:34:13,717-Speed 2985.21 samples/sec Loss 8.4130 LearningRate 0.0337 Epoch: 8 Global Step: 104160 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:34:17,112-Speed 3017.66 samples/sec Loss 8.2950 LearningRate 0.0337 Epoch: 8 Global Step: 104170 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:34:20,518-Speed 3007.40 samples/sec Loss 8.4364 LearningRate 0.0337 Epoch: 8 Global Step: 104180 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 11:34:23,878-Speed 3047.78 samples/sec Loss 8.3063 LearningRate 0.0337 Epoch: 8 Global Step: 104190 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 11:34:27,319-Speed 2976.87 samples/sec Loss 8.2708 LearningRate 0.0337 Epoch: 8 Global Step: 104200 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 11:34:30,699-Speed 3030.32 samples/sec Loss 8.3318 LearningRate 0.0337 Epoch: 8 Global Step: 104210 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:34:34,181-Speed 2942.48 samples/sec Loss 8.1946 LearningRate 0.0337 Epoch: 8 Global Step: 104220 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:34:37,542-Speed 3047.32 samples/sec Loss 8.4347 LearningRate 0.0337 Epoch: 8 Global Step: 104230 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:34:40,881-Speed 3067.92 samples/sec Loss 8.3593 LearningRate 0.0337 Epoch: 8 Global Step: 104240 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:34:44,270-Speed 3022.41 samples/sec Loss 8.3029 LearningRate 0.0337 Epoch: 8 Global Step: 104250 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:34:47,615-Speed 3062.03 samples/sec Loss 8.3705 LearningRate 0.0337 Epoch: 8 Global Step: 104260 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:34:51,013-Speed 3014.24 samples/sec Loss 8.3995 LearningRate 0.0337 Epoch: 8 Global Step: 104270 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:34:54,421-Speed 3005.40 samples/sec Loss 8.3593 LearningRate 0.0337 Epoch: 8 Global Step: 104280 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:34:57,836-Speed 2998.79 samples/sec Loss 8.3432 LearningRate 0.0337 Epoch: 8 Global Step: 104290 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:35:01,298-Speed 2959.16 samples/sec Loss 8.3815 LearningRate 0.0337 Epoch: 8 Global Step: 104300 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:35:04,654-Speed 3051.72 samples/sec Loss 8.4471 LearningRate 0.0337 Epoch: 8 Global Step: 104310 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:35:08,024-Speed 3039.58 samples/sec Loss 8.4124 LearningRate 0.0336 Epoch: 8 Global Step: 104320 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:35:11,364-Speed 3066.86 samples/sec Loss 8.3669 LearningRate 0.0336 Epoch: 8 Global Step: 104330 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:35:14,681-Speed 3087.53 samples/sec Loss 8.3089 LearningRate 0.0336 Epoch: 8 Global Step: 104340 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:35:18,072-Speed 3020.95 samples/sec Loss 8.3952 LearningRate 0.0336 Epoch: 8 Global Step: 104350 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:35:21,476-Speed 3008.92 samples/sec Loss 8.4204 LearningRate 0.0336 Epoch: 8 Global Step: 104360 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:35:24,893-Speed 2997.84 samples/sec Loss 8.5085 LearningRate 0.0336 Epoch: 8 Global Step: 104370 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:35:28,325-Speed 2985.17 samples/sec Loss 8.3412 LearningRate 0.0336 Epoch: 8 Global Step: 104380 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:35:31,756-Speed 2985.02 samples/sec Loss 8.3133 LearningRate 0.0336 Epoch: 8 Global Step: 104390 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:35:35,124-Speed 3040.96 samples/sec Loss 8.4328 LearningRate 0.0336 Epoch: 8 Global Step: 104400 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:35:38,496-Speed 3037.77 samples/sec Loss 8.3808 LearningRate 0.0336 Epoch: 8 Global Step: 104410 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:35:41,898-Speed 3011.66 samples/sec Loss 8.3856 LearningRate 0.0336 Epoch: 8 Global Step: 104420 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:35:45,263-Speed 3043.97 samples/sec Loss 8.4259 LearningRate 0.0336 Epoch: 8 Global Step: 104430 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:35:48,667-Speed 3008.77 samples/sec Loss 8.3449 LearningRate 0.0336 Epoch: 8 Global Step: 104440 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:35:52,048-Speed 3029.86 samples/sec Loss 8.4418 LearningRate 0.0336 Epoch: 8 Global Step: 104450 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 11:35:55,478-Speed 2985.90 samples/sec Loss 8.2689 LearningRate 0.0336 Epoch: 8 Global Step: 104460 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 11:35:58,835-Speed 3050.74 samples/sec Loss 8.4356 LearningRate 0.0336 Epoch: 8 Global Step: 104470 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 11:36:02,145-Speed 3094.67 samples/sec Loss 8.3974 LearningRate 0.0336 Epoch: 8 Global Step: 104480 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:36:05,584-Speed 2979.04 samples/sec Loss 8.3510 LearningRate 0.0336 Epoch: 8 Global Step: 104490 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:36:08,980-Speed 3015.67 samples/sec Loss 8.4521 LearningRate 0.0336 Epoch: 8 Global Step: 104500 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:36:12,347-Speed 3042.83 samples/sec Loss 8.3690 LearningRate 0.0336 Epoch: 8 Global Step: 104510 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:36:15,739-Speed 3018.82 samples/sec Loss 8.3584 LearningRate 0.0336 Epoch: 8 Global Step: 104520 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:36:19,096-Speed 3051.68 samples/sec Loss 8.3761 LearningRate 0.0335 Epoch: 8 Global Step: 104530 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:36:22,505-Speed 3005.03 samples/sec Loss 8.3258 LearningRate 0.0335 Epoch: 8 Global Step: 104540 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:36:25,995-Speed 2934.56 samples/sec Loss 8.4083 LearningRate 0.0335 Epoch: 8 Global Step: 104550 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:36:29,429-Speed 2982.66 samples/sec Loss 8.4574 LearningRate 0.0335 Epoch: 8 Global Step: 104560 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:36:32,892-Speed 2958.40 samples/sec Loss 8.3775 LearningRate 0.0335 Epoch: 8 Global Step: 104570 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:36:36,325-Speed 2983.71 samples/sec Loss 8.3405 LearningRate 0.0335 Epoch: 8 Global Step: 104580 Fp16 Grad Scale: 131072 Required: 14 hours Training: 2022-04-27 11:36:39,793-Speed 2953.94 samples/sec Loss 8.4439 LearningRate 0.0335 Epoch: 8 Global Step: 104590 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:36:43,246-Speed 2965.70 samples/sec Loss 8.3292 LearningRate 0.0335 Epoch: 8 Global Step: 104600 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:36:46,618-Speed 3037.59 samples/sec Loss 8.2283 LearningRate 0.0335 Epoch: 8 Global Step: 104610 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:36:50,075-Speed 2963.75 samples/sec Loss 8.2956 LearningRate 0.0335 Epoch: 8 Global Step: 104620 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:36:53,561-Speed 2937.92 samples/sec Loss 8.3085 LearningRate 0.0335 Epoch: 8 Global Step: 104630 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:36:56,999-Speed 2979.45 samples/sec Loss 8.3731 LearningRate 0.0335 Epoch: 8 Global Step: 104640 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:37:00,394-Speed 3016.98 samples/sec Loss 8.4992 LearningRate 0.0335 Epoch: 8 Global Step: 104650 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:37:03,733-Speed 3068.04 samples/sec Loss 8.3397 LearningRate 0.0335 Epoch: 8 Global Step: 104660 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:37:07,051-Speed 3086.69 samples/sec Loss 8.3424 LearningRate 0.0335 Epoch: 8 Global Step: 104670 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:37:10,435-Speed 3027.12 samples/sec Loss 8.2725 LearningRate 0.0335 Epoch: 8 Global Step: 104680 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:37:13,902-Speed 2954.58 samples/sec Loss 8.4453 LearningRate 0.0335 Epoch: 8 Global Step: 104690 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:37:17,283-Speed 3029.39 samples/sec Loss 8.4584 LearningRate 0.0335 Epoch: 8 Global Step: 104700 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:37:20,683-Speed 3012.38 samples/sec Loss 8.3908 LearningRate 0.0335 Epoch: 8 Global Step: 104710 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:37:24,029-Speed 3061.62 samples/sec Loss 8.4931 LearningRate 0.0335 Epoch: 8 Global Step: 104720 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:37:27,425-Speed 3016.10 samples/sec Loss 8.3973 LearningRate 0.0335 Epoch: 8 Global Step: 104730 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:37:30,853-Speed 2987.68 samples/sec Loss 8.4040 LearningRate 0.0335 Epoch: 8 Global Step: 104740 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:37:34,307-Speed 2965.28 samples/sec Loss 8.4401 LearningRate 0.0334 Epoch: 8 Global Step: 104750 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:37:37,743-Speed 2981.09 samples/sec Loss 8.3098 LearningRate 0.0334 Epoch: 8 Global Step: 104760 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:37:41,149-Speed 3007.37 samples/sec Loss 8.3909 LearningRate 0.0334 Epoch: 8 Global Step: 104770 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:37:44,585-Speed 2981.74 samples/sec Loss 8.5689 LearningRate 0.0334 Epoch: 8 Global Step: 104780 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:37:48,052-Speed 2954.60 samples/sec Loss 8.4053 LearningRate 0.0334 Epoch: 8 Global Step: 104790 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:37:51,487-Speed 2981.58 samples/sec Loss 8.2511 LearningRate 0.0334 Epoch: 8 Global Step: 104800 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:37:54,844-Speed 3051.05 samples/sec Loss 8.2185 LearningRate 0.0334 Epoch: 8 Global Step: 104810 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:37:58,254-Speed 3003.54 samples/sec Loss 8.3651 LearningRate 0.0334 Epoch: 8 Global Step: 104820 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:38:01,640-Speed 3025.42 samples/sec Loss 8.3026 LearningRate 0.0334 Epoch: 8 Global Step: 104830 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:38:04,977-Speed 3069.01 samples/sec Loss 8.4076 LearningRate 0.0334 Epoch: 8 Global Step: 104840 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:38:08,319-Speed 3065.37 samples/sec Loss 8.4816 LearningRate 0.0334 Epoch: 8 Global Step: 104850 Fp16 Grad Scale: 65536 Required: 14 hours Training: 2022-04-27 11:38:11,754-Speed 2981.94 samples/sec Loss 8.5012 LearningRate 0.0334 Epoch: 8 Global Step: 104860 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:38:15,053-Speed 3104.85 samples/sec Loss 8.3228 LearningRate 0.0334 Epoch: 8 Global Step: 104870 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:38:18,955-Speed 2625.25 samples/sec Loss 8.3778 LearningRate 0.0334 Epoch: 8 Global Step: 104880 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:38:22,412-Speed 2962.36 samples/sec Loss 8.4669 LearningRate 0.0334 Epoch: 8 Global Step: 104890 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:38:25,786-Speed 3037.13 samples/sec Loss 8.3416 LearningRate 0.0334 Epoch: 8 Global Step: 104900 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:38:30,439-Speed 2201.73 samples/sec Loss 8.4113 LearningRate 0.0334 Epoch: 8 Global Step: 104910 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:38:34,539-Speed 2497.67 samples/sec Loss 8.4957 LearningRate 0.0334 Epoch: 8 Global Step: 104920 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:38:38,023-Speed 2940.12 samples/sec Loss 8.3436 LearningRate 0.0334 Epoch: 8 Global Step: 104930 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:38:41,400-Speed 3033.09 samples/sec Loss 8.4082 LearningRate 0.0334 Epoch: 8 Global Step: 104940 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:38:44,785-Speed 3026.28 samples/sec Loss 8.3296 LearningRate 0.0334 Epoch: 8 Global Step: 104950 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:38:48,288-Speed 2923.89 samples/sec Loss 8.1805 LearningRate 0.0333 Epoch: 8 Global Step: 104960 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:38:51,698-Speed 3004.25 samples/sec Loss 8.3375 LearningRate 0.0333 Epoch: 8 Global Step: 104970 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:38:55,068-Speed 3039.46 samples/sec Loss 8.3310 LearningRate 0.0333 Epoch: 8 Global Step: 104980 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:38:58,462-Speed 3017.41 samples/sec Loss 8.3095 LearningRate 0.0333 Epoch: 8 Global Step: 104990 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:39:01,821-Speed 3050.19 samples/sec Loss 8.5139 LearningRate 0.0333 Epoch: 8 Global Step: 105000 Fp16 Grad Scale: 32768 Required: 14 hours Training: 2022-04-27 11:39:05,161-Speed 3066.30 samples/sec Loss 8.4247 LearningRate 0.0333 Epoch: 8 Global Step: 105010 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:39:08,540-Speed 3031.73 samples/sec Loss 8.4797 LearningRate 0.0333 Epoch: 8 Global Step: 105020 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:39:11,905-Speed 3044.01 samples/sec Loss 8.2274 LearningRate 0.0333 Epoch: 8 Global Step: 105030 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:39:15,280-Speed 3035.03 samples/sec Loss 8.3400 LearningRate 0.0333 Epoch: 8 Global Step: 105040 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:39:18,589-Speed 3097.86 samples/sec Loss 8.3813 LearningRate 0.0333 Epoch: 8 Global Step: 105050 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:39:21,943-Speed 3054.09 samples/sec Loss 8.2386 LearningRate 0.0333 Epoch: 8 Global Step: 105060 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:39:25,251-Speed 3096.88 samples/sec Loss 8.3594 LearningRate 0.0333 Epoch: 8 Global Step: 105070 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:39:28,605-Speed 3053.61 samples/sec Loss 8.4718 LearningRate 0.0333 Epoch: 8 Global Step: 105080 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:39:31,962-Speed 3051.15 samples/sec Loss 8.3518 LearningRate 0.0333 Epoch: 8 Global Step: 105090 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:39:35,383-Speed 2994.56 samples/sec Loss 8.2172 LearningRate 0.0333 Epoch: 8 Global Step: 105100 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:39:38,768-Speed 3025.38 samples/sec Loss 8.3437 LearningRate 0.0333 Epoch: 8 Global Step: 105110 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:39:42,209-Speed 2977.34 samples/sec Loss 8.5053 LearningRate 0.0333 Epoch: 8 Global Step: 105120 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:39:45,577-Speed 3040.83 samples/sec Loss 8.4245 LearningRate 0.0333 Epoch: 8 Global Step: 105130 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:39:48,965-Speed 3023.31 samples/sec Loss 8.4332 LearningRate 0.0333 Epoch: 8 Global Step: 105140 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:39:52,383-Speed 2996.99 samples/sec Loss 8.2948 LearningRate 0.0333 Epoch: 8 Global Step: 105150 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:39:55,713-Speed 3075.34 samples/sec Loss 8.1726 LearningRate 0.0333 Epoch: 8 Global Step: 105160 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:39:59,118-Speed 3009.67 samples/sec Loss 8.4554 LearningRate 0.0333 Epoch: 8 Global Step: 105170 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:40:02,539-Speed 2993.76 samples/sec Loss 8.4310 LearningRate 0.0332 Epoch: 8 Global Step: 105180 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:40:05,837-Speed 3105.61 samples/sec Loss 8.3679 LearningRate 0.0332 Epoch: 8 Global Step: 105190 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:40:09,196-Speed 3050.04 samples/sec Loss 8.4255 LearningRate 0.0332 Epoch: 8 Global Step: 105200 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:40:12,619-Speed 2992.59 samples/sec Loss 8.3949 LearningRate 0.0332 Epoch: 8 Global Step: 105210 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:40:16,071-Speed 2966.79 samples/sec Loss 8.3291 LearningRate 0.0332 Epoch: 8 Global Step: 105220 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:40:19,515-Speed 2974.47 samples/sec Loss 8.2949 LearningRate 0.0332 Epoch: 8 Global Step: 105230 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:40:22,859-Speed 3062.62 samples/sec Loss 8.4525 LearningRate 0.0332 Epoch: 8 Global Step: 105240 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:40:26,162-Speed 3101.55 samples/sec Loss 8.3903 LearningRate 0.0332 Epoch: 8 Global Step: 105250 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:40:29,585-Speed 2992.59 samples/sec Loss 8.3047 LearningRate 0.0332 Epoch: 8 Global Step: 105260 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:40:34,175-Speed 2231.33 samples/sec Loss 8.3342 LearningRate 0.0332 Epoch: 8 Global Step: 105270 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:40:37,506-Speed 3075.50 samples/sec Loss 8.3083 LearningRate 0.0332 Epoch: 8 Global Step: 105280 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:40:40,816-Speed 3094.36 samples/sec Loss 8.3567 LearningRate 0.0332 Epoch: 8 Global Step: 105290 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:40:44,266-Speed 2969.20 samples/sec Loss 8.3041 LearningRate 0.0332 Epoch: 8 Global Step: 105300 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:40:47,622-Speed 3051.70 samples/sec Loss 8.4367 LearningRate 0.0332 Epoch: 8 Global Step: 105310 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:40:50,980-Speed 3050.60 samples/sec Loss 8.3533 LearningRate 0.0332 Epoch: 8 Global Step: 105320 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:40:54,438-Speed 2962.82 samples/sec Loss 8.4140 LearningRate 0.0332 Epoch: 8 Global Step: 105330 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:40:57,860-Speed 2993.22 samples/sec Loss 8.4350 LearningRate 0.0332 Epoch: 8 Global Step: 105340 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:41:01,179-Speed 3085.30 samples/sec Loss 8.2361 LearningRate 0.0332 Epoch: 8 Global Step: 105350 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:41:04,524-Speed 3062.92 samples/sec Loss 8.4529 LearningRate 0.0332 Epoch: 8 Global Step: 105360 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:41:07,877-Speed 3054.74 samples/sec Loss 8.4754 LearningRate 0.0332 Epoch: 8 Global Step: 105370 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:41:11,259-Speed 3028.64 samples/sec Loss 8.2687 LearningRate 0.0332 Epoch: 8 Global Step: 105380 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:41:14,581-Speed 3083.14 samples/sec Loss 8.4799 LearningRate 0.0331 Epoch: 8 Global Step: 105390 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:41:17,934-Speed 3055.09 samples/sec Loss 8.3817 LearningRate 0.0331 Epoch: 8 Global Step: 105400 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:41:21,320-Speed 3024.97 samples/sec Loss 8.3879 LearningRate 0.0331 Epoch: 8 Global Step: 105410 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:41:24,742-Speed 2993.15 samples/sec Loss 8.3133 LearningRate 0.0331 Epoch: 8 Global Step: 105420 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:41:28,109-Speed 3041.83 samples/sec Loss 8.4969 LearningRate 0.0331 Epoch: 8 Global Step: 105430 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:41:31,507-Speed 3014.70 samples/sec Loss 8.4011 LearningRate 0.0331 Epoch: 8 Global Step: 105440 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:41:34,907-Speed 3012.45 samples/sec Loss 8.4398 LearningRate 0.0331 Epoch: 8 Global Step: 105450 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:41:38,239-Speed 3074.85 samples/sec Loss 8.3299 LearningRate 0.0331 Epoch: 8 Global Step: 105460 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:41:41,651-Speed 3001.94 samples/sec Loss 8.3589 LearningRate 0.0331 Epoch: 8 Global Step: 105470 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:41:45,071-Speed 2995.33 samples/sec Loss 8.5656 LearningRate 0.0331 Epoch: 8 Global Step: 105480 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:41:48,432-Speed 3047.53 samples/sec Loss 8.3936 LearningRate 0.0331 Epoch: 8 Global Step: 105490 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:41:51,820-Speed 3022.57 samples/sec Loss 8.4175 LearningRate 0.0331 Epoch: 8 Global Step: 105500 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:41:55,172-Speed 3056.57 samples/sec Loss 8.3557 LearningRate 0.0331 Epoch: 8 Global Step: 105510 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:41:58,502-Speed 3075.32 samples/sec Loss 8.3498 LearningRate 0.0331 Epoch: 8 Global Step: 105520 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:42:01,858-Speed 3052.57 samples/sec Loss 8.2326 LearningRate 0.0331 Epoch: 8 Global Step: 105530 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:42:05,226-Speed 3041.12 samples/sec Loss 8.2428 LearningRate 0.0331 Epoch: 8 Global Step: 105540 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:42:08,565-Speed 3067.49 samples/sec Loss 8.2626 LearningRate 0.0331 Epoch: 8 Global Step: 105550 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:42:11,955-Speed 3021.24 samples/sec Loss 8.3782 LearningRate 0.0331 Epoch: 8 Global Step: 105560 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:42:15,399-Speed 2974.85 samples/sec Loss 8.4923 LearningRate 0.0331 Epoch: 8 Global Step: 105570 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:42:18,884-Speed 2938.71 samples/sec Loss 8.3532 LearningRate 0.0331 Epoch: 8 Global Step: 105580 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:42:22,416-Speed 2900.23 samples/sec Loss 8.2302 LearningRate 0.0331 Epoch: 8 Global Step: 105590 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 11:42:25,810-Speed 3017.63 samples/sec Loss 8.4824 LearningRate 0.0331 Epoch: 8 Global Step: 105600 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 11:42:29,287-Speed 2946.18 samples/sec Loss 8.4594 LearningRate 0.0330 Epoch: 8 Global Step: 105610 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 11:42:32,705-Speed 2996.90 samples/sec Loss 8.3068 LearningRate 0.0330 Epoch: 8 Global Step: 105620 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 11:42:36,094-Speed 3022.46 samples/sec Loss 8.2995 LearningRate 0.0330 Epoch: 8 Global Step: 105630 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 11:42:39,553-Speed 2961.16 samples/sec Loss 8.2940 LearningRate 0.0330 Epoch: 8 Global Step: 105640 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 11:42:42,899-Speed 3061.40 samples/sec Loss 8.3750 LearningRate 0.0330 Epoch: 8 Global Step: 105650 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:42:46,382-Speed 2940.89 samples/sec Loss 8.3362 LearningRate 0.0330 Epoch: 8 Global Step: 105660 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:42:49,756-Speed 3035.98 samples/sec Loss 8.4181 LearningRate 0.0330 Epoch: 8 Global Step: 105670 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:42:53,084-Speed 3077.88 samples/sec Loss 8.4611 LearningRate 0.0330 Epoch: 8 Global Step: 105680 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:42:56,419-Speed 3071.22 samples/sec Loss 8.4204 LearningRate 0.0330 Epoch: 8 Global Step: 105690 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:42:59,840-Speed 2994.65 samples/sec Loss 8.4863 LearningRate 0.0330 Epoch: 8 Global Step: 105700 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:43:03,243-Speed 3009.71 samples/sec Loss 8.2893 LearningRate 0.0330 Epoch: 8 Global Step: 105710 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:43:06,713-Speed 2951.93 samples/sec Loss 8.3029 LearningRate 0.0330 Epoch: 8 Global Step: 105720 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:43:10,137-Speed 2991.16 samples/sec Loss 8.4602 LearningRate 0.0330 Epoch: 8 Global Step: 105730 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:43:13,505-Speed 3041.92 samples/sec Loss 8.4796 LearningRate 0.0330 Epoch: 8 Global Step: 105740 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:43:16,902-Speed 3014.28 samples/sec Loss 8.5562 LearningRate 0.0330 Epoch: 8 Global Step: 105750 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 11:43:20,275-Speed 3037.22 samples/sec Loss 8.3307 LearningRate 0.0330 Epoch: 8 Global Step: 105760 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:43:23,645-Speed 3040.03 samples/sec Loss 8.3713 LearningRate 0.0330 Epoch: 8 Global Step: 105770 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:43:27,084-Speed 2978.54 samples/sec Loss 8.4637 LearningRate 0.0330 Epoch: 8 Global Step: 105780 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:43:30,488-Speed 3009.17 samples/sec Loss 8.3265 LearningRate 0.0330 Epoch: 8 Global Step: 105790 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:43:33,918-Speed 2986.06 samples/sec Loss 8.3541 LearningRate 0.0330 Epoch: 8 Global Step: 105800 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:43:37,353-Speed 2982.43 samples/sec Loss 8.3154 LearningRate 0.0330 Epoch: 8 Global Step: 105810 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:43:40,715-Speed 3045.99 samples/sec Loss 8.3044 LearningRate 0.0330 Epoch: 8 Global Step: 105820 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:43:44,118-Speed 3010.04 samples/sec Loss 8.3685 LearningRate 0.0329 Epoch: 8 Global Step: 105830 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:43:47,501-Speed 3027.44 samples/sec Loss 8.4660 LearningRate 0.0329 Epoch: 8 Global Step: 105840 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:43:50,912-Speed 3002.92 samples/sec Loss 8.4000 LearningRate 0.0329 Epoch: 8 Global Step: 105850 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:43:54,368-Speed 2963.85 samples/sec Loss 8.3763 LearningRate 0.0329 Epoch: 8 Global Step: 105860 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:43:57,793-Speed 2990.98 samples/sec Loss 8.4040 LearningRate 0.0329 Epoch: 8 Global Step: 105870 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:44:01,185-Speed 3019.48 samples/sec Loss 8.5079 LearningRate 0.0329 Epoch: 8 Global Step: 105880 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:44:04,502-Speed 3087.51 samples/sec Loss 8.4770 LearningRate 0.0329 Epoch: 8 Global Step: 105890 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:44:07,856-Speed 3054.89 samples/sec Loss 8.3890 LearningRate 0.0329 Epoch: 8 Global Step: 105900 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:44:11,266-Speed 3004.03 samples/sec Loss 8.4746 LearningRate 0.0329 Epoch: 8 Global Step: 105910 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:44:14,640-Speed 3036.00 samples/sec Loss 8.4222 LearningRate 0.0329 Epoch: 8 Global Step: 105920 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:44:18,037-Speed 3014.56 samples/sec Loss 8.4480 LearningRate 0.0329 Epoch: 8 Global Step: 105930 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:44:21,398-Speed 3047.80 samples/sec Loss 8.3017 LearningRate 0.0329 Epoch: 8 Global Step: 105940 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:44:24,767-Speed 3040.00 samples/sec Loss 8.3152 LearningRate 0.0329 Epoch: 8 Global Step: 105950 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:44:28,211-Speed 2974.27 samples/sec Loss 8.3072 LearningRate 0.0329 Epoch: 8 Global Step: 105960 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:44:31,669-Speed 2961.94 samples/sec Loss 8.4262 LearningRate 0.0329 Epoch: 8 Global Step: 105970 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:44:35,099-Speed 2986.75 samples/sec Loss 8.4315 LearningRate 0.0329 Epoch: 8 Global Step: 105980 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:44:38,519-Speed 2994.75 samples/sec Loss 8.4790 LearningRate 0.0329 Epoch: 8 Global Step: 105990 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:44:41,909-Speed 3021.11 samples/sec Loss 8.2829 LearningRate 0.0329 Epoch: 8 Global Step: 106000 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:44:45,237-Speed 3078.19 samples/sec Loss 8.3115 LearningRate 0.0329 Epoch: 8 Global Step: 106010 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:44:48,601-Speed 3045.29 samples/sec Loss 8.2486 LearningRate 0.0329 Epoch: 8 Global Step: 106020 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:44:51,959-Speed 3050.16 samples/sec Loss 8.4155 LearningRate 0.0329 Epoch: 8 Global Step: 106030 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:44:55,323-Speed 3045.10 samples/sec Loss 8.2517 LearningRate 0.0328 Epoch: 8 Global Step: 106040 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:44:58,657-Speed 3072.61 samples/sec Loss 8.3458 LearningRate 0.0328 Epoch: 8 Global Step: 106050 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:45:02,132-Speed 2947.10 samples/sec Loss 8.3112 LearningRate 0.0328 Epoch: 8 Global Step: 106060 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:45:05,492-Speed 3048.50 samples/sec Loss 8.3836 LearningRate 0.0328 Epoch: 8 Global Step: 106070 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:45:08,837-Speed 3062.31 samples/sec Loss 8.4290 LearningRate 0.0328 Epoch: 8 Global Step: 106080 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:45:12,171-Speed 3072.46 samples/sec Loss 8.5578 LearningRate 0.0328 Epoch: 8 Global Step: 106090 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 11:45:15,561-Speed 3021.07 samples/sec Loss 8.4265 LearningRate 0.0328 Epoch: 8 Global Step: 106100 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 11:45:18,966-Speed 3008.61 samples/sec Loss 8.3535 LearningRate 0.0328 Epoch: 8 Global Step: 106110 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 11:45:22,325-Speed 3049.60 samples/sec Loss 8.5030 LearningRate 0.0328 Epoch: 8 Global Step: 106120 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 11:45:25,722-Speed 3015.22 samples/sec Loss 8.3383 LearningRate 0.0328 Epoch: 8 Global Step: 106130 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 11:45:29,144-Speed 2992.96 samples/sec Loss 8.3041 LearningRate 0.0328 Epoch: 8 Global Step: 106140 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 11:45:32,515-Speed 3038.82 samples/sec Loss 8.4235 LearningRate 0.0328 Epoch: 8 Global Step: 106150 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:45:35,885-Speed 3039.23 samples/sec Loss 8.2727 LearningRate 0.0328 Epoch: 8 Global Step: 106160 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:45:39,205-Speed 3085.62 samples/sec Loss 8.4072 LearningRate 0.0328 Epoch: 8 Global Step: 106170 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:45:42,534-Speed 3076.22 samples/sec Loss 8.4614 LearningRate 0.0328 Epoch: 8 Global Step: 106180 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:45:45,888-Speed 3053.93 samples/sec Loss 8.3970 LearningRate 0.0328 Epoch: 8 Global Step: 106190 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:45:49,282-Speed 3018.36 samples/sec Loss 8.4768 LearningRate 0.0328 Epoch: 8 Global Step: 106200 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:45:52,731-Speed 2969.48 samples/sec Loss 8.2469 LearningRate 0.0328 Epoch: 8 Global Step: 106210 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:45:56,083-Speed 3056.07 samples/sec Loss 8.3956 LearningRate 0.0328 Epoch: 8 Global Step: 106220 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:45:59,461-Speed 3031.70 samples/sec Loss 8.3123 LearningRate 0.0328 Epoch: 8 Global Step: 106230 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:46:02,853-Speed 3019.78 samples/sec Loss 8.2950 LearningRate 0.0328 Epoch: 8 Global Step: 106240 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:46:06,252-Speed 3013.72 samples/sec Loss 8.2799 LearningRate 0.0328 Epoch: 8 Global Step: 106250 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:46:09,638-Speed 3024.99 samples/sec Loss 8.3159 LearningRate 0.0327 Epoch: 8 Global Step: 106260 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:46:12,981-Speed 3064.43 samples/sec Loss 8.2547 LearningRate 0.0327 Epoch: 8 Global Step: 106270 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:46:16,323-Speed 3064.64 samples/sec Loss 8.3135 LearningRate 0.0327 Epoch: 8 Global Step: 106280 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:46:19,679-Speed 3052.33 samples/sec Loss 8.3488 LearningRate 0.0327 Epoch: 8 Global Step: 106290 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:46:23,066-Speed 3023.46 samples/sec Loss 8.1961 LearningRate 0.0327 Epoch: 8 Global Step: 106300 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:46:26,448-Speed 3028.38 samples/sec Loss 8.2742 LearningRate 0.0327 Epoch: 8 Global Step: 106310 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:46:29,815-Speed 3042.79 samples/sec Loss 8.4260 LearningRate 0.0327 Epoch: 8 Global Step: 106320 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:46:33,191-Speed 3034.34 samples/sec Loss 8.4399 LearningRate 0.0327 Epoch: 8 Global Step: 106330 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:46:36,519-Speed 3077.47 samples/sec Loss 8.3924 LearningRate 0.0327 Epoch: 8 Global Step: 106340 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:46:39,863-Speed 3063.24 samples/sec Loss 8.3917 LearningRate 0.0327 Epoch: 8 Global Step: 106350 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:46:43,216-Speed 3054.56 samples/sec Loss 8.3402 LearningRate 0.0327 Epoch: 8 Global Step: 106360 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:46:46,623-Speed 3006.02 samples/sec Loss 8.2924 LearningRate 0.0327 Epoch: 8 Global Step: 106370 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:46:49,950-Speed 3078.90 samples/sec Loss 8.3202 LearningRate 0.0327 Epoch: 8 Global Step: 106380 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:46:53,343-Speed 3019.33 samples/sec Loss 8.2276 LearningRate 0.0327 Epoch: 8 Global Step: 106390 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:46:56,724-Speed 3029.28 samples/sec Loss 8.3062 LearningRate 0.0327 Epoch: 8 Global Step: 106400 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:47:00,122-Speed 3014.64 samples/sec Loss 8.3512 LearningRate 0.0327 Epoch: 8 Global Step: 106410 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:47:03,511-Speed 3022.14 samples/sec Loss 8.4169 LearningRate 0.0327 Epoch: 8 Global Step: 106420 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:47:06,909-Speed 3014.40 samples/sec Loss 8.3839 LearningRate 0.0327 Epoch: 8 Global Step: 106430 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:47:10,275-Speed 3043.14 samples/sec Loss 8.2129 LearningRate 0.0327 Epoch: 8 Global Step: 106440 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:47:13,639-Speed 3044.99 samples/sec Loss 8.3329 LearningRate 0.0327 Epoch: 8 Global Step: 106450 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:47:16,962-Speed 3081.58 samples/sec Loss 8.3344 LearningRate 0.0327 Epoch: 8 Global Step: 106460 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:47:20,368-Speed 3007.31 samples/sec Loss 8.1591 LearningRate 0.0327 Epoch: 8 Global Step: 106470 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:47:23,684-Speed 3089.17 samples/sec Loss 8.1150 LearningRate 0.0326 Epoch: 8 Global Step: 106480 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:47:27,024-Speed 3066.37 samples/sec Loss 8.2457 LearningRate 0.0326 Epoch: 8 Global Step: 106490 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:47:30,399-Speed 3035.12 samples/sec Loss 8.3155 LearningRate 0.0326 Epoch: 8 Global Step: 106500 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:47:33,791-Speed 3020.12 samples/sec Loss 8.4462 LearningRate 0.0326 Epoch: 8 Global Step: 106510 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:47:37,186-Speed 3016.66 samples/sec Loss 8.4687 LearningRate 0.0326 Epoch: 8 Global Step: 106520 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:47:40,559-Speed 3037.05 samples/sec Loss 8.4997 LearningRate 0.0326 Epoch: 8 Global Step: 106530 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:47:43,872-Speed 3091.53 samples/sec Loss 8.2712 LearningRate 0.0326 Epoch: 8 Global Step: 106540 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:47:47,233-Speed 3047.52 samples/sec Loss 8.2860 LearningRate 0.0326 Epoch: 8 Global Step: 106550 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:47:50,589-Speed 3051.90 samples/sec Loss 8.4165 LearningRate 0.0326 Epoch: 8 Global Step: 106560 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:47:53,995-Speed 3007.68 samples/sec Loss 8.4278 LearningRate 0.0326 Epoch: 8 Global Step: 106570 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:47:57,420-Speed 2989.79 samples/sec Loss 8.4290 LearningRate 0.0326 Epoch: 8 Global Step: 106580 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 11:48:00,868-Speed 2970.86 samples/sec Loss 8.2365 LearningRate 0.0326 Epoch: 8 Global Step: 106590 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 11:48:04,213-Speed 3062.36 samples/sec Loss 8.2448 LearningRate 0.0326 Epoch: 8 Global Step: 106600 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 11:48:07,680-Speed 2954.44 samples/sec Loss 8.3832 LearningRate 0.0326 Epoch: 8 Global Step: 106610 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 11:48:11,027-Speed 3059.90 samples/sec Loss 8.4636 LearningRate 0.0326 Epoch: 8 Global Step: 106620 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:48:14,410-Speed 3027.52 samples/sec Loss 8.3498 LearningRate 0.0326 Epoch: 8 Global Step: 106630 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:48:17,763-Speed 3055.47 samples/sec Loss 8.4316 LearningRate 0.0326 Epoch: 8 Global Step: 106640 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:48:21,077-Speed 3091.16 samples/sec Loss 8.2073 LearningRate 0.0326 Epoch: 8 Global Step: 106650 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:48:24,377-Speed 3103.82 samples/sec Loss 8.3756 LearningRate 0.0326 Epoch: 8 Global Step: 106660 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:48:27,697-Speed 3085.38 samples/sec Loss 8.2070 LearningRate 0.0326 Epoch: 8 Global Step: 106670 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:48:31,083-Speed 3025.04 samples/sec Loss 8.3273 LearningRate 0.0326 Epoch: 8 Global Step: 106680 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:48:34,447-Speed 3044.95 samples/sec Loss 8.3373 LearningRate 0.0325 Epoch: 8 Global Step: 106690 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:48:37,799-Speed 3055.82 samples/sec Loss 8.3629 LearningRate 0.0325 Epoch: 8 Global Step: 106700 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:48:41,182-Speed 3027.62 samples/sec Loss 8.3752 LearningRate 0.0325 Epoch: 8 Global Step: 106710 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:48:44,504-Speed 3083.71 samples/sec Loss 8.3325 LearningRate 0.0325 Epoch: 8 Global Step: 106720 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 11:48:47,871-Speed 3041.70 samples/sec Loss 8.2100 LearningRate 0.0325 Epoch: 8 Global Step: 106730 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 11:48:51,268-Speed 3015.75 samples/sec Loss 8.2116 LearningRate 0.0325 Epoch: 8 Global Step: 106740 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 11:48:54,693-Speed 2990.27 samples/sec Loss 8.2430 LearningRate 0.0325 Epoch: 8 Global Step: 106750 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 11:48:58,144-Speed 2967.99 samples/sec Loss 8.2225 LearningRate 0.0325 Epoch: 8 Global Step: 106760 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 11:49:01,474-Speed 3076.02 samples/sec Loss 8.2855 LearningRate 0.0325 Epoch: 8 Global Step: 106770 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:49:04,868-Speed 3017.92 samples/sec Loss 8.3148 LearningRate 0.0325 Epoch: 8 Global Step: 106780 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:49:08,214-Speed 3061.54 samples/sec Loss 8.3864 LearningRate 0.0325 Epoch: 8 Global Step: 106790 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:49:11,584-Speed 3039.54 samples/sec Loss 8.2750 LearningRate 0.0325 Epoch: 8 Global Step: 106800 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:49:14,965-Speed 3030.06 samples/sec Loss 8.3455 LearningRate 0.0325 Epoch: 8 Global Step: 106810 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:49:18,326-Speed 3047.40 samples/sec Loss 8.3256 LearningRate 0.0325 Epoch: 8 Global Step: 106820 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:49:21,756-Speed 2986.31 samples/sec Loss 8.3668 LearningRate 0.0325 Epoch: 8 Global Step: 106830 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:49:25,130-Speed 3035.91 samples/sec Loss 8.3313 LearningRate 0.0325 Epoch: 8 Global Step: 106840 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:49:28,537-Speed 3006.51 samples/sec Loss 8.3565 LearningRate 0.0325 Epoch: 8 Global Step: 106850 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:49:31,852-Speed 3090.18 samples/sec Loss 8.3030 LearningRate 0.0325 Epoch: 8 Global Step: 106860 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:49:35,231-Speed 3031.36 samples/sec Loss 8.3067 LearningRate 0.0325 Epoch: 8 Global Step: 106870 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 11:49:38,583-Speed 3055.54 samples/sec Loss 8.1398 LearningRate 0.0325 Epoch: 8 Global Step: 106880 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 11:49:41,964-Speed 3029.76 samples/sec Loss 8.1684 LearningRate 0.0325 Epoch: 8 Global Step: 106890 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 11:49:45,336-Speed 3037.39 samples/sec Loss 8.4750 LearningRate 0.0325 Epoch: 8 Global Step: 106900 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 11:49:48,775-Speed 2978.34 samples/sec Loss 8.2103 LearningRate 0.0324 Epoch: 8 Global Step: 106910 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 11:49:52,091-Speed 3088.90 samples/sec Loss 8.3099 LearningRate 0.0324 Epoch: 8 Global Step: 106920 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:49:55,493-Speed 3010.74 samples/sec Loss 8.2390 LearningRate 0.0324 Epoch: 8 Global Step: 106930 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:49:58,819-Speed 3080.54 samples/sec Loss 8.2743 LearningRate 0.0324 Epoch: 8 Global Step: 106940 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:50:02,240-Speed 2994.00 samples/sec Loss 8.1747 LearningRate 0.0324 Epoch: 8 Global Step: 106950 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:50:05,619-Speed 3031.10 samples/sec Loss 8.3005 LearningRate 0.0324 Epoch: 8 Global Step: 106960 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:50:08,984-Speed 3044.29 samples/sec Loss 8.2751 LearningRate 0.0324 Epoch: 8 Global Step: 106970 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:50:12,337-Speed 3054.08 samples/sec Loss 8.2783 LearningRate 0.0324 Epoch: 8 Global Step: 106980 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:50:15,744-Speed 3006.05 samples/sec Loss 8.3639 LearningRate 0.0324 Epoch: 8 Global Step: 106990 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:50:19,112-Speed 3041.67 samples/sec Loss 8.1129 LearningRate 0.0324 Epoch: 8 Global Step: 107000 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:50:22,474-Speed 3046.83 samples/sec Loss 8.4432 LearningRate 0.0324 Epoch: 8 Global Step: 107010 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:50:25,808-Speed 3072.55 samples/sec Loss 8.1291 LearningRate 0.0324 Epoch: 8 Global Step: 107020 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:50:29,140-Speed 3074.79 samples/sec Loss 8.3487 LearningRate 0.0324 Epoch: 8 Global Step: 107030 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:50:32,513-Speed 3036.63 samples/sec Loss 8.2880 LearningRate 0.0324 Epoch: 8 Global Step: 107040 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:50:35,837-Speed 3081.13 samples/sec Loss 8.1055 LearningRate 0.0324 Epoch: 8 Global Step: 107050 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:50:39,220-Speed 3027.39 samples/sec Loss 8.1859 LearningRate 0.0324 Epoch: 8 Global Step: 107060 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:50:42,566-Speed 3061.22 samples/sec Loss 8.2138 LearningRate 0.0324 Epoch: 8 Global Step: 107070 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:50:45,917-Speed 3056.70 samples/sec Loss 8.3923 LearningRate 0.0324 Epoch: 8 Global Step: 107080 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:50:49,236-Speed 3086.01 samples/sec Loss 8.3020 LearningRate 0.0324 Epoch: 8 Global Step: 107090 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:50:52,562-Speed 3079.74 samples/sec Loss 8.2845 LearningRate 0.0324 Epoch: 8 Global Step: 107100 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:50:55,882-Speed 3085.15 samples/sec Loss 8.3217 LearningRate 0.0324 Epoch: 8 Global Step: 107110 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:50:59,217-Speed 3071.84 samples/sec Loss 8.2860 LearningRate 0.0324 Epoch: 8 Global Step: 107120 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:51:02,572-Speed 3053.14 samples/sec Loss 8.2920 LearningRate 0.0323 Epoch: 8 Global Step: 107130 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:51:06,000-Speed 2987.64 samples/sec Loss 8.3685 LearningRate 0.0323 Epoch: 8 Global Step: 107140 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:51:09,370-Speed 3039.57 samples/sec Loss 8.3707 LearningRate 0.0323 Epoch: 8 Global Step: 107150 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:51:12,689-Speed 3085.88 samples/sec Loss 8.3198 LearningRate 0.0323 Epoch: 8 Global Step: 107160 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:51:16,063-Speed 3035.71 samples/sec Loss 8.3946 LearningRate 0.0323 Epoch: 8 Global Step: 107170 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:51:19,452-Speed 3022.42 samples/sec Loss 8.3750 LearningRate 0.0323 Epoch: 8 Global Step: 107180 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:51:22,855-Speed 3009.95 samples/sec Loss 8.3301 LearningRate 0.0323 Epoch: 8 Global Step: 107190 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:51:26,226-Speed 3038.47 samples/sec Loss 8.1573 LearningRate 0.0323 Epoch: 8 Global Step: 107200 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 11:51:29,666-Speed 2978.08 samples/sec Loss 8.2520 LearningRate 0.0323 Epoch: 8 Global Step: 107210 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 11:51:33,052-Speed 3024.62 samples/sec Loss 8.2901 LearningRate 0.0323 Epoch: 8 Global Step: 107220 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 11:51:36,464-Speed 3002.29 samples/sec Loss 8.2845 LearningRate 0.0323 Epoch: 8 Global Step: 107230 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 11:51:39,825-Speed 3046.94 samples/sec Loss 8.4480 LearningRate 0.0323 Epoch: 8 Global Step: 107240 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 11:51:43,174-Speed 3060.69 samples/sec Loss 8.1762 LearningRate 0.0323 Epoch: 8 Global Step: 107250 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 11:51:46,558-Speed 3026.82 samples/sec Loss 8.2943 LearningRate 0.0323 Epoch: 8 Global Step: 107260 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 11:51:49,913-Speed 3052.53 samples/sec Loss 8.2283 LearningRate 0.0323 Epoch: 8 Global Step: 107270 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 11:51:53,238-Speed 3080.74 samples/sec Loss 8.2214 LearningRate 0.0323 Epoch: 8 Global Step: 107280 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 11:51:56,642-Speed 3008.54 samples/sec Loss 8.3514 LearningRate 0.0323 Epoch: 8 Global Step: 107290 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 11:51:59,984-Speed 3065.48 samples/sec Loss 8.2806 LearningRate 0.0323 Epoch: 8 Global Step: 107300 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:52:03,391-Speed 3006.03 samples/sec Loss 8.2026 LearningRate 0.0323 Epoch: 8 Global Step: 107310 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:52:06,713-Speed 3083.11 samples/sec Loss 8.4055 LearningRate 0.0323 Epoch: 8 Global Step: 107320 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:52:10,067-Speed 3054.05 samples/sec Loss 8.3627 LearningRate 0.0323 Epoch: 8 Global Step: 107330 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:52:13,450-Speed 3027.66 samples/sec Loss 8.3472 LearningRate 0.0323 Epoch: 8 Global Step: 107340 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:52:16,853-Speed 3010.43 samples/sec Loss 8.1917 LearningRate 0.0322 Epoch: 8 Global Step: 107350 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:52:20,205-Speed 3055.67 samples/sec Loss 8.3397 LearningRate 0.0322 Epoch: 8 Global Step: 107360 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:52:23,586-Speed 3029.62 samples/sec Loss 8.2820 LearningRate 0.0322 Epoch: 8 Global Step: 107370 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:52:27,000-Speed 2999.68 samples/sec Loss 8.3853 LearningRate 0.0322 Epoch: 8 Global Step: 107380 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:52:30,383-Speed 3028.00 samples/sec Loss 8.2871 LearningRate 0.0322 Epoch: 8 Global Step: 107390 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:52:33,738-Speed 3052.87 samples/sec Loss 8.3361 LearningRate 0.0322 Epoch: 8 Global Step: 107400 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:52:37,079-Speed 3065.45 samples/sec Loss 8.3488 LearningRate 0.0322 Epoch: 8 Global Step: 107410 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:52:40,398-Speed 3086.41 samples/sec Loss 8.3841 LearningRate 0.0322 Epoch: 8 Global Step: 107420 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:52:43,812-Speed 3000.66 samples/sec Loss 8.2051 LearningRate 0.0322 Epoch: 8 Global Step: 107430 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:52:47,233-Speed 2993.34 samples/sec Loss 8.2928 LearningRate 0.0322 Epoch: 8 Global Step: 107440 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:52:50,625-Speed 3019.91 samples/sec Loss 8.2754 LearningRate 0.0322 Epoch: 8 Global Step: 107450 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:52:53,951-Speed 3080.06 samples/sec Loss 8.2567 LearningRate 0.0322 Epoch: 8 Global Step: 107460 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:52:57,306-Speed 3052.22 samples/sec Loss 8.3264 LearningRate 0.0322 Epoch: 8 Global Step: 107470 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:53:00,684-Speed 3032.76 samples/sec Loss 8.2015 LearningRate 0.0322 Epoch: 8 Global Step: 107480 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:53:04,016-Speed 3074.20 samples/sec Loss 8.2890 LearningRate 0.0322 Epoch: 8 Global Step: 107490 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:53:07,427-Speed 3002.45 samples/sec Loss 8.2957 LearningRate 0.0322 Epoch: 8 Global Step: 107500 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:53:10,840-Speed 3000.99 samples/sec Loss 8.2069 LearningRate 0.0322 Epoch: 8 Global Step: 107510 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:53:14,261-Speed 2994.63 samples/sec Loss 8.2661 LearningRate 0.0322 Epoch: 8 Global Step: 107520 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:53:17,640-Speed 3030.35 samples/sec Loss 8.2936 LearningRate 0.0322 Epoch: 8 Global Step: 107530 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:53:21,081-Speed 2976.82 samples/sec Loss 8.3029 LearningRate 0.0322 Epoch: 8 Global Step: 107540 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:53:24,514-Speed 2983.72 samples/sec Loss 8.1970 LearningRate 0.0322 Epoch: 8 Global Step: 107550 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:53:27,912-Speed 3014.05 samples/sec Loss 8.2767 LearningRate 0.0322 Epoch: 8 Global Step: 107560 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:53:31,245-Speed 3073.46 samples/sec Loss 8.1979 LearningRate 0.0321 Epoch: 8 Global Step: 107570 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:53:34,585-Speed 3066.54 samples/sec Loss 8.3397 LearningRate 0.0321 Epoch: 8 Global Step: 107580 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:53:37,935-Speed 3058.20 samples/sec Loss 8.3029 LearningRate 0.0321 Epoch: 8 Global Step: 107590 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:53:41,298-Speed 3045.45 samples/sec Loss 8.2531 LearningRate 0.0321 Epoch: 8 Global Step: 107600 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:53:44,688-Speed 3022.05 samples/sec Loss 8.3065 LearningRate 0.0321 Epoch: 8 Global Step: 107610 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:53:48,058-Speed 3039.19 samples/sec Loss 8.1721 LearningRate 0.0321 Epoch: 8 Global Step: 107620 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:53:51,578-Speed 2910.22 samples/sec Loss 8.2839 LearningRate 0.0321 Epoch: 8 Global Step: 107630 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:53:55,010-Speed 2984.71 samples/sec Loss 8.2582 LearningRate 0.0321 Epoch: 8 Global Step: 107640 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:53:58,422-Speed 3002.42 samples/sec Loss 8.3714 LearningRate 0.0321 Epoch: 8 Global Step: 107650 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:54:01,863-Speed 2977.19 samples/sec Loss 8.2494 LearningRate 0.0321 Epoch: 8 Global Step: 107660 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:54:05,211-Speed 3058.85 samples/sec Loss 8.2747 LearningRate 0.0321 Epoch: 8 Global Step: 107670 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:54:08,554-Speed 3064.34 samples/sec Loss 8.3767 LearningRate 0.0321 Epoch: 8 Global Step: 107680 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:54:11,909-Speed 3052.97 samples/sec Loss 8.2489 LearningRate 0.0321 Epoch: 8 Global Step: 107690 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:54:15,249-Speed 3067.41 samples/sec Loss 8.1264 LearningRate 0.0321 Epoch: 8 Global Step: 107700 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:54:18,607-Speed 3050.24 samples/sec Loss 8.2338 LearningRate 0.0321 Epoch: 8 Global Step: 107710 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:54:21,965-Speed 3049.97 samples/sec Loss 8.2076 LearningRate 0.0321 Epoch: 8 Global Step: 107720 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:54:25,307-Speed 3065.07 samples/sec Loss 8.3363 LearningRate 0.0321 Epoch: 8 Global Step: 107730 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:54:28,685-Speed 3032.31 samples/sec Loss 8.3494 LearningRate 0.0321 Epoch: 8 Global Step: 107740 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:54:32,075-Speed 3021.22 samples/sec Loss 8.1879 LearningRate 0.0321 Epoch: 8 Global Step: 107750 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:54:35,464-Speed 3022.86 samples/sec Loss 8.2249 LearningRate 0.0321 Epoch: 8 Global Step: 107760 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:54:38,789-Speed 3080.42 samples/sec Loss 8.2947 LearningRate 0.0321 Epoch: 8 Global Step: 107770 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:54:42,136-Speed 3059.87 samples/sec Loss 8.2756 LearningRate 0.0321 Epoch: 8 Global Step: 107780 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:54:45,510-Speed 3036.07 samples/sec Loss 8.3645 LearningRate 0.0320 Epoch: 8 Global Step: 107790 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:54:48,944-Speed 2983.09 samples/sec Loss 8.1605 LearningRate 0.0320 Epoch: 8 Global Step: 107800 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:54:52,278-Speed 3071.64 samples/sec Loss 8.2754 LearningRate 0.0320 Epoch: 8 Global Step: 107810 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:54:55,596-Speed 3087.37 samples/sec Loss 8.3044 LearningRate 0.0320 Epoch: 8 Global Step: 107820 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:54:58,941-Speed 3062.32 samples/sec Loss 8.3037 LearningRate 0.0320 Epoch: 8 Global Step: 107830 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:55:02,303-Speed 3045.91 samples/sec Loss 8.2464 LearningRate 0.0320 Epoch: 8 Global Step: 107840 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:55:05,647-Speed 3063.15 samples/sec Loss 8.3117 LearningRate 0.0320 Epoch: 8 Global Step: 107850 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:55:09,000-Speed 3055.31 samples/sec Loss 8.3376 LearningRate 0.0320 Epoch: 8 Global Step: 107860 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:55:12,327-Speed 3078.67 samples/sec Loss 8.2461 LearningRate 0.0320 Epoch: 8 Global Step: 107870 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:55:15,718-Speed 3020.14 samples/sec Loss 8.1526 LearningRate 0.0320 Epoch: 8 Global Step: 107880 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:55:19,110-Speed 3020.03 samples/sec Loss 8.2219 LearningRate 0.0320 Epoch: 8 Global Step: 107890 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 11:55:22,527-Speed 2998.01 samples/sec Loss 8.0381 LearningRate 0.0320 Epoch: 8 Global Step: 107900 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 11:55:26,006-Speed 2943.48 samples/sec Loss 8.1331 LearningRate 0.0320 Epoch: 8 Global Step: 107910 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 11:55:29,363-Speed 3051.73 samples/sec Loss 8.2536 LearningRate 0.0320 Epoch: 8 Global Step: 107920 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 11:55:32,737-Speed 3035.67 samples/sec Loss 8.3082 LearningRate 0.0320 Epoch: 8 Global Step: 107930 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 11:55:36,059-Speed 3083.66 samples/sec Loss 8.3433 LearningRate 0.0320 Epoch: 8 Global Step: 107940 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 11:55:39,532-Speed 2948.85 samples/sec Loss 8.2117 LearningRate 0.0320 Epoch: 8 Global Step: 107950 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 11:55:42,963-Speed 2985.32 samples/sec Loss 8.2906 LearningRate 0.0320 Epoch: 8 Global Step: 107960 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 11:55:46,321-Speed 3050.14 samples/sec Loss 8.1570 LearningRate 0.0320 Epoch: 8 Global Step: 107970 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:55:49,710-Speed 3022.60 samples/sec Loss 8.2042 LearningRate 0.0320 Epoch: 8 Global Step: 107980 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:55:53,090-Speed 3030.08 samples/sec Loss 8.2546 LearningRate 0.0320 Epoch: 8 Global Step: 107990 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:55:56,460-Speed 3039.76 samples/sec Loss 8.2044 LearningRate 0.0320 Epoch: 8 Global Step: 108000 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:55:59,837-Speed 3033.51 samples/sec Loss 8.3937 LearningRate 0.0319 Epoch: 8 Global Step: 108010 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:56:03,248-Speed 3002.68 samples/sec Loss 8.3429 LearningRate 0.0319 Epoch: 8 Global Step: 108020 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:56:06,661-Speed 3000.84 samples/sec Loss 8.1239 LearningRate 0.0319 Epoch: 8 Global Step: 108030 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:56:10,103-Speed 2975.35 samples/sec Loss 8.2066 LearningRate 0.0319 Epoch: 8 Global Step: 108040 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:56:13,490-Speed 3024.46 samples/sec Loss 8.2326 LearningRate 0.0319 Epoch: 8 Global Step: 108050 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:56:16,947-Speed 2962.79 samples/sec Loss 8.3221 LearningRate 0.0319 Epoch: 8 Global Step: 108060 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:56:20,351-Speed 3009.09 samples/sec Loss 8.2082 LearningRate 0.0319 Epoch: 8 Global Step: 108070 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:56:23,727-Speed 3033.66 samples/sec Loss 8.2116 LearningRate 0.0319 Epoch: 8 Global Step: 108080 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:56:27,060-Speed 3073.01 samples/sec Loss 8.1529 LearningRate 0.0319 Epoch: 8 Global Step: 108090 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:56:30,421-Speed 3048.18 samples/sec Loss 8.2950 LearningRate 0.0319 Epoch: 8 Global Step: 108100 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:56:33,772-Speed 3056.27 samples/sec Loss 8.2765 LearningRate 0.0319 Epoch: 8 Global Step: 108110 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:56:37,159-Speed 3023.97 samples/sec Loss 8.3084 LearningRate 0.0319 Epoch: 8 Global Step: 108120 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:56:40,597-Speed 2979.89 samples/sec Loss 8.1905 LearningRate 0.0319 Epoch: 8 Global Step: 108130 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:56:44,071-Speed 2948.53 samples/sec Loss 8.2921 LearningRate 0.0319 Epoch: 8 Global Step: 108140 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:56:47,399-Speed 3077.14 samples/sec Loss 8.2492 LearningRate 0.0319 Epoch: 8 Global Step: 108150 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:56:50,792-Speed 3018.92 samples/sec Loss 8.2467 LearningRate 0.0319 Epoch: 8 Global Step: 108160 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:56:54,218-Speed 2990.47 samples/sec Loss 8.1547 LearningRate 0.0319 Epoch: 8 Global Step: 108170 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:56:57,595-Speed 3032.68 samples/sec Loss 8.1989 LearningRate 0.0319 Epoch: 8 Global Step: 108180 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:57:01,025-Speed 2986.64 samples/sec Loss 8.1249 LearningRate 0.0319 Epoch: 8 Global Step: 108190 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:57:04,406-Speed 3029.19 samples/sec Loss 8.1214 LearningRate 0.0319 Epoch: 8 Global Step: 108200 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:57:07,785-Speed 3031.47 samples/sec Loss 8.3171 LearningRate 0.0319 Epoch: 8 Global Step: 108210 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:57:11,213-Speed 2988.06 samples/sec Loss 8.1367 LearningRate 0.0319 Epoch: 8 Global Step: 108220 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:57:14,554-Speed 3066.22 samples/sec Loss 8.3665 LearningRate 0.0318 Epoch: 8 Global Step: 108230 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:57:17,910-Speed 3052.36 samples/sec Loss 8.3566 LearningRate 0.0318 Epoch: 8 Global Step: 108240 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 11:57:21,233-Speed 3081.93 samples/sec Loss 8.0752 LearningRate 0.0318 Epoch: 8 Global Step: 108250 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 11:57:24,581-Speed 3059.20 samples/sec Loss 8.1850 LearningRate 0.0318 Epoch: 8 Global Step: 108260 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 11:57:27,942-Speed 3048.18 samples/sec Loss 8.1653 LearningRate 0.0318 Epoch: 8 Global Step: 108270 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 11:57:31,322-Speed 3030.14 samples/sec Loss 8.2574 LearningRate 0.0318 Epoch: 8 Global Step: 108280 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 11:57:34,665-Speed 3064.09 samples/sec Loss 8.3527 LearningRate 0.0318 Epoch: 8 Global Step: 108290 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 11:57:38,081-Speed 2999.03 samples/sec Loss 8.3117 LearningRate 0.0318 Epoch: 8 Global Step: 108300 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:57:41,469-Speed 3022.85 samples/sec Loss 8.2450 LearningRate 0.0318 Epoch: 8 Global Step: 108310 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:57:44,810-Speed 3066.01 samples/sec Loss 8.2962 LearningRate 0.0318 Epoch: 8 Global Step: 108320 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:57:48,158-Speed 3059.07 samples/sec Loss 8.2579 LearningRate 0.0318 Epoch: 8 Global Step: 108330 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:57:51,580-Speed 2993.07 samples/sec Loss 8.2389 LearningRate 0.0318 Epoch: 8 Global Step: 108340 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:57:55,036-Speed 2964.10 samples/sec Loss 8.1075 LearningRate 0.0318 Epoch: 8 Global Step: 108350 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:57:58,422-Speed 3025.06 samples/sec Loss 8.2285 LearningRate 0.0318 Epoch: 8 Global Step: 108360 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:58:01,770-Speed 3059.37 samples/sec Loss 8.2285 LearningRate 0.0318 Epoch: 8 Global Step: 108370 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:58:05,091-Speed 3084.45 samples/sec Loss 8.2509 LearningRate 0.0318 Epoch: 8 Global Step: 108380 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:58:08,481-Speed 3021.37 samples/sec Loss 8.2481 LearningRate 0.0318 Epoch: 8 Global Step: 108390 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:58:11,917-Speed 2981.11 samples/sec Loss 8.3313 LearningRate 0.0318 Epoch: 8 Global Step: 108400 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 11:58:15,326-Speed 3004.96 samples/sec Loss 8.2676 LearningRate 0.0318 Epoch: 8 Global Step: 108410 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 11:58:18,774-Speed 2970.89 samples/sec Loss 8.3374 LearningRate 0.0318 Epoch: 8 Global Step: 108420 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 11:58:22,107-Speed 3073.40 samples/sec Loss 8.2802 LearningRate 0.0318 Epoch: 8 Global Step: 108430 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:58:25,412-Speed 3098.85 samples/sec Loss 8.2817 LearningRate 0.0318 Epoch: 8 Global Step: 108440 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:58:28,784-Speed 3037.88 samples/sec Loss 8.0996 LearningRate 0.0317 Epoch: 8 Global Step: 108450 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:58:32,153-Speed 3040.32 samples/sec Loss 8.1778 LearningRate 0.0317 Epoch: 8 Global Step: 108460 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:58:35,643-Speed 2935.26 samples/sec Loss 8.2580 LearningRate 0.0317 Epoch: 8 Global Step: 108470 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:58:38,952-Speed 3095.28 samples/sec Loss 8.2023 LearningRate 0.0317 Epoch: 8 Global Step: 108480 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:58:42,280-Speed 3077.91 samples/sec Loss 8.1966 LearningRate 0.0317 Epoch: 8 Global Step: 108490 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:58:45,613-Speed 3073.12 samples/sec Loss 8.1138 LearningRate 0.0317 Epoch: 8 Global Step: 108500 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:58:49,007-Speed 3017.51 samples/sec Loss 8.1542 LearningRate 0.0317 Epoch: 8 Global Step: 108510 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:58:52,363-Speed 3052.03 samples/sec Loss 8.1771 LearningRate 0.0317 Epoch: 8 Global Step: 108520 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:58:55,771-Speed 3005.76 samples/sec Loss 8.3994 LearningRate 0.0317 Epoch: 8 Global Step: 108530 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 11:58:59,130-Speed 3050.20 samples/sec Loss 8.1788 LearningRate 0.0317 Epoch: 8 Global Step: 108540 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 11:59:02,455-Speed 3080.35 samples/sec Loss 8.2618 LearningRate 0.0317 Epoch: 8 Global Step: 108550 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 11:59:05,790-Speed 3071.38 samples/sec Loss 8.2395 LearningRate 0.0317 Epoch: 8 Global Step: 108560 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:59:09,153-Speed 3045.40 samples/sec Loss 8.1553 LearningRate 0.0317 Epoch: 8 Global Step: 108570 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:59:12,553-Speed 3012.88 samples/sec Loss 8.2405 LearningRate 0.0317 Epoch: 8 Global Step: 108580 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:59:15,865-Speed 3092.73 samples/sec Loss 8.2335 LearningRate 0.0317 Epoch: 8 Global Step: 108590 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:59:19,269-Speed 3009.27 samples/sec Loss 8.3411 LearningRate 0.0317 Epoch: 8 Global Step: 108600 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:59:22,589-Speed 3084.91 samples/sec Loss 8.1903 LearningRate 0.0317 Epoch: 8 Global Step: 108610 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:59:26,046-Speed 2962.83 samples/sec Loss 8.2113 LearningRate 0.0317 Epoch: 8 Global Step: 108620 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:59:29,477-Speed 2985.16 samples/sec Loss 8.0401 LearningRate 0.0317 Epoch: 8 Global Step: 108630 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:59:32,810-Speed 3073.17 samples/sec Loss 8.2723 LearningRate 0.0317 Epoch: 8 Global Step: 108640 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:59:36,214-Speed 3009.79 samples/sec Loss 8.1459 LearningRate 0.0317 Epoch: 8 Global Step: 108650 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:59:39,647-Speed 2983.40 samples/sec Loss 8.1232 LearningRate 0.0317 Epoch: 8 Global Step: 108660 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:59:43,066-Speed 2995.54 samples/sec Loss 8.1251 LearningRate 0.0316 Epoch: 8 Global Step: 108670 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 11:59:46,506-Speed 2977.53 samples/sec Loss 8.1670 LearningRate 0.0316 Epoch: 8 Global Step: 108680 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:59:49,984-Speed 2945.29 samples/sec Loss 8.1454 LearningRate 0.0316 Epoch: 8 Global Step: 108690 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:59:53,450-Speed 2954.94 samples/sec Loss 8.1132 LearningRate 0.0316 Epoch: 8 Global Step: 108700 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 11:59:56,766-Speed 3088.42 samples/sec Loss 8.2254 LearningRate 0.0316 Epoch: 8 Global Step: 108710 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:00:00,114-Speed 3060.32 samples/sec Loss 8.1794 LearningRate 0.0316 Epoch: 8 Global Step: 108720 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:00:03,547-Speed 2983.54 samples/sec Loss 8.1295 LearningRate 0.0316 Epoch: 8 Global Step: 108730 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:00:06,935-Speed 3022.91 samples/sec Loss 8.0872 LearningRate 0.0316 Epoch: 8 Global Step: 108740 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:00:10,358-Speed 2992.57 samples/sec Loss 8.2341 LearningRate 0.0316 Epoch: 8 Global Step: 108750 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:00:13,821-Speed 2957.71 samples/sec Loss 8.2089 LearningRate 0.0316 Epoch: 8 Global Step: 108760 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:00:17,193-Speed 3037.33 samples/sec Loss 8.1327 LearningRate 0.0316 Epoch: 8 Global Step: 108770 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:00:20,481-Speed 3116.23 samples/sec Loss 8.3017 LearningRate 0.0316 Epoch: 8 Global Step: 108780 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:00:23,921-Speed 2977.11 samples/sec Loss 8.1711 LearningRate 0.0316 Epoch: 8 Global Step: 108790 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:00:27,304-Speed 3028.05 samples/sec Loss 8.2164 LearningRate 0.0316 Epoch: 8 Global Step: 108800 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:00:30,630-Speed 3079.36 samples/sec Loss 8.1870 LearningRate 0.0316 Epoch: 8 Global Step: 108810 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:00:34,088-Speed 2962.09 samples/sec Loss 8.2363 LearningRate 0.0316 Epoch: 8 Global Step: 108820 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:00:37,454-Speed 3042.86 samples/sec Loss 8.1180 LearningRate 0.0316 Epoch: 8 Global Step: 108830 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:00:40,798-Speed 3063.32 samples/sec Loss 8.0705 LearningRate 0.0316 Epoch: 8 Global Step: 108840 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:00:44,198-Speed 3012.50 samples/sec Loss 8.1895 LearningRate 0.0316 Epoch: 8 Global Step: 108850 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:00:47,654-Speed 2963.92 samples/sec Loss 8.2076 LearningRate 0.0316 Epoch: 8 Global Step: 108860 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:00:51,071-Speed 2998.25 samples/sec Loss 8.1837 LearningRate 0.0316 Epoch: 8 Global Step: 108870 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:00:54,442-Speed 3037.73 samples/sec Loss 8.1658 LearningRate 0.0316 Epoch: 8 Global Step: 108880 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 12:00:57,829-Speed 3024.99 samples/sec Loss 8.2405 LearningRate 0.0315 Epoch: 8 Global Step: 108890 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 12:01:01,157-Speed 3077.97 samples/sec Loss 8.0409 LearningRate 0.0315 Epoch: 8 Global Step: 108900 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 12:01:04,531-Speed 3035.75 samples/sec Loss 8.3393 LearningRate 0.0315 Epoch: 8 Global Step: 108910 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 12:01:07,919-Speed 3023.10 samples/sec Loss 8.1264 LearningRate 0.0315 Epoch: 8 Global Step: 108920 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:01:11,312-Speed 3018.56 samples/sec Loss 8.3057 LearningRate 0.0315 Epoch: 8 Global Step: 108930 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:01:14,711-Speed 3013.85 samples/sec Loss 8.3130 LearningRate 0.0315 Epoch: 8 Global Step: 108940 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:01:18,175-Speed 2957.10 samples/sec Loss 8.2936 LearningRate 0.0315 Epoch: 8 Global Step: 108950 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:01:21,565-Speed 3021.31 samples/sec Loss 8.2283 LearningRate 0.0315 Epoch: 8 Global Step: 108960 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:01:25,014-Speed 2970.49 samples/sec Loss 8.1998 LearningRate 0.0315 Epoch: 8 Global Step: 108970 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:01:28,389-Speed 3034.67 samples/sec Loss 8.0439 LearningRate 0.0315 Epoch: 8 Global Step: 108980 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:01:31,697-Speed 3096.77 samples/sec Loss 8.2589 LearningRate 0.0315 Epoch: 8 Global Step: 108990 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:01:35,028-Speed 3074.53 samples/sec Loss 8.1849 LearningRate 0.0315 Epoch: 8 Global Step: 109000 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:01:38,430-Speed 3010.89 samples/sec Loss 8.3067 LearningRate 0.0315 Epoch: 8 Global Step: 109010 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:01:41,742-Speed 3092.27 samples/sec Loss 8.1906 LearningRate 0.0315 Epoch: 8 Global Step: 109020 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 12:01:45,133-Speed 3020.84 samples/sec Loss 8.3072 LearningRate 0.0315 Epoch: 8 Global Step: 109030 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 12:01:48,475-Speed 3065.57 samples/sec Loss 8.2530 LearningRate 0.0315 Epoch: 8 Global Step: 109040 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:01:51,791-Speed 3088.64 samples/sec Loss 8.2116 LearningRate 0.0315 Epoch: 8 Global Step: 109050 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:01:55,101-Speed 3094.03 samples/sec Loss 8.1133 LearningRate 0.0315 Epoch: 8 Global Step: 109060 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:01:58,414-Speed 3092.80 samples/sec Loss 8.1817 LearningRate 0.0315 Epoch: 8 Global Step: 109070 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:02:01,850-Speed 2980.93 samples/sec Loss 8.1815 LearningRate 0.0315 Epoch: 8 Global Step: 109080 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:02:05,210-Speed 3048.48 samples/sec Loss 8.1759 LearningRate 0.0315 Epoch: 8 Global Step: 109090 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:02:08,628-Speed 2996.69 samples/sec Loss 8.2069 LearningRate 0.0315 Epoch: 8 Global Step: 109100 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:02:11,973-Speed 3061.86 samples/sec Loss 8.2244 LearningRate 0.0314 Epoch: 8 Global Step: 109110 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:02:15,399-Speed 2989.18 samples/sec Loss 8.1934 LearningRate 0.0314 Epoch: 8 Global Step: 109120 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:02:18,764-Speed 3044.43 samples/sec Loss 8.1290 LearningRate 0.0314 Epoch: 8 Global Step: 109130 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:02:22,131-Speed 3042.63 samples/sec Loss 8.0436 LearningRate 0.0314 Epoch: 8 Global Step: 109140 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:02:25,494-Speed 3044.73 samples/sec Loss 8.1833 LearningRate 0.0314 Epoch: 8 Global Step: 109150 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:02:28,907-Speed 3001.23 samples/sec Loss 8.2759 LearningRate 0.0314 Epoch: 8 Global Step: 109160 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:02:32,330-Speed 2993.00 samples/sec Loss 8.2526 LearningRate 0.0314 Epoch: 8 Global Step: 109170 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:02:35,724-Speed 3017.66 samples/sec Loss 8.1957 LearningRate 0.0314 Epoch: 8 Global Step: 109180 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:02:39,182-Speed 2961.86 samples/sec Loss 8.0448 LearningRate 0.0314 Epoch: 8 Global Step: 109190 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:02:42,563-Speed 3029.86 samples/sec Loss 8.1450 LearningRate 0.0314 Epoch: 8 Global Step: 109200 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:02:45,934-Speed 3037.99 samples/sec Loss 8.1359 LearningRate 0.0314 Epoch: 8 Global Step: 109210 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:02:49,298-Speed 3044.68 samples/sec Loss 8.0537 LearningRate 0.0314 Epoch: 8 Global Step: 109220 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:02:52,710-Speed 3002.20 samples/sec Loss 8.1499 LearningRate 0.0314 Epoch: 8 Global Step: 109230 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:02:56,123-Speed 3000.74 samples/sec Loss 8.2720 LearningRate 0.0314 Epoch: 8 Global Step: 109240 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 12:02:59,456-Speed 3074.02 samples/sec Loss 8.1400 LearningRate 0.0314 Epoch: 8 Global Step: 109250 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 12:03:02,807-Speed 3056.68 samples/sec Loss 8.1902 LearningRate 0.0314 Epoch: 8 Global Step: 109260 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 12:03:06,215-Speed 3005.26 samples/sec Loss 8.2738 LearningRate 0.0314 Epoch: 8 Global Step: 109270 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 12:03:09,611-Speed 3016.57 samples/sec Loss 8.2683 LearningRate 0.0314 Epoch: 8 Global Step: 109280 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 12:03:13,064-Speed 2966.13 samples/sec Loss 8.2098 LearningRate 0.0314 Epoch: 8 Global Step: 109290 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 12:03:16,456-Speed 3020.26 samples/sec Loss 8.0632 LearningRate 0.0314 Epoch: 8 Global Step: 109300 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 12:03:19,816-Speed 3048.14 samples/sec Loss 8.1074 LearningRate 0.0314 Epoch: 8 Global Step: 109310 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:03:23,273-Speed 2963.45 samples/sec Loss 8.0976 LearningRate 0.0314 Epoch: 8 Global Step: 109320 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:03:26,676-Speed 3008.96 samples/sec Loss 8.0945 LearningRate 0.0313 Epoch: 8 Global Step: 109330 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:03:30,051-Speed 3035.83 samples/sec Loss 8.0309 LearningRate 0.0313 Epoch: 8 Global Step: 109340 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:03:33,472-Speed 2994.28 samples/sec Loss 8.2040 LearningRate 0.0313 Epoch: 8 Global Step: 109350 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:03:36,877-Speed 3007.80 samples/sec Loss 8.1612 LearningRate 0.0313 Epoch: 8 Global Step: 109360 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:03:40,399-Speed 2908.78 samples/sec Loss 8.2387 LearningRate 0.0313 Epoch: 8 Global Step: 109370 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:03:43,736-Speed 3068.90 samples/sec Loss 8.1002 LearningRate 0.0313 Epoch: 8 Global Step: 109380 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:03:47,162-Speed 2989.93 samples/sec Loss 8.1752 LearningRate 0.0313 Epoch: 8 Global Step: 109390 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:03:50,604-Speed 2976.81 samples/sec Loss 8.2513 LearningRate 0.0313 Epoch: 8 Global Step: 109400 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:03:54,020-Speed 2997.53 samples/sec Loss 8.2073 LearningRate 0.0313 Epoch: 8 Global Step: 109410 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 12:03:57,428-Speed 3005.51 samples/sec Loss 8.2160 LearningRate 0.0313 Epoch: 8 Global Step: 109420 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 12:04:00,810-Speed 3029.24 samples/sec Loss 8.1156 LearningRate 0.0313 Epoch: 8 Global Step: 109430 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 12:04:04,253-Speed 2974.51 samples/sec Loss 8.0175 LearningRate 0.0313 Epoch: 8 Global Step: 109440 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:04:07,683-Speed 2986.40 samples/sec Loss 8.2908 LearningRate 0.0313 Epoch: 8 Global Step: 109450 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:04:11,073-Speed 3021.51 samples/sec Loss 8.3230 LearningRate 0.0313 Epoch: 8 Global Step: 109460 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:04:14,518-Speed 2972.93 samples/sec Loss 8.2331 LearningRate 0.0313 Epoch: 8 Global Step: 109470 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:04:17,918-Speed 3012.73 samples/sec Loss 8.0439 LearningRate 0.0313 Epoch: 8 Global Step: 109480 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:04:21,329-Speed 3003.55 samples/sec Loss 8.1279 LearningRate 0.0313 Epoch: 8 Global Step: 109490 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:04:24,684-Speed 3052.98 samples/sec Loss 8.2297 LearningRate 0.0313 Epoch: 8 Global Step: 109500 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:04:28,113-Speed 2987.18 samples/sec Loss 8.0766 LearningRate 0.0313 Epoch: 8 Global Step: 109510 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:04:31,482-Speed 3039.90 samples/sec Loss 8.2408 LearningRate 0.0313 Epoch: 8 Global Step: 109520 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:04:34,885-Speed 3010.54 samples/sec Loss 8.1156 LearningRate 0.0313 Epoch: 8 Global Step: 109530 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:04:38,288-Speed 3010.23 samples/sec Loss 8.1300 LearningRate 0.0313 Epoch: 8 Global Step: 109540 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 12:04:41,736-Speed 2970.18 samples/sec Loss 8.1475 LearningRate 0.0312 Epoch: 8 Global Step: 109550 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 12:04:45,172-Speed 2981.52 samples/sec Loss 8.2291 LearningRate 0.0312 Epoch: 8 Global Step: 109560 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 12:04:48,521-Speed 3058.96 samples/sec Loss 8.0798 LearningRate 0.0312 Epoch: 8 Global Step: 109570 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 12:04:51,884-Speed 3045.12 samples/sec Loss 8.1797 LearningRate 0.0312 Epoch: 8 Global Step: 109580 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:04:55,212-Speed 3078.24 samples/sec Loss 8.0720 LearningRate 0.0312 Epoch: 8 Global Step: 109590 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:04:58,544-Speed 3074.41 samples/sec Loss 8.1914 LearningRate 0.0312 Epoch: 8 Global Step: 109600 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:05:01,955-Speed 3002.67 samples/sec Loss 8.2060 LearningRate 0.0312 Epoch: 8 Global Step: 109610 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:05:05,301-Speed 3061.11 samples/sec Loss 8.1486 LearningRate 0.0312 Epoch: 8 Global Step: 109620 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:05:08,695-Speed 3018.38 samples/sec Loss 8.0677 LearningRate 0.0312 Epoch: 8 Global Step: 109630 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:05:12,058-Speed 3045.07 samples/sec Loss 8.2022 LearningRate 0.0312 Epoch: 8 Global Step: 109640 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:05:15,428-Speed 3039.67 samples/sec Loss 8.0908 LearningRate 0.0312 Epoch: 8 Global Step: 109650 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:05:18,783-Speed 3053.44 samples/sec Loss 8.1096 LearningRate 0.0312 Epoch: 8 Global Step: 109660 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:05:22,186-Speed 3009.43 samples/sec Loss 8.2518 LearningRate 0.0312 Epoch: 8 Global Step: 109670 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:05:25,640-Speed 2965.54 samples/sec Loss 8.2007 LearningRate 0.0312 Epoch: 8 Global Step: 109680 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 12:05:29,073-Speed 2983.77 samples/sec Loss 8.2880 LearningRate 0.0312 Epoch: 8 Global Step: 109690 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 12:05:32,397-Speed 3080.85 samples/sec Loss 8.0758 LearningRate 0.0312 Epoch: 8 Global Step: 109700 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 12:05:35,775-Speed 3032.51 samples/sec Loss 8.1534 LearningRate 0.0312 Epoch: 8 Global Step: 109710 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 12:05:39,148-Speed 3037.03 samples/sec Loss 8.2665 LearningRate 0.0312 Epoch: 8 Global Step: 109720 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:05:42,481-Speed 3072.38 samples/sec Loss 8.1720 LearningRate 0.0312 Epoch: 8 Global Step: 109730 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:05:45,800-Speed 3086.40 samples/sec Loss 8.2011 LearningRate 0.0312 Epoch: 8 Global Step: 109740 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:05:49,155-Speed 3053.60 samples/sec Loss 7.9432 LearningRate 0.0312 Epoch: 8 Global Step: 109750 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:05:52,531-Speed 3033.48 samples/sec Loss 8.1967 LearningRate 0.0312 Epoch: 8 Global Step: 109760 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:05:55,879-Speed 3059.35 samples/sec Loss 8.0646 LearningRate 0.0312 Epoch: 8 Global Step: 109770 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:05:59,301-Speed 2993.26 samples/sec Loss 8.0802 LearningRate 0.0311 Epoch: 8 Global Step: 109780 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:06:02,712-Speed 3002.84 samples/sec Loss 8.0553 LearningRate 0.0311 Epoch: 8 Global Step: 109790 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:06:06,149-Speed 2980.21 samples/sec Loss 8.1986 LearningRate 0.0311 Epoch: 8 Global Step: 109800 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:06:09,591-Speed 2975.60 samples/sec Loss 8.1971 LearningRate 0.0311 Epoch: 8 Global Step: 109810 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:06:12,986-Speed 3017.19 samples/sec Loss 8.1867 LearningRate 0.0311 Epoch: 8 Global Step: 109820 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:06:16,442-Speed 2963.82 samples/sec Loss 8.2094 LearningRate 0.0311 Epoch: 8 Global Step: 109830 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:06:19,852-Speed 3003.63 samples/sec Loss 8.0608 LearningRate 0.0311 Epoch: 8 Global Step: 109840 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:06:23,214-Speed 3046.97 samples/sec Loss 8.1491 LearningRate 0.0311 Epoch: 8 Global Step: 109850 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:06:26,612-Speed 3013.79 samples/sec Loss 8.3260 LearningRate 0.0311 Epoch: 8 Global Step: 109860 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:06:29,985-Speed 3037.53 samples/sec Loss 8.1187 LearningRate 0.0311 Epoch: 8 Global Step: 109870 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:06:33,343-Speed 3049.72 samples/sec Loss 8.1388 LearningRate 0.0311 Epoch: 8 Global Step: 109880 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:06:36,683-Speed 3067.41 samples/sec Loss 8.1354 LearningRate 0.0311 Epoch: 8 Global Step: 109890 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:06:40,124-Speed 2976.63 samples/sec Loss 8.1784 LearningRate 0.0311 Epoch: 8 Global Step: 109900 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:06:43,504-Speed 3030.11 samples/sec Loss 8.1546 LearningRate 0.0311 Epoch: 8 Global Step: 109910 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:06:46,932-Speed 2988.06 samples/sec Loss 8.1619 LearningRate 0.0311 Epoch: 8 Global Step: 109920 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 12:06:50,344-Speed 3002.52 samples/sec Loss 8.1707 LearningRate 0.0311 Epoch: 8 Global Step: 109930 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:06:53,766-Speed 2992.68 samples/sec Loss 8.1945 LearningRate 0.0311 Epoch: 8 Global Step: 109940 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:06:57,250-Speed 2940.20 samples/sec Loss 8.0350 LearningRate 0.0311 Epoch: 8 Global Step: 109950 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:07:00,672-Speed 2992.93 samples/sec Loss 8.0924 LearningRate 0.0311 Epoch: 8 Global Step: 109960 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:07:04,030-Speed 3050.46 samples/sec Loss 8.0791 LearningRate 0.0311 Epoch: 8 Global Step: 109970 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:07:07,467-Speed 2979.81 samples/sec Loss 8.2240 LearningRate 0.0311 Epoch: 8 Global Step: 109980 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:07:10,946-Speed 2944.77 samples/sec Loss 8.1931 LearningRate 0.0311 Epoch: 8 Global Step: 109990 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:07:14,345-Speed 3013.45 samples/sec Loss 8.0224 LearningRate 0.0310 Epoch: 8 Global Step: 110000 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:07:17,791-Speed 2972.25 samples/sec Loss 8.1256 LearningRate 0.0310 Epoch: 8 Global Step: 110010 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:07:21,256-Speed 2955.35 samples/sec Loss 8.1371 LearningRate 0.0310 Epoch: 8 Global Step: 110020 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:07:24,694-Speed 2979.68 samples/sec Loss 8.2226 LearningRate 0.0310 Epoch: 8 Global Step: 110030 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 12:07:28,141-Speed 2971.89 samples/sec Loss 8.0085 LearningRate 0.0310 Epoch: 8 Global Step: 110040 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:07:31,504-Speed 3045.95 samples/sec Loss 8.1728 LearningRate 0.0310 Epoch: 8 Global Step: 110050 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:07:34,873-Speed 3040.40 samples/sec Loss 8.1275 LearningRate 0.0310 Epoch: 8 Global Step: 110060 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:07:38,292-Speed 2996.03 samples/sec Loss 8.2269 LearningRate 0.0310 Epoch: 8 Global Step: 110070 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:07:41,696-Speed 3009.02 samples/sec Loss 8.0528 LearningRate 0.0310 Epoch: 8 Global Step: 110080 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:07:45,020-Speed 3081.24 samples/sec Loss 8.1314 LearningRate 0.0310 Epoch: 8 Global Step: 110090 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:07:48,419-Speed 3013.08 samples/sec Loss 8.0552 LearningRate 0.0310 Epoch: 8 Global Step: 110100 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:07:51,799-Speed 3030.40 samples/sec Loss 8.1933 LearningRate 0.0310 Epoch: 8 Global Step: 110110 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:07:55,228-Speed 2987.18 samples/sec Loss 8.1838 LearningRate 0.0310 Epoch: 8 Global Step: 110120 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:07:58,673-Speed 2973.33 samples/sec Loss 8.0770 LearningRate 0.0310 Epoch: 8 Global Step: 110130 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:08:02,115-Speed 2976.48 samples/sec Loss 8.0166 LearningRate 0.0310 Epoch: 8 Global Step: 110140 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 12:08:05,565-Speed 2968.57 samples/sec Loss 8.0490 LearningRate 0.0310 Epoch: 8 Global Step: 110150 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 12:08:08,924-Speed 3049.68 samples/sec Loss 7.9407 LearningRate 0.0310 Epoch: 8 Global Step: 110160 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:08:12,393-Speed 2952.95 samples/sec Loss 7.9776 LearningRate 0.0310 Epoch: 8 Global Step: 110170 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:08:15,871-Speed 2944.42 samples/sec Loss 8.0818 LearningRate 0.0310 Epoch: 8 Global Step: 110180 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:08:19,249-Speed 3032.78 samples/sec Loss 8.2472 LearningRate 0.0310 Epoch: 8 Global Step: 110190 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:08:22,648-Speed 3013.64 samples/sec Loss 7.9958 LearningRate 0.0310 Epoch: 8 Global Step: 110200 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:08:26,042-Speed 3017.03 samples/sec Loss 8.1105 LearningRate 0.0310 Epoch: 8 Global Step: 110210 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:08:29,427-Speed 3026.81 samples/sec Loss 8.0934 LearningRate 0.0309 Epoch: 8 Global Step: 110220 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:08:32,869-Speed 2975.46 samples/sec Loss 8.0059 LearningRate 0.0309 Epoch: 8 Global Step: 110230 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:08:36,303-Speed 2982.54 samples/sec Loss 7.9966 LearningRate 0.0309 Epoch: 8 Global Step: 110240 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:08:39,701-Speed 3014.56 samples/sec Loss 8.2429 LearningRate 0.0309 Epoch: 8 Global Step: 110250 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:08:43,123-Speed 2993.83 samples/sec Loss 7.9953 LearningRate 0.0309 Epoch: 8 Global Step: 110260 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:08:46,520-Speed 3014.89 samples/sec Loss 8.0036 LearningRate 0.0309 Epoch: 8 Global Step: 110270 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:08:49,929-Speed 3004.53 samples/sec Loss 8.0708 LearningRate 0.0309 Epoch: 8 Global Step: 110280 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:08:53,317-Speed 3023.52 samples/sec Loss 8.0316 LearningRate 0.0309 Epoch: 8 Global Step: 110290 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:08:56,635-Speed 3086.86 samples/sec Loss 8.1680 LearningRate 0.0309 Epoch: 8 Global Step: 110300 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:08:59,998-Speed 3046.02 samples/sec Loss 8.0713 LearningRate 0.0309 Epoch: 8 Global Step: 110310 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:09:03,418-Speed 2995.18 samples/sec Loss 8.1844 LearningRate 0.0309 Epoch: 8 Global Step: 110320 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:09:06,768-Speed 3057.70 samples/sec Loss 8.0076 LearningRate 0.0309 Epoch: 8 Global Step: 110330 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:09:10,156-Speed 3023.41 samples/sec Loss 8.1270 LearningRate 0.0309 Epoch: 8 Global Step: 110340 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:09:13,564-Speed 3005.48 samples/sec Loss 8.3112 LearningRate 0.0309 Epoch: 8 Global Step: 110350 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:09:17,070-Speed 2921.21 samples/sec Loss 8.0873 LearningRate 0.0309 Epoch: 8 Global Step: 110360 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:09:20,497-Speed 2988.80 samples/sec Loss 8.0790 LearningRate 0.0309 Epoch: 8 Global Step: 110370 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:09:23,990-Speed 2932.67 samples/sec Loss 8.1126 LearningRate 0.0309 Epoch: 8 Global Step: 110380 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:09:27,380-Speed 3021.40 samples/sec Loss 8.1722 LearningRate 0.0309 Epoch: 8 Global Step: 110390 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:09:30,767-Speed 3024.76 samples/sec Loss 8.1042 LearningRate 0.0309 Epoch: 8 Global Step: 110400 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:09:34,189-Speed 2992.98 samples/sec Loss 8.1032 LearningRate 0.0309 Epoch: 8 Global Step: 110410 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:09:37,585-Speed 3016.06 samples/sec Loss 8.1527 LearningRate 0.0309 Epoch: 8 Global Step: 110420 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:09:41,014-Speed 2987.67 samples/sec Loss 8.0768 LearningRate 0.0309 Epoch: 8 Global Step: 110430 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:09:44,420-Speed 3007.08 samples/sec Loss 8.1247 LearningRate 0.0309 Epoch: 8 Global Step: 110440 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:09:47,826-Speed 3007.45 samples/sec Loss 8.2074 LearningRate 0.0308 Epoch: 8 Global Step: 110450 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 12:09:51,222-Speed 3016.10 samples/sec Loss 8.0500 LearningRate 0.0308 Epoch: 8 Global Step: 110460 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 12:09:54,589-Speed 3041.84 samples/sec Loss 8.1085 LearningRate 0.0308 Epoch: 8 Global Step: 110470 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 12:09:57,893-Speed 3100.72 samples/sec Loss 8.0721 LearningRate 0.0308 Epoch: 8 Global Step: 110480 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:10:01,286-Speed 3019.69 samples/sec Loss 8.0364 LearningRate 0.0308 Epoch: 8 Global Step: 110490 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:10:04,731-Speed 2972.88 samples/sec Loss 8.0813 LearningRate 0.0308 Epoch: 8 Global Step: 110500 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:10:08,130-Speed 3013.81 samples/sec Loss 8.1887 LearningRate 0.0308 Epoch: 8 Global Step: 110510 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:10:11,452-Speed 3083.43 samples/sec Loss 8.2213 LearningRate 0.0308 Epoch: 8 Global Step: 110520 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:10:14,825-Speed 3036.66 samples/sec Loss 8.1382 LearningRate 0.0308 Epoch: 8 Global Step: 110530 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:10:18,267-Speed 2975.79 samples/sec Loss 8.0192 LearningRate 0.0308 Epoch: 8 Global Step: 110540 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:10:21,683-Speed 2998.86 samples/sec Loss 8.0497 LearningRate 0.0308 Epoch: 8 Global Step: 110550 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:10:25,109-Speed 2989.88 samples/sec Loss 8.0224 LearningRate 0.0308 Epoch: 8 Global Step: 110560 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:10:28,493-Speed 3026.00 samples/sec Loss 7.9625 LearningRate 0.0308 Epoch: 8 Global Step: 110570 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:10:31,905-Speed 3002.78 samples/sec Loss 8.0571 LearningRate 0.0308 Epoch: 8 Global Step: 110580 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:10:35,267-Speed 3045.94 samples/sec Loss 8.1288 LearningRate 0.0308 Epoch: 8 Global Step: 110590 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:10:38,599-Speed 3074.67 samples/sec Loss 8.0902 LearningRate 0.0308 Epoch: 8 Global Step: 110600 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:10:42,059-Speed 2960.53 samples/sec Loss 8.1531 LearningRate 0.0308 Epoch: 8 Global Step: 110610 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:10:45,390-Speed 3075.18 samples/sec Loss 8.0863 LearningRate 0.0308 Epoch: 8 Global Step: 110620 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:10:48,738-Speed 3059.32 samples/sec Loss 8.0816 LearningRate 0.0308 Epoch: 8 Global Step: 110630 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:10:52,052-Speed 3090.61 samples/sec Loss 8.0344 LearningRate 0.0308 Epoch: 8 Global Step: 110640 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:10:55,407-Speed 3053.14 samples/sec Loss 8.0679 LearningRate 0.0308 Epoch: 8 Global Step: 110650 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:10:58,737-Speed 3076.00 samples/sec Loss 8.0655 LearningRate 0.0308 Epoch: 8 Global Step: 110660 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:11:02,177-Speed 2977.59 samples/sec Loss 8.1217 LearningRate 0.0307 Epoch: 8 Global Step: 110670 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:11:05,470-Speed 3110.28 samples/sec Loss 8.0534 LearningRate 0.0307 Epoch: 8 Global Step: 110680 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:11:08,804-Speed 3072.50 samples/sec Loss 8.0304 LearningRate 0.0307 Epoch: 8 Global Step: 110690 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:11:12,131-Speed 3078.88 samples/sec Loss 8.1020 LearningRate 0.0307 Epoch: 8 Global Step: 110700 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:11:15,466-Speed 3071.06 samples/sec Loss 8.0897 LearningRate 0.0307 Epoch: 8 Global Step: 110710 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:11:18,885-Speed 2995.81 samples/sec Loss 8.1095 LearningRate 0.0307 Epoch: 8 Global Step: 110720 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:11:22,201-Speed 3088.67 samples/sec Loss 7.9748 LearningRate 0.0307 Epoch: 8 Global Step: 110730 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:11:25,513-Speed 3092.86 samples/sec Loss 8.1767 LearningRate 0.0307 Epoch: 8 Global Step: 110740 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:11:28,826-Speed 3091.25 samples/sec Loss 8.0675 LearningRate 0.0307 Epoch: 8 Global Step: 110750 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:11:32,163-Speed 3071.49 samples/sec Loss 8.1230 LearningRate 0.0307 Epoch: 8 Global Step: 110760 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:11:35,497-Speed 3071.85 samples/sec Loss 8.0073 LearningRate 0.0307 Epoch: 8 Global Step: 110770 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:11:38,784-Speed 3116.51 samples/sec Loss 8.1620 LearningRate 0.0307 Epoch: 8 Global Step: 110780 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:11:42,117-Speed 3073.65 samples/sec Loss 8.2410 LearningRate 0.0307 Epoch: 8 Global Step: 110790 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:11:45,488-Speed 3038.33 samples/sec Loss 8.0068 LearningRate 0.0307 Epoch: 8 Global Step: 110800 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:11:48,854-Speed 3043.41 samples/sec Loss 7.8945 LearningRate 0.0307 Epoch: 8 Global Step: 110810 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:11:52,203-Speed 3058.15 samples/sec Loss 8.0430 LearningRate 0.0307 Epoch: 8 Global Step: 110820 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:11:55,562-Speed 3048.75 samples/sec Loss 8.0339 LearningRate 0.0307 Epoch: 8 Global Step: 110830 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:11:58,934-Speed 3037.61 samples/sec Loss 8.0478 LearningRate 0.0307 Epoch: 8 Global Step: 110840 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:12:02,262-Speed 3078.25 samples/sec Loss 8.0693 LearningRate 0.0307 Epoch: 8 Global Step: 110850 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:12:05,620-Speed 3049.97 samples/sec Loss 8.0855 LearningRate 0.0307 Epoch: 8 Global Step: 110860 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:12:09,045-Speed 2990.80 samples/sec Loss 8.1603 LearningRate 0.0307 Epoch: 8 Global Step: 110870 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:12:12,455-Speed 3004.53 samples/sec Loss 8.0106 LearningRate 0.0307 Epoch: 8 Global Step: 110880 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:12:15,889-Speed 2982.20 samples/sec Loss 8.0984 LearningRate 0.0306 Epoch: 8 Global Step: 110890 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:12:19,312-Speed 2992.14 samples/sec Loss 8.0774 LearningRate 0.0306 Epoch: 8 Global Step: 110900 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:12:22,699-Speed 3024.12 samples/sec Loss 8.1820 LearningRate 0.0306 Epoch: 8 Global Step: 110910 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:12:25,998-Speed 3105.18 samples/sec Loss 8.0370 LearningRate 0.0306 Epoch: 8 Global Step: 110920 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:12:29,362-Speed 3045.30 samples/sec Loss 8.0154 LearningRate 0.0306 Epoch: 8 Global Step: 110930 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:12:32,718-Speed 3052.07 samples/sec Loss 8.0205 LearningRate 0.0306 Epoch: 8 Global Step: 110940 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:12:36,133-Speed 2998.67 samples/sec Loss 8.1616 LearningRate 0.0306 Epoch: 8 Global Step: 110950 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:12:39,562-Speed 2987.23 samples/sec Loss 8.1235 LearningRate 0.0306 Epoch: 8 Global Step: 110960 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:12:43,036-Speed 2949.24 samples/sec Loss 8.1823 LearningRate 0.0306 Epoch: 8 Global Step: 110970 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:12:46,455-Speed 2996.18 samples/sec Loss 8.1428 LearningRate 0.0306 Epoch: 8 Global Step: 110980 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:12:49,967-Speed 2915.64 samples/sec Loss 8.0269 LearningRate 0.0306 Epoch: 8 Global Step: 110990 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:12:53,370-Speed 3010.85 samples/sec Loss 8.0658 LearningRate 0.0306 Epoch: 8 Global Step: 111000 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:12:56,793-Speed 2992.13 samples/sec Loss 7.9771 LearningRate 0.0306 Epoch: 8 Global Step: 111010 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:13:00,217-Speed 2991.86 samples/sec Loss 8.0327 LearningRate 0.0306 Epoch: 8 Global Step: 111020 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:13:03,587-Speed 3039.09 samples/sec Loss 8.0427 LearningRate 0.0306 Epoch: 8 Global Step: 111030 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:13:06,926-Speed 3068.18 samples/sec Loss 8.0107 LearningRate 0.0306 Epoch: 8 Global Step: 111040 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:13:10,305-Speed 3031.11 samples/sec Loss 7.9520 LearningRate 0.0306 Epoch: 8 Global Step: 111050 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:13:13,676-Speed 3038.31 samples/sec Loss 8.1454 LearningRate 0.0306 Epoch: 8 Global Step: 111060 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:13:17,037-Speed 3047.57 samples/sec Loss 8.1288 LearningRate 0.0306 Epoch: 8 Global Step: 111070 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:13:20,433-Speed 3016.60 samples/sec Loss 8.0885 LearningRate 0.0306 Epoch: 8 Global Step: 111080 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:13:23,755-Speed 3082.65 samples/sec Loss 7.9836 LearningRate 0.0306 Epoch: 8 Global Step: 111090 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:13:27,162-Speed 3007.09 samples/sec Loss 8.0026 LearningRate 0.0306 Epoch: 8 Global Step: 111100 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:13:30,563-Speed 3011.46 samples/sec Loss 8.0870 LearningRate 0.0306 Epoch: 8 Global Step: 111110 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:13:34,066-Speed 2923.69 samples/sec Loss 7.8915 LearningRate 0.0305 Epoch: 8 Global Step: 111120 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 12:13:37,414-Speed 3059.08 samples/sec Loss 8.0934 LearningRate 0.0305 Epoch: 8 Global Step: 111130 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 12:13:40,818-Speed 3009.66 samples/sec Loss 8.1175 LearningRate 0.0305 Epoch: 8 Global Step: 111140 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 12:13:44,209-Speed 3020.41 samples/sec Loss 8.1310 LearningRate 0.0305 Epoch: 8 Global Step: 111150 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 12:13:47,574-Speed 3043.88 samples/sec Loss 8.1166 LearningRate 0.0305 Epoch: 8 Global Step: 111160 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:13:51,042-Speed 2953.35 samples/sec Loss 8.0664 LearningRate 0.0305 Epoch: 8 Global Step: 111170 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:13:54,523-Speed 2942.37 samples/sec Loss 8.1278 LearningRate 0.0305 Epoch: 8 Global Step: 111180 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:13:57,970-Speed 2972.03 samples/sec Loss 8.0791 LearningRate 0.0305 Epoch: 8 Global Step: 111190 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:14:01,403-Speed 2983.39 samples/sec Loss 7.9122 LearningRate 0.0305 Epoch: 8 Global Step: 111200 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:14:04,774-Speed 3038.60 samples/sec Loss 8.1115 LearningRate 0.0305 Epoch: 8 Global Step: 111210 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:14:08,160-Speed 3025.04 samples/sec Loss 8.0942 LearningRate 0.0305 Epoch: 8 Global Step: 111220 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:14:11,516-Speed 3052.67 samples/sec Loss 8.1465 LearningRate 0.0305 Epoch: 8 Global Step: 111230 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:14:14,843-Speed 3078.01 samples/sec Loss 8.0661 LearningRate 0.0305 Epoch: 8 Global Step: 111240 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:14:18,249-Speed 3008.03 samples/sec Loss 8.0275 LearningRate 0.0305 Epoch: 8 Global Step: 111250 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:14:21,601-Speed 3055.66 samples/sec Loss 8.0060 LearningRate 0.0305 Epoch: 8 Global Step: 111260 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:14:24,992-Speed 3020.21 samples/sec Loss 8.0931 LearningRate 0.0305 Epoch: 8 Global Step: 111270 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:14:28,381-Speed 3022.79 samples/sec Loss 7.9818 LearningRate 0.0305 Epoch: 8 Global Step: 111280 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:14:31,800-Speed 2995.62 samples/sec Loss 8.1519 LearningRate 0.0305 Epoch: 8 Global Step: 111290 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:14:35,151-Speed 3055.99 samples/sec Loss 8.1278 LearningRate 0.0305 Epoch: 8 Global Step: 111300 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:14:38,573-Speed 2993.78 samples/sec Loss 8.0172 LearningRate 0.0305 Epoch: 8 Global Step: 111310 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:14:42,025-Speed 2967.75 samples/sec Loss 7.8189 LearningRate 0.0305 Epoch: 8 Global Step: 111320 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:14:45,489-Speed 2956.87 samples/sec Loss 8.0960 LearningRate 0.0305 Epoch: 8 Global Step: 111330 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:14:48,818-Speed 3076.48 samples/sec Loss 8.0732 LearningRate 0.0304 Epoch: 8 Global Step: 111340 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:14:52,183-Speed 3044.46 samples/sec Loss 8.2099 LearningRate 0.0304 Epoch: 8 Global Step: 111350 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:14:55,511-Speed 3077.70 samples/sec Loss 8.0162 LearningRate 0.0304 Epoch: 8 Global Step: 111360 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:14:58,919-Speed 3006.13 samples/sec Loss 8.0258 LearningRate 0.0304 Epoch: 8 Global Step: 111370 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:15:02,329-Speed 3003.31 samples/sec Loss 7.9254 LearningRate 0.0304 Epoch: 8 Global Step: 111380 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:15:05,767-Speed 2979.08 samples/sec Loss 8.0804 LearningRate 0.0304 Epoch: 8 Global Step: 111390 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:15:09,176-Speed 3004.74 samples/sec Loss 8.0447 LearningRate 0.0304 Epoch: 8 Global Step: 111400 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:15:12,576-Speed 3013.09 samples/sec Loss 8.0326 LearningRate 0.0304 Epoch: 8 Global Step: 111410 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:15:15,970-Speed 3018.02 samples/sec Loss 7.9991 LearningRate 0.0304 Epoch: 8 Global Step: 111420 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:15:19,420-Speed 2968.56 samples/sec Loss 8.0413 LearningRate 0.0304 Epoch: 8 Global Step: 111430 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:15:22,847-Speed 2989.34 samples/sec Loss 7.9538 LearningRate 0.0304 Epoch: 8 Global Step: 111440 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:15:26,245-Speed 3014.20 samples/sec Loss 8.0972 LearningRate 0.0304 Epoch: 8 Global Step: 111450 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:15:29,570-Speed 3080.53 samples/sec Loss 8.0895 LearningRate 0.0304 Epoch: 8 Global Step: 111460 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:15:32,955-Speed 3026.05 samples/sec Loss 8.1670 LearningRate 0.0304 Epoch: 8 Global Step: 111470 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:15:36,377-Speed 2992.40 samples/sec Loss 8.1307 LearningRate 0.0304 Epoch: 8 Global Step: 111480 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:15:39,770-Speed 3019.11 samples/sec Loss 8.0239 LearningRate 0.0304 Epoch: 8 Global Step: 111490 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:15:43,161-Speed 3021.13 samples/sec Loss 8.0311 LearningRate 0.0304 Epoch: 8 Global Step: 111500 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:15:46,539-Speed 3031.34 samples/sec Loss 8.0936 LearningRate 0.0304 Epoch: 8 Global Step: 111510 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:15:49,870-Speed 3075.89 samples/sec Loss 7.9892 LearningRate 0.0304 Epoch: 8 Global Step: 111520 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:15:53,298-Speed 2987.89 samples/sec Loss 8.0007 LearningRate 0.0304 Epoch: 8 Global Step: 111530 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:15:56,678-Speed 3029.82 samples/sec Loss 7.9842 LearningRate 0.0304 Epoch: 8 Global Step: 111540 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:16:00,083-Speed 3009.14 samples/sec Loss 8.0385 LearningRate 0.0304 Epoch: 8 Global Step: 111550 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:16:03,478-Speed 3017.11 samples/sec Loss 8.0918 LearningRate 0.0304 Epoch: 8 Global Step: 111560 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:16:06,835-Speed 3051.26 samples/sec Loss 7.9576 LearningRate 0.0303 Epoch: 8 Global Step: 111570 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:16:10,205-Speed 3039.33 samples/sec Loss 8.0437 LearningRate 0.0303 Epoch: 8 Global Step: 111580 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:16:13,580-Speed 3035.59 samples/sec Loss 8.1585 LearningRate 0.0303 Epoch: 8 Global Step: 111590 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:16:16,909-Speed 3076.59 samples/sec Loss 7.8537 LearningRate 0.0303 Epoch: 8 Global Step: 111600 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:16:20,262-Speed 3054.62 samples/sec Loss 7.9779 LearningRate 0.0303 Epoch: 8 Global Step: 111610 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:16:23,683-Speed 2994.27 samples/sec Loss 7.9636 LearningRate 0.0303 Epoch: 8 Global Step: 111620 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:16:27,173-Speed 2934.50 samples/sec Loss 7.8973 LearningRate 0.0303 Epoch: 8 Global Step: 111630 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:16:30,623-Speed 2969.52 samples/sec Loss 8.0664 LearningRate 0.0303 Epoch: 8 Global Step: 111640 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:16:33,947-Speed 3080.99 samples/sec Loss 8.1076 LearningRate 0.0303 Epoch: 8 Global Step: 111650 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:16:37,272-Speed 3080.55 samples/sec Loss 8.0786 LearningRate 0.0303 Epoch: 8 Global Step: 111660 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 12:16:40,610-Speed 3068.77 samples/sec Loss 8.1555 LearningRate 0.0303 Epoch: 8 Global Step: 111670 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 12:16:44,029-Speed 2995.63 samples/sec Loss 7.8890 LearningRate 0.0303 Epoch: 8 Global Step: 111680 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 12:16:47,346-Speed 3088.41 samples/sec Loss 7.9437 LearningRate 0.0303 Epoch: 8 Global Step: 111690 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 12:16:50,751-Speed 3009.38 samples/sec Loss 7.9943 LearningRate 0.0303 Epoch: 8 Global Step: 111700 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:16:54,077-Speed 3078.72 samples/sec Loss 8.0499 LearningRate 0.0303 Epoch: 8 Global Step: 111710 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:16:57,385-Speed 3096.55 samples/sec Loss 7.9863 LearningRate 0.0303 Epoch: 8 Global Step: 111720 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:17:00,783-Speed 3014.17 samples/sec Loss 8.0515 LearningRate 0.0303 Epoch: 8 Global Step: 111730 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:17:04,174-Speed 3021.16 samples/sec Loss 7.9340 LearningRate 0.0303 Epoch: 8 Global Step: 111740 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:17:07,619-Speed 2972.59 samples/sec Loss 7.8249 LearningRate 0.0303 Epoch: 8 Global Step: 111750 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:17:11,090-Speed 2951.71 samples/sec Loss 8.0300 LearningRate 0.0303 Epoch: 8 Global Step: 111760 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:17:14,501-Speed 3003.02 samples/sec Loss 7.9893 LearningRate 0.0303 Epoch: 8 Global Step: 111770 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:17:18,178-Speed 2785.43 samples/sec Loss 8.0795 LearningRate 0.0303 Epoch: 8 Global Step: 111780 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:17:50,148-Speed 320.31 samples/sec Loss 7.9396 LearningRate 0.0302 Epoch: 9 Global Step: 111790 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:17:53,635-Speed 2937.50 samples/sec Loss 6.4272 LearningRate 0.0302 Epoch: 9 Global Step: 111800 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 12:17:57,382-Speed 2735.05 samples/sec Loss 6.5976 LearningRate 0.0302 Epoch: 9 Global Step: 111810 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 12:18:00,821-Speed 2978.19 samples/sec Loss 6.4675 LearningRate 0.0302 Epoch: 9 Global Step: 111820 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 12:18:04,228-Speed 3006.77 samples/sec Loss 6.5378 LearningRate 0.0302 Epoch: 9 Global Step: 111830 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 12:18:07,727-Speed 2927.79 samples/sec Loss 6.4939 LearningRate 0.0302 Epoch: 9 Global Step: 111840 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:18:11,404-Speed 2785.68 samples/sec Loss 6.7012 LearningRate 0.0302 Epoch: 9 Global Step: 111850 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:18:14,806-Speed 3010.29 samples/sec Loss 6.5154 LearningRate 0.0302 Epoch: 9 Global Step: 111860 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:18:18,190-Speed 3026.92 samples/sec Loss 6.3353 LearningRate 0.0302 Epoch: 9 Global Step: 111870 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:18:21,617-Speed 2989.91 samples/sec Loss 6.6473 LearningRate 0.0302 Epoch: 9 Global Step: 111880 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:18:24,937-Speed 3085.11 samples/sec Loss 6.6460 LearningRate 0.0302 Epoch: 9 Global Step: 111890 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:18:28,430-Speed 2932.28 samples/sec Loss 6.6492 LearningRate 0.0302 Epoch: 9 Global Step: 111900 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:18:31,841-Speed 3003.39 samples/sec Loss 6.4821 LearningRate 0.0302 Epoch: 9 Global Step: 111910 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:18:35,216-Speed 3034.68 samples/sec Loss 6.4642 LearningRate 0.0302 Epoch: 9 Global Step: 111920 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:18:38,718-Speed 2924.51 samples/sec Loss 6.5693 LearningRate 0.0302 Epoch: 9 Global Step: 111930 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:18:42,133-Speed 3000.34 samples/sec Loss 6.4288 LearningRate 0.0302 Epoch: 9 Global Step: 111940 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:18:45,562-Speed 2986.23 samples/sec Loss 6.5631 LearningRate 0.0302 Epoch: 9 Global Step: 111950 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:18:48,970-Speed 3005.98 samples/sec Loss 6.5111 LearningRate 0.0302 Epoch: 9 Global Step: 111960 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:18:52,438-Speed 2953.79 samples/sec Loss 6.5285 LearningRate 0.0302 Epoch: 9 Global Step: 111970 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:18:55,855-Speed 2997.40 samples/sec Loss 6.6465 LearningRate 0.0302 Epoch: 9 Global Step: 111980 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:18:59,179-Speed 3081.67 samples/sec Loss 6.4550 LearningRate 0.0302 Epoch: 9 Global Step: 111990 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:19:02,523-Speed 3063.09 samples/sec Loss 6.5402 LearningRate 0.0302 Epoch: 9 Global Step: 112000 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:19:05,879-Speed 3052.04 samples/sec Loss 6.4757 LearningRate 0.0302 Epoch: 9 Global Step: 112010 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:19:09,320-Speed 2976.79 samples/sec Loss 6.5689 LearningRate 0.0301 Epoch: 9 Global Step: 112020 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:19:12,738-Speed 2998.15 samples/sec Loss 6.5343 LearningRate 0.0301 Epoch: 9 Global Step: 112030 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:19:16,171-Speed 2983.52 samples/sec Loss 6.6244 LearningRate 0.0301 Epoch: 9 Global Step: 112040 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:19:19,566-Speed 3016.98 samples/sec Loss 6.5650 LearningRate 0.0301 Epoch: 9 Global Step: 112050 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:19:22,935-Speed 3040.51 samples/sec Loss 6.6300 LearningRate 0.0301 Epoch: 9 Global Step: 112060 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:19:26,286-Speed 3057.69 samples/sec Loss 6.7034 LearningRate 0.0301 Epoch: 9 Global Step: 112070 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:19:29,698-Speed 3002.29 samples/sec Loss 6.6716 LearningRate 0.0301 Epoch: 9 Global Step: 112080 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:19:33,073-Speed 3035.00 samples/sec Loss 6.8011 LearningRate 0.0301 Epoch: 9 Global Step: 112090 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:19:36,696-Speed 2827.15 samples/sec Loss 6.7745 LearningRate 0.0301 Epoch: 9 Global Step: 112100 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:19:40,135-Speed 2978.66 samples/sec Loss 6.6185 LearningRate 0.0301 Epoch: 9 Global Step: 112110 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:19:43,490-Speed 3052.84 samples/sec Loss 6.6035 LearningRate 0.0301 Epoch: 9 Global Step: 112120 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:19:46,825-Speed 3071.27 samples/sec Loss 6.6305 LearningRate 0.0301 Epoch: 9 Global Step: 112130 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:19:50,192-Speed 3042.63 samples/sec Loss 6.7928 LearningRate 0.0301 Epoch: 9 Global Step: 112140 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:19:53,567-Speed 3034.46 samples/sec Loss 6.6154 LearningRate 0.0301 Epoch: 9 Global Step: 112150 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:19:56,928-Speed 3047.83 samples/sec Loss 6.8104 LearningRate 0.0301 Epoch: 9 Global Step: 112160 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:20:00,374-Speed 2972.77 samples/sec Loss 6.6267 LearningRate 0.0301 Epoch: 9 Global Step: 112170 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:20:03,820-Speed 2972.48 samples/sec Loss 6.7334 LearningRate 0.0301 Epoch: 9 Global Step: 112180 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:20:07,201-Speed 3029.16 samples/sec Loss 6.6279 LearningRate 0.0301 Epoch: 9 Global Step: 112190 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:20:10,554-Speed 3055.80 samples/sec Loss 6.6329 LearningRate 0.0301 Epoch: 9 Global Step: 112200 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:20:13,947-Speed 3018.21 samples/sec Loss 6.7617 LearningRate 0.0301 Epoch: 9 Global Step: 112210 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:20:17,303-Speed 3052.42 samples/sec Loss 6.7775 LearningRate 0.0301 Epoch: 9 Global Step: 112220 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:20:20,639-Speed 3070.99 samples/sec Loss 6.7726 LearningRate 0.0301 Epoch: 9 Global Step: 112230 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:20:24,083-Speed 2973.77 samples/sec Loss 6.7819 LearningRate 0.0301 Epoch: 9 Global Step: 112240 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:20:27,445-Speed 3046.42 samples/sec Loss 6.6077 LearningRate 0.0300 Epoch: 9 Global Step: 112250 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:20:30,800-Speed 3053.21 samples/sec Loss 6.6535 LearningRate 0.0300 Epoch: 9 Global Step: 112260 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:20:34,183-Speed 3027.96 samples/sec Loss 6.8600 LearningRate 0.0300 Epoch: 9 Global Step: 112270 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:20:37,512-Speed 3076.59 samples/sec Loss 6.8256 LearningRate 0.0300 Epoch: 9 Global Step: 112280 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:20:40,890-Speed 3032.75 samples/sec Loss 6.6725 LearningRate 0.0300 Epoch: 9 Global Step: 112290 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:20:44,230-Speed 3066.65 samples/sec Loss 6.7984 LearningRate 0.0300 Epoch: 9 Global Step: 112300 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:20:47,575-Speed 3062.37 samples/sec Loss 6.7176 LearningRate 0.0300 Epoch: 9 Global Step: 112310 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:20:50,987-Speed 3001.71 samples/sec Loss 6.7263 LearningRate 0.0300 Epoch: 9 Global Step: 112320 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:20:54,380-Speed 3018.88 samples/sec Loss 6.7198 LearningRate 0.0300 Epoch: 9 Global Step: 112330 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:20:57,758-Speed 3032.74 samples/sec Loss 6.7502 LearningRate 0.0300 Epoch: 9 Global Step: 112340 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:21:01,169-Speed 3002.68 samples/sec Loss 6.7038 LearningRate 0.0300 Epoch: 9 Global Step: 112350 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:21:04,485-Speed 3089.42 samples/sec Loss 6.8808 LearningRate 0.0300 Epoch: 9 Global Step: 112360 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:21:07,915-Speed 2986.51 samples/sec Loss 6.8269 LearningRate 0.0300 Epoch: 9 Global Step: 112370 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:21:11,322-Speed 3006.79 samples/sec Loss 6.6608 LearningRate 0.0300 Epoch: 9 Global Step: 112380 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:21:14,730-Speed 3005.48 samples/sec Loss 6.7673 LearningRate 0.0300 Epoch: 9 Global Step: 112390 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:21:18,182-Speed 2966.96 samples/sec Loss 6.7637 LearningRate 0.0300 Epoch: 9 Global Step: 112400 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:21:21,645-Speed 2958.38 samples/sec Loss 6.8654 LearningRate 0.0300 Epoch: 9 Global Step: 112410 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:21:25,080-Speed 2981.60 samples/sec Loss 6.7682 LearningRate 0.0300 Epoch: 9 Global Step: 112420 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:21:28,391-Speed 3094.00 samples/sec Loss 6.6873 LearningRate 0.0300 Epoch: 9 Global Step: 112430 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:21:31,793-Speed 3011.35 samples/sec Loss 6.8076 LearningRate 0.0300 Epoch: 9 Global Step: 112440 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:21:35,180-Speed 3024.00 samples/sec Loss 6.9556 LearningRate 0.0300 Epoch: 9 Global Step: 112450 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:21:38,523-Speed 3064.37 samples/sec Loss 6.8125 LearningRate 0.0300 Epoch: 9 Global Step: 112460 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 12:21:41,900-Speed 3033.62 samples/sec Loss 6.9509 LearningRate 0.0299 Epoch: 9 Global Step: 112470 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:21:45,254-Speed 3053.83 samples/sec Loss 6.8179 LearningRate 0.0299 Epoch: 9 Global Step: 112480 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:21:48,678-Speed 2991.30 samples/sec Loss 6.9104 LearningRate 0.0299 Epoch: 9 Global Step: 112490 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:21:52,061-Speed 3028.27 samples/sec Loss 6.9144 LearningRate 0.0299 Epoch: 9 Global Step: 112500 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:21:55,420-Speed 3048.48 samples/sec Loss 6.7451 LearningRate 0.0299 Epoch: 9 Global Step: 112510 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:21:58,746-Speed 3080.19 samples/sec Loss 6.7879 LearningRate 0.0299 Epoch: 9 Global Step: 112520 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:22:02,094-Speed 3059.15 samples/sec Loss 6.7976 LearningRate 0.0299 Epoch: 9 Global Step: 112530 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:22:05,522-Speed 2996.76 samples/sec Loss 6.8640 LearningRate 0.0299 Epoch: 9 Global Step: 112540 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:22:08,890-Speed 3041.89 samples/sec Loss 6.9808 LearningRate 0.0299 Epoch: 9 Global Step: 112550 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:22:12,203-Speed 3091.56 samples/sec Loss 6.8999 LearningRate 0.0299 Epoch: 9 Global Step: 112560 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:22:15,559-Speed 3051.63 samples/sec Loss 6.9137 LearningRate 0.0299 Epoch: 9 Global Step: 112570 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 12:22:18,951-Speed 3019.93 samples/sec Loss 6.9075 LearningRate 0.0299 Epoch: 9 Global Step: 112580 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 12:22:22,335-Speed 3026.63 samples/sec Loss 6.8338 LearningRate 0.0299 Epoch: 9 Global Step: 112590 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 12:22:25,693-Speed 3050.78 samples/sec Loss 6.9488 LearningRate 0.0299 Epoch: 9 Global Step: 112600 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 12:22:29,079-Speed 3024.72 samples/sec Loss 6.9399 LearningRate 0.0299 Epoch: 9 Global Step: 112610 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 12:22:32,557-Speed 2945.57 samples/sec Loss 6.8711 LearningRate 0.0299 Epoch: 9 Global Step: 112620 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 12:22:35,884-Speed 3078.82 samples/sec Loss 6.8919 LearningRate 0.0299 Epoch: 9 Global Step: 112630 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 12:22:39,221-Speed 3069.61 samples/sec Loss 6.9398 LearningRate 0.0299 Epoch: 9 Global Step: 112640 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:22:42,587-Speed 3043.21 samples/sec Loss 6.7586 LearningRate 0.0299 Epoch: 9 Global Step: 112650 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:22:45,939-Speed 3054.85 samples/sec Loss 6.8874 LearningRate 0.0299 Epoch: 9 Global Step: 112660 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:22:49,309-Speed 3040.27 samples/sec Loss 6.9959 LearningRate 0.0299 Epoch: 9 Global Step: 112670 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:22:52,696-Speed 3024.04 samples/sec Loss 6.9910 LearningRate 0.0299 Epoch: 9 Global Step: 112680 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:22:56,075-Speed 3031.00 samples/sec Loss 6.8797 LearningRate 0.0299 Epoch: 9 Global Step: 112690 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:22:59,420-Speed 3062.83 samples/sec Loss 6.9339 LearningRate 0.0298 Epoch: 9 Global Step: 112700 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:23:02,750-Speed 3076.13 samples/sec Loss 6.8918 LearningRate 0.0298 Epoch: 9 Global Step: 112710 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:23:06,096-Speed 3060.77 samples/sec Loss 6.9017 LearningRate 0.0298 Epoch: 9 Global Step: 112720 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:23:09,419-Speed 3083.11 samples/sec Loss 6.9787 LearningRate 0.0298 Epoch: 9 Global Step: 112730 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:23:12,882-Speed 2957.84 samples/sec Loss 7.0670 LearningRate 0.0298 Epoch: 9 Global Step: 112740 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 12:23:16,243-Speed 3047.44 samples/sec Loss 7.0803 LearningRate 0.0298 Epoch: 9 Global Step: 112750 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:23:19,678-Speed 2981.58 samples/sec Loss 6.9897 LearningRate 0.0298 Epoch: 9 Global Step: 112760 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:23:23,074-Speed 3016.36 samples/sec Loss 6.9837 LearningRate 0.0298 Epoch: 9 Global Step: 112770 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:23:26,417-Speed 3064.17 samples/sec Loss 6.8802 LearningRate 0.0298 Epoch: 9 Global Step: 112780 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:23:29,764-Speed 3059.99 samples/sec Loss 7.0078 LearningRate 0.0298 Epoch: 9 Global Step: 112790 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:23:33,136-Speed 3037.44 samples/sec Loss 6.9307 LearningRate 0.0298 Epoch: 9 Global Step: 112800 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:23:36,542-Speed 3007.93 samples/sec Loss 7.0503 LearningRate 0.0298 Epoch: 9 Global Step: 112810 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:23:40,018-Speed 2946.19 samples/sec Loss 7.0000 LearningRate 0.0298 Epoch: 9 Global Step: 112820 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:23:43,427-Speed 3004.80 samples/sec Loss 6.9267 LearningRate 0.0298 Epoch: 9 Global Step: 112830 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:23:46,834-Speed 3006.43 samples/sec Loss 6.9228 LearningRate 0.0298 Epoch: 9 Global Step: 112840 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:23:50,249-Speed 2999.84 samples/sec Loss 7.0665 LearningRate 0.0298 Epoch: 9 Global Step: 112850 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:23:53,697-Speed 2970.54 samples/sec Loss 6.9871 LearningRate 0.0298 Epoch: 9 Global Step: 112860 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:23:57,150-Speed 2966.28 samples/sec Loss 6.9001 LearningRate 0.0298 Epoch: 9 Global Step: 112870 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:24:00,571-Speed 2994.26 samples/sec Loss 7.0508 LearningRate 0.0298 Epoch: 9 Global Step: 112880 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:24:03,955-Speed 3027.18 samples/sec Loss 6.9928 LearningRate 0.0298 Epoch: 9 Global Step: 112890 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:24:07,333-Speed 3032.35 samples/sec Loss 6.8324 LearningRate 0.0298 Epoch: 9 Global Step: 112900 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:24:10,823-Speed 2935.53 samples/sec Loss 6.9922 LearningRate 0.0298 Epoch: 9 Global Step: 112910 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:24:14,162-Speed 3067.28 samples/sec Loss 7.0856 LearningRate 0.0298 Epoch: 9 Global Step: 112920 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:24:17,503-Speed 3066.16 samples/sec Loss 7.0554 LearningRate 0.0297 Epoch: 9 Global Step: 112930 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:24:20,928-Speed 2990.16 samples/sec Loss 6.9404 LearningRate 0.0297 Epoch: 9 Global Step: 112940 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:24:24,276-Speed 3059.79 samples/sec Loss 7.0594 LearningRate 0.0297 Epoch: 9 Global Step: 112950 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:24:27,632-Speed 3052.32 samples/sec Loss 6.9880 LearningRate 0.0297 Epoch: 9 Global Step: 112960 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:24:31,065-Speed 2983.00 samples/sec Loss 7.0597 LearningRate 0.0297 Epoch: 9 Global Step: 112970 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 12:24:34,442-Speed 3033.81 samples/sec Loss 7.0925 LearningRate 0.0297 Epoch: 9 Global Step: 112980 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 12:24:37,814-Speed 3037.36 samples/sec Loss 6.9986 LearningRate 0.0297 Epoch: 9 Global Step: 112990 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 12:24:41,160-Speed 3061.47 samples/sec Loss 6.9859 LearningRate 0.0297 Epoch: 9 Global Step: 113000 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:24:44,535-Speed 3035.23 samples/sec Loss 7.0399 LearningRate 0.0297 Epoch: 9 Global Step: 113010 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:24:47,853-Speed 3087.58 samples/sec Loss 7.1107 LearningRate 0.0297 Epoch: 9 Global Step: 113020 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:24:51,202-Speed 3058.35 samples/sec Loss 7.0566 LearningRate 0.0297 Epoch: 9 Global Step: 113030 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:24:54,539-Speed 3069.84 samples/sec Loss 7.1117 LearningRate 0.0297 Epoch: 9 Global Step: 113040 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:24:57,978-Speed 2978.48 samples/sec Loss 7.1408 LearningRate 0.0297 Epoch: 9 Global Step: 113050 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:25:01,397-Speed 2996.25 samples/sec Loss 7.0490 LearningRate 0.0297 Epoch: 9 Global Step: 113060 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:25:04,730-Speed 3072.63 samples/sec Loss 7.1159 LearningRate 0.0297 Epoch: 9 Global Step: 113070 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:25:08,158-Speed 2988.29 samples/sec Loss 7.1159 LearningRate 0.0297 Epoch: 9 Global Step: 113080 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:25:11,522-Speed 3046.13 samples/sec Loss 7.0669 LearningRate 0.0297 Epoch: 9 Global Step: 113090 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:25:14,950-Speed 2987.94 samples/sec Loss 7.1646 LearningRate 0.0297 Epoch: 9 Global Step: 113100 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 12:25:18,278-Speed 3078.31 samples/sec Loss 7.1250 LearningRate 0.0297 Epoch: 9 Global Step: 113110 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:25:21,644-Speed 3043.03 samples/sec Loss 7.0452 LearningRate 0.0297 Epoch: 9 Global Step: 113120 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:25:25,086-Speed 2975.60 samples/sec Loss 6.9864 LearningRate 0.0297 Epoch: 9 Global Step: 113130 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:25:28,507-Speed 2994.17 samples/sec Loss 7.0119 LearningRate 0.0297 Epoch: 9 Global Step: 113140 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:25:31,907-Speed 3013.39 samples/sec Loss 7.1296 LearningRate 0.0297 Epoch: 9 Global Step: 113150 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:25:35,237-Speed 3075.89 samples/sec Loss 7.0273 LearningRate 0.0296 Epoch: 9 Global Step: 113160 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:25:38,597-Speed 3048.71 samples/sec Loss 7.0722 LearningRate 0.0296 Epoch: 9 Global Step: 113170 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:25:41,933-Speed 3070.43 samples/sec Loss 7.1007 LearningRate 0.0296 Epoch: 9 Global Step: 113180 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:25:45,324-Speed 3020.16 samples/sec Loss 6.9916 LearningRate 0.0296 Epoch: 9 Global Step: 113190 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:25:48,712-Speed 3023.59 samples/sec Loss 7.0803 LearningRate 0.0296 Epoch: 9 Global Step: 113200 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:25:52,038-Speed 3079.52 samples/sec Loss 7.1766 LearningRate 0.0296 Epoch: 9 Global Step: 113210 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 12:25:55,360-Speed 3083.44 samples/sec Loss 7.1478 LearningRate 0.0296 Epoch: 9 Global Step: 113220 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 12:25:58,736-Speed 3034.25 samples/sec Loss 7.0586 LearningRate 0.0296 Epoch: 9 Global Step: 113230 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 12:26:02,078-Speed 3065.10 samples/sec Loss 7.1072 LearningRate 0.0296 Epoch: 9 Global Step: 113240 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 12:26:05,499-Speed 2993.99 samples/sec Loss 7.0758 LearningRate 0.0296 Epoch: 9 Global Step: 113250 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:26:08,916-Speed 2998.06 samples/sec Loss 7.1129 LearningRate 0.0296 Epoch: 9 Global Step: 113260 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:26:12,310-Speed 3017.63 samples/sec Loss 7.2428 LearningRate 0.0296 Epoch: 9 Global Step: 113270 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:26:15,679-Speed 3039.95 samples/sec Loss 7.0671 LearningRate 0.0296 Epoch: 9 Global Step: 113280 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:26:19,080-Speed 3012.08 samples/sec Loss 7.1938 LearningRate 0.0296 Epoch: 9 Global Step: 113290 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:26:22,534-Speed 2965.56 samples/sec Loss 7.1532 LearningRate 0.0296 Epoch: 9 Global Step: 113300 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:26:25,930-Speed 3016.08 samples/sec Loss 7.2318 LearningRate 0.0296 Epoch: 9 Global Step: 113310 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:26:29,317-Speed 3024.48 samples/sec Loss 7.0883 LearningRate 0.0296 Epoch: 9 Global Step: 113320 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:26:32,766-Speed 2969.84 samples/sec Loss 7.0965 LearningRate 0.0296 Epoch: 9 Global Step: 113330 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:26:36,191-Speed 2989.83 samples/sec Loss 7.0632 LearningRate 0.0296 Epoch: 9 Global Step: 113340 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:26:39,534-Speed 3064.42 samples/sec Loss 7.1352 LearningRate 0.0296 Epoch: 9 Global Step: 113350 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:26:42,962-Speed 2988.24 samples/sec Loss 7.2351 LearningRate 0.0296 Epoch: 9 Global Step: 113360 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:26:46,348-Speed 3025.39 samples/sec Loss 7.1162 LearningRate 0.0296 Epoch: 9 Global Step: 113370 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:26:49,711-Speed 3045.26 samples/sec Loss 7.1663 LearningRate 0.0295 Epoch: 9 Global Step: 113380 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:26:53,142-Speed 2985.80 samples/sec Loss 7.0081 LearningRate 0.0295 Epoch: 9 Global Step: 113390 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:26:56,509-Speed 3041.83 samples/sec Loss 7.0612 LearningRate 0.0295 Epoch: 9 Global Step: 113400 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:26:59,888-Speed 3031.06 samples/sec Loss 7.1036 LearningRate 0.0295 Epoch: 9 Global Step: 113410 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:27:03,266-Speed 3033.32 samples/sec Loss 7.1947 LearningRate 0.0295 Epoch: 9 Global Step: 113420 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:27:06,664-Speed 3013.53 samples/sec Loss 7.1838 LearningRate 0.0295 Epoch: 9 Global Step: 113430 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:27:10,091-Speed 2988.84 samples/sec Loss 7.1192 LearningRate 0.0295 Epoch: 9 Global Step: 113440 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:27:13,550-Speed 2962.36 samples/sec Loss 7.1569 LearningRate 0.0295 Epoch: 9 Global Step: 113450 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:27:16,946-Speed 3016.56 samples/sec Loss 7.1310 LearningRate 0.0295 Epoch: 9 Global Step: 113460 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:27:20,295-Speed 3058.48 samples/sec Loss 7.1263 LearningRate 0.0295 Epoch: 9 Global Step: 113470 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:27:23,730-Speed 2982.52 samples/sec Loss 7.3420 LearningRate 0.0295 Epoch: 9 Global Step: 113480 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:27:27,109-Speed 3031.12 samples/sec Loss 7.2285 LearningRate 0.0295 Epoch: 9 Global Step: 113490 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:27:30,576-Speed 2954.20 samples/sec Loss 7.3348 LearningRate 0.0295 Epoch: 9 Global Step: 113500 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:27:34,018-Speed 2975.39 samples/sec Loss 7.1156 LearningRate 0.0295 Epoch: 9 Global Step: 113510 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:27:37,435-Speed 2997.74 samples/sec Loss 7.1829 LearningRate 0.0295 Epoch: 9 Global Step: 113520 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:27:40,771-Speed 3070.56 samples/sec Loss 7.2055 LearningRate 0.0295 Epoch: 9 Global Step: 113530 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:27:44,197-Speed 2989.80 samples/sec Loss 7.1175 LearningRate 0.0295 Epoch: 9 Global Step: 113540 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:27:47,517-Speed 3085.72 samples/sec Loss 7.0734 LearningRate 0.0295 Epoch: 9 Global Step: 113550 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:27:50,847-Speed 3075.70 samples/sec Loss 7.1051 LearningRate 0.0295 Epoch: 9 Global Step: 113560 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:27:54,236-Speed 3022.88 samples/sec Loss 7.2508 LearningRate 0.0295 Epoch: 9 Global Step: 113570 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 12:27:57,605-Speed 3040.46 samples/sec Loss 7.2260 LearningRate 0.0295 Epoch: 9 Global Step: 113580 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 12:28:00,954-Speed 3058.81 samples/sec Loss 7.2640 LearningRate 0.0295 Epoch: 9 Global Step: 113590 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 12:28:04,326-Speed 3037.31 samples/sec Loss 7.1846 LearningRate 0.0295 Epoch: 9 Global Step: 113600 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 12:28:07,716-Speed 3021.69 samples/sec Loss 7.2019 LearningRate 0.0294 Epoch: 9 Global Step: 113610 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 12:28:11,113-Speed 3015.43 samples/sec Loss 7.3244 LearningRate 0.0294 Epoch: 9 Global Step: 113620 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 12:28:14,478-Speed 3043.87 samples/sec Loss 7.2249 LearningRate 0.0294 Epoch: 9 Global Step: 113630 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 12:28:17,870-Speed 3019.50 samples/sec Loss 7.1850 LearningRate 0.0294 Epoch: 9 Global Step: 113640 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 12:28:21,308-Speed 2979.57 samples/sec Loss 7.2390 LearningRate 0.0294 Epoch: 9 Global Step: 113650 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 12:28:24,665-Speed 3050.91 samples/sec Loss 7.3730 LearningRate 0.0294 Epoch: 9 Global Step: 113660 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 12:28:28,074-Speed 3004.78 samples/sec Loss 7.1443 LearningRate 0.0294 Epoch: 9 Global Step: 113670 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:28:31,454-Speed 3030.90 samples/sec Loss 7.3553 LearningRate 0.0294 Epoch: 9 Global Step: 113680 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:28:34,896-Speed 2975.67 samples/sec Loss 7.2832 LearningRate 0.0294 Epoch: 9 Global Step: 113690 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:28:38,247-Speed 3057.03 samples/sec Loss 7.2886 LearningRate 0.0294 Epoch: 9 Global Step: 113700 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:28:41,654-Speed 3006.32 samples/sec Loss 7.2738 LearningRate 0.0294 Epoch: 9 Global Step: 113710 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:28:45,021-Speed 3041.92 samples/sec Loss 7.1856 LearningRate 0.0294 Epoch: 9 Global Step: 113720 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:28:48,431-Speed 3004.12 samples/sec Loss 7.2699 LearningRate 0.0294 Epoch: 9 Global Step: 113730 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:28:51,806-Speed 3034.76 samples/sec Loss 7.1124 LearningRate 0.0294 Epoch: 9 Global Step: 113740 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:28:55,163-Speed 3051.50 samples/sec Loss 7.2789 LearningRate 0.0294 Epoch: 9 Global Step: 113750 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:28:58,558-Speed 3016.83 samples/sec Loss 7.2744 LearningRate 0.0294 Epoch: 9 Global Step: 113760 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:29:01,981-Speed 2993.23 samples/sec Loss 7.3472 LearningRate 0.0294 Epoch: 9 Global Step: 113770 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:29:05,359-Speed 3032.23 samples/sec Loss 7.2867 LearningRate 0.0294 Epoch: 9 Global Step: 113780 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:29:08,763-Speed 3008.38 samples/sec Loss 7.2985 LearningRate 0.0294 Epoch: 9 Global Step: 113790 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:29:12,118-Speed 3053.98 samples/sec Loss 7.2076 LearningRate 0.0294 Epoch: 9 Global Step: 113800 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:29:15,492-Speed 3035.83 samples/sec Loss 7.3776 LearningRate 0.0294 Epoch: 9 Global Step: 113810 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:29:18,883-Speed 3020.79 samples/sec Loss 7.2823 LearningRate 0.0294 Epoch: 9 Global Step: 113820 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:29:22,336-Speed 2966.14 samples/sec Loss 7.2851 LearningRate 0.0294 Epoch: 9 Global Step: 113830 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:29:25,696-Speed 3048.39 samples/sec Loss 7.2086 LearningRate 0.0293 Epoch: 9 Global Step: 113840 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:29:29,095-Speed 3013.80 samples/sec Loss 7.2723 LearningRate 0.0293 Epoch: 9 Global Step: 113850 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:29:32,538-Speed 2975.01 samples/sec Loss 7.2963 LearningRate 0.0293 Epoch: 9 Global Step: 113860 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:29:35,934-Speed 3015.70 samples/sec Loss 7.2096 LearningRate 0.0293 Epoch: 9 Global Step: 113870 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 12:29:39,388-Speed 2965.47 samples/sec Loss 7.3285 LearningRate 0.0293 Epoch: 9 Global Step: 113880 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:29:42,755-Speed 3043.06 samples/sec Loss 7.3431 LearningRate 0.0293 Epoch: 9 Global Step: 113890 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:29:46,095-Speed 3066.21 samples/sec Loss 7.3955 LearningRate 0.0293 Epoch: 9 Global Step: 113900 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:29:49,488-Speed 3019.10 samples/sec Loss 7.2447 LearningRate 0.0293 Epoch: 9 Global Step: 113910 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:29:52,977-Speed 2935.90 samples/sec Loss 7.3232 LearningRate 0.0293 Epoch: 9 Global Step: 113920 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:29:56,373-Speed 3015.54 samples/sec Loss 7.1996 LearningRate 0.0293 Epoch: 9 Global Step: 113930 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:29:59,785-Speed 3002.34 samples/sec Loss 7.3209 LearningRate 0.0293 Epoch: 9 Global Step: 113940 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:30:03,229-Speed 2975.26 samples/sec Loss 7.4672 LearningRate 0.0293 Epoch: 9 Global Step: 113950 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:30:06,592-Speed 3045.71 samples/sec Loss 7.3792 LearningRate 0.0293 Epoch: 9 Global Step: 113960 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:30:10,023-Speed 2985.62 samples/sec Loss 7.3477 LearningRate 0.0293 Epoch: 9 Global Step: 113970 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:30:13,400-Speed 3032.97 samples/sec Loss 7.3796 LearningRate 0.0293 Epoch: 9 Global Step: 113980 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:30:16,800-Speed 3013.32 samples/sec Loss 7.2766 LearningRate 0.0293 Epoch: 9 Global Step: 113990 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:30:20,223-Speed 2992.03 samples/sec Loss 7.3621 LearningRate 0.0293 Epoch: 9 Global Step: 114000 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:30:23,600-Speed 3033.60 samples/sec Loss 7.2371 LearningRate 0.0293 Epoch: 9 Global Step: 114010 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:30:26,996-Speed 3015.49 samples/sec Loss 7.2809 LearningRate 0.0293 Epoch: 9 Global Step: 114020 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:30:30,423-Speed 2989.66 samples/sec Loss 7.2994 LearningRate 0.0293 Epoch: 9 Global Step: 114030 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:30:33,814-Speed 3019.93 samples/sec Loss 7.4804 LearningRate 0.0293 Epoch: 9 Global Step: 114040 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:30:37,212-Speed 3014.98 samples/sec Loss 7.4406 LearningRate 0.0293 Epoch: 9 Global Step: 114050 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:30:40,669-Speed 2962.63 samples/sec Loss 7.3907 LearningRate 0.0293 Epoch: 9 Global Step: 114060 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:30:44,044-Speed 3034.94 samples/sec Loss 7.2428 LearningRate 0.0292 Epoch: 9 Global Step: 114070 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:30:47,553-Speed 2919.61 samples/sec Loss 7.3193 LearningRate 0.0292 Epoch: 9 Global Step: 114080 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:30:50,904-Speed 3056.82 samples/sec Loss 7.3994 LearningRate 0.0292 Epoch: 9 Global Step: 114090 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:30:54,359-Speed 2963.91 samples/sec Loss 7.3426 LearningRate 0.0292 Epoch: 9 Global Step: 114100 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:30:57,756-Speed 3015.27 samples/sec Loss 7.3545 LearningRate 0.0292 Epoch: 9 Global Step: 114110 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:31:01,197-Speed 2977.25 samples/sec Loss 7.4623 LearningRate 0.0292 Epoch: 9 Global Step: 114120 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:31:04,663-Speed 2954.72 samples/sec Loss 7.2393 LearningRate 0.0292 Epoch: 9 Global Step: 114130 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:31:08,082-Speed 2995.65 samples/sec Loss 7.4920 LearningRate 0.0292 Epoch: 9 Global Step: 114140 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:31:11,406-Speed 3082.52 samples/sec Loss 7.3044 LearningRate 0.0292 Epoch: 9 Global Step: 114150 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:31:14,849-Speed 2974.99 samples/sec Loss 7.3484 LearningRate 0.0292 Epoch: 9 Global Step: 114160 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:31:18,213-Speed 3045.44 samples/sec Loss 7.3410 LearningRate 0.0292 Epoch: 9 Global Step: 114170 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:31:21,555-Speed 3065.28 samples/sec Loss 7.4940 LearningRate 0.0292 Epoch: 9 Global Step: 114180 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:31:24,959-Speed 3008.96 samples/sec Loss 7.3464 LearningRate 0.0292 Epoch: 9 Global Step: 114190 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:31:28,328-Speed 3040.28 samples/sec Loss 7.2180 LearningRate 0.0292 Epoch: 9 Global Step: 114200 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:31:31,746-Speed 2997.23 samples/sec Loss 7.3800 LearningRate 0.0292 Epoch: 9 Global Step: 114210 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:31:35,132-Speed 3024.32 samples/sec Loss 7.4033 LearningRate 0.0292 Epoch: 9 Global Step: 114220 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 12:31:38,532-Speed 3013.01 samples/sec Loss 7.3566 LearningRate 0.0292 Epoch: 9 Global Step: 114230 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:31:41,952-Speed 2995.26 samples/sec Loss 7.4440 LearningRate 0.0292 Epoch: 9 Global Step: 114240 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:31:45,352-Speed 3012.17 samples/sec Loss 7.3293 LearningRate 0.0292 Epoch: 9 Global Step: 114250 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:31:48,719-Speed 3041.73 samples/sec Loss 7.4573 LearningRate 0.0292 Epoch: 9 Global Step: 114260 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:31:52,057-Speed 3069.23 samples/sec Loss 7.3712 LearningRate 0.0292 Epoch: 9 Global Step: 114270 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:31:55,386-Speed 3076.36 samples/sec Loss 7.4564 LearningRate 0.0292 Epoch: 9 Global Step: 114280 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:31:58,813-Speed 2989.54 samples/sec Loss 7.4360 LearningRate 0.0292 Epoch: 9 Global Step: 114290 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:32:02,173-Speed 3049.23 samples/sec Loss 7.3503 LearningRate 0.0291 Epoch: 9 Global Step: 114300 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:32:05,571-Speed 3013.51 samples/sec Loss 7.4036 LearningRate 0.0291 Epoch: 9 Global Step: 114310 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:32:08,973-Speed 3011.07 samples/sec Loss 7.3571 LearningRate 0.0291 Epoch: 9 Global Step: 114320 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:32:12,415-Speed 2976.50 samples/sec Loss 7.4070 LearningRate 0.0291 Epoch: 9 Global Step: 114330 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:32:15,781-Speed 3042.42 samples/sec Loss 7.2874 LearningRate 0.0291 Epoch: 9 Global Step: 114340 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:32:19,180-Speed 3014.14 samples/sec Loss 7.3887 LearningRate 0.0291 Epoch: 9 Global Step: 114350 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:32:22,520-Speed 3066.95 samples/sec Loss 7.4442 LearningRate 0.0291 Epoch: 9 Global Step: 114360 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:32:25,913-Speed 3018.53 samples/sec Loss 7.4158 LearningRate 0.0291 Epoch: 9 Global Step: 114370 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:32:29,282-Speed 3040.65 samples/sec Loss 7.4356 LearningRate 0.0291 Epoch: 9 Global Step: 114380 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:32:32,640-Speed 3049.66 samples/sec Loss 7.4283 LearningRate 0.0291 Epoch: 9 Global Step: 114390 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:32:36,144-Speed 2923.49 samples/sec Loss 7.4034 LearningRate 0.0291 Epoch: 9 Global Step: 114400 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:32:39,532-Speed 3023.52 samples/sec Loss 7.5261 LearningRate 0.0291 Epoch: 9 Global Step: 114410 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:32:42,937-Speed 3008.22 samples/sec Loss 7.4379 LearningRate 0.0291 Epoch: 9 Global Step: 114420 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:32:46,355-Speed 2996.63 samples/sec Loss 7.2643 LearningRate 0.0291 Epoch: 9 Global Step: 114430 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:32:49,817-Speed 2958.30 samples/sec Loss 7.5494 LearningRate 0.0291 Epoch: 9 Global Step: 114440 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:32:53,271-Speed 2965.58 samples/sec Loss 7.4718 LearningRate 0.0291 Epoch: 9 Global Step: 114450 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:32:56,678-Speed 3006.68 samples/sec Loss 7.4253 LearningRate 0.0291 Epoch: 9 Global Step: 114460 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 12:33:00,083-Speed 3008.87 samples/sec Loss 7.6373 LearningRate 0.0291 Epoch: 9 Global Step: 114470 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:33:03,470-Speed 3024.17 samples/sec Loss 7.4461 LearningRate 0.0291 Epoch: 9 Global Step: 114480 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:33:06,903-Speed 2983.38 samples/sec Loss 7.3708 LearningRate 0.0291 Epoch: 9 Global Step: 114490 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:33:10,300-Speed 3015.37 samples/sec Loss 7.3815 LearningRate 0.0291 Epoch: 9 Global Step: 114500 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:33:13,745-Speed 2972.96 samples/sec Loss 7.4346 LearningRate 0.0291 Epoch: 9 Global Step: 114510 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:33:17,192-Speed 2971.79 samples/sec Loss 7.3959 LearningRate 0.0291 Epoch: 9 Global Step: 114520 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:33:20,602-Speed 3003.84 samples/sec Loss 7.3291 LearningRate 0.0290 Epoch: 9 Global Step: 114530 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:33:23,916-Speed 3090.70 samples/sec Loss 7.5401 LearningRate 0.0290 Epoch: 9 Global Step: 114540 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:33:27,342-Speed 2989.88 samples/sec Loss 7.5886 LearningRate 0.0290 Epoch: 9 Global Step: 114550 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:33:31,581-Speed 2416.19 samples/sec Loss 7.3909 LearningRate 0.0290 Epoch: 9 Global Step: 114560 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:33:34,999-Speed 2996.78 samples/sec Loss 7.6059 LearningRate 0.0290 Epoch: 9 Global Step: 114570 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 12:33:38,440-Speed 2976.40 samples/sec Loss 7.4732 LearningRate 0.0290 Epoch: 9 Global Step: 114580 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 12:33:41,751-Speed 3093.02 samples/sec Loss 7.5358 LearningRate 0.0290 Epoch: 9 Global Step: 114590 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:33:45,094-Speed 3064.85 samples/sec Loss 7.5576 LearningRate 0.0290 Epoch: 9 Global Step: 114600 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:33:48,516-Speed 2992.87 samples/sec Loss 7.3029 LearningRate 0.0290 Epoch: 9 Global Step: 114610 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:33:51,917-Speed 3012.38 samples/sec Loss 7.4659 LearningRate 0.0290 Epoch: 9 Global Step: 114620 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 12:33:55,294-Speed 3032.54 samples/sec Loss 7.4578 LearningRate 0.0290 Epoch: 9 Global Step: 114630 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 12:33:58,727-Speed 2984.27 samples/sec Loss 7.3146 LearningRate 0.0290 Epoch: 9 Global Step: 114640 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 12:34:02,150-Speed 2992.06 samples/sec Loss 7.4248 LearningRate 0.0290 Epoch: 9 Global Step: 114650 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 12:34:05,565-Speed 2999.95 samples/sec Loss 7.3980 LearningRate 0.0290 Epoch: 9 Global Step: 114660 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 12:34:09,042-Speed 2945.46 samples/sec Loss 7.4827 LearningRate 0.0290 Epoch: 9 Global Step: 114670 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 12:34:12,384-Speed 3064.95 samples/sec Loss 7.5227 LearningRate 0.0290 Epoch: 9 Global Step: 114680 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 12:34:15,859-Speed 2947.21 samples/sec Loss 7.4786 LearningRate 0.0290 Epoch: 9 Global Step: 114690 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 12:34:19,222-Speed 3046.81 samples/sec Loss 7.4632 LearningRate 0.0290 Epoch: 9 Global Step: 114700 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 12:34:22,612-Speed 3021.24 samples/sec Loss 7.4379 LearningRate 0.0290 Epoch: 9 Global Step: 114710 Fp16 Grad Scale: 16384 Required: 13 hours Training: 2022-04-27 12:34:25,977-Speed 3043.78 samples/sec Loss 7.5226 LearningRate 0.0290 Epoch: 9 Global Step: 114720 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:34:29,432-Speed 2965.06 samples/sec Loss 7.3126 LearningRate 0.0290 Epoch: 9 Global Step: 114730 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:34:32,858-Speed 2989.72 samples/sec Loss 7.5176 LearningRate 0.0290 Epoch: 9 Global Step: 114740 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:34:36,270-Speed 3001.25 samples/sec Loss 7.5849 LearningRate 0.0290 Epoch: 9 Global Step: 114750 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:34:39,680-Speed 3004.84 samples/sec Loss 7.2877 LearningRate 0.0289 Epoch: 9 Global Step: 114760 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:34:43,081-Speed 3013.47 samples/sec Loss 7.4654 LearningRate 0.0289 Epoch: 9 Global Step: 114770 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:34:46,498-Speed 2997.75 samples/sec Loss 7.5481 LearningRate 0.0289 Epoch: 9 Global Step: 114780 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:34:49,855-Speed 3051.15 samples/sec Loss 7.5586 LearningRate 0.0289 Epoch: 9 Global Step: 114790 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:34:53,326-Speed 2951.52 samples/sec Loss 7.3615 LearningRate 0.0289 Epoch: 9 Global Step: 114800 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:34:56,689-Speed 3045.17 samples/sec Loss 7.4024 LearningRate 0.0289 Epoch: 9 Global Step: 114810 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:35:00,051-Speed 3047.61 samples/sec Loss 7.3972 LearningRate 0.0289 Epoch: 9 Global Step: 114820 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:35:03,445-Speed 3017.82 samples/sec Loss 7.4529 LearningRate 0.0289 Epoch: 9 Global Step: 114830 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:35:06,877-Speed 2984.26 samples/sec Loss 7.3257 LearningRate 0.0289 Epoch: 9 Global Step: 114840 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:35:10,349-Speed 2950.92 samples/sec Loss 7.4385 LearningRate 0.0289 Epoch: 9 Global Step: 114850 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:35:13,802-Speed 2965.64 samples/sec Loss 7.5391 LearningRate 0.0289 Epoch: 9 Global Step: 114860 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:35:17,211-Speed 3004.80 samples/sec Loss 7.5361 LearningRate 0.0289 Epoch: 9 Global Step: 114870 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:35:20,588-Speed 3033.37 samples/sec Loss 7.5166 LearningRate 0.0289 Epoch: 9 Global Step: 114880 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:35:23,942-Speed 3053.53 samples/sec Loss 7.5493 LearningRate 0.0289 Epoch: 9 Global Step: 114890 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:35:27,308-Speed 3043.30 samples/sec Loss 7.4521 LearningRate 0.0289 Epoch: 9 Global Step: 114900 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:35:30,667-Speed 3052.01 samples/sec Loss 7.4100 LearningRate 0.0289 Epoch: 9 Global Step: 114910 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:35:34,067-Speed 3012.47 samples/sec Loss 7.4143 LearningRate 0.0289 Epoch: 9 Global Step: 114920 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 12:35:37,471-Speed 3008.84 samples/sec Loss 7.5439 LearningRate 0.0289 Epoch: 9 Global Step: 114930 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:35:40,867-Speed 3016.65 samples/sec Loss 7.5565 LearningRate 0.0289 Epoch: 9 Global Step: 114940 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:35:44,233-Speed 3042.88 samples/sec Loss 7.4772 LearningRate 0.0289 Epoch: 9 Global Step: 114950 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:35:47,674-Speed 2976.74 samples/sec Loss 7.4859 LearningRate 0.0289 Epoch: 9 Global Step: 114960 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:35:51,115-Speed 2976.36 samples/sec Loss 7.5385 LearningRate 0.0289 Epoch: 9 Global Step: 114970 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:35:54,496-Speed 3030.37 samples/sec Loss 7.5052 LearningRate 0.0289 Epoch: 9 Global Step: 114980 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:35:57,954-Speed 2961.52 samples/sec Loss 7.4298 LearningRate 0.0288 Epoch: 9 Global Step: 114990 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:36:01,450-Speed 2929.90 samples/sec Loss 7.4202 LearningRate 0.0288 Epoch: 9 Global Step: 115000 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:36:04,902-Speed 2967.76 samples/sec Loss 7.5140 LearningRate 0.0288 Epoch: 9 Global Step: 115010 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:36:08,333-Speed 2984.53 samples/sec Loss 7.6452 LearningRate 0.0288 Epoch: 9 Global Step: 115020 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:36:11,728-Speed 3017.36 samples/sec Loss 7.5374 LearningRate 0.0288 Epoch: 9 Global Step: 115030 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 12:36:15,055-Speed 3078.89 samples/sec Loss 7.4632 LearningRate 0.0288 Epoch: 9 Global Step: 115040 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:36:18,468-Speed 3000.96 samples/sec Loss 7.5390 LearningRate 0.0288 Epoch: 9 Global Step: 115050 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:36:21,885-Speed 2998.05 samples/sec Loss 7.5682 LearningRate 0.0288 Epoch: 9 Global Step: 115060 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:36:25,333-Speed 2970.67 samples/sec Loss 7.5788 LearningRate 0.0288 Epoch: 9 Global Step: 115070 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:36:28,767-Speed 2982.22 samples/sec Loss 7.6619 LearningRate 0.0288 Epoch: 9 Global Step: 115080 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:36:32,192-Speed 2991.45 samples/sec Loss 7.5350 LearningRate 0.0288 Epoch: 9 Global Step: 115090 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:36:35,588-Speed 3016.00 samples/sec Loss 7.5774 LearningRate 0.0288 Epoch: 9 Global Step: 115100 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:36:39,005-Speed 2996.87 samples/sec Loss 7.3629 LearningRate 0.0288 Epoch: 9 Global Step: 115110 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:36:42,424-Speed 2996.31 samples/sec Loss 7.5341 LearningRate 0.0288 Epoch: 9 Global Step: 115120 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:36:45,758-Speed 3072.08 samples/sec Loss 7.5562 LearningRate 0.0288 Epoch: 9 Global Step: 115130 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:36:49,246-Speed 2936.98 samples/sec Loss 7.5071 LearningRate 0.0288 Epoch: 9 Global Step: 115140 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:36:52,607-Speed 3047.38 samples/sec Loss 7.4293 LearningRate 0.0288 Epoch: 9 Global Step: 115150 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:36:56,025-Speed 2996.98 samples/sec Loss 7.5273 LearningRate 0.0288 Epoch: 9 Global Step: 115160 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:36:59,449-Speed 2991.58 samples/sec Loss 7.5251 LearningRate 0.0288 Epoch: 9 Global Step: 115170 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:37:02,880-Speed 2985.10 samples/sec Loss 7.4382 LearningRate 0.0288 Epoch: 9 Global Step: 115180 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:37:06,255-Speed 3034.65 samples/sec Loss 7.5936 LearningRate 0.0288 Epoch: 9 Global Step: 115190 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:37:09,635-Speed 3030.57 samples/sec Loss 7.4587 LearningRate 0.0288 Epoch: 9 Global Step: 115200 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:37:13,041-Speed 3007.33 samples/sec Loss 7.4953 LearningRate 0.0288 Epoch: 9 Global Step: 115210 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:37:16,478-Speed 2980.45 samples/sec Loss 7.5511 LearningRate 0.0287 Epoch: 9 Global Step: 115220 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:37:19,910-Speed 2984.54 samples/sec Loss 7.4222 LearningRate 0.0287 Epoch: 9 Global Step: 115230 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:37:23,365-Speed 2964.69 samples/sec Loss 7.6536 LearningRate 0.0287 Epoch: 9 Global Step: 115240 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:37:26,827-Speed 2958.65 samples/sec Loss 7.5147 LearningRate 0.0287 Epoch: 9 Global Step: 115250 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:37:30,253-Speed 2990.32 samples/sec Loss 7.5128 LearningRate 0.0287 Epoch: 9 Global Step: 115260 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:37:33,664-Speed 3002.56 samples/sec Loss 7.5413 LearningRate 0.0287 Epoch: 9 Global Step: 115270 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:37:37,020-Speed 3052.32 samples/sec Loss 7.5670 LearningRate 0.0287 Epoch: 9 Global Step: 115280 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:37:40,409-Speed 3021.88 samples/sec Loss 7.6118 LearningRate 0.0287 Epoch: 9 Global Step: 115290 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:37:43,795-Speed 3025.05 samples/sec Loss 7.6085 LearningRate 0.0287 Epoch: 9 Global Step: 115300 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:37:47,208-Speed 3001.17 samples/sec Loss 7.5665 LearningRate 0.0287 Epoch: 9 Global Step: 115310 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:37:50,612-Speed 3009.34 samples/sec Loss 7.5703 LearningRate 0.0287 Epoch: 9 Global Step: 115320 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:37:54,048-Speed 2981.33 samples/sec Loss 7.5029 LearningRate 0.0287 Epoch: 9 Global Step: 115330 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 12:37:57,396-Speed 3059.26 samples/sec Loss 7.4526 LearningRate 0.0287 Epoch: 9 Global Step: 115340 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 12:38:00,760-Speed 3045.22 samples/sec Loss 7.5613 LearningRate 0.0287 Epoch: 9 Global Step: 115350 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 12:38:04,219-Speed 2960.55 samples/sec Loss 7.6688 LearningRate 0.0287 Epoch: 9 Global Step: 115360 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 12:38:07,635-Speed 2998.23 samples/sec Loss 7.6232 LearningRate 0.0287 Epoch: 9 Global Step: 115370 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 12:38:11,153-Speed 2911.63 samples/sec Loss 7.4790 LearningRate 0.0287 Epoch: 9 Global Step: 115380 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 12:38:14,520-Speed 3042.38 samples/sec Loss 7.6894 LearningRate 0.0287 Epoch: 9 Global Step: 115390 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 12:38:17,935-Speed 2999.69 samples/sec Loss 7.5081 LearningRate 0.0287 Epoch: 9 Global Step: 115400 Fp16 Grad Scale: 131072 Required: 13 hours Training: 2022-04-27 12:38:21,278-Speed 3063.46 samples/sec Loss 7.4783 LearningRate 0.0287 Epoch: 9 Global Step: 115410 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:38:24,633-Speed 3053.49 samples/sec Loss 7.6634 LearningRate 0.0287 Epoch: 9 Global Step: 115420 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:38:28,031-Speed 3013.84 samples/sec Loss 7.4778 LearningRate 0.0287 Epoch: 9 Global Step: 115430 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:38:31,452-Speed 2994.03 samples/sec Loss 7.5237 LearningRate 0.0287 Epoch: 9 Global Step: 115440 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:38:34,764-Speed 3093.66 samples/sec Loss 7.4993 LearningRate 0.0287 Epoch: 9 Global Step: 115450 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:38:38,082-Speed 3086.69 samples/sec Loss 7.5522 LearningRate 0.0286 Epoch: 9 Global Step: 115460 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:38:41,523-Speed 2976.51 samples/sec Loss 7.5156 LearningRate 0.0286 Epoch: 9 Global Step: 115470 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:38:45,020-Speed 2929.21 samples/sec Loss 7.4734 LearningRate 0.0286 Epoch: 9 Global Step: 115480 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:38:48,396-Speed 3034.22 samples/sec Loss 7.6490 LearningRate 0.0286 Epoch: 9 Global Step: 115490 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:38:51,757-Speed 3048.18 samples/sec Loss 7.6022 LearningRate 0.0286 Epoch: 9 Global Step: 115500 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:38:55,205-Speed 2969.83 samples/sec Loss 7.5261 LearningRate 0.0286 Epoch: 9 Global Step: 115510 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:38:58,687-Speed 2942.70 samples/sec Loss 7.5002 LearningRate 0.0286 Epoch: 9 Global Step: 115520 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:39:02,084-Speed 3014.90 samples/sec Loss 7.6495 LearningRate 0.0286 Epoch: 9 Global Step: 115530 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:39:05,508-Speed 2991.71 samples/sec Loss 7.5719 LearningRate 0.0286 Epoch: 9 Global Step: 115540 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:39:08,855-Speed 3060.68 samples/sec Loss 7.6938 LearningRate 0.0286 Epoch: 9 Global Step: 115550 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:39:12,172-Speed 3087.75 samples/sec Loss 7.4765 LearningRate 0.0286 Epoch: 9 Global Step: 115560 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:39:15,529-Speed 3051.12 samples/sec Loss 7.6321 LearningRate 0.0286 Epoch: 9 Global Step: 115570 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:39:18,896-Speed 3042.36 samples/sec Loss 7.5952 LearningRate 0.0286 Epoch: 9 Global Step: 115580 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:39:22,305-Speed 3004.79 samples/sec Loss 7.4861 LearningRate 0.0286 Epoch: 9 Global Step: 115590 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:39:25,745-Speed 2977.18 samples/sec Loss 7.7208 LearningRate 0.0286 Epoch: 9 Global Step: 115600 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:39:29,106-Speed 3047.69 samples/sec Loss 7.5618 LearningRate 0.0286 Epoch: 9 Global Step: 115610 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:39:32,556-Speed 2970.10 samples/sec Loss 7.4910 LearningRate 0.0286 Epoch: 9 Global Step: 115620 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:39:35,957-Speed 3011.19 samples/sec Loss 7.5717 LearningRate 0.0286 Epoch: 9 Global Step: 115630 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:39:39,375-Speed 2996.78 samples/sec Loss 7.6996 LearningRate 0.0286 Epoch: 9 Global Step: 115640 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:39:42,789-Speed 3000.23 samples/sec Loss 7.5627 LearningRate 0.0286 Epoch: 9 Global Step: 115650 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:39:46,180-Speed 3020.39 samples/sec Loss 7.6010 LearningRate 0.0286 Epoch: 9 Global Step: 115660 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:39:49,688-Speed 2920.32 samples/sec Loss 7.6363 LearningRate 0.0286 Epoch: 9 Global Step: 115670 Fp16 Grad Scale: 65536 Required: 13 hours Training: 2022-04-27 12:39:53,136-Speed 2971.05 samples/sec Loss 7.6198 LearningRate 0.0286 Epoch: 9 Global Step: 115680 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:39:56,583-Speed 2971.00 samples/sec Loss 7.5398 LearningRate 0.0285 Epoch: 9 Global Step: 115690 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:39:59,997-Speed 3000.46 samples/sec Loss 7.4595 LearningRate 0.0285 Epoch: 9 Global Step: 115700 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:40:03,388-Speed 3020.90 samples/sec Loss 7.6483 LearningRate 0.0285 Epoch: 9 Global Step: 115710 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:40:06,931-Speed 2890.87 samples/sec Loss 7.6759 LearningRate 0.0285 Epoch: 9 Global Step: 115720 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:40:10,250-Speed 3087.02 samples/sec Loss 7.5388 LearningRate 0.0285 Epoch: 9 Global Step: 115730 Fp16 Grad Scale: 32768 Required: 13 hours Training: 2022-04-27 12:40:13,607-Speed 3050.87 samples/sec Loss 7.7666 LearningRate 0.0285 Epoch: 9 Global Step: 115740 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:40:17,023-Speed 2997.95 samples/sec Loss 7.5569 LearningRate 0.0285 Epoch: 9 Global Step: 115750 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:40:20,504-Speed 2942.29 samples/sec Loss 7.7426 LearningRate 0.0285 Epoch: 9 Global Step: 115760 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:40:23,852-Speed 3060.01 samples/sec Loss 7.5081 LearningRate 0.0285 Epoch: 9 Global Step: 115770 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:40:27,201-Speed 3058.13 samples/sec Loss 7.5919 LearningRate 0.0285 Epoch: 9 Global Step: 115780 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:40:30,553-Speed 3055.69 samples/sec Loss 7.6290 LearningRate 0.0285 Epoch: 9 Global Step: 115790 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:40:33,967-Speed 3000.21 samples/sec Loss 7.4776 LearningRate 0.0285 Epoch: 9 Global Step: 115800 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:40:37,373-Speed 3007.62 samples/sec Loss 7.6503 LearningRate 0.0285 Epoch: 9 Global Step: 115810 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:40:40,690-Speed 3088.09 samples/sec Loss 7.4770 LearningRate 0.0285 Epoch: 9 Global Step: 115820 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:40:44,026-Speed 3070.19 samples/sec Loss 7.6820 LearningRate 0.0285 Epoch: 9 Global Step: 115830 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:40:47,417-Speed 3020.78 samples/sec Loss 7.5064 LearningRate 0.0285 Epoch: 9 Global Step: 115840 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:40:50,843-Speed 2989.56 samples/sec Loss 7.5031 LearningRate 0.0285 Epoch: 9 Global Step: 115850 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:40:54,156-Speed 3091.80 samples/sec Loss 7.4759 LearningRate 0.0285 Epoch: 9 Global Step: 115860 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:40:57,490-Speed 3072.71 samples/sec Loss 7.5422 LearningRate 0.0285 Epoch: 9 Global Step: 115870 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:41:00,842-Speed 3056.02 samples/sec Loss 7.7645 LearningRate 0.0285 Epoch: 9 Global Step: 115880 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-27 12:41:04,145-Speed 3101.07 samples/sec Loss 7.6008 LearningRate 0.0285 Epoch: 9 Global Step: 115890 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:41:07,496-Speed 3056.12 samples/sec Loss 7.4730 LearningRate 0.0285 Epoch: 9 Global Step: 115900 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:41:10,815-Speed 3086.80 samples/sec Loss 7.5495 LearningRate 0.0285 Epoch: 9 Global Step: 115910 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:41:14,133-Speed 3086.80 samples/sec Loss 7.7359 LearningRate 0.0284 Epoch: 9 Global Step: 115920 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:41:17,475-Speed 3064.99 samples/sec Loss 7.6152 LearningRate 0.0284 Epoch: 9 Global Step: 115930 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:41:20,834-Speed 3049.58 samples/sec Loss 7.5375 LearningRate 0.0284 Epoch: 9 Global Step: 115940 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:41:24,167-Speed 3073.42 samples/sec Loss 7.5601 LearningRate 0.0284 Epoch: 9 Global Step: 115950 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:41:27,536-Speed 3040.37 samples/sec Loss 7.5072 LearningRate 0.0284 Epoch: 9 Global Step: 115960 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:41:30,940-Speed 3008.52 samples/sec Loss 7.7302 LearningRate 0.0284 Epoch: 9 Global Step: 115970 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:41:34,257-Speed 3088.74 samples/sec Loss 7.6426 LearningRate 0.0284 Epoch: 9 Global Step: 115980 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:41:37,582-Speed 3080.30 samples/sec Loss 7.5630 LearningRate 0.0284 Epoch: 9 Global Step: 115990 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:41:40,940-Speed 3050.24 samples/sec Loss 7.5837 LearningRate 0.0284 Epoch: 9 Global Step: 116000 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:41:44,277-Speed 3069.79 samples/sec Loss 7.5065 LearningRate 0.0284 Epoch: 9 Global Step: 116010 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:41:47,678-Speed 3012.03 samples/sec Loss 7.6793 LearningRate 0.0284 Epoch: 9 Global Step: 116020 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:41:51,029-Speed 3056.59 samples/sec Loss 7.6261 LearningRate 0.0284 Epoch: 9 Global Step: 116030 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:41:54,448-Speed 2995.44 samples/sec Loss 7.5445 LearningRate 0.0284 Epoch: 9 Global Step: 116040 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:41:57,796-Speed 3059.49 samples/sec Loss 7.5604 LearningRate 0.0284 Epoch: 9 Global Step: 116050 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:42:01,218-Speed 2993.58 samples/sec Loss 7.6607 LearningRate 0.0284 Epoch: 9 Global Step: 116060 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:42:04,545-Speed 3078.04 samples/sec Loss 7.5325 LearningRate 0.0284 Epoch: 9 Global Step: 116070 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:42:07,892-Speed 3059.92 samples/sec Loss 7.6212 LearningRate 0.0284 Epoch: 9 Global Step: 116080 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:42:11,227-Speed 3072.26 samples/sec Loss 7.5436 LearningRate 0.0284 Epoch: 9 Global Step: 116090 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-27 12:42:14,601-Speed 3035.34 samples/sec Loss 7.5809 LearningRate 0.0284 Epoch: 9 Global Step: 116100 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:42:18,042-Speed 2976.66 samples/sec Loss 7.7324 LearningRate 0.0284 Epoch: 9 Global Step: 116110 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:42:21,424-Speed 3028.42 samples/sec Loss 7.5974 LearningRate 0.0284 Epoch: 9 Global Step: 116120 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:42:24,822-Speed 3014.40 samples/sec Loss 7.6352 LearningRate 0.0284 Epoch: 9 Global Step: 116130 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:42:28,254-Speed 2985.29 samples/sec Loss 7.6153 LearningRate 0.0284 Epoch: 9 Global Step: 116140 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:42:31,621-Speed 3042.47 samples/sec Loss 7.6431 LearningRate 0.0283 Epoch: 9 Global Step: 116150 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:42:35,032-Speed 3002.97 samples/sec Loss 7.5924 LearningRate 0.0283 Epoch: 9 Global Step: 116160 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:42:38,445-Speed 3000.24 samples/sec Loss 7.7274 LearningRate 0.0283 Epoch: 9 Global Step: 116170 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:42:41,821-Speed 3034.45 samples/sec Loss 7.4289 LearningRate 0.0283 Epoch: 9 Global Step: 116180 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:42:45,277-Speed 2963.89 samples/sec Loss 7.5524 LearningRate 0.0283 Epoch: 9 Global Step: 116190 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:42:48,689-Speed 3001.93 samples/sec Loss 7.4648 LearningRate 0.0283 Epoch: 9 Global Step: 116200 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-27 12:42:52,120-Speed 2985.57 samples/sec Loss 7.7213 LearningRate 0.0283 Epoch: 9 Global Step: 116210 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:42:55,601-Speed 2942.22 samples/sec Loss 7.4726 LearningRate 0.0283 Epoch: 9 Global Step: 116220 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:42:59,058-Speed 2963.39 samples/sec Loss 7.6939 LearningRate 0.0283 Epoch: 9 Global Step: 116230 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:43:02,443-Speed 3025.96 samples/sec Loss 7.6276 LearningRate 0.0283 Epoch: 9 Global Step: 116240 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:43:05,945-Speed 2925.17 samples/sec Loss 7.6060 LearningRate 0.0283 Epoch: 9 Global Step: 116250 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:43:09,358-Speed 3000.71 samples/sec Loss 7.6511 LearningRate 0.0283 Epoch: 9 Global Step: 116260 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:43:12,723-Speed 3043.64 samples/sec Loss 7.5036 LearningRate 0.0283 Epoch: 9 Global Step: 116270 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:43:16,162-Speed 2979.02 samples/sec Loss 7.4990 LearningRate 0.0283 Epoch: 9 Global Step: 116280 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:43:19,586-Speed 2991.75 samples/sec Loss 7.6400 LearningRate 0.0283 Epoch: 9 Global Step: 116290 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:43:22,914-Speed 3076.90 samples/sec Loss 7.6830 LearningRate 0.0283 Epoch: 9 Global Step: 116300 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:43:26,342-Speed 2988.34 samples/sec Loss 7.4814 LearningRate 0.0283 Epoch: 9 Global Step: 116310 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:43:29,790-Speed 2971.00 samples/sec Loss 7.5914 LearningRate 0.0283 Epoch: 9 Global Step: 116320 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:43:33,285-Speed 2930.53 samples/sec Loss 7.6338 LearningRate 0.0283 Epoch: 9 Global Step: 116330 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:43:36,805-Speed 2909.80 samples/sec Loss 7.6007 LearningRate 0.0283 Epoch: 9 Global Step: 116340 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:43:40,157-Speed 3056.06 samples/sec Loss 7.6368 LearningRate 0.0283 Epoch: 9 Global Step: 116350 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:43:43,515-Speed 3049.95 samples/sec Loss 7.6782 LearningRate 0.0283 Epoch: 9 Global Step: 116360 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:43:46,884-Speed 3040.33 samples/sec Loss 7.6299 LearningRate 0.0283 Epoch: 9 Global Step: 116370 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:43:50,261-Speed 3033.46 samples/sec Loss 7.6061 LearningRate 0.0283 Epoch: 9 Global Step: 116380 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:43:53,689-Speed 2988.59 samples/sec Loss 7.6527 LearningRate 0.0282 Epoch: 9 Global Step: 116390 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:43:57,160-Speed 2951.02 samples/sec Loss 7.5860 LearningRate 0.0282 Epoch: 9 Global Step: 116400 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:44:00,573-Speed 3001.06 samples/sec Loss 7.5434 LearningRate 0.0282 Epoch: 9 Global Step: 116410 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:44:03,929-Speed 3051.83 samples/sec Loss 7.6841 LearningRate 0.0282 Epoch: 9 Global Step: 116420 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:44:07,409-Speed 2944.01 samples/sec Loss 7.7070 LearningRate 0.0282 Epoch: 9 Global Step: 116430 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:44:10,909-Speed 2926.24 samples/sec Loss 7.5625 LearningRate 0.0282 Epoch: 9 Global Step: 116440 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:44:14,282-Speed 3036.21 samples/sec Loss 7.5819 LearningRate 0.0282 Epoch: 9 Global Step: 116450 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:44:17,738-Speed 2964.28 samples/sec Loss 7.7852 LearningRate 0.0282 Epoch: 9 Global Step: 116460 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:44:21,102-Speed 3045.21 samples/sec Loss 7.6605 LearningRate 0.0282 Epoch: 9 Global Step: 116470 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:44:24,502-Speed 3012.56 samples/sec Loss 7.7795 LearningRate 0.0282 Epoch: 9 Global Step: 116480 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:44:27,911-Speed 3004.85 samples/sec Loss 7.6994 LearningRate 0.0282 Epoch: 9 Global Step: 116490 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:44:31,289-Speed 3031.70 samples/sec Loss 7.5809 LearningRate 0.0282 Epoch: 9 Global Step: 116500 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:44:34,676-Speed 3023.87 samples/sec Loss 7.5692 LearningRate 0.0282 Epoch: 9 Global Step: 116510 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-27 12:44:38,103-Speed 2989.00 samples/sec Loss 7.6302 LearningRate 0.0282 Epoch: 9 Global Step: 116520 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:44:41,498-Speed 3017.41 samples/sec Loss 7.5163 LearningRate 0.0282 Epoch: 9 Global Step: 116530 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:44:44,895-Speed 3015.48 samples/sec Loss 7.4592 LearningRate 0.0282 Epoch: 9 Global Step: 116540 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:44:48,380-Speed 2938.83 samples/sec Loss 7.6281 LearningRate 0.0282 Epoch: 9 Global Step: 116550 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:44:51,857-Speed 2946.20 samples/sec Loss 7.6072 LearningRate 0.0282 Epoch: 9 Global Step: 116560 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:44:55,243-Speed 3025.15 samples/sec Loss 7.6185 LearningRate 0.0282 Epoch: 9 Global Step: 116570 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:44:58,625-Speed 3028.87 samples/sec Loss 7.6542 LearningRate 0.0282 Epoch: 9 Global Step: 116580 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:45:02,065-Speed 2977.45 samples/sec Loss 7.6900 LearningRate 0.0282 Epoch: 9 Global Step: 116590 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:45:05,474-Speed 3004.96 samples/sec Loss 7.6023 LearningRate 0.0282 Epoch: 9 Global Step: 116600 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:45:08,931-Speed 2962.61 samples/sec Loss 7.6077 LearningRate 0.0282 Epoch: 9 Global Step: 116610 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:45:12,405-Speed 2948.70 samples/sec Loss 7.6234 LearningRate 0.0281 Epoch: 9 Global Step: 116620 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:45:15,730-Speed 3080.74 samples/sec Loss 7.6313 LearningRate 0.0281 Epoch: 9 Global Step: 116630 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:45:19,067-Speed 3069.62 samples/sec Loss 7.5599 LearningRate 0.0281 Epoch: 9 Global Step: 116640 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:45:22,483-Speed 2998.13 samples/sec Loss 7.6723 LearningRate 0.0281 Epoch: 9 Global Step: 116650 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:45:25,814-Speed 3075.22 samples/sec Loss 7.5754 LearningRate 0.0281 Epoch: 9 Global Step: 116660 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:45:29,204-Speed 3021.26 samples/sec Loss 7.7361 LearningRate 0.0281 Epoch: 9 Global Step: 116670 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:45:32,566-Speed 3047.16 samples/sec Loss 7.6971 LearningRate 0.0281 Epoch: 9 Global Step: 116680 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:45:35,935-Speed 3040.39 samples/sec Loss 7.6643 LearningRate 0.0281 Epoch: 9 Global Step: 116690 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:45:39,309-Speed 3035.90 samples/sec Loss 7.6445 LearningRate 0.0281 Epoch: 9 Global Step: 116700 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:45:42,735-Speed 2989.07 samples/sec Loss 7.6354 LearningRate 0.0281 Epoch: 9 Global Step: 116710 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:45:46,096-Speed 3048.28 samples/sec Loss 7.7324 LearningRate 0.0281 Epoch: 9 Global Step: 116720 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:45:49,510-Speed 2999.89 samples/sec Loss 7.6738 LearningRate 0.0281 Epoch: 9 Global Step: 116730 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:45:52,940-Speed 2986.59 samples/sec Loss 7.8015 LearningRate 0.0281 Epoch: 9 Global Step: 116740 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:45:56,352-Speed 3001.85 samples/sec Loss 7.6082 LearningRate 0.0281 Epoch: 9 Global Step: 116750 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:45:59,767-Speed 2999.32 samples/sec Loss 7.6603 LearningRate 0.0281 Epoch: 9 Global Step: 116760 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:46:03,174-Speed 3006.55 samples/sec Loss 7.8193 LearningRate 0.0281 Epoch: 9 Global Step: 116770 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:46:06,559-Speed 3025.59 samples/sec Loss 7.5963 LearningRate 0.0281 Epoch: 9 Global Step: 116780 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:46:10,039-Speed 2943.84 samples/sec Loss 7.5668 LearningRate 0.0281 Epoch: 9 Global Step: 116790 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:46:13,381-Speed 3065.06 samples/sec Loss 7.6257 LearningRate 0.0281 Epoch: 9 Global Step: 116800 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:46:16,791-Speed 3003.43 samples/sec Loss 7.6076 LearningRate 0.0281 Epoch: 9 Global Step: 116810 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:46:20,184-Speed 3018.81 samples/sec Loss 7.5784 LearningRate 0.0281 Epoch: 9 Global Step: 116820 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:46:23,531-Speed 3060.82 samples/sec Loss 7.6157 LearningRate 0.0281 Epoch: 9 Global Step: 116830 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:46:26,860-Speed 3076.20 samples/sec Loss 7.7553 LearningRate 0.0281 Epoch: 9 Global Step: 116840 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:46:30,299-Speed 2978.85 samples/sec Loss 7.5529 LearningRate 0.0281 Epoch: 9 Global Step: 116850 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:46:33,675-Speed 3034.33 samples/sec Loss 7.7064 LearningRate 0.0280 Epoch: 9 Global Step: 116860 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:46:37,086-Speed 3002.19 samples/sec Loss 7.7132 LearningRate 0.0280 Epoch: 9 Global Step: 116870 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:46:40,471-Speed 3026.82 samples/sec Loss 7.6473 LearningRate 0.0280 Epoch: 9 Global Step: 116880 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:46:43,897-Speed 2989.45 samples/sec Loss 7.4982 LearningRate 0.0280 Epoch: 9 Global Step: 116890 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:46:47,291-Speed 3018.13 samples/sec Loss 7.5758 LearningRate 0.0280 Epoch: 9 Global Step: 116900 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:46:50,692-Speed 3011.24 samples/sec Loss 7.5703 LearningRate 0.0280 Epoch: 9 Global Step: 116910 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:46:54,036-Speed 3063.33 samples/sec Loss 7.6517 LearningRate 0.0280 Epoch: 9 Global Step: 116920 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:46:57,452-Speed 2999.55 samples/sec Loss 7.5050 LearningRate 0.0280 Epoch: 9 Global Step: 116930 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:47:00,905-Speed 2966.14 samples/sec Loss 7.7223 LearningRate 0.0280 Epoch: 9 Global Step: 116940 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:47:04,271-Speed 3042.99 samples/sec Loss 7.5266 LearningRate 0.0280 Epoch: 9 Global Step: 116950 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-27 12:47:07,699-Speed 2987.59 samples/sec Loss 7.5510 LearningRate 0.0280 Epoch: 9 Global Step: 116960 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:47:11,039-Speed 3067.28 samples/sec Loss 7.6063 LearningRate 0.0280 Epoch: 9 Global Step: 116970 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:47:14,435-Speed 3015.73 samples/sec Loss 7.5880 LearningRate 0.0280 Epoch: 9 Global Step: 116980 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:47:17,838-Speed 3010.85 samples/sec Loss 7.5189 LearningRate 0.0280 Epoch: 9 Global Step: 116990 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:47:21,184-Speed 3060.62 samples/sec Loss 7.5856 LearningRate 0.0280 Epoch: 9 Global Step: 117000 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:47:24,529-Speed 3062.37 samples/sec Loss 7.5810 LearningRate 0.0280 Epoch: 9 Global Step: 117010 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:47:27,948-Speed 2995.77 samples/sec Loss 7.5489 LearningRate 0.0280 Epoch: 9 Global Step: 117020 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:47:31,328-Speed 3030.27 samples/sec Loss 7.7158 LearningRate 0.0280 Epoch: 9 Global Step: 117030 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:47:34,753-Speed 2990.48 samples/sec Loss 7.6682 LearningRate 0.0280 Epoch: 9 Global Step: 117040 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:47:38,150-Speed 3015.70 samples/sec Loss 7.6234 LearningRate 0.0280 Epoch: 9 Global Step: 117050 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:47:41,553-Speed 3010.49 samples/sec Loss 7.6239 LearningRate 0.0280 Epoch: 9 Global Step: 117060 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-27 12:47:44,917-Speed 3044.64 samples/sec Loss 7.6625 LearningRate 0.0280 Epoch: 9 Global Step: 117070 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:47:48,425-Speed 2919.80 samples/sec Loss 7.5511 LearningRate 0.0280 Epoch: 9 Global Step: 117080 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:47:51,775-Speed 3056.96 samples/sec Loss 7.6183 LearningRate 0.0279 Epoch: 9 Global Step: 117090 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:47:55,124-Speed 3059.42 samples/sec Loss 7.6547 LearningRate 0.0279 Epoch: 9 Global Step: 117100 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:47:58,477-Speed 3054.03 samples/sec Loss 7.5650 LearningRate 0.0279 Epoch: 9 Global Step: 117110 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:48:01,931-Speed 2965.31 samples/sec Loss 7.6946 LearningRate 0.0279 Epoch: 9 Global Step: 117120 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:48:05,297-Speed 3043.74 samples/sec Loss 7.6623 LearningRate 0.0279 Epoch: 9 Global Step: 117130 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:48:08,706-Speed 3004.68 samples/sec Loss 7.7446 LearningRate 0.0279 Epoch: 9 Global Step: 117140 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:48:12,168-Speed 2957.69 samples/sec Loss 7.5971 LearningRate 0.0279 Epoch: 9 Global Step: 117150 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:48:15,598-Speed 2986.77 samples/sec Loss 7.6868 LearningRate 0.0279 Epoch: 9 Global Step: 117160 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:48:19,031-Speed 2983.39 samples/sec Loss 7.7148 LearningRate 0.0279 Epoch: 9 Global Step: 117170 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-27 12:48:22,384-Speed 3055.24 samples/sec Loss 7.6418 LearningRate 0.0279 Epoch: 9 Global Step: 117180 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:48:25,725-Speed 3066.35 samples/sec Loss 7.7854 LearningRate 0.0279 Epoch: 9 Global Step: 117190 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:48:29,143-Speed 2996.67 samples/sec Loss 7.6134 LearningRate 0.0279 Epoch: 9 Global Step: 117200 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:48:32,575-Speed 2984.71 samples/sec Loss 7.5048 LearningRate 0.0279 Epoch: 9 Global Step: 117210 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:48:35,997-Speed 2993.21 samples/sec Loss 7.5166 LearningRate 0.0279 Epoch: 9 Global Step: 117220 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:48:39,343-Speed 3061.27 samples/sec Loss 7.8287 LearningRate 0.0279 Epoch: 9 Global Step: 117230 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:48:42,648-Speed 3099.48 samples/sec Loss 7.6498 LearningRate 0.0279 Epoch: 9 Global Step: 117240 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:48:46,049-Speed 3011.60 samples/sec Loss 7.4910 LearningRate 0.0279 Epoch: 9 Global Step: 117250 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:48:49,462-Speed 3000.62 samples/sec Loss 7.6748 LearningRate 0.0279 Epoch: 9 Global Step: 117260 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:48:52,866-Speed 3009.38 samples/sec Loss 7.6625 LearningRate 0.0279 Epoch: 9 Global Step: 117270 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:48:56,236-Speed 3038.78 samples/sec Loss 7.6463 LearningRate 0.0279 Epoch: 9 Global Step: 117280 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-27 12:48:59,546-Speed 3094.97 samples/sec Loss 7.7634 LearningRate 0.0279 Epoch: 9 Global Step: 117290 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:49:02,922-Speed 3034.41 samples/sec Loss 7.6464 LearningRate 0.0279 Epoch: 9 Global Step: 117300 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:49:06,254-Speed 3075.24 samples/sec Loss 7.6434 LearningRate 0.0279 Epoch: 9 Global Step: 117310 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:49:09,649-Speed 3016.69 samples/sec Loss 7.5391 LearningRate 0.0279 Epoch: 9 Global Step: 117320 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:49:13,064-Speed 2999.30 samples/sec Loss 7.6903 LearningRate 0.0278 Epoch: 9 Global Step: 117330 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:49:16,493-Speed 2986.73 samples/sec Loss 7.7545 LearningRate 0.0278 Epoch: 9 Global Step: 117340 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:49:19,885-Speed 3020.20 samples/sec Loss 7.7480 LearningRate 0.0278 Epoch: 9 Global Step: 117350 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:49:23,194-Speed 3096.10 samples/sec Loss 7.6862 LearningRate 0.0278 Epoch: 9 Global Step: 117360 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:49:26,527-Speed 3072.78 samples/sec Loss 7.7737 LearningRate 0.0278 Epoch: 9 Global Step: 117370 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:49:29,863-Speed 3070.05 samples/sec Loss 7.5916 LearningRate 0.0278 Epoch: 9 Global Step: 117380 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:49:33,256-Speed 3019.26 samples/sec Loss 7.6677 LearningRate 0.0278 Epoch: 9 Global Step: 117390 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:49:36,684-Speed 2987.63 samples/sec Loss 7.5109 LearningRate 0.0278 Epoch: 9 Global Step: 117400 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:49:40,059-Speed 3034.92 samples/sec Loss 7.6554 LearningRate 0.0278 Epoch: 9 Global Step: 117410 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:49:43,451-Speed 3020.47 samples/sec Loss 7.5748 LearningRate 0.0278 Epoch: 9 Global Step: 117420 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:49:46,815-Speed 3044.58 samples/sec Loss 7.6320 LearningRate 0.0278 Epoch: 9 Global Step: 117430 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:49:50,187-Speed 3037.46 samples/sec Loss 7.6227 LearningRate 0.0278 Epoch: 9 Global Step: 117440 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:49:53,613-Speed 2989.22 samples/sec Loss 7.6266 LearningRate 0.0278 Epoch: 9 Global Step: 117450 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:49:57,047-Speed 2982.85 samples/sec Loss 7.7205 LearningRate 0.0278 Epoch: 9 Global Step: 117460 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:50:00,540-Speed 2939.46 samples/sec Loss 7.6887 LearningRate 0.0278 Epoch: 9 Global Step: 117470 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:50:04,048-Speed 2919.83 samples/sec Loss 7.5951 LearningRate 0.0278 Epoch: 9 Global Step: 117480 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:50:07,406-Speed 3050.41 samples/sec Loss 7.6688 LearningRate 0.0278 Epoch: 9 Global Step: 117490 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-27 12:50:10,834-Speed 2988.23 samples/sec Loss 7.7067 LearningRate 0.0278 Epoch: 9 Global Step: 117500 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:50:14,231-Speed 3015.01 samples/sec Loss 7.6183 LearningRate 0.0278 Epoch: 9 Global Step: 117510 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:50:17,606-Speed 3034.71 samples/sec Loss 7.6541 LearningRate 0.0278 Epoch: 9 Global Step: 117520 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:50:21,031-Speed 2990.86 samples/sec Loss 7.6181 LearningRate 0.0278 Epoch: 9 Global Step: 117530 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:50:24,393-Speed 3048.09 samples/sec Loss 7.5650 LearningRate 0.0278 Epoch: 9 Global Step: 117540 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:50:27,900-Speed 2920.17 samples/sec Loss 7.6351 LearningRate 0.0278 Epoch: 9 Global Step: 117550 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:50:31,263-Speed 3046.58 samples/sec Loss 7.6610 LearningRate 0.0277 Epoch: 9 Global Step: 117560 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:50:34,604-Speed 3065.09 samples/sec Loss 7.4679 LearningRate 0.0277 Epoch: 9 Global Step: 117570 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:50:38,064-Speed 2960.79 samples/sec Loss 7.6599 LearningRate 0.0277 Epoch: 9 Global Step: 117580 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:50:41,478-Speed 3000.22 samples/sec Loss 7.6914 LearningRate 0.0277 Epoch: 9 Global Step: 117590 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:50:44,943-Speed 2956.24 samples/sec Loss 7.5083 LearningRate 0.0277 Epoch: 9 Global Step: 117600 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-27 12:50:48,336-Speed 3019.11 samples/sec Loss 7.5216 LearningRate 0.0277 Epoch: 9 Global Step: 117610 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-27 12:50:51,771-Speed 2982.09 samples/sec Loss 7.7115 LearningRate 0.0277 Epoch: 9 Global Step: 117620 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-27 12:50:55,109-Speed 3068.49 samples/sec Loss 7.5492 LearningRate 0.0277 Epoch: 9 Global Step: 117630 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:50:58,488-Speed 3031.38 samples/sec Loss 7.5882 LearningRate 0.0277 Epoch: 9 Global Step: 117640 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:51:01,880-Speed 3019.29 samples/sec Loss 7.6327 LearningRate 0.0277 Epoch: 9 Global Step: 117650 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:51:05,226-Speed 3061.26 samples/sec Loss 7.5475 LearningRate 0.0277 Epoch: 9 Global Step: 117660 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:51:08,596-Speed 3039.83 samples/sec Loss 7.6398 LearningRate 0.0277 Epoch: 9 Global Step: 117670 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:51:11,999-Speed 3010.10 samples/sec Loss 7.5838 LearningRate 0.0277 Epoch: 9 Global Step: 117680 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:51:15,449-Speed 2968.91 samples/sec Loss 7.6666 LearningRate 0.0277 Epoch: 9 Global Step: 117690 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:51:18,864-Speed 2999.06 samples/sec Loss 7.7421 LearningRate 0.0277 Epoch: 9 Global Step: 117700 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:51:22,276-Speed 3002.34 samples/sec Loss 7.7686 LearningRate 0.0277 Epoch: 9 Global Step: 117710 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:51:25,744-Speed 2953.37 samples/sec Loss 7.6411 LearningRate 0.0277 Epoch: 9 Global Step: 117720 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:51:29,136-Speed 3019.59 samples/sec Loss 7.7123 LearningRate 0.0277 Epoch: 9 Global Step: 117730 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-27 12:51:32,529-Speed 3019.30 samples/sec Loss 7.5190 LearningRate 0.0277 Epoch: 9 Global Step: 117740 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-27 12:51:35,977-Speed 2970.37 samples/sec Loss 7.6797 LearningRate 0.0277 Epoch: 9 Global Step: 117750 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:51:39,320-Speed 3064.11 samples/sec Loss 7.4757 LearningRate 0.0277 Epoch: 9 Global Step: 117760 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:51:42,728-Speed 3005.57 samples/sec Loss 7.6751 LearningRate 0.0277 Epoch: 9 Global Step: 117770 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:51:46,174-Speed 2972.50 samples/sec Loss 7.6468 LearningRate 0.0277 Epoch: 9 Global Step: 117780 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:51:49,565-Speed 3020.02 samples/sec Loss 7.4841 LearningRate 0.0277 Epoch: 9 Global Step: 117790 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:51:52,965-Speed 3012.65 samples/sec Loss 7.5317 LearningRate 0.0276 Epoch: 9 Global Step: 117800 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:51:56,401-Speed 2981.23 samples/sec Loss 7.4887 LearningRate 0.0276 Epoch: 9 Global Step: 117810 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:51:59,808-Speed 3005.91 samples/sec Loss 7.6333 LearningRate 0.0276 Epoch: 9 Global Step: 117820 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:52:03,276-Speed 2954.34 samples/sec Loss 7.7171 LearningRate 0.0276 Epoch: 9 Global Step: 117830 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:52:06,642-Speed 3042.23 samples/sec Loss 7.6890 LearningRate 0.0276 Epoch: 9 Global Step: 117840 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:52:10,026-Speed 3027.27 samples/sec Loss 7.5477 LearningRate 0.0276 Epoch: 9 Global Step: 117850 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:52:13,389-Speed 3046.18 samples/sec Loss 7.7169 LearningRate 0.0276 Epoch: 9 Global Step: 117860 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:52:16,752-Speed 3045.47 samples/sec Loss 7.6091 LearningRate 0.0276 Epoch: 9 Global Step: 117870 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:52:20,124-Speed 3037.17 samples/sec Loss 7.6126 LearningRate 0.0276 Epoch: 9 Global Step: 117880 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:52:23,513-Speed 3023.07 samples/sec Loss 7.6595 LearningRate 0.0276 Epoch: 9 Global Step: 117890 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:52:26,948-Speed 2981.41 samples/sec Loss 7.6168 LearningRate 0.0276 Epoch: 9 Global Step: 117900 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:52:30,334-Speed 3025.74 samples/sec Loss 7.6854 LearningRate 0.0276 Epoch: 9 Global Step: 117910 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:52:33,693-Speed 3049.41 samples/sec Loss 7.6595 LearningRate 0.0276 Epoch: 9 Global Step: 117920 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:52:37,092-Speed 3013.12 samples/sec Loss 7.4997 LearningRate 0.0276 Epoch: 9 Global Step: 117930 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:52:40,485-Speed 3018.98 samples/sec Loss 7.7680 LearningRate 0.0276 Epoch: 9 Global Step: 117940 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:52:43,949-Speed 2956.76 samples/sec Loss 7.4602 LearningRate 0.0276 Epoch: 9 Global Step: 117950 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:52:47,384-Speed 2982.38 samples/sec Loss 7.6457 LearningRate 0.0276 Epoch: 9 Global Step: 117960 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:52:50,841-Speed 2962.64 samples/sec Loss 7.6053 LearningRate 0.0276 Epoch: 9 Global Step: 117970 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:52:54,244-Speed 3009.52 samples/sec Loss 7.7623 LearningRate 0.0276 Epoch: 9 Global Step: 117980 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:52:57,593-Speed 3059.42 samples/sec Loss 7.6400 LearningRate 0.0276 Epoch: 9 Global Step: 117990 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:53:01,000-Speed 3006.45 samples/sec Loss 7.5207 LearningRate 0.0276 Epoch: 9 Global Step: 118000 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:53:04,368-Speed 3041.48 samples/sec Loss 7.6789 LearningRate 0.0276 Epoch: 9 Global Step: 118010 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:53:07,729-Speed 3047.47 samples/sec Loss 7.7738 LearningRate 0.0276 Epoch: 9 Global Step: 118020 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:53:11,040-Speed 3094.25 samples/sec Loss 7.4863 LearningRate 0.0275 Epoch: 9 Global Step: 118030 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:53:14,394-Speed 3052.97 samples/sec Loss 7.6985 LearningRate 0.0275 Epoch: 9 Global Step: 118040 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:53:17,750-Speed 3052.29 samples/sec Loss 7.6033 LearningRate 0.0275 Epoch: 9 Global Step: 118050 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:53:21,165-Speed 2999.29 samples/sec Loss 7.7232 LearningRate 0.0275 Epoch: 9 Global Step: 118060 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:53:24,585-Speed 2995.18 samples/sec Loss 7.6001 LearningRate 0.0275 Epoch: 9 Global Step: 118070 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:53:27,947-Speed 3046.53 samples/sec Loss 7.7101 LearningRate 0.0275 Epoch: 9 Global Step: 118080 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:53:31,370-Speed 2991.94 samples/sec Loss 7.6455 LearningRate 0.0275 Epoch: 9 Global Step: 118090 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:53:34,722-Speed 3055.83 samples/sec Loss 7.6072 LearningRate 0.0275 Epoch: 9 Global Step: 118100 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:53:38,059-Speed 3070.07 samples/sec Loss 7.5820 LearningRate 0.0275 Epoch: 9 Global Step: 118110 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:53:41,526-Speed 2954.18 samples/sec Loss 7.7655 LearningRate 0.0275 Epoch: 9 Global Step: 118120 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:53:44,858-Speed 3074.93 samples/sec Loss 7.6253 LearningRate 0.0275 Epoch: 9 Global Step: 118130 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:53:48,231-Speed 3037.18 samples/sec Loss 7.6709 LearningRate 0.0275 Epoch: 9 Global Step: 118140 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:53:51,575-Speed 3063.23 samples/sec Loss 7.4893 LearningRate 0.0275 Epoch: 9 Global Step: 118150 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:53:55,039-Speed 2956.32 samples/sec Loss 7.6088 LearningRate 0.0275 Epoch: 9 Global Step: 118160 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:53:58,470-Speed 2985.70 samples/sec Loss 7.5578 LearningRate 0.0275 Epoch: 9 Global Step: 118170 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:54:01,874-Speed 3008.87 samples/sec Loss 7.7160 LearningRate 0.0275 Epoch: 9 Global Step: 118180 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:54:05,210-Speed 3070.42 samples/sec Loss 7.5795 LearningRate 0.0275 Epoch: 9 Global Step: 118190 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:54:08,638-Speed 2989.28 samples/sec Loss 7.5978 LearningRate 0.0275 Epoch: 9 Global Step: 118200 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:54:11,983-Speed 3062.14 samples/sec Loss 7.6149 LearningRate 0.0275 Epoch: 9 Global Step: 118210 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:54:15,332-Speed 3057.97 samples/sec Loss 7.5084 LearningRate 0.0275 Epoch: 9 Global Step: 118220 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:54:18,721-Speed 3022.33 samples/sec Loss 7.5288 LearningRate 0.0275 Epoch: 9 Global Step: 118230 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:54:22,084-Speed 3046.54 samples/sec Loss 7.5794 LearningRate 0.0275 Epoch: 9 Global Step: 118240 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:54:25,468-Speed 3026.04 samples/sec Loss 7.6582 LearningRate 0.0275 Epoch: 9 Global Step: 118250 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:54:28,887-Speed 2996.06 samples/sec Loss 7.6240 LearningRate 0.0275 Epoch: 9 Global Step: 118260 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:54:32,307-Speed 2994.90 samples/sec Loss 7.6451 LearningRate 0.0274 Epoch: 9 Global Step: 118270 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:54:35,679-Speed 3037.84 samples/sec Loss 7.5923 LearningRate 0.0274 Epoch: 9 Global Step: 118280 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:54:39,012-Speed 3073.11 samples/sec Loss 7.4606 LearningRate 0.0274 Epoch: 9 Global Step: 118290 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:54:42,479-Speed 2954.49 samples/sec Loss 7.5463 LearningRate 0.0274 Epoch: 9 Global Step: 118300 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:54:45,796-Speed 3087.89 samples/sec Loss 7.6263 LearningRate 0.0274 Epoch: 9 Global Step: 118310 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:54:49,188-Speed 3019.92 samples/sec Loss 7.7147 LearningRate 0.0274 Epoch: 9 Global Step: 118320 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:54:52,525-Speed 3069.34 samples/sec Loss 7.6236 LearningRate 0.0274 Epoch: 9 Global Step: 118330 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:54:55,921-Speed 3015.95 samples/sec Loss 7.8210 LearningRate 0.0274 Epoch: 9 Global Step: 118340 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:54:59,239-Speed 3086.92 samples/sec Loss 7.6602 LearningRate 0.0274 Epoch: 9 Global Step: 118350 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:55:02,554-Speed 3089.77 samples/sec Loss 7.5522 LearningRate 0.0274 Epoch: 9 Global Step: 118360 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:55:06,027-Speed 2949.08 samples/sec Loss 7.5320 LearningRate 0.0274 Epoch: 9 Global Step: 118370 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:55:09,466-Speed 2979.00 samples/sec Loss 7.5636 LearningRate 0.0274 Epoch: 9 Global Step: 118380 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:55:12,850-Speed 3027.07 samples/sec Loss 7.5971 LearningRate 0.0274 Epoch: 9 Global Step: 118390 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:55:16,252-Speed 3010.25 samples/sec Loss 7.5989 LearningRate 0.0274 Epoch: 9 Global Step: 118400 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:55:19,610-Speed 3050.66 samples/sec Loss 7.6519 LearningRate 0.0274 Epoch: 9 Global Step: 118410 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:55:23,092-Speed 2942.71 samples/sec Loss 7.6493 LearningRate 0.0274 Epoch: 9 Global Step: 118420 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:55:26,468-Speed 3034.01 samples/sec Loss 7.6327 LearningRate 0.0274 Epoch: 9 Global Step: 118430 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:55:29,805-Speed 3069.19 samples/sec Loss 7.5701 LearningRate 0.0274 Epoch: 9 Global Step: 118440 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:55:33,147-Speed 3065.71 samples/sec Loss 7.5165 LearningRate 0.0274 Epoch: 9 Global Step: 118450 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:55:36,540-Speed 3018.70 samples/sec Loss 7.5410 LearningRate 0.0274 Epoch: 9 Global Step: 118460 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:55:39,859-Speed 3085.75 samples/sec Loss 7.5837 LearningRate 0.0274 Epoch: 9 Global Step: 118470 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:55:43,189-Speed 3076.14 samples/sec Loss 7.6447 LearningRate 0.0274 Epoch: 9 Global Step: 118480 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:55:46,566-Speed 3033.98 samples/sec Loss 7.5365 LearningRate 0.0274 Epoch: 9 Global Step: 118490 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:55:50,005-Speed 2978.29 samples/sec Loss 7.5569 LearningRate 0.0274 Epoch: 9 Global Step: 118500 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:55:53,400-Speed 3016.61 samples/sec Loss 7.5844 LearningRate 0.0273 Epoch: 9 Global Step: 118510 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:55:56,815-Speed 2999.46 samples/sec Loss 7.5382 LearningRate 0.0273 Epoch: 9 Global Step: 118520 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:56:00,223-Speed 3005.80 samples/sec Loss 7.7465 LearningRate 0.0273 Epoch: 9 Global Step: 118530 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:56:03,558-Speed 3071.18 samples/sec Loss 7.7561 LearningRate 0.0273 Epoch: 9 Global Step: 118540 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:56:06,914-Speed 3052.26 samples/sec Loss 7.6537 LearningRate 0.0273 Epoch: 9 Global Step: 118550 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:56:10,253-Speed 3068.18 samples/sec Loss 7.7011 LearningRate 0.0273 Epoch: 9 Global Step: 118560 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:56:13,604-Speed 3055.90 samples/sec Loss 7.6981 LearningRate 0.0273 Epoch: 9 Global Step: 118570 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:56:16,964-Speed 3048.08 samples/sec Loss 7.5752 LearningRate 0.0273 Epoch: 9 Global Step: 118580 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:56:20,317-Speed 3055.64 samples/sec Loss 7.6200 LearningRate 0.0273 Epoch: 9 Global Step: 118590 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:56:23,645-Speed 3077.47 samples/sec Loss 7.5940 LearningRate 0.0273 Epoch: 9 Global Step: 118600 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:56:27,006-Speed 3047.84 samples/sec Loss 7.5614 LearningRate 0.0273 Epoch: 9 Global Step: 118610 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:56:30,387-Speed 3029.48 samples/sec Loss 7.5195 LearningRate 0.0273 Epoch: 9 Global Step: 118620 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:56:33,718-Speed 3074.95 samples/sec Loss 7.6294 LearningRate 0.0273 Epoch: 9 Global Step: 118630 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:56:37,097-Speed 3030.95 samples/sec Loss 7.6419 LearningRate 0.0273 Epoch: 9 Global Step: 118640 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:56:40,414-Speed 3088.36 samples/sec Loss 7.5894 LearningRate 0.0273 Epoch: 9 Global Step: 118650 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:56:43,723-Speed 3095.57 samples/sec Loss 7.7185 LearningRate 0.0273 Epoch: 9 Global Step: 118660 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:56:47,075-Speed 3055.05 samples/sec Loss 7.5643 LearningRate 0.0273 Epoch: 9 Global Step: 118670 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:56:50,428-Speed 3055.50 samples/sec Loss 7.6278 LearningRate 0.0273 Epoch: 9 Global Step: 118680 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:56:53,817-Speed 3023.15 samples/sec Loss 7.5531 LearningRate 0.0273 Epoch: 9 Global Step: 118690 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:56:57,203-Speed 3024.96 samples/sec Loss 7.6530 LearningRate 0.0273 Epoch: 9 Global Step: 118700 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:57:00,593-Speed 3020.98 samples/sec Loss 7.5282 LearningRate 0.0273 Epoch: 9 Global Step: 118710 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:57:03,944-Speed 3056.82 samples/sec Loss 7.7550 LearningRate 0.0273 Epoch: 9 Global Step: 118720 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:57:07,311-Speed 3041.90 samples/sec Loss 7.6462 LearningRate 0.0273 Epoch: 9 Global Step: 118730 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:57:10,714-Speed 3010.67 samples/sec Loss 7.6458 LearningRate 0.0273 Epoch: 9 Global Step: 118740 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:57:14,052-Speed 3068.11 samples/sec Loss 7.5665 LearningRate 0.0272 Epoch: 9 Global Step: 118750 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:57:17,361-Speed 3095.70 samples/sec Loss 7.6641 LearningRate 0.0272 Epoch: 9 Global Step: 118760 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:57:20,824-Speed 2957.66 samples/sec Loss 7.6128 LearningRate 0.0272 Epoch: 9 Global Step: 118770 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:57:24,268-Speed 2974.11 samples/sec Loss 7.6219 LearningRate 0.0272 Epoch: 9 Global Step: 118780 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-27 12:57:27,712-Speed 2974.79 samples/sec Loss 7.6800 LearningRate 0.0272 Epoch: 9 Global Step: 118790 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:57:31,128-Speed 2997.66 samples/sec Loss 7.5929 LearningRate 0.0272 Epoch: 9 Global Step: 118800 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:57:34,508-Speed 3030.23 samples/sec Loss 7.6978 LearningRate 0.0272 Epoch: 9 Global Step: 118810 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:57:37,888-Speed 3031.23 samples/sec Loss 7.6264 LearningRate 0.0272 Epoch: 9 Global Step: 118820 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:57:41,208-Speed 3084.58 samples/sec Loss 7.5286 LearningRate 0.0272 Epoch: 9 Global Step: 118830 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:57:44,639-Speed 2985.56 samples/sec Loss 7.5743 LearningRate 0.0272 Epoch: 9 Global Step: 118840 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:57:48,019-Speed 3030.79 samples/sec Loss 7.5981 LearningRate 0.0272 Epoch: 9 Global Step: 118850 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:57:51,396-Speed 3033.25 samples/sec Loss 7.5788 LearningRate 0.0272 Epoch: 9 Global Step: 118860 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:57:54,794-Speed 3014.43 samples/sec Loss 7.6045 LearningRate 0.0272 Epoch: 9 Global Step: 118870 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:57:58,150-Speed 3052.51 samples/sec Loss 7.5078 LearningRate 0.0272 Epoch: 9 Global Step: 118880 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:58:01,515-Speed 3043.68 samples/sec Loss 7.6536 LearningRate 0.0272 Epoch: 9 Global Step: 118890 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:58:04,854-Speed 3067.83 samples/sec Loss 7.6591 LearningRate 0.0272 Epoch: 9 Global Step: 118900 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:58:08,167-Speed 3091.60 samples/sec Loss 7.6314 LearningRate 0.0272 Epoch: 9 Global Step: 118910 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:58:11,571-Speed 3009.39 samples/sec Loss 7.5522 LearningRate 0.0272 Epoch: 9 Global Step: 118920 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:58:15,022-Speed 2967.35 samples/sec Loss 7.4822 LearningRate 0.0272 Epoch: 9 Global Step: 118930 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:58:18,428-Speed 3007.85 samples/sec Loss 7.5454 LearningRate 0.0272 Epoch: 9 Global Step: 118940 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:58:21,810-Speed 3028.53 samples/sec Loss 7.5817 LearningRate 0.0272 Epoch: 9 Global Step: 118950 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:58:25,133-Speed 3082.09 samples/sec Loss 7.5560 LearningRate 0.0272 Epoch: 9 Global Step: 118960 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 12:58:28,439-Speed 3098.66 samples/sec Loss 7.6358 LearningRate 0.0272 Epoch: 9 Global Step: 118970 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:58:31,772-Speed 3073.04 samples/sec Loss 7.6478 LearningRate 0.0271 Epoch: 9 Global Step: 118980 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:58:35,133-Speed 3047.76 samples/sec Loss 7.4202 LearningRate 0.0271 Epoch: 9 Global Step: 118990 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:58:38,568-Speed 2981.68 samples/sec Loss 7.5984 LearningRate 0.0271 Epoch: 9 Global Step: 119000 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:58:42,017-Speed 2970.32 samples/sec Loss 7.5262 LearningRate 0.0271 Epoch: 9 Global Step: 119010 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:58:45,382-Speed 3043.19 samples/sec Loss 7.6818 LearningRate 0.0271 Epoch: 9 Global Step: 119020 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:58:48,749-Speed 3042.88 samples/sec Loss 7.5507 LearningRate 0.0271 Epoch: 9 Global Step: 119030 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:58:52,165-Speed 2997.81 samples/sec Loss 7.5947 LearningRate 0.0271 Epoch: 9 Global Step: 119040 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 12:58:55,624-Speed 2961.64 samples/sec Loss 7.5916 LearningRate 0.0271 Epoch: 9 Global Step: 119050 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 12:58:59,000-Speed 3033.50 samples/sec Loss 7.6106 LearningRate 0.0271 Epoch: 9 Global Step: 119060 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 12:59:02,369-Speed 3040.64 samples/sec Loss 7.5770 LearningRate 0.0271 Epoch: 9 Global Step: 119070 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 12:59:05,726-Speed 3051.30 samples/sec Loss 7.6353 LearningRate 0.0271 Epoch: 9 Global Step: 119080 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 12:59:09,158-Speed 2984.28 samples/sec Loss 7.5589 LearningRate 0.0271 Epoch: 9 Global Step: 119090 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 12:59:12,511-Speed 3054.81 samples/sec Loss 7.6239 LearningRate 0.0271 Epoch: 9 Global Step: 119100 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 12:59:15,889-Speed 3032.59 samples/sec Loss 7.6145 LearningRate 0.0271 Epoch: 9 Global Step: 119110 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 12:59:19,271-Speed 3028.64 samples/sec Loss 7.5840 LearningRate 0.0271 Epoch: 9 Global Step: 119120 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 12:59:22,734-Speed 2957.97 samples/sec Loss 7.7567 LearningRate 0.0271 Epoch: 9 Global Step: 119130 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 12:59:26,073-Speed 3067.03 samples/sec Loss 7.6430 LearningRate 0.0271 Epoch: 9 Global Step: 119140 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:59:29,460-Speed 3024.68 samples/sec Loss 7.6020 LearningRate 0.0271 Epoch: 9 Global Step: 119150 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:59:32,915-Speed 2964.67 samples/sec Loss 7.4362 LearningRate 0.0271 Epoch: 9 Global Step: 119160 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:59:36,343-Speed 2988.11 samples/sec Loss 7.5708 LearningRate 0.0271 Epoch: 9 Global Step: 119170 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:59:39,756-Speed 3000.96 samples/sec Loss 7.5104 LearningRate 0.0271 Epoch: 9 Global Step: 119180 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:59:43,164-Speed 3005.76 samples/sec Loss 7.7098 LearningRate 0.0271 Epoch: 9 Global Step: 119190 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:59:46,544-Speed 3030.37 samples/sec Loss 7.5240 LearningRate 0.0271 Epoch: 9 Global Step: 119200 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:59:49,954-Speed 3004.21 samples/sec Loss 7.6534 LearningRate 0.0271 Epoch: 9 Global Step: 119210 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:59:53,271-Speed 3087.45 samples/sec Loss 7.5094 LearningRate 0.0270 Epoch: 9 Global Step: 119220 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 12:59:56,635-Speed 3044.73 samples/sec Loss 7.6250 LearningRate 0.0270 Epoch: 9 Global Step: 119230 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:00:00,023-Speed 3023.29 samples/sec Loss 7.5294 LearningRate 0.0270 Epoch: 9 Global Step: 119240 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:00:03,390-Speed 3042.25 samples/sec Loss 7.4600 LearningRate 0.0270 Epoch: 9 Global Step: 119250 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:00:06,760-Speed 3040.58 samples/sec Loss 7.6760 LearningRate 0.0270 Epoch: 9 Global Step: 119260 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:00:10,149-Speed 3022.50 samples/sec Loss 7.6479 LearningRate 0.0270 Epoch: 9 Global Step: 119270 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:00:13,603-Speed 2965.02 samples/sec Loss 7.5456 LearningRate 0.0270 Epoch: 9 Global Step: 119280 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:00:17,030-Speed 2989.45 samples/sec Loss 7.6312 LearningRate 0.0270 Epoch: 9 Global Step: 119290 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:00:20,380-Speed 3057.78 samples/sec Loss 7.6820 LearningRate 0.0270 Epoch: 9 Global Step: 119300 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:00:23,741-Speed 3046.90 samples/sec Loss 7.5219 LearningRate 0.0270 Epoch: 9 Global Step: 119310 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:00:27,103-Speed 3047.22 samples/sec Loss 7.5402 LearningRate 0.0270 Epoch: 9 Global Step: 119320 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:00:30,565-Speed 2958.88 samples/sec Loss 7.5652 LearningRate 0.0270 Epoch: 9 Global Step: 119330 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:00:33,930-Speed 3043.54 samples/sec Loss 7.5639 LearningRate 0.0270 Epoch: 9 Global Step: 119340 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:00:37,323-Speed 3019.22 samples/sec Loss 7.5793 LearningRate 0.0270 Epoch: 9 Global Step: 119350 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:00:40,666-Speed 3063.89 samples/sec Loss 7.6652 LearningRate 0.0270 Epoch: 9 Global Step: 119360 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:00:44,015-Speed 3058.00 samples/sec Loss 7.6793 LearningRate 0.0270 Epoch: 9 Global Step: 119370 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:00:47,368-Speed 3054.96 samples/sec Loss 7.6506 LearningRate 0.0270 Epoch: 9 Global Step: 119380 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:00:50,778-Speed 3004.14 samples/sec Loss 7.6186 LearningRate 0.0270 Epoch: 9 Global Step: 119390 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:00:54,156-Speed 3031.71 samples/sec Loss 7.5650 LearningRate 0.0270 Epoch: 9 Global Step: 119400 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:00:57,519-Speed 3046.02 samples/sec Loss 7.6451 LearningRate 0.0270 Epoch: 9 Global Step: 119410 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:01:00,907-Speed 3023.18 samples/sec Loss 7.5531 LearningRate 0.0270 Epoch: 9 Global Step: 119420 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:01:04,258-Speed 3057.37 samples/sec Loss 7.5176 LearningRate 0.0270 Epoch: 9 Global Step: 119430 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:01:07,627-Speed 3039.91 samples/sec Loss 7.4437 LearningRate 0.0270 Epoch: 9 Global Step: 119440 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:01:11,035-Speed 3005.79 samples/sec Loss 7.5614 LearningRate 0.0270 Epoch: 9 Global Step: 119450 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:01:14,447-Speed 3002.59 samples/sec Loss 7.4502 LearningRate 0.0269 Epoch: 9 Global Step: 119460 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:01:17,763-Speed 3088.30 samples/sec Loss 7.5495 LearningRate 0.0269 Epoch: 9 Global Step: 119470 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:01:21,180-Speed 2997.90 samples/sec Loss 7.6460 LearningRate 0.0269 Epoch: 9 Global Step: 119480 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:01:24,564-Speed 3026.49 samples/sec Loss 7.5743 LearningRate 0.0269 Epoch: 9 Global Step: 119490 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:01:27,933-Speed 3040.77 samples/sec Loss 7.5330 LearningRate 0.0269 Epoch: 9 Global Step: 119500 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:01:31,312-Speed 3031.08 samples/sec Loss 7.4597 LearningRate 0.0269 Epoch: 9 Global Step: 119510 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:01:34,771-Speed 2960.94 samples/sec Loss 7.5836 LearningRate 0.0269 Epoch: 9 Global Step: 119520 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:01:38,126-Speed 3052.99 samples/sec Loss 7.4307 LearningRate 0.0269 Epoch: 9 Global Step: 119530 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:01:41,484-Speed 3051.20 samples/sec Loss 7.5140 LearningRate 0.0269 Epoch: 9 Global Step: 119540 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:01:44,834-Speed 3057.37 samples/sec Loss 7.6195 LearningRate 0.0269 Epoch: 9 Global Step: 119550 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:01:48,238-Speed 3009.04 samples/sec Loss 7.6064 LearningRate 0.0269 Epoch: 9 Global Step: 119560 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:01:51,691-Speed 2966.05 samples/sec Loss 7.5532 LearningRate 0.0269 Epoch: 9 Global Step: 119570 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:01:55,121-Speed 2986.41 samples/sec Loss 7.6393 LearningRate 0.0269 Epoch: 9 Global Step: 119580 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:01:58,541-Speed 2995.19 samples/sec Loss 7.6773 LearningRate 0.0269 Epoch: 9 Global Step: 119590 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:02:01,949-Speed 3005.25 samples/sec Loss 7.5478 LearningRate 0.0269 Epoch: 9 Global Step: 119600 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:02:05,276-Speed 3078.89 samples/sec Loss 7.5775 LearningRate 0.0269 Epoch: 9 Global Step: 119610 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:02:08,613-Speed 3069.09 samples/sec Loss 7.5452 LearningRate 0.0269 Epoch: 9 Global Step: 119620 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:02:11,945-Speed 3074.63 samples/sec Loss 7.5533 LearningRate 0.0269 Epoch: 9 Global Step: 119630 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:02:15,323-Speed 3032.60 samples/sec Loss 7.5817 LearningRate 0.0269 Epoch: 9 Global Step: 119640 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:02:18,706-Speed 3027.86 samples/sec Loss 7.6127 LearningRate 0.0269 Epoch: 9 Global Step: 119650 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:02:22,088-Speed 3028.70 samples/sec Loss 7.5665 LearningRate 0.0269 Epoch: 9 Global Step: 119660 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:02:25,451-Speed 3046.27 samples/sec Loss 7.5634 LearningRate 0.0269 Epoch: 9 Global Step: 119670 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:02:28,833-Speed 3028.36 samples/sec Loss 7.7119 LearningRate 0.0269 Epoch: 9 Global Step: 119680 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:02:32,169-Speed 3070.56 samples/sec Loss 7.5836 LearningRate 0.0269 Epoch: 9 Global Step: 119690 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:02:35,557-Speed 3023.27 samples/sec Loss 7.6129 LearningRate 0.0268 Epoch: 9 Global Step: 119700 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:02:38,967-Speed 3003.22 samples/sec Loss 7.6266 LearningRate 0.0268 Epoch: 9 Global Step: 119710 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:02:42,321-Speed 3054.59 samples/sec Loss 7.4703 LearningRate 0.0268 Epoch: 9 Global Step: 119720 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:02:45,688-Speed 3041.79 samples/sec Loss 7.6874 LearningRate 0.0268 Epoch: 9 Global Step: 119730 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:02:49,064-Speed 3034.18 samples/sec Loss 7.6840 LearningRate 0.0268 Epoch: 9 Global Step: 119740 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:02:52,404-Speed 3066.38 samples/sec Loss 7.4761 LearningRate 0.0268 Epoch: 9 Global Step: 119750 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:02:55,786-Speed 3029.06 samples/sec Loss 7.6535 LearningRate 0.0268 Epoch: 9 Global Step: 119760 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:02:59,227-Speed 2976.90 samples/sec Loss 7.6884 LearningRate 0.0268 Epoch: 9 Global Step: 119770 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:03:02,624-Speed 3015.24 samples/sec Loss 7.4887 LearningRate 0.0268 Epoch: 9 Global Step: 119780 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:03:06,015-Speed 3020.36 samples/sec Loss 7.6104 LearningRate 0.0268 Epoch: 9 Global Step: 119790 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:03:09,387-Speed 3037.46 samples/sec Loss 7.5930 LearningRate 0.0268 Epoch: 9 Global Step: 119800 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:03:12,759-Speed 3037.70 samples/sec Loss 7.7115 LearningRate 0.0268 Epoch: 9 Global Step: 119810 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:03:16,080-Speed 3084.05 samples/sec Loss 7.5885 LearningRate 0.0268 Epoch: 9 Global Step: 119820 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:03:19,428-Speed 3059.70 samples/sec Loss 7.4473 LearningRate 0.0268 Epoch: 9 Global Step: 119830 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:03:22,812-Speed 3026.90 samples/sec Loss 7.5861 LearningRate 0.0268 Epoch: 9 Global Step: 119840 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:03:26,184-Speed 3037.26 samples/sec Loss 7.4608 LearningRate 0.0268 Epoch: 9 Global Step: 119850 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-27 13:03:29,534-Speed 3057.86 samples/sec Loss 7.5567 LearningRate 0.0268 Epoch: 9 Global Step: 119860 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:03:32,927-Speed 3018.82 samples/sec Loss 7.5656 LearningRate 0.0268 Epoch: 9 Global Step: 119870 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:03:36,284-Speed 3051.49 samples/sec Loss 7.5748 LearningRate 0.0268 Epoch: 9 Global Step: 119880 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:03:39,666-Speed 3028.12 samples/sec Loss 7.5294 LearningRate 0.0268 Epoch: 9 Global Step: 119890 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:03:43,038-Speed 3037.84 samples/sec Loss 7.4209 LearningRate 0.0268 Epoch: 9 Global Step: 119900 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:03:46,376-Speed 3068.58 samples/sec Loss 7.4964 LearningRate 0.0268 Epoch: 9 Global Step: 119910 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:03:49,767-Speed 3020.42 samples/sec Loss 7.5258 LearningRate 0.0268 Epoch: 9 Global Step: 119920 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:03:53,095-Speed 3078.35 samples/sec Loss 7.4511 LearningRate 0.0268 Epoch: 9 Global Step: 119930 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:03:56,465-Speed 3039.31 samples/sec Loss 7.4799 LearningRate 0.0267 Epoch: 9 Global Step: 119940 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:03:59,861-Speed 3015.75 samples/sec Loss 7.6491 LearningRate 0.0267 Epoch: 9 Global Step: 119950 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:04:03,212-Speed 3057.24 samples/sec Loss 7.5803 LearningRate 0.0267 Epoch: 9 Global Step: 119960 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:04:06,581-Speed 3039.69 samples/sec Loss 7.4944 LearningRate 0.0267 Epoch: 9 Global Step: 119970 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:04:09,936-Speed 3053.16 samples/sec Loss 7.5834 LearningRate 0.0267 Epoch: 9 Global Step: 119980 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:04:13,356-Speed 2995.01 samples/sec Loss 7.6002 LearningRate 0.0267 Epoch: 9 Global Step: 119990 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:04:16,768-Speed 3002.18 samples/sec Loss 7.4197 LearningRate 0.0267 Epoch: 9 Global Step: 120000 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:04:20,174-Speed 3007.07 samples/sec Loss 7.4269 LearningRate 0.0267 Epoch: 9 Global Step: 120010 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:04:23,543-Speed 3040.33 samples/sec Loss 7.5622 LearningRate 0.0267 Epoch: 9 Global Step: 120020 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:04:26,921-Speed 3032.40 samples/sec Loss 7.5197 LearningRate 0.0267 Epoch: 9 Global Step: 120030 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:04:30,249-Speed 3078.22 samples/sec Loss 7.4830 LearningRate 0.0267 Epoch: 9 Global Step: 120040 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:04:33,682-Speed 2983.16 samples/sec Loss 7.6461 LearningRate 0.0267 Epoch: 9 Global Step: 120050 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:04:37,038-Speed 3051.71 samples/sec Loss 7.5885 LearningRate 0.0267 Epoch: 9 Global Step: 120060 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:04:40,445-Speed 3007.06 samples/sec Loss 7.4667 LearningRate 0.0267 Epoch: 9 Global Step: 120070 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:04:43,817-Speed 3037.72 samples/sec Loss 7.4490 LearningRate 0.0267 Epoch: 9 Global Step: 120080 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:04:47,188-Speed 3038.02 samples/sec Loss 7.4519 LearningRate 0.0267 Epoch: 9 Global Step: 120090 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:04:50,578-Speed 3021.50 samples/sec Loss 7.5686 LearningRate 0.0267 Epoch: 9 Global Step: 120100 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:04:53,977-Speed 3014.08 samples/sec Loss 7.5802 LearningRate 0.0267 Epoch: 9 Global Step: 120110 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:04:57,404-Speed 2988.86 samples/sec Loss 7.5683 LearningRate 0.0267 Epoch: 9 Global Step: 120120 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:05:00,749-Speed 3062.31 samples/sec Loss 7.4613 LearningRate 0.0267 Epoch: 9 Global Step: 120130 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:05:04,192-Speed 2974.30 samples/sec Loss 7.6408 LearningRate 0.0267 Epoch: 9 Global Step: 120140 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:05:07,539-Speed 3060.89 samples/sec Loss 7.4451 LearningRate 0.0267 Epoch: 9 Global Step: 120150 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:05:10,898-Speed 3049.12 samples/sec Loss 7.7112 LearningRate 0.0267 Epoch: 9 Global Step: 120160 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:05:14,209-Speed 3093.52 samples/sec Loss 7.5948 LearningRate 0.0267 Epoch: 9 Global Step: 120170 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:05:17,574-Speed 3043.73 samples/sec Loss 7.5236 LearningRate 0.0266 Epoch: 9 Global Step: 120180 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:05:20,994-Speed 2995.63 samples/sec Loss 7.3801 LearningRate 0.0266 Epoch: 9 Global Step: 120190 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:05:24,383-Speed 3021.62 samples/sec Loss 7.5062 LearningRate 0.0266 Epoch: 9 Global Step: 120200 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:05:27,736-Speed 3055.08 samples/sec Loss 7.4839 LearningRate 0.0266 Epoch: 9 Global Step: 120210 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:05:31,091-Speed 3053.35 samples/sec Loss 7.4254 LearningRate 0.0266 Epoch: 9 Global Step: 120220 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:05:34,464-Speed 3036.85 samples/sec Loss 7.5951 LearningRate 0.0266 Epoch: 9 Global Step: 120230 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:05:37,827-Speed 3045.42 samples/sec Loss 7.6911 LearningRate 0.0266 Epoch: 9 Global Step: 120240 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:05:41,270-Speed 2975.19 samples/sec Loss 7.4662 LearningRate 0.0266 Epoch: 9 Global Step: 120250 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:05:44,636-Speed 3042.88 samples/sec Loss 7.5039 LearningRate 0.0266 Epoch: 9 Global Step: 120260 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:05:48,041-Speed 3008.31 samples/sec Loss 7.5131 LearningRate 0.0266 Epoch: 9 Global Step: 120270 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:05:51,489-Speed 2970.84 samples/sec Loss 7.5580 LearningRate 0.0266 Epoch: 9 Global Step: 120280 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:05:54,837-Speed 3058.39 samples/sec Loss 7.4277 LearningRate 0.0266 Epoch: 9 Global Step: 120290 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:05:58,219-Speed 3028.82 samples/sec Loss 7.5269 LearningRate 0.0266 Epoch: 9 Global Step: 120300 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:06:01,663-Speed 2974.81 samples/sec Loss 7.4975 LearningRate 0.0266 Epoch: 9 Global Step: 120310 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:06:04,967-Speed 3099.06 samples/sec Loss 7.4333 LearningRate 0.0266 Epoch: 9 Global Step: 120320 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:06:08,340-Speed 3037.39 samples/sec Loss 7.5262 LearningRate 0.0266 Epoch: 9 Global Step: 120330 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:06:11,769-Speed 2986.79 samples/sec Loss 7.4192 LearningRate 0.0266 Epoch: 9 Global Step: 120340 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:06:15,159-Speed 3021.21 samples/sec Loss 7.4288 LearningRate 0.0266 Epoch: 9 Global Step: 120350 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:06:18,597-Speed 2979.86 samples/sec Loss 7.5796 LearningRate 0.0266 Epoch: 9 Global Step: 120360 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:06:21,939-Speed 3064.41 samples/sec Loss 7.5737 LearningRate 0.0266 Epoch: 9 Global Step: 120370 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:06:25,257-Speed 3086.97 samples/sec Loss 7.4685 LearningRate 0.0266 Epoch: 9 Global Step: 120380 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:06:28,649-Speed 3020.08 samples/sec Loss 7.4935 LearningRate 0.0266 Epoch: 9 Global Step: 120390 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:06:32,038-Speed 3021.93 samples/sec Loss 7.5123 LearningRate 0.0266 Epoch: 9 Global Step: 120400 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:06:35,391-Speed 3054.96 samples/sec Loss 7.4976 LearningRate 0.0266 Epoch: 9 Global Step: 120410 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:06:38,774-Speed 3027.88 samples/sec Loss 7.5744 LearningRate 0.0265 Epoch: 9 Global Step: 120420 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:06:42,172-Speed 3014.24 samples/sec Loss 7.5634 LearningRate 0.0265 Epoch: 9 Global Step: 120430 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:06:45,595-Speed 2992.61 samples/sec Loss 7.5963 LearningRate 0.0265 Epoch: 9 Global Step: 120440 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:06:49,030-Speed 2981.52 samples/sec Loss 7.5079 LearningRate 0.0265 Epoch: 9 Global Step: 120450 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:06:52,493-Speed 2958.05 samples/sec Loss 7.5071 LearningRate 0.0265 Epoch: 9 Global Step: 120460 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-27 13:06:55,846-Speed 3055.31 samples/sec Loss 7.5291 LearningRate 0.0265 Epoch: 9 Global Step: 120470 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-27 13:06:59,245-Speed 3013.70 samples/sec Loss 7.5113 LearningRate 0.0265 Epoch: 9 Global Step: 120480 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-27 13:07:02,623-Speed 3032.05 samples/sec Loss 7.5392 LearningRate 0.0265 Epoch: 9 Global Step: 120490 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:07:06,076-Speed 2966.38 samples/sec Loss 7.5819 LearningRate 0.0265 Epoch: 9 Global Step: 120500 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:07:09,468-Speed 3019.38 samples/sec Loss 7.5787 LearningRate 0.0265 Epoch: 9 Global Step: 120510 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:07:12,912-Speed 2974.53 samples/sec Loss 7.5808 LearningRate 0.0265 Epoch: 9 Global Step: 120520 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:07:16,267-Speed 3052.99 samples/sec Loss 7.6522 LearningRate 0.0265 Epoch: 9 Global Step: 120530 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:07:19,661-Speed 3018.38 samples/sec Loss 7.4786 LearningRate 0.0265 Epoch: 9 Global Step: 120540 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:07:23,034-Speed 3036.17 samples/sec Loss 7.4607 LearningRate 0.0265 Epoch: 9 Global Step: 120550 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:07:26,389-Speed 3053.38 samples/sec Loss 7.5221 LearningRate 0.0265 Epoch: 9 Global Step: 120560 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:07:29,765-Speed 3034.24 samples/sec Loss 7.3870 LearningRate 0.0265 Epoch: 9 Global Step: 120570 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:07:33,124-Speed 3049.18 samples/sec Loss 7.4908 LearningRate 0.0265 Epoch: 9 Global Step: 120580 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:07:36,555-Speed 2984.66 samples/sec Loss 7.4504 LearningRate 0.0265 Epoch: 9 Global Step: 120590 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:07:39,947-Speed 3019.88 samples/sec Loss 7.4802 LearningRate 0.0265 Epoch: 9 Global Step: 120600 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:07:43,349-Speed 3011.29 samples/sec Loss 7.5588 LearningRate 0.0265 Epoch: 9 Global Step: 120610 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:07:46,872-Speed 2907.13 samples/sec Loss 7.3641 LearningRate 0.0265 Epoch: 9 Global Step: 120620 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:07:50,205-Speed 3072.69 samples/sec Loss 7.4879 LearningRate 0.0265 Epoch: 9 Global Step: 120630 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:07:53,553-Speed 3059.55 samples/sec Loss 7.6514 LearningRate 0.0265 Epoch: 9 Global Step: 120640 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:07:56,927-Speed 3035.69 samples/sec Loss 7.5151 LearningRate 0.0265 Epoch: 9 Global Step: 120650 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:08:00,338-Speed 3003.24 samples/sec Loss 7.4638 LearningRate 0.0264 Epoch: 9 Global Step: 120660 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:08:03,707-Speed 3040.17 samples/sec Loss 7.4871 LearningRate 0.0264 Epoch: 9 Global Step: 120670 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:08:07,098-Speed 3019.83 samples/sec Loss 7.3654 LearningRate 0.0264 Epoch: 9 Global Step: 120680 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:08:10,466-Speed 3042.01 samples/sec Loss 7.4451 LearningRate 0.0264 Epoch: 9 Global Step: 120690 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:08:13,843-Speed 3033.25 samples/sec Loss 7.5241 LearningRate 0.0264 Epoch: 9 Global Step: 120700 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:08:17,222-Speed 3030.48 samples/sec Loss 7.4685 LearningRate 0.0264 Epoch: 9 Global Step: 120710 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:08:20,576-Speed 3054.74 samples/sec Loss 7.5651 LearningRate 0.0264 Epoch: 9 Global Step: 120720 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:08:23,910-Speed 3071.46 samples/sec Loss 7.5038 LearningRate 0.0264 Epoch: 9 Global Step: 120730 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:08:27,400-Speed 2935.15 samples/sec Loss 7.5174 LearningRate 0.0264 Epoch: 9 Global Step: 120740 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:08:30,821-Speed 2993.41 samples/sec Loss 7.5125 LearningRate 0.0264 Epoch: 9 Global Step: 120750 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:08:34,206-Speed 3026.21 samples/sec Loss 7.4496 LearningRate 0.0264 Epoch: 9 Global Step: 120760 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:08:37,532-Speed 3079.93 samples/sec Loss 7.4812 LearningRate 0.0264 Epoch: 9 Global Step: 120770 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:08:40,885-Speed 3054.59 samples/sec Loss 7.4406 LearningRate 0.0264 Epoch: 9 Global Step: 120780 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:08:44,213-Speed 3078.11 samples/sec Loss 7.6511 LearningRate 0.0264 Epoch: 9 Global Step: 120790 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:08:47,554-Speed 3065.64 samples/sec Loss 7.4651 LearningRate 0.0264 Epoch: 9 Global Step: 120800 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:08:50,993-Speed 2978.47 samples/sec Loss 7.4636 LearningRate 0.0264 Epoch: 9 Global Step: 120810 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:08:54,391-Speed 3014.30 samples/sec Loss 7.5041 LearningRate 0.0264 Epoch: 9 Global Step: 120820 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:08:57,748-Speed 3051.41 samples/sec Loss 7.5855 LearningRate 0.0264 Epoch: 9 Global Step: 120830 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:09:01,149-Speed 3011.78 samples/sec Loss 7.5056 LearningRate 0.0264 Epoch: 9 Global Step: 120840 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:09:04,517-Speed 3041.12 samples/sec Loss 7.5257 LearningRate 0.0264 Epoch: 9 Global Step: 120850 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:09:07,933-Speed 2998.82 samples/sec Loss 7.4678 LearningRate 0.0264 Epoch: 9 Global Step: 120860 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:09:11,257-Speed 3081.03 samples/sec Loss 7.4252 LearningRate 0.0264 Epoch: 9 Global Step: 120870 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:09:14,639-Speed 3028.81 samples/sec Loss 7.5718 LearningRate 0.0264 Epoch: 9 Global Step: 120880 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:09:17,955-Speed 3089.45 samples/sec Loss 7.5012 LearningRate 0.0264 Epoch: 9 Global Step: 120890 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:09:21,260-Speed 3099.11 samples/sec Loss 7.5694 LearningRate 0.0264 Epoch: 9 Global Step: 120900 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:09:24,629-Speed 3040.32 samples/sec Loss 7.3880 LearningRate 0.0263 Epoch: 9 Global Step: 120910 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:09:28,069-Speed 2977.12 samples/sec Loss 7.4762 LearningRate 0.0263 Epoch: 9 Global Step: 120920 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:09:31,475-Speed 3007.87 samples/sec Loss 7.5407 LearningRate 0.0263 Epoch: 9 Global Step: 120930 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:09:34,852-Speed 3033.26 samples/sec Loss 7.5118 LearningRate 0.0263 Epoch: 9 Global Step: 120940 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:09:38,330-Speed 2944.69 samples/sec Loss 7.4993 LearningRate 0.0263 Epoch: 9 Global Step: 120950 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:09:41,754-Speed 2991.60 samples/sec Loss 7.4838 LearningRate 0.0263 Epoch: 9 Global Step: 120960 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:09:45,134-Speed 3030.51 samples/sec Loss 7.4238 LearningRate 0.0263 Epoch: 9 Global Step: 120970 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:09:48,502-Speed 3041.37 samples/sec Loss 7.4448 LearningRate 0.0263 Epoch: 9 Global Step: 120980 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:09:51,880-Speed 3032.22 samples/sec Loss 7.5185 LearningRate 0.0263 Epoch: 9 Global Step: 120990 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:09:55,294-Speed 3000.26 samples/sec Loss 7.5407 LearningRate 0.0263 Epoch: 9 Global Step: 121000 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-27 13:09:58,641-Speed 3060.23 samples/sec Loss 7.4593 LearningRate 0.0263 Epoch: 9 Global Step: 121010 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:10:02,012-Speed 3038.34 samples/sec Loss 7.4822 LearningRate 0.0263 Epoch: 9 Global Step: 121020 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:10:05,364-Speed 3056.53 samples/sec Loss 7.4325 LearningRate 0.0263 Epoch: 9 Global Step: 121030 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:10:08,772-Speed 3005.25 samples/sec Loss 7.5354 LearningRate 0.0263 Epoch: 9 Global Step: 121040 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:10:12,119-Speed 3060.00 samples/sec Loss 7.4410 LearningRate 0.0263 Epoch: 9 Global Step: 121050 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:10:15,470-Speed 3056.67 samples/sec Loss 7.5258 LearningRate 0.0263 Epoch: 9 Global Step: 121060 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:10:18,814-Speed 3062.99 samples/sec Loss 7.5896 LearningRate 0.0263 Epoch: 9 Global Step: 121070 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:10:22,241-Speed 2989.42 samples/sec Loss 7.5575 LearningRate 0.0263 Epoch: 9 Global Step: 121080 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:10:25,550-Speed 3095.65 samples/sec Loss 7.5060 LearningRate 0.0263 Epoch: 9 Global Step: 121090 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:10:28,954-Speed 3008.60 samples/sec Loss 7.4529 LearningRate 0.0263 Epoch: 9 Global Step: 121100 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:10:32,399-Speed 2973.63 samples/sec Loss 7.5414 LearningRate 0.0263 Epoch: 9 Global Step: 121110 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:10:35,785-Speed 3024.48 samples/sec Loss 7.5746 LearningRate 0.0263 Epoch: 9 Global Step: 121120 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:10:39,150-Speed 3043.72 samples/sec Loss 7.3921 LearningRate 0.0263 Epoch: 9 Global Step: 121130 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:10:42,495-Speed 3062.37 samples/sec Loss 7.4266 LearningRate 0.0263 Epoch: 9 Global Step: 121140 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:10:45,898-Speed 3009.80 samples/sec Loss 7.5072 LearningRate 0.0262 Epoch: 9 Global Step: 121150 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:10:49,214-Speed 3088.83 samples/sec Loss 7.4269 LearningRate 0.0262 Epoch: 9 Global Step: 121160 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:10:52,620-Speed 3008.13 samples/sec Loss 7.4368 LearningRate 0.0262 Epoch: 9 Global Step: 121170 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:10:55,973-Speed 3054.60 samples/sec Loss 7.4867 LearningRate 0.0262 Epoch: 9 Global Step: 121180 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:10:59,358-Speed 3026.12 samples/sec Loss 7.4305 LearningRate 0.0262 Epoch: 9 Global Step: 121190 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:11:02,761-Speed 3009.41 samples/sec Loss 7.5175 LearningRate 0.0262 Epoch: 9 Global Step: 121200 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:11:06,165-Speed 3009.12 samples/sec Loss 7.4522 LearningRate 0.0262 Epoch: 9 Global Step: 121210 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-27 13:11:09,535-Speed 3039.60 samples/sec Loss 7.5364 LearningRate 0.0262 Epoch: 9 Global Step: 121220 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:11:13,557-Speed 2546.70 samples/sec Loss 7.3703 LearningRate 0.0262 Epoch: 9 Global Step: 121230 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:11:16,927-Speed 3039.87 samples/sec Loss 7.3743 LearningRate 0.0262 Epoch: 9 Global Step: 121240 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:11:20,352-Speed 2990.95 samples/sec Loss 7.5928 LearningRate 0.0262 Epoch: 9 Global Step: 121250 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:11:24,391-Speed 2535.23 samples/sec Loss 7.5997 LearningRate 0.0262 Epoch: 9 Global Step: 121260 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:11:27,826-Speed 2982.48 samples/sec Loss 7.4407 LearningRate 0.0262 Epoch: 9 Global Step: 121270 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:11:32,349-Speed 2264.49 samples/sec Loss 7.5550 LearningRate 0.0262 Epoch: 9 Global Step: 121280 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:11:35,797-Speed 2970.29 samples/sec Loss 7.4438 LearningRate 0.0262 Epoch: 9 Global Step: 121290 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:11:39,234-Speed 2980.37 samples/sec Loss 7.5111 LearningRate 0.0262 Epoch: 9 Global Step: 121300 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:11:42,619-Speed 3025.57 samples/sec Loss 7.5153 LearningRate 0.0262 Epoch: 9 Global Step: 121310 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:11:46,037-Speed 2996.73 samples/sec Loss 7.4734 LearningRate 0.0262 Epoch: 9 Global Step: 121320 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:11:49,439-Speed 3010.78 samples/sec Loss 7.4208 LearningRate 0.0262 Epoch: 9 Global Step: 121330 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:11:52,808-Speed 3040.33 samples/sec Loss 7.6382 LearningRate 0.0262 Epoch: 9 Global Step: 121340 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:11:56,194-Speed 3025.53 samples/sec Loss 7.3132 LearningRate 0.0262 Epoch: 9 Global Step: 121350 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:11:59,608-Speed 2999.87 samples/sec Loss 7.5179 LearningRate 0.0262 Epoch: 9 Global Step: 121360 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:12:02,939-Speed 3075.46 samples/sec Loss 7.5468 LearningRate 0.0262 Epoch: 9 Global Step: 121370 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:12:06,276-Speed 3068.96 samples/sec Loss 7.5599 LearningRate 0.0262 Epoch: 9 Global Step: 121380 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:12:09,657-Speed 3029.92 samples/sec Loss 7.4458 LearningRate 0.0261 Epoch: 9 Global Step: 121390 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:12:13,004-Speed 3059.90 samples/sec Loss 7.5253 LearningRate 0.0261 Epoch: 9 Global Step: 121400 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:12:16,348-Speed 3062.57 samples/sec Loss 7.5246 LearningRate 0.0261 Epoch: 9 Global Step: 121410 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:12:19,845-Speed 2929.81 samples/sec Loss 7.4687 LearningRate 0.0261 Epoch: 9 Global Step: 121420 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:12:23,217-Speed 3037.73 samples/sec Loss 7.4656 LearningRate 0.0261 Epoch: 9 Global Step: 121430 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:12:26,587-Speed 3038.72 samples/sec Loss 7.4055 LearningRate 0.0261 Epoch: 9 Global Step: 121440 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:12:30,032-Speed 2973.25 samples/sec Loss 7.5834 LearningRate 0.0261 Epoch: 9 Global Step: 121450 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:12:33,404-Speed 3037.86 samples/sec Loss 7.4779 LearningRate 0.0261 Epoch: 9 Global Step: 121460 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-27 13:12:36,780-Speed 3033.97 samples/sec Loss 7.4383 LearningRate 0.0261 Epoch: 9 Global Step: 121470 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-27 13:12:40,153-Speed 3037.08 samples/sec Loss 7.3726 LearningRate 0.0261 Epoch: 9 Global Step: 121480 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:12:43,504-Speed 3056.05 samples/sec Loss 7.4617 LearningRate 0.0261 Epoch: 9 Global Step: 121490 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:12:46,829-Speed 3080.90 samples/sec Loss 7.4557 LearningRate 0.0261 Epoch: 9 Global Step: 121500 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:12:50,249-Speed 2995.14 samples/sec Loss 7.3604 LearningRate 0.0261 Epoch: 9 Global Step: 121510 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:12:53,664-Speed 2999.50 samples/sec Loss 7.4350 LearningRate 0.0261 Epoch: 9 Global Step: 121520 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:12:57,117-Speed 2965.96 samples/sec Loss 7.4075 LearningRate 0.0261 Epoch: 9 Global Step: 121530 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:13:00,539-Speed 2994.15 samples/sec Loss 7.4147 LearningRate 0.0261 Epoch: 9 Global Step: 121540 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:13:03,933-Speed 3017.66 samples/sec Loss 7.4819 LearningRate 0.0261 Epoch: 9 Global Step: 121550 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:13:07,384-Speed 2968.19 samples/sec Loss 7.5615 LearningRate 0.0261 Epoch: 9 Global Step: 121560 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:13:10,799-Speed 2998.98 samples/sec Loss 7.4739 LearningRate 0.0261 Epoch: 9 Global Step: 121570 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:13:14,200-Speed 3011.63 samples/sec Loss 7.4187 LearningRate 0.0261 Epoch: 9 Global Step: 121580 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-27 13:13:17,545-Speed 3063.16 samples/sec Loss 7.5210 LearningRate 0.0261 Epoch: 9 Global Step: 121590 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:13:20,969-Speed 2992.31 samples/sec Loss 7.3439 LearningRate 0.0261 Epoch: 9 Global Step: 121600 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:13:24,318-Speed 3058.17 samples/sec Loss 7.3858 LearningRate 0.0261 Epoch: 9 Global Step: 121610 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:13:28,302-Speed 2571.31 samples/sec Loss 7.3497 LearningRate 0.0261 Epoch: 9 Global Step: 121620 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:13:32,255-Speed 2590.84 samples/sec Loss 7.4215 LearningRate 0.0260 Epoch: 9 Global Step: 121630 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:13:35,704-Speed 2970.76 samples/sec Loss 7.4929 LearningRate 0.0260 Epoch: 9 Global Step: 121640 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:13:39,140-Speed 2980.57 samples/sec Loss 7.4510 LearningRate 0.0260 Epoch: 9 Global Step: 121650 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:13:42,527-Speed 3023.93 samples/sec Loss 7.4196 LearningRate 0.0260 Epoch: 9 Global Step: 121660 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:13:45,944-Speed 2997.79 samples/sec Loss 7.4032 LearningRate 0.0260 Epoch: 9 Global Step: 121670 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:13:49,362-Speed 2996.99 samples/sec Loss 7.3949 LearningRate 0.0260 Epoch: 9 Global Step: 121680 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:13:52,765-Speed 3010.11 samples/sec Loss 7.4333 LearningRate 0.0260 Epoch: 9 Global Step: 121690 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:13:56,119-Speed 3054.01 samples/sec Loss 7.4038 LearningRate 0.0260 Epoch: 9 Global Step: 121700 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:13:59,497-Speed 3032.06 samples/sec Loss 7.4045 LearningRate 0.0260 Epoch: 9 Global Step: 121710 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:14:02,892-Speed 3017.43 samples/sec Loss 7.2876 LearningRate 0.0260 Epoch: 9 Global Step: 121720 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:14:06,238-Speed 3060.85 samples/sec Loss 7.2997 LearningRate 0.0260 Epoch: 9 Global Step: 121730 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:14:09,694-Speed 2963.97 samples/sec Loss 7.4372 LearningRate 0.0260 Epoch: 9 Global Step: 121740 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:14:13,053-Speed 3049.19 samples/sec Loss 7.4483 LearningRate 0.0260 Epoch: 9 Global Step: 121750 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:14:16,474-Speed 2994.90 samples/sec Loss 7.4274 LearningRate 0.0260 Epoch: 9 Global Step: 121760 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:14:19,979-Speed 2922.08 samples/sec Loss 7.5073 LearningRate 0.0260 Epoch: 9 Global Step: 121770 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:14:23,331-Speed 3055.26 samples/sec Loss 7.4457 LearningRate 0.0260 Epoch: 9 Global Step: 121780 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:14:26,736-Speed 3008.79 samples/sec Loss 7.4341 LearningRate 0.0260 Epoch: 9 Global Step: 121790 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:14:30,167-Speed 2984.72 samples/sec Loss 7.3115 LearningRate 0.0260 Epoch: 9 Global Step: 121800 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:14:33,550-Speed 3028.37 samples/sec Loss 7.4846 LearningRate 0.0260 Epoch: 9 Global Step: 121810 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:14:36,988-Speed 2979.00 samples/sec Loss 7.5209 LearningRate 0.0260 Epoch: 9 Global Step: 121820 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-27 13:14:40,401-Speed 3001.55 samples/sec Loss 7.4679 LearningRate 0.0260 Epoch: 9 Global Step: 121830 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-27 13:14:43,855-Speed 2965.32 samples/sec Loss 7.3676 LearningRate 0.0260 Epoch: 9 Global Step: 121840 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:14:47,208-Speed 3054.88 samples/sec Loss 7.4375 LearningRate 0.0260 Epoch: 9 Global Step: 121850 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:14:50,564-Speed 3051.93 samples/sec Loss 7.3536 LearningRate 0.0260 Epoch: 9 Global Step: 121860 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:14:53,897-Speed 3073.70 samples/sec Loss 7.4171 LearningRate 0.0260 Epoch: 9 Global Step: 121870 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:14:57,230-Speed 3072.76 samples/sec Loss 7.3700 LearningRate 0.0259 Epoch: 9 Global Step: 121880 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:15:00,597-Speed 3042.40 samples/sec Loss 7.3495 LearningRate 0.0259 Epoch: 9 Global Step: 121890 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:15:03,980-Speed 3027.50 samples/sec Loss 7.3664 LearningRate 0.0259 Epoch: 9 Global Step: 121900 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:15:07,337-Speed 3051.79 samples/sec Loss 7.4877 LearningRate 0.0259 Epoch: 9 Global Step: 121910 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:15:10,766-Speed 2986.58 samples/sec Loss 7.3548 LearningRate 0.0259 Epoch: 9 Global Step: 121920 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:15:14,152-Speed 3025.17 samples/sec Loss 7.4411 LearningRate 0.0259 Epoch: 9 Global Step: 121930 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:15:17,500-Speed 3059.36 samples/sec Loss 7.4487 LearningRate 0.0259 Epoch: 9 Global Step: 121940 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:15:20,909-Speed 3005.44 samples/sec Loss 7.5398 LearningRate 0.0259 Epoch: 9 Global Step: 121950 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:15:24,256-Speed 3060.08 samples/sec Loss 7.3629 LearningRate 0.0259 Epoch: 9 Global Step: 121960 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:15:27,614-Speed 3050.79 samples/sec Loss 7.3696 LearningRate 0.0259 Epoch: 9 Global Step: 121970 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:15:30,940-Speed 3079.48 samples/sec Loss 7.4552 LearningRate 0.0259 Epoch: 9 Global Step: 121980 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:15:34,445-Speed 2922.55 samples/sec Loss 7.5789 LearningRate 0.0259 Epoch: 9 Global Step: 121990 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:15:37,898-Speed 2966.09 samples/sec Loss 7.4263 LearningRate 0.0259 Epoch: 9 Global Step: 122000 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:15:41,314-Speed 2998.90 samples/sec Loss 7.5680 LearningRate 0.0259 Epoch: 9 Global Step: 122010 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:15:44,701-Speed 3023.55 samples/sec Loss 7.4563 LearningRate 0.0259 Epoch: 9 Global Step: 122020 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:15:48,139-Speed 2979.05 samples/sec Loss 7.3807 LearningRate 0.0259 Epoch: 9 Global Step: 122030 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:15:51,584-Speed 2973.56 samples/sec Loss 7.4159 LearningRate 0.0259 Epoch: 9 Global Step: 122040 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:15:54,911-Speed 3078.65 samples/sec Loss 7.3866 LearningRate 0.0259 Epoch: 9 Global Step: 122050 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-27 13:15:58,253-Speed 3067.17 samples/sec Loss 7.3736 LearningRate 0.0259 Epoch: 9 Global Step: 122060 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-27 13:16:01,607-Speed 3054.27 samples/sec Loss 7.3361 LearningRate 0.0259 Epoch: 9 Global Step: 122070 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-27 13:16:05,015-Speed 3005.87 samples/sec Loss 7.5054 LearningRate 0.0259 Epoch: 9 Global Step: 122080 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-27 13:16:08,444-Speed 2987.16 samples/sec Loss 7.3960 LearningRate 0.0259 Epoch: 9 Global Step: 122090 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-27 13:16:11,811-Speed 3042.13 samples/sec Loss 7.4745 LearningRate 0.0259 Epoch: 9 Global Step: 122100 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:16:15,151-Speed 3067.32 samples/sec Loss 7.4531 LearningRate 0.0259 Epoch: 9 Global Step: 122110 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:16:18,471-Speed 3085.28 samples/sec Loss 7.4288 LearningRate 0.0258 Epoch: 9 Global Step: 122120 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:16:21,831-Speed 3048.05 samples/sec Loss 7.3546 LearningRate 0.0258 Epoch: 9 Global Step: 122130 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:16:25,207-Speed 3034.26 samples/sec Loss 7.4562 LearningRate 0.0258 Epoch: 9 Global Step: 122140 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:16:28,643-Speed 2980.77 samples/sec Loss 7.3256 LearningRate 0.0258 Epoch: 9 Global Step: 122150 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:16:32,057-Speed 3000.39 samples/sec Loss 7.4595 LearningRate 0.0258 Epoch: 9 Global Step: 122160 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:16:35,410-Speed 3055.72 samples/sec Loss 7.4002 LearningRate 0.0258 Epoch: 9 Global Step: 122170 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:16:38,862-Speed 2966.68 samples/sec Loss 7.3448 LearningRate 0.0258 Epoch: 9 Global Step: 122180 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:16:42,289-Speed 2988.67 samples/sec Loss 7.4952 LearningRate 0.0258 Epoch: 9 Global Step: 122190 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:16:45,676-Speed 3023.94 samples/sec Loss 7.3411 LearningRate 0.0258 Epoch: 9 Global Step: 122200 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:16:49,079-Speed 3010.10 samples/sec Loss 7.3972 LearningRate 0.0258 Epoch: 9 Global Step: 122210 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:16:52,471-Speed 3020.25 samples/sec Loss 7.4438 LearningRate 0.0258 Epoch: 9 Global Step: 122220 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:16:55,795-Speed 3080.86 samples/sec Loss 7.3761 LearningRate 0.0258 Epoch: 9 Global Step: 122230 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:16:59,151-Speed 3052.16 samples/sec Loss 7.4000 LearningRate 0.0258 Epoch: 9 Global Step: 122240 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:17:02,572-Speed 2994.88 samples/sec Loss 7.4318 LearningRate 0.0258 Epoch: 9 Global Step: 122250 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:17:06,115-Speed 2890.43 samples/sec Loss 7.3413 LearningRate 0.0258 Epoch: 9 Global Step: 122260 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:17:09,439-Speed 3081.61 samples/sec Loss 7.3860 LearningRate 0.0258 Epoch: 9 Global Step: 122270 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:17:12,775-Speed 3070.31 samples/sec Loss 7.4650 LearningRate 0.0258 Epoch: 9 Global Step: 122280 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:17:16,177-Speed 3010.94 samples/sec Loss 7.3578 LearningRate 0.0258 Epoch: 9 Global Step: 122290 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:17:19,502-Speed 3080.54 samples/sec Loss 7.4428 LearningRate 0.0258 Epoch: 9 Global Step: 122300 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:17:22,826-Speed 3081.08 samples/sec Loss 7.3219 LearningRate 0.0258 Epoch: 9 Global Step: 122310 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:17:26,181-Speed 3052.95 samples/sec Loss 7.4929 LearningRate 0.0258 Epoch: 9 Global Step: 122320 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:17:29,603-Speed 2993.49 samples/sec Loss 7.3116 LearningRate 0.0258 Epoch: 9 Global Step: 122330 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:17:33,035-Speed 2984.95 samples/sec Loss 7.3509 LearningRate 0.0258 Epoch: 9 Global Step: 122340 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:17:36,397-Speed 3045.81 samples/sec Loss 7.4763 LearningRate 0.0258 Epoch: 9 Global Step: 122350 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:17:39,866-Speed 2952.99 samples/sec Loss 7.5584 LearningRate 0.0258 Epoch: 9 Global Step: 122360 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:17:43,331-Speed 2955.65 samples/sec Loss 7.4081 LearningRate 0.0257 Epoch: 9 Global Step: 122370 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:17:46,691-Speed 3048.28 samples/sec Loss 7.4557 LearningRate 0.0257 Epoch: 9 Global Step: 122380 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:17:50,075-Speed 3027.26 samples/sec Loss 7.3921 LearningRate 0.0257 Epoch: 9 Global Step: 122390 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:17:53,404-Speed 3076.45 samples/sec Loss 7.4035 LearningRate 0.0257 Epoch: 9 Global Step: 122400 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:17:56,824-Speed 2995.45 samples/sec Loss 7.3856 LearningRate 0.0257 Epoch: 9 Global Step: 122410 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:18:00,197-Speed 3036.29 samples/sec Loss 7.3456 LearningRate 0.0257 Epoch: 9 Global Step: 122420 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:18:03,561-Speed 3045.35 samples/sec Loss 7.3120 LearningRate 0.0257 Epoch: 9 Global Step: 122430 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:18:06,915-Speed 3053.57 samples/sec Loss 7.2978 LearningRate 0.0257 Epoch: 9 Global Step: 122440 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:18:10,248-Speed 3073.39 samples/sec Loss 7.4067 LearningRate 0.0257 Epoch: 9 Global Step: 122450 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:18:13,704-Speed 2963.61 samples/sec Loss 7.3342 LearningRate 0.0257 Epoch: 9 Global Step: 122460 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:18:17,158-Speed 2965.61 samples/sec Loss 7.3176 LearningRate 0.0257 Epoch: 9 Global Step: 122470 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:18:20,547-Speed 3021.84 samples/sec Loss 7.4570 LearningRate 0.0257 Epoch: 9 Global Step: 122480 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:18:24,003-Speed 2964.22 samples/sec Loss 7.3492 LearningRate 0.0257 Epoch: 9 Global Step: 122490 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:18:27,354-Speed 3056.44 samples/sec Loss 7.3873 LearningRate 0.0257 Epoch: 9 Global Step: 122500 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:18:30,701-Speed 3060.03 samples/sec Loss 7.4861 LearningRate 0.0257 Epoch: 9 Global Step: 122510 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:18:34,041-Speed 3067.24 samples/sec Loss 7.3383 LearningRate 0.0257 Epoch: 9 Global Step: 122520 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:18:37,385-Speed 3062.64 samples/sec Loss 7.4270 LearningRate 0.0257 Epoch: 9 Global Step: 122530 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:18:40,749-Speed 3044.74 samples/sec Loss 7.3330 LearningRate 0.0257 Epoch: 9 Global Step: 122540 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:18:44,213-Speed 2957.36 samples/sec Loss 7.3430 LearningRate 0.0257 Epoch: 9 Global Step: 122550 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:18:47,632-Speed 2996.12 samples/sec Loss 7.3961 LearningRate 0.0257 Epoch: 9 Global Step: 122560 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:18:51,046-Speed 3000.71 samples/sec Loss 7.4701 LearningRate 0.0257 Epoch: 9 Global Step: 122570 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:18:54,505-Speed 2960.70 samples/sec Loss 7.4281 LearningRate 0.0257 Epoch: 9 Global Step: 122580 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:18:57,885-Speed 3030.78 samples/sec Loss 7.4466 LearningRate 0.0257 Epoch: 9 Global Step: 122590 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:19:01,221-Speed 3070.56 samples/sec Loss 7.2763 LearningRate 0.0257 Epoch: 9 Global Step: 122600 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:19:04,670-Speed 2969.80 samples/sec Loss 7.3475 LearningRate 0.0256 Epoch: 9 Global Step: 122610 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:19:08,065-Speed 3017.06 samples/sec Loss 7.3335 LearningRate 0.0256 Epoch: 9 Global Step: 122620 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:19:11,428-Speed 3046.34 samples/sec Loss 7.3631 LearningRate 0.0256 Epoch: 9 Global Step: 122630 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:19:14,768-Speed 3066.53 samples/sec Loss 7.4008 LearningRate 0.0256 Epoch: 9 Global Step: 122640 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:19:18,075-Speed 3097.57 samples/sec Loss 7.3791 LearningRate 0.0256 Epoch: 9 Global Step: 122650 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:19:21,453-Speed 3031.99 samples/sec Loss 7.3721 LearningRate 0.0256 Epoch: 9 Global Step: 122660 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:19:24,823-Speed 3039.92 samples/sec Loss 7.4172 LearningRate 0.0256 Epoch: 9 Global Step: 122670 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:19:28,188-Speed 3043.55 samples/sec Loss 7.2733 LearningRate 0.0256 Epoch: 9 Global Step: 122680 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:19:31,559-Speed 3038.13 samples/sec Loss 7.2225 LearningRate 0.0256 Epoch: 9 Global Step: 122690 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:19:34,867-Speed 3096.52 samples/sec Loss 7.5711 LearningRate 0.0256 Epoch: 9 Global Step: 122700 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:19:38,199-Speed 3074.05 samples/sec Loss 7.3219 LearningRate 0.0256 Epoch: 9 Global Step: 122710 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:19:41,573-Speed 3035.99 samples/sec Loss 7.3048 LearningRate 0.0256 Epoch: 9 Global Step: 122720 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:19:44,901-Speed 3077.42 samples/sec Loss 7.3425 LearningRate 0.0256 Epoch: 9 Global Step: 122730 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:19:48,336-Speed 2982.11 samples/sec Loss 7.3394 LearningRate 0.0256 Epoch: 9 Global Step: 122740 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:19:51,666-Speed 3075.66 samples/sec Loss 7.4059 LearningRate 0.0256 Epoch: 9 Global Step: 122750 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:19:55,103-Speed 2980.44 samples/sec Loss 7.3734 LearningRate 0.0256 Epoch: 9 Global Step: 122760 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:19:58,513-Speed 3003.95 samples/sec Loss 7.4657 LearningRate 0.0256 Epoch: 9 Global Step: 122770 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:20:01,946-Speed 2983.91 samples/sec Loss 7.3992 LearningRate 0.0256 Epoch: 9 Global Step: 122780 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:20:05,349-Speed 3010.68 samples/sec Loss 7.4426 LearningRate 0.0256 Epoch: 9 Global Step: 122790 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:20:08,693-Speed 3062.88 samples/sec Loss 7.3249 LearningRate 0.0256 Epoch: 9 Global Step: 122800 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:20:12,083-Speed 3021.53 samples/sec Loss 7.4599 LearningRate 0.0256 Epoch: 9 Global Step: 122810 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:20:15,408-Speed 3079.66 samples/sec Loss 7.3819 LearningRate 0.0256 Epoch: 9 Global Step: 122820 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:20:18,784-Speed 3034.77 samples/sec Loss 7.3528 LearningRate 0.0256 Epoch: 9 Global Step: 122830 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:20:22,174-Speed 3020.94 samples/sec Loss 7.3521 LearningRate 0.0256 Epoch: 9 Global Step: 122840 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:20:25,484-Speed 3094.57 samples/sec Loss 7.4899 LearningRate 0.0256 Epoch: 9 Global Step: 122850 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:20:28,872-Speed 3023.43 samples/sec Loss 7.3455 LearningRate 0.0255 Epoch: 9 Global Step: 122860 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:20:32,217-Speed 3061.63 samples/sec Loss 7.2860 LearningRate 0.0255 Epoch: 9 Global Step: 122870 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:20:35,542-Speed 3080.94 samples/sec Loss 7.3344 LearningRate 0.0255 Epoch: 9 Global Step: 122880 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:20:38,999-Speed 2963.26 samples/sec Loss 7.4333 LearningRate 0.0255 Epoch: 9 Global Step: 122890 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:20:42,341-Speed 3064.84 samples/sec Loss 7.4309 LearningRate 0.0255 Epoch: 9 Global Step: 122900 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:20:45,719-Speed 3031.72 samples/sec Loss 7.3093 LearningRate 0.0255 Epoch: 9 Global Step: 122910 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:20:49,037-Speed 3087.70 samples/sec Loss 7.3169 LearningRate 0.0255 Epoch: 9 Global Step: 122920 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:20:52,414-Speed 3033.27 samples/sec Loss 7.3923 LearningRate 0.0255 Epoch: 9 Global Step: 122930 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:20:55,779-Speed 3042.93 samples/sec Loss 7.5699 LearningRate 0.0255 Epoch: 9 Global Step: 122940 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:20:59,122-Speed 3064.37 samples/sec Loss 7.4403 LearningRate 0.0255 Epoch: 9 Global Step: 122950 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:21:02,414-Speed 3111.88 samples/sec Loss 7.3818 LearningRate 0.0255 Epoch: 9 Global Step: 122960 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:21:05,759-Speed 3061.90 samples/sec Loss 7.2818 LearningRate 0.0255 Epoch: 9 Global Step: 122970 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:21:09,132-Speed 3036.79 samples/sec Loss 7.3625 LearningRate 0.0255 Epoch: 9 Global Step: 122980 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:21:12,573-Speed 2976.84 samples/sec Loss 7.2928 LearningRate 0.0255 Epoch: 9 Global Step: 122990 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:21:15,955-Speed 3028.26 samples/sec Loss 7.3870 LearningRate 0.0255 Epoch: 9 Global Step: 123000 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:21:19,368-Speed 3001.65 samples/sec Loss 7.2922 LearningRate 0.0255 Epoch: 9 Global Step: 123010 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:21:22,816-Speed 2970.32 samples/sec Loss 7.3863 LearningRate 0.0255 Epoch: 9 Global Step: 123020 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:21:26,183-Speed 3042.30 samples/sec Loss 7.3237 LearningRate 0.0255 Epoch: 9 Global Step: 123030 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:21:29,529-Speed 3060.75 samples/sec Loss 7.4127 LearningRate 0.0255 Epoch: 9 Global Step: 123040 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:21:32,924-Speed 3016.95 samples/sec Loss 7.5166 LearningRate 0.0255 Epoch: 9 Global Step: 123050 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:21:36,260-Speed 3070.43 samples/sec Loss 7.3570 LearningRate 0.0255 Epoch: 9 Global Step: 123060 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:21:39,627-Speed 3042.71 samples/sec Loss 7.2054 LearningRate 0.0255 Epoch: 9 Global Step: 123070 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:21:42,970-Speed 3063.90 samples/sec Loss 7.3648 LearningRate 0.0255 Epoch: 9 Global Step: 123080 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:21:46,318-Speed 3059.89 samples/sec Loss 7.3860 LearningRate 0.0255 Epoch: 9 Global Step: 123090 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:21:49,779-Speed 2959.23 samples/sec Loss 7.4268 LearningRate 0.0254 Epoch: 9 Global Step: 123100 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:21:53,194-Speed 2998.89 samples/sec Loss 7.3007 LearningRate 0.0254 Epoch: 9 Global Step: 123110 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:21:56,500-Speed 3098.74 samples/sec Loss 7.2840 LearningRate 0.0254 Epoch: 9 Global Step: 123120 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:21:59,884-Speed 3027.05 samples/sec Loss 7.5576 LearningRate 0.0254 Epoch: 9 Global Step: 123130 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:22:03,216-Speed 3074.07 samples/sec Loss 7.3174 LearningRate 0.0254 Epoch: 9 Global Step: 123140 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:22:06,567-Speed 3056.61 samples/sec Loss 7.3198 LearningRate 0.0254 Epoch: 9 Global Step: 123150 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:22:09,909-Speed 3064.04 samples/sec Loss 7.2368 LearningRate 0.0254 Epoch: 9 Global Step: 123160 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:22:13,309-Speed 3013.29 samples/sec Loss 7.3079 LearningRate 0.0254 Epoch: 9 Global Step: 123170 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:22:16,723-Speed 2999.94 samples/sec Loss 7.1740 LearningRate 0.0254 Epoch: 9 Global Step: 123180 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:22:20,087-Speed 3044.39 samples/sec Loss 7.4638 LearningRate 0.0254 Epoch: 9 Global Step: 123190 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:22:23,440-Speed 3054.88 samples/sec Loss 7.3274 LearningRate 0.0254 Epoch: 9 Global Step: 123200 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:22:26,799-Speed 3049.77 samples/sec Loss 7.3911 LearningRate 0.0254 Epoch: 9 Global Step: 123210 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:22:30,199-Speed 3012.89 samples/sec Loss 7.3589 LearningRate 0.0254 Epoch: 9 Global Step: 123220 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:22:33,579-Speed 3030.25 samples/sec Loss 7.3704 LearningRate 0.0254 Epoch: 9 Global Step: 123230 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:22:36,959-Speed 3030.66 samples/sec Loss 7.5114 LearningRate 0.0254 Epoch: 9 Global Step: 123240 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:22:40,381-Speed 2992.89 samples/sec Loss 7.4002 LearningRate 0.0254 Epoch: 9 Global Step: 123250 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:22:43,856-Speed 2947.89 samples/sec Loss 7.4053 LearningRate 0.0254 Epoch: 9 Global Step: 123260 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:22:47,308-Speed 2967.10 samples/sec Loss 7.4464 LearningRate 0.0254 Epoch: 9 Global Step: 123270 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:22:50,739-Speed 2985.39 samples/sec Loss 7.3301 LearningRate 0.0254 Epoch: 9 Global Step: 123280 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:22:54,080-Speed 3065.53 samples/sec Loss 7.3332 LearningRate 0.0254 Epoch: 9 Global Step: 123290 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:22:57,452-Speed 3037.76 samples/sec Loss 7.2702 LearningRate 0.0254 Epoch: 9 Global Step: 123300 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:23:00,818-Speed 3042.49 samples/sec Loss 7.4130 LearningRate 0.0254 Epoch: 9 Global Step: 123310 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:23:04,152-Speed 3072.19 samples/sec Loss 7.3179 LearningRate 0.0254 Epoch: 9 Global Step: 123320 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:23:07,506-Speed 3053.67 samples/sec Loss 7.4339 LearningRate 0.0254 Epoch: 9 Global Step: 123330 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:23:10,839-Speed 3073.55 samples/sec Loss 7.3201 LearningRate 0.0254 Epoch: 9 Global Step: 123340 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:23:14,190-Speed 3056.13 samples/sec Loss 7.3162 LearningRate 0.0253 Epoch: 9 Global Step: 123350 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:23:17,618-Speed 2988.23 samples/sec Loss 7.3833 LearningRate 0.0253 Epoch: 9 Global Step: 123360 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:23:21,030-Speed 3002.52 samples/sec Loss 7.2543 LearningRate 0.0253 Epoch: 9 Global Step: 123370 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:23:24,469-Speed 2978.43 samples/sec Loss 7.3465 LearningRate 0.0253 Epoch: 9 Global Step: 123380 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:23:27,860-Speed 3020.14 samples/sec Loss 7.3594 LearningRate 0.0253 Epoch: 9 Global Step: 123390 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:23:31,227-Speed 3042.62 samples/sec Loss 7.2626 LearningRate 0.0253 Epoch: 9 Global Step: 123400 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:23:34,599-Speed 3036.69 samples/sec Loss 7.2742 LearningRate 0.0253 Epoch: 9 Global Step: 123410 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:23:37,995-Speed 3016.56 samples/sec Loss 7.3824 LearningRate 0.0253 Epoch: 9 Global Step: 123420 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:23:41,426-Speed 2985.46 samples/sec Loss 7.3186 LearningRate 0.0253 Epoch: 9 Global Step: 123430 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:23:44,793-Speed 3042.37 samples/sec Loss 7.4294 LearningRate 0.0253 Epoch: 9 Global Step: 123440 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:23:48,155-Speed 3046.58 samples/sec Loss 7.1149 LearningRate 0.0253 Epoch: 9 Global Step: 123450 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:23:51,601-Speed 2971.93 samples/sec Loss 7.1830 LearningRate 0.0253 Epoch: 9 Global Step: 123460 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:23:55,064-Speed 2957.55 samples/sec Loss 7.3452 LearningRate 0.0253 Epoch: 9 Global Step: 123470 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:23:58,434-Speed 3039.65 samples/sec Loss 7.3031 LearningRate 0.0253 Epoch: 9 Global Step: 123480 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:24:01,816-Speed 3029.07 samples/sec Loss 7.3669 LearningRate 0.0253 Epoch: 9 Global Step: 123490 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:24:05,219-Speed 3009.92 samples/sec Loss 7.3587 LearningRate 0.0253 Epoch: 9 Global Step: 123500 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:24:08,566-Speed 3060.43 samples/sec Loss 7.3324 LearningRate 0.0253 Epoch: 9 Global Step: 123510 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:24:11,954-Speed 3022.84 samples/sec Loss 7.3629 LearningRate 0.0253 Epoch: 9 Global Step: 123520 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:24:15,305-Speed 3056.72 samples/sec Loss 7.4549 LearningRate 0.0253 Epoch: 9 Global Step: 123530 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:24:18,780-Speed 2948.17 samples/sec Loss 7.3566 LearningRate 0.0253 Epoch: 9 Global Step: 123540 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:24:22,115-Speed 3071.40 samples/sec Loss 7.2437 LearningRate 0.0253 Epoch: 9 Global Step: 123550 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:24:25,570-Speed 2964.30 samples/sec Loss 7.1887 LearningRate 0.0253 Epoch: 9 Global Step: 123560 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:24:29,006-Speed 2980.93 samples/sec Loss 7.3371 LearningRate 0.0253 Epoch: 9 Global Step: 123570 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:24:32,438-Speed 2984.39 samples/sec Loss 7.1713 LearningRate 0.0253 Epoch: 9 Global Step: 123580 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:24:35,965-Speed 2903.56 samples/sec Loss 7.4069 LearningRate 0.0253 Epoch: 9 Global Step: 123590 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:24:39,489-Speed 2906.96 samples/sec Loss 7.4439 LearningRate 0.0252 Epoch: 9 Global Step: 123600 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:24:42,876-Speed 3024.32 samples/sec Loss 7.4714 LearningRate 0.0252 Epoch: 9 Global Step: 123610 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:24:46,361-Speed 2938.47 samples/sec Loss 7.2377 LearningRate 0.0252 Epoch: 9 Global Step: 123620 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:24:49,711-Speed 3058.41 samples/sec Loss 7.2622 LearningRate 0.0252 Epoch: 9 Global Step: 123630 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:24:53,069-Speed 3050.11 samples/sec Loss 7.3118 LearningRate 0.0252 Epoch: 9 Global Step: 123640 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:24:56,431-Speed 3045.90 samples/sec Loss 7.4258 LearningRate 0.0252 Epoch: 9 Global Step: 123650 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:24:59,752-Speed 3085.28 samples/sec Loss 7.2218 LearningRate 0.0252 Epoch: 9 Global Step: 123660 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:25:03,112-Speed 3047.74 samples/sec Loss 7.2986 LearningRate 0.0252 Epoch: 9 Global Step: 123670 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:25:06,440-Speed 3078.17 samples/sec Loss 7.3396 LearningRate 0.0252 Epoch: 9 Global Step: 123680 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:25:09,785-Speed 3062.48 samples/sec Loss 7.2223 LearningRate 0.0252 Epoch: 9 Global Step: 123690 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:25:13,166-Speed 3028.97 samples/sec Loss 7.1727 LearningRate 0.0252 Epoch: 9 Global Step: 123700 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:25:16,581-Speed 2999.61 samples/sec Loss 7.1815 LearningRate 0.0252 Epoch: 9 Global Step: 123710 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:25:19,936-Speed 3052.73 samples/sec Loss 7.3220 LearningRate 0.0252 Epoch: 9 Global Step: 123720 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:25:23,285-Speed 3058.82 samples/sec Loss 7.3545 LearningRate 0.0252 Epoch: 9 Global Step: 123730 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:25:26,696-Speed 3003.02 samples/sec Loss 7.2819 LearningRate 0.0252 Epoch: 9 Global Step: 123740 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:25:30,099-Speed 3009.90 samples/sec Loss 7.3165 LearningRate 0.0252 Epoch: 9 Global Step: 123750 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:25:33,488-Speed 3021.93 samples/sec Loss 7.3246 LearningRate 0.0252 Epoch: 9 Global Step: 123760 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:25:36,869-Speed 3029.91 samples/sec Loss 7.2888 LearningRate 0.0252 Epoch: 9 Global Step: 123770 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 13:25:40,286-Speed 2997.70 samples/sec Loss 7.3207 LearningRate 0.0252 Epoch: 9 Global Step: 123780 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 13:25:43,693-Speed 3006.51 samples/sec Loss 7.3604 LearningRate 0.0252 Epoch: 9 Global Step: 123790 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 13:25:47,101-Speed 3005.56 samples/sec Loss 7.2287 LearningRate 0.0252 Epoch: 9 Global Step: 123800 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 13:25:50,506-Speed 3008.40 samples/sec Loss 7.3708 LearningRate 0.0252 Epoch: 9 Global Step: 123810 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 13:25:53,836-Speed 3076.03 samples/sec Loss 7.2428 LearningRate 0.0252 Epoch: 9 Global Step: 123820 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 13:25:57,186-Speed 3056.65 samples/sec Loss 7.4626 LearningRate 0.0252 Epoch: 9 Global Step: 123830 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 13:26:00,560-Speed 3035.82 samples/sec Loss 7.4288 LearningRate 0.0251 Epoch: 9 Global Step: 123840 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 13:26:03,895-Speed 3071.87 samples/sec Loss 7.3675 LearningRate 0.0251 Epoch: 9 Global Step: 123850 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 13:26:07,290-Speed 3016.46 samples/sec Loss 7.3168 LearningRate 0.0251 Epoch: 9 Global Step: 123860 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 13:26:10,715-Speed 2990.71 samples/sec Loss 7.3129 LearningRate 0.0251 Epoch: 9 Global Step: 123870 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:26:14,064-Speed 3058.59 samples/sec Loss 7.3151 LearningRate 0.0251 Epoch: 9 Global Step: 123880 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:26:17,485-Speed 2993.62 samples/sec Loss 7.3599 LearningRate 0.0251 Epoch: 9 Global Step: 123890 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:26:20,923-Speed 2979.28 samples/sec Loss 7.3008 LearningRate 0.0251 Epoch: 9 Global Step: 123900 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:26:24,323-Speed 3012.57 samples/sec Loss 7.3195 LearningRate 0.0251 Epoch: 9 Global Step: 123910 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:26:27,711-Speed 3024.11 samples/sec Loss 7.3011 LearningRate 0.0251 Epoch: 9 Global Step: 123920 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:26:31,100-Speed 3022.30 samples/sec Loss 7.3026 LearningRate 0.0251 Epoch: 9 Global Step: 123930 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:26:34,448-Speed 3059.49 samples/sec Loss 7.4222 LearningRate 0.0251 Epoch: 9 Global Step: 123940 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:26:37,795-Speed 3060.71 samples/sec Loss 7.3608 LearningRate 0.0251 Epoch: 9 Global Step: 123950 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:26:41,156-Speed 3047.35 samples/sec Loss 7.2264 LearningRate 0.0251 Epoch: 9 Global Step: 123960 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 13:26:44,539-Speed 3028.05 samples/sec Loss 7.3088 LearningRate 0.0251 Epoch: 9 Global Step: 123970 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 13:26:47,886-Speed 3060.27 samples/sec Loss 7.3098 LearningRate 0.0251 Epoch: 9 Global Step: 123980 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 13:26:51,319-Speed 2983.29 samples/sec Loss 7.2886 LearningRate 0.0251 Epoch: 9 Global Step: 123990 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 13:26:54,674-Speed 3053.18 samples/sec Loss 7.2117 LearningRate 0.0251 Epoch: 9 Global Step: 124000 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 13:26:58,018-Speed 3062.66 samples/sec Loss 7.3119 LearningRate 0.0251 Epoch: 9 Global Step: 124010 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 13:27:01,386-Speed 3041.13 samples/sec Loss 7.1883 LearningRate 0.0251 Epoch: 9 Global Step: 124020 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 13:27:04,748-Speed 3046.89 samples/sec Loss 7.3317 LearningRate 0.0251 Epoch: 9 Global Step: 124030 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 13:27:08,055-Speed 3097.76 samples/sec Loss 7.2811 LearningRate 0.0251 Epoch: 9 Global Step: 124040 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 13:27:11,371-Speed 3088.80 samples/sec Loss 7.3030 LearningRate 0.0251 Epoch: 9 Global Step: 124050 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 13:27:14,682-Speed 3093.60 samples/sec Loss 7.3426 LearningRate 0.0251 Epoch: 9 Global Step: 124060 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:27:18,069-Speed 3024.54 samples/sec Loss 7.2543 LearningRate 0.0251 Epoch: 9 Global Step: 124070 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:27:21,446-Speed 3032.82 samples/sec Loss 7.2893 LearningRate 0.0251 Epoch: 9 Global Step: 124080 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:27:24,753-Speed 3097.59 samples/sec Loss 7.3779 LearningRate 0.0250 Epoch: 9 Global Step: 124090 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:27:28,142-Speed 3021.84 samples/sec Loss 7.2138 LearningRate 0.0250 Epoch: 9 Global Step: 124100 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:27:31,593-Speed 2968.26 samples/sec Loss 7.3504 LearningRate 0.0250 Epoch: 9 Global Step: 124110 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:27:34,993-Speed 3012.65 samples/sec Loss 7.2111 LearningRate 0.0250 Epoch: 9 Global Step: 124120 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:27:38,345-Speed 3056.23 samples/sec Loss 7.2713 LearningRate 0.0250 Epoch: 9 Global Step: 124130 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:27:41,783-Speed 2979.00 samples/sec Loss 7.3572 LearningRate 0.0250 Epoch: 9 Global Step: 124140 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:27:45,158-Speed 3035.03 samples/sec Loss 7.3368 LearningRate 0.0250 Epoch: 9 Global Step: 124150 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:27:48,514-Speed 3052.37 samples/sec Loss 7.2942 LearningRate 0.0250 Epoch: 9 Global Step: 124160 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:27:51,948-Speed 2982.35 samples/sec Loss 7.3068 LearningRate 0.0250 Epoch: 9 Global Step: 124170 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:27:55,440-Speed 2933.67 samples/sec Loss 7.2917 LearningRate 0.0250 Epoch: 9 Global Step: 124180 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:27:58,799-Speed 3049.68 samples/sec Loss 7.2394 LearningRate 0.0250 Epoch: 9 Global Step: 124190 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:28:02,546-Speed 2733.53 samples/sec Loss 7.3207 LearningRate 0.0250 Epoch: 9 Global Step: 124200 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:28:05,995-Speed 2969.07 samples/sec Loss 7.2968 LearningRate 0.0250 Epoch: 9 Global Step: 124210 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:28:38,848-Speed 311.71 samples/sec Loss 5.8874 LearningRate 0.0250 Epoch: 10 Global Step: 124220 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:28:42,382-Speed 2899.57 samples/sec Loss 5.7777 LearningRate 0.0250 Epoch: 10 Global Step: 124230 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:28:45,774-Speed 3019.88 samples/sec Loss 5.7393 LearningRate 0.0250 Epoch: 10 Global Step: 124240 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:28:49,223-Speed 2969.93 samples/sec Loss 5.8783 LearningRate 0.0250 Epoch: 10 Global Step: 124250 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:28:52,601-Speed 3032.27 samples/sec Loss 5.8722 LearningRate 0.0250 Epoch: 10 Global Step: 124260 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:28:55,993-Speed 3019.37 samples/sec Loss 5.7411 LearningRate 0.0250 Epoch: 10 Global Step: 124270 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:28:59,333-Speed 3066.76 samples/sec Loss 5.8612 LearningRate 0.0250 Epoch: 10 Global Step: 124280 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:29:02,739-Speed 3008.10 samples/sec Loss 5.8956 LearningRate 0.0250 Epoch: 10 Global Step: 124290 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:29:06,140-Speed 3011.22 samples/sec Loss 5.8281 LearningRate 0.0250 Epoch: 10 Global Step: 124300 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:29:09,573-Speed 2984.69 samples/sec Loss 5.8799 LearningRate 0.0250 Epoch: 10 Global Step: 124310 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:29:12,971-Speed 3013.70 samples/sec Loss 5.9513 LearningRate 0.0250 Epoch: 10 Global Step: 124320 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:29:16,540-Speed 2869.99 samples/sec Loss 5.9344 LearningRate 0.0250 Epoch: 10 Global Step: 124330 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:29:19,938-Speed 3014.53 samples/sec Loss 5.8734 LearningRate 0.0249 Epoch: 10 Global Step: 124340 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:29:23,280-Speed 3064.65 samples/sec Loss 5.8706 LearningRate 0.0249 Epoch: 10 Global Step: 124350 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:29:26,746-Speed 2955.11 samples/sec Loss 5.9110 LearningRate 0.0249 Epoch: 10 Global Step: 124360 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:29:30,437-Speed 2775.58 samples/sec Loss 5.7492 LearningRate 0.0249 Epoch: 10 Global Step: 124370 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:29:33,803-Speed 3043.07 samples/sec Loss 5.8033 LearningRate 0.0249 Epoch: 10 Global Step: 124380 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:29:37,220-Speed 2997.58 samples/sec Loss 5.9446 LearningRate 0.0249 Epoch: 10 Global Step: 124390 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:29:40,784-Speed 2875.13 samples/sec Loss 5.8064 LearningRate 0.0249 Epoch: 10 Global Step: 124400 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:29:44,106-Speed 3084.16 samples/sec Loss 5.8992 LearningRate 0.0249 Epoch: 10 Global Step: 124410 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:29:47,463-Speed 3051.29 samples/sec Loss 5.7588 LearningRate 0.0249 Epoch: 10 Global Step: 124420 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-27 13:29:50,793-Speed 3075.57 samples/sec Loss 5.9502 LearningRate 0.0249 Epoch: 10 Global Step: 124430 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-27 13:29:54,125-Speed 3073.83 samples/sec Loss 5.9151 LearningRate 0.0249 Epoch: 10 Global Step: 124440 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-27 13:29:57,500-Speed 3035.35 samples/sec Loss 5.9457 LearningRate 0.0249 Epoch: 10 Global Step: 124450 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-27 13:30:00,902-Speed 3010.89 samples/sec Loss 6.0079 LearningRate 0.0249 Epoch: 10 Global Step: 124460 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:30:04,238-Speed 3070.65 samples/sec Loss 5.9348 LearningRate 0.0249 Epoch: 10 Global Step: 124470 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:30:07,634-Speed 3016.15 samples/sec Loss 5.7925 LearningRate 0.0249 Epoch: 10 Global Step: 124480 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:30:11,119-Speed 2939.24 samples/sec Loss 5.9380 LearningRate 0.0249 Epoch: 10 Global Step: 124490 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:30:14,496-Speed 3033.31 samples/sec Loss 5.9191 LearningRate 0.0249 Epoch: 10 Global Step: 124500 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:30:17,898-Speed 3010.80 samples/sec Loss 6.0102 LearningRate 0.0249 Epoch: 10 Global Step: 124510 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:30:21,261-Speed 3046.63 samples/sec Loss 5.9246 LearningRate 0.0249 Epoch: 10 Global Step: 124520 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:30:24,579-Speed 3087.19 samples/sec Loss 5.8591 LearningRate 0.0249 Epoch: 10 Global Step: 124530 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:30:28,003-Speed 2991.30 samples/sec Loss 5.8342 LearningRate 0.0249 Epoch: 10 Global Step: 124540 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:30:31,404-Speed 3012.81 samples/sec Loss 6.0412 LearningRate 0.0249 Epoch: 10 Global Step: 124550 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:30:34,815-Speed 3002.15 samples/sec Loss 5.9893 LearningRate 0.0249 Epoch: 10 Global Step: 124560 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:30:38,216-Speed 3012.37 samples/sec Loss 5.9411 LearningRate 0.0249 Epoch: 10 Global Step: 124570 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:30:41,564-Speed 3058.97 samples/sec Loss 6.0171 LearningRate 0.0249 Epoch: 10 Global Step: 124580 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:30:44,955-Speed 3020.62 samples/sec Loss 6.0004 LearningRate 0.0248 Epoch: 10 Global Step: 124590 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:30:48,333-Speed 3032.68 samples/sec Loss 5.9817 LearningRate 0.0248 Epoch: 10 Global Step: 124600 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:30:51,866-Speed 2899.26 samples/sec Loss 5.8711 LearningRate 0.0248 Epoch: 10 Global Step: 124610 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:30:55,308-Speed 2976.11 samples/sec Loss 5.8815 LearningRate 0.0248 Epoch: 10 Global Step: 124620 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:30:58,681-Speed 3036.95 samples/sec Loss 5.8970 LearningRate 0.0248 Epoch: 10 Global Step: 124630 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:31:02,035-Speed 3053.32 samples/sec Loss 6.0216 LearningRate 0.0248 Epoch: 10 Global Step: 124640 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:31:05,361-Speed 3079.52 samples/sec Loss 5.9575 LearningRate 0.0248 Epoch: 10 Global Step: 124650 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:31:08,840-Speed 2944.63 samples/sec Loss 6.0967 LearningRate 0.0248 Epoch: 10 Global Step: 124660 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:31:12,251-Speed 3003.33 samples/sec Loss 5.9831 LearningRate 0.0248 Epoch: 10 Global Step: 124670 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:31:15,658-Speed 3006.50 samples/sec Loss 6.0232 LearningRate 0.0248 Epoch: 10 Global Step: 124680 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:31:19,059-Speed 3011.81 samples/sec Loss 5.9658 LearningRate 0.0248 Epoch: 10 Global Step: 124690 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:31:22,495-Speed 2980.76 samples/sec Loss 6.0858 LearningRate 0.0248 Epoch: 10 Global Step: 124700 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:31:25,874-Speed 3031.97 samples/sec Loss 5.9775 LearningRate 0.0248 Epoch: 10 Global Step: 124710 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:31:29,239-Speed 3043.58 samples/sec Loss 5.9859 LearningRate 0.0248 Epoch: 10 Global Step: 124720 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:31:32,593-Speed 3054.31 samples/sec Loss 6.1135 LearningRate 0.0248 Epoch: 10 Global Step: 124730 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:31:35,917-Speed 3080.99 samples/sec Loss 5.9524 LearningRate 0.0248 Epoch: 10 Global Step: 124740 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:31:39,272-Speed 3054.12 samples/sec Loss 6.0654 LearningRate 0.0248 Epoch: 10 Global Step: 124750 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:31:42,642-Speed 3040.22 samples/sec Loss 6.1252 LearningRate 0.0248 Epoch: 10 Global Step: 124760 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:31:46,006-Speed 3044.79 samples/sec Loss 6.0978 LearningRate 0.0248 Epoch: 10 Global Step: 124770 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:31:49,371-Speed 3043.90 samples/sec Loss 6.0929 LearningRate 0.0248 Epoch: 10 Global Step: 124780 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:31:52,844-Speed 2948.80 samples/sec Loss 6.0671 LearningRate 0.0248 Epoch: 10 Global Step: 124790 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:31:56,214-Speed 3039.79 samples/sec Loss 6.0208 LearningRate 0.0248 Epoch: 10 Global Step: 124800 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:31:59,578-Speed 3045.00 samples/sec Loss 6.0429 LearningRate 0.0248 Epoch: 10 Global Step: 124810 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:32:02,922-Speed 3062.68 samples/sec Loss 5.9874 LearningRate 0.0248 Epoch: 10 Global Step: 124820 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:32:06,329-Speed 3006.50 samples/sec Loss 6.0470 LearningRate 0.0248 Epoch: 10 Global Step: 124830 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:32:09,687-Speed 3050.56 samples/sec Loss 5.9791 LearningRate 0.0247 Epoch: 10 Global Step: 124840 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:32:13,089-Speed 3011.09 samples/sec Loss 6.0646 LearningRate 0.0247 Epoch: 10 Global Step: 124850 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:32:16,436-Speed 3060.32 samples/sec Loss 6.0488 LearningRate 0.0247 Epoch: 10 Global Step: 124860 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:32:19,841-Speed 3007.80 samples/sec Loss 5.9191 LearningRate 0.0247 Epoch: 10 Global Step: 124870 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:32:23,233-Speed 3019.90 samples/sec Loss 6.0492 LearningRate 0.0247 Epoch: 10 Global Step: 124880 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:32:26,682-Speed 2969.63 samples/sec Loss 6.0458 LearningRate 0.0247 Epoch: 10 Global Step: 124890 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:32:30,076-Speed 3018.63 samples/sec Loss 6.1075 LearningRate 0.0247 Epoch: 10 Global Step: 124900 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:32:33,432-Speed 3051.59 samples/sec Loss 6.0945 LearningRate 0.0247 Epoch: 10 Global Step: 124910 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:32:36,818-Speed 3025.67 samples/sec Loss 6.0958 LearningRate 0.0247 Epoch: 10 Global Step: 124920 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:32:40,172-Speed 3054.08 samples/sec Loss 6.0867 LearningRate 0.0247 Epoch: 10 Global Step: 124930 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-27 13:32:43,530-Speed 3050.45 samples/sec Loss 6.1933 LearningRate 0.0247 Epoch: 10 Global Step: 124940 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-27 13:32:46,903-Speed 3036.32 samples/sec Loss 6.0523 LearningRate 0.0247 Epoch: 10 Global Step: 124950 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:32:50,265-Speed 3046.40 samples/sec Loss 6.0920 LearningRate 0.0247 Epoch: 10 Global Step: 124960 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:32:53,726-Speed 2959.52 samples/sec Loss 6.1776 LearningRate 0.0247 Epoch: 10 Global Step: 124970 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:32:57,139-Speed 3002.24 samples/sec Loss 6.0994 LearningRate 0.0247 Epoch: 10 Global Step: 124980 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:33:00,571-Speed 2985.18 samples/sec Loss 6.1097 LearningRate 0.0247 Epoch: 10 Global Step: 124990 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:33:03,933-Speed 3046.65 samples/sec Loss 6.2016 LearningRate 0.0247 Epoch: 10 Global Step: 125000 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:33:07,316-Speed 3028.19 samples/sec Loss 6.2573 LearningRate 0.0247 Epoch: 10 Global Step: 125010 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:33:10,736-Speed 2995.57 samples/sec Loss 6.1095 LearningRate 0.0247 Epoch: 10 Global Step: 125020 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:33:14,168-Speed 2984.57 samples/sec Loss 6.0694 LearningRate 0.0247 Epoch: 10 Global Step: 125030 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:33:17,625-Speed 2962.87 samples/sec Loss 6.2039 LearningRate 0.0247 Epoch: 10 Global Step: 125040 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:33:20,963-Speed 3068.49 samples/sec Loss 6.0642 LearningRate 0.0247 Epoch: 10 Global Step: 125050 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:33:24,327-Speed 3044.66 samples/sec Loss 6.1769 LearningRate 0.0247 Epoch: 10 Global Step: 125060 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:33:27,736-Speed 3004.79 samples/sec Loss 6.1745 LearningRate 0.0247 Epoch: 10 Global Step: 125070 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:33:31,108-Speed 3038.15 samples/sec Loss 6.0198 LearningRate 0.0247 Epoch: 10 Global Step: 125080 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:33:34,435-Speed 3078.88 samples/sec Loss 6.1075 LearningRate 0.0246 Epoch: 10 Global Step: 125090 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:33:37,849-Speed 3000.39 samples/sec Loss 6.1681 LearningRate 0.0246 Epoch: 10 Global Step: 125100 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:33:41,181-Speed 3073.88 samples/sec Loss 6.1760 LearningRate 0.0246 Epoch: 10 Global Step: 125110 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:33:44,625-Speed 2974.56 samples/sec Loss 6.1804 LearningRate 0.0246 Epoch: 10 Global Step: 125120 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:33:48,068-Speed 2974.86 samples/sec Loss 6.1729 LearningRate 0.0246 Epoch: 10 Global Step: 125130 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:33:51,502-Speed 2983.05 samples/sec Loss 6.1693 LearningRate 0.0246 Epoch: 10 Global Step: 125140 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:33:54,995-Speed 2932.12 samples/sec Loss 6.1406 LearningRate 0.0246 Epoch: 10 Global Step: 125150 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:33:58,538-Speed 2891.69 samples/sec Loss 6.1960 LearningRate 0.0246 Epoch: 10 Global Step: 125160 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:34:01,916-Speed 3031.84 samples/sec Loss 6.1558 LearningRate 0.0246 Epoch: 10 Global Step: 125170 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:34:05,406-Speed 2934.72 samples/sec Loss 6.1923 LearningRate 0.0246 Epoch: 10 Global Step: 125180 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:34:08,840-Speed 2983.23 samples/sec Loss 6.2315 LearningRate 0.0246 Epoch: 10 Global Step: 125190 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:34:12,219-Speed 3031.23 samples/sec Loss 6.1924 LearningRate 0.0246 Epoch: 10 Global Step: 125200 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:34:15,603-Speed 3027.39 samples/sec Loss 6.1800 LearningRate 0.0246 Epoch: 10 Global Step: 125210 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:34:19,035-Speed 2983.62 samples/sec Loss 6.1174 LearningRate 0.0246 Epoch: 10 Global Step: 125220 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-27 13:34:22,493-Speed 2962.88 samples/sec Loss 6.2243 LearningRate 0.0246 Epoch: 10 Global Step: 125230 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-27 13:34:25,952-Speed 2961.12 samples/sec Loss 6.0815 LearningRate 0.0246 Epoch: 10 Global Step: 125240 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-27 13:34:29,307-Speed 3054.52 samples/sec Loss 6.2996 LearningRate 0.0246 Epoch: 10 Global Step: 125250 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:34:32,761-Speed 2964.96 samples/sec Loss 6.1384 LearningRate 0.0246 Epoch: 10 Global Step: 125260 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:34:36,157-Speed 3016.07 samples/sec Loss 6.2461 LearningRate 0.0246 Epoch: 10 Global Step: 125270 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:34:39,492-Speed 3071.94 samples/sec Loss 6.2082 LearningRate 0.0246 Epoch: 10 Global Step: 125280 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:34:42,877-Speed 3026.17 samples/sec Loss 6.3640 LearningRate 0.0246 Epoch: 10 Global Step: 125290 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:34:46,248-Speed 3037.89 samples/sec Loss 6.3313 LearningRate 0.0246 Epoch: 10 Global Step: 125300 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:34:49,705-Speed 2963.73 samples/sec Loss 6.2012 LearningRate 0.0246 Epoch: 10 Global Step: 125310 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:34:53,133-Speed 2987.62 samples/sec Loss 6.2326 LearningRate 0.0246 Epoch: 10 Global Step: 125320 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:34:56,552-Speed 2996.10 samples/sec Loss 6.0802 LearningRate 0.0246 Epoch: 10 Global Step: 125330 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:34:59,970-Speed 2996.56 samples/sec Loss 6.1814 LearningRate 0.0245 Epoch: 10 Global Step: 125340 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:35:03,385-Speed 2999.73 samples/sec Loss 6.3097 LearningRate 0.0245 Epoch: 10 Global Step: 125350 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-27 13:35:06,736-Speed 3056.24 samples/sec Loss 6.1450 LearningRate 0.0245 Epoch: 10 Global Step: 125360 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:35:10,130-Speed 3018.46 samples/sec Loss 6.2153 LearningRate 0.0245 Epoch: 10 Global Step: 125370 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:35:13,613-Speed 2941.28 samples/sec Loss 6.3516 LearningRate 0.0245 Epoch: 10 Global Step: 125380 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:35:17,025-Speed 3001.79 samples/sec Loss 6.2831 LearningRate 0.0245 Epoch: 10 Global Step: 125390 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:35:20,458-Speed 2983.61 samples/sec Loss 6.2866 LearningRate 0.0245 Epoch: 10 Global Step: 125400 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:35:23,817-Speed 3049.25 samples/sec Loss 6.2193 LearningRate 0.0245 Epoch: 10 Global Step: 125410 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:35:27,230-Speed 3001.48 samples/sec Loss 6.2311 LearningRate 0.0245 Epoch: 10 Global Step: 125420 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:35:30,615-Speed 3026.50 samples/sec Loss 6.1877 LearningRate 0.0245 Epoch: 10 Global Step: 125430 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:35:33,967-Speed 3055.89 samples/sec Loss 6.2729 LearningRate 0.0245 Epoch: 10 Global Step: 125440 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:35:37,379-Speed 3002.67 samples/sec Loss 6.3658 LearningRate 0.0245 Epoch: 10 Global Step: 125450 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:35:40,832-Speed 2966.62 samples/sec Loss 6.2781 LearningRate 0.0245 Epoch: 10 Global Step: 125460 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:35:44,221-Speed 3022.53 samples/sec Loss 6.3134 LearningRate 0.0245 Epoch: 10 Global Step: 125470 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:35:47,613-Speed 3019.96 samples/sec Loss 6.2909 LearningRate 0.0245 Epoch: 10 Global Step: 125480 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:35:51,092-Speed 2944.35 samples/sec Loss 6.2581 LearningRate 0.0245 Epoch: 10 Global Step: 125490 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:35:54,495-Speed 3009.55 samples/sec Loss 6.3540 LearningRate 0.0245 Epoch: 10 Global Step: 125500 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:35:57,850-Speed 3053.11 samples/sec Loss 6.2584 LearningRate 0.0245 Epoch: 10 Global Step: 125510 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:36:01,230-Speed 3031.04 samples/sec Loss 6.3550 LearningRate 0.0245 Epoch: 10 Global Step: 125520 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:36:04,694-Speed 2956.37 samples/sec Loss 6.3952 LearningRate 0.0245 Epoch: 10 Global Step: 125530 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:36:08,030-Speed 3071.15 samples/sec Loss 6.3489 LearningRate 0.0245 Epoch: 10 Global Step: 125540 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:36:11,412-Speed 3028.35 samples/sec Loss 6.2560 LearningRate 0.0245 Epoch: 10 Global Step: 125550 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:36:14,844-Speed 2984.28 samples/sec Loss 6.3241 LearningRate 0.0245 Epoch: 10 Global Step: 125560 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:36:18,265-Speed 2994.47 samples/sec Loss 6.2676 LearningRate 0.0245 Epoch: 10 Global Step: 125570 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:36:21,683-Speed 2996.48 samples/sec Loss 6.2978 LearningRate 0.0245 Epoch: 10 Global Step: 125580 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:36:25,048-Speed 3044.33 samples/sec Loss 6.2493 LearningRate 0.0244 Epoch: 10 Global Step: 125590 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:36:28,402-Speed 3054.33 samples/sec Loss 6.2874 LearningRate 0.0244 Epoch: 10 Global Step: 125600 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:36:31,757-Speed 3052.34 samples/sec Loss 6.3756 LearningRate 0.0244 Epoch: 10 Global Step: 125610 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:36:35,174-Speed 2997.53 samples/sec Loss 6.2594 LearningRate 0.0244 Epoch: 10 Global Step: 125620 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:36:38,554-Speed 3031.21 samples/sec Loss 6.3522 LearningRate 0.0244 Epoch: 10 Global Step: 125630 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:36:41,893-Speed 3067.54 samples/sec Loss 6.3094 LearningRate 0.0244 Epoch: 10 Global Step: 125640 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:36:45,309-Speed 2997.81 samples/sec Loss 6.3958 LearningRate 0.0244 Epoch: 10 Global Step: 125650 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-27 13:36:48,659-Speed 3058.56 samples/sec Loss 6.4275 LearningRate 0.0244 Epoch: 10 Global Step: 125660 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:36:52,033-Speed 3035.87 samples/sec Loss 6.2436 LearningRate 0.0244 Epoch: 10 Global Step: 125670 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:36:55,438-Speed 3008.36 samples/sec Loss 6.3731 LearningRate 0.0244 Epoch: 10 Global Step: 125680 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:36:58,825-Speed 3023.84 samples/sec Loss 6.3721 LearningRate 0.0244 Epoch: 10 Global Step: 125690 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:37:02,158-Speed 3073.23 samples/sec Loss 6.3447 LearningRate 0.0244 Epoch: 10 Global Step: 125700 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:37:05,579-Speed 2993.81 samples/sec Loss 6.3056 LearningRate 0.0244 Epoch: 10 Global Step: 125710 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:37:08,995-Speed 2998.85 samples/sec Loss 6.4091 LearningRate 0.0244 Epoch: 10 Global Step: 125720 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:37:12,343-Speed 3059.88 samples/sec Loss 6.3936 LearningRate 0.0244 Epoch: 10 Global Step: 125730 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:37:15,776-Speed 2982.67 samples/sec Loss 6.3175 LearningRate 0.0244 Epoch: 10 Global Step: 125740 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:37:19,217-Speed 2976.98 samples/sec Loss 6.3973 LearningRate 0.0244 Epoch: 10 Global Step: 125750 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:37:22,625-Speed 3005.93 samples/sec Loss 6.4568 LearningRate 0.0244 Epoch: 10 Global Step: 125760 Fp16 Grad Scale: 131072 Required: 12 hours Training: 2022-04-27 13:37:25,997-Speed 3037.25 samples/sec Loss 6.3497 LearningRate 0.0244 Epoch: 10 Global Step: 125770 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:37:29,377-Speed 3030.67 samples/sec Loss 6.4748 LearningRate 0.0244 Epoch: 10 Global Step: 125780 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:37:32,789-Speed 3001.79 samples/sec Loss 6.3802 LearningRate 0.0244 Epoch: 10 Global Step: 125790 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:37:36,127-Speed 3068.38 samples/sec Loss 6.3005 LearningRate 0.0244 Epoch: 10 Global Step: 125800 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:37:39,505-Speed 3033.44 samples/sec Loss 6.4440 LearningRate 0.0244 Epoch: 10 Global Step: 125810 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:37:42,845-Speed 3066.34 samples/sec Loss 6.4619 LearningRate 0.0244 Epoch: 10 Global Step: 125820 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:37:46,170-Speed 3080.32 samples/sec Loss 6.4154 LearningRate 0.0244 Epoch: 10 Global Step: 125830 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:37:49,568-Speed 3015.07 samples/sec Loss 6.5545 LearningRate 0.0243 Epoch: 10 Global Step: 125840 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:37:53,034-Speed 2955.63 samples/sec Loss 6.4110 LearningRate 0.0243 Epoch: 10 Global Step: 125850 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:37:56,355-Speed 3083.97 samples/sec Loss 6.3326 LearningRate 0.0243 Epoch: 10 Global Step: 125860 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:37:59,732-Speed 3033.28 samples/sec Loss 6.3975 LearningRate 0.0243 Epoch: 10 Global Step: 125870 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:38:03,079-Speed 3060.97 samples/sec Loss 6.3931 LearningRate 0.0243 Epoch: 10 Global Step: 125880 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:38:06,424-Speed 3061.95 samples/sec Loss 6.4033 LearningRate 0.0243 Epoch: 10 Global Step: 125890 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:38:09,735-Speed 3093.57 samples/sec Loss 6.4396 LearningRate 0.0243 Epoch: 10 Global Step: 125900 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:38:13,075-Speed 3067.04 samples/sec Loss 6.4191 LearningRate 0.0243 Epoch: 10 Global Step: 125910 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:38:16,397-Speed 3083.32 samples/sec Loss 6.4125 LearningRate 0.0243 Epoch: 10 Global Step: 125920 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:38:19,776-Speed 3031.56 samples/sec Loss 6.4984 LearningRate 0.0243 Epoch: 10 Global Step: 125930 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:38:23,131-Speed 3052.44 samples/sec Loss 6.3679 LearningRate 0.0243 Epoch: 10 Global Step: 125940 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:38:26,425-Speed 3109.94 samples/sec Loss 6.4735 LearningRate 0.0243 Epoch: 10 Global Step: 125950 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 13:38:29,827-Speed 3010.48 samples/sec Loss 6.3641 LearningRate 0.0243 Epoch: 10 Global Step: 125960 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 13:38:33,141-Speed 3090.93 samples/sec Loss 6.4499 LearningRate 0.0243 Epoch: 10 Global Step: 125970 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 13:38:36,448-Speed 3097.35 samples/sec Loss 6.3175 LearningRate 0.0243 Epoch: 10 Global Step: 125980 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 13:38:39,796-Speed 3059.99 samples/sec Loss 6.4741 LearningRate 0.0243 Epoch: 10 Global Step: 125990 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 13:38:43,142-Speed 3060.70 samples/sec Loss 6.4282 LearningRate 0.0243 Epoch: 10 Global Step: 126000 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 13:38:46,544-Speed 3010.37 samples/sec Loss 6.3851 LearningRate 0.0243 Epoch: 10 Global Step: 126010 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 13:38:49,878-Speed 3072.85 samples/sec Loss 6.4208 LearningRate 0.0243 Epoch: 10 Global Step: 126020 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 13:38:53,244-Speed 3043.03 samples/sec Loss 6.3919 LearningRate 0.0243 Epoch: 10 Global Step: 126030 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 13:38:56,546-Speed 3101.64 samples/sec Loss 6.4096 LearningRate 0.0243 Epoch: 10 Global Step: 126040 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 13:38:59,913-Speed 3042.53 samples/sec Loss 6.4717 LearningRate 0.0243 Epoch: 10 Global Step: 126050 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:39:03,229-Speed 3089.78 samples/sec Loss 6.5060 LearningRate 0.0243 Epoch: 10 Global Step: 126060 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:39:06,598-Speed 3039.52 samples/sec Loss 6.3286 LearningRate 0.0243 Epoch: 10 Global Step: 126070 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:39:09,941-Speed 3064.56 samples/sec Loss 6.4110 LearningRate 0.0243 Epoch: 10 Global Step: 126080 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:39:13,316-Speed 3034.86 samples/sec Loss 6.4446 LearningRate 0.0242 Epoch: 10 Global Step: 126090 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:39:16,669-Speed 3055.34 samples/sec Loss 6.4652 LearningRate 0.0242 Epoch: 10 Global Step: 126100 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:39:20,097-Speed 2988.08 samples/sec Loss 6.4384 LearningRate 0.0242 Epoch: 10 Global Step: 126110 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:39:23,478-Speed 3029.38 samples/sec Loss 6.4504 LearningRate 0.0242 Epoch: 10 Global Step: 126120 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:39:26,872-Speed 3018.65 samples/sec Loss 6.4889 LearningRate 0.0242 Epoch: 10 Global Step: 126130 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:39:30,330-Speed 2962.33 samples/sec Loss 6.4236 LearningRate 0.0242 Epoch: 10 Global Step: 126140 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:39:33,780-Speed 2969.00 samples/sec Loss 6.4950 LearningRate 0.0242 Epoch: 10 Global Step: 126150 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:39:37,221-Speed 2976.12 samples/sec Loss 6.4549 LearningRate 0.0242 Epoch: 10 Global Step: 126160 Fp16 Grad Scale: 65536 Required: 12 hours Training: 2022-04-27 13:39:40,626-Speed 3008.97 samples/sec Loss 6.3724 LearningRate 0.0242 Epoch: 10 Global Step: 126170 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:39:44,065-Speed 2978.00 samples/sec Loss 6.4465 LearningRate 0.0242 Epoch: 10 Global Step: 126180 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:39:47,482-Speed 3000.73 samples/sec Loss 6.5340 LearningRate 0.0242 Epoch: 10 Global Step: 126190 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:39:50,830-Speed 3059.68 samples/sec Loss 6.5067 LearningRate 0.0242 Epoch: 10 Global Step: 126200 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:39:54,246-Speed 2998.00 samples/sec Loss 6.4610 LearningRate 0.0242 Epoch: 10 Global Step: 126210 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:39:57,604-Speed 3050.80 samples/sec Loss 6.4372 LearningRate 0.0242 Epoch: 10 Global Step: 126220 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:40:00,960-Speed 3052.20 samples/sec Loss 6.4716 LearningRate 0.0242 Epoch: 10 Global Step: 126230 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:40:04,330-Speed 3038.50 samples/sec Loss 6.5408 LearningRate 0.0242 Epoch: 10 Global Step: 126240 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:40:07,661-Speed 3075.56 samples/sec Loss 6.4752 LearningRate 0.0242 Epoch: 10 Global Step: 126250 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:40:11,061-Speed 3012.58 samples/sec Loss 6.5862 LearningRate 0.0242 Epoch: 10 Global Step: 126260 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:40:14,473-Speed 3001.87 samples/sec Loss 6.3982 LearningRate 0.0242 Epoch: 10 Global Step: 126270 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 13:40:17,813-Speed 3066.78 samples/sec Loss 6.4353 LearningRate 0.0242 Epoch: 10 Global Step: 126280 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 13:40:21,195-Speed 3029.53 samples/sec Loss 6.6039 LearningRate 0.0242 Epoch: 10 Global Step: 126290 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 13:40:24,556-Speed 3047.35 samples/sec Loss 6.4683 LearningRate 0.0242 Epoch: 10 Global Step: 126300 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 13:40:27,948-Speed 3019.85 samples/sec Loss 6.4381 LearningRate 0.0242 Epoch: 10 Global Step: 126310 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 13:40:31,328-Speed 3030.73 samples/sec Loss 6.4479 LearningRate 0.0242 Epoch: 10 Global Step: 126320 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 13:40:34,712-Speed 3026.90 samples/sec Loss 6.6580 LearningRate 0.0242 Epoch: 10 Global Step: 126330 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 13:40:38,059-Speed 3060.18 samples/sec Loss 6.4765 LearningRate 0.0241 Epoch: 10 Global Step: 126340 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 13:40:41,450-Speed 3020.83 samples/sec Loss 6.4192 LearningRate 0.0241 Epoch: 10 Global Step: 126350 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 13:40:44,763-Speed 3091.58 samples/sec Loss 6.6319 LearningRate 0.0241 Epoch: 10 Global Step: 126360 Fp16 Grad Scale: 16384 Required: 12 hours Training: 2022-04-27 13:40:48,139-Speed 3034.11 samples/sec Loss 6.4668 LearningRate 0.0241 Epoch: 10 Global Step: 126370 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:40:51,499-Speed 3048.39 samples/sec Loss 6.5178 LearningRate 0.0241 Epoch: 10 Global Step: 126380 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:40:54,862-Speed 3046.13 samples/sec Loss 6.5669 LearningRate 0.0241 Epoch: 10 Global Step: 126390 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:40:58,217-Speed 3052.75 samples/sec Loss 6.5568 LearningRate 0.0241 Epoch: 10 Global Step: 126400 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:41:01,606-Speed 3023.17 samples/sec Loss 6.5423 LearningRate 0.0241 Epoch: 10 Global Step: 126410 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:41:04,963-Speed 3050.70 samples/sec Loss 6.5376 LearningRate 0.0241 Epoch: 10 Global Step: 126420 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:41:08,370-Speed 3006.26 samples/sec Loss 6.5064 LearningRate 0.0241 Epoch: 10 Global Step: 126430 Fp16 Grad Scale: 32768 Required: 12 hours Training: 2022-04-27 13:41:11,798-Speed 2988.72 samples/sec Loss 6.6052 LearningRate 0.0241 Epoch: 10 Global Step: 126440 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:41:15,274-Speed 2945.77 samples/sec Loss 6.4939 LearningRate 0.0241 Epoch: 10 Global Step: 126450 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:41:18,731-Speed 2963.84 samples/sec Loss 6.5456 LearningRate 0.0241 Epoch: 10 Global Step: 126460 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:41:22,136-Speed 3008.22 samples/sec Loss 6.5709 LearningRate 0.0241 Epoch: 10 Global Step: 126470 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 13:41:25,525-Speed 3022.42 samples/sec Loss 6.5668 LearningRate 0.0241 Epoch: 10 Global Step: 126480 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 13:41:29,000-Speed 2947.36 samples/sec Loss 6.6277 LearningRate 0.0241 Epoch: 10 Global Step: 126490 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 13:41:32,342-Speed 3064.76 samples/sec Loss 6.4183 LearningRate 0.0241 Epoch: 10 Global Step: 126500 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 13:41:35,695-Speed 3055.75 samples/sec Loss 6.5807 LearningRate 0.0241 Epoch: 10 Global Step: 126510 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 13:41:39,057-Speed 3046.58 samples/sec Loss 6.6678 LearningRate 0.0241 Epoch: 10 Global Step: 126520 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 13:41:42,472-Speed 2999.19 samples/sec Loss 6.5791 LearningRate 0.0241 Epoch: 10 Global Step: 126530 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 13:41:45,790-Speed 3086.88 samples/sec Loss 6.5730 LearningRate 0.0241 Epoch: 10 Global Step: 126540 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 13:41:49,196-Speed 3007.11 samples/sec Loss 6.4837 LearningRate 0.0241 Epoch: 10 Global Step: 126550 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 13:41:52,555-Speed 3049.62 samples/sec Loss 6.5307 LearningRate 0.0241 Epoch: 10 Global Step: 126560 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 13:41:55,902-Speed 3061.10 samples/sec Loss 6.5437 LearningRate 0.0241 Epoch: 10 Global Step: 126570 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 13:41:59,249-Speed 3060.02 samples/sec Loss 6.6032 LearningRate 0.0241 Epoch: 10 Global Step: 126580 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 13:42:02,580-Speed 3075.45 samples/sec Loss 6.5684 LearningRate 0.0241 Epoch: 10 Global Step: 126590 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 13:42:05,911-Speed 3075.78 samples/sec Loss 6.5412 LearningRate 0.0240 Epoch: 10 Global Step: 126600 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 13:42:09,419-Speed 2919.20 samples/sec Loss 6.6004 LearningRate 0.0240 Epoch: 10 Global Step: 126610 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 13:42:12,781-Speed 3047.87 samples/sec Loss 6.5816 LearningRate 0.0240 Epoch: 10 Global Step: 126620 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 13:42:16,124-Speed 3063.81 samples/sec Loss 6.6217 LearningRate 0.0240 Epoch: 10 Global Step: 126630 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:42:19,617-Speed 2932.95 samples/sec Loss 6.4294 LearningRate 0.0240 Epoch: 10 Global Step: 126640 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:42:22,942-Speed 3080.50 samples/sec Loss 6.5965 LearningRate 0.0240 Epoch: 10 Global Step: 126650 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:42:26,365-Speed 2992.67 samples/sec Loss 6.5677 LearningRate 0.0240 Epoch: 10 Global Step: 126660 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:42:29,729-Speed 3045.15 samples/sec Loss 6.5721 LearningRate 0.0240 Epoch: 10 Global Step: 126670 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:42:33,063-Speed 3072.66 samples/sec Loss 6.4920 LearningRate 0.0240 Epoch: 10 Global Step: 126680 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:42:36,448-Speed 3025.16 samples/sec Loss 6.6268 LearningRate 0.0240 Epoch: 10 Global Step: 126690 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:42:39,840-Speed 3020.05 samples/sec Loss 6.6050 LearningRate 0.0240 Epoch: 10 Global Step: 126700 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:42:43,264-Speed 2992.11 samples/sec Loss 6.6251 LearningRate 0.0240 Epoch: 10 Global Step: 126710 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:42:46,698-Speed 2982.67 samples/sec Loss 6.4315 LearningRate 0.0240 Epoch: 10 Global Step: 126720 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:42:50,044-Speed 3061.73 samples/sec Loss 6.6307 LearningRate 0.0240 Epoch: 10 Global Step: 126730 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 13:42:53,439-Speed 3017.06 samples/sec Loss 6.5870 LearningRate 0.0240 Epoch: 10 Global Step: 126740 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 13:42:56,874-Speed 2981.46 samples/sec Loss 6.5051 LearningRate 0.0240 Epoch: 10 Global Step: 126750 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 13:43:00,315-Speed 2976.72 samples/sec Loss 6.5730 LearningRate 0.0240 Epoch: 10 Global Step: 126760 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 13:43:03,755-Speed 2978.08 samples/sec Loss 6.6559 LearningRate 0.0240 Epoch: 10 Global Step: 126770 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 13:43:07,165-Speed 3003.68 samples/sec Loss 6.6543 LearningRate 0.0240 Epoch: 10 Global Step: 126780 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 13:43:10,683-Speed 2911.16 samples/sec Loss 6.5950 LearningRate 0.0240 Epoch: 10 Global Step: 126790 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 13:43:14,130-Speed 2972.08 samples/sec Loss 6.5930 LearningRate 0.0240 Epoch: 10 Global Step: 126800 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 13:43:17,553-Speed 2991.69 samples/sec Loss 6.7334 LearningRate 0.0240 Epoch: 10 Global Step: 126810 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 13:43:20,907-Speed 3053.93 samples/sec Loss 6.5302 LearningRate 0.0240 Epoch: 10 Global Step: 126820 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 13:43:24,278-Speed 3038.79 samples/sec Loss 6.6522 LearningRate 0.0240 Epoch: 10 Global Step: 126830 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 13:43:27,622-Speed 3063.65 samples/sec Loss 6.5998 LearningRate 0.0240 Epoch: 10 Global Step: 126840 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 13:43:31,039-Speed 2997.77 samples/sec Loss 6.5327 LearningRate 0.0239 Epoch: 10 Global Step: 126850 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 13:43:34,471-Speed 2984.30 samples/sec Loss 6.6503 LearningRate 0.0239 Epoch: 10 Global Step: 126860 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 13:43:37,894-Speed 2992.81 samples/sec Loss 6.6288 LearningRate 0.0239 Epoch: 10 Global Step: 126870 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:43:41,227-Speed 3073.34 samples/sec Loss 6.5731 LearningRate 0.0239 Epoch: 10 Global Step: 126880 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:43:44,570-Speed 3064.53 samples/sec Loss 6.6108 LearningRate 0.0239 Epoch: 10 Global Step: 126890 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:43:47,945-Speed 3034.91 samples/sec Loss 6.5334 LearningRate 0.0239 Epoch: 10 Global Step: 126900 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:43:51,441-Speed 2930.12 samples/sec Loss 6.6256 LearningRate 0.0239 Epoch: 10 Global Step: 126910 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:43:54,865-Speed 2991.61 samples/sec Loss 6.6706 LearningRate 0.0239 Epoch: 10 Global Step: 126920 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:43:58,360-Speed 2930.63 samples/sec Loss 6.6726 LearningRate 0.0239 Epoch: 10 Global Step: 126930 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:44:01,801-Speed 2976.82 samples/sec Loss 6.5860 LearningRate 0.0239 Epoch: 10 Global Step: 126940 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:44:05,183-Speed 3028.81 samples/sec Loss 6.6217 LearningRate 0.0239 Epoch: 10 Global Step: 126950 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:44:08,622-Speed 2978.25 samples/sec Loss 6.6440 LearningRate 0.0239 Epoch: 10 Global Step: 126960 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:44:12,055-Speed 2983.59 samples/sec Loss 6.6083 LearningRate 0.0239 Epoch: 10 Global Step: 126970 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 13:44:15,456-Speed 3012.12 samples/sec Loss 6.6211 LearningRate 0.0239 Epoch: 10 Global Step: 126980 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 13:44:18,857-Speed 3011.94 samples/sec Loss 6.6406 LearningRate 0.0239 Epoch: 10 Global Step: 126990 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 13:44:22,285-Speed 2987.37 samples/sec Loss 6.6422 LearningRate 0.0239 Epoch: 10 Global Step: 127000 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 13:44:25,655-Speed 3039.62 samples/sec Loss 6.7185 LearningRate 0.0239 Epoch: 10 Global Step: 127010 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 13:44:29,075-Speed 2995.49 samples/sec Loss 6.6729 LearningRate 0.0239 Epoch: 10 Global Step: 127020 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 13:44:32,507-Speed 2983.70 samples/sec Loss 6.7529 LearningRate 0.0239 Epoch: 10 Global Step: 127030 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 13:44:35,860-Speed 3054.77 samples/sec Loss 6.7620 LearningRate 0.0239 Epoch: 10 Global Step: 127040 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 13:44:39,251-Speed 3021.36 samples/sec Loss 6.7446 LearningRate 0.0239 Epoch: 10 Global Step: 127050 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 13:44:42,667-Speed 2998.58 samples/sec Loss 6.7133 LearningRate 0.0239 Epoch: 10 Global Step: 127060 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 13:44:46,156-Speed 2935.39 samples/sec Loss 6.6538 LearningRate 0.0239 Epoch: 10 Global Step: 127070 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 13:44:49,522-Speed 3043.22 samples/sec Loss 6.6232 LearningRate 0.0239 Epoch: 10 Global Step: 127080 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 13:44:52,865-Speed 3063.94 samples/sec Loss 6.7325 LearningRate 0.0239 Epoch: 10 Global Step: 127090 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 13:44:56,332-Speed 2954.31 samples/sec Loss 6.7782 LearningRate 0.0239 Epoch: 10 Global Step: 127100 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 13:44:59,730-Speed 3014.37 samples/sec Loss 6.7065 LearningRate 0.0238 Epoch: 10 Global Step: 127110 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 13:45:03,205-Speed 2948.20 samples/sec Loss 6.6192 LearningRate 0.0238 Epoch: 10 Global Step: 127120 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 13:45:06,596-Speed 3020.27 samples/sec Loss 6.6021 LearningRate 0.0238 Epoch: 10 Global Step: 127130 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 13:45:09,997-Speed 3011.77 samples/sec Loss 6.7369 LearningRate 0.0238 Epoch: 10 Global Step: 127140 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 13:45:13,468-Speed 2950.96 samples/sec Loss 6.5920 LearningRate 0.0238 Epoch: 10 Global Step: 127150 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 13:45:16,838-Speed 3039.94 samples/sec Loss 6.7874 LearningRate 0.0238 Epoch: 10 Global Step: 127160 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 13:45:20,197-Speed 3049.83 samples/sec Loss 6.7061 LearningRate 0.0238 Epoch: 10 Global Step: 127170 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 13:45:23,686-Speed 2935.57 samples/sec Loss 6.5805 LearningRate 0.0238 Epoch: 10 Global Step: 127180 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 13:45:27,086-Speed 3012.64 samples/sec Loss 6.6558 LearningRate 0.0238 Epoch: 10 Global Step: 127190 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 13:45:30,430-Speed 3063.01 samples/sec Loss 6.7513 LearningRate 0.0238 Epoch: 10 Global Step: 127200 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 13:45:33,883-Speed 2966.97 samples/sec Loss 6.6749 LearningRate 0.0238 Epoch: 10 Global Step: 127210 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 13:45:37,275-Speed 3019.96 samples/sec Loss 6.6565 LearningRate 0.0238 Epoch: 10 Global Step: 127220 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 13:45:40,629-Speed 3053.90 samples/sec Loss 6.6621 LearningRate 0.0238 Epoch: 10 Global Step: 127230 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 13:45:44,030-Speed 3011.17 samples/sec Loss 6.7945 LearningRate 0.0238 Epoch: 10 Global Step: 127240 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 13:45:47,382-Speed 3055.73 samples/sec Loss 6.5860 LearningRate 0.0238 Epoch: 10 Global Step: 127250 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 13:45:50,782-Speed 3013.07 samples/sec Loss 6.7285 LearningRate 0.0238 Epoch: 10 Global Step: 127260 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 13:45:54,176-Speed 3017.86 samples/sec Loss 6.6718 LearningRate 0.0238 Epoch: 10 Global Step: 127270 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 13:45:57,564-Speed 3023.20 samples/sec Loss 6.6557 LearningRate 0.0238 Epoch: 10 Global Step: 127280 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 13:46:00,888-Speed 3082.07 samples/sec Loss 6.7167 LearningRate 0.0238 Epoch: 10 Global Step: 127290 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 13:46:04,258-Speed 3039.42 samples/sec Loss 6.6355 LearningRate 0.0238 Epoch: 10 Global Step: 127300 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 13:46:07,589-Speed 3074.80 samples/sec Loss 6.6695 LearningRate 0.0238 Epoch: 10 Global Step: 127310 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 13:46:10,961-Speed 3037.52 samples/sec Loss 6.6651 LearningRate 0.0238 Epoch: 10 Global Step: 127320 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 13:46:14,305-Speed 3063.60 samples/sec Loss 6.8105 LearningRate 0.0238 Epoch: 10 Global Step: 127330 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 13:46:17,684-Speed 3031.17 samples/sec Loss 6.8253 LearningRate 0.0238 Epoch: 10 Global Step: 127340 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 13:46:21,020-Speed 3070.10 samples/sec Loss 6.8753 LearningRate 0.0238 Epoch: 10 Global Step: 127350 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 13:46:24,381-Speed 3047.85 samples/sec Loss 6.6991 LearningRate 0.0237 Epoch: 10 Global Step: 127360 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:46:27,718-Speed 3069.70 samples/sec Loss 6.7485 LearningRate 0.0237 Epoch: 10 Global Step: 127370 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:46:31,054-Speed 3070.05 samples/sec Loss 6.6324 LearningRate 0.0237 Epoch: 10 Global Step: 127380 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:46:34,396-Speed 3065.99 samples/sec Loss 6.7974 LearningRate 0.0237 Epoch: 10 Global Step: 127390 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:46:37,840-Speed 2973.42 samples/sec Loss 6.8166 LearningRate 0.0237 Epoch: 10 Global Step: 127400 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:46:41,185-Speed 3061.83 samples/sec Loss 6.6368 LearningRate 0.0237 Epoch: 10 Global Step: 127410 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:46:44,540-Speed 3053.42 samples/sec Loss 6.6286 LearningRate 0.0237 Epoch: 10 Global Step: 127420 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:46:47,932-Speed 3019.97 samples/sec Loss 6.7430 LearningRate 0.0237 Epoch: 10 Global Step: 127430 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:46:51,275-Speed 3064.08 samples/sec Loss 6.7051 LearningRate 0.0237 Epoch: 10 Global Step: 127440 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:46:54,622-Speed 3060.48 samples/sec Loss 6.7735 LearningRate 0.0237 Epoch: 10 Global Step: 127450 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:46:58,012-Speed 3021.88 samples/sec Loss 6.8179 LearningRate 0.0237 Epoch: 10 Global Step: 127460 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:47:01,418-Speed 3007.66 samples/sec Loss 6.7189 LearningRate 0.0237 Epoch: 10 Global Step: 127470 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:47:04,799-Speed 3029.90 samples/sec Loss 6.7108 LearningRate 0.0237 Epoch: 10 Global Step: 127480 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:47:08,151-Speed 3055.70 samples/sec Loss 6.6849 LearningRate 0.0237 Epoch: 10 Global Step: 127490 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:47:11,601-Speed 2968.26 samples/sec Loss 6.4461 LearningRate 0.0237 Epoch: 10 Global Step: 127500 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:47:15,007-Speed 3007.80 samples/sec Loss 6.7124 LearningRate 0.0237 Epoch: 10 Global Step: 127510 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:47:18,335-Speed 3078.51 samples/sec Loss 6.6616 LearningRate 0.0237 Epoch: 10 Global Step: 127520 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:47:21,697-Speed 3046.36 samples/sec Loss 6.6771 LearningRate 0.0237 Epoch: 10 Global Step: 127530 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:47:25,116-Speed 2995.99 samples/sec Loss 6.8909 LearningRate 0.0237 Epoch: 10 Global Step: 127540 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:47:28,500-Speed 3026.77 samples/sec Loss 6.6828 LearningRate 0.0237 Epoch: 10 Global Step: 127550 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 13:47:31,912-Speed 3002.23 samples/sec Loss 6.7735 LearningRate 0.0237 Epoch: 10 Global Step: 127560 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 13:47:35,249-Speed 3069.16 samples/sec Loss 6.6929 LearningRate 0.0237 Epoch: 10 Global Step: 127570 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 13:47:38,597-Speed 3059.44 samples/sec Loss 6.6958 LearningRate 0.0237 Epoch: 10 Global Step: 127580 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 13:47:42,040-Speed 2975.08 samples/sec Loss 6.7292 LearningRate 0.0237 Epoch: 10 Global Step: 127590 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 13:47:45,471-Speed 2985.36 samples/sec Loss 6.7470 LearningRate 0.0237 Epoch: 10 Global Step: 127600 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 13:47:48,949-Speed 2945.64 samples/sec Loss 6.6902 LearningRate 0.0237 Epoch: 10 Global Step: 127610 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 13:47:52,349-Speed 3012.48 samples/sec Loss 6.8693 LearningRate 0.0236 Epoch: 10 Global Step: 127620 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 13:47:55,761-Speed 3002.36 samples/sec Loss 6.7202 LearningRate 0.0236 Epoch: 10 Global Step: 127630 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 13:47:59,225-Speed 2956.88 samples/sec Loss 6.8137 LearningRate 0.0236 Epoch: 10 Global Step: 127640 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 13:48:02,761-Speed 2896.88 samples/sec Loss 6.7930 LearningRate 0.0236 Epoch: 10 Global Step: 127650 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:48:06,227-Speed 2955.35 samples/sec Loss 6.8168 LearningRate 0.0236 Epoch: 10 Global Step: 127660 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:48:09,604-Speed 3032.56 samples/sec Loss 6.7778 LearningRate 0.0236 Epoch: 10 Global Step: 127670 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:48:13,022-Speed 2997.14 samples/sec Loss 6.7971 LearningRate 0.0236 Epoch: 10 Global Step: 127680 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:48:16,441-Speed 2996.47 samples/sec Loss 6.7495 LearningRate 0.0236 Epoch: 10 Global Step: 127690 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:48:19,845-Speed 3009.67 samples/sec Loss 6.7199 LearningRate 0.0236 Epoch: 10 Global Step: 127700 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:48:23,270-Speed 2990.05 samples/sec Loss 6.7567 LearningRate 0.0236 Epoch: 10 Global Step: 127710 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:48:26,715-Speed 2973.12 samples/sec Loss 6.7710 LearningRate 0.0236 Epoch: 10 Global Step: 127720 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:48:30,180-Speed 2956.26 samples/sec Loss 6.7569 LearningRate 0.0236 Epoch: 10 Global Step: 127730 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:48:33,649-Speed 2952.82 samples/sec Loss 6.7253 LearningRate 0.0236 Epoch: 10 Global Step: 127740 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:48:37,012-Speed 3045.44 samples/sec Loss 6.7741 LearningRate 0.0236 Epoch: 10 Global Step: 127750 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 13:48:40,443-Speed 2986.11 samples/sec Loss 6.7234 LearningRate 0.0236 Epoch: 10 Global Step: 127760 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 13:48:43,802-Speed 3049.28 samples/sec Loss 6.7740 LearningRate 0.0236 Epoch: 10 Global Step: 127770 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 13:48:47,174-Speed 3036.78 samples/sec Loss 6.7623 LearningRate 0.0236 Epoch: 10 Global Step: 127780 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 13:48:50,573-Speed 3013.57 samples/sec Loss 6.7893 LearningRate 0.0236 Epoch: 10 Global Step: 127790 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 13:48:54,060-Speed 2937.98 samples/sec Loss 6.8502 LearningRate 0.0236 Epoch: 10 Global Step: 127800 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 13:48:57,422-Speed 3046.83 samples/sec Loss 6.8182 LearningRate 0.0236 Epoch: 10 Global Step: 127810 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:49:00,821-Speed 3014.12 samples/sec Loss 6.7535 LearningRate 0.0236 Epoch: 10 Global Step: 127820 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:49:04,267-Speed 2971.80 samples/sec Loss 6.8717 LearningRate 0.0236 Epoch: 10 Global Step: 127830 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:49:07,654-Speed 3023.92 samples/sec Loss 6.7252 LearningRate 0.0236 Epoch: 10 Global Step: 127840 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:49:11,070-Speed 2999.09 samples/sec Loss 6.7770 LearningRate 0.0236 Epoch: 10 Global Step: 127850 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:49:14,451-Speed 3029.66 samples/sec Loss 6.7028 LearningRate 0.0236 Epoch: 10 Global Step: 127860 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:49:17,893-Speed 2976.01 samples/sec Loss 6.6952 LearningRate 0.0235 Epoch: 10 Global Step: 127870 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:49:21,280-Speed 3024.45 samples/sec Loss 6.8163 LearningRate 0.0235 Epoch: 10 Global Step: 127880 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:49:24,707-Speed 2988.74 samples/sec Loss 6.9531 LearningRate 0.0235 Epoch: 10 Global Step: 127890 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:49:28,091-Speed 3026.84 samples/sec Loss 6.7527 LearningRate 0.0235 Epoch: 10 Global Step: 127900 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:49:31,558-Speed 2954.02 samples/sec Loss 6.7075 LearningRate 0.0235 Epoch: 10 Global Step: 127910 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 13:49:34,941-Speed 3027.90 samples/sec Loss 6.7910 LearningRate 0.0235 Epoch: 10 Global Step: 127920 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 13:49:38,317-Speed 3034.41 samples/sec Loss 6.8160 LearningRate 0.0235 Epoch: 10 Global Step: 127930 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 13:49:41,671-Speed 3053.27 samples/sec Loss 6.7467 LearningRate 0.0235 Epoch: 10 Global Step: 127940 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:49:45,154-Speed 2940.95 samples/sec Loss 6.9076 LearningRate 0.0235 Epoch: 10 Global Step: 127950 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:49:48,653-Speed 2927.75 samples/sec Loss 6.7608 LearningRate 0.0235 Epoch: 10 Global Step: 127960 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:49:52,027-Speed 3035.86 samples/sec Loss 6.7906 LearningRate 0.0235 Epoch: 10 Global Step: 127970 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:49:55,504-Speed 2945.79 samples/sec Loss 6.7781 LearningRate 0.0235 Epoch: 10 Global Step: 127980 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:49:58,895-Speed 3021.24 samples/sec Loss 6.8608 LearningRate 0.0235 Epoch: 10 Global Step: 127990 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:50:02,347-Speed 2967.42 samples/sec Loss 6.8390 LearningRate 0.0235 Epoch: 10 Global Step: 128000 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 13:50:05,717-Speed 3039.13 samples/sec Loss 6.8900 LearningRate 0.0235 Epoch: 10 Global Step: 128010 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 13:50:09,109-Speed 3019.89 samples/sec Loss 6.9161 LearningRate 0.0235 Epoch: 10 Global Step: 128020 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 13:50:12,514-Speed 3008.28 samples/sec Loss 6.8139 LearningRate 0.0235 Epoch: 10 Global Step: 128030 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 13:50:15,852-Speed 3067.80 samples/sec Loss 6.7379 LearningRate 0.0235 Epoch: 10 Global Step: 128040 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 13:50:19,305-Speed 2967.08 samples/sec Loss 6.7810 LearningRate 0.0235 Epoch: 10 Global Step: 128050 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 13:50:22,660-Speed 3052.89 samples/sec Loss 6.9030 LearningRate 0.0235 Epoch: 10 Global Step: 128060 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 13:50:26,052-Speed 3019.94 samples/sec Loss 6.7678 LearningRate 0.0235 Epoch: 10 Global Step: 128070 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 13:50:29,412-Speed 3049.07 samples/sec Loss 6.8278 LearningRate 0.0235 Epoch: 10 Global Step: 128080 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 13:50:32,720-Speed 3096.69 samples/sec Loss 6.8569 LearningRate 0.0235 Epoch: 10 Global Step: 128090 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 13:50:36,153-Speed 2982.84 samples/sec Loss 6.7706 LearningRate 0.0235 Epoch: 10 Global Step: 128100 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:50:39,514-Speed 3047.89 samples/sec Loss 6.7828 LearningRate 0.0235 Epoch: 10 Global Step: 128110 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:50:42,817-Speed 3101.91 samples/sec Loss 6.8905 LearningRate 0.0235 Epoch: 10 Global Step: 128120 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:50:46,128-Speed 3093.25 samples/sec Loss 6.7958 LearningRate 0.0234 Epoch: 10 Global Step: 128130 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:50:49,443-Speed 3090.00 samples/sec Loss 6.8114 LearningRate 0.0234 Epoch: 10 Global Step: 128140 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:50:52,789-Speed 3060.75 samples/sec Loss 6.7669 LearningRate 0.0234 Epoch: 10 Global Step: 128150 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:50:56,149-Speed 3049.01 samples/sec Loss 6.8469 LearningRate 0.0234 Epoch: 10 Global Step: 128160 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:50:59,526-Speed 3033.15 samples/sec Loss 6.7666 LearningRate 0.0234 Epoch: 10 Global Step: 128170 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:51:02,892-Speed 3042.76 samples/sec Loss 6.8537 LearningRate 0.0234 Epoch: 10 Global Step: 128180 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:51:06,311-Speed 2996.32 samples/sec Loss 6.7936 LearningRate 0.0234 Epoch: 10 Global Step: 128190 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:51:09,743-Speed 2984.68 samples/sec Loss 6.7874 LearningRate 0.0234 Epoch: 10 Global Step: 128200 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 13:51:13,100-Speed 3050.44 samples/sec Loss 6.8008 LearningRate 0.0234 Epoch: 10 Global Step: 128210 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 13:51:16,452-Speed 3056.22 samples/sec Loss 6.8944 LearningRate 0.0234 Epoch: 10 Global Step: 128220 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 13:51:19,798-Speed 3061.39 samples/sec Loss 6.9059 LearningRate 0.0234 Epoch: 10 Global Step: 128230 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 13:51:23,179-Speed 3029.01 samples/sec Loss 6.8353 LearningRate 0.0234 Epoch: 10 Global Step: 128240 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 13:51:26,486-Speed 3097.55 samples/sec Loss 6.8659 LearningRate 0.0234 Epoch: 10 Global Step: 128250 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 13:51:29,947-Speed 2959.65 samples/sec Loss 6.9914 LearningRate 0.0234 Epoch: 10 Global Step: 128260 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 13:51:33,420-Speed 2949.58 samples/sec Loss 6.8156 LearningRate 0.0234 Epoch: 10 Global Step: 128270 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 13:51:36,829-Speed 3004.65 samples/sec Loss 6.7319 LearningRate 0.0234 Epoch: 10 Global Step: 128280 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 13:51:40,224-Speed 3017.39 samples/sec Loss 6.9097 LearningRate 0.0234 Epoch: 10 Global Step: 128290 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:51:43,553-Speed 3076.67 samples/sec Loss 6.8184 LearningRate 0.0234 Epoch: 10 Global Step: 128300 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:51:46,923-Speed 3039.00 samples/sec Loss 6.8909 LearningRate 0.0234 Epoch: 10 Global Step: 128310 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:51:50,261-Speed 3068.96 samples/sec Loss 6.8450 LearningRate 0.0234 Epoch: 10 Global Step: 128320 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:51:53,731-Speed 2952.23 samples/sec Loss 6.9126 LearningRate 0.0234 Epoch: 10 Global Step: 128330 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:51:57,131-Speed 3012.56 samples/sec Loss 6.9310 LearningRate 0.0234 Epoch: 10 Global Step: 128340 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:52:00,522-Speed 3020.55 samples/sec Loss 6.9385 LearningRate 0.0234 Epoch: 10 Global Step: 128350 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:52:03,867-Speed 3062.63 samples/sec Loss 6.7788 LearningRate 0.0234 Epoch: 10 Global Step: 128360 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:52:07,315-Speed 2970.79 samples/sec Loss 6.9249 LearningRate 0.0234 Epoch: 10 Global Step: 128370 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:52:10,662-Speed 3060.38 samples/sec Loss 6.9016 LearningRate 0.0233 Epoch: 10 Global Step: 128380 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 13:52:14,105-Speed 2975.18 samples/sec Loss 6.8308 LearningRate 0.0233 Epoch: 10 Global Step: 128390 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 13:52:17,499-Speed 3018.73 samples/sec Loss 6.8667 LearningRate 0.0233 Epoch: 10 Global Step: 128400 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 13:52:20,887-Speed 3023.11 samples/sec Loss 6.7497 LearningRate 0.0233 Epoch: 10 Global Step: 128410 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 13:52:24,262-Speed 3035.27 samples/sec Loss 6.9269 LearningRate 0.0233 Epoch: 10 Global Step: 128420 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 13:52:27,640-Speed 3032.81 samples/sec Loss 6.9395 LearningRate 0.0233 Epoch: 10 Global Step: 128430 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 13:52:31,020-Speed 3030.27 samples/sec Loss 6.9142 LearningRate 0.0233 Epoch: 10 Global Step: 128440 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 13:52:34,399-Speed 3031.36 samples/sec Loss 6.9053 LearningRate 0.0233 Epoch: 10 Global Step: 128450 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 13:52:37,771-Speed 3037.60 samples/sec Loss 6.8686 LearningRate 0.0233 Epoch: 10 Global Step: 128460 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 13:52:41,125-Speed 3053.45 samples/sec Loss 6.9450 LearningRate 0.0233 Epoch: 10 Global Step: 128470 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 13:52:44,505-Speed 3030.95 samples/sec Loss 6.8497 LearningRate 0.0233 Epoch: 10 Global Step: 128480 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:52:47,872-Speed 3041.75 samples/sec Loss 6.8068 LearningRate 0.0233 Epoch: 10 Global Step: 128490 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:52:51,301-Speed 2987.32 samples/sec Loss 6.8913 LearningRate 0.0233 Epoch: 10 Global Step: 128500 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:52:54,686-Speed 3025.70 samples/sec Loss 6.7886 LearningRate 0.0233 Epoch: 10 Global Step: 128510 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:52:58,105-Speed 2996.13 samples/sec Loss 6.7965 LearningRate 0.0233 Epoch: 10 Global Step: 128520 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 13:53:01,473-Speed 3041.49 samples/sec Loss 6.8061 LearningRate 0.0233 Epoch: 10 Global Step: 128530 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 13:53:04,899-Speed 2990.19 samples/sec Loss 6.7809 LearningRate 0.0233 Epoch: 10 Global Step: 128540 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 13:53:08,259-Speed 3048.27 samples/sec Loss 6.7852 LearningRate 0.0233 Epoch: 10 Global Step: 128550 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 13:53:11,611-Speed 3055.66 samples/sec Loss 6.8041 LearningRate 0.0233 Epoch: 10 Global Step: 128560 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 13:53:14,963-Speed 3055.92 samples/sec Loss 6.7177 LearningRate 0.0233 Epoch: 10 Global Step: 128570 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 13:53:18,400-Speed 2979.58 samples/sec Loss 6.8679 LearningRate 0.0233 Epoch: 10 Global Step: 128580 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 13:53:21,900-Speed 2927.26 samples/sec Loss 6.8977 LearningRate 0.0233 Epoch: 10 Global Step: 128590 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 13:53:25,384-Speed 2939.26 samples/sec Loss 6.8560 LearningRate 0.0233 Epoch: 10 Global Step: 128600 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 13:53:28,784-Speed 3012.98 samples/sec Loss 6.8544 LearningRate 0.0233 Epoch: 10 Global Step: 128610 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 13:53:32,237-Speed 2966.20 samples/sec Loss 6.8048 LearningRate 0.0233 Epoch: 10 Global Step: 128620 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:53:35,642-Speed 3007.93 samples/sec Loss 6.8293 LearningRate 0.0233 Epoch: 10 Global Step: 128630 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:53:39,031-Speed 3022.98 samples/sec Loss 6.8338 LearningRate 0.0232 Epoch: 10 Global Step: 128640 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:53:42,370-Speed 3067.36 samples/sec Loss 6.8560 LearningRate 0.0232 Epoch: 10 Global Step: 128650 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:53:45,737-Speed 3042.46 samples/sec Loss 6.8878 LearningRate 0.0232 Epoch: 10 Global Step: 128660 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:53:49,131-Speed 3017.56 samples/sec Loss 6.7456 LearningRate 0.0232 Epoch: 10 Global Step: 128670 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:53:52,623-Speed 2933.39 samples/sec Loss 6.8645 LearningRate 0.0232 Epoch: 10 Global Step: 128680 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:53:56,018-Speed 3016.55 samples/sec Loss 6.7455 LearningRate 0.0232 Epoch: 10 Global Step: 128690 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:53:59,393-Speed 3035.15 samples/sec Loss 6.8661 LearningRate 0.0232 Epoch: 10 Global Step: 128700 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:54:02,764-Speed 3038.41 samples/sec Loss 6.8158 LearningRate 0.0232 Epoch: 10 Global Step: 128710 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:54:06,126-Speed 3046.62 samples/sec Loss 6.7839 LearningRate 0.0232 Epoch: 10 Global Step: 128720 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 13:54:09,527-Speed 3012.10 samples/sec Loss 6.8543 LearningRate 0.0232 Epoch: 10 Global Step: 128730 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 13:54:12,973-Speed 2972.35 samples/sec Loss 6.7645 LearningRate 0.0232 Epoch: 10 Global Step: 128740 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 13:54:16,468-Speed 2930.13 samples/sec Loss 6.8080 LearningRate 0.0232 Epoch: 10 Global Step: 128750 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 13:54:19,871-Speed 3010.48 samples/sec Loss 6.9215 LearningRate 0.0232 Epoch: 10 Global Step: 128760 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:54:23,369-Speed 2927.64 samples/sec Loss 6.9507 LearningRate 0.0232 Epoch: 10 Global Step: 128770 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:54:26,758-Speed 3022.56 samples/sec Loss 6.8028 LearningRate 0.0232 Epoch: 10 Global Step: 128780 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:54:30,110-Speed 3056.96 samples/sec Loss 6.8660 LearningRate 0.0232 Epoch: 10 Global Step: 128790 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:54:33,496-Speed 3024.37 samples/sec Loss 6.8027 LearningRate 0.0232 Epoch: 10 Global Step: 128800 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:54:36,888-Speed 3019.65 samples/sec Loss 6.8347 LearningRate 0.0232 Epoch: 10 Global Step: 128810 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:54:40,291-Speed 3010.91 samples/sec Loss 6.8674 LearningRate 0.0232 Epoch: 10 Global Step: 128820 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:54:43,656-Speed 3043.62 samples/sec Loss 6.7509 LearningRate 0.0232 Epoch: 10 Global Step: 128830 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:54:47,138-Speed 2941.48 samples/sec Loss 6.8291 LearningRate 0.0232 Epoch: 10 Global Step: 128840 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:54:50,669-Speed 2901.26 samples/sec Loss 6.7773 LearningRate 0.0232 Epoch: 10 Global Step: 128850 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:54:54,109-Speed 2977.63 samples/sec Loss 6.8717 LearningRate 0.0232 Epoch: 10 Global Step: 128860 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 13:54:57,441-Speed 3074.57 samples/sec Loss 6.8302 LearningRate 0.0232 Epoch: 10 Global Step: 128870 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 13:55:00,819-Speed 3032.01 samples/sec Loss 6.9008 LearningRate 0.0232 Epoch: 10 Global Step: 128880 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 13:55:04,211-Speed 3019.48 samples/sec Loss 6.8491 LearningRate 0.0232 Epoch: 10 Global Step: 128890 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 13:55:07,589-Speed 3032.18 samples/sec Loss 6.8847 LearningRate 0.0231 Epoch: 10 Global Step: 128900 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 13:55:10,960-Speed 3038.67 samples/sec Loss 6.7599 LearningRate 0.0231 Epoch: 10 Global Step: 128910 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:55:14,328-Speed 3041.03 samples/sec Loss 6.8096 LearningRate 0.0231 Epoch: 10 Global Step: 128920 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:55:17,692-Speed 3044.71 samples/sec Loss 6.9350 LearningRate 0.0231 Epoch: 10 Global Step: 128930 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:55:21,067-Speed 3034.48 samples/sec Loss 6.9237 LearningRate 0.0231 Epoch: 10 Global Step: 128940 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:55:24,437-Speed 3039.90 samples/sec Loss 6.9202 LearningRate 0.0231 Epoch: 10 Global Step: 128950 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:55:27,759-Speed 3083.46 samples/sec Loss 6.9118 LearningRate 0.0231 Epoch: 10 Global Step: 128960 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:55:31,095-Speed 3069.90 samples/sec Loss 6.7493 LearningRate 0.0231 Epoch: 10 Global Step: 128970 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:55:34,612-Speed 2912.61 samples/sec Loss 6.7719 LearningRate 0.0231 Epoch: 10 Global Step: 128980 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:55:37,990-Speed 3032.83 samples/sec Loss 6.7658 LearningRate 0.0231 Epoch: 10 Global Step: 128990 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:55:41,351-Speed 3047.64 samples/sec Loss 6.8482 LearningRate 0.0231 Epoch: 10 Global Step: 129000 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 13:55:44,724-Speed 3035.86 samples/sec Loss 6.9224 LearningRate 0.0231 Epoch: 10 Global Step: 129010 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 13:55:48,072-Speed 3059.38 samples/sec Loss 6.9187 LearningRate 0.0231 Epoch: 10 Global Step: 129020 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 13:55:51,445-Speed 3036.29 samples/sec Loss 6.8325 LearningRate 0.0231 Epoch: 10 Global Step: 129030 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 13:55:54,919-Speed 2949.06 samples/sec Loss 6.7452 LearningRate 0.0231 Epoch: 10 Global Step: 129040 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 13:55:58,415-Speed 2929.79 samples/sec Loss 6.8604 LearningRate 0.0231 Epoch: 10 Global Step: 129050 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 13:56:01,796-Speed 3029.23 samples/sec Loss 6.7620 LearningRate 0.0231 Epoch: 10 Global Step: 129060 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 13:56:05,206-Speed 3004.12 samples/sec Loss 6.8082 LearningRate 0.0231 Epoch: 10 Global Step: 129070 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 13:56:08,639-Speed 2983.63 samples/sec Loss 6.7740 LearningRate 0.0231 Epoch: 10 Global Step: 129080 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 13:56:11,954-Speed 3089.99 samples/sec Loss 6.7972 LearningRate 0.0231 Epoch: 10 Global Step: 129090 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 13:56:15,324-Speed 3039.30 samples/sec Loss 6.8141 LearningRate 0.0231 Epoch: 10 Global Step: 129100 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:56:18,761-Speed 2980.41 samples/sec Loss 6.8878 LearningRate 0.0231 Epoch: 10 Global Step: 129110 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:56:22,148-Speed 3024.02 samples/sec Loss 6.8199 LearningRate 0.0231 Epoch: 10 Global Step: 129120 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:56:25,568-Speed 2995.04 samples/sec Loss 6.9000 LearningRate 0.0231 Epoch: 10 Global Step: 129130 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:56:28,971-Speed 3009.99 samples/sec Loss 6.8500 LearningRate 0.0231 Epoch: 10 Global Step: 129140 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:56:32,301-Speed 3075.85 samples/sec Loss 6.8003 LearningRate 0.0231 Epoch: 10 Global Step: 129150 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:56:35,670-Speed 3040.39 samples/sec Loss 6.8174 LearningRate 0.0230 Epoch: 10 Global Step: 129160 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:56:39,055-Speed 3025.60 samples/sec Loss 6.7203 LearningRate 0.0230 Epoch: 10 Global Step: 129170 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:56:42,506-Speed 2967.84 samples/sec Loss 6.6564 LearningRate 0.0230 Epoch: 10 Global Step: 129180 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:56:45,895-Speed 3022.85 samples/sec Loss 6.9680 LearningRate 0.0230 Epoch: 10 Global Step: 129190 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:56:49,245-Speed 3058.06 samples/sec Loss 6.9370 LearningRate 0.0230 Epoch: 10 Global Step: 129200 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 13:56:52,628-Speed 3028.07 samples/sec Loss 7.0116 LearningRate 0.0230 Epoch: 10 Global Step: 129210 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 13:56:56,007-Speed 3030.40 samples/sec Loss 6.8315 LearningRate 0.0230 Epoch: 10 Global Step: 129220 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 13:56:59,357-Speed 3057.73 samples/sec Loss 6.7695 LearningRate 0.0230 Epoch: 10 Global Step: 129230 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 13:57:02,693-Speed 3071.05 samples/sec Loss 6.8628 LearningRate 0.0230 Epoch: 10 Global Step: 129240 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 13:57:06,091-Speed 3013.51 samples/sec Loss 6.9142 LearningRate 0.0230 Epoch: 10 Global Step: 129250 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 13:57:09,473-Speed 3028.73 samples/sec Loss 6.7615 LearningRate 0.0230 Epoch: 10 Global Step: 129260 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 13:57:12,909-Speed 2981.22 samples/sec Loss 6.8291 LearningRate 0.0230 Epoch: 10 Global Step: 129270 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 13:57:16,260-Speed 3055.92 samples/sec Loss 6.8825 LearningRate 0.0230 Epoch: 10 Global Step: 129280 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 13:57:19,610-Speed 3058.12 samples/sec Loss 6.9551 LearningRate 0.0230 Epoch: 10 Global Step: 129290 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 13:57:22,980-Speed 3039.17 samples/sec Loss 6.9610 LearningRate 0.0230 Epoch: 10 Global Step: 129300 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 13:57:26,324-Speed 3063.07 samples/sec Loss 6.9207 LearningRate 0.0230 Epoch: 10 Global Step: 129310 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 13:57:29,725-Speed 3012.29 samples/sec Loss 6.8758 LearningRate 0.0230 Epoch: 10 Global Step: 129320 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:57:33,106-Speed 3029.91 samples/sec Loss 6.8974 LearningRate 0.0230 Epoch: 10 Global Step: 129330 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:57:36,438-Speed 3074.29 samples/sec Loss 6.9068 LearningRate 0.0230 Epoch: 10 Global Step: 129340 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:57:39,831-Speed 3019.07 samples/sec Loss 6.9367 LearningRate 0.0230 Epoch: 10 Global Step: 129350 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:57:43,207-Speed 3034.04 samples/sec Loss 6.9989 LearningRate 0.0230 Epoch: 10 Global Step: 129360 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:57:46,588-Speed 3028.80 samples/sec Loss 6.7808 LearningRate 0.0230 Epoch: 10 Global Step: 129370 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:57:50,005-Speed 2998.05 samples/sec Loss 6.9134 LearningRate 0.0230 Epoch: 10 Global Step: 129380 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:57:53,348-Speed 3064.36 samples/sec Loss 6.8799 LearningRate 0.0230 Epoch: 10 Global Step: 129390 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:57:56,736-Speed 3023.51 samples/sec Loss 6.9610 LearningRate 0.0230 Epoch: 10 Global Step: 129400 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:58:00,195-Speed 2961.78 samples/sec Loss 6.8840 LearningRate 0.0230 Epoch: 10 Global Step: 129410 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:58:03,595-Speed 3011.94 samples/sec Loss 6.7980 LearningRate 0.0229 Epoch: 10 Global Step: 129420 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 13:58:06,992-Speed 3015.34 samples/sec Loss 6.9542 LearningRate 0.0229 Epoch: 10 Global Step: 129430 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 13:58:10,462-Speed 2951.78 samples/sec Loss 6.9315 LearningRate 0.0229 Epoch: 10 Global Step: 129440 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 13:58:13,785-Speed 3082.29 samples/sec Loss 6.9023 LearningRate 0.0229 Epoch: 10 Global Step: 129450 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 13:58:17,159-Speed 3036.02 samples/sec Loss 6.8373 LearningRate 0.0229 Epoch: 10 Global Step: 129460 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 13:58:20,495-Speed 3070.83 samples/sec Loss 6.7742 LearningRate 0.0229 Epoch: 10 Global Step: 129470 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 13:58:23,831-Speed 3070.42 samples/sec Loss 6.9322 LearningRate 0.0229 Epoch: 10 Global Step: 129480 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 13:58:27,156-Speed 3080.74 samples/sec Loss 6.8969 LearningRate 0.0229 Epoch: 10 Global Step: 129490 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 13:58:30,508-Speed 3055.34 samples/sec Loss 6.9609 LearningRate 0.0229 Epoch: 10 Global Step: 129500 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:58:33,843-Speed 3071.67 samples/sec Loss 6.9233 LearningRate 0.0229 Epoch: 10 Global Step: 129510 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:58:37,214-Speed 3037.89 samples/sec Loss 6.7821 LearningRate 0.0229 Epoch: 10 Global Step: 129520 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:58:40,596-Speed 3029.67 samples/sec Loss 6.7414 LearningRate 0.0229 Epoch: 10 Global Step: 129530 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:58:43,916-Speed 3084.26 samples/sec Loss 6.7137 LearningRate 0.0229 Epoch: 10 Global Step: 129540 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:58:47,270-Speed 3054.07 samples/sec Loss 6.8529 LearningRate 0.0229 Epoch: 10 Global Step: 129550 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:58:50,689-Speed 2995.97 samples/sec Loss 6.8772 LearningRate 0.0229 Epoch: 10 Global Step: 129560 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:58:54,079-Speed 3021.58 samples/sec Loss 6.8917 LearningRate 0.0229 Epoch: 10 Global Step: 129570 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:58:57,542-Speed 2957.44 samples/sec Loss 7.0150 LearningRate 0.0229 Epoch: 10 Global Step: 129580 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:59:00,994-Speed 2967.25 samples/sec Loss 6.9088 LearningRate 0.0229 Epoch: 10 Global Step: 129590 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:59:04,394-Speed 3012.47 samples/sec Loss 7.0159 LearningRate 0.0229 Epoch: 10 Global Step: 129600 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 13:59:07,807-Speed 3001.21 samples/sec Loss 6.8029 LearningRate 0.0229 Epoch: 10 Global Step: 129610 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 13:59:11,194-Speed 3025.25 samples/sec Loss 6.8119 LearningRate 0.0229 Epoch: 10 Global Step: 129620 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 13:59:14,512-Speed 3087.19 samples/sec Loss 6.8679 LearningRate 0.0229 Epoch: 10 Global Step: 129630 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 13:59:17,849-Speed 3068.91 samples/sec Loss 6.9938 LearningRate 0.0229 Epoch: 10 Global Step: 129640 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 13:59:21,263-Speed 3001.23 samples/sec Loss 6.8717 LearningRate 0.0229 Epoch: 10 Global Step: 129650 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 13:59:24,622-Speed 3048.67 samples/sec Loss 6.8798 LearningRate 0.0229 Epoch: 10 Global Step: 129660 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:59:28,004-Speed 3028.68 samples/sec Loss 6.8677 LearningRate 0.0229 Epoch: 10 Global Step: 129670 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:59:31,390-Speed 3025.35 samples/sec Loss 6.9216 LearningRate 0.0228 Epoch: 10 Global Step: 129680 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:59:34,798-Speed 3005.67 samples/sec Loss 6.8198 LearningRate 0.0228 Epoch: 10 Global Step: 129690 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:59:38,227-Speed 2986.45 samples/sec Loss 6.9358 LearningRate 0.0228 Epoch: 10 Global Step: 129700 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:59:41,644-Speed 2997.86 samples/sec Loss 6.8529 LearningRate 0.0228 Epoch: 10 Global Step: 129710 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:59:45,115-Speed 2950.82 samples/sec Loss 7.0032 LearningRate 0.0228 Epoch: 10 Global Step: 129720 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:59:48,518-Speed 3009.93 samples/sec Loss 6.8796 LearningRate 0.0228 Epoch: 10 Global Step: 129730 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:59:51,877-Speed 3049.48 samples/sec Loss 6.8979 LearningRate 0.0228 Epoch: 10 Global Step: 129740 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:59:55,187-Speed 3094.49 samples/sec Loss 6.8206 LearningRate 0.0228 Epoch: 10 Global Step: 129750 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 13:59:58,599-Speed 3001.43 samples/sec Loss 6.9439 LearningRate 0.0228 Epoch: 10 Global Step: 129760 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:00:02,022-Speed 2992.58 samples/sec Loss 6.9177 LearningRate 0.0228 Epoch: 10 Global Step: 129770 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:00:05,400-Speed 3032.80 samples/sec Loss 6.9222 LearningRate 0.0228 Epoch: 10 Global Step: 129780 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:00:08,777-Speed 3032.53 samples/sec Loss 6.7808 LearningRate 0.0228 Epoch: 10 Global Step: 129790 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:00:12,143-Speed 3042.89 samples/sec Loss 6.8755 LearningRate 0.0228 Epoch: 10 Global Step: 129800 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:00:15,578-Speed 2982.19 samples/sec Loss 6.9765 LearningRate 0.0228 Epoch: 10 Global Step: 129810 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:00:18,969-Speed 3020.48 samples/sec Loss 6.8494 LearningRate 0.0228 Epoch: 10 Global Step: 129820 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:00:22,391-Speed 2993.34 samples/sec Loss 6.9287 LearningRate 0.0228 Epoch: 10 Global Step: 129830 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:00:25,825-Speed 2982.90 samples/sec Loss 6.7738 LearningRate 0.0228 Epoch: 10 Global Step: 129840 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:00:29,326-Speed 2925.99 samples/sec Loss 6.8940 LearningRate 0.0228 Epoch: 10 Global Step: 129850 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:00:32,790-Speed 2956.23 samples/sec Loss 6.9420 LearningRate 0.0228 Epoch: 10 Global Step: 129860 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:00:36,283-Speed 2932.39 samples/sec Loss 6.8703 LearningRate 0.0228 Epoch: 10 Global Step: 129870 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:00:39,676-Speed 3019.08 samples/sec Loss 6.8921 LearningRate 0.0228 Epoch: 10 Global Step: 129880 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:00:43,041-Speed 3044.01 samples/sec Loss 7.0288 LearningRate 0.0228 Epoch: 10 Global Step: 129890 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:00:46,452-Speed 3002.64 samples/sec Loss 6.8266 LearningRate 0.0228 Epoch: 10 Global Step: 129900 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:00:49,793-Speed 3066.09 samples/sec Loss 6.9560 LearningRate 0.0228 Epoch: 10 Global Step: 129910 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:00:53,214-Speed 2994.32 samples/sec Loss 6.9081 LearningRate 0.0228 Epoch: 10 Global Step: 129920 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:00:56,629-Speed 2999.80 samples/sec Loss 6.9866 LearningRate 0.0228 Epoch: 10 Global Step: 129930 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:01:00,039-Speed 3003.24 samples/sec Loss 6.8780 LearningRate 0.0227 Epoch: 10 Global Step: 129940 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:01:03,399-Speed 3048.51 samples/sec Loss 6.8947 LearningRate 0.0227 Epoch: 10 Global Step: 129950 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:01:06,773-Speed 3035.22 samples/sec Loss 6.8261 LearningRate 0.0227 Epoch: 10 Global Step: 129960 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:01:10,200-Speed 2989.44 samples/sec Loss 6.9218 LearningRate 0.0227 Epoch: 10 Global Step: 129970 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:01:13,598-Speed 3013.86 samples/sec Loss 6.7885 LearningRate 0.0227 Epoch: 10 Global Step: 129980 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:01:17,002-Speed 3009.31 samples/sec Loss 6.8251 LearningRate 0.0227 Epoch: 10 Global Step: 129990 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:01:20,477-Speed 2947.82 samples/sec Loss 6.9875 LearningRate 0.0227 Epoch: 10 Global Step: 130000 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:01:23,987-Speed 2917.69 samples/sec Loss 6.7882 LearningRate 0.0227 Epoch: 10 Global Step: 130010 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:01:27,391-Speed 3009.21 samples/sec Loss 6.9032 LearningRate 0.0227 Epoch: 10 Global Step: 130020 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 14:01:30,732-Speed 3066.45 samples/sec Loss 6.8091 LearningRate 0.0227 Epoch: 10 Global Step: 130030 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 14:01:34,108-Speed 3033.36 samples/sec Loss 6.8576 LearningRate 0.0227 Epoch: 10 Global Step: 130040 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:01:37,509-Speed 3011.78 samples/sec Loss 6.7796 LearningRate 0.0227 Epoch: 10 Global Step: 130050 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:01:40,879-Speed 3039.84 samples/sec Loss 6.8459 LearningRate 0.0227 Epoch: 10 Global Step: 130060 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:01:44,309-Speed 2985.88 samples/sec Loss 6.7975 LearningRate 0.0227 Epoch: 10 Global Step: 130070 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:01:47,662-Speed 3054.26 samples/sec Loss 6.9739 LearningRate 0.0227 Epoch: 10 Global Step: 130080 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:01:51,158-Speed 2930.80 samples/sec Loss 6.8360 LearningRate 0.0227 Epoch: 10 Global Step: 130090 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:01:54,545-Speed 3023.45 samples/sec Loss 6.8727 LearningRate 0.0227 Epoch: 10 Global Step: 130100 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:01:57,906-Speed 3047.74 samples/sec Loss 6.9845 LearningRate 0.0227 Epoch: 10 Global Step: 130110 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:02:01,300-Speed 3017.99 samples/sec Loss 6.8126 LearningRate 0.0227 Epoch: 10 Global Step: 130120 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:02:04,733-Speed 2983.79 samples/sec Loss 6.8700 LearningRate 0.0227 Epoch: 10 Global Step: 130130 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:02:08,079-Speed 3061.47 samples/sec Loss 6.8780 LearningRate 0.0227 Epoch: 10 Global Step: 130140 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:02:11,418-Speed 3067.63 samples/sec Loss 6.8842 LearningRate 0.0227 Epoch: 10 Global Step: 130150 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:02:14,826-Speed 3004.94 samples/sec Loss 6.8603 LearningRate 0.0227 Epoch: 10 Global Step: 130160 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:02:18,198-Speed 3038.50 samples/sec Loss 6.9488 LearningRate 0.0227 Epoch: 10 Global Step: 130170 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:02:21,661-Speed 2957.48 samples/sec Loss 6.8179 LearningRate 0.0227 Epoch: 10 Global Step: 130180 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:02:25,074-Speed 3001.37 samples/sec Loss 6.6747 LearningRate 0.0227 Epoch: 10 Global Step: 130190 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:02:28,480-Speed 3006.83 samples/sec Loss 6.9741 LearningRate 0.0226 Epoch: 10 Global Step: 130200 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:02:31,854-Speed 3036.05 samples/sec Loss 6.8849 LearningRate 0.0226 Epoch: 10 Global Step: 130210 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:02:35,273-Speed 2995.83 samples/sec Loss 6.8250 LearningRate 0.0226 Epoch: 10 Global Step: 130220 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:02:38,684-Speed 3002.60 samples/sec Loss 6.9238 LearningRate 0.0226 Epoch: 10 Global Step: 130230 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:02:42,027-Speed 3063.74 samples/sec Loss 6.8504 LearningRate 0.0226 Epoch: 10 Global Step: 130240 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:02:45,442-Speed 2999.50 samples/sec Loss 7.0483 LearningRate 0.0226 Epoch: 10 Global Step: 130250 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:02:48,855-Speed 3001.41 samples/sec Loss 6.9445 LearningRate 0.0226 Epoch: 10 Global Step: 130260 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:02:52,324-Speed 2952.77 samples/sec Loss 6.9543 LearningRate 0.0226 Epoch: 10 Global Step: 130270 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:02:55,734-Speed 3003.92 samples/sec Loss 6.9181 LearningRate 0.0226 Epoch: 10 Global Step: 130280 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:02:59,156-Speed 2992.74 samples/sec Loss 6.8827 LearningRate 0.0226 Epoch: 10 Global Step: 130290 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:03:02,523-Speed 3042.82 samples/sec Loss 7.0048 LearningRate 0.0226 Epoch: 10 Global Step: 130300 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:03:05,975-Speed 2966.75 samples/sec Loss 6.8576 LearningRate 0.0226 Epoch: 10 Global Step: 130310 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:03:09,343-Speed 3041.86 samples/sec Loss 6.8109 LearningRate 0.0226 Epoch: 10 Global Step: 130320 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:03:12,720-Speed 3033.33 samples/sec Loss 6.8967 LearningRate 0.0226 Epoch: 10 Global Step: 130330 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:03:16,116-Speed 3016.29 samples/sec Loss 6.8952 LearningRate 0.0226 Epoch: 10 Global Step: 130340 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:03:20,188-Speed 2515.03 samples/sec Loss 6.8729 LearningRate 0.0226 Epoch: 10 Global Step: 130350 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:03:23,548-Speed 3048.18 samples/sec Loss 6.9257 LearningRate 0.0226 Epoch: 10 Global Step: 130360 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:03:26,876-Speed 3077.56 samples/sec Loss 6.9033 LearningRate 0.0226 Epoch: 10 Global Step: 130370 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:03:30,262-Speed 3026.12 samples/sec Loss 6.9254 LearningRate 0.0226 Epoch: 10 Global Step: 130380 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:03:33,614-Speed 3055.41 samples/sec Loss 6.7553 LearningRate 0.0226 Epoch: 10 Global Step: 130390 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:03:37,028-Speed 3000.85 samples/sec Loss 6.7680 LearningRate 0.0226 Epoch: 10 Global Step: 130400 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:03:40,418-Speed 3021.77 samples/sec Loss 7.0066 LearningRate 0.0226 Epoch: 10 Global Step: 130410 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 14:03:43,721-Speed 3100.61 samples/sec Loss 6.7791 LearningRate 0.0226 Epoch: 10 Global Step: 130420 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:03:47,154-Speed 2983.94 samples/sec Loss 6.8295 LearningRate 0.0226 Epoch: 10 Global Step: 130430 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:03:50,520-Speed 3043.51 samples/sec Loss 6.9588 LearningRate 0.0226 Epoch: 10 Global Step: 130440 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:03:53,901-Speed 3028.94 samples/sec Loss 6.8625 LearningRate 0.0226 Epoch: 10 Global Step: 130450 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:03:57,267-Speed 3043.09 samples/sec Loss 6.9256 LearningRate 0.0225 Epoch: 10 Global Step: 130460 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:04:00,701-Speed 2983.03 samples/sec Loss 6.8585 LearningRate 0.0225 Epoch: 10 Global Step: 130470 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:04:04,067-Speed 3043.14 samples/sec Loss 6.7715 LearningRate 0.0225 Epoch: 10 Global Step: 130480 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:04:07,463-Speed 3015.83 samples/sec Loss 6.9879 LearningRate 0.0225 Epoch: 10 Global Step: 130490 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:04:10,872-Speed 3004.64 samples/sec Loss 6.9098 LearningRate 0.0225 Epoch: 10 Global Step: 130500 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:04:14,300-Speed 2988.77 samples/sec Loss 6.7659 LearningRate 0.0225 Epoch: 10 Global Step: 130510 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:04:17,694-Speed 3017.18 samples/sec Loss 6.7847 LearningRate 0.0225 Epoch: 10 Global Step: 130520 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 14:04:21,053-Speed 3049.94 samples/sec Loss 6.8471 LearningRate 0.0225 Epoch: 10 Global Step: 130530 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:04:24,408-Speed 3052.76 samples/sec Loss 6.9513 LearningRate 0.0225 Epoch: 10 Global Step: 130540 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:04:27,784-Speed 3033.84 samples/sec Loss 6.7617 LearningRate 0.0225 Epoch: 10 Global Step: 130550 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:04:31,226-Speed 2976.57 samples/sec Loss 6.8879 LearningRate 0.0225 Epoch: 10 Global Step: 130560 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:04:34,585-Speed 3049.33 samples/sec Loss 6.8876 LearningRate 0.0225 Epoch: 10 Global Step: 130570 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:04:37,954-Speed 3040.49 samples/sec Loss 6.7683 LearningRate 0.0225 Epoch: 10 Global Step: 130580 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:04:41,438-Speed 2939.79 samples/sec Loss 6.9117 LearningRate 0.0225 Epoch: 10 Global Step: 130590 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:04:44,924-Speed 2938.75 samples/sec Loss 6.8997 LearningRate 0.0225 Epoch: 10 Global Step: 130600 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:04:48,336-Speed 3001.62 samples/sec Loss 6.9291 LearningRate 0.0225 Epoch: 10 Global Step: 130610 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:04:51,731-Speed 3017.07 samples/sec Loss 6.9665 LearningRate 0.0225 Epoch: 10 Global Step: 130620 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:04:55,137-Speed 3007.67 samples/sec Loss 6.8215 LearningRate 0.0225 Epoch: 10 Global Step: 130630 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:04:58,555-Speed 2996.62 samples/sec Loss 6.8328 LearningRate 0.0225 Epoch: 10 Global Step: 130640 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:05:01,914-Speed 3050.10 samples/sec Loss 6.8802 LearningRate 0.0225 Epoch: 10 Global Step: 130650 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:05:05,340-Speed 2988.72 samples/sec Loss 6.8073 LearningRate 0.0225 Epoch: 10 Global Step: 130660 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:05:08,766-Speed 2990.40 samples/sec Loss 6.8216 LearningRate 0.0225 Epoch: 10 Global Step: 130670 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:05:12,248-Speed 2941.82 samples/sec Loss 6.8056 LearningRate 0.0225 Epoch: 10 Global Step: 130680 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:05:15,596-Speed 3059.57 samples/sec Loss 6.7040 LearningRate 0.0225 Epoch: 10 Global Step: 130690 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:05:19,008-Speed 3002.23 samples/sec Loss 6.7683 LearningRate 0.0225 Epoch: 10 Global Step: 130700 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:05:22,465-Speed 2962.44 samples/sec Loss 6.8949 LearningRate 0.0225 Epoch: 10 Global Step: 130710 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:05:25,821-Speed 3052.03 samples/sec Loss 6.8141 LearningRate 0.0224 Epoch: 10 Global Step: 130720 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:05:29,311-Speed 2934.70 samples/sec Loss 6.9719 LearningRate 0.0224 Epoch: 10 Global Step: 130730 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:05:32,725-Speed 3000.95 samples/sec Loss 6.9586 LearningRate 0.0224 Epoch: 10 Global Step: 130740 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:05:36,064-Speed 3067.51 samples/sec Loss 6.8089 LearningRate 0.0224 Epoch: 10 Global Step: 130750 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:05:39,440-Speed 3034.55 samples/sec Loss 6.8535 LearningRate 0.0224 Epoch: 10 Global Step: 130760 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:05:42,878-Speed 2979.17 samples/sec Loss 6.7857 LearningRate 0.0224 Epoch: 10 Global Step: 130770 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:05:46,280-Speed 3010.79 samples/sec Loss 6.9298 LearningRate 0.0224 Epoch: 10 Global Step: 130780 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:05:49,721-Speed 2976.18 samples/sec Loss 6.9237 LearningRate 0.0224 Epoch: 10 Global Step: 130790 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:05:53,094-Speed 3037.25 samples/sec Loss 6.7774 LearningRate 0.0224 Epoch: 10 Global Step: 130800 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:05:56,498-Speed 3008.57 samples/sec Loss 6.7501 LearningRate 0.0224 Epoch: 10 Global Step: 130810 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:05:59,928-Speed 2986.33 samples/sec Loss 6.9391 LearningRate 0.0224 Epoch: 10 Global Step: 130820 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:06:03,359-Speed 2986.80 samples/sec Loss 6.8448 LearningRate 0.0224 Epoch: 10 Global Step: 130830 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:06:06,687-Speed 3077.16 samples/sec Loss 6.8919 LearningRate 0.0224 Epoch: 10 Global Step: 130840 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:06:10,118-Speed 2985.47 samples/sec Loss 6.8162 LearningRate 0.0224 Epoch: 10 Global Step: 130850 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:06:13,550-Speed 2985.15 samples/sec Loss 6.9105 LearningRate 0.0224 Epoch: 10 Global Step: 130860 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:06:16,925-Speed 3034.56 samples/sec Loss 6.9192 LearningRate 0.0224 Epoch: 10 Global Step: 130870 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:06:20,357-Speed 2984.65 samples/sec Loss 6.7839 LearningRate 0.0224 Epoch: 10 Global Step: 130880 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:06:23,834-Speed 2945.58 samples/sec Loss 6.9118 LearningRate 0.0224 Epoch: 10 Global Step: 130890 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:06:27,208-Speed 3035.70 samples/sec Loss 7.0228 LearningRate 0.0224 Epoch: 10 Global Step: 130900 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:06:30,572-Speed 3045.23 samples/sec Loss 6.9226 LearningRate 0.0224 Epoch: 10 Global Step: 130910 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:06:33,960-Speed 3023.76 samples/sec Loss 6.9295 LearningRate 0.0224 Epoch: 10 Global Step: 130920 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:06:37,361-Speed 3012.16 samples/sec Loss 6.8987 LearningRate 0.0224 Epoch: 10 Global Step: 130930 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:06:40,798-Speed 2979.84 samples/sec Loss 6.8970 LearningRate 0.0224 Epoch: 10 Global Step: 130940 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:06:44,127-Speed 3076.62 samples/sec Loss 6.8499 LearningRate 0.0224 Epoch: 10 Global Step: 130950 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:06:47,535-Speed 3005.63 samples/sec Loss 6.8687 LearningRate 0.0224 Epoch: 10 Global Step: 130960 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:06:50,920-Speed 3026.70 samples/sec Loss 6.9121 LearningRate 0.0224 Epoch: 10 Global Step: 130970 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:06:54,359-Speed 2978.20 samples/sec Loss 6.8622 LearningRate 0.0223 Epoch: 10 Global Step: 130980 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:06:57,749-Speed 3021.47 samples/sec Loss 6.8345 LearningRate 0.0223 Epoch: 10 Global Step: 130990 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:07:01,185-Speed 2981.32 samples/sec Loss 6.9291 LearningRate 0.0223 Epoch: 10 Global Step: 131000 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:07:04,513-Speed 3077.95 samples/sec Loss 6.8845 LearningRate 0.0223 Epoch: 10 Global Step: 131010 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:07:07,905-Speed 3019.96 samples/sec Loss 6.7819 LearningRate 0.0223 Epoch: 10 Global Step: 131020 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:07:11,246-Speed 3065.98 samples/sec Loss 7.0017 LearningRate 0.0223 Epoch: 10 Global Step: 131030 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 14:07:14,623-Speed 3033.07 samples/sec Loss 6.9112 LearningRate 0.0223 Epoch: 10 Global Step: 131040 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:07:18,094-Speed 2950.67 samples/sec Loss 6.8023 LearningRate 0.0223 Epoch: 10 Global Step: 131050 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:07:21,566-Speed 2950.08 samples/sec Loss 6.8574 LearningRate 0.0223 Epoch: 10 Global Step: 131060 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:07:24,970-Speed 3008.65 samples/sec Loss 6.8634 LearningRate 0.0223 Epoch: 10 Global Step: 131070 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:07:28,335-Speed 3044.51 samples/sec Loss 6.8353 LearningRate 0.0223 Epoch: 10 Global Step: 131080 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:07:31,767-Speed 2984.69 samples/sec Loss 6.9560 LearningRate 0.0223 Epoch: 10 Global Step: 131090 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:07:35,172-Speed 3008.58 samples/sec Loss 6.7534 LearningRate 0.0223 Epoch: 10 Global Step: 131100 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:07:38,527-Speed 3053.22 samples/sec Loss 6.8752 LearningRate 0.0223 Epoch: 10 Global Step: 131110 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:07:41,964-Speed 2979.60 samples/sec Loss 6.7722 LearningRate 0.0223 Epoch: 10 Global Step: 131120 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:07:45,312-Speed 3059.16 samples/sec Loss 6.7481 LearningRate 0.0223 Epoch: 10 Global Step: 131130 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 14:07:48,665-Speed 3055.71 samples/sec Loss 6.9792 LearningRate 0.0223 Epoch: 10 Global Step: 131140 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 14:07:52,014-Speed 3058.35 samples/sec Loss 6.8530 LearningRate 0.0223 Epoch: 10 Global Step: 131150 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 14:07:55,439-Speed 2990.15 samples/sec Loss 6.8731 LearningRate 0.0223 Epoch: 10 Global Step: 131160 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 14:07:58,796-Speed 3051.99 samples/sec Loss 6.8346 LearningRate 0.0223 Epoch: 10 Global Step: 131170 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 14:08:02,108-Speed 3092.31 samples/sec Loss 6.9015 LearningRate 0.0223 Epoch: 10 Global Step: 131180 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 14:08:05,495-Speed 3023.87 samples/sec Loss 6.8558 LearningRate 0.0223 Epoch: 10 Global Step: 131190 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 14:08:08,900-Speed 3008.45 samples/sec Loss 6.8759 LearningRate 0.0223 Epoch: 10 Global Step: 131200 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 14:08:12,311-Speed 3002.87 samples/sec Loss 6.9091 LearningRate 0.0223 Epoch: 10 Global Step: 131210 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 14:08:15,738-Speed 2988.97 samples/sec Loss 6.9026 LearningRate 0.0223 Epoch: 10 Global Step: 131220 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 14:08:19,101-Speed 3045.43 samples/sec Loss 6.8226 LearningRate 0.0223 Epoch: 10 Global Step: 131230 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:08:22,520-Speed 2996.06 samples/sec Loss 6.9322 LearningRate 0.0223 Epoch: 10 Global Step: 131240 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:08:25,921-Speed 3011.40 samples/sec Loss 6.8383 LearningRate 0.0222 Epoch: 10 Global Step: 131250 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:08:29,342-Speed 2994.88 samples/sec Loss 6.8218 LearningRate 0.0222 Epoch: 10 Global Step: 131260 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:08:32,807-Speed 2956.07 samples/sec Loss 6.8107 LearningRate 0.0222 Epoch: 10 Global Step: 131270 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:08:36,120-Speed 3091.32 samples/sec Loss 6.8079 LearningRate 0.0222 Epoch: 10 Global Step: 131280 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:08:39,533-Speed 3000.81 samples/sec Loss 6.8882 LearningRate 0.0222 Epoch: 10 Global Step: 131290 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:08:42,960-Speed 2989.34 samples/sec Loss 6.8428 LearningRate 0.0222 Epoch: 10 Global Step: 131300 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:08:46,348-Speed 3023.27 samples/sec Loss 6.8929 LearningRate 0.0222 Epoch: 10 Global Step: 131310 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:08:49,729-Speed 3029.12 samples/sec Loss 6.8684 LearningRate 0.0222 Epoch: 10 Global Step: 131320 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:08:53,082-Speed 3055.41 samples/sec Loss 6.7379 LearningRate 0.0222 Epoch: 10 Global Step: 131330 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:08:56,419-Speed 3069.17 samples/sec Loss 6.8565 LearningRate 0.0222 Epoch: 10 Global Step: 131340 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:08:59,840-Speed 2994.24 samples/sec Loss 6.8331 LearningRate 0.0222 Epoch: 10 Global Step: 131350 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:09:03,175-Speed 3071.21 samples/sec Loss 6.8282 LearningRate 0.0222 Epoch: 10 Global Step: 131360 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:09:06,580-Speed 3008.05 samples/sec Loss 6.7656 LearningRate 0.0222 Epoch: 10 Global Step: 131370 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:09:10,004-Speed 2991.92 samples/sec Loss 6.7849 LearningRate 0.0222 Epoch: 10 Global Step: 131380 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:09:13,401-Speed 3014.87 samples/sec Loss 6.9297 LearningRate 0.0222 Epoch: 10 Global Step: 131390 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:09:16,819-Speed 2997.07 samples/sec Loss 6.8753 LearningRate 0.0222 Epoch: 10 Global Step: 131400 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:09:20,168-Speed 3058.22 samples/sec Loss 6.8885 LearningRate 0.0222 Epoch: 10 Global Step: 131410 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:09:23,583-Speed 2999.13 samples/sec Loss 6.8584 LearningRate 0.0222 Epoch: 10 Global Step: 131420 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:09:26,958-Speed 3036.16 samples/sec Loss 6.8351 LearningRate 0.0222 Epoch: 10 Global Step: 131430 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 14:09:30,383-Speed 2991.46 samples/sec Loss 6.9201 LearningRate 0.0222 Epoch: 10 Global Step: 131440 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:09:33,828-Speed 2973.32 samples/sec Loss 6.8819 LearningRate 0.0222 Epoch: 10 Global Step: 131450 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:09:37,170-Speed 3064.58 samples/sec Loss 6.8384 LearningRate 0.0222 Epoch: 10 Global Step: 131460 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:09:40,654-Speed 2939.84 samples/sec Loss 6.6811 LearningRate 0.0222 Epoch: 10 Global Step: 131470 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:09:44,055-Speed 3011.83 samples/sec Loss 6.9157 LearningRate 0.0222 Epoch: 10 Global Step: 131480 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:09:47,430-Speed 3034.84 samples/sec Loss 6.7902 LearningRate 0.0222 Epoch: 10 Global Step: 131490 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:09:50,887-Speed 2962.92 samples/sec Loss 6.9133 LearningRate 0.0222 Epoch: 10 Global Step: 131500 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:09:54,265-Speed 3032.75 samples/sec Loss 6.7060 LearningRate 0.0221 Epoch: 10 Global Step: 131510 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:09:57,716-Speed 2967.40 samples/sec Loss 6.8114 LearningRate 0.0221 Epoch: 10 Global Step: 131520 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:10:01,179-Speed 2958.52 samples/sec Loss 6.8004 LearningRate 0.0221 Epoch: 10 Global Step: 131530 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:10:04,584-Speed 3008.03 samples/sec Loss 6.8132 LearningRate 0.0221 Epoch: 10 Global Step: 131540 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:10:07,947-Speed 3046.13 samples/sec Loss 6.8649 LearningRate 0.0221 Epoch: 10 Global Step: 131550 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:10:11,360-Speed 3001.30 samples/sec Loss 6.9309 LearningRate 0.0221 Epoch: 10 Global Step: 131560 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:10:14,752-Speed 3019.57 samples/sec Loss 6.8026 LearningRate 0.0221 Epoch: 10 Global Step: 131570 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:10:18,104-Speed 3055.65 samples/sec Loss 6.7002 LearningRate 0.0221 Epoch: 10 Global Step: 131580 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:10:21,481-Speed 3033.59 samples/sec Loss 6.8122 LearningRate 0.0221 Epoch: 10 Global Step: 131590 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:10:24,851-Speed 3039.36 samples/sec Loss 6.9294 LearningRate 0.0221 Epoch: 10 Global Step: 131600 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:10:28,273-Speed 2993.14 samples/sec Loss 6.7802 LearningRate 0.0221 Epoch: 10 Global Step: 131610 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:10:31,614-Speed 3065.62 samples/sec Loss 6.8427 LearningRate 0.0221 Epoch: 10 Global Step: 131620 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:10:35,100-Speed 2938.26 samples/sec Loss 6.8235 LearningRate 0.0221 Epoch: 10 Global Step: 131630 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:10:38,562-Speed 2959.07 samples/sec Loss 6.8087 LearningRate 0.0221 Epoch: 10 Global Step: 131640 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:10:42,101-Speed 2894.02 samples/sec Loss 6.9583 LearningRate 0.0221 Epoch: 10 Global Step: 131650 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:10:45,470-Speed 3040.65 samples/sec Loss 6.9293 LearningRate 0.0221 Epoch: 10 Global Step: 131660 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:10:48,910-Speed 2977.54 samples/sec Loss 6.8874 LearningRate 0.0221 Epoch: 10 Global Step: 131670 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:10:52,250-Speed 3066.82 samples/sec Loss 6.8373 LearningRate 0.0221 Epoch: 10 Global Step: 131680 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:10:55,678-Speed 2987.69 samples/sec Loss 6.8451 LearningRate 0.0221 Epoch: 10 Global Step: 131690 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:10:59,110-Speed 2984.63 samples/sec Loss 6.7951 LearningRate 0.0221 Epoch: 10 Global Step: 131700 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:11:02,501-Speed 3020.66 samples/sec Loss 6.8493 LearningRate 0.0221 Epoch: 10 Global Step: 131710 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:11:05,986-Speed 2938.69 samples/sec Loss 6.7566 LearningRate 0.0221 Epoch: 10 Global Step: 131720 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:11:09,400-Speed 3000.58 samples/sec Loss 6.7851 LearningRate 0.0221 Epoch: 10 Global Step: 131730 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:11:12,770-Speed 3039.75 samples/sec Loss 6.8628 LearningRate 0.0221 Epoch: 10 Global Step: 131740 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:11:16,121-Speed 3056.57 samples/sec Loss 6.9625 LearningRate 0.0221 Epoch: 10 Global Step: 131750 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:11:19,479-Speed 3050.23 samples/sec Loss 6.9079 LearningRate 0.0221 Epoch: 10 Global Step: 131760 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:11:22,861-Speed 3028.81 samples/sec Loss 6.7048 LearningRate 0.0220 Epoch: 10 Global Step: 131770 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:11:26,227-Speed 3043.22 samples/sec Loss 6.6788 LearningRate 0.0220 Epoch: 10 Global Step: 131780 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:11:29,562-Speed 3070.53 samples/sec Loss 6.8213 LearningRate 0.0220 Epoch: 10 Global Step: 131790 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:11:32,923-Speed 3047.89 samples/sec Loss 6.8004 LearningRate 0.0220 Epoch: 10 Global Step: 131800 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:11:36,363-Speed 2977.65 samples/sec Loss 6.7615 LearningRate 0.0220 Epoch: 10 Global Step: 131810 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:11:39,810-Speed 2971.91 samples/sec Loss 6.8130 LearningRate 0.0220 Epoch: 10 Global Step: 131820 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:11:43,285-Speed 2948.00 samples/sec Loss 6.8460 LearningRate 0.0220 Epoch: 10 Global Step: 131830 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:11:46,664-Speed 3031.02 samples/sec Loss 6.8960 LearningRate 0.0220 Epoch: 10 Global Step: 131840 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:11:50,111-Speed 2971.09 samples/sec Loss 6.8434 LearningRate 0.0220 Epoch: 10 Global Step: 131850 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 14:11:53,466-Speed 3052.95 samples/sec Loss 6.9250 LearningRate 0.0220 Epoch: 10 Global Step: 131860 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 14:11:56,823-Speed 3051.31 samples/sec Loss 6.8170 LearningRate 0.0220 Epoch: 10 Global Step: 131870 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 14:12:00,162-Speed 3067.59 samples/sec Loss 6.9806 LearningRate 0.0220 Epoch: 10 Global Step: 131880 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 14:12:03,594-Speed 2984.52 samples/sec Loss 6.8387 LearningRate 0.0220 Epoch: 10 Global Step: 131890 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 14:12:07,011-Speed 2997.55 samples/sec Loss 6.9450 LearningRate 0.0220 Epoch: 10 Global Step: 131900 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 14:12:10,405-Speed 3017.68 samples/sec Loss 6.8887 LearningRate 0.0220 Epoch: 10 Global Step: 131910 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 14:12:13,819-Speed 3001.08 samples/sec Loss 6.8442 LearningRate 0.0220 Epoch: 10 Global Step: 131920 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 14:12:17,255-Speed 2980.93 samples/sec Loss 6.9988 LearningRate 0.0220 Epoch: 10 Global Step: 131930 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 14:12:20,595-Speed 3066.50 samples/sec Loss 6.7338 LearningRate 0.0220 Epoch: 10 Global Step: 131940 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 14:12:24,038-Speed 2974.70 samples/sec Loss 6.8050 LearningRate 0.0220 Epoch: 10 Global Step: 131950 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:12:27,408-Speed 3039.43 samples/sec Loss 6.7854 LearningRate 0.0220 Epoch: 10 Global Step: 131960 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:12:30,810-Speed 3010.68 samples/sec Loss 6.7569 LearningRate 0.0220 Epoch: 10 Global Step: 131970 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:12:34,157-Speed 3060.84 samples/sec Loss 6.7900 LearningRate 0.0220 Epoch: 10 Global Step: 131980 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:12:37,463-Speed 3098.54 samples/sec Loss 6.8283 LearningRate 0.0220 Epoch: 10 Global Step: 131990 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:12:40,845-Speed 3028.66 samples/sec Loss 6.8878 LearningRate 0.0220 Epoch: 10 Global Step: 132000 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:12:44,232-Speed 3024.14 samples/sec Loss 6.9255 LearningRate 0.0220 Epoch: 10 Global Step: 132010 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:12:47,592-Speed 3047.78 samples/sec Loss 7.0177 LearningRate 0.0220 Epoch: 10 Global Step: 132020 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:12:50,942-Speed 3057.90 samples/sec Loss 6.8394 LearningRate 0.0220 Epoch: 10 Global Step: 132030 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:12:54,284-Speed 3065.16 samples/sec Loss 6.7688 LearningRate 0.0219 Epoch: 10 Global Step: 132040 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:12:57,649-Speed 3044.56 samples/sec Loss 6.8090 LearningRate 0.0219 Epoch: 10 Global Step: 132050 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:13:01,031-Speed 3028.55 samples/sec Loss 6.8480 LearningRate 0.0219 Epoch: 10 Global Step: 132060 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:13:04,456-Speed 2990.29 samples/sec Loss 7.0112 LearningRate 0.0219 Epoch: 10 Global Step: 132070 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:13:07,778-Speed 3083.28 samples/sec Loss 6.7838 LearningRate 0.0219 Epoch: 10 Global Step: 132080 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:13:11,198-Speed 2995.03 samples/sec Loss 6.7116 LearningRate 0.0219 Epoch: 10 Global Step: 132090 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:13:14,628-Speed 2986.53 samples/sec Loss 6.8301 LearningRate 0.0219 Epoch: 10 Global Step: 132100 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:13:18,013-Speed 3025.42 samples/sec Loss 6.9978 LearningRate 0.0219 Epoch: 10 Global Step: 132110 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:13:21,383-Speed 3039.81 samples/sec Loss 6.8223 LearningRate 0.0219 Epoch: 10 Global Step: 132120 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:13:24,740-Speed 3051.53 samples/sec Loss 6.9158 LearningRate 0.0219 Epoch: 10 Global Step: 132130 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:13:28,107-Speed 3042.32 samples/sec Loss 6.9006 LearningRate 0.0219 Epoch: 10 Global Step: 132140 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:13:31,431-Speed 3081.29 samples/sec Loss 6.8815 LearningRate 0.0219 Epoch: 10 Global Step: 132150 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:13:34,742-Speed 3093.59 samples/sec Loss 6.8281 LearningRate 0.0219 Epoch: 10 Global Step: 132160 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:13:38,112-Speed 3040.22 samples/sec Loss 6.8908 LearningRate 0.0219 Epoch: 10 Global Step: 132170 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:13:41,476-Speed 3044.88 samples/sec Loss 6.8044 LearningRate 0.0219 Epoch: 10 Global Step: 132180 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:13:44,806-Speed 3075.76 samples/sec Loss 6.9394 LearningRate 0.0219 Epoch: 10 Global Step: 132190 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:13:48,150-Speed 3062.88 samples/sec Loss 6.8868 LearningRate 0.0219 Epoch: 10 Global Step: 132200 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:13:51,542-Speed 3019.73 samples/sec Loss 6.8857 LearningRate 0.0219 Epoch: 10 Global Step: 132210 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:13:54,900-Speed 3050.40 samples/sec Loss 6.9623 LearningRate 0.0219 Epoch: 10 Global Step: 132220 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:13:58,267-Speed 3042.68 samples/sec Loss 6.8581 LearningRate 0.0219 Epoch: 10 Global Step: 132230 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:14:01,579-Speed 3092.04 samples/sec Loss 6.7752 LearningRate 0.0219 Epoch: 10 Global Step: 132240 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:14:04,932-Speed 3055.40 samples/sec Loss 6.9038 LearningRate 0.0219 Epoch: 10 Global Step: 132250 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:14:08,268-Speed 3074.25 samples/sec Loss 6.7468 LearningRate 0.0219 Epoch: 10 Global Step: 132260 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:14:11,577-Speed 3095.80 samples/sec Loss 6.7122 LearningRate 0.0219 Epoch: 10 Global Step: 132270 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:14:14,982-Speed 3008.37 samples/sec Loss 6.7064 LearningRate 0.0219 Epoch: 10 Global Step: 132280 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:14:18,407-Speed 2990.38 samples/sec Loss 6.8850 LearningRate 0.0219 Epoch: 10 Global Step: 132290 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:14:21,819-Speed 3002.23 samples/sec Loss 6.9194 LearningRate 0.0218 Epoch: 10 Global Step: 132300 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:14:25,150-Speed 3074.92 samples/sec Loss 6.7713 LearningRate 0.0218 Epoch: 10 Global Step: 132310 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:14:28,580-Speed 2987.15 samples/sec Loss 6.7688 LearningRate 0.0218 Epoch: 10 Global Step: 132320 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:14:31,948-Speed 3040.44 samples/sec Loss 6.8214 LearningRate 0.0218 Epoch: 10 Global Step: 132330 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:14:35,404-Speed 2964.04 samples/sec Loss 6.8761 LearningRate 0.0218 Epoch: 10 Global Step: 132340 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:14:38,790-Speed 3025.02 samples/sec Loss 6.7020 LearningRate 0.0218 Epoch: 10 Global Step: 132350 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:14:42,143-Speed 3054.99 samples/sec Loss 6.9856 LearningRate 0.0218 Epoch: 10 Global Step: 132360 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:14:45,625-Speed 2941.43 samples/sec Loss 6.7723 LearningRate 0.0218 Epoch: 10 Global Step: 132370 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:14:49,107-Speed 2942.26 samples/sec Loss 6.8242 LearningRate 0.0218 Epoch: 10 Global Step: 132380 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:14:52,451-Speed 3062.62 samples/sec Loss 6.8455 LearningRate 0.0218 Epoch: 10 Global Step: 132390 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 14:14:55,912-Speed 2959.53 samples/sec Loss 6.7429 LearningRate 0.0218 Epoch: 10 Global Step: 132400 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 14:14:59,311-Speed 3013.56 samples/sec Loss 6.7222 LearningRate 0.0218 Epoch: 10 Global Step: 132410 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 14:15:02,739-Speed 2988.39 samples/sec Loss 6.8337 LearningRate 0.0218 Epoch: 10 Global Step: 132420 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 14:15:06,099-Speed 3048.29 samples/sec Loss 6.9687 LearningRate 0.0218 Epoch: 10 Global Step: 132430 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 14:15:09,530-Speed 2985.44 samples/sec Loss 6.8136 LearningRate 0.0218 Epoch: 10 Global Step: 132440 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 14:15:12,902-Speed 3037.47 samples/sec Loss 6.8822 LearningRate 0.0218 Epoch: 10 Global Step: 132450 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 14:15:16,330-Speed 2988.80 samples/sec Loss 6.7557 LearningRate 0.0218 Epoch: 10 Global Step: 132460 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 14:15:19,700-Speed 3039.31 samples/sec Loss 6.8619 LearningRate 0.0218 Epoch: 10 Global Step: 132470 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 14:15:23,067-Speed 3042.31 samples/sec Loss 6.8470 LearningRate 0.0218 Epoch: 10 Global Step: 132480 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 14:15:26,442-Speed 3033.98 samples/sec Loss 6.8058 LearningRate 0.0218 Epoch: 10 Global Step: 132490 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:15:29,854-Speed 3002.80 samples/sec Loss 6.8234 LearningRate 0.0218 Epoch: 10 Global Step: 132500 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:15:33,206-Speed 3054.98 samples/sec Loss 6.7498 LearningRate 0.0218 Epoch: 10 Global Step: 132510 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:15:36,679-Speed 2949.50 samples/sec Loss 6.9078 LearningRate 0.0218 Epoch: 10 Global Step: 132520 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:15:40,151-Speed 2950.01 samples/sec Loss 6.6132 LearningRate 0.0218 Epoch: 10 Global Step: 132530 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:15:43,551-Speed 3013.50 samples/sec Loss 6.7126 LearningRate 0.0218 Epoch: 10 Global Step: 132540 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:15:46,954-Speed 3009.93 samples/sec Loss 6.7243 LearningRate 0.0218 Epoch: 10 Global Step: 132550 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:15:50,375-Speed 2994.65 samples/sec Loss 6.9208 LearningRate 0.0218 Epoch: 10 Global Step: 132560 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:15:53,863-Speed 2936.54 samples/sec Loss 6.7805 LearningRate 0.0217 Epoch: 10 Global Step: 132570 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:15:57,229-Speed 3042.92 samples/sec Loss 6.8814 LearningRate 0.0217 Epoch: 10 Global Step: 132580 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:16:00,556-Speed 3078.26 samples/sec Loss 6.7638 LearningRate 0.0217 Epoch: 10 Global Step: 132590 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:16:04,025-Speed 2952.79 samples/sec Loss 6.8466 LearningRate 0.0217 Epoch: 10 Global Step: 132600 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:16:07,472-Speed 2971.45 samples/sec Loss 6.8567 LearningRate 0.0217 Epoch: 10 Global Step: 132610 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:16:10,834-Speed 3046.80 samples/sec Loss 6.8970 LearningRate 0.0217 Epoch: 10 Global Step: 132620 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:16:14,253-Speed 2995.49 samples/sec Loss 6.8670 LearningRate 0.0217 Epoch: 10 Global Step: 132630 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:16:17,627-Speed 3036.52 samples/sec Loss 6.7376 LearningRate 0.0217 Epoch: 10 Global Step: 132640 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:16:21,085-Speed 2961.77 samples/sec Loss 6.8257 LearningRate 0.0217 Epoch: 10 Global Step: 132650 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:16:24,573-Speed 2937.16 samples/sec Loss 6.7761 LearningRate 0.0217 Epoch: 10 Global Step: 132660 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:16:27,948-Speed 3034.42 samples/sec Loss 6.9045 LearningRate 0.0217 Epoch: 10 Global Step: 132670 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:16:31,316-Speed 3041.98 samples/sec Loss 6.7699 LearningRate 0.0217 Epoch: 10 Global Step: 132680 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:16:34,685-Speed 3039.98 samples/sec Loss 6.7053 LearningRate 0.0217 Epoch: 10 Global Step: 132690 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:16:38,076-Speed 3020.75 samples/sec Loss 6.6820 LearningRate 0.0217 Epoch: 10 Global Step: 132700 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:16:41,495-Speed 2995.67 samples/sec Loss 6.7936 LearningRate 0.0217 Epoch: 10 Global Step: 132710 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:16:44,968-Speed 2949.02 samples/sec Loss 6.7435 LearningRate 0.0217 Epoch: 10 Global Step: 132720 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:16:48,401-Speed 2983.57 samples/sec Loss 6.9441 LearningRate 0.0217 Epoch: 10 Global Step: 132730 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:16:51,754-Speed 3055.06 samples/sec Loss 6.7845 LearningRate 0.0217 Epoch: 10 Global Step: 132740 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:16:55,098-Speed 3062.41 samples/sec Loss 6.6395 LearningRate 0.0217 Epoch: 10 Global Step: 132750 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:16:58,436-Speed 3069.24 samples/sec Loss 6.8296 LearningRate 0.0217 Epoch: 10 Global Step: 132760 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:17:01,861-Speed 2990.29 samples/sec Loss 6.8023 LearningRate 0.0217 Epoch: 10 Global Step: 132770 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:17:05,221-Speed 3048.95 samples/sec Loss 6.7675 LearningRate 0.0217 Epoch: 10 Global Step: 132780 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:17:08,633-Speed 3002.18 samples/sec Loss 6.8558 LearningRate 0.0217 Epoch: 10 Global Step: 132790 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:17:12,011-Speed 3031.72 samples/sec Loss 6.5866 LearningRate 0.0217 Epoch: 10 Global Step: 132800 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:17:15,448-Speed 2980.97 samples/sec Loss 6.8686 LearningRate 0.0217 Epoch: 10 Global Step: 132810 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:17:18,840-Speed 3019.70 samples/sec Loss 6.8016 LearningRate 0.0217 Epoch: 10 Global Step: 132820 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:17:22,211-Speed 3038.34 samples/sec Loss 6.8787 LearningRate 0.0217 Epoch: 10 Global Step: 132830 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:17:25,555-Speed 3062.32 samples/sec Loss 6.7457 LearningRate 0.0216 Epoch: 10 Global Step: 132840 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:17:28,898-Speed 3064.24 samples/sec Loss 6.6790 LearningRate 0.0216 Epoch: 10 Global Step: 132850 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:17:32,224-Speed 3079.86 samples/sec Loss 6.7135 LearningRate 0.0216 Epoch: 10 Global Step: 132860 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:17:35,545-Speed 3084.80 samples/sec Loss 6.7954 LearningRate 0.0216 Epoch: 10 Global Step: 132870 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:17:38,918-Speed 3036.69 samples/sec Loss 6.9508 LearningRate 0.0216 Epoch: 10 Global Step: 132880 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:17:42,279-Speed 3047.67 samples/sec Loss 6.8323 LearningRate 0.0216 Epoch: 10 Global Step: 132890 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:17:45,662-Speed 3028.20 samples/sec Loss 6.8050 LearningRate 0.0216 Epoch: 10 Global Step: 132900 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:17:49,064-Speed 3010.99 samples/sec Loss 6.8051 LearningRate 0.0216 Epoch: 10 Global Step: 132910 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:17:52,393-Speed 3077.50 samples/sec Loss 6.7771 LearningRate 0.0216 Epoch: 10 Global Step: 132920 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:17:55,731-Speed 3068.13 samples/sec Loss 6.7956 LearningRate 0.0216 Epoch: 10 Global Step: 132930 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:17:59,173-Speed 2975.99 samples/sec Loss 6.7846 LearningRate 0.0216 Epoch: 10 Global Step: 132940 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:18:02,634-Speed 2958.93 samples/sec Loss 6.7579 LearningRate 0.0216 Epoch: 10 Global Step: 132950 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:18:06,056-Speed 2993.46 samples/sec Loss 6.9126 LearningRate 0.0216 Epoch: 10 Global Step: 132960 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:18:09,415-Speed 3049.84 samples/sec Loss 6.7026 LearningRate 0.0216 Epoch: 10 Global Step: 132970 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:18:12,763-Speed 3059.29 samples/sec Loss 6.7751 LearningRate 0.0216 Epoch: 10 Global Step: 132980 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:18:16,150-Speed 3024.38 samples/sec Loss 6.8485 LearningRate 0.0216 Epoch: 10 Global Step: 132990 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:18:19,587-Speed 2980.66 samples/sec Loss 6.7998 LearningRate 0.0216 Epoch: 10 Global Step: 133000 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:18:22,886-Speed 3104.93 samples/sec Loss 6.7967 LearningRate 0.0216 Epoch: 10 Global Step: 133010 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:18:26,296-Speed 3003.15 samples/sec Loss 6.7371 LearningRate 0.0216 Epoch: 10 Global Step: 133020 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:18:29,673-Speed 3033.67 samples/sec Loss 6.8872 LearningRate 0.0216 Epoch: 10 Global Step: 133030 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:18:33,089-Speed 2998.37 samples/sec Loss 6.8142 LearningRate 0.0216 Epoch: 10 Global Step: 133040 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:18:36,566-Speed 2946.05 samples/sec Loss 6.9085 LearningRate 0.0216 Epoch: 10 Global Step: 133050 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:18:39,968-Speed 3010.62 samples/sec Loss 6.7552 LearningRate 0.0216 Epoch: 10 Global Step: 133060 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:18:43,347-Speed 3031.82 samples/sec Loss 6.8331 LearningRate 0.0216 Epoch: 10 Global Step: 133070 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:18:46,712-Speed 3043.91 samples/sec Loss 6.7902 LearningRate 0.0216 Epoch: 10 Global Step: 133080 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:18:50,153-Speed 2976.78 samples/sec Loss 6.7343 LearningRate 0.0216 Epoch: 10 Global Step: 133090 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:18:53,503-Speed 3057.21 samples/sec Loss 6.8648 LearningRate 0.0215 Epoch: 10 Global Step: 133100 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:18:56,885-Speed 3029.06 samples/sec Loss 6.7655 LearningRate 0.0215 Epoch: 10 Global Step: 133110 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:19:00,294-Speed 3004.68 samples/sec Loss 6.7791 LearningRate 0.0215 Epoch: 10 Global Step: 133120 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:19:03,760-Speed 2954.75 samples/sec Loss 6.8485 LearningRate 0.0215 Epoch: 10 Global Step: 133130 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:19:07,093-Speed 3073.16 samples/sec Loss 6.9290 LearningRate 0.0215 Epoch: 10 Global Step: 133140 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:19:10,466-Speed 3036.65 samples/sec Loss 6.8010 LearningRate 0.0215 Epoch: 10 Global Step: 133150 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:19:13,794-Speed 3078.24 samples/sec Loss 6.7750 LearningRate 0.0215 Epoch: 10 Global Step: 133160 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:19:17,163-Speed 3039.45 samples/sec Loss 6.7536 LearningRate 0.0215 Epoch: 10 Global Step: 133170 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:19:20,625-Speed 2960.18 samples/sec Loss 6.6756 LearningRate 0.0215 Epoch: 10 Global Step: 133180 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:19:24,083-Speed 2961.72 samples/sec Loss 6.8254 LearningRate 0.0215 Epoch: 10 Global Step: 133190 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:19:27,554-Speed 2950.64 samples/sec Loss 6.8378 LearningRate 0.0215 Epoch: 10 Global Step: 133200 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:19:30,987-Speed 2983.72 samples/sec Loss 6.9121 LearningRate 0.0215 Epoch: 10 Global Step: 133210 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:19:34,486-Speed 2927.47 samples/sec Loss 6.8371 LearningRate 0.0215 Epoch: 10 Global Step: 133220 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:19:37,941-Speed 2964.87 samples/sec Loss 6.7192 LearningRate 0.0215 Epoch: 10 Global Step: 133230 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:19:41,456-Speed 2913.87 samples/sec Loss 6.8181 LearningRate 0.0215 Epoch: 10 Global Step: 133240 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:19:44,848-Speed 3019.30 samples/sec Loss 6.7540 LearningRate 0.0215 Epoch: 10 Global Step: 133250 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:19:48,282-Speed 2983.49 samples/sec Loss 6.7862 LearningRate 0.0215 Epoch: 10 Global Step: 133260 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:19:51,633-Speed 3056.06 samples/sec Loss 6.7479 LearningRate 0.0215 Epoch: 10 Global Step: 133270 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:19:55,055-Speed 2993.83 samples/sec Loss 6.7427 LearningRate 0.0215 Epoch: 10 Global Step: 133280 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:19:58,531-Speed 2946.76 samples/sec Loss 6.6397 LearningRate 0.0215 Epoch: 10 Global Step: 133290 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:20:01,864-Speed 3073.52 samples/sec Loss 6.8089 LearningRate 0.0215 Epoch: 10 Global Step: 133300 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:20:05,317-Speed 2965.35 samples/sec Loss 6.7641 LearningRate 0.0215 Epoch: 10 Global Step: 133310 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:20:08,751-Speed 2983.47 samples/sec Loss 6.6654 LearningRate 0.0215 Epoch: 10 Global Step: 133320 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:20:12,255-Speed 2923.08 samples/sec Loss 6.8342 LearningRate 0.0215 Epoch: 10 Global Step: 133330 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:20:15,653-Speed 3014.05 samples/sec Loss 6.7995 LearningRate 0.0215 Epoch: 10 Global Step: 133340 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:20:19,100-Speed 2971.88 samples/sec Loss 6.7568 LearningRate 0.0215 Epoch: 10 Global Step: 133350 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:20:22,519-Speed 2995.60 samples/sec Loss 6.7418 LearningRate 0.0215 Epoch: 10 Global Step: 133360 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:20:25,902-Speed 3027.65 samples/sec Loss 6.7533 LearningRate 0.0214 Epoch: 10 Global Step: 133370 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:20:29,235-Speed 3073.44 samples/sec Loss 6.8518 LearningRate 0.0214 Epoch: 10 Global Step: 133380 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:20:32,566-Speed 3075.58 samples/sec Loss 6.8656 LearningRate 0.0214 Epoch: 10 Global Step: 133390 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:20:35,962-Speed 3015.87 samples/sec Loss 6.7379 LearningRate 0.0214 Epoch: 10 Global Step: 133400 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:20:39,347-Speed 3025.87 samples/sec Loss 6.7517 LearningRate 0.0214 Epoch: 10 Global Step: 133410 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:20:42,759-Speed 3002.41 samples/sec Loss 6.8646 LearningRate 0.0214 Epoch: 10 Global Step: 133420 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:20:46,168-Speed 3004.80 samples/sec Loss 6.7334 LearningRate 0.0214 Epoch: 10 Global Step: 133430 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:20:49,543-Speed 3035.53 samples/sec Loss 6.7270 LearningRate 0.0214 Epoch: 10 Global Step: 133440 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:20:52,941-Speed 3014.58 samples/sec Loss 6.7182 LearningRate 0.0214 Epoch: 10 Global Step: 133450 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:20:56,345-Speed 3008.50 samples/sec Loss 6.6999 LearningRate 0.0214 Epoch: 10 Global Step: 133460 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:20:59,778-Speed 2983.99 samples/sec Loss 6.7477 LearningRate 0.0214 Epoch: 10 Global Step: 133470 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:21:03,199-Speed 2993.93 samples/sec Loss 6.6417 LearningRate 0.0214 Epoch: 10 Global Step: 133480 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:21:06,567-Speed 3041.74 samples/sec Loss 6.8403 LearningRate 0.0214 Epoch: 10 Global Step: 133490 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:21:09,991-Speed 2991.39 samples/sec Loss 6.7201 LearningRate 0.0214 Epoch: 10 Global Step: 133500 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:21:13,411-Speed 2994.75 samples/sec Loss 6.7271 LearningRate 0.0214 Epoch: 10 Global Step: 133510 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:21:16,827-Speed 2998.64 samples/sec Loss 6.6957 LearningRate 0.0214 Epoch: 10 Global Step: 133520 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:21:20,180-Speed 3055.12 samples/sec Loss 6.8281 LearningRate 0.0214 Epoch: 10 Global Step: 133530 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:21:23,595-Speed 2998.67 samples/sec Loss 6.7249 LearningRate 0.0214 Epoch: 10 Global Step: 133540 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:21:26,990-Speed 3018.66 samples/sec Loss 6.7928 LearningRate 0.0214 Epoch: 10 Global Step: 133550 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:21:30,356-Speed 3042.97 samples/sec Loss 6.7263 LearningRate 0.0214 Epoch: 10 Global Step: 133560 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:21:33,743-Speed 3024.47 samples/sec Loss 6.9165 LearningRate 0.0214 Epoch: 10 Global Step: 133570 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 14:21:37,049-Speed 3098.02 samples/sec Loss 6.7995 LearningRate 0.0214 Epoch: 10 Global Step: 133580 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 14:21:40,377-Speed 3077.85 samples/sec Loss 6.8005 LearningRate 0.0214 Epoch: 10 Global Step: 133590 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 14:21:43,726-Speed 3058.37 samples/sec Loss 6.6201 LearningRate 0.0214 Epoch: 10 Global Step: 133600 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 14:21:47,116-Speed 3021.01 samples/sec Loss 6.7592 LearningRate 0.0214 Epoch: 10 Global Step: 133610 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 14:21:50,444-Speed 3078.39 samples/sec Loss 6.7908 LearningRate 0.0214 Epoch: 10 Global Step: 133620 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 14:21:53,814-Speed 3039.37 samples/sec Loss 6.7717 LearningRate 0.0214 Epoch: 10 Global Step: 133630 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 14:21:57,122-Speed 3095.97 samples/sec Loss 6.6650 LearningRate 0.0213 Epoch: 10 Global Step: 133640 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 14:22:00,470-Speed 3059.42 samples/sec Loss 6.7934 LearningRate 0.0213 Epoch: 10 Global Step: 133650 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 14:22:03,876-Speed 3007.77 samples/sec Loss 6.7330 LearningRate 0.0213 Epoch: 10 Global Step: 133660 Fp16 Grad Scale: 16384 Required: 11 hours Training: 2022-04-27 14:22:07,207-Speed 3075.02 samples/sec Loss 6.8248 LearningRate 0.0213 Epoch: 10 Global Step: 133670 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:22:10,541-Speed 3071.68 samples/sec Loss 6.7906 LearningRate 0.0213 Epoch: 10 Global Step: 133680 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:22:13,954-Speed 3001.53 samples/sec Loss 6.7933 LearningRate 0.0213 Epoch: 10 Global Step: 133690 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:22:17,288-Speed 3071.74 samples/sec Loss 6.8587 LearningRate 0.0213 Epoch: 10 Global Step: 133700 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:22:20,642-Speed 3054.60 samples/sec Loss 6.7616 LearningRate 0.0213 Epoch: 10 Global Step: 133710 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:22:23,997-Speed 3052.56 samples/sec Loss 6.7928 LearningRate 0.0213 Epoch: 10 Global Step: 133720 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:22:27,404-Speed 3006.33 samples/sec Loss 6.7850 LearningRate 0.0213 Epoch: 10 Global Step: 133730 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:22:30,815-Speed 3002.86 samples/sec Loss 6.7900 LearningRate 0.0213 Epoch: 10 Global Step: 133740 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:22:34,146-Speed 3075.55 samples/sec Loss 6.6933 LearningRate 0.0213 Epoch: 10 Global Step: 133750 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:22:37,484-Speed 3069.00 samples/sec Loss 6.7466 LearningRate 0.0213 Epoch: 10 Global Step: 133760 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:22:40,792-Speed 3095.82 samples/sec Loss 6.7277 LearningRate 0.0213 Epoch: 10 Global Step: 133770 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:22:44,137-Speed 3062.63 samples/sec Loss 6.8653 LearningRate 0.0213 Epoch: 10 Global Step: 133780 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:22:47,458-Speed 3084.48 samples/sec Loss 6.6711 LearningRate 0.0213 Epoch: 10 Global Step: 133790 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:22:50,822-Speed 3044.84 samples/sec Loss 6.7480 LearningRate 0.0213 Epoch: 10 Global Step: 133800 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:22:54,227-Speed 3007.51 samples/sec Loss 6.7072 LearningRate 0.0213 Epoch: 10 Global Step: 133810 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:22:57,633-Speed 3007.19 samples/sec Loss 6.7265 LearningRate 0.0213 Epoch: 10 Global Step: 133820 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:23:01,004-Speed 3038.60 samples/sec Loss 6.7318 LearningRate 0.0213 Epoch: 10 Global Step: 133830 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:23:04,471-Speed 2955.01 samples/sec Loss 6.8288 LearningRate 0.0213 Epoch: 10 Global Step: 133840 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:23:07,799-Speed 3077.69 samples/sec Loss 6.6965 LearningRate 0.0213 Epoch: 10 Global Step: 133850 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:23:11,202-Speed 3009.91 samples/sec Loss 6.6940 LearningRate 0.0213 Epoch: 10 Global Step: 133860 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:23:14,602-Speed 3012.79 samples/sec Loss 6.6568 LearningRate 0.0213 Epoch: 10 Global Step: 133870 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:23:17,969-Speed 3041.52 samples/sec Loss 6.6355 LearningRate 0.0213 Epoch: 10 Global Step: 133880 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:23:21,326-Speed 3051.22 samples/sec Loss 6.7488 LearningRate 0.0213 Epoch: 10 Global Step: 133890 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:23:24,660-Speed 3072.49 samples/sec Loss 6.7121 LearningRate 0.0213 Epoch: 10 Global Step: 133900 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:23:28,126-Speed 2955.10 samples/sec Loss 6.7785 LearningRate 0.0212 Epoch: 10 Global Step: 133910 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:23:31,594-Speed 2953.32 samples/sec Loss 6.6387 LearningRate 0.0212 Epoch: 10 Global Step: 133920 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:23:34,928-Speed 3073.11 samples/sec Loss 6.7711 LearningRate 0.0212 Epoch: 10 Global Step: 133930 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:23:38,322-Speed 3017.38 samples/sec Loss 6.7839 LearningRate 0.0212 Epoch: 10 Global Step: 133940 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:23:41,724-Speed 3011.00 samples/sec Loss 6.7505 LearningRate 0.0212 Epoch: 10 Global Step: 133950 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:23:45,140-Speed 2998.83 samples/sec Loss 6.7474 LearningRate 0.0212 Epoch: 10 Global Step: 133960 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:23:48,518-Speed 3032.18 samples/sec Loss 6.7012 LearningRate 0.0212 Epoch: 10 Global Step: 133970 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:23:51,899-Speed 3029.53 samples/sec Loss 6.8335 LearningRate 0.0212 Epoch: 10 Global Step: 133980 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:23:55,244-Speed 3062.70 samples/sec Loss 6.7666 LearningRate 0.0212 Epoch: 10 Global Step: 133990 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:23:58,636-Speed 3019.18 samples/sec Loss 6.7985 LearningRate 0.0212 Epoch: 10 Global Step: 134000 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:24:01,974-Speed 3069.52 samples/sec Loss 6.5268 LearningRate 0.0212 Epoch: 10 Global Step: 134010 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:24:05,394-Speed 2994.84 samples/sec Loss 6.7400 LearningRate 0.0212 Epoch: 10 Global Step: 134020 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:24:08,774-Speed 3030.57 samples/sec Loss 6.8113 LearningRate 0.0212 Epoch: 10 Global Step: 134030 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:24:12,233-Speed 2961.43 samples/sec Loss 6.6668 LearningRate 0.0212 Epoch: 10 Global Step: 134040 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:24:15,599-Speed 3042.81 samples/sec Loss 6.7498 LearningRate 0.0212 Epoch: 10 Global Step: 134050 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:24:18,940-Speed 3065.54 samples/sec Loss 6.8575 LearningRate 0.0212 Epoch: 10 Global Step: 134060 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:24:22,370-Speed 2986.60 samples/sec Loss 6.8121 LearningRate 0.0212 Epoch: 10 Global Step: 134070 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:24:25,693-Speed 3082.57 samples/sec Loss 6.6543 LearningRate 0.0212 Epoch: 10 Global Step: 134080 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:24:29,142-Speed 2969.21 samples/sec Loss 6.7160 LearningRate 0.0212 Epoch: 10 Global Step: 134090 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:24:32,496-Speed 3054.53 samples/sec Loss 6.7653 LearningRate 0.0212 Epoch: 10 Global Step: 134100 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:24:35,822-Speed 3079.57 samples/sec Loss 6.6455 LearningRate 0.0212 Epoch: 10 Global Step: 134110 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:24:39,181-Speed 3049.08 samples/sec Loss 6.8411 LearningRate 0.0212 Epoch: 10 Global Step: 134120 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:24:42,596-Speed 2999.21 samples/sec Loss 6.6833 LearningRate 0.0212 Epoch: 10 Global Step: 134130 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:24:45,910-Speed 3090.74 samples/sec Loss 6.7158 LearningRate 0.0212 Epoch: 10 Global Step: 134140 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:24:49,264-Speed 3054.14 samples/sec Loss 6.8121 LearningRate 0.0212 Epoch: 10 Global Step: 134150 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:24:52,676-Speed 3001.63 samples/sec Loss 6.7191 LearningRate 0.0212 Epoch: 10 Global Step: 134160 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:24:56,057-Speed 3029.87 samples/sec Loss 6.6490 LearningRate 0.0212 Epoch: 10 Global Step: 134170 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:24:59,487-Speed 2985.71 samples/sec Loss 6.7120 LearningRate 0.0211 Epoch: 10 Global Step: 134180 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:25:02,889-Speed 3011.29 samples/sec Loss 6.6937 LearningRate 0.0211 Epoch: 10 Global Step: 134190 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:25:06,234-Speed 3061.91 samples/sec Loss 6.7751 LearningRate 0.0211 Epoch: 10 Global Step: 134200 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:25:09,661-Speed 2988.87 samples/sec Loss 6.7763 LearningRate 0.0211 Epoch: 10 Global Step: 134210 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:25:13,137-Speed 2946.65 samples/sec Loss 6.7720 LearningRate 0.0211 Epoch: 10 Global Step: 134220 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:25:16,584-Speed 2971.28 samples/sec Loss 6.7289 LearningRate 0.0211 Epoch: 10 Global Step: 134230 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:25:20,020-Speed 2981.69 samples/sec Loss 6.8570 LearningRate 0.0211 Epoch: 10 Global Step: 134240 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:25:23,519-Speed 2926.92 samples/sec Loss 6.6679 LearningRate 0.0211 Epoch: 10 Global Step: 134250 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:25:26,869-Speed 3057.63 samples/sec Loss 6.7733 LearningRate 0.0211 Epoch: 10 Global Step: 134260 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:25:30,357-Speed 2936.75 samples/sec Loss 6.6509 LearningRate 0.0211 Epoch: 10 Global Step: 134270 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:25:33,705-Speed 3060.01 samples/sec Loss 6.7582 LearningRate 0.0211 Epoch: 10 Global Step: 134280 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:25:37,110-Speed 3007.33 samples/sec Loss 6.7082 LearningRate 0.0211 Epoch: 10 Global Step: 134290 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:25:40,458-Speed 3059.65 samples/sec Loss 6.6552 LearningRate 0.0211 Epoch: 10 Global Step: 134300 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:25:43,789-Speed 3075.02 samples/sec Loss 6.7575 LearningRate 0.0211 Epoch: 10 Global Step: 134310 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:25:47,150-Speed 3047.93 samples/sec Loss 6.8296 LearningRate 0.0211 Epoch: 10 Global Step: 134320 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:25:50,490-Speed 3066.43 samples/sec Loss 6.6745 LearningRate 0.0211 Epoch: 10 Global Step: 134330 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:25:53,931-Speed 2979.98 samples/sec Loss 6.6452 LearningRate 0.0211 Epoch: 10 Global Step: 134340 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:25:57,400-Speed 2951.91 samples/sec Loss 6.7850 LearningRate 0.0211 Epoch: 10 Global Step: 134350 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:26:00,867-Speed 2954.34 samples/sec Loss 6.7389 LearningRate 0.0211 Epoch: 10 Global Step: 134360 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:26:04,225-Speed 3050.63 samples/sec Loss 6.7091 LearningRate 0.0211 Epoch: 10 Global Step: 134370 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:26:07,587-Speed 3046.34 samples/sec Loss 6.7090 LearningRate 0.0211 Epoch: 10 Global Step: 134380 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:26:10,993-Speed 3007.28 samples/sec Loss 6.7836 LearningRate 0.0211 Epoch: 10 Global Step: 134390 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:26:14,366-Speed 3037.09 samples/sec Loss 6.8215 LearningRate 0.0211 Epoch: 10 Global Step: 134400 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:26:17,768-Speed 3010.77 samples/sec Loss 6.6070 LearningRate 0.0211 Epoch: 10 Global Step: 134410 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:26:21,210-Speed 2975.89 samples/sec Loss 6.6827 LearningRate 0.0211 Epoch: 10 Global Step: 134420 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:26:24,524-Speed 3091.28 samples/sec Loss 6.7899 LearningRate 0.0211 Epoch: 10 Global Step: 134430 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:26:27,867-Speed 3063.80 samples/sec Loss 6.6656 LearningRate 0.0211 Epoch: 10 Global Step: 134440 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:26:31,220-Speed 3055.60 samples/sec Loss 6.6813 LearningRate 0.0210 Epoch: 10 Global Step: 134450 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:26:34,540-Speed 3084.42 samples/sec Loss 6.6692 LearningRate 0.0210 Epoch: 10 Global Step: 134460 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:26:37,925-Speed 3025.92 samples/sec Loss 6.8893 LearningRate 0.0210 Epoch: 10 Global Step: 134470 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:26:41,265-Speed 3067.26 samples/sec Loss 6.8178 LearningRate 0.0210 Epoch: 10 Global Step: 134480 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:26:44,641-Speed 3033.76 samples/sec Loss 6.7368 LearningRate 0.0210 Epoch: 10 Global Step: 134490 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:26:48,089-Speed 2970.95 samples/sec Loss 6.6239 LearningRate 0.0210 Epoch: 10 Global Step: 134500 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:26:51,448-Speed 3049.66 samples/sec Loss 6.8047 LearningRate 0.0210 Epoch: 10 Global Step: 134510 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:26:54,848-Speed 3012.64 samples/sec Loss 6.6042 LearningRate 0.0210 Epoch: 10 Global Step: 134520 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:26:58,180-Speed 3073.57 samples/sec Loss 6.5434 LearningRate 0.0210 Epoch: 10 Global Step: 134530 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:27:01,506-Speed 3080.25 samples/sec Loss 6.7921 LearningRate 0.0210 Epoch: 10 Global Step: 134540 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:27:04,913-Speed 3006.40 samples/sec Loss 6.6585 LearningRate 0.0210 Epoch: 10 Global Step: 134550 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:27:08,254-Speed 3065.50 samples/sec Loss 6.8075 LearningRate 0.0210 Epoch: 10 Global Step: 134560 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:27:11,680-Speed 2990.18 samples/sec Loss 6.5615 LearningRate 0.0210 Epoch: 10 Global Step: 134570 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:27:15,073-Speed 3018.62 samples/sec Loss 6.6299 LearningRate 0.0210 Epoch: 10 Global Step: 134580 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:27:18,433-Speed 3048.71 samples/sec Loss 6.6199 LearningRate 0.0210 Epoch: 10 Global Step: 134590 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:27:21,895-Speed 2958.15 samples/sec Loss 6.7299 LearningRate 0.0210 Epoch: 10 Global Step: 134600 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:27:25,289-Speed 3018.30 samples/sec Loss 6.7011 LearningRate 0.0210 Epoch: 10 Global Step: 134610 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:27:28,636-Speed 3060.72 samples/sec Loss 6.5530 LearningRate 0.0210 Epoch: 10 Global Step: 134620 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:27:32,039-Speed 3010.04 samples/sec Loss 6.7532 LearningRate 0.0210 Epoch: 10 Global Step: 134630 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:27:35,519-Speed 2943.28 samples/sec Loss 6.7940 LearningRate 0.0210 Epoch: 10 Global Step: 134640 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:27:38,989-Speed 2951.49 samples/sec Loss 6.6959 LearningRate 0.0210 Epoch: 10 Global Step: 134650 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:27:42,405-Speed 2998.60 samples/sec Loss 6.6255 LearningRate 0.0210 Epoch: 10 Global Step: 134660 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:27:45,802-Speed 3014.64 samples/sec Loss 6.7328 LearningRate 0.0210 Epoch: 10 Global Step: 134670 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:27:49,123-Speed 3084.97 samples/sec Loss 6.7336 LearningRate 0.0210 Epoch: 10 Global Step: 134680 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:27:52,498-Speed 3034.94 samples/sec Loss 6.5865 LearningRate 0.0210 Epoch: 10 Global Step: 134690 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:27:55,902-Speed 3008.56 samples/sec Loss 6.6865 LearningRate 0.0210 Epoch: 10 Global Step: 134700 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:27:59,285-Speed 3027.87 samples/sec Loss 6.7482 LearningRate 0.0210 Epoch: 10 Global Step: 134710 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:28:02,683-Speed 3014.51 samples/sec Loss 6.5904 LearningRate 0.0209 Epoch: 10 Global Step: 134720 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:28:06,171-Speed 2936.34 samples/sec Loss 6.6960 LearningRate 0.0209 Epoch: 10 Global Step: 134730 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:28:09,558-Speed 3024.21 samples/sec Loss 6.7815 LearningRate 0.0209 Epoch: 10 Global Step: 134740 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:28:12,924-Speed 3042.93 samples/sec Loss 6.7065 LearningRate 0.0209 Epoch: 10 Global Step: 134750 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:28:16,238-Speed 3090.62 samples/sec Loss 6.7258 LearningRate 0.0209 Epoch: 10 Global Step: 134760 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:28:19,644-Speed 3007.71 samples/sec Loss 6.5918 LearningRate 0.0209 Epoch: 10 Global Step: 134770 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:28:23,014-Speed 3039.95 samples/sec Loss 6.6348 LearningRate 0.0209 Epoch: 10 Global Step: 134780 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:28:26,364-Speed 3057.46 samples/sec Loss 6.7151 LearningRate 0.0209 Epoch: 10 Global Step: 134790 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:28:29,802-Speed 2979.15 samples/sec Loss 6.5646 LearningRate 0.0209 Epoch: 10 Global Step: 134800 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:28:33,192-Speed 3021.78 samples/sec Loss 6.7168 LearningRate 0.0209 Epoch: 10 Global Step: 134810 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:28:36,590-Speed 3014.69 samples/sec Loss 6.8067 LearningRate 0.0209 Epoch: 10 Global Step: 134820 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:28:39,981-Speed 3020.53 samples/sec Loss 6.6760 LearningRate 0.0209 Epoch: 10 Global Step: 134830 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:28:43,377-Speed 3016.37 samples/sec Loss 6.7691 LearningRate 0.0209 Epoch: 10 Global Step: 134840 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:28:46,766-Speed 3022.70 samples/sec Loss 6.7517 LearningRate 0.0209 Epoch: 10 Global Step: 134850 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:28:50,110-Speed 3063.92 samples/sec Loss 6.6255 LearningRate 0.0209 Epoch: 10 Global Step: 134860 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:28:53,466-Speed 3052.32 samples/sec Loss 6.7299 LearningRate 0.0209 Epoch: 10 Global Step: 134870 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:28:56,846-Speed 3030.08 samples/sec Loss 6.8155 LearningRate 0.0209 Epoch: 10 Global Step: 134880 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:29:00,275-Speed 2986.87 samples/sec Loss 6.7421 LearningRate 0.0209 Epoch: 10 Global Step: 134890 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:29:03,696-Speed 2994.60 samples/sec Loss 6.7622 LearningRate 0.0209 Epoch: 10 Global Step: 134900 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:29:07,190-Speed 2931.39 samples/sec Loss 6.6213 LearningRate 0.0209 Epoch: 10 Global Step: 134910 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:29:10,588-Speed 3014.42 samples/sec Loss 6.6378 LearningRate 0.0209 Epoch: 10 Global Step: 134920 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:29:14,022-Speed 2982.58 samples/sec Loss 6.5786 LearningRate 0.0209 Epoch: 10 Global Step: 134930 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:29:17,491-Speed 2953.00 samples/sec Loss 6.6889 LearningRate 0.0209 Epoch: 10 Global Step: 134940 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:29:20,921-Speed 2986.30 samples/sec Loss 6.8094 LearningRate 0.0209 Epoch: 10 Global Step: 134950 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:29:24,352-Speed 2985.85 samples/sec Loss 6.5753 LearningRate 0.0209 Epoch: 10 Global Step: 134960 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:29:27,746-Speed 3017.16 samples/sec Loss 6.6239 LearningRate 0.0209 Epoch: 10 Global Step: 134970 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:29:31,159-Speed 3001.50 samples/sec Loss 6.6525 LearningRate 0.0209 Epoch: 10 Global Step: 134980 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:29:34,545-Speed 3024.97 samples/sec Loss 6.7705 LearningRate 0.0208 Epoch: 10 Global Step: 134990 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:29:37,907-Speed 3047.01 samples/sec Loss 6.7116 LearningRate 0.0208 Epoch: 10 Global Step: 135000 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:29:41,357-Speed 2969.44 samples/sec Loss 6.6151 LearningRate 0.0208 Epoch: 10 Global Step: 135010 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:29:44,707-Speed 3057.04 samples/sec Loss 6.6264 LearningRate 0.0208 Epoch: 10 Global Step: 135020 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:29:48,206-Speed 2927.48 samples/sec Loss 6.8352 LearningRate 0.0208 Epoch: 10 Global Step: 135030 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:29:51,587-Speed 3030.19 samples/sec Loss 6.6125 LearningRate 0.0208 Epoch: 10 Global Step: 135040 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:29:54,998-Speed 3003.00 samples/sec Loss 6.7222 LearningRate 0.0208 Epoch: 10 Global Step: 135050 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:29:58,353-Speed 3052.16 samples/sec Loss 6.6792 LearningRate 0.0208 Epoch: 10 Global Step: 135060 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:30:01,722-Speed 3040.27 samples/sec Loss 6.6919 LearningRate 0.0208 Epoch: 10 Global Step: 135070 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 14:30:05,102-Speed 3030.80 samples/sec Loss 6.5576 LearningRate 0.0208 Epoch: 10 Global Step: 135080 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:30:08,489-Speed 3024.05 samples/sec Loss 6.7027 LearningRate 0.0208 Epoch: 10 Global Step: 135090 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:30:11,865-Speed 3033.91 samples/sec Loss 6.6898 LearningRate 0.0208 Epoch: 10 Global Step: 135100 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:30:15,276-Speed 3002.91 samples/sec Loss 6.7408 LearningRate 0.0208 Epoch: 10 Global Step: 135110 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:30:18,637-Speed 3048.19 samples/sec Loss 6.6598 LearningRate 0.0208 Epoch: 10 Global Step: 135120 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:30:22,092-Speed 2964.65 samples/sec Loss 6.7730 LearningRate 0.0208 Epoch: 10 Global Step: 135130 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:30:25,497-Speed 3007.52 samples/sec Loss 6.6537 LearningRate 0.0208 Epoch: 10 Global Step: 135140 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:30:28,928-Speed 2985.76 samples/sec Loss 6.6039 LearningRate 0.0208 Epoch: 10 Global Step: 135150 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:30:32,255-Speed 3078.43 samples/sec Loss 6.7908 LearningRate 0.0208 Epoch: 10 Global Step: 135160 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:30:35,674-Speed 2996.12 samples/sec Loss 6.6602 LearningRate 0.0208 Epoch: 10 Global Step: 135170 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:30:39,039-Speed 3044.19 samples/sec Loss 6.7594 LearningRate 0.0208 Epoch: 10 Global Step: 135180 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:30:42,378-Speed 3066.99 samples/sec Loss 6.7041 LearningRate 0.0208 Epoch: 10 Global Step: 135190 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:30:45,787-Speed 3004.71 samples/sec Loss 6.7140 LearningRate 0.0208 Epoch: 10 Global Step: 135200 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:30:49,165-Speed 3032.59 samples/sec Loss 6.6797 LearningRate 0.0208 Epoch: 10 Global Step: 135210 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:30:52,635-Speed 2951.62 samples/sec Loss 6.7401 LearningRate 0.0208 Epoch: 10 Global Step: 135220 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:30:56,002-Speed 3042.07 samples/sec Loss 6.6568 LearningRate 0.0208 Epoch: 10 Global Step: 135230 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:30:59,398-Speed 3016.31 samples/sec Loss 6.7653 LearningRate 0.0208 Epoch: 10 Global Step: 135240 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:31:02,767-Speed 3040.42 samples/sec Loss 6.6693 LearningRate 0.0208 Epoch: 10 Global Step: 135250 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:31:06,112-Speed 3061.80 samples/sec Loss 6.7425 LearningRate 0.0207 Epoch: 10 Global Step: 135260 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:31:09,530-Speed 2997.40 samples/sec Loss 6.6140 LearningRate 0.0207 Epoch: 10 Global Step: 135270 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:31:12,991-Speed 2958.92 samples/sec Loss 6.5934 LearningRate 0.0207 Epoch: 10 Global Step: 135280 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:31:16,468-Speed 2946.19 samples/sec Loss 6.7520 LearningRate 0.0207 Epoch: 10 Global Step: 135290 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:31:19,839-Speed 3038.29 samples/sec Loss 6.6525 LearningRate 0.0207 Epoch: 10 Global Step: 135300 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:31:23,200-Speed 3047.71 samples/sec Loss 6.7652 LearningRate 0.0207 Epoch: 10 Global Step: 135310 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:31:26,591-Speed 3020.56 samples/sec Loss 6.6618 LearningRate 0.0207 Epoch: 10 Global Step: 135320 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:31:29,951-Speed 3048.71 samples/sec Loss 6.6387 LearningRate 0.0207 Epoch: 10 Global Step: 135330 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:31:33,346-Speed 3017.06 samples/sec Loss 6.5680 LearningRate 0.0207 Epoch: 10 Global Step: 135340 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 14:31:36,757-Speed 3003.15 samples/sec Loss 6.7150 LearningRate 0.0207 Epoch: 10 Global Step: 135350 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:31:40,258-Speed 2925.20 samples/sec Loss 6.5999 LearningRate 0.0207 Epoch: 10 Global Step: 135360 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:31:43,652-Speed 3017.63 samples/sec Loss 6.6921 LearningRate 0.0207 Epoch: 10 Global Step: 135370 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:31:46,962-Speed 3095.14 samples/sec Loss 6.7189 LearningRate 0.0207 Epoch: 10 Global Step: 135380 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:31:50,285-Speed 3083.13 samples/sec Loss 6.6668 LearningRate 0.0207 Epoch: 10 Global Step: 135390 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:31:53,714-Speed 2987.04 samples/sec Loss 6.6441 LearningRate 0.0207 Epoch: 10 Global Step: 135400 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:31:57,064-Speed 3057.52 samples/sec Loss 6.6602 LearningRate 0.0207 Epoch: 10 Global Step: 135410 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:32:00,454-Speed 3022.12 samples/sec Loss 6.5631 LearningRate 0.0207 Epoch: 10 Global Step: 135420 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:32:03,813-Speed 3049.28 samples/sec Loss 6.6409 LearningRate 0.0207 Epoch: 10 Global Step: 135430 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:32:07,165-Speed 3055.65 samples/sec Loss 6.5378 LearningRate 0.0207 Epoch: 10 Global Step: 135440 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:32:10,550-Speed 3025.56 samples/sec Loss 6.7514 LearningRate 0.0207 Epoch: 10 Global Step: 135450 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:32:13,907-Speed 3051.45 samples/sec Loss 6.6708 LearningRate 0.0207 Epoch: 10 Global Step: 135460 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:32:17,269-Speed 3045.93 samples/sec Loss 6.6109 LearningRate 0.0207 Epoch: 10 Global Step: 135470 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:32:20,611-Speed 3066.12 samples/sec Loss 6.6637 LearningRate 0.0207 Epoch: 10 Global Step: 135480 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:32:23,944-Speed 3073.04 samples/sec Loss 6.6131 LearningRate 0.0207 Epoch: 10 Global Step: 135490 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:32:27,313-Speed 3040.17 samples/sec Loss 6.6753 LearningRate 0.0207 Epoch: 10 Global Step: 135500 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:32:30,731-Speed 2997.02 samples/sec Loss 6.7069 LearningRate 0.0207 Epoch: 10 Global Step: 135510 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:32:34,093-Speed 3047.09 samples/sec Loss 6.5888 LearningRate 0.0207 Epoch: 10 Global Step: 135520 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:32:37,455-Speed 3046.06 samples/sec Loss 6.6052 LearningRate 0.0207 Epoch: 10 Global Step: 135530 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:32:40,812-Speed 3051.92 samples/sec Loss 6.7051 LearningRate 0.0206 Epoch: 10 Global Step: 135540 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:32:44,217-Speed 3007.99 samples/sec Loss 6.6214 LearningRate 0.0206 Epoch: 10 Global Step: 135550 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:32:47,558-Speed 3065.66 samples/sec Loss 6.5986 LearningRate 0.0206 Epoch: 10 Global Step: 135560 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:32:50,896-Speed 3068.96 samples/sec Loss 6.6786 LearningRate 0.0206 Epoch: 10 Global Step: 135570 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:32:54,221-Speed 3080.41 samples/sec Loss 6.6517 LearningRate 0.0206 Epoch: 10 Global Step: 135580 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:32:57,536-Speed 3089.66 samples/sec Loss 6.5678 LearningRate 0.0206 Epoch: 10 Global Step: 135590 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:33:00,894-Speed 3050.51 samples/sec Loss 6.6402 LearningRate 0.0206 Epoch: 10 Global Step: 135600 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:33:04,186-Speed 3111.73 samples/sec Loss 6.6649 LearningRate 0.0206 Epoch: 10 Global Step: 135610 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:33:07,553-Speed 3042.07 samples/sec Loss 6.6594 LearningRate 0.0206 Epoch: 10 Global Step: 135620 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:33:10,893-Speed 3066.85 samples/sec Loss 6.5700 LearningRate 0.0206 Epoch: 10 Global Step: 135630 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:33:14,237-Speed 3062.73 samples/sec Loss 6.4951 LearningRate 0.0206 Epoch: 10 Global Step: 135640 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:33:17,588-Speed 3056.90 samples/sec Loss 6.7209 LearningRate 0.0206 Epoch: 10 Global Step: 135650 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:33:20,979-Speed 3020.36 samples/sec Loss 6.6136 LearningRate 0.0206 Epoch: 10 Global Step: 135660 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:33:24,369-Speed 3023.27 samples/sec Loss 6.6705 LearningRate 0.0206 Epoch: 10 Global Step: 135670 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:33:27,737-Speed 3041.12 samples/sec Loss 6.6037 LearningRate 0.0206 Epoch: 10 Global Step: 135680 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:33:31,059-Speed 3083.17 samples/sec Loss 6.6639 LearningRate 0.0206 Epoch: 10 Global Step: 135690 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:33:34,408-Speed 3058.71 samples/sec Loss 6.6862 LearningRate 0.0206 Epoch: 10 Global Step: 135700 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:33:37,792-Speed 3026.41 samples/sec Loss 6.6306 LearningRate 0.0206 Epoch: 10 Global Step: 135710 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:33:41,118-Speed 3079.87 samples/sec Loss 6.6851 LearningRate 0.0206 Epoch: 10 Global Step: 135720 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:33:44,537-Speed 2995.68 samples/sec Loss 6.6969 LearningRate 0.0206 Epoch: 10 Global Step: 135730 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:33:47,889-Speed 3055.84 samples/sec Loss 6.5961 LearningRate 0.0206 Epoch: 10 Global Step: 135740 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:33:51,292-Speed 3009.47 samples/sec Loss 6.6308 LearningRate 0.0206 Epoch: 10 Global Step: 135750 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:33:54,628-Speed 3071.69 samples/sec Loss 6.6251 LearningRate 0.0206 Epoch: 10 Global Step: 135760 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:33:58,070-Speed 2975.58 samples/sec Loss 6.5104 LearningRate 0.0206 Epoch: 10 Global Step: 135770 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:34:01,408-Speed 3068.56 samples/sec Loss 6.6992 LearningRate 0.0206 Epoch: 10 Global Step: 135780 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:34:04,711-Speed 3101.00 samples/sec Loss 6.5488 LearningRate 0.0206 Epoch: 10 Global Step: 135790 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:34:08,020-Speed 3095.31 samples/sec Loss 6.5176 LearningRate 0.0206 Epoch: 10 Global Step: 135800 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:34:11,393-Speed 3036.51 samples/sec Loss 6.6322 LearningRate 0.0205 Epoch: 10 Global Step: 135810 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:34:14,933-Speed 2893.58 samples/sec Loss 6.6978 LearningRate 0.0205 Epoch: 10 Global Step: 135820 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:34:18,278-Speed 3062.24 samples/sec Loss 6.5880 LearningRate 0.0205 Epoch: 10 Global Step: 135830 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:34:21,614-Speed 3070.29 samples/sec Loss 6.6148 LearningRate 0.0205 Epoch: 10 Global Step: 135840 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:34:24,989-Speed 3035.01 samples/sec Loss 6.6080 LearningRate 0.0205 Epoch: 10 Global Step: 135850 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:34:28,418-Speed 2987.90 samples/sec Loss 6.6588 LearningRate 0.0205 Epoch: 10 Global Step: 135860 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:34:31,835-Speed 2997.44 samples/sec Loss 6.6144 LearningRate 0.0205 Epoch: 10 Global Step: 135870 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:34:35,232-Speed 3015.81 samples/sec Loss 6.7454 LearningRate 0.0205 Epoch: 10 Global Step: 135880 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:34:38,572-Speed 3067.11 samples/sec Loss 6.6151 LearningRate 0.0205 Epoch: 10 Global Step: 135890 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:34:41,945-Speed 3036.43 samples/sec Loss 6.6313 LearningRate 0.0205 Epoch: 10 Global Step: 135900 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:34:45,298-Speed 3054.94 samples/sec Loss 6.6977 LearningRate 0.0205 Epoch: 10 Global Step: 135910 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:34:48,651-Speed 3054.43 samples/sec Loss 6.5718 LearningRate 0.0205 Epoch: 10 Global Step: 135920 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:34:52,077-Speed 2990.20 samples/sec Loss 6.7057 LearningRate 0.0205 Epoch: 10 Global Step: 135930 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:34:55,488-Speed 3003.48 samples/sec Loss 6.7228 LearningRate 0.0205 Epoch: 10 Global Step: 135940 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:34:58,821-Speed 3072.99 samples/sec Loss 6.6301 LearningRate 0.0205 Epoch: 10 Global Step: 135950 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:35:02,173-Speed 3055.43 samples/sec Loss 6.6238 LearningRate 0.0205 Epoch: 10 Global Step: 135960 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:35:05,510-Speed 3070.14 samples/sec Loss 6.6507 LearningRate 0.0205 Epoch: 10 Global Step: 135970 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:35:08,954-Speed 2973.15 samples/sec Loss 6.6445 LearningRate 0.0205 Epoch: 10 Global Step: 135980 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:35:12,361-Speed 3006.99 samples/sec Loss 6.7366 LearningRate 0.0205 Epoch: 10 Global Step: 135990 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 14:35:15,800-Speed 2978.44 samples/sec Loss 6.6240 LearningRate 0.0205 Epoch: 10 Global Step: 136000 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 14:35:19,237-Speed 2979.88 samples/sec Loss 6.5670 LearningRate 0.0205 Epoch: 10 Global Step: 136010 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 14:35:22,588-Speed 3056.87 samples/sec Loss 6.6214 LearningRate 0.0205 Epoch: 10 Global Step: 136020 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:35:25,915-Speed 3078.45 samples/sec Loss 6.6541 LearningRate 0.0205 Epoch: 10 Global Step: 136030 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:35:29,255-Speed 3067.25 samples/sec Loss 6.6002 LearningRate 0.0205 Epoch: 10 Global Step: 136040 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:35:32,665-Speed 3003.68 samples/sec Loss 6.7018 LearningRate 0.0205 Epoch: 10 Global Step: 136050 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:35:36,034-Speed 3039.96 samples/sec Loss 6.7114 LearningRate 0.0205 Epoch: 10 Global Step: 136060 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:35:39,370-Speed 3071.20 samples/sec Loss 6.5462 LearningRate 0.0205 Epoch: 10 Global Step: 136070 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:35:42,673-Speed 3100.97 samples/sec Loss 6.5666 LearningRate 0.0205 Epoch: 10 Global Step: 136080 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:35:45,986-Speed 3091.43 samples/sec Loss 6.6022 LearningRate 0.0204 Epoch: 10 Global Step: 136090 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:35:49,373-Speed 3024.43 samples/sec Loss 6.5212 LearningRate 0.0204 Epoch: 10 Global Step: 136100 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:35:52,727-Speed 3053.59 samples/sec Loss 6.6345 LearningRate 0.0204 Epoch: 10 Global Step: 136110 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:35:56,122-Speed 3017.08 samples/sec Loss 6.5538 LearningRate 0.0204 Epoch: 10 Global Step: 136120 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:35:59,489-Speed 3041.71 samples/sec Loss 6.5686 LearningRate 0.0204 Epoch: 10 Global Step: 136130 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:36:02,809-Speed 3085.18 samples/sec Loss 6.6447 LearningRate 0.0204 Epoch: 10 Global Step: 136140 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:36:06,265-Speed 2964.23 samples/sec Loss 6.6816 LearningRate 0.0204 Epoch: 10 Global Step: 136150 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:36:09,658-Speed 3018.33 samples/sec Loss 6.5961 LearningRate 0.0204 Epoch: 10 Global Step: 136160 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:36:13,036-Speed 3032.38 samples/sec Loss 6.5700 LearningRate 0.0204 Epoch: 10 Global Step: 136170 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:36:16,395-Speed 3049.52 samples/sec Loss 6.5425 LearningRate 0.0204 Epoch: 10 Global Step: 136180 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:36:19,729-Speed 3072.75 samples/sec Loss 6.5381 LearningRate 0.0204 Epoch: 10 Global Step: 136190 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:36:23,041-Speed 3092.43 samples/sec Loss 6.5612 LearningRate 0.0204 Epoch: 10 Global Step: 136200 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:36:26,369-Speed 3077.10 samples/sec Loss 6.7014 LearningRate 0.0204 Epoch: 10 Global Step: 136210 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:36:29,762-Speed 3018.98 samples/sec Loss 6.5095 LearningRate 0.0204 Epoch: 10 Global Step: 136220 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 14:36:33,131-Speed 3041.13 samples/sec Loss 6.5237 LearningRate 0.0204 Epoch: 10 Global Step: 136230 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:36:36,457-Speed 3079.81 samples/sec Loss 6.6785 LearningRate 0.0204 Epoch: 10 Global Step: 136240 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:36:39,820-Speed 3045.68 samples/sec Loss 6.5912 LearningRate 0.0204 Epoch: 10 Global Step: 136250 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:36:43,236-Speed 2998.22 samples/sec Loss 6.5565 LearningRate 0.0204 Epoch: 10 Global Step: 136260 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:36:46,601-Speed 3044.08 samples/sec Loss 6.5500 LearningRate 0.0204 Epoch: 10 Global Step: 136270 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:36:50,025-Speed 2991.32 samples/sec Loss 6.5322 LearningRate 0.0204 Epoch: 10 Global Step: 136280 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:36:53,400-Speed 3035.44 samples/sec Loss 6.5444 LearningRate 0.0204 Epoch: 10 Global Step: 136290 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:36:56,715-Speed 3089.78 samples/sec Loss 6.5729 LearningRate 0.0204 Epoch: 10 Global Step: 136300 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:37:00,054-Speed 3067.86 samples/sec Loss 6.5298 LearningRate 0.0204 Epoch: 10 Global Step: 136310 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:37:03,506-Speed 2967.59 samples/sec Loss 6.8303 LearningRate 0.0204 Epoch: 10 Global Step: 136320 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:37:06,854-Speed 3059.65 samples/sec Loss 6.5975 LearningRate 0.0204 Epoch: 10 Global Step: 136330 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:37:10,244-Speed 3020.76 samples/sec Loss 6.6299 LearningRate 0.0204 Epoch: 10 Global Step: 136340 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:37:13,625-Speed 3030.01 samples/sec Loss 6.5539 LearningRate 0.0204 Epoch: 10 Global Step: 136350 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:37:16,956-Speed 3074.79 samples/sec Loss 6.5770 LearningRate 0.0203 Epoch: 10 Global Step: 136360 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:37:20,440-Speed 2940.47 samples/sec Loss 6.6013 LearningRate 0.0203 Epoch: 10 Global Step: 136370 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:37:23,849-Speed 3004.15 samples/sec Loss 6.6586 LearningRate 0.0203 Epoch: 10 Global Step: 136380 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:37:27,321-Speed 2949.67 samples/sec Loss 6.5766 LearningRate 0.0203 Epoch: 10 Global Step: 136390 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:37:30,766-Speed 2973.50 samples/sec Loss 6.5236 LearningRate 0.0203 Epoch: 10 Global Step: 136400 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:37:34,168-Speed 3010.71 samples/sec Loss 6.7258 LearningRate 0.0203 Epoch: 10 Global Step: 136410 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:37:37,542-Speed 3036.19 samples/sec Loss 6.5807 LearningRate 0.0203 Epoch: 10 Global Step: 136420 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:37:40,862-Speed 3085.08 samples/sec Loss 6.6613 LearningRate 0.0203 Epoch: 10 Global Step: 136430 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:37:44,270-Speed 3006.05 samples/sec Loss 6.5296 LearningRate 0.0203 Epoch: 10 Global Step: 136440 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:37:47,584-Speed 3091.04 samples/sec Loss 6.5859 LearningRate 0.0203 Epoch: 10 Global Step: 136450 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:37:50,930-Speed 3062.07 samples/sec Loss 6.6202 LearningRate 0.0203 Epoch: 10 Global Step: 136460 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:37:54,380-Speed 2969.21 samples/sec Loss 6.5933 LearningRate 0.0203 Epoch: 10 Global Step: 136470 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:37:57,749-Speed 3040.36 samples/sec Loss 6.6265 LearningRate 0.0203 Epoch: 10 Global Step: 136480 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:38:01,073-Speed 3081.25 samples/sec Loss 6.6391 LearningRate 0.0203 Epoch: 10 Global Step: 136490 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:38:04,500-Speed 2988.68 samples/sec Loss 6.6145 LearningRate 0.0203 Epoch: 10 Global Step: 136500 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:38:07,901-Speed 3012.50 samples/sec Loss 6.5754 LearningRate 0.0203 Epoch: 10 Global Step: 136510 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:38:11,253-Speed 3056.07 samples/sec Loss 6.5791 LearningRate 0.0203 Epoch: 10 Global Step: 136520 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:38:14,571-Speed 3087.04 samples/sec Loss 6.6689 LearningRate 0.0203 Epoch: 10 Global Step: 136530 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:38:17,875-Speed 3100.30 samples/sec Loss 6.6408 LearningRate 0.0203 Epoch: 10 Global Step: 136540 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:38:21,179-Speed 3099.96 samples/sec Loss 6.5580 LearningRate 0.0203 Epoch: 10 Global Step: 136550 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:38:24,651-Speed 2950.14 samples/sec Loss 6.6209 LearningRate 0.0203 Epoch: 10 Global Step: 136560 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:38:28,117-Speed 2955.21 samples/sec Loss 6.5082 LearningRate 0.0203 Epoch: 10 Global Step: 136570 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:38:31,512-Speed 3017.01 samples/sec Loss 6.5578 LearningRate 0.0203 Epoch: 10 Global Step: 136580 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:38:34,930-Speed 2996.78 samples/sec Loss 6.5432 LearningRate 0.0203 Epoch: 10 Global Step: 136590 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:38:38,237-Speed 3098.47 samples/sec Loss 6.6389 LearningRate 0.0203 Epoch: 10 Global Step: 136600 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:38:41,673-Speed 2980.25 samples/sec Loss 6.6207 LearningRate 0.0203 Epoch: 10 Global Step: 136610 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:38:45,407-Speed 2743.47 samples/sec Loss 6.5384 LearningRate 0.0203 Epoch: 10 Global Step: 136620 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:38:48,829-Speed 2992.99 samples/sec Loss 6.6018 LearningRate 0.0203 Epoch: 10 Global Step: 136630 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:39:22,431-Speed 304.75 samples/sec Loss 5.2470 LearningRate 0.0202 Epoch: 11 Global Step: 136640 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:39:26,255-Speed 2678.44 samples/sec Loss 5.0876 LearningRate 0.0202 Epoch: 11 Global Step: 136650 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:39:29,656-Speed 3011.58 samples/sec Loss 5.0402 LearningRate 0.0202 Epoch: 11 Global Step: 136660 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:39:33,056-Speed 3013.36 samples/sec Loss 5.1029 LearningRate 0.0202 Epoch: 11 Global Step: 136670 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:39:36,494-Speed 2979.08 samples/sec Loss 5.0153 LearningRate 0.0202 Epoch: 11 Global Step: 136680 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:39:39,837-Speed 3063.55 samples/sec Loss 5.1525 LearningRate 0.0202 Epoch: 11 Global Step: 136690 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:39:43,214-Speed 3033.51 samples/sec Loss 5.1737 LearningRate 0.0202 Epoch: 11 Global Step: 136700 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:39:46,734-Speed 2911.25 samples/sec Loss 5.1295 LearningRate 0.0202 Epoch: 11 Global Step: 136710 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:39:50,086-Speed 3055.72 samples/sec Loss 5.2685 LearningRate 0.0202 Epoch: 11 Global Step: 136720 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:39:53,482-Speed 3016.26 samples/sec Loss 5.2505 LearningRate 0.0202 Epoch: 11 Global Step: 136730 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:39:56,810-Speed 3077.65 samples/sec Loss 5.1335 LearningRate 0.0202 Epoch: 11 Global Step: 136740 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:40:00,174-Speed 3046.24 samples/sec Loss 5.0794 LearningRate 0.0202 Epoch: 11 Global Step: 136750 Fp16 Grad Scale: 131072 Required: 11 hours Training: 2022-04-27 14:40:03,545-Speed 3039.25 samples/sec Loss 5.1768 LearningRate 0.0202 Epoch: 11 Global Step: 136760 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:40:06,974-Speed 2987.11 samples/sec Loss 5.0395 LearningRate 0.0202 Epoch: 11 Global Step: 136770 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:40:10,360-Speed 3024.65 samples/sec Loss 5.1716 LearningRate 0.0202 Epoch: 11 Global Step: 136780 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:40:13,748-Speed 3023.63 samples/sec Loss 5.0142 LearningRate 0.0202 Epoch: 11 Global Step: 136790 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:40:17,187-Speed 2978.76 samples/sec Loss 5.1567 LearningRate 0.0202 Epoch: 11 Global Step: 136800 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:40:20,615-Speed 2987.93 samples/sec Loss 5.2119 LearningRate 0.0202 Epoch: 11 Global Step: 136810 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:40:24,032-Speed 2997.24 samples/sec Loss 5.1398 LearningRate 0.0202 Epoch: 11 Global Step: 136820 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:40:27,419-Speed 3024.11 samples/sec Loss 5.2147 LearningRate 0.0202 Epoch: 11 Global Step: 136830 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:40:30,817-Speed 3014.22 samples/sec Loss 5.1381 LearningRate 0.0202 Epoch: 11 Global Step: 136840 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:40:34,274-Speed 2963.61 samples/sec Loss 5.1540 LearningRate 0.0202 Epoch: 11 Global Step: 136850 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:40:37,835-Speed 2875.95 samples/sec Loss 5.1478 LearningRate 0.0202 Epoch: 11 Global Step: 136860 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:40:41,309-Speed 2948.41 samples/sec Loss 5.2415 LearningRate 0.0202 Epoch: 11 Global Step: 136870 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:40:44,688-Speed 3031.69 samples/sec Loss 5.2125 LearningRate 0.0202 Epoch: 11 Global Step: 136880 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:40:48,130-Speed 2976.48 samples/sec Loss 5.1967 LearningRate 0.0202 Epoch: 11 Global Step: 136890 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:40:51,829-Speed 2769.05 samples/sec Loss 5.2231 LearningRate 0.0202 Epoch: 11 Global Step: 136900 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:40:55,211-Speed 3028.68 samples/sec Loss 5.0780 LearningRate 0.0201 Epoch: 11 Global Step: 136910 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:40:58,725-Speed 2915.00 samples/sec Loss 5.2225 LearningRate 0.0201 Epoch: 11 Global Step: 136920 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:41:02,093-Speed 3041.64 samples/sec Loss 5.1922 LearningRate 0.0201 Epoch: 11 Global Step: 136930 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:41:05,480-Speed 3024.09 samples/sec Loss 5.3248 LearningRate 0.0201 Epoch: 11 Global Step: 136940 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:41:08,851-Speed 3038.54 samples/sec Loss 5.3511 LearningRate 0.0201 Epoch: 11 Global Step: 136950 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:41:12,184-Speed 3073.03 samples/sec Loss 5.2794 LearningRate 0.0201 Epoch: 11 Global Step: 136960 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:41:15,588-Speed 3008.50 samples/sec Loss 5.2520 LearningRate 0.0201 Epoch: 11 Global Step: 136970 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:41:18,970-Speed 3028.83 samples/sec Loss 5.2955 LearningRate 0.0201 Epoch: 11 Global Step: 136980 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:41:22,369-Speed 3014.00 samples/sec Loss 5.3146 LearningRate 0.0201 Epoch: 11 Global Step: 136990 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:41:25,709-Speed 3066.88 samples/sec Loss 5.2255 LearningRate 0.0201 Epoch: 11 Global Step: 137000 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:41:29,085-Speed 3033.73 samples/sec Loss 5.1972 LearningRate 0.0201 Epoch: 11 Global Step: 137010 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:41:32,488-Speed 3010.06 samples/sec Loss 5.1914 LearningRate 0.0201 Epoch: 11 Global Step: 137020 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:41:35,880-Speed 3019.88 samples/sec Loss 5.3503 LearningRate 0.0201 Epoch: 11 Global Step: 137030 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:41:39,334-Speed 2965.75 samples/sec Loss 5.4021 LearningRate 0.0201 Epoch: 11 Global Step: 137040 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:41:42,661-Speed 3078.15 samples/sec Loss 5.2259 LearningRate 0.0201 Epoch: 11 Global Step: 137050 Fp16 Grad Scale: 65536 Required: 11 hours Training: 2022-04-27 14:41:46,035-Speed 3035.80 samples/sec Loss 5.2076 LearningRate 0.0201 Epoch: 11 Global Step: 137060 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:41:49,449-Speed 3001.11 samples/sec Loss 5.3281 LearningRate 0.0201 Epoch: 11 Global Step: 137070 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:41:52,842-Speed 3018.56 samples/sec Loss 5.2339 LearningRate 0.0201 Epoch: 11 Global Step: 137080 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:41:56,214-Speed 3037.54 samples/sec Loss 5.1699 LearningRate 0.0201 Epoch: 11 Global Step: 137090 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:41:59,577-Speed 3045.85 samples/sec Loss 5.1902 LearningRate 0.0201 Epoch: 11 Global Step: 137100 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:42:02,918-Speed 3066.06 samples/sec Loss 5.3507 LearningRate 0.0201 Epoch: 11 Global Step: 137110 Fp16 Grad Scale: 32768 Required: 11 hours Training: 2022-04-27 14:42:06,261-Speed 3063.38 samples/sec Loss 5.2343 LearningRate 0.0201 Epoch: 11 Global Step: 137120 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:42:09,613-Speed 3056.69 samples/sec Loss 5.3541 LearningRate 0.0201 Epoch: 11 Global Step: 137130 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:42:12,976-Speed 3045.16 samples/sec Loss 5.2710 LearningRate 0.0201 Epoch: 11 Global Step: 137140 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:42:16,364-Speed 3023.59 samples/sec Loss 5.3121 LearningRate 0.0201 Epoch: 11 Global Step: 137150 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:42:19,759-Speed 3017.96 samples/sec Loss 5.2457 LearningRate 0.0201 Epoch: 11 Global Step: 137160 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 14:42:23,156-Speed 3015.19 samples/sec Loss 5.2978 LearningRate 0.0201 Epoch: 11 Global Step: 137170 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 14:42:26,478-Speed 3083.32 samples/sec Loss 5.2961 LearningRate 0.0201 Epoch: 11 Global Step: 137180 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 14:42:29,876-Speed 3014.21 samples/sec Loss 5.3139 LearningRate 0.0200 Epoch: 11 Global Step: 137190 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 14:42:33,309-Speed 2983.59 samples/sec Loss 5.3153 LearningRate 0.0200 Epoch: 11 Global Step: 137200 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 14:42:36,661-Speed 3056.37 samples/sec Loss 5.2569 LearningRate 0.0200 Epoch: 11 Global Step: 137210 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 14:42:40,102-Speed 2976.82 samples/sec Loss 5.3942 LearningRate 0.0200 Epoch: 11 Global Step: 137220 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 14:42:43,478-Speed 3034.39 samples/sec Loss 5.2945 LearningRate 0.0200 Epoch: 11 Global Step: 137230 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 14:42:46,903-Speed 2990.60 samples/sec Loss 5.3533 LearningRate 0.0200 Epoch: 11 Global Step: 137240 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 14:42:50,304-Speed 3011.74 samples/sec Loss 5.4195 LearningRate 0.0200 Epoch: 11 Global Step: 137250 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 14:42:53,728-Speed 2991.75 samples/sec Loss 5.2325 LearningRate 0.0200 Epoch: 11 Global Step: 137260 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:42:57,124-Speed 3015.95 samples/sec Loss 5.2998 LearningRate 0.0200 Epoch: 11 Global Step: 137270 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:43:00,473-Speed 3060.73 samples/sec Loss 5.2635 LearningRate 0.0200 Epoch: 11 Global Step: 137280 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:43:03,807-Speed 3071.44 samples/sec Loss 5.2622 LearningRate 0.0200 Epoch: 11 Global Step: 137290 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:43:07,229-Speed 2993.69 samples/sec Loss 5.3513 LearningRate 0.0200 Epoch: 11 Global Step: 137300 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:43:10,667-Speed 2979.83 samples/sec Loss 5.3153 LearningRate 0.0200 Epoch: 11 Global Step: 137310 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:43:14,163-Speed 2929.59 samples/sec Loss 5.3985 LearningRate 0.0200 Epoch: 11 Global Step: 137320 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:43:17,659-Speed 2930.18 samples/sec Loss 5.3265 LearningRate 0.0200 Epoch: 11 Global Step: 137330 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:43:21,094-Speed 2981.56 samples/sec Loss 5.4469 LearningRate 0.0200 Epoch: 11 Global Step: 137340 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:43:24,502-Speed 3005.48 samples/sec Loss 5.4590 LearningRate 0.0200 Epoch: 11 Global Step: 137350 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:43:27,938-Speed 2981.61 samples/sec Loss 5.3380 LearningRate 0.0200 Epoch: 11 Global Step: 137360 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 14:43:31,335-Speed 3015.46 samples/sec Loss 5.2901 LearningRate 0.0200 Epoch: 11 Global Step: 137370 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 14:43:34,812-Speed 2945.34 samples/sec Loss 5.3761 LearningRate 0.0200 Epoch: 11 Global Step: 137380 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 14:43:38,170-Speed 3050.66 samples/sec Loss 5.2764 LearningRate 0.0200 Epoch: 11 Global Step: 137390 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 14:43:41,634-Speed 2957.26 samples/sec Loss 5.3732 LearningRate 0.0200 Epoch: 11 Global Step: 137400 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 14:43:45,010-Speed 3033.34 samples/sec Loss 5.3739 LearningRate 0.0200 Epoch: 11 Global Step: 137410 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 14:43:48,498-Speed 2937.02 samples/sec Loss 5.3884 LearningRate 0.0200 Epoch: 11 Global Step: 137420 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 14:43:51,926-Speed 2987.90 samples/sec Loss 5.4802 LearningRate 0.0200 Epoch: 11 Global Step: 137430 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 14:43:55,256-Speed 3076.14 samples/sec Loss 5.3557 LearningRate 0.0200 Epoch: 11 Global Step: 137440 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:43:58,672-Speed 2998.43 samples/sec Loss 5.3646 LearningRate 0.0200 Epoch: 11 Global Step: 137450 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:44:02,103-Speed 2985.83 samples/sec Loss 5.3751 LearningRate 0.0200 Epoch: 11 Global Step: 137460 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:44:05,453-Speed 3057.21 samples/sec Loss 5.3661 LearningRate 0.0199 Epoch: 11 Global Step: 137470 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:44:08,809-Speed 3052.27 samples/sec Loss 5.4467 LearningRate 0.0199 Epoch: 11 Global Step: 137480 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:44:12,152-Speed 3063.79 samples/sec Loss 5.4550 LearningRate 0.0199 Epoch: 11 Global Step: 137490 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:44:15,521-Speed 3040.88 samples/sec Loss 5.3690 LearningRate 0.0199 Epoch: 11 Global Step: 137500 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:44:18,877-Speed 3052.45 samples/sec Loss 5.2983 LearningRate 0.0199 Epoch: 11 Global Step: 137510 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:44:22,304-Speed 2988.82 samples/sec Loss 5.4514 LearningRate 0.0199 Epoch: 11 Global Step: 137520 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:44:25,774-Speed 2951.83 samples/sec Loss 5.4599 LearningRate 0.0199 Epoch: 11 Global Step: 137530 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:44:29,150-Speed 3033.84 samples/sec Loss 5.5142 LearningRate 0.0199 Epoch: 11 Global Step: 137540 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 14:44:32,492-Speed 3064.56 samples/sec Loss 5.4876 LearningRate 0.0199 Epoch: 11 Global Step: 137550 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 14:44:35,861-Speed 3040.05 samples/sec Loss 5.3939 LearningRate 0.0199 Epoch: 11 Global Step: 137560 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 14:44:39,236-Speed 3035.41 samples/sec Loss 5.4902 LearningRate 0.0199 Epoch: 11 Global Step: 137570 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:44:43,179-Speed 2598.02 samples/sec Loss 5.3441 LearningRate 0.0199 Epoch: 11 Global Step: 137580 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:44:46,514-Speed 3070.85 samples/sec Loss 5.4569 LearningRate 0.0199 Epoch: 11 Global Step: 137590 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:44:49,959-Speed 2973.44 samples/sec Loss 5.5188 LearningRate 0.0199 Epoch: 11 Global Step: 137600 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:44:54,159-Speed 2438.69 samples/sec Loss 5.4639 LearningRate 0.0199 Epoch: 11 Global Step: 137610 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:44:58,800-Speed 2206.80 samples/sec Loss 5.3986 LearningRate 0.0199 Epoch: 11 Global Step: 137620 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:45:02,228-Speed 2987.96 samples/sec Loss 5.5036 LearningRate 0.0199 Epoch: 11 Global Step: 137630 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:45:05,583-Speed 3053.18 samples/sec Loss 5.4451 LearningRate 0.0199 Epoch: 11 Global Step: 137640 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 14:45:08,979-Speed 3016.10 samples/sec Loss 5.4637 LearningRate 0.0199 Epoch: 11 Global Step: 137650 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 14:45:12,424-Speed 2973.96 samples/sec Loss 5.4899 LearningRate 0.0199 Epoch: 11 Global Step: 137660 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 14:45:15,787-Speed 3045.49 samples/sec Loss 5.4531 LearningRate 0.0199 Epoch: 11 Global Step: 137670 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 14:45:19,160-Speed 3036.90 samples/sec Loss 5.4235 LearningRate 0.0199 Epoch: 11 Global Step: 137680 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 14:45:22,581-Speed 2993.71 samples/sec Loss 5.3859 LearningRate 0.0199 Epoch: 11 Global Step: 137690 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 14:45:25,970-Speed 3022.17 samples/sec Loss 5.5060 LearningRate 0.0199 Epoch: 11 Global Step: 137700 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 14:45:29,301-Speed 3075.75 samples/sec Loss 5.5584 LearningRate 0.0199 Epoch: 11 Global Step: 137710 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 14:45:32,655-Speed 3053.53 samples/sec Loss 5.4827 LearningRate 0.0199 Epoch: 11 Global Step: 137720 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 14:45:36,008-Speed 3055.53 samples/sec Loss 5.5161 LearningRate 0.0199 Epoch: 11 Global Step: 137730 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 14:45:39,395-Speed 3023.60 samples/sec Loss 5.4428 LearningRate 0.0199 Epoch: 11 Global Step: 137740 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:45:42,809-Speed 3000.49 samples/sec Loss 5.4874 LearningRate 0.0198 Epoch: 11 Global Step: 137750 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:45:46,204-Speed 3017.44 samples/sec Loss 5.4840 LearningRate 0.0198 Epoch: 11 Global Step: 137760 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:45:49,685-Speed 2942.48 samples/sec Loss 5.6238 LearningRate 0.0198 Epoch: 11 Global Step: 137770 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:45:53,161-Speed 2946.30 samples/sec Loss 5.5097 LearningRate 0.0198 Epoch: 11 Global Step: 137780 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:45:56,600-Speed 2979.26 samples/sec Loss 5.5873 LearningRate 0.0198 Epoch: 11 Global Step: 137790 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:46:00,033-Speed 2983.71 samples/sec Loss 5.5386 LearningRate 0.0198 Epoch: 11 Global Step: 137800 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:46:03,453-Speed 2995.55 samples/sec Loss 5.4895 LearningRate 0.0198 Epoch: 11 Global Step: 137810 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:46:06,887-Speed 2983.05 samples/sec Loss 5.5774 LearningRate 0.0198 Epoch: 11 Global Step: 137820 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:46:10,327-Speed 2977.96 samples/sec Loss 5.5517 LearningRate 0.0198 Epoch: 11 Global Step: 137830 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:46:13,752-Speed 2990.52 samples/sec Loss 5.6453 LearningRate 0.0198 Epoch: 11 Global Step: 137840 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 14:46:17,223-Speed 2950.89 samples/sec Loss 5.5233 LearningRate 0.0198 Epoch: 11 Global Step: 137850 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 14:46:20,544-Speed 3084.03 samples/sec Loss 5.5205 LearningRate 0.0198 Epoch: 11 Global Step: 137860 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:46:23,928-Speed 3027.14 samples/sec Loss 5.5657 LearningRate 0.0198 Epoch: 11 Global Step: 137870 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:46:27,435-Speed 2920.38 samples/sec Loss 5.5550 LearningRate 0.0198 Epoch: 11 Global Step: 137880 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:46:30,845-Speed 3004.17 samples/sec Loss 5.4983 LearningRate 0.0198 Epoch: 11 Global Step: 137890 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:46:34,277-Speed 2984.37 samples/sec Loss 5.4022 LearningRate 0.0198 Epoch: 11 Global Step: 137900 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:46:37,704-Speed 2989.11 samples/sec Loss 5.5772 LearningRate 0.0198 Epoch: 11 Global Step: 137910 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:46:41,170-Speed 2955.34 samples/sec Loss 5.5382 LearningRate 0.0198 Epoch: 11 Global Step: 137920 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:46:44,527-Speed 3050.68 samples/sec Loss 5.5660 LearningRate 0.0198 Epoch: 11 Global Step: 137930 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:46:47,845-Speed 3087.38 samples/sec Loss 5.5428 LearningRate 0.0198 Epoch: 11 Global Step: 137940 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:46:51,276-Speed 2986.02 samples/sec Loss 5.4947 LearningRate 0.0198 Epoch: 11 Global Step: 137950 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:46:54,603-Speed 3078.49 samples/sec Loss 5.4821 LearningRate 0.0198 Epoch: 11 Global Step: 137960 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 14:46:58,624-Speed 2546.81 samples/sec Loss 5.6513 LearningRate 0.0198 Epoch: 11 Global Step: 137970 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 14:47:02,025-Speed 3012.06 samples/sec Loss 5.5341 LearningRate 0.0198 Epoch: 11 Global Step: 137980 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 14:47:06,024-Speed 2561.89 samples/sec Loss 5.7075 LearningRate 0.0198 Epoch: 11 Global Step: 137990 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 14:47:09,466-Speed 2975.58 samples/sec Loss 5.6016 LearningRate 0.0198 Epoch: 11 Global Step: 138000 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 14:47:12,840-Speed 3036.28 samples/sec Loss 5.5152 LearningRate 0.0198 Epoch: 11 Global Step: 138010 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 14:47:16,176-Speed 3070.66 samples/sec Loss 5.4685 LearningRate 0.0197 Epoch: 11 Global Step: 138020 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 14:47:19,565-Speed 3021.83 samples/sec Loss 5.5833 LearningRate 0.0197 Epoch: 11 Global Step: 138030 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 14:47:23,007-Speed 2976.46 samples/sec Loss 5.5273 LearningRate 0.0197 Epoch: 11 Global Step: 138040 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 14:47:26,386-Speed 3030.58 samples/sec Loss 5.6117 LearningRate 0.0197 Epoch: 11 Global Step: 138050 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 14:47:29,784-Speed 3014.96 samples/sec Loss 5.6186 LearningRate 0.0197 Epoch: 11 Global Step: 138060 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 14:47:33,080-Speed 3107.78 samples/sec Loss 5.6363 LearningRate 0.0197 Epoch: 11 Global Step: 138070 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:47:36,440-Speed 3049.15 samples/sec Loss 5.5469 LearningRate 0.0197 Epoch: 11 Global Step: 138080 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:47:39,772-Speed 3074.34 samples/sec Loss 5.5419 LearningRate 0.0197 Epoch: 11 Global Step: 138090 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:47:43,178-Speed 3007.22 samples/sec Loss 5.6212 LearningRate 0.0197 Epoch: 11 Global Step: 138100 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:47:46,517-Speed 3066.89 samples/sec Loss 5.5637 LearningRate 0.0197 Epoch: 11 Global Step: 138110 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:47:49,875-Speed 3050.49 samples/sec Loss 5.6273 LearningRate 0.0197 Epoch: 11 Global Step: 138120 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:47:53,384-Speed 2919.31 samples/sec Loss 5.5590 LearningRate 0.0197 Epoch: 11 Global Step: 138130 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:47:56,715-Speed 3074.71 samples/sec Loss 5.5391 LearningRate 0.0197 Epoch: 11 Global Step: 138140 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:48:00,020-Speed 3100.00 samples/sec Loss 5.6830 LearningRate 0.0197 Epoch: 11 Global Step: 138150 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:48:03,462-Speed 2975.88 samples/sec Loss 5.5568 LearningRate 0.0197 Epoch: 11 Global Step: 138160 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:48:06,974-Speed 2918.08 samples/sec Loss 5.5790 LearningRate 0.0197 Epoch: 11 Global Step: 138170 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 14:48:10,348-Speed 3035.72 samples/sec Loss 5.6428 LearningRate 0.0197 Epoch: 11 Global Step: 138180 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 14:48:13,705-Speed 3050.91 samples/sec Loss 5.6638 LearningRate 0.0197 Epoch: 11 Global Step: 138190 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 14:48:17,132-Speed 2989.67 samples/sec Loss 5.6102 LearningRate 0.0197 Epoch: 11 Global Step: 138200 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 14:48:20,522-Speed 3021.54 samples/sec Loss 5.6018 LearningRate 0.0197 Epoch: 11 Global Step: 138210 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 14:48:23,896-Speed 3035.56 samples/sec Loss 5.7315 LearningRate 0.0197 Epoch: 11 Global Step: 138220 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 14:48:27,273-Speed 3033.60 samples/sec Loss 5.7146 LearningRate 0.0197 Epoch: 11 Global Step: 138230 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 14:48:30,658-Speed 3026.15 samples/sec Loss 5.5411 LearningRate 0.0197 Epoch: 11 Global Step: 138240 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 14:48:34,063-Speed 3007.59 samples/sec Loss 5.6156 LearningRate 0.0197 Epoch: 11 Global Step: 138250 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 14:48:37,446-Speed 3028.12 samples/sec Loss 5.5322 LearningRate 0.0197 Epoch: 11 Global Step: 138260 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:48:40,774-Speed 3077.93 samples/sec Loss 5.6076 LearningRate 0.0197 Epoch: 11 Global Step: 138270 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:48:44,152-Speed 3032.07 samples/sec Loss 5.6223 LearningRate 0.0197 Epoch: 11 Global Step: 138280 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:48:47,518-Speed 3043.03 samples/sec Loss 5.6211 LearningRate 0.0197 Epoch: 11 Global Step: 138290 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:48:50,917-Speed 3013.78 samples/sec Loss 5.6517 LearningRate 0.0196 Epoch: 11 Global Step: 138300 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:48:54,236-Speed 3085.55 samples/sec Loss 5.6574 LearningRate 0.0196 Epoch: 11 Global Step: 138310 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:48:57,639-Speed 3010.78 samples/sec Loss 5.5269 LearningRate 0.0196 Epoch: 11 Global Step: 138320 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:49:01,048-Speed 3004.44 samples/sec Loss 5.6633 LearningRate 0.0196 Epoch: 11 Global Step: 138330 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:49:04,473-Speed 2990.55 samples/sec Loss 5.5995 LearningRate 0.0196 Epoch: 11 Global Step: 138340 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:49:07,899-Speed 2990.50 samples/sec Loss 5.6907 LearningRate 0.0196 Epoch: 11 Global Step: 138350 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:49:11,279-Speed 3030.12 samples/sec Loss 5.6272 LearningRate 0.0196 Epoch: 11 Global Step: 138360 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 14:49:14,756-Speed 2945.89 samples/sec Loss 5.7025 LearningRate 0.0196 Epoch: 11 Global Step: 138370 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 14:49:18,159-Speed 3010.30 samples/sec Loss 5.7014 LearningRate 0.0196 Epoch: 11 Global Step: 138380 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 14:49:21,532-Speed 3037.01 samples/sec Loss 5.7378 LearningRate 0.0196 Epoch: 11 Global Step: 138390 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 14:49:24,932-Speed 3012.84 samples/sec Loss 5.7091 LearningRate 0.0196 Epoch: 11 Global Step: 138400 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 14:49:28,295-Speed 3045.02 samples/sec Loss 5.6279 LearningRate 0.0196 Epoch: 11 Global Step: 138410 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 14:49:31,674-Speed 3032.14 samples/sec Loss 5.6825 LearningRate 0.0196 Epoch: 11 Global Step: 138420 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 14:49:35,089-Speed 2998.95 samples/sec Loss 5.6655 LearningRate 0.0196 Epoch: 11 Global Step: 138430 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 14:49:38,457-Speed 3042.16 samples/sec Loss 5.6469 LearningRate 0.0196 Epoch: 11 Global Step: 138440 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:49:41,797-Speed 3066.53 samples/sec Loss 5.7239 LearningRate 0.0196 Epoch: 11 Global Step: 138450 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:49:45,189-Speed 3019.16 samples/sec Loss 5.6730 LearningRate 0.0196 Epoch: 11 Global Step: 138460 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:49:48,594-Speed 3008.41 samples/sec Loss 5.7010 LearningRate 0.0196 Epoch: 11 Global Step: 138470 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:49:51,967-Speed 3037.23 samples/sec Loss 5.5896 LearningRate 0.0196 Epoch: 11 Global Step: 138480 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:49:55,380-Speed 3001.34 samples/sec Loss 5.7438 LearningRate 0.0196 Epoch: 11 Global Step: 138490 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:49:58,761-Speed 3029.11 samples/sec Loss 5.5900 LearningRate 0.0196 Epoch: 11 Global Step: 138500 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 14:50:02,144-Speed 3028.15 samples/sec Loss 5.8236 LearningRate 0.0196 Epoch: 11 Global Step: 138510 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 14:50:05,523-Speed 3031.03 samples/sec Loss 5.5781 LearningRate 0.0196 Epoch: 11 Global Step: 138520 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 14:50:09,032-Speed 2919.68 samples/sec Loss 5.7521 LearningRate 0.0196 Epoch: 11 Global Step: 138530 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 14:50:12,468-Speed 2981.07 samples/sec Loss 5.7029 LearningRate 0.0196 Epoch: 11 Global Step: 138540 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 14:50:15,806-Speed 3068.41 samples/sec Loss 5.7101 LearningRate 0.0196 Epoch: 11 Global Step: 138550 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 14:50:19,220-Speed 3000.02 samples/sec Loss 5.6632 LearningRate 0.0196 Epoch: 11 Global Step: 138560 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 14:50:22,651-Speed 2985.72 samples/sec Loss 5.6637 LearningRate 0.0196 Epoch: 11 Global Step: 138570 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 14:50:26,053-Speed 3010.71 samples/sec Loss 5.5923 LearningRate 0.0196 Epoch: 11 Global Step: 138580 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 14:50:29,423-Speed 3039.90 samples/sec Loss 5.7790 LearningRate 0.0195 Epoch: 11 Global Step: 138590 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 14:50:32,735-Speed 3092.57 samples/sec Loss 5.7628 LearningRate 0.0195 Epoch: 11 Global Step: 138600 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:50:36,091-Speed 3051.66 samples/sec Loss 5.7305 LearningRate 0.0195 Epoch: 11 Global Step: 138610 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:50:39,478-Speed 3024.88 samples/sec Loss 5.7328 LearningRate 0.0195 Epoch: 11 Global Step: 138620 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:50:42,830-Speed 3055.56 samples/sec Loss 5.7315 LearningRate 0.0195 Epoch: 11 Global Step: 138630 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:50:46,280-Speed 2968.49 samples/sec Loss 5.8123 LearningRate 0.0195 Epoch: 11 Global Step: 138640 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:50:49,677-Speed 3017.25 samples/sec Loss 5.6773 LearningRate 0.0195 Epoch: 11 Global Step: 138650 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:50:53,044-Speed 3042.12 samples/sec Loss 5.7607 LearningRate 0.0195 Epoch: 11 Global Step: 138660 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:50:56,418-Speed 3035.35 samples/sec Loss 5.8770 LearningRate 0.0195 Epoch: 11 Global Step: 138670 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:50:59,889-Speed 2951.45 samples/sec Loss 5.7453 LearningRate 0.0195 Epoch: 11 Global Step: 138680 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:51:03,371-Speed 2941.67 samples/sec Loss 5.6513 LearningRate 0.0195 Epoch: 11 Global Step: 138690 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:51:06,808-Speed 2980.25 samples/sec Loss 5.7829 LearningRate 0.0195 Epoch: 11 Global Step: 138700 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 14:51:10,140-Speed 3074.39 samples/sec Loss 5.7560 LearningRate 0.0195 Epoch: 11 Global Step: 138710 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:51:13,616-Speed 2946.81 samples/sec Loss 5.7700 LearningRate 0.0195 Epoch: 11 Global Step: 138720 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:51:16,964-Speed 3059.43 samples/sec Loss 5.7331 LearningRate 0.0195 Epoch: 11 Global Step: 138730 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:51:20,357-Speed 3019.54 samples/sec Loss 5.7523 LearningRate 0.0195 Epoch: 11 Global Step: 138740 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:51:23,737-Speed 3030.44 samples/sec Loss 5.7052 LearningRate 0.0195 Epoch: 11 Global Step: 138750 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:51:27,049-Speed 3092.39 samples/sec Loss 5.7168 LearningRate 0.0195 Epoch: 11 Global Step: 138760 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:51:30,367-Speed 3087.03 samples/sec Loss 5.6590 LearningRate 0.0195 Epoch: 11 Global Step: 138770 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:51:33,717-Speed 3058.43 samples/sec Loss 5.8683 LearningRate 0.0195 Epoch: 11 Global Step: 138780 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:51:37,118-Speed 3011.14 samples/sec Loss 5.8081 LearningRate 0.0195 Epoch: 11 Global Step: 138790 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:51:40,604-Speed 2938.49 samples/sec Loss 5.7212 LearningRate 0.0195 Epoch: 11 Global Step: 138800 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:51:43,990-Speed 3025.10 samples/sec Loss 5.6742 LearningRate 0.0195 Epoch: 11 Global Step: 138810 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 14:51:47,406-Speed 2998.79 samples/sec Loss 5.6505 LearningRate 0.0195 Epoch: 11 Global Step: 138820 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 14:51:50,759-Speed 3055.04 samples/sec Loss 5.7423 LearningRate 0.0195 Epoch: 11 Global Step: 138830 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 14:51:54,084-Speed 3080.64 samples/sec Loss 5.7539 LearningRate 0.0195 Epoch: 11 Global Step: 138840 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:51:57,508-Speed 2991.36 samples/sec Loss 5.7930 LearningRate 0.0195 Epoch: 11 Global Step: 138850 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:52:00,821-Speed 3091.92 samples/sec Loss 5.6626 LearningRate 0.0195 Epoch: 11 Global Step: 138860 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:52:04,210-Speed 3022.33 samples/sec Loss 5.7423 LearningRate 0.0194 Epoch: 11 Global Step: 138870 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:52:07,557-Speed 3059.63 samples/sec Loss 5.7292 LearningRate 0.0194 Epoch: 11 Global Step: 138880 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:52:11,024-Speed 2955.23 samples/sec Loss 5.6760 LearningRate 0.0194 Epoch: 11 Global Step: 138890 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:52:14,403-Speed 3031.28 samples/sec Loss 5.8594 LearningRate 0.0194 Epoch: 11 Global Step: 138900 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:52:17,741-Speed 3068.18 samples/sec Loss 5.8939 LearningRate 0.0194 Epoch: 11 Global Step: 138910 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:52:21,151-Speed 3004.83 samples/sec Loss 5.7381 LearningRate 0.0194 Epoch: 11 Global Step: 138920 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:52:24,554-Speed 3010.04 samples/sec Loss 5.6916 LearningRate 0.0194 Epoch: 11 Global Step: 138930 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:52:27,973-Speed 2995.79 samples/sec Loss 5.7907 LearningRate 0.0194 Epoch: 11 Global Step: 138940 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 14:52:31,328-Speed 3053.27 samples/sec Loss 5.7027 LearningRate 0.0194 Epoch: 11 Global Step: 138950 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 14:52:34,737-Speed 3004.36 samples/sec Loss 5.8906 LearningRate 0.0194 Epoch: 11 Global Step: 138960 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 14:52:38,114-Speed 3032.85 samples/sec Loss 5.7975 LearningRate 0.0194 Epoch: 11 Global Step: 138970 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 14:52:41,582-Speed 2954.70 samples/sec Loss 5.8054 LearningRate 0.0194 Epoch: 11 Global Step: 138980 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 14:52:45,051-Speed 2952.30 samples/sec Loss 5.8812 LearningRate 0.0194 Epoch: 11 Global Step: 138990 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 14:52:48,452-Speed 3011.72 samples/sec Loss 5.8561 LearningRate 0.0194 Epoch: 11 Global Step: 139000 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 14:52:51,841-Speed 3022.81 samples/sec Loss 5.8409 LearningRate 0.0194 Epoch: 11 Global Step: 139010 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 14:52:55,237-Speed 3016.82 samples/sec Loss 5.8256 LearningRate 0.0194 Epoch: 11 Global Step: 139020 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 14:52:58,690-Speed 2967.56 samples/sec Loss 5.7133 LearningRate 0.0194 Epoch: 11 Global Step: 139030 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 14:53:02,091-Speed 3012.37 samples/sec Loss 5.9397 LearningRate 0.0194 Epoch: 11 Global Step: 139040 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:53:05,458-Speed 3041.81 samples/sec Loss 5.7744 LearningRate 0.0194 Epoch: 11 Global Step: 139050 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:53:08,819-Speed 3047.51 samples/sec Loss 5.8335 LearningRate 0.0194 Epoch: 11 Global Step: 139060 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:53:12,189-Speed 3039.49 samples/sec Loss 5.7979 LearningRate 0.0194 Epoch: 11 Global Step: 139070 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:53:15,533-Speed 3062.74 samples/sec Loss 5.7731 LearningRate 0.0194 Epoch: 11 Global Step: 139080 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:53:18,948-Speed 3000.03 samples/sec Loss 5.8265 LearningRate 0.0194 Epoch: 11 Global Step: 139090 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:53:22,333-Speed 3025.82 samples/sec Loss 5.8536 LearningRate 0.0194 Epoch: 11 Global Step: 139100 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:53:25,703-Speed 3039.21 samples/sec Loss 5.7487 LearningRate 0.0194 Epoch: 11 Global Step: 139110 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:53:29,109-Speed 3007.44 samples/sec Loss 5.8134 LearningRate 0.0194 Epoch: 11 Global Step: 139120 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:53:32,555-Speed 2972.11 samples/sec Loss 5.8991 LearningRate 0.0194 Epoch: 11 Global Step: 139130 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:53:35,929-Speed 3036.10 samples/sec Loss 5.7481 LearningRate 0.0194 Epoch: 11 Global Step: 139140 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 14:53:39,339-Speed 3004.27 samples/sec Loss 5.8002 LearningRate 0.0193 Epoch: 11 Global Step: 139150 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 14:53:42,788-Speed 2969.30 samples/sec Loss 5.7329 LearningRate 0.0193 Epoch: 11 Global Step: 139160 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 14:53:46,175-Speed 3023.96 samples/sec Loss 5.8569 LearningRate 0.0193 Epoch: 11 Global Step: 139170 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 14:53:49,578-Speed 3009.96 samples/sec Loss 5.8176 LearningRate 0.0193 Epoch: 11 Global Step: 139180 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 14:53:52,976-Speed 3015.20 samples/sec Loss 5.8354 LearningRate 0.0193 Epoch: 11 Global Step: 139190 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 14:53:56,463-Speed 2937.50 samples/sec Loss 5.8621 LearningRate 0.0193 Epoch: 11 Global Step: 139200 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 14:53:59,863-Speed 3012.85 samples/sec Loss 5.9518 LearningRate 0.0193 Epoch: 11 Global Step: 139210 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 14:54:03,327-Speed 2957.23 samples/sec Loss 5.8582 LearningRate 0.0193 Epoch: 11 Global Step: 139220 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 14:54:06,649-Speed 3084.29 samples/sec Loss 5.8362 LearningRate 0.0193 Epoch: 11 Global Step: 139230 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:54:09,980-Speed 3074.35 samples/sec Loss 5.8990 LearningRate 0.0193 Epoch: 11 Global Step: 139240 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:54:13,339-Speed 3051.24 samples/sec Loss 5.8599 LearningRate 0.0193 Epoch: 11 Global Step: 139250 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:54:16,746-Speed 3005.95 samples/sec Loss 5.8871 LearningRate 0.0193 Epoch: 11 Global Step: 139260 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:54:20,173-Speed 2988.69 samples/sec Loss 5.8866 LearningRate 0.0193 Epoch: 11 Global Step: 139270 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:54:23,591-Speed 2997.59 samples/sec Loss 5.8663 LearningRate 0.0193 Epoch: 11 Global Step: 139280 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:54:26,999-Speed 3005.23 samples/sec Loss 5.8925 LearningRate 0.0193 Epoch: 11 Global Step: 139290 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:54:30,392-Speed 3018.77 samples/sec Loss 5.8887 LearningRate 0.0193 Epoch: 11 Global Step: 139300 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:54:33,792-Speed 3012.89 samples/sec Loss 5.9625 LearningRate 0.0193 Epoch: 11 Global Step: 139310 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:54:37,151-Speed 3049.38 samples/sec Loss 5.8520 LearningRate 0.0193 Epoch: 11 Global Step: 139320 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:54:40,510-Speed 3049.77 samples/sec Loss 5.7953 LearningRate 0.0193 Epoch: 11 Global Step: 139330 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 14:54:43,914-Speed 3009.23 samples/sec Loss 5.8138 LearningRate 0.0193 Epoch: 11 Global Step: 139340 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 14:54:47,351-Speed 2980.29 samples/sec Loss 5.9564 LearningRate 0.0193 Epoch: 11 Global Step: 139350 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 14:54:50,735-Speed 3025.87 samples/sec Loss 5.9914 LearningRate 0.0193 Epoch: 11 Global Step: 139360 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 14:54:54,129-Speed 3018.59 samples/sec Loss 5.9067 LearningRate 0.0193 Epoch: 11 Global Step: 139370 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 14:54:57,524-Speed 3017.37 samples/sec Loss 5.8703 LearningRate 0.0193 Epoch: 11 Global Step: 139380 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 14:55:00,868-Speed 3062.74 samples/sec Loss 5.8370 LearningRate 0.0193 Epoch: 11 Global Step: 139390 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 14:55:04,220-Speed 3055.37 samples/sec Loss 5.8760 LearningRate 0.0193 Epoch: 11 Global Step: 139400 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 14:55:07,685-Speed 2956.29 samples/sec Loss 5.8519 LearningRate 0.0193 Epoch: 11 Global Step: 139410 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 14:55:11,092-Speed 3006.60 samples/sec Loss 5.8751 LearningRate 0.0193 Epoch: 11 Global Step: 139420 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 14:55:14,416-Speed 3082.06 samples/sec Loss 5.8622 LearningRate 0.0192 Epoch: 11 Global Step: 139430 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 14:55:17,887-Speed 2950.92 samples/sec Loss 5.9286 LearningRate 0.0192 Epoch: 11 Global Step: 139440 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 14:55:21,289-Speed 3010.61 samples/sec Loss 5.7764 LearningRate 0.0192 Epoch: 11 Global Step: 139450 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 14:55:24,654-Speed 3043.88 samples/sec Loss 5.9829 LearningRate 0.0192 Epoch: 11 Global Step: 139460 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 14:55:28,037-Speed 3027.31 samples/sec Loss 5.8843 LearningRate 0.0192 Epoch: 11 Global Step: 139470 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 14:55:31,424-Speed 3024.10 samples/sec Loss 6.0273 LearningRate 0.0192 Epoch: 11 Global Step: 139480 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 14:55:34,816-Speed 3019.94 samples/sec Loss 5.9031 LearningRate 0.0192 Epoch: 11 Global Step: 139490 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 14:55:38,166-Speed 3057.99 samples/sec Loss 5.8658 LearningRate 0.0192 Epoch: 11 Global Step: 139500 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 14:55:41,532-Speed 3043.16 samples/sec Loss 5.8872 LearningRate 0.0192 Epoch: 11 Global Step: 139510 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 14:55:44,879-Speed 3060.62 samples/sec Loss 5.9555 LearningRate 0.0192 Epoch: 11 Global Step: 139520 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 14:55:48,294-Speed 2999.15 samples/sec Loss 5.9832 LearningRate 0.0192 Epoch: 11 Global Step: 139530 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 14:55:51,670-Speed 3033.94 samples/sec Loss 5.8798 LearningRate 0.0192 Epoch: 11 Global Step: 139540 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 14:55:55,117-Speed 2972.37 samples/sec Loss 5.9329 LearningRate 0.0192 Epoch: 11 Global Step: 139550 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:55:58,588-Speed 2950.77 samples/sec Loss 5.9789 LearningRate 0.0192 Epoch: 11 Global Step: 139560 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:56:01,986-Speed 3014.32 samples/sec Loss 5.8612 LearningRate 0.0192 Epoch: 11 Global Step: 139570 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:56:05,341-Speed 3053.50 samples/sec Loss 5.8830 LearningRate 0.0192 Epoch: 11 Global Step: 139580 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:56:08,798-Speed 2963.02 samples/sec Loss 5.9640 LearningRate 0.0192 Epoch: 11 Global Step: 139590 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:56:12,170-Speed 3040.33 samples/sec Loss 5.8746 LearningRate 0.0192 Epoch: 11 Global Step: 139600 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:56:15,530-Speed 3048.11 samples/sec Loss 5.8909 LearningRate 0.0192 Epoch: 11 Global Step: 139610 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:56:18,920-Speed 3021.81 samples/sec Loss 6.0621 LearningRate 0.0192 Epoch: 11 Global Step: 139620 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:56:22,277-Speed 3051.55 samples/sec Loss 5.8834 LearningRate 0.0192 Epoch: 11 Global Step: 139630 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:56:25,655-Speed 3031.59 samples/sec Loss 5.9950 LearningRate 0.0192 Epoch: 11 Global Step: 139640 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:56:29,145-Speed 2935.16 samples/sec Loss 6.0066 LearningRate 0.0192 Epoch: 11 Global Step: 139650 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 14:56:32,507-Speed 3046.70 samples/sec Loss 5.8420 LearningRate 0.0192 Epoch: 11 Global Step: 139660 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 14:56:35,855-Speed 3059.28 samples/sec Loss 5.9243 LearningRate 0.0192 Epoch: 11 Global Step: 139670 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 14:56:39,204-Speed 3058.80 samples/sec Loss 6.0498 LearningRate 0.0192 Epoch: 11 Global Step: 139680 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 14:56:42,667-Speed 2958.00 samples/sec Loss 5.8945 LearningRate 0.0192 Epoch: 11 Global Step: 139690 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 14:56:46,028-Speed 3047.61 samples/sec Loss 5.9249 LearningRate 0.0192 Epoch: 11 Global Step: 139700 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 14:56:49,386-Speed 3050.26 samples/sec Loss 5.9389 LearningRate 0.0191 Epoch: 11 Global Step: 139710 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 14:56:52,700-Speed 3090.86 samples/sec Loss 5.9791 LearningRate 0.0191 Epoch: 11 Global Step: 139720 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:56:56,099-Speed 3013.06 samples/sec Loss 5.9489 LearningRate 0.0191 Epoch: 11 Global Step: 139730 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:56:59,522-Speed 2992.20 samples/sec Loss 6.0438 LearningRate 0.0191 Epoch: 11 Global Step: 139740 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:57:02,851-Speed 3077.34 samples/sec Loss 5.9562 LearningRate 0.0191 Epoch: 11 Global Step: 139750 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:57:06,298-Speed 2971.32 samples/sec Loss 5.9851 LearningRate 0.0191 Epoch: 11 Global Step: 139760 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:57:09,642-Speed 3062.83 samples/sec Loss 6.0628 LearningRate 0.0191 Epoch: 11 Global Step: 139770 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:57:13,024-Speed 3029.97 samples/sec Loss 5.8748 LearningRate 0.0191 Epoch: 11 Global Step: 139780 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:57:16,350-Speed 3079.21 samples/sec Loss 5.9036 LearningRate 0.0191 Epoch: 11 Global Step: 139790 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:57:19,742-Speed 3019.19 samples/sec Loss 5.9950 LearningRate 0.0191 Epoch: 11 Global Step: 139800 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:57:23,090-Speed 3059.61 samples/sec Loss 5.9238 LearningRate 0.0191 Epoch: 11 Global Step: 139810 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:57:26,398-Speed 3096.90 samples/sec Loss 6.0192 LearningRate 0.0191 Epoch: 11 Global Step: 139820 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 14:57:29,777-Speed 3031.54 samples/sec Loss 5.9465 LearningRate 0.0191 Epoch: 11 Global Step: 139830 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 14:57:33,146-Speed 3040.13 samples/sec Loss 5.8789 LearningRate 0.0191 Epoch: 11 Global Step: 139840 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:57:36,498-Speed 3056.01 samples/sec Loss 5.8806 LearningRate 0.0191 Epoch: 11 Global Step: 139850 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:57:39,939-Speed 2976.33 samples/sec Loss 5.9684 LearningRate 0.0191 Epoch: 11 Global Step: 139860 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:57:43,381-Speed 2976.67 samples/sec Loss 5.8972 LearningRate 0.0191 Epoch: 11 Global Step: 139870 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:57:46,699-Speed 3087.33 samples/sec Loss 5.9157 LearningRate 0.0191 Epoch: 11 Global Step: 139880 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:57:50,091-Speed 3019.53 samples/sec Loss 5.9936 LearningRate 0.0191 Epoch: 11 Global Step: 139890 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:57:53,469-Speed 3031.75 samples/sec Loss 5.9588 LearningRate 0.0191 Epoch: 11 Global Step: 139900 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:57:56,958-Speed 2936.27 samples/sec Loss 5.9053 LearningRate 0.0191 Epoch: 11 Global Step: 139910 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:58:00,377-Speed 2995.61 samples/sec Loss 5.9290 LearningRate 0.0191 Epoch: 11 Global Step: 139920 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:58:03,711-Speed 3072.98 samples/sec Loss 6.1031 LearningRate 0.0191 Epoch: 11 Global Step: 139930 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:58:07,174-Speed 2957.62 samples/sec Loss 6.0369 LearningRate 0.0191 Epoch: 11 Global Step: 139940 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 14:58:10,489-Speed 3090.28 samples/sec Loss 5.9934 LearningRate 0.0191 Epoch: 11 Global Step: 139950 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 14:58:13,852-Speed 3045.73 samples/sec Loss 5.8644 LearningRate 0.0191 Epoch: 11 Global Step: 139960 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 14:58:17,204-Speed 3054.97 samples/sec Loss 5.9352 LearningRate 0.0191 Epoch: 11 Global Step: 139970 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 14:58:20,532-Speed 3078.50 samples/sec Loss 5.8942 LearningRate 0.0191 Epoch: 11 Global Step: 139980 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 14:58:23,877-Speed 3061.86 samples/sec Loss 5.9514 LearningRate 0.0191 Epoch: 11 Global Step: 139990 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 14:58:27,281-Speed 3009.07 samples/sec Loss 6.0130 LearningRate 0.0190 Epoch: 11 Global Step: 140000 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 14:58:30,772-Speed 2934.27 samples/sec Loss 5.9340 LearningRate 0.0190 Epoch: 11 Global Step: 140010 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 14:58:34,139-Speed 3041.64 samples/sec Loss 6.0000 LearningRate 0.0190 Epoch: 11 Global Step: 140020 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 14:58:37,542-Speed 3009.90 samples/sec Loss 5.9757 LearningRate 0.0190 Epoch: 11 Global Step: 140030 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 14:58:41,003-Speed 2959.40 samples/sec Loss 6.0116 LearningRate 0.0190 Epoch: 11 Global Step: 140040 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 14:58:44,367-Speed 3045.61 samples/sec Loss 6.0382 LearningRate 0.0190 Epoch: 11 Global Step: 140050 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 14:58:47,796-Speed 2986.50 samples/sec Loss 6.0646 LearningRate 0.0190 Epoch: 11 Global Step: 140060 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 14:58:51,182-Speed 3025.27 samples/sec Loss 5.9654 LearningRate 0.0190 Epoch: 11 Global Step: 140070 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 14:58:54,676-Speed 2931.75 samples/sec Loss 5.9663 LearningRate 0.0190 Epoch: 11 Global Step: 140080 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 14:58:58,026-Speed 3057.29 samples/sec Loss 5.9802 LearningRate 0.0190 Epoch: 11 Global Step: 140090 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 14:59:01,483-Speed 2963.17 samples/sec Loss 6.0209 LearningRate 0.0190 Epoch: 11 Global Step: 140100 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 14:59:04,838-Speed 3052.53 samples/sec Loss 5.9351 LearningRate 0.0190 Epoch: 11 Global Step: 140110 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 14:59:08,181-Speed 3064.24 samples/sec Loss 5.9864 LearningRate 0.0190 Epoch: 11 Global Step: 140120 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 14:59:11,550-Speed 3040.92 samples/sec Loss 6.0388 LearningRate 0.0190 Epoch: 11 Global Step: 140130 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 14:59:14,952-Speed 3011.07 samples/sec Loss 5.9593 LearningRate 0.0190 Epoch: 11 Global Step: 140140 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:59:18,391-Speed 2978.30 samples/sec Loss 5.9627 LearningRate 0.0190 Epoch: 11 Global Step: 140150 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:59:21,742-Speed 3057.12 samples/sec Loss 6.1339 LearningRate 0.0190 Epoch: 11 Global Step: 140160 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:59:25,136-Speed 3018.22 samples/sec Loss 6.0658 LearningRate 0.0190 Epoch: 11 Global Step: 140170 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:59:28,461-Speed 3079.97 samples/sec Loss 5.9289 LearningRate 0.0190 Epoch: 11 Global Step: 140180 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:59:31,797-Speed 3070.99 samples/sec Loss 6.0725 LearningRate 0.0190 Epoch: 11 Global Step: 140190 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:59:35,114-Speed 3088.12 samples/sec Loss 5.9151 LearningRate 0.0190 Epoch: 11 Global Step: 140200 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:59:38,515-Speed 3012.61 samples/sec Loss 5.9890 LearningRate 0.0190 Epoch: 11 Global Step: 140210 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:59:41,843-Speed 3077.70 samples/sec Loss 5.9571 LearningRate 0.0190 Epoch: 11 Global Step: 140220 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:59:45,201-Speed 3050.90 samples/sec Loss 6.0901 LearningRate 0.0190 Epoch: 11 Global Step: 140230 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:59:48,551-Speed 3057.03 samples/sec Loss 5.9945 LearningRate 0.0190 Epoch: 11 Global Step: 140240 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 14:59:51,971-Speed 2994.88 samples/sec Loss 6.0932 LearningRate 0.0190 Epoch: 11 Global Step: 140250 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:59:55,359-Speed 3023.67 samples/sec Loss 5.9824 LearningRate 0.0190 Epoch: 11 Global Step: 140260 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 14:59:58,774-Speed 2999.63 samples/sec Loss 5.8586 LearningRate 0.0190 Epoch: 11 Global Step: 140270 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:00:02,238-Speed 2956.39 samples/sec Loss 5.8955 LearningRate 0.0189 Epoch: 11 Global Step: 140280 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:00:05,560-Speed 3083.75 samples/sec Loss 5.9999 LearningRate 0.0189 Epoch: 11 Global Step: 140290 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:00:08,928-Speed 3041.57 samples/sec Loss 6.0515 LearningRate 0.0189 Epoch: 11 Global Step: 140300 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:00:12,341-Speed 3001.37 samples/sec Loss 6.0652 LearningRate 0.0189 Epoch: 11 Global Step: 140310 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:00:15,704-Speed 3045.54 samples/sec Loss 5.9725 LearningRate 0.0189 Epoch: 11 Global Step: 140320 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:00:19,054-Speed 3057.48 samples/sec Loss 5.9924 LearningRate 0.0189 Epoch: 11 Global Step: 140330 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:00:22,394-Speed 3067.51 samples/sec Loss 5.9059 LearningRate 0.0189 Epoch: 11 Global Step: 140340 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:00:25,702-Speed 3096.03 samples/sec Loss 5.9657 LearningRate 0.0189 Epoch: 11 Global Step: 140350 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:00:29,034-Speed 3074.69 samples/sec Loss 6.0147 LearningRate 0.0189 Epoch: 11 Global Step: 140360 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:00:32,442-Speed 3005.49 samples/sec Loss 5.9973 LearningRate 0.0189 Epoch: 11 Global Step: 140370 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:00:35,807-Speed 3044.07 samples/sec Loss 5.9821 LearningRate 0.0189 Epoch: 11 Global Step: 140380 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:00:39,250-Speed 2975.50 samples/sec Loss 5.9842 LearningRate 0.0189 Epoch: 11 Global Step: 140390 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:00:42,646-Speed 3015.68 samples/sec Loss 6.0331 LearningRate 0.0189 Epoch: 11 Global Step: 140400 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:00:46,032-Speed 3025.26 samples/sec Loss 5.9948 LearningRate 0.0189 Epoch: 11 Global Step: 140410 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:00:49,412-Speed 3030.36 samples/sec Loss 6.0152 LearningRate 0.0189 Epoch: 11 Global Step: 140420 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:00:52,890-Speed 2944.95 samples/sec Loss 5.9712 LearningRate 0.0189 Epoch: 11 Global Step: 140430 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:00:56,303-Speed 3001.82 samples/sec Loss 6.0895 LearningRate 0.0189 Epoch: 11 Global Step: 140440 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:00:59,706-Speed 3009.91 samples/sec Loss 6.0284 LearningRate 0.0189 Epoch: 11 Global Step: 140450 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:01:03,134-Speed 2987.69 samples/sec Loss 6.0877 LearningRate 0.0189 Epoch: 11 Global Step: 140460 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:01:06,536-Speed 3010.99 samples/sec Loss 5.9498 LearningRate 0.0189 Epoch: 11 Global Step: 140470 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:01:09,901-Speed 3044.22 samples/sec Loss 6.1691 LearningRate 0.0189 Epoch: 11 Global Step: 140480 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:01:13,320-Speed 2996.31 samples/sec Loss 5.9454 LearningRate 0.0189 Epoch: 11 Global Step: 140490 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:01:16,743-Speed 2991.46 samples/sec Loss 6.0149 LearningRate 0.0189 Epoch: 11 Global Step: 140500 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:01:20,139-Speed 3016.49 samples/sec Loss 6.0060 LearningRate 0.0189 Epoch: 11 Global Step: 140510 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:01:23,574-Speed 2981.95 samples/sec Loss 6.0662 LearningRate 0.0189 Epoch: 11 Global Step: 140520 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:01:27,090-Speed 2913.51 samples/sec Loss 5.9056 LearningRate 0.0189 Epoch: 11 Global Step: 140530 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:01:30,477-Speed 3024.13 samples/sec Loss 6.0433 LearningRate 0.0189 Epoch: 11 Global Step: 140540 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:01:33,893-Speed 2998.16 samples/sec Loss 6.0337 LearningRate 0.0189 Epoch: 11 Global Step: 140550 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:01:37,238-Speed 3062.73 samples/sec Loss 5.9783 LearningRate 0.0189 Epoch: 11 Global Step: 140560 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:01:40,710-Speed 2950.61 samples/sec Loss 5.9667 LearningRate 0.0188 Epoch: 11 Global Step: 140570 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:01:44,158-Speed 2970.07 samples/sec Loss 6.0795 LearningRate 0.0188 Epoch: 11 Global Step: 140580 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:01:47,492-Speed 3072.28 samples/sec Loss 6.0601 LearningRate 0.0188 Epoch: 11 Global Step: 140590 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:01:50,848-Speed 3051.79 samples/sec Loss 6.0215 LearningRate 0.0188 Epoch: 11 Global Step: 140600 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:01:54,184-Speed 3070.99 samples/sec Loss 6.0858 LearningRate 0.0188 Epoch: 11 Global Step: 140610 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:01:57,577-Speed 3018.96 samples/sec Loss 5.9791 LearningRate 0.0188 Epoch: 11 Global Step: 140620 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:02:00,983-Speed 3007.11 samples/sec Loss 6.0107 LearningRate 0.0188 Epoch: 11 Global Step: 140630 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:02:04,395-Speed 3002.44 samples/sec Loss 6.0459 LearningRate 0.0188 Epoch: 11 Global Step: 140640 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:02:07,826-Speed 2984.83 samples/sec Loss 6.0485 LearningRate 0.0188 Epoch: 11 Global Step: 140650 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:02:11,186-Speed 3048.60 samples/sec Loss 6.0900 LearningRate 0.0188 Epoch: 11 Global Step: 140660 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:02:14,676-Speed 2935.35 samples/sec Loss 6.0285 LearningRate 0.0188 Epoch: 11 Global Step: 140670 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:02:18,079-Speed 3009.96 samples/sec Loss 6.1089 LearningRate 0.0188 Epoch: 11 Global Step: 140680 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:02:21,442-Speed 3045.42 samples/sec Loss 5.9742 LearningRate 0.0188 Epoch: 11 Global Step: 140690 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:02:24,874-Speed 2985.15 samples/sec Loss 6.0248 LearningRate 0.0188 Epoch: 11 Global Step: 140700 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:02:28,316-Speed 2975.52 samples/sec Loss 6.0981 LearningRate 0.0188 Epoch: 11 Global Step: 140710 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:02:31,702-Speed 3025.30 samples/sec Loss 6.0532 LearningRate 0.0188 Epoch: 11 Global Step: 140720 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:02:35,042-Speed 3066.59 samples/sec Loss 6.2298 LearningRate 0.0188 Epoch: 11 Global Step: 140730 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:02:38,403-Speed 3047.74 samples/sec Loss 6.0948 LearningRate 0.0188 Epoch: 11 Global Step: 140740 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:02:41,801-Speed 3013.74 samples/sec Loss 5.9655 LearningRate 0.0188 Epoch: 11 Global Step: 140750 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:02:45,161-Speed 3049.15 samples/sec Loss 6.0704 LearningRate 0.0188 Epoch: 11 Global Step: 140760 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:02:48,520-Speed 3049.19 samples/sec Loss 6.0493 LearningRate 0.0188 Epoch: 11 Global Step: 140770 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:02:51,896-Speed 3034.46 samples/sec Loss 6.0584 LearningRate 0.0188 Epoch: 11 Global Step: 140780 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:02:55,289-Speed 3018.92 samples/sec Loss 6.0689 LearningRate 0.0188 Epoch: 11 Global Step: 140790 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:02:58,666-Speed 3032.60 samples/sec Loss 6.1000 LearningRate 0.0188 Epoch: 11 Global Step: 140800 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:03:02,056-Speed 3021.79 samples/sec Loss 6.0292 LearningRate 0.0188 Epoch: 11 Global Step: 140810 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:03:05,401-Speed 3062.14 samples/sec Loss 6.0161 LearningRate 0.0188 Epoch: 11 Global Step: 140820 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:03:08,750-Speed 3059.10 samples/sec Loss 6.0101 LearningRate 0.0188 Epoch: 11 Global Step: 140830 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:03:12,059-Speed 3094.94 samples/sec Loss 5.9320 LearningRate 0.0188 Epoch: 11 Global Step: 140840 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:03:15,413-Speed 3054.05 samples/sec Loss 6.1624 LearningRate 0.0188 Epoch: 11 Global Step: 140850 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:03:18,819-Speed 3007.34 samples/sec Loss 6.0544 LearningRate 0.0187 Epoch: 11 Global Step: 140860 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:03:22,230-Speed 3002.76 samples/sec Loss 6.0456 LearningRate 0.0187 Epoch: 11 Global Step: 140870 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:03:25,586-Speed 3052.30 samples/sec Loss 6.0853 LearningRate 0.0187 Epoch: 11 Global Step: 140880 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:03:28,929-Speed 3063.80 samples/sec Loss 6.0772 LearningRate 0.0187 Epoch: 11 Global Step: 140890 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:03:32,267-Speed 3068.90 samples/sec Loss 6.1586 LearningRate 0.0187 Epoch: 11 Global Step: 140900 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:03:35,636-Speed 3040.36 samples/sec Loss 6.1444 LearningRate 0.0187 Epoch: 11 Global Step: 140910 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:03:39,003-Speed 3042.59 samples/sec Loss 6.0616 LearningRate 0.0187 Epoch: 11 Global Step: 140920 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:03:42,352-Speed 3057.90 samples/sec Loss 6.0052 LearningRate 0.0187 Epoch: 11 Global Step: 140930 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:03:45,779-Speed 2989.33 samples/sec Loss 6.1328 LearningRate 0.0187 Epoch: 11 Global Step: 140940 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:03:49,203-Speed 2991.69 samples/sec Loss 6.0153 LearningRate 0.0187 Epoch: 11 Global Step: 140950 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:03:52,598-Speed 3016.39 samples/sec Loss 5.9689 LearningRate 0.0187 Epoch: 11 Global Step: 140960 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:03:56,015-Speed 2997.52 samples/sec Loss 6.0513 LearningRate 0.0187 Epoch: 11 Global Step: 140970 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:03:59,397-Speed 3028.96 samples/sec Loss 6.0328 LearningRate 0.0187 Epoch: 11 Global Step: 140980 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:04:02,827-Speed 2986.92 samples/sec Loss 6.1596 LearningRate 0.0187 Epoch: 11 Global Step: 140990 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:04:06,291-Speed 2956.91 samples/sec Loss 6.0768 LearningRate 0.0187 Epoch: 11 Global Step: 141000 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:04:09,661-Speed 3039.06 samples/sec Loss 6.0564 LearningRate 0.0187 Epoch: 11 Global Step: 141010 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:04:13,117-Speed 2963.68 samples/sec Loss 6.1069 LearningRate 0.0187 Epoch: 11 Global Step: 141020 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:04:16,511-Speed 3018.71 samples/sec Loss 6.1503 LearningRate 0.0187 Epoch: 11 Global Step: 141030 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:04:19,880-Speed 3040.44 samples/sec Loss 6.0768 LearningRate 0.0187 Epoch: 11 Global Step: 141040 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:04:23,235-Speed 3052.73 samples/sec Loss 6.0050 LearningRate 0.0187 Epoch: 11 Global Step: 141050 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:04:26,622-Speed 3024.41 samples/sec Loss 6.1126 LearningRate 0.0187 Epoch: 11 Global Step: 141060 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:04:30,047-Speed 2990.86 samples/sec Loss 6.0153 LearningRate 0.0187 Epoch: 11 Global Step: 141070 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:04:33,421-Speed 3036.05 samples/sec Loss 6.0740 LearningRate 0.0187 Epoch: 11 Global Step: 141080 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:04:36,820-Speed 3013.79 samples/sec Loss 6.0493 LearningRate 0.0187 Epoch: 11 Global Step: 141090 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:04:40,150-Speed 3075.46 samples/sec Loss 6.0226 LearningRate 0.0187 Epoch: 11 Global Step: 141100 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:04:43,575-Speed 2990.80 samples/sec Loss 6.0200 LearningRate 0.0187 Epoch: 11 Global Step: 141110 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:04:46,982-Speed 3006.31 samples/sec Loss 5.9732 LearningRate 0.0187 Epoch: 11 Global Step: 141120 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:04:50,381-Speed 3013.82 samples/sec Loss 6.1581 LearningRate 0.0187 Epoch: 11 Global Step: 141130 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:04:53,736-Speed 3052.79 samples/sec Loss 6.0002 LearningRate 0.0186 Epoch: 11 Global Step: 141140 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:04:57,062-Speed 3079.59 samples/sec Loss 5.9862 LearningRate 0.0186 Epoch: 11 Global Step: 141150 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:05:00,478-Speed 2998.78 samples/sec Loss 6.1454 LearningRate 0.0186 Epoch: 11 Global Step: 141160 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:05:03,896-Speed 2995.96 samples/sec Loss 6.1187 LearningRate 0.0186 Epoch: 11 Global Step: 141170 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:05:07,246-Speed 3057.67 samples/sec Loss 5.9969 LearningRate 0.0186 Epoch: 11 Global Step: 141180 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:05:10,587-Speed 3066.45 samples/sec Loss 6.0654 LearningRate 0.0186 Epoch: 11 Global Step: 141190 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:05:13,977-Speed 3020.86 samples/sec Loss 6.0693 LearningRate 0.0186 Epoch: 11 Global Step: 141200 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:05:17,371-Speed 3018.53 samples/sec Loss 6.0651 LearningRate 0.0186 Epoch: 11 Global Step: 141210 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:05:20,832-Speed 2959.46 samples/sec Loss 6.0717 LearningRate 0.0186 Epoch: 11 Global Step: 141220 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:05:24,192-Speed 3048.96 samples/sec Loss 6.0469 LearningRate 0.0186 Epoch: 11 Global Step: 141230 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:05:27,568-Speed 3033.51 samples/sec Loss 6.1408 LearningRate 0.0186 Epoch: 11 Global Step: 141240 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:05:30,955-Speed 3024.44 samples/sec Loss 5.9482 LearningRate 0.0186 Epoch: 11 Global Step: 141250 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:05:34,281-Speed 3079.05 samples/sec Loss 6.0549 LearningRate 0.0186 Epoch: 11 Global Step: 141260 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:05:37,728-Speed 2972.15 samples/sec Loss 6.1253 LearningRate 0.0186 Epoch: 11 Global Step: 141270 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:05:41,087-Speed 3049.47 samples/sec Loss 6.0372 LearningRate 0.0186 Epoch: 11 Global Step: 141280 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:05:44,409-Speed 3082.52 samples/sec Loss 6.0208 LearningRate 0.0186 Epoch: 11 Global Step: 141290 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:05:47,814-Speed 3008.50 samples/sec Loss 6.0932 LearningRate 0.0186 Epoch: 11 Global Step: 141300 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:05:51,229-Speed 2999.60 samples/sec Loss 6.0918 LearningRate 0.0186 Epoch: 11 Global Step: 141310 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:05:54,661-Speed 2984.54 samples/sec Loss 6.1111 LearningRate 0.0186 Epoch: 11 Global Step: 141320 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:05:57,989-Speed 3077.32 samples/sec Loss 5.9630 LearningRate 0.0186 Epoch: 11 Global Step: 141330 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:06:01,410-Speed 2994.14 samples/sec Loss 6.1713 LearningRate 0.0186 Epoch: 11 Global Step: 141340 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:06:04,798-Speed 3023.42 samples/sec Loss 5.9923 LearningRate 0.0186 Epoch: 11 Global Step: 141350 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:06:08,165-Speed 3042.75 samples/sec Loss 6.1996 LearningRate 0.0186 Epoch: 11 Global Step: 141360 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:06:11,637-Speed 2950.06 samples/sec Loss 6.0527 LearningRate 0.0186 Epoch: 11 Global Step: 141370 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:06:15,070-Speed 2983.50 samples/sec Loss 6.0804 LearningRate 0.0186 Epoch: 11 Global Step: 141380 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:06:18,472-Speed 3010.55 samples/sec Loss 6.0792 LearningRate 0.0186 Epoch: 11 Global Step: 141390 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:06:21,883-Speed 3002.93 samples/sec Loss 6.1615 LearningRate 0.0186 Epoch: 11 Global Step: 141400 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:06:25,232-Speed 3058.82 samples/sec Loss 6.0133 LearningRate 0.0186 Epoch: 11 Global Step: 141410 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:06:28,667-Speed 2982.64 samples/sec Loss 6.1059 LearningRate 0.0186 Epoch: 11 Global Step: 141420 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:06:32,072-Speed 3008.34 samples/sec Loss 6.0868 LearningRate 0.0185 Epoch: 11 Global Step: 141430 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:06:35,436-Speed 3044.83 samples/sec Loss 6.0616 LearningRate 0.0185 Epoch: 11 Global Step: 141440 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:06:38,773-Speed 3070.41 samples/sec Loss 6.0101 LearningRate 0.0185 Epoch: 11 Global Step: 141450 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:06:42,185-Speed 3002.35 samples/sec Loss 6.1666 LearningRate 0.0185 Epoch: 11 Global Step: 141460 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:06:45,589-Speed 3008.41 samples/sec Loss 6.1293 LearningRate 0.0185 Epoch: 11 Global Step: 141470 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:06:48,929-Speed 3067.34 samples/sec Loss 6.1280 LearningRate 0.0185 Epoch: 11 Global Step: 141480 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:06:52,296-Speed 3041.90 samples/sec Loss 6.0803 LearningRate 0.0185 Epoch: 11 Global Step: 141490 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:06:55,715-Speed 2996.13 samples/sec Loss 6.0958 LearningRate 0.0185 Epoch: 11 Global Step: 141500 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:06:59,058-Speed 3063.85 samples/sec Loss 6.0465 LearningRate 0.0185 Epoch: 11 Global Step: 141510 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:07:02,444-Speed 3025.01 samples/sec Loss 6.0898 LearningRate 0.0185 Epoch: 11 Global Step: 141520 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:07:05,787-Speed 3064.14 samples/sec Loss 5.9964 LearningRate 0.0185 Epoch: 11 Global Step: 141530 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:07:09,157-Speed 3039.52 samples/sec Loss 6.0365 LearningRate 0.0185 Epoch: 11 Global Step: 141540 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:07:12,570-Speed 3001.02 samples/sec Loss 6.1297 LearningRate 0.0185 Epoch: 11 Global Step: 141550 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:07:15,960-Speed 3021.33 samples/sec Loss 6.1287 LearningRate 0.0185 Epoch: 11 Global Step: 141560 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:07:19,363-Speed 3010.65 samples/sec Loss 6.0587 LearningRate 0.0185 Epoch: 11 Global Step: 141570 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:07:22,730-Speed 3041.99 samples/sec Loss 6.1326 LearningRate 0.0185 Epoch: 11 Global Step: 141580 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:07:26,044-Speed 3091.00 samples/sec Loss 6.1838 LearningRate 0.0185 Epoch: 11 Global Step: 141590 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:07:29,482-Speed 2979.12 samples/sec Loss 6.1521 LearningRate 0.0185 Epoch: 11 Global Step: 141600 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:07:32,810-Speed 3078.37 samples/sec Loss 6.0908 LearningRate 0.0185 Epoch: 11 Global Step: 141610 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:07:36,161-Speed 3056.31 samples/sec Loss 6.1520 LearningRate 0.0185 Epoch: 11 Global Step: 141620 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:07:39,565-Speed 3009.13 samples/sec Loss 6.1228 LearningRate 0.0185 Epoch: 11 Global Step: 141630 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:07:42,930-Speed 3044.49 samples/sec Loss 6.1343 LearningRate 0.0185 Epoch: 11 Global Step: 141640 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:07:46,249-Speed 3085.41 samples/sec Loss 6.0623 LearningRate 0.0185 Epoch: 11 Global Step: 141650 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:07:49,627-Speed 3032.88 samples/sec Loss 6.1802 LearningRate 0.0185 Epoch: 11 Global Step: 141660 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:07:53,033-Speed 3007.14 samples/sec Loss 6.1495 LearningRate 0.0185 Epoch: 11 Global Step: 141670 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:07:56,470-Speed 2980.32 samples/sec Loss 6.0112 LearningRate 0.0185 Epoch: 11 Global Step: 141680 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:07:59,810-Speed 3066.48 samples/sec Loss 6.1871 LearningRate 0.0185 Epoch: 11 Global Step: 141690 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:08:03,202-Speed 3019.62 samples/sec Loss 6.0990 LearningRate 0.0185 Epoch: 11 Global Step: 141700 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:08:06,617-Speed 2999.90 samples/sec Loss 6.0872 LearningRate 0.0185 Epoch: 11 Global Step: 141710 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:08:10,073-Speed 2963.62 samples/sec Loss 6.0399 LearningRate 0.0184 Epoch: 11 Global Step: 141720 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:08:13,499-Speed 2990.01 samples/sec Loss 6.1295 LearningRate 0.0184 Epoch: 11 Global Step: 141730 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:08:16,942-Speed 2974.34 samples/sec Loss 6.2241 LearningRate 0.0184 Epoch: 11 Global Step: 141740 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:08:20,344-Speed 3011.52 samples/sec Loss 6.1116 LearningRate 0.0184 Epoch: 11 Global Step: 141750 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:08:23,695-Speed 3056.58 samples/sec Loss 6.0613 LearningRate 0.0184 Epoch: 11 Global Step: 141760 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:08:27,077-Speed 3028.69 samples/sec Loss 6.1695 LearningRate 0.0184 Epoch: 11 Global Step: 141770 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:08:30,444-Speed 3042.69 samples/sec Loss 6.0984 LearningRate 0.0184 Epoch: 11 Global Step: 141780 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:08:33,780-Speed 3069.81 samples/sec Loss 6.2294 LearningRate 0.0184 Epoch: 11 Global Step: 141790 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:08:37,189-Speed 3004.37 samples/sec Loss 6.1062 LearningRate 0.0184 Epoch: 11 Global Step: 141800 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:08:40,504-Speed 3090.68 samples/sec Loss 6.0262 LearningRate 0.0184 Epoch: 11 Global Step: 141810 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:08:43,869-Speed 3043.77 samples/sec Loss 6.1394 LearningRate 0.0184 Epoch: 11 Global Step: 141820 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:08:47,231-Speed 3046.92 samples/sec Loss 6.1997 LearningRate 0.0184 Epoch: 11 Global Step: 141830 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:08:50,650-Speed 2995.44 samples/sec Loss 5.9460 LearningRate 0.0184 Epoch: 11 Global Step: 141840 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:08:53,963-Speed 3092.36 samples/sec Loss 6.1688 LearningRate 0.0184 Epoch: 11 Global Step: 141850 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:08:57,329-Speed 3042.55 samples/sec Loss 6.0305 LearningRate 0.0184 Epoch: 11 Global Step: 141860 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:09:00,695-Speed 3043.96 samples/sec Loss 6.1646 LearningRate 0.0184 Epoch: 11 Global Step: 141870 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:09:04,115-Speed 2994.80 samples/sec Loss 6.1699 LearningRate 0.0184 Epoch: 11 Global Step: 141880 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:09:07,624-Speed 2918.89 samples/sec Loss 6.0187 LearningRate 0.0184 Epoch: 11 Global Step: 141890 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:09:11,043-Speed 2995.02 samples/sec Loss 6.1555 LearningRate 0.0184 Epoch: 11 Global Step: 141900 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:09:14,435-Speed 3020.51 samples/sec Loss 6.2413 LearningRate 0.0184 Epoch: 11 Global Step: 141910 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:09:17,806-Speed 3038.83 samples/sec Loss 6.1054 LearningRate 0.0184 Epoch: 11 Global Step: 141920 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:09:21,142-Speed 3070.37 samples/sec Loss 6.0916 LearningRate 0.0184 Epoch: 11 Global Step: 141930 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:09:24,596-Speed 2965.62 samples/sec Loss 6.1536 LearningRate 0.0184 Epoch: 11 Global Step: 141940 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:09:27,960-Speed 3045.28 samples/sec Loss 6.1383 LearningRate 0.0184 Epoch: 11 Global Step: 141950 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:09:31,405-Speed 2973.27 samples/sec Loss 6.0103 LearningRate 0.0184 Epoch: 11 Global Step: 141960 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:09:34,794-Speed 3022.08 samples/sec Loss 5.9830 LearningRate 0.0184 Epoch: 11 Global Step: 141970 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:09:38,210-Speed 2999.17 samples/sec Loss 6.1330 LearningRate 0.0184 Epoch: 11 Global Step: 141980 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:09:41,553-Speed 3063.60 samples/sec Loss 6.0916 LearningRate 0.0184 Epoch: 11 Global Step: 141990 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:09:44,951-Speed 3014.44 samples/sec Loss 6.1640 LearningRate 0.0184 Epoch: 11 Global Step: 142000 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:09:48,272-Speed 3084.81 samples/sec Loss 6.1242 LearningRate 0.0183 Epoch: 11 Global Step: 142010 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:09:51,626-Speed 3054.08 samples/sec Loss 6.2336 LearningRate 0.0183 Epoch: 11 Global Step: 142020 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:09:55,021-Speed 3016.44 samples/sec Loss 6.1720 LearningRate 0.0183 Epoch: 11 Global Step: 142030 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:09:58,350-Speed 3077.59 samples/sec Loss 6.1150 LearningRate 0.0183 Epoch: 11 Global Step: 142040 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:10:01,726-Speed 3034.08 samples/sec Loss 6.1678 LearningRate 0.0183 Epoch: 11 Global Step: 142050 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:10:05,187-Speed 2959.66 samples/sec Loss 6.0554 LearningRate 0.0183 Epoch: 11 Global Step: 142060 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:10:08,590-Speed 3009.60 samples/sec Loss 6.1229 LearningRate 0.0183 Epoch: 11 Global Step: 142070 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:10:11,967-Speed 3033.90 samples/sec Loss 6.1347 LearningRate 0.0183 Epoch: 11 Global Step: 142080 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:10:15,297-Speed 3075.85 samples/sec Loss 6.1054 LearningRate 0.0183 Epoch: 11 Global Step: 142090 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:10:18,779-Speed 2941.34 samples/sec Loss 6.1310 LearningRate 0.0183 Epoch: 11 Global Step: 142100 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:10:22,112-Speed 3074.08 samples/sec Loss 6.0116 LearningRate 0.0183 Epoch: 11 Global Step: 142110 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:10:25,433-Speed 3083.50 samples/sec Loss 6.1626 LearningRate 0.0183 Epoch: 11 Global Step: 142120 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:10:28,828-Speed 3017.76 samples/sec Loss 6.1385 LearningRate 0.0183 Epoch: 11 Global Step: 142130 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:10:32,219-Speed 3020.60 samples/sec Loss 6.0508 LearningRate 0.0183 Epoch: 11 Global Step: 142140 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:10:35,608-Speed 3022.56 samples/sec Loss 6.0412 LearningRate 0.0183 Epoch: 11 Global Step: 142150 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:10:39,084-Speed 2946.22 samples/sec Loss 6.0526 LearningRate 0.0183 Epoch: 11 Global Step: 142160 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:10:42,474-Speed 3021.78 samples/sec Loss 6.1290 LearningRate 0.0183 Epoch: 11 Global Step: 142170 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:10:45,929-Speed 2964.44 samples/sec Loss 6.1032 LearningRate 0.0183 Epoch: 11 Global Step: 142180 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:10:49,325-Speed 3016.23 samples/sec Loss 6.0732 LearningRate 0.0183 Epoch: 11 Global Step: 142190 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:10:52,652-Speed 3079.09 samples/sec Loss 6.0714 LearningRate 0.0183 Epoch: 11 Global Step: 142200 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:10:56,026-Speed 3035.53 samples/sec Loss 6.1562 LearningRate 0.0183 Epoch: 11 Global Step: 142210 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:10:59,397-Speed 3039.10 samples/sec Loss 6.1150 LearningRate 0.0183 Epoch: 11 Global Step: 142220 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:11:02,758-Speed 3047.18 samples/sec Loss 6.1289 LearningRate 0.0183 Epoch: 11 Global Step: 142230 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:11:06,107-Speed 3058.73 samples/sec Loss 6.1458 LearningRate 0.0183 Epoch: 11 Global Step: 142240 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:11:09,526-Speed 2996.27 samples/sec Loss 6.1265 LearningRate 0.0183 Epoch: 11 Global Step: 142250 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:11:12,927-Speed 3011.22 samples/sec Loss 6.1862 LearningRate 0.0183 Epoch: 11 Global Step: 142260 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:11:16,331-Speed 3009.00 samples/sec Loss 6.1303 LearningRate 0.0183 Epoch: 11 Global Step: 142270 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:11:19,701-Speed 3039.53 samples/sec Loss 6.1427 LearningRate 0.0183 Epoch: 11 Global Step: 142280 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:11:23,056-Speed 3053.68 samples/sec Loss 6.1247 LearningRate 0.0183 Epoch: 11 Global Step: 142290 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:11:26,494-Speed 2980.11 samples/sec Loss 6.1614 LearningRate 0.0182 Epoch: 11 Global Step: 142300 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:11:29,935-Speed 2977.02 samples/sec Loss 6.0247 LearningRate 0.0182 Epoch: 11 Global Step: 142310 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:11:33,403-Speed 2953.70 samples/sec Loss 6.1596 LearningRate 0.0182 Epoch: 11 Global Step: 142320 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:11:36,812-Speed 3004.08 samples/sec Loss 6.1555 LearningRate 0.0182 Epoch: 11 Global Step: 142330 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:11:40,198-Speed 3025.78 samples/sec Loss 6.1807 LearningRate 0.0182 Epoch: 11 Global Step: 142340 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:11:43,543-Speed 3061.58 samples/sec Loss 6.1541 LearningRate 0.0182 Epoch: 11 Global Step: 142350 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:11:46,895-Speed 3056.08 samples/sec Loss 6.1832 LearningRate 0.0182 Epoch: 11 Global Step: 142360 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:11:50,273-Speed 3032.21 samples/sec Loss 6.2350 LearningRate 0.0182 Epoch: 11 Global Step: 142370 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:11:53,656-Speed 3027.74 samples/sec Loss 6.1247 LearningRate 0.0182 Epoch: 11 Global Step: 142380 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:11:57,050-Speed 3018.06 samples/sec Loss 5.9820 LearningRate 0.0182 Epoch: 11 Global Step: 142390 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:12:00,418-Speed 3040.85 samples/sec Loss 6.1091 LearningRate 0.0182 Epoch: 11 Global Step: 142400 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:12:03,817-Speed 3014.05 samples/sec Loss 6.0950 LearningRate 0.0182 Epoch: 11 Global Step: 142410 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:12:07,234-Speed 2997.02 samples/sec Loss 6.1340 LearningRate 0.0182 Epoch: 11 Global Step: 142420 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:12:10,653-Speed 2996.08 samples/sec Loss 6.1951 LearningRate 0.0182 Epoch: 11 Global Step: 142430 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:12:14,006-Speed 3054.84 samples/sec Loss 6.0403 LearningRate 0.0182 Epoch: 11 Global Step: 142440 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:12:17,365-Speed 3049.95 samples/sec Loss 6.2114 LearningRate 0.0182 Epoch: 11 Global Step: 142450 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:12:20,763-Speed 3013.73 samples/sec Loss 6.1107 LearningRate 0.0182 Epoch: 11 Global Step: 142460 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:12:24,112-Speed 3058.83 samples/sec Loss 6.1785 LearningRate 0.0182 Epoch: 11 Global Step: 142470 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:12:27,498-Speed 3024.99 samples/sec Loss 6.0504 LearningRate 0.0182 Epoch: 11 Global Step: 142480 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:12:30,890-Speed 3019.99 samples/sec Loss 6.0724 LearningRate 0.0182 Epoch: 11 Global Step: 142490 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:12:34,296-Speed 3008.07 samples/sec Loss 6.1225 LearningRate 0.0182 Epoch: 11 Global Step: 142500 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:12:37,803-Speed 2920.50 samples/sec Loss 6.2142 LearningRate 0.0182 Epoch: 11 Global Step: 142510 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:12:41,181-Speed 3032.50 samples/sec Loss 6.1859 LearningRate 0.0182 Epoch: 11 Global Step: 142520 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:12:44,515-Speed 3072.12 samples/sec Loss 6.1230 LearningRate 0.0182 Epoch: 11 Global Step: 142530 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:12:47,883-Speed 3041.03 samples/sec Loss 6.1615 LearningRate 0.0182 Epoch: 11 Global Step: 142540 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:12:51,277-Speed 3018.39 samples/sec Loss 6.1300 LearningRate 0.0182 Epoch: 11 Global Step: 142550 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:12:54,690-Speed 3000.90 samples/sec Loss 6.0600 LearningRate 0.0182 Epoch: 11 Global Step: 142560 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:12:58,055-Speed 3044.26 samples/sec Loss 6.0538 LearningRate 0.0182 Epoch: 11 Global Step: 142570 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:13:01,462-Speed 3006.27 samples/sec Loss 6.0266 LearningRate 0.0182 Epoch: 11 Global Step: 142580 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:13:04,821-Speed 3049.93 samples/sec Loss 6.1835 LearningRate 0.0181 Epoch: 11 Global Step: 142590 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:13:08,201-Speed 3030.46 samples/sec Loss 6.1501 LearningRate 0.0181 Epoch: 11 Global Step: 142600 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:13:11,563-Speed 3046.36 samples/sec Loss 6.1584 LearningRate 0.0181 Epoch: 11 Global Step: 142610 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:13:14,935-Speed 3037.46 samples/sec Loss 6.0627 LearningRate 0.0181 Epoch: 11 Global Step: 142620 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:13:18,395-Speed 2960.56 samples/sec Loss 6.1212 LearningRate 0.0181 Epoch: 11 Global Step: 142630 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:13:21,766-Speed 3038.72 samples/sec Loss 6.1552 LearningRate 0.0181 Epoch: 11 Global Step: 142640 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:13:25,272-Speed 2921.80 samples/sec Loss 6.1244 LearningRate 0.0181 Epoch: 11 Global Step: 142650 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:13:28,644-Speed 3038.59 samples/sec Loss 6.2256 LearningRate 0.0181 Epoch: 11 Global Step: 142660 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:13:32,134-Speed 2935.10 samples/sec Loss 6.2136 LearningRate 0.0181 Epoch: 11 Global Step: 142670 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:13:35,499-Speed 3044.27 samples/sec Loss 6.0287 LearningRate 0.0181 Epoch: 11 Global Step: 142680 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:13:38,865-Speed 3042.59 samples/sec Loss 6.0978 LearningRate 0.0181 Epoch: 11 Global Step: 142690 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:13:42,248-Speed 3028.16 samples/sec Loss 6.0791 LearningRate 0.0181 Epoch: 11 Global Step: 142700 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:13:45,600-Speed 3055.62 samples/sec Loss 6.1654 LearningRate 0.0181 Epoch: 11 Global Step: 142710 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:13:48,928-Speed 3077.85 samples/sec Loss 6.1347 LearningRate 0.0181 Epoch: 11 Global Step: 142720 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:13:52,256-Speed 3078.55 samples/sec Loss 6.1979 LearningRate 0.0181 Epoch: 11 Global Step: 142730 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:13:55,587-Speed 3074.93 samples/sec Loss 6.0577 LearningRate 0.0181 Epoch: 11 Global Step: 142740 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:13:58,917-Speed 3076.48 samples/sec Loss 6.1782 LearningRate 0.0181 Epoch: 11 Global Step: 142750 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:14:02,324-Speed 3005.84 samples/sec Loss 6.1346 LearningRate 0.0181 Epoch: 11 Global Step: 142760 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:14:05,742-Speed 2997.10 samples/sec Loss 6.1154 LearningRate 0.0181 Epoch: 11 Global Step: 142770 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:14:09,141-Speed 3013.76 samples/sec Loss 6.2545 LearningRate 0.0181 Epoch: 11 Global Step: 142780 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:14:12,547-Speed 3007.24 samples/sec Loss 6.2276 LearningRate 0.0181 Epoch: 11 Global Step: 142790 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:14:15,961-Speed 3000.17 samples/sec Loss 6.1738 LearningRate 0.0181 Epoch: 11 Global Step: 142800 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:14:19,398-Speed 2980.69 samples/sec Loss 6.0923 LearningRate 0.0181 Epoch: 11 Global Step: 142810 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:14:22,785-Speed 3024.33 samples/sec Loss 6.1310 LearningRate 0.0181 Epoch: 11 Global Step: 142820 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:14:26,171-Speed 3025.07 samples/sec Loss 6.1642 LearningRate 0.0181 Epoch: 11 Global Step: 142830 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:14:29,511-Speed 3066.86 samples/sec Loss 6.1926 LearningRate 0.0181 Epoch: 11 Global Step: 142840 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:14:32,992-Speed 2942.21 samples/sec Loss 6.0883 LearningRate 0.0181 Epoch: 11 Global Step: 142850 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:14:36,384-Speed 3020.42 samples/sec Loss 6.0626 LearningRate 0.0181 Epoch: 11 Global Step: 142860 Fp16 Grad Scale: 131072 Required: 10 hours Training: 2022-04-27 15:14:39,790-Speed 3007.16 samples/sec Loss 6.1119 LearningRate 0.0181 Epoch: 11 Global Step: 142870 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:14:43,105-Speed 3090.04 samples/sec Loss 6.1572 LearningRate 0.0180 Epoch: 11 Global Step: 142880 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:14:46,521-Speed 2998.49 samples/sec Loss 6.0650 LearningRate 0.0180 Epoch: 11 Global Step: 142890 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:14:49,936-Speed 2998.85 samples/sec Loss 6.1178 LearningRate 0.0180 Epoch: 11 Global Step: 142900 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:14:53,379-Speed 2974.75 samples/sec Loss 6.1197 LearningRate 0.0180 Epoch: 11 Global Step: 142910 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:14:56,764-Speed 3026.40 samples/sec Loss 6.1820 LearningRate 0.0180 Epoch: 11 Global Step: 142920 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:15:00,219-Speed 2963.92 samples/sec Loss 6.0644 LearningRate 0.0180 Epoch: 11 Global Step: 142930 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:15:03,588-Speed 3042.46 samples/sec Loss 6.0937 LearningRate 0.0180 Epoch: 11 Global Step: 142940 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:15:06,986-Speed 3014.87 samples/sec Loss 6.1076 LearningRate 0.0180 Epoch: 11 Global Step: 142950 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:15:10,432-Speed 2973.01 samples/sec Loss 6.1233 LearningRate 0.0180 Epoch: 11 Global Step: 142960 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:15:13,951-Speed 2910.36 samples/sec Loss 6.1093 LearningRate 0.0180 Epoch: 11 Global Step: 142970 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:15:17,348-Speed 3015.74 samples/sec Loss 6.0601 LearningRate 0.0180 Epoch: 11 Global Step: 142980 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:15:20,737-Speed 3021.92 samples/sec Loss 6.1915 LearningRate 0.0180 Epoch: 11 Global Step: 142990 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:15:24,117-Speed 3030.36 samples/sec Loss 6.0990 LearningRate 0.0180 Epoch: 11 Global Step: 143000 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:15:27,488-Speed 3038.53 samples/sec Loss 6.0170 LearningRate 0.0180 Epoch: 11 Global Step: 143010 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:15:30,837-Speed 3058.60 samples/sec Loss 6.1649 LearningRate 0.0180 Epoch: 11 Global Step: 143020 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:15:34,357-Speed 2910.40 samples/sec Loss 6.1573 LearningRate 0.0180 Epoch: 11 Global Step: 143030 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:15:37,769-Speed 3001.51 samples/sec Loss 6.0951 LearningRate 0.0180 Epoch: 11 Global Step: 143040 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:15:41,177-Speed 3006.14 samples/sec Loss 6.0862 LearningRate 0.0180 Epoch: 11 Global Step: 143050 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:15:44,541-Speed 3044.38 samples/sec Loss 6.0352 LearningRate 0.0180 Epoch: 11 Global Step: 143060 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:15:47,964-Speed 2992.87 samples/sec Loss 6.1128 LearningRate 0.0180 Epoch: 11 Global Step: 143070 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:15:51,427-Speed 2957.68 samples/sec Loss 6.1566 LearningRate 0.0180 Epoch: 11 Global Step: 143080 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:15:54,800-Speed 3036.53 samples/sec Loss 6.0710 LearningRate 0.0180 Epoch: 11 Global Step: 143090 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:15:58,190-Speed 3021.67 samples/sec Loss 6.1544 LearningRate 0.0180 Epoch: 11 Global Step: 143100 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:16:01,557-Speed 3042.65 samples/sec Loss 6.0794 LearningRate 0.0180 Epoch: 11 Global Step: 143110 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:16:04,884-Speed 3078.09 samples/sec Loss 6.1074 LearningRate 0.0180 Epoch: 11 Global Step: 143120 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:16:08,238-Speed 3053.97 samples/sec Loss 6.1731 LearningRate 0.0180 Epoch: 11 Global Step: 143130 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:16:11,663-Speed 2990.91 samples/sec Loss 6.0770 LearningRate 0.0180 Epoch: 11 Global Step: 143140 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:16:15,013-Speed 3057.96 samples/sec Loss 6.0408 LearningRate 0.0180 Epoch: 11 Global Step: 143150 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:16:18,434-Speed 2994.17 samples/sec Loss 6.1571 LearningRate 0.0180 Epoch: 11 Global Step: 143160 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:16:21,792-Speed 3050.36 samples/sec Loss 6.2189 LearningRate 0.0180 Epoch: 11 Global Step: 143170 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:16:25,206-Speed 2999.87 samples/sec Loss 6.1163 LearningRate 0.0179 Epoch: 11 Global Step: 143180 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:16:28,642-Speed 2981.25 samples/sec Loss 6.0935 LearningRate 0.0179 Epoch: 11 Global Step: 143190 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:16:32,158-Speed 2913.62 samples/sec Loss 6.0558 LearningRate 0.0179 Epoch: 11 Global Step: 143200 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:16:35,562-Speed 3008.79 samples/sec Loss 6.1647 LearningRate 0.0179 Epoch: 11 Global Step: 143210 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:16:38,989-Speed 2989.57 samples/sec Loss 6.1905 LearningRate 0.0179 Epoch: 11 Global Step: 143220 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:16:42,385-Speed 3016.04 samples/sec Loss 6.0592 LearningRate 0.0179 Epoch: 11 Global Step: 143230 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:16:45,833-Speed 2970.89 samples/sec Loss 6.0080 LearningRate 0.0179 Epoch: 11 Global Step: 143240 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:16:49,236-Speed 3009.92 samples/sec Loss 5.9754 LearningRate 0.0179 Epoch: 11 Global Step: 143250 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:16:52,652-Speed 2998.89 samples/sec Loss 6.2130 LearningRate 0.0179 Epoch: 11 Global Step: 143260 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:16:56,058-Speed 3007.13 samples/sec Loss 6.1420 LearningRate 0.0179 Epoch: 11 Global Step: 143270 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:16:59,448-Speed 3020.86 samples/sec Loss 6.2127 LearningRate 0.0179 Epoch: 11 Global Step: 143280 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:17:02,828-Speed 3031.34 samples/sec Loss 5.9544 LearningRate 0.0179 Epoch: 11 Global Step: 143290 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:17:06,135-Speed 3096.55 samples/sec Loss 5.9856 LearningRate 0.0179 Epoch: 11 Global Step: 143300 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:17:09,535-Speed 3013.14 samples/sec Loss 6.1821 LearningRate 0.0179 Epoch: 11 Global Step: 143310 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:17:12,928-Speed 3019.06 samples/sec Loss 6.2780 LearningRate 0.0179 Epoch: 11 Global Step: 143320 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:17:16,357-Speed 2986.81 samples/sec Loss 6.1158 LearningRate 0.0179 Epoch: 11 Global Step: 143330 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:17:19,723-Speed 3042.99 samples/sec Loss 6.0664 LearningRate 0.0179 Epoch: 11 Global Step: 143340 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:17:23,192-Speed 2953.07 samples/sec Loss 6.1529 LearningRate 0.0179 Epoch: 11 Global Step: 143350 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:17:26,550-Speed 3049.70 samples/sec Loss 6.0534 LearningRate 0.0179 Epoch: 11 Global Step: 143360 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:17:30,032-Speed 2942.16 samples/sec Loss 6.0686 LearningRate 0.0179 Epoch: 11 Global Step: 143370 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:17:33,496-Speed 2956.34 samples/sec Loss 6.1614 LearningRate 0.0179 Epoch: 11 Global Step: 143380 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:17:36,833-Speed 3069.55 samples/sec Loss 5.9606 LearningRate 0.0179 Epoch: 11 Global Step: 143390 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:17:40,156-Speed 3083.22 samples/sec Loss 6.1085 LearningRate 0.0179 Epoch: 11 Global Step: 143400 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:17:43,593-Speed 2979.28 samples/sec Loss 6.0327 LearningRate 0.0179 Epoch: 11 Global Step: 143410 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:17:47,021-Speed 2988.45 samples/sec Loss 6.1119 LearningRate 0.0179 Epoch: 11 Global Step: 143420 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:17:50,458-Speed 2980.10 samples/sec Loss 6.1108 LearningRate 0.0179 Epoch: 11 Global Step: 143430 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:17:53,882-Speed 2991.67 samples/sec Loss 6.1281 LearningRate 0.0179 Epoch: 11 Global Step: 143440 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:17:57,316-Speed 2982.84 samples/sec Loss 6.1248 LearningRate 0.0179 Epoch: 11 Global Step: 143450 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:18:00,728-Speed 3002.68 samples/sec Loss 6.1451 LearningRate 0.0179 Epoch: 11 Global Step: 143460 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:18:04,162-Speed 2982.10 samples/sec Loss 6.1687 LearningRate 0.0178 Epoch: 11 Global Step: 143470 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:18:07,613-Speed 2969.36 samples/sec Loss 6.2183 LearningRate 0.0178 Epoch: 11 Global Step: 143480 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:18:11,017-Speed 3010.24 samples/sec Loss 6.2209 LearningRate 0.0178 Epoch: 11 Global Step: 143490 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:18:14,381-Speed 3044.85 samples/sec Loss 6.1551 LearningRate 0.0178 Epoch: 11 Global Step: 143500 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:18:17,817-Speed 2981.25 samples/sec Loss 6.0487 LearningRate 0.0178 Epoch: 11 Global Step: 143510 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:18:21,215-Speed 3014.43 samples/sec Loss 6.0909 LearningRate 0.0178 Epoch: 11 Global Step: 143520 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:18:24,584-Speed 3039.95 samples/sec Loss 6.1581 LearningRate 0.0178 Epoch: 11 Global Step: 143530 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:18:27,955-Speed 3038.76 samples/sec Loss 6.1221 LearningRate 0.0178 Epoch: 11 Global Step: 143540 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:18:31,322-Speed 3042.25 samples/sec Loss 6.1118 LearningRate 0.0178 Epoch: 11 Global Step: 143550 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:18:34,704-Speed 3028.65 samples/sec Loss 6.1340 LearningRate 0.0178 Epoch: 11 Global Step: 143560 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:18:38,070-Speed 3043.38 samples/sec Loss 6.1221 LearningRate 0.0178 Epoch: 11 Global Step: 143570 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:18:41,551-Speed 2943.40 samples/sec Loss 6.1576 LearningRate 0.0178 Epoch: 11 Global Step: 143580 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:18:44,955-Speed 3008.57 samples/sec Loss 6.1339 LearningRate 0.0178 Epoch: 11 Global Step: 143590 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:18:48,345-Speed 3022.12 samples/sec Loss 6.1173 LearningRate 0.0178 Epoch: 11 Global Step: 143600 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:18:51,718-Speed 3036.92 samples/sec Loss 6.1569 LearningRate 0.0178 Epoch: 11 Global Step: 143610 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:18:55,094-Speed 3033.49 samples/sec Loss 6.0399 LearningRate 0.0178 Epoch: 11 Global Step: 143620 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:18:58,497-Speed 3010.74 samples/sec Loss 6.0656 LearningRate 0.0178 Epoch: 11 Global Step: 143630 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:19:01,871-Speed 3035.10 samples/sec Loss 6.1737 LearningRate 0.0178 Epoch: 11 Global Step: 143640 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:19:05,218-Speed 3060.60 samples/sec Loss 6.0755 LearningRate 0.0178 Epoch: 11 Global Step: 143650 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:19:08,618-Speed 3012.31 samples/sec Loss 6.2056 LearningRate 0.0178 Epoch: 11 Global Step: 143660 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:19:11,997-Speed 3031.35 samples/sec Loss 6.0731 LearningRate 0.0178 Epoch: 11 Global Step: 143670 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:19:15,444-Speed 2971.40 samples/sec Loss 6.0676 LearningRate 0.0178 Epoch: 11 Global Step: 143680 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:19:18,807-Speed 3045.73 samples/sec Loss 6.1984 LearningRate 0.0178 Epoch: 11 Global Step: 143690 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:19:22,175-Speed 3041.19 samples/sec Loss 6.1151 LearningRate 0.0178 Epoch: 11 Global Step: 143700 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:19:25,587-Speed 3002.59 samples/sec Loss 6.1242 LearningRate 0.0178 Epoch: 11 Global Step: 143710 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:19:29,017-Speed 2986.04 samples/sec Loss 6.0129 LearningRate 0.0178 Epoch: 11 Global Step: 143720 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:19:32,429-Speed 3002.44 samples/sec Loss 6.1052 LearningRate 0.0178 Epoch: 11 Global Step: 143730 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:19:35,810-Speed 3029.56 samples/sec Loss 6.0774 LearningRate 0.0178 Epoch: 11 Global Step: 143740 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:19:39,159-Speed 3058.72 samples/sec Loss 6.1345 LearningRate 0.0178 Epoch: 11 Global Step: 143750 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:19:42,515-Speed 3051.72 samples/sec Loss 6.1412 LearningRate 0.0177 Epoch: 11 Global Step: 143760 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:19:45,839-Speed 3081.25 samples/sec Loss 6.1144 LearningRate 0.0177 Epoch: 11 Global Step: 143770 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:19:49,188-Speed 3059.11 samples/sec Loss 6.0489 LearningRate 0.0177 Epoch: 11 Global Step: 143780 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:19:52,595-Speed 3006.76 samples/sec Loss 6.0812 LearningRate 0.0177 Epoch: 11 Global Step: 143790 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:19:55,973-Speed 3032.48 samples/sec Loss 5.9831 LearningRate 0.0177 Epoch: 11 Global Step: 143800 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:19:59,374-Speed 3011.08 samples/sec Loss 6.1247 LearningRate 0.0177 Epoch: 11 Global Step: 143810 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:20:02,727-Speed 3055.22 samples/sec Loss 6.1295 LearningRate 0.0177 Epoch: 11 Global Step: 143820 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:20:06,074-Speed 3060.65 samples/sec Loss 6.1079 LearningRate 0.0177 Epoch: 11 Global Step: 143830 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:20:09,487-Speed 3001.38 samples/sec Loss 5.9878 LearningRate 0.0177 Epoch: 11 Global Step: 143840 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:20:12,905-Speed 2995.82 samples/sec Loss 6.0575 LearningRate 0.0177 Epoch: 11 Global Step: 143850 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:20:16,260-Speed 3052.88 samples/sec Loss 6.2380 LearningRate 0.0177 Epoch: 11 Global Step: 143860 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:20:19,695-Speed 2982.77 samples/sec Loss 6.0806 LearningRate 0.0177 Epoch: 11 Global Step: 143870 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:20:23,104-Speed 3004.65 samples/sec Loss 6.0115 LearningRate 0.0177 Epoch: 11 Global Step: 143880 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:20:26,437-Speed 3072.52 samples/sec Loss 6.1164 LearningRate 0.0177 Epoch: 11 Global Step: 143890 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:20:29,772-Speed 3071.40 samples/sec Loss 6.1106 LearningRate 0.0177 Epoch: 11 Global Step: 143900 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:20:33,144-Speed 3037.63 samples/sec Loss 6.1035 LearningRate 0.0177 Epoch: 11 Global Step: 143910 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:20:36,569-Speed 2990.87 samples/sec Loss 6.1050 LearningRate 0.0177 Epoch: 11 Global Step: 143920 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:20:40,038-Speed 2952.76 samples/sec Loss 6.1154 LearningRate 0.0177 Epoch: 11 Global Step: 143930 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:20:43,463-Speed 2990.25 samples/sec Loss 6.0155 LearningRate 0.0177 Epoch: 11 Global Step: 143940 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:20:46,881-Speed 2996.58 samples/sec Loss 6.1956 LearningRate 0.0177 Epoch: 11 Global Step: 143950 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:20:50,307-Speed 2989.68 samples/sec Loss 5.9488 LearningRate 0.0177 Epoch: 11 Global Step: 143960 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:20:53,829-Speed 2908.71 samples/sec Loss 5.9668 LearningRate 0.0177 Epoch: 11 Global Step: 143970 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:20:57,222-Speed 3018.16 samples/sec Loss 6.0766 LearningRate 0.0177 Epoch: 11 Global Step: 143980 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:21:00,644-Speed 2993.56 samples/sec Loss 6.1871 LearningRate 0.0177 Epoch: 11 Global Step: 143990 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:21:04,036-Speed 3019.99 samples/sec Loss 6.1307 LearningRate 0.0177 Epoch: 11 Global Step: 144000 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:21:07,398-Speed 3046.50 samples/sec Loss 6.1146 LearningRate 0.0177 Epoch: 11 Global Step: 144010 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:21:10,774-Speed 3035.27 samples/sec Loss 6.1116 LearningRate 0.0177 Epoch: 11 Global Step: 144020 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:21:14,142-Speed 3041.54 samples/sec Loss 6.1395 LearningRate 0.0177 Epoch: 11 Global Step: 144030 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:21:17,578-Speed 2981.29 samples/sec Loss 6.0950 LearningRate 0.0177 Epoch: 11 Global Step: 144040 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:21:20,980-Speed 3010.33 samples/sec Loss 6.2880 LearningRate 0.0177 Epoch: 11 Global Step: 144050 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:21:24,351-Speed 3039.06 samples/sec Loss 6.1512 LearningRate 0.0176 Epoch: 11 Global Step: 144060 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:21:27,760-Speed 3004.81 samples/sec Loss 6.1656 LearningRate 0.0176 Epoch: 11 Global Step: 144070 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:21:31,152-Speed 3019.30 samples/sec Loss 6.0700 LearningRate 0.0176 Epoch: 11 Global Step: 144080 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:21:34,492-Speed 3067.07 samples/sec Loss 6.1390 LearningRate 0.0176 Epoch: 11 Global Step: 144090 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:21:37,874-Speed 3028.48 samples/sec Loss 6.1743 LearningRate 0.0176 Epoch: 11 Global Step: 144100 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:21:41,359-Speed 2939.17 samples/sec Loss 6.1442 LearningRate 0.0176 Epoch: 11 Global Step: 144110 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:21:44,702-Speed 3064.47 samples/sec Loss 6.1876 LearningRate 0.0176 Epoch: 11 Global Step: 144120 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:21:48,068-Speed 3043.08 samples/sec Loss 6.0151 LearningRate 0.0176 Epoch: 11 Global Step: 144130 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:21:51,393-Speed 3080.72 samples/sec Loss 6.0769 LearningRate 0.0176 Epoch: 11 Global Step: 144140 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:21:54,791-Speed 3014.04 samples/sec Loss 6.1003 LearningRate 0.0176 Epoch: 11 Global Step: 144150 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:21:58,142-Speed 3056.95 samples/sec Loss 6.0483 LearningRate 0.0176 Epoch: 11 Global Step: 144160 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:22:01,448-Speed 3098.11 samples/sec Loss 6.1207 LearningRate 0.0176 Epoch: 11 Global Step: 144170 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:22:04,801-Speed 3054.68 samples/sec Loss 6.2128 LearningRate 0.0176 Epoch: 11 Global Step: 144180 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:22:08,134-Speed 3073.00 samples/sec Loss 6.0535 LearningRate 0.0176 Epoch: 11 Global Step: 144190 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:22:11,530-Speed 3016.40 samples/sec Loss 6.1555 LearningRate 0.0176 Epoch: 11 Global Step: 144200 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:22:14,833-Speed 3100.93 samples/sec Loss 6.1185 LearningRate 0.0176 Epoch: 11 Global Step: 144210 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:22:18,180-Speed 3060.30 samples/sec Loss 5.9618 LearningRate 0.0176 Epoch: 11 Global Step: 144220 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:22:21,586-Speed 3007.80 samples/sec Loss 6.0833 LearningRate 0.0176 Epoch: 11 Global Step: 144230 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:22:24,956-Speed 3039.44 samples/sec Loss 6.1455 LearningRate 0.0176 Epoch: 11 Global Step: 144240 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:22:28,328-Speed 3037.63 samples/sec Loss 5.9486 LearningRate 0.0176 Epoch: 11 Global Step: 144250 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:22:31,669-Speed 3065.22 samples/sec Loss 6.0515 LearningRate 0.0176 Epoch: 11 Global Step: 144260 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:22:35,026-Speed 3051.03 samples/sec Loss 6.0305 LearningRate 0.0176 Epoch: 11 Global Step: 144270 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:22:38,393-Speed 3042.96 samples/sec Loss 6.1256 LearningRate 0.0176 Epoch: 11 Global Step: 144280 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:22:41,781-Speed 3023.00 samples/sec Loss 6.0967 LearningRate 0.0176 Epoch: 11 Global Step: 144290 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:22:45,186-Speed 3008.62 samples/sec Loss 6.1014 LearningRate 0.0176 Epoch: 11 Global Step: 144300 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:22:48,649-Speed 2957.63 samples/sec Loss 6.1764 LearningRate 0.0176 Epoch: 11 Global Step: 144310 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:22:51,999-Speed 3058.41 samples/sec Loss 6.1014 LearningRate 0.0176 Epoch: 11 Global Step: 144320 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:22:55,355-Speed 3051.49 samples/sec Loss 6.1039 LearningRate 0.0176 Epoch: 11 Global Step: 144330 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:22:58,735-Speed 3030.91 samples/sec Loss 6.0365 LearningRate 0.0176 Epoch: 11 Global Step: 144340 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:23:02,099-Speed 3044.85 samples/sec Loss 6.2005 LearningRate 0.0176 Epoch: 11 Global Step: 144350 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:23:05,465-Speed 3042.19 samples/sec Loss 6.1148 LearningRate 0.0175 Epoch: 11 Global Step: 144360 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:23:08,869-Speed 3009.49 samples/sec Loss 6.0549 LearningRate 0.0175 Epoch: 11 Global Step: 144370 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:23:12,290-Speed 2994.43 samples/sec Loss 6.0498 LearningRate 0.0175 Epoch: 11 Global Step: 144380 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:23:15,724-Speed 2982.47 samples/sec Loss 6.1395 LearningRate 0.0175 Epoch: 11 Global Step: 144390 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:23:19,162-Speed 2979.44 samples/sec Loss 6.0972 LearningRate 0.0175 Epoch: 11 Global Step: 144400 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:23:22,609-Speed 2971.25 samples/sec Loss 6.0532 LearningRate 0.0175 Epoch: 11 Global Step: 144410 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:23:26,131-Speed 2908.58 samples/sec Loss 6.2647 LearningRate 0.0175 Epoch: 11 Global Step: 144420 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:23:29,592-Speed 2959.65 samples/sec Loss 6.1990 LearningRate 0.0175 Epoch: 11 Global Step: 144430 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:23:33,048-Speed 2963.64 samples/sec Loss 6.0957 LearningRate 0.0175 Epoch: 11 Global Step: 144440 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:23:36,534-Speed 2937.82 samples/sec Loss 6.2183 LearningRate 0.0175 Epoch: 11 Global Step: 144450 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:23:39,964-Speed 2986.59 samples/sec Loss 6.2580 LearningRate 0.0175 Epoch: 11 Global Step: 144460 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:23:43,312-Speed 3059.19 samples/sec Loss 6.1225 LearningRate 0.0175 Epoch: 11 Global Step: 144470 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:23:46,661-Speed 3058.82 samples/sec Loss 6.0741 LearningRate 0.0175 Epoch: 11 Global Step: 144480 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:23:50,080-Speed 2996.40 samples/sec Loss 5.9617 LearningRate 0.0175 Epoch: 11 Global Step: 144490 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:23:53,502-Speed 2992.78 samples/sec Loss 6.0607 LearningRate 0.0175 Epoch: 11 Global Step: 144500 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:23:56,887-Speed 3026.18 samples/sec Loss 6.2096 LearningRate 0.0175 Epoch: 11 Global Step: 144510 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:24:00,233-Speed 3061.10 samples/sec Loss 6.1432 LearningRate 0.0175 Epoch: 11 Global Step: 144520 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:24:03,554-Speed 3084.70 samples/sec Loss 6.1145 LearningRate 0.0175 Epoch: 11 Global Step: 144530 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:24:06,970-Speed 2998.56 samples/sec Loss 6.1294 LearningRate 0.0175 Epoch: 11 Global Step: 144540 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:24:10,351-Speed 3028.77 samples/sec Loss 6.1957 LearningRate 0.0175 Epoch: 11 Global Step: 144550 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:24:13,759-Speed 3006.24 samples/sec Loss 6.2183 LearningRate 0.0175 Epoch: 11 Global Step: 144560 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:24:17,180-Speed 2994.40 samples/sec Loss 6.1735 LearningRate 0.0175 Epoch: 11 Global Step: 144570 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:24:20,617-Speed 2983.27 samples/sec Loss 6.0379 LearningRate 0.0175 Epoch: 11 Global Step: 144580 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:24:24,031-Speed 3000.04 samples/sec Loss 6.1898 LearningRate 0.0175 Epoch: 11 Global Step: 144590 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:24:27,444-Speed 3001.26 samples/sec Loss 6.1304 LearningRate 0.0175 Epoch: 11 Global Step: 144600 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:24:30,789-Speed 3062.04 samples/sec Loss 6.0607 LearningRate 0.0175 Epoch: 11 Global Step: 144610 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:24:34,154-Speed 3043.85 samples/sec Loss 6.2271 LearningRate 0.0175 Epoch: 11 Global Step: 144620 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:24:37,567-Speed 3001.08 samples/sec Loss 5.9870 LearningRate 0.0175 Epoch: 11 Global Step: 144630 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:24:40,920-Speed 3055.55 samples/sec Loss 6.0429 LearningRate 0.0175 Epoch: 11 Global Step: 144640 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:24:44,326-Speed 3007.28 samples/sec Loss 6.0630 LearningRate 0.0174 Epoch: 11 Global Step: 144650 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:24:47,701-Speed 3034.18 samples/sec Loss 6.0494 LearningRate 0.0174 Epoch: 11 Global Step: 144660 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:24:51,085-Speed 3027.94 samples/sec Loss 6.0506 LearningRate 0.0174 Epoch: 11 Global Step: 144670 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:24:54,460-Speed 3034.00 samples/sec Loss 6.1433 LearningRate 0.0174 Epoch: 11 Global Step: 144680 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:24:57,821-Speed 3047.80 samples/sec Loss 6.0978 LearningRate 0.0174 Epoch: 11 Global Step: 144690 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:25:01,174-Speed 3055.05 samples/sec Loss 6.1221 LearningRate 0.0174 Epoch: 11 Global Step: 144700 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:25:04,494-Speed 3085.13 samples/sec Loss 6.1418 LearningRate 0.0174 Epoch: 11 Global Step: 144710 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:25:07,805-Speed 3093.04 samples/sec Loss 6.1355 LearningRate 0.0174 Epoch: 11 Global Step: 144720 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:25:11,219-Speed 3000.36 samples/sec Loss 6.0817 LearningRate 0.0174 Epoch: 11 Global Step: 144730 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:25:14,541-Speed 3083.96 samples/sec Loss 6.1988 LearningRate 0.0174 Epoch: 11 Global Step: 144740 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:25:17,906-Speed 3043.43 samples/sec Loss 6.0723 LearningRate 0.0174 Epoch: 11 Global Step: 144750 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:25:21,267-Speed 3048.09 samples/sec Loss 6.0851 LearningRate 0.0174 Epoch: 11 Global Step: 144760 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:25:24,589-Speed 3083.11 samples/sec Loss 5.9865 LearningRate 0.0174 Epoch: 11 Global Step: 144770 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:25:27,948-Speed 3049.77 samples/sec Loss 6.0357 LearningRate 0.0174 Epoch: 11 Global Step: 144780 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:25:31,276-Speed 3077.87 samples/sec Loss 6.0520 LearningRate 0.0174 Epoch: 11 Global Step: 144790 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:25:34,616-Speed 3066.73 samples/sec Loss 6.1203 LearningRate 0.0174 Epoch: 11 Global Step: 144800 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:25:38,042-Speed 2989.60 samples/sec Loss 6.1433 LearningRate 0.0174 Epoch: 11 Global Step: 144810 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:25:41,479-Speed 2980.20 samples/sec Loss 6.0590 LearningRate 0.0174 Epoch: 11 Global Step: 144820 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:25:44,907-Speed 2988.48 samples/sec Loss 6.0823 LearningRate 0.0174 Epoch: 11 Global Step: 144830 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:25:48,345-Speed 2979.49 samples/sec Loss 6.1311 LearningRate 0.0174 Epoch: 11 Global Step: 144840 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:25:51,697-Speed 3057.00 samples/sec Loss 6.0705 LearningRate 0.0174 Epoch: 11 Global Step: 144850 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:25:55,104-Speed 3006.15 samples/sec Loss 6.0132 LearningRate 0.0174 Epoch: 11 Global Step: 144860 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:25:58,478-Speed 3035.86 samples/sec Loss 6.1428 LearningRate 0.0174 Epoch: 11 Global Step: 144870 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:26:01,883-Speed 3008.35 samples/sec Loss 6.1258 LearningRate 0.0174 Epoch: 11 Global Step: 144880 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:26:05,277-Speed 3017.82 samples/sec Loss 6.0725 LearningRate 0.0174 Epoch: 11 Global Step: 144890 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:26:08,647-Speed 3039.12 samples/sec Loss 6.0919 LearningRate 0.0174 Epoch: 11 Global Step: 144900 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:26:12,058-Speed 3002.60 samples/sec Loss 6.0710 LearningRate 0.0174 Epoch: 11 Global Step: 144910 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:26:15,434-Speed 3034.46 samples/sec Loss 6.0872 LearningRate 0.0174 Epoch: 11 Global Step: 144920 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:26:18,834-Speed 3012.91 samples/sec Loss 6.0573 LearningRate 0.0174 Epoch: 11 Global Step: 144930 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:26:22,155-Speed 3083.81 samples/sec Loss 6.0486 LearningRate 0.0174 Epoch: 11 Global Step: 144940 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:26:25,512-Speed 3050.87 samples/sec Loss 6.1282 LearningRate 0.0173 Epoch: 11 Global Step: 144950 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:26:28,932-Speed 2994.97 samples/sec Loss 5.9848 LearningRate 0.0173 Epoch: 11 Global Step: 144960 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:26:32,269-Speed 3069.90 samples/sec Loss 6.1550 LearningRate 0.0173 Epoch: 11 Global Step: 144970 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:26:35,701-Speed 2984.75 samples/sec Loss 6.0636 LearningRate 0.0173 Epoch: 11 Global Step: 144980 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:26:39,065-Speed 3045.07 samples/sec Loss 6.1059 LearningRate 0.0173 Epoch: 11 Global Step: 144990 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:26:42,403-Speed 3068.63 samples/sec Loss 6.1357 LearningRate 0.0173 Epoch: 11 Global Step: 145000 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:26:45,747-Speed 3062.51 samples/sec Loss 6.0247 LearningRate 0.0173 Epoch: 11 Global Step: 145010 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:26:49,074-Speed 3078.55 samples/sec Loss 6.0352 LearningRate 0.0173 Epoch: 11 Global Step: 145020 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:26:52,428-Speed 3054.62 samples/sec Loss 6.0734 LearningRate 0.0173 Epoch: 11 Global Step: 145030 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:26:55,868-Speed 2977.54 samples/sec Loss 6.0555 LearningRate 0.0173 Epoch: 11 Global Step: 145040 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:26:59,237-Speed 3040.47 samples/sec Loss 6.0610 LearningRate 0.0173 Epoch: 11 Global Step: 145050 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:27:02,628-Speed 3020.41 samples/sec Loss 6.0776 LearningRate 0.0173 Epoch: 11 Global Step: 145060 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:27:06,022-Speed 3018.13 samples/sec Loss 6.0187 LearningRate 0.0173 Epoch: 11 Global Step: 145070 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:27:09,469-Speed 2972.05 samples/sec Loss 6.2080 LearningRate 0.0173 Epoch: 11 Global Step: 145080 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:27:12,796-Speed 3078.93 samples/sec Loss 6.0336 LearningRate 0.0173 Epoch: 11 Global Step: 145090 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:27:16,171-Speed 3034.60 samples/sec Loss 6.0871 LearningRate 0.0173 Epoch: 11 Global Step: 145100 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:27:19,578-Speed 3006.28 samples/sec Loss 6.1175 LearningRate 0.0173 Epoch: 11 Global Step: 145110 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:27:23,093-Speed 2914.13 samples/sec Loss 6.1783 LearningRate 0.0173 Epoch: 11 Global Step: 145120 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:27:26,428-Speed 3071.01 samples/sec Loss 6.1547 LearningRate 0.0173 Epoch: 11 Global Step: 145130 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:27:29,775-Speed 3060.37 samples/sec Loss 6.1568 LearningRate 0.0173 Epoch: 11 Global Step: 145140 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:27:33,189-Speed 3000.35 samples/sec Loss 6.1934 LearningRate 0.0173 Epoch: 11 Global Step: 145150 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:27:36,552-Speed 3046.25 samples/sec Loss 6.1117 LearningRate 0.0173 Epoch: 11 Global Step: 145160 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:27:39,978-Speed 2988.98 samples/sec Loss 6.1765 LearningRate 0.0173 Epoch: 11 Global Step: 145170 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:27:43,383-Speed 3009.33 samples/sec Loss 6.0125 LearningRate 0.0173 Epoch: 11 Global Step: 145180 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:27:46,751-Speed 3040.88 samples/sec Loss 6.1400 LearningRate 0.0173 Epoch: 11 Global Step: 145190 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:27:50,169-Speed 2997.75 samples/sec Loss 6.0534 LearningRate 0.0173 Epoch: 11 Global Step: 145200 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:27:53,543-Speed 3035.24 samples/sec Loss 6.0550 LearningRate 0.0173 Epoch: 11 Global Step: 145210 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:27:56,911-Speed 3041.73 samples/sec Loss 6.0107 LearningRate 0.0173 Epoch: 11 Global Step: 145220 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:28:00,241-Speed 3076.18 samples/sec Loss 6.1006 LearningRate 0.0173 Epoch: 11 Global Step: 145230 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:28:03,570-Speed 3076.80 samples/sec Loss 6.0565 LearningRate 0.0173 Epoch: 11 Global Step: 145240 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:28:06,988-Speed 2996.59 samples/sec Loss 6.1443 LearningRate 0.0172 Epoch: 11 Global Step: 145250 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:28:10,389-Speed 3012.07 samples/sec Loss 5.9896 LearningRate 0.0172 Epoch: 11 Global Step: 145260 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:28:13,749-Speed 3048.51 samples/sec Loss 6.1093 LearningRate 0.0172 Epoch: 11 Global Step: 145270 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:28:17,163-Speed 3000.07 samples/sec Loss 6.0081 LearningRate 0.0172 Epoch: 11 Global Step: 145280 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:28:20,538-Speed 3034.85 samples/sec Loss 6.1635 LearningRate 0.0172 Epoch: 11 Global Step: 145290 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:28:23,909-Speed 3038.42 samples/sec Loss 6.0447 LearningRate 0.0172 Epoch: 11 Global Step: 145300 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:28:27,243-Speed 3072.37 samples/sec Loss 6.0046 LearningRate 0.0172 Epoch: 11 Global Step: 145310 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:28:30,625-Speed 3028.42 samples/sec Loss 6.0065 LearningRate 0.0172 Epoch: 11 Global Step: 145320 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:28:34,024-Speed 3013.39 samples/sec Loss 6.0525 LearningRate 0.0172 Epoch: 11 Global Step: 145330 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:28:37,379-Speed 3053.34 samples/sec Loss 6.0019 LearningRate 0.0172 Epoch: 11 Global Step: 145340 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:28:40,775-Speed 3016.16 samples/sec Loss 6.0413 LearningRate 0.0172 Epoch: 11 Global Step: 145350 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:28:44,211-Speed 2981.15 samples/sec Loss 6.1410 LearningRate 0.0172 Epoch: 11 Global Step: 145360 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:28:47,648-Speed 2980.47 samples/sec Loss 6.1107 LearningRate 0.0172 Epoch: 11 Global Step: 145370 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:28:51,003-Speed 3053.08 samples/sec Loss 6.0406 LearningRate 0.0172 Epoch: 11 Global Step: 145380 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:28:54,372-Speed 3042.59 samples/sec Loss 5.9460 LearningRate 0.0172 Epoch: 11 Global Step: 145390 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:28:57,829-Speed 2962.75 samples/sec Loss 6.1352 LearningRate 0.0172 Epoch: 11 Global Step: 145400 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:29:01,219-Speed 3021.44 samples/sec Loss 6.0164 LearningRate 0.0172 Epoch: 11 Global Step: 145410 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:29:04,599-Speed 3030.97 samples/sec Loss 6.0606 LearningRate 0.0172 Epoch: 11 Global Step: 145420 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:29:07,920-Speed 3084.15 samples/sec Loss 6.1181 LearningRate 0.0172 Epoch: 11 Global Step: 145430 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:29:11,378-Speed 2962.08 samples/sec Loss 5.9838 LearningRate 0.0172 Epoch: 11 Global Step: 145440 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:29:14,711-Speed 3073.96 samples/sec Loss 6.0630 LearningRate 0.0172 Epoch: 11 Global Step: 145450 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:29:18,135-Speed 2991.15 samples/sec Loss 6.0653 LearningRate 0.0172 Epoch: 11 Global Step: 145460 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:29:21,582-Speed 2971.56 samples/sec Loss 6.0486 LearningRate 0.0172 Epoch: 11 Global Step: 145470 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:29:24,989-Speed 3006.18 samples/sec Loss 6.1774 LearningRate 0.0172 Epoch: 11 Global Step: 145480 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:29:28,400-Speed 3003.67 samples/sec Loss 6.1281 LearningRate 0.0172 Epoch: 11 Global Step: 145490 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:29:31,806-Speed 3007.34 samples/sec Loss 6.1732 LearningRate 0.0172 Epoch: 11 Global Step: 145500 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:29:35,264-Speed 2961.91 samples/sec Loss 6.1249 LearningRate 0.0172 Epoch: 11 Global Step: 145510 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:29:38,637-Speed 3037.11 samples/sec Loss 6.0925 LearningRate 0.0172 Epoch: 11 Global Step: 145520 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:29:42,024-Speed 3024.49 samples/sec Loss 6.1094 LearningRate 0.0172 Epoch: 11 Global Step: 145530 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:29:45,422-Speed 3014.40 samples/sec Loss 6.0815 LearningRate 0.0172 Epoch: 11 Global Step: 145540 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:29:48,852-Speed 2986.54 samples/sec Loss 5.9962 LearningRate 0.0171 Epoch: 11 Global Step: 145550 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:29:52,245-Speed 3018.15 samples/sec Loss 6.0578 LearningRate 0.0171 Epoch: 11 Global Step: 145560 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:29:55,654-Speed 3005.14 samples/sec Loss 6.1184 LearningRate 0.0171 Epoch: 11 Global Step: 145570 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:29:59,057-Speed 3010.17 samples/sec Loss 6.1660 LearningRate 0.0171 Epoch: 11 Global Step: 145580 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:30:02,413-Speed 3051.84 samples/sec Loss 6.0484 LearningRate 0.0171 Epoch: 11 Global Step: 145590 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:30:05,840-Speed 2989.12 samples/sec Loss 6.0725 LearningRate 0.0171 Epoch: 11 Global Step: 145600 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:30:09,200-Speed 3048.02 samples/sec Loss 6.1447 LearningRate 0.0171 Epoch: 11 Global Step: 145610 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:30:12,583-Speed 3028.53 samples/sec Loss 6.0404 LearningRate 0.0171 Epoch: 11 Global Step: 145620 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:30:15,959-Speed 3033.63 samples/sec Loss 6.1058 LearningRate 0.0171 Epoch: 11 Global Step: 145630 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:30:19,308-Speed 3058.77 samples/sec Loss 6.0061 LearningRate 0.0171 Epoch: 11 Global Step: 145640 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:30:22,702-Speed 3017.89 samples/sec Loss 6.0327 LearningRate 0.0171 Epoch: 11 Global Step: 145650 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:30:26,052-Speed 3057.92 samples/sec Loss 6.0311 LearningRate 0.0171 Epoch: 11 Global Step: 145660 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:30:29,440-Speed 3023.69 samples/sec Loss 5.8724 LearningRate 0.0171 Epoch: 11 Global Step: 145670 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:30:32,775-Speed 3071.04 samples/sec Loss 6.1473 LearningRate 0.0171 Epoch: 11 Global Step: 145680 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:30:36,136-Speed 3047.03 samples/sec Loss 5.9815 LearningRate 0.0171 Epoch: 11 Global Step: 145690 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:30:39,508-Speed 3038.37 samples/sec Loss 6.0987 LearningRate 0.0171 Epoch: 11 Global Step: 145700 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:30:42,927-Speed 2995.23 samples/sec Loss 6.0395 LearningRate 0.0171 Epoch: 11 Global Step: 145710 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:30:46,245-Speed 3087.23 samples/sec Loss 6.0249 LearningRate 0.0171 Epoch: 11 Global Step: 145720 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:30:49,587-Speed 3065.19 samples/sec Loss 6.0603 LearningRate 0.0171 Epoch: 11 Global Step: 145730 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:30:53,018-Speed 2985.91 samples/sec Loss 6.0588 LearningRate 0.0171 Epoch: 11 Global Step: 145740 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:30:56,331-Speed 3091.43 samples/sec Loss 6.1412 LearningRate 0.0171 Epoch: 11 Global Step: 145750 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:30:59,672-Speed 3065.70 samples/sec Loss 6.1552 LearningRate 0.0171 Epoch: 11 Global Step: 145760 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:31:03,048-Speed 3034.64 samples/sec Loss 6.1127 LearningRate 0.0171 Epoch: 11 Global Step: 145770 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:31:06,462-Speed 2999.86 samples/sec Loss 6.0209 LearningRate 0.0171 Epoch: 11 Global Step: 145780 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:31:09,782-Speed 3084.84 samples/sec Loss 5.9978 LearningRate 0.0171 Epoch: 11 Global Step: 145790 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:31:13,193-Speed 3003.01 samples/sec Loss 6.0958 LearningRate 0.0171 Epoch: 11 Global Step: 145800 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:31:16,626-Speed 2984.27 samples/sec Loss 6.0665 LearningRate 0.0171 Epoch: 11 Global Step: 145810 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:31:20,084-Speed 2961.93 samples/sec Loss 6.0521 LearningRate 0.0171 Epoch: 11 Global Step: 145820 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:31:23,483-Speed 3014.07 samples/sec Loss 6.1786 LearningRate 0.0171 Epoch: 11 Global Step: 145830 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:31:26,910-Speed 2988.40 samples/sec Loss 6.0285 LearningRate 0.0171 Epoch: 11 Global Step: 145840 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:31:30,264-Speed 3054.00 samples/sec Loss 6.0736 LearningRate 0.0170 Epoch: 11 Global Step: 145850 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:31:33,779-Speed 2914.95 samples/sec Loss 6.0448 LearningRate 0.0170 Epoch: 11 Global Step: 145860 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:31:37,081-Speed 3102.63 samples/sec Loss 6.0540 LearningRate 0.0170 Epoch: 11 Global Step: 145870 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:31:40,511-Speed 2985.88 samples/sec Loss 6.0173 LearningRate 0.0170 Epoch: 11 Global Step: 145880 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:31:43,857-Speed 3061.23 samples/sec Loss 6.0645 LearningRate 0.0170 Epoch: 11 Global Step: 145890 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:31:47,320-Speed 2957.79 samples/sec Loss 6.0017 LearningRate 0.0170 Epoch: 11 Global Step: 145900 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:31:50,769-Speed 2970.01 samples/sec Loss 5.9940 LearningRate 0.0170 Epoch: 11 Global Step: 145910 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:31:54,116-Speed 3060.67 samples/sec Loss 6.0718 LearningRate 0.0170 Epoch: 11 Global Step: 145920 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:31:57,565-Speed 2969.92 samples/sec Loss 5.9767 LearningRate 0.0170 Epoch: 11 Global Step: 145930 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:32:00,917-Speed 3055.94 samples/sec Loss 6.0454 LearningRate 0.0170 Epoch: 11 Global Step: 145940 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:32:04,314-Speed 3015.38 samples/sec Loss 6.1114 LearningRate 0.0170 Epoch: 11 Global Step: 145950 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:32:07,756-Speed 2976.04 samples/sec Loss 6.0259 LearningRate 0.0170 Epoch: 11 Global Step: 145960 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:32:11,224-Speed 2953.42 samples/sec Loss 6.0571 LearningRate 0.0170 Epoch: 11 Global Step: 145970 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:32:14,691-Speed 2954.05 samples/sec Loss 6.0178 LearningRate 0.0170 Epoch: 11 Global Step: 145980 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:32:18,100-Speed 3004.75 samples/sec Loss 5.9854 LearningRate 0.0170 Epoch: 11 Global Step: 145990 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:32:21,525-Speed 2991.19 samples/sec Loss 6.0537 LearningRate 0.0170 Epoch: 11 Global Step: 146000 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:32:24,882-Speed 3051.00 samples/sec Loss 5.9832 LearningRate 0.0170 Epoch: 11 Global Step: 146010 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:32:28,280-Speed 3014.67 samples/sec Loss 5.9861 LearningRate 0.0170 Epoch: 11 Global Step: 146020 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:32:31,694-Speed 2999.85 samples/sec Loss 6.1673 LearningRate 0.0170 Epoch: 11 Global Step: 146030 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:32:35,061-Speed 3042.46 samples/sec Loss 5.9956 LearningRate 0.0170 Epoch: 11 Global Step: 146040 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:32:38,390-Speed 3077.21 samples/sec Loss 6.0543 LearningRate 0.0170 Epoch: 11 Global Step: 146050 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:32:41,754-Speed 3044.24 samples/sec Loss 6.0434 LearningRate 0.0170 Epoch: 11 Global Step: 146060 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:32:45,060-Speed 3098.53 samples/sec Loss 6.0731 LearningRate 0.0170 Epoch: 11 Global Step: 146070 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:32:48,452-Speed 3019.64 samples/sec Loss 5.9573 LearningRate 0.0170 Epoch: 11 Global Step: 146080 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:32:51,779-Speed 3078.51 samples/sec Loss 6.0534 LearningRate 0.0170 Epoch: 11 Global Step: 146090 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:32:55,227-Speed 2971.48 samples/sec Loss 6.0970 LearningRate 0.0170 Epoch: 11 Global Step: 146100 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:32:58,570-Speed 3063.48 samples/sec Loss 6.0198 LearningRate 0.0170 Epoch: 11 Global Step: 146110 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:33:01,928-Speed 3050.05 samples/sec Loss 5.9693 LearningRate 0.0170 Epoch: 11 Global Step: 146120 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:33:05,369-Speed 2977.24 samples/sec Loss 6.0822 LearningRate 0.0170 Epoch: 11 Global Step: 146130 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:33:08,770-Speed 3011.37 samples/sec Loss 6.0764 LearningRate 0.0170 Epoch: 11 Global Step: 146140 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:33:12,109-Speed 3068.35 samples/sec Loss 6.0282 LearningRate 0.0169 Epoch: 11 Global Step: 146150 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:33:15,575-Speed 2955.51 samples/sec Loss 6.1045 LearningRate 0.0169 Epoch: 11 Global Step: 146160 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:33:19,013-Speed 2979.01 samples/sec Loss 5.9877 LearningRate 0.0169 Epoch: 11 Global Step: 146170 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:33:22,369-Speed 3052.22 samples/sec Loss 6.0232 LearningRate 0.0169 Epoch: 11 Global Step: 146180 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:33:25,738-Speed 3039.98 samples/sec Loss 6.0780 LearningRate 0.0169 Epoch: 11 Global Step: 146190 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:33:29,124-Speed 3025.32 samples/sec Loss 6.0846 LearningRate 0.0169 Epoch: 11 Global Step: 146200 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:33:32,523-Speed 3013.34 samples/sec Loss 5.9751 LearningRate 0.0169 Epoch: 11 Global Step: 146210 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:33:35,904-Speed 3029.81 samples/sec Loss 6.1199 LearningRate 0.0169 Epoch: 11 Global Step: 146220 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:33:39,274-Speed 3039.39 samples/sec Loss 6.0684 LearningRate 0.0169 Epoch: 11 Global Step: 146230 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:33:42,655-Speed 3029.34 samples/sec Loss 5.9903 LearningRate 0.0169 Epoch: 11 Global Step: 146240 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:33:46,040-Speed 3026.43 samples/sec Loss 5.9853 LearningRate 0.0169 Epoch: 11 Global Step: 146250 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:33:49,375-Speed 3071.49 samples/sec Loss 5.9659 LearningRate 0.0169 Epoch: 11 Global Step: 146260 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:33:52,856-Speed 2942.63 samples/sec Loss 6.0874 LearningRate 0.0169 Epoch: 11 Global Step: 146270 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:33:56,344-Speed 2935.95 samples/sec Loss 6.0138 LearningRate 0.0169 Epoch: 11 Global Step: 146280 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:33:59,807-Speed 2958.40 samples/sec Loss 6.0381 LearningRate 0.0169 Epoch: 11 Global Step: 146290 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:34:03,174-Speed 3042.21 samples/sec Loss 5.9620 LearningRate 0.0169 Epoch: 11 Global Step: 146300 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:34:06,562-Speed 3023.08 samples/sec Loss 6.0319 LearningRate 0.0169 Epoch: 11 Global Step: 146310 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:34:09,970-Speed 3005.31 samples/sec Loss 6.0246 LearningRate 0.0169 Epoch: 11 Global Step: 146320 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:34:13,343-Speed 3037.11 samples/sec Loss 6.0312 LearningRate 0.0169 Epoch: 11 Global Step: 146330 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:34:16,670-Speed 3078.07 samples/sec Loss 6.1115 LearningRate 0.0169 Epoch: 11 Global Step: 146340 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:34:20,018-Speed 3059.48 samples/sec Loss 5.9962 LearningRate 0.0169 Epoch: 11 Global Step: 146350 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:34:23,375-Speed 3051.40 samples/sec Loss 6.0812 LearningRate 0.0169 Epoch: 11 Global Step: 146360 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:34:26,739-Speed 3044.51 samples/sec Loss 6.0450 LearningRate 0.0169 Epoch: 11 Global Step: 146370 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:34:30,162-Speed 2992.23 samples/sec Loss 6.1035 LearningRate 0.0169 Epoch: 11 Global Step: 146380 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:34:33,498-Speed 3071.07 samples/sec Loss 5.9117 LearningRate 0.0169 Epoch: 11 Global Step: 146390 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:34:36,827-Speed 3075.98 samples/sec Loss 5.9728 LearningRate 0.0169 Epoch: 11 Global Step: 146400 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:34:40,192-Speed 3043.85 samples/sec Loss 6.0898 LearningRate 0.0169 Epoch: 11 Global Step: 146410 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:34:43,554-Speed 3047.31 samples/sec Loss 5.9258 LearningRate 0.0169 Epoch: 11 Global Step: 146420 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:34:46,922-Speed 3041.32 samples/sec Loss 6.0113 LearningRate 0.0169 Epoch: 11 Global Step: 146430 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:34:50,373-Speed 2967.87 samples/sec Loss 6.0620 LearningRate 0.0169 Epoch: 11 Global Step: 146440 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:34:53,780-Speed 3006.75 samples/sec Loss 6.1128 LearningRate 0.0168 Epoch: 11 Global Step: 146450 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:34:57,172-Speed 3019.24 samples/sec Loss 6.0012 LearningRate 0.0168 Epoch: 11 Global Step: 146460 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:35:00,536-Speed 3045.13 samples/sec Loss 6.0649 LearningRate 0.0168 Epoch: 11 Global Step: 146470 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:35:04,685-Speed 2468.03 samples/sec Loss 6.1393 LearningRate 0.0168 Epoch: 11 Global Step: 146480 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:35:08,100-Speed 2999.96 samples/sec Loss 6.0820 LearningRate 0.0168 Epoch: 11 Global Step: 146490 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:35:11,419-Speed 3086.38 samples/sec Loss 6.0619 LearningRate 0.0168 Epoch: 11 Global Step: 146500 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:35:14,770-Speed 3056.25 samples/sec Loss 5.9499 LearningRate 0.0168 Epoch: 11 Global Step: 146510 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:35:18,092-Speed 3083.23 samples/sec Loss 6.0821 LearningRate 0.0168 Epoch: 11 Global Step: 146520 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:35:21,514-Speed 2993.17 samples/sec Loss 6.0985 LearningRate 0.0168 Epoch: 11 Global Step: 146530 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:35:24,952-Speed 2979.60 samples/sec Loss 6.0580 LearningRate 0.0168 Epoch: 11 Global Step: 146540 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:35:28,340-Speed 3023.17 samples/sec Loss 6.1284 LearningRate 0.0168 Epoch: 11 Global Step: 146550 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:35:31,698-Speed 3050.61 samples/sec Loss 6.0912 LearningRate 0.0168 Epoch: 11 Global Step: 146560 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:35:35,021-Speed 3082.11 samples/sec Loss 6.0597 LearningRate 0.0168 Epoch: 11 Global Step: 146570 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:35:38,359-Speed 3069.08 samples/sec Loss 5.9574 LearningRate 0.0168 Epoch: 11 Global Step: 146580 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:35:41,845-Speed 2938.16 samples/sec Loss 5.9204 LearningRate 0.0168 Epoch: 11 Global Step: 146590 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:35:45,234-Speed 3022.26 samples/sec Loss 6.2070 LearningRate 0.0168 Epoch: 11 Global Step: 146600 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:35:48,590-Speed 3052.17 samples/sec Loss 6.1292 LearningRate 0.0168 Epoch: 11 Global Step: 146610 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:35:51,948-Speed 3050.18 samples/sec Loss 6.0506 LearningRate 0.0168 Epoch: 11 Global Step: 146620 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:35:55,371-Speed 2992.46 samples/sec Loss 6.0274 LearningRate 0.0168 Epoch: 11 Global Step: 146630 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:35:58,706-Speed 3071.03 samples/sec Loss 6.0327 LearningRate 0.0168 Epoch: 11 Global Step: 146640 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:36:02,106-Speed 3013.37 samples/sec Loss 6.0028 LearningRate 0.0168 Epoch: 11 Global Step: 146650 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:36:05,416-Speed 3094.99 samples/sec Loss 5.9069 LearningRate 0.0168 Epoch: 11 Global Step: 146660 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:36:08,847-Speed 2985.46 samples/sec Loss 6.0117 LearningRate 0.0168 Epoch: 11 Global Step: 146670 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:36:12,188-Speed 3065.22 samples/sec Loss 6.0329 LearningRate 0.0168 Epoch: 11 Global Step: 146680 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:36:15,516-Speed 3078.10 samples/sec Loss 6.0573 LearningRate 0.0168 Epoch: 11 Global Step: 146690 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:36:18,891-Speed 3035.01 samples/sec Loss 5.9835 LearningRate 0.0168 Epoch: 11 Global Step: 146700 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:36:22,231-Speed 3066.40 samples/sec Loss 6.1966 LearningRate 0.0168 Epoch: 11 Global Step: 146710 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:36:25,652-Speed 2994.04 samples/sec Loss 6.0566 LearningRate 0.0168 Epoch: 11 Global Step: 146720 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:36:29,021-Speed 3040.56 samples/sec Loss 5.9146 LearningRate 0.0168 Epoch: 11 Global Step: 146730 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:36:32,363-Speed 3064.86 samples/sec Loss 6.0130 LearningRate 0.0168 Epoch: 11 Global Step: 146740 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:36:35,734-Speed 3038.59 samples/sec Loss 6.1022 LearningRate 0.0167 Epoch: 11 Global Step: 146750 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:36:39,060-Speed 3079.19 samples/sec Loss 6.0278 LearningRate 0.0167 Epoch: 11 Global Step: 146760 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:36:42,449-Speed 3022.86 samples/sec Loss 5.8971 LearningRate 0.0167 Epoch: 11 Global Step: 146770 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:36:45,926-Speed 2945.79 samples/sec Loss 6.0432 LearningRate 0.0167 Epoch: 11 Global Step: 146780 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:36:49,244-Speed 3086.90 samples/sec Loss 5.9677 LearningRate 0.0167 Epoch: 11 Global Step: 146790 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:36:52,604-Speed 3048.23 samples/sec Loss 5.9460 LearningRate 0.0167 Epoch: 11 Global Step: 146800 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:36:55,943-Speed 3068.14 samples/sec Loss 6.0159 LearningRate 0.0167 Epoch: 11 Global Step: 146810 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:36:59,284-Speed 3065.70 samples/sec Loss 5.9400 LearningRate 0.0167 Epoch: 11 Global Step: 146820 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:37:02,707-Speed 2992.98 samples/sec Loss 5.8866 LearningRate 0.0167 Epoch: 11 Global Step: 146830 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:37:06,151-Speed 2973.96 samples/sec Loss 6.0858 LearningRate 0.0167 Epoch: 11 Global Step: 146840 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:37:09,517-Speed 3042.63 samples/sec Loss 6.1207 LearningRate 0.0167 Epoch: 11 Global Step: 146850 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:37:12,896-Speed 3031.54 samples/sec Loss 5.9175 LearningRate 0.0167 Epoch: 11 Global Step: 146860 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:37:16,282-Speed 3024.63 samples/sec Loss 6.0277 LearningRate 0.0167 Epoch: 11 Global Step: 146870 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:37:19,671-Speed 3022.72 samples/sec Loss 5.9858 LearningRate 0.0167 Epoch: 11 Global Step: 146880 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:37:23,078-Speed 3006.36 samples/sec Loss 5.9226 LearningRate 0.0167 Epoch: 11 Global Step: 146890 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:37:26,412-Speed 3072.76 samples/sec Loss 5.9630 LearningRate 0.0167 Epoch: 11 Global Step: 146900 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:37:29,785-Speed 3036.81 samples/sec Loss 5.9368 LearningRate 0.0167 Epoch: 11 Global Step: 146910 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:37:33,170-Speed 3025.57 samples/sec Loss 5.9728 LearningRate 0.0167 Epoch: 11 Global Step: 146920 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:37:36,527-Speed 3051.87 samples/sec Loss 6.0841 LearningRate 0.0167 Epoch: 11 Global Step: 146930 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:37:39,854-Speed 3078.29 samples/sec Loss 6.0512 LearningRate 0.0167 Epoch: 11 Global Step: 146940 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:37:43,245-Speed 3021.23 samples/sec Loss 6.0490 LearningRate 0.0167 Epoch: 11 Global Step: 146950 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:37:46,611-Speed 3042.86 samples/sec Loss 5.9340 LearningRate 0.0167 Epoch: 11 Global Step: 146960 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:37:50,107-Speed 2929.70 samples/sec Loss 6.0676 LearningRate 0.0167 Epoch: 11 Global Step: 146970 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:37:53,481-Speed 3035.97 samples/sec Loss 5.9888 LearningRate 0.0167 Epoch: 11 Global Step: 146980 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:37:56,828-Speed 3060.81 samples/sec Loss 6.0403 LearningRate 0.0167 Epoch: 11 Global Step: 146990 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:38:00,157-Speed 3076.36 samples/sec Loss 6.1022 LearningRate 0.0167 Epoch: 11 Global Step: 147000 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:38:03,511-Speed 3054.01 samples/sec Loss 5.9274 LearningRate 0.0167 Epoch: 11 Global Step: 147010 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:38:06,991-Speed 2944.21 samples/sec Loss 6.0037 LearningRate 0.0167 Epoch: 11 Global Step: 147020 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:38:10,440-Speed 2969.16 samples/sec Loss 5.9282 LearningRate 0.0167 Epoch: 11 Global Step: 147030 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:38:13,774-Speed 3072.16 samples/sec Loss 6.0315 LearningRate 0.0167 Epoch: 11 Global Step: 147040 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:38:17,193-Speed 2996.30 samples/sec Loss 6.0055 LearningRate 0.0167 Epoch: 11 Global Step: 147050 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:38:20,566-Speed 3036.49 samples/sec Loss 6.0581 LearningRate 0.0166 Epoch: 11 Global Step: 147060 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:38:23,942-Speed 3034.25 samples/sec Loss 5.9784 LearningRate 0.0166 Epoch: 11 Global Step: 147070 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:38:27,313-Speed 3038.50 samples/sec Loss 6.0364 LearningRate 0.0166 Epoch: 11 Global Step: 147080 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:38:30,729-Speed 2998.60 samples/sec Loss 5.8301 LearningRate 0.0166 Epoch: 11 Global Step: 147090 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:38:34,212-Speed 2942.19 samples/sec Loss 5.9717 LearningRate 0.0166 Epoch: 11 Global Step: 147100 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:38:37,609-Speed 3015.07 samples/sec Loss 6.1123 LearningRate 0.0166 Epoch: 11 Global Step: 147110 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:38:40,932-Speed 3082.41 samples/sec Loss 5.8687 LearningRate 0.0166 Epoch: 11 Global Step: 147120 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:38:44,335-Speed 3010.17 samples/sec Loss 5.8952 LearningRate 0.0166 Epoch: 11 Global Step: 147130 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:38:47,750-Speed 2999.20 samples/sec Loss 6.0063 LearningRate 0.0166 Epoch: 11 Global Step: 147140 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:38:51,102-Speed 3056.34 samples/sec Loss 6.0700 LearningRate 0.0166 Epoch: 11 Global Step: 147150 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:38:54,453-Speed 3055.95 samples/sec Loss 5.9762 LearningRate 0.0166 Epoch: 11 Global Step: 147160 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:38:57,830-Speed 3033.62 samples/sec Loss 6.0371 LearningRate 0.0166 Epoch: 11 Global Step: 147170 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:39:01,252-Speed 2994.26 samples/sec Loss 5.8969 LearningRate 0.0166 Epoch: 11 Global Step: 147180 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:39:04,699-Speed 2971.30 samples/sec Loss 6.0244 LearningRate 0.0166 Epoch: 11 Global Step: 147190 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:39:08,103-Speed 3009.04 samples/sec Loss 5.9351 LearningRate 0.0166 Epoch: 11 Global Step: 147200 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:39:11,477-Speed 3035.64 samples/sec Loss 6.0950 LearningRate 0.0166 Epoch: 11 Global Step: 147210 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:39:14,914-Speed 2980.39 samples/sec Loss 6.0239 LearningRate 0.0166 Epoch: 11 Global Step: 147220 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:39:18,287-Speed 3037.16 samples/sec Loss 6.1138 LearningRate 0.0166 Epoch: 11 Global Step: 147230 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:39:21,661-Speed 3035.89 samples/sec Loss 5.9639 LearningRate 0.0166 Epoch: 11 Global Step: 147240 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:39:25,058-Speed 3015.45 samples/sec Loss 6.0172 LearningRate 0.0166 Epoch: 11 Global Step: 147250 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:39:28,450-Speed 3019.25 samples/sec Loss 5.9515 LearningRate 0.0166 Epoch: 11 Global Step: 147260 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:39:31,853-Speed 3010.26 samples/sec Loss 5.9737 LearningRate 0.0166 Epoch: 11 Global Step: 147270 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:39:35,222-Speed 3040.51 samples/sec Loss 5.8854 LearningRate 0.0166 Epoch: 11 Global Step: 147280 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:39:38,590-Speed 3041.38 samples/sec Loss 6.1404 LearningRate 0.0166 Epoch: 11 Global Step: 147290 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:39:42,041-Speed 2967.88 samples/sec Loss 6.0275 LearningRate 0.0166 Epoch: 11 Global Step: 147300 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:39:45,450-Speed 3004.56 samples/sec Loss 5.8271 LearningRate 0.0166 Epoch: 11 Global Step: 147310 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:39:48,905-Speed 2964.86 samples/sec Loss 6.0194 LearningRate 0.0166 Epoch: 11 Global Step: 147320 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:39:52,302-Speed 3014.60 samples/sec Loss 5.9405 LearningRate 0.0166 Epoch: 11 Global Step: 147330 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:39:55,655-Speed 3055.30 samples/sec Loss 5.9187 LearningRate 0.0166 Epoch: 11 Global Step: 147340 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:39:59,037-Speed 3028.19 samples/sec Loss 6.0554 LearningRate 0.0166 Epoch: 11 Global Step: 147350 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:40:02,457-Speed 2995.54 samples/sec Loss 6.0617 LearningRate 0.0165 Epoch: 11 Global Step: 147360 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:40:05,861-Speed 3008.48 samples/sec Loss 6.0303 LearningRate 0.0165 Epoch: 11 Global Step: 147370 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:40:09,302-Speed 2977.03 samples/sec Loss 5.9736 LearningRate 0.0165 Epoch: 11 Global Step: 147380 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:40:12,716-Speed 3000.06 samples/sec Loss 5.9928 LearningRate 0.0165 Epoch: 11 Global Step: 147390 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:40:16,087-Speed 3038.80 samples/sec Loss 5.9107 LearningRate 0.0165 Epoch: 11 Global Step: 147400 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:40:19,485-Speed 3014.31 samples/sec Loss 5.9500 LearningRate 0.0165 Epoch: 11 Global Step: 147410 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:40:22,859-Speed 3035.54 samples/sec Loss 5.9110 LearningRate 0.0165 Epoch: 11 Global Step: 147420 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:40:26,188-Speed 3077.17 samples/sec Loss 6.1229 LearningRate 0.0165 Epoch: 11 Global Step: 147430 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:40:29,593-Speed 3007.81 samples/sec Loss 5.9490 LearningRate 0.0165 Epoch: 11 Global Step: 147440 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:40:32,947-Speed 3054.61 samples/sec Loss 5.9876 LearningRate 0.0165 Epoch: 11 Global Step: 147450 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:40:36,356-Speed 3004.63 samples/sec Loss 5.9496 LearningRate 0.0165 Epoch: 11 Global Step: 147460 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:40:39,721-Speed 3043.96 samples/sec Loss 5.9797 LearningRate 0.0165 Epoch: 11 Global Step: 147470 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:40:43,084-Speed 3045.89 samples/sec Loss 6.0057 LearningRate 0.0165 Epoch: 11 Global Step: 147480 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:40:46,490-Speed 3006.88 samples/sec Loss 5.9426 LearningRate 0.0165 Epoch: 11 Global Step: 147490 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:40:49,979-Speed 2935.75 samples/sec Loss 6.0489 LearningRate 0.0165 Epoch: 11 Global Step: 147500 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:40:53,445-Speed 2955.61 samples/sec Loss 6.0298 LearningRate 0.0165 Epoch: 11 Global Step: 147510 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:40:56,868-Speed 2991.79 samples/sec Loss 5.9177 LearningRate 0.0165 Epoch: 11 Global Step: 147520 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:41:00,204-Speed 3071.05 samples/sec Loss 6.0722 LearningRate 0.0165 Epoch: 11 Global Step: 147530 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:41:03,572-Speed 3040.70 samples/sec Loss 5.9957 LearningRate 0.0165 Epoch: 11 Global Step: 147540 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:41:06,883-Speed 3093.66 samples/sec Loss 5.9529 LearningRate 0.0165 Epoch: 11 Global Step: 147550 Fp16 Grad Scale: 65536 Required: 10 hours Training: 2022-04-27 15:41:10,265-Speed 3029.25 samples/sec Loss 6.0191 LearningRate 0.0165 Epoch: 11 Global Step: 147560 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:41:13,606-Speed 3065.84 samples/sec Loss 5.8348 LearningRate 0.0165 Epoch: 11 Global Step: 147570 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:41:17,016-Speed 3004.14 samples/sec Loss 5.9947 LearningRate 0.0165 Epoch: 11 Global Step: 147580 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:41:20,418-Speed 3010.86 samples/sec Loss 5.9775 LearningRate 0.0165 Epoch: 11 Global Step: 147590 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:41:23,761-Speed 3064.39 samples/sec Loss 5.9893 LearningRate 0.0165 Epoch: 11 Global Step: 147600 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:41:27,140-Speed 3031.37 samples/sec Loss 5.8371 LearningRate 0.0165 Epoch: 11 Global Step: 147610 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:41:30,518-Speed 3032.01 samples/sec Loss 6.0167 LearningRate 0.0165 Epoch: 11 Global Step: 147620 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:41:33,849-Speed 3074.69 samples/sec Loss 6.0487 LearningRate 0.0165 Epoch: 11 Global Step: 147630 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:41:37,211-Speed 3046.98 samples/sec Loss 6.0646 LearningRate 0.0165 Epoch: 11 Global Step: 147640 Fp16 Grad Scale: 32768 Required: 10 hours Training: 2022-04-27 15:41:40,531-Speed 3085.58 samples/sec Loss 5.8702 LearningRate 0.0165 Epoch: 11 Global Step: 147650 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:41:43,846-Speed 3089.65 samples/sec Loss 6.0108 LearningRate 0.0165 Epoch: 11 Global Step: 147660 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:41:47,222-Speed 3034.31 samples/sec Loss 6.0017 LearningRate 0.0164 Epoch: 11 Global Step: 147670 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:41:50,580-Speed 3050.48 samples/sec Loss 6.0129 LearningRate 0.0164 Epoch: 11 Global Step: 147680 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:41:53,983-Speed 3009.83 samples/sec Loss 6.0265 LearningRate 0.0164 Epoch: 11 Global Step: 147690 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:41:57,349-Speed 3043.36 samples/sec Loss 5.9183 LearningRate 0.0164 Epoch: 11 Global Step: 147700 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:42:00,720-Speed 3038.29 samples/sec Loss 5.9512 LearningRate 0.0164 Epoch: 11 Global Step: 147710 Fp16 Grad Scale: 16384 Required: 10 hours Training: 2022-04-27 15:42:04,083-Speed 3045.37 samples/sec Loss 5.9739 LearningRate 0.0164 Epoch: 11 Global Step: 147720 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 15:42:07,450-Speed 3042.45 samples/sec Loss 5.9786 LearningRate 0.0164 Epoch: 11 Global Step: 147730 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 15:42:10,816-Speed 3043.77 samples/sec Loss 5.9061 LearningRate 0.0164 Epoch: 11 Global Step: 147740 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 15:42:14,184-Speed 3040.26 samples/sec Loss 5.9040 LearningRate 0.0164 Epoch: 11 Global Step: 147750 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:42:17,582-Speed 3014.61 samples/sec Loss 5.9582 LearningRate 0.0164 Epoch: 11 Global Step: 147760 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:42:20,906-Speed 3082.01 samples/sec Loss 6.0079 LearningRate 0.0164 Epoch: 11 Global Step: 147770 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:42:24,245-Speed 3068.20 samples/sec Loss 5.8886 LearningRate 0.0164 Epoch: 11 Global Step: 147780 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:42:27,589-Speed 3062.41 samples/sec Loss 5.9567 LearningRate 0.0164 Epoch: 11 Global Step: 147790 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:42:31,036-Speed 2971.60 samples/sec Loss 5.9379 LearningRate 0.0164 Epoch: 11 Global Step: 147800 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:42:34,455-Speed 2995.88 samples/sec Loss 5.9758 LearningRate 0.0164 Epoch: 11 Global Step: 147810 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:42:37,781-Speed 3079.71 samples/sec Loss 5.9527 LearningRate 0.0164 Epoch: 11 Global Step: 147820 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:42:41,088-Speed 3097.71 samples/sec Loss 5.9518 LearningRate 0.0164 Epoch: 11 Global Step: 147830 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:42:44,456-Speed 3041.02 samples/sec Loss 6.0153 LearningRate 0.0164 Epoch: 11 Global Step: 147840 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:42:47,808-Speed 3055.29 samples/sec Loss 5.8276 LearningRate 0.0164 Epoch: 11 Global Step: 147850 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 15:42:51,238-Speed 2986.41 samples/sec Loss 5.8825 LearningRate 0.0164 Epoch: 11 Global Step: 147860 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 15:42:54,601-Speed 3046.57 samples/sec Loss 5.8984 LearningRate 0.0164 Epoch: 11 Global Step: 147870 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:42:57,978-Speed 3032.60 samples/sec Loss 5.9929 LearningRate 0.0164 Epoch: 11 Global Step: 147880 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:43:01,353-Speed 3034.80 samples/sec Loss 5.9009 LearningRate 0.0164 Epoch: 11 Global Step: 147890 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:43:04,733-Speed 3030.68 samples/sec Loss 5.9133 LearningRate 0.0164 Epoch: 11 Global Step: 147900 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:43:08,096-Speed 3045.16 samples/sec Loss 5.9833 LearningRate 0.0164 Epoch: 11 Global Step: 147910 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:43:11,422-Speed 3080.49 samples/sec Loss 5.9513 LearningRate 0.0164 Epoch: 11 Global Step: 147920 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:43:14,907-Speed 2938.94 samples/sec Loss 6.0207 LearningRate 0.0164 Epoch: 11 Global Step: 147930 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:43:18,288-Speed 3029.27 samples/sec Loss 5.8782 LearningRate 0.0164 Epoch: 11 Global Step: 147940 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:43:21,672-Speed 3027.58 samples/sec Loss 5.8545 LearningRate 0.0164 Epoch: 11 Global Step: 147950 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:43:25,118-Speed 2972.02 samples/sec Loss 5.8800 LearningRate 0.0164 Epoch: 11 Global Step: 147960 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:43:28,468-Speed 3058.29 samples/sec Loss 5.9964 LearningRate 0.0164 Epoch: 11 Global Step: 147970 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 15:43:31,897-Speed 2987.26 samples/sec Loss 5.9414 LearningRate 0.0163 Epoch: 11 Global Step: 147980 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 15:43:35,292-Speed 3017.13 samples/sec Loss 5.9205 LearningRate 0.0163 Epoch: 11 Global Step: 147990 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 15:43:38,679-Speed 3023.50 samples/sec Loss 5.9853 LearningRate 0.0163 Epoch: 11 Global Step: 148000 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 15:43:42,107-Speed 2988.34 samples/sec Loss 5.8930 LearningRate 0.0163 Epoch: 11 Global Step: 148010 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 15:43:45,403-Speed 3107.76 samples/sec Loss 5.9210 LearningRate 0.0163 Epoch: 11 Global Step: 148020 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 15:43:48,715-Speed 3093.06 samples/sec Loss 6.0108 LearningRate 0.0163 Epoch: 11 Global Step: 148030 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 15:43:52,046-Speed 3075.56 samples/sec Loss 6.0062 LearningRate 0.0163 Epoch: 11 Global Step: 148040 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 15:43:55,481-Speed 2980.94 samples/sec Loss 5.9555 LearningRate 0.0163 Epoch: 11 Global Step: 148050 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 15:43:58,888-Speed 3006.92 samples/sec Loss 5.9994 LearningRate 0.0163 Epoch: 11 Global Step: 148060 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 15:44:02,241-Speed 3054.64 samples/sec Loss 5.9854 LearningRate 0.0163 Epoch: 11 Global Step: 148070 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 15:44:05,594-Speed 3054.51 samples/sec Loss 5.8505 LearningRate 0.0163 Epoch: 11 Global Step: 148080 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 15:44:09,010-Speed 2999.26 samples/sec Loss 5.8285 LearningRate 0.0163 Epoch: 11 Global Step: 148090 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 15:44:12,436-Speed 2989.19 samples/sec Loss 5.9696 LearningRate 0.0163 Epoch: 11 Global Step: 148100 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 15:44:15,766-Speed 3076.42 samples/sec Loss 5.9337 LearningRate 0.0163 Epoch: 11 Global Step: 148110 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 15:44:19,156-Speed 3021.37 samples/sec Loss 5.9241 LearningRate 0.0163 Epoch: 11 Global Step: 148120 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:44:22,590-Speed 2982.87 samples/sec Loss 6.0097 LearningRate 0.0163 Epoch: 11 Global Step: 148130 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 15:44:25,910-Speed 3084.66 samples/sec Loss 5.9375 LearningRate 0.0163 Epoch: 11 Global Step: 148140 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 15:44:29,277-Speed 3042.86 samples/sec Loss 5.8841 LearningRate 0.0163 Epoch: 11 Global Step: 148150 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 15:44:32,675-Speed 3013.77 samples/sec Loss 6.0010 LearningRate 0.0163 Epoch: 11 Global Step: 148160 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 15:44:36,024-Speed 3058.61 samples/sec Loss 6.0846 LearningRate 0.0163 Epoch: 11 Global Step: 148170 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 15:44:39,413-Speed 3022.34 samples/sec Loss 6.0454 LearningRate 0.0163 Epoch: 11 Global Step: 148180 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 15:44:42,788-Speed 3034.54 samples/sec Loss 5.8591 LearningRate 0.0163 Epoch: 11 Global Step: 148190 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 15:44:46,158-Speed 3039.47 samples/sec Loss 5.9777 LearningRate 0.0163 Epoch: 11 Global Step: 148200 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 15:44:49,573-Speed 2999.96 samples/sec Loss 5.9744 LearningRate 0.0163 Epoch: 11 Global Step: 148210 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 15:44:52,940-Speed 3041.74 samples/sec Loss 5.9034 LearningRate 0.0163 Epoch: 11 Global Step: 148220 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 15:44:56,343-Speed 3010.09 samples/sec Loss 5.9723 LearningRate 0.0163 Epoch: 11 Global Step: 148230 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:44:59,680-Speed 3068.70 samples/sec Loss 5.9124 LearningRate 0.0163 Epoch: 11 Global Step: 148240 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:45:03,102-Speed 2993.67 samples/sec Loss 5.9432 LearningRate 0.0163 Epoch: 11 Global Step: 148250 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:45:06,547-Speed 2973.42 samples/sec Loss 5.9370 LearningRate 0.0163 Epoch: 11 Global Step: 148260 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:45:09,923-Speed 3034.10 samples/sec Loss 5.8015 LearningRate 0.0163 Epoch: 11 Global Step: 148270 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:45:13,351-Speed 2987.57 samples/sec Loss 5.8036 LearningRate 0.0162 Epoch: 11 Global Step: 148280 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:45:16,745-Speed 3018.15 samples/sec Loss 5.9921 LearningRate 0.0162 Epoch: 11 Global Step: 148290 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:45:20,148-Speed 3009.65 samples/sec Loss 5.9536 LearningRate 0.0162 Epoch: 11 Global Step: 148300 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:45:23,595-Speed 2971.86 samples/sec Loss 5.8668 LearningRate 0.0162 Epoch: 11 Global Step: 148310 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:45:26,924-Speed 3077.17 samples/sec Loss 5.9138 LearningRate 0.0162 Epoch: 11 Global Step: 148320 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:45:30,269-Speed 3061.82 samples/sec Loss 5.9273 LearningRate 0.0162 Epoch: 11 Global Step: 148330 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:45:33,640-Speed 3038.60 samples/sec Loss 5.9064 LearningRate 0.0162 Epoch: 11 Global Step: 148340 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:45:37,008-Speed 3041.47 samples/sec Loss 5.9250 LearningRate 0.0162 Epoch: 11 Global Step: 148350 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:45:40,398-Speed 3021.51 samples/sec Loss 5.9949 LearningRate 0.0162 Epoch: 11 Global Step: 148360 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:45:43,776-Speed 3032.78 samples/sec Loss 5.9529 LearningRate 0.0162 Epoch: 11 Global Step: 148370 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:45:47,121-Speed 3061.68 samples/sec Loss 5.8677 LearningRate 0.0162 Epoch: 11 Global Step: 148380 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:45:50,503-Speed 3029.14 samples/sec Loss 5.8853 LearningRate 0.0162 Epoch: 11 Global Step: 148390 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:45:53,873-Speed 3039.05 samples/sec Loss 5.8766 LearningRate 0.0162 Epoch: 11 Global Step: 148400 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:45:57,236-Speed 3046.11 samples/sec Loss 5.9172 LearningRate 0.0162 Epoch: 11 Global Step: 148410 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:46:00,579-Speed 3063.68 samples/sec Loss 5.8808 LearningRate 0.0162 Epoch: 11 Global Step: 148420 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:46:03,980-Speed 3011.78 samples/sec Loss 5.8846 LearningRate 0.0162 Epoch: 11 Global Step: 148430 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 15:46:07,397-Speed 2997.67 samples/sec Loss 5.9052 LearningRate 0.0162 Epoch: 11 Global Step: 148440 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 15:46:10,789-Speed 3019.73 samples/sec Loss 5.9474 LearningRate 0.0162 Epoch: 11 Global Step: 148450 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 15:46:14,235-Speed 2972.35 samples/sec Loss 5.9047 LearningRate 0.0162 Epoch: 11 Global Step: 148460 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 15:46:17,570-Speed 3071.95 samples/sec Loss 5.9735 LearningRate 0.0162 Epoch: 11 Global Step: 148470 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 15:46:20,990-Speed 2994.81 samples/sec Loss 5.8544 LearningRate 0.0162 Epoch: 11 Global Step: 148480 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 15:46:24,342-Speed 3055.32 samples/sec Loss 5.9786 LearningRate 0.0162 Epoch: 11 Global Step: 148490 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 15:46:27,840-Speed 2928.60 samples/sec Loss 5.8669 LearningRate 0.0162 Epoch: 11 Global Step: 148500 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 15:46:31,208-Speed 3041.41 samples/sec Loss 5.9149 LearningRate 0.0162 Epoch: 11 Global Step: 148510 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 15:46:34,545-Speed 3069.50 samples/sec Loss 5.9739 LearningRate 0.0162 Epoch: 11 Global Step: 148520 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 15:46:37,932-Speed 3023.68 samples/sec Loss 5.9385 LearningRate 0.0162 Epoch: 11 Global Step: 148530 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:46:41,306-Speed 3036.62 samples/sec Loss 5.8556 LearningRate 0.0162 Epoch: 11 Global Step: 148540 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:46:44,677-Speed 3038.09 samples/sec Loss 5.9852 LearningRate 0.0162 Epoch: 11 Global Step: 148550 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:46:48,079-Speed 3010.56 samples/sec Loss 5.9394 LearningRate 0.0162 Epoch: 11 Global Step: 148560 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:46:51,505-Speed 2992.48 samples/sec Loss 5.9167 LearningRate 0.0162 Epoch: 11 Global Step: 148570 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:46:54,917-Speed 3002.41 samples/sec Loss 6.0273 LearningRate 0.0162 Epoch: 11 Global Step: 148580 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:46:58,255-Speed 3068.33 samples/sec Loss 5.9417 LearningRate 0.0161 Epoch: 11 Global Step: 148590 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:47:01,634-Speed 3031.03 samples/sec Loss 5.8615 LearningRate 0.0161 Epoch: 11 Global Step: 148600 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:47:05,019-Speed 3028.20 samples/sec Loss 5.9728 LearningRate 0.0161 Epoch: 11 Global Step: 148610 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:47:08,403-Speed 3026.45 samples/sec Loss 5.9185 LearningRate 0.0161 Epoch: 11 Global Step: 148620 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:47:11,799-Speed 3016.11 samples/sec Loss 5.8830 LearningRate 0.0161 Epoch: 11 Global Step: 148630 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:47:15,139-Speed 3066.78 samples/sec Loss 5.9979 LearningRate 0.0161 Epoch: 11 Global Step: 148640 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:47:18,489-Speed 3058.14 samples/sec Loss 5.8261 LearningRate 0.0161 Epoch: 11 Global Step: 148650 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:47:21,847-Speed 3050.34 samples/sec Loss 5.8678 LearningRate 0.0161 Epoch: 11 Global Step: 148660 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:47:25,259-Speed 3002.13 samples/sec Loss 5.9145 LearningRate 0.0161 Epoch: 11 Global Step: 148670 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 15:47:28,619-Speed 3048.42 samples/sec Loss 5.9573 LearningRate 0.0161 Epoch: 11 Global Step: 148680 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 15:47:32,021-Speed 3010.47 samples/sec Loss 5.9043 LearningRate 0.0161 Epoch: 11 Global Step: 148690 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 15:47:35,428-Speed 3006.65 samples/sec Loss 5.7796 LearningRate 0.0161 Epoch: 11 Global Step: 148700 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 15:47:38,749-Speed 3084.08 samples/sec Loss 5.9569 LearningRate 0.0161 Epoch: 11 Global Step: 148710 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 15:47:42,051-Speed 3102.76 samples/sec Loss 5.7885 LearningRate 0.0161 Epoch: 11 Global Step: 148720 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 15:47:45,386-Speed 3071.51 samples/sec Loss 6.0149 LearningRate 0.0161 Epoch: 11 Global Step: 148730 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 15:47:48,761-Speed 3034.55 samples/sec Loss 5.9636 LearningRate 0.0161 Epoch: 11 Global Step: 148740 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 15:47:52,135-Speed 3035.95 samples/sec Loss 5.7972 LearningRate 0.0161 Epoch: 11 Global Step: 148750 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 15:47:55,495-Speed 3049.29 samples/sec Loss 5.8198 LearningRate 0.0161 Epoch: 11 Global Step: 148760 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 15:47:58,871-Speed 3033.65 samples/sec Loss 5.8555 LearningRate 0.0161 Epoch: 11 Global Step: 148770 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:48:02,268-Speed 3015.28 samples/sec Loss 5.9170 LearningRate 0.0161 Epoch: 11 Global Step: 148780 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:48:05,604-Speed 3070.32 samples/sec Loss 5.9595 LearningRate 0.0161 Epoch: 11 Global Step: 148790 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:48:09,019-Speed 2999.30 samples/sec Loss 5.8776 LearningRate 0.0161 Epoch: 11 Global Step: 148800 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:48:12,435-Speed 2998.28 samples/sec Loss 5.9412 LearningRate 0.0161 Epoch: 11 Global Step: 148810 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:48:15,779-Speed 3063.42 samples/sec Loss 5.9709 LearningRate 0.0161 Epoch: 11 Global Step: 148820 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:48:19,224-Speed 2973.01 samples/sec Loss 5.9152 LearningRate 0.0161 Epoch: 11 Global Step: 148830 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:48:22,541-Speed 3088.39 samples/sec Loss 5.8332 LearningRate 0.0161 Epoch: 11 Global Step: 148840 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:48:25,865-Speed 3081.70 samples/sec Loss 5.9688 LearningRate 0.0161 Epoch: 11 Global Step: 148850 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:48:29,268-Speed 3010.32 samples/sec Loss 5.9060 LearningRate 0.0161 Epoch: 11 Global Step: 148860 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:48:32,692-Speed 2991.04 samples/sec Loss 5.8510 LearningRate 0.0161 Epoch: 11 Global Step: 148870 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 15:48:36,121-Speed 2987.79 samples/sec Loss 5.9106 LearningRate 0.0161 Epoch: 11 Global Step: 148880 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 15:48:39,481-Speed 3048.56 samples/sec Loss 5.9094 LearningRate 0.0161 Epoch: 11 Global Step: 148890 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 15:48:42,905-Speed 2991.56 samples/sec Loss 5.9345 LearningRate 0.0160 Epoch: 11 Global Step: 148900 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 15:48:46,261-Speed 3052.01 samples/sec Loss 5.8285 LearningRate 0.0160 Epoch: 11 Global Step: 148910 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 15:48:49,648-Speed 3024.36 samples/sec Loss 5.8886 LearningRate 0.0160 Epoch: 11 Global Step: 148920 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 15:48:53,028-Speed 3030.03 samples/sec Loss 5.7314 LearningRate 0.0160 Epoch: 11 Global Step: 148930 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:48:56,429-Speed 3013.11 samples/sec Loss 5.9387 LearningRate 0.0160 Epoch: 11 Global Step: 148940 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:48:59,830-Speed 3012.30 samples/sec Loss 6.0153 LearningRate 0.0160 Epoch: 11 Global Step: 148950 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:49:03,224-Speed 3018.48 samples/sec Loss 5.9060 LearningRate 0.0160 Epoch: 11 Global Step: 148960 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:49:06,617-Speed 3018.65 samples/sec Loss 5.7721 LearningRate 0.0160 Epoch: 11 Global Step: 148970 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:49:09,959-Speed 3064.68 samples/sec Loss 5.9487 LearningRate 0.0160 Epoch: 11 Global Step: 148980 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:49:13,346-Speed 3024.37 samples/sec Loss 5.9504 LearningRate 0.0160 Epoch: 11 Global Step: 148990 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:49:16,743-Speed 3015.12 samples/sec Loss 6.0002 LearningRate 0.0160 Epoch: 11 Global Step: 149000 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:49:20,173-Speed 2986.29 samples/sec Loss 5.9317 LearningRate 0.0160 Epoch: 11 Global Step: 149010 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:49:23,540-Speed 3042.17 samples/sec Loss 5.9357 LearningRate 0.0160 Epoch: 11 Global Step: 149020 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:49:26,885-Speed 3061.92 samples/sec Loss 5.9923 LearningRate 0.0160 Epoch: 11 Global Step: 149030 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 15:49:30,206-Speed 3084.93 samples/sec Loss 5.8323 LearningRate 0.0160 Epoch: 11 Global Step: 149040 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:49:34,004-Speed 2696.34 samples/sec Loss 5.8347 LearningRate 0.0160 Epoch: 11 Global Step: 149050 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:50:06,256-Speed 317.52 samples/sec Loss 4.7019 LearningRate 0.0160 Epoch: 12 Global Step: 149060 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:50:09,808-Speed 2883.90 samples/sec Loss 4.4558 LearningRate 0.0160 Epoch: 12 Global Step: 149070 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:50:13,169-Speed 3047.49 samples/sec Loss 4.4148 LearningRate 0.0160 Epoch: 12 Global Step: 149080 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:50:16,837-Speed 2792.21 samples/sec Loss 4.3830 LearningRate 0.0160 Epoch: 12 Global Step: 149090 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:50:20,184-Speed 3060.33 samples/sec Loss 4.3784 LearningRate 0.0160 Epoch: 12 Global Step: 149100 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:50:23,539-Speed 3053.24 samples/sec Loss 4.4723 LearningRate 0.0160 Epoch: 12 Global Step: 149110 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:50:26,927-Speed 3023.35 samples/sec Loss 4.5124 LearningRate 0.0160 Epoch: 12 Global Step: 149120 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:50:30,360-Speed 2984.34 samples/sec Loss 4.4501 LearningRate 0.0160 Epoch: 12 Global Step: 149130 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:50:33,720-Speed 3048.88 samples/sec Loss 4.5939 LearningRate 0.0160 Epoch: 12 Global Step: 149140 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 15:50:37,116-Speed 3015.92 samples/sec Loss 4.4897 LearningRate 0.0160 Epoch: 12 Global Step: 149150 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:50:40,586-Speed 2952.13 samples/sec Loss 4.5148 LearningRate 0.0160 Epoch: 12 Global Step: 149160 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:50:43,968-Speed 3028.87 samples/sec Loss 4.4455 LearningRate 0.0160 Epoch: 12 Global Step: 149170 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:50:47,321-Speed 3054.88 samples/sec Loss 4.4451 LearningRate 0.0160 Epoch: 12 Global Step: 149180 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:50:50,626-Speed 3098.54 samples/sec Loss 4.4437 LearningRate 0.0160 Epoch: 12 Global Step: 149190 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:50:54,025-Speed 3013.98 samples/sec Loss 4.4353 LearningRate 0.0160 Epoch: 12 Global Step: 149200 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:50:57,470-Speed 2973.49 samples/sec Loss 4.5410 LearningRate 0.0159 Epoch: 12 Global Step: 149210 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:51:00,794-Speed 3081.23 samples/sec Loss 4.5070 LearningRate 0.0159 Epoch: 12 Global Step: 149220 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:51:04,256-Speed 2958.91 samples/sec Loss 4.4963 LearningRate 0.0159 Epoch: 12 Global Step: 149230 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:51:07,668-Speed 3001.87 samples/sec Loss 4.4119 LearningRate 0.0159 Epoch: 12 Global Step: 149240 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:51:11,009-Speed 3066.37 samples/sec Loss 4.5738 LearningRate 0.0159 Epoch: 12 Global Step: 149250 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 15:51:14,432-Speed 2991.83 samples/sec Loss 4.5093 LearningRate 0.0159 Epoch: 12 Global Step: 149260 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 15:51:17,838-Speed 3007.72 samples/sec Loss 4.4970 LearningRate 0.0159 Epoch: 12 Global Step: 149270 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 15:51:21,196-Speed 3050.71 samples/sec Loss 4.5243 LearningRate 0.0159 Epoch: 12 Global Step: 149280 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 15:51:24,546-Speed 3057.27 samples/sec Loss 4.5702 LearningRate 0.0159 Epoch: 12 Global Step: 149290 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 15:51:27,895-Speed 3058.54 samples/sec Loss 4.4764 LearningRate 0.0159 Epoch: 12 Global Step: 149300 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 15:51:31,345-Speed 2968.90 samples/sec Loss 4.4836 LearningRate 0.0159 Epoch: 12 Global Step: 149310 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 15:51:34,697-Speed 3056.09 samples/sec Loss 4.5434 LearningRate 0.0159 Epoch: 12 Global Step: 149320 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 15:51:38,108-Speed 3004.07 samples/sec Loss 4.4532 LearningRate 0.0159 Epoch: 12 Global Step: 149330 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:51:41,564-Speed 2963.34 samples/sec Loss 4.4951 LearningRate 0.0159 Epoch: 12 Global Step: 149340 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:51:44,956-Speed 3020.38 samples/sec Loss 4.4742 LearningRate 0.0159 Epoch: 12 Global Step: 149350 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:51:48,549-Speed 2850.69 samples/sec Loss 4.6328 LearningRate 0.0159 Epoch: 12 Global Step: 149360 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:51:52,028-Speed 2943.82 samples/sec Loss 4.5832 LearningRate 0.0159 Epoch: 12 Global Step: 149370 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:51:55,524-Speed 2930.74 samples/sec Loss 4.5138 LearningRate 0.0159 Epoch: 12 Global Step: 149380 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:51:58,922-Speed 3014.21 samples/sec Loss 4.5411 LearningRate 0.0159 Epoch: 12 Global Step: 149390 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:52:02,252-Speed 3076.66 samples/sec Loss 4.6050 LearningRate 0.0159 Epoch: 12 Global Step: 149400 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:52:05,661-Speed 3004.32 samples/sec Loss 4.5707 LearningRate 0.0159 Epoch: 12 Global Step: 149410 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:52:09,042-Speed 3030.23 samples/sec Loss 4.6063 LearningRate 0.0159 Epoch: 12 Global Step: 149420 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:52:12,380-Speed 3068.94 samples/sec Loss 4.7001 LearningRate 0.0159 Epoch: 12 Global Step: 149430 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 15:52:15,772-Speed 3019.36 samples/sec Loss 4.5437 LearningRate 0.0159 Epoch: 12 Global Step: 149440 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 15:52:19,145-Speed 3036.35 samples/sec Loss 4.5551 LearningRate 0.0159 Epoch: 12 Global Step: 149450 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 15:52:22,487-Speed 3065.15 samples/sec Loss 4.6097 LearningRate 0.0159 Epoch: 12 Global Step: 149460 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:52:25,852-Speed 3044.09 samples/sec Loss 4.5867 LearningRate 0.0159 Epoch: 12 Global Step: 149470 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:52:29,185-Speed 3072.93 samples/sec Loss 4.4024 LearningRate 0.0159 Epoch: 12 Global Step: 149480 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:52:32,581-Speed 3016.71 samples/sec Loss 4.6082 LearningRate 0.0159 Epoch: 12 Global Step: 149490 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:52:35,946-Speed 3043.92 samples/sec Loss 4.4849 LearningRate 0.0159 Epoch: 12 Global Step: 149500 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:52:39,382-Speed 2980.96 samples/sec Loss 4.5454 LearningRate 0.0159 Epoch: 12 Global Step: 149510 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:52:42,765-Speed 3028.04 samples/sec Loss 4.5812 LearningRate 0.0158 Epoch: 12 Global Step: 149520 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:52:46,157-Speed 3019.91 samples/sec Loss 4.5702 LearningRate 0.0158 Epoch: 12 Global Step: 149530 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:52:49,577-Speed 2994.53 samples/sec Loss 4.6163 LearningRate 0.0158 Epoch: 12 Global Step: 149540 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:52:52,944-Speed 3042.48 samples/sec Loss 4.6672 LearningRate 0.0158 Epoch: 12 Global Step: 149550 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:52:56,367-Speed 2992.00 samples/sec Loss 4.6022 LearningRate 0.0158 Epoch: 12 Global Step: 149560 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 15:52:59,771-Speed 3010.27 samples/sec Loss 4.6159 LearningRate 0.0158 Epoch: 12 Global Step: 149570 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 15:53:03,126-Speed 3053.39 samples/sec Loss 4.6317 LearningRate 0.0158 Epoch: 12 Global Step: 149580 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 15:53:06,490-Speed 3044.89 samples/sec Loss 4.4729 LearningRate 0.0158 Epoch: 12 Global Step: 149590 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 15:53:09,988-Speed 2927.61 samples/sec Loss 4.6701 LearningRate 0.0158 Epoch: 12 Global Step: 149600 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 15:53:13,403-Speed 3000.03 samples/sec Loss 4.6367 LearningRate 0.0158 Epoch: 12 Global Step: 149610 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 15:53:16,750-Speed 3060.13 samples/sec Loss 4.6993 LearningRate 0.0158 Epoch: 12 Global Step: 149620 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:53:20,097-Speed 3060.65 samples/sec Loss 4.5614 LearningRate 0.0158 Epoch: 12 Global Step: 149630 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:53:23,503-Speed 3006.67 samples/sec Loss 4.5773 LearningRate 0.0158 Epoch: 12 Global Step: 149640 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:53:26,832-Speed 3076.83 samples/sec Loss 4.6714 LearningRate 0.0158 Epoch: 12 Global Step: 149650 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:53:30,177-Speed 3062.38 samples/sec Loss 4.6142 LearningRate 0.0158 Epoch: 12 Global Step: 149660 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:53:33,630-Speed 2966.91 samples/sec Loss 4.5361 LearningRate 0.0158 Epoch: 12 Global Step: 149670 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:53:36,970-Speed 3066.77 samples/sec Loss 4.5845 LearningRate 0.0158 Epoch: 12 Global Step: 149680 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 15:53:40,424-Speed 2965.16 samples/sec Loss 4.6559 LearningRate 0.0158 Epoch: 12 Global Step: 149690 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 15:53:43,878-Speed 2965.81 samples/sec Loss 4.6239 LearningRate 0.0158 Epoch: 12 Global Step: 149700 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 15:53:47,321-Speed 2974.42 samples/sec Loss 4.5937 LearningRate 0.0158 Epoch: 12 Global Step: 149710 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 15:53:50,779-Speed 2961.99 samples/sec Loss 4.7000 LearningRate 0.0158 Epoch: 12 Global Step: 149720 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 15:53:54,228-Speed 2969.59 samples/sec Loss 4.6464 LearningRate 0.0158 Epoch: 12 Global Step: 149730 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 15:53:57,542-Speed 3090.87 samples/sec Loss 4.7655 LearningRate 0.0158 Epoch: 12 Global Step: 149740 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 15:54:01,000-Speed 2962.59 samples/sec Loss 4.4654 LearningRate 0.0158 Epoch: 12 Global Step: 149750 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 15:54:04,323-Speed 3081.80 samples/sec Loss 4.6290 LearningRate 0.0158 Epoch: 12 Global Step: 149760 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 15:54:07,690-Speed 3043.35 samples/sec Loss 4.7803 LearningRate 0.0158 Epoch: 12 Global Step: 149770 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 15:54:11,067-Speed 3032.47 samples/sec Loss 4.7963 LearningRate 0.0158 Epoch: 12 Global Step: 149780 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:54:14,499-Speed 2985.63 samples/sec Loss 4.7309 LearningRate 0.0158 Epoch: 12 Global Step: 149790 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:54:17,835-Speed 3070.37 samples/sec Loss 4.5757 LearningRate 0.0158 Epoch: 12 Global Step: 149800 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:54:21,251-Speed 2998.94 samples/sec Loss 4.6835 LearningRate 0.0158 Epoch: 12 Global Step: 149810 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:54:24,622-Speed 3038.00 samples/sec Loss 4.7095 LearningRate 0.0158 Epoch: 12 Global Step: 149820 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:54:28,097-Speed 2947.68 samples/sec Loss 4.6472 LearningRate 0.0158 Epoch: 12 Global Step: 149830 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:54:31,451-Speed 3054.14 samples/sec Loss 4.6187 LearningRate 0.0157 Epoch: 12 Global Step: 149840 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:54:34,803-Speed 3055.20 samples/sec Loss 4.7087 LearningRate 0.0157 Epoch: 12 Global Step: 149850 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 15:54:38,242-Speed 2978.94 samples/sec Loss 4.6528 LearningRate 0.0157 Epoch: 12 Global Step: 149860 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 15:54:41,598-Speed 3052.08 samples/sec Loss 4.7455 LearningRate 0.0157 Epoch: 12 Global Step: 149870 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 15:54:44,987-Speed 3022.33 samples/sec Loss 4.6431 LearningRate 0.0157 Epoch: 12 Global Step: 149880 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 15:54:48,430-Speed 2975.05 samples/sec Loss 4.5934 LearningRate 0.0157 Epoch: 12 Global Step: 149890 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 15:54:51,840-Speed 3003.18 samples/sec Loss 4.6976 LearningRate 0.0157 Epoch: 12 Global Step: 149900 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 15:54:55,244-Speed 3009.41 samples/sec Loss 4.7669 LearningRate 0.0157 Epoch: 12 Global Step: 149910 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 15:54:58,601-Speed 3051.73 samples/sec Loss 4.7286 LearningRate 0.0157 Epoch: 12 Global Step: 149920 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 15:55:01,944-Speed 3063.64 samples/sec Loss 4.6321 LearningRate 0.0157 Epoch: 12 Global Step: 149930 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 15:55:05,351-Speed 3006.46 samples/sec Loss 4.7727 LearningRate 0.0157 Epoch: 12 Global Step: 149940 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 15:55:08,665-Speed 3091.42 samples/sec Loss 4.7538 LearningRate 0.0157 Epoch: 12 Global Step: 149950 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:55:12,134-Speed 2952.07 samples/sec Loss 4.6780 LearningRate 0.0157 Epoch: 12 Global Step: 149960 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:55:15,603-Speed 2953.68 samples/sec Loss 4.8165 LearningRate 0.0157 Epoch: 12 Global Step: 149970 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:55:18,994-Speed 3020.85 samples/sec Loss 4.7868 LearningRate 0.0157 Epoch: 12 Global Step: 149980 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:55:22,376-Speed 3028.61 samples/sec Loss 4.7165 LearningRate 0.0157 Epoch: 12 Global Step: 149990 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:55:25,867-Speed 2933.93 samples/sec Loss 4.6538 LearningRate 0.0157 Epoch: 12 Global Step: 150000 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:55:29,252-Speed 3025.92 samples/sec Loss 4.6442 LearningRate 0.0157 Epoch: 12 Global Step: 150010 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:55:32,656-Speed 3008.98 samples/sec Loss 4.8020 LearningRate 0.0157 Epoch: 12 Global Step: 150020 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:55:36,091-Speed 2981.91 samples/sec Loss 4.7782 LearningRate 0.0157 Epoch: 12 Global Step: 150030 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:55:39,498-Speed 3006.04 samples/sec Loss 4.7227 LearningRate 0.0157 Epoch: 12 Global Step: 150040 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:55:42,863-Speed 3044.26 samples/sec Loss 4.7491 LearningRate 0.0157 Epoch: 12 Global Step: 150050 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 15:55:46,159-Speed 3107.25 samples/sec Loss 4.7149 LearningRate 0.0157 Epoch: 12 Global Step: 150060 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:55:49,468-Speed 3096.05 samples/sec Loss 4.8018 LearningRate 0.0157 Epoch: 12 Global Step: 150070 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:55:52,901-Speed 2983.73 samples/sec Loss 4.7722 LearningRate 0.0157 Epoch: 12 Global Step: 150080 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:55:56,275-Speed 3035.73 samples/sec Loss 4.6844 LearningRate 0.0157 Epoch: 12 Global Step: 150090 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:55:59,703-Speed 2987.71 samples/sec Loss 4.6953 LearningRate 0.0157 Epoch: 12 Global Step: 150100 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:56:03,079-Speed 3034.06 samples/sec Loss 4.7106 LearningRate 0.0157 Epoch: 12 Global Step: 150110 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:56:06,423-Speed 3062.80 samples/sec Loss 4.7780 LearningRate 0.0157 Epoch: 12 Global Step: 150120 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:56:09,797-Speed 3036.70 samples/sec Loss 4.7328 LearningRate 0.0157 Epoch: 12 Global Step: 150130 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:56:13,167-Speed 3039.52 samples/sec Loss 4.7776 LearningRate 0.0157 Epoch: 12 Global Step: 150140 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:56:16,536-Speed 3040.21 samples/sec Loss 4.7457 LearningRate 0.0156 Epoch: 12 Global Step: 150150 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:56:19,930-Speed 3018.12 samples/sec Loss 4.7465 LearningRate 0.0156 Epoch: 12 Global Step: 150160 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 15:56:23,268-Speed 3067.92 samples/sec Loss 4.7647 LearningRate 0.0156 Epoch: 12 Global Step: 150170 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 15:56:26,715-Speed 2971.77 samples/sec Loss 4.8039 LearningRate 0.0156 Epoch: 12 Global Step: 150180 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:56:30,075-Speed 3048.89 samples/sec Loss 4.7088 LearningRate 0.0156 Epoch: 12 Global Step: 150190 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:56:33,436-Speed 3047.81 samples/sec Loss 4.8652 LearningRate 0.0156 Epoch: 12 Global Step: 150200 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:56:36,814-Speed 3031.81 samples/sec Loss 4.8823 LearningRate 0.0156 Epoch: 12 Global Step: 150210 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:56:40,150-Speed 3070.86 samples/sec Loss 4.7221 LearningRate 0.0156 Epoch: 12 Global Step: 150220 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:56:43,481-Speed 3074.67 samples/sec Loss 4.7907 LearningRate 0.0156 Epoch: 12 Global Step: 150230 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:56:46,851-Speed 3039.68 samples/sec Loss 4.7384 LearningRate 0.0156 Epoch: 12 Global Step: 150240 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:56:50,169-Speed 3087.16 samples/sec Loss 4.7681 LearningRate 0.0156 Epoch: 12 Global Step: 150250 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:56:53,522-Speed 3054.47 samples/sec Loss 4.7503 LearningRate 0.0156 Epoch: 12 Global Step: 150260 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 15:56:56,963-Speed 2977.24 samples/sec Loss 4.8424 LearningRate 0.0156 Epoch: 12 Global Step: 150270 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 15:57:00,319-Speed 3051.84 samples/sec Loss 4.7021 LearningRate 0.0156 Epoch: 12 Global Step: 150280 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 15:57:03,691-Speed 3037.23 samples/sec Loss 4.7804 LearningRate 0.0156 Epoch: 12 Global Step: 150290 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 15:57:07,121-Speed 2986.36 samples/sec Loss 4.8073 LearningRate 0.0156 Epoch: 12 Global Step: 150300 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 15:57:10,458-Speed 3069.28 samples/sec Loss 4.7635 LearningRate 0.0156 Epoch: 12 Global Step: 150310 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 15:57:13,788-Speed 3076.17 samples/sec Loss 4.8124 LearningRate 0.0156 Epoch: 12 Global Step: 150320 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 15:57:17,221-Speed 2983.93 samples/sec Loss 4.8581 LearningRate 0.0156 Epoch: 12 Global Step: 150330 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 15:57:20,635-Speed 3000.08 samples/sec Loss 4.8280 LearningRate 0.0156 Epoch: 12 Global Step: 150340 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 15:57:24,063-Speed 2988.38 samples/sec Loss 4.7557 LearningRate 0.0156 Epoch: 12 Global Step: 150350 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 15:57:27,400-Speed 3068.92 samples/sec Loss 4.8795 LearningRate 0.0156 Epoch: 12 Global Step: 150360 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:57:30,720-Speed 3085.71 samples/sec Loss 4.6944 LearningRate 0.0156 Epoch: 12 Global Step: 150370 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:57:34,111-Speed 3019.92 samples/sec Loss 4.8473 LearningRate 0.0156 Epoch: 12 Global Step: 150380 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:57:37,528-Speed 2998.31 samples/sec Loss 4.8769 LearningRate 0.0156 Epoch: 12 Global Step: 150390 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 15:57:40,871-Speed 3063.81 samples/sec Loss 4.7886 LearningRate 0.0156 Epoch: 12 Global Step: 150400 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 15:57:44,269-Speed 3015.31 samples/sec Loss 4.7652 LearningRate 0.0156 Epoch: 12 Global Step: 150410 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 15:57:47,695-Speed 2989.35 samples/sec Loss 4.7927 LearningRate 0.0156 Epoch: 12 Global Step: 150420 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 15:57:51,053-Speed 3050.75 samples/sec Loss 4.7463 LearningRate 0.0156 Epoch: 12 Global Step: 150430 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 15:57:54,448-Speed 3016.80 samples/sec Loss 4.8488 LearningRate 0.0156 Epoch: 12 Global Step: 150440 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 15:57:57,799-Speed 3056.70 samples/sec Loss 4.8110 LearningRate 0.0156 Epoch: 12 Global Step: 150450 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 15:58:01,272-Speed 2949.31 samples/sec Loss 4.8572 LearningRate 0.0155 Epoch: 12 Global Step: 150460 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 15:58:04,595-Speed 3082.54 samples/sec Loss 4.8963 LearningRate 0.0155 Epoch: 12 Global Step: 150470 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 15:58:08,026-Speed 2985.59 samples/sec Loss 4.8960 LearningRate 0.0155 Epoch: 12 Global Step: 150480 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 15:58:11,420-Speed 3017.42 samples/sec Loss 4.9173 LearningRate 0.0155 Epoch: 12 Global Step: 150490 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:58:14,766-Speed 3061.72 samples/sec Loss 4.7969 LearningRate 0.0155 Epoch: 12 Global Step: 150500 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:58:18,177-Speed 3002.67 samples/sec Loss 4.8782 LearningRate 0.0155 Epoch: 12 Global Step: 150510 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:58:21,615-Speed 2979.92 samples/sec Loss 4.8465 LearningRate 0.0155 Epoch: 12 Global Step: 150520 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:58:25,068-Speed 2965.96 samples/sec Loss 4.8579 LearningRate 0.0155 Epoch: 12 Global Step: 150530 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:58:28,562-Speed 2932.23 samples/sec Loss 4.7910 LearningRate 0.0155 Epoch: 12 Global Step: 150540 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:58:31,938-Speed 3033.85 samples/sec Loss 4.8378 LearningRate 0.0155 Epoch: 12 Global Step: 150550 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:58:35,268-Speed 3075.24 samples/sec Loss 4.9767 LearningRate 0.0155 Epoch: 12 Global Step: 150560 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:58:38,720-Speed 2967.77 samples/sec Loss 4.9708 LearningRate 0.0155 Epoch: 12 Global Step: 150570 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:58:42,127-Speed 3006.35 samples/sec Loss 4.9609 LearningRate 0.0155 Epoch: 12 Global Step: 150580 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:58:45,536-Speed 3004.14 samples/sec Loss 4.9618 LearningRate 0.0155 Epoch: 12 Global Step: 150590 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 15:58:48,907-Speed 3039.24 samples/sec Loss 4.8980 LearningRate 0.0155 Epoch: 12 Global Step: 150600 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 15:58:52,270-Speed 3045.76 samples/sec Loss 4.8709 LearningRate 0.0155 Epoch: 12 Global Step: 150610 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:58:55,708-Speed 2979.05 samples/sec Loss 4.8242 LearningRate 0.0155 Epoch: 12 Global Step: 150620 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:58:59,057-Speed 3058.85 samples/sec Loss 4.8885 LearningRate 0.0155 Epoch: 12 Global Step: 150630 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:59:02,389-Speed 3073.35 samples/sec Loss 4.9063 LearningRate 0.0155 Epoch: 12 Global Step: 150640 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:59:05,713-Speed 3081.81 samples/sec Loss 4.9184 LearningRate 0.0155 Epoch: 12 Global Step: 150650 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:59:09,140-Speed 2988.58 samples/sec Loss 4.7782 LearningRate 0.0155 Epoch: 12 Global Step: 150660 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:59:12,502-Speed 3046.79 samples/sec Loss 4.9602 LearningRate 0.0155 Epoch: 12 Global Step: 150670 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:59:15,871-Speed 3040.44 samples/sec Loss 4.9071 LearningRate 0.0155 Epoch: 12 Global Step: 150680 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:59:19,224-Speed 3055.03 samples/sec Loss 4.8558 LearningRate 0.0155 Epoch: 12 Global Step: 150690 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:59:22,556-Speed 3073.97 samples/sec Loss 5.0201 LearningRate 0.0155 Epoch: 12 Global Step: 150700 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:59:25,869-Speed 3091.51 samples/sec Loss 4.8942 LearningRate 0.0155 Epoch: 12 Global Step: 150710 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 15:59:29,277-Speed 3006.20 samples/sec Loss 4.9205 LearningRate 0.0155 Epoch: 12 Global Step: 150720 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 15:59:32,652-Speed 3035.00 samples/sec Loss 4.9435 LearningRate 0.0155 Epoch: 12 Global Step: 150730 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 15:59:35,995-Speed 3063.29 samples/sec Loss 4.9426 LearningRate 0.0155 Epoch: 12 Global Step: 150740 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 15:59:39,379-Speed 3027.07 samples/sec Loss 4.8783 LearningRate 0.0155 Epoch: 12 Global Step: 150750 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 15:59:42,706-Speed 3079.37 samples/sec Loss 4.8054 LearningRate 0.0155 Epoch: 12 Global Step: 150760 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:59:46,045-Speed 3067.02 samples/sec Loss 4.9613 LearningRate 0.0155 Epoch: 12 Global Step: 150770 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:59:49,447-Speed 3010.31 samples/sec Loss 4.9745 LearningRate 0.0154 Epoch: 12 Global Step: 150780 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:59:52,828-Speed 3029.70 samples/sec Loss 4.9459 LearningRate 0.0154 Epoch: 12 Global Step: 150790 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:59:56,161-Speed 3074.03 samples/sec Loss 5.0065 LearningRate 0.0154 Epoch: 12 Global Step: 150800 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 15:59:59,511-Speed 3057.15 samples/sec Loss 4.9027 LearningRate 0.0154 Epoch: 12 Global Step: 150810 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:00:02,883-Speed 3037.63 samples/sec Loss 5.0000 LearningRate 0.0154 Epoch: 12 Global Step: 150820 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:00:06,267-Speed 3026.71 samples/sec Loss 4.8740 LearningRate 0.0154 Epoch: 12 Global Step: 150830 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:00:09,624-Speed 3051.96 samples/sec Loss 4.9721 LearningRate 0.0154 Epoch: 12 Global Step: 150840 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:00:13,054-Speed 2986.15 samples/sec Loss 4.9701 LearningRate 0.0154 Epoch: 12 Global Step: 150850 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:00:16,422-Speed 3040.59 samples/sec Loss 4.8777 LearningRate 0.0154 Epoch: 12 Global Step: 150860 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:00:19,777-Speed 3054.58 samples/sec Loss 4.9775 LearningRate 0.0154 Epoch: 12 Global Step: 150870 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:00:23,100-Speed 3082.53 samples/sec Loss 4.8554 LearningRate 0.0154 Epoch: 12 Global Step: 150880 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:00:26,414-Speed 3090.39 samples/sec Loss 4.9251 LearningRate 0.0154 Epoch: 12 Global Step: 150890 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:00:29,847-Speed 2983.43 samples/sec Loss 4.9091 LearningRate 0.0154 Epoch: 12 Global Step: 150900 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:00:33,191-Speed 3063.26 samples/sec Loss 5.0354 LearningRate 0.0154 Epoch: 12 Global Step: 150910 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:00:36,566-Speed 3034.74 samples/sec Loss 5.0130 LearningRate 0.0154 Epoch: 12 Global Step: 150920 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:00:39,991-Speed 2991.09 samples/sec Loss 5.0224 LearningRate 0.0154 Epoch: 12 Global Step: 150930 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:00:43,374-Speed 3027.55 samples/sec Loss 4.8984 LearningRate 0.0154 Epoch: 12 Global Step: 150940 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:00:46,740-Speed 3043.07 samples/sec Loss 4.9431 LearningRate 0.0154 Epoch: 12 Global Step: 150950 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:00:50,131-Speed 3020.89 samples/sec Loss 4.8955 LearningRate 0.0154 Epoch: 12 Global Step: 150960 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:00:53,477-Speed 3060.51 samples/sec Loss 4.9414 LearningRate 0.0154 Epoch: 12 Global Step: 150970 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:00:56,822-Speed 3061.91 samples/sec Loss 4.9243 LearningRate 0.0154 Epoch: 12 Global Step: 150980 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:01:00,262-Speed 2978.10 samples/sec Loss 4.9233 LearningRate 0.0154 Epoch: 12 Global Step: 150990 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:01:03,666-Speed 3008.61 samples/sec Loss 4.8830 LearningRate 0.0154 Epoch: 12 Global Step: 151000 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:01:07,204-Speed 2895.26 samples/sec Loss 4.9302 LearningRate 0.0154 Epoch: 12 Global Step: 151010 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:01:10,642-Speed 2979.53 samples/sec Loss 5.0004 LearningRate 0.0154 Epoch: 12 Global Step: 151020 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:01:14,054-Speed 3001.73 samples/sec Loss 4.9736 LearningRate 0.0154 Epoch: 12 Global Step: 151030 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:01:17,405-Speed 3056.97 samples/sec Loss 4.8737 LearningRate 0.0154 Epoch: 12 Global Step: 151040 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:01:20,843-Speed 2979.64 samples/sec Loss 4.9121 LearningRate 0.0154 Epoch: 12 Global Step: 151050 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:01:24,182-Speed 3067.04 samples/sec Loss 4.9650 LearningRate 0.0154 Epoch: 12 Global Step: 151060 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:01:27,548-Speed 3043.41 samples/sec Loss 4.9875 LearningRate 0.0154 Epoch: 12 Global Step: 151070 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:01:30,950-Speed 3010.75 samples/sec Loss 5.0377 LearningRate 0.0154 Epoch: 12 Global Step: 151080 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:01:34,357-Speed 3006.51 samples/sec Loss 4.9522 LearningRate 0.0154 Epoch: 12 Global Step: 151090 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:01:37,735-Speed 3032.03 samples/sec Loss 5.0311 LearningRate 0.0153 Epoch: 12 Global Step: 151100 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:01:41,097-Speed 3047.41 samples/sec Loss 4.9596 LearningRate 0.0153 Epoch: 12 Global Step: 151110 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:01:44,486-Speed 3022.37 samples/sec Loss 5.0041 LearningRate 0.0153 Epoch: 12 Global Step: 151120 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:01:47,879-Speed 3019.09 samples/sec Loss 5.0109 LearningRate 0.0153 Epoch: 12 Global Step: 151130 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:01:51,227-Speed 3059.30 samples/sec Loss 4.9691 LearningRate 0.0153 Epoch: 12 Global Step: 151140 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:01:54,623-Speed 3015.71 samples/sec Loss 5.0395 LearningRate 0.0153 Epoch: 12 Global Step: 151150 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:01:58,012-Speed 3022.91 samples/sec Loss 5.0352 LearningRate 0.0153 Epoch: 12 Global Step: 151160 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:02:01,351-Speed 3067.01 samples/sec Loss 5.0111 LearningRate 0.0153 Epoch: 12 Global Step: 151170 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:02:04,729-Speed 3032.62 samples/sec Loss 4.8391 LearningRate 0.0153 Epoch: 12 Global Step: 151180 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:02:08,104-Speed 3034.94 samples/sec Loss 4.9195 LearningRate 0.0153 Epoch: 12 Global Step: 151190 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:02:11,503-Speed 3013.34 samples/sec Loss 5.0974 LearningRate 0.0153 Epoch: 12 Global Step: 151200 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:02:14,932-Speed 2987.60 samples/sec Loss 5.0485 LearningRate 0.0153 Epoch: 12 Global Step: 151210 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:02:18,308-Speed 3034.38 samples/sec Loss 4.9571 LearningRate 0.0153 Epoch: 12 Global Step: 151220 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:02:21,711-Speed 3009.71 samples/sec Loss 5.0171 LearningRate 0.0153 Epoch: 12 Global Step: 151230 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:02:25,117-Speed 3007.10 samples/sec Loss 4.9251 LearningRate 0.0153 Epoch: 12 Global Step: 151240 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:02:28,543-Speed 2989.59 samples/sec Loss 5.0288 LearningRate 0.0153 Epoch: 12 Global Step: 151250 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:02:31,966-Speed 2992.44 samples/sec Loss 5.0384 LearningRate 0.0153 Epoch: 12 Global Step: 151260 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:02:35,338-Speed 3038.18 samples/sec Loss 5.0391 LearningRate 0.0153 Epoch: 12 Global Step: 151270 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:02:38,717-Speed 3031.30 samples/sec Loss 4.9884 LearningRate 0.0153 Epoch: 12 Global Step: 151280 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:02:42,049-Speed 3074.30 samples/sec Loss 5.1616 LearningRate 0.0153 Epoch: 12 Global Step: 151290 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:02:45,422-Speed 3036.81 samples/sec Loss 4.9656 LearningRate 0.0153 Epoch: 12 Global Step: 151300 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:02:48,745-Speed 3082.82 samples/sec Loss 4.9075 LearningRate 0.0153 Epoch: 12 Global Step: 151310 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:02:52,120-Speed 3034.59 samples/sec Loss 5.0633 LearningRate 0.0153 Epoch: 12 Global Step: 151320 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:02:55,549-Speed 2987.01 samples/sec Loss 5.0096 LearningRate 0.0153 Epoch: 12 Global Step: 151330 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:02:58,905-Speed 3051.75 samples/sec Loss 5.0181 LearningRate 0.0153 Epoch: 12 Global Step: 151340 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:03:02,250-Speed 3062.59 samples/sec Loss 4.8715 LearningRate 0.0153 Epoch: 12 Global Step: 151350 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:03:05,626-Speed 3033.57 samples/sec Loss 5.0671 LearningRate 0.0153 Epoch: 12 Global Step: 151360 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:03:09,083-Speed 2963.75 samples/sec Loss 4.9997 LearningRate 0.0153 Epoch: 12 Global Step: 151370 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:03:12,532-Speed 2969.21 samples/sec Loss 4.9874 LearningRate 0.0153 Epoch: 12 Global Step: 151380 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:03:15,935-Speed 3009.62 samples/sec Loss 5.0667 LearningRate 0.0153 Epoch: 12 Global Step: 151390 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:03:19,326-Speed 3021.06 samples/sec Loss 5.0502 LearningRate 0.0153 Epoch: 12 Global Step: 151400 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:03:22,669-Speed 3064.36 samples/sec Loss 5.0415 LearningRate 0.0152 Epoch: 12 Global Step: 151410 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:03:26,070-Speed 3012.23 samples/sec Loss 4.9245 LearningRate 0.0152 Epoch: 12 Global Step: 151420 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:03:29,440-Speed 3038.80 samples/sec Loss 5.0822 LearningRate 0.0152 Epoch: 12 Global Step: 151430 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:03:32,806-Speed 3042.94 samples/sec Loss 5.1016 LearningRate 0.0152 Epoch: 12 Global Step: 151440 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:03:36,199-Speed 3018.55 samples/sec Loss 5.1217 LearningRate 0.0152 Epoch: 12 Global Step: 151450 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:03:39,629-Speed 2986.63 samples/sec Loss 4.9793 LearningRate 0.0152 Epoch: 12 Global Step: 151460 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:03:42,980-Speed 3057.32 samples/sec Loss 4.9662 LearningRate 0.0152 Epoch: 12 Global Step: 151470 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:03:46,317-Speed 3069.60 samples/sec Loss 5.0255 LearningRate 0.0152 Epoch: 12 Global Step: 151480 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:03:49,767-Speed 2968.51 samples/sec Loss 5.1253 LearningRate 0.0152 Epoch: 12 Global Step: 151490 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:03:53,167-Speed 3013.18 samples/sec Loss 5.0001 LearningRate 0.0152 Epoch: 12 Global Step: 151500 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:03:56,599-Speed 2984.77 samples/sec Loss 5.0568 LearningRate 0.0152 Epoch: 12 Global Step: 151510 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:03:59,970-Speed 3037.84 samples/sec Loss 4.9623 LearningRate 0.0152 Epoch: 12 Global Step: 151520 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:04:03,365-Speed 3017.75 samples/sec Loss 5.0699 LearningRate 0.0152 Epoch: 12 Global Step: 151530 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:04:06,726-Speed 3047.21 samples/sec Loss 5.1189 LearningRate 0.0152 Epoch: 12 Global Step: 151540 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:04:10,086-Speed 3048.09 samples/sec Loss 5.0388 LearningRate 0.0152 Epoch: 12 Global Step: 151550 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:04:13,451-Speed 3044.49 samples/sec Loss 5.0754 LearningRate 0.0152 Epoch: 12 Global Step: 151560 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:04:16,798-Speed 3060.27 samples/sec Loss 5.1318 LearningRate 0.0152 Epoch: 12 Global Step: 151570 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:04:20,132-Speed 3072.45 samples/sec Loss 5.0405 LearningRate 0.0152 Epoch: 12 Global Step: 151580 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:04:23,533-Speed 3011.51 samples/sec Loss 5.0650 LearningRate 0.0152 Epoch: 12 Global Step: 151590 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:04:26,901-Speed 3041.38 samples/sec Loss 5.1415 LearningRate 0.0152 Epoch: 12 Global Step: 151600 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:04:30,337-Speed 2980.81 samples/sec Loss 5.0298 LearningRate 0.0152 Epoch: 12 Global Step: 151610 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:04:33,725-Speed 3023.29 samples/sec Loss 5.0683 LearningRate 0.0152 Epoch: 12 Global Step: 151620 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:04:37,104-Speed 3031.22 samples/sec Loss 4.9996 LearningRate 0.0152 Epoch: 12 Global Step: 151630 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:04:40,519-Speed 3000.21 samples/sec Loss 5.1674 LearningRate 0.0152 Epoch: 12 Global Step: 151640 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:04:43,893-Speed 3035.36 samples/sec Loss 5.1425 LearningRate 0.0152 Epoch: 12 Global Step: 151650 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:04:47,333-Speed 2977.77 samples/sec Loss 5.0090 LearningRate 0.0152 Epoch: 12 Global Step: 151660 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:04:50,663-Speed 3075.79 samples/sec Loss 5.1261 LearningRate 0.0152 Epoch: 12 Global Step: 151670 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:04:53,981-Speed 3087.05 samples/sec Loss 5.1155 LearningRate 0.0152 Epoch: 12 Global Step: 151680 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:04:57,316-Speed 3071.44 samples/sec Loss 5.0451 LearningRate 0.0152 Epoch: 12 Global Step: 151690 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:05:00,731-Speed 2999.48 samples/sec Loss 5.0666 LearningRate 0.0152 Epoch: 12 Global Step: 151700 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:05:04,151-Speed 2994.87 samples/sec Loss 5.0735 LearningRate 0.0152 Epoch: 12 Global Step: 151710 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:05:07,549-Speed 3014.14 samples/sec Loss 5.0345 LearningRate 0.0152 Epoch: 12 Global Step: 151720 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:05:10,994-Speed 2973.84 samples/sec Loss 5.1010 LearningRate 0.0151 Epoch: 12 Global Step: 151730 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:05:14,440-Speed 2972.14 samples/sec Loss 5.0865 LearningRate 0.0151 Epoch: 12 Global Step: 151740 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:05:17,769-Speed 3077.04 samples/sec Loss 5.0921 LearningRate 0.0151 Epoch: 12 Global Step: 151750 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:05:21,096-Speed 3078.46 samples/sec Loss 5.0002 LearningRate 0.0151 Epoch: 12 Global Step: 151760 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:05:24,406-Speed 3094.62 samples/sec Loss 5.0672 LearningRate 0.0151 Epoch: 12 Global Step: 151770 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:05:27,824-Speed 2997.34 samples/sec Loss 5.0684 LearningRate 0.0151 Epoch: 12 Global Step: 151780 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:05:31,178-Speed 3054.10 samples/sec Loss 5.0434 LearningRate 0.0151 Epoch: 12 Global Step: 151790 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:05:34,573-Speed 3016.67 samples/sec Loss 5.1084 LearningRate 0.0151 Epoch: 12 Global Step: 151800 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:05:38,038-Speed 2956.65 samples/sec Loss 5.1459 LearningRate 0.0151 Epoch: 12 Global Step: 151810 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:05:41,404-Speed 3043.69 samples/sec Loss 5.1703 LearningRate 0.0151 Epoch: 12 Global Step: 151820 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:05:44,925-Speed 2908.99 samples/sec Loss 5.0737 LearningRate 0.0151 Epoch: 12 Global Step: 151830 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:05:48,284-Speed 3049.79 samples/sec Loss 5.1221 LearningRate 0.0151 Epoch: 12 Global Step: 151840 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:05:51,638-Speed 3054.05 samples/sec Loss 5.1147 LearningRate 0.0151 Epoch: 12 Global Step: 151850 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:05:55,068-Speed 2985.86 samples/sec Loss 4.9785 LearningRate 0.0151 Epoch: 12 Global Step: 151860 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:05:58,503-Speed 2982.18 samples/sec Loss 5.0831 LearningRate 0.0151 Epoch: 12 Global Step: 151870 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:06:01,933-Speed 2986.76 samples/sec Loss 5.1992 LearningRate 0.0151 Epoch: 12 Global Step: 151880 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:06:05,335-Speed 3010.02 samples/sec Loss 5.1849 LearningRate 0.0151 Epoch: 12 Global Step: 151890 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:06:08,726-Speed 3021.16 samples/sec Loss 5.1178 LearningRate 0.0151 Epoch: 12 Global Step: 151900 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:06:12,092-Speed 3042.92 samples/sec Loss 5.0604 LearningRate 0.0151 Epoch: 12 Global Step: 151910 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:06:15,449-Speed 3051.87 samples/sec Loss 5.1513 LearningRate 0.0151 Epoch: 12 Global Step: 151920 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:06:18,835-Speed 3025.19 samples/sec Loss 5.1971 LearningRate 0.0151 Epoch: 12 Global Step: 151930 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:06:22,166-Speed 3074.62 samples/sec Loss 5.1637 LearningRate 0.0151 Epoch: 12 Global Step: 151940 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:06:25,535-Speed 3040.16 samples/sec Loss 5.1030 LearningRate 0.0151 Epoch: 12 Global Step: 151950 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:06:28,903-Speed 3041.92 samples/sec Loss 5.1954 LearningRate 0.0151 Epoch: 12 Global Step: 151960 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:06:32,310-Speed 3006.09 samples/sec Loss 5.1528 LearningRate 0.0151 Epoch: 12 Global Step: 151970 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:06:35,663-Speed 3054.65 samples/sec Loss 5.1156 LearningRate 0.0151 Epoch: 12 Global Step: 151980 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:06:39,120-Speed 2962.81 samples/sec Loss 5.0520 LearningRate 0.0151 Epoch: 12 Global Step: 151990 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:06:42,431-Speed 3094.11 samples/sec Loss 5.1104 LearningRate 0.0151 Epoch: 12 Global Step: 152000 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:06:45,789-Speed 3049.78 samples/sec Loss 5.1182 LearningRate 0.0151 Epoch: 12 Global Step: 152010 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:06:49,152-Speed 3046.21 samples/sec Loss 5.0768 LearningRate 0.0151 Epoch: 12 Global Step: 152020 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:06:52,569-Speed 2997.45 samples/sec Loss 5.1648 LearningRate 0.0151 Epoch: 12 Global Step: 152030 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:06:55,984-Speed 2999.19 samples/sec Loss 5.0954 LearningRate 0.0151 Epoch: 12 Global Step: 152040 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:06:59,370-Speed 3024.89 samples/sec Loss 5.1880 LearningRate 0.0150 Epoch: 12 Global Step: 152050 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:07:02,714-Speed 3062.88 samples/sec Loss 5.0559 LearningRate 0.0150 Epoch: 12 Global Step: 152060 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:07:06,019-Speed 3099.44 samples/sec Loss 5.1366 LearningRate 0.0150 Epoch: 12 Global Step: 152070 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:07:09,355-Speed 3070.24 samples/sec Loss 5.1298 LearningRate 0.0150 Epoch: 12 Global Step: 152080 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:07:12,752-Speed 3015.47 samples/sec Loss 5.1377 LearningRate 0.0150 Epoch: 12 Global Step: 152090 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:07:16,069-Speed 3087.59 samples/sec Loss 5.1694 LearningRate 0.0150 Epoch: 12 Global Step: 152100 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:07:19,468-Speed 3013.73 samples/sec Loss 5.2600 LearningRate 0.0150 Epoch: 12 Global Step: 152110 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:07:22,811-Speed 3064.39 samples/sec Loss 5.1525 LearningRate 0.0150 Epoch: 12 Global Step: 152120 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:07:26,134-Speed 3082.93 samples/sec Loss 5.1059 LearningRate 0.0150 Epoch: 12 Global Step: 152130 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:07:29,475-Speed 3066.15 samples/sec Loss 5.1483 LearningRate 0.0150 Epoch: 12 Global Step: 152140 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:07:32,799-Speed 3081.24 samples/sec Loss 5.1613 LearningRate 0.0150 Epoch: 12 Global Step: 152150 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:07:36,172-Speed 3036.16 samples/sec Loss 5.1675 LearningRate 0.0150 Epoch: 12 Global Step: 152160 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:07:39,507-Speed 3071.73 samples/sec Loss 5.1500 LearningRate 0.0150 Epoch: 12 Global Step: 152170 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:07:42,864-Speed 3051.58 samples/sec Loss 5.0720 LearningRate 0.0150 Epoch: 12 Global Step: 152180 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:07:46,208-Speed 3062.65 samples/sec Loss 5.1058 LearningRate 0.0150 Epoch: 12 Global Step: 152190 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:07:49,527-Speed 3086.61 samples/sec Loss 5.1202 LearningRate 0.0150 Epoch: 12 Global Step: 152200 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:07:52,915-Speed 3023.80 samples/sec Loss 5.1170 LearningRate 0.0150 Epoch: 12 Global Step: 152210 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:07:56,267-Speed 3055.76 samples/sec Loss 5.1478 LearningRate 0.0150 Epoch: 12 Global Step: 152220 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:07:59,656-Speed 3022.20 samples/sec Loss 5.2333 LearningRate 0.0150 Epoch: 12 Global Step: 152230 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:08:03,083-Speed 2989.15 samples/sec Loss 5.1223 LearningRate 0.0150 Epoch: 12 Global Step: 152240 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:08:06,431-Speed 3059.43 samples/sec Loss 5.2112 LearningRate 0.0150 Epoch: 12 Global Step: 152250 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:08:09,877-Speed 2973.03 samples/sec Loss 5.1919 LearningRate 0.0150 Epoch: 12 Global Step: 152260 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:08:13,273-Speed 3016.29 samples/sec Loss 5.0476 LearningRate 0.0150 Epoch: 12 Global Step: 152270 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:08:16,667-Speed 3017.88 samples/sec Loss 5.1850 LearningRate 0.0150 Epoch: 12 Global Step: 152280 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:08:20,048-Speed 3029.34 samples/sec Loss 5.2112 LearningRate 0.0150 Epoch: 12 Global Step: 152290 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:08:23,443-Speed 3016.92 samples/sec Loss 5.1935 LearningRate 0.0150 Epoch: 12 Global Step: 152300 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:08:26,780-Speed 3069.56 samples/sec Loss 5.0629 LearningRate 0.0150 Epoch: 12 Global Step: 152310 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:08:30,132-Speed 3056.20 samples/sec Loss 5.1560 LearningRate 0.0150 Epoch: 12 Global Step: 152320 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:08:33,547-Speed 2999.32 samples/sec Loss 5.1293 LearningRate 0.0150 Epoch: 12 Global Step: 152330 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:08:36,918-Speed 3038.03 samples/sec Loss 5.1900 LearningRate 0.0150 Epoch: 12 Global Step: 152340 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:08:40,296-Speed 3032.51 samples/sec Loss 5.2122 LearningRate 0.0150 Epoch: 12 Global Step: 152350 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:08:43,614-Speed 3086.81 samples/sec Loss 5.2495 LearningRate 0.0150 Epoch: 12 Global Step: 152360 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:08:47,031-Speed 2997.45 samples/sec Loss 5.1843 LearningRate 0.0149 Epoch: 12 Global Step: 152370 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:08:50,438-Speed 3006.32 samples/sec Loss 5.1936 LearningRate 0.0149 Epoch: 12 Global Step: 152380 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:08:53,793-Speed 3053.24 samples/sec Loss 5.2251 LearningRate 0.0149 Epoch: 12 Global Step: 152390 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:08:57,284-Speed 2933.66 samples/sec Loss 5.2227 LearningRate 0.0149 Epoch: 12 Global Step: 152400 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:09:00,699-Speed 3000.51 samples/sec Loss 5.2477 LearningRate 0.0149 Epoch: 12 Global Step: 152410 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:09:04,084-Speed 3025.07 samples/sec Loss 5.1361 LearningRate 0.0149 Epoch: 12 Global Step: 152420 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:09:07,491-Speed 3006.75 samples/sec Loss 5.1330 LearningRate 0.0149 Epoch: 12 Global Step: 152430 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:09:10,967-Speed 2946.68 samples/sec Loss 5.0906 LearningRate 0.0149 Epoch: 12 Global Step: 152440 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:09:14,334-Speed 3041.97 samples/sec Loss 5.1225 LearningRate 0.0149 Epoch: 12 Global Step: 152450 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:09:17,699-Speed 3043.83 samples/sec Loss 5.1148 LearningRate 0.0149 Epoch: 12 Global Step: 152460 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:09:21,068-Speed 3041.02 samples/sec Loss 5.1124 LearningRate 0.0149 Epoch: 12 Global Step: 152470 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:09:24,441-Speed 3036.65 samples/sec Loss 5.2021 LearningRate 0.0149 Epoch: 12 Global Step: 152480 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:09:27,835-Speed 3018.20 samples/sec Loss 5.2318 LearningRate 0.0149 Epoch: 12 Global Step: 152490 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:09:31,177-Speed 3065.04 samples/sec Loss 5.2234 LearningRate 0.0149 Epoch: 12 Global Step: 152500 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:09:34,524-Speed 3060.40 samples/sec Loss 5.2133 LearningRate 0.0149 Epoch: 12 Global Step: 152510 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:09:38,001-Speed 2945.14 samples/sec Loss 5.1605 LearningRate 0.0149 Epoch: 12 Global Step: 152520 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:09:41,353-Speed 3056.14 samples/sec Loss 5.1429 LearningRate 0.0149 Epoch: 12 Global Step: 152530 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:09:44,822-Speed 2952.45 samples/sec Loss 5.2875 LearningRate 0.0149 Epoch: 12 Global Step: 152540 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:09:48,268-Speed 2972.46 samples/sec Loss 5.2317 LearningRate 0.0149 Epoch: 12 Global Step: 152550 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:09:51,663-Speed 3017.35 samples/sec Loss 5.2172 LearningRate 0.0149 Epoch: 12 Global Step: 152560 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:09:55,081-Speed 2996.45 samples/sec Loss 5.2819 LearningRate 0.0149 Epoch: 12 Global Step: 152570 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:09:58,508-Speed 2988.84 samples/sec Loss 5.2377 LearningRate 0.0149 Epoch: 12 Global Step: 152580 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:10:01,901-Speed 3019.11 samples/sec Loss 5.1127 LearningRate 0.0149 Epoch: 12 Global Step: 152590 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:10:05,276-Speed 3035.00 samples/sec Loss 5.0997 LearningRate 0.0149 Epoch: 12 Global Step: 152600 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:10:08,580-Speed 3100.29 samples/sec Loss 5.2472 LearningRate 0.0149 Epoch: 12 Global Step: 152610 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:10:11,940-Speed 3048.56 samples/sec Loss 5.2688 LearningRate 0.0149 Epoch: 12 Global Step: 152620 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:10:15,305-Speed 3044.35 samples/sec Loss 5.1226 LearningRate 0.0149 Epoch: 12 Global Step: 152630 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:10:18,695-Speed 3021.58 samples/sec Loss 5.2250 LearningRate 0.0149 Epoch: 12 Global Step: 152640 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:10:22,139-Speed 2974.30 samples/sec Loss 5.1946 LearningRate 0.0149 Epoch: 12 Global Step: 152650 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:10:25,598-Speed 2961.64 samples/sec Loss 5.2274 LearningRate 0.0149 Epoch: 12 Global Step: 152660 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:10:29,025-Speed 2989.15 samples/sec Loss 5.1132 LearningRate 0.0149 Epoch: 12 Global Step: 152670 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:10:32,463-Speed 2979.08 samples/sec Loss 5.1450 LearningRate 0.0149 Epoch: 12 Global Step: 152680 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:10:35,779-Speed 3088.61 samples/sec Loss 5.1641 LearningRate 0.0148 Epoch: 12 Global Step: 152690 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:10:39,183-Speed 3009.24 samples/sec Loss 5.2693 LearningRate 0.0148 Epoch: 12 Global Step: 152700 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:10:42,582-Speed 3013.60 samples/sec Loss 5.1750 LearningRate 0.0148 Epoch: 12 Global Step: 152710 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:10:45,997-Speed 2998.90 samples/sec Loss 5.1957 LearningRate 0.0148 Epoch: 12 Global Step: 152720 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:10:49,407-Speed 3004.46 samples/sec Loss 5.2999 LearningRate 0.0148 Epoch: 12 Global Step: 152730 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:10:52,843-Speed 2981.42 samples/sec Loss 5.1834 LearningRate 0.0148 Epoch: 12 Global Step: 152740 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:10:56,303-Speed 2960.23 samples/sec Loss 5.1568 LearningRate 0.0148 Epoch: 12 Global Step: 152750 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:10:59,783-Speed 2943.65 samples/sec Loss 5.2790 LearningRate 0.0148 Epoch: 12 Global Step: 152760 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:11:03,185-Speed 3010.74 samples/sec Loss 5.2034 LearningRate 0.0148 Epoch: 12 Global Step: 152770 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:11:06,572-Speed 3024.27 samples/sec Loss 5.1526 LearningRate 0.0148 Epoch: 12 Global Step: 152780 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:11:09,953-Speed 3029.63 samples/sec Loss 5.2297 LearningRate 0.0148 Epoch: 12 Global Step: 152790 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:11:13,418-Speed 2955.79 samples/sec Loss 5.2612 LearningRate 0.0148 Epoch: 12 Global Step: 152800 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:11:16,853-Speed 2982.10 samples/sec Loss 5.2141 LearningRate 0.0148 Epoch: 12 Global Step: 152810 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:11:20,187-Speed 3072.51 samples/sec Loss 5.1670 LearningRate 0.0148 Epoch: 12 Global Step: 152820 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:11:23,557-Speed 3038.79 samples/sec Loss 5.1345 LearningRate 0.0148 Epoch: 12 Global Step: 152830 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:11:26,975-Speed 2997.05 samples/sec Loss 5.2195 LearningRate 0.0148 Epoch: 12 Global Step: 152840 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:11:30,354-Speed 3030.82 samples/sec Loss 5.1443 LearningRate 0.0148 Epoch: 12 Global Step: 152850 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:11:33,780-Speed 2990.06 samples/sec Loss 5.2217 LearningRate 0.0148 Epoch: 12 Global Step: 152860 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:11:37,239-Speed 2961.26 samples/sec Loss 5.2506 LearningRate 0.0148 Epoch: 12 Global Step: 152870 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:11:40,697-Speed 2962.22 samples/sec Loss 5.2700 LearningRate 0.0148 Epoch: 12 Global Step: 152880 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:11:44,121-Speed 2991.42 samples/sec Loss 5.3004 LearningRate 0.0148 Epoch: 12 Global Step: 152890 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:11:47,557-Speed 2981.33 samples/sec Loss 5.2459 LearningRate 0.0148 Epoch: 12 Global Step: 152900 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:11:50,947-Speed 3021.15 samples/sec Loss 5.2789 LearningRate 0.0148 Epoch: 12 Global Step: 152910 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:11:54,258-Speed 3093.99 samples/sec Loss 5.1893 LearningRate 0.0148 Epoch: 12 Global Step: 152920 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:11:57,637-Speed 3030.71 samples/sec Loss 5.2101 LearningRate 0.0148 Epoch: 12 Global Step: 152930 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:12:01,058-Speed 2994.42 samples/sec Loss 5.2643 LearningRate 0.0148 Epoch: 12 Global Step: 152940 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:12:04,448-Speed 3021.64 samples/sec Loss 5.2895 LearningRate 0.0148 Epoch: 12 Global Step: 152950 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:12:07,806-Speed 3050.32 samples/sec Loss 5.1651 LearningRate 0.0148 Epoch: 12 Global Step: 152960 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:12:11,142-Speed 3070.56 samples/sec Loss 5.2924 LearningRate 0.0148 Epoch: 12 Global Step: 152970 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:12:14,548-Speed 3007.99 samples/sec Loss 5.2900 LearningRate 0.0148 Epoch: 12 Global Step: 152980 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:12:17,893-Speed 3061.90 samples/sec Loss 5.2401 LearningRate 0.0148 Epoch: 12 Global Step: 152990 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:12:21,303-Speed 3003.22 samples/sec Loss 5.2493 LearningRate 0.0148 Epoch: 12 Global Step: 153000 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:12:24,699-Speed 3016.81 samples/sec Loss 5.2401 LearningRate 0.0148 Epoch: 12 Global Step: 153010 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:12:28,040-Speed 3065.56 samples/sec Loss 5.2475 LearningRate 0.0147 Epoch: 12 Global Step: 153020 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:12:31,423-Speed 3028.28 samples/sec Loss 5.3219 LearningRate 0.0147 Epoch: 12 Global Step: 153030 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:12:34,827-Speed 3008.75 samples/sec Loss 5.1939 LearningRate 0.0147 Epoch: 12 Global Step: 153040 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:12:38,209-Speed 3028.52 samples/sec Loss 5.2384 LearningRate 0.0147 Epoch: 12 Global Step: 153050 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:12:41,529-Speed 3086.05 samples/sec Loss 5.2652 LearningRate 0.0147 Epoch: 12 Global Step: 153060 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:12:44,893-Speed 3045.16 samples/sec Loss 5.1994 LearningRate 0.0147 Epoch: 12 Global Step: 153070 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:12:48,234-Speed 3065.21 samples/sec Loss 5.2190 LearningRate 0.0147 Epoch: 12 Global Step: 153080 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:12:51,557-Speed 3082.85 samples/sec Loss 5.1661 LearningRate 0.0147 Epoch: 12 Global Step: 153090 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:12:54,959-Speed 3010.70 samples/sec Loss 5.1818 LearningRate 0.0147 Epoch: 12 Global Step: 153100 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:12:58,362-Speed 3010.05 samples/sec Loss 5.3099 LearningRate 0.0147 Epoch: 12 Global Step: 153110 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:13:01,693-Speed 3074.58 samples/sec Loss 5.2622 LearningRate 0.0147 Epoch: 12 Global Step: 153120 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:13:05,146-Speed 2966.49 samples/sec Loss 5.2969 LearningRate 0.0147 Epoch: 12 Global Step: 153130 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:13:08,522-Speed 3034.33 samples/sec Loss 5.1938 LearningRate 0.0147 Epoch: 12 Global Step: 153140 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:13:11,863-Speed 3066.32 samples/sec Loss 5.3181 LearningRate 0.0147 Epoch: 12 Global Step: 153150 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:13:15,232-Speed 3039.37 samples/sec Loss 5.3494 LearningRate 0.0147 Epoch: 12 Global Step: 153160 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:13:18,649-Speed 2998.12 samples/sec Loss 5.3477 LearningRate 0.0147 Epoch: 12 Global Step: 153170 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:13:22,032-Speed 3027.42 samples/sec Loss 5.3240 LearningRate 0.0147 Epoch: 12 Global Step: 153180 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:13:25,485-Speed 2966.66 samples/sec Loss 5.2893 LearningRate 0.0147 Epoch: 12 Global Step: 153190 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:13:28,885-Speed 3012.35 samples/sec Loss 5.2061 LearningRate 0.0147 Epoch: 12 Global Step: 153200 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:13:32,276-Speed 3021.10 samples/sec Loss 5.3222 LearningRate 0.0147 Epoch: 12 Global Step: 153210 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:13:35,665-Speed 3022.66 samples/sec Loss 5.1959 LearningRate 0.0147 Epoch: 12 Global Step: 153220 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:13:39,059-Speed 3017.73 samples/sec Loss 5.2450 LearningRate 0.0147 Epoch: 12 Global Step: 153230 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:13:42,539-Speed 2943.76 samples/sec Loss 5.2239 LearningRate 0.0147 Epoch: 12 Global Step: 153240 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:13:45,982-Speed 2974.85 samples/sec Loss 5.3586 LearningRate 0.0147 Epoch: 12 Global Step: 153250 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:13:49,388-Speed 3006.66 samples/sec Loss 5.2730 LearningRate 0.0147 Epoch: 12 Global Step: 153260 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:13:52,795-Speed 3006.83 samples/sec Loss 5.2777 LearningRate 0.0147 Epoch: 12 Global Step: 153270 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:13:56,181-Speed 3025.39 samples/sec Loss 5.2561 LearningRate 0.0147 Epoch: 12 Global Step: 153280 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:13:59,544-Speed 3045.56 samples/sec Loss 5.2219 LearningRate 0.0147 Epoch: 12 Global Step: 153290 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:14:03,043-Speed 2927.98 samples/sec Loss 5.2718 LearningRate 0.0147 Epoch: 12 Global Step: 153300 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:14:06,439-Speed 3015.86 samples/sec Loss 5.2420 LearningRate 0.0147 Epoch: 12 Global Step: 153310 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:14:09,830-Speed 3021.15 samples/sec Loss 5.3430 LearningRate 0.0147 Epoch: 12 Global Step: 153320 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:14:13,213-Speed 3027.87 samples/sec Loss 5.1785 LearningRate 0.0147 Epoch: 12 Global Step: 153330 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:14:16,606-Speed 3018.36 samples/sec Loss 5.2391 LearningRate 0.0146 Epoch: 12 Global Step: 153340 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:14:19,983-Speed 3033.37 samples/sec Loss 5.3195 LearningRate 0.0146 Epoch: 12 Global Step: 153350 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:14:23,440-Speed 2963.13 samples/sec Loss 5.2948 LearningRate 0.0146 Epoch: 12 Global Step: 153360 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:14:26,956-Speed 2913.69 samples/sec Loss 5.3362 LearningRate 0.0146 Epoch: 12 Global Step: 153370 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:14:30,335-Speed 3030.79 samples/sec Loss 5.2261 LearningRate 0.0146 Epoch: 12 Global Step: 153380 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:14:33,728-Speed 3019.03 samples/sec Loss 5.2837 LearningRate 0.0146 Epoch: 12 Global Step: 153390 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:14:37,123-Speed 3017.35 samples/sec Loss 5.4372 LearningRate 0.0146 Epoch: 12 Global Step: 153400 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:14:40,498-Speed 3035.41 samples/sec Loss 5.3119 LearningRate 0.0146 Epoch: 12 Global Step: 153410 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:14:43,856-Speed 3049.93 samples/sec Loss 5.1542 LearningRate 0.0146 Epoch: 12 Global Step: 153420 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:14:47,245-Speed 3022.51 samples/sec Loss 5.2529 LearningRate 0.0146 Epoch: 12 Global Step: 153430 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:14:50,602-Speed 3050.95 samples/sec Loss 5.3586 LearningRate 0.0146 Epoch: 12 Global Step: 153440 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:14:54,034-Speed 2984.59 samples/sec Loss 5.2743 LearningRate 0.0146 Epoch: 12 Global Step: 153450 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:14:57,418-Speed 3026.56 samples/sec Loss 5.1915 LearningRate 0.0146 Epoch: 12 Global Step: 153460 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:15:00,871-Speed 2967.06 samples/sec Loss 5.1983 LearningRate 0.0146 Epoch: 12 Global Step: 153470 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:15:04,336-Speed 2955.67 samples/sec Loss 5.3701 LearningRate 0.0146 Epoch: 12 Global Step: 153480 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:15:07,770-Speed 2983.51 samples/sec Loss 5.2208 LearningRate 0.0146 Epoch: 12 Global Step: 153490 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:15:11,154-Speed 3026.08 samples/sec Loss 5.3429 LearningRate 0.0146 Epoch: 12 Global Step: 153500 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:15:14,483-Speed 3077.46 samples/sec Loss 5.2303 LearningRate 0.0146 Epoch: 12 Global Step: 153510 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:15:17,867-Speed 3026.29 samples/sec Loss 5.2841 LearningRate 0.0146 Epoch: 12 Global Step: 153520 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:15:21,265-Speed 3014.53 samples/sec Loss 5.3336 LearningRate 0.0146 Epoch: 12 Global Step: 153530 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:15:24,591-Speed 3079.66 samples/sec Loss 5.2474 LearningRate 0.0146 Epoch: 12 Global Step: 153540 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:15:27,976-Speed 3026.05 samples/sec Loss 5.2227 LearningRate 0.0146 Epoch: 12 Global Step: 153550 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:15:31,355-Speed 3031.37 samples/sec Loss 5.1919 LearningRate 0.0146 Epoch: 12 Global Step: 153560 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:15:34,795-Speed 2978.23 samples/sec Loss 5.3091 LearningRate 0.0146 Epoch: 12 Global Step: 153570 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:15:38,185-Speed 3021.22 samples/sec Loss 5.1683 LearningRate 0.0146 Epoch: 12 Global Step: 153580 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:15:41,629-Speed 2974.11 samples/sec Loss 5.3216 LearningRate 0.0146 Epoch: 12 Global Step: 153590 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:15:45,011-Speed 3028.71 samples/sec Loss 5.2533 LearningRate 0.0146 Epoch: 12 Global Step: 153600 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:15:48,431-Speed 2994.66 samples/sec Loss 5.3116 LearningRate 0.0146 Epoch: 12 Global Step: 153610 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:15:51,757-Speed 3079.68 samples/sec Loss 5.2103 LearningRate 0.0146 Epoch: 12 Global Step: 153620 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:15:55,108-Speed 3057.36 samples/sec Loss 5.2891 LearningRate 0.0146 Epoch: 12 Global Step: 153630 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:15:58,532-Speed 2991.07 samples/sec Loss 5.2788 LearningRate 0.0146 Epoch: 12 Global Step: 153640 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:16:01,957-Speed 2991.10 samples/sec Loss 5.2660 LearningRate 0.0146 Epoch: 12 Global Step: 153650 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:16:05,350-Speed 3018.25 samples/sec Loss 5.2187 LearningRate 0.0146 Epoch: 12 Global Step: 153660 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:16:08,696-Speed 3061.60 samples/sec Loss 5.3678 LearningRate 0.0145 Epoch: 12 Global Step: 153670 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:16:12,088-Speed 3019.17 samples/sec Loss 5.2426 LearningRate 0.0145 Epoch: 12 Global Step: 153680 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:16:15,441-Speed 3055.54 samples/sec Loss 5.3542 LearningRate 0.0145 Epoch: 12 Global Step: 153690 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:16:18,853-Speed 3002.09 samples/sec Loss 5.2923 LearningRate 0.0145 Epoch: 12 Global Step: 153700 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:16:22,235-Speed 3027.65 samples/sec Loss 5.2526 LearningRate 0.0145 Epoch: 12 Global Step: 153710 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:16:25,594-Speed 3049.89 samples/sec Loss 5.3258 LearningRate 0.0145 Epoch: 12 Global Step: 153720 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:16:28,978-Speed 3026.71 samples/sec Loss 5.4100 LearningRate 0.0145 Epoch: 12 Global Step: 153730 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:16:32,323-Speed 3062.69 samples/sec Loss 5.3876 LearningRate 0.0145 Epoch: 12 Global Step: 153740 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:16:35,760-Speed 2980.39 samples/sec Loss 5.3215 LearningRate 0.0145 Epoch: 12 Global Step: 153750 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:16:39,138-Speed 3031.88 samples/sec Loss 5.2357 LearningRate 0.0145 Epoch: 12 Global Step: 153760 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:16:42,583-Speed 2973.61 samples/sec Loss 5.3083 LearningRate 0.0145 Epoch: 12 Global Step: 153770 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:16:45,940-Speed 3051.01 samples/sec Loss 5.2214 LearningRate 0.0145 Epoch: 12 Global Step: 153780 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:16:49,284-Speed 3063.00 samples/sec Loss 5.2850 LearningRate 0.0145 Epoch: 12 Global Step: 153790 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:16:52,651-Speed 3043.39 samples/sec Loss 5.3661 LearningRate 0.0145 Epoch: 12 Global Step: 153800 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:16:55,995-Speed 3062.77 samples/sec Loss 5.3382 LearningRate 0.0145 Epoch: 12 Global Step: 153810 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:16:59,349-Speed 3053.43 samples/sec Loss 5.3307 LearningRate 0.0145 Epoch: 12 Global Step: 153820 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:17:02,714-Speed 3043.84 samples/sec Loss 5.3103 LearningRate 0.0145 Epoch: 12 Global Step: 153830 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:17:06,105-Speed 3021.21 samples/sec Loss 5.2907 LearningRate 0.0145 Epoch: 12 Global Step: 153840 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:17:09,474-Speed 3040.23 samples/sec Loss 5.3660 LearningRate 0.0145 Epoch: 12 Global Step: 153850 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:17:12,798-Speed 3081.41 samples/sec Loss 5.3779 LearningRate 0.0145 Epoch: 12 Global Step: 153860 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:17:16,162-Speed 3044.58 samples/sec Loss 5.3071 LearningRate 0.0145 Epoch: 12 Global Step: 153870 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:17:19,536-Speed 3036.53 samples/sec Loss 5.3706 LearningRate 0.0145 Epoch: 12 Global Step: 153880 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:17:22,907-Speed 3038.57 samples/sec Loss 5.1883 LearningRate 0.0145 Epoch: 12 Global Step: 153890 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:17:26,342-Speed 2981.68 samples/sec Loss 5.3451 LearningRate 0.0145 Epoch: 12 Global Step: 153900 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:17:29,729-Speed 3023.64 samples/sec Loss 5.3852 LearningRate 0.0145 Epoch: 12 Global Step: 153910 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:17:33,085-Speed 3052.04 samples/sec Loss 5.3615 LearningRate 0.0145 Epoch: 12 Global Step: 153920 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:17:37,104-Speed 2548.45 samples/sec Loss 5.2620 LearningRate 0.0145 Epoch: 12 Global Step: 153930 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:17:40,429-Speed 3080.71 samples/sec Loss 5.2414 LearningRate 0.0145 Epoch: 12 Global Step: 153940 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:17:43,755-Speed 3079.97 samples/sec Loss 5.3087 LearningRate 0.0145 Epoch: 12 Global Step: 153950 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:17:47,679-Speed 2609.77 samples/sec Loss 5.3124 LearningRate 0.0145 Epoch: 12 Global Step: 153960 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:17:51,053-Speed 3035.92 samples/sec Loss 5.3817 LearningRate 0.0145 Epoch: 12 Global Step: 153970 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:17:54,371-Speed 3087.66 samples/sec Loss 5.3800 LearningRate 0.0145 Epoch: 12 Global Step: 153980 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:17:59,009-Speed 2207.86 samples/sec Loss 5.3013 LearningRate 0.0144 Epoch: 12 Global Step: 153990 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:18:02,413-Speed 3009.45 samples/sec Loss 5.3590 LearningRate 0.0144 Epoch: 12 Global Step: 154000 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:18:05,828-Speed 2999.34 samples/sec Loss 5.3691 LearningRate 0.0144 Epoch: 12 Global Step: 154010 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:18:09,194-Speed 3042.63 samples/sec Loss 5.4098 LearningRate 0.0144 Epoch: 12 Global Step: 154020 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:18:12,602-Speed 3006.50 samples/sec Loss 5.2483 LearningRate 0.0144 Epoch: 12 Global Step: 154030 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:18:16,047-Speed 2972.87 samples/sec Loss 5.2796 LearningRate 0.0144 Epoch: 12 Global Step: 154040 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:18:19,449-Speed 3011.47 samples/sec Loss 5.3007 LearningRate 0.0144 Epoch: 12 Global Step: 154050 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:18:22,818-Speed 3040.25 samples/sec Loss 5.4087 LearningRate 0.0144 Epoch: 12 Global Step: 154060 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:18:26,171-Speed 3054.34 samples/sec Loss 5.3442 LearningRate 0.0144 Epoch: 12 Global Step: 154070 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:18:29,540-Speed 3040.82 samples/sec Loss 5.4136 LearningRate 0.0144 Epoch: 12 Global Step: 154080 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:18:32,928-Speed 3023.19 samples/sec Loss 5.3086 LearningRate 0.0144 Epoch: 12 Global Step: 154090 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:18:36,350-Speed 2992.53 samples/sec Loss 5.3892 LearningRate 0.0144 Epoch: 12 Global Step: 154100 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:18:39,690-Speed 3067.04 samples/sec Loss 5.3981 LearningRate 0.0144 Epoch: 12 Global Step: 154110 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:18:43,093-Speed 3010.90 samples/sec Loss 5.3012 LearningRate 0.0144 Epoch: 12 Global Step: 154120 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:18:46,479-Speed 3025.36 samples/sec Loss 5.4932 LearningRate 0.0144 Epoch: 12 Global Step: 154130 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:18:49,825-Speed 3061.33 samples/sec Loss 5.1709 LearningRate 0.0144 Epoch: 12 Global Step: 154140 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:18:53,194-Speed 3040.28 samples/sec Loss 5.3098 LearningRate 0.0144 Epoch: 12 Global Step: 154150 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:18:56,508-Speed 3090.85 samples/sec Loss 5.2518 LearningRate 0.0144 Epoch: 12 Global Step: 154160 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:19:00,023-Speed 2914.07 samples/sec Loss 5.3225 LearningRate 0.0144 Epoch: 12 Global Step: 154170 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:19:03,378-Speed 3052.73 samples/sec Loss 5.3898 LearningRate 0.0144 Epoch: 12 Global Step: 154180 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:19:06,755-Speed 3033.16 samples/sec Loss 5.2727 LearningRate 0.0144 Epoch: 12 Global Step: 154190 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:19:10,115-Speed 3048.96 samples/sec Loss 5.2504 LearningRate 0.0144 Epoch: 12 Global Step: 154200 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:19:13,469-Speed 3054.72 samples/sec Loss 5.3158 LearningRate 0.0144 Epoch: 12 Global Step: 154210 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:19:16,912-Speed 2974.06 samples/sec Loss 5.3326 LearningRate 0.0144 Epoch: 12 Global Step: 154220 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:19:20,343-Speed 2985.76 samples/sec Loss 5.3020 LearningRate 0.0144 Epoch: 12 Global Step: 154230 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:19:23,686-Speed 3064.36 samples/sec Loss 5.3701 LearningRate 0.0144 Epoch: 12 Global Step: 154240 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:19:27,096-Speed 3003.17 samples/sec Loss 5.2765 LearningRate 0.0144 Epoch: 12 Global Step: 154250 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:19:30,548-Speed 2967.81 samples/sec Loss 5.2996 LearningRate 0.0144 Epoch: 12 Global Step: 154260 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:19:33,949-Speed 3011.18 samples/sec Loss 5.3473 LearningRate 0.0144 Epoch: 12 Global Step: 154270 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:19:37,356-Speed 3007.12 samples/sec Loss 5.3114 LearningRate 0.0144 Epoch: 12 Global Step: 154280 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:19:40,775-Speed 2996.07 samples/sec Loss 5.2990 LearningRate 0.0144 Epoch: 12 Global Step: 154290 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:19:44,135-Speed 3048.11 samples/sec Loss 5.4063 LearningRate 0.0144 Epoch: 12 Global Step: 154300 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:19:47,502-Speed 3041.51 samples/sec Loss 5.3314 LearningRate 0.0144 Epoch: 12 Global Step: 154310 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:19:50,896-Speed 3029.36 samples/sec Loss 5.2962 LearningRate 0.0143 Epoch: 12 Global Step: 154320 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:19:54,821-Speed 2609.22 samples/sec Loss 5.3332 LearningRate 0.0143 Epoch: 12 Global Step: 154330 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:19:58,801-Speed 2573.27 samples/sec Loss 5.3503 LearningRate 0.0143 Epoch: 12 Global Step: 154340 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:20:02,194-Speed 3019.30 samples/sec Loss 5.4159 LearningRate 0.0143 Epoch: 12 Global Step: 154350 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:20:05,620-Speed 2989.06 samples/sec Loss 5.3833 LearningRate 0.0143 Epoch: 12 Global Step: 154360 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:20:09,009-Speed 3022.79 samples/sec Loss 5.2345 LearningRate 0.0143 Epoch: 12 Global Step: 154370 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:20:12,440-Speed 2985.52 samples/sec Loss 5.4274 LearningRate 0.0143 Epoch: 12 Global Step: 154380 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:20:15,822-Speed 3028.76 samples/sec Loss 5.2538 LearningRate 0.0143 Epoch: 12 Global Step: 154390 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:20:19,268-Speed 2972.05 samples/sec Loss 5.3406 LearningRate 0.0143 Epoch: 12 Global Step: 154400 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:20:22,743-Speed 2947.81 samples/sec Loss 5.2610 LearningRate 0.0143 Epoch: 12 Global Step: 154410 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:20:26,135-Speed 3020.25 samples/sec Loss 5.3898 LearningRate 0.0143 Epoch: 12 Global Step: 154420 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:20:29,493-Speed 3049.71 samples/sec Loss 5.3262 LearningRate 0.0143 Epoch: 12 Global Step: 154430 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:20:32,857-Speed 3045.16 samples/sec Loss 5.2970 LearningRate 0.0143 Epoch: 12 Global Step: 154440 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:20:36,185-Speed 3077.27 samples/sec Loss 5.3027 LearningRate 0.0143 Epoch: 12 Global Step: 154450 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:20:39,620-Speed 2982.53 samples/sec Loss 5.3216 LearningRate 0.0143 Epoch: 12 Global Step: 154460 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:20:42,989-Speed 3039.79 samples/sec Loss 5.3991 LearningRate 0.0143 Epoch: 12 Global Step: 154470 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:20:46,450-Speed 2959.88 samples/sec Loss 5.3022 LearningRate 0.0143 Epoch: 12 Global Step: 154480 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:20:49,821-Speed 3038.17 samples/sec Loss 5.2838 LearningRate 0.0143 Epoch: 12 Global Step: 154490 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:20:53,293-Speed 2950.17 samples/sec Loss 5.2779 LearningRate 0.0143 Epoch: 12 Global Step: 154500 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:20:56,697-Speed 3009.50 samples/sec Loss 5.3351 LearningRate 0.0143 Epoch: 12 Global Step: 154510 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:21:00,132-Speed 2981.87 samples/sec Loss 5.3883 LearningRate 0.0143 Epoch: 12 Global Step: 154520 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:21:03,482-Speed 3057.97 samples/sec Loss 5.2672 LearningRate 0.0143 Epoch: 12 Global Step: 154530 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:21:06,953-Speed 2950.34 samples/sec Loss 5.2989 LearningRate 0.0143 Epoch: 12 Global Step: 154540 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:21:10,401-Speed 2971.17 samples/sec Loss 5.3124 LearningRate 0.0143 Epoch: 12 Global Step: 154550 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:21:13,747-Speed 3061.50 samples/sec Loss 5.3899 LearningRate 0.0143 Epoch: 12 Global Step: 154560 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:21:17,242-Speed 2929.92 samples/sec Loss 5.3194 LearningRate 0.0143 Epoch: 12 Global Step: 154570 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:21:20,625-Speed 3028.94 samples/sec Loss 5.4106 LearningRate 0.0143 Epoch: 12 Global Step: 154580 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:21:24,036-Speed 3003.62 samples/sec Loss 5.3496 LearningRate 0.0143 Epoch: 12 Global Step: 154590 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:21:27,409-Speed 3036.23 samples/sec Loss 5.3982 LearningRate 0.0143 Epoch: 12 Global Step: 154600 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:21:30,918-Speed 2919.66 samples/sec Loss 5.2360 LearningRate 0.0143 Epoch: 12 Global Step: 154610 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:21:34,267-Speed 3057.73 samples/sec Loss 5.4225 LearningRate 0.0143 Epoch: 12 Global Step: 154620 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:21:37,672-Speed 3008.84 samples/sec Loss 5.3309 LearningRate 0.0143 Epoch: 12 Global Step: 154630 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:21:40,986-Speed 3090.68 samples/sec Loss 5.3845 LearningRate 0.0143 Epoch: 12 Global Step: 154640 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:21:44,337-Speed 3056.16 samples/sec Loss 5.2442 LearningRate 0.0142 Epoch: 12 Global Step: 154650 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:21:47,723-Speed 3025.41 samples/sec Loss 5.3426 LearningRate 0.0142 Epoch: 12 Global Step: 154660 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:21:51,066-Speed 3064.30 samples/sec Loss 5.3214 LearningRate 0.0142 Epoch: 12 Global Step: 154670 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:21:54,446-Speed 3030.33 samples/sec Loss 5.2811 LearningRate 0.0142 Epoch: 12 Global Step: 154680 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:21:57,905-Speed 2961.47 samples/sec Loss 5.3645 LearningRate 0.0142 Epoch: 12 Global Step: 154690 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:22:01,208-Speed 3100.49 samples/sec Loss 5.3945 LearningRate 0.0142 Epoch: 12 Global Step: 154700 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:22:04,616-Speed 3005.45 samples/sec Loss 5.3617 LearningRate 0.0142 Epoch: 12 Global Step: 154710 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:22:07,973-Speed 3052.02 samples/sec Loss 5.3047 LearningRate 0.0142 Epoch: 12 Global Step: 154720 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:22:11,444-Speed 2950.86 samples/sec Loss 5.3825 LearningRate 0.0142 Epoch: 12 Global Step: 154730 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:22:14,844-Speed 3012.35 samples/sec Loss 5.2877 LearningRate 0.0142 Epoch: 12 Global Step: 154740 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:22:18,272-Speed 2988.26 samples/sec Loss 5.3206 LearningRate 0.0142 Epoch: 12 Global Step: 154750 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:22:21,691-Speed 2996.51 samples/sec Loss 5.5169 LearningRate 0.0142 Epoch: 12 Global Step: 154760 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:22:25,005-Speed 3090.57 samples/sec Loss 5.2699 LearningRate 0.0142 Epoch: 12 Global Step: 154770 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:22:28,337-Speed 3073.72 samples/sec Loss 5.4102 LearningRate 0.0142 Epoch: 12 Global Step: 154780 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:22:31,755-Speed 2996.94 samples/sec Loss 5.3194 LearningRate 0.0142 Epoch: 12 Global Step: 154790 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:22:35,185-Speed 2985.74 samples/sec Loss 5.2863 LearningRate 0.0142 Epoch: 12 Global Step: 154800 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:22:38,572-Speed 3025.78 samples/sec Loss 5.3072 LearningRate 0.0142 Epoch: 12 Global Step: 154810 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:22:41,914-Speed 3064.78 samples/sec Loss 5.3810 LearningRate 0.0142 Epoch: 12 Global Step: 154820 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:22:45,333-Speed 2995.71 samples/sec Loss 5.3211 LearningRate 0.0142 Epoch: 12 Global Step: 154830 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:22:48,812-Speed 2944.14 samples/sec Loss 5.2711 LearningRate 0.0142 Epoch: 12 Global Step: 154840 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:22:52,199-Speed 3023.94 samples/sec Loss 5.3646 LearningRate 0.0142 Epoch: 12 Global Step: 154850 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:22:55,621-Speed 2993.35 samples/sec Loss 5.3596 LearningRate 0.0142 Epoch: 12 Global Step: 154860 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:22:58,990-Speed 3040.87 samples/sec Loss 5.3673 LearningRate 0.0142 Epoch: 12 Global Step: 154870 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:23:02,478-Speed 2936.19 samples/sec Loss 5.3682 LearningRate 0.0142 Epoch: 12 Global Step: 154880 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:23:05,814-Speed 3070.31 samples/sec Loss 5.3636 LearningRate 0.0142 Epoch: 12 Global Step: 154890 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:23:09,197-Speed 3028.07 samples/sec Loss 5.3917 LearningRate 0.0142 Epoch: 12 Global Step: 154900 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:23:12,550-Speed 3054.94 samples/sec Loss 5.3308 LearningRate 0.0142 Epoch: 12 Global Step: 154910 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:23:15,937-Speed 3024.26 samples/sec Loss 5.4013 LearningRate 0.0142 Epoch: 12 Global Step: 154920 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:23:19,362-Speed 2990.26 samples/sec Loss 5.2649 LearningRate 0.0142 Epoch: 12 Global Step: 154930 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:23:22,824-Speed 2959.18 samples/sec Loss 5.3278 LearningRate 0.0142 Epoch: 12 Global Step: 154940 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:23:26,182-Speed 3050.10 samples/sec Loss 5.3402 LearningRate 0.0142 Epoch: 12 Global Step: 154950 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:23:29,542-Speed 3048.18 samples/sec Loss 5.5277 LearningRate 0.0142 Epoch: 12 Global Step: 154960 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:23:32,884-Speed 3065.07 samples/sec Loss 5.4134 LearningRate 0.0142 Epoch: 12 Global Step: 154970 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:23:36,269-Speed 3026.03 samples/sec Loss 5.3346 LearningRate 0.0141 Epoch: 12 Global Step: 154980 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:23:39,699-Speed 2986.88 samples/sec Loss 5.5005 LearningRate 0.0141 Epoch: 12 Global Step: 154990 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:23:43,131-Speed 2983.64 samples/sec Loss 5.3961 LearningRate 0.0141 Epoch: 12 Global Step: 155000 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:23:46,619-Speed 2936.65 samples/sec Loss 5.3751 LearningRate 0.0141 Epoch: 12 Global Step: 155010 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:23:50,017-Speed 3014.77 samples/sec Loss 5.4412 LearningRate 0.0141 Epoch: 12 Global Step: 155020 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:23:53,406-Speed 3022.78 samples/sec Loss 5.3175 LearningRate 0.0141 Epoch: 12 Global Step: 155030 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:23:56,869-Speed 2957.85 samples/sec Loss 5.3353 LearningRate 0.0141 Epoch: 12 Global Step: 155040 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:24:00,267-Speed 3014.06 samples/sec Loss 5.3034 LearningRate 0.0141 Epoch: 12 Global Step: 155050 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:24:03,614-Speed 3060.06 samples/sec Loss 5.3101 LearningRate 0.0141 Epoch: 12 Global Step: 155060 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:24:07,050-Speed 2981.49 samples/sec Loss 5.4769 LearningRate 0.0141 Epoch: 12 Global Step: 155070 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:24:10,518-Speed 2953.60 samples/sec Loss 5.3108 LearningRate 0.0141 Epoch: 12 Global Step: 155080 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:24:13,920-Speed 3011.19 samples/sec Loss 5.3534 LearningRate 0.0141 Epoch: 12 Global Step: 155090 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:24:17,365-Speed 2972.69 samples/sec Loss 5.3085 LearningRate 0.0141 Epoch: 12 Global Step: 155100 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:24:20,752-Speed 3024.18 samples/sec Loss 5.3531 LearningRate 0.0141 Epoch: 12 Global Step: 155110 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:24:24,141-Speed 3023.25 samples/sec Loss 5.3538 LearningRate 0.0141 Epoch: 12 Global Step: 155120 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:24:27,539-Speed 3014.38 samples/sec Loss 5.3906 LearningRate 0.0141 Epoch: 12 Global Step: 155130 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:24:30,934-Speed 3016.23 samples/sec Loss 5.4551 LearningRate 0.0141 Epoch: 12 Global Step: 155140 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:24:34,316-Speed 3029.49 samples/sec Loss 5.3472 LearningRate 0.0141 Epoch: 12 Global Step: 155150 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:24:37,677-Speed 3047.00 samples/sec Loss 5.2283 LearningRate 0.0141 Epoch: 12 Global Step: 155160 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:24:41,088-Speed 3003.08 samples/sec Loss 5.4187 LearningRate 0.0141 Epoch: 12 Global Step: 155170 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:24:44,546-Speed 2962.16 samples/sec Loss 5.4543 LearningRate 0.0141 Epoch: 12 Global Step: 155180 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:24:47,883-Speed 3069.27 samples/sec Loss 5.2164 LearningRate 0.0141 Epoch: 12 Global Step: 155190 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:24:51,233-Speed 3057.53 samples/sec Loss 5.3880 LearningRate 0.0141 Epoch: 12 Global Step: 155200 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:24:54,603-Speed 3039.03 samples/sec Loss 5.3150 LearningRate 0.0141 Epoch: 12 Global Step: 155210 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:24:57,917-Speed 3091.27 samples/sec Loss 5.3811 LearningRate 0.0141 Epoch: 12 Global Step: 155220 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:25:01,295-Speed 3032.58 samples/sec Loss 5.3134 LearningRate 0.0141 Epoch: 12 Global Step: 155230 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:25:04,672-Speed 3032.74 samples/sec Loss 5.3654 LearningRate 0.0141 Epoch: 12 Global Step: 155240 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:25:08,121-Speed 2969.49 samples/sec Loss 5.4089 LearningRate 0.0141 Epoch: 12 Global Step: 155250 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:25:11,447-Speed 3079.87 samples/sec Loss 5.2540 LearningRate 0.0141 Epoch: 12 Global Step: 155260 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:25:14,793-Speed 3061.59 samples/sec Loss 5.4298 LearningRate 0.0141 Epoch: 12 Global Step: 155270 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:25:18,143-Speed 3057.49 samples/sec Loss 5.3630 LearningRate 0.0141 Epoch: 12 Global Step: 155280 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:25:21,580-Speed 2980.09 samples/sec Loss 5.4243 LearningRate 0.0141 Epoch: 12 Global Step: 155290 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:25:25,029-Speed 2969.66 samples/sec Loss 5.3035 LearningRate 0.0141 Epoch: 12 Global Step: 155300 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:25:28,483-Speed 2966.02 samples/sec Loss 5.4893 LearningRate 0.0140 Epoch: 12 Global Step: 155310 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:25:31,957-Speed 2947.86 samples/sec Loss 5.3669 LearningRate 0.0140 Epoch: 12 Global Step: 155320 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:25:35,422-Speed 2956.13 samples/sec Loss 5.3869 LearningRate 0.0140 Epoch: 12 Global Step: 155330 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:25:38,870-Speed 2970.74 samples/sec Loss 5.3790 LearningRate 0.0140 Epoch: 12 Global Step: 155340 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:25:42,228-Speed 3051.12 samples/sec Loss 5.3949 LearningRate 0.0140 Epoch: 12 Global Step: 155350 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:25:45,545-Speed 3088.11 samples/sec Loss 5.3372 LearningRate 0.0140 Epoch: 12 Global Step: 155360 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:25:48,979-Speed 2982.27 samples/sec Loss 5.3543 LearningRate 0.0140 Epoch: 12 Global Step: 155370 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:25:52,363-Speed 3027.30 samples/sec Loss 5.3433 LearningRate 0.0140 Epoch: 12 Global Step: 155380 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:25:55,738-Speed 3034.82 samples/sec Loss 5.3694 LearningRate 0.0140 Epoch: 12 Global Step: 155390 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:25:59,055-Speed 3087.74 samples/sec Loss 5.3536 LearningRate 0.0140 Epoch: 12 Global Step: 155400 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:26:02,402-Speed 3060.30 samples/sec Loss 5.3802 LearningRate 0.0140 Epoch: 12 Global Step: 155410 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:26:05,796-Speed 3017.63 samples/sec Loss 5.3941 LearningRate 0.0140 Epoch: 12 Global Step: 155420 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:26:09,114-Speed 3087.41 samples/sec Loss 5.2434 LearningRate 0.0140 Epoch: 12 Global Step: 155430 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:26:12,479-Speed 3044.30 samples/sec Loss 5.2343 LearningRate 0.0140 Epoch: 12 Global Step: 155440 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:26:15,870-Speed 3020.26 samples/sec Loss 5.4252 LearningRate 0.0140 Epoch: 12 Global Step: 155450 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:26:19,252-Speed 3029.19 samples/sec Loss 5.3137 LearningRate 0.0140 Epoch: 12 Global Step: 155460 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:26:22,644-Speed 3019.59 samples/sec Loss 5.3633 LearningRate 0.0140 Epoch: 12 Global Step: 155470 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:26:26,055-Speed 3003.13 samples/sec Loss 5.3146 LearningRate 0.0140 Epoch: 12 Global Step: 155480 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:26:29,418-Speed 3045.20 samples/sec Loss 5.4246 LearningRate 0.0140 Epoch: 12 Global Step: 155490 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:26:32,855-Speed 2980.26 samples/sec Loss 5.3870 LearningRate 0.0140 Epoch: 12 Global Step: 155500 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:26:36,238-Speed 3027.60 samples/sec Loss 5.3604 LearningRate 0.0140 Epoch: 12 Global Step: 155510 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:26:39,617-Speed 3031.35 samples/sec Loss 5.3967 LearningRate 0.0140 Epoch: 12 Global Step: 155520 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:26:42,989-Speed 3037.96 samples/sec Loss 5.3120 LearningRate 0.0140 Epoch: 12 Global Step: 155530 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:26:46,367-Speed 3031.67 samples/sec Loss 5.4145 LearningRate 0.0140 Epoch: 12 Global Step: 155540 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:26:49,751-Speed 3027.47 samples/sec Loss 5.4860 LearningRate 0.0140 Epoch: 12 Global Step: 155550 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:26:53,175-Speed 2991.38 samples/sec Loss 5.2945 LearningRate 0.0140 Epoch: 12 Global Step: 155560 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:26:56,586-Speed 3003.00 samples/sec Loss 5.3710 LearningRate 0.0140 Epoch: 12 Global Step: 155570 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:27:00,100-Speed 2915.38 samples/sec Loss 5.3399 LearningRate 0.0140 Epoch: 12 Global Step: 155580 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:27:03,479-Speed 3031.33 samples/sec Loss 5.3261 LearningRate 0.0140 Epoch: 12 Global Step: 155590 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:27:06,829-Speed 3056.69 samples/sec Loss 5.3803 LearningRate 0.0140 Epoch: 12 Global Step: 155600 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:27:10,178-Speed 3059.15 samples/sec Loss 5.5025 LearningRate 0.0140 Epoch: 12 Global Step: 155610 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:27:13,496-Speed 3087.02 samples/sec Loss 5.2970 LearningRate 0.0140 Epoch: 12 Global Step: 155620 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:27:16,895-Speed 3013.48 samples/sec Loss 5.3028 LearningRate 0.0140 Epoch: 12 Global Step: 155630 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:27:20,319-Speed 2991.54 samples/sec Loss 5.3289 LearningRate 0.0139 Epoch: 12 Global Step: 155640 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:27:23,668-Speed 3059.26 samples/sec Loss 5.4626 LearningRate 0.0139 Epoch: 12 Global Step: 155650 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:27:27,069-Speed 3011.43 samples/sec Loss 5.3784 LearningRate 0.0139 Epoch: 12 Global Step: 155660 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:27:30,552-Speed 2941.00 samples/sec Loss 5.5559 LearningRate 0.0139 Epoch: 12 Global Step: 155670 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:27:33,882-Speed 3076.09 samples/sec Loss 5.3448 LearningRate 0.0139 Epoch: 12 Global Step: 155680 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:27:37,337-Speed 2964.15 samples/sec Loss 5.4247 LearningRate 0.0139 Epoch: 12 Global Step: 155690 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:27:40,751-Speed 3000.71 samples/sec Loss 5.3566 LearningRate 0.0139 Epoch: 12 Global Step: 155700 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:27:44,146-Speed 3016.84 samples/sec Loss 5.4254 LearningRate 0.0139 Epoch: 12 Global Step: 155710 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:27:47,535-Speed 3022.08 samples/sec Loss 5.2626 LearningRate 0.0139 Epoch: 12 Global Step: 155720 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:27:50,986-Speed 2968.74 samples/sec Loss 5.4037 LearningRate 0.0139 Epoch: 12 Global Step: 155730 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:27:54,334-Speed 3059.38 samples/sec Loss 5.3275 LearningRate 0.0139 Epoch: 12 Global Step: 155740 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:27:57,748-Speed 3000.19 samples/sec Loss 5.2874 LearningRate 0.0139 Epoch: 12 Global Step: 155750 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:28:01,078-Speed 3075.75 samples/sec Loss 5.2765 LearningRate 0.0139 Epoch: 12 Global Step: 155760 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:28:04,465-Speed 3024.60 samples/sec Loss 5.3479 LearningRate 0.0139 Epoch: 12 Global Step: 155770 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:28:07,957-Speed 2933.11 samples/sec Loss 5.2998 LearningRate 0.0139 Epoch: 12 Global Step: 155780 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:28:11,420-Speed 2957.73 samples/sec Loss 5.5248 LearningRate 0.0139 Epoch: 12 Global Step: 155790 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:28:14,758-Speed 3069.37 samples/sec Loss 5.3010 LearningRate 0.0139 Epoch: 12 Global Step: 155800 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:28:18,215-Speed 2962.54 samples/sec Loss 5.3835 LearningRate 0.0139 Epoch: 12 Global Step: 155810 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:28:21,573-Speed 3050.21 samples/sec Loss 5.3217 LearningRate 0.0139 Epoch: 12 Global Step: 155820 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:28:24,992-Speed 2996.27 samples/sec Loss 5.3351 LearningRate 0.0139 Epoch: 12 Global Step: 155830 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:28:28,365-Speed 3036.40 samples/sec Loss 5.4355 LearningRate 0.0139 Epoch: 12 Global Step: 155840 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:28:31,695-Speed 3075.84 samples/sec Loss 5.4061 LearningRate 0.0139 Epoch: 12 Global Step: 155850 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:28:35,065-Speed 3039.77 samples/sec Loss 5.2787 LearningRate 0.0139 Epoch: 12 Global Step: 155860 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:28:38,507-Speed 2975.60 samples/sec Loss 5.3236 LearningRate 0.0139 Epoch: 12 Global Step: 155870 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:28:41,884-Speed 3033.03 samples/sec Loss 5.3807 LearningRate 0.0139 Epoch: 12 Global Step: 155880 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:28:45,297-Speed 3001.01 samples/sec Loss 5.4914 LearningRate 0.0139 Epoch: 12 Global Step: 155890 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:28:48,641-Speed 3063.37 samples/sec Loss 5.3294 LearningRate 0.0139 Epoch: 12 Global Step: 155900 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:28:52,035-Speed 3017.67 samples/sec Loss 5.3100 LearningRate 0.0139 Epoch: 12 Global Step: 155910 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:28:55,470-Speed 2982.44 samples/sec Loss 5.3460 LearningRate 0.0139 Epoch: 12 Global Step: 155920 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:28:58,910-Speed 2977.21 samples/sec Loss 5.3350 LearningRate 0.0139 Epoch: 12 Global Step: 155930 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:29:02,329-Speed 2995.76 samples/sec Loss 5.3558 LearningRate 0.0139 Epoch: 12 Global Step: 155940 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:29:05,714-Speed 3026.74 samples/sec Loss 5.3550 LearningRate 0.0139 Epoch: 12 Global Step: 155950 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:29:09,125-Speed 3002.01 samples/sec Loss 5.4051 LearningRate 0.0139 Epoch: 12 Global Step: 155960 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:29:12,510-Speed 3026.23 samples/sec Loss 5.4572 LearningRate 0.0138 Epoch: 12 Global Step: 155970 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:29:15,887-Speed 3033.17 samples/sec Loss 5.3713 LearningRate 0.0138 Epoch: 12 Global Step: 155980 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:29:19,256-Speed 3040.63 samples/sec Loss 5.3082 LearningRate 0.0138 Epoch: 12 Global Step: 155990 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:29:22,586-Speed 3076.28 samples/sec Loss 5.3701 LearningRate 0.0138 Epoch: 12 Global Step: 156000 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:29:25,930-Speed 3062.81 samples/sec Loss 5.3229 LearningRate 0.0138 Epoch: 12 Global Step: 156010 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:29:29,266-Speed 3070.48 samples/sec Loss 5.3489 LearningRate 0.0138 Epoch: 12 Global Step: 156020 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:29:32,655-Speed 3022.27 samples/sec Loss 5.3938 LearningRate 0.0138 Epoch: 12 Global Step: 156030 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:29:35,994-Speed 3067.56 samples/sec Loss 5.4249 LearningRate 0.0138 Epoch: 12 Global Step: 156040 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:29:39,373-Speed 3031.28 samples/sec Loss 5.4596 LearningRate 0.0138 Epoch: 12 Global Step: 156050 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:29:42,783-Speed 3003.65 samples/sec Loss 5.2780 LearningRate 0.0138 Epoch: 12 Global Step: 156060 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:29:46,113-Speed 3076.57 samples/sec Loss 5.3237 LearningRate 0.0138 Epoch: 12 Global Step: 156070 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:29:49,532-Speed 2996.53 samples/sec Loss 5.3860 LearningRate 0.0138 Epoch: 12 Global Step: 156080 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:29:52,859-Speed 3078.27 samples/sec Loss 5.3534 LearningRate 0.0138 Epoch: 12 Global Step: 156090 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:29:56,222-Speed 3046.11 samples/sec Loss 5.3576 LearningRate 0.0138 Epoch: 12 Global Step: 156100 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:29:59,687-Speed 2956.73 samples/sec Loss 5.2770 LearningRate 0.0138 Epoch: 12 Global Step: 156110 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:30:03,087-Speed 3012.38 samples/sec Loss 5.3407 LearningRate 0.0138 Epoch: 12 Global Step: 156120 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:30:06,476-Speed 3021.70 samples/sec Loss 5.1777 LearningRate 0.0138 Epoch: 12 Global Step: 156130 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:30:09,863-Speed 3024.38 samples/sec Loss 5.3624 LearningRate 0.0138 Epoch: 12 Global Step: 156140 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:30:13,195-Speed 3074.49 samples/sec Loss 5.5774 LearningRate 0.0138 Epoch: 12 Global Step: 156150 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:30:16,539-Speed 3063.11 samples/sec Loss 5.3616 LearningRate 0.0138 Epoch: 12 Global Step: 156160 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:30:19,892-Speed 3054.95 samples/sec Loss 5.2898 LearningRate 0.0138 Epoch: 12 Global Step: 156170 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:30:23,224-Speed 3074.47 samples/sec Loss 5.3156 LearningRate 0.0138 Epoch: 12 Global Step: 156180 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:30:26,626-Speed 3010.24 samples/sec Loss 5.3482 LearningRate 0.0138 Epoch: 12 Global Step: 156190 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:30:29,972-Speed 3061.79 samples/sec Loss 5.3839 LearningRate 0.0138 Epoch: 12 Global Step: 156200 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:30:33,368-Speed 3015.95 samples/sec Loss 5.4359 LearningRate 0.0138 Epoch: 12 Global Step: 156210 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:30:36,760-Speed 3019.48 samples/sec Loss 5.4261 LearningRate 0.0138 Epoch: 12 Global Step: 156220 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:30:40,161-Speed 3012.31 samples/sec Loss 5.4922 LearningRate 0.0138 Epoch: 12 Global Step: 156230 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:30:43,601-Speed 2977.08 samples/sec Loss 5.4507 LearningRate 0.0138 Epoch: 12 Global Step: 156240 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:30:47,054-Speed 2966.20 samples/sec Loss 5.3159 LearningRate 0.0138 Epoch: 12 Global Step: 156250 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:30:50,463-Speed 3005.31 samples/sec Loss 5.3072 LearningRate 0.0138 Epoch: 12 Global Step: 156260 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:30:53,911-Speed 2970.45 samples/sec Loss 5.3552 LearningRate 0.0138 Epoch: 12 Global Step: 156270 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:30:57,344-Speed 2983.97 samples/sec Loss 5.4705 LearningRate 0.0138 Epoch: 12 Global Step: 156280 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:31:00,687-Speed 3064.19 samples/sec Loss 5.3771 LearningRate 0.0138 Epoch: 12 Global Step: 156290 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:31:04,135-Speed 2970.89 samples/sec Loss 5.3738 LearningRate 0.0138 Epoch: 12 Global Step: 156300 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:31:07,570-Speed 2981.50 samples/sec Loss 5.2787 LearningRate 0.0137 Epoch: 12 Global Step: 156310 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:31:10,893-Speed 3083.34 samples/sec Loss 5.3744 LearningRate 0.0137 Epoch: 12 Global Step: 156320 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:31:14,258-Speed 3043.07 samples/sec Loss 5.4017 LearningRate 0.0137 Epoch: 12 Global Step: 156330 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:31:17,707-Speed 2970.26 samples/sec Loss 5.3723 LearningRate 0.0137 Epoch: 12 Global Step: 156340 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:31:21,184-Speed 2946.05 samples/sec Loss 5.3402 LearningRate 0.0137 Epoch: 12 Global Step: 156350 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:31:24,659-Speed 2947.72 samples/sec Loss 5.4308 LearningRate 0.0137 Epoch: 12 Global Step: 156360 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:31:28,064-Speed 3008.53 samples/sec Loss 5.2872 LearningRate 0.0137 Epoch: 12 Global Step: 156370 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:31:31,423-Speed 3049.25 samples/sec Loss 5.4857 LearningRate 0.0137 Epoch: 12 Global Step: 156380 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:31:34,806-Speed 3027.99 samples/sec Loss 5.4281 LearningRate 0.0137 Epoch: 12 Global Step: 156390 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:31:38,271-Speed 2956.03 samples/sec Loss 5.4308 LearningRate 0.0137 Epoch: 12 Global Step: 156400 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:31:41,680-Speed 3005.16 samples/sec Loss 5.2234 LearningRate 0.0137 Epoch: 12 Global Step: 156410 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:31:45,076-Speed 3015.70 samples/sec Loss 5.4594 LearningRate 0.0137 Epoch: 12 Global Step: 156420 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:31:48,481-Speed 3008.70 samples/sec Loss 5.3289 LearningRate 0.0137 Epoch: 12 Global Step: 156430 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:31:51,876-Speed 3017.43 samples/sec Loss 5.3650 LearningRate 0.0137 Epoch: 12 Global Step: 156440 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:31:55,318-Speed 2975.41 samples/sec Loss 5.4065 LearningRate 0.0137 Epoch: 12 Global Step: 156450 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:31:58,762-Speed 2974.76 samples/sec Loss 5.3120 LearningRate 0.0137 Epoch: 12 Global Step: 156460 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:32:02,110-Speed 3059.31 samples/sec Loss 5.2990 LearningRate 0.0137 Epoch: 12 Global Step: 156470 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:32:05,620-Speed 2918.18 samples/sec Loss 5.2811 LearningRate 0.0137 Epoch: 12 Global Step: 156480 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:32:08,972-Speed 3055.47 samples/sec Loss 5.3347 LearningRate 0.0137 Epoch: 12 Global Step: 156490 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:32:12,404-Speed 2984.46 samples/sec Loss 5.4088 LearningRate 0.0137 Epoch: 12 Global Step: 156500 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:32:15,855-Speed 2968.69 samples/sec Loss 5.4311 LearningRate 0.0137 Epoch: 12 Global Step: 156510 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:32:19,238-Speed 3027.15 samples/sec Loss 5.3337 LearningRate 0.0137 Epoch: 12 Global Step: 156520 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:32:22,620-Speed 3028.79 samples/sec Loss 5.3756 LearningRate 0.0137 Epoch: 12 Global Step: 156530 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:32:26,026-Speed 3007.95 samples/sec Loss 5.3146 LearningRate 0.0137 Epoch: 12 Global Step: 156540 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:32:29,353-Speed 3079.20 samples/sec Loss 5.3952 LearningRate 0.0137 Epoch: 12 Global Step: 156550 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:32:32,709-Speed 3051.65 samples/sec Loss 5.3761 LearningRate 0.0137 Epoch: 12 Global Step: 156560 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:32:36,057-Speed 3059.66 samples/sec Loss 5.3574 LearningRate 0.0137 Epoch: 12 Global Step: 156570 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:32:39,423-Speed 3043.37 samples/sec Loss 5.3629 LearningRate 0.0137 Epoch: 12 Global Step: 156580 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:32:42,877-Speed 2964.86 samples/sec Loss 5.3011 LearningRate 0.0137 Epoch: 12 Global Step: 156590 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:32:46,346-Speed 2953.09 samples/sec Loss 5.4444 LearningRate 0.0137 Epoch: 12 Global Step: 156600 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:32:49,755-Speed 3004.04 samples/sec Loss 5.3801 LearningRate 0.0137 Epoch: 12 Global Step: 156610 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:32:53,163-Speed 3006.09 samples/sec Loss 5.3837 LearningRate 0.0137 Epoch: 12 Global Step: 156620 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:32:56,525-Speed 3046.53 samples/sec Loss 5.4512 LearningRate 0.0137 Epoch: 12 Global Step: 156630 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:32:59,917-Speed 3019.92 samples/sec Loss 5.3192 LearningRate 0.0136 Epoch: 12 Global Step: 156640 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:33:03,325-Speed 3005.08 samples/sec Loss 5.3559 LearningRate 0.0136 Epoch: 12 Global Step: 156650 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:33:06,755-Speed 2985.97 samples/sec Loss 5.4731 LearningRate 0.0136 Epoch: 12 Global Step: 156660 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:33:10,153-Speed 3015.03 samples/sec Loss 5.4313 LearningRate 0.0136 Epoch: 12 Global Step: 156670 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:33:13,512-Speed 3048.58 samples/sec Loss 5.3286 LearningRate 0.0136 Epoch: 12 Global Step: 156680 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:33:16,883-Speed 3038.71 samples/sec Loss 5.4696 LearningRate 0.0136 Epoch: 12 Global Step: 156690 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:33:20,272-Speed 3022.62 samples/sec Loss 5.3974 LearningRate 0.0136 Epoch: 12 Global Step: 156700 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:33:23,657-Speed 3026.01 samples/sec Loss 5.2890 LearningRate 0.0136 Epoch: 12 Global Step: 156710 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:33:26,968-Speed 3093.07 samples/sec Loss 5.3504 LearningRate 0.0136 Epoch: 12 Global Step: 156720 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:33:30,368-Speed 3012.79 samples/sec Loss 5.2240 LearningRate 0.0136 Epoch: 12 Global Step: 156730 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:33:33,718-Speed 3057.13 samples/sec Loss 5.3833 LearningRate 0.0136 Epoch: 12 Global Step: 156740 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:33:37,116-Speed 3015.04 samples/sec Loss 5.4316 LearningRate 0.0136 Epoch: 12 Global Step: 156750 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:33:40,512-Speed 3016.41 samples/sec Loss 5.3944 LearningRate 0.0136 Epoch: 12 Global Step: 156760 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:33:43,952-Speed 2976.65 samples/sec Loss 5.4893 LearningRate 0.0136 Epoch: 12 Global Step: 156770 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:33:47,457-Speed 2922.64 samples/sec Loss 5.4573 LearningRate 0.0136 Epoch: 12 Global Step: 156780 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:33:50,846-Speed 3022.65 samples/sec Loss 5.4207 LearningRate 0.0136 Epoch: 12 Global Step: 156790 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:33:54,253-Speed 3006.81 samples/sec Loss 5.2862 LearningRate 0.0136 Epoch: 12 Global Step: 156800 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:33:57,711-Speed 2961.54 samples/sec Loss 5.2999 LearningRate 0.0136 Epoch: 12 Global Step: 156810 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:34:01,142-Speed 2985.61 samples/sec Loss 5.3855 LearningRate 0.0136 Epoch: 12 Global Step: 156820 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:34:04,520-Speed 3031.98 samples/sec Loss 5.3476 LearningRate 0.0136 Epoch: 12 Global Step: 156830 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:34:07,888-Speed 3041.39 samples/sec Loss 5.3263 LearningRate 0.0136 Epoch: 12 Global Step: 156840 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:34:11,260-Speed 3037.96 samples/sec Loss 5.4127 LearningRate 0.0136 Epoch: 12 Global Step: 156850 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:34:14,650-Speed 3021.12 samples/sec Loss 5.3339 LearningRate 0.0136 Epoch: 12 Global Step: 156860 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:34:18,001-Speed 3057.14 samples/sec Loss 5.3685 LearningRate 0.0136 Epoch: 12 Global Step: 156870 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:34:21,409-Speed 3005.43 samples/sec Loss 5.3732 LearningRate 0.0136 Epoch: 12 Global Step: 156880 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:34:24,846-Speed 2979.82 samples/sec Loss 5.4811 LearningRate 0.0136 Epoch: 12 Global Step: 156890 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:34:28,245-Speed 3014.19 samples/sec Loss 5.3409 LearningRate 0.0136 Epoch: 12 Global Step: 156900 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:34:31,609-Speed 3044.90 samples/sec Loss 5.3607 LearningRate 0.0136 Epoch: 12 Global Step: 156910 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:34:34,982-Speed 3036.42 samples/sec Loss 5.2930 LearningRate 0.0136 Epoch: 12 Global Step: 156920 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:34:38,472-Speed 2934.94 samples/sec Loss 5.3408 LearningRate 0.0136 Epoch: 12 Global Step: 156930 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:34:41,876-Speed 3009.12 samples/sec Loss 5.3861 LearningRate 0.0136 Epoch: 12 Global Step: 156940 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:34:45,286-Speed 3003.47 samples/sec Loss 5.3981 LearningRate 0.0136 Epoch: 12 Global Step: 156950 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:34:48,685-Speed 3014.10 samples/sec Loss 5.3139 LearningRate 0.0136 Epoch: 12 Global Step: 156960 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:34:52,036-Speed 3056.08 samples/sec Loss 5.4033 LearningRate 0.0136 Epoch: 12 Global Step: 156970 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:34:55,540-Speed 2923.30 samples/sec Loss 5.2675 LearningRate 0.0135 Epoch: 12 Global Step: 156980 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:34:58,924-Speed 3026.79 samples/sec Loss 5.3342 LearningRate 0.0135 Epoch: 12 Global Step: 156990 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:35:02,266-Speed 3065.49 samples/sec Loss 5.3845 LearningRate 0.0135 Epoch: 12 Global Step: 157000 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:35:05,634-Speed 3041.30 samples/sec Loss 5.3262 LearningRate 0.0135 Epoch: 12 Global Step: 157010 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:35:09,033-Speed 3013.69 samples/sec Loss 5.3986 LearningRate 0.0135 Epoch: 12 Global Step: 157020 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:35:12,429-Speed 3016.79 samples/sec Loss 5.3609 LearningRate 0.0135 Epoch: 12 Global Step: 157030 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:35:15,840-Speed 3003.01 samples/sec Loss 5.4691 LearningRate 0.0135 Epoch: 12 Global Step: 157040 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:35:19,253-Speed 3000.68 samples/sec Loss 5.3379 LearningRate 0.0135 Epoch: 12 Global Step: 157050 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:35:22,604-Speed 3056.94 samples/sec Loss 5.3667 LearningRate 0.0135 Epoch: 12 Global Step: 157060 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:35:25,935-Speed 3074.72 samples/sec Loss 5.4817 LearningRate 0.0135 Epoch: 12 Global Step: 157070 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:35:29,252-Speed 3088.26 samples/sec Loss 5.4119 LearningRate 0.0135 Epoch: 12 Global Step: 157080 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:35:32,602-Speed 3057.63 samples/sec Loss 5.4418 LearningRate 0.0135 Epoch: 12 Global Step: 157090 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:35:36,032-Speed 2985.94 samples/sec Loss 5.3613 LearningRate 0.0135 Epoch: 12 Global Step: 157100 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:35:39,482-Speed 2969.72 samples/sec Loss 5.4635 LearningRate 0.0135 Epoch: 12 Global Step: 157110 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:35:42,912-Speed 2986.11 samples/sec Loss 5.4226 LearningRate 0.0135 Epoch: 12 Global Step: 157120 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:35:46,243-Speed 3077.18 samples/sec Loss 5.3695 LearningRate 0.0135 Epoch: 12 Global Step: 157130 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:35:49,625-Speed 3027.85 samples/sec Loss 5.3821 LearningRate 0.0135 Epoch: 12 Global Step: 157140 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:35:52,931-Speed 3098.44 samples/sec Loss 5.3526 LearningRate 0.0135 Epoch: 12 Global Step: 157150 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:35:56,329-Speed 3013.96 samples/sec Loss 5.3553 LearningRate 0.0135 Epoch: 12 Global Step: 157160 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:35:59,774-Speed 2973.20 samples/sec Loss 5.3724 LearningRate 0.0135 Epoch: 12 Global Step: 157170 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:36:03,281-Speed 2921.29 samples/sec Loss 5.3229 LearningRate 0.0135 Epoch: 12 Global Step: 157180 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:36:06,708-Speed 2988.29 samples/sec Loss 5.3978 LearningRate 0.0135 Epoch: 12 Global Step: 157190 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:36:10,095-Speed 3024.50 samples/sec Loss 5.2447 LearningRate 0.0135 Epoch: 12 Global Step: 157200 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:36:13,549-Speed 2966.15 samples/sec Loss 5.4167 LearningRate 0.0135 Epoch: 12 Global Step: 157210 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:36:16,927-Speed 3031.94 samples/sec Loss 5.3706 LearningRate 0.0135 Epoch: 12 Global Step: 157220 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:36:20,409-Speed 2941.96 samples/sec Loss 5.3512 LearningRate 0.0135 Epoch: 12 Global Step: 157230 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:36:23,842-Speed 2983.48 samples/sec Loss 5.4150 LearningRate 0.0135 Epoch: 12 Global Step: 157240 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:36:27,278-Speed 2980.41 samples/sec Loss 5.4312 LearningRate 0.0135 Epoch: 12 Global Step: 157250 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:36:30,645-Speed 3042.72 samples/sec Loss 5.3077 LearningRate 0.0135 Epoch: 12 Global Step: 157260 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:36:34,011-Speed 3042.54 samples/sec Loss 5.3708 LearningRate 0.0135 Epoch: 12 Global Step: 157270 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:36:37,334-Speed 3082.92 samples/sec Loss 5.4190 LearningRate 0.0135 Epoch: 12 Global Step: 157280 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:36:40,754-Speed 2994.25 samples/sec Loss 5.3224 LearningRate 0.0135 Epoch: 12 Global Step: 157290 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:36:44,116-Speed 3047.34 samples/sec Loss 5.4160 LearningRate 0.0135 Epoch: 12 Global Step: 157300 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:36:47,535-Speed 2995.26 samples/sec Loss 5.4198 LearningRate 0.0135 Epoch: 12 Global Step: 157310 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:36:50,877-Speed 3065.39 samples/sec Loss 5.3539 LearningRate 0.0134 Epoch: 12 Global Step: 157320 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:36:54,207-Speed 3076.24 samples/sec Loss 5.3663 LearningRate 0.0134 Epoch: 12 Global Step: 157330 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:36:57,556-Speed 3057.81 samples/sec Loss 5.2778 LearningRate 0.0134 Epoch: 12 Global Step: 157340 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:37:00,867-Speed 3094.20 samples/sec Loss 5.3641 LearningRate 0.0134 Epoch: 12 Global Step: 157350 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:37:04,327-Speed 2959.81 samples/sec Loss 5.4059 LearningRate 0.0134 Epoch: 12 Global Step: 157360 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:37:07,712-Speed 3026.76 samples/sec Loss 5.4307 LearningRate 0.0134 Epoch: 12 Global Step: 157370 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:37:11,160-Speed 2970.85 samples/sec Loss 5.3385 LearningRate 0.0134 Epoch: 12 Global Step: 157380 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:37:14,607-Speed 2971.39 samples/sec Loss 5.4445 LearningRate 0.0134 Epoch: 12 Global Step: 157390 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:37:18,084-Speed 2945.95 samples/sec Loss 5.4132 LearningRate 0.0134 Epoch: 12 Global Step: 157400 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:37:21,471-Speed 3024.67 samples/sec Loss 5.3768 LearningRate 0.0134 Epoch: 12 Global Step: 157410 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:37:24,861-Speed 3021.55 samples/sec Loss 5.2778 LearningRate 0.0134 Epoch: 12 Global Step: 157420 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:37:28,307-Speed 2972.55 samples/sec Loss 5.3743 LearningRate 0.0134 Epoch: 12 Global Step: 157430 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:37:31,825-Speed 2911.43 samples/sec Loss 5.3155 LearningRate 0.0134 Epoch: 12 Global Step: 157440 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:37:35,255-Speed 2986.22 samples/sec Loss 5.3533 LearningRate 0.0134 Epoch: 12 Global Step: 157450 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:37:38,741-Speed 2937.67 samples/sec Loss 5.5072 LearningRate 0.0134 Epoch: 12 Global Step: 157460 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:37:42,130-Speed 3022.45 samples/sec Loss 5.3156 LearningRate 0.0134 Epoch: 12 Global Step: 157470 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:37:45,520-Speed 3021.56 samples/sec Loss 5.4229 LearningRate 0.0134 Epoch: 12 Global Step: 157480 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:37:48,916-Speed 3015.43 samples/sec Loss 5.4004 LearningRate 0.0134 Epoch: 12 Global Step: 157490 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:37:52,331-Speed 2999.45 samples/sec Loss 5.3579 LearningRate 0.0134 Epoch: 12 Global Step: 157500 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:37:55,731-Speed 3012.70 samples/sec Loss 5.3530 LearningRate 0.0134 Epoch: 12 Global Step: 157510 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:37:59,099-Speed 3040.92 samples/sec Loss 5.3184 LearningRate 0.0134 Epoch: 12 Global Step: 157520 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:38:02,452-Speed 3055.66 samples/sec Loss 5.3337 LearningRate 0.0134 Epoch: 12 Global Step: 157530 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:38:05,800-Speed 3058.85 samples/sec Loss 5.2714 LearningRate 0.0134 Epoch: 12 Global Step: 157540 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:38:09,164-Speed 3044.67 samples/sec Loss 5.3866 LearningRate 0.0134 Epoch: 12 Global Step: 157550 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:38:12,630-Speed 2955.63 samples/sec Loss 5.2982 LearningRate 0.0134 Epoch: 12 Global Step: 157560 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:38:15,997-Speed 3043.11 samples/sec Loss 5.3790 LearningRate 0.0134 Epoch: 12 Global Step: 157570 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:38:19,363-Speed 3042.43 samples/sec Loss 5.4263 LearningRate 0.0134 Epoch: 12 Global Step: 157580 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:38:22,769-Speed 3007.78 samples/sec Loss 5.4723 LearningRate 0.0134 Epoch: 12 Global Step: 157590 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:38:26,132-Speed 3045.84 samples/sec Loss 5.3524 LearningRate 0.0134 Epoch: 12 Global Step: 157600 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:38:29,593-Speed 2959.36 samples/sec Loss 5.3218 LearningRate 0.0134 Epoch: 12 Global Step: 157610 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:38:32,973-Speed 3030.55 samples/sec Loss 5.3827 LearningRate 0.0134 Epoch: 12 Global Step: 157620 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:38:36,344-Speed 3038.55 samples/sec Loss 5.2849 LearningRate 0.0134 Epoch: 12 Global Step: 157630 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:38:39,699-Speed 3053.33 samples/sec Loss 5.3593 LearningRate 0.0134 Epoch: 12 Global Step: 157640 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:38:43,035-Speed 3070.35 samples/sec Loss 5.3134 LearningRate 0.0134 Epoch: 12 Global Step: 157650 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:38:46,406-Speed 3038.96 samples/sec Loss 5.3468 LearningRate 0.0133 Epoch: 12 Global Step: 157660 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:38:49,828-Speed 2993.69 samples/sec Loss 5.3285 LearningRate 0.0133 Epoch: 12 Global Step: 157670 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:38:53,162-Speed 3071.76 samples/sec Loss 5.3569 LearningRate 0.0133 Epoch: 12 Global Step: 157680 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:38:56,501-Speed 3067.48 samples/sec Loss 5.3156 LearningRate 0.0133 Epoch: 12 Global Step: 157690 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:38:59,849-Speed 3059.36 samples/sec Loss 5.4169 LearningRate 0.0133 Epoch: 12 Global Step: 157700 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:39:03,166-Speed 3088.89 samples/sec Loss 5.3844 LearningRate 0.0133 Epoch: 12 Global Step: 157710 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:39:06,581-Speed 2998.91 samples/sec Loss 5.3035 LearningRate 0.0133 Epoch: 12 Global Step: 157720 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:39:09,928-Speed 3059.85 samples/sec Loss 5.4197 LearningRate 0.0133 Epoch: 12 Global Step: 157730 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:39:13,258-Speed 3076.52 samples/sec Loss 5.3115 LearningRate 0.0133 Epoch: 12 Global Step: 157740 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:39:16,571-Speed 3091.09 samples/sec Loss 5.3699 LearningRate 0.0133 Epoch: 12 Global Step: 157750 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:39:19,938-Speed 3042.25 samples/sec Loss 5.4316 LearningRate 0.0133 Epoch: 12 Global Step: 157760 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:39:23,270-Speed 3074.11 samples/sec Loss 5.3407 LearningRate 0.0133 Epoch: 12 Global Step: 157770 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:39:26,695-Speed 2990.69 samples/sec Loss 5.3722 LearningRate 0.0133 Epoch: 12 Global Step: 157780 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:39:30,128-Speed 2983.46 samples/sec Loss 5.4447 LearningRate 0.0133 Epoch: 12 Global Step: 157790 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:39:33,494-Speed 3043.55 samples/sec Loss 5.3455 LearningRate 0.0133 Epoch: 12 Global Step: 157800 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:39:36,877-Speed 3027.72 samples/sec Loss 5.2174 LearningRate 0.0133 Epoch: 12 Global Step: 157810 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:39:40,327-Speed 2968.76 samples/sec Loss 5.3861 LearningRate 0.0133 Epoch: 12 Global Step: 157820 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:39:43,744-Speed 2997.28 samples/sec Loss 5.3422 LearningRate 0.0133 Epoch: 12 Global Step: 157830 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:39:47,228-Speed 2940.46 samples/sec Loss 5.2859 LearningRate 0.0133 Epoch: 12 Global Step: 157840 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:39:50,585-Speed 3051.32 samples/sec Loss 5.2802 LearningRate 0.0133 Epoch: 12 Global Step: 157850 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:39:53,929-Speed 3062.69 samples/sec Loss 5.2938 LearningRate 0.0133 Epoch: 12 Global Step: 157860 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:39:57,318-Speed 3022.05 samples/sec Loss 5.3155 LearningRate 0.0133 Epoch: 12 Global Step: 157870 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:40:00,671-Speed 3054.83 samples/sec Loss 5.4006 LearningRate 0.0133 Epoch: 12 Global Step: 157880 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:40:04,069-Speed 3014.74 samples/sec Loss 5.3266 LearningRate 0.0133 Epoch: 12 Global Step: 157890 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:40:07,376-Speed 3096.84 samples/sec Loss 5.2917 LearningRate 0.0133 Epoch: 12 Global Step: 157900 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:40:10,692-Speed 3089.43 samples/sec Loss 5.3989 LearningRate 0.0133 Epoch: 12 Global Step: 157910 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:40:13,990-Speed 3105.88 samples/sec Loss 5.3053 LearningRate 0.0133 Epoch: 12 Global Step: 157920 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:40:17,389-Speed 3013.79 samples/sec Loss 5.3001 LearningRate 0.0133 Epoch: 12 Global Step: 157930 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:40:20,732-Speed 3064.24 samples/sec Loss 5.3249 LearningRate 0.0133 Epoch: 12 Global Step: 157940 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:40:24,199-Speed 2954.06 samples/sec Loss 5.3183 LearningRate 0.0133 Epoch: 12 Global Step: 157950 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:40:27,564-Speed 3044.51 samples/sec Loss 5.3826 LearningRate 0.0133 Epoch: 12 Global Step: 157960 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:40:30,946-Speed 3029.04 samples/sec Loss 5.3085 LearningRate 0.0133 Epoch: 12 Global Step: 157970 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:40:34,328-Speed 3028.21 samples/sec Loss 5.3205 LearningRate 0.0133 Epoch: 12 Global Step: 157980 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:40:37,722-Speed 3018.21 samples/sec Loss 5.3364 LearningRate 0.0133 Epoch: 12 Global Step: 157990 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:40:41,028-Speed 3098.07 samples/sec Loss 5.3887 LearningRate 0.0132 Epoch: 12 Global Step: 158000 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:40:44,358-Speed 3075.95 samples/sec Loss 5.4714 LearningRate 0.0132 Epoch: 12 Global Step: 158010 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:40:47,707-Speed 3058.13 samples/sec Loss 5.4390 LearningRate 0.0132 Epoch: 12 Global Step: 158020 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:40:51,080-Speed 3036.74 samples/sec Loss 5.3578 LearningRate 0.0132 Epoch: 12 Global Step: 158030 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:40:54,476-Speed 3015.80 samples/sec Loss 5.2621 LearningRate 0.0132 Epoch: 12 Global Step: 158040 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:40:57,808-Speed 3074.52 samples/sec Loss 5.3965 LearningRate 0.0132 Epoch: 12 Global Step: 158050 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:41:01,169-Speed 3047.72 samples/sec Loss 5.4031 LearningRate 0.0132 Epoch: 12 Global Step: 158060 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:41:04,556-Speed 3023.79 samples/sec Loss 5.3332 LearningRate 0.0132 Epoch: 12 Global Step: 158070 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:41:07,946-Speed 3022.07 samples/sec Loss 5.2881 LearningRate 0.0132 Epoch: 12 Global Step: 158080 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:41:11,312-Speed 3042.57 samples/sec Loss 5.3132 LearningRate 0.0132 Epoch: 12 Global Step: 158090 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:41:14,622-Speed 3094.90 samples/sec Loss 5.3371 LearningRate 0.0132 Epoch: 12 Global Step: 158100 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:41:17,985-Speed 3045.32 samples/sec Loss 5.4115 LearningRate 0.0132 Epoch: 12 Global Step: 158110 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:41:21,361-Speed 3033.67 samples/sec Loss 5.2717 LearningRate 0.0132 Epoch: 12 Global Step: 158120 Fp16 Grad Scale: 131072 Required: 9 hours Training: 2022-04-27 16:41:24,725-Speed 3045.19 samples/sec Loss 5.3798 LearningRate 0.0132 Epoch: 12 Global Step: 158130 Fp16 Grad Scale: 65536 Required: 9 hours Training: 2022-04-27 16:41:28,068-Speed 3064.42 samples/sec Loss 5.3166 LearningRate 0.0132 Epoch: 12 Global Step: 158140 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:41:31,407-Speed 3067.41 samples/sec Loss 5.3668 LearningRate 0.0132 Epoch: 12 Global Step: 158150 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:41:34,774-Speed 3041.82 samples/sec Loss 5.4530 LearningRate 0.0132 Epoch: 12 Global Step: 158160 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:41:38,201-Speed 2989.39 samples/sec Loss 5.3611 LearningRate 0.0132 Epoch: 12 Global Step: 158170 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:41:41,514-Speed 3091.28 samples/sec Loss 5.4518 LearningRate 0.0132 Epoch: 12 Global Step: 158180 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:41:44,880-Speed 3043.15 samples/sec Loss 5.5063 LearningRate 0.0132 Epoch: 12 Global Step: 158190 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:41:48,283-Speed 3010.29 samples/sec Loss 5.3838 LearningRate 0.0132 Epoch: 12 Global Step: 158200 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:41:51,622-Speed 3067.18 samples/sec Loss 5.2853 LearningRate 0.0132 Epoch: 12 Global Step: 158210 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:41:55,040-Speed 2996.63 samples/sec Loss 5.3302 LearningRate 0.0132 Epoch: 12 Global Step: 158220 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:41:58,452-Speed 3002.59 samples/sec Loss 5.3609 LearningRate 0.0132 Epoch: 12 Global Step: 158230 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:42:01,825-Speed 3036.81 samples/sec Loss 5.4719 LearningRate 0.0132 Epoch: 12 Global Step: 158240 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:42:05,245-Speed 2995.22 samples/sec Loss 5.3522 LearningRate 0.0132 Epoch: 12 Global Step: 158250 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:42:08,629-Speed 3027.18 samples/sec Loss 5.2450 LearningRate 0.0132 Epoch: 12 Global Step: 158260 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:42:12,022-Speed 3018.41 samples/sec Loss 5.1341 LearningRate 0.0132 Epoch: 12 Global Step: 158270 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:42:15,437-Speed 2999.13 samples/sec Loss 5.3154 LearningRate 0.0132 Epoch: 12 Global Step: 158280 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:42:18,833-Speed 3016.13 samples/sec Loss 5.2673 LearningRate 0.0132 Epoch: 12 Global Step: 158290 Fp16 Grad Scale: 16384 Required: 9 hours Training: 2022-04-27 16:42:22,280-Speed 2971.59 samples/sec Loss 5.3909 LearningRate 0.0132 Epoch: 12 Global Step: 158300 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:42:25,731-Speed 2967.77 samples/sec Loss 5.4237 LearningRate 0.0132 Epoch: 12 Global Step: 158310 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:42:29,116-Speed 3026.14 samples/sec Loss 5.3144 LearningRate 0.0132 Epoch: 12 Global Step: 158320 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:42:32,466-Speed 3057.96 samples/sec Loss 5.3130 LearningRate 0.0132 Epoch: 12 Global Step: 158330 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:42:35,878-Speed 3002.10 samples/sec Loss 5.2503 LearningRate 0.0131 Epoch: 12 Global Step: 158340 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:42:39,303-Speed 2989.78 samples/sec Loss 5.4294 LearningRate 0.0131 Epoch: 12 Global Step: 158350 Fp16 Grad Scale: 32768 Required: 9 hours Training: 2022-04-27 16:42:42,676-Speed 3037.30 samples/sec Loss 5.4235 LearningRate 0.0131 Epoch: 12 Global Step: 158360 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:42:46,072-Speed 3016.08 samples/sec Loss 5.3355 LearningRate 0.0131 Epoch: 12 Global Step: 158370 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:42:49,482-Speed 3003.99 samples/sec Loss 5.2999 LearningRate 0.0131 Epoch: 12 Global Step: 158380 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 16:42:52,868-Speed 3024.97 samples/sec Loss 5.2665 LearningRate 0.0131 Epoch: 12 Global Step: 158390 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 16:42:56,184-Speed 3089.03 samples/sec Loss 5.4195 LearningRate 0.0131 Epoch: 12 Global Step: 158400 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 16:42:59,547-Speed 3046.01 samples/sec Loss 5.4017 LearningRate 0.0131 Epoch: 12 Global Step: 158410 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 16:43:02,918-Speed 3038.51 samples/sec Loss 5.2725 LearningRate 0.0131 Epoch: 12 Global Step: 158420 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 16:43:06,324-Speed 3007.18 samples/sec Loss 5.2924 LearningRate 0.0131 Epoch: 12 Global Step: 158430 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 16:43:09,653-Speed 3076.48 samples/sec Loss 5.2845 LearningRate 0.0131 Epoch: 12 Global Step: 158440 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 16:43:13,079-Speed 2990.19 samples/sec Loss 5.3810 LearningRate 0.0131 Epoch: 12 Global Step: 158450 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 16:43:16,401-Speed 3083.73 samples/sec Loss 5.3936 LearningRate 0.0131 Epoch: 12 Global Step: 158460 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 16:43:19,750-Speed 3058.31 samples/sec Loss 5.4251 LearningRate 0.0131 Epoch: 12 Global Step: 158470 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 16:43:23,225-Speed 2948.14 samples/sec Loss 5.3470 LearningRate 0.0131 Epoch: 12 Global Step: 158480 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:43:26,652-Speed 2988.30 samples/sec Loss 5.3146 LearningRate 0.0131 Epoch: 12 Global Step: 158490 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:43:30,065-Speed 3001.40 samples/sec Loss 5.3445 LearningRate 0.0131 Epoch: 12 Global Step: 158500 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:43:33,413-Speed 3060.15 samples/sec Loss 5.3493 LearningRate 0.0131 Epoch: 12 Global Step: 158510 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:43:36,740-Speed 3078.21 samples/sec Loss 5.4217 LearningRate 0.0131 Epoch: 12 Global Step: 158520 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:43:40,096-Speed 3052.75 samples/sec Loss 5.3179 LearningRate 0.0131 Epoch: 12 Global Step: 158530 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:43:43,490-Speed 3017.88 samples/sec Loss 5.3309 LearningRate 0.0131 Epoch: 12 Global Step: 158540 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:43:46,855-Speed 3043.64 samples/sec Loss 5.3363 LearningRate 0.0131 Epoch: 12 Global Step: 158550 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:43:50,305-Speed 2968.82 samples/sec Loss 5.3963 LearningRate 0.0131 Epoch: 12 Global Step: 158560 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:43:53,767-Speed 2958.39 samples/sec Loss 5.3635 LearningRate 0.0131 Epoch: 12 Global Step: 158570 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:43:57,176-Speed 3005.03 samples/sec Loss 5.3173 LearningRate 0.0131 Epoch: 12 Global Step: 158580 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 16:44:00,588-Speed 3002.18 samples/sec Loss 5.4170 LearningRate 0.0131 Epoch: 12 Global Step: 158590 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 16:44:03,991-Speed 3009.92 samples/sec Loss 5.3062 LearningRate 0.0131 Epoch: 12 Global Step: 158600 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:44:07,352-Speed 3047.39 samples/sec Loss 5.3479 LearningRate 0.0131 Epoch: 12 Global Step: 158610 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:44:10,750-Speed 3014.86 samples/sec Loss 5.3501 LearningRate 0.0131 Epoch: 12 Global Step: 158620 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:44:14,117-Speed 3042.27 samples/sec Loss 5.3927 LearningRate 0.0131 Epoch: 12 Global Step: 158630 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:44:17,447-Speed 3075.67 samples/sec Loss 5.3264 LearningRate 0.0131 Epoch: 12 Global Step: 158640 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:44:20,781-Speed 3072.23 samples/sec Loss 5.2570 LearningRate 0.0131 Epoch: 12 Global Step: 158650 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:44:24,129-Speed 3059.17 samples/sec Loss 5.3245 LearningRate 0.0131 Epoch: 12 Global Step: 158660 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:44:27,453-Speed 3081.95 samples/sec Loss 5.4010 LearningRate 0.0131 Epoch: 12 Global Step: 158670 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:44:30,783-Speed 3075.96 samples/sec Loss 5.3607 LearningRate 0.0130 Epoch: 12 Global Step: 158680 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:44:34,139-Speed 3052.12 samples/sec Loss 5.2628 LearningRate 0.0130 Epoch: 12 Global Step: 158690 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:44:37,520-Speed 3029.07 samples/sec Loss 5.2612 LearningRate 0.0130 Epoch: 12 Global Step: 158700 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 16:44:40,873-Speed 3054.67 samples/sec Loss 5.2891 LearningRate 0.0130 Epoch: 12 Global Step: 158710 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 16:44:44,219-Speed 3061.56 samples/sec Loss 5.3134 LearningRate 0.0130 Epoch: 12 Global Step: 158720 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 16:44:47,551-Speed 3073.77 samples/sec Loss 5.2033 LearningRate 0.0130 Epoch: 12 Global Step: 158730 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 16:44:50,935-Speed 3026.79 samples/sec Loss 5.2678 LearningRate 0.0130 Epoch: 12 Global Step: 158740 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 16:44:54,308-Speed 3037.03 samples/sec Loss 5.3293 LearningRate 0.0130 Epoch: 12 Global Step: 158750 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 16:44:57,679-Speed 3038.72 samples/sec Loss 5.3228 LearningRate 0.0130 Epoch: 12 Global Step: 158760 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 16:45:01,022-Speed 3064.16 samples/sec Loss 5.3782 LearningRate 0.0130 Epoch: 12 Global Step: 158770 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 16:45:04,369-Speed 3059.55 samples/sec Loss 5.3414 LearningRate 0.0130 Epoch: 12 Global Step: 158780 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 16:45:07,704-Speed 3071.30 samples/sec Loss 5.2893 LearningRate 0.0130 Epoch: 12 Global Step: 158790 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 16:45:11,080-Speed 3034.41 samples/sec Loss 5.2746 LearningRate 0.0130 Epoch: 12 Global Step: 158800 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 16:45:14,450-Speed 3039.17 samples/sec Loss 5.3348 LearningRate 0.0130 Epoch: 12 Global Step: 158810 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 16:45:17,861-Speed 3003.18 samples/sec Loss 5.2939 LearningRate 0.0130 Epoch: 12 Global Step: 158820 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 16:45:21,217-Speed 3052.16 samples/sec Loss 5.3966 LearningRate 0.0130 Epoch: 12 Global Step: 158830 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:45:24,594-Speed 3032.96 samples/sec Loss 5.2533 LearningRate 0.0130 Epoch: 12 Global Step: 158840 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:45:28,020-Speed 2989.51 samples/sec Loss 5.3283 LearningRate 0.0130 Epoch: 12 Global Step: 158850 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:45:31,409-Speed 3023.21 samples/sec Loss 5.2418 LearningRate 0.0130 Epoch: 12 Global Step: 158860 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:45:34,765-Speed 3051.87 samples/sec Loss 5.2442 LearningRate 0.0130 Epoch: 12 Global Step: 158870 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:45:38,117-Speed 3055.21 samples/sec Loss 5.3658 LearningRate 0.0130 Epoch: 12 Global Step: 158880 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:45:41,449-Speed 3074.00 samples/sec Loss 5.3157 LearningRate 0.0130 Epoch: 12 Global Step: 158890 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:45:44,805-Speed 3052.42 samples/sec Loss 5.4485 LearningRate 0.0130 Epoch: 12 Global Step: 158900 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:45:48,233-Speed 2988.60 samples/sec Loss 5.1363 LearningRate 0.0130 Epoch: 12 Global Step: 158910 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:45:51,733-Speed 2925.76 samples/sec Loss 5.3728 LearningRate 0.0130 Epoch: 12 Global Step: 158920 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:45:55,079-Speed 3061.64 samples/sec Loss 5.3284 LearningRate 0.0130 Epoch: 12 Global Step: 158930 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 16:45:58,448-Speed 3040.32 samples/sec Loss 5.3435 LearningRate 0.0130 Epoch: 12 Global Step: 158940 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 16:46:01,833-Speed 3027.04 samples/sec Loss 5.2959 LearningRate 0.0130 Epoch: 12 Global Step: 158950 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:46:05,244-Speed 3002.44 samples/sec Loss 5.4041 LearningRate 0.0130 Epoch: 12 Global Step: 158960 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:46:08,595-Speed 3057.03 samples/sec Loss 5.2753 LearningRate 0.0130 Epoch: 12 Global Step: 158970 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:46:11,959-Speed 3044.60 samples/sec Loss 5.3071 LearningRate 0.0130 Epoch: 12 Global Step: 158980 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:46:15,356-Speed 3016.10 samples/sec Loss 5.2995 LearningRate 0.0130 Epoch: 12 Global Step: 158990 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:46:18,697-Speed 3065.54 samples/sec Loss 5.3621 LearningRate 0.0130 Epoch: 12 Global Step: 159000 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:46:22,047-Speed 3057.62 samples/sec Loss 5.3599 LearningRate 0.0130 Epoch: 12 Global Step: 159010 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:46:25,421-Speed 3035.93 samples/sec Loss 5.2790 LearningRate 0.0130 Epoch: 12 Global Step: 159020 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:46:28,844-Speed 2992.35 samples/sec Loss 5.2553 LearningRate 0.0129 Epoch: 12 Global Step: 159030 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:46:32,173-Speed 3076.07 samples/sec Loss 5.3896 LearningRate 0.0129 Epoch: 12 Global Step: 159040 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:46:35,501-Speed 3077.91 samples/sec Loss 5.3030 LearningRate 0.0129 Epoch: 12 Global Step: 159050 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 16:46:38,878-Speed 3033.54 samples/sec Loss 5.3882 LearningRate 0.0129 Epoch: 12 Global Step: 159060 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 16:46:42,236-Speed 3050.39 samples/sec Loss 5.2840 LearningRate 0.0129 Epoch: 12 Global Step: 159070 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 16:46:45,609-Speed 3036.76 samples/sec Loss 5.3894 LearningRate 0.0129 Epoch: 12 Global Step: 159080 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 16:46:49,003-Speed 3018.20 samples/sec Loss 5.2752 LearningRate 0.0129 Epoch: 12 Global Step: 159090 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:46:52,426-Speed 2991.79 samples/sec Loss 5.3726 LearningRate 0.0129 Epoch: 12 Global Step: 159100 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:46:55,859-Speed 2983.58 samples/sec Loss 5.3393 LearningRate 0.0129 Epoch: 12 Global Step: 159110 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:46:59,275-Speed 2999.09 samples/sec Loss 5.3995 LearningRate 0.0129 Epoch: 12 Global Step: 159120 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:47:02,725-Speed 2969.04 samples/sec Loss 5.4260 LearningRate 0.0129 Epoch: 12 Global Step: 159130 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:47:06,131-Speed 3006.56 samples/sec Loss 5.3328 LearningRate 0.0129 Epoch: 12 Global Step: 159140 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:47:09,550-Speed 2996.52 samples/sec Loss 5.2560 LearningRate 0.0129 Epoch: 12 Global Step: 159150 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:47:12,900-Speed 3057.51 samples/sec Loss 5.3790 LearningRate 0.0129 Epoch: 12 Global Step: 159160 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:47:16,282-Speed 3028.28 samples/sec Loss 5.2858 LearningRate 0.0129 Epoch: 12 Global Step: 159170 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:47:19,717-Speed 2981.79 samples/sec Loss 5.2265 LearningRate 0.0129 Epoch: 12 Global Step: 159180 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:47:23,161-Speed 2974.14 samples/sec Loss 5.3012 LearningRate 0.0129 Epoch: 12 Global Step: 159190 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 16:47:26,574-Speed 3001.16 samples/sec Loss 5.2143 LearningRate 0.0129 Epoch: 12 Global Step: 159200 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:47:29,945-Speed 3038.55 samples/sec Loss 5.2556 LearningRate 0.0129 Epoch: 12 Global Step: 159210 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:47:33,279-Speed 3072.34 samples/sec Loss 5.3436 LearningRate 0.0129 Epoch: 12 Global Step: 159220 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:47:36,720-Speed 2976.38 samples/sec Loss 5.2586 LearningRate 0.0129 Epoch: 12 Global Step: 159230 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:47:40,143-Speed 2992.25 samples/sec Loss 5.2739 LearningRate 0.0129 Epoch: 12 Global Step: 159240 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:47:43,537-Speed 3018.23 samples/sec Loss 5.3831 LearningRate 0.0129 Epoch: 12 Global Step: 159250 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:47:46,915-Speed 3032.34 samples/sec Loss 5.2754 LearningRate 0.0129 Epoch: 12 Global Step: 159260 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 16:47:50,413-Speed 2927.96 samples/sec Loss 5.2681 LearningRate 0.0129 Epoch: 12 Global Step: 159270 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 16:47:53,829-Speed 2998.78 samples/sec Loss 5.2479 LearningRate 0.0129 Epoch: 12 Global Step: 159280 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 16:47:57,157-Speed 3078.02 samples/sec Loss 5.3280 LearningRate 0.0129 Epoch: 12 Global Step: 159290 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 16:48:00,551-Speed 3018.19 samples/sec Loss 5.3287 LearningRate 0.0129 Epoch: 12 Global Step: 159300 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 16:48:04,021-Speed 2951.86 samples/sec Loss 5.2642 LearningRate 0.0129 Epoch: 12 Global Step: 159310 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 16:48:07,377-Speed 3052.31 samples/sec Loss 5.1949 LearningRate 0.0129 Epoch: 12 Global Step: 159320 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 16:48:10,836-Speed 2960.40 samples/sec Loss 5.3685 LearningRate 0.0129 Epoch: 12 Global Step: 159330 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 16:48:14,243-Speed 3006.98 samples/sec Loss 5.2950 LearningRate 0.0129 Epoch: 12 Global Step: 159340 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 16:48:17,722-Speed 2943.56 samples/sec Loss 5.3772 LearningRate 0.0129 Epoch: 12 Global Step: 159350 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 16:48:21,142-Speed 2994.99 samples/sec Loss 5.2943 LearningRate 0.0129 Epoch: 12 Global Step: 159360 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:48:24,552-Speed 3003.71 samples/sec Loss 5.4783 LearningRate 0.0128 Epoch: 12 Global Step: 159370 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:48:28,015-Speed 2958.45 samples/sec Loss 5.3242 LearningRate 0.0128 Epoch: 12 Global Step: 159380 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:48:31,426-Speed 3002.57 samples/sec Loss 5.3021 LearningRate 0.0128 Epoch: 12 Global Step: 159390 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:48:34,861-Speed 2982.09 samples/sec Loss 5.3285 LearningRate 0.0128 Epoch: 12 Global Step: 159400 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:48:38,271-Speed 3003.89 samples/sec Loss 5.3028 LearningRate 0.0128 Epoch: 12 Global Step: 159410 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:48:41,690-Speed 2995.98 samples/sec Loss 5.3000 LearningRate 0.0128 Epoch: 12 Global Step: 159420 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:48:45,013-Speed 3082.31 samples/sec Loss 5.2999 LearningRate 0.0128 Epoch: 12 Global Step: 159430 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:48:48,341-Speed 3077.41 samples/sec Loss 5.2947 LearningRate 0.0128 Epoch: 12 Global Step: 159440 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:48:51,757-Speed 2998.54 samples/sec Loss 5.2686 LearningRate 0.0128 Epoch: 12 Global Step: 159450 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:48:55,134-Speed 3032.98 samples/sec Loss 5.2341 LearningRate 0.0128 Epoch: 12 Global Step: 159460 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 16:48:58,558-Speed 2992.10 samples/sec Loss 5.3058 LearningRate 0.0128 Epoch: 12 Global Step: 159470 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 16:49:01,987-Speed 2987.59 samples/sec Loss 5.2178 LearningRate 0.0128 Epoch: 12 Global Step: 159480 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 16:49:05,372-Speed 3026.15 samples/sec Loss 5.3643 LearningRate 0.0128 Epoch: 12 Global Step: 159490 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 16:49:08,792-Speed 2994.71 samples/sec Loss 5.2300 LearningRate 0.0128 Epoch: 12 Global Step: 159500 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 16:49:12,189-Speed 3015.22 samples/sec Loss 5.2969 LearningRate 0.0128 Epoch: 12 Global Step: 159510 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 16:49:15,564-Speed 3034.58 samples/sec Loss 5.2979 LearningRate 0.0128 Epoch: 12 Global Step: 159520 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 16:49:18,914-Speed 3058.11 samples/sec Loss 5.2574 LearningRate 0.0128 Epoch: 12 Global Step: 159530 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:49:22,281-Speed 3041.90 samples/sec Loss 5.3201 LearningRate 0.0128 Epoch: 12 Global Step: 159540 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:49:25,619-Speed 3069.37 samples/sec Loss 5.2040 LearningRate 0.0128 Epoch: 12 Global Step: 159550 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:49:29,015-Speed 3016.49 samples/sec Loss 5.3873 LearningRate 0.0128 Epoch: 12 Global Step: 159560 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:49:32,478-Speed 2957.71 samples/sec Loss 5.3009 LearningRate 0.0128 Epoch: 12 Global Step: 159570 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:49:35,837-Speed 3049.38 samples/sec Loss 5.2961 LearningRate 0.0128 Epoch: 12 Global Step: 159580 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:49:39,242-Speed 3008.30 samples/sec Loss 5.3132 LearningRate 0.0128 Epoch: 12 Global Step: 159590 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:49:42,638-Speed 3016.24 samples/sec Loss 5.4018 LearningRate 0.0128 Epoch: 12 Global Step: 159600 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:49:45,983-Speed 3062.13 samples/sec Loss 5.2505 LearningRate 0.0128 Epoch: 12 Global Step: 159610 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:49:49,406-Speed 2992.08 samples/sec Loss 5.3003 LearningRate 0.0128 Epoch: 12 Global Step: 159620 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:49:52,742-Speed 3071.07 samples/sec Loss 5.3375 LearningRate 0.0128 Epoch: 12 Global Step: 159630 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 16:49:56,146-Speed 3008.49 samples/sec Loss 5.2946 LearningRate 0.0128 Epoch: 12 Global Step: 159640 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 16:49:59,524-Speed 3032.74 samples/sec Loss 5.1931 LearningRate 0.0128 Epoch: 12 Global Step: 159650 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 16:50:02,846-Speed 3083.19 samples/sec Loss 5.3752 LearningRate 0.0128 Epoch: 12 Global Step: 159660 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 16:50:06,253-Speed 3005.98 samples/sec Loss 5.3216 LearningRate 0.0128 Epoch: 12 Global Step: 159670 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:50:09,651-Speed 3014.50 samples/sec Loss 5.3708 LearningRate 0.0128 Epoch: 12 Global Step: 159680 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:50:13,006-Speed 3052.73 samples/sec Loss 5.2627 LearningRate 0.0128 Epoch: 12 Global Step: 159690 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:50:16,405-Speed 3014.44 samples/sec Loss 5.3166 LearningRate 0.0128 Epoch: 12 Global Step: 159700 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:50:19,880-Speed 2947.45 samples/sec Loss 5.4503 LearningRate 0.0128 Epoch: 12 Global Step: 159710 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:50:23,337-Speed 2962.91 samples/sec Loss 5.3062 LearningRate 0.0127 Epoch: 12 Global Step: 159720 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:50:26,713-Speed 3034.56 samples/sec Loss 5.3534 LearningRate 0.0127 Epoch: 12 Global Step: 159730 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:50:30,127-Speed 2999.84 samples/sec Loss 5.2911 LearningRate 0.0127 Epoch: 12 Global Step: 159740 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:50:33,531-Speed 3009.48 samples/sec Loss 5.3521 LearningRate 0.0127 Epoch: 12 Global Step: 159750 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:50:36,982-Speed 2968.99 samples/sec Loss 5.2322 LearningRate 0.0127 Epoch: 12 Global Step: 159760 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:50:40,404-Speed 2992.77 samples/sec Loss 5.3075 LearningRate 0.0127 Epoch: 12 Global Step: 159770 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 16:50:43,801-Speed 3015.85 samples/sec Loss 5.1583 LearningRate 0.0127 Epoch: 12 Global Step: 159780 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:50:47,142-Speed 3066.61 samples/sec Loss 5.4086 LearningRate 0.0127 Epoch: 12 Global Step: 159790 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:50:50,510-Speed 3040.85 samples/sec Loss 5.2788 LearningRate 0.0127 Epoch: 12 Global Step: 159800 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:50:53,942-Speed 2984.74 samples/sec Loss 5.2218 LearningRate 0.0127 Epoch: 12 Global Step: 159810 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:50:57,305-Speed 3045.69 samples/sec Loss 5.4152 LearningRate 0.0127 Epoch: 12 Global Step: 159820 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:51:00,722-Speed 2997.42 samples/sec Loss 5.1722 LearningRate 0.0127 Epoch: 12 Global Step: 159830 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:51:04,177-Speed 2964.84 samples/sec Loss 5.3646 LearningRate 0.0127 Epoch: 12 Global Step: 159840 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:51:07,550-Speed 3037.00 samples/sec Loss 5.1962 LearningRate 0.0127 Epoch: 12 Global Step: 159850 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:51:10,922-Speed 3037.28 samples/sec Loss 5.2182 LearningRate 0.0127 Epoch: 12 Global Step: 159860 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:51:14,293-Speed 3038.03 samples/sec Loss 5.2638 LearningRate 0.0127 Epoch: 12 Global Step: 159870 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:51:17,639-Speed 3062.08 samples/sec Loss 5.2784 LearningRate 0.0127 Epoch: 12 Global Step: 159880 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 16:51:21,008-Speed 3039.63 samples/sec Loss 5.1765 LearningRate 0.0127 Epoch: 12 Global Step: 159890 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:51:24,473-Speed 2956.23 samples/sec Loss 5.3096 LearningRate 0.0127 Epoch: 12 Global Step: 159900 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:51:27,863-Speed 3021.80 samples/sec Loss 5.1860 LearningRate 0.0127 Epoch: 12 Global Step: 159910 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:51:31,390-Speed 2903.74 samples/sec Loss 5.2645 LearningRate 0.0127 Epoch: 12 Global Step: 159920 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:51:34,812-Speed 2993.67 samples/sec Loss 5.2301 LearningRate 0.0127 Epoch: 12 Global Step: 159930 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:51:38,175-Speed 3045.42 samples/sec Loss 5.2935 LearningRate 0.0127 Epoch: 12 Global Step: 159940 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:51:41,554-Speed 3031.53 samples/sec Loss 5.2531 LearningRate 0.0127 Epoch: 12 Global Step: 159950 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:51:44,983-Speed 2987.41 samples/sec Loss 5.3530 LearningRate 0.0127 Epoch: 12 Global Step: 159960 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:51:48,453-Speed 2951.31 samples/sec Loss 5.4850 LearningRate 0.0127 Epoch: 12 Global Step: 159970 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:51:51,894-Speed 2976.97 samples/sec Loss 5.2629 LearningRate 0.0127 Epoch: 12 Global Step: 159980 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:51:55,352-Speed 2962.39 samples/sec Loss 5.3039 LearningRate 0.0127 Epoch: 12 Global Step: 159990 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 16:51:58,826-Speed 2948.40 samples/sec Loss 5.2154 LearningRate 0.0127 Epoch: 12 Global Step: 160000 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 16:52:02,192-Speed 3043.57 samples/sec Loss 5.2168 LearningRate 0.0127 Epoch: 12 Global Step: 160010 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 16:52:05,671-Speed 2944.15 samples/sec Loss 5.3403 LearningRate 0.0127 Epoch: 12 Global Step: 160020 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 16:52:09,045-Speed 3035.87 samples/sec Loss 5.2565 LearningRate 0.0127 Epoch: 12 Global Step: 160030 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:52:12,498-Speed 2966.50 samples/sec Loss 5.1999 LearningRate 0.0127 Epoch: 12 Global Step: 160040 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:52:15,869-Speed 3038.07 samples/sec Loss 5.2838 LearningRate 0.0127 Epoch: 12 Global Step: 160050 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:52:19,269-Speed 3012.91 samples/sec Loss 5.2244 LearningRate 0.0127 Epoch: 12 Global Step: 160060 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:52:22,612-Speed 3063.92 samples/sec Loss 5.2179 LearningRate 0.0126 Epoch: 12 Global Step: 160070 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:52:26,003-Speed 3020.38 samples/sec Loss 5.2597 LearningRate 0.0126 Epoch: 12 Global Step: 160080 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:52:29,356-Speed 3055.23 samples/sec Loss 5.3101 LearningRate 0.0126 Epoch: 12 Global Step: 160090 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:52:32,728-Speed 3038.06 samples/sec Loss 5.3306 LearningRate 0.0126 Epoch: 12 Global Step: 160100 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:52:36,111-Speed 3026.91 samples/sec Loss 5.3281 LearningRate 0.0126 Epoch: 12 Global Step: 160110 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:52:39,467-Speed 3052.23 samples/sec Loss 5.2827 LearningRate 0.0126 Epoch: 12 Global Step: 160120 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:52:42,802-Speed 3072.01 samples/sec Loss 5.3202 LearningRate 0.0126 Epoch: 12 Global Step: 160130 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 16:52:46,243-Speed 2976.37 samples/sec Loss 5.3372 LearningRate 0.0126 Epoch: 12 Global Step: 160140 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 16:52:49,621-Speed 3031.81 samples/sec Loss 5.3343 LearningRate 0.0126 Epoch: 12 Global Step: 160150 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 16:52:53,065-Speed 2974.46 samples/sec Loss 5.2906 LearningRate 0.0126 Epoch: 12 Global Step: 160160 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 16:52:56,459-Speed 3017.65 samples/sec Loss 5.3190 LearningRate 0.0126 Epoch: 12 Global Step: 160170 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 16:52:59,829-Speed 3039.79 samples/sec Loss 5.2541 LearningRate 0.0126 Epoch: 12 Global Step: 160180 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:53:03,209-Speed 3029.77 samples/sec Loss 5.2946 LearningRate 0.0126 Epoch: 12 Global Step: 160190 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 16:53:06,602-Speed 3019.48 samples/sec Loss 5.2344 LearningRate 0.0126 Epoch: 12 Global Step: 160200 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 16:53:09,940-Speed 3068.91 samples/sec Loss 5.3049 LearningRate 0.0126 Epoch: 12 Global Step: 160210 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 16:53:13,283-Speed 3063.83 samples/sec Loss 5.2656 LearningRate 0.0126 Epoch: 12 Global Step: 160220 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 16:53:16,673-Speed 3020.99 samples/sec Loss 5.2602 LearningRate 0.0126 Epoch: 12 Global Step: 160230 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 16:53:20,031-Speed 3050.64 samples/sec Loss 5.2811 LearningRate 0.0126 Epoch: 12 Global Step: 160240 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 16:53:23,396-Speed 3043.98 samples/sec Loss 5.3675 LearningRate 0.0126 Epoch: 12 Global Step: 160250 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 16:53:26,747-Speed 3056.93 samples/sec Loss 5.2590 LearningRate 0.0126 Epoch: 12 Global Step: 160260 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 16:53:30,160-Speed 3001.47 samples/sec Loss 5.2346 LearningRate 0.0126 Epoch: 12 Global Step: 160270 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 16:53:33,539-Speed 3030.57 samples/sec Loss 5.3146 LearningRate 0.0126 Epoch: 12 Global Step: 160280 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 16:53:36,866-Speed 3079.01 samples/sec Loss 5.2681 LearningRate 0.0126 Epoch: 12 Global Step: 160290 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 16:53:40,233-Speed 3042.51 samples/sec Loss 5.2127 LearningRate 0.0126 Epoch: 12 Global Step: 160300 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 16:53:43,597-Speed 3044.41 samples/sec Loss 5.2532 LearningRate 0.0126 Epoch: 12 Global Step: 160310 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 16:53:46,957-Speed 3048.42 samples/sec Loss 5.1864 LearningRate 0.0126 Epoch: 12 Global Step: 160320 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 16:53:50,333-Speed 3034.51 samples/sec Loss 5.2205 LearningRate 0.0126 Epoch: 12 Global Step: 160330 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 16:53:53,700-Speed 3042.11 samples/sec Loss 5.2789 LearningRate 0.0126 Epoch: 12 Global Step: 160340 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 16:53:57,017-Speed 3087.94 samples/sec Loss 5.3729 LearningRate 0.0126 Epoch: 12 Global Step: 160350 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 16:54:00,364-Speed 3060.82 samples/sec Loss 5.3431 LearningRate 0.0126 Epoch: 12 Global Step: 160360 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 16:54:03,810-Speed 2972.73 samples/sec Loss 5.2134 LearningRate 0.0126 Epoch: 12 Global Step: 160370 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 16:54:07,125-Speed 3089.74 samples/sec Loss 5.1951 LearningRate 0.0126 Epoch: 12 Global Step: 160380 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 16:54:10,527-Speed 3010.73 samples/sec Loss 5.3174 LearningRate 0.0126 Epoch: 12 Global Step: 160390 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 16:54:14,007-Speed 2943.65 samples/sec Loss 5.1813 LearningRate 0.0126 Epoch: 12 Global Step: 160400 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 16:54:17,420-Speed 3001.04 samples/sec Loss 5.2216 LearningRate 0.0126 Epoch: 12 Global Step: 160410 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:54:20,803-Speed 3027.95 samples/sec Loss 5.2362 LearningRate 0.0125 Epoch: 12 Global Step: 160420 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:54:24,223-Speed 2994.25 samples/sec Loss 5.3243 LearningRate 0.0125 Epoch: 12 Global Step: 160430 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:54:27,634-Speed 3003.44 samples/sec Loss 5.1550 LearningRate 0.0125 Epoch: 12 Global Step: 160440 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:54:31,117-Speed 2940.21 samples/sec Loss 5.2981 LearningRate 0.0125 Epoch: 12 Global Step: 160450 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:54:34,501-Speed 3027.65 samples/sec Loss 5.2316 LearningRate 0.0125 Epoch: 12 Global Step: 160460 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:54:37,886-Speed 3025.27 samples/sec Loss 5.1607 LearningRate 0.0125 Epoch: 12 Global Step: 160470 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:54:41,301-Speed 3000.01 samples/sec Loss 5.2557 LearningRate 0.0125 Epoch: 12 Global Step: 160480 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:54:44,716-Speed 2999.52 samples/sec Loss 5.3073 LearningRate 0.0125 Epoch: 12 Global Step: 160490 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:54:48,115-Speed 3013.16 samples/sec Loss 5.2454 LearningRate 0.0125 Epoch: 12 Global Step: 160500 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:54:51,471-Speed 3052.19 samples/sec Loss 5.2458 LearningRate 0.0125 Epoch: 12 Global Step: 160510 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 16:54:54,906-Speed 2981.84 samples/sec Loss 5.2243 LearningRate 0.0125 Epoch: 12 Global Step: 160520 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 16:54:58,347-Speed 2976.58 samples/sec Loss 5.1702 LearningRate 0.0125 Epoch: 12 Global Step: 160530 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 16:55:01,663-Speed 3089.07 samples/sec Loss 5.2101 LearningRate 0.0125 Epoch: 12 Global Step: 160540 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 16:55:05,155-Speed 2933.62 samples/sec Loss 5.2153 LearningRate 0.0125 Epoch: 12 Global Step: 160550 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 16:55:08,514-Speed 3050.40 samples/sec Loss 5.2798 LearningRate 0.0125 Epoch: 12 Global Step: 160560 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 16:55:11,983-Speed 2952.74 samples/sec Loss 5.1585 LearningRate 0.0125 Epoch: 12 Global Step: 160570 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 16:55:15,387-Speed 3008.83 samples/sec Loss 5.3420 LearningRate 0.0125 Epoch: 12 Global Step: 160580 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 16:55:18,805-Speed 2996.79 samples/sec Loss 5.2301 LearningRate 0.0125 Epoch: 12 Global Step: 160590 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 16:55:22,228-Speed 2992.51 samples/sec Loss 5.2574 LearningRate 0.0125 Epoch: 12 Global Step: 160600 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 16:55:25,564-Speed 3069.94 samples/sec Loss 5.3146 LearningRate 0.0125 Epoch: 12 Global Step: 160610 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 16:55:28,969-Speed 3008.77 samples/sec Loss 5.2891 LearningRate 0.0125 Epoch: 12 Global Step: 160620 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:55:32,369-Speed 3017.80 samples/sec Loss 5.3106 LearningRate 0.0125 Epoch: 12 Global Step: 160630 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:55:35,726-Speed 3051.63 samples/sec Loss 5.1655 LearningRate 0.0125 Epoch: 12 Global Step: 160640 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:55:39,104-Speed 3031.70 samples/sec Loss 5.2072 LearningRate 0.0125 Epoch: 12 Global Step: 160650 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:55:42,527-Speed 2993.03 samples/sec Loss 5.1956 LearningRate 0.0125 Epoch: 12 Global Step: 160660 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:55:45,919-Speed 3019.53 samples/sec Loss 5.3228 LearningRate 0.0125 Epoch: 12 Global Step: 160670 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:55:49,281-Speed 3047.60 samples/sec Loss 5.1661 LearningRate 0.0125 Epoch: 12 Global Step: 160680 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:55:52,698-Speed 2997.27 samples/sec Loss 5.2380 LearningRate 0.0125 Epoch: 12 Global Step: 160690 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:55:56,204-Speed 2921.08 samples/sec Loss 5.2116 LearningRate 0.0125 Epoch: 12 Global Step: 160700 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:55:59,625-Speed 2993.90 samples/sec Loss 5.2580 LearningRate 0.0125 Epoch: 12 Global Step: 160710 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:56:03,014-Speed 3023.34 samples/sec Loss 5.1397 LearningRate 0.0125 Epoch: 12 Global Step: 160720 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 16:56:06,470-Speed 2964.13 samples/sec Loss 5.2316 LearningRate 0.0125 Epoch: 12 Global Step: 160730 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 16:56:09,934-Speed 2956.77 samples/sec Loss 5.2204 LearningRate 0.0125 Epoch: 12 Global Step: 160740 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 16:56:13,290-Speed 3052.22 samples/sec Loss 5.3204 LearningRate 0.0125 Epoch: 12 Global Step: 160750 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 16:56:16,734-Speed 2973.59 samples/sec Loss 5.2624 LearningRate 0.0125 Epoch: 12 Global Step: 160760 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 16:56:20,146-Speed 3003.09 samples/sec Loss 5.2809 LearningRate 0.0124 Epoch: 12 Global Step: 160770 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 16:56:23,465-Speed 3086.04 samples/sec Loss 5.2508 LearningRate 0.0124 Epoch: 12 Global Step: 160780 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 16:56:26,812-Speed 3060.21 samples/sec Loss 5.2129 LearningRate 0.0124 Epoch: 12 Global Step: 160790 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 16:56:30,212-Speed 3012.37 samples/sec Loss 5.1940 LearningRate 0.0124 Epoch: 12 Global Step: 160800 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 16:56:33,632-Speed 2995.10 samples/sec Loss 5.3156 LearningRate 0.0124 Epoch: 12 Global Step: 160810 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 16:56:37,022-Speed 3020.97 samples/sec Loss 5.2160 LearningRate 0.0124 Epoch: 12 Global Step: 160820 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 16:56:40,426-Speed 3009.27 samples/sec Loss 5.3065 LearningRate 0.0124 Epoch: 12 Global Step: 160830 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 16:56:43,841-Speed 2999.96 samples/sec Loss 5.2996 LearningRate 0.0124 Epoch: 12 Global Step: 160840 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:56:47,222-Speed 3028.91 samples/sec Loss 5.3638 LearningRate 0.0124 Epoch: 12 Global Step: 160850 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:56:50,596-Speed 3035.66 samples/sec Loss 5.3055 LearningRate 0.0124 Epoch: 12 Global Step: 160860 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:56:54,008-Speed 3001.72 samples/sec Loss 5.1424 LearningRate 0.0124 Epoch: 12 Global Step: 160870 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:56:57,449-Speed 2976.96 samples/sec Loss 5.1832 LearningRate 0.0124 Epoch: 12 Global Step: 160880 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:57:00,766-Speed 3088.43 samples/sec Loss 5.2016 LearningRate 0.0124 Epoch: 12 Global Step: 160890 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:57:04,126-Speed 3048.57 samples/sec Loss 5.3150 LearningRate 0.0124 Epoch: 12 Global Step: 160900 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:57:07,597-Speed 2951.26 samples/sec Loss 5.2482 LearningRate 0.0124 Epoch: 12 Global Step: 160910 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:57:11,018-Speed 2993.67 samples/sec Loss 5.2566 LearningRate 0.0124 Epoch: 12 Global Step: 160920 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:57:14,393-Speed 3035.39 samples/sec Loss 5.2860 LearningRate 0.0124 Epoch: 12 Global Step: 160930 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:57:17,757-Speed 3044.38 samples/sec Loss 5.2408 LearningRate 0.0124 Epoch: 12 Global Step: 160940 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 16:57:21,200-Speed 2975.36 samples/sec Loss 5.1642 LearningRate 0.0124 Epoch: 12 Global Step: 160950 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 16:57:24,653-Speed 2966.41 samples/sec Loss 5.3286 LearningRate 0.0124 Epoch: 12 Global Step: 160960 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 16:57:28,087-Speed 2982.40 samples/sec Loss 5.2099 LearningRate 0.0124 Epoch: 12 Global Step: 160970 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 16:57:31,572-Speed 2939.75 samples/sec Loss 5.2847 LearningRate 0.0124 Epoch: 12 Global Step: 160980 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 16:57:34,987-Speed 2998.81 samples/sec Loss 5.2896 LearningRate 0.0124 Epoch: 12 Global Step: 160990 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 16:57:38,322-Speed 3071.80 samples/sec Loss 5.1476 LearningRate 0.0124 Epoch: 12 Global Step: 161000 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 16:57:41,753-Speed 2985.36 samples/sec Loss 5.1827 LearningRate 0.0124 Epoch: 12 Global Step: 161010 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 16:57:45,128-Speed 3034.41 samples/sec Loss 5.2370 LearningRate 0.0124 Epoch: 12 Global Step: 161020 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 16:57:48,553-Speed 2990.74 samples/sec Loss 5.2035 LearningRate 0.0124 Epoch: 12 Global Step: 161030 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 16:57:51,940-Speed 3024.60 samples/sec Loss 5.1686 LearningRate 0.0124 Epoch: 12 Global Step: 161040 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:57:55,303-Speed 3045.56 samples/sec Loss 5.2215 LearningRate 0.0124 Epoch: 12 Global Step: 161050 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:57:58,688-Speed 3025.49 samples/sec Loss 5.3129 LearningRate 0.0124 Epoch: 12 Global Step: 161060 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 16:58:02,065-Speed 3033.71 samples/sec Loss 5.3241 LearningRate 0.0124 Epoch: 12 Global Step: 161070 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 16:58:05,506-Speed 2976.52 samples/sec Loss 5.1590 LearningRate 0.0124 Epoch: 12 Global Step: 161080 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 16:58:08,876-Speed 3039.27 samples/sec Loss 5.3016 LearningRate 0.0124 Epoch: 12 Global Step: 161090 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 16:58:12,312-Speed 2982.64 samples/sec Loss 5.2199 LearningRate 0.0124 Epoch: 12 Global Step: 161100 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 16:58:15,685-Speed 3036.90 samples/sec Loss 5.1933 LearningRate 0.0124 Epoch: 12 Global Step: 161110 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 16:58:19,081-Speed 3016.33 samples/sec Loss 5.1900 LearningRate 0.0123 Epoch: 12 Global Step: 161120 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 16:58:22,470-Speed 3022.63 samples/sec Loss 5.1711 LearningRate 0.0123 Epoch: 12 Global Step: 161130 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 16:58:25,788-Speed 3086.54 samples/sec Loss 5.1931 LearningRate 0.0123 Epoch: 12 Global Step: 161140 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 16:58:29,154-Speed 3043.09 samples/sec Loss 5.1304 LearningRate 0.0123 Epoch: 12 Global Step: 161150 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 16:58:32,516-Speed 3046.93 samples/sec Loss 5.1112 LearningRate 0.0123 Epoch: 12 Global Step: 161160 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:58:35,852-Speed 3070.30 samples/sec Loss 5.2386 LearningRate 0.0123 Epoch: 12 Global Step: 161170 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:58:39,247-Speed 3017.43 samples/sec Loss 5.2389 LearningRate 0.0123 Epoch: 12 Global Step: 161180 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:58:42,629-Speed 3028.28 samples/sec Loss 5.1987 LearningRate 0.0123 Epoch: 12 Global Step: 161190 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 16:58:46,004-Speed 3035.47 samples/sec Loss 5.2687 LearningRate 0.0123 Epoch: 12 Global Step: 161200 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 16:58:49,326-Speed 3084.21 samples/sec Loss 5.2344 LearningRate 0.0123 Epoch: 12 Global Step: 161210 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 16:58:53,345-Speed 2548.20 samples/sec Loss 5.2985 LearningRate 0.0123 Epoch: 12 Global Step: 161220 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 16:58:56,785-Speed 2976.94 samples/sec Loss 5.1946 LearningRate 0.0123 Epoch: 12 Global Step: 161230 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 16:59:00,231-Speed 2972.95 samples/sec Loss 5.2191 LearningRate 0.0123 Epoch: 12 Global Step: 161240 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 16:59:03,680-Speed 2969.47 samples/sec Loss 5.2360 LearningRate 0.0123 Epoch: 12 Global Step: 161250 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 16:59:07,103-Speed 2992.55 samples/sec Loss 5.2188 LearningRate 0.0123 Epoch: 12 Global Step: 161260 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 16:59:10,500-Speed 3015.67 samples/sec Loss 5.1800 LearningRate 0.0123 Epoch: 12 Global Step: 161270 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 16:59:13,927-Speed 2988.85 samples/sec Loss 5.1755 LearningRate 0.0123 Epoch: 12 Global Step: 161280 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 16:59:17,330-Speed 3009.85 samples/sec Loss 5.1072 LearningRate 0.0123 Epoch: 12 Global Step: 161290 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:59:20,781-Speed 2968.23 samples/sec Loss 5.2519 LearningRate 0.0123 Epoch: 12 Global Step: 161300 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:59:24,163-Speed 3028.35 samples/sec Loss 5.1480 LearningRate 0.0123 Epoch: 12 Global Step: 161310 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:59:27,635-Speed 2950.52 samples/sec Loss 5.2030 LearningRate 0.0123 Epoch: 12 Global Step: 161320 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:59:31,034-Speed 3014.21 samples/sec Loss 5.2528 LearningRate 0.0123 Epoch: 12 Global Step: 161330 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:59:34,470-Speed 2981.13 samples/sec Loss 5.2390 LearningRate 0.0123 Epoch: 12 Global Step: 161340 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:59:37,876-Speed 3007.27 samples/sec Loss 5.2501 LearningRate 0.0123 Epoch: 12 Global Step: 161350 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:59:41,248-Speed 3037.86 samples/sec Loss 5.2071 LearningRate 0.0123 Epoch: 12 Global Step: 161360 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:59:44,624-Speed 3034.10 samples/sec Loss 5.3184 LearningRate 0.0123 Epoch: 12 Global Step: 161370 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:59:48,042-Speed 2996.81 samples/sec Loss 5.2652 LearningRate 0.0123 Epoch: 12 Global Step: 161380 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 16:59:51,377-Speed 3071.61 samples/sec Loss 5.1520 LearningRate 0.0123 Epoch: 12 Global Step: 161390 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 16:59:54,797-Speed 2994.74 samples/sec Loss 5.1613 LearningRate 0.0123 Epoch: 12 Global Step: 161400 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 16:59:58,134-Speed 3071.02 samples/sec Loss 5.2196 LearningRate 0.0123 Epoch: 12 Global Step: 161410 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:00:01,488-Speed 3054.14 samples/sec Loss 5.2892 LearningRate 0.0123 Epoch: 12 Global Step: 161420 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:00:04,900-Speed 3001.96 samples/sec Loss 5.3001 LearningRate 0.0123 Epoch: 12 Global Step: 161430 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:00:08,239-Speed 3067.91 samples/sec Loss 5.2496 LearningRate 0.0123 Epoch: 12 Global Step: 161440 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:00:11,605-Speed 3042.37 samples/sec Loss 5.1671 LearningRate 0.0123 Epoch: 12 Global Step: 161450 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:00:15,031-Speed 2990.27 samples/sec Loss 5.1857 LearningRate 0.0123 Epoch: 12 Global Step: 161460 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:00:18,577-Speed 2888.45 samples/sec Loss 5.1882 LearningRate 0.0123 Epoch: 12 Global Step: 161470 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:00:49,768-Speed 328.32 samples/sec Loss 4.1629 LearningRate 0.0122 Epoch: 13 Global Step: 161480 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:00:53,528-Speed 2724.18 samples/sec Loss 3.8057 LearningRate 0.0122 Epoch: 13 Global Step: 161490 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:00:56,839-Speed 3093.72 samples/sec Loss 3.7816 LearningRate 0.0122 Epoch: 13 Global Step: 161500 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:01:00,196-Speed 3051.36 samples/sec Loss 3.7689 LearningRate 0.0122 Epoch: 13 Global Step: 161510 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:01:03,636-Speed 2977.67 samples/sec Loss 3.8991 LearningRate 0.0122 Epoch: 13 Global Step: 161520 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:01:06,997-Speed 3047.70 samples/sec Loss 3.8470 LearningRate 0.0122 Epoch: 13 Global Step: 161530 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:01:10,366-Speed 3040.37 samples/sec Loss 3.8059 LearningRate 0.0122 Epoch: 13 Global Step: 161540 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:01:13,759-Speed 3018.66 samples/sec Loss 3.9172 LearningRate 0.0122 Epoch: 13 Global Step: 161550 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:01:17,199-Speed 2978.18 samples/sec Loss 3.8364 LearningRate 0.0122 Epoch: 13 Global Step: 161560 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:01:20,644-Speed 2973.46 samples/sec Loss 3.8220 LearningRate 0.0122 Epoch: 13 Global Step: 161570 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:01:24,017-Speed 3037.08 samples/sec Loss 3.7172 LearningRate 0.0122 Epoch: 13 Global Step: 161580 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:01:27,488-Speed 2950.75 samples/sec Loss 3.8377 LearningRate 0.0122 Epoch: 13 Global Step: 161590 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:01:30,991-Speed 2924.03 samples/sec Loss 3.8595 LearningRate 0.0122 Epoch: 13 Global Step: 161600 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:01:34,596-Speed 2841.30 samples/sec Loss 3.8116 LearningRate 0.0122 Epoch: 13 Global Step: 161610 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:01:38,301-Speed 2764.25 samples/sec Loss 3.8510 LearningRate 0.0122 Epoch: 13 Global Step: 161620 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:01:41,741-Speed 2978.33 samples/sec Loss 3.7937 LearningRate 0.0122 Epoch: 13 Global Step: 161630 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:01:45,092-Speed 3056.62 samples/sec Loss 3.8784 LearningRate 0.0122 Epoch: 13 Global Step: 161640 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:01:48,479-Speed 3023.95 samples/sec Loss 3.8700 LearningRate 0.0122 Epoch: 13 Global Step: 161650 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:01:51,842-Speed 3046.50 samples/sec Loss 3.9399 LearningRate 0.0122 Epoch: 13 Global Step: 161660 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:01:55,208-Speed 3042.30 samples/sec Loss 3.7579 LearningRate 0.0122 Epoch: 13 Global Step: 161670 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:01:58,615-Speed 3006.70 samples/sec Loss 3.8176 LearningRate 0.0122 Epoch: 13 Global Step: 161680 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:02:02,021-Speed 3007.26 samples/sec Loss 3.9546 LearningRate 0.0122 Epoch: 13 Global Step: 161690 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:02:05,381-Speed 3048.78 samples/sec Loss 3.7863 LearningRate 0.0122 Epoch: 13 Global Step: 161700 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:02:08,745-Speed 3044.83 samples/sec Loss 3.8083 LearningRate 0.0122 Epoch: 13 Global Step: 161710 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:02:12,129-Speed 3026.27 samples/sec Loss 3.8229 LearningRate 0.0122 Epoch: 13 Global Step: 161720 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:02:15,516-Speed 3024.56 samples/sec Loss 3.8835 LearningRate 0.0122 Epoch: 13 Global Step: 161730 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:02:18,996-Speed 2943.41 samples/sec Loss 3.8304 LearningRate 0.0122 Epoch: 13 Global Step: 161740 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:02:22,416-Speed 2995.37 samples/sec Loss 3.9259 LearningRate 0.0122 Epoch: 13 Global Step: 161750 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:02:25,755-Speed 3068.01 samples/sec Loss 3.8309 LearningRate 0.0122 Epoch: 13 Global Step: 161760 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:02:29,179-Speed 2991.40 samples/sec Loss 3.7665 LearningRate 0.0122 Epoch: 13 Global Step: 161770 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:02:32,605-Speed 2989.93 samples/sec Loss 3.7946 LearningRate 0.0122 Epoch: 13 Global Step: 161780 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:02:35,972-Speed 3041.94 samples/sec Loss 3.8410 LearningRate 0.0122 Epoch: 13 Global Step: 161790 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:02:39,350-Speed 3032.25 samples/sec Loss 3.7649 LearningRate 0.0122 Epoch: 13 Global Step: 161800 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:02:42,761-Speed 3003.06 samples/sec Loss 3.8393 LearningRate 0.0122 Epoch: 13 Global Step: 161810 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:02:46,100-Speed 3067.92 samples/sec Loss 3.8518 LearningRate 0.0122 Epoch: 13 Global Step: 161820 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:02:49,573-Speed 2949.67 samples/sec Loss 3.9590 LearningRate 0.0121 Epoch: 13 Global Step: 161830 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:02:53,020-Speed 2971.42 samples/sec Loss 3.9605 LearningRate 0.0121 Epoch: 13 Global Step: 161840 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:02:56,406-Speed 3025.01 samples/sec Loss 3.9579 LearningRate 0.0121 Epoch: 13 Global Step: 161850 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:02:59,841-Speed 2982.33 samples/sec Loss 3.8656 LearningRate 0.0121 Epoch: 13 Global Step: 161860 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:03:03,250-Speed 3004.63 samples/sec Loss 3.8373 LearningRate 0.0121 Epoch: 13 Global Step: 161870 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:03:06,635-Speed 3025.54 samples/sec Loss 3.8427 LearningRate 0.0121 Epoch: 13 Global Step: 161880 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:03:09,995-Speed 3047.96 samples/sec Loss 3.8777 LearningRate 0.0121 Epoch: 13 Global Step: 161890 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:03:13,379-Speed 3026.98 samples/sec Loss 3.8506 LearningRate 0.0121 Epoch: 13 Global Step: 161900 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:03:16,720-Speed 3065.96 samples/sec Loss 3.9199 LearningRate 0.0121 Epoch: 13 Global Step: 161910 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:03:20,039-Speed 3086.70 samples/sec Loss 3.8829 LearningRate 0.0121 Epoch: 13 Global Step: 161920 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:03:23,408-Speed 3040.38 samples/sec Loss 3.8637 LearningRate 0.0121 Epoch: 13 Global Step: 161930 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:03:26,760-Speed 3055.63 samples/sec Loss 3.9317 LearningRate 0.0121 Epoch: 13 Global Step: 161940 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:03:30,123-Speed 3045.38 samples/sec Loss 3.9290 LearningRate 0.0121 Epoch: 13 Global Step: 161950 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:03:33,524-Speed 3012.06 samples/sec Loss 3.8490 LearningRate 0.0121 Epoch: 13 Global Step: 161960 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:03:36,934-Speed 3003.22 samples/sec Loss 3.9766 LearningRate 0.0121 Epoch: 13 Global Step: 161970 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:03:40,340-Speed 3007.74 samples/sec Loss 3.9480 LearningRate 0.0121 Epoch: 13 Global Step: 161980 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:03:43,686-Speed 3060.87 samples/sec Loss 3.9586 LearningRate 0.0121 Epoch: 13 Global Step: 161990 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:03:47,108-Speed 2993.77 samples/sec Loss 3.9353 LearningRate 0.0121 Epoch: 13 Global Step: 162000 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:03:50,452-Speed 3063.33 samples/sec Loss 4.0452 LearningRate 0.0121 Epoch: 13 Global Step: 162010 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:03:53,800-Speed 3060.04 samples/sec Loss 3.8920 LearningRate 0.0121 Epoch: 13 Global Step: 162020 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:03:57,130-Speed 3076.22 samples/sec Loss 3.9300 LearningRate 0.0121 Epoch: 13 Global Step: 162030 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:04:00,566-Speed 2981.38 samples/sec Loss 3.9901 LearningRate 0.0121 Epoch: 13 Global Step: 162040 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:04:03,922-Speed 3051.83 samples/sec Loss 3.9139 LearningRate 0.0121 Epoch: 13 Global Step: 162050 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:04:07,245-Speed 3082.52 samples/sec Loss 3.9395 LearningRate 0.0121 Epoch: 13 Global Step: 162060 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:04:10,596-Speed 3056.77 samples/sec Loss 3.8120 LearningRate 0.0121 Epoch: 13 Global Step: 162070 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:04:13,923-Speed 3078.74 samples/sec Loss 3.9278 LearningRate 0.0121 Epoch: 13 Global Step: 162080 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:04:17,305-Speed 3030.06 samples/sec Loss 3.9418 LearningRate 0.0121 Epoch: 13 Global Step: 162090 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:04:20,737-Speed 2984.16 samples/sec Loss 3.8143 LearningRate 0.0121 Epoch: 13 Global Step: 162100 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:04:24,157-Speed 2994.99 samples/sec Loss 3.9509 LearningRate 0.0121 Epoch: 13 Global Step: 162110 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:04:27,618-Speed 2959.89 samples/sec Loss 3.9871 LearningRate 0.0121 Epoch: 13 Global Step: 162120 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:04:30,937-Speed 3086.21 samples/sec Loss 3.9985 LearningRate 0.0121 Epoch: 13 Global Step: 162130 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:04:34,276-Speed 3067.30 samples/sec Loss 3.8762 LearningRate 0.0121 Epoch: 13 Global Step: 162140 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:04:37,591-Speed 3090.31 samples/sec Loss 3.8476 LearningRate 0.0121 Epoch: 13 Global Step: 162150 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:04:40,946-Speed 3052.69 samples/sec Loss 4.0608 LearningRate 0.0121 Epoch: 13 Global Step: 162160 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:04:44,317-Speed 3038.97 samples/sec Loss 3.9650 LearningRate 0.0121 Epoch: 13 Global Step: 162170 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:04:47,677-Speed 3048.78 samples/sec Loss 4.0906 LearningRate 0.0121 Epoch: 13 Global Step: 162180 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:04:51,050-Speed 3036.93 samples/sec Loss 3.9613 LearningRate 0.0120 Epoch: 13 Global Step: 162190 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:04:54,433-Speed 3027.36 samples/sec Loss 3.8214 LearningRate 0.0120 Epoch: 13 Global Step: 162200 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:04:57,798-Speed 3044.49 samples/sec Loss 4.0196 LearningRate 0.0120 Epoch: 13 Global Step: 162210 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:05:01,120-Speed 3082.79 samples/sec Loss 3.9247 LearningRate 0.0120 Epoch: 13 Global Step: 162220 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:05:04,453-Speed 3072.93 samples/sec Loss 3.9431 LearningRate 0.0120 Epoch: 13 Global Step: 162230 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:05:07,839-Speed 3025.47 samples/sec Loss 3.9733 LearningRate 0.0120 Epoch: 13 Global Step: 162240 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:05:11,168-Speed 3076.74 samples/sec Loss 4.0665 LearningRate 0.0120 Epoch: 13 Global Step: 162250 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:05:14,496-Speed 3077.19 samples/sec Loss 3.9859 LearningRate 0.0120 Epoch: 13 Global Step: 162260 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:05:17,875-Speed 3032.20 samples/sec Loss 4.0102 LearningRate 0.0120 Epoch: 13 Global Step: 162270 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:05:21,210-Speed 3070.78 samples/sec Loss 3.9258 LearningRate 0.0120 Epoch: 13 Global Step: 162280 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:05:24,574-Speed 3044.85 samples/sec Loss 3.9245 LearningRate 0.0120 Epoch: 13 Global Step: 162290 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:05:27,892-Speed 3087.59 samples/sec Loss 3.9922 LearningRate 0.0120 Epoch: 13 Global Step: 162300 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:05:31,262-Speed 3039.34 samples/sec Loss 3.9656 LearningRate 0.0120 Epoch: 13 Global Step: 162310 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:05:34,670-Speed 3005.46 samples/sec Loss 4.0008 LearningRate 0.0120 Epoch: 13 Global Step: 162320 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:05:38,172-Speed 2925.33 samples/sec Loss 3.9164 LearningRate 0.0120 Epoch: 13 Global Step: 162330 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:05:41,607-Speed 2981.91 samples/sec Loss 4.0772 LearningRate 0.0120 Epoch: 13 Global Step: 162340 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:05:44,940-Speed 3073.26 samples/sec Loss 3.9411 LearningRate 0.0120 Epoch: 13 Global Step: 162350 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:05:48,295-Speed 3052.78 samples/sec Loss 4.1733 LearningRate 0.0120 Epoch: 13 Global Step: 162360 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:05:51,699-Speed 3008.60 samples/sec Loss 4.0131 LearningRate 0.0120 Epoch: 13 Global Step: 162370 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:05:55,155-Speed 2964.74 samples/sec Loss 4.0899 LearningRate 0.0120 Epoch: 13 Global Step: 162380 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:05:58,552-Speed 3015.10 samples/sec Loss 3.9861 LearningRate 0.0120 Epoch: 13 Global Step: 162390 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:06:01,961-Speed 3004.08 samples/sec Loss 4.0054 LearningRate 0.0120 Epoch: 13 Global Step: 162400 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:06:05,307-Speed 3061.25 samples/sec Loss 4.0759 LearningRate 0.0120 Epoch: 13 Global Step: 162410 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:06:08,750-Speed 2975.27 samples/sec Loss 3.9550 LearningRate 0.0120 Epoch: 13 Global Step: 162420 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:06:12,092-Speed 3065.73 samples/sec Loss 4.0321 LearningRate 0.0120 Epoch: 13 Global Step: 162430 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:06:15,472-Speed 3029.66 samples/sec Loss 4.0037 LearningRate 0.0120 Epoch: 13 Global Step: 162440 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:06:18,796-Speed 3081.96 samples/sec Loss 4.0727 LearningRate 0.0120 Epoch: 13 Global Step: 162450 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:06:22,158-Speed 3046.48 samples/sec Loss 4.0263 LearningRate 0.0120 Epoch: 13 Global Step: 162460 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:06:25,633-Speed 2947.66 samples/sec Loss 4.0023 LearningRate 0.0120 Epoch: 13 Global Step: 162470 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:06:29,029-Speed 3016.12 samples/sec Loss 4.0500 LearningRate 0.0120 Epoch: 13 Global Step: 162480 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:06:32,499-Speed 2952.62 samples/sec Loss 4.0161 LearningRate 0.0120 Epoch: 13 Global Step: 162490 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:06:35,966-Speed 2954.60 samples/sec Loss 4.0294 LearningRate 0.0120 Epoch: 13 Global Step: 162500 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:06:39,339-Speed 3036.97 samples/sec Loss 4.0341 LearningRate 0.0120 Epoch: 13 Global Step: 162510 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:06:42,667-Speed 3077.66 samples/sec Loss 4.0919 LearningRate 0.0120 Epoch: 13 Global Step: 162520 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:06:46,038-Speed 3038.42 samples/sec Loss 4.0806 LearningRate 0.0120 Epoch: 13 Global Step: 162530 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:06:49,419-Speed 3030.05 samples/sec Loss 4.0191 LearningRate 0.0120 Epoch: 13 Global Step: 162540 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:06:52,839-Speed 2994.16 samples/sec Loss 4.0572 LearningRate 0.0119 Epoch: 13 Global Step: 162550 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:06:56,279-Speed 2978.07 samples/sec Loss 4.0834 LearningRate 0.0119 Epoch: 13 Global Step: 162560 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:06:59,699-Speed 2994.62 samples/sec Loss 4.0891 LearningRate 0.0119 Epoch: 13 Global Step: 162570 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:07:03,088-Speed 3022.21 samples/sec Loss 4.1022 LearningRate 0.0119 Epoch: 13 Global Step: 162580 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:07:06,418-Speed 3076.45 samples/sec Loss 4.0666 LearningRate 0.0119 Epoch: 13 Global Step: 162590 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:07:09,807-Speed 3022.47 samples/sec Loss 4.0966 LearningRate 0.0119 Epoch: 13 Global Step: 162600 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:07:13,117-Speed 3094.99 samples/sec Loss 4.0918 LearningRate 0.0119 Epoch: 13 Global Step: 162610 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:07:16,444-Speed 3077.90 samples/sec Loss 4.1149 LearningRate 0.0119 Epoch: 13 Global Step: 162620 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:07:19,881-Speed 2980.85 samples/sec Loss 4.1185 LearningRate 0.0119 Epoch: 13 Global Step: 162630 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:07:23,283-Speed 3011.41 samples/sec Loss 4.0499 LearningRate 0.0119 Epoch: 13 Global Step: 162640 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:07:26,677-Speed 3017.95 samples/sec Loss 4.0765 LearningRate 0.0119 Epoch: 13 Global Step: 162650 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:07:30,026-Speed 3057.74 samples/sec Loss 4.0757 LearningRate 0.0119 Epoch: 13 Global Step: 162660 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:07:33,414-Speed 3023.40 samples/sec Loss 3.9936 LearningRate 0.0119 Epoch: 13 Global Step: 162670 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:07:36,885-Speed 2951.20 samples/sec Loss 4.0905 LearningRate 0.0119 Epoch: 13 Global Step: 162680 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:07:40,301-Speed 2998.99 samples/sec Loss 4.0960 LearningRate 0.0119 Epoch: 13 Global Step: 162690 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:07:43,646-Speed 3061.36 samples/sec Loss 4.0205 LearningRate 0.0119 Epoch: 13 Global Step: 162700 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:07:47,049-Speed 3010.36 samples/sec Loss 4.1395 LearningRate 0.0119 Epoch: 13 Global Step: 162710 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:07:50,488-Speed 2978.53 samples/sec Loss 4.0789 LearningRate 0.0119 Epoch: 13 Global Step: 162720 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:07:53,865-Speed 3032.91 samples/sec Loss 4.1465 LearningRate 0.0119 Epoch: 13 Global Step: 162730 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:07:57,242-Speed 3033.32 samples/sec Loss 4.0205 LearningRate 0.0119 Epoch: 13 Global Step: 162740 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:08:00,597-Speed 3053.08 samples/sec Loss 4.1008 LearningRate 0.0119 Epoch: 13 Global Step: 162750 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:08:03,913-Speed 3088.20 samples/sec Loss 4.0959 LearningRate 0.0119 Epoch: 13 Global Step: 162760 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:08:07,232-Speed 3086.62 samples/sec Loss 4.2306 LearningRate 0.0119 Epoch: 13 Global Step: 162770 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:08:10,652-Speed 2995.18 samples/sec Loss 4.0956 LearningRate 0.0119 Epoch: 13 Global Step: 162780 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:08:14,104-Speed 2967.02 samples/sec Loss 4.1297 LearningRate 0.0119 Epoch: 13 Global Step: 162790 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:08:17,421-Speed 3087.98 samples/sec Loss 4.0700 LearningRate 0.0119 Epoch: 13 Global Step: 162800 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:08:20,756-Speed 3071.37 samples/sec Loss 4.1866 LearningRate 0.0119 Epoch: 13 Global Step: 162810 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:08:24,124-Speed 3040.80 samples/sec Loss 4.0368 LearningRate 0.0119 Epoch: 13 Global Step: 162820 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:08:27,448-Speed 3083.84 samples/sec Loss 4.0841 LearningRate 0.0119 Epoch: 13 Global Step: 162830 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:08:30,874-Speed 2989.78 samples/sec Loss 4.0836 LearningRate 0.0119 Epoch: 13 Global Step: 162840 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:08:34,287-Speed 3000.58 samples/sec Loss 4.1906 LearningRate 0.0119 Epoch: 13 Global Step: 162850 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:08:37,724-Speed 2980.27 samples/sec Loss 4.1209 LearningRate 0.0119 Epoch: 13 Global Step: 162860 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:08:41,076-Speed 3055.64 samples/sec Loss 4.1398 LearningRate 0.0119 Epoch: 13 Global Step: 162870 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:08:44,455-Speed 3031.21 samples/sec Loss 4.1798 LearningRate 0.0119 Epoch: 13 Global Step: 162880 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:08:47,845-Speed 3022.32 samples/sec Loss 4.1270 LearningRate 0.0119 Epoch: 13 Global Step: 162890 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:08:51,242-Speed 3015.23 samples/sec Loss 4.0835 LearningRate 0.0119 Epoch: 13 Global Step: 162900 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:08:54,631-Speed 3021.57 samples/sec Loss 4.1918 LearningRate 0.0118 Epoch: 13 Global Step: 162910 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:08:58,032-Speed 3011.72 samples/sec Loss 4.2050 LearningRate 0.0118 Epoch: 13 Global Step: 162920 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:09:01,431-Speed 3014.14 samples/sec Loss 4.2149 LearningRate 0.0118 Epoch: 13 Global Step: 162930 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:09:04,775-Speed 3062.31 samples/sec Loss 4.1610 LearningRate 0.0118 Epoch: 13 Global Step: 162940 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:09:08,117-Speed 3065.39 samples/sec Loss 4.1328 LearningRate 0.0118 Epoch: 13 Global Step: 162950 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:09:11,491-Speed 3035.49 samples/sec Loss 4.1529 LearningRate 0.0118 Epoch: 13 Global Step: 162960 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:09:14,818-Speed 3078.46 samples/sec Loss 4.1972 LearningRate 0.0118 Epoch: 13 Global Step: 162970 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:09:18,199-Speed 3029.76 samples/sec Loss 4.2673 LearningRate 0.0118 Epoch: 13 Global Step: 162980 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:09:21,611-Speed 3002.50 samples/sec Loss 4.1626 LearningRate 0.0118 Epoch: 13 Global Step: 162990 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:09:24,997-Speed 3024.71 samples/sec Loss 4.1632 LearningRate 0.0118 Epoch: 13 Global Step: 163000 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:09:28,337-Speed 3066.90 samples/sec Loss 4.2304 LearningRate 0.0118 Epoch: 13 Global Step: 163010 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:09:31,769-Speed 2984.35 samples/sec Loss 4.1807 LearningRate 0.0118 Epoch: 13 Global Step: 163020 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:09:35,123-Speed 3053.75 samples/sec Loss 4.1825 LearningRate 0.0118 Epoch: 13 Global Step: 163030 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:09:38,486-Speed 3045.64 samples/sec Loss 4.1462 LearningRate 0.0118 Epoch: 13 Global Step: 163040 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:09:41,841-Speed 3053.46 samples/sec Loss 4.1650 LearningRate 0.0118 Epoch: 13 Global Step: 163050 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:09:45,174-Speed 3073.01 samples/sec Loss 4.2414 LearningRate 0.0118 Epoch: 13 Global Step: 163060 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:09:48,504-Speed 3076.18 samples/sec Loss 4.2414 LearningRate 0.0118 Epoch: 13 Global Step: 163070 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:09:51,897-Speed 3018.66 samples/sec Loss 4.2509 LearningRate 0.0118 Epoch: 13 Global Step: 163080 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:09:55,235-Speed 3068.12 samples/sec Loss 4.2716 LearningRate 0.0118 Epoch: 13 Global Step: 163090 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:09:58,699-Speed 2957.21 samples/sec Loss 4.1236 LearningRate 0.0118 Epoch: 13 Global Step: 163100 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:10:02,158-Speed 2960.97 samples/sec Loss 4.1207 LearningRate 0.0118 Epoch: 13 Global Step: 163110 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:10:05,537-Speed 3031.91 samples/sec Loss 4.2433 LearningRate 0.0118 Epoch: 13 Global Step: 163120 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:10:08,962-Speed 2990.92 samples/sec Loss 4.2492 LearningRate 0.0118 Epoch: 13 Global Step: 163130 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:10:12,351-Speed 3022.26 samples/sec Loss 4.2776 LearningRate 0.0118 Epoch: 13 Global Step: 163140 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:10:15,666-Speed 3089.64 samples/sec Loss 4.2430 LearningRate 0.0118 Epoch: 13 Global Step: 163150 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:10:19,090-Speed 2992.06 samples/sec Loss 4.1241 LearningRate 0.0118 Epoch: 13 Global Step: 163160 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:10:22,516-Speed 2989.71 samples/sec Loss 4.1665 LearningRate 0.0118 Epoch: 13 Global Step: 163170 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:10:25,903-Speed 3023.80 samples/sec Loss 4.2282 LearningRate 0.0118 Epoch: 13 Global Step: 163180 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:10:29,257-Speed 3053.86 samples/sec Loss 4.2342 LearningRate 0.0118 Epoch: 13 Global Step: 163190 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:10:32,657-Speed 3012.74 samples/sec Loss 4.1700 LearningRate 0.0118 Epoch: 13 Global Step: 163200 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:10:36,068-Speed 3002.86 samples/sec Loss 4.1656 LearningRate 0.0118 Epoch: 13 Global Step: 163210 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:10:39,426-Speed 3049.80 samples/sec Loss 4.1698 LearningRate 0.0118 Epoch: 13 Global Step: 163220 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:10:42,766-Speed 3066.61 samples/sec Loss 4.2152 LearningRate 0.0118 Epoch: 13 Global Step: 163230 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:10:46,100-Speed 3072.56 samples/sec Loss 4.1760 LearningRate 0.0118 Epoch: 13 Global Step: 163240 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:10:49,622-Speed 2908.46 samples/sec Loss 4.1735 LearningRate 0.0118 Epoch: 13 Global Step: 163250 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:10:53,083-Speed 2960.26 samples/sec Loss 4.1781 LearningRate 0.0118 Epoch: 13 Global Step: 163260 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:10:56,463-Speed 3030.64 samples/sec Loss 4.2397 LearningRate 0.0117 Epoch: 13 Global Step: 163270 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:10:59,858-Speed 3016.40 samples/sec Loss 4.2745 LearningRate 0.0117 Epoch: 13 Global Step: 163280 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:11:03,295-Speed 2980.14 samples/sec Loss 4.1263 LearningRate 0.0117 Epoch: 13 Global Step: 163290 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:11:06,690-Speed 3017.39 samples/sec Loss 4.1905 LearningRate 0.0117 Epoch: 13 Global Step: 163300 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:11:10,120-Speed 2986.33 samples/sec Loss 4.2020 LearningRate 0.0117 Epoch: 13 Global Step: 163310 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:11:13,503-Speed 3027.70 samples/sec Loss 4.1637 LearningRate 0.0117 Epoch: 13 Global Step: 163320 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:11:16,925-Speed 2992.65 samples/sec Loss 4.1308 LearningRate 0.0117 Epoch: 13 Global Step: 163330 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:11:20,331-Speed 3007.76 samples/sec Loss 4.1824 LearningRate 0.0117 Epoch: 13 Global Step: 163340 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:11:23,722-Speed 3020.63 samples/sec Loss 4.2840 LearningRate 0.0117 Epoch: 13 Global Step: 163350 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:11:27,084-Speed 3046.67 samples/sec Loss 4.2807 LearningRate 0.0117 Epoch: 13 Global Step: 163360 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:11:30,531-Speed 2971.68 samples/sec Loss 4.2229 LearningRate 0.0117 Epoch: 13 Global Step: 163370 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:11:34,041-Speed 2918.68 samples/sec Loss 4.3116 LearningRate 0.0117 Epoch: 13 Global Step: 163380 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:11:37,515-Speed 2948.95 samples/sec Loss 4.3037 LearningRate 0.0117 Epoch: 13 Global Step: 163390 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:11:40,984-Speed 2952.71 samples/sec Loss 4.1818 LearningRate 0.0117 Epoch: 13 Global Step: 163400 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:11:44,437-Speed 2966.35 samples/sec Loss 4.2607 LearningRate 0.0117 Epoch: 13 Global Step: 163410 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:11:47,912-Speed 2946.88 samples/sec Loss 4.2461 LearningRate 0.0117 Epoch: 13 Global Step: 163420 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:11:51,353-Speed 2977.26 samples/sec Loss 4.1110 LearningRate 0.0117 Epoch: 13 Global Step: 163430 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:11:54,885-Speed 2899.94 samples/sec Loss 4.2372 LearningRate 0.0117 Epoch: 13 Global Step: 163440 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:11:58,348-Speed 2957.94 samples/sec Loss 4.3073 LearningRate 0.0117 Epoch: 13 Global Step: 163450 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:12:01,751-Speed 3009.75 samples/sec Loss 4.2366 LearningRate 0.0117 Epoch: 13 Global Step: 163460 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:12:05,104-Speed 3055.13 samples/sec Loss 4.3395 LearningRate 0.0117 Epoch: 13 Global Step: 163470 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:12:08,521-Speed 2997.26 samples/sec Loss 4.2869 LearningRate 0.0117 Epoch: 13 Global Step: 163480 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:12:11,985-Speed 2956.94 samples/sec Loss 4.2825 LearningRate 0.0117 Epoch: 13 Global Step: 163490 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:12:15,415-Speed 2986.73 samples/sec Loss 4.2614 LearningRate 0.0117 Epoch: 13 Global Step: 163500 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:12:18,824-Speed 3004.84 samples/sec Loss 4.2165 LearningRate 0.0117 Epoch: 13 Global Step: 163510 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:12:22,259-Speed 2981.85 samples/sec Loss 4.1936 LearningRate 0.0117 Epoch: 13 Global Step: 163520 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:12:25,726-Speed 2954.06 samples/sec Loss 4.2199 LearningRate 0.0117 Epoch: 13 Global Step: 163530 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:12:29,180-Speed 2965.50 samples/sec Loss 4.1824 LearningRate 0.0117 Epoch: 13 Global Step: 163540 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:12:32,565-Speed 3026.20 samples/sec Loss 4.2497 LearningRate 0.0117 Epoch: 13 Global Step: 163550 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:12:35,923-Speed 3049.98 samples/sec Loss 4.2953 LearningRate 0.0117 Epoch: 13 Global Step: 163560 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:12:39,309-Speed 3025.43 samples/sec Loss 4.1970 LearningRate 0.0117 Epoch: 13 Global Step: 163570 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:12:42,696-Speed 3024.70 samples/sec Loss 4.2752 LearningRate 0.0117 Epoch: 13 Global Step: 163580 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:12:46,073-Speed 3032.23 samples/sec Loss 4.2199 LearningRate 0.0117 Epoch: 13 Global Step: 163590 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:12:49,442-Speed 3040.15 samples/sec Loss 4.3102 LearningRate 0.0117 Epoch: 13 Global Step: 163600 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:12:52,821-Speed 3032.07 samples/sec Loss 4.2745 LearningRate 0.0117 Epoch: 13 Global Step: 163610 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:12:56,128-Speed 3096.85 samples/sec Loss 4.2956 LearningRate 0.0117 Epoch: 13 Global Step: 163620 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:12:59,461-Speed 3073.97 samples/sec Loss 4.3057 LearningRate 0.0116 Epoch: 13 Global Step: 163630 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:13:02,786-Speed 3080.12 samples/sec Loss 4.2670 LearningRate 0.0116 Epoch: 13 Global Step: 163640 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:13:06,186-Speed 3012.69 samples/sec Loss 4.2500 LearningRate 0.0116 Epoch: 13 Global Step: 163650 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:13:09,571-Speed 3026.22 samples/sec Loss 4.2155 LearningRate 0.0116 Epoch: 13 Global Step: 163660 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:13:13,001-Speed 2985.69 samples/sec Loss 4.2942 LearningRate 0.0116 Epoch: 13 Global Step: 163670 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:13:16,420-Speed 2996.19 samples/sec Loss 4.2161 LearningRate 0.0116 Epoch: 13 Global Step: 163680 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:13:19,857-Speed 2980.74 samples/sec Loss 4.1931 LearningRate 0.0116 Epoch: 13 Global Step: 163690 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:13:23,222-Speed 3044.31 samples/sec Loss 4.2723 LearningRate 0.0116 Epoch: 13 Global Step: 163700 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:13:26,542-Speed 3085.17 samples/sec Loss 4.3704 LearningRate 0.0116 Epoch: 13 Global Step: 163710 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:13:29,882-Speed 3066.92 samples/sec Loss 4.2590 LearningRate 0.0116 Epoch: 13 Global Step: 163720 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:13:33,295-Speed 3001.22 samples/sec Loss 4.2543 LearningRate 0.0116 Epoch: 13 Global Step: 163730 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:13:36,714-Speed 2995.74 samples/sec Loss 4.3364 LearningRate 0.0116 Epoch: 13 Global Step: 163740 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:13:40,106-Speed 3020.05 samples/sec Loss 4.2609 LearningRate 0.0116 Epoch: 13 Global Step: 163750 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:13:43,485-Speed 3031.72 samples/sec Loss 4.2898 LearningRate 0.0116 Epoch: 13 Global Step: 163760 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:13:46,855-Speed 3039.04 samples/sec Loss 4.2434 LearningRate 0.0116 Epoch: 13 Global Step: 163770 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:13:50,188-Speed 3073.72 samples/sec Loss 4.2293 LearningRate 0.0116 Epoch: 13 Global Step: 163780 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:13:53,506-Speed 3087.40 samples/sec Loss 4.1982 LearningRate 0.0116 Epoch: 13 Global Step: 163790 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:13:56,822-Speed 3088.55 samples/sec Loss 4.3564 LearningRate 0.0116 Epoch: 13 Global Step: 163800 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:14:00,165-Speed 3063.88 samples/sec Loss 4.3031 LearningRate 0.0116 Epoch: 13 Global Step: 163810 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:14:03,537-Speed 3037.48 samples/sec Loss 4.2464 LearningRate 0.0116 Epoch: 13 Global Step: 163820 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:14:06,875-Speed 3068.63 samples/sec Loss 4.3397 LearningRate 0.0116 Epoch: 13 Global Step: 163830 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:14:10,265-Speed 3021.59 samples/sec Loss 4.2955 LearningRate 0.0116 Epoch: 13 Global Step: 163840 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:14:13,647-Speed 3028.49 samples/sec Loss 4.2220 LearningRate 0.0116 Epoch: 13 Global Step: 163850 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:14:17,054-Speed 3006.29 samples/sec Loss 4.4015 LearningRate 0.0116 Epoch: 13 Global Step: 163860 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:14:20,453-Speed 3013.65 samples/sec Loss 4.2687 LearningRate 0.0116 Epoch: 13 Global Step: 163870 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:14:23,777-Speed 3081.20 samples/sec Loss 4.2637 LearningRate 0.0116 Epoch: 13 Global Step: 163880 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:14:27,124-Speed 3060.91 samples/sec Loss 4.2315 LearningRate 0.0116 Epoch: 13 Global Step: 163890 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:14:30,510-Speed 3025.32 samples/sec Loss 4.3162 LearningRate 0.0116 Epoch: 13 Global Step: 163900 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:14:33,934-Speed 2991.32 samples/sec Loss 4.1595 LearningRate 0.0116 Epoch: 13 Global Step: 163910 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:14:37,388-Speed 2965.60 samples/sec Loss 4.2680 LearningRate 0.0116 Epoch: 13 Global Step: 163920 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:14:40,752-Speed 3045.24 samples/sec Loss 4.2094 LearningRate 0.0116 Epoch: 13 Global Step: 163930 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:14:44,164-Speed 3001.86 samples/sec Loss 4.2980 LearningRate 0.0116 Epoch: 13 Global Step: 163940 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:14:47,566-Speed 3011.09 samples/sec Loss 4.3257 LearningRate 0.0116 Epoch: 13 Global Step: 163950 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:14:50,962-Speed 3016.29 samples/sec Loss 4.3764 LearningRate 0.0116 Epoch: 13 Global Step: 163960 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:14:54,363-Speed 3011.04 samples/sec Loss 4.3225 LearningRate 0.0116 Epoch: 13 Global Step: 163970 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:14:57,755-Speed 3019.57 samples/sec Loss 4.2844 LearningRate 0.0116 Epoch: 13 Global Step: 163980 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:15:01,068-Speed 3091.92 samples/sec Loss 4.3072 LearningRate 0.0116 Epoch: 13 Global Step: 163990 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:15:04,469-Speed 3011.75 samples/sec Loss 4.3493 LearningRate 0.0115 Epoch: 13 Global Step: 164000 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:15:07,872-Speed 3010.14 samples/sec Loss 4.3445 LearningRate 0.0115 Epoch: 13 Global Step: 164010 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:15:11,274-Speed 3010.67 samples/sec Loss 4.3105 LearningRate 0.0115 Epoch: 13 Global Step: 164020 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:15:14,642-Speed 3041.66 samples/sec Loss 4.1768 LearningRate 0.0115 Epoch: 13 Global Step: 164030 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:15:18,095-Speed 2966.41 samples/sec Loss 4.2028 LearningRate 0.0115 Epoch: 13 Global Step: 164040 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:15:21,482-Speed 3023.70 samples/sec Loss 4.3207 LearningRate 0.0115 Epoch: 13 Global Step: 164050 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:15:24,921-Speed 2978.77 samples/sec Loss 4.3715 LearningRate 0.0115 Epoch: 13 Global Step: 164060 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:15:28,270-Speed 3059.13 samples/sec Loss 4.3491 LearningRate 0.0115 Epoch: 13 Global Step: 164070 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:15:31,690-Speed 2994.32 samples/sec Loss 4.2816 LearningRate 0.0115 Epoch: 13 Global Step: 164080 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:15:35,090-Speed 3013.02 samples/sec Loss 4.3811 LearningRate 0.0115 Epoch: 13 Global Step: 164090 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:15:38,508-Speed 2996.77 samples/sec Loss 4.3362 LearningRate 0.0115 Epoch: 13 Global Step: 164100 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:15:41,836-Speed 3078.26 samples/sec Loss 4.3110 LearningRate 0.0115 Epoch: 13 Global Step: 164110 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:15:45,182-Speed 3061.16 samples/sec Loss 4.3369 LearningRate 0.0115 Epoch: 13 Global Step: 164120 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:15:48,638-Speed 2963.62 samples/sec Loss 4.4086 LearningRate 0.0115 Epoch: 13 Global Step: 164130 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:15:52,046-Speed 3005.66 samples/sec Loss 4.3347 LearningRate 0.0115 Epoch: 13 Global Step: 164140 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:15:55,448-Speed 3010.96 samples/sec Loss 4.3856 LearningRate 0.0115 Epoch: 13 Global Step: 164150 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:15:58,776-Speed 3077.91 samples/sec Loss 4.3212 LearningRate 0.0115 Epoch: 13 Global Step: 164160 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:16:02,145-Speed 3040.61 samples/sec Loss 4.3376 LearningRate 0.0115 Epoch: 13 Global Step: 164170 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:16:05,633-Speed 2936.70 samples/sec Loss 4.3637 LearningRate 0.0115 Epoch: 13 Global Step: 164180 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:16:09,093-Speed 2960.51 samples/sec Loss 4.3752 LearningRate 0.0115 Epoch: 13 Global Step: 164190 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:16:12,468-Speed 3035.32 samples/sec Loss 4.4229 LearningRate 0.0115 Epoch: 13 Global Step: 164200 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:16:15,818-Speed 3056.90 samples/sec Loss 4.2849 LearningRate 0.0115 Epoch: 13 Global Step: 164210 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:16:19,237-Speed 2996.47 samples/sec Loss 4.3500 LearningRate 0.0115 Epoch: 13 Global Step: 164220 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:16:22,606-Speed 3039.67 samples/sec Loss 4.4081 LearningRate 0.0115 Epoch: 13 Global Step: 164230 Fp16 Grad Scale: 131072 Required: 8 hours Training: 2022-04-27 17:16:25,972-Speed 3042.77 samples/sec Loss 4.4600 LearningRate 0.0115 Epoch: 13 Global Step: 164240 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:16:29,368-Speed 3016.27 samples/sec Loss 4.3536 LearningRate 0.0115 Epoch: 13 Global Step: 164250 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:16:32,818-Speed 2969.62 samples/sec Loss 4.2708 LearningRate 0.0115 Epoch: 13 Global Step: 164260 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:16:36,236-Speed 2996.13 samples/sec Loss 4.3641 LearningRate 0.0115 Epoch: 13 Global Step: 164270 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:16:39,586-Speed 3057.54 samples/sec Loss 4.2887 LearningRate 0.0115 Epoch: 13 Global Step: 164280 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:16:42,922-Speed 3070.09 samples/sec Loss 4.4306 LearningRate 0.0115 Epoch: 13 Global Step: 164290 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:16:46,302-Speed 3032.10 samples/sec Loss 4.3775 LearningRate 0.0115 Epoch: 13 Global Step: 164300 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:16:49,706-Speed 3008.67 samples/sec Loss 4.3794 LearningRate 0.0115 Epoch: 13 Global Step: 164310 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:16:53,147-Speed 2976.66 samples/sec Loss 4.3573 LearningRate 0.0115 Epoch: 13 Global Step: 164320 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:16:56,586-Speed 2978.50 samples/sec Loss 4.4134 LearningRate 0.0115 Epoch: 13 Global Step: 164330 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:17:00,079-Speed 2932.85 samples/sec Loss 4.3999 LearningRate 0.0115 Epoch: 13 Global Step: 164340 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:17:03,493-Speed 3000.02 samples/sec Loss 4.3671 LearningRate 0.0115 Epoch: 13 Global Step: 164350 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:17:07,002-Speed 2918.96 samples/sec Loss 4.3317 LearningRate 0.0115 Epoch: 13 Global Step: 164360 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:17:10,403-Speed 3012.27 samples/sec Loss 4.3332 LearningRate 0.0114 Epoch: 13 Global Step: 164370 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:17:13,811-Speed 3005.46 samples/sec Loss 4.3725 LearningRate 0.0114 Epoch: 13 Global Step: 164380 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:17:17,185-Speed 3035.28 samples/sec Loss 4.2982 LearningRate 0.0114 Epoch: 13 Global Step: 164390 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:17:20,609-Speed 2991.86 samples/sec Loss 4.3257 LearningRate 0.0114 Epoch: 13 Global Step: 164400 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:17:23,971-Speed 3046.28 samples/sec Loss 4.3238 LearningRate 0.0114 Epoch: 13 Global Step: 164410 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:17:27,335-Speed 3044.98 samples/sec Loss 4.3797 LearningRate 0.0114 Epoch: 13 Global Step: 164420 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:17:30,721-Speed 3025.53 samples/sec Loss 4.3365 LearningRate 0.0114 Epoch: 13 Global Step: 164430 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:17:34,145-Speed 2991.52 samples/sec Loss 4.3040 LearningRate 0.0114 Epoch: 13 Global Step: 164440 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:17:37,554-Speed 3004.00 samples/sec Loss 4.3608 LearningRate 0.0114 Epoch: 13 Global Step: 164450 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:17:40,959-Speed 3008.51 samples/sec Loss 4.2930 LearningRate 0.0114 Epoch: 13 Global Step: 164460 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:17:44,405-Speed 2971.94 samples/sec Loss 4.4680 LearningRate 0.0114 Epoch: 13 Global Step: 164470 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:17:47,820-Speed 2999.82 samples/sec Loss 4.3807 LearningRate 0.0114 Epoch: 13 Global Step: 164480 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:17:51,247-Speed 2988.98 samples/sec Loss 4.4476 LearningRate 0.0114 Epoch: 13 Global Step: 164490 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:17:54,565-Speed 3086.97 samples/sec Loss 4.3018 LearningRate 0.0114 Epoch: 13 Global Step: 164500 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:17:57,944-Speed 3031.12 samples/sec Loss 4.3494 LearningRate 0.0114 Epoch: 13 Global Step: 164510 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:18:01,326-Speed 3029.47 samples/sec Loss 4.3451 LearningRate 0.0114 Epoch: 13 Global Step: 164520 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:18:04,722-Speed 3015.24 samples/sec Loss 4.3896 LearningRate 0.0114 Epoch: 13 Global Step: 164530 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:18:08,130-Speed 3005.68 samples/sec Loss 4.3816 LearningRate 0.0114 Epoch: 13 Global Step: 164540 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:18:11,468-Speed 3068.74 samples/sec Loss 4.2773 LearningRate 0.0114 Epoch: 13 Global Step: 164550 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:18:14,818-Speed 3057.61 samples/sec Loss 4.3574 LearningRate 0.0114 Epoch: 13 Global Step: 164560 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:18:18,219-Speed 3012.21 samples/sec Loss 4.4007 LearningRate 0.0114 Epoch: 13 Global Step: 164570 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:18:21,574-Speed 3052.36 samples/sec Loss 4.4411 LearningRate 0.0114 Epoch: 13 Global Step: 164580 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:18:24,960-Speed 3025.34 samples/sec Loss 4.3813 LearningRate 0.0114 Epoch: 13 Global Step: 164590 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:18:28,320-Speed 3048.54 samples/sec Loss 4.2915 LearningRate 0.0114 Epoch: 13 Global Step: 164600 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:18:31,709-Speed 3022.10 samples/sec Loss 4.4771 LearningRate 0.0114 Epoch: 13 Global Step: 164610 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:18:35,103-Speed 3018.18 samples/sec Loss 4.3611 LearningRate 0.0114 Epoch: 13 Global Step: 164620 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:18:38,521-Speed 2996.62 samples/sec Loss 4.4147 LearningRate 0.0114 Epoch: 13 Global Step: 164630 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:18:41,957-Speed 2981.04 samples/sec Loss 4.3568 LearningRate 0.0114 Epoch: 13 Global Step: 164640 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:18:45,291-Speed 3072.87 samples/sec Loss 4.4081 LearningRate 0.0114 Epoch: 13 Global Step: 164650 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:18:48,671-Speed 3030.86 samples/sec Loss 4.4164 LearningRate 0.0114 Epoch: 13 Global Step: 164660 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:18:51,996-Speed 3080.48 samples/sec Loss 4.4411 LearningRate 0.0114 Epoch: 13 Global Step: 164670 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:18:55,341-Speed 3062.48 samples/sec Loss 4.3385 LearningRate 0.0114 Epoch: 13 Global Step: 164680 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:18:58,747-Speed 3007.57 samples/sec Loss 4.4723 LearningRate 0.0114 Epoch: 13 Global Step: 164690 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:19:02,208-Speed 2959.10 samples/sec Loss 4.3517 LearningRate 0.0114 Epoch: 13 Global Step: 164700 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:19:05,616-Speed 3005.27 samples/sec Loss 4.3725 LearningRate 0.0114 Epoch: 13 Global Step: 164710 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:19:09,006-Speed 3021.47 samples/sec Loss 4.3967 LearningRate 0.0114 Epoch: 13 Global Step: 164720 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:19:12,415-Speed 3004.53 samples/sec Loss 4.3934 LearningRate 0.0113 Epoch: 13 Global Step: 164730 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:19:15,771-Speed 3052.37 samples/sec Loss 4.4589 LearningRate 0.0113 Epoch: 13 Global Step: 164740 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:19:19,240-Speed 2952.29 samples/sec Loss 4.3480 LearningRate 0.0113 Epoch: 13 Global Step: 164750 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:19:22,690-Speed 2968.90 samples/sec Loss 4.4787 LearningRate 0.0113 Epoch: 13 Global Step: 164760 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:19:26,182-Speed 2933.63 samples/sec Loss 4.3771 LearningRate 0.0113 Epoch: 13 Global Step: 164770 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:19:29,633-Speed 2967.83 samples/sec Loss 4.4415 LearningRate 0.0113 Epoch: 13 Global Step: 164780 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:19:32,970-Speed 3069.84 samples/sec Loss 4.3787 LearningRate 0.0113 Epoch: 13 Global Step: 164790 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:19:36,363-Speed 3018.11 samples/sec Loss 4.3883 LearningRate 0.0113 Epoch: 13 Global Step: 164800 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:19:39,826-Speed 2958.04 samples/sec Loss 4.3813 LearningRate 0.0113 Epoch: 13 Global Step: 164810 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:19:43,207-Speed 3029.73 samples/sec Loss 4.3349 LearningRate 0.0113 Epoch: 13 Global Step: 164820 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:19:46,541-Speed 3071.53 samples/sec Loss 4.4412 LearningRate 0.0113 Epoch: 13 Global Step: 164830 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:19:49,924-Speed 3028.71 samples/sec Loss 4.4064 LearningRate 0.0113 Epoch: 13 Global Step: 164840 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:19:53,301-Speed 3033.41 samples/sec Loss 4.4089 LearningRate 0.0113 Epoch: 13 Global Step: 164850 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:19:56,722-Speed 2994.17 samples/sec Loss 4.3675 LearningRate 0.0113 Epoch: 13 Global Step: 164860 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:20:00,187-Speed 2955.83 samples/sec Loss 4.5026 LearningRate 0.0113 Epoch: 13 Global Step: 164870 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:20:03,607-Speed 2994.73 samples/sec Loss 4.5000 LearningRate 0.0113 Epoch: 13 Global Step: 164880 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:20:06,923-Speed 3089.17 samples/sec Loss 4.4477 LearningRate 0.0113 Epoch: 13 Global Step: 164890 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:20:10,255-Speed 3074.64 samples/sec Loss 4.4714 LearningRate 0.0113 Epoch: 13 Global Step: 164900 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:20:13,645-Speed 3020.96 samples/sec Loss 4.3541 LearningRate 0.0113 Epoch: 13 Global Step: 164910 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:20:17,006-Speed 3047.54 samples/sec Loss 4.3124 LearningRate 0.0113 Epoch: 13 Global Step: 164920 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:20:20,421-Speed 2999.38 samples/sec Loss 4.4268 LearningRate 0.0113 Epoch: 13 Global Step: 164930 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:20:23,832-Speed 3003.05 samples/sec Loss 4.3905 LearningRate 0.0113 Epoch: 13 Global Step: 164940 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:20:27,266-Speed 2983.04 samples/sec Loss 4.4960 LearningRate 0.0113 Epoch: 13 Global Step: 164950 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:20:30,732-Speed 2956.10 samples/sec Loss 4.3773 LearningRate 0.0113 Epoch: 13 Global Step: 164960 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:20:34,109-Speed 3032.46 samples/sec Loss 4.3944 LearningRate 0.0113 Epoch: 13 Global Step: 164970 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:20:37,505-Speed 3016.02 samples/sec Loss 4.3915 LearningRate 0.0113 Epoch: 13 Global Step: 164980 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:20:40,935-Speed 2986.71 samples/sec Loss 4.4531 LearningRate 0.0113 Epoch: 13 Global Step: 164990 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:20:44,390-Speed 2964.70 samples/sec Loss 4.4381 LearningRate 0.0113 Epoch: 13 Global Step: 165000 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:20:47,823-Speed 2983.62 samples/sec Loss 4.4451 LearningRate 0.0113 Epoch: 13 Global Step: 165010 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:20:51,217-Speed 3019.00 samples/sec Loss 4.4852 LearningRate 0.0113 Epoch: 13 Global Step: 165020 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:20:54,569-Speed 3055.73 samples/sec Loss 4.4039 LearningRate 0.0113 Epoch: 13 Global Step: 165030 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:20:57,996-Speed 2988.59 samples/sec Loss 4.3498 LearningRate 0.0113 Epoch: 13 Global Step: 165040 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:21:01,382-Speed 3024.96 samples/sec Loss 4.4421 LearningRate 0.0113 Epoch: 13 Global Step: 165050 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:21:04,821-Speed 2978.58 samples/sec Loss 4.3648 LearningRate 0.0113 Epoch: 13 Global Step: 165060 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:21:08,211-Speed 3021.24 samples/sec Loss 4.4519 LearningRate 0.0113 Epoch: 13 Global Step: 165070 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:21:11,569-Speed 3049.87 samples/sec Loss 4.3968 LearningRate 0.0113 Epoch: 13 Global Step: 165080 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:21:14,948-Speed 3031.58 samples/sec Loss 4.4103 LearningRate 0.0113 Epoch: 13 Global Step: 165090 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:21:18,391-Speed 2974.86 samples/sec Loss 4.5568 LearningRate 0.0112 Epoch: 13 Global Step: 165100 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:21:21,774-Speed 3027.86 samples/sec Loss 4.4500 LearningRate 0.0112 Epoch: 13 Global Step: 165110 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:21:25,199-Speed 2991.04 samples/sec Loss 4.4264 LearningRate 0.0112 Epoch: 13 Global Step: 165120 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:21:28,635-Speed 2980.12 samples/sec Loss 4.5038 LearningRate 0.0112 Epoch: 13 Global Step: 165130 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:21:32,091-Speed 2964.38 samples/sec Loss 4.4140 LearningRate 0.0112 Epoch: 13 Global Step: 165140 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:21:35,513-Speed 2993.05 samples/sec Loss 4.4362 LearningRate 0.0112 Epoch: 13 Global Step: 165150 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:21:38,931-Speed 2997.23 samples/sec Loss 4.4282 LearningRate 0.0112 Epoch: 13 Global Step: 165160 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:21:42,344-Speed 3001.67 samples/sec Loss 4.4370 LearningRate 0.0112 Epoch: 13 Global Step: 165170 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:21:45,707-Speed 3045.76 samples/sec Loss 4.4298 LearningRate 0.0112 Epoch: 13 Global Step: 165180 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:21:49,111-Speed 3009.00 samples/sec Loss 4.4594 LearningRate 0.0112 Epoch: 13 Global Step: 165190 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:21:52,514-Speed 3009.48 samples/sec Loss 4.5008 LearningRate 0.0112 Epoch: 13 Global Step: 165200 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:21:55,943-Speed 2987.48 samples/sec Loss 4.4078 LearningRate 0.0112 Epoch: 13 Global Step: 165210 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:21:59,343-Speed 3012.52 samples/sec Loss 4.4245 LearningRate 0.0112 Epoch: 13 Global Step: 165220 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:22:02,723-Speed 3029.95 samples/sec Loss 4.5008 LearningRate 0.0112 Epoch: 13 Global Step: 165230 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:22:06,105-Speed 3028.91 samples/sec Loss 4.4079 LearningRate 0.0112 Epoch: 13 Global Step: 165240 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:22:09,474-Speed 3040.38 samples/sec Loss 4.4697 LearningRate 0.0112 Epoch: 13 Global Step: 165250 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:22:12,871-Speed 3015.60 samples/sec Loss 4.4784 LearningRate 0.0112 Epoch: 13 Global Step: 165260 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:22:16,415-Speed 2889.92 samples/sec Loss 4.5681 LearningRate 0.0112 Epoch: 13 Global Step: 165270 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:22:19,832-Speed 2997.97 samples/sec Loss 4.4690 LearningRate 0.0112 Epoch: 13 Global Step: 165280 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:22:23,253-Speed 2993.58 samples/sec Loss 4.4665 LearningRate 0.0112 Epoch: 13 Global Step: 165290 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:22:26,649-Speed 3016.71 samples/sec Loss 4.3683 LearningRate 0.0112 Epoch: 13 Global Step: 165300 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:22:30,091-Speed 2975.47 samples/sec Loss 4.4937 LearningRate 0.0112 Epoch: 13 Global Step: 165310 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:22:33,533-Speed 2975.78 samples/sec Loss 4.4876 LearningRate 0.0112 Epoch: 13 Global Step: 165320 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:22:36,963-Speed 2986.67 samples/sec Loss 4.4351 LearningRate 0.0112 Epoch: 13 Global Step: 165330 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:22:40,410-Speed 2971.95 samples/sec Loss 4.5057 LearningRate 0.0112 Epoch: 13 Global Step: 165340 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:22:43,781-Speed 3038.05 samples/sec Loss 4.4205 LearningRate 0.0112 Epoch: 13 Global Step: 165350 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:22:47,110-Speed 3076.75 samples/sec Loss 4.5855 LearningRate 0.0112 Epoch: 13 Global Step: 165360 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:22:50,501-Speed 3020.94 samples/sec Loss 4.5284 LearningRate 0.0112 Epoch: 13 Global Step: 165370 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:22:53,916-Speed 2999.07 samples/sec Loss 4.3956 LearningRate 0.0112 Epoch: 13 Global Step: 165380 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:22:57,273-Speed 3051.76 samples/sec Loss 4.5448 LearningRate 0.0112 Epoch: 13 Global Step: 165390 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:23:00,607-Speed 3072.85 samples/sec Loss 4.4804 LearningRate 0.0112 Epoch: 13 Global Step: 165400 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:23:04,051-Speed 2973.94 samples/sec Loss 4.3551 LearningRate 0.0112 Epoch: 13 Global Step: 165410 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:23:07,392-Speed 3066.13 samples/sec Loss 4.4592 LearningRate 0.0112 Epoch: 13 Global Step: 165420 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:23:10,746-Speed 3054.17 samples/sec Loss 4.4900 LearningRate 0.0112 Epoch: 13 Global Step: 165430 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:23:14,083-Speed 3069.79 samples/sec Loss 4.4976 LearningRate 0.0112 Epoch: 13 Global Step: 165440 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:23:17,476-Speed 3019.59 samples/sec Loss 4.4201 LearningRate 0.0112 Epoch: 13 Global Step: 165450 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:23:20,869-Speed 3018.88 samples/sec Loss 4.4798 LearningRate 0.0112 Epoch: 13 Global Step: 165460 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:23:24,263-Speed 3017.55 samples/sec Loss 4.4398 LearningRate 0.0111 Epoch: 13 Global Step: 165470 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:23:27,733-Speed 2951.12 samples/sec Loss 4.5107 LearningRate 0.0111 Epoch: 13 Global Step: 165480 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:23:31,166-Speed 2983.83 samples/sec Loss 4.4392 LearningRate 0.0111 Epoch: 13 Global Step: 165490 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:23:34,544-Speed 3032.78 samples/sec Loss 4.5931 LearningRate 0.0111 Epoch: 13 Global Step: 165500 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:23:37,916-Speed 3036.81 samples/sec Loss 4.5270 LearningRate 0.0111 Epoch: 13 Global Step: 165510 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:23:41,268-Speed 3056.05 samples/sec Loss 4.5672 LearningRate 0.0111 Epoch: 13 Global Step: 165520 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:23:44,721-Speed 2966.93 samples/sec Loss 4.5239 LearningRate 0.0111 Epoch: 13 Global Step: 165530 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:23:48,127-Speed 3006.51 samples/sec Loss 4.4127 LearningRate 0.0111 Epoch: 13 Global Step: 165540 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:23:51,550-Speed 2992.64 samples/sec Loss 4.4684 LearningRate 0.0111 Epoch: 13 Global Step: 165550 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:23:54,957-Speed 3006.08 samples/sec Loss 4.5290 LearningRate 0.0111 Epoch: 13 Global Step: 165560 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:23:58,357-Speed 3013.19 samples/sec Loss 4.5029 LearningRate 0.0111 Epoch: 13 Global Step: 165570 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:24:01,696-Speed 3067.46 samples/sec Loss 4.4068 LearningRate 0.0111 Epoch: 13 Global Step: 165580 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:24:05,089-Speed 3019.00 samples/sec Loss 4.4554 LearningRate 0.0111 Epoch: 13 Global Step: 165590 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:24:08,461-Speed 3037.37 samples/sec Loss 4.4883 LearningRate 0.0111 Epoch: 13 Global Step: 165600 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:24:11,859-Speed 3014.39 samples/sec Loss 4.4863 LearningRate 0.0111 Epoch: 13 Global Step: 165610 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:24:15,238-Speed 3031.43 samples/sec Loss 4.5023 LearningRate 0.0111 Epoch: 13 Global Step: 165620 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:24:18,725-Speed 2937.54 samples/sec Loss 4.5474 LearningRate 0.0111 Epoch: 13 Global Step: 165630 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:24:22,066-Speed 3065.54 samples/sec Loss 4.4988 LearningRate 0.0111 Epoch: 13 Global Step: 165640 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:24:25,476-Speed 3003.50 samples/sec Loss 4.6042 LearningRate 0.0111 Epoch: 13 Global Step: 165650 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:24:28,800-Speed 3081.46 samples/sec Loss 4.4653 LearningRate 0.0111 Epoch: 13 Global Step: 165660 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:24:32,207-Speed 3007.12 samples/sec Loss 4.4592 LearningRate 0.0111 Epoch: 13 Global Step: 165670 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:24:35,578-Speed 3038.08 samples/sec Loss 4.4958 LearningRate 0.0111 Epoch: 13 Global Step: 165680 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:24:38,897-Speed 3085.76 samples/sec Loss 4.4405 LearningRate 0.0111 Epoch: 13 Global Step: 165690 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:24:42,209-Speed 3092.73 samples/sec Loss 4.5233 LearningRate 0.0111 Epoch: 13 Global Step: 165700 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:24:45,604-Speed 3017.23 samples/sec Loss 4.4604 LearningRate 0.0111 Epoch: 13 Global Step: 165710 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:24:49,068-Speed 2957.20 samples/sec Loss 4.4502 LearningRate 0.0111 Epoch: 13 Global Step: 165720 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:24:52,483-Speed 2999.44 samples/sec Loss 4.5466 LearningRate 0.0111 Epoch: 13 Global Step: 165730 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:24:55,901-Speed 2996.57 samples/sec Loss 4.5944 LearningRate 0.0111 Epoch: 13 Global Step: 165740 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:24:59,267-Speed 3042.96 samples/sec Loss 4.4831 LearningRate 0.0111 Epoch: 13 Global Step: 165750 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:25:02,654-Speed 3023.75 samples/sec Loss 4.6075 LearningRate 0.0111 Epoch: 13 Global Step: 165760 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:25:06,065-Speed 3003.24 samples/sec Loss 4.4626 LearningRate 0.0111 Epoch: 13 Global Step: 165770 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:25:09,466-Speed 3012.02 samples/sec Loss 4.5134 LearningRate 0.0111 Epoch: 13 Global Step: 165780 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:25:12,912-Speed 2972.04 samples/sec Loss 4.4895 LearningRate 0.0111 Epoch: 13 Global Step: 165790 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:25:16,342-Speed 2987.08 samples/sec Loss 4.4423 LearningRate 0.0111 Epoch: 13 Global Step: 165800 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:25:19,739-Speed 3015.47 samples/sec Loss 4.5215 LearningRate 0.0111 Epoch: 13 Global Step: 165810 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:25:23,140-Speed 3012.21 samples/sec Loss 4.4326 LearningRate 0.0111 Epoch: 13 Global Step: 165820 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:25:26,492-Speed 3055.45 samples/sec Loss 4.6215 LearningRate 0.0111 Epoch: 13 Global Step: 165830 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:25:29,879-Speed 3024.57 samples/sec Loss 4.5065 LearningRate 0.0111 Epoch: 13 Global Step: 165840 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:25:33,273-Speed 3018.22 samples/sec Loss 4.5229 LearningRate 0.0110 Epoch: 13 Global Step: 165850 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:25:36,762-Speed 2935.88 samples/sec Loss 4.4156 LearningRate 0.0110 Epoch: 13 Global Step: 165860 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:25:40,154-Speed 3019.19 samples/sec Loss 4.6155 LearningRate 0.0110 Epoch: 13 Global Step: 165870 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:25:43,511-Speed 3051.47 samples/sec Loss 4.4358 LearningRate 0.0110 Epoch: 13 Global Step: 165880 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:25:46,887-Speed 3033.64 samples/sec Loss 4.4887 LearningRate 0.0110 Epoch: 13 Global Step: 165890 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:25:50,244-Speed 3051.60 samples/sec Loss 4.5458 LearningRate 0.0110 Epoch: 13 Global Step: 165900 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:25:53,755-Speed 2917.16 samples/sec Loss 4.5159 LearningRate 0.0110 Epoch: 13 Global Step: 165910 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:25:57,107-Speed 3055.83 samples/sec Loss 4.4888 LearningRate 0.0110 Epoch: 13 Global Step: 165920 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:26:00,458-Speed 3057.03 samples/sec Loss 4.5196 LearningRate 0.0110 Epoch: 13 Global Step: 165930 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:26:03,783-Speed 3080.48 samples/sec Loss 4.5546 LearningRate 0.0110 Epoch: 13 Global Step: 165940 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:26:07,126-Speed 3064.20 samples/sec Loss 4.5065 LearningRate 0.0110 Epoch: 13 Global Step: 165950 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:26:10,488-Speed 3046.33 samples/sec Loss 4.5487 LearningRate 0.0110 Epoch: 13 Global Step: 165960 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:26:13,884-Speed 3016.35 samples/sec Loss 4.4583 LearningRate 0.0110 Epoch: 13 Global Step: 165970 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:26:17,334-Speed 2968.77 samples/sec Loss 4.5932 LearningRate 0.0110 Epoch: 13 Global Step: 165980 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:26:20,751-Speed 2997.77 samples/sec Loss 4.5661 LearningRate 0.0110 Epoch: 13 Global Step: 165990 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:26:24,130-Speed 3031.61 samples/sec Loss 4.5338 LearningRate 0.0110 Epoch: 13 Global Step: 166000 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:26:27,500-Speed 3039.68 samples/sec Loss 4.5264 LearningRate 0.0110 Epoch: 13 Global Step: 166010 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:26:30,879-Speed 3031.40 samples/sec Loss 4.5138 LearningRate 0.0110 Epoch: 13 Global Step: 166020 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:26:34,326-Speed 2971.42 samples/sec Loss 4.5849 LearningRate 0.0110 Epoch: 13 Global Step: 166030 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:26:37,717-Speed 3020.08 samples/sec Loss 4.5624 LearningRate 0.0110 Epoch: 13 Global Step: 166040 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:26:41,060-Speed 3064.63 samples/sec Loss 4.4967 LearningRate 0.0110 Epoch: 13 Global Step: 166050 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:26:44,433-Speed 3036.10 samples/sec Loss 4.3855 LearningRate 0.0110 Epoch: 13 Global Step: 166060 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:26:47,798-Speed 3044.57 samples/sec Loss 4.6836 LearningRate 0.0110 Epoch: 13 Global Step: 166070 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:26:51,287-Speed 2935.64 samples/sec Loss 4.5072 LearningRate 0.0110 Epoch: 13 Global Step: 166080 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:26:54,757-Speed 2951.95 samples/sec Loss 4.6019 LearningRate 0.0110 Epoch: 13 Global Step: 166090 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:26:58,203-Speed 2971.86 samples/sec Loss 4.4947 LearningRate 0.0110 Epoch: 13 Global Step: 166100 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:27:01,592-Speed 3022.89 samples/sec Loss 4.5380 LearningRate 0.0110 Epoch: 13 Global Step: 166110 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:27:04,937-Speed 3062.84 samples/sec Loss 4.4379 LearningRate 0.0110 Epoch: 13 Global Step: 166120 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:27:08,310-Speed 3036.29 samples/sec Loss 4.5667 LearningRate 0.0110 Epoch: 13 Global Step: 166130 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:27:11,782-Speed 2949.45 samples/sec Loss 4.5812 LearningRate 0.0110 Epoch: 13 Global Step: 166140 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:27:15,134-Speed 3057.20 samples/sec Loss 4.4899 LearningRate 0.0110 Epoch: 13 Global Step: 166150 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:27:18,487-Speed 3054.29 samples/sec Loss 4.5247 LearningRate 0.0110 Epoch: 13 Global Step: 166160 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:27:21,887-Speed 3012.73 samples/sec Loss 4.5967 LearningRate 0.0110 Epoch: 13 Global Step: 166170 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:27:25,228-Speed 3066.24 samples/sec Loss 4.4822 LearningRate 0.0110 Epoch: 13 Global Step: 166180 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:27:28,561-Speed 3072.51 samples/sec Loss 4.5482 LearningRate 0.0110 Epoch: 13 Global Step: 166190 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:27:31,957-Speed 3016.61 samples/sec Loss 4.4725 LearningRate 0.0110 Epoch: 13 Global Step: 166200 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:27:35,357-Speed 3012.47 samples/sec Loss 4.4883 LearningRate 0.0110 Epoch: 13 Global Step: 166210 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:27:38,794-Speed 2979.94 samples/sec Loss 4.4964 LearningRate 0.0109 Epoch: 13 Global Step: 166220 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:27:42,183-Speed 3022.82 samples/sec Loss 4.5076 LearningRate 0.0109 Epoch: 13 Global Step: 166230 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:27:45,618-Speed 2982.09 samples/sec Loss 4.4489 LearningRate 0.0109 Epoch: 13 Global Step: 166240 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:27:49,049-Speed 2985.69 samples/sec Loss 4.4462 LearningRate 0.0109 Epoch: 13 Global Step: 166250 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:27:52,462-Speed 3001.24 samples/sec Loss 4.5575 LearningRate 0.0109 Epoch: 13 Global Step: 166260 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:27:55,897-Speed 2981.56 samples/sec Loss 4.5595 LearningRate 0.0109 Epoch: 13 Global Step: 166270 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:27:59,309-Speed 3002.58 samples/sec Loss 4.4487 LearningRate 0.0109 Epoch: 13 Global Step: 166280 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:28:02,771-Speed 2958.30 samples/sec Loss 4.5220 LearningRate 0.0109 Epoch: 13 Global Step: 166290 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:28:06,258-Speed 2937.20 samples/sec Loss 4.5308 LearningRate 0.0109 Epoch: 13 Global Step: 166300 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:28:09,675-Speed 2997.71 samples/sec Loss 4.5160 LearningRate 0.0109 Epoch: 13 Global Step: 166310 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:28:13,040-Speed 3044.47 samples/sec Loss 4.4554 LearningRate 0.0109 Epoch: 13 Global Step: 166320 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:28:16,424-Speed 3026.97 samples/sec Loss 4.4003 LearningRate 0.0109 Epoch: 13 Global Step: 166330 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:28:19,819-Speed 3016.89 samples/sec Loss 4.6344 LearningRate 0.0109 Epoch: 13 Global Step: 166340 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:28:23,216-Speed 3015.62 samples/sec Loss 4.4538 LearningRate 0.0109 Epoch: 13 Global Step: 166350 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:28:26,572-Speed 3051.25 samples/sec Loss 4.5815 LearningRate 0.0109 Epoch: 13 Global Step: 166360 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:28:29,929-Speed 3051.06 samples/sec Loss 4.6138 LearningRate 0.0109 Epoch: 13 Global Step: 166370 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:28:33,307-Speed 3032.23 samples/sec Loss 4.4975 LearningRate 0.0109 Epoch: 13 Global Step: 166380 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:28:36,700-Speed 3019.87 samples/sec Loss 4.6036 LearningRate 0.0109 Epoch: 13 Global Step: 166390 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:28:40,038-Speed 3068.62 samples/sec Loss 4.6129 LearningRate 0.0109 Epoch: 13 Global Step: 166400 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:28:43,450-Speed 3001.39 samples/sec Loss 4.6060 LearningRate 0.0109 Epoch: 13 Global Step: 166410 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:28:46,828-Speed 3032.62 samples/sec Loss 4.6040 LearningRate 0.0109 Epoch: 13 Global Step: 166420 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:28:50,170-Speed 3064.61 samples/sec Loss 4.5862 LearningRate 0.0109 Epoch: 13 Global Step: 166430 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:28:53,643-Speed 2949.28 samples/sec Loss 4.5981 LearningRate 0.0109 Epoch: 13 Global Step: 166440 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:28:57,035-Speed 3019.64 samples/sec Loss 4.5560 LearningRate 0.0109 Epoch: 13 Global Step: 166450 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:29:00,386-Speed 3056.41 samples/sec Loss 4.5280 LearningRate 0.0109 Epoch: 13 Global Step: 166460 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:29:03,722-Speed 3070.61 samples/sec Loss 4.6351 LearningRate 0.0109 Epoch: 13 Global Step: 166470 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:29:07,115-Speed 3019.13 samples/sec Loss 4.5987 LearningRate 0.0109 Epoch: 13 Global Step: 166480 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:29:10,580-Speed 2956.03 samples/sec Loss 4.6074 LearningRate 0.0109 Epoch: 13 Global Step: 166490 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:29:14,046-Speed 2955.47 samples/sec Loss 4.5829 LearningRate 0.0109 Epoch: 13 Global Step: 166500 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:29:17,377-Speed 3074.97 samples/sec Loss 4.6318 LearningRate 0.0109 Epoch: 13 Global Step: 166510 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:29:20,814-Speed 2980.36 samples/sec Loss 4.6186 LearningRate 0.0109 Epoch: 13 Global Step: 166520 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:29:24,184-Speed 3039.42 samples/sec Loss 4.5383 LearningRate 0.0109 Epoch: 13 Global Step: 166530 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:29:27,593-Speed 3003.87 samples/sec Loss 4.5696 LearningRate 0.0109 Epoch: 13 Global Step: 166540 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:29:31,009-Speed 2998.83 samples/sec Loss 4.5069 LearningRate 0.0109 Epoch: 13 Global Step: 166550 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:29:34,453-Speed 2973.81 samples/sec Loss 4.5556 LearningRate 0.0109 Epoch: 13 Global Step: 166560 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:29:37,785-Speed 3074.98 samples/sec Loss 4.5884 LearningRate 0.0109 Epoch: 13 Global Step: 166570 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:29:41,230-Speed 2973.26 samples/sec Loss 4.5500 LearningRate 0.0109 Epoch: 13 Global Step: 166580 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:29:44,697-Speed 2955.05 samples/sec Loss 4.5526 LearningRate 0.0109 Epoch: 13 Global Step: 166590 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:29:48,154-Speed 2962.75 samples/sec Loss 4.5876 LearningRate 0.0108 Epoch: 13 Global Step: 166600 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:29:51,586-Speed 2984.61 samples/sec Loss 4.5297 LearningRate 0.0108 Epoch: 13 Global Step: 166610 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:29:55,004-Speed 2997.17 samples/sec Loss 4.6443 LearningRate 0.0108 Epoch: 13 Global Step: 166620 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:29:58,402-Speed 3014.15 samples/sec Loss 4.5607 LearningRate 0.0108 Epoch: 13 Global Step: 166630 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:30:01,873-Speed 2951.02 samples/sec Loss 4.4849 LearningRate 0.0108 Epoch: 13 Global Step: 166640 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:30:05,243-Speed 3039.13 samples/sec Loss 4.6407 LearningRate 0.0108 Epoch: 13 Global Step: 166650 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:30:08,596-Speed 3054.78 samples/sec Loss 4.5034 LearningRate 0.0108 Epoch: 13 Global Step: 166660 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:30:11,951-Speed 3052.87 samples/sec Loss 4.5381 LearningRate 0.0108 Epoch: 13 Global Step: 166670 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:30:15,288-Speed 3069.56 samples/sec Loss 4.5272 LearningRate 0.0108 Epoch: 13 Global Step: 166680 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:30:18,642-Speed 3054.22 samples/sec Loss 4.5912 LearningRate 0.0108 Epoch: 13 Global Step: 166690 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:30:21,943-Speed 3103.00 samples/sec Loss 4.5526 LearningRate 0.0108 Epoch: 13 Global Step: 166700 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:30:25,318-Speed 3035.04 samples/sec Loss 4.3943 LearningRate 0.0108 Epoch: 13 Global Step: 166710 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:30:28,751-Speed 2983.41 samples/sec Loss 4.5808 LearningRate 0.0108 Epoch: 13 Global Step: 166720 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:30:32,103-Speed 3055.68 samples/sec Loss 4.5192 LearningRate 0.0108 Epoch: 13 Global Step: 166730 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:30:35,433-Speed 3075.66 samples/sec Loss 4.5537 LearningRate 0.0108 Epoch: 13 Global Step: 166740 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:30:38,758-Speed 3080.68 samples/sec Loss 4.5772 LearningRate 0.0108 Epoch: 13 Global Step: 166750 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:30:42,214-Speed 2964.05 samples/sec Loss 4.5409 LearningRate 0.0108 Epoch: 13 Global Step: 166760 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:30:45,606-Speed 3019.75 samples/sec Loss 4.6350 LearningRate 0.0108 Epoch: 13 Global Step: 166770 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:30:48,970-Speed 3044.78 samples/sec Loss 4.6320 LearningRate 0.0108 Epoch: 13 Global Step: 166780 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:30:52,358-Speed 3023.47 samples/sec Loss 4.6388 LearningRate 0.0108 Epoch: 13 Global Step: 166790 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:30:55,712-Speed 3053.17 samples/sec Loss 4.5111 LearningRate 0.0108 Epoch: 13 Global Step: 166800 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:30:59,148-Speed 2981.58 samples/sec Loss 4.5163 LearningRate 0.0108 Epoch: 13 Global Step: 166810 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:31:02,509-Speed 3046.91 samples/sec Loss 4.4694 LearningRate 0.0108 Epoch: 13 Global Step: 166820 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:31:05,942-Speed 2984.15 samples/sec Loss 4.6529 LearningRate 0.0108 Epoch: 13 Global Step: 166830 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:31:09,347-Speed 3008.08 samples/sec Loss 4.4540 LearningRate 0.0108 Epoch: 13 Global Step: 166840 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:31:12,774-Speed 2988.70 samples/sec Loss 4.5638 LearningRate 0.0108 Epoch: 13 Global Step: 166850 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:31:16,190-Speed 2997.66 samples/sec Loss 4.5266 LearningRate 0.0108 Epoch: 13 Global Step: 166860 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:31:19,516-Speed 3079.92 samples/sec Loss 4.5473 LearningRate 0.0108 Epoch: 13 Global Step: 166870 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:31:22,887-Speed 3039.12 samples/sec Loss 4.5484 LearningRate 0.0108 Epoch: 13 Global Step: 166880 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:31:26,245-Speed 3050.33 samples/sec Loss 4.6309 LearningRate 0.0108 Epoch: 13 Global Step: 166890 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:31:29,601-Speed 3051.52 samples/sec Loss 4.5337 LearningRate 0.0108 Epoch: 13 Global Step: 166900 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:31:32,951-Speed 3057.99 samples/sec Loss 4.5713 LearningRate 0.0108 Epoch: 13 Global Step: 166910 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:31:36,346-Speed 3016.72 samples/sec Loss 4.5555 LearningRate 0.0108 Epoch: 13 Global Step: 166920 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:31:39,710-Speed 3045.64 samples/sec Loss 4.5190 LearningRate 0.0108 Epoch: 13 Global Step: 166930 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:31:43,131-Speed 2994.25 samples/sec Loss 4.6400 LearningRate 0.0108 Epoch: 13 Global Step: 166940 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:31:46,512-Speed 3029.35 samples/sec Loss 4.5815 LearningRate 0.0108 Epoch: 13 Global Step: 166950 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:31:49,893-Speed 3030.08 samples/sec Loss 4.6430 LearningRate 0.0108 Epoch: 13 Global Step: 166960 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:31:53,263-Speed 3039.84 samples/sec Loss 4.6371 LearningRate 0.0108 Epoch: 13 Global Step: 166970 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:31:56,602-Speed 3067.34 samples/sec Loss 4.5611 LearningRate 0.0107 Epoch: 13 Global Step: 166980 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:31:59,928-Speed 3079.71 samples/sec Loss 4.6075 LearningRate 0.0107 Epoch: 13 Global Step: 166990 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:32:03,312-Speed 3026.60 samples/sec Loss 4.6673 LearningRate 0.0107 Epoch: 13 Global Step: 167000 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 17:32:06,701-Speed 3022.37 samples/sec Loss 4.5138 LearningRate 0.0107 Epoch: 13 Global Step: 167010 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 17:32:10,053-Speed 3056.02 samples/sec Loss 4.6230 LearningRate 0.0107 Epoch: 13 Global Step: 167020 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 17:32:13,388-Speed 3071.54 samples/sec Loss 4.6504 LearningRate 0.0107 Epoch: 13 Global Step: 167030 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 17:32:16,762-Speed 3035.81 samples/sec Loss 4.7198 LearningRate 0.0107 Epoch: 13 Global Step: 167040 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 17:32:20,196-Speed 2982.80 samples/sec Loss 4.6085 LearningRate 0.0107 Epoch: 13 Global Step: 167050 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 17:32:23,576-Speed 3030.03 samples/sec Loss 4.4994 LearningRate 0.0107 Epoch: 13 Global Step: 167060 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 17:32:26,953-Speed 3033.40 samples/sec Loss 4.5751 LearningRate 0.0107 Epoch: 13 Global Step: 167070 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 17:32:30,378-Speed 2990.96 samples/sec Loss 4.5382 LearningRate 0.0107 Epoch: 13 Global Step: 167080 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 17:32:33,722-Speed 3062.81 samples/sec Loss 4.6177 LearningRate 0.0107 Epoch: 13 Global Step: 167090 Fp16 Grad Scale: 8192 Required: 8 hours Training: 2022-04-27 17:32:37,055-Speed 3072.75 samples/sec Loss 4.4968 LearningRate 0.0107 Epoch: 13 Global Step: 167100 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:32:40,484-Speed 2987.96 samples/sec Loss 4.4863 LearningRate 0.0107 Epoch: 13 Global Step: 167110 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:32:43,872-Speed 3023.64 samples/sec Loss 4.5287 LearningRate 0.0107 Epoch: 13 Global Step: 167120 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:32:47,272-Speed 3012.36 samples/sec Loss 4.5891 LearningRate 0.0107 Epoch: 13 Global Step: 167130 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:32:50,720-Speed 2970.72 samples/sec Loss 4.5433 LearningRate 0.0107 Epoch: 13 Global Step: 167140 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:32:54,161-Speed 2976.50 samples/sec Loss 4.5630 LearningRate 0.0107 Epoch: 13 Global Step: 167150 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:32:57,481-Speed 3085.62 samples/sec Loss 4.5723 LearningRate 0.0107 Epoch: 13 Global Step: 167160 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:33:00,893-Speed 3001.77 samples/sec Loss 4.6022 LearningRate 0.0107 Epoch: 13 Global Step: 167170 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:33:04,262-Speed 3039.96 samples/sec Loss 4.5533 LearningRate 0.0107 Epoch: 13 Global Step: 167180 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:33:07,586-Speed 3082.01 samples/sec Loss 4.5559 LearningRate 0.0107 Epoch: 13 Global Step: 167190 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:33:10,925-Speed 3067.22 samples/sec Loss 4.6046 LearningRate 0.0107 Epoch: 13 Global Step: 167200 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:33:14,310-Speed 3026.19 samples/sec Loss 4.6382 LearningRate 0.0107 Epoch: 13 Global Step: 167210 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:33:17,668-Speed 3049.70 samples/sec Loss 4.5267 LearningRate 0.0107 Epoch: 13 Global Step: 167220 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:33:20,995-Speed 3079.08 samples/sec Loss 4.5468 LearningRate 0.0107 Epoch: 13 Global Step: 167230 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:33:24,402-Speed 3006.20 samples/sec Loss 4.6134 LearningRate 0.0107 Epoch: 13 Global Step: 167240 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:33:27,832-Speed 2986.73 samples/sec Loss 4.5706 LearningRate 0.0107 Epoch: 13 Global Step: 167250 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:33:31,158-Speed 3079.01 samples/sec Loss 4.5368 LearningRate 0.0107 Epoch: 13 Global Step: 167260 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:33:34,481-Speed 3083.23 samples/sec Loss 4.5217 LearningRate 0.0107 Epoch: 13 Global Step: 167270 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:33:37,859-Speed 3032.07 samples/sec Loss 4.6447 LearningRate 0.0107 Epoch: 13 Global Step: 167280 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:33:41,231-Speed 3038.19 samples/sec Loss 4.6542 LearningRate 0.0107 Epoch: 13 Global Step: 167290 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:33:44,545-Speed 3090.13 samples/sec Loss 4.4947 LearningRate 0.0107 Epoch: 13 Global Step: 167300 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:33:47,944-Speed 3014.15 samples/sec Loss 4.5461 LearningRate 0.0107 Epoch: 13 Global Step: 167310 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:33:51,314-Speed 3039.36 samples/sec Loss 4.6451 LearningRate 0.0107 Epoch: 13 Global Step: 167320 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:33:54,727-Speed 3001.13 samples/sec Loss 4.6361 LearningRate 0.0107 Epoch: 13 Global Step: 167330 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:33:58,062-Speed 3071.68 samples/sec Loss 4.5049 LearningRate 0.0107 Epoch: 13 Global Step: 167340 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:34:01,415-Speed 3054.73 samples/sec Loss 4.6247 LearningRate 0.0106 Epoch: 13 Global Step: 167350 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:34:04,826-Speed 3003.04 samples/sec Loss 4.4603 LearningRate 0.0106 Epoch: 13 Global Step: 167360 Fp16 Grad Scale: 16384 Required: 8 hours Training: 2022-04-27 17:34:08,256-Speed 2986.04 samples/sec Loss 4.5709 LearningRate 0.0106 Epoch: 13 Global Step: 167370 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:34:11,635-Speed 3031.53 samples/sec Loss 4.4940 LearningRate 0.0106 Epoch: 13 Global Step: 167380 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:34:14,954-Speed 3085.49 samples/sec Loss 4.6354 LearningRate 0.0106 Epoch: 13 Global Step: 167390 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:34:18,418-Speed 2957.78 samples/sec Loss 4.5691 LearningRate 0.0106 Epoch: 13 Global Step: 167400 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:34:21,868-Speed 2968.60 samples/sec Loss 4.6041 LearningRate 0.0106 Epoch: 13 Global Step: 167410 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:34:25,245-Speed 3032.97 samples/sec Loss 4.5932 LearningRate 0.0106 Epoch: 13 Global Step: 167420 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:34:28,628-Speed 3028.21 samples/sec Loss 4.5764 LearningRate 0.0106 Epoch: 13 Global Step: 167430 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:34:32,068-Speed 2978.48 samples/sec Loss 4.5921 LearningRate 0.0106 Epoch: 13 Global Step: 167440 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:34:35,528-Speed 2959.49 samples/sec Loss 4.5670 LearningRate 0.0106 Epoch: 13 Global Step: 167450 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:34:38,956-Speed 2988.61 samples/sec Loss 4.5794 LearningRate 0.0106 Epoch: 13 Global Step: 167460 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:34:42,350-Speed 3017.75 samples/sec Loss 4.5462 LearningRate 0.0106 Epoch: 13 Global Step: 167470 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:34:45,738-Speed 3023.16 samples/sec Loss 4.5621 LearningRate 0.0106 Epoch: 13 Global Step: 167480 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:34:49,152-Speed 2999.93 samples/sec Loss 4.6757 LearningRate 0.0106 Epoch: 13 Global Step: 167490 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:34:52,551-Speed 3014.04 samples/sec Loss 4.5754 LearningRate 0.0106 Epoch: 13 Global Step: 167500 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:34:55,954-Speed 3009.82 samples/sec Loss 4.5833 LearningRate 0.0106 Epoch: 13 Global Step: 167510 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:34:59,333-Speed 3031.77 samples/sec Loss 4.5975 LearningRate 0.0106 Epoch: 13 Global Step: 167520 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:35:02,712-Speed 3031.03 samples/sec Loss 4.5612 LearningRate 0.0106 Epoch: 13 Global Step: 167530 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:35:06,023-Speed 3094.17 samples/sec Loss 4.6211 LearningRate 0.0106 Epoch: 13 Global Step: 167540 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:35:09,423-Speed 3012.57 samples/sec Loss 4.5463 LearningRate 0.0106 Epoch: 13 Global Step: 167550 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:35:12,731-Speed 3095.91 samples/sec Loss 4.6337 LearningRate 0.0106 Epoch: 13 Global Step: 167560 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:35:16,074-Speed 3064.10 samples/sec Loss 4.6161 LearningRate 0.0106 Epoch: 13 Global Step: 167570 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:35:19,506-Speed 2984.13 samples/sec Loss 4.6440 LearningRate 0.0106 Epoch: 13 Global Step: 167580 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:35:22,922-Speed 2998.69 samples/sec Loss 4.6811 LearningRate 0.0106 Epoch: 13 Global Step: 167590 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:35:26,241-Speed 3087.34 samples/sec Loss 4.5080 LearningRate 0.0106 Epoch: 13 Global Step: 167600 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:35:29,592-Speed 3055.83 samples/sec Loss 4.5380 LearningRate 0.0106 Epoch: 13 Global Step: 167610 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:35:32,938-Speed 3061.27 samples/sec Loss 4.4111 LearningRate 0.0106 Epoch: 13 Global Step: 167620 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:35:36,334-Speed 3016.16 samples/sec Loss 4.5159 LearningRate 0.0106 Epoch: 13 Global Step: 167630 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:35:39,676-Speed 3064.88 samples/sec Loss 4.6393 LearningRate 0.0106 Epoch: 13 Global Step: 167640 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:35:43,037-Speed 3047.66 samples/sec Loss 4.6093 LearningRate 0.0106 Epoch: 13 Global Step: 167650 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:35:46,457-Speed 2995.90 samples/sec Loss 4.6343 LearningRate 0.0106 Epoch: 13 Global Step: 167660 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:35:49,811-Speed 3052.98 samples/sec Loss 4.6711 LearningRate 0.0106 Epoch: 13 Global Step: 167670 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:35:53,221-Speed 3004.09 samples/sec Loss 4.5924 LearningRate 0.0106 Epoch: 13 Global Step: 167680 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:35:56,548-Speed 3078.66 samples/sec Loss 4.4959 LearningRate 0.0106 Epoch: 13 Global Step: 167690 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:35:59,894-Speed 3060.83 samples/sec Loss 4.5857 LearningRate 0.0106 Epoch: 13 Global Step: 167700 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:36:03,207-Speed 3092.40 samples/sec Loss 4.6130 LearningRate 0.0106 Epoch: 13 Global Step: 167710 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:36:06,537-Speed 3076.16 samples/sec Loss 4.5739 LearningRate 0.0106 Epoch: 13 Global Step: 167720 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:36:09,883-Speed 3061.46 samples/sec Loss 4.6858 LearningRate 0.0106 Epoch: 13 Global Step: 167730 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:36:13,288-Speed 3007.56 samples/sec Loss 4.5989 LearningRate 0.0105 Epoch: 13 Global Step: 167740 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:36:16,680-Speed 3019.69 samples/sec Loss 4.6267 LearningRate 0.0105 Epoch: 13 Global Step: 167750 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:36:20,025-Speed 3062.10 samples/sec Loss 4.6138 LearningRate 0.0105 Epoch: 13 Global Step: 167760 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:36:23,559-Speed 2898.44 samples/sec Loss 4.4763 LearningRate 0.0105 Epoch: 13 Global Step: 167770 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:36:26,983-Speed 2991.10 samples/sec Loss 4.5749 LearningRate 0.0105 Epoch: 13 Global Step: 167780 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:36:30,409-Speed 2990.51 samples/sec Loss 4.6793 LearningRate 0.0105 Epoch: 13 Global Step: 167790 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:36:33,779-Speed 3039.11 samples/sec Loss 4.6647 LearningRate 0.0105 Epoch: 13 Global Step: 167800 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:36:37,185-Speed 3007.24 samples/sec Loss 4.6160 LearningRate 0.0105 Epoch: 13 Global Step: 167810 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:36:40,647-Speed 2958.59 samples/sec Loss 4.6965 LearningRate 0.0105 Epoch: 13 Global Step: 167820 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:36:44,054-Speed 3006.75 samples/sec Loss 4.7258 LearningRate 0.0105 Epoch: 13 Global Step: 167830 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:36:47,463-Speed 3005.07 samples/sec Loss 4.6712 LearningRate 0.0105 Epoch: 13 Global Step: 167840 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:36:50,768-Speed 3099.01 samples/sec Loss 4.6629 LearningRate 0.0105 Epoch: 13 Global Step: 167850 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:36:54,158-Speed 3020.98 samples/sec Loss 4.6913 LearningRate 0.0105 Epoch: 13 Global Step: 167860 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:36:57,551-Speed 3019.08 samples/sec Loss 4.6805 LearningRate 0.0105 Epoch: 13 Global Step: 167870 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:37:00,916-Speed 3043.94 samples/sec Loss 4.6329 LearningRate 0.0105 Epoch: 13 Global Step: 167880 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:37:04,280-Speed 3044.41 samples/sec Loss 4.5875 LearningRate 0.0105 Epoch: 13 Global Step: 167890 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:37:07,664-Speed 3028.13 samples/sec Loss 4.7088 LearningRate 0.0105 Epoch: 13 Global Step: 167900 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:37:11,039-Speed 3034.89 samples/sec Loss 4.5169 LearningRate 0.0105 Epoch: 13 Global Step: 167910 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:37:14,382-Speed 3063.22 samples/sec Loss 4.5864 LearningRate 0.0105 Epoch: 13 Global Step: 167920 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:37:17,711-Speed 3077.78 samples/sec Loss 4.6502 LearningRate 0.0105 Epoch: 13 Global Step: 167930 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:37:21,062-Speed 3056.54 samples/sec Loss 4.5059 LearningRate 0.0105 Epoch: 13 Global Step: 167940 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:37:24,414-Speed 3055.13 samples/sec Loss 4.6067 LearningRate 0.0105 Epoch: 13 Global Step: 167950 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:37:27,740-Speed 3079.48 samples/sec Loss 4.6473 LearningRate 0.0105 Epoch: 13 Global Step: 167960 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:37:31,097-Speed 3051.79 samples/sec Loss 4.5194 LearningRate 0.0105 Epoch: 13 Global Step: 167970 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:37:34,419-Speed 3083.02 samples/sec Loss 4.5494 LearningRate 0.0105 Epoch: 13 Global Step: 167980 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:37:37,812-Speed 3018.35 samples/sec Loss 4.4902 LearningRate 0.0105 Epoch: 13 Global Step: 167990 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:37:41,233-Speed 2994.24 samples/sec Loss 4.5472 LearningRate 0.0105 Epoch: 13 Global Step: 168000 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:37:44,600-Speed 3042.72 samples/sec Loss 4.6879 LearningRate 0.0105 Epoch: 13 Global Step: 168010 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:37:48,046-Speed 2972.46 samples/sec Loss 4.6343 LearningRate 0.0105 Epoch: 13 Global Step: 168020 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:37:51,458-Speed 3002.29 samples/sec Loss 4.6590 LearningRate 0.0105 Epoch: 13 Global Step: 168030 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:37:54,807-Speed 3058.23 samples/sec Loss 4.5428 LearningRate 0.0105 Epoch: 13 Global Step: 168040 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:37:58,148-Speed 3065.55 samples/sec Loss 4.6149 LearningRate 0.0105 Epoch: 13 Global Step: 168050 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:38:01,460-Speed 3093.22 samples/sec Loss 4.6564 LearningRate 0.0105 Epoch: 13 Global Step: 168060 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:38:04,832-Speed 3037.18 samples/sec Loss 4.5927 LearningRate 0.0105 Epoch: 13 Global Step: 168070 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:38:08,127-Speed 3108.93 samples/sec Loss 4.5164 LearningRate 0.0105 Epoch: 13 Global Step: 168080 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:38:11,457-Speed 3075.65 samples/sec Loss 4.5812 LearningRate 0.0105 Epoch: 13 Global Step: 168090 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:38:14,787-Speed 3076.20 samples/sec Loss 4.6232 LearningRate 0.0105 Epoch: 13 Global Step: 168100 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:38:18,150-Speed 3045.64 samples/sec Loss 4.6374 LearningRate 0.0105 Epoch: 13 Global Step: 168110 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:38:21,508-Speed 3050.20 samples/sec Loss 4.7475 LearningRate 0.0104 Epoch: 13 Global Step: 168120 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:38:24,961-Speed 2966.61 samples/sec Loss 4.5352 LearningRate 0.0104 Epoch: 13 Global Step: 168130 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:38:28,374-Speed 3000.26 samples/sec Loss 4.6334 LearningRate 0.0104 Epoch: 13 Global Step: 168140 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:38:31,715-Speed 3066.67 samples/sec Loss 4.6286 LearningRate 0.0104 Epoch: 13 Global Step: 168150 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:38:35,135-Speed 2994.67 samples/sec Loss 4.5208 LearningRate 0.0104 Epoch: 13 Global Step: 168160 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:38:38,463-Speed 3077.74 samples/sec Loss 4.6572 LearningRate 0.0104 Epoch: 13 Global Step: 168170 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:38:41,848-Speed 3026.45 samples/sec Loss 4.5946 LearningRate 0.0104 Epoch: 13 Global Step: 168180 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:38:45,211-Speed 3045.92 samples/sec Loss 4.7224 LearningRate 0.0104 Epoch: 13 Global Step: 168190 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:38:48,560-Speed 3058.58 samples/sec Loss 4.5820 LearningRate 0.0104 Epoch: 13 Global Step: 168200 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:38:51,914-Speed 3053.90 samples/sec Loss 4.5831 LearningRate 0.0104 Epoch: 13 Global Step: 168210 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:38:55,222-Speed 3096.08 samples/sec Loss 4.6602 LearningRate 0.0104 Epoch: 13 Global Step: 168220 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:38:58,574-Speed 3055.54 samples/sec Loss 4.5634 LearningRate 0.0104 Epoch: 13 Global Step: 168230 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:39:01,961-Speed 3025.38 samples/sec Loss 4.6675 LearningRate 0.0104 Epoch: 13 Global Step: 168240 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:39:05,399-Speed 2979.04 samples/sec Loss 4.6265 LearningRate 0.0104 Epoch: 13 Global Step: 168250 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:39:08,763-Speed 3045.00 samples/sec Loss 4.7103 LearningRate 0.0104 Epoch: 13 Global Step: 168260 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:39:12,166-Speed 3009.60 samples/sec Loss 4.6328 LearningRate 0.0104 Epoch: 13 Global Step: 168270 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:39:15,533-Speed 3042.35 samples/sec Loss 4.6958 LearningRate 0.0104 Epoch: 13 Global Step: 168280 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:39:18,897-Speed 3045.32 samples/sec Loss 4.6078 LearningRate 0.0104 Epoch: 13 Global Step: 168290 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:39:22,217-Speed 3084.46 samples/sec Loss 4.5929 LearningRate 0.0104 Epoch: 13 Global Step: 168300 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:39:25,507-Speed 3113.41 samples/sec Loss 4.5674 LearningRate 0.0104 Epoch: 13 Global Step: 168310 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:39:28,852-Speed 3062.66 samples/sec Loss 4.5979 LearningRate 0.0104 Epoch: 13 Global Step: 168320 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:39:32,375-Speed 2907.15 samples/sec Loss 4.6819 LearningRate 0.0104 Epoch: 13 Global Step: 168330 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:39:35,704-Speed 3076.84 samples/sec Loss 4.6249 LearningRate 0.0104 Epoch: 13 Global Step: 168340 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:39:39,134-Speed 2987.12 samples/sec Loss 4.5648 LearningRate 0.0104 Epoch: 13 Global Step: 168350 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:39:42,614-Speed 2943.57 samples/sec Loss 4.5976 LearningRate 0.0104 Epoch: 13 Global Step: 168360 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:39:46,089-Speed 2947.20 samples/sec Loss 4.5872 LearningRate 0.0104 Epoch: 13 Global Step: 168370 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:39:49,510-Speed 2994.55 samples/sec Loss 4.7786 LearningRate 0.0104 Epoch: 13 Global Step: 168380 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:39:52,931-Speed 2994.39 samples/sec Loss 4.6498 LearningRate 0.0104 Epoch: 13 Global Step: 168390 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:39:56,332-Speed 3011.06 samples/sec Loss 4.6554 LearningRate 0.0104 Epoch: 13 Global Step: 168400 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:39:59,777-Speed 2973.71 samples/sec Loss 4.5838 LearningRate 0.0104 Epoch: 13 Global Step: 168410 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:40:03,103-Speed 3079.93 samples/sec Loss 4.6508 LearningRate 0.0104 Epoch: 13 Global Step: 168420 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:40:06,514-Speed 3002.54 samples/sec Loss 4.6571 LearningRate 0.0104 Epoch: 13 Global Step: 168430 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:40:09,866-Speed 3056.19 samples/sec Loss 4.7186 LearningRate 0.0104 Epoch: 13 Global Step: 168440 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:40:13,196-Speed 3075.79 samples/sec Loss 4.6360 LearningRate 0.0104 Epoch: 13 Global Step: 168450 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:40:16,505-Speed 3095.53 samples/sec Loss 4.6470 LearningRate 0.0104 Epoch: 13 Global Step: 168460 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:40:19,937-Speed 2984.78 samples/sec Loss 4.5786 LearningRate 0.0104 Epoch: 13 Global Step: 168470 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:40:23,336-Speed 3013.25 samples/sec Loss 4.6980 LearningRate 0.0104 Epoch: 13 Global Step: 168480 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:40:26,758-Speed 2993.22 samples/sec Loss 4.6815 LearningRate 0.0104 Epoch: 13 Global Step: 168490 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:40:30,262-Speed 2923.50 samples/sec Loss 4.6478 LearningRate 0.0103 Epoch: 13 Global Step: 168500 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:40:33,746-Speed 2939.98 samples/sec Loss 4.6437 LearningRate 0.0103 Epoch: 13 Global Step: 168510 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:40:37,049-Speed 3100.65 samples/sec Loss 4.5455 LearningRate 0.0103 Epoch: 13 Global Step: 168520 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:40:40,387-Speed 3069.08 samples/sec Loss 4.7047 LearningRate 0.0103 Epoch: 13 Global Step: 168530 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:40:43,850-Speed 2957.42 samples/sec Loss 4.5456 LearningRate 0.0103 Epoch: 13 Global Step: 168540 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:40:47,221-Speed 3038.39 samples/sec Loss 4.5829 LearningRate 0.0103 Epoch: 13 Global Step: 168550 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:40:50,693-Speed 2950.19 samples/sec Loss 4.6094 LearningRate 0.0103 Epoch: 13 Global Step: 168560 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:40:54,043-Speed 3058.17 samples/sec Loss 4.6043 LearningRate 0.0103 Epoch: 13 Global Step: 168570 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:40:57,458-Speed 2999.24 samples/sec Loss 4.5760 LearningRate 0.0103 Epoch: 13 Global Step: 168580 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:41:00,871-Speed 3001.39 samples/sec Loss 4.6150 LearningRate 0.0103 Epoch: 13 Global Step: 168590 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:41:04,277-Speed 3006.85 samples/sec Loss 4.6336 LearningRate 0.0103 Epoch: 13 Global Step: 168600 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:41:07,656-Speed 3032.41 samples/sec Loss 4.6845 LearningRate 0.0103 Epoch: 13 Global Step: 168610 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:41:11,167-Speed 2916.63 samples/sec Loss 4.6552 LearningRate 0.0103 Epoch: 13 Global Step: 168620 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:41:14,656-Speed 2936.26 samples/sec Loss 4.5677 LearningRate 0.0103 Epoch: 13 Global Step: 168630 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:41:18,065-Speed 3004.29 samples/sec Loss 4.6249 LearningRate 0.0103 Epoch: 13 Global Step: 168640 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:41:21,425-Speed 3049.06 samples/sec Loss 4.7552 LearningRate 0.0103 Epoch: 13 Global Step: 168650 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:41:24,817-Speed 3019.76 samples/sec Loss 4.6523 LearningRate 0.0103 Epoch: 13 Global Step: 168660 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:41:28,183-Speed 3043.10 samples/sec Loss 4.6133 LearningRate 0.0103 Epoch: 13 Global Step: 168670 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:41:31,626-Speed 2976.55 samples/sec Loss 4.5851 LearningRate 0.0103 Epoch: 13 Global Step: 168680 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:41:35,020-Speed 3018.28 samples/sec Loss 4.6347 LearningRate 0.0103 Epoch: 13 Global Step: 168690 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:41:38,377-Speed 3051.27 samples/sec Loss 4.6774 LearningRate 0.0103 Epoch: 13 Global Step: 168700 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:41:41,716-Speed 3067.05 samples/sec Loss 4.6478 LearningRate 0.0103 Epoch: 13 Global Step: 168710 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:41:45,125-Speed 3005.28 samples/sec Loss 4.6018 LearningRate 0.0103 Epoch: 13 Global Step: 168720 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:41:48,518-Speed 3019.03 samples/sec Loss 4.6116 LearningRate 0.0103 Epoch: 13 Global Step: 168730 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:41:51,885-Speed 3042.10 samples/sec Loss 4.6685 LearningRate 0.0103 Epoch: 13 Global Step: 168740 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:41:55,229-Speed 3063.47 samples/sec Loss 4.5903 LearningRate 0.0103 Epoch: 13 Global Step: 168750 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:41:58,612-Speed 3027.34 samples/sec Loss 4.6542 LearningRate 0.0103 Epoch: 13 Global Step: 168760 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:42:01,966-Speed 3054.70 samples/sec Loss 4.7216 LearningRate 0.0103 Epoch: 13 Global Step: 168770 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:42:05,335-Speed 3039.71 samples/sec Loss 4.6160 LearningRate 0.0103 Epoch: 13 Global Step: 168780 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:42:08,754-Speed 2996.77 samples/sec Loss 4.4893 LearningRate 0.0103 Epoch: 13 Global Step: 168790 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:42:12,133-Speed 3030.32 samples/sec Loss 4.6198 LearningRate 0.0103 Epoch: 13 Global Step: 168800 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:42:15,521-Speed 3023.99 samples/sec Loss 4.5799 LearningRate 0.0103 Epoch: 13 Global Step: 168810 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:42:18,886-Speed 3043.58 samples/sec Loss 4.6046 LearningRate 0.0103 Epoch: 13 Global Step: 168820 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:42:22,348-Speed 2958.95 samples/sec Loss 4.5694 LearningRate 0.0103 Epoch: 13 Global Step: 168830 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:42:25,767-Speed 2995.49 samples/sec Loss 4.6330 LearningRate 0.0103 Epoch: 13 Global Step: 168840 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:42:29,194-Speed 2989.14 samples/sec Loss 4.6551 LearningRate 0.0103 Epoch: 13 Global Step: 168850 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:42:32,527-Speed 3072.72 samples/sec Loss 4.6205 LearningRate 0.0103 Epoch: 13 Global Step: 168860 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:42:35,865-Speed 3068.74 samples/sec Loss 4.6515 LearningRate 0.0103 Epoch: 13 Global Step: 168870 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:42:39,213-Speed 3059.03 samples/sec Loss 4.6771 LearningRate 0.0103 Epoch: 13 Global Step: 168880 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:42:42,526-Speed 3092.69 samples/sec Loss 4.6337 LearningRate 0.0102 Epoch: 13 Global Step: 168890 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:42:45,860-Speed 3071.61 samples/sec Loss 4.5786 LearningRate 0.0102 Epoch: 13 Global Step: 168900 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:42:49,207-Speed 3060.47 samples/sec Loss 4.6010 LearningRate 0.0102 Epoch: 13 Global Step: 168910 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:42:52,549-Speed 3064.85 samples/sec Loss 4.7031 LearningRate 0.0102 Epoch: 13 Global Step: 168920 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:42:55,936-Speed 3024.56 samples/sec Loss 4.5672 LearningRate 0.0102 Epoch: 13 Global Step: 168930 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:42:59,386-Speed 2968.19 samples/sec Loss 4.6350 LearningRate 0.0102 Epoch: 13 Global Step: 168940 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:43:02,843-Speed 2963.15 samples/sec Loss 4.6589 LearningRate 0.0102 Epoch: 13 Global Step: 168950 Fp16 Grad Scale: 65536 Required: 8 hours Training: 2022-04-27 17:43:06,161-Speed 3087.30 samples/sec Loss 4.6117 LearningRate 0.0102 Epoch: 13 Global Step: 168960 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:43:09,546-Speed 3025.78 samples/sec Loss 4.5789 LearningRate 0.0102 Epoch: 13 Global Step: 168970 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:43:12,956-Speed 3004.03 samples/sec Loss 4.6517 LearningRate 0.0102 Epoch: 13 Global Step: 168980 Fp16 Grad Scale: 32768 Required: 8 hours Training: 2022-04-27 17:43:16,366-Speed 3003.70 samples/sec Loss 4.5911 LearningRate 0.0102 Epoch: 13 Global Step: 168990 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:43:19,749-Speed 3027.46 samples/sec Loss 4.6539 LearningRate 0.0102 Epoch: 13 Global Step: 169000 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:43:23,163-Speed 3000.43 samples/sec Loss 4.7365 LearningRate 0.0102 Epoch: 13 Global Step: 169010 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:43:26,562-Speed 3013.30 samples/sec Loss 4.6660 LearningRate 0.0102 Epoch: 13 Global Step: 169020 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:43:29,941-Speed 3031.20 samples/sec Loss 4.5464 LearningRate 0.0102 Epoch: 13 Global Step: 169030 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:43:33,374-Speed 2983.47 samples/sec Loss 4.4815 LearningRate 0.0102 Epoch: 13 Global Step: 169040 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:43:36,770-Speed 3017.08 samples/sec Loss 4.5814 LearningRate 0.0102 Epoch: 13 Global Step: 169050 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:43:40,116-Speed 3060.37 samples/sec Loss 4.5547 LearningRate 0.0102 Epoch: 13 Global Step: 169060 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 17:43:43,468-Speed 3055.66 samples/sec Loss 4.5924 LearningRate 0.0102 Epoch: 13 Global Step: 169070 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 17:43:46,811-Speed 3064.67 samples/sec Loss 4.6666 LearningRate 0.0102 Epoch: 13 Global Step: 169080 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:43:50,232-Speed 2993.48 samples/sec Loss 4.5471 LearningRate 0.0102 Epoch: 13 Global Step: 169090 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:43:53,647-Speed 2999.87 samples/sec Loss 4.6169 LearningRate 0.0102 Epoch: 13 Global Step: 169100 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:43:57,069-Speed 2993.08 samples/sec Loss 4.6934 LearningRate 0.0102 Epoch: 13 Global Step: 169110 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:44:00,454-Speed 3025.76 samples/sec Loss 4.5214 LearningRate 0.0102 Epoch: 13 Global Step: 169120 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:44:03,853-Speed 3014.04 samples/sec Loss 4.6395 LearningRate 0.0102 Epoch: 13 Global Step: 169130 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:44:07,243-Speed 3021.04 samples/sec Loss 4.5803 LearningRate 0.0102 Epoch: 13 Global Step: 169140 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:44:10,644-Speed 3012.18 samples/sec Loss 4.5924 LearningRate 0.0102 Epoch: 13 Global Step: 169150 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:44:14,111-Speed 2954.56 samples/sec Loss 4.6483 LearningRate 0.0102 Epoch: 13 Global Step: 169160 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:44:17,568-Speed 2962.57 samples/sec Loss 4.6952 LearningRate 0.0102 Epoch: 13 Global Step: 169170 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:44:20,896-Speed 3077.94 samples/sec Loss 4.6977 LearningRate 0.0102 Epoch: 13 Global Step: 169180 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:44:24,310-Speed 3000.80 samples/sec Loss 4.5806 LearningRate 0.0102 Epoch: 13 Global Step: 169190 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:44:27,725-Speed 2999.32 samples/sec Loss 4.6938 LearningRate 0.0102 Epoch: 13 Global Step: 169200 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:44:31,169-Speed 2974.21 samples/sec Loss 4.6679 LearningRate 0.0102 Epoch: 13 Global Step: 169210 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:44:34,635-Speed 2955.04 samples/sec Loss 4.6245 LearningRate 0.0102 Epoch: 13 Global Step: 169220 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:44:38,106-Speed 2951.07 samples/sec Loss 4.6316 LearningRate 0.0102 Epoch: 13 Global Step: 169230 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:44:41,527-Speed 2994.23 samples/sec Loss 4.6728 LearningRate 0.0102 Epoch: 13 Global Step: 169240 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:44:44,999-Speed 2949.76 samples/sec Loss 4.5873 LearningRate 0.0102 Epoch: 13 Global Step: 169250 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:44:48,438-Speed 2978.52 samples/sec Loss 4.6183 LearningRate 0.0102 Epoch: 13 Global Step: 169260 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:44:51,841-Speed 3009.99 samples/sec Loss 4.7089 LearningRate 0.0102 Epoch: 13 Global Step: 169270 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 17:44:55,241-Speed 3012.36 samples/sec Loss 4.6653 LearningRate 0.0101 Epoch: 13 Global Step: 169280 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 17:44:58,610-Speed 3040.84 samples/sec Loss 4.7332 LearningRate 0.0101 Epoch: 13 Global Step: 169290 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 17:45:02,031-Speed 2993.78 samples/sec Loss 4.5655 LearningRate 0.0101 Epoch: 13 Global Step: 169300 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 17:45:05,434-Speed 3010.42 samples/sec Loss 4.6386 LearningRate 0.0101 Epoch: 13 Global Step: 169310 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 17:45:08,852-Speed 2996.60 samples/sec Loss 4.7037 LearningRate 0.0101 Epoch: 13 Global Step: 169320 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 17:45:12,220-Speed 3041.30 samples/sec Loss 4.6514 LearningRate 0.0101 Epoch: 13 Global Step: 169330 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 17:45:15,676-Speed 2963.94 samples/sec Loss 4.6034 LearningRate 0.0101 Epoch: 13 Global Step: 169340 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 17:45:19,059-Speed 3028.43 samples/sec Loss 4.6502 LearningRate 0.0101 Epoch: 13 Global Step: 169350 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 17:45:22,490-Speed 2984.65 samples/sec Loss 4.5436 LearningRate 0.0101 Epoch: 13 Global Step: 169360 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 17:45:25,950-Speed 2960.57 samples/sec Loss 4.6776 LearningRate 0.0101 Epoch: 13 Global Step: 169370 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 17:45:29,283-Speed 3073.19 samples/sec Loss 4.6641 LearningRate 0.0101 Epoch: 13 Global Step: 169380 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 17:45:32,607-Speed 3081.16 samples/sec Loss 4.5034 LearningRate 0.0101 Epoch: 13 Global Step: 169390 Fp16 Grad Scale: 8192 Required: 7 hours Training: 2022-04-27 17:45:36,049-Speed 2976.23 samples/sec Loss 4.5805 LearningRate 0.0101 Epoch: 13 Global Step: 169400 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 17:45:39,445-Speed 3015.65 samples/sec Loss 4.5877 LearningRate 0.0101 Epoch: 13 Global Step: 169410 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 17:45:42,821-Speed 3034.89 samples/sec Loss 4.5997 LearningRate 0.0101 Epoch: 13 Global Step: 169420 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 17:45:46,167-Speed 3061.45 samples/sec Loss 4.6745 LearningRate 0.0101 Epoch: 13 Global Step: 169430 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 17:45:49,496-Speed 3076.09 samples/sec Loss 4.6028 LearningRate 0.0101 Epoch: 13 Global Step: 169440 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 17:45:52,839-Speed 3064.88 samples/sec Loss 4.6352 LearningRate 0.0101 Epoch: 13 Global Step: 169450 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 17:45:56,160-Speed 3083.53 samples/sec Loss 4.6447 LearningRate 0.0101 Epoch: 13 Global Step: 169460 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 17:45:59,592-Speed 2984.92 samples/sec Loss 4.7088 LearningRate 0.0101 Epoch: 13 Global Step: 169470 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 17:46:02,959-Speed 3041.99 samples/sec Loss 4.5846 LearningRate 0.0101 Epoch: 13 Global Step: 169480 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 17:46:06,350-Speed 3020.88 samples/sec Loss 4.6459 LearningRate 0.0101 Epoch: 13 Global Step: 169490 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 17:46:09,699-Speed 3057.78 samples/sec Loss 4.6056 LearningRate 0.0101 Epoch: 13 Global Step: 169500 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:46:13,032-Speed 3073.55 samples/sec Loss 4.6275 LearningRate 0.0101 Epoch: 13 Global Step: 169510 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:46:16,385-Speed 3054.39 samples/sec Loss 4.5880 LearningRate 0.0101 Epoch: 13 Global Step: 169520 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:46:19,745-Speed 3048.95 samples/sec Loss 4.6497 LearningRate 0.0101 Epoch: 13 Global Step: 169530 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:46:23,079-Speed 3072.32 samples/sec Loss 4.4877 LearningRate 0.0101 Epoch: 13 Global Step: 169540 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:46:26,391-Speed 3092.57 samples/sec Loss 4.6582 LearningRate 0.0101 Epoch: 13 Global Step: 169550 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:46:29,721-Speed 3075.29 samples/sec Loss 4.5788 LearningRate 0.0101 Epoch: 13 Global Step: 169560 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:46:33,053-Speed 3074.38 samples/sec Loss 4.6264 LearningRate 0.0101 Epoch: 13 Global Step: 169570 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:46:36,391-Speed 3068.14 samples/sec Loss 4.6782 LearningRate 0.0101 Epoch: 13 Global Step: 169580 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:46:39,754-Speed 3046.38 samples/sec Loss 4.5482 LearningRate 0.0101 Epoch: 13 Global Step: 169590 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:46:43,101-Speed 3060.19 samples/sec Loss 4.5879 LearningRate 0.0101 Epoch: 13 Global Step: 169600 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 17:46:46,485-Speed 3026.40 samples/sec Loss 4.6266 LearningRate 0.0101 Epoch: 13 Global Step: 169610 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 17:46:49,866-Speed 3029.79 samples/sec Loss 4.5989 LearningRate 0.0101 Epoch: 13 Global Step: 169620 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 17:46:53,215-Speed 3058.94 samples/sec Loss 4.6403 LearningRate 0.0101 Epoch: 13 Global Step: 169630 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 17:46:56,576-Speed 3047.60 samples/sec Loss 4.5684 LearningRate 0.0101 Epoch: 13 Global Step: 169640 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 17:46:59,913-Speed 3068.56 samples/sec Loss 4.5716 LearningRate 0.0101 Epoch: 13 Global Step: 169650 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 17:47:03,268-Speed 3052.99 samples/sec Loss 4.5795 LearningRate 0.0101 Epoch: 13 Global Step: 169660 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 17:47:06,614-Speed 3061.76 samples/sec Loss 4.5859 LearningRate 0.0100 Epoch: 13 Global Step: 169670 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 17:47:09,982-Speed 3040.89 samples/sec Loss 4.6087 LearningRate 0.0100 Epoch: 13 Global Step: 169680 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 17:47:13,444-Speed 2958.90 samples/sec Loss 4.6142 LearningRate 0.0100 Epoch: 13 Global Step: 169690 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 17:47:16,830-Speed 3024.65 samples/sec Loss 4.5900 LearningRate 0.0100 Epoch: 13 Global Step: 169700 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 17:47:20,285-Speed 2964.85 samples/sec Loss 4.6295 LearningRate 0.0100 Epoch: 13 Global Step: 169710 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 17:47:23,665-Speed 3030.33 samples/sec Loss 4.4403 LearningRate 0.0100 Epoch: 13 Global Step: 169720 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 17:47:27,019-Speed 3053.77 samples/sec Loss 4.6754 LearningRate 0.0100 Epoch: 13 Global Step: 169730 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 17:47:30,401-Speed 3029.14 samples/sec Loss 4.6065 LearningRate 0.0100 Epoch: 13 Global Step: 169740 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:47:33,792-Speed 3020.26 samples/sec Loss 4.5437 LearningRate 0.0100 Epoch: 13 Global Step: 169750 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:47:37,172-Speed 3030.71 samples/sec Loss 4.6918 LearningRate 0.0100 Epoch: 13 Global Step: 169760 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:47:40,542-Speed 3039.57 samples/sec Loss 4.5893 LearningRate 0.0100 Epoch: 13 Global Step: 169770 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:47:43,881-Speed 3067.36 samples/sec Loss 4.6490 LearningRate 0.0100 Epoch: 13 Global Step: 169780 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:47:47,268-Speed 3024.36 samples/sec Loss 4.6731 LearningRate 0.0100 Epoch: 13 Global Step: 169790 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:47:50,718-Speed 2968.45 samples/sec Loss 4.6149 LearningRate 0.0100 Epoch: 13 Global Step: 169800 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:47:54,105-Speed 3024.30 samples/sec Loss 4.6237 LearningRate 0.0100 Epoch: 13 Global Step: 169810 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:47:57,536-Speed 2985.10 samples/sec Loss 4.6895 LearningRate 0.0100 Epoch: 13 Global Step: 169820 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:48:00,878-Speed 3065.03 samples/sec Loss 4.6079 LearningRate 0.0100 Epoch: 13 Global Step: 169830 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 17:48:04,200-Speed 3083.79 samples/sec Loss 4.6561 LearningRate 0.0100 Epoch: 13 Global Step: 169840 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 17:48:07,513-Speed 3091.13 samples/sec Loss 4.7246 LearningRate 0.0100 Epoch: 13 Global Step: 169850 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 17:48:10,812-Speed 3105.36 samples/sec Loss 4.5843 LearningRate 0.0100 Epoch: 13 Global Step: 169860 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 17:48:14,142-Speed 3075.08 samples/sec Loss 4.5841 LearningRate 0.0100 Epoch: 13 Global Step: 169870 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 17:48:17,548-Speed 3007.40 samples/sec Loss 4.5679 LearningRate 0.0100 Epoch: 13 Global Step: 169880 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 17:48:20,898-Speed 3057.81 samples/sec Loss 4.6607 LearningRate 0.0100 Epoch: 13 Global Step: 169890 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 17:48:24,252-Speed 3054.20 samples/sec Loss 4.5875 LearningRate 0.0100 Epoch: 13 Global Step: 169900 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 17:48:27,637-Speed 3025.69 samples/sec Loss 4.6695 LearningRate 0.0100 Epoch: 13 Global Step: 169910 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 17:48:31,065-Speed 2988.42 samples/sec Loss 4.6340 LearningRate 0.0100 Epoch: 13 Global Step: 169920 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 17:48:34,397-Speed 3074.24 samples/sec Loss 4.6723 LearningRate 0.0100 Epoch: 13 Global Step: 169930 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:48:37,849-Speed 2967.33 samples/sec Loss 4.6450 LearningRate 0.0100 Epoch: 13 Global Step: 169940 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:48:41,246-Speed 3014.81 samples/sec Loss 4.6067 LearningRate 0.0100 Epoch: 13 Global Step: 169950 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:48:44,604-Speed 3050.29 samples/sec Loss 4.6011 LearningRate 0.0100 Epoch: 13 Global Step: 169960 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:48:47,956-Speed 3055.99 samples/sec Loss 4.6231 LearningRate 0.0100 Epoch: 13 Global Step: 169970 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:48:51,289-Speed 3073.09 samples/sec Loss 4.6722 LearningRate 0.0100 Epoch: 13 Global Step: 169980 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:48:54,680-Speed 3020.74 samples/sec Loss 4.6300 LearningRate 0.0100 Epoch: 13 Global Step: 169990 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:48:58,090-Speed 3003.25 samples/sec Loss 4.5859 LearningRate 0.0100 Epoch: 13 Global Step: 170000 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:49:01,552-Speed 2959.60 samples/sec Loss 4.6593 LearningRate 0.0100 Epoch: 13 Global Step: 170010 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:49:04,955-Speed 3010.28 samples/sec Loss 4.5319 LearningRate 0.0100 Epoch: 13 Global Step: 170020 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:49:08,374-Speed 2996.09 samples/sec Loss 4.5513 LearningRate 0.0100 Epoch: 13 Global Step: 170030 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 17:49:11,799-Speed 2990.13 samples/sec Loss 4.5522 LearningRate 0.0100 Epoch: 13 Global Step: 170040 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 17:49:15,217-Speed 2996.86 samples/sec Loss 4.6785 LearningRate 0.0100 Epoch: 13 Global Step: 170050 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 17:49:18,636-Speed 2995.96 samples/sec Loss 4.5711 LearningRate 0.0099 Epoch: 13 Global Step: 170060 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 17:49:22,032-Speed 3015.85 samples/sec Loss 4.6146 LearningRate 0.0099 Epoch: 13 Global Step: 170070 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 17:49:25,436-Speed 3008.84 samples/sec Loss 4.5461 LearningRate 0.0099 Epoch: 13 Global Step: 170080 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:49:28,800-Speed 3045.06 samples/sec Loss 4.6550 LearningRate 0.0099 Epoch: 13 Global Step: 170090 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:49:32,185-Speed 3026.33 samples/sec Loss 4.6479 LearningRate 0.0099 Epoch: 13 Global Step: 170100 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:49:35,536-Speed 3056.28 samples/sec Loss 4.5617 LearningRate 0.0099 Epoch: 13 Global Step: 170110 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:49:38,909-Speed 3037.23 samples/sec Loss 4.7012 LearningRate 0.0099 Epoch: 13 Global Step: 170120 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:49:42,264-Speed 3052.84 samples/sec Loss 4.5686 LearningRate 0.0099 Epoch: 13 Global Step: 170130 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:49:45,680-Speed 2998.35 samples/sec Loss 4.6235 LearningRate 0.0099 Epoch: 13 Global Step: 170140 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:49:49,066-Speed 3024.60 samples/sec Loss 4.5815 LearningRate 0.0099 Epoch: 13 Global Step: 170150 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:49:52,465-Speed 3013.55 samples/sec Loss 4.5687 LearningRate 0.0099 Epoch: 13 Global Step: 170160 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:49:55,878-Speed 3001.51 samples/sec Loss 4.6090 LearningRate 0.0099 Epoch: 13 Global Step: 170170 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:49:59,234-Speed 3052.27 samples/sec Loss 4.5819 LearningRate 0.0099 Epoch: 13 Global Step: 170180 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:50:02,635-Speed 3011.35 samples/sec Loss 4.5474 LearningRate 0.0099 Epoch: 13 Global Step: 170190 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:50:06,016-Speed 3029.40 samples/sec Loss 4.6983 LearningRate 0.0099 Epoch: 13 Global Step: 170200 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:50:09,481-Speed 2956.61 samples/sec Loss 4.6758 LearningRate 0.0099 Epoch: 13 Global Step: 170210 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:50:12,921-Speed 2977.11 samples/sec Loss 4.6669 LearningRate 0.0099 Epoch: 13 Global Step: 170220 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:50:16,395-Speed 2948.26 samples/sec Loss 4.6680 LearningRate 0.0099 Epoch: 13 Global Step: 170230 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:50:19,827-Speed 2985.21 samples/sec Loss 4.5775 LearningRate 0.0099 Epoch: 13 Global Step: 170240 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:50:23,213-Speed 3024.76 samples/sec Loss 4.6177 LearningRate 0.0099 Epoch: 13 Global Step: 170250 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 17:50:26,650-Speed 2980.14 samples/sec Loss 4.5903 LearningRate 0.0099 Epoch: 13 Global Step: 170260 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 17:50:30,060-Speed 3003.75 samples/sec Loss 4.5853 LearningRate 0.0099 Epoch: 13 Global Step: 170270 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 17:50:33,431-Speed 3038.97 samples/sec Loss 4.6097 LearningRate 0.0099 Epoch: 13 Global Step: 170280 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 17:50:37,515-Speed 2507.83 samples/sec Loss 4.5644 LearningRate 0.0099 Epoch: 13 Global Step: 170290 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 17:50:40,876-Speed 3047.92 samples/sec Loss 4.6808 LearningRate 0.0099 Epoch: 13 Global Step: 170300 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 17:50:44,268-Speed 3019.26 samples/sec Loss 4.5768 LearningRate 0.0099 Epoch: 13 Global Step: 170310 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 17:50:48,366-Speed 2499.58 samples/sec Loss 4.4914 LearningRate 0.0099 Epoch: 13 Global Step: 170320 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 17:50:52,255-Speed 2633.56 samples/sec Loss 4.5549 LearningRate 0.0099 Epoch: 13 Global Step: 170330 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 17:50:56,145-Speed 2633.07 samples/sec Loss 4.6531 LearningRate 0.0099 Epoch: 13 Global Step: 170340 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 17:51:00,130-Speed 2570.08 samples/sec Loss 4.5802 LearningRate 0.0099 Epoch: 13 Global Step: 170350 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:51:03,534-Speed 3009.63 samples/sec Loss 4.6807 LearningRate 0.0099 Epoch: 13 Global Step: 170360 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:51:06,876-Speed 3064.88 samples/sec Loss 4.5925 LearningRate 0.0099 Epoch: 13 Global Step: 170370 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 17:51:10,235-Speed 3049.40 samples/sec Loss 4.5716 LearningRate 0.0099 Epoch: 13 Global Step: 170380 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 17:51:13,545-Speed 3093.90 samples/sec Loss 4.6470 LearningRate 0.0099 Epoch: 13 Global Step: 170390 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 17:51:16,910-Speed 3044.70 samples/sec Loss 4.5788 LearningRate 0.0099 Epoch: 13 Global Step: 170400 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 17:51:20,272-Speed 3046.52 samples/sec Loss 4.5656 LearningRate 0.0099 Epoch: 13 Global Step: 170410 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 17:51:23,660-Speed 3023.46 samples/sec Loss 4.5900 LearningRate 0.0099 Epoch: 13 Global Step: 170420 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 17:51:27,044-Speed 3026.22 samples/sec Loss 4.5974 LearningRate 0.0099 Epoch: 13 Global Step: 170430 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 17:51:30,488-Speed 2973.75 samples/sec Loss 4.5051 LearningRate 0.0099 Epoch: 13 Global Step: 170440 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 17:51:33,859-Speed 3039.17 samples/sec Loss 4.5527 LearningRate 0.0099 Epoch: 13 Global Step: 170450 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 17:51:37,201-Speed 3064.05 samples/sec Loss 4.6159 LearningRate 0.0098 Epoch: 13 Global Step: 170460 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 17:51:40,585-Speed 3027.55 samples/sec Loss 4.6260 LearningRate 0.0098 Epoch: 13 Global Step: 170470 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:51:43,941-Speed 3051.44 samples/sec Loss 4.6425 LearningRate 0.0098 Epoch: 13 Global Step: 170480 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:51:47,338-Speed 3015.75 samples/sec Loss 4.5790 LearningRate 0.0098 Epoch: 13 Global Step: 170490 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:51:50,707-Speed 3040.21 samples/sec Loss 4.5319 LearningRate 0.0098 Epoch: 13 Global Step: 170500 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:51:54,142-Speed 2981.93 samples/sec Loss 4.5593 LearningRate 0.0098 Epoch: 13 Global Step: 170510 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:51:57,561-Speed 2995.74 samples/sec Loss 4.6261 LearningRate 0.0098 Epoch: 13 Global Step: 170520 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:52:00,910-Speed 3058.53 samples/sec Loss 4.6438 LearningRate 0.0098 Epoch: 13 Global Step: 170530 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:52:04,274-Speed 3044.69 samples/sec Loss 4.5741 LearningRate 0.0098 Epoch: 13 Global Step: 170540 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:52:07,665-Speed 3020.43 samples/sec Loss 4.6203 LearningRate 0.0098 Epoch: 13 Global Step: 170550 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:52:11,113-Speed 2971.14 samples/sec Loss 4.5380 LearningRate 0.0098 Epoch: 13 Global Step: 170560 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:52:14,518-Speed 3008.27 samples/sec Loss 4.5823 LearningRate 0.0098 Epoch: 13 Global Step: 170570 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 17:52:17,921-Speed 3010.35 samples/sec Loss 4.6494 LearningRate 0.0098 Epoch: 13 Global Step: 170580 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 17:52:21,334-Speed 3000.82 samples/sec Loss 4.6241 LearningRate 0.0098 Epoch: 13 Global Step: 170590 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 17:52:24,795-Speed 2959.74 samples/sec Loss 4.6043 LearningRate 0.0098 Epoch: 13 Global Step: 170600 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 17:52:28,278-Speed 2940.79 samples/sec Loss 4.5805 LearningRate 0.0098 Epoch: 13 Global Step: 170610 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 17:52:31,641-Speed 3045.58 samples/sec Loss 4.7117 LearningRate 0.0098 Epoch: 13 Global Step: 170620 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 17:52:35,005-Speed 3044.59 samples/sec Loss 4.5873 LearningRate 0.0098 Epoch: 13 Global Step: 170630 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:52:38,374-Speed 3041.28 samples/sec Loss 4.6574 LearningRate 0.0098 Epoch: 13 Global Step: 170640 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:52:41,735-Speed 3047.10 samples/sec Loss 4.5453 LearningRate 0.0098 Epoch: 13 Global Step: 170650 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 17:52:45,203-Speed 2953.71 samples/sec Loss 4.5826 LearningRate 0.0098 Epoch: 13 Global Step: 170660 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 17:52:48,656-Speed 2966.14 samples/sec Loss 4.5775 LearningRate 0.0098 Epoch: 13 Global Step: 170670 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 17:52:52,755-Speed 2498.97 samples/sec Loss 4.6405 LearningRate 0.0098 Epoch: 13 Global Step: 170680 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 17:52:56,191-Speed 2980.32 samples/sec Loss 4.5770 LearningRate 0.0098 Epoch: 13 Global Step: 170690 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 17:53:00,359-Speed 2457.59 samples/sec Loss 4.5071 LearningRate 0.0098 Epoch: 13 Global Step: 170700 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 17:53:03,718-Speed 3049.31 samples/sec Loss 4.6303 LearningRate 0.0098 Epoch: 13 Global Step: 170710 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 17:53:07,047-Speed 3077.54 samples/sec Loss 4.5636 LearningRate 0.0098 Epoch: 13 Global Step: 170720 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 17:53:10,403-Speed 3051.67 samples/sec Loss 4.6215 LearningRate 0.0098 Epoch: 13 Global Step: 170730 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 17:53:13,783-Speed 3031.28 samples/sec Loss 4.5691 LearningRate 0.0098 Epoch: 13 Global Step: 170740 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 17:53:17,172-Speed 3022.22 samples/sec Loss 4.6144 LearningRate 0.0098 Epoch: 13 Global Step: 170750 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:53:20,486-Speed 3090.77 samples/sec Loss 4.5597 LearningRate 0.0098 Epoch: 13 Global Step: 170760 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:53:23,876-Speed 3021.19 samples/sec Loss 4.6733 LearningRate 0.0098 Epoch: 13 Global Step: 170770 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:53:27,241-Speed 3043.54 samples/sec Loss 4.5460 LearningRate 0.0098 Epoch: 13 Global Step: 170780 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:53:30,586-Speed 3062.57 samples/sec Loss 4.5470 LearningRate 0.0098 Epoch: 13 Global Step: 170790 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:53:34,012-Speed 2990.37 samples/sec Loss 4.5775 LearningRate 0.0098 Epoch: 13 Global Step: 170800 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:53:37,349-Speed 3068.84 samples/sec Loss 4.6011 LearningRate 0.0098 Epoch: 13 Global Step: 170810 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:53:40,692-Speed 3064.28 samples/sec Loss 4.6242 LearningRate 0.0098 Epoch: 13 Global Step: 170820 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:53:44,104-Speed 3001.64 samples/sec Loss 4.6162 LearningRate 0.0098 Epoch: 13 Global Step: 170830 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:53:47,510-Speed 3007.55 samples/sec Loss 4.5590 LearningRate 0.0098 Epoch: 13 Global Step: 170840 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:53:50,877-Speed 3042.54 samples/sec Loss 4.5869 LearningRate 0.0098 Epoch: 13 Global Step: 170850 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 17:53:54,183-Speed 3097.89 samples/sec Loss 4.6515 LearningRate 0.0097 Epoch: 13 Global Step: 170860 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:53:57,587-Speed 3009.70 samples/sec Loss 4.5349 LearningRate 0.0097 Epoch: 13 Global Step: 170870 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:54:01,061-Speed 2948.06 samples/sec Loss 4.6559 LearningRate 0.0097 Epoch: 13 Global Step: 170880 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:54:04,416-Speed 3052.72 samples/sec Loss 4.6245 LearningRate 0.0097 Epoch: 13 Global Step: 170890 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:54:07,810-Speed 3017.80 samples/sec Loss 4.6019 LearningRate 0.0097 Epoch: 13 Global Step: 170900 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:54:11,140-Speed 3075.76 samples/sec Loss 4.5651 LearningRate 0.0097 Epoch: 13 Global Step: 170910 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:54:14,454-Speed 3091.05 samples/sec Loss 4.6349 LearningRate 0.0097 Epoch: 13 Global Step: 170920 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:54:17,793-Speed 3067.88 samples/sec Loss 4.6850 LearningRate 0.0097 Epoch: 13 Global Step: 170930 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:54:21,129-Speed 3070.46 samples/sec Loss 4.5999 LearningRate 0.0097 Epoch: 13 Global Step: 170940 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:54:24,541-Speed 3001.93 samples/sec Loss 4.5072 LearningRate 0.0097 Epoch: 13 Global Step: 170950 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:54:27,995-Speed 2965.34 samples/sec Loss 4.5634 LearningRate 0.0097 Epoch: 13 Global Step: 170960 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 17:54:31,471-Speed 2946.71 samples/sec Loss 4.6524 LearningRate 0.0097 Epoch: 13 Global Step: 170970 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:54:34,854-Speed 3027.37 samples/sec Loss 4.7182 LearningRate 0.0097 Epoch: 13 Global Step: 170980 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:54:38,221-Speed 3042.01 samples/sec Loss 4.6189 LearningRate 0.0097 Epoch: 13 Global Step: 170990 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:54:41,606-Speed 3026.27 samples/sec Loss 4.5055 LearningRate 0.0097 Epoch: 13 Global Step: 171000 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:54:45,020-Speed 3000.43 samples/sec Loss 4.6070 LearningRate 0.0097 Epoch: 13 Global Step: 171010 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:54:48,415-Speed 3016.67 samples/sec Loss 4.7112 LearningRate 0.0097 Epoch: 13 Global Step: 171020 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:54:51,779-Speed 3045.00 samples/sec Loss 4.6378 LearningRate 0.0097 Epoch: 13 Global Step: 171030 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:54:55,265-Speed 2938.25 samples/sec Loss 4.6077 LearningRate 0.0097 Epoch: 13 Global Step: 171040 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:54:58,677-Speed 3001.39 samples/sec Loss 4.6355 LearningRate 0.0097 Epoch: 13 Global Step: 171050 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:55:02,105-Speed 2987.96 samples/sec Loss 4.6266 LearningRate 0.0097 Epoch: 13 Global Step: 171060 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:55:05,458-Speed 3054.89 samples/sec Loss 4.5131 LearningRate 0.0097 Epoch: 13 Global Step: 171070 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 17:55:08,827-Speed 3040.74 samples/sec Loss 4.4804 LearningRate 0.0097 Epoch: 13 Global Step: 171080 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:55:12,169-Speed 3064.81 samples/sec Loss 4.6066 LearningRate 0.0097 Epoch: 13 Global Step: 171090 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:55:15,520-Speed 3056.76 samples/sec Loss 4.6362 LearningRate 0.0097 Epoch: 13 Global Step: 171100 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:55:18,878-Speed 3050.27 samples/sec Loss 4.5665 LearningRate 0.0097 Epoch: 13 Global Step: 171110 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:55:22,301-Speed 2992.11 samples/sec Loss 4.5169 LearningRate 0.0097 Epoch: 13 Global Step: 171120 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:55:25,705-Speed 3009.15 samples/sec Loss 4.6118 LearningRate 0.0097 Epoch: 13 Global Step: 171130 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:55:29,031-Speed 3079.66 samples/sec Loss 4.6513 LearningRate 0.0097 Epoch: 13 Global Step: 171140 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:55:32,380-Speed 3058.57 samples/sec Loss 4.6597 LearningRate 0.0097 Epoch: 13 Global Step: 171150 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:55:35,799-Speed 2995.74 samples/sec Loss 4.6586 LearningRate 0.0097 Epoch: 13 Global Step: 171160 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:55:39,252-Speed 2966.70 samples/sec Loss 4.6516 LearningRate 0.0097 Epoch: 13 Global Step: 171170 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:55:42,613-Speed 3047.73 samples/sec Loss 4.5985 LearningRate 0.0097 Epoch: 13 Global Step: 171180 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 17:55:45,897-Speed 3118.32 samples/sec Loss 4.6640 LearningRate 0.0097 Epoch: 13 Global Step: 171190 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:55:49,341-Speed 2975.00 samples/sec Loss 4.6439 LearningRate 0.0097 Epoch: 13 Global Step: 171200 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:55:52,673-Speed 3074.22 samples/sec Loss 4.6094 LearningRate 0.0097 Epoch: 13 Global Step: 171210 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:55:56,076-Speed 3009.48 samples/sec Loss 4.6768 LearningRate 0.0097 Epoch: 13 Global Step: 171220 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:55:59,535-Speed 2961.48 samples/sec Loss 4.5859 LearningRate 0.0097 Epoch: 13 Global Step: 171230 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:56:02,915-Speed 3030.07 samples/sec Loss 4.6760 LearningRate 0.0097 Epoch: 13 Global Step: 171240 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:56:06,286-Speed 3038.68 samples/sec Loss 4.5592 LearningRate 0.0096 Epoch: 13 Global Step: 171250 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 17:56:09,651-Speed 3044.23 samples/sec Loss 4.4969 LearningRate 0.0096 Epoch: 13 Global Step: 171260 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 17:56:12,997-Speed 3061.14 samples/sec Loss 4.5286 LearningRate 0.0096 Epoch: 13 Global Step: 171270 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 17:56:16,347-Speed 3056.95 samples/sec Loss 4.6260 LearningRate 0.0096 Epoch: 13 Global Step: 171280 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 17:56:19,686-Speed 3067.67 samples/sec Loss 4.7817 LearningRate 0.0096 Epoch: 13 Global Step: 171290 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 17:56:23,015-Speed 3076.85 samples/sec Loss 4.6172 LearningRate 0.0096 Epoch: 13 Global Step: 171300 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 17:56:26,408-Speed 3019.35 samples/sec Loss 4.6031 LearningRate 0.0096 Epoch: 13 Global Step: 171310 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 17:56:29,752-Speed 3063.12 samples/sec Loss 4.5683 LearningRate 0.0096 Epoch: 13 Global Step: 171320 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 17:56:33,060-Speed 3095.64 samples/sec Loss 4.5636 LearningRate 0.0096 Epoch: 13 Global Step: 171330 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 17:56:36,439-Speed 3031.68 samples/sec Loss 4.5511 LearningRate 0.0096 Epoch: 13 Global Step: 171340 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 17:56:39,823-Speed 3026.81 samples/sec Loss 4.5838 LearningRate 0.0096 Epoch: 13 Global Step: 171350 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:56:43,210-Speed 3024.41 samples/sec Loss 4.5837 LearningRate 0.0096 Epoch: 13 Global Step: 171360 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:56:46,563-Speed 3054.45 samples/sec Loss 4.5673 LearningRate 0.0096 Epoch: 13 Global Step: 171370 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:56:49,962-Speed 3013.37 samples/sec Loss 4.6245 LearningRate 0.0096 Epoch: 13 Global Step: 171380 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:56:53,296-Speed 3072.28 samples/sec Loss 4.6617 LearningRate 0.0096 Epoch: 13 Global Step: 171390 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:56:56,682-Speed 3025.47 samples/sec Loss 4.5804 LearningRate 0.0096 Epoch: 13 Global Step: 171400 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:57:00,054-Speed 3037.76 samples/sec Loss 4.6436 LearningRate 0.0096 Epoch: 13 Global Step: 171410 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:57:03,440-Speed 3024.44 samples/sec Loss 4.5982 LearningRate 0.0096 Epoch: 13 Global Step: 171420 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:57:06,760-Speed 3085.34 samples/sec Loss 4.6068 LearningRate 0.0096 Epoch: 13 Global Step: 171430 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:57:10,096-Speed 3070.79 samples/sec Loss 4.5161 LearningRate 0.0096 Epoch: 13 Global Step: 171440 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:57:13,377-Speed 3120.96 samples/sec Loss 4.5152 LearningRate 0.0096 Epoch: 13 Global Step: 171450 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:57:16,724-Speed 3060.44 samples/sec Loss 4.6483 LearningRate 0.0096 Epoch: 13 Global Step: 171460 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:57:20,056-Speed 3074.44 samples/sec Loss 4.6118 LearningRate 0.0096 Epoch: 13 Global Step: 171470 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:57:23,435-Speed 3030.80 samples/sec Loss 4.5058 LearningRate 0.0096 Epoch: 13 Global Step: 171480 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:57:26,826-Speed 3020.55 samples/sec Loss 4.6476 LearningRate 0.0096 Epoch: 13 Global Step: 171490 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:57:30,156-Speed 3075.84 samples/sec Loss 4.6880 LearningRate 0.0096 Epoch: 13 Global Step: 171500 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:57:33,513-Speed 3051.83 samples/sec Loss 4.4881 LearningRate 0.0096 Epoch: 13 Global Step: 171510 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:57:36,846-Speed 3072.96 samples/sec Loss 4.6205 LearningRate 0.0096 Epoch: 13 Global Step: 171520 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:57:40,212-Speed 3043.63 samples/sec Loss 4.5923 LearningRate 0.0096 Epoch: 13 Global Step: 171530 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:57:43,575-Speed 3044.99 samples/sec Loss 4.4875 LearningRate 0.0096 Epoch: 13 Global Step: 171540 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:57:46,888-Speed 3091.60 samples/sec Loss 4.5929 LearningRate 0.0096 Epoch: 13 Global Step: 171550 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 17:57:50,328-Speed 2977.37 samples/sec Loss 4.6615 LearningRate 0.0096 Epoch: 13 Global Step: 171560 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 17:57:53,725-Speed 3015.58 samples/sec Loss 4.5416 LearningRate 0.0096 Epoch: 13 Global Step: 171570 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 17:57:57,154-Speed 2987.08 samples/sec Loss 4.5869 LearningRate 0.0096 Epoch: 13 Global Step: 171580 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 17:58:00,569-Speed 2999.55 samples/sec Loss 4.6508 LearningRate 0.0096 Epoch: 13 Global Step: 171590 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 17:58:03,895-Speed 3080.09 samples/sec Loss 4.6409 LearningRate 0.0096 Epoch: 13 Global Step: 171600 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:58:07,268-Speed 3036.64 samples/sec Loss 4.6764 LearningRate 0.0096 Epoch: 13 Global Step: 171610 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:58:10,605-Speed 3069.25 samples/sec Loss 4.5795 LearningRate 0.0096 Epoch: 13 Global Step: 171620 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:58:14,021-Speed 2998.52 samples/sec Loss 4.5325 LearningRate 0.0096 Epoch: 13 Global Step: 171630 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:58:17,417-Speed 3016.75 samples/sec Loss 4.6052 LearningRate 0.0096 Epoch: 13 Global Step: 171640 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:58:20,779-Speed 3046.63 samples/sec Loss 4.5624 LearningRate 0.0096 Epoch: 13 Global Step: 171650 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:58:24,107-Speed 3077.58 samples/sec Loss 4.4824 LearningRate 0.0095 Epoch: 13 Global Step: 171660 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:58:27,517-Speed 3003.90 samples/sec Loss 4.6415 LearningRate 0.0095 Epoch: 13 Global Step: 171670 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:58:30,897-Speed 3031.02 samples/sec Loss 4.5537 LearningRate 0.0095 Epoch: 13 Global Step: 171680 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:58:34,258-Speed 3046.85 samples/sec Loss 4.4879 LearningRate 0.0095 Epoch: 13 Global Step: 171690 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:58:37,601-Speed 3064.08 samples/sec Loss 4.5011 LearningRate 0.0095 Epoch: 13 Global Step: 171700 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:58:41,021-Speed 2995.63 samples/sec Loss 4.5347 LearningRate 0.0095 Epoch: 13 Global Step: 171710 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:58:44,403-Speed 3028.37 samples/sec Loss 4.5052 LearningRate 0.0095 Epoch: 13 Global Step: 171720 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:58:47,743-Speed 3067.78 samples/sec Loss 4.5594 LearningRate 0.0095 Epoch: 13 Global Step: 171730 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:58:51,071-Speed 3078.14 samples/sec Loss 4.5631 LearningRate 0.0095 Epoch: 13 Global Step: 171740 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:58:54,488-Speed 2997.74 samples/sec Loss 4.4670 LearningRate 0.0095 Epoch: 13 Global Step: 171750 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:58:57,815-Speed 3078.41 samples/sec Loss 4.5985 LearningRate 0.0095 Epoch: 13 Global Step: 171760 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:59:01,260-Speed 2973.63 samples/sec Loss 4.7160 LearningRate 0.0095 Epoch: 13 Global Step: 171770 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:59:04,569-Speed 3094.72 samples/sec Loss 4.5237 LearningRate 0.0095 Epoch: 13 Global Step: 171780 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:59:07,949-Speed 3030.96 samples/sec Loss 4.5822 LearningRate 0.0095 Epoch: 13 Global Step: 171790 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:59:11,368-Speed 2996.05 samples/sec Loss 4.5578 LearningRate 0.0095 Epoch: 13 Global Step: 171800 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 17:59:14,763-Speed 3016.49 samples/sec Loss 4.5884 LearningRate 0.0095 Epoch: 13 Global Step: 171810 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 17:59:18,140-Speed 3033.21 samples/sec Loss 4.6460 LearningRate 0.0095 Epoch: 13 Global Step: 171820 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 17:59:21,481-Speed 3065.94 samples/sec Loss 4.5701 LearningRate 0.0095 Epoch: 13 Global Step: 171830 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 17:59:24,811-Speed 3075.71 samples/sec Loss 4.7124 LearningRate 0.0095 Epoch: 13 Global Step: 171840 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:59:28,252-Speed 2977.16 samples/sec Loss 4.5470 LearningRate 0.0095 Epoch: 13 Global Step: 171850 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:59:31,670-Speed 2997.01 samples/sec Loss 4.5900 LearningRate 0.0095 Epoch: 13 Global Step: 171860 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:59:35,145-Speed 2947.72 samples/sec Loss 4.6059 LearningRate 0.0095 Epoch: 13 Global Step: 171870 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:59:38,566-Speed 2993.50 samples/sec Loss 4.5638 LearningRate 0.0095 Epoch: 13 Global Step: 171880 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:59:41,996-Speed 2986.34 samples/sec Loss 4.5648 LearningRate 0.0095 Epoch: 13 Global Step: 171890 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:59:45,356-Speed 3049.05 samples/sec Loss 4.5700 LearningRate 0.0095 Epoch: 13 Global Step: 171900 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:59:48,802-Speed 2971.95 samples/sec Loss 4.6634 LearningRate 0.0095 Epoch: 13 Global Step: 171910 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:59:52,198-Speed 3016.21 samples/sec Loss 4.6714 LearningRate 0.0095 Epoch: 13 Global Step: 171920 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:59:55,605-Speed 3006.86 samples/sec Loss 4.6334 LearningRate 0.0095 Epoch: 13 Global Step: 171930 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 17:59:58,953-Speed 3058.90 samples/sec Loss 4.5227 LearningRate 0.0095 Epoch: 13 Global Step: 171940 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:00:02,406-Speed 2967.03 samples/sec Loss 4.5810 LearningRate 0.0095 Epoch: 13 Global Step: 171950 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:00:05,839-Speed 2983.47 samples/sec Loss 4.5308 LearningRate 0.0095 Epoch: 13 Global Step: 171960 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:00:09,237-Speed 3014.25 samples/sec Loss 4.5847 LearningRate 0.0095 Epoch: 13 Global Step: 171970 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:00:12,663-Speed 2992.62 samples/sec Loss 4.5833 LearningRate 0.0095 Epoch: 13 Global Step: 171980 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:00:16,119-Speed 2963.70 samples/sec Loss 4.6671 LearningRate 0.0095 Epoch: 13 Global Step: 171990 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:00:19,510-Speed 3020.08 samples/sec Loss 4.5892 LearningRate 0.0095 Epoch: 13 Global Step: 172000 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:00:22,890-Speed 3031.00 samples/sec Loss 4.5168 LearningRate 0.0095 Epoch: 13 Global Step: 172010 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:00:26,246-Speed 3051.98 samples/sec Loss 4.5676 LearningRate 0.0095 Epoch: 13 Global Step: 172020 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:00:29,629-Speed 3027.33 samples/sec Loss 4.5959 LearningRate 0.0095 Epoch: 13 Global Step: 172030 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:00:32,960-Speed 3074.85 samples/sec Loss 4.5628 LearningRate 0.0095 Epoch: 13 Global Step: 172040 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:00:36,414-Speed 2965.66 samples/sec Loss 4.7159 LearningRate 0.0095 Epoch: 13 Global Step: 172050 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:00:39,841-Speed 2989.05 samples/sec Loss 4.5868 LearningRate 0.0094 Epoch: 13 Global Step: 172060 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:00:43,213-Speed 3038.34 samples/sec Loss 4.4951 LearningRate 0.0094 Epoch: 13 Global Step: 172070 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:00:46,625-Speed 3001.65 samples/sec Loss 4.5112 LearningRate 0.0094 Epoch: 13 Global Step: 172080 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:00:49,969-Speed 3063.64 samples/sec Loss 4.5934 LearningRate 0.0094 Epoch: 13 Global Step: 172090 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:00:53,294-Speed 3080.59 samples/sec Loss 4.6199 LearningRate 0.0094 Epoch: 13 Global Step: 172100 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:00:56,715-Speed 2994.06 samples/sec Loss 4.6145 LearningRate 0.0094 Epoch: 13 Global Step: 172110 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:01:00,144-Speed 2987.16 samples/sec Loss 4.5954 LearningRate 0.0094 Epoch: 13 Global Step: 172120 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:01:03,604-Speed 2960.25 samples/sec Loss 4.5702 LearningRate 0.0094 Epoch: 13 Global Step: 172130 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:01:06,949-Speed 3062.18 samples/sec Loss 4.4849 LearningRate 0.0094 Epoch: 13 Global Step: 172140 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:01:10,355-Speed 3008.06 samples/sec Loss 4.5591 LearningRate 0.0094 Epoch: 13 Global Step: 172150 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:01:13,745-Speed 3020.58 samples/sec Loss 4.5740 LearningRate 0.0094 Epoch: 13 Global Step: 172160 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:01:17,208-Speed 2958.37 samples/sec Loss 4.5334 LearningRate 0.0094 Epoch: 13 Global Step: 172170 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:01:20,614-Speed 3007.38 samples/sec Loss 4.6222 LearningRate 0.0094 Epoch: 13 Global Step: 172180 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:01:24,054-Speed 2977.92 samples/sec Loss 4.6609 LearningRate 0.0094 Epoch: 13 Global Step: 172190 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:01:27,464-Speed 3003.79 samples/sec Loss 4.5619 LearningRate 0.0094 Epoch: 13 Global Step: 172200 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:01:31,038-Speed 2866.16 samples/sec Loss 4.5079 LearningRate 0.0094 Epoch: 13 Global Step: 172210 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:01:34,398-Speed 3048.79 samples/sec Loss 4.6269 LearningRate 0.0094 Epoch: 13 Global Step: 172220 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:01:37,747-Speed 3058.33 samples/sec Loss 4.6023 LearningRate 0.0094 Epoch: 13 Global Step: 172230 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:01:41,158-Speed 3003.42 samples/sec Loss 4.5041 LearningRate 0.0094 Epoch: 13 Global Step: 172240 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:01:44,633-Speed 2947.03 samples/sec Loss 4.6197 LearningRate 0.0094 Epoch: 13 Global Step: 172250 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:01:48,012-Speed 3031.65 samples/sec Loss 4.5960 LearningRate 0.0094 Epoch: 13 Global Step: 172260 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:01:51,359-Speed 3060.54 samples/sec Loss 4.5828 LearningRate 0.0094 Epoch: 13 Global Step: 172270 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:01:54,699-Speed 3066.89 samples/sec Loss 4.5634 LearningRate 0.0094 Epoch: 13 Global Step: 172280 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:01:58,115-Speed 2998.46 samples/sec Loss 4.5434 LearningRate 0.0094 Epoch: 13 Global Step: 172290 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:02:01,491-Speed 3033.44 samples/sec Loss 4.5823 LearningRate 0.0094 Epoch: 13 Global Step: 172300 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:02:04,845-Speed 3054.16 samples/sec Loss 4.6175 LearningRate 0.0094 Epoch: 13 Global Step: 172310 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:02:08,198-Speed 3054.89 samples/sec Loss 4.4836 LearningRate 0.0094 Epoch: 13 Global Step: 172320 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:02:11,555-Speed 3051.40 samples/sec Loss 4.5355 LearningRate 0.0094 Epoch: 13 Global Step: 172330 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:02:14,910-Speed 3053.29 samples/sec Loss 4.6161 LearningRate 0.0094 Epoch: 13 Global Step: 172340 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:02:18,312-Speed 3010.37 samples/sec Loss 4.5450 LearningRate 0.0094 Epoch: 13 Global Step: 172350 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:02:21,674-Speed 3046.74 samples/sec Loss 4.5673 LearningRate 0.0094 Epoch: 13 Global Step: 172360 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:02:25,030-Speed 3052.20 samples/sec Loss 4.6223 LearningRate 0.0094 Epoch: 13 Global Step: 172370 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:02:28,455-Speed 2990.60 samples/sec Loss 4.5394 LearningRate 0.0094 Epoch: 13 Global Step: 172380 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:02:31,818-Speed 3046.49 samples/sec Loss 4.6193 LearningRate 0.0094 Epoch: 13 Global Step: 172390 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 18:02:35,232-Speed 3000.35 samples/sec Loss 4.5785 LearningRate 0.0094 Epoch: 13 Global Step: 172400 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:02:38,707-Speed 2947.03 samples/sec Loss 4.5932 LearningRate 0.0094 Epoch: 13 Global Step: 172410 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:02:42,200-Speed 2932.76 samples/sec Loss 4.6483 LearningRate 0.0094 Epoch: 13 Global Step: 172420 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:02:45,644-Speed 2973.89 samples/sec Loss 4.6140 LearningRate 0.0094 Epoch: 13 Global Step: 172430 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:02:49,081-Speed 2980.12 samples/sec Loss 4.4759 LearningRate 0.0094 Epoch: 13 Global Step: 172440 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:02:52,535-Speed 2966.04 samples/sec Loss 4.6397 LearningRate 0.0094 Epoch: 13 Global Step: 172450 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:02:55,944-Speed 3003.99 samples/sec Loss 4.5835 LearningRate 0.0093 Epoch: 13 Global Step: 172460 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:02:59,313-Speed 3041.08 samples/sec Loss 4.5299 LearningRate 0.0093 Epoch: 13 Global Step: 172470 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:03:02,730-Speed 2997.74 samples/sec Loss 4.6266 LearningRate 0.0093 Epoch: 13 Global Step: 172480 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:03:06,097-Speed 3042.28 samples/sec Loss 4.5957 LearningRate 0.0093 Epoch: 13 Global Step: 172490 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:03:09,516-Speed 2996.30 samples/sec Loss 4.5375 LearningRate 0.0093 Epoch: 13 Global Step: 172500 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:03:12,985-Speed 2952.24 samples/sec Loss 4.5755 LearningRate 0.0093 Epoch: 13 Global Step: 172510 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:03:16,339-Speed 3053.74 samples/sec Loss 4.5161 LearningRate 0.0093 Epoch: 13 Global Step: 172520 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:03:19,676-Speed 3069.74 samples/sec Loss 4.6087 LearningRate 0.0093 Epoch: 13 Global Step: 172530 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:03:23,057-Speed 3029.35 samples/sec Loss 4.5261 LearningRate 0.0093 Epoch: 13 Global Step: 172540 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:03:26,415-Speed 3050.62 samples/sec Loss 4.4442 LearningRate 0.0093 Epoch: 13 Global Step: 172550 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:03:29,801-Speed 3025.09 samples/sec Loss 4.5070 LearningRate 0.0093 Epoch: 13 Global Step: 172560 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:03:33,199-Speed 3014.44 samples/sec Loss 4.5352 LearningRate 0.0093 Epoch: 13 Global Step: 172570 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:03:36,535-Speed 3069.85 samples/sec Loss 4.5879 LearningRate 0.0093 Epoch: 13 Global Step: 172580 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:03:39,947-Speed 3002.95 samples/sec Loss 4.4983 LearningRate 0.0093 Epoch: 13 Global Step: 172590 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:03:43,358-Speed 3003.23 samples/sec Loss 4.4844 LearningRate 0.0093 Epoch: 13 Global Step: 172600 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:03:46,686-Speed 3077.11 samples/sec Loss 4.5360 LearningRate 0.0093 Epoch: 13 Global Step: 172610 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:03:50,056-Speed 3040.22 samples/sec Loss 4.4313 LearningRate 0.0093 Epoch: 13 Global Step: 172620 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:03:53,428-Speed 3037.62 samples/sec Loss 4.5941 LearningRate 0.0093 Epoch: 13 Global Step: 172630 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:03:56,773-Speed 3061.27 samples/sec Loss 4.6506 LearningRate 0.0093 Epoch: 13 Global Step: 172640 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:04:00,278-Speed 2922.66 samples/sec Loss 4.4813 LearningRate 0.0093 Epoch: 13 Global Step: 172650 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:04:03,757-Speed 2944.51 samples/sec Loss 4.6032 LearningRate 0.0093 Epoch: 13 Global Step: 172660 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:04:07,186-Speed 2986.41 samples/sec Loss 4.5598 LearningRate 0.0093 Epoch: 13 Global Step: 172670 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:04:10,549-Speed 3046.17 samples/sec Loss 4.5996 LearningRate 0.0093 Epoch: 13 Global Step: 172680 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:04:13,937-Speed 3023.71 samples/sec Loss 4.4634 LearningRate 0.0093 Epoch: 13 Global Step: 172690 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:04:17,360-Speed 2991.85 samples/sec Loss 4.5287 LearningRate 0.0093 Epoch: 13 Global Step: 172700 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:04:20,728-Speed 3041.36 samples/sec Loss 4.5571 LearningRate 0.0093 Epoch: 13 Global Step: 172710 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:04:24,055-Speed 3078.23 samples/sec Loss 4.4660 LearningRate 0.0093 Epoch: 13 Global Step: 172720 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:04:27,443-Speed 3023.31 samples/sec Loss 4.4898 LearningRate 0.0093 Epoch: 13 Global Step: 172730 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:04:30,788-Speed 3062.37 samples/sec Loss 4.4745 LearningRate 0.0093 Epoch: 13 Global Step: 172740 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:04:34,214-Speed 2991.02 samples/sec Loss 4.5428 LearningRate 0.0093 Epoch: 13 Global Step: 172750 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:04:37,575-Speed 3047.93 samples/sec Loss 4.5478 LearningRate 0.0093 Epoch: 13 Global Step: 172760 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:04:40,991-Speed 2998.17 samples/sec Loss 4.6438 LearningRate 0.0093 Epoch: 13 Global Step: 172770 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:04:44,374-Speed 3027.76 samples/sec Loss 4.6117 LearningRate 0.0093 Epoch: 13 Global Step: 172780 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:04:47,812-Speed 2979.30 samples/sec Loss 4.4416 LearningRate 0.0093 Epoch: 13 Global Step: 172790 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:04:51,280-Speed 2953.85 samples/sec Loss 4.5951 LearningRate 0.0093 Epoch: 13 Global Step: 172800 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:04:54,682-Speed 3011.12 samples/sec Loss 4.4259 LearningRate 0.0093 Epoch: 13 Global Step: 172810 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:04:58,057-Speed 3034.10 samples/sec Loss 4.4877 LearningRate 0.0093 Epoch: 13 Global Step: 172820 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:05:01,429-Speed 3038.11 samples/sec Loss 4.4870 LearningRate 0.0093 Epoch: 13 Global Step: 172830 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:05:04,746-Speed 3088.01 samples/sec Loss 4.5067 LearningRate 0.0093 Epoch: 13 Global Step: 172840 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:05:08,155-Speed 3005.21 samples/sec Loss 4.5731 LearningRate 0.0093 Epoch: 13 Global Step: 172850 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:05:11,571-Speed 2997.89 samples/sec Loss 4.4731 LearningRate 0.0093 Epoch: 13 Global Step: 172860 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:05:14,925-Speed 3054.33 samples/sec Loss 4.6056 LearningRate 0.0092 Epoch: 13 Global Step: 172870 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:05:18,286-Speed 3047.73 samples/sec Loss 4.5765 LearningRate 0.0092 Epoch: 13 Global Step: 172880 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:05:21,641-Speed 3052.73 samples/sec Loss 4.5387 LearningRate 0.0092 Epoch: 13 Global Step: 172890 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:05:25,046-Speed 3008.84 samples/sec Loss 4.6425 LearningRate 0.0092 Epoch: 13 Global Step: 172900 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:05:28,404-Speed 3049.63 samples/sec Loss 4.5067 LearningRate 0.0092 Epoch: 13 Global Step: 172910 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:05:31,778-Speed 3036.16 samples/sec Loss 4.4243 LearningRate 0.0092 Epoch: 13 Global Step: 172920 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:05:35,128-Speed 3057.33 samples/sec Loss 4.4245 LearningRate 0.0092 Epoch: 13 Global Step: 172930 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:05:38,497-Speed 3040.53 samples/sec Loss 4.6102 LearningRate 0.0092 Epoch: 13 Global Step: 172940 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:05:41,897-Speed 3012.76 samples/sec Loss 4.5483 LearningRate 0.0092 Epoch: 13 Global Step: 172950 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:05:45,252-Speed 3052.95 samples/sec Loss 4.4271 LearningRate 0.0092 Epoch: 13 Global Step: 172960 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:05:48,689-Speed 2980.21 samples/sec Loss 4.6000 LearningRate 0.0092 Epoch: 13 Global Step: 172970 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:05:52,021-Speed 3074.38 samples/sec Loss 4.4857 LearningRate 0.0092 Epoch: 13 Global Step: 172980 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:05:55,391-Speed 3039.37 samples/sec Loss 4.4639 LearningRate 0.0092 Epoch: 13 Global Step: 172990 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:05:58,766-Speed 3034.61 samples/sec Loss 4.6734 LearningRate 0.0092 Epoch: 13 Global Step: 173000 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:06:02,096-Speed 3076.44 samples/sec Loss 4.5923 LearningRate 0.0092 Epoch: 13 Global Step: 173010 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:06:05,458-Speed 3046.46 samples/sec Loss 4.4956 LearningRate 0.0092 Epoch: 13 Global Step: 173020 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:06:08,836-Speed 3032.43 samples/sec Loss 4.6191 LearningRate 0.0092 Epoch: 13 Global Step: 173030 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:06:12,241-Speed 3007.80 samples/sec Loss 4.5035 LearningRate 0.0092 Epoch: 13 Global Step: 173040 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:06:15,641-Speed 3013.36 samples/sec Loss 4.6390 LearningRate 0.0092 Epoch: 13 Global Step: 173050 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:06:19,065-Speed 2991.50 samples/sec Loss 4.5430 LearningRate 0.0092 Epoch: 13 Global Step: 173060 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:06:22,458-Speed 3018.61 samples/sec Loss 4.4948 LearningRate 0.0092 Epoch: 13 Global Step: 173070 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:06:25,911-Speed 2966.52 samples/sec Loss 4.5622 LearningRate 0.0092 Epoch: 13 Global Step: 173080 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:06:29,255-Speed 3062.50 samples/sec Loss 4.4386 LearningRate 0.0092 Epoch: 13 Global Step: 173090 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:06:32,619-Speed 3045.12 samples/sec Loss 4.5883 LearningRate 0.0092 Epoch: 13 Global Step: 173100 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:06:36,003-Speed 3026.78 samples/sec Loss 4.5556 LearningRate 0.0092 Epoch: 13 Global Step: 173110 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:06:39,337-Speed 3072.59 samples/sec Loss 4.5144 LearningRate 0.0092 Epoch: 13 Global Step: 173120 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:06:42,675-Speed 3068.96 samples/sec Loss 4.5160 LearningRate 0.0092 Epoch: 13 Global Step: 173130 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:06:46,088-Speed 3001.56 samples/sec Loss 4.5019 LearningRate 0.0092 Epoch: 13 Global Step: 173140 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:06:49,435-Speed 3060.17 samples/sec Loss 4.5021 LearningRate 0.0092 Epoch: 13 Global Step: 173150 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:06:52,794-Speed 3049.94 samples/sec Loss 4.4750 LearningRate 0.0092 Epoch: 13 Global Step: 173160 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:06:56,132-Speed 3068.51 samples/sec Loss 4.5923 LearningRate 0.0092 Epoch: 13 Global Step: 173170 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:06:59,502-Speed 3039.45 samples/sec Loss 4.4426 LearningRate 0.0092 Epoch: 13 Global Step: 173180 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:07:02,908-Speed 3006.70 samples/sec Loss 4.5044 LearningRate 0.0092 Epoch: 13 Global Step: 173190 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:07:06,272-Speed 3045.26 samples/sec Loss 4.5675 LearningRate 0.0092 Epoch: 13 Global Step: 173200 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:07:09,729-Speed 2962.56 samples/sec Loss 4.5509 LearningRate 0.0092 Epoch: 13 Global Step: 173210 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:07:13,071-Speed 3065.05 samples/sec Loss 4.5171 LearningRate 0.0092 Epoch: 13 Global Step: 173220 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:07:16,432-Speed 3047.89 samples/sec Loss 4.4408 LearningRate 0.0092 Epoch: 13 Global Step: 173230 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:07:19,841-Speed 3004.40 samples/sec Loss 4.4880 LearningRate 0.0092 Epoch: 13 Global Step: 173240 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:07:23,222-Speed 3029.70 samples/sec Loss 4.5680 LearningRate 0.0092 Epoch: 13 Global Step: 173250 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:07:26,690-Speed 2953.28 samples/sec Loss 4.5205 LearningRate 0.0092 Epoch: 13 Global Step: 173260 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:07:30,176-Speed 2938.77 samples/sec Loss 4.5457 LearningRate 0.0092 Epoch: 13 Global Step: 173270 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:07:33,653-Speed 2946.00 samples/sec Loss 4.5768 LearningRate 0.0091 Epoch: 13 Global Step: 173280 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:07:37,087-Speed 2982.45 samples/sec Loss 4.5797 LearningRate 0.0091 Epoch: 13 Global Step: 173290 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:07:40,457-Speed 3039.07 samples/sec Loss 4.5520 LearningRate 0.0091 Epoch: 13 Global Step: 173300 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:07:43,832-Speed 3035.51 samples/sec Loss 4.5307 LearningRate 0.0091 Epoch: 13 Global Step: 173310 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:07:47,233-Speed 3011.71 samples/sec Loss 4.5691 LearningRate 0.0091 Epoch: 13 Global Step: 173320 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:07:50,617-Speed 3026.81 samples/sec Loss 4.4582 LearningRate 0.0091 Epoch: 13 Global Step: 173330 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:07:54,011-Speed 3017.78 samples/sec Loss 4.5089 LearningRate 0.0091 Epoch: 13 Global Step: 173340 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:07:57,444-Speed 2983.91 samples/sec Loss 4.5991 LearningRate 0.0091 Epoch: 13 Global Step: 173350 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:08:00,766-Speed 3083.28 samples/sec Loss 4.4692 LearningRate 0.0091 Epoch: 13 Global Step: 173360 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:08:04,143-Speed 3033.36 samples/sec Loss 4.5550 LearningRate 0.0091 Epoch: 13 Global Step: 173370 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:08:07,473-Speed 3075.27 samples/sec Loss 4.5582 LearningRate 0.0091 Epoch: 13 Global Step: 173380 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:08:10,842-Speed 3041.17 samples/sec Loss 4.6458 LearningRate 0.0091 Epoch: 13 Global Step: 173390 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:08:14,185-Speed 3063.26 samples/sec Loss 4.5256 LearningRate 0.0091 Epoch: 13 Global Step: 173400 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:08:17,498-Speed 3091.78 samples/sec Loss 4.5430 LearningRate 0.0091 Epoch: 13 Global Step: 173410 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:08:20,845-Speed 3060.41 samples/sec Loss 4.5415 LearningRate 0.0091 Epoch: 13 Global Step: 173420 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:08:24,196-Speed 3057.27 samples/sec Loss 4.4871 LearningRate 0.0091 Epoch: 13 Global Step: 173430 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:08:27,543-Speed 3060.27 samples/sec Loss 4.5161 LearningRate 0.0091 Epoch: 13 Global Step: 173440 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:08:30,958-Speed 2999.65 samples/sec Loss 4.5820 LearningRate 0.0091 Epoch: 13 Global Step: 173450 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:08:34,350-Speed 3019.00 samples/sec Loss 4.4165 LearningRate 0.0091 Epoch: 13 Global Step: 173460 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:08:37,751-Speed 3012.45 samples/sec Loss 4.4761 LearningRate 0.0091 Epoch: 13 Global Step: 173470 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:08:41,206-Speed 2964.71 samples/sec Loss 4.5315 LearningRate 0.0091 Epoch: 13 Global Step: 173480 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:08:44,656-Speed 2968.41 samples/sec Loss 4.4547 LearningRate 0.0091 Epoch: 13 Global Step: 173490 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:08:48,091-Speed 2982.64 samples/sec Loss 4.5799 LearningRate 0.0091 Epoch: 13 Global Step: 173500 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:08:51,417-Speed 3079.70 samples/sec Loss 4.6797 LearningRate 0.0091 Epoch: 13 Global Step: 173510 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:08:54,816-Speed 3013.43 samples/sec Loss 4.5538 LearningRate 0.0091 Epoch: 13 Global Step: 173520 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:08:58,219-Speed 3010.21 samples/sec Loss 4.6321 LearningRate 0.0091 Epoch: 13 Global Step: 173530 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:09:01,507-Speed 3114.95 samples/sec Loss 4.5104 LearningRate 0.0091 Epoch: 13 Global Step: 173540 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:09:04,822-Speed 3089.79 samples/sec Loss 4.6263 LearningRate 0.0091 Epoch: 13 Global Step: 173550 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:09:08,220-Speed 3013.80 samples/sec Loss 4.6034 LearningRate 0.0091 Epoch: 13 Global Step: 173560 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:09:11,552-Speed 3074.82 samples/sec Loss 4.5887 LearningRate 0.0091 Epoch: 13 Global Step: 173570 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:09:14,986-Speed 2982.08 samples/sec Loss 4.5084 LearningRate 0.0091 Epoch: 13 Global Step: 173580 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:09:18,304-Speed 3088.10 samples/sec Loss 4.5629 LearningRate 0.0091 Epoch: 13 Global Step: 173590 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:09:21,690-Speed 3024.17 samples/sec Loss 4.5451 LearningRate 0.0091 Epoch: 13 Global Step: 173600 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:09:25,068-Speed 3032.95 samples/sec Loss 4.4561 LearningRate 0.0091 Epoch: 13 Global Step: 173610 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:09:28,478-Speed 3003.18 samples/sec Loss 4.5084 LearningRate 0.0091 Epoch: 13 Global Step: 173620 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:09:31,913-Speed 2982.17 samples/sec Loss 4.4731 LearningRate 0.0091 Epoch: 13 Global Step: 173630 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:09:35,263-Speed 3058.21 samples/sec Loss 4.4850 LearningRate 0.0091 Epoch: 13 Global Step: 173640 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:09:38,752-Speed 2935.91 samples/sec Loss 4.5012 LearningRate 0.0091 Epoch: 13 Global Step: 173650 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:09:42,090-Speed 3067.78 samples/sec Loss 4.5170 LearningRate 0.0091 Epoch: 13 Global Step: 173660 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:09:45,486-Speed 3017.29 samples/sec Loss 4.6095 LearningRate 0.0091 Epoch: 13 Global Step: 173670 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:09:48,924-Speed 2979.12 samples/sec Loss 4.4989 LearningRate 0.0091 Epoch: 13 Global Step: 173680 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:09:52,336-Speed 3001.37 samples/sec Loss 4.5753 LearningRate 0.0090 Epoch: 13 Global Step: 173690 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:09:55,675-Speed 3067.29 samples/sec Loss 4.5065 LearningRate 0.0090 Epoch: 13 Global Step: 173700 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:09:59,119-Speed 2974.20 samples/sec Loss 4.5010 LearningRate 0.0090 Epoch: 13 Global Step: 173710 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:10:02,524-Speed 3008.73 samples/sec Loss 4.5557 LearningRate 0.0090 Epoch: 13 Global Step: 173720 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:10:05,934-Speed 3003.80 samples/sec Loss 4.5137 LearningRate 0.0090 Epoch: 13 Global Step: 173730 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:10:09,268-Speed 3072.38 samples/sec Loss 4.4380 LearningRate 0.0090 Epoch: 13 Global Step: 173740 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:10:12,751-Speed 2940.69 samples/sec Loss 4.5861 LearningRate 0.0090 Epoch: 13 Global Step: 173750 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:10:16,206-Speed 2964.93 samples/sec Loss 4.5157 LearningRate 0.0090 Epoch: 13 Global Step: 173760 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:10:19,610-Speed 3009.25 samples/sec Loss 4.5658 LearningRate 0.0090 Epoch: 13 Global Step: 173770 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:10:23,031-Speed 2994.17 samples/sec Loss 4.5265 LearningRate 0.0090 Epoch: 13 Global Step: 173780 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:10:26,404-Speed 3036.66 samples/sec Loss 4.5696 LearningRate 0.0090 Epoch: 13 Global Step: 173790 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:10:29,890-Speed 2938.42 samples/sec Loss 4.4814 LearningRate 0.0090 Epoch: 13 Global Step: 173800 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:10:33,231-Speed 3065.61 samples/sec Loss 4.3976 LearningRate 0.0090 Epoch: 13 Global Step: 173810 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:10:36,610-Speed 3030.86 samples/sec Loss 4.5782 LearningRate 0.0090 Epoch: 13 Global Step: 173820 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:10:39,964-Speed 3054.22 samples/sec Loss 4.4982 LearningRate 0.0090 Epoch: 13 Global Step: 173830 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:10:43,395-Speed 2985.96 samples/sec Loss 4.4638 LearningRate 0.0090 Epoch: 13 Global Step: 173840 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:10:46,739-Speed 3063.10 samples/sec Loss 4.5618 LearningRate 0.0090 Epoch: 13 Global Step: 173850 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:10:50,143-Speed 3008.97 samples/sec Loss 4.3983 LearningRate 0.0090 Epoch: 13 Global Step: 173860 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:10:53,521-Speed 3032.36 samples/sec Loss 4.5171 LearningRate 0.0090 Epoch: 13 Global Step: 173870 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:10:56,856-Speed 3070.93 samples/sec Loss 4.5482 LearningRate 0.0090 Epoch: 13 Global Step: 173880 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:11:00,586-Speed 2746.41 samples/sec Loss 4.4345 LearningRate 0.0090 Epoch: 13 Global Step: 173890 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:11:33,222-Speed 313.78 samples/sec Loss 3.7024 LearningRate 0.0090 Epoch: 14 Global Step: 173900 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:11:36,761-Speed 2894.14 samples/sec Loss 3.1546 LearningRate 0.0090 Epoch: 14 Global Step: 173910 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:11:40,259-Speed 2928.07 samples/sec Loss 3.1538 LearningRate 0.0090 Epoch: 14 Global Step: 173920 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:11:43,635-Speed 3034.22 samples/sec Loss 3.1666 LearningRate 0.0090 Epoch: 14 Global Step: 173930 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:11:47,044-Speed 3004.98 samples/sec Loss 3.2521 LearningRate 0.0090 Epoch: 14 Global Step: 173940 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:11:50,457-Speed 3000.28 samples/sec Loss 3.1619 LearningRate 0.0090 Epoch: 14 Global Step: 173950 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:11:53,831-Speed 3035.86 samples/sec Loss 3.3285 LearningRate 0.0090 Epoch: 14 Global Step: 173960 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:11:57,196-Speed 3051.97 samples/sec Loss 3.1538 LearningRate 0.0090 Epoch: 14 Global Step: 173970 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:12:00,578-Speed 3028.35 samples/sec Loss 3.2859 LearningRate 0.0090 Epoch: 14 Global Step: 173980 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:12:03,938-Speed 3049.02 samples/sec Loss 3.1419 LearningRate 0.0090 Epoch: 14 Global Step: 173990 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:12:07,252-Speed 3090.71 samples/sec Loss 3.1603 LearningRate 0.0090 Epoch: 14 Global Step: 174000 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:12:10,706-Speed 2965.46 samples/sec Loss 3.2215 LearningRate 0.0090 Epoch: 14 Global Step: 174010 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:12:14,113-Speed 3006.43 samples/sec Loss 3.2194 LearningRate 0.0090 Epoch: 14 Global Step: 174020 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:12:17,454-Speed 3065.97 samples/sec Loss 3.1945 LearningRate 0.0090 Epoch: 14 Global Step: 174030 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:12:20,764-Speed 3094.55 samples/sec Loss 3.2307 LearningRate 0.0090 Epoch: 14 Global Step: 174040 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:12:24,088-Speed 3082.01 samples/sec Loss 3.1493 LearningRate 0.0090 Epoch: 14 Global Step: 174050 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:12:27,391-Speed 3101.33 samples/sec Loss 3.1606 LearningRate 0.0090 Epoch: 14 Global Step: 174060 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:12:30,823-Speed 2984.78 samples/sec Loss 3.2591 LearningRate 0.0090 Epoch: 14 Global Step: 174070 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:12:34,178-Speed 3052.49 samples/sec Loss 3.2395 LearningRate 0.0090 Epoch: 14 Global Step: 174080 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:12:37,474-Speed 3107.82 samples/sec Loss 3.2368 LearningRate 0.0090 Epoch: 14 Global Step: 174090 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:12:40,908-Speed 2983.04 samples/sec Loss 3.2291 LearningRate 0.0090 Epoch: 14 Global Step: 174100 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:12:44,362-Speed 2965.21 samples/sec Loss 3.1594 LearningRate 0.0089 Epoch: 14 Global Step: 174110 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:12:47,770-Speed 3005.64 samples/sec Loss 3.1427 LearningRate 0.0089 Epoch: 14 Global Step: 174120 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:12:51,318-Speed 2887.58 samples/sec Loss 3.2266 LearningRate 0.0089 Epoch: 14 Global Step: 174130 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:12:54,730-Speed 3001.72 samples/sec Loss 3.1891 LearningRate 0.0089 Epoch: 14 Global Step: 174140 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:12:58,135-Speed 3007.76 samples/sec Loss 3.1702 LearningRate 0.0089 Epoch: 14 Global Step: 174150 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:13:01,751-Speed 2833.02 samples/sec Loss 3.1403 LearningRate 0.0089 Epoch: 14 Global Step: 174160 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:13:05,189-Speed 2979.50 samples/sec Loss 3.2608 LearningRate 0.0089 Epoch: 14 Global Step: 174170 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:13:08,626-Speed 2980.00 samples/sec Loss 3.1745 LearningRate 0.0089 Epoch: 14 Global Step: 174180 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:13:12,073-Speed 2971.75 samples/sec Loss 3.2314 LearningRate 0.0089 Epoch: 14 Global Step: 174190 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:13:15,554-Speed 2942.80 samples/sec Loss 3.2309 LearningRate 0.0089 Epoch: 14 Global Step: 174200 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:13:18,952-Speed 3014.35 samples/sec Loss 3.2731 LearningRate 0.0089 Epoch: 14 Global Step: 174210 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:13:22,328-Speed 3034.44 samples/sec Loss 3.1329 LearningRate 0.0089 Epoch: 14 Global Step: 174220 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:13:25,783-Speed 2964.19 samples/sec Loss 3.2533 LearningRate 0.0089 Epoch: 14 Global Step: 174230 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:13:29,224-Speed 2976.75 samples/sec Loss 3.2234 LearningRate 0.0089 Epoch: 14 Global Step: 174240 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:13:32,630-Speed 3008.70 samples/sec Loss 3.2932 LearningRate 0.0089 Epoch: 14 Global Step: 174250 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:13:36,040-Speed 3003.90 samples/sec Loss 3.2671 LearningRate 0.0089 Epoch: 14 Global Step: 174260 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:13:39,414-Speed 3036.62 samples/sec Loss 3.2361 LearningRate 0.0089 Epoch: 14 Global Step: 174270 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:13:42,878-Speed 2956.27 samples/sec Loss 3.1984 LearningRate 0.0089 Epoch: 14 Global Step: 174280 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:13:46,211-Speed 3073.20 samples/sec Loss 3.2504 LearningRate 0.0089 Epoch: 14 Global Step: 174290 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:13:49,614-Speed 3010.04 samples/sec Loss 3.2350 LearningRate 0.0089 Epoch: 14 Global Step: 174300 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:13:53,023-Speed 3004.67 samples/sec Loss 3.2475 LearningRate 0.0089 Epoch: 14 Global Step: 174310 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:13:56,479-Speed 2964.31 samples/sec Loss 3.2691 LearningRate 0.0089 Epoch: 14 Global Step: 174320 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:13:59,839-Speed 3048.04 samples/sec Loss 3.1281 LearningRate 0.0089 Epoch: 14 Global Step: 174330 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:14:03,222-Speed 3027.67 samples/sec Loss 3.2252 LearningRate 0.0089 Epoch: 14 Global Step: 174340 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:14:06,589-Speed 3041.82 samples/sec Loss 3.2053 LearningRate 0.0089 Epoch: 14 Global Step: 174350 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:14:09,974-Speed 3026.61 samples/sec Loss 3.1899 LearningRate 0.0089 Epoch: 14 Global Step: 174360 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:14:13,359-Speed 3025.56 samples/sec Loss 3.2042 LearningRate 0.0089 Epoch: 14 Global Step: 174370 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:14:16,754-Speed 3017.31 samples/sec Loss 3.2520 LearningRate 0.0089 Epoch: 14 Global Step: 174380 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:14:20,110-Speed 3051.83 samples/sec Loss 3.2278 LearningRate 0.0089 Epoch: 14 Global Step: 174390 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:14:23,552-Speed 2975.92 samples/sec Loss 3.2312 LearningRate 0.0089 Epoch: 14 Global Step: 174400 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:14:26,921-Speed 3040.20 samples/sec Loss 3.2052 LearningRate 0.0089 Epoch: 14 Global Step: 174410 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:14:30,247-Speed 3080.18 samples/sec Loss 3.3202 LearningRate 0.0089 Epoch: 14 Global Step: 174420 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:14:33,628-Speed 3029.79 samples/sec Loss 3.2510 LearningRate 0.0089 Epoch: 14 Global Step: 174430 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:14:36,991-Speed 3045.30 samples/sec Loss 3.2714 LearningRate 0.0089 Epoch: 14 Global Step: 174440 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:14:40,343-Speed 3056.23 samples/sec Loss 3.3764 LearningRate 0.0089 Epoch: 14 Global Step: 174450 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:14:43,743-Speed 3012.44 samples/sec Loss 3.2234 LearningRate 0.0089 Epoch: 14 Global Step: 174460 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:14:47,207-Speed 2957.00 samples/sec Loss 3.3707 LearningRate 0.0089 Epoch: 14 Global Step: 174470 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:14:50,611-Speed 3009.41 samples/sec Loss 3.3143 LearningRate 0.0089 Epoch: 14 Global Step: 174480 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:14:53,995-Speed 3026.61 samples/sec Loss 3.3000 LearningRate 0.0089 Epoch: 14 Global Step: 174490 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:14:57,346-Speed 3056.57 samples/sec Loss 3.2785 LearningRate 0.0089 Epoch: 14 Global Step: 174500 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:15:00,688-Speed 3065.00 samples/sec Loss 3.2938 LearningRate 0.0089 Epoch: 14 Global Step: 174510 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:15:04,057-Speed 3040.69 samples/sec Loss 3.2210 LearningRate 0.0088 Epoch: 14 Global Step: 174520 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:15:07,499-Speed 2975.20 samples/sec Loss 3.2883 LearningRate 0.0088 Epoch: 14 Global Step: 174530 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:15:10,892-Speed 3019.88 samples/sec Loss 3.3325 LearningRate 0.0088 Epoch: 14 Global Step: 174540 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:15:14,272-Speed 3030.00 samples/sec Loss 3.3468 LearningRate 0.0088 Epoch: 14 Global Step: 174550 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:15:17,764-Speed 2933.21 samples/sec Loss 3.3311 LearningRate 0.0088 Epoch: 14 Global Step: 174560 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:15:21,260-Speed 2929.65 samples/sec Loss 3.2375 LearningRate 0.0088 Epoch: 14 Global Step: 174570 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:15:24,669-Speed 3005.17 samples/sec Loss 3.3462 LearningRate 0.0088 Epoch: 14 Global Step: 174580 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:15:28,171-Speed 2925.16 samples/sec Loss 3.3751 LearningRate 0.0088 Epoch: 14 Global Step: 174590 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:15:31,589-Speed 2996.79 samples/sec Loss 3.2165 LearningRate 0.0088 Epoch: 14 Global Step: 174600 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:15:35,010-Speed 2993.60 samples/sec Loss 3.4115 LearningRate 0.0088 Epoch: 14 Global Step: 174610 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:15:38,448-Speed 2979.24 samples/sec Loss 3.3624 LearningRate 0.0088 Epoch: 14 Global Step: 174620 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:15:41,821-Speed 3037.20 samples/sec Loss 3.3192 LearningRate 0.0088 Epoch: 14 Global Step: 174630 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:15:45,245-Speed 2991.56 samples/sec Loss 3.3591 LearningRate 0.0088 Epoch: 14 Global Step: 174640 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:15:48,654-Speed 3003.94 samples/sec Loss 3.3462 LearningRate 0.0088 Epoch: 14 Global Step: 174650 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:15:52,049-Speed 3017.63 samples/sec Loss 3.3692 LearningRate 0.0088 Epoch: 14 Global Step: 174660 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:15:55,469-Speed 2994.23 samples/sec Loss 3.2942 LearningRate 0.0088 Epoch: 14 Global Step: 174670 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:15:58,891-Speed 2994.06 samples/sec Loss 3.1691 LearningRate 0.0088 Epoch: 14 Global Step: 174680 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:16:02,243-Speed 3055.17 samples/sec Loss 3.3777 LearningRate 0.0088 Epoch: 14 Global Step: 174690 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:16:05,647-Speed 3009.21 samples/sec Loss 3.3656 LearningRate 0.0088 Epoch: 14 Global Step: 174700 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:16:09,019-Speed 3038.04 samples/sec Loss 3.3661 LearningRate 0.0088 Epoch: 14 Global Step: 174710 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:16:12,403-Speed 3026.60 samples/sec Loss 3.3578 LearningRate 0.0088 Epoch: 14 Global Step: 174720 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:16:15,779-Speed 3033.83 samples/sec Loss 3.3584 LearningRate 0.0088 Epoch: 14 Global Step: 174730 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:16:19,210-Speed 2986.16 samples/sec Loss 3.3715 LearningRate 0.0088 Epoch: 14 Global Step: 174740 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:16:22,606-Speed 3015.89 samples/sec Loss 3.3234 LearningRate 0.0088 Epoch: 14 Global Step: 174750 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 18:16:25,994-Speed 3023.39 samples/sec Loss 3.3336 LearningRate 0.0088 Epoch: 14 Global Step: 174760 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:16:29,341-Speed 3059.96 samples/sec Loss 3.4362 LearningRate 0.0088 Epoch: 14 Global Step: 174770 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:16:32,726-Speed 3026.17 samples/sec Loss 3.3184 LearningRate 0.0088 Epoch: 14 Global Step: 174780 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:16:36,085-Speed 3049.56 samples/sec Loss 3.3025 LearningRate 0.0088 Epoch: 14 Global Step: 174790 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:16:39,446-Speed 3047.24 samples/sec Loss 3.3760 LearningRate 0.0088 Epoch: 14 Global Step: 174800 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:16:42,763-Speed 3088.06 samples/sec Loss 3.3511 LearningRate 0.0088 Epoch: 14 Global Step: 174810 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:16:46,154-Speed 3020.46 samples/sec Loss 3.2973 LearningRate 0.0088 Epoch: 14 Global Step: 174820 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:16:49,619-Speed 2956.41 samples/sec Loss 3.3943 LearningRate 0.0088 Epoch: 14 Global Step: 174830 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:16:52,991-Speed 3037.16 samples/sec Loss 3.3453 LearningRate 0.0088 Epoch: 14 Global Step: 174840 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:16:56,512-Speed 2910.03 samples/sec Loss 3.4114 LearningRate 0.0088 Epoch: 14 Global Step: 174850 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:16:59,868-Speed 3051.63 samples/sec Loss 3.4121 LearningRate 0.0088 Epoch: 14 Global Step: 174860 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:17:03,250-Speed 3028.76 samples/sec Loss 3.3954 LearningRate 0.0088 Epoch: 14 Global Step: 174870 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:17:06,622-Speed 3037.40 samples/sec Loss 3.3864 LearningRate 0.0088 Epoch: 14 Global Step: 174880 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:17:09,980-Speed 3050.46 samples/sec Loss 3.3597 LearningRate 0.0088 Epoch: 14 Global Step: 174890 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:17:13,407-Speed 2989.29 samples/sec Loss 3.3520 LearningRate 0.0088 Epoch: 14 Global Step: 174900 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:17:16,759-Speed 3055.34 samples/sec Loss 3.4080 LearningRate 0.0088 Epoch: 14 Global Step: 174910 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:17:20,194-Speed 2982.18 samples/sec Loss 3.3298 LearningRate 0.0088 Epoch: 14 Global Step: 174920 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:17:23,621-Speed 2989.07 samples/sec Loss 3.3758 LearningRate 0.0088 Epoch: 14 Global Step: 174930 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:17:26,942-Speed 3084.50 samples/sec Loss 3.3694 LearningRate 0.0087 Epoch: 14 Global Step: 174940 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:17:30,317-Speed 3035.05 samples/sec Loss 3.3624 LearningRate 0.0087 Epoch: 14 Global Step: 174950 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:17:33,806-Speed 2935.21 samples/sec Loss 3.3866 LearningRate 0.0087 Epoch: 14 Global Step: 174960 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:17:37,231-Speed 2990.58 samples/sec Loss 3.3426 LearningRate 0.0087 Epoch: 14 Global Step: 174970 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:17:40,564-Speed 3073.62 samples/sec Loss 3.3801 LearningRate 0.0087 Epoch: 14 Global Step: 174980 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:17:43,863-Speed 3104.44 samples/sec Loss 3.3983 LearningRate 0.0087 Epoch: 14 Global Step: 174990 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:17:47,330-Speed 2954.84 samples/sec Loss 3.3546 LearningRate 0.0087 Epoch: 14 Global Step: 175000 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:17:50,745-Speed 2998.81 samples/sec Loss 3.3771 LearningRate 0.0087 Epoch: 14 Global Step: 175010 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:17:54,142-Speed 3016.09 samples/sec Loss 3.3378 LearningRate 0.0087 Epoch: 14 Global Step: 175020 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:17:57,477-Speed 3070.85 samples/sec Loss 3.4492 LearningRate 0.0087 Epoch: 14 Global Step: 175030 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:18:00,829-Speed 3055.57 samples/sec Loss 3.3632 LearningRate 0.0087 Epoch: 14 Global Step: 175040 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:18:04,236-Speed 3006.94 samples/sec Loss 3.3293 LearningRate 0.0087 Epoch: 14 Global Step: 175050 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:18:07,590-Speed 3053.12 samples/sec Loss 3.4576 LearningRate 0.0087 Epoch: 14 Global Step: 175060 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:18:10,899-Speed 3095.94 samples/sec Loss 3.3423 LearningRate 0.0087 Epoch: 14 Global Step: 175070 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:18:14,250-Speed 3057.34 samples/sec Loss 3.4438 LearningRate 0.0087 Epoch: 14 Global Step: 175080 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:18:17,683-Speed 2983.12 samples/sec Loss 3.3301 LearningRate 0.0087 Epoch: 14 Global Step: 175090 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:18:21,150-Speed 2954.37 samples/sec Loss 3.3359 LearningRate 0.0087 Epoch: 14 Global Step: 175100 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:18:24,613-Speed 2957.50 samples/sec Loss 3.4128 LearningRate 0.0087 Epoch: 14 Global Step: 175110 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:18:27,944-Speed 3075.24 samples/sec Loss 3.3969 LearningRate 0.0087 Epoch: 14 Global Step: 175120 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:18:31,318-Speed 3036.49 samples/sec Loss 3.3794 LearningRate 0.0087 Epoch: 14 Global Step: 175130 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:18:34,716-Speed 3013.76 samples/sec Loss 3.3609 LearningRate 0.0087 Epoch: 14 Global Step: 175140 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:18:38,051-Speed 3071.51 samples/sec Loss 3.4602 LearningRate 0.0087 Epoch: 14 Global Step: 175150 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:18:41,493-Speed 2976.09 samples/sec Loss 3.3698 LearningRate 0.0087 Epoch: 14 Global Step: 175160 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:18:44,916-Speed 2992.04 samples/sec Loss 3.5015 LearningRate 0.0087 Epoch: 14 Global Step: 175170 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:18:48,346-Speed 2985.94 samples/sec Loss 3.3669 LearningRate 0.0087 Epoch: 14 Global Step: 175180 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:18:51,765-Speed 2996.49 samples/sec Loss 3.2695 LearningRate 0.0087 Epoch: 14 Global Step: 175190 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:18:55,072-Speed 3096.67 samples/sec Loss 3.4152 LearningRate 0.0087 Epoch: 14 Global Step: 175200 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:18:58,445-Speed 3037.14 samples/sec Loss 3.4191 LearningRate 0.0087 Epoch: 14 Global Step: 175210 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:19:01,864-Speed 2995.89 samples/sec Loss 3.4305 LearningRate 0.0087 Epoch: 14 Global Step: 175220 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:19:05,280-Speed 2998.53 samples/sec Loss 3.4408 LearningRate 0.0087 Epoch: 14 Global Step: 175230 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:19:08,715-Speed 2981.55 samples/sec Loss 3.3377 LearningRate 0.0087 Epoch: 14 Global Step: 175240 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:19:12,186-Speed 2951.08 samples/sec Loss 3.4920 LearningRate 0.0087 Epoch: 14 Global Step: 175250 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:19:15,632-Speed 2972.28 samples/sec Loss 3.3534 LearningRate 0.0087 Epoch: 14 Global Step: 175260 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:19:19,126-Speed 2931.99 samples/sec Loss 3.3780 LearningRate 0.0087 Epoch: 14 Global Step: 175270 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:19:22,532-Speed 3006.85 samples/sec Loss 3.4503 LearningRate 0.0087 Epoch: 14 Global Step: 175280 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:19:25,931-Speed 3014.44 samples/sec Loss 3.3319 LearningRate 0.0087 Epoch: 14 Global Step: 175290 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:19:29,273-Speed 3064.78 samples/sec Loss 3.3728 LearningRate 0.0087 Epoch: 14 Global Step: 175300 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:19:32,609-Speed 3070.74 samples/sec Loss 3.4190 LearningRate 0.0087 Epoch: 14 Global Step: 175310 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:19:36,078-Speed 2951.86 samples/sec Loss 3.3555 LearningRate 0.0087 Epoch: 14 Global Step: 175320 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:19:39,444-Speed 3043.15 samples/sec Loss 3.4203 LearningRate 0.0087 Epoch: 14 Global Step: 175330 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:19:42,800-Speed 3052.53 samples/sec Loss 3.4024 LearningRate 0.0087 Epoch: 14 Global Step: 175340 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:19:46,173-Speed 3036.29 samples/sec Loss 3.3867 LearningRate 0.0087 Epoch: 14 Global Step: 175350 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:19:49,549-Speed 3034.08 samples/sec Loss 3.3967 LearningRate 0.0086 Epoch: 14 Global Step: 175360 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:19:53,031-Speed 2941.24 samples/sec Loss 3.4540 LearningRate 0.0086 Epoch: 14 Global Step: 175370 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:19:56,375-Speed 3063.68 samples/sec Loss 3.4770 LearningRate 0.0086 Epoch: 14 Global Step: 175380 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:19:59,722-Speed 3060.07 samples/sec Loss 3.3797 LearningRate 0.0086 Epoch: 14 Global Step: 175390 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:20:03,063-Speed 3065.08 samples/sec Loss 3.3936 LearningRate 0.0086 Epoch: 14 Global Step: 175400 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:20:06,412-Speed 3058.43 samples/sec Loss 3.4292 LearningRate 0.0086 Epoch: 14 Global Step: 175410 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:20:09,862-Speed 2969.59 samples/sec Loss 3.4300 LearningRate 0.0086 Epoch: 14 Global Step: 175420 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:20:13,206-Speed 3062.94 samples/sec Loss 3.4099 LearningRate 0.0086 Epoch: 14 Global Step: 175430 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:20:16,585-Speed 3031.22 samples/sec Loss 3.4244 LearningRate 0.0086 Epoch: 14 Global Step: 175440 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:20:19,928-Speed 3064.61 samples/sec Loss 3.5758 LearningRate 0.0086 Epoch: 14 Global Step: 175450 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:20:23,271-Speed 3063.75 samples/sec Loss 3.4546 LearningRate 0.0086 Epoch: 14 Global Step: 175460 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:20:26,678-Speed 3006.76 samples/sec Loss 3.4124 LearningRate 0.0086 Epoch: 14 Global Step: 175470 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:20:30,005-Speed 3079.05 samples/sec Loss 3.4866 LearningRate 0.0086 Epoch: 14 Global Step: 175480 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:20:33,399-Speed 3017.03 samples/sec Loss 3.4745 LearningRate 0.0086 Epoch: 14 Global Step: 175490 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:20:36,754-Speed 3053.25 samples/sec Loss 3.3884 LearningRate 0.0086 Epoch: 14 Global Step: 175500 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:20:40,107-Speed 3055.09 samples/sec Loss 3.4917 LearningRate 0.0086 Epoch: 14 Global Step: 175510 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:20:43,490-Speed 3027.84 samples/sec Loss 3.4526 LearningRate 0.0086 Epoch: 14 Global Step: 175520 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:20:46,833-Speed 3063.85 samples/sec Loss 3.4587 LearningRate 0.0086 Epoch: 14 Global Step: 175530 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:20:50,188-Speed 3053.13 samples/sec Loss 3.4802 LearningRate 0.0086 Epoch: 14 Global Step: 175540 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:20:53,517-Speed 3076.37 samples/sec Loss 3.5169 LearningRate 0.0086 Epoch: 14 Global Step: 175550 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:20:56,961-Speed 2973.98 samples/sec Loss 3.4546 LearningRate 0.0086 Epoch: 14 Global Step: 175560 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:21:00,357-Speed 3016.30 samples/sec Loss 3.4900 LearningRate 0.0086 Epoch: 14 Global Step: 175570 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:21:03,745-Speed 3023.00 samples/sec Loss 3.4677 LearningRate 0.0086 Epoch: 14 Global Step: 175580 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:21:07,181-Speed 2980.62 samples/sec Loss 3.5289 LearningRate 0.0086 Epoch: 14 Global Step: 175590 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:21:10,493-Speed 3092.95 samples/sec Loss 3.4911 LearningRate 0.0086 Epoch: 14 Global Step: 175600 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:21:13,807-Speed 3090.72 samples/sec Loss 3.5114 LearningRate 0.0086 Epoch: 14 Global Step: 175610 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:21:17,141-Speed 3072.80 samples/sec Loss 3.4651 LearningRate 0.0086 Epoch: 14 Global Step: 175620 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:21:20,509-Speed 3041.42 samples/sec Loss 3.5288 LearningRate 0.0086 Epoch: 14 Global Step: 175630 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:21:23,838-Speed 3077.03 samples/sec Loss 3.5315 LearningRate 0.0086 Epoch: 14 Global Step: 175640 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:21:27,214-Speed 3034.14 samples/sec Loss 3.5652 LearningRate 0.0086 Epoch: 14 Global Step: 175650 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:21:30,623-Speed 3004.55 samples/sec Loss 3.5328 LearningRate 0.0086 Epoch: 14 Global Step: 175660 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:21:34,078-Speed 2964.44 samples/sec Loss 3.4641 LearningRate 0.0086 Epoch: 14 Global Step: 175670 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:21:37,467-Speed 3023.21 samples/sec Loss 3.5136 LearningRate 0.0086 Epoch: 14 Global Step: 175680 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:21:40,858-Speed 3021.02 samples/sec Loss 3.4452 LearningRate 0.0086 Epoch: 14 Global Step: 175690 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:21:44,174-Speed 3089.19 samples/sec Loss 3.4791 LearningRate 0.0086 Epoch: 14 Global Step: 175700 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:21:47,533-Speed 3048.52 samples/sec Loss 3.5113 LearningRate 0.0086 Epoch: 14 Global Step: 175710 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:21:50,895-Speed 3047.28 samples/sec Loss 3.4561 LearningRate 0.0086 Epoch: 14 Global Step: 175720 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:21:54,228-Speed 3073.23 samples/sec Loss 3.5174 LearningRate 0.0086 Epoch: 14 Global Step: 175730 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:21:57,578-Speed 3058.60 samples/sec Loss 3.4535 LearningRate 0.0086 Epoch: 14 Global Step: 175740 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:22:01,035-Speed 2963.02 samples/sec Loss 3.4602 LearningRate 0.0086 Epoch: 14 Global Step: 175750 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:22:04,464-Speed 2987.37 samples/sec Loss 3.4630 LearningRate 0.0086 Epoch: 14 Global Step: 175760 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:22:07,811-Speed 3060.41 samples/sec Loss 3.5036 LearningRate 0.0086 Epoch: 14 Global Step: 175770 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:22:11,189-Speed 3032.82 samples/sec Loss 3.4915 LearningRate 0.0086 Epoch: 14 Global Step: 175780 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:22:14,590-Speed 3011.85 samples/sec Loss 3.4449 LearningRate 0.0085 Epoch: 14 Global Step: 175790 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:22:17,999-Speed 3004.63 samples/sec Loss 3.4450 LearningRate 0.0085 Epoch: 14 Global Step: 175800 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:22:21,390-Speed 3021.32 samples/sec Loss 3.6065 LearningRate 0.0085 Epoch: 14 Global Step: 175810 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:22:24,814-Speed 2990.94 samples/sec Loss 3.5145 LearningRate 0.0085 Epoch: 14 Global Step: 175820 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:22:28,224-Speed 3003.91 samples/sec Loss 3.5988 LearningRate 0.0085 Epoch: 14 Global Step: 175830 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:22:31,590-Speed 3043.19 samples/sec Loss 3.6094 LearningRate 0.0085 Epoch: 14 Global Step: 175840 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:22:34,992-Speed 3011.19 samples/sec Loss 3.4747 LearningRate 0.0085 Epoch: 14 Global Step: 175850 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:22:38,404-Speed 3001.32 samples/sec Loss 3.5525 LearningRate 0.0085 Epoch: 14 Global Step: 175860 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:22:41,876-Speed 2950.88 samples/sec Loss 3.4802 LearningRate 0.0085 Epoch: 14 Global Step: 175870 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:22:45,206-Speed 3075.60 samples/sec Loss 3.4677 LearningRate 0.0085 Epoch: 14 Global Step: 175880 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:22:48,542-Speed 3070.77 samples/sec Loss 3.5273 LearningRate 0.0085 Epoch: 14 Global Step: 175890 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:22:51,864-Speed 3083.43 samples/sec Loss 3.5573 LearningRate 0.0085 Epoch: 14 Global Step: 175900 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:22:55,202-Speed 3068.93 samples/sec Loss 3.6100 LearningRate 0.0085 Epoch: 14 Global Step: 175910 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:22:58,603-Speed 3011.98 samples/sec Loss 3.4739 LearningRate 0.0085 Epoch: 14 Global Step: 175920 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:23:01,938-Speed 3072.92 samples/sec Loss 3.4653 LearningRate 0.0085 Epoch: 14 Global Step: 175930 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:23:05,331-Speed 3018.93 samples/sec Loss 3.4825 LearningRate 0.0085 Epoch: 14 Global Step: 175940 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:23:08,706-Speed 3035.39 samples/sec Loss 3.6235 LearningRate 0.0085 Epoch: 14 Global Step: 175950 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:23:12,097-Speed 3020.13 samples/sec Loss 3.5202 LearningRate 0.0085 Epoch: 14 Global Step: 175960 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:23:15,451-Speed 3054.60 samples/sec Loss 3.5749 LearningRate 0.0085 Epoch: 14 Global Step: 175970 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:23:18,853-Speed 3010.33 samples/sec Loss 3.4934 LearningRate 0.0085 Epoch: 14 Global Step: 175980 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:23:22,162-Speed 3095.41 samples/sec Loss 3.6107 LearningRate 0.0085 Epoch: 14 Global Step: 175990 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:23:25,530-Speed 3041.86 samples/sec Loss 3.5462 LearningRate 0.0085 Epoch: 14 Global Step: 176000 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:23:28,922-Speed 3020.66 samples/sec Loss 3.5550 LearningRate 0.0085 Epoch: 14 Global Step: 176010 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:23:32,355-Speed 2983.39 samples/sec Loss 3.5385 LearningRate 0.0085 Epoch: 14 Global Step: 176020 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:23:35,727-Speed 3037.70 samples/sec Loss 3.5114 LearningRate 0.0085 Epoch: 14 Global Step: 176030 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:23:39,118-Speed 3020.42 samples/sec Loss 3.5802 LearningRate 0.0085 Epoch: 14 Global Step: 176040 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:23:42,544-Speed 2990.93 samples/sec Loss 3.5743 LearningRate 0.0085 Epoch: 14 Global Step: 176050 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:23:45,991-Speed 2970.88 samples/sec Loss 3.6394 LearningRate 0.0085 Epoch: 14 Global Step: 176060 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:23:49,412-Speed 2994.32 samples/sec Loss 3.5883 LearningRate 0.0085 Epoch: 14 Global Step: 176070 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:23:52,822-Speed 3004.14 samples/sec Loss 3.6187 LearningRate 0.0085 Epoch: 14 Global Step: 176080 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:23:56,199-Speed 3032.96 samples/sec Loss 3.5501 LearningRate 0.0085 Epoch: 14 Global Step: 176090 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:23:59,607-Speed 3005.00 samples/sec Loss 3.5673 LearningRate 0.0085 Epoch: 14 Global Step: 176100 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:24:03,058-Speed 2968.84 samples/sec Loss 3.5107 LearningRate 0.0085 Epoch: 14 Global Step: 176110 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:24:06,462-Speed 3008.51 samples/sec Loss 3.7465 LearningRate 0.0085 Epoch: 14 Global Step: 176120 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:24:09,871-Speed 3004.88 samples/sec Loss 3.6149 LearningRate 0.0085 Epoch: 14 Global Step: 176130 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:24:13,239-Speed 3041.83 samples/sec Loss 3.5107 LearningRate 0.0085 Epoch: 14 Global Step: 176140 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:24:16,669-Speed 2985.91 samples/sec Loss 3.4903 LearningRate 0.0085 Epoch: 14 Global Step: 176150 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:24:20,057-Speed 3023.54 samples/sec Loss 3.5364 LearningRate 0.0085 Epoch: 14 Global Step: 176160 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:24:23,502-Speed 2973.44 samples/sec Loss 3.4903 LearningRate 0.0085 Epoch: 14 Global Step: 176170 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:24:26,821-Speed 3085.90 samples/sec Loss 3.4422 LearningRate 0.0085 Epoch: 14 Global Step: 176180 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:24:30,187-Speed 3042.75 samples/sec Loss 3.5469 LearningRate 0.0085 Epoch: 14 Global Step: 176190 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:24:33,502-Speed 3090.47 samples/sec Loss 3.6166 LearningRate 0.0085 Epoch: 14 Global Step: 176200 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:24:36,910-Speed 3005.38 samples/sec Loss 3.5497 LearningRate 0.0084 Epoch: 14 Global Step: 176210 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:24:40,271-Speed 3047.58 samples/sec Loss 3.5527 LearningRate 0.0084 Epoch: 14 Global Step: 176220 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:24:43,627-Speed 3052.32 samples/sec Loss 3.5942 LearningRate 0.0084 Epoch: 14 Global Step: 176230 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:24:47,083-Speed 2963.20 samples/sec Loss 3.6274 LearningRate 0.0084 Epoch: 14 Global Step: 176240 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:24:50,466-Speed 3029.21 samples/sec Loss 3.6332 LearningRate 0.0084 Epoch: 14 Global Step: 176250 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:24:53,871-Speed 3007.40 samples/sec Loss 3.5383 LearningRate 0.0084 Epoch: 14 Global Step: 176260 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:24:57,339-Speed 2953.34 samples/sec Loss 3.6332 LearningRate 0.0084 Epoch: 14 Global Step: 176270 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:25:00,817-Speed 2945.78 samples/sec Loss 3.5342 LearningRate 0.0084 Epoch: 14 Global Step: 176280 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:25:04,216-Speed 3013.24 samples/sec Loss 3.5210 LearningRate 0.0084 Epoch: 14 Global Step: 176290 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:25:07,577-Speed 3047.13 samples/sec Loss 3.5813 LearningRate 0.0084 Epoch: 14 Global Step: 176300 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:25:11,055-Speed 2945.45 samples/sec Loss 3.5482 LearningRate 0.0084 Epoch: 14 Global Step: 176310 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:25:14,356-Speed 3102.49 samples/sec Loss 3.5225 LearningRate 0.0084 Epoch: 14 Global Step: 176320 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:25:17,846-Speed 2934.41 samples/sec Loss 3.5698 LearningRate 0.0084 Epoch: 14 Global Step: 176330 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:25:21,244-Speed 3015.01 samples/sec Loss 3.7278 LearningRate 0.0084 Epoch: 14 Global Step: 176340 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:25:24,719-Speed 2947.57 samples/sec Loss 3.5624 LearningRate 0.0084 Epoch: 14 Global Step: 176350 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:25:28,117-Speed 3013.92 samples/sec Loss 3.4949 LearningRate 0.0084 Epoch: 14 Global Step: 176360 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:25:31,567-Speed 2969.51 samples/sec Loss 3.5876 LearningRate 0.0084 Epoch: 14 Global Step: 176370 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:25:35,012-Speed 2973.05 samples/sec Loss 3.6556 LearningRate 0.0084 Epoch: 14 Global Step: 176380 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:25:38,515-Speed 2924.83 samples/sec Loss 3.6876 LearningRate 0.0084 Epoch: 14 Global Step: 176390 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:25:41,897-Speed 3027.87 samples/sec Loss 3.5154 LearningRate 0.0084 Epoch: 14 Global Step: 176400 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:25:45,313-Speed 2999.62 samples/sec Loss 3.5609 LearningRate 0.0084 Epoch: 14 Global Step: 176410 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:25:48,768-Speed 2964.44 samples/sec Loss 3.6122 LearningRate 0.0084 Epoch: 14 Global Step: 176420 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:25:52,159-Speed 3020.41 samples/sec Loss 3.6018 LearningRate 0.0084 Epoch: 14 Global Step: 176430 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:25:55,623-Speed 2956.99 samples/sec Loss 3.7115 LearningRate 0.0084 Epoch: 14 Global Step: 176440 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:25:58,979-Speed 3052.59 samples/sec Loss 3.6156 LearningRate 0.0084 Epoch: 14 Global Step: 176450 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:26:02,398-Speed 2996.10 samples/sec Loss 3.5820 LearningRate 0.0084 Epoch: 14 Global Step: 176460 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:26:05,821-Speed 2991.68 samples/sec Loss 3.6069 LearningRate 0.0084 Epoch: 14 Global Step: 176470 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:26:09,336-Speed 2914.82 samples/sec Loss 3.5962 LearningRate 0.0084 Epoch: 14 Global Step: 176480 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:26:12,805-Speed 2952.77 samples/sec Loss 3.5549 LearningRate 0.0084 Epoch: 14 Global Step: 176490 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:26:16,161-Speed 3051.25 samples/sec Loss 3.5555 LearningRate 0.0084 Epoch: 14 Global Step: 176500 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:26:19,561-Speed 3013.15 samples/sec Loss 3.6452 LearningRate 0.0084 Epoch: 14 Global Step: 176510 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:26:22,966-Speed 3008.27 samples/sec Loss 3.5695 LearningRate 0.0084 Epoch: 14 Global Step: 176520 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:26:26,379-Speed 3000.95 samples/sec Loss 3.4619 LearningRate 0.0084 Epoch: 14 Global Step: 176530 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:26:29,766-Speed 3023.96 samples/sec Loss 3.6038 LearningRate 0.0084 Epoch: 14 Global Step: 176540 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:26:33,240-Speed 2948.84 samples/sec Loss 3.7119 LearningRate 0.0084 Epoch: 14 Global Step: 176550 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:26:36,672-Speed 2984.19 samples/sec Loss 3.6200 LearningRate 0.0084 Epoch: 14 Global Step: 176560 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:26:40,048-Speed 3035.20 samples/sec Loss 3.5344 LearningRate 0.0084 Epoch: 14 Global Step: 176570 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:26:43,432-Speed 3026.74 samples/sec Loss 3.5345 LearningRate 0.0084 Epoch: 14 Global Step: 176580 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:26:46,794-Speed 3045.89 samples/sec Loss 3.5501 LearningRate 0.0084 Epoch: 14 Global Step: 176590 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:26:50,178-Speed 3027.54 samples/sec Loss 3.5396 LearningRate 0.0084 Epoch: 14 Global Step: 176600 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:26:53,572-Speed 3017.75 samples/sec Loss 3.6405 LearningRate 0.0084 Epoch: 14 Global Step: 176610 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:26:57,005-Speed 2982.96 samples/sec Loss 3.5942 LearningRate 0.0084 Epoch: 14 Global Step: 176620 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:27:00,356-Speed 3057.68 samples/sec Loss 3.6074 LearningRate 0.0084 Epoch: 14 Global Step: 176630 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:27:03,722-Speed 3042.67 samples/sec Loss 3.6056 LearningRate 0.0083 Epoch: 14 Global Step: 176640 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:27:07,066-Speed 3062.99 samples/sec Loss 3.6283 LearningRate 0.0083 Epoch: 14 Global Step: 176650 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:27:10,522-Speed 2963.67 samples/sec Loss 3.5515 LearningRate 0.0083 Epoch: 14 Global Step: 176660 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:27:13,933-Speed 3002.68 samples/sec Loss 3.6550 LearningRate 0.0083 Epoch: 14 Global Step: 176670 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:27:17,345-Speed 3002.06 samples/sec Loss 3.6340 LearningRate 0.0083 Epoch: 14 Global Step: 176680 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:27:20,824-Speed 2944.99 samples/sec Loss 3.5566 LearningRate 0.0083 Epoch: 14 Global Step: 176690 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:27:24,241-Speed 2997.57 samples/sec Loss 3.6893 LearningRate 0.0083 Epoch: 14 Global Step: 176700 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:27:27,579-Speed 3067.81 samples/sec Loss 3.5666 LearningRate 0.0083 Epoch: 14 Global Step: 176710 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:27:30,953-Speed 3036.82 samples/sec Loss 3.6034 LearningRate 0.0083 Epoch: 14 Global Step: 176720 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:27:34,420-Speed 2953.55 samples/sec Loss 3.6403 LearningRate 0.0083 Epoch: 14 Global Step: 176730 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:27:37,803-Speed 3028.56 samples/sec Loss 3.5965 LearningRate 0.0083 Epoch: 14 Global Step: 176740 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:27:41,207-Speed 3009.19 samples/sec Loss 3.5697 LearningRate 0.0083 Epoch: 14 Global Step: 176750 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:27:44,554-Speed 3059.84 samples/sec Loss 3.6154 LearningRate 0.0083 Epoch: 14 Global Step: 176760 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:27:47,942-Speed 3023.55 samples/sec Loss 3.6046 LearningRate 0.0083 Epoch: 14 Global Step: 176770 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:27:51,341-Speed 3013.75 samples/sec Loss 3.6039 LearningRate 0.0083 Epoch: 14 Global Step: 176780 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:27:54,670-Speed 3076.74 samples/sec Loss 3.6173 LearningRate 0.0083 Epoch: 14 Global Step: 176790 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:27:58,032-Speed 3046.41 samples/sec Loss 3.6130 LearningRate 0.0083 Epoch: 14 Global Step: 176800 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:28:01,422-Speed 3021.94 samples/sec Loss 3.5364 LearningRate 0.0083 Epoch: 14 Global Step: 176810 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:28:05,678-Speed 2406.50 samples/sec Loss 3.6073 LearningRate 0.0083 Epoch: 14 Global Step: 176820 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:28:09,083-Speed 3008.84 samples/sec Loss 3.5687 LearningRate 0.0083 Epoch: 14 Global Step: 176830 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:28:12,491-Speed 3004.87 samples/sec Loss 3.6128 LearningRate 0.0083 Epoch: 14 Global Step: 176840 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:28:15,879-Speed 3023.61 samples/sec Loss 3.6832 LearningRate 0.0083 Epoch: 14 Global Step: 176850 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:28:19,275-Speed 3016.48 samples/sec Loss 3.7031 LearningRate 0.0083 Epoch: 14 Global Step: 176860 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:28:22,629-Speed 3053.56 samples/sec Loss 3.6902 LearningRate 0.0083 Epoch: 14 Global Step: 176870 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:28:26,051-Speed 2992.73 samples/sec Loss 3.7016 LearningRate 0.0083 Epoch: 14 Global Step: 176880 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:28:29,471-Speed 2995.01 samples/sec Loss 3.6060 LearningRate 0.0083 Epoch: 14 Global Step: 176890 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:28:32,917-Speed 2972.70 samples/sec Loss 3.6416 LearningRate 0.0083 Epoch: 14 Global Step: 176900 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:28:36,374-Speed 2962.95 samples/sec Loss 3.6707 LearningRate 0.0083 Epoch: 14 Global Step: 176910 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:28:39,801-Speed 2988.78 samples/sec Loss 3.7027 LearningRate 0.0083 Epoch: 14 Global Step: 176920 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:28:43,241-Speed 2977.54 samples/sec Loss 3.7354 LearningRate 0.0083 Epoch: 14 Global Step: 176930 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:28:46,646-Speed 3007.65 samples/sec Loss 3.6631 LearningRate 0.0083 Epoch: 14 Global Step: 176940 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:28:50,074-Speed 2988.37 samples/sec Loss 3.6263 LearningRate 0.0083 Epoch: 14 Global Step: 176950 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:28:53,437-Speed 3045.64 samples/sec Loss 3.6835 LearningRate 0.0083 Epoch: 14 Global Step: 176960 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:28:56,873-Speed 2981.13 samples/sec Loss 3.6887 LearningRate 0.0083 Epoch: 14 Global Step: 176970 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:29:00,274-Speed 3011.63 samples/sec Loss 3.6794 LearningRate 0.0083 Epoch: 14 Global Step: 176980 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:29:03,695-Speed 2994.32 samples/sec Loss 3.7325 LearningRate 0.0083 Epoch: 14 Global Step: 176990 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:29:07,090-Speed 3017.02 samples/sec Loss 3.6954 LearningRate 0.0083 Epoch: 14 Global Step: 177000 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:29:10,555-Speed 2956.89 samples/sec Loss 3.7417 LearningRate 0.0083 Epoch: 14 Global Step: 177010 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:29:13,980-Speed 2990.62 samples/sec Loss 3.6530 LearningRate 0.0083 Epoch: 14 Global Step: 177020 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:29:17,442-Speed 2958.46 samples/sec Loss 3.6263 LearningRate 0.0083 Epoch: 14 Global Step: 177030 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:29:20,842-Speed 3012.30 samples/sec Loss 3.6992 LearningRate 0.0083 Epoch: 14 Global Step: 177040 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:29:24,243-Speed 3012.22 samples/sec Loss 3.6828 LearningRate 0.0083 Epoch: 14 Global Step: 177050 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:29:27,609-Speed 3042.97 samples/sec Loss 3.6821 LearningRate 0.0083 Epoch: 14 Global Step: 177060 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:29:30,990-Speed 3029.39 samples/sec Loss 3.6721 LearningRate 0.0082 Epoch: 14 Global Step: 177070 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:29:34,479-Speed 2936.37 samples/sec Loss 3.6841 LearningRate 0.0082 Epoch: 14 Global Step: 177080 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:29:37,956-Speed 2946.03 samples/sec Loss 3.6752 LearningRate 0.0082 Epoch: 14 Global Step: 177090 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:29:41,368-Speed 3002.47 samples/sec Loss 3.5961 LearningRate 0.0082 Epoch: 14 Global Step: 177100 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:29:44,800-Speed 2984.36 samples/sec Loss 3.7190 LearningRate 0.0082 Epoch: 14 Global Step: 177110 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:29:48,141-Speed 3065.88 samples/sec Loss 3.6705 LearningRate 0.0082 Epoch: 14 Global Step: 177120 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:29:51,594-Speed 2966.45 samples/sec Loss 3.6494 LearningRate 0.0082 Epoch: 14 Global Step: 177130 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:29:55,055-Speed 2958.95 samples/sec Loss 3.6538 LearningRate 0.0082 Epoch: 14 Global Step: 177140 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:29:58,406-Speed 3056.55 samples/sec Loss 3.6413 LearningRate 0.0082 Epoch: 14 Global Step: 177150 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:30:01,778-Speed 3038.53 samples/sec Loss 3.6896 LearningRate 0.0082 Epoch: 14 Global Step: 177160 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:30:05,110-Speed 3073.82 samples/sec Loss 3.6630 LearningRate 0.0082 Epoch: 14 Global Step: 177170 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:30:08,513-Speed 3010.12 samples/sec Loss 3.7917 LearningRate 0.0082 Epoch: 14 Global Step: 177180 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:30:11,979-Speed 2955.36 samples/sec Loss 3.6743 LearningRate 0.0082 Epoch: 14 Global Step: 177190 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:30:15,358-Speed 3031.68 samples/sec Loss 3.7047 LearningRate 0.0082 Epoch: 14 Global Step: 177200 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:30:18,726-Speed 3041.13 samples/sec Loss 3.6392 LearningRate 0.0082 Epoch: 14 Global Step: 177210 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:30:22,110-Speed 3026.63 samples/sec Loss 3.6874 LearningRate 0.0082 Epoch: 14 Global Step: 177220 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:30:25,410-Speed 3103.30 samples/sec Loss 3.6823 LearningRate 0.0082 Epoch: 14 Global Step: 177230 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:30:28,809-Speed 3014.08 samples/sec Loss 3.5995 LearningRate 0.0082 Epoch: 14 Global Step: 177240 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:30:32,165-Speed 3051.66 samples/sec Loss 3.6163 LearningRate 0.0082 Epoch: 14 Global Step: 177250 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:30:35,522-Speed 3051.41 samples/sec Loss 3.6916 LearningRate 0.0082 Epoch: 14 Global Step: 177260 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:30:38,841-Speed 3086.26 samples/sec Loss 3.5916 LearningRate 0.0082 Epoch: 14 Global Step: 177270 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:30:42,225-Speed 3027.01 samples/sec Loss 3.6325 LearningRate 0.0082 Epoch: 14 Global Step: 177280 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:30:45,557-Speed 3073.36 samples/sec Loss 3.6275 LearningRate 0.0082 Epoch: 14 Global Step: 177290 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:30:48,927-Speed 3040.16 samples/sec Loss 3.5975 LearningRate 0.0082 Epoch: 14 Global Step: 177300 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:30:52,357-Speed 2986.16 samples/sec Loss 3.6983 LearningRate 0.0082 Epoch: 14 Global Step: 177310 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:30:55,728-Speed 3038.56 samples/sec Loss 3.7547 LearningRate 0.0082 Epoch: 14 Global Step: 177320 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:30:59,097-Speed 3040.33 samples/sec Loss 3.6282 LearningRate 0.0082 Epoch: 14 Global Step: 177330 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:31:02,454-Speed 3052.21 samples/sec Loss 3.6922 LearningRate 0.0082 Epoch: 14 Global Step: 177340 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:31:05,796-Speed 3064.34 samples/sec Loss 3.7283 LearningRate 0.0082 Epoch: 14 Global Step: 177350 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:31:09,133-Speed 3070.11 samples/sec Loss 3.6130 LearningRate 0.0082 Epoch: 14 Global Step: 177360 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:31:12,480-Speed 3060.06 samples/sec Loss 3.7715 LearningRate 0.0082 Epoch: 14 Global Step: 177370 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:31:15,908-Speed 2987.76 samples/sec Loss 3.7692 LearningRate 0.0082 Epoch: 14 Global Step: 177380 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:31:19,265-Speed 3051.76 samples/sec Loss 3.6984 LearningRate 0.0082 Epoch: 14 Global Step: 177390 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:31:22,735-Speed 2951.37 samples/sec Loss 3.6467 LearningRate 0.0082 Epoch: 14 Global Step: 177400 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:31:26,115-Speed 3030.95 samples/sec Loss 3.6601 LearningRate 0.0082 Epoch: 14 Global Step: 177410 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:31:29,567-Speed 2967.04 samples/sec Loss 3.6298 LearningRate 0.0082 Epoch: 14 Global Step: 177420 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:31:32,977-Speed 3003.24 samples/sec Loss 3.6680 LearningRate 0.0082 Epoch: 14 Global Step: 177430 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:31:36,292-Speed 3090.60 samples/sec Loss 3.6917 LearningRate 0.0082 Epoch: 14 Global Step: 177440 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:31:39,701-Speed 3004.70 samples/sec Loss 3.7219 LearningRate 0.0082 Epoch: 14 Global Step: 177450 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:31:43,067-Speed 3042.73 samples/sec Loss 3.6419 LearningRate 0.0082 Epoch: 14 Global Step: 177460 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:31:46,415-Speed 3059.15 samples/sec Loss 3.7133 LearningRate 0.0082 Epoch: 14 Global Step: 177470 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:31:49,734-Speed 3087.04 samples/sec Loss 3.7156 LearningRate 0.0082 Epoch: 14 Global Step: 177480 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:31:53,115-Speed 3029.42 samples/sec Loss 3.7707 LearningRate 0.0082 Epoch: 14 Global Step: 177490 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:31:56,490-Speed 3034.43 samples/sec Loss 3.7001 LearningRate 0.0082 Epoch: 14 Global Step: 177500 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:31:59,942-Speed 2967.74 samples/sec Loss 3.6441 LearningRate 0.0081 Epoch: 14 Global Step: 177510 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:32:03,368-Speed 2989.64 samples/sec Loss 3.6669 LearningRate 0.0081 Epoch: 14 Global Step: 177520 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:32:06,803-Speed 2981.71 samples/sec Loss 3.7166 LearningRate 0.0081 Epoch: 14 Global Step: 177530 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:32:10,235-Speed 2984.99 samples/sec Loss 3.7398 LearningRate 0.0081 Epoch: 14 Global Step: 177540 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:32:13,583-Speed 3059.36 samples/sec Loss 3.7553 LearningRate 0.0081 Epoch: 14 Global Step: 177550 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:32:17,076-Speed 2932.21 samples/sec Loss 3.6915 LearningRate 0.0081 Epoch: 14 Global Step: 177560 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:32:20,502-Speed 2989.64 samples/sec Loss 3.7256 LearningRate 0.0081 Epoch: 14 Global Step: 177570 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:32:23,926-Speed 2991.84 samples/sec Loss 3.7016 LearningRate 0.0081 Epoch: 14 Global Step: 177580 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:32:27,374-Speed 2970.27 samples/sec Loss 3.6634 LearningRate 0.0081 Epoch: 14 Global Step: 177590 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:32:30,690-Speed 3089.44 samples/sec Loss 3.6605 LearningRate 0.0081 Epoch: 14 Global Step: 177600 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:32:34,121-Speed 2985.25 samples/sec Loss 3.7065 LearningRate 0.0081 Epoch: 14 Global Step: 177610 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:32:37,520-Speed 3013.45 samples/sec Loss 3.6765 LearningRate 0.0081 Epoch: 14 Global Step: 177620 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:32:40,895-Speed 3034.96 samples/sec Loss 3.6291 LearningRate 0.0081 Epoch: 14 Global Step: 177630 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:32:44,219-Speed 3081.25 samples/sec Loss 3.6802 LearningRate 0.0081 Epoch: 14 Global Step: 177640 Fp16 Grad Scale: 131072 Required: 7 hours Training: 2022-04-27 18:32:47,656-Speed 2980.67 samples/sec Loss 3.7660 LearningRate 0.0081 Epoch: 14 Global Step: 177650 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:32:51,040-Speed 3026.43 samples/sec Loss 3.6830 LearningRate 0.0081 Epoch: 14 Global Step: 177660 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:32:54,514-Speed 2948.30 samples/sec Loss 3.7157 LearningRate 0.0081 Epoch: 14 Global Step: 177670 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:32:57,930-Speed 2998.25 samples/sec Loss 3.6448 LearningRate 0.0081 Epoch: 14 Global Step: 177680 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:33:01,341-Speed 3003.62 samples/sec Loss 3.6763 LearningRate 0.0081 Epoch: 14 Global Step: 177690 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:33:04,703-Speed 3045.79 samples/sec Loss 3.6251 LearningRate 0.0081 Epoch: 14 Global Step: 177700 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:33:08,118-Speed 2999.11 samples/sec Loss 3.7194 LearningRate 0.0081 Epoch: 14 Global Step: 177710 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:33:11,498-Speed 3030.96 samples/sec Loss 3.6425 LearningRate 0.0081 Epoch: 14 Global Step: 177720 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:33:14,893-Speed 3016.60 samples/sec Loss 3.7405 LearningRate 0.0081 Epoch: 14 Global Step: 177730 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:33:18,263-Speed 3040.17 samples/sec Loss 3.6910 LearningRate 0.0081 Epoch: 14 Global Step: 177740 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:33:21,675-Speed 3002.03 samples/sec Loss 3.7298 LearningRate 0.0081 Epoch: 14 Global Step: 177750 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:33:25,118-Speed 2974.78 samples/sec Loss 3.6828 LearningRate 0.0081 Epoch: 14 Global Step: 177760 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:33:28,453-Speed 3072.09 samples/sec Loss 3.7245 LearningRate 0.0081 Epoch: 14 Global Step: 177770 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:33:31,833-Speed 3029.71 samples/sec Loss 3.6248 LearningRate 0.0081 Epoch: 14 Global Step: 177780 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:33:35,247-Speed 3000.56 samples/sec Loss 3.6898 LearningRate 0.0081 Epoch: 14 Global Step: 177790 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:33:38,698-Speed 2968.62 samples/sec Loss 3.7166 LearningRate 0.0081 Epoch: 14 Global Step: 177800 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:33:42,099-Speed 3011.23 samples/sec Loss 3.6853 LearningRate 0.0081 Epoch: 14 Global Step: 177810 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:33:45,418-Speed 3085.72 samples/sec Loss 3.7173 LearningRate 0.0081 Epoch: 14 Global Step: 177820 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:33:48,857-Speed 2979.16 samples/sec Loss 3.7442 LearningRate 0.0081 Epoch: 14 Global Step: 177830 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:33:52,187-Speed 3075.30 samples/sec Loss 3.6897 LearningRate 0.0081 Epoch: 14 Global Step: 177840 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:33:55,581-Speed 3017.45 samples/sec Loss 3.7018 LearningRate 0.0081 Epoch: 14 Global Step: 177850 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:33:59,021-Speed 2977.84 samples/sec Loss 3.7670 LearningRate 0.0081 Epoch: 14 Global Step: 177860 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:34:02,378-Speed 3051.53 samples/sec Loss 3.7458 LearningRate 0.0081 Epoch: 14 Global Step: 177870 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:34:05,786-Speed 3005.54 samples/sec Loss 3.7706 LearningRate 0.0081 Epoch: 14 Global Step: 177880 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:34:09,125-Speed 3067.02 samples/sec Loss 3.8023 LearningRate 0.0081 Epoch: 14 Global Step: 177890 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:34:12,486-Speed 3047.46 samples/sec Loss 3.7609 LearningRate 0.0081 Epoch: 14 Global Step: 177900 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:34:15,825-Speed 3067.82 samples/sec Loss 3.7343 LearningRate 0.0081 Epoch: 14 Global Step: 177910 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:34:19,180-Speed 3053.45 samples/sec Loss 3.6785 LearningRate 0.0081 Epoch: 14 Global Step: 177920 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:34:22,522-Speed 3064.68 samples/sec Loss 3.7775 LearningRate 0.0081 Epoch: 14 Global Step: 177930 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:34:25,863-Speed 3065.85 samples/sec Loss 3.7761 LearningRate 0.0080 Epoch: 14 Global Step: 177940 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:34:29,300-Speed 2980.10 samples/sec Loss 3.8061 LearningRate 0.0080 Epoch: 14 Global Step: 177950 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:34:32,614-Speed 3091.56 samples/sec Loss 3.7081 LearningRate 0.0080 Epoch: 14 Global Step: 177960 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:34:35,977-Speed 3045.73 samples/sec Loss 3.7311 LearningRate 0.0080 Epoch: 14 Global Step: 177970 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:34:39,375-Speed 3013.78 samples/sec Loss 3.7214 LearningRate 0.0080 Epoch: 14 Global Step: 177980 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:34:42,769-Speed 3017.63 samples/sec Loss 3.6840 LearningRate 0.0080 Epoch: 14 Global Step: 177990 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:34:46,216-Speed 2971.97 samples/sec Loss 3.6014 LearningRate 0.0080 Epoch: 14 Global Step: 178000 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:34:49,728-Speed 2916.60 samples/sec Loss 3.7494 LearningRate 0.0080 Epoch: 14 Global Step: 178010 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:34:53,078-Speed 3056.97 samples/sec Loss 3.7437 LearningRate 0.0080 Epoch: 14 Global Step: 178020 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:34:56,484-Speed 3006.78 samples/sec Loss 3.8258 LearningRate 0.0080 Epoch: 14 Global Step: 178030 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:34:59,900-Speed 2999.21 samples/sec Loss 3.7208 LearningRate 0.0080 Epoch: 14 Global Step: 178040 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:35:03,334-Speed 2982.62 samples/sec Loss 3.7114 LearningRate 0.0080 Epoch: 14 Global Step: 178050 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:35:06,755-Speed 2994.00 samples/sec Loss 3.7670 LearningRate 0.0080 Epoch: 14 Global Step: 178060 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:35:10,174-Speed 2995.67 samples/sec Loss 3.7690 LearningRate 0.0080 Epoch: 14 Global Step: 178070 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:35:13,567-Speed 3019.16 samples/sec Loss 3.6843 LearningRate 0.0080 Epoch: 14 Global Step: 178080 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:35:17,057-Speed 2934.70 samples/sec Loss 3.7213 LearningRate 0.0080 Epoch: 14 Global Step: 178090 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:35:20,470-Speed 3001.25 samples/sec Loss 3.6534 LearningRate 0.0080 Epoch: 14 Global Step: 178100 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:35:23,836-Speed 3043.44 samples/sec Loss 3.7578 LearningRate 0.0080 Epoch: 14 Global Step: 178110 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:35:27,207-Speed 3038.01 samples/sec Loss 3.7171 LearningRate 0.0080 Epoch: 14 Global Step: 178120 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:35:30,560-Speed 3054.81 samples/sec Loss 3.6576 LearningRate 0.0080 Epoch: 14 Global Step: 178130 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:35:33,978-Speed 2996.69 samples/sec Loss 3.7475 LearningRate 0.0080 Epoch: 14 Global Step: 178140 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:35:37,386-Speed 3006.08 samples/sec Loss 3.7406 LearningRate 0.0080 Epoch: 14 Global Step: 178150 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:35:40,811-Speed 2989.95 samples/sec Loss 3.7847 LearningRate 0.0080 Epoch: 14 Global Step: 178160 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:35:44,225-Speed 3000.37 samples/sec Loss 3.7123 LearningRate 0.0080 Epoch: 14 Global Step: 178170 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:35:47,594-Speed 3040.50 samples/sec Loss 3.7696 LearningRate 0.0080 Epoch: 14 Global Step: 178180 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:35:50,945-Speed 3057.60 samples/sec Loss 3.7157 LearningRate 0.0080 Epoch: 14 Global Step: 178190 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:35:54,268-Speed 3081.65 samples/sec Loss 3.7368 LearningRate 0.0080 Epoch: 14 Global Step: 178200 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:35:57,724-Speed 2964.36 samples/sec Loss 3.7054 LearningRate 0.0080 Epoch: 14 Global Step: 178210 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:36:01,027-Speed 3100.63 samples/sec Loss 3.6492 LearningRate 0.0080 Epoch: 14 Global Step: 178220 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:36:04,401-Speed 3035.68 samples/sec Loss 3.8512 LearningRate 0.0080 Epoch: 14 Global Step: 178230 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:36:07,813-Speed 3002.26 samples/sec Loss 3.7780 LearningRate 0.0080 Epoch: 14 Global Step: 178240 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:36:11,160-Speed 3059.90 samples/sec Loss 3.8280 LearningRate 0.0080 Epoch: 14 Global Step: 178250 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:36:14,503-Speed 3064.29 samples/sec Loss 3.7041 LearningRate 0.0080 Epoch: 14 Global Step: 178260 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:36:17,876-Speed 3037.22 samples/sec Loss 3.7753 LearningRate 0.0080 Epoch: 14 Global Step: 178270 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:36:21,268-Speed 3019.77 samples/sec Loss 3.7123 LearningRate 0.0080 Epoch: 14 Global Step: 178280 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:36:24,734-Speed 2954.65 samples/sec Loss 3.7579 LearningRate 0.0080 Epoch: 14 Global Step: 178290 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:36:28,181-Speed 2971.96 samples/sec Loss 3.6747 LearningRate 0.0080 Epoch: 14 Global Step: 178300 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:36:31,551-Speed 3039.32 samples/sec Loss 3.7567 LearningRate 0.0080 Epoch: 14 Global Step: 178310 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:36:34,971-Speed 2994.66 samples/sec Loss 3.7644 LearningRate 0.0080 Epoch: 14 Global Step: 178320 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:36:38,326-Speed 3052.54 samples/sec Loss 3.7312 LearningRate 0.0080 Epoch: 14 Global Step: 178330 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:36:41,746-Speed 2995.52 samples/sec Loss 3.7185 LearningRate 0.0080 Epoch: 14 Global Step: 178340 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:36:45,138-Speed 3019.70 samples/sec Loss 3.7420 LearningRate 0.0080 Epoch: 14 Global Step: 178350 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:36:48,535-Speed 3015.45 samples/sec Loss 3.7708 LearningRate 0.0080 Epoch: 14 Global Step: 178360 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:36:51,898-Speed 3044.98 samples/sec Loss 3.7562 LearningRate 0.0080 Epoch: 14 Global Step: 178370 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:36:55,207-Speed 3096.61 samples/sec Loss 3.7712 LearningRate 0.0079 Epoch: 14 Global Step: 178380 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:36:58,521-Speed 3090.81 samples/sec Loss 3.7681 LearningRate 0.0079 Epoch: 14 Global Step: 178390 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:37:01,846-Speed 3080.25 samples/sec Loss 3.7896 LearningRate 0.0079 Epoch: 14 Global Step: 178400 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:37:05,277-Speed 2985.44 samples/sec Loss 3.6920 LearningRate 0.0079 Epoch: 14 Global Step: 178410 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:37:08,620-Speed 3064.17 samples/sec Loss 3.8231 LearningRate 0.0079 Epoch: 14 Global Step: 178420 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:37:12,007-Speed 3024.73 samples/sec Loss 3.7543 LearningRate 0.0079 Epoch: 14 Global Step: 178430 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:37:15,433-Speed 2989.33 samples/sec Loss 3.8285 LearningRate 0.0079 Epoch: 14 Global Step: 178440 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:37:18,761-Speed 3077.20 samples/sec Loss 3.7782 LearningRate 0.0079 Epoch: 14 Global Step: 178450 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:37:22,226-Speed 2956.89 samples/sec Loss 3.7709 LearningRate 0.0079 Epoch: 14 Global Step: 178460 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:37:25,592-Speed 3042.85 samples/sec Loss 3.7554 LearningRate 0.0079 Epoch: 14 Global Step: 178470 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:37:28,986-Speed 3018.02 samples/sec Loss 3.8565 LearningRate 0.0079 Epoch: 14 Global Step: 178480 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:37:32,373-Speed 3023.66 samples/sec Loss 3.7635 LearningRate 0.0079 Epoch: 14 Global Step: 178490 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:37:35,765-Speed 3019.37 samples/sec Loss 3.7580 LearningRate 0.0079 Epoch: 14 Global Step: 178500 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:37:39,159-Speed 3018.12 samples/sec Loss 3.7520 LearningRate 0.0079 Epoch: 14 Global Step: 178510 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:37:42,519-Speed 3049.15 samples/sec Loss 3.7361 LearningRate 0.0079 Epoch: 14 Global Step: 178520 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:37:45,874-Speed 3053.14 samples/sec Loss 3.8037 LearningRate 0.0079 Epoch: 14 Global Step: 178530 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:37:49,204-Speed 3075.78 samples/sec Loss 3.8248 LearningRate 0.0079 Epoch: 14 Global Step: 178540 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:37:52,591-Speed 3023.67 samples/sec Loss 3.7857 LearningRate 0.0079 Epoch: 14 Global Step: 178550 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:37:55,940-Speed 3058.39 samples/sec Loss 3.7271 LearningRate 0.0079 Epoch: 14 Global Step: 178560 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:37:59,269-Speed 3077.77 samples/sec Loss 3.7649 LearningRate 0.0079 Epoch: 14 Global Step: 178570 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:38:02,641-Speed 3038.00 samples/sec Loss 3.7137 LearningRate 0.0079 Epoch: 14 Global Step: 178580 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:38:06,009-Speed 3041.22 samples/sec Loss 3.7584 LearningRate 0.0079 Epoch: 14 Global Step: 178590 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:38:09,396-Speed 3023.64 samples/sec Loss 3.7646 LearningRate 0.0079 Epoch: 14 Global Step: 178600 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:38:12,703-Speed 3097.33 samples/sec Loss 3.7349 LearningRate 0.0079 Epoch: 14 Global Step: 178610 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:38:16,026-Speed 3082.67 samples/sec Loss 3.7840 LearningRate 0.0079 Epoch: 14 Global Step: 178620 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:38:19,503-Speed 2945.20 samples/sec Loss 3.7730 LearningRate 0.0079 Epoch: 14 Global Step: 178630 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:38:22,849-Speed 3061.14 samples/sec Loss 3.8323 LearningRate 0.0079 Epoch: 14 Global Step: 178640 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:38:26,227-Speed 3032.92 samples/sec Loss 3.7272 LearningRate 0.0079 Epoch: 14 Global Step: 178650 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:38:29,565-Speed 3068.10 samples/sec Loss 3.7733 LearningRate 0.0079 Epoch: 14 Global Step: 178660 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:38:32,947-Speed 3028.35 samples/sec Loss 3.6748 LearningRate 0.0079 Epoch: 14 Global Step: 178670 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:38:36,333-Speed 3024.85 samples/sec Loss 3.7495 LearningRate 0.0079 Epoch: 14 Global Step: 178680 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:38:39,672-Speed 3068.05 samples/sec Loss 3.8217 LearningRate 0.0079 Epoch: 14 Global Step: 178690 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:38:43,061-Speed 3021.97 samples/sec Loss 3.8015 LearningRate 0.0079 Epoch: 14 Global Step: 178700 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:38:46,417-Speed 3052.49 samples/sec Loss 3.7739 LearningRate 0.0079 Epoch: 14 Global Step: 178710 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:38:49,812-Speed 3016.21 samples/sec Loss 3.7033 LearningRate 0.0079 Epoch: 14 Global Step: 178720 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:38:53,144-Speed 3074.12 samples/sec Loss 3.8389 LearningRate 0.0079 Epoch: 14 Global Step: 178730 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:38:56,509-Speed 3044.40 samples/sec Loss 3.7141 LearningRate 0.0079 Epoch: 14 Global Step: 178740 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:38:59,842-Speed 3072.39 samples/sec Loss 3.7742 LearningRate 0.0079 Epoch: 14 Global Step: 178750 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:39:03,157-Speed 3089.93 samples/sec Loss 3.8081 LearningRate 0.0079 Epoch: 14 Global Step: 178760 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:39:06,479-Speed 3083.40 samples/sec Loss 3.8728 LearningRate 0.0079 Epoch: 14 Global Step: 178770 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:39:09,943-Speed 2956.78 samples/sec Loss 3.7501 LearningRate 0.0079 Epoch: 14 Global Step: 178780 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:39:13,314-Speed 3038.71 samples/sec Loss 3.7410 LearningRate 0.0079 Epoch: 14 Global Step: 178790 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:39:16,617-Speed 3101.55 samples/sec Loss 3.8342 LearningRate 0.0079 Epoch: 14 Global Step: 178800 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:39:20,051-Speed 2982.08 samples/sec Loss 3.6917 LearningRate 0.0079 Epoch: 14 Global Step: 178810 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:39:23,505-Speed 2965.70 samples/sec Loss 3.7788 LearningRate 0.0078 Epoch: 14 Global Step: 178820 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:39:26,863-Speed 3050.00 samples/sec Loss 3.7778 LearningRate 0.0078 Epoch: 14 Global Step: 178830 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:39:30,240-Speed 3033.21 samples/sec Loss 3.6494 LearningRate 0.0078 Epoch: 14 Global Step: 178840 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:39:33,618-Speed 3032.40 samples/sec Loss 3.7966 LearningRate 0.0078 Epoch: 14 Global Step: 178850 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:39:36,974-Speed 3052.33 samples/sec Loss 3.8853 LearningRate 0.0078 Epoch: 14 Global Step: 178860 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:39:40,405-Speed 2984.53 samples/sec Loss 3.7332 LearningRate 0.0078 Epoch: 14 Global Step: 178870 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:39:43,793-Speed 3023.04 samples/sec Loss 3.8546 LearningRate 0.0078 Epoch: 14 Global Step: 178880 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:39:47,205-Speed 3002.40 samples/sec Loss 3.7548 LearningRate 0.0078 Epoch: 14 Global Step: 178890 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:39:50,565-Speed 3048.30 samples/sec Loss 3.7768 LearningRate 0.0078 Epoch: 14 Global Step: 178900 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:39:53,971-Speed 3007.20 samples/sec Loss 3.7747 LearningRate 0.0078 Epoch: 14 Global Step: 178910 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:39:57,384-Speed 3001.15 samples/sec Loss 3.7895 LearningRate 0.0078 Epoch: 14 Global Step: 178920 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:40:00,724-Speed 3067.05 samples/sec Loss 3.8571 LearningRate 0.0078 Epoch: 14 Global Step: 178930 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:40:04,070-Speed 3061.29 samples/sec Loss 3.7576 LearningRate 0.0078 Epoch: 14 Global Step: 178940 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:40:07,403-Speed 3072.49 samples/sec Loss 3.8110 LearningRate 0.0078 Epoch: 14 Global Step: 178950 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:40:10,833-Speed 2986.23 samples/sec Loss 3.7378 LearningRate 0.0078 Epoch: 14 Global Step: 178960 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:40:14,165-Speed 3074.72 samples/sec Loss 3.7952 LearningRate 0.0078 Epoch: 14 Global Step: 178970 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:40:17,513-Speed 3059.18 samples/sec Loss 3.7570 LearningRate 0.0078 Epoch: 14 Global Step: 178980 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:40:20,904-Speed 3020.14 samples/sec Loss 3.7153 LearningRate 0.0078 Epoch: 14 Global Step: 178990 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:40:24,237-Speed 3073.55 samples/sec Loss 3.8232 LearningRate 0.0078 Epoch: 14 Global Step: 179000 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:40:27,617-Speed 3030.41 samples/sec Loss 3.8494 LearningRate 0.0078 Epoch: 14 Global Step: 179010 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:40:31,009-Speed 3020.17 samples/sec Loss 3.8173 LearningRate 0.0078 Epoch: 14 Global Step: 179020 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:40:34,529-Speed 2910.23 samples/sec Loss 3.7534 LearningRate 0.0078 Epoch: 14 Global Step: 179030 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:40:37,871-Speed 3064.20 samples/sec Loss 3.6976 LearningRate 0.0078 Epoch: 14 Global Step: 179040 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:40:41,213-Speed 3064.71 samples/sec Loss 3.6885 LearningRate 0.0078 Epoch: 14 Global Step: 179050 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:40:44,547-Speed 3072.39 samples/sec Loss 3.8557 LearningRate 0.0078 Epoch: 14 Global Step: 179060 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:40:47,923-Speed 3034.30 samples/sec Loss 3.8402 LearningRate 0.0078 Epoch: 14 Global Step: 179070 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:40:51,296-Speed 3036.80 samples/sec Loss 3.7026 LearningRate 0.0078 Epoch: 14 Global Step: 179080 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:40:54,721-Speed 2990.18 samples/sec Loss 3.7300 LearningRate 0.0078 Epoch: 14 Global Step: 179090 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:40:58,091-Speed 3039.23 samples/sec Loss 3.7835 LearningRate 0.0078 Epoch: 14 Global Step: 179100 Fp16 Grad Scale: 16384 Required: 7 hours Training: 2022-04-27 18:41:01,455-Speed 3045.15 samples/sec Loss 3.7854 LearningRate 0.0078 Epoch: 14 Global Step: 179110 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:41:04,776-Speed 3084.12 samples/sec Loss 3.7713 LearningRate 0.0078 Epoch: 14 Global Step: 179120 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:41:08,130-Speed 3054.17 samples/sec Loss 3.7810 LearningRate 0.0078 Epoch: 14 Global Step: 179130 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:41:11,448-Speed 3086.82 samples/sec Loss 3.7837 LearningRate 0.0078 Epoch: 14 Global Step: 179140 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:41:14,793-Speed 3061.51 samples/sec Loss 3.8474 LearningRate 0.0078 Epoch: 14 Global Step: 179150 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:41:18,171-Speed 3032.34 samples/sec Loss 3.7342 LearningRate 0.0078 Epoch: 14 Global Step: 179160 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:41:21,547-Speed 3034.17 samples/sec Loss 3.8545 LearningRate 0.0078 Epoch: 14 Global Step: 179170 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:41:24,972-Speed 2990.07 samples/sec Loss 3.7700 LearningRate 0.0078 Epoch: 14 Global Step: 179180 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:41:28,414-Speed 2976.09 samples/sec Loss 3.8307 LearningRate 0.0078 Epoch: 14 Global Step: 179190 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:41:31,764-Speed 3057.38 samples/sec Loss 3.7315 LearningRate 0.0078 Epoch: 14 Global Step: 179200 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:41:35,117-Speed 3055.20 samples/sec Loss 3.7265 LearningRate 0.0078 Epoch: 14 Global Step: 179210 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:41:38,453-Speed 3070.63 samples/sec Loss 3.8436 LearningRate 0.0078 Epoch: 14 Global Step: 179220 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:41:41,824-Speed 3037.76 samples/sec Loss 3.8577 LearningRate 0.0078 Epoch: 14 Global Step: 179230 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:41:45,155-Speed 3074.84 samples/sec Loss 3.8389 LearningRate 0.0078 Epoch: 14 Global Step: 179240 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:41:48,467-Speed 3092.81 samples/sec Loss 3.8662 LearningRate 0.0078 Epoch: 14 Global Step: 179250 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:41:51,790-Speed 3082.74 samples/sec Loss 3.7185 LearningRate 0.0078 Epoch: 14 Global Step: 179260 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:41:55,158-Speed 3040.48 samples/sec Loss 3.7281 LearningRate 0.0077 Epoch: 14 Global Step: 179270 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:41:58,591-Speed 2984.26 samples/sec Loss 3.7683 LearningRate 0.0077 Epoch: 14 Global Step: 179280 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:42:01,977-Speed 3024.74 samples/sec Loss 3.7953 LearningRate 0.0077 Epoch: 14 Global Step: 179290 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:42:05,340-Speed 3045.76 samples/sec Loss 3.7809 LearningRate 0.0077 Epoch: 14 Global Step: 179300 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:42:08,663-Speed 3082.62 samples/sec Loss 3.7395 LearningRate 0.0077 Epoch: 14 Global Step: 179310 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:42:12,014-Speed 3055.88 samples/sec Loss 3.7032 LearningRate 0.0077 Epoch: 14 Global Step: 179320 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:42:15,412-Speed 3014.50 samples/sec Loss 3.8021 LearningRate 0.0077 Epoch: 14 Global Step: 179330 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:42:18,798-Speed 3025.12 samples/sec Loss 3.7645 LearningRate 0.0077 Epoch: 14 Global Step: 179340 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:42:22,125-Speed 3078.40 samples/sec Loss 3.8489 LearningRate 0.0077 Epoch: 14 Global Step: 179350 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:42:25,467-Speed 3064.87 samples/sec Loss 3.8013 LearningRate 0.0077 Epoch: 14 Global Step: 179360 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:42:28,804-Speed 3069.70 samples/sec Loss 3.7719 LearningRate 0.0077 Epoch: 14 Global Step: 179370 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:42:32,177-Speed 3036.71 samples/sec Loss 3.7452 LearningRate 0.0077 Epoch: 14 Global Step: 179380 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:42:35,503-Speed 3079.87 samples/sec Loss 3.7513 LearningRate 0.0077 Epoch: 14 Global Step: 179390 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:42:38,867-Speed 3045.08 samples/sec Loss 3.8895 LearningRate 0.0077 Epoch: 14 Global Step: 179400 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:42:42,193-Speed 3079.17 samples/sec Loss 3.8034 LearningRate 0.0077 Epoch: 14 Global Step: 179410 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:42:45,645-Speed 2967.09 samples/sec Loss 3.6867 LearningRate 0.0077 Epoch: 14 Global Step: 179420 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:42:49,083-Speed 2979.51 samples/sec Loss 3.9076 LearningRate 0.0077 Epoch: 14 Global Step: 179430 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:42:52,482-Speed 3013.50 samples/sec Loss 3.8369 LearningRate 0.0077 Epoch: 14 Global Step: 179440 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:42:55,855-Speed 3036.66 samples/sec Loss 3.6731 LearningRate 0.0077 Epoch: 14 Global Step: 179450 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:42:59,227-Speed 3037.39 samples/sec Loss 3.7029 LearningRate 0.0077 Epoch: 14 Global Step: 179460 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:43:02,570-Speed 3064.88 samples/sec Loss 3.8273 LearningRate 0.0077 Epoch: 14 Global Step: 179470 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:43:05,970-Speed 3013.63 samples/sec Loss 3.7687 LearningRate 0.0077 Epoch: 14 Global Step: 179480 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:43:09,382-Speed 3002.17 samples/sec Loss 3.8914 LearningRate 0.0077 Epoch: 14 Global Step: 179490 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:43:12,707-Speed 3079.80 samples/sec Loss 3.7307 LearningRate 0.0077 Epoch: 14 Global Step: 179500 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:43:16,113-Speed 3007.46 samples/sec Loss 3.8048 LearningRate 0.0077 Epoch: 14 Global Step: 179510 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:43:19,567-Speed 2966.32 samples/sec Loss 3.7205 LearningRate 0.0077 Epoch: 14 Global Step: 179520 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:43:22,972-Speed 3007.94 samples/sec Loss 3.8604 LearningRate 0.0077 Epoch: 14 Global Step: 179530 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:43:26,352-Speed 3030.56 samples/sec Loss 3.8053 LearningRate 0.0077 Epoch: 14 Global Step: 179540 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:43:29,720-Speed 3041.29 samples/sec Loss 3.7920 LearningRate 0.0077 Epoch: 14 Global Step: 179550 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:43:33,123-Speed 3009.41 samples/sec Loss 3.7262 LearningRate 0.0077 Epoch: 14 Global Step: 179560 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:43:36,440-Speed 3088.51 samples/sec Loss 3.8861 LearningRate 0.0077 Epoch: 14 Global Step: 179570 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:43:39,833-Speed 3019.07 samples/sec Loss 3.7416 LearningRate 0.0077 Epoch: 14 Global Step: 179580 Fp16 Grad Scale: 65536 Required: 7 hours Training: 2022-04-27 18:43:43,209-Speed 3034.11 samples/sec Loss 3.7620 LearningRate 0.0077 Epoch: 14 Global Step: 179590 Fp16 Grad Scale: 32768 Required: 7 hours Training: 2022-04-27 18:43:46,610-Speed 3011.02 samples/sec Loss 3.7690 LearningRate 0.0077 Epoch: 14 Global Step: 179600 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:43:50,005-Speed 3017.38 samples/sec Loss 3.7666 LearningRate 0.0077 Epoch: 14 Global Step: 179610 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:43:53,386-Speed 3029.08 samples/sec Loss 3.7774 LearningRate 0.0077 Epoch: 14 Global Step: 179620 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:43:56,763-Speed 3033.10 samples/sec Loss 3.8668 LearningRate 0.0077 Epoch: 14 Global Step: 179630 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:44:00,114-Speed 3056.69 samples/sec Loss 3.8580 LearningRate 0.0077 Epoch: 14 Global Step: 179640 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:44:03,534-Speed 2996.11 samples/sec Loss 3.7332 LearningRate 0.0077 Epoch: 14 Global Step: 179650 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:44:06,955-Speed 2993.94 samples/sec Loss 3.8313 LearningRate 0.0077 Epoch: 14 Global Step: 179660 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:44:10,452-Speed 2928.84 samples/sec Loss 3.8014 LearningRate 0.0077 Epoch: 14 Global Step: 179670 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:44:13,850-Speed 3014.78 samples/sec Loss 3.7981 LearningRate 0.0077 Epoch: 14 Global Step: 179680 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:44:17,256-Speed 3007.30 samples/sec Loss 3.8044 LearningRate 0.0077 Epoch: 14 Global Step: 179690 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 18:44:20,653-Speed 3015.54 samples/sec Loss 3.8146 LearningRate 0.0077 Epoch: 14 Global Step: 179700 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 18:44:24,042-Speed 3022.30 samples/sec Loss 3.7681 LearningRate 0.0077 Epoch: 14 Global Step: 179710 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:44:27,478-Speed 2980.50 samples/sec Loss 3.8260 LearningRate 0.0076 Epoch: 14 Global Step: 179720 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:44:30,898-Speed 2995.40 samples/sec Loss 3.7947 LearningRate 0.0076 Epoch: 14 Global Step: 179730 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:44:34,293-Speed 3017.30 samples/sec Loss 3.7394 LearningRate 0.0076 Epoch: 14 Global Step: 179740 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:44:37,686-Speed 3018.63 samples/sec Loss 3.8153 LearningRate 0.0076 Epoch: 14 Global Step: 179750 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:44:41,123-Speed 2979.60 samples/sec Loss 3.8087 LearningRate 0.0076 Epoch: 14 Global Step: 179760 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:44:44,564-Speed 2976.90 samples/sec Loss 3.7771 LearningRate 0.0076 Epoch: 14 Global Step: 179770 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:44:47,928-Speed 3045.12 samples/sec Loss 3.8985 LearningRate 0.0076 Epoch: 14 Global Step: 179780 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:44:51,244-Speed 3088.28 samples/sec Loss 3.7888 LearningRate 0.0076 Epoch: 14 Global Step: 179790 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:44:54,619-Speed 3034.81 samples/sec Loss 3.7879 LearningRate 0.0076 Epoch: 14 Global Step: 179800 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:44:58,032-Speed 3001.89 samples/sec Loss 3.8246 LearningRate 0.0076 Epoch: 14 Global Step: 179810 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 18:45:01,384-Speed 3055.66 samples/sec Loss 3.8297 LearningRate 0.0076 Epoch: 14 Global Step: 179820 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 18:45:04,801-Speed 2997.29 samples/sec Loss 3.8225 LearningRate 0.0076 Epoch: 14 Global Step: 179830 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 18:45:08,178-Speed 3033.06 samples/sec Loss 3.7580 LearningRate 0.0076 Epoch: 14 Global Step: 179840 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:45:11,575-Speed 3015.46 samples/sec Loss 3.7891 LearningRate 0.0076 Epoch: 14 Global Step: 179850 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:45:14,968-Speed 3018.99 samples/sec Loss 3.8007 LearningRate 0.0076 Epoch: 14 Global Step: 179860 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:45:18,330-Speed 3046.31 samples/sec Loss 3.7846 LearningRate 0.0076 Epoch: 14 Global Step: 179870 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:45:21,647-Speed 3088.48 samples/sec Loss 3.7598 LearningRate 0.0076 Epoch: 14 Global Step: 179880 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:45:25,067-Speed 2994.35 samples/sec Loss 3.8482 LearningRate 0.0076 Epoch: 14 Global Step: 179890 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:45:28,529-Speed 2959.30 samples/sec Loss 3.8504 LearningRate 0.0076 Epoch: 14 Global Step: 179900 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:45:32,005-Speed 2946.71 samples/sec Loss 3.8575 LearningRate 0.0076 Epoch: 14 Global Step: 179910 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:45:35,337-Speed 3073.83 samples/sec Loss 3.7530 LearningRate 0.0076 Epoch: 14 Global Step: 179920 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:45:38,706-Speed 3040.87 samples/sec Loss 3.7977 LearningRate 0.0076 Epoch: 14 Global Step: 179930 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:45:42,041-Speed 3070.55 samples/sec Loss 3.8136 LearningRate 0.0076 Epoch: 14 Global Step: 179940 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 18:45:45,392-Speed 3056.86 samples/sec Loss 3.8067 LearningRate 0.0076 Epoch: 14 Global Step: 179950 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 18:45:48,808-Speed 2998.27 samples/sec Loss 3.9815 LearningRate 0.0076 Epoch: 14 Global Step: 179960 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 18:45:52,249-Speed 2977.24 samples/sec Loss 3.8097 LearningRate 0.0076 Epoch: 14 Global Step: 179970 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 18:45:55,601-Speed 3055.67 samples/sec Loss 3.7939 LearningRate 0.0076 Epoch: 14 Global Step: 179980 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 18:45:58,965-Speed 3044.82 samples/sec Loss 3.9180 LearningRate 0.0076 Epoch: 14 Global Step: 179990 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:46:02,349-Speed 3027.14 samples/sec Loss 3.8647 LearningRate 0.0076 Epoch: 14 Global Step: 180000 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:46:05,739-Speed 3021.05 samples/sec Loss 3.7974 LearningRate 0.0076 Epoch: 14 Global Step: 180010 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:46:09,096-Speed 3051.49 samples/sec Loss 3.8220 LearningRate 0.0076 Epoch: 14 Global Step: 180020 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:46:12,535-Speed 2978.23 samples/sec Loss 3.7914 LearningRate 0.0076 Epoch: 14 Global Step: 180030 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:46:16,029-Speed 2931.49 samples/sec Loss 3.9199 LearningRate 0.0076 Epoch: 14 Global Step: 180040 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:46:19,412-Speed 3027.90 samples/sec Loss 3.8481 LearningRate 0.0076 Epoch: 14 Global Step: 180050 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:46:22,726-Speed 3091.04 samples/sec Loss 3.8450 LearningRate 0.0076 Epoch: 14 Global Step: 180060 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:46:26,134-Speed 3005.29 samples/sec Loss 3.6889 LearningRate 0.0076 Epoch: 14 Global Step: 180070 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:46:29,477-Speed 3063.64 samples/sec Loss 3.7927 LearningRate 0.0076 Epoch: 14 Global Step: 180080 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:46:32,833-Speed 3052.78 samples/sec Loss 3.7322 LearningRate 0.0076 Epoch: 14 Global Step: 180090 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:46:36,210-Speed 3032.80 samples/sec Loss 3.8396 LearningRate 0.0076 Epoch: 14 Global Step: 180100 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:46:39,606-Speed 3016.44 samples/sec Loss 3.8257 LearningRate 0.0076 Epoch: 14 Global Step: 180110 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:46:42,979-Speed 3036.57 samples/sec Loss 3.9163 LearningRate 0.0076 Epoch: 14 Global Step: 180120 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:46:46,435-Speed 2963.65 samples/sec Loss 3.7648 LearningRate 0.0076 Epoch: 14 Global Step: 180130 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:46:49,790-Speed 3053.00 samples/sec Loss 3.7577 LearningRate 0.0076 Epoch: 14 Global Step: 180140 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:46:53,182-Speed 3020.23 samples/sec Loss 3.7322 LearningRate 0.0076 Epoch: 14 Global Step: 180150 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:46:56,560-Speed 3032.02 samples/sec Loss 3.8762 LearningRate 0.0076 Epoch: 14 Global Step: 180160 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:46:59,952-Speed 3020.20 samples/sec Loss 3.9180 LearningRate 0.0075 Epoch: 14 Global Step: 180170 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:47:03,375-Speed 2992.51 samples/sec Loss 3.8097 LearningRate 0.0075 Epoch: 14 Global Step: 180180 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:47:06,789-Speed 2999.94 samples/sec Loss 3.8855 LearningRate 0.0075 Epoch: 14 Global Step: 180190 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 18:47:10,253-Speed 2956.75 samples/sec Loss 3.7909 LearningRate 0.0075 Epoch: 14 Global Step: 180200 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 18:47:13,609-Speed 3052.25 samples/sec Loss 3.9014 LearningRate 0.0075 Epoch: 14 Global Step: 180210 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:47:16,966-Speed 3051.32 samples/sec Loss 3.8079 LearningRate 0.0075 Epoch: 14 Global Step: 180220 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:47:20,347-Speed 3029.22 samples/sec Loss 3.9415 LearningRate 0.0075 Epoch: 14 Global Step: 180230 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:47:23,827-Speed 2943.23 samples/sec Loss 3.9097 LearningRate 0.0075 Epoch: 14 Global Step: 180240 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:47:27,244-Speed 2997.39 samples/sec Loss 3.8222 LearningRate 0.0075 Epoch: 14 Global Step: 180250 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:47:30,591-Speed 3060.71 samples/sec Loss 3.8298 LearningRate 0.0075 Epoch: 14 Global Step: 180260 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:47:33,970-Speed 3031.13 samples/sec Loss 3.8217 LearningRate 0.0075 Epoch: 14 Global Step: 180270 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:47:37,301-Speed 3075.40 samples/sec Loss 3.7461 LearningRate 0.0075 Epoch: 14 Global Step: 180280 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:47:40,727-Speed 2989.94 samples/sec Loss 3.8721 LearningRate 0.0075 Epoch: 14 Global Step: 180290 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:47:44,133-Speed 3007.33 samples/sec Loss 3.8295 LearningRate 0.0075 Epoch: 14 Global Step: 180300 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:47:47,552-Speed 2995.10 samples/sec Loss 3.9276 LearningRate 0.0075 Epoch: 14 Global Step: 180310 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 18:47:50,976-Speed 2992.13 samples/sec Loss 3.7979 LearningRate 0.0075 Epoch: 14 Global Step: 180320 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 18:47:54,393-Speed 2997.84 samples/sec Loss 3.8777 LearningRate 0.0075 Epoch: 14 Global Step: 180330 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 18:47:57,810-Speed 2997.72 samples/sec Loss 4.0004 LearningRate 0.0075 Epoch: 14 Global Step: 180340 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 18:48:01,199-Speed 3021.92 samples/sec Loss 3.9074 LearningRate 0.0075 Epoch: 14 Global Step: 180350 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 18:48:04,550-Speed 3056.71 samples/sec Loss 3.8353 LearningRate 0.0075 Epoch: 14 Global Step: 180360 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 18:48:07,889-Speed 3067.49 samples/sec Loss 3.9097 LearningRate 0.0075 Epoch: 14 Global Step: 180370 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 18:48:11,243-Speed 3053.97 samples/sec Loss 3.7003 LearningRate 0.0075 Epoch: 14 Global Step: 180380 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 18:48:14,601-Speed 3050.44 samples/sec Loss 3.8941 LearningRate 0.0075 Epoch: 14 Global Step: 180390 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 18:48:18,065-Speed 2956.80 samples/sec Loss 3.8731 LearningRate 0.0075 Epoch: 14 Global Step: 180400 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 18:48:21,531-Speed 2954.91 samples/sec Loss 3.8482 LearningRate 0.0075 Epoch: 14 Global Step: 180410 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 18:48:24,971-Speed 2978.61 samples/sec Loss 3.8147 LearningRate 0.0075 Epoch: 14 Global Step: 180420 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:48:28,395-Speed 2990.69 samples/sec Loss 3.8437 LearningRate 0.0075 Epoch: 14 Global Step: 180430 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:48:31,822-Speed 2989.09 samples/sec Loss 3.9193 LearningRate 0.0075 Epoch: 14 Global Step: 180440 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:48:35,178-Speed 3051.72 samples/sec Loss 3.8340 LearningRate 0.0075 Epoch: 14 Global Step: 180450 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:48:38,646-Speed 2953.70 samples/sec Loss 3.7156 LearningRate 0.0075 Epoch: 14 Global Step: 180460 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:48:42,064-Speed 2997.12 samples/sec Loss 3.8208 LearningRate 0.0075 Epoch: 14 Global Step: 180470 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:48:45,482-Speed 2996.50 samples/sec Loss 3.7919 LearningRate 0.0075 Epoch: 14 Global Step: 180480 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:48:48,925-Speed 2974.94 samples/sec Loss 3.8862 LearningRate 0.0075 Epoch: 14 Global Step: 180490 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:48:52,310-Speed 3026.01 samples/sec Loss 3.8443 LearningRate 0.0075 Epoch: 14 Global Step: 180500 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:48:55,736-Speed 2990.05 samples/sec Loss 3.7845 LearningRate 0.0075 Epoch: 14 Global Step: 180510 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:48:59,157-Speed 2993.42 samples/sec Loss 3.8800 LearningRate 0.0075 Epoch: 14 Global Step: 180520 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 18:49:02,562-Speed 3008.61 samples/sec Loss 3.7901 LearningRate 0.0075 Epoch: 14 Global Step: 180530 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:49:05,928-Speed 3043.38 samples/sec Loss 3.8066 LearningRate 0.0075 Epoch: 14 Global Step: 180540 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:49:09,318-Speed 3020.98 samples/sec Loss 3.8260 LearningRate 0.0075 Epoch: 14 Global Step: 180550 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:49:12,629-Speed 3093.77 samples/sec Loss 3.9058 LearningRate 0.0075 Epoch: 14 Global Step: 180560 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:49:15,997-Speed 3040.94 samples/sec Loss 3.7413 LearningRate 0.0075 Epoch: 14 Global Step: 180570 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:49:19,346-Speed 3058.29 samples/sec Loss 3.8026 LearningRate 0.0075 Epoch: 14 Global Step: 180580 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:49:22,783-Speed 2980.89 samples/sec Loss 3.8827 LearningRate 0.0075 Epoch: 14 Global Step: 180590 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:49:26,214-Speed 2984.46 samples/sec Loss 3.7166 LearningRate 0.0075 Epoch: 14 Global Step: 180600 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:49:29,637-Speed 2992.50 samples/sec Loss 3.9005 LearningRate 0.0075 Epoch: 14 Global Step: 180610 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:49:33,036-Speed 3014.09 samples/sec Loss 3.8259 LearningRate 0.0074 Epoch: 14 Global Step: 180620 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:49:36,457-Speed 2993.22 samples/sec Loss 3.8829 LearningRate 0.0074 Epoch: 14 Global Step: 180630 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 18:49:39,975-Speed 2912.34 samples/sec Loss 3.8735 LearningRate 0.0074 Epoch: 14 Global Step: 180640 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 18:49:43,446-Speed 2950.97 samples/sec Loss 3.7401 LearningRate 0.0074 Epoch: 14 Global Step: 180650 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:49:46,833-Speed 3024.39 samples/sec Loss 3.9718 LearningRate 0.0074 Epoch: 14 Global Step: 180660 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:49:50,171-Speed 3068.06 samples/sec Loss 3.7744 LearningRate 0.0074 Epoch: 14 Global Step: 180670 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:49:53,645-Speed 2948.76 samples/sec Loss 3.8868 LearningRate 0.0074 Epoch: 14 Global Step: 180680 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:49:57,050-Speed 3007.76 samples/sec Loss 3.9325 LearningRate 0.0074 Epoch: 14 Global Step: 180690 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 18:50:00,475-Speed 2990.61 samples/sec Loss 3.8027 LearningRate 0.0074 Epoch: 14 Global Step: 180700 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 18:50:03,867-Speed 3019.78 samples/sec Loss 3.8524 LearningRate 0.0074 Epoch: 14 Global Step: 180710 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 18:50:07,283-Speed 2998.59 samples/sec Loss 3.7930 LearningRate 0.0074 Epoch: 14 Global Step: 180720 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 18:50:10,667-Speed 3026.70 samples/sec Loss 3.8481 LearningRate 0.0074 Epoch: 14 Global Step: 180730 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 18:50:13,973-Speed 3098.12 samples/sec Loss 3.8484 LearningRate 0.0074 Epoch: 14 Global Step: 180740 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 18:50:17,332-Speed 3048.96 samples/sec Loss 3.8422 LearningRate 0.0074 Epoch: 14 Global Step: 180750 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 18:50:20,719-Speed 3024.33 samples/sec Loss 3.7613 LearningRate 0.0074 Epoch: 14 Global Step: 180760 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 18:50:24,146-Speed 2988.83 samples/sec Loss 3.8791 LearningRate 0.0074 Epoch: 14 Global Step: 180770 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 18:50:27,579-Speed 2983.68 samples/sec Loss 3.8632 LearningRate 0.0074 Epoch: 14 Global Step: 180780 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 18:50:30,930-Speed 3056.90 samples/sec Loss 3.8453 LearningRate 0.0074 Epoch: 14 Global Step: 180790 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:50:34,364-Speed 2982.64 samples/sec Loss 3.8740 LearningRate 0.0074 Epoch: 14 Global Step: 180800 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:50:37,714-Speed 3057.10 samples/sec Loss 3.9030 LearningRate 0.0074 Epoch: 14 Global Step: 180810 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:50:41,104-Speed 3021.97 samples/sec Loss 3.8576 LearningRate 0.0074 Epoch: 14 Global Step: 180820 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:50:44,504-Speed 3012.84 samples/sec Loss 3.9316 LearningRate 0.0074 Epoch: 14 Global Step: 180830 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:50:47,854-Speed 3057.23 samples/sec Loss 3.9195 LearningRate 0.0074 Epoch: 14 Global Step: 180840 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:50:51,268-Speed 3000.10 samples/sec Loss 3.8249 LearningRate 0.0074 Epoch: 14 Global Step: 180850 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:50:54,639-Speed 3038.77 samples/sec Loss 3.8997 LearningRate 0.0074 Epoch: 14 Global Step: 180860 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:50:57,984-Speed 3062.34 samples/sec Loss 3.9393 LearningRate 0.0074 Epoch: 14 Global Step: 180870 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:51:01,366-Speed 3028.01 samples/sec Loss 3.7622 LearningRate 0.0074 Epoch: 14 Global Step: 180880 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:51:04,745-Speed 3031.42 samples/sec Loss 3.7346 LearningRate 0.0074 Epoch: 14 Global Step: 180890 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:51:08,090-Speed 3062.41 samples/sec Loss 3.7973 LearningRate 0.0074 Epoch: 14 Global Step: 180900 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:51:11,452-Speed 3046.80 samples/sec Loss 3.7955 LearningRate 0.0074 Epoch: 14 Global Step: 180910 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:51:14,895-Speed 2974.78 samples/sec Loss 3.8909 LearningRate 0.0074 Epoch: 14 Global Step: 180920 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:51:18,293-Speed 3014.76 samples/sec Loss 3.7925 LearningRate 0.0074 Epoch: 14 Global Step: 180930 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:51:21,706-Speed 3000.62 samples/sec Loss 3.8124 LearningRate 0.0074 Epoch: 14 Global Step: 180940 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:51:25,080-Speed 3036.12 samples/sec Loss 3.7337 LearningRate 0.0074 Epoch: 14 Global Step: 180950 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 18:51:28,475-Speed 3017.42 samples/sec Loss 3.8435 LearningRate 0.0074 Epoch: 14 Global Step: 180960 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 18:51:31,820-Speed 3062.21 samples/sec Loss 3.8093 LearningRate 0.0074 Epoch: 14 Global Step: 180970 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 18:51:35,161-Speed 3066.29 samples/sec Loss 3.8510 LearningRate 0.0074 Epoch: 14 Global Step: 180980 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 18:51:38,519-Speed 3050.56 samples/sec Loss 3.7718 LearningRate 0.0074 Epoch: 14 Global Step: 180990 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 18:51:41,847-Speed 3077.54 samples/sec Loss 3.8000 LearningRate 0.0074 Epoch: 14 Global Step: 181000 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 18:51:45,172-Speed 3080.30 samples/sec Loss 3.7667 LearningRate 0.0074 Epoch: 14 Global Step: 181010 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 18:51:48,499-Speed 3079.23 samples/sec Loss 3.7846 LearningRate 0.0074 Epoch: 14 Global Step: 181020 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 18:51:51,850-Speed 3056.91 samples/sec Loss 3.8365 LearningRate 0.0074 Epoch: 14 Global Step: 181030 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 18:51:55,219-Speed 3040.24 samples/sec Loss 3.8547 LearningRate 0.0074 Epoch: 14 Global Step: 181040 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 18:51:58,600-Speed 3029.47 samples/sec Loss 3.8391 LearningRate 0.0074 Epoch: 14 Global Step: 181050 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:52:01,982-Speed 3029.82 samples/sec Loss 3.8983 LearningRate 0.0074 Epoch: 14 Global Step: 181060 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:52:05,412-Speed 2986.46 samples/sec Loss 3.8819 LearningRate 0.0074 Epoch: 14 Global Step: 181070 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:52:08,753-Speed 3066.71 samples/sec Loss 3.7908 LearningRate 0.0073 Epoch: 14 Global Step: 181080 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:52:12,122-Speed 3040.06 samples/sec Loss 3.8503 LearningRate 0.0073 Epoch: 14 Global Step: 181090 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:52:15,492-Speed 3039.89 samples/sec Loss 3.9363 LearningRate 0.0073 Epoch: 14 Global Step: 181100 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:52:18,861-Speed 3040.34 samples/sec Loss 3.8478 LearningRate 0.0073 Epoch: 14 Global Step: 181110 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:52:22,253-Speed 3019.30 samples/sec Loss 3.8104 LearningRate 0.0073 Epoch: 14 Global Step: 181120 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:52:25,585-Speed 3074.25 samples/sec Loss 3.9628 LearningRate 0.0073 Epoch: 14 Global Step: 181130 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:52:29,002-Speed 2997.54 samples/sec Loss 3.8416 LearningRate 0.0073 Epoch: 14 Global Step: 181140 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:52:32,403-Speed 3011.49 samples/sec Loss 3.7971 LearningRate 0.0073 Epoch: 14 Global Step: 181150 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 18:52:35,791-Speed 3023.80 samples/sec Loss 3.8850 LearningRate 0.0073 Epoch: 14 Global Step: 181160 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 18:52:39,189-Speed 3014.71 samples/sec Loss 3.8146 LearningRate 0.0073 Epoch: 14 Global Step: 181170 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 18:52:42,548-Speed 3049.28 samples/sec Loss 3.8573 LearningRate 0.0073 Epoch: 14 Global Step: 181180 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 18:52:45,911-Speed 3045.63 samples/sec Loss 3.8933 LearningRate 0.0073 Epoch: 14 Global Step: 181190 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 18:52:49,283-Speed 3037.59 samples/sec Loss 3.8663 LearningRate 0.0073 Epoch: 14 Global Step: 181200 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 18:52:52,716-Speed 2983.40 samples/sec Loss 3.7668 LearningRate 0.0073 Epoch: 14 Global Step: 181210 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 18:52:56,046-Speed 3076.23 samples/sec Loss 3.8547 LearningRate 0.0073 Epoch: 14 Global Step: 181220 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 18:52:59,407-Speed 3047.81 samples/sec Loss 3.9038 LearningRate 0.0073 Epoch: 14 Global Step: 181230 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 18:53:02,823-Speed 2998.35 samples/sec Loss 3.8135 LearningRate 0.0073 Epoch: 14 Global Step: 181240 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 18:53:06,186-Speed 3044.96 samples/sec Loss 3.8259 LearningRate 0.0073 Epoch: 14 Global Step: 181250 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 18:53:09,581-Speed 3017.46 samples/sec Loss 3.8982 LearningRate 0.0073 Epoch: 14 Global Step: 181260 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 18:53:12,929-Speed 3059.32 samples/sec Loss 3.8686 LearningRate 0.0073 Epoch: 14 Global Step: 181270 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 18:53:16,334-Speed 3008.44 samples/sec Loss 3.8917 LearningRate 0.0073 Epoch: 14 Global Step: 181280 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 18:53:19,701-Speed 3042.74 samples/sec Loss 4.0029 LearningRate 0.0073 Epoch: 14 Global Step: 181290 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 18:53:23,046-Speed 3062.01 samples/sec Loss 3.7996 LearningRate 0.0073 Epoch: 14 Global Step: 181300 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 18:53:26,364-Speed 3087.08 samples/sec Loss 3.7948 LearningRate 0.0073 Epoch: 14 Global Step: 181310 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:53:29,741-Speed 3033.21 samples/sec Loss 3.7668 LearningRate 0.0073 Epoch: 14 Global Step: 181320 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:53:33,139-Speed 3014.88 samples/sec Loss 3.9027 LearningRate 0.0073 Epoch: 14 Global Step: 181330 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:53:36,633-Speed 2930.94 samples/sec Loss 3.7825 LearningRate 0.0073 Epoch: 14 Global Step: 181340 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:53:40,029-Speed 3016.44 samples/sec Loss 3.8796 LearningRate 0.0073 Epoch: 14 Global Step: 181350 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:53:43,418-Speed 3022.52 samples/sec Loss 3.8772 LearningRate 0.0073 Epoch: 14 Global Step: 181360 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:53:46,785-Speed 3041.71 samples/sec Loss 3.8925 LearningRate 0.0073 Epoch: 14 Global Step: 181370 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:53:50,156-Speed 3039.28 samples/sec Loss 3.7835 LearningRate 0.0073 Epoch: 14 Global Step: 181380 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:53:53,542-Speed 3024.57 samples/sec Loss 3.7764 LearningRate 0.0073 Epoch: 14 Global Step: 181390 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:53:56,893-Speed 3056.99 samples/sec Loss 3.8377 LearningRate 0.0073 Epoch: 14 Global Step: 181400 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:54:00,301-Speed 3006.05 samples/sec Loss 3.8537 LearningRate 0.0073 Epoch: 14 Global Step: 181410 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 18:54:03,663-Speed 3046.97 samples/sec Loss 3.8749 LearningRate 0.0073 Epoch: 14 Global Step: 181420 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 18:54:07,089-Speed 2989.31 samples/sec Loss 3.8655 LearningRate 0.0073 Epoch: 14 Global Step: 181430 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:54:10,600-Speed 2917.28 samples/sec Loss 3.9109 LearningRate 0.0073 Epoch: 14 Global Step: 181440 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:54:13,996-Speed 3016.09 samples/sec Loss 3.8710 LearningRate 0.0073 Epoch: 14 Global Step: 181450 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:54:17,354-Speed 3050.34 samples/sec Loss 3.9072 LearningRate 0.0073 Epoch: 14 Global Step: 181460 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:54:20,701-Speed 3061.19 samples/sec Loss 3.8955 LearningRate 0.0073 Epoch: 14 Global Step: 181470 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:54:24,132-Speed 2984.80 samples/sec Loss 3.8356 LearningRate 0.0073 Epoch: 14 Global Step: 181480 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:54:27,502-Speed 3039.32 samples/sec Loss 3.9398 LearningRate 0.0073 Epoch: 14 Global Step: 181490 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:54:30,897-Speed 3016.95 samples/sec Loss 3.8035 LearningRate 0.0073 Epoch: 14 Global Step: 181500 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:54:34,279-Speed 3028.77 samples/sec Loss 3.7515 LearningRate 0.0073 Epoch: 14 Global Step: 181510 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:54:37,655-Speed 3034.07 samples/sec Loss 3.9065 LearningRate 0.0073 Epoch: 14 Global Step: 181520 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:54:41,024-Speed 3040.47 samples/sec Loss 3.8540 LearningRate 0.0073 Epoch: 14 Global Step: 181530 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 18:54:44,449-Speed 2990.61 samples/sec Loss 3.8312 LearningRate 0.0072 Epoch: 14 Global Step: 181540 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 18:54:47,839-Speed 3021.91 samples/sec Loss 3.8491 LearningRate 0.0072 Epoch: 14 Global Step: 181550 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:54:51,150-Speed 3093.28 samples/sec Loss 3.7690 LearningRate 0.0072 Epoch: 14 Global Step: 181560 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:54:54,522-Speed 3037.62 samples/sec Loss 3.8495 LearningRate 0.0072 Epoch: 14 Global Step: 181570 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:54:57,853-Speed 3074.92 samples/sec Loss 3.7092 LearningRate 0.0072 Epoch: 14 Global Step: 181580 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 18:55:01,164-Speed 3093.87 samples/sec Loss 3.8976 LearningRate 0.0072 Epoch: 14 Global Step: 181590 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 18:55:04,531-Speed 3041.77 samples/sec Loss 3.8134 LearningRate 0.0072 Epoch: 14 Global Step: 181600 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 18:55:07,905-Speed 3036.31 samples/sec Loss 3.8490 LearningRate 0.0072 Epoch: 14 Global Step: 181610 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 18:55:11,375-Speed 2952.00 samples/sec Loss 3.8598 LearningRate 0.0072 Epoch: 14 Global Step: 181620 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 18:55:14,740-Speed 3043.92 samples/sec Loss 3.8377 LearningRate 0.0072 Epoch: 14 Global Step: 181630 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 18:55:18,235-Speed 2930.70 samples/sec Loss 3.7708 LearningRate 0.0072 Epoch: 14 Global Step: 181640 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 18:55:21,600-Speed 3043.91 samples/sec Loss 3.8299 LearningRate 0.0072 Epoch: 14 Global Step: 181650 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 18:55:24,990-Speed 3021.67 samples/sec Loss 3.8221 LearningRate 0.0072 Epoch: 14 Global Step: 181660 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 18:55:28,408-Speed 2996.61 samples/sec Loss 3.8885 LearningRate 0.0072 Epoch: 14 Global Step: 181670 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 18:55:31,816-Speed 3005.56 samples/sec Loss 3.7826 LearningRate 0.0072 Epoch: 14 Global Step: 181680 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:55:35,172-Speed 3051.49 samples/sec Loss 3.8625 LearningRate 0.0072 Epoch: 14 Global Step: 181690 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:55:38,570-Speed 3014.34 samples/sec Loss 3.8241 LearningRate 0.0072 Epoch: 14 Global Step: 181700 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:55:41,948-Speed 3032.55 samples/sec Loss 3.7330 LearningRate 0.0072 Epoch: 14 Global Step: 181710 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:55:45,442-Speed 2931.93 samples/sec Loss 3.8200 LearningRate 0.0072 Epoch: 14 Global Step: 181720 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:55:48,834-Speed 3019.31 samples/sec Loss 3.8552 LearningRate 0.0072 Epoch: 14 Global Step: 181730 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:55:52,324-Speed 2935.56 samples/sec Loss 3.8376 LearningRate 0.0072 Epoch: 14 Global Step: 181740 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:55:55,711-Speed 3023.50 samples/sec Loss 3.8380 LearningRate 0.0072 Epoch: 14 Global Step: 181750 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:55:59,073-Speed 3046.46 samples/sec Loss 3.9047 LearningRate 0.0072 Epoch: 14 Global Step: 181760 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:56:02,451-Speed 3032.69 samples/sec Loss 3.8483 LearningRate 0.0072 Epoch: 14 Global Step: 181770 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:56:05,856-Speed 3008.33 samples/sec Loss 3.8237 LearningRate 0.0072 Epoch: 14 Global Step: 181780 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 18:56:09,305-Speed 2969.60 samples/sec Loss 3.8512 LearningRate 0.0072 Epoch: 14 Global Step: 181790 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:56:12,781-Speed 2947.40 samples/sec Loss 3.8756 LearningRate 0.0072 Epoch: 14 Global Step: 181800 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:56:16,102-Speed 3083.89 samples/sec Loss 3.7846 LearningRate 0.0072 Epoch: 14 Global Step: 181810 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 18:56:19,503-Speed 3011.69 samples/sec Loss 3.8692 LearningRate 0.0072 Epoch: 14 Global Step: 181820 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 18:56:22,850-Speed 3060.36 samples/sec Loss 3.9153 LearningRate 0.0072 Epoch: 14 Global Step: 181830 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 18:56:26,274-Speed 2992.02 samples/sec Loss 3.8776 LearningRate 0.0072 Epoch: 14 Global Step: 181840 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 18:56:29,615-Speed 3065.15 samples/sec Loss 3.8512 LearningRate 0.0072 Epoch: 14 Global Step: 181850 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 18:56:32,956-Speed 3066.11 samples/sec Loss 3.8787 LearningRate 0.0072 Epoch: 14 Global Step: 181860 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 18:56:36,423-Speed 2954.47 samples/sec Loss 3.8824 LearningRate 0.0072 Epoch: 14 Global Step: 181870 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 18:56:39,793-Speed 3039.18 samples/sec Loss 3.8790 LearningRate 0.0072 Epoch: 14 Global Step: 181880 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 18:56:43,124-Speed 3075.58 samples/sec Loss 3.8753 LearningRate 0.0072 Epoch: 14 Global Step: 181890 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 18:56:46,547-Speed 2992.15 samples/sec Loss 3.8943 LearningRate 0.0072 Epoch: 14 Global Step: 181900 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 18:56:49,892-Speed 3063.05 samples/sec Loss 3.8688 LearningRate 0.0072 Epoch: 14 Global Step: 181910 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:56:53,263-Speed 3038.42 samples/sec Loss 3.7868 LearningRate 0.0072 Epoch: 14 Global Step: 181920 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:56:56,568-Speed 3098.62 samples/sec Loss 3.9332 LearningRate 0.0072 Epoch: 14 Global Step: 181930 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:56:59,871-Speed 3100.64 samples/sec Loss 3.9109 LearningRate 0.0072 Epoch: 14 Global Step: 181940 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:57:03,287-Speed 2999.33 samples/sec Loss 3.8482 LearningRate 0.0072 Epoch: 14 Global Step: 181950 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:57:06,786-Speed 2927.14 samples/sec Loss 3.9380 LearningRate 0.0072 Epoch: 14 Global Step: 181960 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:57:10,172-Speed 3025.76 samples/sec Loss 3.8541 LearningRate 0.0072 Epoch: 14 Global Step: 181970 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:57:13,566-Speed 3017.77 samples/sec Loss 3.7033 LearningRate 0.0072 Epoch: 14 Global Step: 181980 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:57:16,910-Speed 3062.62 samples/sec Loss 3.9132 LearningRate 0.0072 Epoch: 14 Global Step: 181990 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:57:20,263-Speed 3054.95 samples/sec Loss 3.8173 LearningRate 0.0071 Epoch: 14 Global Step: 182000 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:57:23,601-Speed 3068.78 samples/sec Loss 3.9388 LearningRate 0.0071 Epoch: 14 Global Step: 182010 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:57:27,063-Speed 2958.38 samples/sec Loss 3.8462 LearningRate 0.0071 Epoch: 14 Global Step: 182020 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:57:30,455-Speed 3019.28 samples/sec Loss 3.9260 LearningRate 0.0071 Epoch: 14 Global Step: 182030 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:57:33,781-Speed 3080.09 samples/sec Loss 3.8798 LearningRate 0.0071 Epoch: 14 Global Step: 182040 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:57:37,147-Speed 3042.65 samples/sec Loss 3.8145 LearningRate 0.0071 Epoch: 14 Global Step: 182050 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:57:40,457-Speed 3094.89 samples/sec Loss 3.8500 LearningRate 0.0071 Epoch: 14 Global Step: 182060 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:57:43,822-Speed 3043.36 samples/sec Loss 3.9110 LearningRate 0.0071 Epoch: 14 Global Step: 182070 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:57:47,129-Speed 3097.77 samples/sec Loss 3.8276 LearningRate 0.0071 Epoch: 14 Global Step: 182080 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:57:50,471-Speed 3064.71 samples/sec Loss 3.9180 LearningRate 0.0071 Epoch: 14 Global Step: 182090 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:57:53,874-Speed 3010.23 samples/sec Loss 3.8305 LearningRate 0.0071 Epoch: 14 Global Step: 182100 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:57:57,290-Speed 2997.65 samples/sec Loss 3.7933 LearningRate 0.0071 Epoch: 14 Global Step: 182110 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 18:58:00,639-Speed 3058.93 samples/sec Loss 3.8173 LearningRate 0.0071 Epoch: 14 Global Step: 182120 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 18:58:04,024-Speed 3026.22 samples/sec Loss 3.7803 LearningRate 0.0071 Epoch: 14 Global Step: 182130 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 18:58:07,438-Speed 3000.41 samples/sec Loss 3.8584 LearningRate 0.0071 Epoch: 14 Global Step: 182140 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 18:58:10,757-Speed 3086.56 samples/sec Loss 3.9090 LearningRate 0.0071 Epoch: 14 Global Step: 182150 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 18:58:14,142-Speed 3025.35 samples/sec Loss 3.7602 LearningRate 0.0071 Epoch: 14 Global Step: 182160 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:58:17,502-Speed 3048.55 samples/sec Loss 3.7363 LearningRate 0.0071 Epoch: 14 Global Step: 182170 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 18:58:20,833-Speed 3075.47 samples/sec Loss 3.8142 LearningRate 0.0071 Epoch: 14 Global Step: 182180 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 18:58:24,176-Speed 3063.39 samples/sec Loss 3.8441 LearningRate 0.0071 Epoch: 14 Global Step: 182190 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 18:58:27,579-Speed 3009.90 samples/sec Loss 3.8369 LearningRate 0.0071 Epoch: 14 Global Step: 182200 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 18:58:31,002-Speed 2993.22 samples/sec Loss 3.7760 LearningRate 0.0071 Epoch: 14 Global Step: 182210 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 18:58:34,343-Speed 3065.80 samples/sec Loss 3.8365 LearningRate 0.0071 Epoch: 14 Global Step: 182220 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 18:58:37,775-Speed 2984.36 samples/sec Loss 3.9815 LearningRate 0.0071 Epoch: 14 Global Step: 182230 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 18:58:41,165-Speed 3020.85 samples/sec Loss 3.8433 LearningRate 0.0071 Epoch: 14 Global Step: 182240 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 18:58:44,533-Speed 3041.77 samples/sec Loss 3.8656 LearningRate 0.0071 Epoch: 14 Global Step: 182250 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 18:58:47,865-Speed 3074.29 samples/sec Loss 3.8595 LearningRate 0.0071 Epoch: 14 Global Step: 182260 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 18:58:51,252-Speed 3024.24 samples/sec Loss 3.8383 LearningRate 0.0071 Epoch: 14 Global Step: 182270 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:58:54,569-Speed 3087.63 samples/sec Loss 3.8888 LearningRate 0.0071 Epoch: 14 Global Step: 182280 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:58:57,961-Speed 3020.18 samples/sec Loss 3.7793 LearningRate 0.0071 Epoch: 14 Global Step: 182290 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:59:01,327-Speed 3042.99 samples/sec Loss 3.8422 LearningRate 0.0071 Epoch: 14 Global Step: 182300 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:59:04,737-Speed 3004.09 samples/sec Loss 3.8173 LearningRate 0.0071 Epoch: 14 Global Step: 182310 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:59:08,210-Speed 2949.72 samples/sec Loss 3.8049 LearningRate 0.0071 Epoch: 14 Global Step: 182320 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:59:11,625-Speed 2998.96 samples/sec Loss 3.8287 LearningRate 0.0071 Epoch: 14 Global Step: 182330 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:59:15,004-Speed 3031.24 samples/sec Loss 3.8303 LearningRate 0.0071 Epoch: 14 Global Step: 182340 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:59:18,514-Speed 2918.01 samples/sec Loss 3.8610 LearningRate 0.0071 Epoch: 14 Global Step: 182350 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:59:21,824-Speed 3095.06 samples/sec Loss 3.7808 LearningRate 0.0071 Epoch: 14 Global Step: 182360 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:59:25,154-Speed 3076.04 samples/sec Loss 3.7877 LearningRate 0.0071 Epoch: 14 Global Step: 182370 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 18:59:28,501-Speed 3060.11 samples/sec Loss 3.8961 LearningRate 0.0071 Epoch: 14 Global Step: 182380 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 18:59:31,887-Speed 3024.53 samples/sec Loss 3.8875 LearningRate 0.0071 Epoch: 14 Global Step: 182390 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 18:59:35,288-Speed 3012.54 samples/sec Loss 3.9040 LearningRate 0.0071 Epoch: 14 Global Step: 182400 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:59:38,661-Speed 3036.61 samples/sec Loss 3.8549 LearningRate 0.0071 Epoch: 14 Global Step: 182410 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:59:42,043-Speed 3028.66 samples/sec Loss 3.8397 LearningRate 0.0071 Epoch: 14 Global Step: 182420 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:59:45,382-Speed 3067.81 samples/sec Loss 3.8754 LearningRate 0.0071 Epoch: 14 Global Step: 182430 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:59:48,757-Speed 3035.36 samples/sec Loss 3.9450 LearningRate 0.0071 Epoch: 14 Global Step: 182440 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:59:52,102-Speed 3062.45 samples/sec Loss 3.8182 LearningRate 0.0071 Epoch: 14 Global Step: 182450 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:59:55,544-Speed 2975.95 samples/sec Loss 3.7240 LearningRate 0.0070 Epoch: 14 Global Step: 182460 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 18:59:58,915-Speed 3038.65 samples/sec Loss 3.8711 LearningRate 0.0070 Epoch: 14 Global Step: 182470 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:00:02,288-Speed 3036.20 samples/sec Loss 3.8781 LearningRate 0.0070 Epoch: 14 Global Step: 182480 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:00:05,679-Speed 3020.85 samples/sec Loss 3.8933 LearningRate 0.0070 Epoch: 14 Global Step: 182490 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:00:09,113-Speed 2983.01 samples/sec Loss 3.8662 LearningRate 0.0070 Epoch: 14 Global Step: 182500 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:00:12,642-Speed 2902.46 samples/sec Loss 3.8406 LearningRate 0.0070 Epoch: 14 Global Step: 182510 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:00:16,015-Speed 3036.77 samples/sec Loss 3.9324 LearningRate 0.0070 Epoch: 14 Global Step: 182520 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:00:19,444-Speed 2987.62 samples/sec Loss 3.8213 LearningRate 0.0070 Epoch: 14 Global Step: 182530 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:00:22,890-Speed 2972.34 samples/sec Loss 3.9242 LearningRate 0.0070 Epoch: 14 Global Step: 182540 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:00:26,366-Speed 2947.04 samples/sec Loss 3.9492 LearningRate 0.0070 Epoch: 14 Global Step: 182550 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:00:29,817-Speed 2967.81 samples/sec Loss 3.8721 LearningRate 0.0070 Epoch: 14 Global Step: 182560 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:00:33,191-Speed 3035.89 samples/sec Loss 3.8647 LearningRate 0.0070 Epoch: 14 Global Step: 182570 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:00:36,605-Speed 3000.52 samples/sec Loss 3.9270 LearningRate 0.0070 Epoch: 14 Global Step: 182580 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:00:39,956-Speed 3057.37 samples/sec Loss 3.8446 LearningRate 0.0070 Epoch: 14 Global Step: 182590 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:00:43,284-Speed 3077.38 samples/sec Loss 3.8844 LearningRate 0.0070 Epoch: 14 Global Step: 182600 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:00:46,677-Speed 3019.37 samples/sec Loss 3.9411 LearningRate 0.0070 Epoch: 14 Global Step: 182610 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:00:50,055-Speed 3032.02 samples/sec Loss 3.8094 LearningRate 0.0070 Epoch: 14 Global Step: 182620 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:00:53,423-Speed 3041.53 samples/sec Loss 3.8499 LearningRate 0.0070 Epoch: 14 Global Step: 182630 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:00:56,770-Speed 3060.91 samples/sec Loss 3.8916 LearningRate 0.0070 Epoch: 14 Global Step: 182640 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:01:00,120-Speed 3057.58 samples/sec Loss 3.8839 LearningRate 0.0070 Epoch: 14 Global Step: 182650 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:01:03,527-Speed 3006.55 samples/sec Loss 3.8359 LearningRate 0.0070 Epoch: 14 Global Step: 182660 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:01:06,935-Speed 3005.47 samples/sec Loss 3.8594 LearningRate 0.0070 Epoch: 14 Global Step: 182670 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:01:10,341-Speed 3007.08 samples/sec Loss 3.7980 LearningRate 0.0070 Epoch: 14 Global Step: 182680 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:01:13,765-Speed 2991.23 samples/sec Loss 3.9573 LearningRate 0.0070 Epoch: 14 Global Step: 182690 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:01:17,149-Speed 3026.87 samples/sec Loss 3.8111 LearningRate 0.0070 Epoch: 14 Global Step: 182700 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:01:20,500-Speed 3057.06 samples/sec Loss 3.7589 LearningRate 0.0070 Epoch: 14 Global Step: 182710 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:01:23,885-Speed 3025.72 samples/sec Loss 3.8115 LearningRate 0.0070 Epoch: 14 Global Step: 182720 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:01:27,283-Speed 3014.33 samples/sec Loss 3.9053 LearningRate 0.0070 Epoch: 14 Global Step: 182730 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:01:30,710-Speed 2989.25 samples/sec Loss 3.8871 LearningRate 0.0070 Epoch: 14 Global Step: 182740 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:01:34,089-Speed 3031.49 samples/sec Loss 3.9157 LearningRate 0.0070 Epoch: 14 Global Step: 182750 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:01:37,437-Speed 3059.32 samples/sec Loss 3.8474 LearningRate 0.0070 Epoch: 14 Global Step: 182760 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:01:40,859-Speed 2993.20 samples/sec Loss 3.8832 LearningRate 0.0070 Epoch: 14 Global Step: 182770 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:01:44,311-Speed 2967.27 samples/sec Loss 3.8478 LearningRate 0.0070 Epoch: 14 Global Step: 182780 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:01:47,685-Speed 3035.16 samples/sec Loss 3.9550 LearningRate 0.0070 Epoch: 14 Global Step: 182790 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:01:51,085-Speed 3012.91 samples/sec Loss 3.8411 LearningRate 0.0070 Epoch: 14 Global Step: 182800 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:01:54,471-Speed 3024.96 samples/sec Loss 3.8155 LearningRate 0.0070 Epoch: 14 Global Step: 182810 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:01:57,973-Speed 2924.78 samples/sec Loss 3.8958 LearningRate 0.0070 Epoch: 14 Global Step: 182820 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:02:01,372-Speed 3013.73 samples/sec Loss 3.9065 LearningRate 0.0070 Epoch: 14 Global Step: 182830 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:02:04,745-Speed 3036.75 samples/sec Loss 3.8677 LearningRate 0.0070 Epoch: 14 Global Step: 182840 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:02:08,165-Speed 2994.95 samples/sec Loss 3.9142 LearningRate 0.0070 Epoch: 14 Global Step: 182850 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:02:11,592-Speed 2989.36 samples/sec Loss 3.8381 LearningRate 0.0070 Epoch: 14 Global Step: 182860 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:02:14,933-Speed 3065.53 samples/sec Loss 3.9547 LearningRate 0.0070 Epoch: 14 Global Step: 182870 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:02:18,270-Speed 3069.19 samples/sec Loss 3.7845 LearningRate 0.0070 Epoch: 14 Global Step: 182880 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:02:21,730-Speed 2960.57 samples/sec Loss 3.7809 LearningRate 0.0070 Epoch: 14 Global Step: 182890 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:02:25,111-Speed 3029.32 samples/sec Loss 3.7990 LearningRate 0.0070 Epoch: 14 Global Step: 182900 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:02:28,488-Speed 3033.60 samples/sec Loss 3.8583 LearningRate 0.0070 Epoch: 14 Global Step: 182910 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:02:31,865-Speed 3032.33 samples/sec Loss 3.8518 LearningRate 0.0070 Epoch: 14 Global Step: 182920 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:02:35,286-Speed 2993.91 samples/sec Loss 3.8528 LearningRate 0.0069 Epoch: 14 Global Step: 182930 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:02:38,651-Speed 3044.20 samples/sec Loss 3.8493 LearningRate 0.0069 Epoch: 14 Global Step: 182940 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:02:42,026-Speed 3035.54 samples/sec Loss 3.7630 LearningRate 0.0069 Epoch: 14 Global Step: 182950 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:02:45,401-Speed 3034.79 samples/sec Loss 3.8947 LearningRate 0.0069 Epoch: 14 Global Step: 182960 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:02:48,771-Speed 3039.40 samples/sec Loss 3.8817 LearningRate 0.0069 Epoch: 14 Global Step: 182970 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:02:52,138-Speed 3042.15 samples/sec Loss 3.7808 LearningRate 0.0069 Epoch: 14 Global Step: 182980 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:02:55,498-Speed 3049.48 samples/sec Loss 3.8904 LearningRate 0.0069 Epoch: 14 Global Step: 182990 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:02:58,858-Speed 3048.37 samples/sec Loss 3.9087 LearningRate 0.0069 Epoch: 14 Global Step: 183000 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:03:02,262-Speed 3009.27 samples/sec Loss 3.8339 LearningRate 0.0069 Epoch: 14 Global Step: 183010 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:03:05,643-Speed 3028.90 samples/sec Loss 3.9383 LearningRate 0.0069 Epoch: 14 Global Step: 183020 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:03:09,041-Speed 3015.00 samples/sec Loss 3.7696 LearningRate 0.0069 Epoch: 14 Global Step: 183030 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:03:12,435-Speed 3017.58 samples/sec Loss 3.8123 LearningRate 0.0069 Epoch: 14 Global Step: 183040 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:03:15,791-Speed 3052.74 samples/sec Loss 3.8339 LearningRate 0.0069 Epoch: 14 Global Step: 183050 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:03:19,225-Speed 2982.67 samples/sec Loss 3.8752 LearningRate 0.0069 Epoch: 14 Global Step: 183060 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:03:22,614-Speed 3021.94 samples/sec Loss 3.8047 LearningRate 0.0069 Epoch: 14 Global Step: 183070 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:03:26,049-Speed 2982.08 samples/sec Loss 4.0022 LearningRate 0.0069 Epoch: 14 Global Step: 183080 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:03:29,443-Speed 3018.42 samples/sec Loss 3.8596 LearningRate 0.0069 Epoch: 14 Global Step: 183090 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:03:32,892-Speed 2969.09 samples/sec Loss 3.9289 LearningRate 0.0069 Epoch: 14 Global Step: 183100 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:03:36,315-Speed 2992.37 samples/sec Loss 3.7572 LearningRate 0.0069 Epoch: 14 Global Step: 183110 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:03:39,767-Speed 2967.20 samples/sec Loss 3.8569 LearningRate 0.0069 Epoch: 14 Global Step: 183120 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 19:03:43,213-Speed 2973.08 samples/sec Loss 3.8288 LearningRate 0.0069 Epoch: 14 Global Step: 183130 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 19:03:46,646-Speed 2983.10 samples/sec Loss 3.9191 LearningRate 0.0069 Epoch: 14 Global Step: 183140 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 19:03:50,071-Speed 2991.64 samples/sec Loss 3.8061 LearningRate 0.0069 Epoch: 14 Global Step: 183150 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 19:03:53,412-Speed 3065.79 samples/sec Loss 3.8279 LearningRate 0.0069 Epoch: 14 Global Step: 183160 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 19:03:56,738-Speed 3079.64 samples/sec Loss 3.8244 LearningRate 0.0069 Epoch: 14 Global Step: 183170 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 19:04:00,138-Speed 3012.35 samples/sec Loss 3.8225 LearningRate 0.0069 Epoch: 14 Global Step: 183180 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:04:03,530-Speed 3019.58 samples/sec Loss 3.8367 LearningRate 0.0069 Epoch: 14 Global Step: 183190 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:04:06,854-Speed 3082.26 samples/sec Loss 3.8799 LearningRate 0.0069 Epoch: 14 Global Step: 183200 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:04:10,168-Speed 3091.15 samples/sec Loss 3.8683 LearningRate 0.0069 Epoch: 14 Global Step: 183210 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:04:13,568-Speed 3011.95 samples/sec Loss 3.8203 LearningRate 0.0069 Epoch: 14 Global Step: 183220 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:04:16,942-Speed 3035.57 samples/sec Loss 3.8145 LearningRate 0.0069 Epoch: 14 Global Step: 183230 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:04:20,311-Speed 3040.44 samples/sec Loss 3.9331 LearningRate 0.0069 Epoch: 14 Global Step: 183240 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:04:23,665-Speed 3054.45 samples/sec Loss 3.8110 LearningRate 0.0069 Epoch: 14 Global Step: 183250 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:04:27,057-Speed 3019.90 samples/sec Loss 3.9250 LearningRate 0.0069 Epoch: 14 Global Step: 183260 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:04:30,407-Speed 3057.03 samples/sec Loss 3.8234 LearningRate 0.0069 Epoch: 14 Global Step: 183270 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:04:33,793-Speed 3024.84 samples/sec Loss 3.9289 LearningRate 0.0069 Epoch: 14 Global Step: 183280 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:04:37,162-Speed 3040.74 samples/sec Loss 3.9569 LearningRate 0.0069 Epoch: 14 Global Step: 183290 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:04:40,597-Speed 2982.15 samples/sec Loss 3.8550 LearningRate 0.0069 Epoch: 14 Global Step: 183300 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:04:44,003-Speed 3006.76 samples/sec Loss 3.8609 LearningRate 0.0069 Epoch: 14 Global Step: 183310 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:04:47,428-Speed 2990.18 samples/sec Loss 3.7757 LearningRate 0.0069 Epoch: 14 Global Step: 183320 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:04:50,769-Speed 3066.78 samples/sec Loss 3.8867 LearningRate 0.0069 Epoch: 14 Global Step: 183330 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:04:54,107-Speed 3068.12 samples/sec Loss 3.9091 LearningRate 0.0069 Epoch: 14 Global Step: 183340 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:04:57,498-Speed 3021.37 samples/sec Loss 3.8727 LearningRate 0.0069 Epoch: 14 Global Step: 183350 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:05:00,866-Speed 3040.66 samples/sec Loss 3.8500 LearningRate 0.0069 Epoch: 14 Global Step: 183360 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:05:04,219-Speed 3055.02 samples/sec Loss 3.9544 LearningRate 0.0069 Epoch: 14 Global Step: 183370 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:05:07,554-Speed 3071.24 samples/sec Loss 3.8142 LearningRate 0.0069 Epoch: 14 Global Step: 183380 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:05:10,902-Speed 3061.05 samples/sec Loss 3.8878 LearningRate 0.0069 Epoch: 14 Global Step: 183390 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:05:14,353-Speed 2968.04 samples/sec Loss 3.8089 LearningRate 0.0069 Epoch: 14 Global Step: 183400 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:05:17,772-Speed 2996.48 samples/sec Loss 3.8219 LearningRate 0.0068 Epoch: 14 Global Step: 183410 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:05:21,160-Speed 3023.45 samples/sec Loss 3.7835 LearningRate 0.0068 Epoch: 14 Global Step: 183420 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:05:24,618-Speed 2962.87 samples/sec Loss 3.8375 LearningRate 0.0068 Epoch: 14 Global Step: 183430 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:05:28,055-Speed 2980.22 samples/sec Loss 3.8373 LearningRate 0.0068 Epoch: 14 Global Step: 183440 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:05:31,522-Speed 2954.40 samples/sec Loss 3.7660 LearningRate 0.0068 Epoch: 14 Global Step: 183450 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:05:34,880-Speed 3050.78 samples/sec Loss 3.8476 LearningRate 0.0068 Epoch: 14 Global Step: 183460 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:05:38,249-Speed 3041.07 samples/sec Loss 3.8225 LearningRate 0.0068 Epoch: 14 Global Step: 183470 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:05:41,727-Speed 2944.48 samples/sec Loss 3.8155 LearningRate 0.0068 Epoch: 14 Global Step: 183480 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:05:45,148-Speed 2994.08 samples/sec Loss 3.8888 LearningRate 0.0068 Epoch: 14 Global Step: 183490 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:05:48,608-Speed 2960.48 samples/sec Loss 3.8476 LearningRate 0.0068 Epoch: 14 Global Step: 183500 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:05:51,999-Speed 3020.41 samples/sec Loss 3.9243 LearningRate 0.0068 Epoch: 14 Global Step: 183510 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:05:55,357-Speed 3050.27 samples/sec Loss 3.8475 LearningRate 0.0068 Epoch: 14 Global Step: 183520 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:05:58,709-Speed 3055.97 samples/sec Loss 3.8490 LearningRate 0.0068 Epoch: 14 Global Step: 183530 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:06:02,086-Speed 3033.32 samples/sec Loss 3.8152 LearningRate 0.0068 Epoch: 14 Global Step: 183540 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:06:05,507-Speed 2993.71 samples/sec Loss 3.8857 LearningRate 0.0068 Epoch: 14 Global Step: 183550 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:06:08,869-Speed 3046.50 samples/sec Loss 3.8663 LearningRate 0.0068 Epoch: 14 Global Step: 183560 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:06:12,262-Speed 3018.83 samples/sec Loss 3.8472 LearningRate 0.0068 Epoch: 14 Global Step: 183570 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:06:15,673-Speed 3003.41 samples/sec Loss 3.8921 LearningRate 0.0068 Epoch: 14 Global Step: 183580 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:06:19,046-Speed 3037.02 samples/sec Loss 3.9113 LearningRate 0.0068 Epoch: 14 Global Step: 183590 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:06:22,464-Speed 2996.76 samples/sec Loss 3.8538 LearningRate 0.0068 Epoch: 14 Global Step: 183600 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:06:25,778-Speed 3090.13 samples/sec Loss 3.8822 LearningRate 0.0068 Epoch: 14 Global Step: 183610 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:06:29,133-Speed 3053.96 samples/sec Loss 3.8770 LearningRate 0.0068 Epoch: 14 Global Step: 183620 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:06:32,532-Speed 3013.01 samples/sec Loss 3.7951 LearningRate 0.0068 Epoch: 14 Global Step: 183630 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:06:35,958-Speed 2990.25 samples/sec Loss 3.8543 LearningRate 0.0068 Epoch: 14 Global Step: 183640 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:06:39,312-Speed 3054.12 samples/sec Loss 3.8978 LearningRate 0.0068 Epoch: 14 Global Step: 183650 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:06:42,727-Speed 2999.84 samples/sec Loss 3.9243 LearningRate 0.0068 Epoch: 14 Global Step: 183660 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 19:06:46,115-Speed 3022.62 samples/sec Loss 3.8046 LearningRate 0.0068 Epoch: 14 Global Step: 183670 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 19:06:49,455-Speed 3066.84 samples/sec Loss 3.8129 LearningRate 0.0068 Epoch: 14 Global Step: 183680 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:06:52,851-Speed 3016.70 samples/sec Loss 3.8317 LearningRate 0.0068 Epoch: 14 Global Step: 183690 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:06:56,192-Speed 3065.54 samples/sec Loss 3.9097 LearningRate 0.0068 Epoch: 14 Global Step: 183700 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:06:59,643-Speed 2967.90 samples/sec Loss 3.8539 LearningRate 0.0068 Epoch: 14 Global Step: 183710 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:07:02,989-Speed 3061.28 samples/sec Loss 3.8970 LearningRate 0.0068 Epoch: 14 Global Step: 183720 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:07:06,466-Speed 2946.10 samples/sec Loss 3.9556 LearningRate 0.0068 Epoch: 14 Global Step: 183730 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:07:09,860-Speed 3018.50 samples/sec Loss 3.8601 LearningRate 0.0068 Epoch: 14 Global Step: 183740 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:07:13,254-Speed 3018.11 samples/sec Loss 3.8429 LearningRate 0.0068 Epoch: 14 Global Step: 183750 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:07:16,613-Speed 3048.62 samples/sec Loss 3.7719 LearningRate 0.0068 Epoch: 14 Global Step: 183760 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:07:19,942-Speed 3077.47 samples/sec Loss 3.8233 LearningRate 0.0068 Epoch: 14 Global Step: 183770 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:07:23,343-Speed 3011.92 samples/sec Loss 4.0004 LearningRate 0.0068 Epoch: 14 Global Step: 183780 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:07:26,708-Speed 3044.27 samples/sec Loss 3.8132 LearningRate 0.0068 Epoch: 14 Global Step: 183790 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:07:30,156-Speed 2970.29 samples/sec Loss 3.8002 LearningRate 0.0068 Epoch: 14 Global Step: 183800 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:07:33,572-Speed 2998.85 samples/sec Loss 3.8128 LearningRate 0.0068 Epoch: 14 Global Step: 183810 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:07:37,070-Speed 2928.51 samples/sec Loss 3.8811 LearningRate 0.0068 Epoch: 14 Global Step: 183820 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:07:40,484-Speed 3000.20 samples/sec Loss 3.8911 LearningRate 0.0068 Epoch: 14 Global Step: 183830 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:07:43,853-Speed 3040.54 samples/sec Loss 3.7227 LearningRate 0.0068 Epoch: 14 Global Step: 183840 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:07:47,279-Speed 2989.10 samples/sec Loss 3.8246 LearningRate 0.0068 Epoch: 14 Global Step: 183850 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:07:50,685-Speed 3007.86 samples/sec Loss 3.9585 LearningRate 0.0068 Epoch: 14 Global Step: 183860 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:07:54,031-Speed 3060.73 samples/sec Loss 3.8618 LearningRate 0.0068 Epoch: 14 Global Step: 183870 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:07:57,369-Speed 3069.47 samples/sec Loss 3.8268 LearningRate 0.0067 Epoch: 14 Global Step: 183880 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:08:00,778-Speed 3004.48 samples/sec Loss 3.7887 LearningRate 0.0067 Epoch: 14 Global Step: 183890 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:08:04,111-Speed 3072.72 samples/sec Loss 3.7488 LearningRate 0.0067 Epoch: 14 Global Step: 183900 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:08:07,456-Speed 3062.92 samples/sec Loss 3.8327 LearningRate 0.0067 Epoch: 14 Global Step: 183910 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:08:10,888-Speed 2984.03 samples/sec Loss 3.8834 LearningRate 0.0067 Epoch: 14 Global Step: 183920 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:08:14,292-Speed 3009.03 samples/sec Loss 3.9003 LearningRate 0.0067 Epoch: 14 Global Step: 183930 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:08:17,687-Speed 3017.09 samples/sec Loss 3.8310 LearningRate 0.0067 Epoch: 14 Global Step: 183940 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:08:21,068-Speed 3030.19 samples/sec Loss 3.7694 LearningRate 0.0067 Epoch: 14 Global Step: 183950 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:08:24,434-Speed 3042.87 samples/sec Loss 3.7950 LearningRate 0.0067 Epoch: 14 Global Step: 183960 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:08:27,830-Speed 3015.78 samples/sec Loss 3.8106 LearningRate 0.0067 Epoch: 14 Global Step: 183970 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:08:31,286-Speed 2964.77 samples/sec Loss 3.7842 LearningRate 0.0067 Epoch: 14 Global Step: 183980 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:08:34,661-Speed 3035.40 samples/sec Loss 3.8745 LearningRate 0.0067 Epoch: 14 Global Step: 183990 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:08:38,023-Speed 3047.00 samples/sec Loss 3.8465 LearningRate 0.0067 Epoch: 14 Global Step: 184000 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:08:41,507-Speed 2939.79 samples/sec Loss 3.7814 LearningRate 0.0067 Epoch: 14 Global Step: 184010 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:08:44,838-Speed 3075.27 samples/sec Loss 3.8607 LearningRate 0.0067 Epoch: 14 Global Step: 184020 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:08:48,230-Speed 3019.01 samples/sec Loss 3.8061 LearningRate 0.0067 Epoch: 14 Global Step: 184030 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:08:51,558-Speed 3078.55 samples/sec Loss 3.9081 LearningRate 0.0067 Epoch: 14 Global Step: 184040 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:08:54,973-Speed 2998.93 samples/sec Loss 3.9057 LearningRate 0.0067 Epoch: 14 Global Step: 184050 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:08:58,307-Speed 3072.23 samples/sec Loss 3.8652 LearningRate 0.0067 Epoch: 14 Global Step: 184060 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:09:01,717-Speed 3003.51 samples/sec Loss 3.7094 LearningRate 0.0067 Epoch: 14 Global Step: 184070 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:09:05,060-Speed 3064.27 samples/sec Loss 3.7867 LearningRate 0.0067 Epoch: 14 Global Step: 184080 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:09:08,431-Speed 3038.98 samples/sec Loss 3.7705 LearningRate 0.0067 Epoch: 14 Global Step: 184090 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:09:11,795-Speed 3045.62 samples/sec Loss 3.8635 LearningRate 0.0067 Epoch: 14 Global Step: 184100 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:09:15,101-Speed 3098.43 samples/sec Loss 3.8815 LearningRate 0.0067 Epoch: 14 Global Step: 184110 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:09:18,482-Speed 3029.85 samples/sec Loss 3.8030 LearningRate 0.0067 Epoch: 14 Global Step: 184120 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:09:21,784-Speed 3101.40 samples/sec Loss 3.8717 LearningRate 0.0067 Epoch: 14 Global Step: 184130 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:09:25,121-Speed 3069.84 samples/sec Loss 3.8563 LearningRate 0.0067 Epoch: 14 Global Step: 184140 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:09:28,511-Speed 3021.33 samples/sec Loss 3.9493 LearningRate 0.0067 Epoch: 14 Global Step: 184150 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:09:31,966-Speed 2964.42 samples/sec Loss 3.7873 LearningRate 0.0067 Epoch: 14 Global Step: 184160 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:09:35,302-Speed 3070.50 samples/sec Loss 3.8443 LearningRate 0.0067 Epoch: 14 Global Step: 184170 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:09:38,714-Speed 3002.95 samples/sec Loss 3.9012 LearningRate 0.0067 Epoch: 14 Global Step: 184180 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:09:42,090-Speed 3033.63 samples/sec Loss 3.7897 LearningRate 0.0067 Epoch: 14 Global Step: 184190 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:09:45,431-Speed 3065.70 samples/sec Loss 3.8246 LearningRate 0.0067 Epoch: 14 Global Step: 184200 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:09:48,760-Speed 3077.21 samples/sec Loss 3.9548 LearningRate 0.0067 Epoch: 14 Global Step: 184210 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:09:52,102-Speed 3064.97 samples/sec Loss 3.8575 LearningRate 0.0067 Epoch: 14 Global Step: 184220 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:09:55,554-Speed 2966.71 samples/sec Loss 3.8482 LearningRate 0.0067 Epoch: 14 Global Step: 184230 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:09:58,901-Speed 3060.86 samples/sec Loss 3.8256 LearningRate 0.0067 Epoch: 14 Global Step: 184240 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:10:02,343-Speed 2975.35 samples/sec Loss 3.8628 LearningRate 0.0067 Epoch: 14 Global Step: 184250 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:10:05,728-Speed 3025.96 samples/sec Loss 3.8810 LearningRate 0.0067 Epoch: 14 Global Step: 184260 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:10:09,168-Speed 2977.51 samples/sec Loss 3.7535 LearningRate 0.0067 Epoch: 14 Global Step: 184270 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:10:12,501-Speed 3074.01 samples/sec Loss 3.8856 LearningRate 0.0067 Epoch: 14 Global Step: 184280 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:10:15,829-Speed 3076.95 samples/sec Loss 3.8003 LearningRate 0.0067 Epoch: 14 Global Step: 184290 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:10:19,201-Speed 3037.74 samples/sec Loss 3.7690 LearningRate 0.0067 Epoch: 14 Global Step: 184300 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:10:22,537-Speed 3070.62 samples/sec Loss 3.8635 LearningRate 0.0067 Epoch: 14 Global Step: 184310 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:10:25,892-Speed 3052.81 samples/sec Loss 3.9096 LearningRate 0.0067 Epoch: 14 Global Step: 184320 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:10:29,327-Speed 2981.70 samples/sec Loss 3.7378 LearningRate 0.0067 Epoch: 14 Global Step: 184330 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:10:32,684-Speed 3051.48 samples/sec Loss 3.8161 LearningRate 0.0067 Epoch: 14 Global Step: 184340 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:10:35,996-Speed 3093.12 samples/sec Loss 3.7862 LearningRate 0.0067 Epoch: 14 Global Step: 184350 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 19:10:39,340-Speed 3062.45 samples/sec Loss 3.8358 LearningRate 0.0066 Epoch: 14 Global Step: 184360 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 19:10:42,709-Speed 3040.88 samples/sec Loss 3.8958 LearningRate 0.0066 Epoch: 14 Global Step: 184370 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 19:10:46,118-Speed 3004.04 samples/sec Loss 3.7081 LearningRate 0.0066 Epoch: 14 Global Step: 184380 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:10:49,485-Speed 3042.42 samples/sec Loss 3.8208 LearningRate 0.0066 Epoch: 14 Global Step: 184390 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:10:52,875-Speed 3021.27 samples/sec Loss 3.7641 LearningRate 0.0066 Epoch: 14 Global Step: 184400 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:10:56,340-Speed 2956.64 samples/sec Loss 3.8833 LearningRate 0.0066 Epoch: 14 Global Step: 184410 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:10:59,713-Speed 3036.68 samples/sec Loss 3.9187 LearningRate 0.0066 Epoch: 14 Global Step: 184420 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:11:03,160-Speed 2970.99 samples/sec Loss 3.8619 LearningRate 0.0066 Epoch: 14 Global Step: 184430 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:11:06,594-Speed 2982.87 samples/sec Loss 3.8938 LearningRate 0.0066 Epoch: 14 Global Step: 184440 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:11:10,031-Speed 2979.81 samples/sec Loss 3.8015 LearningRate 0.0066 Epoch: 14 Global Step: 184450 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:11:13,443-Speed 3002.44 samples/sec Loss 3.8155 LearningRate 0.0066 Epoch: 14 Global Step: 184460 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:11:16,886-Speed 2975.16 samples/sec Loss 3.8230 LearningRate 0.0066 Epoch: 14 Global Step: 184470 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:11:20,303-Speed 2997.80 samples/sec Loss 3.7380 LearningRate 0.0066 Epoch: 14 Global Step: 184480 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 19:11:23,735-Speed 2984.46 samples/sec Loss 3.6967 LearningRate 0.0066 Epoch: 14 Global Step: 184490 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 19:11:27,206-Speed 2950.43 samples/sec Loss 3.7862 LearningRate 0.0066 Epoch: 14 Global Step: 184500 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 19:11:30,641-Speed 2982.59 samples/sec Loss 3.9237 LearningRate 0.0066 Epoch: 14 Global Step: 184510 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 19:11:34,033-Speed 3019.33 samples/sec Loss 3.7552 LearningRate 0.0066 Epoch: 14 Global Step: 184520 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 19:11:37,535-Speed 2924.79 samples/sec Loss 3.8714 LearningRate 0.0066 Epoch: 14 Global Step: 184530 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 19:11:40,920-Speed 3026.08 samples/sec Loss 3.8426 LearningRate 0.0066 Epoch: 14 Global Step: 184540 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 19:11:44,335-Speed 2999.51 samples/sec Loss 3.9242 LearningRate 0.0066 Epoch: 14 Global Step: 184550 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 19:11:47,744-Speed 3004.34 samples/sec Loss 3.8193 LearningRate 0.0066 Epoch: 14 Global Step: 184560 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 19:11:51,173-Speed 2987.89 samples/sec Loss 3.8062 LearningRate 0.0066 Epoch: 14 Global Step: 184570 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 19:11:54,547-Speed 3035.55 samples/sec Loss 3.8279 LearningRate 0.0066 Epoch: 14 Global Step: 184580 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:11:57,859-Speed 3091.91 samples/sec Loss 3.8708 LearningRate 0.0066 Epoch: 14 Global Step: 184590 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:12:01,205-Speed 3061.27 samples/sec Loss 3.8665 LearningRate 0.0066 Epoch: 14 Global Step: 184600 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:12:04,623-Speed 2997.21 samples/sec Loss 3.8098 LearningRate 0.0066 Epoch: 14 Global Step: 184610 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:12:07,965-Speed 3064.16 samples/sec Loss 3.7995 LearningRate 0.0066 Epoch: 14 Global Step: 184620 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:12:11,335-Speed 3039.64 samples/sec Loss 3.8185 LearningRate 0.0066 Epoch: 14 Global Step: 184630 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:12:14,752-Speed 2997.91 samples/sec Loss 3.7989 LearningRate 0.0066 Epoch: 14 Global Step: 184640 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:12:18,108-Speed 3051.98 samples/sec Loss 3.9229 LearningRate 0.0066 Epoch: 14 Global Step: 184650 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:12:21,574-Speed 2955.61 samples/sec Loss 3.8250 LearningRate 0.0066 Epoch: 14 Global Step: 184660 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:12:25,004-Speed 2985.50 samples/sec Loss 3.8260 LearningRate 0.0066 Epoch: 14 Global Step: 184670 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:12:28,355-Speed 3057.13 samples/sec Loss 3.7272 LearningRate 0.0066 Epoch: 14 Global Step: 184680 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:12:31,723-Speed 3040.97 samples/sec Loss 3.7663 LearningRate 0.0066 Epoch: 14 Global Step: 184690 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:12:35,071-Speed 3059.23 samples/sec Loss 3.9349 LearningRate 0.0066 Epoch: 14 Global Step: 184700 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:12:38,511-Speed 2977.02 samples/sec Loss 3.8220 LearningRate 0.0066 Epoch: 14 Global Step: 184710 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:12:41,833-Speed 3083.96 samples/sec Loss 3.8359 LearningRate 0.0066 Epoch: 14 Global Step: 184720 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:12:45,268-Speed 2982.04 samples/sec Loss 3.8491 LearningRate 0.0066 Epoch: 14 Global Step: 184730 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:12:48,659-Speed 3020.15 samples/sec Loss 3.8414 LearningRate 0.0066 Epoch: 14 Global Step: 184740 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:12:52,027-Speed 3041.81 samples/sec Loss 3.8205 LearningRate 0.0066 Epoch: 14 Global Step: 184750 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:12:55,440-Speed 3001.23 samples/sec Loss 3.8674 LearningRate 0.0066 Epoch: 14 Global Step: 184760 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:12:58,872-Speed 2984.15 samples/sec Loss 3.7317 LearningRate 0.0066 Epoch: 14 Global Step: 184770 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:13:02,202-Speed 3076.06 samples/sec Loss 3.7683 LearningRate 0.0066 Epoch: 14 Global Step: 184780 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:13:05,647-Speed 2973.72 samples/sec Loss 3.8672 LearningRate 0.0066 Epoch: 14 Global Step: 184790 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:13:09,118-Speed 2950.94 samples/sec Loss 3.9063 LearningRate 0.0066 Epoch: 14 Global Step: 184800 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:13:12,468-Speed 3057.28 samples/sec Loss 3.8697 LearningRate 0.0066 Epoch: 14 Global Step: 184810 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:13:15,926-Speed 2961.75 samples/sec Loss 3.8932 LearningRate 0.0066 Epoch: 14 Global Step: 184820 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:13:19,371-Speed 2973.61 samples/sec Loss 3.8572 LearningRate 0.0066 Epoch: 14 Global Step: 184830 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:13:22,754-Speed 3027.62 samples/sec Loss 3.7746 LearningRate 0.0066 Epoch: 14 Global Step: 184840 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:13:26,096-Speed 3064.55 samples/sec Loss 3.8534 LearningRate 0.0065 Epoch: 14 Global Step: 184850 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:13:29,469-Speed 3036.80 samples/sec Loss 3.8459 LearningRate 0.0065 Epoch: 14 Global Step: 184860 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:13:32,843-Speed 3036.32 samples/sec Loss 3.8446 LearningRate 0.0065 Epoch: 14 Global Step: 184870 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:13:36,206-Speed 3045.63 samples/sec Loss 3.7947 LearningRate 0.0065 Epoch: 14 Global Step: 184880 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:13:39,515-Speed 3094.51 samples/sec Loss 3.8968 LearningRate 0.0065 Epoch: 14 Global Step: 184890 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:13:42,837-Speed 3083.93 samples/sec Loss 3.6810 LearningRate 0.0065 Epoch: 14 Global Step: 184900 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:13:46,163-Speed 3079.21 samples/sec Loss 3.8382 LearningRate 0.0065 Epoch: 14 Global Step: 184910 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:13:49,499-Speed 3071.33 samples/sec Loss 3.8619 LearningRate 0.0065 Epoch: 14 Global Step: 184920 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:13:52,850-Speed 3056.69 samples/sec Loss 3.8093 LearningRate 0.0065 Epoch: 14 Global Step: 184930 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:13:56,185-Speed 3071.17 samples/sec Loss 3.7121 LearningRate 0.0065 Epoch: 14 Global Step: 184940 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:13:59,536-Speed 3056.95 samples/sec Loss 3.9184 LearningRate 0.0065 Epoch: 14 Global Step: 184950 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:14:02,913-Speed 3032.81 samples/sec Loss 3.7676 LearningRate 0.0065 Epoch: 14 Global Step: 184960 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:14:06,368-Speed 2964.42 samples/sec Loss 3.8023 LearningRate 0.0065 Epoch: 14 Global Step: 184970 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:14:09,820-Speed 2967.51 samples/sec Loss 3.8878 LearningRate 0.0065 Epoch: 14 Global Step: 184980 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:14:13,183-Speed 3045.29 samples/sec Loss 3.7842 LearningRate 0.0065 Epoch: 14 Global Step: 184990 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:14:16,601-Speed 2997.27 samples/sec Loss 3.8437 LearningRate 0.0065 Epoch: 14 Global Step: 185000 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:14:20,021-Speed 2995.01 samples/sec Loss 3.7923 LearningRate 0.0065 Epoch: 14 Global Step: 185010 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:14:23,398-Speed 3032.53 samples/sec Loss 3.7333 LearningRate 0.0065 Epoch: 14 Global Step: 185020 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:14:26,814-Speed 3001.55 samples/sec Loss 3.9127 LearningRate 0.0065 Epoch: 14 Global Step: 185030 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:14:30,278-Speed 2956.69 samples/sec Loss 3.8492 LearningRate 0.0065 Epoch: 14 Global Step: 185040 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:14:33,653-Speed 3035.23 samples/sec Loss 3.8547 LearningRate 0.0065 Epoch: 14 Global Step: 185050 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:14:37,062-Speed 3004.55 samples/sec Loss 3.9550 LearningRate 0.0065 Epoch: 14 Global Step: 185060 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:14:40,425-Speed 3045.89 samples/sec Loss 3.7876 LearningRate 0.0065 Epoch: 14 Global Step: 185070 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:14:43,785-Speed 3048.95 samples/sec Loss 3.8467 LearningRate 0.0065 Epoch: 14 Global Step: 185080 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:14:47,128-Speed 3063.79 samples/sec Loss 3.8328 LearningRate 0.0065 Epoch: 14 Global Step: 185090 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 19:14:50,630-Speed 2924.18 samples/sec Loss 3.8232 LearningRate 0.0065 Epoch: 14 Global Step: 185100 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 19:14:53,984-Speed 3054.14 samples/sec Loss 3.7930 LearningRate 0.0065 Epoch: 14 Global Step: 185110 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 19:14:57,389-Speed 3008.21 samples/sec Loss 3.7693 LearningRate 0.0065 Epoch: 14 Global Step: 185120 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 19:15:00,770-Speed 3030.08 samples/sec Loss 3.8757 LearningRate 0.0065 Epoch: 14 Global Step: 185130 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:15:04,187-Speed 2997.56 samples/sec Loss 3.7304 LearningRate 0.0065 Epoch: 14 Global Step: 185140 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:15:07,592-Speed 3007.71 samples/sec Loss 3.9081 LearningRate 0.0065 Epoch: 14 Global Step: 185150 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:15:10,946-Speed 3054.54 samples/sec Loss 3.7363 LearningRate 0.0065 Epoch: 14 Global Step: 185160 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:15:14,293-Speed 3060.26 samples/sec Loss 3.8644 LearningRate 0.0065 Epoch: 14 Global Step: 185170 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:15:17,658-Speed 3043.46 samples/sec Loss 3.8473 LearningRate 0.0065 Epoch: 14 Global Step: 185180 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:15:21,067-Speed 3005.01 samples/sec Loss 3.7910 LearningRate 0.0065 Epoch: 14 Global Step: 185190 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:15:24,449-Speed 3028.15 samples/sec Loss 3.7944 LearningRate 0.0065 Epoch: 14 Global Step: 185200 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:15:27,819-Speed 3039.68 samples/sec Loss 3.8958 LearningRate 0.0065 Epoch: 14 Global Step: 185210 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:15:31,152-Speed 3073.94 samples/sec Loss 3.8623 LearningRate 0.0065 Epoch: 14 Global Step: 185220 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:15:34,637-Speed 2939.21 samples/sec Loss 3.8380 LearningRate 0.0065 Epoch: 14 Global Step: 185230 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:15:38,039-Speed 3010.63 samples/sec Loss 3.8376 LearningRate 0.0065 Epoch: 14 Global Step: 185240 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:15:41,363-Speed 3081.10 samples/sec Loss 3.7308 LearningRate 0.0065 Epoch: 14 Global Step: 185250 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:15:44,747-Speed 3026.77 samples/sec Loss 3.8801 LearningRate 0.0065 Epoch: 14 Global Step: 185260 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:15:48,233-Speed 2938.39 samples/sec Loss 3.8090 LearningRate 0.0065 Epoch: 14 Global Step: 185270 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:15:51,684-Speed 2968.52 samples/sec Loss 3.8140 LearningRate 0.0065 Epoch: 14 Global Step: 185280 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:15:55,062-Speed 3031.69 samples/sec Loss 3.7721 LearningRate 0.0065 Epoch: 14 Global Step: 185290 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:15:58,470-Speed 3006.10 samples/sec Loss 3.8131 LearningRate 0.0065 Epoch: 14 Global Step: 185300 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:16:01,873-Speed 3009.41 samples/sec Loss 3.7855 LearningRate 0.0065 Epoch: 14 Global Step: 185310 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:16:05,216-Speed 3063.69 samples/sec Loss 3.8685 LearningRate 0.0065 Epoch: 14 Global Step: 185320 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:16:08,641-Speed 2991.41 samples/sec Loss 3.8219 LearningRate 0.0064 Epoch: 14 Global Step: 185330 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:16:12,067-Speed 2989.56 samples/sec Loss 3.8493 LearningRate 0.0064 Epoch: 14 Global Step: 185340 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:16:15,460-Speed 3018.61 samples/sec Loss 3.7890 LearningRate 0.0064 Epoch: 14 Global Step: 185350 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:16:18,843-Speed 3027.51 samples/sec Loss 3.7664 LearningRate 0.0064 Epoch: 14 Global Step: 185360 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:16:22,201-Speed 3050.56 samples/sec Loss 3.7543 LearningRate 0.0064 Epoch: 14 Global Step: 185370 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:16:25,594-Speed 3019.18 samples/sec Loss 3.7822 LearningRate 0.0064 Epoch: 14 Global Step: 185380 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:16:29,069-Speed 2947.70 samples/sec Loss 3.7497 LearningRate 0.0064 Epoch: 14 Global Step: 185390 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:16:32,509-Speed 2977.14 samples/sec Loss 3.8540 LearningRate 0.0064 Epoch: 14 Global Step: 185400 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:16:35,884-Speed 3035.20 samples/sec Loss 3.8020 LearningRate 0.0064 Epoch: 14 Global Step: 185410 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:16:39,284-Speed 3012.48 samples/sec Loss 3.7536 LearningRate 0.0064 Epoch: 14 Global Step: 185420 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:16:42,740-Speed 2964.59 samples/sec Loss 3.8021 LearningRate 0.0064 Epoch: 14 Global Step: 185430 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:16:46,209-Speed 2952.55 samples/sec Loss 3.8764 LearningRate 0.0064 Epoch: 14 Global Step: 185440 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:16:49,593-Speed 3026.98 samples/sec Loss 3.8061 LearningRate 0.0064 Epoch: 14 Global Step: 185450 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:16:53,112-Speed 2910.40 samples/sec Loss 3.8820 LearningRate 0.0064 Epoch: 14 Global Step: 185460 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:16:56,558-Speed 2972.46 samples/sec Loss 3.8013 LearningRate 0.0064 Epoch: 14 Global Step: 185470 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:16:59,928-Speed 3039.93 samples/sec Loss 3.7439 LearningRate 0.0064 Epoch: 14 Global Step: 185480 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:17:03,319-Speed 3020.80 samples/sec Loss 3.8560 LearningRate 0.0064 Epoch: 14 Global Step: 185490 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:17:06,795-Speed 2946.71 samples/sec Loss 3.8864 LearningRate 0.0064 Epoch: 14 Global Step: 185500 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:17:10,166-Speed 3038.36 samples/sec Loss 3.7766 LearningRate 0.0064 Epoch: 14 Global Step: 185510 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 19:17:13,621-Speed 2964.87 samples/sec Loss 3.7609 LearningRate 0.0064 Epoch: 14 Global Step: 185520 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 19:17:17,063-Speed 2975.81 samples/sec Loss 3.7885 LearningRate 0.0064 Epoch: 14 Global Step: 185530 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 19:17:20,418-Speed 3052.44 samples/sec Loss 3.8438 LearningRate 0.0064 Epoch: 14 Global Step: 185540 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:17:23,744-Speed 3080.20 samples/sec Loss 3.7666 LearningRate 0.0064 Epoch: 14 Global Step: 185550 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:17:27,119-Speed 3034.57 samples/sec Loss 3.7763 LearningRate 0.0064 Epoch: 14 Global Step: 185560 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:17:30,472-Speed 3054.78 samples/sec Loss 3.8362 LearningRate 0.0064 Epoch: 14 Global Step: 185570 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:17:33,878-Speed 3008.17 samples/sec Loss 3.7979 LearningRate 0.0064 Epoch: 14 Global Step: 185580 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:17:37,209-Speed 3074.93 samples/sec Loss 3.7840 LearningRate 0.0064 Epoch: 14 Global Step: 185590 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:17:40,521-Speed 3092.92 samples/sec Loss 3.8219 LearningRate 0.0064 Epoch: 14 Global Step: 185600 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:17:43,897-Speed 3033.45 samples/sec Loss 3.8380 LearningRate 0.0064 Epoch: 14 Global Step: 185610 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:17:47,244-Speed 3060.54 samples/sec Loss 3.8196 LearningRate 0.0064 Epoch: 14 Global Step: 185620 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:17:50,694-Speed 2968.61 samples/sec Loss 3.8784 LearningRate 0.0064 Epoch: 14 Global Step: 185630 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:17:54,022-Speed 3077.83 samples/sec Loss 3.6924 LearningRate 0.0064 Epoch: 14 Global Step: 185640 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:17:57,425-Speed 3010.03 samples/sec Loss 3.9275 LearningRate 0.0064 Epoch: 14 Global Step: 185650 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:18:00,774-Speed 3058.22 samples/sec Loss 3.8386 LearningRate 0.0064 Epoch: 14 Global Step: 185660 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:18:04,133-Speed 3050.07 samples/sec Loss 3.7484 LearningRate 0.0064 Epoch: 14 Global Step: 185670 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:18:07,500-Speed 3042.46 samples/sec Loss 3.8456 LearningRate 0.0064 Epoch: 14 Global Step: 185680 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:18:10,867-Speed 3041.89 samples/sec Loss 3.7974 LearningRate 0.0064 Epoch: 14 Global Step: 185690 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:18:14,255-Speed 3022.83 samples/sec Loss 3.7002 LearningRate 0.0064 Epoch: 14 Global Step: 185700 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:18:17,602-Speed 3060.15 samples/sec Loss 3.8840 LearningRate 0.0064 Epoch: 14 Global Step: 185710 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:18:20,976-Speed 3036.46 samples/sec Loss 3.8496 LearningRate 0.0064 Epoch: 14 Global Step: 185720 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:18:24,413-Speed 2980.49 samples/sec Loss 3.7278 LearningRate 0.0064 Epoch: 14 Global Step: 185730 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:18:27,824-Speed 3002.76 samples/sec Loss 3.7540 LearningRate 0.0064 Epoch: 14 Global Step: 185740 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:18:31,181-Speed 3051.25 samples/sec Loss 3.8216 LearningRate 0.0064 Epoch: 14 Global Step: 185750 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:18:34,612-Speed 2985.28 samples/sec Loss 3.8347 LearningRate 0.0064 Epoch: 14 Global Step: 185760 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:18:38,087-Speed 2947.10 samples/sec Loss 3.7753 LearningRate 0.0064 Epoch: 14 Global Step: 185770 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:18:41,442-Speed 3053.68 samples/sec Loss 3.7469 LearningRate 0.0064 Epoch: 14 Global Step: 185780 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:18:44,761-Speed 3085.26 samples/sec Loss 3.7676 LearningRate 0.0064 Epoch: 14 Global Step: 185790 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:18:48,160-Speed 3013.75 samples/sec Loss 3.7504 LearningRate 0.0064 Epoch: 14 Global Step: 185800 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:18:51,491-Speed 3075.22 samples/sec Loss 3.7840 LearningRate 0.0064 Epoch: 14 Global Step: 185810 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:18:54,999-Speed 2919.80 samples/sec Loss 3.8347 LearningRate 0.0064 Epoch: 14 Global Step: 185820 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:18:58,373-Speed 3036.08 samples/sec Loss 3.8198 LearningRate 0.0063 Epoch: 14 Global Step: 185830 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:19:01,740-Speed 3041.81 samples/sec Loss 3.8573 LearningRate 0.0063 Epoch: 14 Global Step: 185840 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:19:05,157-Speed 2997.92 samples/sec Loss 3.8260 LearningRate 0.0063 Epoch: 14 Global Step: 185850 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:19:08,498-Speed 3065.72 samples/sec Loss 3.8453 LearningRate 0.0063 Epoch: 14 Global Step: 185860 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:19:11,865-Speed 3042.14 samples/sec Loss 3.8632 LearningRate 0.0063 Epoch: 14 Global Step: 185870 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:19:15,191-Speed 3079.90 samples/sec Loss 3.6983 LearningRate 0.0063 Epoch: 14 Global Step: 185880 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:19:18,551-Speed 3047.89 samples/sec Loss 3.7956 LearningRate 0.0063 Epoch: 14 Global Step: 185890 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:19:22,048-Speed 2929.09 samples/sec Loss 3.8691 LearningRate 0.0063 Epoch: 14 Global Step: 185900 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:19:25,449-Speed 3011.96 samples/sec Loss 3.7536 LearningRate 0.0063 Epoch: 14 Global Step: 185910 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:19:28,796-Speed 3060.39 samples/sec Loss 3.7292 LearningRate 0.0063 Epoch: 14 Global Step: 185920 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:19:32,204-Speed 3006.03 samples/sec Loss 3.6972 LearningRate 0.0063 Epoch: 14 Global Step: 185930 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:19:35,537-Speed 3072.61 samples/sec Loss 3.7933 LearningRate 0.0063 Epoch: 14 Global Step: 185940 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 19:19:38,851-Speed 3090.96 samples/sec Loss 3.7670 LearningRate 0.0063 Epoch: 14 Global Step: 185950 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 19:19:42,236-Speed 3025.54 samples/sec Loss 3.7737 LearningRate 0.0063 Epoch: 14 Global Step: 185960 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 19:19:45,576-Speed 3066.75 samples/sec Loss 3.7318 LearningRate 0.0063 Epoch: 14 Global Step: 185970 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:19:48,924-Speed 3059.81 samples/sec Loss 3.7998 LearningRate 0.0063 Epoch: 14 Global Step: 185980 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:19:52,300-Speed 3033.76 samples/sec Loss 3.7594 LearningRate 0.0063 Epoch: 14 Global Step: 185990 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:19:55,663-Speed 3045.30 samples/sec Loss 3.7579 LearningRate 0.0063 Epoch: 14 Global Step: 186000 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:19:59,040-Speed 3033.63 samples/sec Loss 3.8663 LearningRate 0.0063 Epoch: 14 Global Step: 186010 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:20:02,414-Speed 3035.55 samples/sec Loss 3.8105 LearningRate 0.0063 Epoch: 14 Global Step: 186020 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:20:05,785-Speed 3038.83 samples/sec Loss 3.8887 LearningRate 0.0063 Epoch: 14 Global Step: 186030 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:20:09,124-Speed 3067.97 samples/sec Loss 3.7787 LearningRate 0.0063 Epoch: 14 Global Step: 186040 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:20:12,470-Speed 3060.59 samples/sec Loss 3.7846 LearningRate 0.0063 Epoch: 14 Global Step: 186050 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:20:15,827-Speed 3051.88 samples/sec Loss 3.7933 LearningRate 0.0063 Epoch: 14 Global Step: 186060 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:20:19,252-Speed 2990.32 samples/sec Loss 3.7250 LearningRate 0.0063 Epoch: 14 Global Step: 186070 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 19:20:22,675-Speed 2992.24 samples/sec Loss 3.7872 LearningRate 0.0063 Epoch: 14 Global Step: 186080 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 19:20:26,069-Speed 3018.26 samples/sec Loss 3.7371 LearningRate 0.0063 Epoch: 14 Global Step: 186090 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 19:20:29,536-Speed 2954.36 samples/sec Loss 3.8532 LearningRate 0.0063 Epoch: 14 Global Step: 186100 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 19:20:32,905-Speed 3040.68 samples/sec Loss 3.7403 LearningRate 0.0063 Epoch: 14 Global Step: 186110 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 19:20:36,283-Speed 3032.32 samples/sec Loss 3.6690 LearningRate 0.0063 Epoch: 14 Global Step: 186120 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 19:20:39,756-Speed 2948.82 samples/sec Loss 3.8180 LearningRate 0.0063 Epoch: 14 Global Step: 186130 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 19:20:43,166-Speed 3003.76 samples/sec Loss 3.8133 LearningRate 0.0063 Epoch: 14 Global Step: 186140 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 19:20:46,501-Speed 3071.89 samples/sec Loss 3.7874 LearningRate 0.0063 Epoch: 14 Global Step: 186150 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 19:20:49,979-Speed 2945.27 samples/sec Loss 3.7841 LearningRate 0.0063 Epoch: 14 Global Step: 186160 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 19:20:53,372-Speed 3018.86 samples/sec Loss 3.8024 LearningRate 0.0063 Epoch: 14 Global Step: 186170 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:20:56,734-Speed 3046.37 samples/sec Loss 3.7269 LearningRate 0.0063 Epoch: 14 Global Step: 186180 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:21:00,140-Speed 3007.52 samples/sec Loss 3.7098 LearningRate 0.0063 Epoch: 14 Global Step: 186190 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:21:03,476-Speed 3069.82 samples/sec Loss 3.8216 LearningRate 0.0063 Epoch: 14 Global Step: 186200 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:21:06,843-Speed 3042.69 samples/sec Loss 3.7771 LearningRate 0.0063 Epoch: 14 Global Step: 186210 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:21:10,201-Speed 3050.32 samples/sec Loss 3.7749 LearningRate 0.0063 Epoch: 14 Global Step: 186220 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:21:13,567-Speed 3043.10 samples/sec Loss 3.7779 LearningRate 0.0063 Epoch: 14 Global Step: 186230 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:21:16,976-Speed 3003.81 samples/sec Loss 3.7975 LearningRate 0.0063 Epoch: 14 Global Step: 186240 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:21:20,367-Speed 3021.18 samples/sec Loss 3.7063 LearningRate 0.0063 Epoch: 14 Global Step: 186250 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:21:23,733-Speed 3043.06 samples/sec Loss 3.7437 LearningRate 0.0063 Epoch: 14 Global Step: 186260 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:21:27,121-Speed 3022.72 samples/sec Loss 3.8310 LearningRate 0.0063 Epoch: 14 Global Step: 186270 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:21:30,457-Speed 3070.44 samples/sec Loss 3.6376 LearningRate 0.0063 Epoch: 14 Global Step: 186280 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:21:33,881-Speed 2991.28 samples/sec Loss 3.8506 LearningRate 0.0063 Epoch: 14 Global Step: 186290 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:21:37,291-Speed 3004.27 samples/sec Loss 3.7802 LearningRate 0.0063 Epoch: 14 Global Step: 186300 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:21:41,106-Speed 2684.75 samples/sec Loss 3.8073 LearningRate 0.0063 Epoch: 14 Global Step: 186310 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:22:14,102-Speed 310.35 samples/sec Loss 3.2311 LearningRate 0.0062 Epoch: 15 Global Step: 186320 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:22:17,534-Speed 2984.77 samples/sec Loss 2.5887 LearningRate 0.0062 Epoch: 15 Global Step: 186330 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:22:21,220-Speed 2779.21 samples/sec Loss 2.4440 LearningRate 0.0062 Epoch: 15 Global Step: 186340 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:22:24,628-Speed 3005.23 samples/sec Loss 2.6139 LearningRate 0.0062 Epoch: 15 Global Step: 186350 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:22:27,986-Speed 3051.19 samples/sec Loss 2.5951 LearningRate 0.0062 Epoch: 15 Global Step: 186360 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:22:31,403-Speed 2997.06 samples/sec Loss 2.5864 LearningRate 0.0062 Epoch: 15 Global Step: 186370 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 19:22:34,779-Speed 3033.77 samples/sec Loss 2.4867 LearningRate 0.0062 Epoch: 15 Global Step: 186380 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 19:22:38,128-Speed 3059.14 samples/sec Loss 2.5750 LearningRate 0.0062 Epoch: 15 Global Step: 186390 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 19:22:41,542-Speed 3000.55 samples/sec Loss 2.4762 LearningRate 0.0062 Epoch: 15 Global Step: 186400 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:22:44,930-Speed 3022.72 samples/sec Loss 2.5658 LearningRate 0.0062 Epoch: 15 Global Step: 186410 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:22:48,350-Speed 2994.90 samples/sec Loss 2.5354 LearningRate 0.0062 Epoch: 15 Global Step: 186420 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:22:51,745-Speed 3017.92 samples/sec Loss 2.5427 LearningRate 0.0062 Epoch: 15 Global Step: 186430 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:22:55,116-Speed 3037.97 samples/sec Loss 2.5589 LearningRate 0.0062 Epoch: 15 Global Step: 186440 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:22:58,478-Speed 3047.26 samples/sec Loss 2.4892 LearningRate 0.0062 Epoch: 15 Global Step: 186450 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:23:01,865-Speed 3024.47 samples/sec Loss 2.6255 LearningRate 0.0062 Epoch: 15 Global Step: 186460 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:23:05,249-Speed 3026.34 samples/sec Loss 2.5667 LearningRate 0.0062 Epoch: 15 Global Step: 186470 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:23:08,626-Speed 3033.25 samples/sec Loss 2.5607 LearningRate 0.0062 Epoch: 15 Global Step: 186480 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:23:12,067-Speed 2977.15 samples/sec Loss 2.5827 LearningRate 0.0062 Epoch: 15 Global Step: 186490 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:23:15,381-Speed 3090.76 samples/sec Loss 2.5766 LearningRate 0.0062 Epoch: 15 Global Step: 186500 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:23:18,771-Speed 3021.16 samples/sec Loss 2.5400 LearningRate 0.0062 Epoch: 15 Global Step: 186510 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:23:22,124-Speed 3054.78 samples/sec Loss 2.6483 LearningRate 0.0062 Epoch: 15 Global Step: 186520 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:23:25,501-Speed 3032.64 samples/sec Loss 2.5840 LearningRate 0.0062 Epoch: 15 Global Step: 186530 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:23:28,933-Speed 2985.37 samples/sec Loss 2.5566 LearningRate 0.0062 Epoch: 15 Global Step: 186540 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:23:32,306-Speed 3036.39 samples/sec Loss 2.5131 LearningRate 0.0062 Epoch: 15 Global Step: 186550 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:23:35,730-Speed 2991.32 samples/sec Loss 2.5415 LearningRate 0.0062 Epoch: 15 Global Step: 186560 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:23:39,121-Speed 3020.82 samples/sec Loss 2.6124 LearningRate 0.0062 Epoch: 15 Global Step: 186570 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:23:42,507-Speed 3025.31 samples/sec Loss 2.5562 LearningRate 0.0062 Epoch: 15 Global Step: 186580 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:23:45,854-Speed 3060.57 samples/sec Loss 2.5923 LearningRate 0.0062 Epoch: 15 Global Step: 186590 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:23:49,347-Speed 2932.24 samples/sec Loss 2.5442 LearningRate 0.0062 Epoch: 15 Global Step: 186600 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:23:52,740-Speed 3018.98 samples/sec Loss 2.6755 LearningRate 0.0062 Epoch: 15 Global Step: 186610 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:23:56,188-Speed 2970.57 samples/sec Loss 2.6054 LearningRate 0.0062 Epoch: 15 Global Step: 186620 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:23:59,629-Speed 2977.14 samples/sec Loss 2.6216 LearningRate 0.0062 Epoch: 15 Global Step: 186630 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:24:03,103-Speed 2949.14 samples/sec Loss 2.5277 LearningRate 0.0062 Epoch: 15 Global Step: 186640 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:24:06,581-Speed 2944.72 samples/sec Loss 2.5714 LearningRate 0.0062 Epoch: 15 Global Step: 186650 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:24:10,643-Speed 2521.46 samples/sec Loss 2.6195 LearningRate 0.0062 Epoch: 15 Global Step: 186660 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:24:14,435-Speed 2701.76 samples/sec Loss 2.6524 LearningRate 0.0062 Epoch: 15 Global Step: 186670 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:24:18,423-Speed 2568.37 samples/sec Loss 2.5897 LearningRate 0.0062 Epoch: 15 Global Step: 186680 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:24:22,151-Speed 2748.54 samples/sec Loss 2.5928 LearningRate 0.0062 Epoch: 15 Global Step: 186690 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:24:25,482-Speed 3075.43 samples/sec Loss 2.6192 LearningRate 0.0062 Epoch: 15 Global Step: 186700 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:24:29,492-Speed 2554.46 samples/sec Loss 2.6320 LearningRate 0.0062 Epoch: 15 Global Step: 186710 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:24:32,837-Speed 3061.37 samples/sec Loss 2.6746 LearningRate 0.0062 Epoch: 15 Global Step: 186720 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:24:36,185-Speed 3059.85 samples/sec Loss 2.6434 LearningRate 0.0062 Epoch: 15 Global Step: 186730 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:24:39,592-Speed 3006.36 samples/sec Loss 2.6170 LearningRate 0.0062 Epoch: 15 Global Step: 186740 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:24:43,016-Speed 2991.65 samples/sec Loss 2.5857 LearningRate 0.0062 Epoch: 15 Global Step: 186750 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:24:46,439-Speed 2992.33 samples/sec Loss 2.6328 LearningRate 0.0062 Epoch: 15 Global Step: 186760 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:24:49,813-Speed 3036.28 samples/sec Loss 2.6184 LearningRate 0.0062 Epoch: 15 Global Step: 186770 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:24:53,209-Speed 3016.07 samples/sec Loss 2.6576 LearningRate 0.0062 Epoch: 15 Global Step: 186780 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:24:56,606-Speed 3015.17 samples/sec Loss 2.5563 LearningRate 0.0062 Epoch: 15 Global Step: 186790 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:24:59,985-Speed 3031.41 samples/sec Loss 2.6394 LearningRate 0.0062 Epoch: 15 Global Step: 186800 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:25:03,462-Speed 2945.46 samples/sec Loss 2.6533 LearningRate 0.0062 Epoch: 15 Global Step: 186810 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:25:06,827-Speed 3044.63 samples/sec Loss 2.6508 LearningRate 0.0061 Epoch: 15 Global Step: 186820 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:25:10,231-Speed 3008.78 samples/sec Loss 2.6105 LearningRate 0.0061 Epoch: 15 Global Step: 186830 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:25:13,688-Speed 2963.28 samples/sec Loss 2.6092 LearningRate 0.0061 Epoch: 15 Global Step: 186840 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:25:17,056-Speed 3040.96 samples/sec Loss 2.6081 LearningRate 0.0061 Epoch: 15 Global Step: 186850 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:25:20,456-Speed 3012.45 samples/sec Loss 2.6253 LearningRate 0.0061 Epoch: 15 Global Step: 186860 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:25:23,805-Speed 3059.07 samples/sec Loss 2.5562 LearningRate 0.0061 Epoch: 15 Global Step: 186870 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:25:27,210-Speed 3008.15 samples/sec Loss 2.6361 LearningRate 0.0061 Epoch: 15 Global Step: 186880 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:25:30,659-Speed 2969.70 samples/sec Loss 2.6448 LearningRate 0.0061 Epoch: 15 Global Step: 186890 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:25:34,069-Speed 3003.81 samples/sec Loss 2.6746 LearningRate 0.0061 Epoch: 15 Global Step: 186900 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:25:37,445-Speed 3034.23 samples/sec Loss 2.6480 LearningRate 0.0061 Epoch: 15 Global Step: 186910 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:25:40,836-Speed 3020.87 samples/sec Loss 2.7014 LearningRate 0.0061 Epoch: 15 Global Step: 186920 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:25:44,257-Speed 2993.90 samples/sec Loss 2.7329 LearningRate 0.0061 Epoch: 15 Global Step: 186930 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:25:47,731-Speed 2948.92 samples/sec Loss 2.6036 LearningRate 0.0061 Epoch: 15 Global Step: 186940 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:25:51,149-Speed 2996.21 samples/sec Loss 2.6404 LearningRate 0.0061 Epoch: 15 Global Step: 186950 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:25:54,529-Speed 3031.18 samples/sec Loss 2.6403 LearningRate 0.0061 Epoch: 15 Global Step: 186960 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:25:57,994-Speed 2955.91 samples/sec Loss 2.6116 LearningRate 0.0061 Epoch: 15 Global Step: 186970 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:26:01,403-Speed 3004.11 samples/sec Loss 2.6131 LearningRate 0.0061 Epoch: 15 Global Step: 186980 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:26:04,850-Speed 2971.84 samples/sec Loss 2.6404 LearningRate 0.0061 Epoch: 15 Global Step: 186990 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:26:08,204-Speed 3053.84 samples/sec Loss 2.6895 LearningRate 0.0061 Epoch: 15 Global Step: 187000 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:26:11,638-Speed 2983.27 samples/sec Loss 2.6651 LearningRate 0.0061 Epoch: 15 Global Step: 187010 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:26:15,006-Speed 3041.08 samples/sec Loss 2.6056 LearningRate 0.0061 Epoch: 15 Global Step: 187020 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:26:19,001-Speed 2563.64 samples/sec Loss 2.6387 LearningRate 0.0061 Epoch: 15 Global Step: 187030 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:26:22,349-Speed 3059.72 samples/sec Loss 2.6490 LearningRate 0.0061 Epoch: 15 Global Step: 187040 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:26:25,777-Speed 2988.15 samples/sec Loss 2.7026 LearningRate 0.0061 Epoch: 15 Global Step: 187050 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:26:29,783-Speed 2556.95 samples/sec Loss 2.6205 LearningRate 0.0061 Epoch: 15 Global Step: 187060 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:26:33,198-Speed 2998.29 samples/sec Loss 2.6593 LearningRate 0.0061 Epoch: 15 Global Step: 187070 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:26:36,555-Speed 3051.70 samples/sec Loss 2.7245 LearningRate 0.0061 Epoch: 15 Global Step: 187080 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:26:39,938-Speed 3027.95 samples/sec Loss 2.7041 LearningRate 0.0061 Epoch: 15 Global Step: 187090 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:26:43,315-Speed 3034.50 samples/sec Loss 2.6878 LearningRate 0.0061 Epoch: 15 Global Step: 187100 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 19:26:46,717-Speed 3011.09 samples/sec Loss 2.6693 LearningRate 0.0061 Epoch: 15 Global Step: 187110 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:26:50,216-Speed 2927.18 samples/sec Loss 2.6720 LearningRate 0.0061 Epoch: 15 Global Step: 187120 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:26:53,607-Speed 3020.81 samples/sec Loss 2.6143 LearningRate 0.0061 Epoch: 15 Global Step: 187130 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:26:56,968-Speed 3047.89 samples/sec Loss 2.6269 LearningRate 0.0061 Epoch: 15 Global Step: 187140 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:27:00,403-Speed 2982.39 samples/sec Loss 2.6708 LearningRate 0.0061 Epoch: 15 Global Step: 187150 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:27:03,801-Speed 3014.26 samples/sec Loss 2.6138 LearningRate 0.0061 Epoch: 15 Global Step: 187160 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:27:07,196-Speed 3016.80 samples/sec Loss 2.7614 LearningRate 0.0061 Epoch: 15 Global Step: 187170 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:27:10,556-Speed 3048.70 samples/sec Loss 2.7156 LearningRate 0.0061 Epoch: 15 Global Step: 187180 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:27:13,913-Speed 3051.54 samples/sec Loss 2.6746 LearningRate 0.0061 Epoch: 15 Global Step: 187190 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:27:17,271-Speed 3049.62 samples/sec Loss 2.5863 LearningRate 0.0061 Epoch: 15 Global Step: 187200 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:27:20,678-Speed 3006.55 samples/sec Loss 2.6192 LearningRate 0.0061 Epoch: 15 Global Step: 187210 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:27:24,089-Speed 3003.67 samples/sec Loss 2.7005 LearningRate 0.0061 Epoch: 15 Global Step: 187220 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:27:27,535-Speed 2971.68 samples/sec Loss 2.6408 LearningRate 0.0061 Epoch: 15 Global Step: 187230 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:27:30,961-Speed 2989.69 samples/sec Loss 2.6259 LearningRate 0.0061 Epoch: 15 Global Step: 187240 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:27:34,388-Speed 2989.17 samples/sec Loss 2.7086 LearningRate 0.0061 Epoch: 15 Global Step: 187250 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:27:37,776-Speed 3022.99 samples/sec Loss 2.7852 LearningRate 0.0061 Epoch: 15 Global Step: 187260 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:27:41,094-Speed 3087.11 samples/sec Loss 2.6973 LearningRate 0.0061 Epoch: 15 Global Step: 187270 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:27:44,484-Speed 3021.70 samples/sec Loss 2.7074 LearningRate 0.0061 Epoch: 15 Global Step: 187280 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:27:47,854-Speed 3039.22 samples/sec Loss 2.7144 LearningRate 0.0061 Epoch: 15 Global Step: 187290 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:27:51,248-Speed 3017.96 samples/sec Loss 2.6552 LearningRate 0.0061 Epoch: 15 Global Step: 187300 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:27:54,637-Speed 3022.52 samples/sec Loss 2.6280 LearningRate 0.0061 Epoch: 15 Global Step: 187310 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 19:27:57,965-Speed 3077.15 samples/sec Loss 2.6616 LearningRate 0.0060 Epoch: 15 Global Step: 187320 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:28:01,321-Speed 3052.03 samples/sec Loss 2.6632 LearningRate 0.0060 Epoch: 15 Global Step: 187330 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:28:04,719-Speed 3014.29 samples/sec Loss 2.6667 LearningRate 0.0060 Epoch: 15 Global Step: 187340 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:28:08,220-Speed 2926.06 samples/sec Loss 2.7062 LearningRate 0.0060 Epoch: 15 Global Step: 187350 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:28:11,635-Speed 2998.69 samples/sec Loss 2.7247 LearningRate 0.0060 Epoch: 15 Global Step: 187360 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:28:14,993-Speed 3050.77 samples/sec Loss 2.7056 LearningRate 0.0060 Epoch: 15 Global Step: 187370 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:28:18,376-Speed 3028.03 samples/sec Loss 2.6992 LearningRate 0.0060 Epoch: 15 Global Step: 187380 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:28:21,740-Speed 3045.08 samples/sec Loss 2.7527 LearningRate 0.0060 Epoch: 15 Global Step: 187390 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:28:25,090-Speed 3057.65 samples/sec Loss 2.6462 LearningRate 0.0060 Epoch: 15 Global Step: 187400 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:28:28,433-Speed 3064.33 samples/sec Loss 2.6736 LearningRate 0.0060 Epoch: 15 Global Step: 187410 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:28:31,831-Speed 3014.32 samples/sec Loss 2.7263 LearningRate 0.0060 Epoch: 15 Global Step: 187420 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:28:35,189-Speed 3049.61 samples/sec Loss 2.7938 LearningRate 0.0060 Epoch: 15 Global Step: 187430 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:28:38,558-Speed 3041.05 samples/sec Loss 2.7292 LearningRate 0.0060 Epoch: 15 Global Step: 187440 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:28:41,949-Speed 3019.81 samples/sec Loss 2.6928 LearningRate 0.0060 Epoch: 15 Global Step: 187450 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:28:45,304-Speed 3053.49 samples/sec Loss 2.7281 LearningRate 0.0060 Epoch: 15 Global Step: 187460 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 19:28:48,757-Speed 2965.98 samples/sec Loss 2.6885 LearningRate 0.0060 Epoch: 15 Global Step: 187470 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 19:28:52,116-Speed 3049.64 samples/sec Loss 2.7348 LearningRate 0.0060 Epoch: 15 Global Step: 187480 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 19:28:55,490-Speed 3036.31 samples/sec Loss 2.7311 LearningRate 0.0060 Epoch: 15 Global Step: 187490 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 19:28:58,833-Speed 3064.46 samples/sec Loss 2.7314 LearningRate 0.0060 Epoch: 15 Global Step: 187500 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 19:29:02,187-Speed 3053.28 samples/sec Loss 2.7072 LearningRate 0.0060 Epoch: 15 Global Step: 187510 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 19:29:05,555-Speed 3041.29 samples/sec Loss 2.7037 LearningRate 0.0060 Epoch: 15 Global Step: 187520 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 19:29:08,941-Speed 3026.44 samples/sec Loss 2.7928 LearningRate 0.0060 Epoch: 15 Global Step: 187530 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 19:29:12,376-Speed 2981.05 samples/sec Loss 2.6898 LearningRate 0.0060 Epoch: 15 Global Step: 187540 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 19:29:15,789-Speed 3001.18 samples/sec Loss 2.7467 LearningRate 0.0060 Epoch: 15 Global Step: 187550 Fp16 Grad Scale: 8192 Required: 6 hours Training: 2022-04-27 19:29:19,237-Speed 2971.01 samples/sec Loss 2.7152 LearningRate 0.0060 Epoch: 15 Global Step: 187560 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:29:22,624-Speed 3024.48 samples/sec Loss 2.7186 LearningRate 0.0060 Epoch: 15 Global Step: 187570 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:29:26,034-Speed 3003.49 samples/sec Loss 2.8079 LearningRate 0.0060 Epoch: 15 Global Step: 187580 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:29:29,397-Speed 3046.17 samples/sec Loss 2.7236 LearningRate 0.0060 Epoch: 15 Global Step: 187590 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:29:32,708-Speed 3092.99 samples/sec Loss 2.7054 LearningRate 0.0060 Epoch: 15 Global Step: 187600 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:29:36,114-Speed 3007.57 samples/sec Loss 2.7483 LearningRate 0.0060 Epoch: 15 Global Step: 187610 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:29:39,558-Speed 2973.58 samples/sec Loss 2.7585 LearningRate 0.0060 Epoch: 15 Global Step: 187620 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:29:42,918-Speed 3048.51 samples/sec Loss 2.7133 LearningRate 0.0060 Epoch: 15 Global Step: 187630 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:29:46,367-Speed 2970.24 samples/sec Loss 2.6792 LearningRate 0.0060 Epoch: 15 Global Step: 187640 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:29:49,732-Speed 3043.57 samples/sec Loss 2.6685 LearningRate 0.0060 Epoch: 15 Global Step: 187650 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:29:53,171-Speed 2978.84 samples/sec Loss 2.7264 LearningRate 0.0060 Epoch: 15 Global Step: 187660 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:29:56,595-Speed 2991.77 samples/sec Loss 2.6828 LearningRate 0.0060 Epoch: 15 Global Step: 187670 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:29:59,959-Speed 3045.01 samples/sec Loss 2.7938 LearningRate 0.0060 Epoch: 15 Global Step: 187680 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:30:03,310-Speed 3057.00 samples/sec Loss 2.7071 LearningRate 0.0060 Epoch: 15 Global Step: 187690 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:30:06,640-Speed 3075.63 samples/sec Loss 2.7805 LearningRate 0.0060 Epoch: 15 Global Step: 187700 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:30:10,039-Speed 3013.89 samples/sec Loss 2.7935 LearningRate 0.0060 Epoch: 15 Global Step: 187710 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:30:13,397-Speed 3050.48 samples/sec Loss 2.7229 LearningRate 0.0060 Epoch: 15 Global Step: 187720 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:30:16,756-Speed 3049.01 samples/sec Loss 2.7339 LearningRate 0.0060 Epoch: 15 Global Step: 187730 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:30:20,127-Speed 3038.92 samples/sec Loss 2.7740 LearningRate 0.0060 Epoch: 15 Global Step: 187740 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:30:23,537-Speed 3003.77 samples/sec Loss 2.7178 LearningRate 0.0060 Epoch: 15 Global Step: 187750 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:30:26,972-Speed 2981.97 samples/sec Loss 2.7510 LearningRate 0.0060 Epoch: 15 Global Step: 187760 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:30:30,380-Speed 3006.90 samples/sec Loss 2.6870 LearningRate 0.0060 Epoch: 15 Global Step: 187770 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:30:33,837-Speed 2963.14 samples/sec Loss 2.7640 LearningRate 0.0060 Epoch: 15 Global Step: 187780 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:30:37,178-Speed 3066.34 samples/sec Loss 2.7220 LearningRate 0.0060 Epoch: 15 Global Step: 187790 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:30:40,591-Speed 3001.08 samples/sec Loss 2.8013 LearningRate 0.0060 Epoch: 15 Global Step: 187800 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:30:44,022-Speed 2984.98 samples/sec Loss 2.7935 LearningRate 0.0060 Epoch: 15 Global Step: 187810 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:30:47,335-Speed 3091.89 samples/sec Loss 2.7501 LearningRate 0.0060 Epoch: 15 Global Step: 187820 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:30:50,775-Speed 2978.10 samples/sec Loss 2.7241 LearningRate 0.0059 Epoch: 15 Global Step: 187830 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:30:54,146-Speed 3038.57 samples/sec Loss 2.8471 LearningRate 0.0059 Epoch: 15 Global Step: 187840 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:30:57,506-Speed 3048.06 samples/sec Loss 2.7652 LearningRate 0.0059 Epoch: 15 Global Step: 187850 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:31:00,901-Speed 3016.89 samples/sec Loss 2.7216 LearningRate 0.0059 Epoch: 15 Global Step: 187860 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:31:04,245-Speed 3063.50 samples/sec Loss 2.7138 LearningRate 0.0059 Epoch: 15 Global Step: 187870 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:31:07,563-Speed 3086.93 samples/sec Loss 2.8272 LearningRate 0.0059 Epoch: 15 Global Step: 187880 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 19:31:10,916-Speed 3055.06 samples/sec Loss 2.7993 LearningRate 0.0059 Epoch: 15 Global Step: 187890 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:31:14,280-Speed 3044.56 samples/sec Loss 2.7942 LearningRate 0.0059 Epoch: 15 Global Step: 187900 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:31:17,657-Speed 3033.42 samples/sec Loss 2.8220 LearningRate 0.0059 Epoch: 15 Global Step: 187910 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:31:21,084-Speed 2988.72 samples/sec Loss 2.7778 LearningRate 0.0059 Epoch: 15 Global Step: 187920 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:31:24,579-Speed 2931.38 samples/sec Loss 2.8601 LearningRate 0.0059 Epoch: 15 Global Step: 187930 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:31:27,967-Speed 3022.94 samples/sec Loss 2.7977 LearningRate 0.0059 Epoch: 15 Global Step: 187940 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:31:31,388-Speed 2994.01 samples/sec Loss 2.7777 LearningRate 0.0059 Epoch: 15 Global Step: 187950 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:31:34,729-Speed 3066.56 samples/sec Loss 2.7568 LearningRate 0.0059 Epoch: 15 Global Step: 187960 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:31:38,116-Speed 3024.38 samples/sec Loss 2.7863 LearningRate 0.0059 Epoch: 15 Global Step: 187970 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:31:41,568-Speed 2967.21 samples/sec Loss 2.8627 LearningRate 0.0059 Epoch: 15 Global Step: 187980 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:31:44,898-Speed 3075.50 samples/sec Loss 2.7777 LearningRate 0.0059 Epoch: 15 Global Step: 187990 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 19:31:48,234-Speed 3070.35 samples/sec Loss 2.6840 LearningRate 0.0059 Epoch: 15 Global Step: 188000 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 19:31:51,579-Speed 3062.19 samples/sec Loss 2.7223 LearningRate 0.0059 Epoch: 15 Global Step: 188010 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 19:31:55,019-Speed 2977.84 samples/sec Loss 2.7462 LearningRate 0.0059 Epoch: 15 Global Step: 188020 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 19:31:58,480-Speed 2959.32 samples/sec Loss 2.7592 LearningRate 0.0059 Epoch: 15 Global Step: 188030 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 19:32:01,776-Speed 3107.98 samples/sec Loss 2.6994 LearningRate 0.0059 Epoch: 15 Global Step: 188040 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:32:05,152-Speed 3034.11 samples/sec Loss 2.8597 LearningRate 0.0059 Epoch: 15 Global Step: 188050 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:32:08,564-Speed 3001.84 samples/sec Loss 2.8398 LearningRate 0.0059 Epoch: 15 Global Step: 188060 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:32:11,904-Speed 3066.58 samples/sec Loss 2.7211 LearningRate 0.0059 Epoch: 15 Global Step: 188070 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:32:15,310-Speed 3007.56 samples/sec Loss 2.7584 LearningRate 0.0059 Epoch: 15 Global Step: 188080 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:32:18,710-Speed 3012.33 samples/sec Loss 2.7833 LearningRate 0.0059 Epoch: 15 Global Step: 188090 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:32:22,043-Speed 3074.11 samples/sec Loss 2.7978 LearningRate 0.0059 Epoch: 15 Global Step: 188100 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:32:25,416-Speed 3036.26 samples/sec Loss 2.8018 LearningRate 0.0059 Epoch: 15 Global Step: 188110 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:32:28,826-Speed 3004.53 samples/sec Loss 2.7610 LearningRate 0.0059 Epoch: 15 Global Step: 188120 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:32:32,258-Speed 2984.21 samples/sec Loss 2.8701 LearningRate 0.0059 Epoch: 15 Global Step: 188130 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:32:35,681-Speed 2991.81 samples/sec Loss 2.7274 LearningRate 0.0059 Epoch: 15 Global Step: 188140 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:32:39,054-Speed 3036.86 samples/sec Loss 2.8047 LearningRate 0.0059 Epoch: 15 Global Step: 188150 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:32:42,455-Speed 3012.40 samples/sec Loss 2.7376 LearningRate 0.0059 Epoch: 15 Global Step: 188160 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:32:45,913-Speed 2961.68 samples/sec Loss 2.8719 LearningRate 0.0059 Epoch: 15 Global Step: 188170 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:32:49,329-Speed 2998.30 samples/sec Loss 2.8023 LearningRate 0.0059 Epoch: 15 Global Step: 188180 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:32:52,741-Speed 3002.03 samples/sec Loss 2.8410 LearningRate 0.0059 Epoch: 15 Global Step: 188190 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:32:56,151-Speed 3003.61 samples/sec Loss 2.8474 LearningRate 0.0059 Epoch: 15 Global Step: 188200 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:32:59,502-Speed 3057.29 samples/sec Loss 2.7496 LearningRate 0.0059 Epoch: 15 Global Step: 188210 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:33:02,885-Speed 3027.11 samples/sec Loss 2.7745 LearningRate 0.0059 Epoch: 15 Global Step: 188220 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:33:06,370-Speed 2939.24 samples/sec Loss 2.7959 LearningRate 0.0059 Epoch: 15 Global Step: 188230 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:33:09,806-Speed 2981.66 samples/sec Loss 2.7995 LearningRate 0.0059 Epoch: 15 Global Step: 188240 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:33:13,244-Speed 2979.43 samples/sec Loss 2.8307 LearningRate 0.0059 Epoch: 15 Global Step: 188250 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 19:33:16,632-Speed 3022.84 samples/sec Loss 2.7693 LearningRate 0.0059 Epoch: 15 Global Step: 188260 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:33:19,993-Speed 3047.93 samples/sec Loss 2.8084 LearningRate 0.0059 Epoch: 15 Global Step: 188270 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:33:23,334-Speed 3066.23 samples/sec Loss 2.8010 LearningRate 0.0059 Epoch: 15 Global Step: 188280 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:33:26,716-Speed 3028.80 samples/sec Loss 2.8743 LearningRate 0.0059 Epoch: 15 Global Step: 188290 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:33:30,078-Speed 3046.04 samples/sec Loss 2.8495 LearningRate 0.0059 Epoch: 15 Global Step: 188300 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:33:33,422-Speed 3062.97 samples/sec Loss 2.8807 LearningRate 0.0059 Epoch: 15 Global Step: 188310 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:33:36,801-Speed 3031.42 samples/sec Loss 2.8195 LearningRate 0.0059 Epoch: 15 Global Step: 188320 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:33:40,209-Speed 3006.04 samples/sec Loss 2.8565 LearningRate 0.0059 Epoch: 15 Global Step: 188330 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:33:43,616-Speed 3006.73 samples/sec Loss 2.8439 LearningRate 0.0058 Epoch: 15 Global Step: 188340 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:33:47,036-Speed 2994.33 samples/sec Loss 2.7573 LearningRate 0.0058 Epoch: 15 Global Step: 188350 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:33:50,453-Speed 2998.19 samples/sec Loss 2.8887 LearningRate 0.0058 Epoch: 15 Global Step: 188360 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 19:33:53,826-Speed 3036.66 samples/sec Loss 2.8700 LearningRate 0.0058 Epoch: 15 Global Step: 188370 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 19:33:57,241-Speed 2999.85 samples/sec Loss 2.7705 LearningRate 0.0058 Epoch: 15 Global Step: 188380 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 19:34:00,647-Speed 3007.53 samples/sec Loss 2.8108 LearningRate 0.0058 Epoch: 15 Global Step: 188390 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:34:03,984-Speed 3069.18 samples/sec Loss 2.8094 LearningRate 0.0058 Epoch: 15 Global Step: 188400 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:34:07,362-Speed 3031.33 samples/sec Loss 2.9225 LearningRate 0.0058 Epoch: 15 Global Step: 188410 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:34:10,714-Speed 3055.99 samples/sec Loss 2.7835 LearningRate 0.0058 Epoch: 15 Global Step: 188420 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:34:14,136-Speed 2993.10 samples/sec Loss 2.9056 LearningRate 0.0058 Epoch: 15 Global Step: 188430 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:34:17,560-Speed 2991.29 samples/sec Loss 2.8169 LearningRate 0.0058 Epoch: 15 Global Step: 188440 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:34:21,068-Speed 2920.70 samples/sec Loss 2.8589 LearningRate 0.0058 Epoch: 15 Global Step: 188450 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:34:24,474-Speed 3006.59 samples/sec Loss 2.7661 LearningRate 0.0058 Epoch: 15 Global Step: 188460 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:34:27,880-Speed 3007.42 samples/sec Loss 2.8469 LearningRate 0.0058 Epoch: 15 Global Step: 188470 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:34:31,310-Speed 2987.06 samples/sec Loss 2.8291 LearningRate 0.0058 Epoch: 15 Global Step: 188480 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:34:34,740-Speed 2985.84 samples/sec Loss 2.8203 LearningRate 0.0058 Epoch: 15 Global Step: 188490 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 19:34:38,126-Speed 3024.86 samples/sec Loss 2.8309 LearningRate 0.0058 Epoch: 15 Global Step: 188500 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:34:41,474-Speed 3059.32 samples/sec Loss 2.7895 LearningRate 0.0058 Epoch: 15 Global Step: 188510 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:34:44,881-Speed 3006.55 samples/sec Loss 2.8098 LearningRate 0.0058 Epoch: 15 Global Step: 188520 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:34:48,272-Speed 3020.45 samples/sec Loss 2.9114 LearningRate 0.0058 Epoch: 15 Global Step: 188530 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:34:51,712-Speed 2977.65 samples/sec Loss 2.8851 LearningRate 0.0058 Epoch: 15 Global Step: 188540 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:34:55,110-Speed 3014.09 samples/sec Loss 2.8535 LearningRate 0.0058 Epoch: 15 Global Step: 188550 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:34:58,452-Speed 3065.63 samples/sec Loss 2.8205 LearningRate 0.0058 Epoch: 15 Global Step: 188560 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:35:01,848-Speed 3016.14 samples/sec Loss 2.7928 LearningRate 0.0058 Epoch: 15 Global Step: 188570 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:35:05,257-Speed 3003.91 samples/sec Loss 2.8725 LearningRate 0.0058 Epoch: 15 Global Step: 188580 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:35:08,636-Speed 3031.52 samples/sec Loss 2.8725 LearningRate 0.0058 Epoch: 15 Global Step: 188590 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:35:12,092-Speed 2964.58 samples/sec Loss 2.8257 LearningRate 0.0058 Epoch: 15 Global Step: 188600 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:35:15,517-Speed 2990.06 samples/sec Loss 2.8404 LearningRate 0.0058 Epoch: 15 Global Step: 188610 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:35:18,934-Speed 2997.74 samples/sec Loss 2.8032 LearningRate 0.0058 Epoch: 15 Global Step: 188620 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:35:22,342-Speed 3006.11 samples/sec Loss 2.8620 LearningRate 0.0058 Epoch: 15 Global Step: 188630 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:35:25,828-Speed 2937.73 samples/sec Loss 2.8324 LearningRate 0.0058 Epoch: 15 Global Step: 188640 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:35:29,281-Speed 2966.81 samples/sec Loss 2.8108 LearningRate 0.0058 Epoch: 15 Global Step: 188650 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:35:32,692-Speed 3002.78 samples/sec Loss 2.8407 LearningRate 0.0058 Epoch: 15 Global Step: 188660 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:35:36,097-Speed 3008.17 samples/sec Loss 2.7965 LearningRate 0.0058 Epoch: 15 Global Step: 188670 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:35:39,574-Speed 2946.17 samples/sec Loss 2.8462 LearningRate 0.0058 Epoch: 15 Global Step: 188680 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:35:42,982-Speed 3005.71 samples/sec Loss 2.7930 LearningRate 0.0058 Epoch: 15 Global Step: 188690 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:35:46,366-Speed 3027.03 samples/sec Loss 2.8059 LearningRate 0.0058 Epoch: 15 Global Step: 188700 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:35:49,810-Speed 2973.60 samples/sec Loss 2.8608 LearningRate 0.0058 Epoch: 15 Global Step: 188710 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:35:53,263-Speed 2966.70 samples/sec Loss 2.9484 LearningRate 0.0058 Epoch: 15 Global Step: 188720 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:35:56,699-Speed 2980.98 samples/sec Loss 2.8248 LearningRate 0.0058 Epoch: 15 Global Step: 188730 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:36:00,128-Speed 2987.61 samples/sec Loss 2.8993 LearningRate 0.0058 Epoch: 15 Global Step: 188740 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:36:03,606-Speed 2944.82 samples/sec Loss 2.9431 LearningRate 0.0058 Epoch: 15 Global Step: 188750 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:36:06,999-Speed 3018.29 samples/sec Loss 2.8852 LearningRate 0.0058 Epoch: 15 Global Step: 188760 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:36:10,396-Speed 3015.50 samples/sec Loss 2.7891 LearningRate 0.0058 Epoch: 15 Global Step: 188770 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:36:13,806-Speed 3004.54 samples/sec Loss 2.7934 LearningRate 0.0058 Epoch: 15 Global Step: 188780 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:36:17,219-Speed 3001.08 samples/sec Loss 2.9069 LearningRate 0.0058 Epoch: 15 Global Step: 188790 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:36:20,618-Speed 3014.16 samples/sec Loss 2.9364 LearningRate 0.0058 Epoch: 15 Global Step: 188800 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:36:23,962-Speed 3064.49 samples/sec Loss 2.8079 LearningRate 0.0058 Epoch: 15 Global Step: 188810 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:36:27,358-Speed 3015.86 samples/sec Loss 2.9031 LearningRate 0.0058 Epoch: 15 Global Step: 188820 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:36:30,772-Speed 3003.30 samples/sec Loss 2.8029 LearningRate 0.0058 Epoch: 15 Global Step: 188830 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:36:34,112-Speed 3066.44 samples/sec Loss 2.9220 LearningRate 0.0058 Epoch: 15 Global Step: 188840 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:36:37,529-Speed 2996.89 samples/sec Loss 2.8663 LearningRate 0.0058 Epoch: 15 Global Step: 188850 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:36:40,905-Speed 3034.12 samples/sec Loss 2.9010 LearningRate 0.0057 Epoch: 15 Global Step: 188860 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:36:44,284-Speed 3032.03 samples/sec Loss 2.9063 LearningRate 0.0057 Epoch: 15 Global Step: 188870 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:36:47,646-Speed 3046.62 samples/sec Loss 2.8612 LearningRate 0.0057 Epoch: 15 Global Step: 188880 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:36:51,059-Speed 3001.13 samples/sec Loss 2.8761 LearningRate 0.0057 Epoch: 15 Global Step: 188890 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:36:54,578-Speed 2910.63 samples/sec Loss 2.8743 LearningRate 0.0057 Epoch: 15 Global Step: 188900 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:36:57,915-Speed 3069.34 samples/sec Loss 2.9206 LearningRate 0.0057 Epoch: 15 Global Step: 188910 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:37:01,276-Speed 3048.03 samples/sec Loss 2.9271 LearningRate 0.0057 Epoch: 15 Global Step: 188920 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:37:04,672-Speed 3016.42 samples/sec Loss 2.9515 LearningRate 0.0057 Epoch: 15 Global Step: 188930 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:37:08,043-Speed 3038.52 samples/sec Loss 2.9252 LearningRate 0.0057 Epoch: 15 Global Step: 188940 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 19:37:11,498-Speed 2965.68 samples/sec Loss 2.8529 LearningRate 0.0057 Epoch: 15 Global Step: 188950 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 19:37:14,876-Speed 3031.69 samples/sec Loss 2.8974 LearningRate 0.0057 Epoch: 15 Global Step: 188960 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:37:18,256-Speed 3030.36 samples/sec Loss 2.8806 LearningRate 0.0057 Epoch: 15 Global Step: 188970 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:37:21,710-Speed 2966.03 samples/sec Loss 2.8755 LearningRate 0.0057 Epoch: 15 Global Step: 188980 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:37:25,194-Speed 2939.55 samples/sec Loss 2.9018 LearningRate 0.0057 Epoch: 15 Global Step: 188990 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:37:28,618-Speed 2992.09 samples/sec Loss 2.8960 LearningRate 0.0057 Epoch: 15 Global Step: 189000 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:37:32,008-Speed 3022.02 samples/sec Loss 2.8511 LearningRate 0.0057 Epoch: 15 Global Step: 189010 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:37:35,460-Speed 2967.18 samples/sec Loss 2.9116 LearningRate 0.0057 Epoch: 15 Global Step: 189020 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:37:38,920-Speed 2961.38 samples/sec Loss 2.8689 LearningRate 0.0057 Epoch: 15 Global Step: 189030 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:37:42,297-Speed 3033.02 samples/sec Loss 2.8610 LearningRate 0.0057 Epoch: 15 Global Step: 189040 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:37:45,744-Speed 2972.10 samples/sec Loss 2.8480 LearningRate 0.0057 Epoch: 15 Global Step: 189050 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:37:49,165-Speed 2993.90 samples/sec Loss 2.8895 LearningRate 0.0057 Epoch: 15 Global Step: 189060 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:37:52,586-Speed 2994.48 samples/sec Loss 2.8941 LearningRate 0.0057 Epoch: 15 Global Step: 189070 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:37:55,991-Speed 3007.82 samples/sec Loss 2.8930 LearningRate 0.0057 Epoch: 15 Global Step: 189080 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:37:59,379-Speed 3023.01 samples/sec Loss 2.9874 LearningRate 0.0057 Epoch: 15 Global Step: 189090 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:38:02,818-Speed 2978.98 samples/sec Loss 2.9016 LearningRate 0.0057 Epoch: 15 Global Step: 189100 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:38:06,135-Speed 3087.80 samples/sec Loss 2.8122 LearningRate 0.0057 Epoch: 15 Global Step: 189110 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:38:09,586-Speed 2968.05 samples/sec Loss 2.9039 LearningRate 0.0057 Epoch: 15 Global Step: 189120 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:38:12,984-Speed 3014.09 samples/sec Loss 2.8560 LearningRate 0.0057 Epoch: 15 Global Step: 189130 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:38:16,396-Speed 3002.52 samples/sec Loss 2.9040 LearningRate 0.0057 Epoch: 15 Global Step: 189140 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:38:19,714-Speed 3086.55 samples/sec Loss 2.8507 LearningRate 0.0057 Epoch: 15 Global Step: 189150 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:38:23,126-Speed 3001.90 samples/sec Loss 2.8619 LearningRate 0.0057 Epoch: 15 Global Step: 189160 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:38:26,510-Speed 3027.19 samples/sec Loss 2.9056 LearningRate 0.0057 Epoch: 15 Global Step: 189170 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:38:29,875-Speed 3044.18 samples/sec Loss 2.8515 LearningRate 0.0057 Epoch: 15 Global Step: 189180 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:38:33,254-Speed 3030.78 samples/sec Loss 2.8657 LearningRate 0.0057 Epoch: 15 Global Step: 189190 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:38:36,703-Speed 2969.61 samples/sec Loss 2.9267 LearningRate 0.0057 Epoch: 15 Global Step: 189200 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:38:40,070-Speed 3042.47 samples/sec Loss 2.8798 LearningRate 0.0057 Epoch: 15 Global Step: 189210 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:38:43,487-Speed 2997.70 samples/sec Loss 2.8434 LearningRate 0.0057 Epoch: 15 Global Step: 189220 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:38:46,838-Speed 3057.12 samples/sec Loss 2.9664 LearningRate 0.0057 Epoch: 15 Global Step: 189230 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:38:50,217-Speed 3031.11 samples/sec Loss 2.8148 LearningRate 0.0057 Epoch: 15 Global Step: 189240 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:38:53,584-Speed 3041.71 samples/sec Loss 2.8752 LearningRate 0.0057 Epoch: 15 Global Step: 189250 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:38:56,930-Speed 3061.18 samples/sec Loss 2.8806 LearningRate 0.0057 Epoch: 15 Global Step: 189260 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:39:00,261-Speed 3074.89 samples/sec Loss 2.8802 LearningRate 0.0057 Epoch: 15 Global Step: 189270 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:39:03,623-Speed 3046.84 samples/sec Loss 2.8552 LearningRate 0.0057 Epoch: 15 Global Step: 189280 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:39:07,018-Speed 3016.85 samples/sec Loss 2.8600 LearningRate 0.0057 Epoch: 15 Global Step: 189290 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:39:10,418-Speed 3012.62 samples/sec Loss 2.8806 LearningRate 0.0057 Epoch: 15 Global Step: 189300 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:39:13,890-Speed 2950.59 samples/sec Loss 2.9366 LearningRate 0.0057 Epoch: 15 Global Step: 189310 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:39:17,326-Speed 2981.14 samples/sec Loss 2.8506 LearningRate 0.0057 Epoch: 15 Global Step: 189320 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:39:20,704-Speed 3032.35 samples/sec Loss 2.9660 LearningRate 0.0057 Epoch: 15 Global Step: 189330 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:39:24,110-Speed 3007.36 samples/sec Loss 2.9193 LearningRate 0.0057 Epoch: 15 Global Step: 189340 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:39:27,552-Speed 2975.77 samples/sec Loss 2.8884 LearningRate 0.0057 Epoch: 15 Global Step: 189350 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:39:30,935-Speed 3027.49 samples/sec Loss 2.9279 LearningRate 0.0057 Epoch: 15 Global Step: 189360 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:39:34,351-Speed 2998.91 samples/sec Loss 2.9006 LearningRate 0.0057 Epoch: 15 Global Step: 189370 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:39:37,795-Speed 2973.41 samples/sec Loss 2.8923 LearningRate 0.0056 Epoch: 15 Global Step: 189380 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:39:41,204-Speed 3004.76 samples/sec Loss 2.8865 LearningRate 0.0056 Epoch: 15 Global Step: 189390 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:39:44,615-Speed 3003.12 samples/sec Loss 2.8235 LearningRate 0.0056 Epoch: 15 Global Step: 189400 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:39:47,995-Speed 3030.18 samples/sec Loss 2.8843 LearningRate 0.0056 Epoch: 15 Global Step: 189410 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:39:51,480-Speed 2939.36 samples/sec Loss 2.9374 LearningRate 0.0056 Epoch: 15 Global Step: 189420 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:39:54,899-Speed 2995.63 samples/sec Loss 2.8477 LearningRate 0.0056 Epoch: 15 Global Step: 189430 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:39:58,267-Speed 3041.32 samples/sec Loss 2.9583 LearningRate 0.0056 Epoch: 15 Global Step: 189440 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:40:01,639-Speed 3037.24 samples/sec Loss 2.9773 LearningRate 0.0056 Epoch: 15 Global Step: 189450 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:40:05,066-Speed 2989.21 samples/sec Loss 2.9149 LearningRate 0.0056 Epoch: 15 Global Step: 189460 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:40:08,378-Speed 3092.82 samples/sec Loss 2.8128 LearningRate 0.0056 Epoch: 15 Global Step: 189470 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:40:11,909-Speed 2901.03 samples/sec Loss 2.9907 LearningRate 0.0056 Epoch: 15 Global Step: 189480 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:40:15,312-Speed 3009.83 samples/sec Loss 2.9114 LearningRate 0.0056 Epoch: 15 Global Step: 189490 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:40:18,724-Speed 3002.06 samples/sec Loss 2.8892 LearningRate 0.0056 Epoch: 15 Global Step: 189500 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:40:22,158-Speed 2983.02 samples/sec Loss 2.8813 LearningRate 0.0056 Epoch: 15 Global Step: 189510 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:40:25,517-Speed 3048.74 samples/sec Loss 2.9500 LearningRate 0.0056 Epoch: 15 Global Step: 189520 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:40:28,921-Speed 3009.61 samples/sec Loss 2.9068 LearningRate 0.0056 Epoch: 15 Global Step: 189530 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:40:32,315-Speed 3017.30 samples/sec Loss 2.9468 LearningRate 0.0056 Epoch: 15 Global Step: 189540 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:40:35,672-Speed 3051.67 samples/sec Loss 2.8769 LearningRate 0.0056 Epoch: 15 Global Step: 189550 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:40:39,089-Speed 2998.06 samples/sec Loss 2.9071 LearningRate 0.0056 Epoch: 15 Global Step: 189560 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:40:42,444-Speed 3052.71 samples/sec Loss 2.9696 LearningRate 0.0056 Epoch: 15 Global Step: 189570 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:40:45,864-Speed 2994.77 samples/sec Loss 2.8951 LearningRate 0.0056 Epoch: 15 Global Step: 189580 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:40:49,269-Speed 3008.16 samples/sec Loss 2.9602 LearningRate 0.0056 Epoch: 15 Global Step: 189590 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:40:52,646-Speed 3033.24 samples/sec Loss 2.9731 LearningRate 0.0056 Epoch: 15 Global Step: 189600 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:40:56,117-Speed 2950.55 samples/sec Loss 2.8891 LearningRate 0.0056 Epoch: 15 Global Step: 189610 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:40:59,532-Speed 2999.36 samples/sec Loss 2.9503 LearningRate 0.0056 Epoch: 15 Global Step: 189620 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:41:02,919-Speed 3024.88 samples/sec Loss 2.9380 LearningRate 0.0056 Epoch: 15 Global Step: 189630 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:41:06,283-Speed 3043.88 samples/sec Loss 2.8293 LearningRate 0.0056 Epoch: 15 Global Step: 189640 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:41:09,585-Speed 3102.45 samples/sec Loss 2.9765 LearningRate 0.0056 Epoch: 15 Global Step: 189650 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:41:12,935-Speed 3057.96 samples/sec Loss 2.9439 LearningRate 0.0056 Epoch: 15 Global Step: 189660 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:41:16,306-Speed 3039.70 samples/sec Loss 2.9463 LearningRate 0.0056 Epoch: 15 Global Step: 189670 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:41:19,654-Speed 3060.08 samples/sec Loss 2.9845 LearningRate 0.0056 Epoch: 15 Global Step: 189680 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:41:23,084-Speed 2985.58 samples/sec Loss 2.9140 LearningRate 0.0056 Epoch: 15 Global Step: 189690 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:41:26,415-Speed 3074.97 samples/sec Loss 2.9288 LearningRate 0.0056 Epoch: 15 Global Step: 189700 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:41:29,778-Speed 3046.21 samples/sec Loss 2.9300 LearningRate 0.0056 Epoch: 15 Global Step: 189710 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:41:33,167-Speed 3022.89 samples/sec Loss 2.9603 LearningRate 0.0056 Epoch: 15 Global Step: 189720 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:41:36,612-Speed 2973.06 samples/sec Loss 2.8758 LearningRate 0.0056 Epoch: 15 Global Step: 189730 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:41:40,040-Speed 2988.15 samples/sec Loss 2.9697 LearningRate 0.0056 Epoch: 15 Global Step: 189740 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 19:41:43,476-Speed 2981.02 samples/sec Loss 2.9217 LearningRate 0.0056 Epoch: 15 Global Step: 189750 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 19:41:46,953-Speed 2946.05 samples/sec Loss 2.8924 LearningRate 0.0056 Epoch: 15 Global Step: 189760 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 19:41:50,356-Speed 3009.10 samples/sec Loss 3.0379 LearningRate 0.0056 Epoch: 15 Global Step: 189770 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 19:41:53,720-Speed 3045.08 samples/sec Loss 2.9839 LearningRate 0.0056 Epoch: 15 Global Step: 189780 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:41:57,009-Speed 3114.51 samples/sec Loss 2.8862 LearningRate 0.0056 Epoch: 15 Global Step: 189790 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:42:00,465-Speed 2963.32 samples/sec Loss 2.9339 LearningRate 0.0056 Epoch: 15 Global Step: 189800 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:42:03,801-Speed 3070.40 samples/sec Loss 2.9817 LearningRate 0.0056 Epoch: 15 Global Step: 189810 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:42:07,216-Speed 2999.42 samples/sec Loss 2.9870 LearningRate 0.0056 Epoch: 15 Global Step: 189820 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:42:10,580-Speed 3045.28 samples/sec Loss 2.8137 LearningRate 0.0056 Epoch: 15 Global Step: 189830 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:42:13,934-Speed 3053.56 samples/sec Loss 2.9507 LearningRate 0.0056 Epoch: 15 Global Step: 189840 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:42:17,245-Speed 3093.38 samples/sec Loss 2.9523 LearningRate 0.0056 Epoch: 15 Global Step: 189850 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:42:20,596-Speed 3056.79 samples/sec Loss 3.0023 LearningRate 0.0056 Epoch: 15 Global Step: 189860 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:42:23,962-Speed 3043.63 samples/sec Loss 2.9006 LearningRate 0.0056 Epoch: 15 Global Step: 189870 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:42:27,354-Speed 3019.35 samples/sec Loss 3.0141 LearningRate 0.0056 Epoch: 15 Global Step: 189880 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:42:30,804-Speed 2969.32 samples/sec Loss 2.8805 LearningRate 0.0056 Epoch: 15 Global Step: 189890 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:42:34,197-Speed 3018.71 samples/sec Loss 2.9282 LearningRate 0.0055 Epoch: 15 Global Step: 189900 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:42:37,515-Speed 3087.41 samples/sec Loss 2.9768 LearningRate 0.0055 Epoch: 15 Global Step: 189910 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:42:40,828-Speed 3091.05 samples/sec Loss 3.0011 LearningRate 0.0055 Epoch: 15 Global Step: 189920 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:42:44,170-Speed 3065.17 samples/sec Loss 3.0029 LearningRate 0.0055 Epoch: 15 Global Step: 189930 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:42:47,569-Speed 3013.73 samples/sec Loss 2.9646 LearningRate 0.0055 Epoch: 15 Global Step: 189940 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:42:50,975-Speed 3007.23 samples/sec Loss 2.9903 LearningRate 0.0055 Epoch: 15 Global Step: 189950 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:42:54,360-Speed 3026.11 samples/sec Loss 3.0375 LearningRate 0.0055 Epoch: 15 Global Step: 189960 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:42:57,733-Speed 3036.75 samples/sec Loss 3.0134 LearningRate 0.0055 Epoch: 15 Global Step: 189970 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:43:01,134-Speed 3011.61 samples/sec Loss 3.0183 LearningRate 0.0055 Epoch: 15 Global Step: 189980 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:43:04,516-Speed 3028.46 samples/sec Loss 2.8924 LearningRate 0.0055 Epoch: 15 Global Step: 189990 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:43:07,883-Speed 3042.87 samples/sec Loss 2.9600 LearningRate 0.0055 Epoch: 15 Global Step: 190000 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:43:11,296-Speed 3000.64 samples/sec Loss 2.9184 LearningRate 0.0055 Epoch: 15 Global Step: 190010 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:43:14,696-Speed 3012.88 samples/sec Loss 2.9585 LearningRate 0.0055 Epoch: 15 Global Step: 190020 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:43:18,082-Speed 3025.25 samples/sec Loss 2.9576 LearningRate 0.0055 Epoch: 15 Global Step: 190030 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:43:21,457-Speed 3035.20 samples/sec Loss 2.8943 LearningRate 0.0055 Epoch: 15 Global Step: 190040 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:43:24,818-Speed 3046.64 samples/sec Loss 2.9398 LearningRate 0.0055 Epoch: 15 Global Step: 190050 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:43:28,225-Speed 3007.16 samples/sec Loss 2.8755 LearningRate 0.0055 Epoch: 15 Global Step: 190060 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:43:31,635-Speed 3003.26 samples/sec Loss 3.0154 LearningRate 0.0055 Epoch: 15 Global Step: 190070 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:43:35,064-Speed 2986.90 samples/sec Loss 2.9527 LearningRate 0.0055 Epoch: 15 Global Step: 190080 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:43:38,521-Speed 2963.08 samples/sec Loss 2.9888 LearningRate 0.0055 Epoch: 15 Global Step: 190090 Fp16 Grad Scale: 65536 Required: 6 hours Training: 2022-04-27 19:43:41,892-Speed 3038.89 samples/sec Loss 2.9247 LearningRate 0.0055 Epoch: 15 Global Step: 190100 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:43:45,345-Speed 2966.12 samples/sec Loss 2.9040 LearningRate 0.0055 Epoch: 15 Global Step: 190110 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:43:48,748-Speed 3009.87 samples/sec Loss 2.9097 LearningRate 0.0055 Epoch: 15 Global Step: 190120 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:43:52,213-Speed 2956.45 samples/sec Loss 2.9937 LearningRate 0.0055 Epoch: 15 Global Step: 190130 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:43:55,592-Speed 3032.20 samples/sec Loss 2.9687 LearningRate 0.0055 Epoch: 15 Global Step: 190140 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:43:58,927-Speed 3071.26 samples/sec Loss 2.9828 LearningRate 0.0055 Epoch: 15 Global Step: 190150 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:44:02,339-Speed 3001.87 samples/sec Loss 2.9356 LearningRate 0.0055 Epoch: 15 Global Step: 190160 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:44:05,710-Speed 3038.52 samples/sec Loss 3.0362 LearningRate 0.0055 Epoch: 15 Global Step: 190170 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:44:09,078-Speed 3041.13 samples/sec Loss 2.9346 LearningRate 0.0055 Epoch: 15 Global Step: 190180 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:44:12,431-Speed 3055.04 samples/sec Loss 2.9459 LearningRate 0.0055 Epoch: 15 Global Step: 190190 Fp16 Grad Scale: 16384 Required: 6 hours Training: 2022-04-27 19:44:15,841-Speed 3004.28 samples/sec Loss 3.0600 LearningRate 0.0055 Epoch: 15 Global Step: 190200 Fp16 Grad Scale: 32768 Required: 6 hours Training: 2022-04-27 19:44:19,179-Speed 3068.37 samples/sec Loss 3.0355 LearningRate 0.0055 Epoch: 15 Global Step: 190210 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:44:22,508-Speed 3077.05 samples/sec Loss 2.9548 LearningRate 0.0055 Epoch: 15 Global Step: 190220 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:44:25,961-Speed 2966.09 samples/sec Loss 2.9755 LearningRate 0.0055 Epoch: 15 Global Step: 190230 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:44:29,401-Speed 2977.86 samples/sec Loss 3.0551 LearningRate 0.0055 Epoch: 15 Global Step: 190240 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:44:32,768-Speed 3042.00 samples/sec Loss 3.0046 LearningRate 0.0055 Epoch: 15 Global Step: 190250 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:44:36,155-Speed 3024.26 samples/sec Loss 2.9499 LearningRate 0.0055 Epoch: 15 Global Step: 190260 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:44:39,478-Speed 3081.72 samples/sec Loss 2.9786 LearningRate 0.0055 Epoch: 15 Global Step: 190270 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:44:42,922-Speed 2974.81 samples/sec Loss 2.9701 LearningRate 0.0055 Epoch: 15 Global Step: 190280 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:44:46,309-Speed 3023.89 samples/sec Loss 3.0331 LearningRate 0.0055 Epoch: 15 Global Step: 190290 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:44:49,639-Speed 3076.46 samples/sec Loss 2.9703 LearningRate 0.0055 Epoch: 15 Global Step: 190300 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:44:52,962-Speed 3082.85 samples/sec Loss 2.8782 LearningRate 0.0055 Epoch: 15 Global Step: 190310 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:44:56,354-Speed 3019.53 samples/sec Loss 3.0404 LearningRate 0.0055 Epoch: 15 Global Step: 190320 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:44:59,736-Speed 3028.45 samples/sec Loss 3.0124 LearningRate 0.0055 Epoch: 15 Global Step: 190330 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:45:03,081-Speed 3062.24 samples/sec Loss 2.9768 LearningRate 0.0055 Epoch: 15 Global Step: 190340 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:45:06,415-Speed 3071.64 samples/sec Loss 3.0095 LearningRate 0.0055 Epoch: 15 Global Step: 190350 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:45:09,785-Speed 3039.86 samples/sec Loss 2.9752 LearningRate 0.0055 Epoch: 15 Global Step: 190360 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:45:13,115-Speed 3075.40 samples/sec Loss 3.0084 LearningRate 0.0055 Epoch: 15 Global Step: 190370 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:45:16,573-Speed 2962.35 samples/sec Loss 2.9586 LearningRate 0.0055 Epoch: 15 Global Step: 190380 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:45:19,896-Speed 3082.54 samples/sec Loss 3.0727 LearningRate 0.0055 Epoch: 15 Global Step: 190390 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:45:23,239-Speed 3063.25 samples/sec Loss 3.0048 LearningRate 0.0055 Epoch: 15 Global Step: 190400 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 19:45:26,566-Speed 3079.36 samples/sec Loss 2.9276 LearningRate 0.0055 Epoch: 15 Global Step: 190410 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 19:45:29,887-Speed 3084.58 samples/sec Loss 2.9998 LearningRate 0.0055 Epoch: 15 Global Step: 190420 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 19:45:33,276-Speed 3022.11 samples/sec Loss 2.9932 LearningRate 0.0054 Epoch: 15 Global Step: 190430 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 19:45:36,729-Speed 2966.43 samples/sec Loss 3.0077 LearningRate 0.0054 Epoch: 15 Global Step: 190440 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 19:45:40,171-Speed 2975.35 samples/sec Loss 2.9244 LearningRate 0.0054 Epoch: 15 Global Step: 190450 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 19:45:43,512-Speed 3066.67 samples/sec Loss 3.0056 LearningRate 0.0054 Epoch: 15 Global Step: 190460 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 19:45:46,884-Speed 3036.84 samples/sec Loss 2.9201 LearningRate 0.0054 Epoch: 15 Global Step: 190470 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 19:45:50,267-Speed 3028.30 samples/sec Loss 2.9782 LearningRate 0.0054 Epoch: 15 Global Step: 190480 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 19:45:53,669-Speed 3010.46 samples/sec Loss 2.9886 LearningRate 0.0054 Epoch: 15 Global Step: 190490 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:45:57,008-Speed 3068.03 samples/sec Loss 2.9819 LearningRate 0.0054 Epoch: 15 Global Step: 190500 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:46:00,484-Speed 2947.19 samples/sec Loss 3.0755 LearningRate 0.0054 Epoch: 15 Global Step: 190510 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:46:03,831-Speed 3060.26 samples/sec Loss 3.0420 LearningRate 0.0054 Epoch: 15 Global Step: 190520 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:46:07,214-Speed 3027.47 samples/sec Loss 2.9713 LearningRate 0.0054 Epoch: 15 Global Step: 190530 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:46:10,624-Speed 3004.23 samples/sec Loss 2.9651 LearningRate 0.0054 Epoch: 15 Global Step: 190540 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:46:14,083-Speed 2960.83 samples/sec Loss 2.9294 LearningRate 0.0054 Epoch: 15 Global Step: 190550 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:46:17,536-Speed 2966.72 samples/sec Loss 2.9136 LearningRate 0.0054 Epoch: 15 Global Step: 190560 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:46:20,923-Speed 3024.22 samples/sec Loss 2.9549 LearningRate 0.0054 Epoch: 15 Global Step: 190570 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:46:24,252-Speed 3077.12 samples/sec Loss 2.9760 LearningRate 0.0054 Epoch: 15 Global Step: 190580 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:46:27,563-Speed 3093.41 samples/sec Loss 3.0188 LearningRate 0.0054 Epoch: 15 Global Step: 190590 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:46:30,944-Speed 3030.09 samples/sec Loss 2.9719 LearningRate 0.0054 Epoch: 15 Global Step: 190600 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:46:34,340-Speed 3015.74 samples/sec Loss 2.9834 LearningRate 0.0054 Epoch: 15 Global Step: 190610 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:46:37,793-Speed 2966.17 samples/sec Loss 2.9866 LearningRate 0.0054 Epoch: 15 Global Step: 190620 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:46:41,211-Speed 2997.21 samples/sec Loss 2.9679 LearningRate 0.0054 Epoch: 15 Global Step: 190630 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:46:44,597-Speed 3024.93 samples/sec Loss 3.0506 LearningRate 0.0054 Epoch: 15 Global Step: 190640 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:46:48,001-Speed 3008.51 samples/sec Loss 2.9114 LearningRate 0.0054 Epoch: 15 Global Step: 190650 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:46:51,381-Speed 3030.25 samples/sec Loss 3.0135 LearningRate 0.0054 Epoch: 15 Global Step: 190660 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:46:54,727-Speed 3061.92 samples/sec Loss 2.9888 LearningRate 0.0054 Epoch: 15 Global Step: 190670 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:46:58,062-Speed 3071.47 samples/sec Loss 3.0065 LearningRate 0.0054 Epoch: 15 Global Step: 190680 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:47:01,547-Speed 2939.29 samples/sec Loss 3.0220 LearningRate 0.0054 Epoch: 15 Global Step: 190690 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 19:47:05,041-Speed 2931.11 samples/sec Loss 2.9956 LearningRate 0.0054 Epoch: 15 Global Step: 190700 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 19:47:08,510-Speed 2952.46 samples/sec Loss 2.9842 LearningRate 0.0054 Epoch: 15 Global Step: 190710 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:47:11,893-Speed 3028.15 samples/sec Loss 3.0167 LearningRate 0.0054 Epoch: 15 Global Step: 190720 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:47:15,300-Speed 3006.58 samples/sec Loss 2.9116 LearningRate 0.0054 Epoch: 15 Global Step: 190730 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:47:18,702-Speed 3010.35 samples/sec Loss 3.0478 LearningRate 0.0054 Epoch: 15 Global Step: 190740 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:47:22,139-Speed 2980.58 samples/sec Loss 2.9591 LearningRate 0.0054 Epoch: 15 Global Step: 190750 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:47:25,663-Speed 2906.91 samples/sec Loss 2.9943 LearningRate 0.0054 Epoch: 15 Global Step: 190760 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:47:29,064-Speed 3011.24 samples/sec Loss 2.9190 LearningRate 0.0054 Epoch: 15 Global Step: 190770 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:47:32,492-Speed 2988.75 samples/sec Loss 3.0649 LearningRate 0.0054 Epoch: 15 Global Step: 190780 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:47:35,893-Speed 3012.97 samples/sec Loss 2.9575 LearningRate 0.0054 Epoch: 15 Global Step: 190790 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:47:39,242-Speed 3058.11 samples/sec Loss 2.9970 LearningRate 0.0054 Epoch: 15 Global Step: 190800 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:47:42,691-Speed 2969.90 samples/sec Loss 2.9739 LearningRate 0.0054 Epoch: 15 Global Step: 190810 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 19:47:46,046-Speed 3052.95 samples/sec Loss 2.9692 LearningRate 0.0054 Epoch: 15 Global Step: 190820 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 19:47:49,457-Speed 3002.64 samples/sec Loss 3.0110 LearningRate 0.0054 Epoch: 15 Global Step: 190830 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:47:52,864-Speed 3006.56 samples/sec Loss 2.9771 LearningRate 0.0054 Epoch: 15 Global Step: 190840 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:47:56,253-Speed 3022.54 samples/sec Loss 3.0407 LearningRate 0.0054 Epoch: 15 Global Step: 190850 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:47:59,613-Speed 3048.69 samples/sec Loss 2.9778 LearningRate 0.0054 Epoch: 15 Global Step: 190860 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:48:02,965-Speed 3055.84 samples/sec Loss 2.9962 LearningRate 0.0054 Epoch: 15 Global Step: 190870 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:48:06,263-Speed 3105.98 samples/sec Loss 3.0443 LearningRate 0.0054 Epoch: 15 Global Step: 190880 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 19:48:09,647-Speed 3026.97 samples/sec Loss 3.0605 LearningRate 0.0054 Epoch: 15 Global Step: 190890 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 19:48:13,008-Speed 3047.18 samples/sec Loss 3.0001 LearningRate 0.0054 Epoch: 15 Global Step: 190900 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 19:48:16,457-Speed 2970.46 samples/sec Loss 2.9728 LearningRate 0.0054 Epoch: 15 Global Step: 190910 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 19:48:19,898-Speed 2976.82 samples/sec Loss 2.9985 LearningRate 0.0054 Epoch: 15 Global Step: 190920 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 19:48:23,244-Speed 3061.28 samples/sec Loss 2.9982 LearningRate 0.0054 Epoch: 15 Global Step: 190930 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 19:48:26,651-Speed 3006.22 samples/sec Loss 3.0466 LearningRate 0.0054 Epoch: 15 Global Step: 190940 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 19:48:30,059-Speed 3005.16 samples/sec Loss 3.0637 LearningRate 0.0054 Epoch: 15 Global Step: 190950 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 19:48:33,464-Speed 3008.90 samples/sec Loss 3.0313 LearningRate 0.0054 Epoch: 15 Global Step: 190960 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 19:48:36,791-Speed 3077.90 samples/sec Loss 2.9899 LearningRate 0.0053 Epoch: 15 Global Step: 190970 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 19:48:40,202-Speed 3003.80 samples/sec Loss 3.0228 LearningRate 0.0053 Epoch: 15 Global Step: 190980 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 19:48:43,532-Speed 3075.87 samples/sec Loss 3.0532 LearningRate 0.0053 Epoch: 15 Global Step: 190990 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 19:48:47,003-Speed 2951.05 samples/sec Loss 3.0496 LearningRate 0.0053 Epoch: 15 Global Step: 191000 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 19:48:50,412-Speed 3004.26 samples/sec Loss 3.0359 LearningRate 0.0053 Epoch: 15 Global Step: 191010 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 19:48:53,877-Speed 2956.06 samples/sec Loss 2.9545 LearningRate 0.0053 Epoch: 15 Global Step: 191020 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 19:48:57,247-Speed 3039.63 samples/sec Loss 3.0905 LearningRate 0.0053 Epoch: 15 Global Step: 191030 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 19:49:00,677-Speed 2986.48 samples/sec Loss 2.9925 LearningRate 0.0053 Epoch: 15 Global Step: 191040 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 19:49:04,174-Speed 2929.17 samples/sec Loss 3.0305 LearningRate 0.0053 Epoch: 15 Global Step: 191050 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 19:49:07,540-Speed 3043.30 samples/sec Loss 3.0276 LearningRate 0.0053 Epoch: 15 Global Step: 191060 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 19:49:10,897-Speed 3051.01 samples/sec Loss 3.0170 LearningRate 0.0053 Epoch: 15 Global Step: 191070 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 19:49:14,285-Speed 3023.14 samples/sec Loss 3.0439 LearningRate 0.0053 Epoch: 15 Global Step: 191080 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:49:17,688-Speed 3010.56 samples/sec Loss 2.9754 LearningRate 0.0053 Epoch: 15 Global Step: 191090 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:49:21,013-Speed 3080.43 samples/sec Loss 3.0194 LearningRate 0.0053 Epoch: 15 Global Step: 191100 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:49:24,365-Speed 3055.24 samples/sec Loss 2.9645 LearningRate 0.0053 Epoch: 15 Global Step: 191110 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:49:27,725-Speed 3048.38 samples/sec Loss 3.0682 LearningRate 0.0053 Epoch: 15 Global Step: 191120 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:49:31,087-Speed 3047.43 samples/sec Loss 2.9263 LearningRate 0.0053 Epoch: 15 Global Step: 191130 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:49:34,392-Speed 3098.74 samples/sec Loss 3.0530 LearningRate 0.0053 Epoch: 15 Global Step: 191140 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 19:49:37,774-Speed 3029.10 samples/sec Loss 3.0142 LearningRate 0.0053 Epoch: 15 Global Step: 191150 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 19:49:41,210-Speed 2980.87 samples/sec Loss 2.9478 LearningRate 0.0053 Epoch: 15 Global Step: 191160 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 19:49:44,701-Speed 2934.16 samples/sec Loss 3.0324 LearningRate 0.0053 Epoch: 15 Global Step: 191170 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 19:49:48,185-Speed 2939.65 samples/sec Loss 2.9663 LearningRate 0.0053 Epoch: 15 Global Step: 191180 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 19:49:51,549-Speed 3044.97 samples/sec Loss 3.0244 LearningRate 0.0053 Epoch: 15 Global Step: 191190 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 19:49:54,938-Speed 3022.26 samples/sec Loss 3.0559 LearningRate 0.0053 Epoch: 15 Global Step: 191200 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 19:49:59,144-Speed 2435.43 samples/sec Loss 3.0265 LearningRate 0.0053 Epoch: 15 Global Step: 191210 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 19:50:02,497-Speed 3055.72 samples/sec Loss 2.9661 LearningRate 0.0053 Epoch: 15 Global Step: 191220 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 19:50:05,868-Speed 3038.55 samples/sec Loss 3.0781 LearningRate 0.0053 Epoch: 15 Global Step: 191230 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 19:50:09,335-Speed 2954.18 samples/sec Loss 2.9913 LearningRate 0.0053 Epoch: 15 Global Step: 191240 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:50:12,764-Speed 2987.62 samples/sec Loss 3.0742 LearningRate 0.0053 Epoch: 15 Global Step: 191250 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:50:16,195-Speed 2985.23 samples/sec Loss 3.0097 LearningRate 0.0053 Epoch: 15 Global Step: 191260 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:50:19,559-Speed 3044.82 samples/sec Loss 2.9464 LearningRate 0.0053 Epoch: 15 Global Step: 191270 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:50:22,884-Speed 3080.51 samples/sec Loss 3.0275 LearningRate 0.0053 Epoch: 15 Global Step: 191280 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 19:50:26,254-Speed 3039.31 samples/sec Loss 3.0736 LearningRate 0.0053 Epoch: 15 Global Step: 191290 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 19:50:29,618-Speed 3045.36 samples/sec Loss 3.0009 LearningRate 0.0053 Epoch: 15 Global Step: 191300 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 19:50:33,059-Speed 2977.19 samples/sec Loss 3.0467 LearningRate 0.0053 Epoch: 15 Global Step: 191310 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 19:50:36,475-Speed 2998.10 samples/sec Loss 3.0409 LearningRate 0.0053 Epoch: 15 Global Step: 191320 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 19:50:39,812-Speed 3070.05 samples/sec Loss 2.9776 LearningRate 0.0053 Epoch: 15 Global Step: 191330 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 19:50:43,143-Speed 3074.53 samples/sec Loss 3.0175 LearningRate 0.0053 Epoch: 15 Global Step: 191340 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 19:50:46,511-Speed 3041.17 samples/sec Loss 3.0180 LearningRate 0.0053 Epoch: 15 Global Step: 191350 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 19:50:49,873-Speed 3047.22 samples/sec Loss 3.0336 LearningRate 0.0053 Epoch: 15 Global Step: 191360 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 19:50:53,327-Speed 2965.65 samples/sec Loss 3.1276 LearningRate 0.0053 Epoch: 15 Global Step: 191370 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 19:50:56,706-Speed 3030.88 samples/sec Loss 2.9845 LearningRate 0.0053 Epoch: 15 Global Step: 191380 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 19:51:00,127-Speed 2993.80 samples/sec Loss 2.9174 LearningRate 0.0053 Epoch: 15 Global Step: 191390 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 19:51:03,532-Speed 3008.57 samples/sec Loss 3.0175 LearningRate 0.0053 Epoch: 15 Global Step: 191400 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 19:51:06,922-Speed 3021.29 samples/sec Loss 2.9882 LearningRate 0.0053 Epoch: 15 Global Step: 191410 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 19:51:10,291-Speed 3040.12 samples/sec Loss 3.0189 LearningRate 0.0053 Epoch: 15 Global Step: 191420 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 19:51:13,645-Speed 3053.99 samples/sec Loss 3.0341 LearningRate 0.0053 Epoch: 15 Global Step: 191430 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 19:51:17,034-Speed 3023.15 samples/sec Loss 2.9875 LearningRate 0.0053 Epoch: 15 Global Step: 191440 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 19:51:20,384-Speed 3057.02 samples/sec Loss 3.0257 LearningRate 0.0053 Epoch: 15 Global Step: 191450 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 19:51:23,746-Speed 3046.79 samples/sec Loss 3.0030 LearningRate 0.0053 Epoch: 15 Global Step: 191460 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 19:51:27,085-Speed 3068.23 samples/sec Loss 2.9577 LearningRate 0.0053 Epoch: 15 Global Step: 191470 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 19:51:30,407-Speed 3082.44 samples/sec Loss 2.9923 LearningRate 0.0053 Epoch: 15 Global Step: 191480 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 19:51:33,754-Speed 3060.96 samples/sec Loss 3.0452 LearningRate 0.0053 Epoch: 15 Global Step: 191490 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 19:51:37,122-Speed 3040.48 samples/sec Loss 3.0444 LearningRate 0.0052 Epoch: 15 Global Step: 191500 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 19:51:40,529-Speed 3006.47 samples/sec Loss 2.9829 LearningRate 0.0052 Epoch: 15 Global Step: 191510 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 19:51:43,987-Speed 2963.07 samples/sec Loss 3.0401 LearningRate 0.0052 Epoch: 15 Global Step: 191520 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:51:47,395-Speed 3005.09 samples/sec Loss 2.9759 LearningRate 0.0052 Epoch: 15 Global Step: 191530 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:51:50,733-Speed 3069.03 samples/sec Loss 3.1009 LearningRate 0.0052 Epoch: 15 Global Step: 191540 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:51:54,046-Speed 3091.13 samples/sec Loss 3.0728 LearningRate 0.0052 Epoch: 15 Global Step: 191550 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:51:57,463-Speed 2997.84 samples/sec Loss 3.0059 LearningRate 0.0052 Epoch: 15 Global Step: 191560 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:52:00,828-Speed 3044.30 samples/sec Loss 2.9901 LearningRate 0.0052 Epoch: 15 Global Step: 191570 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:52:04,249-Speed 2994.10 samples/sec Loss 3.0588 LearningRate 0.0052 Epoch: 15 Global Step: 191580 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:52:07,680-Speed 2985.28 samples/sec Loss 2.9605 LearningRate 0.0052 Epoch: 15 Global Step: 191590 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:52:11,031-Speed 3056.96 samples/sec Loss 3.0614 LearningRate 0.0052 Epoch: 15 Global Step: 191600 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:52:14,444-Speed 3001.30 samples/sec Loss 3.0507 LearningRate 0.0052 Epoch: 15 Global Step: 191610 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:52:17,758-Speed 3091.34 samples/sec Loss 3.0139 LearningRate 0.0052 Epoch: 15 Global Step: 191620 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 19:52:21,072-Speed 3090.25 samples/sec Loss 3.0942 LearningRate 0.0052 Epoch: 15 Global Step: 191630 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:52:24,463-Speed 3020.79 samples/sec Loss 2.9298 LearningRate 0.0052 Epoch: 15 Global Step: 191640 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:52:27,835-Speed 3037.53 samples/sec Loss 3.0190 LearningRate 0.0052 Epoch: 15 Global Step: 191650 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 19:52:31,235-Speed 3013.12 samples/sec Loss 2.9957 LearningRate 0.0052 Epoch: 15 Global Step: 191660 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 19:52:34,585-Speed 3058.13 samples/sec Loss 3.0112 LearningRate 0.0052 Epoch: 15 Global Step: 191670 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 19:52:37,926-Speed 3065.49 samples/sec Loss 3.0935 LearningRate 0.0052 Epoch: 15 Global Step: 191680 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 19:52:41,368-Speed 2975.45 samples/sec Loss 3.0393 LearningRate 0.0052 Epoch: 15 Global Step: 191690 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 19:52:44,780-Speed 3001.86 samples/sec Loss 3.0798 LearningRate 0.0052 Epoch: 15 Global Step: 191700 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 19:52:48,211-Speed 2985.27 samples/sec Loss 3.0830 LearningRate 0.0052 Epoch: 15 Global Step: 191710 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 19:52:51,683-Speed 2951.15 samples/sec Loss 3.0279 LearningRate 0.0052 Epoch: 15 Global Step: 191720 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 19:52:55,117-Speed 2982.61 samples/sec Loss 3.0468 LearningRate 0.0052 Epoch: 15 Global Step: 191730 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 19:52:58,616-Speed 2927.04 samples/sec Loss 3.0368 LearningRate 0.0052 Epoch: 15 Global Step: 191740 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 19:53:02,064-Speed 2971.08 samples/sec Loss 3.0592 LearningRate 0.0052 Epoch: 15 Global Step: 191750 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 19:53:05,415-Speed 3056.74 samples/sec Loss 2.9972 LearningRate 0.0052 Epoch: 15 Global Step: 191760 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 19:53:08,886-Speed 2951.09 samples/sec Loss 2.9904 LearningRate 0.0052 Epoch: 15 Global Step: 191770 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 19:53:12,347-Speed 2958.85 samples/sec Loss 3.0193 LearningRate 0.0052 Epoch: 15 Global Step: 191780 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 19:53:15,755-Speed 3005.75 samples/sec Loss 3.0416 LearningRate 0.0052 Epoch: 15 Global Step: 191790 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 19:53:19,171-Speed 2998.13 samples/sec Loss 3.0199 LearningRate 0.0052 Epoch: 15 Global Step: 191800 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 19:53:22,595-Speed 2992.19 samples/sec Loss 3.0606 LearningRate 0.0052 Epoch: 15 Global Step: 191810 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 19:53:26,009-Speed 3000.27 samples/sec Loss 3.0470 LearningRate 0.0052 Epoch: 15 Global Step: 191820 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 19:53:29,523-Speed 2914.94 samples/sec Loss 3.0476 LearningRate 0.0052 Epoch: 15 Global Step: 191830 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 19:53:32,895-Speed 3037.61 samples/sec Loss 3.0326 LearningRate 0.0052 Epoch: 15 Global Step: 191840 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 19:53:36,320-Speed 2989.67 samples/sec Loss 3.0165 LearningRate 0.0052 Epoch: 15 Global Step: 191850 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 19:53:39,758-Speed 2979.70 samples/sec Loss 3.0360 LearningRate 0.0052 Epoch: 15 Global Step: 191860 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 19:53:43,176-Speed 2996.74 samples/sec Loss 2.9880 LearningRate 0.0052 Epoch: 15 Global Step: 191870 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 19:53:46,510-Speed 3072.60 samples/sec Loss 3.0624 LearningRate 0.0052 Epoch: 15 Global Step: 191880 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:53:49,986-Speed 2946.71 samples/sec Loss 3.0286 LearningRate 0.0052 Epoch: 15 Global Step: 191890 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:53:53,411-Speed 2990.94 samples/sec Loss 3.1164 LearningRate 0.0052 Epoch: 15 Global Step: 191900 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 19:53:56,920-Speed 2918.90 samples/sec Loss 3.0896 LearningRate 0.0052 Epoch: 15 Global Step: 191910 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 19:54:00,277-Speed 3051.34 samples/sec Loss 2.9910 LearningRate 0.0052 Epoch: 15 Global Step: 191920 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 19:54:03,587-Speed 3093.95 samples/sec Loss 3.0110 LearningRate 0.0052 Epoch: 15 Global Step: 191930 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 19:54:06,897-Speed 3094.98 samples/sec Loss 2.9898 LearningRate 0.0052 Epoch: 15 Global Step: 191940 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 19:54:10,241-Speed 3062.56 samples/sec Loss 3.0563 LearningRate 0.0052 Epoch: 15 Global Step: 191950 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 19:54:13,637-Speed 3016.60 samples/sec Loss 3.0330 LearningRate 0.0052 Epoch: 15 Global Step: 191960 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 19:54:17,055-Speed 2996.49 samples/sec Loss 2.9994 LearningRate 0.0052 Epoch: 15 Global Step: 191970 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 19:54:20,492-Speed 2980.07 samples/sec Loss 2.9689 LearningRate 0.0052 Epoch: 15 Global Step: 191980 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 19:54:23,924-Speed 2985.16 samples/sec Loss 2.9781 LearningRate 0.0052 Epoch: 15 Global Step: 191990 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 19:54:27,285-Speed 3047.65 samples/sec Loss 3.0541 LearningRate 0.0052 Epoch: 15 Global Step: 192000 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:54:30,714-Speed 2987.48 samples/sec Loss 3.0008 LearningRate 0.0052 Epoch: 15 Global Step: 192010 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:54:34,149-Speed 2981.90 samples/sec Loss 2.9227 LearningRate 0.0052 Epoch: 15 Global Step: 192020 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:54:37,597-Speed 2969.94 samples/sec Loss 3.0968 LearningRate 0.0052 Epoch: 15 Global Step: 192030 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:54:40,967-Speed 3039.77 samples/sec Loss 2.9320 LearningRate 0.0052 Epoch: 15 Global Step: 192040 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:54:44,301-Speed 3072.43 samples/sec Loss 3.0049 LearningRate 0.0051 Epoch: 15 Global Step: 192050 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:54:47,641-Speed 3066.97 samples/sec Loss 3.0505 LearningRate 0.0051 Epoch: 15 Global Step: 192060 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:54:51,076-Speed 2981.45 samples/sec Loss 3.0534 LearningRate 0.0051 Epoch: 15 Global Step: 192070 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 19:54:54,542-Speed 2955.22 samples/sec Loss 3.0302 LearningRate 0.0051 Epoch: 15 Global Step: 192080 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 19:54:57,939-Speed 3014.89 samples/sec Loss 2.9983 LearningRate 0.0051 Epoch: 15 Global Step: 192090 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 19:55:01,306-Speed 3043.21 samples/sec Loss 3.0823 LearningRate 0.0051 Epoch: 15 Global Step: 192100 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 19:55:04,654-Speed 3059.02 samples/sec Loss 3.0030 LearningRate 0.0051 Epoch: 15 Global Step: 192110 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 19:55:08,034-Speed 3030.36 samples/sec Loss 3.0983 LearningRate 0.0051 Epoch: 15 Global Step: 192120 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 19:55:11,424-Speed 3021.13 samples/sec Loss 3.0340 LearningRate 0.0051 Epoch: 15 Global Step: 192130 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 19:55:14,857-Speed 2984.59 samples/sec Loss 3.0781 LearningRate 0.0051 Epoch: 15 Global Step: 192140 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 19:55:18,251-Speed 3017.89 samples/sec Loss 3.0786 LearningRate 0.0051 Epoch: 15 Global Step: 192150 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 19:55:21,663-Speed 3001.61 samples/sec Loss 3.0045 LearningRate 0.0051 Epoch: 15 Global Step: 192160 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 19:55:25,059-Speed 3016.15 samples/sec Loss 3.0162 LearningRate 0.0051 Epoch: 15 Global Step: 192170 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:55:28,406-Speed 3060.22 samples/sec Loss 3.0848 LearningRate 0.0051 Epoch: 15 Global Step: 192180 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:55:31,759-Speed 3055.25 samples/sec Loss 2.9879 LearningRate 0.0051 Epoch: 15 Global Step: 192190 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:55:35,162-Speed 3009.48 samples/sec Loss 3.0542 LearningRate 0.0051 Epoch: 15 Global Step: 192200 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:55:38,572-Speed 3003.92 samples/sec Loss 2.9822 LearningRate 0.0051 Epoch: 15 Global Step: 192210 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:55:41,921-Speed 3058.39 samples/sec Loss 3.0286 LearningRate 0.0051 Epoch: 15 Global Step: 192220 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:55:45,382-Speed 2960.63 samples/sec Loss 3.0462 LearningRate 0.0051 Epoch: 15 Global Step: 192230 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:55:48,792-Speed 3003.48 samples/sec Loss 3.1283 LearningRate 0.0051 Epoch: 15 Global Step: 192240 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:55:52,122-Speed 3076.19 samples/sec Loss 3.0153 LearningRate 0.0051 Epoch: 15 Global Step: 192250 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:55:55,515-Speed 3018.85 samples/sec Loss 3.0730 LearningRate 0.0051 Epoch: 15 Global Step: 192260 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:55:58,843-Speed 3078.10 samples/sec Loss 3.1022 LearningRate 0.0051 Epoch: 15 Global Step: 192270 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 19:56:02,216-Speed 3036.69 samples/sec Loss 3.0392 LearningRate 0.0051 Epoch: 15 Global Step: 192280 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 19:56:05,583-Speed 3042.32 samples/sec Loss 3.0980 LearningRate 0.0051 Epoch: 15 Global Step: 192290 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 19:56:08,962-Speed 3030.70 samples/sec Loss 3.0339 LearningRate 0.0051 Epoch: 15 Global Step: 192300 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 19:56:12,295-Speed 3073.53 samples/sec Loss 3.1058 LearningRate 0.0051 Epoch: 15 Global Step: 192310 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 19:56:15,669-Speed 3036.41 samples/sec Loss 3.0169 LearningRate 0.0051 Epoch: 15 Global Step: 192320 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:56:19,045-Speed 3033.54 samples/sec Loss 3.0526 LearningRate 0.0051 Epoch: 15 Global Step: 192330 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 19:56:22,435-Speed 3021.26 samples/sec Loss 3.0189 LearningRate 0.0051 Epoch: 15 Global Step: 192340 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 19:56:25,861-Speed 2990.50 samples/sec Loss 3.0469 LearningRate 0.0051 Epoch: 15 Global Step: 192350 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 19:56:29,203-Speed 3064.67 samples/sec Loss 3.0629 LearningRate 0.0051 Epoch: 15 Global Step: 192360 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 19:56:32,610-Speed 3006.29 samples/sec Loss 3.0367 LearningRate 0.0051 Epoch: 15 Global Step: 192370 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 19:56:36,069-Speed 2961.20 samples/sec Loss 3.0403 LearningRate 0.0051 Epoch: 15 Global Step: 192380 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 19:56:39,477-Speed 3005.22 samples/sec Loss 3.0578 LearningRate 0.0051 Epoch: 15 Global Step: 192390 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 19:56:42,893-Speed 2998.54 samples/sec Loss 3.0536 LearningRate 0.0051 Epoch: 15 Global Step: 192400 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 19:56:46,373-Speed 2944.01 samples/sec Loss 3.0679 LearningRate 0.0051 Epoch: 15 Global Step: 192410 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 19:56:49,788-Speed 2999.60 samples/sec Loss 3.0707 LearningRate 0.0051 Epoch: 15 Global Step: 192420 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 19:56:53,115-Speed 3078.44 samples/sec Loss 3.0249 LearningRate 0.0051 Epoch: 15 Global Step: 192430 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:56:56,534-Speed 2995.61 samples/sec Loss 3.1245 LearningRate 0.0051 Epoch: 15 Global Step: 192440 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:56:59,964-Speed 2986.18 samples/sec Loss 3.0840 LearningRate 0.0051 Epoch: 15 Global Step: 192450 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:57:03,371-Speed 3006.78 samples/sec Loss 3.0382 LearningRate 0.0051 Epoch: 15 Global Step: 192460 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:57:06,756-Speed 3025.59 samples/sec Loss 3.0683 LearningRate 0.0051 Epoch: 15 Global Step: 192470 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:57:10,147-Speed 3021.14 samples/sec Loss 2.9906 LearningRate 0.0051 Epoch: 15 Global Step: 192480 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:57:13,592-Speed 2972.37 samples/sec Loss 3.1347 LearningRate 0.0051 Epoch: 15 Global Step: 192490 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:57:17,059-Speed 2955.15 samples/sec Loss 3.0565 LearningRate 0.0051 Epoch: 15 Global Step: 192500 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:57:20,520-Speed 2959.30 samples/sec Loss 3.0151 LearningRate 0.0051 Epoch: 15 Global Step: 192510 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:57:23,921-Speed 3012.13 samples/sec Loss 3.0753 LearningRate 0.0051 Epoch: 15 Global Step: 192520 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:57:27,378-Speed 2962.73 samples/sec Loss 3.0211 LearningRate 0.0051 Epoch: 15 Global Step: 192530 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 19:57:30,858-Speed 2942.56 samples/sec Loss 2.9843 LearningRate 0.0051 Epoch: 15 Global Step: 192540 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:57:34,227-Speed 3040.95 samples/sec Loss 3.0332 LearningRate 0.0051 Epoch: 15 Global Step: 192550 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:57:37,658-Speed 2985.58 samples/sec Loss 3.0619 LearningRate 0.0051 Epoch: 15 Global Step: 192560 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:57:41,026-Speed 3041.23 samples/sec Loss 2.9449 LearningRate 0.0051 Epoch: 15 Global Step: 192570 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:57:44,500-Speed 2948.65 samples/sec Loss 3.0488 LearningRate 0.0051 Epoch: 15 Global Step: 192580 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:57:47,929-Speed 2986.49 samples/sec Loss 3.0892 LearningRate 0.0051 Epoch: 15 Global Step: 192590 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:57:51,271-Speed 3065.36 samples/sec Loss 3.0796 LearningRate 0.0050 Epoch: 15 Global Step: 192600 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 19:57:54,608-Speed 3069.45 samples/sec Loss 3.0392 LearningRate 0.0050 Epoch: 15 Global Step: 192610 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 19:57:57,997-Speed 3022.18 samples/sec Loss 3.0916 LearningRate 0.0050 Epoch: 15 Global Step: 192620 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 19:58:01,427-Speed 2986.82 samples/sec Loss 3.0261 LearningRate 0.0050 Epoch: 15 Global Step: 192630 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 19:58:04,802-Speed 3034.63 samples/sec Loss 3.0722 LearningRate 0.0050 Epoch: 15 Global Step: 192640 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 19:58:08,217-Speed 2998.99 samples/sec Loss 3.0544 LearningRate 0.0050 Epoch: 15 Global Step: 192650 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 19:58:11,600-Speed 3027.90 samples/sec Loss 3.0472 LearningRate 0.0050 Epoch: 15 Global Step: 192660 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 19:58:14,970-Speed 3039.41 samples/sec Loss 2.9985 LearningRate 0.0050 Epoch: 15 Global Step: 192670 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 19:58:18,327-Speed 3051.93 samples/sec Loss 2.9931 LearningRate 0.0050 Epoch: 15 Global Step: 192680 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 19:58:21,720-Speed 3018.31 samples/sec Loss 3.0812 LearningRate 0.0050 Epoch: 15 Global Step: 192690 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 19:58:25,094-Speed 3035.88 samples/sec Loss 3.0660 LearningRate 0.0050 Epoch: 15 Global Step: 192700 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:58:28,530-Speed 2981.55 samples/sec Loss 3.0237 LearningRate 0.0050 Epoch: 15 Global Step: 192710 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:58:31,971-Speed 2978.25 samples/sec Loss 3.0221 LearningRate 0.0050 Epoch: 15 Global Step: 192720 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:58:35,333-Speed 3046.71 samples/sec Loss 3.0140 LearningRate 0.0050 Epoch: 15 Global Step: 192730 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:58:38,648-Speed 3089.95 samples/sec Loss 3.0650 LearningRate 0.0050 Epoch: 15 Global Step: 192740 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:58:42,097-Speed 2969.73 samples/sec Loss 3.0665 LearningRate 0.0050 Epoch: 15 Global Step: 192750 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 19:58:45,536-Speed 2978.13 samples/sec Loss 3.0547 LearningRate 0.0050 Epoch: 15 Global Step: 192760 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 19:58:48,890-Speed 3054.08 samples/sec Loss 3.0459 LearningRate 0.0050 Epoch: 15 Global Step: 192770 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 19:58:52,298-Speed 3005.43 samples/sec Loss 3.0629 LearningRate 0.0050 Epoch: 15 Global Step: 192780 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 19:58:55,645-Speed 3060.41 samples/sec Loss 3.1328 LearningRate 0.0050 Epoch: 15 Global Step: 192790 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 19:58:59,076-Speed 2985.31 samples/sec Loss 2.9715 LearningRate 0.0050 Epoch: 15 Global Step: 192800 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 19:59:02,473-Speed 3014.95 samples/sec Loss 3.0742 LearningRate 0.0050 Epoch: 15 Global Step: 192810 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 19:59:05,815-Speed 3064.68 samples/sec Loss 3.0420 LearningRate 0.0050 Epoch: 15 Global Step: 192820 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 19:59:09,227-Speed 3002.03 samples/sec Loss 3.0561 LearningRate 0.0050 Epoch: 15 Global Step: 192830 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 19:59:12,559-Speed 3074.92 samples/sec Loss 2.9652 LearningRate 0.0050 Epoch: 15 Global Step: 192840 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 19:59:15,972-Speed 3000.90 samples/sec Loss 3.0669 LearningRate 0.0050 Epoch: 15 Global Step: 192850 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:59:19,433-Speed 2959.93 samples/sec Loss 3.0850 LearningRate 0.0050 Epoch: 15 Global Step: 192860 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:59:22,786-Speed 3054.25 samples/sec Loss 3.0980 LearningRate 0.0050 Epoch: 15 Global Step: 192870 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:59:26,123-Speed 3070.12 samples/sec Loss 3.1032 LearningRate 0.0050 Epoch: 15 Global Step: 192880 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:59:29,519-Speed 3015.28 samples/sec Loss 3.0360 LearningRate 0.0050 Epoch: 15 Global Step: 192890 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:59:32,877-Speed 3050.57 samples/sec Loss 3.0934 LearningRate 0.0050 Epoch: 15 Global Step: 192900 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:59:36,223-Speed 3061.30 samples/sec Loss 3.0075 LearningRate 0.0050 Epoch: 15 Global Step: 192910 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:59:39,586-Speed 3045.43 samples/sec Loss 3.1512 LearningRate 0.0050 Epoch: 15 Global Step: 192920 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:59:43,015-Speed 2987.97 samples/sec Loss 3.0697 LearningRate 0.0050 Epoch: 15 Global Step: 192930 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:59:46,379-Speed 3044.62 samples/sec Loss 3.1073 LearningRate 0.0050 Epoch: 15 Global Step: 192940 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:59:49,736-Speed 3051.02 samples/sec Loss 3.1100 LearningRate 0.0050 Epoch: 15 Global Step: 192950 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 19:59:53,104-Speed 3040.63 samples/sec Loss 3.0845 LearningRate 0.0050 Epoch: 15 Global Step: 192960 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:59:56,454-Speed 3057.96 samples/sec Loss 3.0215 LearningRate 0.0050 Epoch: 15 Global Step: 192970 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 19:59:59,807-Speed 3054.73 samples/sec Loss 3.1349 LearningRate 0.0050 Epoch: 15 Global Step: 192980 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:00:03,248-Speed 2976.29 samples/sec Loss 3.0668 LearningRate 0.0050 Epoch: 15 Global Step: 192990 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:00:06,708-Speed 2960.57 samples/sec Loss 3.0226 LearningRate 0.0050 Epoch: 15 Global Step: 193000 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:00:10,147-Speed 2978.35 samples/sec Loss 3.0634 LearningRate 0.0050 Epoch: 15 Global Step: 193010 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:00:13,543-Speed 3017.06 samples/sec Loss 3.0956 LearningRate 0.0050 Epoch: 15 Global Step: 193020 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:00:16,933-Speed 3020.36 samples/sec Loss 3.0241 LearningRate 0.0050 Epoch: 15 Global Step: 193030 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:00:20,386-Speed 2967.35 samples/sec Loss 3.1173 LearningRate 0.0050 Epoch: 15 Global Step: 193040 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:00:23,742-Speed 3051.50 samples/sec Loss 3.1202 LearningRate 0.0050 Epoch: 15 Global Step: 193050 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:00:27,075-Speed 3073.98 samples/sec Loss 3.0198 LearningRate 0.0050 Epoch: 15 Global Step: 193060 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 20:00:30,504-Speed 2986.59 samples/sec Loss 3.0563 LearningRate 0.0050 Epoch: 15 Global Step: 193070 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 20:00:33,851-Speed 3060.78 samples/sec Loss 3.0260 LearningRate 0.0050 Epoch: 15 Global Step: 193080 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 20:00:37,245-Speed 3018.15 samples/sec Loss 3.0655 LearningRate 0.0050 Epoch: 15 Global Step: 193090 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:00:40,580-Speed 3071.23 samples/sec Loss 2.9820 LearningRate 0.0050 Epoch: 15 Global Step: 193100 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:00:43,950-Speed 3039.84 samples/sec Loss 3.1218 LearningRate 0.0050 Epoch: 15 Global Step: 193110 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:00:47,350-Speed 3012.63 samples/sec Loss 3.0547 LearningRate 0.0050 Epoch: 15 Global Step: 193120 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:00:50,754-Speed 3008.50 samples/sec Loss 3.0195 LearningRate 0.0050 Epoch: 15 Global Step: 193130 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:00:54,099-Speed 3061.93 samples/sec Loss 3.0543 LearningRate 0.0050 Epoch: 15 Global Step: 193140 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:00:57,524-Speed 2991.45 samples/sec Loss 3.0829 LearningRate 0.0050 Epoch: 15 Global Step: 193150 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:01:00,951-Speed 2988.93 samples/sec Loss 3.0821 LearningRate 0.0049 Epoch: 15 Global Step: 193160 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:01:04,383-Speed 2983.88 samples/sec Loss 3.1515 LearningRate 0.0049 Epoch: 15 Global Step: 193170 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:01:07,824-Speed 2976.72 samples/sec Loss 3.0069 LearningRate 0.0049 Epoch: 15 Global Step: 193180 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:01:11,172-Speed 3059.89 samples/sec Loss 3.0307 LearningRate 0.0049 Epoch: 15 Global Step: 193190 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:01:14,545-Speed 3036.49 samples/sec Loss 3.1668 LearningRate 0.0049 Epoch: 15 Global Step: 193200 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:01:17,920-Speed 3035.65 samples/sec Loss 3.1382 LearningRate 0.0049 Epoch: 15 Global Step: 193210 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:01:21,299-Speed 3030.98 samples/sec Loss 3.1441 LearningRate 0.0049 Epoch: 15 Global Step: 193220 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:01:24,756-Speed 2962.42 samples/sec Loss 3.0239 LearningRate 0.0049 Epoch: 15 Global Step: 193230 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:01:28,258-Speed 2925.50 samples/sec Loss 3.0454 LearningRate 0.0049 Epoch: 15 Global Step: 193240 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:01:31,747-Speed 2935.62 samples/sec Loss 3.0953 LearningRate 0.0049 Epoch: 15 Global Step: 193250 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:01:35,151-Speed 3008.72 samples/sec Loss 3.0206 LearningRate 0.0049 Epoch: 15 Global Step: 193260 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:01:38,641-Speed 2935.39 samples/sec Loss 3.1097 LearningRate 0.0049 Epoch: 15 Global Step: 193270 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:01:42,051-Speed 3003.43 samples/sec Loss 3.0866 LearningRate 0.0049 Epoch: 15 Global Step: 193280 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:01:45,470-Speed 2995.72 samples/sec Loss 3.0100 LearningRate 0.0049 Epoch: 15 Global Step: 193290 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:01:48,879-Speed 3004.32 samples/sec Loss 3.1259 LearningRate 0.0049 Epoch: 15 Global Step: 193300 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:01:52,329-Speed 2969.32 samples/sec Loss 3.0313 LearningRate 0.0049 Epoch: 15 Global Step: 193310 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:01:55,682-Speed 3055.48 samples/sec Loss 3.0689 LearningRate 0.0049 Epoch: 15 Global Step: 193320 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:01:59,149-Speed 2953.77 samples/sec Loss 3.1302 LearningRate 0.0049 Epoch: 15 Global Step: 193330 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:02:02,545-Speed 3016.61 samples/sec Loss 3.0870 LearningRate 0.0049 Epoch: 15 Global Step: 193340 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:02:05,952-Speed 3006.25 samples/sec Loss 3.0590 LearningRate 0.0049 Epoch: 15 Global Step: 193350 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:02:09,358-Speed 3008.02 samples/sec Loss 3.0737 LearningRate 0.0049 Epoch: 15 Global Step: 193360 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:02:12,765-Speed 3006.03 samples/sec Loss 3.1166 LearningRate 0.0049 Epoch: 15 Global Step: 193370 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:02:16,275-Speed 2918.57 samples/sec Loss 3.0612 LearningRate 0.0049 Epoch: 15 Global Step: 193380 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:02:19,633-Speed 3050.59 samples/sec Loss 3.1057 LearningRate 0.0049 Epoch: 15 Global Step: 193390 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:02:23,016-Speed 3026.97 samples/sec Loss 3.0449 LearningRate 0.0049 Epoch: 15 Global Step: 193400 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:02:26,381-Speed 3044.06 samples/sec Loss 3.0199 LearningRate 0.0049 Epoch: 15 Global Step: 193410 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:02:29,765-Speed 3027.32 samples/sec Loss 3.0972 LearningRate 0.0049 Epoch: 15 Global Step: 193420 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:02:33,121-Speed 3051.39 samples/sec Loss 3.1155 LearningRate 0.0049 Epoch: 15 Global Step: 193430 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:02:36,524-Speed 3010.03 samples/sec Loss 3.0845 LearningRate 0.0049 Epoch: 15 Global Step: 193440 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:02:39,855-Speed 3075.91 samples/sec Loss 2.9705 LearningRate 0.0049 Epoch: 15 Global Step: 193450 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:02:43,171-Speed 3088.77 samples/sec Loss 3.0599 LearningRate 0.0049 Epoch: 15 Global Step: 193460 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:02:46,517-Speed 3060.90 samples/sec Loss 3.0772 LearningRate 0.0049 Epoch: 15 Global Step: 193470 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:02:49,967-Speed 2968.50 samples/sec Loss 3.0786 LearningRate 0.0049 Epoch: 15 Global Step: 193480 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:02:53,358-Speed 3020.56 samples/sec Loss 3.0682 LearningRate 0.0049 Epoch: 15 Global Step: 193490 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:02:56,745-Speed 3024.88 samples/sec Loss 3.0883 LearningRate 0.0049 Epoch: 15 Global Step: 193500 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:03:00,155-Speed 3003.53 samples/sec Loss 3.0897 LearningRate 0.0049 Epoch: 15 Global Step: 193510 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:03:03,580-Speed 2990.57 samples/sec Loss 3.0910 LearningRate 0.0049 Epoch: 15 Global Step: 193520 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:03:06,970-Speed 3020.81 samples/sec Loss 3.0338 LearningRate 0.0049 Epoch: 15 Global Step: 193530 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:03:10,362-Speed 3020.11 samples/sec Loss 3.0939 LearningRate 0.0049 Epoch: 15 Global Step: 193540 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:03:13,763-Speed 3012.10 samples/sec Loss 3.1034 LearningRate 0.0049 Epoch: 15 Global Step: 193550 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:03:17,109-Speed 3060.58 samples/sec Loss 3.1676 LearningRate 0.0049 Epoch: 15 Global Step: 193560 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:03:20,498-Speed 3022.48 samples/sec Loss 3.0733 LearningRate 0.0049 Epoch: 15 Global Step: 193570 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:03:23,865-Speed 3041.63 samples/sec Loss 3.1187 LearningRate 0.0049 Epoch: 15 Global Step: 193580 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:03:27,254-Speed 3022.44 samples/sec Loss 3.0340 LearningRate 0.0049 Epoch: 15 Global Step: 193590 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:03:30,628-Speed 3036.04 samples/sec Loss 3.0906 LearningRate 0.0049 Epoch: 15 Global Step: 193600 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:03:33,974-Speed 3061.40 samples/sec Loss 3.0341 LearningRate 0.0049 Epoch: 15 Global Step: 193610 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:03:37,411-Speed 2979.94 samples/sec Loss 3.0687 LearningRate 0.0049 Epoch: 15 Global Step: 193620 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:03:40,791-Speed 3030.73 samples/sec Loss 3.1256 LearningRate 0.0049 Epoch: 15 Global Step: 193630 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:03:44,202-Speed 3002.77 samples/sec Loss 3.1272 LearningRate 0.0049 Epoch: 15 Global Step: 193640 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:03:47,565-Speed 3045.66 samples/sec Loss 3.0699 LearningRate 0.0049 Epoch: 15 Global Step: 193650 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:03:50,981-Speed 2998.71 samples/sec Loss 3.0811 LearningRate 0.0049 Epoch: 15 Global Step: 193660 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:03:54,372-Speed 3020.35 samples/sec Loss 3.0445 LearningRate 0.0049 Epoch: 15 Global Step: 193670 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:03:57,751-Speed 3032.06 samples/sec Loss 3.0202 LearningRate 0.0049 Epoch: 15 Global Step: 193680 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:04:01,141-Speed 3020.73 samples/sec Loss 3.1394 LearningRate 0.0049 Epoch: 15 Global Step: 193690 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:04:04,566-Speed 2990.64 samples/sec Loss 3.0377 LearningRate 0.0049 Epoch: 15 Global Step: 193700 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:04:07,932-Speed 3042.92 samples/sec Loss 3.0539 LearningRate 0.0049 Epoch: 15 Global Step: 193710 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:04:11,367-Speed 2982.93 samples/sec Loss 3.0371 LearningRate 0.0048 Epoch: 15 Global Step: 193720 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:04:14,750-Speed 3027.52 samples/sec Loss 3.0474 LearningRate 0.0048 Epoch: 15 Global Step: 193730 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:04:18,116-Speed 3042.44 samples/sec Loss 3.0644 LearningRate 0.0048 Epoch: 15 Global Step: 193740 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:04:21,444-Speed 3078.54 samples/sec Loss 2.9837 LearningRate 0.0048 Epoch: 15 Global Step: 193750 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:04:24,841-Speed 3014.85 samples/sec Loss 3.0169 LearningRate 0.0048 Epoch: 15 Global Step: 193760 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:04:28,253-Speed 3002.34 samples/sec Loss 3.0376 LearningRate 0.0048 Epoch: 15 Global Step: 193770 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:04:31,648-Speed 3017.37 samples/sec Loss 2.9976 LearningRate 0.0048 Epoch: 15 Global Step: 193780 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:04:35,028-Speed 3029.89 samples/sec Loss 3.0397 LearningRate 0.0048 Epoch: 15 Global Step: 193790 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:04:38,390-Speed 3046.95 samples/sec Loss 2.9960 LearningRate 0.0048 Epoch: 15 Global Step: 193800 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:04:41,783-Speed 3019.16 samples/sec Loss 3.1239 LearningRate 0.0048 Epoch: 15 Global Step: 193810 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:04:45,096-Speed 3091.00 samples/sec Loss 3.1083 LearningRate 0.0048 Epoch: 15 Global Step: 193820 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:04:48,444-Speed 3059.61 samples/sec Loss 3.0137 LearningRate 0.0048 Epoch: 15 Global Step: 193830 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:04:51,920-Speed 2946.43 samples/sec Loss 3.1282 LearningRate 0.0048 Epoch: 15 Global Step: 193840 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:04:55,346-Speed 2990.31 samples/sec Loss 3.0707 LearningRate 0.0048 Epoch: 15 Global Step: 193850 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:04:58,762-Speed 2997.67 samples/sec Loss 3.0334 LearningRate 0.0048 Epoch: 15 Global Step: 193860 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:05:02,190-Speed 2988.88 samples/sec Loss 3.0449 LearningRate 0.0048 Epoch: 15 Global Step: 193870 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:05:05,516-Speed 3079.40 samples/sec Loss 3.1045 LearningRate 0.0048 Epoch: 15 Global Step: 193880 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:05:08,903-Speed 3024.45 samples/sec Loss 3.0389 LearningRate 0.0048 Epoch: 15 Global Step: 193890 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:05:12,275-Speed 3037.91 samples/sec Loss 3.0383 LearningRate 0.0048 Epoch: 15 Global Step: 193900 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:05:15,597-Speed 3082.81 samples/sec Loss 3.0447 LearningRate 0.0048 Epoch: 15 Global Step: 193910 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:05:19,026-Speed 2986.80 samples/sec Loss 3.0983 LearningRate 0.0048 Epoch: 15 Global Step: 193920 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:05:22,368-Speed 3064.94 samples/sec Loss 3.0496 LearningRate 0.0048 Epoch: 15 Global Step: 193930 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:05:25,709-Speed 3065.91 samples/sec Loss 3.1423 LearningRate 0.0048 Epoch: 15 Global Step: 193940 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:05:29,058-Speed 3058.27 samples/sec Loss 3.0991 LearningRate 0.0048 Epoch: 15 Global Step: 193950 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:05:32,414-Speed 3052.29 samples/sec Loss 3.0875 LearningRate 0.0048 Epoch: 15 Global Step: 193960 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:05:35,784-Speed 3040.26 samples/sec Loss 3.0986 LearningRate 0.0048 Epoch: 15 Global Step: 193970 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:05:39,126-Speed 3064.66 samples/sec Loss 3.0354 LearningRate 0.0048 Epoch: 15 Global Step: 193980 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:05:42,498-Speed 3037.22 samples/sec Loss 3.0750 LearningRate 0.0048 Epoch: 15 Global Step: 193990 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:05:45,966-Speed 2953.66 samples/sec Loss 3.1062 LearningRate 0.0048 Epoch: 15 Global Step: 194000 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:05:49,498-Speed 2900.36 samples/sec Loss 2.9639 LearningRate 0.0048 Epoch: 15 Global Step: 194010 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 20:05:52,848-Speed 3057.94 samples/sec Loss 3.0657 LearningRate 0.0048 Epoch: 15 Global Step: 194020 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:05:56,240-Speed 3019.74 samples/sec Loss 3.0933 LearningRate 0.0048 Epoch: 15 Global Step: 194030 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:05:59,574-Speed 3072.18 samples/sec Loss 3.0793 LearningRate 0.0048 Epoch: 15 Global Step: 194040 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:06:03,054-Speed 2944.30 samples/sec Loss 2.9928 LearningRate 0.0048 Epoch: 15 Global Step: 194050 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:06:06,417-Speed 3045.73 samples/sec Loss 3.0845 LearningRate 0.0048 Epoch: 15 Global Step: 194060 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:06:09,876-Speed 2961.23 samples/sec Loss 3.0587 LearningRate 0.0048 Epoch: 15 Global Step: 194070 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:06:13,248-Speed 3037.94 samples/sec Loss 3.0553 LearningRate 0.0048 Epoch: 15 Global Step: 194080 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:06:16,634-Speed 3024.98 samples/sec Loss 3.0666 LearningRate 0.0048 Epoch: 15 Global Step: 194090 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:06:19,968-Speed 3072.21 samples/sec Loss 3.0724 LearningRate 0.0048 Epoch: 15 Global Step: 194100 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:06:23,436-Speed 2954.01 samples/sec Loss 3.0769 LearningRate 0.0048 Epoch: 15 Global Step: 194110 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:06:26,753-Speed 3087.65 samples/sec Loss 3.0581 LearningRate 0.0048 Epoch: 15 Global Step: 194120 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:06:30,173-Speed 2994.80 samples/sec Loss 3.1162 LearningRate 0.0048 Epoch: 15 Global Step: 194130 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:06:33,527-Speed 3054.26 samples/sec Loss 3.1077 LearningRate 0.0048 Epoch: 15 Global Step: 194140 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:06:36,901-Speed 3036.03 samples/sec Loss 3.0720 LearningRate 0.0048 Epoch: 15 Global Step: 194150 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:06:40,239-Speed 3069.00 samples/sec Loss 3.0646 LearningRate 0.0048 Epoch: 15 Global Step: 194160 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:06:43,709-Speed 2951.69 samples/sec Loss 3.0309 LearningRate 0.0048 Epoch: 15 Global Step: 194170 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:06:47,058-Speed 3058.00 samples/sec Loss 3.0490 LearningRate 0.0048 Epoch: 15 Global Step: 194180 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:06:50,553-Speed 2931.13 samples/sec Loss 3.0464 LearningRate 0.0048 Epoch: 15 Global Step: 194190 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:06:54,084-Speed 2900.97 samples/sec Loss 2.9701 LearningRate 0.0048 Epoch: 15 Global Step: 194200 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:06:57,480-Speed 3015.56 samples/sec Loss 3.0427 LearningRate 0.0048 Epoch: 15 Global Step: 194210 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:07:00,871-Speed 3021.15 samples/sec Loss 3.1577 LearningRate 0.0048 Epoch: 15 Global Step: 194220 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:07:04,195-Speed 3081.44 samples/sec Loss 3.0767 LearningRate 0.0048 Epoch: 15 Global Step: 194230 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:07:07,562-Speed 3041.67 samples/sec Loss 3.0995 LearningRate 0.0048 Epoch: 15 Global Step: 194240 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:07:10,891-Speed 3077.13 samples/sec Loss 3.0654 LearningRate 0.0048 Epoch: 15 Global Step: 194250 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:07:14,239-Speed 3059.83 samples/sec Loss 3.1194 LearningRate 0.0048 Epoch: 15 Global Step: 194260 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:07:17,596-Speed 3050.95 samples/sec Loss 3.0589 LearningRate 0.0048 Epoch: 15 Global Step: 194270 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:07:20,978-Speed 3028.85 samples/sec Loss 3.1103 LearningRate 0.0047 Epoch: 15 Global Step: 194280 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:07:24,403-Speed 2990.85 samples/sec Loss 3.1472 LearningRate 0.0047 Epoch: 15 Global Step: 194290 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:07:27,746-Speed 3063.22 samples/sec Loss 3.0867 LearningRate 0.0047 Epoch: 15 Global Step: 194300 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:07:31,126-Speed 3030.60 samples/sec Loss 3.0965 LearningRate 0.0047 Epoch: 15 Global Step: 194310 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:07:34,551-Speed 2990.60 samples/sec Loss 3.0595 LearningRate 0.0047 Epoch: 15 Global Step: 194320 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:07:37,882-Speed 3075.77 samples/sec Loss 3.0667 LearningRate 0.0047 Epoch: 15 Global Step: 194330 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:07:41,222-Speed 3065.99 samples/sec Loss 3.0374 LearningRate 0.0047 Epoch: 15 Global Step: 194340 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:07:44,586-Speed 3045.19 samples/sec Loss 3.0730 LearningRate 0.0047 Epoch: 15 Global Step: 194350 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:07:48,031-Speed 2973.18 samples/sec Loss 3.1545 LearningRate 0.0047 Epoch: 15 Global Step: 194360 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:07:51,467-Speed 2980.63 samples/sec Loss 3.0251 LearningRate 0.0047 Epoch: 15 Global Step: 194370 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:07:54,797-Speed 3076.61 samples/sec Loss 3.0562 LearningRate 0.0047 Epoch: 15 Global Step: 194380 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:07:58,141-Speed 3063.23 samples/sec Loss 2.9823 LearningRate 0.0047 Epoch: 15 Global Step: 194390 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:08:01,502-Speed 3047.59 samples/sec Loss 2.9896 LearningRate 0.0047 Epoch: 15 Global Step: 194400 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:08:04,865-Speed 3045.94 samples/sec Loss 3.1502 LearningRate 0.0047 Epoch: 15 Global Step: 194410 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:08:08,285-Speed 2995.49 samples/sec Loss 3.0040 LearningRate 0.0047 Epoch: 15 Global Step: 194420 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:08:11,757-Speed 2949.79 samples/sec Loss 3.0692 LearningRate 0.0047 Epoch: 15 Global Step: 194430 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:08:15,130-Speed 3036.36 samples/sec Loss 3.0873 LearningRate 0.0047 Epoch: 15 Global Step: 194440 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:08:18,514-Speed 3027.10 samples/sec Loss 3.0587 LearningRate 0.0047 Epoch: 15 Global Step: 194450 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:08:21,892-Speed 3032.99 samples/sec Loss 3.0908 LearningRate 0.0047 Epoch: 15 Global Step: 194460 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:08:25,273-Speed 3029.54 samples/sec Loss 3.1322 LearningRate 0.0047 Epoch: 15 Global Step: 194470 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:08:28,588-Speed 3089.53 samples/sec Loss 3.1222 LearningRate 0.0047 Epoch: 15 Global Step: 194480 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:08:31,975-Speed 3024.72 samples/sec Loss 3.1020 LearningRate 0.0047 Epoch: 15 Global Step: 194490 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:08:35,433-Speed 2962.78 samples/sec Loss 3.0603 LearningRate 0.0047 Epoch: 15 Global Step: 194500 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:08:38,945-Speed 2916.79 samples/sec Loss 3.1171 LearningRate 0.0047 Epoch: 15 Global Step: 194510 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:08:42,272-Speed 3078.46 samples/sec Loss 3.0712 LearningRate 0.0047 Epoch: 15 Global Step: 194520 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:08:45,729-Speed 2963.18 samples/sec Loss 3.0529 LearningRate 0.0047 Epoch: 15 Global Step: 194530 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:08:49,155-Speed 2988.91 samples/sec Loss 3.1230 LearningRate 0.0047 Epoch: 15 Global Step: 194540 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:08:52,544-Speed 3022.81 samples/sec Loss 3.0955 LearningRate 0.0047 Epoch: 15 Global Step: 194550 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:08:55,891-Speed 3060.14 samples/sec Loss 3.1852 LearningRate 0.0047 Epoch: 15 Global Step: 194560 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:08:59,261-Speed 3039.22 samples/sec Loss 3.0724 LearningRate 0.0047 Epoch: 15 Global Step: 194570 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:09:02,659-Speed 3014.32 samples/sec Loss 3.1321 LearningRate 0.0047 Epoch: 15 Global Step: 194580 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:09:05,993-Speed 3072.48 samples/sec Loss 3.0714 LearningRate 0.0047 Epoch: 15 Global Step: 194590 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:09:09,476-Speed 2940.62 samples/sec Loss 3.0841 LearningRate 0.0047 Epoch: 15 Global Step: 194600 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:09:12,930-Speed 2965.39 samples/sec Loss 3.0701 LearningRate 0.0047 Epoch: 15 Global Step: 194610 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:09:16,305-Speed 3035.90 samples/sec Loss 3.1482 LearningRate 0.0047 Epoch: 15 Global Step: 194620 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:09:19,619-Speed 3090.44 samples/sec Loss 3.1592 LearningRate 0.0047 Epoch: 15 Global Step: 194630 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:09:22,976-Speed 3050.82 samples/sec Loss 2.9595 LearningRate 0.0047 Epoch: 15 Global Step: 194640 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:09:26,340-Speed 3044.81 samples/sec Loss 3.0384 LearningRate 0.0047 Epoch: 15 Global Step: 194650 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:09:29,750-Speed 3004.50 samples/sec Loss 3.1257 LearningRate 0.0047 Epoch: 15 Global Step: 194660 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:09:33,222-Speed 2949.82 samples/sec Loss 3.0288 LearningRate 0.0047 Epoch: 15 Global Step: 194670 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:09:36,656-Speed 2984.16 samples/sec Loss 3.0698 LearningRate 0.0047 Epoch: 15 Global Step: 194680 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:09:40,076-Speed 2994.69 samples/sec Loss 3.1973 LearningRate 0.0047 Epoch: 15 Global Step: 194690 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:09:43,448-Speed 3037.61 samples/sec Loss 3.1261 LearningRate 0.0047 Epoch: 15 Global Step: 194700 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:09:46,837-Speed 3022.78 samples/sec Loss 3.0866 LearningRate 0.0047 Epoch: 15 Global Step: 194710 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:09:50,203-Speed 3042.31 samples/sec Loss 2.9421 LearningRate 0.0047 Epoch: 15 Global Step: 194720 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:09:53,599-Speed 3016.91 samples/sec Loss 3.0687 LearningRate 0.0047 Epoch: 15 Global Step: 194730 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:09:56,972-Speed 3036.98 samples/sec Loss 3.0307 LearningRate 0.0047 Epoch: 15 Global Step: 194740 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:10:00,412-Speed 2976.76 samples/sec Loss 3.0500 LearningRate 0.0047 Epoch: 15 Global Step: 194750 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:10:03,923-Speed 2917.41 samples/sec Loss 3.0207 LearningRate 0.0047 Epoch: 15 Global Step: 194760 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:10:07,343-Speed 2995.53 samples/sec Loss 3.0377 LearningRate 0.0047 Epoch: 15 Global Step: 194770 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:10:10,662-Speed 3085.56 samples/sec Loss 3.1695 LearningRate 0.0047 Epoch: 15 Global Step: 194780 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:10:14,048-Speed 3025.59 samples/sec Loss 3.1177 LearningRate 0.0047 Epoch: 15 Global Step: 194790 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:10:17,487-Speed 2978.00 samples/sec Loss 3.1270 LearningRate 0.0047 Epoch: 15 Global Step: 194800 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:10:20,837-Speed 3057.86 samples/sec Loss 2.9949 LearningRate 0.0047 Epoch: 15 Global Step: 194810 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:10:24,212-Speed 3034.98 samples/sec Loss 3.1294 LearningRate 0.0047 Epoch: 15 Global Step: 194820 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:10:27,663-Speed 2968.41 samples/sec Loss 3.1059 LearningRate 0.0047 Epoch: 15 Global Step: 194830 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:10:31,044-Speed 3029.67 samples/sec Loss 3.0520 LearningRate 0.0047 Epoch: 15 Global Step: 194840 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:10:34,469-Speed 2990.39 samples/sec Loss 3.0400 LearningRate 0.0047 Epoch: 15 Global Step: 194850 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:10:37,872-Speed 3009.94 samples/sec Loss 3.0845 LearningRate 0.0046 Epoch: 15 Global Step: 194860 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:10:41,233-Speed 3047.85 samples/sec Loss 3.1461 LearningRate 0.0046 Epoch: 15 Global Step: 194870 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:10:44,623-Speed 3021.60 samples/sec Loss 3.0291 LearningRate 0.0046 Epoch: 15 Global Step: 194880 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:10:48,004-Speed 3029.24 samples/sec Loss 3.0828 LearningRate 0.0046 Epoch: 15 Global Step: 194890 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:10:51,451-Speed 2971.10 samples/sec Loss 3.0776 LearningRate 0.0046 Epoch: 15 Global Step: 194900 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:10:54,839-Speed 3023.81 samples/sec Loss 3.1229 LearningRate 0.0046 Epoch: 15 Global Step: 194910 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:10:58,189-Speed 3057.41 samples/sec Loss 3.0331 LearningRate 0.0046 Epoch: 15 Global Step: 194920 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:11:01,570-Speed 3029.67 samples/sec Loss 3.1019 LearningRate 0.0046 Epoch: 15 Global Step: 194930 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:11:04,957-Speed 3024.22 samples/sec Loss 3.0483 LearningRate 0.0046 Epoch: 15 Global Step: 194940 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:11:08,414-Speed 2962.57 samples/sec Loss 2.9976 LearningRate 0.0046 Epoch: 15 Global Step: 194950 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:11:11,885-Speed 2951.08 samples/sec Loss 3.0809 LearningRate 0.0046 Epoch: 15 Global Step: 194960 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:11:15,272-Speed 3023.93 samples/sec Loss 3.0999 LearningRate 0.0046 Epoch: 15 Global Step: 194970 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:11:18,634-Speed 3047.41 samples/sec Loss 3.0496 LearningRate 0.0046 Epoch: 15 Global Step: 194980 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:11:21,982-Speed 3059.09 samples/sec Loss 3.1158 LearningRate 0.0046 Epoch: 15 Global Step: 194990 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:11:25,415-Speed 2984.02 samples/sec Loss 3.1193 LearningRate 0.0046 Epoch: 15 Global Step: 195000 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:11:28,751-Speed 3069.84 samples/sec Loss 3.0681 LearningRate 0.0046 Epoch: 15 Global Step: 195010 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:11:32,135-Speed 3026.92 samples/sec Loss 3.0662 LearningRate 0.0046 Epoch: 15 Global Step: 195020 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:11:35,456-Speed 3084.20 samples/sec Loss 3.0576 LearningRate 0.0046 Epoch: 15 Global Step: 195030 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:11:38,835-Speed 3031.49 samples/sec Loss 3.0796 LearningRate 0.0046 Epoch: 15 Global Step: 195040 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:11:42,254-Speed 2996.05 samples/sec Loss 3.0503 LearningRate 0.0046 Epoch: 15 Global Step: 195050 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:11:45,618-Speed 3044.35 samples/sec Loss 3.1401 LearningRate 0.0046 Epoch: 15 Global Step: 195060 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:11:49,049-Speed 2985.94 samples/sec Loss 3.0011 LearningRate 0.0046 Epoch: 15 Global Step: 195070 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 20:11:52,444-Speed 3016.65 samples/sec Loss 3.1510 LearningRate 0.0046 Epoch: 15 Global Step: 195080 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 20:11:55,843-Speed 3013.69 samples/sec Loss 3.0336 LearningRate 0.0046 Epoch: 15 Global Step: 195090 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:11:59,253-Speed 3003.62 samples/sec Loss 3.0906 LearningRate 0.0046 Epoch: 15 Global Step: 195100 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:12:02,607-Speed 3053.51 samples/sec Loss 3.1322 LearningRate 0.0046 Epoch: 15 Global Step: 195110 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:12:06,004-Speed 3015.14 samples/sec Loss 3.0830 LearningRate 0.0046 Epoch: 15 Global Step: 195120 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:12:09,366-Speed 3046.94 samples/sec Loss 3.0464 LearningRate 0.0046 Epoch: 15 Global Step: 195130 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:12:12,753-Speed 3024.36 samples/sec Loss 2.9971 LearningRate 0.0046 Epoch: 15 Global Step: 195140 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:12:16,098-Speed 3062.69 samples/sec Loss 3.1239 LearningRate 0.0046 Epoch: 15 Global Step: 195150 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:12:19,458-Speed 3048.43 samples/sec Loss 3.0852 LearningRate 0.0046 Epoch: 15 Global Step: 195160 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:12:22,833-Speed 3035.20 samples/sec Loss 3.0882 LearningRate 0.0046 Epoch: 15 Global Step: 195170 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:12:26,290-Speed 2963.04 samples/sec Loss 3.1033 LearningRate 0.0046 Epoch: 15 Global Step: 195180 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:12:29,665-Speed 3035.01 samples/sec Loss 3.1168 LearningRate 0.0046 Epoch: 15 Global Step: 195190 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 20:12:33,141-Speed 2946.95 samples/sec Loss 3.0205 LearningRate 0.0046 Epoch: 15 Global Step: 195200 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 20:12:36,552-Speed 3002.40 samples/sec Loss 3.0362 LearningRate 0.0046 Epoch: 15 Global Step: 195210 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 20:12:39,914-Speed 3047.14 samples/sec Loss 3.0701 LearningRate 0.0046 Epoch: 15 Global Step: 195220 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 20:12:43,366-Speed 2967.04 samples/sec Loss 3.0817 LearningRate 0.0046 Epoch: 15 Global Step: 195230 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:12:46,743-Speed 3032.49 samples/sec Loss 3.1358 LearningRate 0.0046 Epoch: 15 Global Step: 195240 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:12:50,111-Speed 3041.60 samples/sec Loss 3.1056 LearningRate 0.0046 Epoch: 15 Global Step: 195250 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:12:53,439-Speed 3077.94 samples/sec Loss 3.0277 LearningRate 0.0046 Epoch: 15 Global Step: 195260 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:12:56,807-Speed 3040.92 samples/sec Loss 3.0756 LearningRate 0.0046 Epoch: 15 Global Step: 195270 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:13:00,195-Speed 3023.26 samples/sec Loss 2.9842 LearningRate 0.0046 Epoch: 15 Global Step: 195280 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:13:03,609-Speed 3000.87 samples/sec Loss 3.0631 LearningRate 0.0046 Epoch: 15 Global Step: 195290 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:13:06,976-Speed 3041.95 samples/sec Loss 3.0258 LearningRate 0.0046 Epoch: 15 Global Step: 195300 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:13:10,359-Speed 3027.27 samples/sec Loss 3.0868 LearningRate 0.0046 Epoch: 15 Global Step: 195310 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:13:13,769-Speed 3004.22 samples/sec Loss 3.1228 LearningRate 0.0046 Epoch: 15 Global Step: 195320 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:13:17,194-Speed 2990.80 samples/sec Loss 3.0302 LearningRate 0.0046 Epoch: 15 Global Step: 195330 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 20:13:20,582-Speed 3023.26 samples/sec Loss 3.1448 LearningRate 0.0046 Epoch: 15 Global Step: 195340 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 20:13:23,901-Speed 3085.39 samples/sec Loss 3.0650 LearningRate 0.0046 Epoch: 15 Global Step: 195350 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:13:27,310-Speed 3005.20 samples/sec Loss 3.1627 LearningRate 0.0046 Epoch: 15 Global Step: 195360 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:13:30,694-Speed 3026.42 samples/sec Loss 3.1468 LearningRate 0.0046 Epoch: 15 Global Step: 195370 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:13:34,093-Speed 3013.48 samples/sec Loss 3.0201 LearningRate 0.0046 Epoch: 15 Global Step: 195380 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:13:37,390-Speed 3107.40 samples/sec Loss 3.1797 LearningRate 0.0046 Epoch: 15 Global Step: 195390 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:13:40,896-Speed 2921.28 samples/sec Loss 3.0912 LearningRate 0.0046 Epoch: 15 Global Step: 195400 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:13:44,297-Speed 3011.38 samples/sec Loss 3.1095 LearningRate 0.0046 Epoch: 15 Global Step: 195410 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:13:47,675-Speed 3032.41 samples/sec Loss 3.1413 LearningRate 0.0046 Epoch: 15 Global Step: 195420 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:13:51,031-Speed 3052.00 samples/sec Loss 3.1256 LearningRate 0.0046 Epoch: 15 Global Step: 195430 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:13:54,393-Speed 3046.86 samples/sec Loss 3.1569 LearningRate 0.0045 Epoch: 15 Global Step: 195440 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:13:57,735-Speed 3064.77 samples/sec Loss 3.0198 LearningRate 0.0045 Epoch: 15 Global Step: 195450 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:14:01,097-Speed 3046.50 samples/sec Loss 3.1040 LearningRate 0.0045 Epoch: 15 Global Step: 195460 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:14:04,555-Speed 2962.54 samples/sec Loss 2.9599 LearningRate 0.0045 Epoch: 15 Global Step: 195470 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:14:08,030-Speed 2947.73 samples/sec Loss 3.0630 LearningRate 0.0045 Epoch: 15 Global Step: 195480 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:14:11,500-Speed 2951.87 samples/sec Loss 3.1019 LearningRate 0.0045 Epoch: 15 Global Step: 195490 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:14:14,906-Speed 3006.87 samples/sec Loss 3.0904 LearningRate 0.0045 Epoch: 15 Global Step: 195500 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:14:18,342-Speed 2981.22 samples/sec Loss 3.0530 LearningRate 0.0045 Epoch: 15 Global Step: 195510 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:14:21,727-Speed 3026.22 samples/sec Loss 3.0447 LearningRate 0.0045 Epoch: 15 Global Step: 195520 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:14:25,093-Speed 3042.69 samples/sec Loss 3.0638 LearningRate 0.0045 Epoch: 15 Global Step: 195530 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:14:28,502-Speed 3004.61 samples/sec Loss 3.0708 LearningRate 0.0045 Epoch: 15 Global Step: 195540 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:14:31,959-Speed 2963.27 samples/sec Loss 2.9918 LearningRate 0.0045 Epoch: 15 Global Step: 195550 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:14:35,395-Speed 2980.83 samples/sec Loss 3.1296 LearningRate 0.0045 Epoch: 15 Global Step: 195560 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:14:38,846-Speed 2967.69 samples/sec Loss 3.0568 LearningRate 0.0045 Epoch: 15 Global Step: 195570 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:14:42,279-Speed 2984.22 samples/sec Loss 3.0334 LearningRate 0.0045 Epoch: 15 Global Step: 195580 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:14:45,632-Speed 3054.58 samples/sec Loss 3.0853 LearningRate 0.0045 Epoch: 15 Global Step: 195590 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:14:49,020-Speed 3022.78 samples/sec Loss 3.0952 LearningRate 0.0045 Epoch: 15 Global Step: 195600 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:14:52,424-Speed 3009.61 samples/sec Loss 3.0430 LearningRate 0.0045 Epoch: 15 Global Step: 195610 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:14:55,836-Speed 3001.88 samples/sec Loss 3.0278 LearningRate 0.0045 Epoch: 15 Global Step: 195620 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:14:59,178-Speed 3065.16 samples/sec Loss 3.0702 LearningRate 0.0045 Epoch: 15 Global Step: 195630 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:15:02,566-Speed 3022.87 samples/sec Loss 3.0848 LearningRate 0.0045 Epoch: 15 Global Step: 195640 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:15:05,910-Speed 3063.68 samples/sec Loss 3.0801 LearningRate 0.0045 Epoch: 15 Global Step: 195650 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:15:09,268-Speed 3049.91 samples/sec Loss 3.0575 LearningRate 0.0045 Epoch: 15 Global Step: 195660 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:15:12,651-Speed 3028.08 samples/sec Loss 2.9953 LearningRate 0.0045 Epoch: 15 Global Step: 195670 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:15:16,075-Speed 2991.26 samples/sec Loss 3.0220 LearningRate 0.0045 Epoch: 15 Global Step: 195680 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:15:19,472-Speed 3014.85 samples/sec Loss 3.0009 LearningRate 0.0045 Epoch: 15 Global Step: 195690 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:15:22,960-Speed 2937.17 samples/sec Loss 2.9626 LearningRate 0.0045 Epoch: 15 Global Step: 195700 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:15:26,357-Speed 3015.52 samples/sec Loss 3.0484 LearningRate 0.0045 Epoch: 15 Global Step: 195710 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:15:29,791-Speed 2982.91 samples/sec Loss 3.0833 LearningRate 0.0045 Epoch: 15 Global Step: 195720 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:15:33,269-Speed 2945.60 samples/sec Loss 3.0268 LearningRate 0.0045 Epoch: 15 Global Step: 195730 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:15:36,578-Speed 3095.22 samples/sec Loss 3.0914 LearningRate 0.0045 Epoch: 15 Global Step: 195740 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:15:39,935-Speed 3051.12 samples/sec Loss 3.0045 LearningRate 0.0045 Epoch: 15 Global Step: 195750 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:15:43,443-Speed 2920.27 samples/sec Loss 3.1150 LearningRate 0.0045 Epoch: 15 Global Step: 195760 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:15:46,849-Speed 3007.22 samples/sec Loss 3.0727 LearningRate 0.0045 Epoch: 15 Global Step: 195770 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:15:50,336-Speed 2937.61 samples/sec Loss 2.9847 LearningRate 0.0045 Epoch: 15 Global Step: 195780 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:15:53,687-Speed 3056.38 samples/sec Loss 3.1317 LearningRate 0.0045 Epoch: 15 Global Step: 195790 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:15:57,097-Speed 3004.35 samples/sec Loss 3.0386 LearningRate 0.0045 Epoch: 15 Global Step: 195800 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:16:00,421-Speed 3081.66 samples/sec Loss 3.1134 LearningRate 0.0045 Epoch: 15 Global Step: 195810 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:16:03,841-Speed 2994.75 samples/sec Loss 3.0826 LearningRate 0.0045 Epoch: 15 Global Step: 195820 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:16:07,156-Speed 3089.98 samples/sec Loss 3.0895 LearningRate 0.0045 Epoch: 15 Global Step: 195830 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:16:10,501-Speed 3062.53 samples/sec Loss 3.0592 LearningRate 0.0045 Epoch: 15 Global Step: 195840 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:16:13,867-Speed 3043.28 samples/sec Loss 3.0698 LearningRate 0.0045 Epoch: 15 Global Step: 195850 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:16:17,193-Speed 3079.12 samples/sec Loss 3.0438 LearningRate 0.0045 Epoch: 15 Global Step: 195860 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:16:20,568-Speed 3035.47 samples/sec Loss 3.0981 LearningRate 0.0045 Epoch: 15 Global Step: 195870 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:16:23,950-Speed 3028.43 samples/sec Loss 3.0601 LearningRate 0.0045 Epoch: 15 Global Step: 195880 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:16:27,324-Speed 3035.52 samples/sec Loss 3.0881 LearningRate 0.0045 Epoch: 15 Global Step: 195890 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:16:30,752-Speed 2988.54 samples/sec Loss 3.0514 LearningRate 0.0045 Epoch: 15 Global Step: 195900 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:16:34,161-Speed 3004.36 samples/sec Loss 3.0657 LearningRate 0.0045 Epoch: 15 Global Step: 195910 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:16:37,496-Speed 3071.89 samples/sec Loss 3.0985 LearningRate 0.0045 Epoch: 15 Global Step: 195920 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:16:40,836-Speed 3066.89 samples/sec Loss 3.0366 LearningRate 0.0045 Epoch: 15 Global Step: 195930 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:16:44,231-Speed 3016.66 samples/sec Loss 3.0444 LearningRate 0.0045 Epoch: 15 Global Step: 195940 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:16:47,581-Speed 3057.60 samples/sec Loss 3.0397 LearningRate 0.0045 Epoch: 15 Global Step: 195950 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:16:50,969-Speed 3023.73 samples/sec Loss 3.0605 LearningRate 0.0045 Epoch: 15 Global Step: 195960 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:16:54,368-Speed 3013.29 samples/sec Loss 3.1018 LearningRate 0.0045 Epoch: 15 Global Step: 195970 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:16:57,721-Speed 3055.55 samples/sec Loss 3.0579 LearningRate 0.0045 Epoch: 15 Global Step: 195980 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:17:01,083-Speed 3046.22 samples/sec Loss 3.0894 LearningRate 0.0045 Epoch: 15 Global Step: 195990 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:17:04,430-Speed 3059.84 samples/sec Loss 3.0614 LearningRate 0.0045 Epoch: 15 Global Step: 196000 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:17:07,767-Speed 3070.05 samples/sec Loss 3.0589 LearningRate 0.0045 Epoch: 15 Global Step: 196010 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:17:11,205-Speed 2979.34 samples/sec Loss 3.0657 LearningRate 0.0044 Epoch: 15 Global Step: 196020 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:17:14,590-Speed 3025.73 samples/sec Loss 3.0373 LearningRate 0.0044 Epoch: 15 Global Step: 196030 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:17:17,992-Speed 3013.64 samples/sec Loss 3.0900 LearningRate 0.0044 Epoch: 15 Global Step: 196040 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:17:21,345-Speed 3055.16 samples/sec Loss 3.0572 LearningRate 0.0044 Epoch: 15 Global Step: 196050 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:17:24,742-Speed 3014.91 samples/sec Loss 3.1274 LearningRate 0.0044 Epoch: 15 Global Step: 196060 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:17:28,104-Speed 3046.14 samples/sec Loss 3.1602 LearningRate 0.0044 Epoch: 15 Global Step: 196070 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:17:31,481-Speed 3033.88 samples/sec Loss 3.0738 LearningRate 0.0044 Epoch: 15 Global Step: 196080 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:17:34,870-Speed 3022.22 samples/sec Loss 3.0930 LearningRate 0.0044 Epoch: 15 Global Step: 196090 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:17:38,269-Speed 3013.95 samples/sec Loss 3.0902 LearningRate 0.0044 Epoch: 15 Global Step: 196100 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:17:41,596-Speed 3078.21 samples/sec Loss 3.0919 LearningRate 0.0044 Epoch: 15 Global Step: 196110 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:17:44,949-Speed 3054.76 samples/sec Loss 3.1485 LearningRate 0.0044 Epoch: 15 Global Step: 196120 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:17:48,330-Speed 3029.95 samples/sec Loss 2.9874 LearningRate 0.0044 Epoch: 15 Global Step: 196130 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:17:51,714-Speed 3026.40 samples/sec Loss 3.0605 LearningRate 0.0044 Epoch: 15 Global Step: 196140 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:17:55,116-Speed 3010.89 samples/sec Loss 3.1014 LearningRate 0.0044 Epoch: 15 Global Step: 196150 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:17:58,583-Speed 2954.54 samples/sec Loss 2.9878 LearningRate 0.0044 Epoch: 15 Global Step: 196160 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:18:02,020-Speed 2980.75 samples/sec Loss 3.0680 LearningRate 0.0044 Epoch: 15 Global Step: 196170 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:18:05,339-Speed 3085.61 samples/sec Loss 3.0864 LearningRate 0.0044 Epoch: 15 Global Step: 196180 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:18:08,760-Speed 2994.81 samples/sec Loss 3.0836 LearningRate 0.0044 Epoch: 15 Global Step: 196190 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:18:12,159-Speed 3013.17 samples/sec Loss 3.1404 LearningRate 0.0044 Epoch: 15 Global Step: 196200 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:18:15,552-Speed 3018.54 samples/sec Loss 3.1040 LearningRate 0.0044 Epoch: 15 Global Step: 196210 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:18:18,984-Speed 2985.42 samples/sec Loss 3.0940 LearningRate 0.0044 Epoch: 15 Global Step: 196220 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:18:22,400-Speed 2998.57 samples/sec Loss 3.0729 LearningRate 0.0044 Epoch: 15 Global Step: 196230 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:18:25,734-Speed 3071.74 samples/sec Loss 3.0466 LearningRate 0.0044 Epoch: 15 Global Step: 196240 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:18:29,123-Speed 3021.98 samples/sec Loss 3.0515 LearningRate 0.0044 Epoch: 15 Global Step: 196250 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:18:32,504-Speed 3029.83 samples/sec Loss 3.0724 LearningRate 0.0044 Epoch: 15 Global Step: 196260 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:18:35,978-Speed 2948.85 samples/sec Loss 3.1294 LearningRate 0.0044 Epoch: 15 Global Step: 196270 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:18:39,326-Speed 3059.28 samples/sec Loss 3.0181 LearningRate 0.0044 Epoch: 15 Global Step: 196280 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:18:42,699-Speed 3036.91 samples/sec Loss 2.9989 LearningRate 0.0044 Epoch: 15 Global Step: 196290 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:18:46,107-Speed 3005.29 samples/sec Loss 3.1274 LearningRate 0.0044 Epoch: 15 Global Step: 196300 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:18:49,511-Speed 3008.62 samples/sec Loss 2.9668 LearningRate 0.0044 Epoch: 15 Global Step: 196310 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:18:52,962-Speed 2968.35 samples/sec Loss 3.0515 LearningRate 0.0044 Epoch: 15 Global Step: 196320 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:18:56,358-Speed 3016.30 samples/sec Loss 3.0738 LearningRate 0.0044 Epoch: 15 Global Step: 196330 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:18:59,672-Speed 3091.08 samples/sec Loss 3.1473 LearningRate 0.0044 Epoch: 15 Global Step: 196340 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:19:03,113-Speed 2976.86 samples/sec Loss 3.1166 LearningRate 0.0044 Epoch: 15 Global Step: 196350 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:19:06,552-Speed 2978.47 samples/sec Loss 3.0985 LearningRate 0.0044 Epoch: 15 Global Step: 196360 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:19:10,008-Speed 2963.97 samples/sec Loss 3.0551 LearningRate 0.0044 Epoch: 15 Global Step: 196370 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:19:13,479-Speed 2951.43 samples/sec Loss 3.0690 LearningRate 0.0044 Epoch: 15 Global Step: 196380 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:19:16,893-Speed 3000.17 samples/sec Loss 3.0702 LearningRate 0.0044 Epoch: 15 Global Step: 196390 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:19:20,245-Speed 3055.69 samples/sec Loss 3.0463 LearningRate 0.0044 Epoch: 15 Global Step: 196400 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:19:23,654-Speed 3004.20 samples/sec Loss 3.0997 LearningRate 0.0044 Epoch: 15 Global Step: 196410 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:19:27,086-Speed 2984.38 samples/sec Loss 3.1095 LearningRate 0.0044 Epoch: 15 Global Step: 196420 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:19:30,462-Speed 3033.93 samples/sec Loss 3.0799 LearningRate 0.0044 Epoch: 15 Global Step: 196430 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:19:33,824-Speed 3046.37 samples/sec Loss 3.0540 LearningRate 0.0044 Epoch: 15 Global Step: 196440 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:19:37,163-Speed 3068.21 samples/sec Loss 3.0268 LearningRate 0.0044 Epoch: 15 Global Step: 196450 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:19:40,576-Speed 3000.96 samples/sec Loss 3.0688 LearningRate 0.0044 Epoch: 15 Global Step: 196460 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:19:43,953-Speed 3033.19 samples/sec Loss 2.9914 LearningRate 0.0044 Epoch: 15 Global Step: 196470 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:19:47,307-Speed 3053.83 samples/sec Loss 3.0079 LearningRate 0.0044 Epoch: 15 Global Step: 196480 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:19:50,685-Speed 3032.12 samples/sec Loss 3.0488 LearningRate 0.0044 Epoch: 15 Global Step: 196490 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:19:54,070-Speed 3026.38 samples/sec Loss 3.0568 LearningRate 0.0044 Epoch: 15 Global Step: 196500 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:19:57,402-Speed 3073.97 samples/sec Loss 3.0679 LearningRate 0.0044 Epoch: 15 Global Step: 196510 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:20:00,755-Speed 3054.76 samples/sec Loss 3.0791 LearningRate 0.0044 Epoch: 15 Global Step: 196520 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:20:04,149-Speed 3018.09 samples/sec Loss 3.0244 LearningRate 0.0044 Epoch: 15 Global Step: 196530 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:20:07,534-Speed 3025.76 samples/sec Loss 3.1166 LearningRate 0.0044 Epoch: 15 Global Step: 196540 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:20:10,854-Speed 3085.42 samples/sec Loss 3.0758 LearningRate 0.0044 Epoch: 15 Global Step: 196550 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:20:14,212-Speed 3050.51 samples/sec Loss 3.0896 LearningRate 0.0044 Epoch: 15 Global Step: 196560 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:20:17,545-Speed 3073.87 samples/sec Loss 3.0727 LearningRate 0.0044 Epoch: 15 Global Step: 196570 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:20:20,926-Speed 3029.27 samples/sec Loss 3.0707 LearningRate 0.0044 Epoch: 15 Global Step: 196580 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:20:24,349-Speed 2992.18 samples/sec Loss 3.1149 LearningRate 0.0044 Epoch: 15 Global Step: 196590 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:20:27,740-Speed 3021.12 samples/sec Loss 3.0603 LearningRate 0.0044 Epoch: 15 Global Step: 196600 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:20:31,111-Speed 3038.38 samples/sec Loss 3.0777 LearningRate 0.0043 Epoch: 15 Global Step: 196610 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:20:34,494-Speed 3028.44 samples/sec Loss 3.0848 LearningRate 0.0043 Epoch: 15 Global Step: 196620 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:20:37,900-Speed 3007.00 samples/sec Loss 3.1357 LearningRate 0.0043 Epoch: 15 Global Step: 196630 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:20:41,247-Speed 3060.20 samples/sec Loss 3.1408 LearningRate 0.0043 Epoch: 15 Global Step: 196640 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:20:44,660-Speed 3001.02 samples/sec Loss 3.0569 LearningRate 0.0043 Epoch: 15 Global Step: 196650 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:20:48,015-Speed 3053.84 samples/sec Loss 3.0620 LearningRate 0.0043 Epoch: 15 Global Step: 196660 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:20:51,373-Speed 3050.06 samples/sec Loss 3.1172 LearningRate 0.0043 Epoch: 15 Global Step: 196670 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:20:54,737-Speed 3045.45 samples/sec Loss 3.1086 LearningRate 0.0043 Epoch: 15 Global Step: 196680 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:20:58,131-Speed 3017.82 samples/sec Loss 3.0255 LearningRate 0.0043 Epoch: 15 Global Step: 196690 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:21:01,618-Speed 2936.67 samples/sec Loss 3.0067 LearningRate 0.0043 Epoch: 15 Global Step: 196700 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:21:05,121-Speed 2924.14 samples/sec Loss 3.0602 LearningRate 0.0043 Epoch: 15 Global Step: 196710 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:21:08,454-Speed 3073.14 samples/sec Loss 3.0349 LearningRate 0.0043 Epoch: 15 Global Step: 196720 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:21:11,817-Speed 3045.59 samples/sec Loss 3.0703 LearningRate 0.0043 Epoch: 15 Global Step: 196730 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:21:15,148-Speed 3075.52 samples/sec Loss 3.0861 LearningRate 0.0043 Epoch: 15 Global Step: 196740 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:21:18,545-Speed 3015.01 samples/sec Loss 3.0663 LearningRate 0.0043 Epoch: 15 Global Step: 196750 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:21:21,938-Speed 3019.26 samples/sec Loss 2.9997 LearningRate 0.0043 Epoch: 15 Global Step: 196760 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:21:25,379-Speed 2976.37 samples/sec Loss 3.0490 LearningRate 0.0043 Epoch: 15 Global Step: 196770 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:21:28,789-Speed 3003.91 samples/sec Loss 3.0898 LearningRate 0.0043 Epoch: 15 Global Step: 196780 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:21:32,119-Speed 3075.83 samples/sec Loss 3.1050 LearningRate 0.0043 Epoch: 15 Global Step: 196790 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:21:35,568-Speed 2970.15 samples/sec Loss 3.0922 LearningRate 0.0043 Epoch: 15 Global Step: 196800 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:21:39,078-Speed 2918.12 samples/sec Loss 3.0822 LearningRate 0.0043 Epoch: 15 Global Step: 196810 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:21:42,465-Speed 3024.38 samples/sec Loss 3.0417 LearningRate 0.0043 Epoch: 15 Global Step: 196820 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:21:45,832-Speed 3041.75 samples/sec Loss 3.0539 LearningRate 0.0043 Epoch: 15 Global Step: 196830 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:21:49,201-Speed 3040.50 samples/sec Loss 3.1570 LearningRate 0.0043 Epoch: 15 Global Step: 196840 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:21:52,542-Speed 3066.60 samples/sec Loss 3.0729 LearningRate 0.0043 Epoch: 15 Global Step: 196850 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:21:55,958-Speed 2999.54 samples/sec Loss 3.1109 LearningRate 0.0043 Epoch: 15 Global Step: 196860 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:21:59,300-Speed 3064.72 samples/sec Loss 3.0850 LearningRate 0.0043 Epoch: 15 Global Step: 196870 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 20:22:02,697-Speed 3014.87 samples/sec Loss 3.0266 LearningRate 0.0043 Epoch: 15 Global Step: 196880 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 20:22:06,125-Speed 2987.90 samples/sec Loss 2.9587 LearningRate 0.0043 Epoch: 15 Global Step: 196890 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 20:22:09,472-Speed 3060.50 samples/sec Loss 2.9991 LearningRate 0.0043 Epoch: 15 Global Step: 196900 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 20:22:12,867-Speed 3017.37 samples/sec Loss 3.0165 LearningRate 0.0043 Epoch: 15 Global Step: 196910 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 20:22:16,209-Speed 3064.82 samples/sec Loss 3.1122 LearningRate 0.0043 Epoch: 15 Global Step: 196920 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 20:22:19,592-Speed 3027.40 samples/sec Loss 3.1140 LearningRate 0.0043 Epoch: 15 Global Step: 196930 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 20:22:23,033-Speed 2976.98 samples/sec Loss 3.0594 LearningRate 0.0043 Epoch: 15 Global Step: 196940 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 20:22:26,396-Speed 3046.17 samples/sec Loss 2.9963 LearningRate 0.0043 Epoch: 15 Global Step: 196950 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 20:22:29,826-Speed 2985.94 samples/sec Loss 3.0769 LearningRate 0.0043 Epoch: 15 Global Step: 196960 Fp16 Grad Scale: 8192 Required: 5 hours Training: 2022-04-27 20:22:33,298-Speed 2949.98 samples/sec Loss 3.0482 LearningRate 0.0043 Epoch: 15 Global Step: 196970 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:22:36,648-Speed 3057.45 samples/sec Loss 3.0027 LearningRate 0.0043 Epoch: 15 Global Step: 196980 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:22:40,032-Speed 3027.03 samples/sec Loss 3.0567 LearningRate 0.0043 Epoch: 15 Global Step: 196990 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:22:43,492-Speed 2960.66 samples/sec Loss 3.0872 LearningRate 0.0043 Epoch: 15 Global Step: 197000 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:22:46,831-Speed 3067.58 samples/sec Loss 3.0660 LearningRate 0.0043 Epoch: 15 Global Step: 197010 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:22:50,228-Speed 3014.96 samples/sec Loss 3.0881 LearningRate 0.0043 Epoch: 15 Global Step: 197020 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:22:53,696-Speed 2953.51 samples/sec Loss 3.0311 LearningRate 0.0043 Epoch: 15 Global Step: 197030 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:22:57,077-Speed 3029.36 samples/sec Loss 3.1421 LearningRate 0.0043 Epoch: 15 Global Step: 197040 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:23:00,435-Speed 3049.97 samples/sec Loss 3.0965 LearningRate 0.0043 Epoch: 15 Global Step: 197050 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:23:03,932-Speed 2929.38 samples/sec Loss 2.9900 LearningRate 0.0043 Epoch: 15 Global Step: 197060 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:23:07,337-Speed 3007.29 samples/sec Loss 3.0580 LearningRate 0.0043 Epoch: 15 Global Step: 197070 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:23:10,740-Speed 3010.43 samples/sec Loss 3.0579 LearningRate 0.0043 Epoch: 15 Global Step: 197080 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:23:14,185-Speed 2973.57 samples/sec Loss 3.0887 LearningRate 0.0043 Epoch: 15 Global Step: 197090 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:23:17,646-Speed 2959.09 samples/sec Loss 3.1065 LearningRate 0.0043 Epoch: 15 Global Step: 197100 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:23:21,063-Speed 2997.28 samples/sec Loss 3.0855 LearningRate 0.0043 Epoch: 15 Global Step: 197110 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:23:24,473-Speed 3003.57 samples/sec Loss 3.1105 LearningRate 0.0043 Epoch: 15 Global Step: 197120 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:23:27,856-Speed 3027.80 samples/sec Loss 3.0087 LearningRate 0.0043 Epoch: 15 Global Step: 197130 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:23:31,291-Speed 2981.95 samples/sec Loss 3.0475 LearningRate 0.0043 Epoch: 15 Global Step: 197140 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:23:34,617-Speed 3079.54 samples/sec Loss 3.0578 LearningRate 0.0043 Epoch: 15 Global Step: 197150 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:23:37,951-Speed 3072.46 samples/sec Loss 3.0544 LearningRate 0.0043 Epoch: 15 Global Step: 197160 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:23:41,379-Speed 2987.86 samples/sec Loss 3.0797 LearningRate 0.0043 Epoch: 15 Global Step: 197170 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:23:44,825-Speed 2971.65 samples/sec Loss 3.0547 LearningRate 0.0043 Epoch: 15 Global Step: 197180 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:23:48,317-Speed 2933.26 samples/sec Loss 3.0310 LearningRate 0.0043 Epoch: 15 Global Step: 197190 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:23:51,731-Speed 3001.02 samples/sec Loss 3.0673 LearningRate 0.0043 Epoch: 15 Global Step: 197200 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:23:55,175-Speed 2974.05 samples/sec Loss 3.0539 LearningRate 0.0042 Epoch: 15 Global Step: 197210 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:23:58,571-Speed 3015.73 samples/sec Loss 3.0414 LearningRate 0.0042 Epoch: 15 Global Step: 197220 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:24:02,021-Speed 2969.22 samples/sec Loss 2.9969 LearningRate 0.0042 Epoch: 15 Global Step: 197230 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:24:05,384-Speed 3044.96 samples/sec Loss 3.0831 LearningRate 0.0042 Epoch: 15 Global Step: 197240 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:24:08,741-Speed 3052.28 samples/sec Loss 3.0477 LearningRate 0.0042 Epoch: 15 Global Step: 197250 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:24:12,158-Speed 2997.09 samples/sec Loss 3.0369 LearningRate 0.0042 Epoch: 15 Global Step: 197260 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:24:15,568-Speed 3004.27 samples/sec Loss 3.1077 LearningRate 0.0042 Epoch: 15 Global Step: 197270 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:24:18,999-Speed 2984.54 samples/sec Loss 3.0305 LearningRate 0.0042 Epoch: 15 Global Step: 197280 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:24:22,498-Speed 2928.36 samples/sec Loss 3.0809 LearningRate 0.0042 Epoch: 15 Global Step: 197290 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:24:25,916-Speed 2995.92 samples/sec Loss 3.0217 LearningRate 0.0042 Epoch: 15 Global Step: 197300 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:24:29,408-Speed 2933.70 samples/sec Loss 3.0705 LearningRate 0.0042 Epoch: 15 Global Step: 197310 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:24:32,777-Speed 3039.97 samples/sec Loss 3.1147 LearningRate 0.0042 Epoch: 15 Global Step: 197320 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:24:36,219-Speed 2976.20 samples/sec Loss 3.0470 LearningRate 0.0042 Epoch: 15 Global Step: 197330 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:24:39,530-Speed 3093.83 samples/sec Loss 2.9840 LearningRate 0.0042 Epoch: 15 Global Step: 197340 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:24:42,904-Speed 3035.07 samples/sec Loss 2.9781 LearningRate 0.0042 Epoch: 15 Global Step: 197350 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:24:46,349-Speed 2973.13 samples/sec Loss 3.0775 LearningRate 0.0042 Epoch: 15 Global Step: 197360 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:24:49,801-Speed 2966.91 samples/sec Loss 3.0545 LearningRate 0.0042 Epoch: 15 Global Step: 197370 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:24:53,221-Speed 2996.04 samples/sec Loss 3.0917 LearningRate 0.0042 Epoch: 15 Global Step: 197380 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:24:56,571-Speed 3057.05 samples/sec Loss 3.1400 LearningRate 0.0042 Epoch: 15 Global Step: 197390 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:24:59,984-Speed 3001.24 samples/sec Loss 3.1064 LearningRate 0.0042 Epoch: 15 Global Step: 197400 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:25:03,441-Speed 2963.31 samples/sec Loss 3.0625 LearningRate 0.0042 Epoch: 15 Global Step: 197410 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:25:06,917-Speed 2945.85 samples/sec Loss 3.0179 LearningRate 0.0042 Epoch: 15 Global Step: 197420 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:25:10,364-Speed 2972.60 samples/sec Loss 3.0682 LearningRate 0.0042 Epoch: 15 Global Step: 197430 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:25:13,747-Speed 3027.38 samples/sec Loss 3.0479 LearningRate 0.0042 Epoch: 15 Global Step: 197440 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:25:17,051-Speed 3099.93 samples/sec Loss 3.0702 LearningRate 0.0042 Epoch: 15 Global Step: 197450 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:25:20,382-Speed 3075.75 samples/sec Loss 3.0701 LearningRate 0.0042 Epoch: 15 Global Step: 197460 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 20:25:23,784-Speed 3010.30 samples/sec Loss 3.1121 LearningRate 0.0042 Epoch: 15 Global Step: 197470 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:25:27,254-Speed 2951.53 samples/sec Loss 3.1157 LearningRate 0.0042 Epoch: 15 Global Step: 197480 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:25:30,632-Speed 3032.37 samples/sec Loss 3.1127 LearningRate 0.0042 Epoch: 15 Global Step: 197490 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:25:34,064-Speed 2985.32 samples/sec Loss 3.0419 LearningRate 0.0042 Epoch: 15 Global Step: 197500 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:25:37,408-Speed 3062.76 samples/sec Loss 3.1187 LearningRate 0.0042 Epoch: 15 Global Step: 197510 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:25:40,757-Speed 3058.68 samples/sec Loss 3.0036 LearningRate 0.0042 Epoch: 15 Global Step: 197520 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:25:44,127-Speed 3038.59 samples/sec Loss 2.9618 LearningRate 0.0042 Epoch: 15 Global Step: 197530 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:25:47,489-Speed 3047.16 samples/sec Loss 3.0170 LearningRate 0.0042 Epoch: 15 Global Step: 197540 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:25:50,923-Speed 2982.77 samples/sec Loss 3.0672 LearningRate 0.0042 Epoch: 15 Global Step: 197550 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:25:54,283-Speed 3048.46 samples/sec Loss 2.9563 LearningRate 0.0042 Epoch: 15 Global Step: 197560 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:25:57,653-Speed 3038.95 samples/sec Loss 3.0456 LearningRate 0.0042 Epoch: 15 Global Step: 197570 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:26:01,023-Speed 3039.30 samples/sec Loss 3.0974 LearningRate 0.0042 Epoch: 15 Global Step: 197580 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:26:04,410-Speed 3024.48 samples/sec Loss 3.0619 LearningRate 0.0042 Epoch: 15 Global Step: 197590 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:26:07,802-Speed 3019.94 samples/sec Loss 3.0962 LearningRate 0.0042 Epoch: 15 Global Step: 197600 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:26:11,170-Speed 3040.54 samples/sec Loss 2.9898 LearningRate 0.0042 Epoch: 15 Global Step: 197610 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:26:14,556-Speed 3025.51 samples/sec Loss 3.0081 LearningRate 0.0042 Epoch: 15 Global Step: 197620 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:26:17,870-Speed 3090.87 samples/sec Loss 3.0133 LearningRate 0.0042 Epoch: 15 Global Step: 197630 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:26:21,275-Speed 3007.59 samples/sec Loss 3.0230 LearningRate 0.0042 Epoch: 15 Global Step: 197640 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:26:24,677-Speed 3010.83 samples/sec Loss 3.1344 LearningRate 0.0042 Epoch: 15 Global Step: 197650 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:26:28,019-Speed 3065.42 samples/sec Loss 3.0929 LearningRate 0.0042 Epoch: 15 Global Step: 197660 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:26:31,377-Speed 3050.35 samples/sec Loss 3.1397 LearningRate 0.0042 Epoch: 15 Global Step: 197670 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:26:34,788-Speed 3002.82 samples/sec Loss 3.0338 LearningRate 0.0042 Epoch: 15 Global Step: 197680 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:26:38,184-Speed 3016.22 samples/sec Loss 3.0014 LearningRate 0.0042 Epoch: 15 Global Step: 197690 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:26:41,569-Speed 3026.13 samples/sec Loss 3.0505 LearningRate 0.0042 Epoch: 15 Global Step: 197700 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:26:44,954-Speed 3025.33 samples/sec Loss 3.0442 LearningRate 0.0042 Epoch: 15 Global Step: 197710 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:26:48,384-Speed 2986.57 samples/sec Loss 3.0602 LearningRate 0.0042 Epoch: 15 Global Step: 197720 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:26:51,808-Speed 2991.30 samples/sec Loss 3.1712 LearningRate 0.0042 Epoch: 15 Global Step: 197730 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:26:55,122-Speed 3091.13 samples/sec Loss 2.9913 LearningRate 0.0042 Epoch: 15 Global Step: 197740 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:26:58,562-Speed 2977.58 samples/sec Loss 3.0972 LearningRate 0.0042 Epoch: 15 Global Step: 197750 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:27:01,957-Speed 3016.82 samples/sec Loss 3.0166 LearningRate 0.0042 Epoch: 15 Global Step: 197760 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:27:05,336-Speed 3031.45 samples/sec Loss 3.0343 LearningRate 0.0042 Epoch: 15 Global Step: 197770 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:27:08,746-Speed 3004.18 samples/sec Loss 2.9737 LearningRate 0.0042 Epoch: 15 Global Step: 197780 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:27:12,209-Speed 2957.04 samples/sec Loss 3.0525 LearningRate 0.0042 Epoch: 15 Global Step: 197790 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:27:15,630-Speed 2994.10 samples/sec Loss 3.0134 LearningRate 0.0042 Epoch: 15 Global Step: 197800 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:27:19,100-Speed 2952.38 samples/sec Loss 2.9714 LearningRate 0.0042 Epoch: 15 Global Step: 197810 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:27:22,491-Speed 3020.75 samples/sec Loss 3.0154 LearningRate 0.0041 Epoch: 15 Global Step: 197820 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:27:25,964-Speed 2949.45 samples/sec Loss 3.1010 LearningRate 0.0041 Epoch: 15 Global Step: 197830 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:27:29,377-Speed 3000.63 samples/sec Loss 3.0076 LearningRate 0.0041 Epoch: 15 Global Step: 197840 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:27:32,772-Speed 3017.06 samples/sec Loss 3.1433 LearningRate 0.0041 Epoch: 15 Global Step: 197850 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:27:36,144-Speed 3037.50 samples/sec Loss 3.0643 LearningRate 0.0041 Epoch: 15 Global Step: 197860 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:27:39,560-Speed 2998.88 samples/sec Loss 2.9854 LearningRate 0.0041 Epoch: 15 Global Step: 197870 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:27:42,917-Speed 3051.43 samples/sec Loss 3.0720 LearningRate 0.0041 Epoch: 15 Global Step: 197880 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:27:46,391-Speed 2948.14 samples/sec Loss 3.0836 LearningRate 0.0041 Epoch: 15 Global Step: 197890 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:27:49,878-Speed 2937.93 samples/sec Loss 3.0146 LearningRate 0.0041 Epoch: 15 Global Step: 197900 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:27:53,337-Speed 2960.57 samples/sec Loss 3.0437 LearningRate 0.0041 Epoch: 15 Global Step: 197910 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:27:56,742-Speed 3008.72 samples/sec Loss 3.1107 LearningRate 0.0041 Epoch: 15 Global Step: 197920 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:28:00,158-Speed 2997.72 samples/sec Loss 3.0340 LearningRate 0.0041 Epoch: 15 Global Step: 197930 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:28:03,543-Speed 3025.99 samples/sec Loss 3.0438 LearningRate 0.0041 Epoch: 15 Global Step: 197940 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:28:06,864-Speed 3084.68 samples/sec Loss 3.0252 LearningRate 0.0041 Epoch: 15 Global Step: 197950 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:28:10,195-Speed 3074.94 samples/sec Loss 3.0556 LearningRate 0.0041 Epoch: 15 Global Step: 197960 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:28:13,519-Speed 3081.33 samples/sec Loss 2.9667 LearningRate 0.0041 Epoch: 15 Global Step: 197970 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:28:16,818-Speed 3104.25 samples/sec Loss 3.0950 LearningRate 0.0041 Epoch: 15 Global Step: 197980 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:28:20,262-Speed 2974.56 samples/sec Loss 3.0592 LearningRate 0.0041 Epoch: 15 Global Step: 197990 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:28:23,712-Speed 2969.88 samples/sec Loss 2.9403 LearningRate 0.0041 Epoch: 15 Global Step: 198000 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:28:27,120-Speed 3005.67 samples/sec Loss 3.0413 LearningRate 0.0041 Epoch: 15 Global Step: 198010 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:28:30,612-Speed 2932.93 samples/sec Loss 3.1143 LearningRate 0.0041 Epoch: 15 Global Step: 198020 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:28:34,013-Speed 3011.59 samples/sec Loss 3.1426 LearningRate 0.0041 Epoch: 15 Global Step: 198030 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:28:37,383-Speed 3039.52 samples/sec Loss 3.0427 LearningRate 0.0041 Epoch: 15 Global Step: 198040 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:28:40,785-Speed 3010.76 samples/sec Loss 3.1729 LearningRate 0.0041 Epoch: 15 Global Step: 198050 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:28:44,169-Speed 3027.42 samples/sec Loss 3.0542 LearningRate 0.0041 Epoch: 15 Global Step: 198060 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:28:47,573-Speed 3008.24 samples/sec Loss 3.0435 LearningRate 0.0041 Epoch: 15 Global Step: 198070 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:28:50,944-Speed 3038.57 samples/sec Loss 2.9685 LearningRate 0.0041 Epoch: 15 Global Step: 198080 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:28:54,328-Speed 3026.78 samples/sec Loss 3.0237 LearningRate 0.0041 Epoch: 15 Global Step: 198090 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:28:57,756-Speed 2988.23 samples/sec Loss 3.0884 LearningRate 0.0041 Epoch: 15 Global Step: 198100 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:29:01,160-Speed 3009.34 samples/sec Loss 3.1207 LearningRate 0.0041 Epoch: 15 Global Step: 198110 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:29:04,595-Speed 2982.54 samples/sec Loss 3.0956 LearningRate 0.0041 Epoch: 15 Global Step: 198120 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:29:07,978-Speed 3027.69 samples/sec Loss 3.0914 LearningRate 0.0041 Epoch: 15 Global Step: 198130 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:29:11,386-Speed 3005.93 samples/sec Loss 3.0145 LearningRate 0.0041 Epoch: 15 Global Step: 198140 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:29:14,770-Speed 3026.32 samples/sec Loss 3.0099 LearningRate 0.0041 Epoch: 15 Global Step: 198150 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:29:18,132-Speed 3047.21 samples/sec Loss 3.0619 LearningRate 0.0041 Epoch: 15 Global Step: 198160 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:29:21,496-Speed 3044.39 samples/sec Loss 3.0337 LearningRate 0.0041 Epoch: 15 Global Step: 198170 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:29:24,864-Speed 3041.43 samples/sec Loss 2.9657 LearningRate 0.0041 Epoch: 15 Global Step: 198180 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 20:29:28,242-Speed 3031.94 samples/sec Loss 3.0440 LearningRate 0.0041 Epoch: 15 Global Step: 198190 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:29:31,664-Speed 2993.34 samples/sec Loss 3.1012 LearningRate 0.0041 Epoch: 15 Global Step: 198200 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:29:35,035-Speed 3038.76 samples/sec Loss 2.9444 LearningRate 0.0041 Epoch: 15 Global Step: 198210 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:29:38,363-Speed 3078.35 samples/sec Loss 3.0283 LearningRate 0.0041 Epoch: 15 Global Step: 198220 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:29:41,723-Speed 3049.09 samples/sec Loss 2.9612 LearningRate 0.0041 Epoch: 15 Global Step: 198230 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:29:45,153-Speed 2986.14 samples/sec Loss 3.0148 LearningRate 0.0041 Epoch: 15 Global Step: 198240 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:29:48,608-Speed 2964.16 samples/sec Loss 3.0541 LearningRate 0.0041 Epoch: 15 Global Step: 198250 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:29:51,968-Speed 3049.14 samples/sec Loss 3.1204 LearningRate 0.0041 Epoch: 15 Global Step: 198260 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:29:55,399-Speed 2985.58 samples/sec Loss 3.0501 LearningRate 0.0041 Epoch: 15 Global Step: 198270 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:29:58,799-Speed 3011.96 samples/sec Loss 3.0242 LearningRate 0.0041 Epoch: 15 Global Step: 198280 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:30:02,193-Speed 3018.48 samples/sec Loss 3.1167 LearningRate 0.0041 Epoch: 15 Global Step: 198290 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:30:05,594-Speed 3011.45 samples/sec Loss 2.9232 LearningRate 0.0041 Epoch: 15 Global Step: 198300 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:30:08,970-Speed 3033.86 samples/sec Loss 3.0627 LearningRate 0.0041 Epoch: 15 Global Step: 198310 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:30:12,320-Speed 3057.64 samples/sec Loss 3.0418 LearningRate 0.0041 Epoch: 15 Global Step: 198320 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:30:15,762-Speed 2975.88 samples/sec Loss 3.0444 LearningRate 0.0041 Epoch: 15 Global Step: 198330 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:30:19,084-Speed 3083.13 samples/sec Loss 3.0800 LearningRate 0.0041 Epoch: 15 Global Step: 198340 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:30:22,409-Speed 3080.77 samples/sec Loss 3.0252 LearningRate 0.0041 Epoch: 15 Global Step: 198350 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:30:25,710-Speed 3103.00 samples/sec Loss 3.0271 LearningRate 0.0041 Epoch: 15 Global Step: 198360 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:30:29,044-Speed 3072.51 samples/sec Loss 3.0856 LearningRate 0.0041 Epoch: 15 Global Step: 198370 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:30:32,416-Speed 3037.59 samples/sec Loss 3.0669 LearningRate 0.0041 Epoch: 15 Global Step: 198380 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:30:35,813-Speed 3014.42 samples/sec Loss 3.0680 LearningRate 0.0041 Epoch: 15 Global Step: 198390 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:30:39,171-Speed 3050.81 samples/sec Loss 2.9964 LearningRate 0.0041 Epoch: 15 Global Step: 198400 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:30:42,601-Speed 2986.38 samples/sec Loss 3.0960 LearningRate 0.0041 Epoch: 15 Global Step: 198410 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:30:45,944-Speed 3063.91 samples/sec Loss 3.0979 LearningRate 0.0041 Epoch: 15 Global Step: 198420 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:30:49,322-Speed 3032.58 samples/sec Loss 2.9165 LearningRate 0.0040 Epoch: 15 Global Step: 198430 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:30:52,711-Speed 3033.40 samples/sec Loss 2.9721 LearningRate 0.0040 Epoch: 15 Global Step: 198440 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:30:56,092-Speed 3029.51 samples/sec Loss 2.9670 LearningRate 0.0040 Epoch: 15 Global Step: 198450 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:30:59,469-Speed 3033.77 samples/sec Loss 2.9626 LearningRate 0.0040 Epoch: 15 Global Step: 198460 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:31:02,871-Speed 3010.83 samples/sec Loss 2.9993 LearningRate 0.0040 Epoch: 15 Global Step: 198470 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:31:06,257-Speed 3025.23 samples/sec Loss 2.9274 LearningRate 0.0040 Epoch: 15 Global Step: 198480 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:31:09,770-Speed 2915.57 samples/sec Loss 2.9496 LearningRate 0.0040 Epoch: 15 Global Step: 198490 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:31:13,154-Speed 3026.93 samples/sec Loss 3.0761 LearningRate 0.0040 Epoch: 15 Global Step: 198500 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:31:16,595-Speed 2976.64 samples/sec Loss 3.0194 LearningRate 0.0040 Epoch: 15 Global Step: 198510 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:31:20,048-Speed 2966.38 samples/sec Loss 3.0972 LearningRate 0.0040 Epoch: 15 Global Step: 198520 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:31:23,363-Speed 3089.62 samples/sec Loss 2.9628 LearningRate 0.0040 Epoch: 15 Global Step: 198530 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:31:26,764-Speed 3011.98 samples/sec Loss 3.0197 LearningRate 0.0040 Epoch: 15 Global Step: 198540 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:31:30,130-Speed 3043.10 samples/sec Loss 2.9549 LearningRate 0.0040 Epoch: 15 Global Step: 198550 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:31:33,503-Speed 3036.84 samples/sec Loss 2.9811 LearningRate 0.0040 Epoch: 15 Global Step: 198560 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:31:36,917-Speed 2999.82 samples/sec Loss 3.1068 LearningRate 0.0040 Epoch: 15 Global Step: 198570 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:31:40,325-Speed 3006.16 samples/sec Loss 3.0658 LearningRate 0.0040 Epoch: 15 Global Step: 198580 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:31:43,752-Speed 2988.66 samples/sec Loss 3.0847 LearningRate 0.0040 Epoch: 15 Global Step: 198590 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:31:47,075-Speed 3081.77 samples/sec Loss 3.0376 LearningRate 0.0040 Epoch: 15 Global Step: 198600 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:31:50,420-Speed 3062.48 samples/sec Loss 3.0110 LearningRate 0.0040 Epoch: 15 Global Step: 198610 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:31:53,777-Speed 3051.05 samples/sec Loss 3.0501 LearningRate 0.0040 Epoch: 15 Global Step: 198620 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:31:57,184-Speed 3006.42 samples/sec Loss 3.0303 LearningRate 0.0040 Epoch: 15 Global Step: 198630 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:32:00,564-Speed 3030.71 samples/sec Loss 3.0632 LearningRate 0.0040 Epoch: 15 Global Step: 198640 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:32:04,039-Speed 2947.47 samples/sec Loss 3.0745 LearningRate 0.0040 Epoch: 15 Global Step: 198650 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:32:07,444-Speed 3008.49 samples/sec Loss 3.0606 LearningRate 0.0040 Epoch: 15 Global Step: 198660 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:32:10,833-Speed 3022.36 samples/sec Loss 3.0202 LearningRate 0.0040 Epoch: 15 Global Step: 198670 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:32:14,148-Speed 3090.07 samples/sec Loss 3.0642 LearningRate 0.0040 Epoch: 15 Global Step: 198680 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:32:17,576-Speed 2987.60 samples/sec Loss 3.0540 LearningRate 0.0040 Epoch: 15 Global Step: 198690 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:32:21,000-Speed 2991.56 samples/sec Loss 3.0443 LearningRate 0.0040 Epoch: 15 Global Step: 198700 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:32:24,408-Speed 3006.77 samples/sec Loss 3.0192 LearningRate 0.0040 Epoch: 15 Global Step: 198710 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:32:27,799-Speed 3020.47 samples/sec Loss 3.0681 LearningRate 0.0040 Epoch: 15 Global Step: 198720 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:32:31,471-Speed 2789.63 samples/sec Loss 2.9739 LearningRate 0.0040 Epoch: 15 Global Step: 198730 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:33:03,994-Speed 314.87 samples/sec Loss 2.5891 LearningRate 0.0040 Epoch: 16 Global Step: 198740 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:33:07,356-Speed 3046.50 samples/sec Loss 1.9987 LearningRate 0.0040 Epoch: 16 Global Step: 198750 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:33:10,899-Speed 2891.60 samples/sec Loss 2.0033 LearningRate 0.0040 Epoch: 16 Global Step: 198760 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:33:14,297-Speed 3014.18 samples/sec Loss 2.0097 LearningRate 0.0040 Epoch: 16 Global Step: 198770 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:33:17,696-Speed 3014.14 samples/sec Loss 1.9355 LearningRate 0.0040 Epoch: 16 Global Step: 198780 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:33:21,066-Speed 3039.68 samples/sec Loss 1.9545 LearningRate 0.0040 Epoch: 16 Global Step: 198790 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:33:24,556-Speed 2934.78 samples/sec Loss 1.9735 LearningRate 0.0040 Epoch: 16 Global Step: 198800 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:33:27,977-Speed 2994.61 samples/sec Loss 1.9689 LearningRate 0.0040 Epoch: 16 Global Step: 198810 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:33:31,419-Speed 2976.48 samples/sec Loss 2.0019 LearningRate 0.0040 Epoch: 16 Global Step: 198820 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:33:35,224-Speed 2691.91 samples/sec Loss 1.9245 LearningRate 0.0040 Epoch: 16 Global Step: 198830 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:33:38,643-Speed 2996.51 samples/sec Loss 1.9598 LearningRate 0.0040 Epoch: 16 Global Step: 198840 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:33:42,028-Speed 3025.94 samples/sec Loss 1.9370 LearningRate 0.0040 Epoch: 16 Global Step: 198850 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:33:45,642-Speed 2834.45 samples/sec Loss 1.9799 LearningRate 0.0040 Epoch: 16 Global Step: 198860 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:33:48,997-Speed 3053.17 samples/sec Loss 1.9380 LearningRate 0.0040 Epoch: 16 Global Step: 198870 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:33:52,493-Speed 2929.77 samples/sec Loss 1.9739 LearningRate 0.0040 Epoch: 16 Global Step: 198880 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 20:33:55,885-Speed 3020.27 samples/sec Loss 1.9504 LearningRate 0.0040 Epoch: 16 Global Step: 198890 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 20:33:59,328-Speed 2974.81 samples/sec Loss 1.9501 LearningRate 0.0040 Epoch: 16 Global Step: 198900 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:34:02,783-Speed 2964.72 samples/sec Loss 1.9244 LearningRate 0.0040 Epoch: 16 Global Step: 198910 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:34:06,126-Speed 3063.64 samples/sec Loss 1.9064 LearningRate 0.0040 Epoch: 16 Global Step: 198920 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:34:09,457-Speed 3074.71 samples/sec Loss 1.9806 LearningRate 0.0040 Epoch: 16 Global Step: 198930 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:34:12,824-Speed 3043.12 samples/sec Loss 1.9493 LearningRate 0.0040 Epoch: 16 Global Step: 198940 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:34:16,247-Speed 2992.23 samples/sec Loss 1.9922 LearningRate 0.0040 Epoch: 16 Global Step: 198950 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:34:19,735-Speed 2936.09 samples/sec Loss 1.9799 LearningRate 0.0040 Epoch: 16 Global Step: 198960 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:34:23,138-Speed 3010.74 samples/sec Loss 2.0003 LearningRate 0.0040 Epoch: 16 Global Step: 198970 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:34:26,498-Speed 3048.64 samples/sec Loss 1.9889 LearningRate 0.0040 Epoch: 16 Global Step: 198980 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:34:29,830-Speed 3073.53 samples/sec Loss 1.9501 LearningRate 0.0040 Epoch: 16 Global Step: 198990 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:34:33,251-Speed 2994.51 samples/sec Loss 1.9211 LearningRate 0.0040 Epoch: 16 Global Step: 199000 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 20:34:36,596-Speed 3061.65 samples/sec Loss 1.9562 LearningRate 0.0040 Epoch: 16 Global Step: 199010 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 20:34:39,979-Speed 3028.32 samples/sec Loss 1.9902 LearningRate 0.0040 Epoch: 16 Global Step: 199020 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 20:34:43,299-Speed 3085.58 samples/sec Loss 2.0287 LearningRate 0.0040 Epoch: 16 Global Step: 199030 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 20:34:46,635-Speed 3069.89 samples/sec Loss 2.0304 LearningRate 0.0040 Epoch: 16 Global Step: 199040 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 20:34:50,035-Speed 3012.36 samples/sec Loss 2.0102 LearningRate 0.0039 Epoch: 16 Global Step: 199050 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 20:34:53,481-Speed 2972.63 samples/sec Loss 1.9799 LearningRate 0.0039 Epoch: 16 Global Step: 199060 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:34:56,943-Speed 2958.94 samples/sec Loss 1.9644 LearningRate 0.0039 Epoch: 16 Global Step: 199070 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:35:00,331-Speed 3023.01 samples/sec Loss 2.0197 LearningRate 0.0039 Epoch: 16 Global Step: 199080 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:35:03,812-Speed 2943.09 samples/sec Loss 1.9396 LearningRate 0.0039 Epoch: 16 Global Step: 199090 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:35:07,120-Speed 3096.01 samples/sec Loss 1.9240 LearningRate 0.0039 Epoch: 16 Global Step: 199100 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:35:10,475-Speed 3052.84 samples/sec Loss 1.9510 LearningRate 0.0039 Epoch: 16 Global Step: 199110 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:35:13,877-Speed 3010.53 samples/sec Loss 2.0006 LearningRate 0.0039 Epoch: 16 Global Step: 199120 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:35:17,263-Speed 3025.93 samples/sec Loss 1.9473 LearningRate 0.0039 Epoch: 16 Global Step: 199130 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:35:20,597-Speed 3071.85 samples/sec Loss 2.0190 LearningRate 0.0039 Epoch: 16 Global Step: 199140 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:35:24,004-Speed 3006.86 samples/sec Loss 1.9448 LearningRate 0.0039 Epoch: 16 Global Step: 199150 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:35:27,329-Speed 3079.85 samples/sec Loss 1.9311 LearningRate 0.0039 Epoch: 16 Global Step: 199160 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 20:35:30,736-Speed 3006.92 samples/sec Loss 2.0387 LearningRate 0.0039 Epoch: 16 Global Step: 199170 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 20:35:34,036-Speed 3103.60 samples/sec Loss 2.0298 LearningRate 0.0039 Epoch: 16 Global Step: 199180 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:35:37,358-Speed 3083.69 samples/sec Loss 1.9778 LearningRate 0.0039 Epoch: 16 Global Step: 199190 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:35:40,799-Speed 2976.48 samples/sec Loss 1.9971 LearningRate 0.0039 Epoch: 16 Global Step: 199200 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:35:44,180-Speed 3029.71 samples/sec Loss 1.9820 LearningRate 0.0039 Epoch: 16 Global Step: 199210 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:35:47,548-Speed 3041.58 samples/sec Loss 1.9844 LearningRate 0.0039 Epoch: 16 Global Step: 199220 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:35:50,934-Speed 3024.76 samples/sec Loss 1.9584 LearningRate 0.0039 Epoch: 16 Global Step: 199230 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:35:54,354-Speed 2995.46 samples/sec Loss 2.0317 LearningRate 0.0039 Epoch: 16 Global Step: 199240 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:35:57,833-Speed 2944.00 samples/sec Loss 1.9811 LearningRate 0.0039 Epoch: 16 Global Step: 199250 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:36:01,235-Speed 3011.61 samples/sec Loss 1.9362 LearningRate 0.0039 Epoch: 16 Global Step: 199260 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:36:04,648-Speed 3001.14 samples/sec Loss 1.9404 LearningRate 0.0039 Epoch: 16 Global Step: 199270 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:36:08,049-Speed 3012.00 samples/sec Loss 1.9934 LearningRate 0.0039 Epoch: 16 Global Step: 199280 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:36:11,499-Speed 2968.94 samples/sec Loss 1.9761 LearningRate 0.0039 Epoch: 16 Global Step: 199290 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:36:14,883-Speed 3027.24 samples/sec Loss 1.9961 LearningRate 0.0039 Epoch: 16 Global Step: 199300 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:36:18,318-Speed 2981.59 samples/sec Loss 1.9713 LearningRate 0.0039 Epoch: 16 Global Step: 199310 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:36:21,681-Speed 3045.94 samples/sec Loss 2.0378 LearningRate 0.0039 Epoch: 16 Global Step: 199320 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:36:25,090-Speed 3004.81 samples/sec Loss 2.0853 LearningRate 0.0039 Epoch: 16 Global Step: 199330 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:36:28,485-Speed 3017.08 samples/sec Loss 2.0343 LearningRate 0.0039 Epoch: 16 Global Step: 199340 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:36:31,901-Speed 2998.24 samples/sec Loss 2.0256 LearningRate 0.0039 Epoch: 16 Global Step: 199350 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:36:35,252-Speed 3056.89 samples/sec Loss 2.0131 LearningRate 0.0039 Epoch: 16 Global Step: 199360 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:36:38,658-Speed 3007.68 samples/sec Loss 2.0052 LearningRate 0.0039 Epoch: 16 Global Step: 199370 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:36:42,117-Speed 2961.62 samples/sec Loss 1.9758 LearningRate 0.0039 Epoch: 16 Global Step: 199380 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 20:36:45,504-Speed 3024.29 samples/sec Loss 2.0409 LearningRate 0.0039 Epoch: 16 Global Step: 199390 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 20:36:48,911-Speed 3005.77 samples/sec Loss 2.0182 LearningRate 0.0039 Epoch: 16 Global Step: 199400 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 20:36:52,362-Speed 2968.04 samples/sec Loss 2.0421 LearningRate 0.0039 Epoch: 16 Global Step: 199410 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 20:36:55,836-Speed 2948.41 samples/sec Loss 1.9997 LearningRate 0.0039 Epoch: 16 Global Step: 199420 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:36:59,293-Speed 2963.59 samples/sec Loss 2.0519 LearningRate 0.0039 Epoch: 16 Global Step: 199430 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:37:02,725-Speed 2984.32 samples/sec Loss 2.0213 LearningRate 0.0039 Epoch: 16 Global Step: 199440 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:37:06,141-Speed 2998.63 samples/sec Loss 2.0144 LearningRate 0.0039 Epoch: 16 Global Step: 199450 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:37:09,601-Speed 2959.97 samples/sec Loss 2.0368 LearningRate 0.0039 Epoch: 16 Global Step: 199460 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:37:13,037-Speed 2981.59 samples/sec Loss 2.0411 LearningRate 0.0039 Epoch: 16 Global Step: 199470 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:37:16,517-Speed 2943.47 samples/sec Loss 2.0208 LearningRate 0.0039 Epoch: 16 Global Step: 199480 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:37:19,879-Speed 3046.16 samples/sec Loss 2.0172 LearningRate 0.0039 Epoch: 16 Global Step: 199490 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:37:23,291-Speed 3002.48 samples/sec Loss 2.0493 LearningRate 0.0039 Epoch: 16 Global Step: 199500 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:37:26,694-Speed 3009.59 samples/sec Loss 1.9517 LearningRate 0.0039 Epoch: 16 Global Step: 199510 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:37:30,082-Speed 3023.34 samples/sec Loss 2.0129 LearningRate 0.0039 Epoch: 16 Global Step: 199520 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 20:37:33,493-Speed 3002.48 samples/sec Loss 2.0699 LearningRate 0.0039 Epoch: 16 Global Step: 199530 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:37:36,886-Speed 3019.43 samples/sec Loss 2.0195 LearningRate 0.0039 Epoch: 16 Global Step: 199540 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:37:40,315-Speed 2987.04 samples/sec Loss 2.0992 LearningRate 0.0039 Epoch: 16 Global Step: 199550 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:37:43,670-Speed 3053.41 samples/sec Loss 2.1023 LearningRate 0.0039 Epoch: 16 Global Step: 199560 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:37:47,078-Speed 3005.73 samples/sec Loss 2.0214 LearningRate 0.0039 Epoch: 16 Global Step: 199570 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:37:50,508-Speed 2985.81 samples/sec Loss 2.0878 LearningRate 0.0039 Epoch: 16 Global Step: 199580 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:37:53,942-Speed 2982.84 samples/sec Loss 2.0405 LearningRate 0.0039 Epoch: 16 Global Step: 199590 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:37:57,411-Speed 2952.91 samples/sec Loss 2.0722 LearningRate 0.0039 Epoch: 16 Global Step: 199600 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:38:00,795-Speed 3026.63 samples/sec Loss 2.0222 LearningRate 0.0039 Epoch: 16 Global Step: 199610 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:38:04,185-Speed 3021.34 samples/sec Loss 2.0021 LearningRate 0.0039 Epoch: 16 Global Step: 199620 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:38:07,574-Speed 3022.70 samples/sec Loss 1.9513 LearningRate 0.0039 Epoch: 16 Global Step: 199630 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:38:10,927-Speed 3055.15 samples/sec Loss 2.0236 LearningRate 0.0039 Epoch: 16 Global Step: 199640 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:38:14,251-Speed 3081.29 samples/sec Loss 2.0533 LearningRate 0.0039 Epoch: 16 Global Step: 199650 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:38:17,651-Speed 3012.67 samples/sec Loss 1.9919 LearningRate 0.0039 Epoch: 16 Global Step: 199660 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:38:21,050-Speed 3013.35 samples/sec Loss 2.0023 LearningRate 0.0039 Epoch: 16 Global Step: 199670 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:38:24,488-Speed 2979.12 samples/sec Loss 2.0394 LearningRate 0.0038 Epoch: 16 Global Step: 199680 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:38:27,815-Speed 3078.35 samples/sec Loss 2.0246 LearningRate 0.0038 Epoch: 16 Global Step: 199690 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:38:31,191-Speed 3034.20 samples/sec Loss 1.9500 LearningRate 0.0038 Epoch: 16 Global Step: 199700 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:38:34,556-Speed 3044.14 samples/sec Loss 2.0656 LearningRate 0.0038 Epoch: 16 Global Step: 199710 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:38:37,892-Speed 3070.32 samples/sec Loss 2.0520 LearningRate 0.0038 Epoch: 16 Global Step: 199720 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:38:41,296-Speed 3009.70 samples/sec Loss 2.0573 LearningRate 0.0038 Epoch: 16 Global Step: 199730 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:38:44,682-Speed 3024.86 samples/sec Loss 2.0814 LearningRate 0.0038 Epoch: 16 Global Step: 199740 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:38:48,033-Speed 3056.27 samples/sec Loss 2.0479 LearningRate 0.0038 Epoch: 16 Global Step: 199750 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:38:51,470-Speed 2980.17 samples/sec Loss 2.0366 LearningRate 0.0038 Epoch: 16 Global Step: 199760 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:38:54,845-Speed 3034.94 samples/sec Loss 2.0662 LearningRate 0.0038 Epoch: 16 Global Step: 199770 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 20:38:58,148-Speed 3101.25 samples/sec Loss 1.9697 LearningRate 0.0038 Epoch: 16 Global Step: 199780 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:39:01,478-Speed 3075.84 samples/sec Loss 2.0421 LearningRate 0.0038 Epoch: 16 Global Step: 199790 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:39:04,878-Speed 3013.57 samples/sec Loss 2.0197 LearningRate 0.0038 Epoch: 16 Global Step: 199800 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:39:08,225-Speed 3059.98 samples/sec Loss 2.0816 LearningRate 0.0038 Epoch: 16 Global Step: 199810 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:39:11,635-Speed 3004.64 samples/sec Loss 2.0286 LearningRate 0.0038 Epoch: 16 Global Step: 199820 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:39:14,992-Speed 3050.69 samples/sec Loss 2.0586 LearningRate 0.0038 Epoch: 16 Global Step: 199830 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:39:18,399-Speed 3006.72 samples/sec Loss 2.1058 LearningRate 0.0038 Epoch: 16 Global Step: 199840 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:39:21,762-Speed 3045.50 samples/sec Loss 2.1136 LearningRate 0.0038 Epoch: 16 Global Step: 199850 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:39:25,179-Speed 2997.42 samples/sec Loss 2.0337 LearningRate 0.0038 Epoch: 16 Global Step: 199860 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:39:28,616-Speed 2980.46 samples/sec Loss 2.0642 LearningRate 0.0038 Epoch: 16 Global Step: 199870 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:39:32,021-Speed 3008.45 samples/sec Loss 2.1094 LearningRate 0.0038 Epoch: 16 Global Step: 199880 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:39:35,481-Speed 2960.01 samples/sec Loss 2.0604 LearningRate 0.0038 Epoch: 16 Global Step: 199890 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:39:38,915-Speed 2983.11 samples/sec Loss 2.0755 LearningRate 0.0038 Epoch: 16 Global Step: 199900 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:39:42,294-Speed 3030.72 samples/sec Loss 1.9904 LearningRate 0.0038 Epoch: 16 Global Step: 199910 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:39:45,713-Speed 2996.65 samples/sec Loss 2.0467 LearningRate 0.0038 Epoch: 16 Global Step: 199920 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:39:49,080-Speed 3041.71 samples/sec Loss 1.9791 LearningRate 0.0038 Epoch: 16 Global Step: 199930 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:39:52,477-Speed 3015.25 samples/sec Loss 2.0004 LearningRate 0.0038 Epoch: 16 Global Step: 199940 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:39:55,836-Speed 3049.61 samples/sec Loss 2.0834 LearningRate 0.0038 Epoch: 16 Global Step: 199950 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:39:59,155-Speed 3087.84 samples/sec Loss 2.0576 LearningRate 0.0038 Epoch: 16 Global Step: 199960 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:40:02,532-Speed 3032.68 samples/sec Loss 2.0257 LearningRate 0.0038 Epoch: 16 Global Step: 199970 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:40:05,932-Speed 3012.50 samples/sec Loss 2.0987 LearningRate 0.0038 Epoch: 16 Global Step: 199980 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:40:09,245-Speed 3092.14 samples/sec Loss 2.0421 LearningRate 0.0038 Epoch: 16 Global Step: 199990 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:40:12,640-Speed 3017.01 samples/sec Loss 2.0218 LearningRate 0.0038 Epoch: 16 Global Step: 200000 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:40:16,082-Speed 2975.31 samples/sec Loss 1.9768 LearningRate 0.0038 Epoch: 16 Global Step: 200010 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:40:19,476-Speed 3018.77 samples/sec Loss 2.0719 LearningRate 0.0038 Epoch: 16 Global Step: 200020 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:40:22,872-Speed 3015.53 samples/sec Loss 2.0817 LearningRate 0.0038 Epoch: 16 Global Step: 200030 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:40:26,245-Speed 3036.95 samples/sec Loss 2.0549 LearningRate 0.0038 Epoch: 16 Global Step: 200040 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:40:29,659-Speed 3000.06 samples/sec Loss 2.0076 LearningRate 0.0038 Epoch: 16 Global Step: 200050 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:40:33,013-Speed 3053.93 samples/sec Loss 2.0926 LearningRate 0.0038 Epoch: 16 Global Step: 200060 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:40:36,366-Speed 3055.09 samples/sec Loss 2.1482 LearningRate 0.0038 Epoch: 16 Global Step: 200070 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:40:39,821-Speed 2964.41 samples/sec Loss 2.0936 LearningRate 0.0038 Epoch: 16 Global Step: 200080 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:40:43,287-Speed 2955.94 samples/sec Loss 2.0929 LearningRate 0.0038 Epoch: 16 Global Step: 200090 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:40:46,714-Speed 2988.45 samples/sec Loss 2.0717 LearningRate 0.0038 Epoch: 16 Global Step: 200100 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:40:50,151-Speed 2980.19 samples/sec Loss 2.0877 LearningRate 0.0038 Epoch: 16 Global Step: 200110 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:40:53,579-Speed 2988.01 samples/sec Loss 2.0738 LearningRate 0.0038 Epoch: 16 Global Step: 200120 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:40:56,994-Speed 2999.64 samples/sec Loss 2.0807 LearningRate 0.0038 Epoch: 16 Global Step: 200130 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:41:00,394-Speed 3012.49 samples/sec Loss 2.1095 LearningRate 0.0038 Epoch: 16 Global Step: 200140 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:41:03,785-Speed 3021.14 samples/sec Loss 2.1381 LearningRate 0.0038 Epoch: 16 Global Step: 200150 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:41:07,152-Speed 3041.56 samples/sec Loss 2.1115 LearningRate 0.0038 Epoch: 16 Global Step: 200160 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:41:10,629-Speed 2946.21 samples/sec Loss 2.1134 LearningRate 0.0038 Epoch: 16 Global Step: 200170 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:41:14,077-Speed 2970.47 samples/sec Loss 2.0748 LearningRate 0.0038 Epoch: 16 Global Step: 200180 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:41:17,476-Speed 3013.75 samples/sec Loss 2.1703 LearningRate 0.0038 Epoch: 16 Global Step: 200190 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:41:20,879-Speed 3010.46 samples/sec Loss 2.0707 LearningRate 0.0038 Epoch: 16 Global Step: 200200 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:41:24,279-Speed 3011.85 samples/sec Loss 2.0865 LearningRate 0.0038 Epoch: 16 Global Step: 200210 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:41:27,678-Speed 3014.12 samples/sec Loss 2.1714 LearningRate 0.0038 Epoch: 16 Global Step: 200220 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:41:31,100-Speed 2992.89 samples/sec Loss 2.0870 LearningRate 0.0038 Epoch: 16 Global Step: 200230 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:41:34,465-Speed 3043.83 samples/sec Loss 2.1486 LearningRate 0.0038 Epoch: 16 Global Step: 200240 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:41:37,809-Speed 3062.89 samples/sec Loss 2.0791 LearningRate 0.0038 Epoch: 16 Global Step: 200250 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:41:41,216-Speed 3007.06 samples/sec Loss 2.0994 LearningRate 0.0038 Epoch: 16 Global Step: 200260 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:41:44,606-Speed 3021.00 samples/sec Loss 2.0870 LearningRate 0.0038 Epoch: 16 Global Step: 200270 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:41:47,958-Speed 3056.28 samples/sec Loss 2.0420 LearningRate 0.0038 Epoch: 16 Global Step: 200280 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:41:51,340-Speed 3028.26 samples/sec Loss 2.0482 LearningRate 0.0038 Epoch: 16 Global Step: 200290 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:41:54,707-Speed 3042.03 samples/sec Loss 2.1250 LearningRate 0.0038 Epoch: 16 Global Step: 200300 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:41:58,121-Speed 3000.47 samples/sec Loss 2.1273 LearningRate 0.0038 Epoch: 16 Global Step: 200310 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:42:01,513-Speed 3019.41 samples/sec Loss 2.0940 LearningRate 0.0037 Epoch: 16 Global Step: 200320 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:42:04,947-Speed 2983.28 samples/sec Loss 2.1083 LearningRate 0.0037 Epoch: 16 Global Step: 200330 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:42:08,415-Speed 2953.75 samples/sec Loss 2.1022 LearningRate 0.0037 Epoch: 16 Global Step: 200340 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:42:11,842-Speed 2988.26 samples/sec Loss 2.0513 LearningRate 0.0037 Epoch: 16 Global Step: 200350 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:42:15,183-Speed 3065.85 samples/sec Loss 2.0879 LearningRate 0.0037 Epoch: 16 Global Step: 200360 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:42:18,538-Speed 3053.67 samples/sec Loss 2.0697 LearningRate 0.0037 Epoch: 16 Global Step: 200370 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:42:21,863-Speed 3080.59 samples/sec Loss 2.0448 LearningRate 0.0037 Epoch: 16 Global Step: 200380 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:42:25,302-Speed 2978.27 samples/sec Loss 2.1181 LearningRate 0.0037 Epoch: 16 Global Step: 200390 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:42:28,723-Speed 2993.61 samples/sec Loss 2.1443 LearningRate 0.0037 Epoch: 16 Global Step: 200400 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:42:32,150-Speed 2989.05 samples/sec Loss 2.0861 LearningRate 0.0037 Epoch: 16 Global Step: 200410 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:42:35,534-Speed 3026.93 samples/sec Loss 2.1159 LearningRate 0.0037 Epoch: 16 Global Step: 200420 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:42:38,894-Speed 3048.72 samples/sec Loss 2.1197 LearningRate 0.0037 Epoch: 16 Global Step: 200430 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:42:42,304-Speed 3003.69 samples/sec Loss 2.1232 LearningRate 0.0037 Epoch: 16 Global Step: 200440 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:42:45,709-Speed 3008.42 samples/sec Loss 2.0737 LearningRate 0.0037 Epoch: 16 Global Step: 200450 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:42:49,101-Speed 3019.46 samples/sec Loss 2.1253 LearningRate 0.0037 Epoch: 16 Global Step: 200460 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:42:52,484-Speed 3028.11 samples/sec Loss 2.1103 LearningRate 0.0037 Epoch: 16 Global Step: 200470 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 20:42:55,874-Speed 3021.73 samples/sec Loss 2.0558 LearningRate 0.0037 Epoch: 16 Global Step: 200480 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 20:42:59,223-Speed 3058.74 samples/sec Loss 2.0703 LearningRate 0.0037 Epoch: 16 Global Step: 200490 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:43:02,546-Speed 3082.33 samples/sec Loss 2.0839 LearningRate 0.0037 Epoch: 16 Global Step: 200500 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:43:05,857-Speed 3093.80 samples/sec Loss 2.0630 LearningRate 0.0037 Epoch: 16 Global Step: 200510 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:43:09,327-Speed 2952.53 samples/sec Loss 2.1244 LearningRate 0.0037 Epoch: 16 Global Step: 200520 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:43:12,690-Speed 3045.96 samples/sec Loss 2.0921 LearningRate 0.0037 Epoch: 16 Global Step: 200530 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:43:16,017-Speed 3078.21 samples/sec Loss 2.1231 LearningRate 0.0037 Epoch: 16 Global Step: 200540 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:43:19,431-Speed 3000.82 samples/sec Loss 2.1061 LearningRate 0.0037 Epoch: 16 Global Step: 200550 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:43:22,860-Speed 2987.37 samples/sec Loss 2.0934 LearningRate 0.0037 Epoch: 16 Global Step: 200560 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:43:26,236-Speed 3033.62 samples/sec Loss 2.0824 LearningRate 0.0037 Epoch: 16 Global Step: 200570 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:43:29,598-Speed 3047.33 samples/sec Loss 2.1720 LearningRate 0.0037 Epoch: 16 Global Step: 200580 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:43:32,908-Speed 3094.51 samples/sec Loss 2.1531 LearningRate 0.0037 Epoch: 16 Global Step: 200590 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 20:43:36,270-Speed 3046.19 samples/sec Loss 2.0939 LearningRate 0.0037 Epoch: 16 Global Step: 200600 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 20:43:39,580-Speed 3094.92 samples/sec Loss 2.0969 LearningRate 0.0037 Epoch: 16 Global Step: 200610 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 20:43:42,975-Speed 3017.49 samples/sec Loss 2.0823 LearningRate 0.0037 Epoch: 16 Global Step: 200620 Fp16 Grad Scale: 65536 Required: 5 hours Training: 2022-04-27 20:43:46,370-Speed 3016.73 samples/sec Loss 2.0510 LearningRate 0.0037 Epoch: 16 Global Step: 200630 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:43:49,722-Speed 3055.54 samples/sec Loss 2.1134 LearningRate 0.0037 Epoch: 16 Global Step: 200640 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:43:53,103-Speed 3030.26 samples/sec Loss 2.1302 LearningRate 0.0037 Epoch: 16 Global Step: 200650 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:43:56,475-Speed 3037.76 samples/sec Loss 2.0757 LearningRate 0.0037 Epoch: 16 Global Step: 200660 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:43:59,873-Speed 3014.48 samples/sec Loss 2.1882 LearningRate 0.0037 Epoch: 16 Global Step: 200670 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:44:03,257-Speed 3027.00 samples/sec Loss 2.1721 LearningRate 0.0037 Epoch: 16 Global Step: 200680 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:44:06,646-Speed 3021.71 samples/sec Loss 2.0819 LearningRate 0.0037 Epoch: 16 Global Step: 200690 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:44:10,057-Speed 3002.89 samples/sec Loss 2.1781 LearningRate 0.0037 Epoch: 16 Global Step: 200700 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:44:13,421-Speed 3045.10 samples/sec Loss 2.1461 LearningRate 0.0037 Epoch: 16 Global Step: 200710 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:44:16,735-Speed 3090.62 samples/sec Loss 2.0947 LearningRate 0.0037 Epoch: 16 Global Step: 200720 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:44:20,247-Speed 2917.27 samples/sec Loss 2.1161 LearningRate 0.0037 Epoch: 16 Global Step: 200730 Fp16 Grad Scale: 16384 Required: 5 hours Training: 2022-04-27 20:44:23,669-Speed 2992.72 samples/sec Loss 2.0671 LearningRate 0.0037 Epoch: 16 Global Step: 200740 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:44:27,035-Speed 3043.00 samples/sec Loss 2.1239 LearningRate 0.0037 Epoch: 16 Global Step: 200750 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:44:30,416-Speed 3029.60 samples/sec Loss 2.1102 LearningRate 0.0037 Epoch: 16 Global Step: 200760 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:44:33,835-Speed 2995.53 samples/sec Loss 2.1841 LearningRate 0.0037 Epoch: 16 Global Step: 200770 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:44:37,190-Speed 3053.27 samples/sec Loss 2.2120 LearningRate 0.0037 Epoch: 16 Global Step: 200780 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:44:40,630-Speed 2977.61 samples/sec Loss 2.1787 LearningRate 0.0037 Epoch: 16 Global Step: 200790 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:44:44,054-Speed 2991.17 samples/sec Loss 2.1792 LearningRate 0.0037 Epoch: 16 Global Step: 200800 Fp16 Grad Scale: 32768 Required: 5 hours Training: 2022-04-27 20:44:47,456-Speed 3011.38 samples/sec Loss 2.1537 LearningRate 0.0037 Epoch: 16 Global Step: 200810 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 20:44:50,856-Speed 3012.83 samples/sec Loss 2.1613 LearningRate 0.0037 Epoch: 16 Global Step: 200820 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 20:44:54,259-Speed 3009.53 samples/sec Loss 2.1532 LearningRate 0.0037 Epoch: 16 Global Step: 200830 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 20:44:57,580-Speed 3084.24 samples/sec Loss 2.0772 LearningRate 0.0037 Epoch: 16 Global Step: 200840 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 20:45:00,970-Speed 3021.30 samples/sec Loss 2.1543 LearningRate 0.0037 Epoch: 16 Global Step: 200850 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 20:45:04,317-Speed 3060.58 samples/sec Loss 2.1163 LearningRate 0.0037 Epoch: 16 Global Step: 200860 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 20:45:07,756-Speed 2978.43 samples/sec Loss 2.1398 LearningRate 0.0037 Epoch: 16 Global Step: 200870 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 20:45:11,085-Speed 3077.23 samples/sec Loss 2.1340 LearningRate 0.0037 Epoch: 16 Global Step: 200880 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 20:45:14,455-Speed 3039.51 samples/sec Loss 2.1114 LearningRate 0.0037 Epoch: 16 Global Step: 200890 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 20:45:17,789-Speed 3072.43 samples/sec Loss 2.1708 LearningRate 0.0037 Epoch: 16 Global Step: 200900 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 20:45:21,272-Speed 2941.19 samples/sec Loss 2.1109 LearningRate 0.0037 Epoch: 16 Global Step: 200910 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:45:24,653-Speed 3028.88 samples/sec Loss 2.0922 LearningRate 0.0037 Epoch: 16 Global Step: 200920 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:45:28,012-Speed 3050.06 samples/sec Loss 2.1680 LearningRate 0.0037 Epoch: 16 Global Step: 200930 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:45:31,386-Speed 3034.93 samples/sec Loss 2.1245 LearningRate 0.0037 Epoch: 16 Global Step: 200940 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:45:34,708-Speed 3084.55 samples/sec Loss 2.1629 LearningRate 0.0037 Epoch: 16 Global Step: 200950 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 20:45:38,039-Speed 3074.79 samples/sec Loss 2.0895 LearningRate 0.0036 Epoch: 16 Global Step: 200960 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 20:45:41,463-Speed 2991.83 samples/sec Loss 2.2004 LearningRate 0.0036 Epoch: 16 Global Step: 200970 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 20:45:44,871-Speed 3005.00 samples/sec Loss 2.1089 LearningRate 0.0036 Epoch: 16 Global Step: 200980 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 20:45:48,271-Speed 3012.59 samples/sec Loss 2.1032 LearningRate 0.0036 Epoch: 16 Global Step: 200990 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 20:45:51,621-Speed 3058.12 samples/sec Loss 2.1500 LearningRate 0.0036 Epoch: 16 Global Step: 201000 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 20:45:55,012-Speed 3020.15 samples/sec Loss 2.1170 LearningRate 0.0036 Epoch: 16 Global Step: 201010 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 20:45:58,385-Speed 3036.83 samples/sec Loss 2.1493 LearningRate 0.0036 Epoch: 16 Global Step: 201020 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 20:46:01,832-Speed 2971.70 samples/sec Loss 2.1985 LearningRate 0.0036 Epoch: 16 Global Step: 201030 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 20:46:05,192-Speed 3048.75 samples/sec Loss 2.0868 LearningRate 0.0036 Epoch: 16 Global Step: 201040 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 20:46:08,540-Speed 3059.01 samples/sec Loss 2.1441 LearningRate 0.0036 Epoch: 16 Global Step: 201050 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:46:11,945-Speed 3008.70 samples/sec Loss 2.1198 LearningRate 0.0036 Epoch: 16 Global Step: 201060 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:46:15,300-Speed 3052.88 samples/sec Loss 2.1897 LearningRate 0.0036 Epoch: 16 Global Step: 201070 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:46:18,670-Speed 3039.49 samples/sec Loss 2.2597 LearningRate 0.0036 Epoch: 16 Global Step: 201080 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:46:21,995-Speed 3083.15 samples/sec Loss 2.1887 LearningRate 0.0036 Epoch: 16 Global Step: 201090 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:46:25,439-Speed 2974.14 samples/sec Loss 2.1616 LearningRate 0.0036 Epoch: 16 Global Step: 201100 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:46:28,820-Speed 3029.70 samples/sec Loss 2.2096 LearningRate 0.0036 Epoch: 16 Global Step: 201110 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:46:32,256-Speed 2980.30 samples/sec Loss 2.1545 LearningRate 0.0036 Epoch: 16 Global Step: 201120 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:46:35,701-Speed 2973.16 samples/sec Loss 2.1587 LearningRate 0.0036 Epoch: 16 Global Step: 201130 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:46:39,126-Speed 2990.96 samples/sec Loss 2.2308 LearningRate 0.0036 Epoch: 16 Global Step: 201140 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:46:42,550-Speed 2991.82 samples/sec Loss 2.2249 LearningRate 0.0036 Epoch: 16 Global Step: 201150 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 20:46:45,970-Speed 2994.88 samples/sec Loss 2.1648 LearningRate 0.0036 Epoch: 16 Global Step: 201160 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 20:46:49,402-Speed 2984.64 samples/sec Loss 2.1433 LearningRate 0.0036 Epoch: 16 Global Step: 201170 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 20:46:52,834-Speed 2984.05 samples/sec Loss 2.1517 LearningRate 0.0036 Epoch: 16 Global Step: 201180 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:46:56,189-Speed 3053.38 samples/sec Loss 2.2086 LearningRate 0.0036 Epoch: 16 Global Step: 201190 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:46:59,580-Speed 3020.43 samples/sec Loss 2.1109 LearningRate 0.0036 Epoch: 16 Global Step: 201200 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:47:02,887-Speed 3097.54 samples/sec Loss 2.1581 LearningRate 0.0036 Epoch: 16 Global Step: 201210 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:47:06,223-Speed 3070.99 samples/sec Loss 2.2045 LearningRate 0.0036 Epoch: 16 Global Step: 201220 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:47:09,557-Speed 3072.33 samples/sec Loss 2.1390 LearningRate 0.0036 Epoch: 16 Global Step: 201230 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:47:12,901-Speed 3063.14 samples/sec Loss 2.2256 LearningRate 0.0036 Epoch: 16 Global Step: 201240 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:47:16,215-Speed 3090.68 samples/sec Loss 2.1712 LearningRate 0.0036 Epoch: 16 Global Step: 201250 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:47:19,628-Speed 3001.03 samples/sec Loss 2.1499 LearningRate 0.0036 Epoch: 16 Global Step: 201260 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:47:22,958-Speed 3076.79 samples/sec Loss 2.1804 LearningRate 0.0036 Epoch: 16 Global Step: 201270 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:47:26,326-Speed 3041.12 samples/sec Loss 2.2307 LearningRate 0.0036 Epoch: 16 Global Step: 201280 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 20:47:29,645-Speed 3086.04 samples/sec Loss 2.1768 LearningRate 0.0036 Epoch: 16 Global Step: 201290 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:47:33,031-Speed 3025.65 samples/sec Loss 2.1432 LearningRate 0.0036 Epoch: 16 Global Step: 201300 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:47:36,398-Speed 3042.36 samples/sec Loss 2.2272 LearningRate 0.0036 Epoch: 16 Global Step: 201310 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:47:39,802-Speed 3008.36 samples/sec Loss 2.1723 LearningRate 0.0036 Epoch: 16 Global Step: 201320 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:47:43,140-Speed 3069.13 samples/sec Loss 2.1437 LearningRate 0.0036 Epoch: 16 Global Step: 201330 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:47:46,539-Speed 3013.58 samples/sec Loss 2.1771 LearningRate 0.0036 Epoch: 16 Global Step: 201340 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:47:49,875-Speed 3069.99 samples/sec Loss 2.2399 LearningRate 0.0036 Epoch: 16 Global Step: 201350 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:47:53,308-Speed 2983.32 samples/sec Loss 2.1917 LearningRate 0.0036 Epoch: 16 Global Step: 201360 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:47:56,659-Speed 3056.94 samples/sec Loss 2.1697 LearningRate 0.0036 Epoch: 16 Global Step: 201370 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 20:48:00,075-Speed 2998.73 samples/sec Loss 2.1543 LearningRate 0.0036 Epoch: 16 Global Step: 201380 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 20:48:03,484-Speed 3004.00 samples/sec Loss 2.1905 LearningRate 0.0036 Epoch: 16 Global Step: 201390 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 20:48:06,853-Speed 3041.07 samples/sec Loss 2.1844 LearningRate 0.0036 Epoch: 16 Global Step: 201400 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 20:48:10,221-Speed 3041.48 samples/sec Loss 2.2061 LearningRate 0.0036 Epoch: 16 Global Step: 201410 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 20:48:13,582-Speed 3047.75 samples/sec Loss 2.1623 LearningRate 0.0036 Epoch: 16 Global Step: 201420 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 20:48:16,992-Speed 3003.75 samples/sec Loss 2.0827 LearningRate 0.0036 Epoch: 16 Global Step: 201430 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 20:48:20,325-Speed 3073.37 samples/sec Loss 2.1936 LearningRate 0.0036 Epoch: 16 Global Step: 201440 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 20:48:23,700-Speed 3034.52 samples/sec Loss 2.1413 LearningRate 0.0036 Epoch: 16 Global Step: 201450 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 20:48:27,051-Speed 3056.56 samples/sec Loss 2.1733 LearningRate 0.0036 Epoch: 16 Global Step: 201460 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 20:48:30,435-Speed 3026.49 samples/sec Loss 2.1640 LearningRate 0.0036 Epoch: 16 Global Step: 201470 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:48:33,832-Speed 3015.36 samples/sec Loss 2.1319 LearningRate 0.0036 Epoch: 16 Global Step: 201480 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:48:37,304-Speed 2950.14 samples/sec Loss 2.1952 LearningRate 0.0036 Epoch: 16 Global Step: 201490 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:48:40,786-Speed 2941.74 samples/sec Loss 2.1448 LearningRate 0.0036 Epoch: 16 Global Step: 201500 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:48:44,191-Speed 3008.59 samples/sec Loss 2.1914 LearningRate 0.0036 Epoch: 16 Global Step: 201510 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:48:47,513-Speed 3083.47 samples/sec Loss 2.1784 LearningRate 0.0036 Epoch: 16 Global Step: 201520 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:48:50,883-Speed 3038.99 samples/sec Loss 2.2367 LearningRate 0.0036 Epoch: 16 Global Step: 201530 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:48:54,271-Speed 3023.41 samples/sec Loss 2.1502 LearningRate 0.0036 Epoch: 16 Global Step: 201540 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:48:57,744-Speed 2949.76 samples/sec Loss 2.1156 LearningRate 0.0036 Epoch: 16 Global Step: 201550 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:49:01,097-Speed 3054.89 samples/sec Loss 2.2150 LearningRate 0.0036 Epoch: 16 Global Step: 201560 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:49:04,492-Speed 3017.08 samples/sec Loss 2.1812 LearningRate 0.0036 Epoch: 16 Global Step: 201570 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 20:49:07,830-Speed 3067.96 samples/sec Loss 2.0595 LearningRate 0.0036 Epoch: 16 Global Step: 201580 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 20:49:11,272-Speed 2975.92 samples/sec Loss 2.2335 LearningRate 0.0036 Epoch: 16 Global Step: 201590 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:49:14,708-Speed 2981.10 samples/sec Loss 2.2077 LearningRate 0.0036 Epoch: 16 Global Step: 201600 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:49:18,155-Speed 2971.90 samples/sec Loss 2.1570 LearningRate 0.0036 Epoch: 16 Global Step: 201610 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:49:21,556-Speed 3011.70 samples/sec Loss 2.1811 LearningRate 0.0035 Epoch: 16 Global Step: 201620 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:49:24,973-Speed 2997.51 samples/sec Loss 2.1569 LearningRate 0.0035 Epoch: 16 Global Step: 201630 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 20:49:28,378-Speed 3008.38 samples/sec Loss 2.1876 LearningRate 0.0035 Epoch: 16 Global Step: 201640 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 20:49:31,728-Speed 3057.95 samples/sec Loss 2.1890 LearningRate 0.0035 Epoch: 16 Global Step: 201650 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 20:49:35,138-Speed 3003.43 samples/sec Loss 2.1629 LearningRate 0.0035 Epoch: 16 Global Step: 201660 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 20:49:38,633-Speed 2930.40 samples/sec Loss 2.2146 LearningRate 0.0035 Epoch: 16 Global Step: 201670 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 20:49:42,062-Speed 2987.65 samples/sec Loss 2.1398 LearningRate 0.0035 Epoch: 16 Global Step: 201680 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 20:49:45,417-Speed 3052.45 samples/sec Loss 2.1294 LearningRate 0.0035 Epoch: 16 Global Step: 201690 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 20:49:48,822-Speed 3008.77 samples/sec Loss 2.1987 LearningRate 0.0035 Epoch: 16 Global Step: 201700 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 20:49:52,199-Speed 3032.90 samples/sec Loss 2.1773 LearningRate 0.0035 Epoch: 16 Global Step: 201710 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 20:49:55,642-Speed 2975.04 samples/sec Loss 2.2095 LearningRate 0.0035 Epoch: 16 Global Step: 201720 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 20:49:59,067-Speed 2990.40 samples/sec Loss 2.1701 LearningRate 0.0035 Epoch: 16 Global Step: 201730 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:50:02,425-Speed 3049.69 samples/sec Loss 2.1066 LearningRate 0.0035 Epoch: 16 Global Step: 201740 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:50:05,753-Speed 3077.94 samples/sec Loss 2.1955 LearningRate 0.0035 Epoch: 16 Global Step: 201750 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:50:09,076-Speed 3083.41 samples/sec Loss 2.1808 LearningRate 0.0035 Epoch: 16 Global Step: 201760 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:50:12,470-Speed 3017.68 samples/sec Loss 2.1606 LearningRate 0.0035 Epoch: 16 Global Step: 201770 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 20:50:15,901-Speed 2985.39 samples/sec Loss 2.1523 LearningRate 0.0035 Epoch: 16 Global Step: 201780 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 20:50:19,299-Speed 3014.13 samples/sec Loss 2.1087 LearningRate 0.0035 Epoch: 16 Global Step: 201790 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 20:50:22,677-Speed 3032.33 samples/sec Loss 2.2575 LearningRate 0.0035 Epoch: 16 Global Step: 201800 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 20:50:26,095-Speed 2997.04 samples/sec Loss 2.0748 LearningRate 0.0035 Epoch: 16 Global Step: 201810 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 20:50:29,559-Speed 2956.62 samples/sec Loss 2.2106 LearningRate 0.0035 Epoch: 16 Global Step: 201820 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 20:50:32,908-Speed 3059.04 samples/sec Loss 2.2711 LearningRate 0.0035 Epoch: 16 Global Step: 201830 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 20:50:36,274-Speed 3042.74 samples/sec Loss 2.2350 LearningRate 0.0035 Epoch: 16 Global Step: 201840 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 20:50:39,684-Speed 3003.71 samples/sec Loss 2.1919 LearningRate 0.0035 Epoch: 16 Global Step: 201850 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 20:50:43,063-Speed 3031.04 samples/sec Loss 2.2283 LearningRate 0.0035 Epoch: 16 Global Step: 201860 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 20:50:46,400-Speed 3069.72 samples/sec Loss 2.1583 LearningRate 0.0035 Epoch: 16 Global Step: 201870 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:50:49,738-Speed 3069.06 samples/sec Loss 2.2094 LearningRate 0.0035 Epoch: 16 Global Step: 201880 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 20:50:53,065-Speed 3078.22 samples/sec Loss 2.1478 LearningRate 0.0035 Epoch: 16 Global Step: 201890 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 20:50:56,430-Speed 3044.29 samples/sec Loss 2.1823 LearningRate 0.0035 Epoch: 16 Global Step: 201900 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 20:50:59,857-Speed 2988.33 samples/sec Loss 2.1749 LearningRate 0.0035 Epoch: 16 Global Step: 201910 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 20:51:03,266-Speed 3004.54 samples/sec Loss 2.2529 LearningRate 0.0035 Epoch: 16 Global Step: 201920 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 20:51:06,637-Speed 3038.28 samples/sec Loss 2.2070 LearningRate 0.0035 Epoch: 16 Global Step: 201930 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 20:51:10,041-Speed 3009.45 samples/sec Loss 2.1572 LearningRate 0.0035 Epoch: 16 Global Step: 201940 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 20:51:13,374-Speed 3073.55 samples/sec Loss 2.1354 LearningRate 0.0035 Epoch: 16 Global Step: 201950 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 20:51:16,719-Speed 3061.20 samples/sec Loss 2.2474 LearningRate 0.0035 Epoch: 16 Global Step: 201960 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 20:51:20,075-Speed 3053.08 samples/sec Loss 2.2302 LearningRate 0.0035 Epoch: 16 Global Step: 201970 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 20:51:23,446-Speed 3038.84 samples/sec Loss 2.2096 LearningRate 0.0035 Epoch: 16 Global Step: 201980 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:51:26,804-Speed 3049.79 samples/sec Loss 2.1778 LearningRate 0.0035 Epoch: 16 Global Step: 201990 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:51:30,147-Speed 3063.81 samples/sec Loss 2.1592 LearningRate 0.0035 Epoch: 16 Global Step: 202000 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:51:33,458-Speed 3094.09 samples/sec Loss 2.2727 LearningRate 0.0035 Epoch: 16 Global Step: 202010 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:51:36,836-Speed 3031.78 samples/sec Loss 2.2583 LearningRate 0.0035 Epoch: 16 Global Step: 202020 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:51:40,165-Speed 3076.63 samples/sec Loss 2.2034 LearningRate 0.0035 Epoch: 16 Global Step: 202030 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:51:43,505-Speed 3067.30 samples/sec Loss 2.2367 LearningRate 0.0035 Epoch: 16 Global Step: 202040 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:51:46,930-Speed 2990.26 samples/sec Loss 2.1453 LearningRate 0.0035 Epoch: 16 Global Step: 202050 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:51:50,292-Speed 3046.71 samples/sec Loss 2.2167 LearningRate 0.0035 Epoch: 16 Global Step: 202060 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:51:53,666-Speed 3035.61 samples/sec Loss 2.2735 LearningRate 0.0035 Epoch: 16 Global Step: 202070 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:51:56,975-Speed 3095.00 samples/sec Loss 2.2506 LearningRate 0.0035 Epoch: 16 Global Step: 202080 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 20:52:00,337-Speed 3046.79 samples/sec Loss 2.1944 LearningRate 0.0035 Epoch: 16 Global Step: 202090 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:52:03,729-Speed 3020.21 samples/sec Loss 2.2320 LearningRate 0.0035 Epoch: 16 Global Step: 202100 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:52:07,062-Speed 3073.05 samples/sec Loss 2.2610 LearningRate 0.0035 Epoch: 16 Global Step: 202110 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:52:10,477-Speed 2999.26 samples/sec Loss 2.2391 LearningRate 0.0035 Epoch: 16 Global Step: 202120 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:52:13,820-Speed 3063.83 samples/sec Loss 2.2527 LearningRate 0.0035 Epoch: 16 Global Step: 202130 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:52:17,234-Speed 2999.99 samples/sec Loss 2.1815 LearningRate 0.0035 Epoch: 16 Global Step: 202140 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:52:20,624-Speed 3021.66 samples/sec Loss 2.1651 LearningRate 0.0035 Epoch: 16 Global Step: 202150 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:52:24,038-Speed 3000.31 samples/sec Loss 2.2155 LearningRate 0.0035 Epoch: 16 Global Step: 202160 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:52:27,404-Speed 3042.57 samples/sec Loss 2.2060 LearningRate 0.0035 Epoch: 16 Global Step: 202170 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:52:30,752-Speed 3059.90 samples/sec Loss 2.2000 LearningRate 0.0035 Epoch: 16 Global Step: 202180 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:52:34,067-Speed 3089.33 samples/sec Loss 2.2314 LearningRate 0.0035 Epoch: 16 Global Step: 202190 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 20:52:37,380-Speed 3091.60 samples/sec Loss 2.3126 LearningRate 0.0035 Epoch: 16 Global Step: 202200 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:52:40,736-Speed 3052.60 samples/sec Loss 2.2413 LearningRate 0.0035 Epoch: 16 Global Step: 202210 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:52:44,080-Speed 3063.04 samples/sec Loss 2.2166 LearningRate 0.0035 Epoch: 16 Global Step: 202220 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:52:47,427-Speed 3060.00 samples/sec Loss 2.1653 LearningRate 0.0035 Epoch: 16 Global Step: 202230 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:52:50,801-Speed 3036.30 samples/sec Loss 2.2542 LearningRate 0.0035 Epoch: 16 Global Step: 202240 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:52:54,157-Speed 3053.12 samples/sec Loss 2.2191 LearningRate 0.0035 Epoch: 16 Global Step: 202250 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:52:57,498-Speed 3065.49 samples/sec Loss 2.2541 LearningRate 0.0035 Epoch: 16 Global Step: 202260 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:53:00,970-Speed 2950.14 samples/sec Loss 2.1921 LearningRate 0.0035 Epoch: 16 Global Step: 202270 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:53:04,380-Speed 3003.87 samples/sec Loss 2.2345 LearningRate 0.0034 Epoch: 16 Global Step: 202280 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:53:07,768-Speed 3022.68 samples/sec Loss 2.2270 LearningRate 0.0034 Epoch: 16 Global Step: 202290 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:53:11,131-Speed 3045.34 samples/sec Loss 2.1980 LearningRate 0.0034 Epoch: 16 Global Step: 202300 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 20:53:14,478-Speed 3061.05 samples/sec Loss 2.1950 LearningRate 0.0034 Epoch: 16 Global Step: 202310 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:53:17,900-Speed 2993.48 samples/sec Loss 2.2751 LearningRate 0.0034 Epoch: 16 Global Step: 202320 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:53:21,278-Speed 3031.96 samples/sec Loss 2.1791 LearningRate 0.0034 Epoch: 16 Global Step: 202330 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:53:24,635-Speed 3051.41 samples/sec Loss 2.2322 LearningRate 0.0034 Epoch: 16 Global Step: 202340 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:53:28,044-Speed 3004.08 samples/sec Loss 2.2549 LearningRate 0.0034 Epoch: 16 Global Step: 202350 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:53:31,516-Speed 2950.62 samples/sec Loss 2.1911 LearningRate 0.0034 Epoch: 16 Global Step: 202360 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:53:34,971-Speed 2964.53 samples/sec Loss 2.1953 LearningRate 0.0034 Epoch: 16 Global Step: 202370 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:53:38,455-Speed 2939.89 samples/sec Loss 2.1754 LearningRate 0.0034 Epoch: 16 Global Step: 202380 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:53:41,878-Speed 2993.09 samples/sec Loss 2.2417 LearningRate 0.0034 Epoch: 16 Global Step: 202390 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:53:45,329-Speed 2968.87 samples/sec Loss 2.1764 LearningRate 0.0034 Epoch: 16 Global Step: 202400 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:53:48,706-Speed 3032.64 samples/sec Loss 2.3088 LearningRate 0.0034 Epoch: 16 Global Step: 202410 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:53:52,082-Speed 3034.52 samples/sec Loss 2.2863 LearningRate 0.0034 Epoch: 16 Global Step: 202420 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:53:55,420-Speed 3068.37 samples/sec Loss 2.2238 LearningRate 0.0034 Epoch: 16 Global Step: 202430 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:53:58,782-Speed 3046.37 samples/sec Loss 2.2027 LearningRate 0.0034 Epoch: 16 Global Step: 202440 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:54:02,115-Speed 3073.19 samples/sec Loss 2.2259 LearningRate 0.0034 Epoch: 16 Global Step: 202450 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:54:05,455-Speed 3066.12 samples/sec Loss 2.1762 LearningRate 0.0034 Epoch: 16 Global Step: 202460 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:54:08,836-Speed 3030.04 samples/sec Loss 2.2283 LearningRate 0.0034 Epoch: 16 Global Step: 202470 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:54:12,181-Speed 3062.70 samples/sec Loss 2.2003 LearningRate 0.0034 Epoch: 16 Global Step: 202480 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:54:15,531-Speed 3057.40 samples/sec Loss 2.2755 LearningRate 0.0034 Epoch: 16 Global Step: 202490 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:54:18,898-Speed 3041.93 samples/sec Loss 2.2732 LearningRate 0.0034 Epoch: 16 Global Step: 202500 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:54:22,245-Speed 3060.58 samples/sec Loss 2.2663 LearningRate 0.0034 Epoch: 16 Global Step: 202510 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 20:54:25,593-Speed 3059.94 samples/sec Loss 2.2238 LearningRate 0.0034 Epoch: 16 Global Step: 202520 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:54:28,936-Speed 3063.58 samples/sec Loss 2.1694 LearningRate 0.0034 Epoch: 16 Global Step: 202530 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:54:32,327-Speed 3021.02 samples/sec Loss 2.2276 LearningRate 0.0034 Epoch: 16 Global Step: 202540 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:54:35,755-Speed 2987.72 samples/sec Loss 2.1924 LearningRate 0.0034 Epoch: 16 Global Step: 202550 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:54:39,102-Speed 3059.92 samples/sec Loss 2.2344 LearningRate 0.0034 Epoch: 16 Global Step: 202560 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:54:42,591-Speed 2935.85 samples/sec Loss 2.1872 LearningRate 0.0034 Epoch: 16 Global Step: 202570 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:54:45,976-Speed 3026.05 samples/sec Loss 2.2105 LearningRate 0.0034 Epoch: 16 Global Step: 202580 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:54:49,376-Speed 3012.48 samples/sec Loss 2.2930 LearningRate 0.0034 Epoch: 16 Global Step: 202590 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:54:52,790-Speed 3001.05 samples/sec Loss 2.2242 LearningRate 0.0034 Epoch: 16 Global Step: 202600 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:54:56,185-Speed 3016.88 samples/sec Loss 2.2647 LearningRate 0.0034 Epoch: 16 Global Step: 202610 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:54:59,536-Speed 3056.61 samples/sec Loss 2.2278 LearningRate 0.0034 Epoch: 16 Global Step: 202620 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:55:02,947-Speed 3003.02 samples/sec Loss 2.2135 LearningRate 0.0034 Epoch: 16 Global Step: 202630 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 20:55:06,365-Speed 2997.36 samples/sec Loss 2.2595 LearningRate 0.0034 Epoch: 16 Global Step: 202640 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 20:55:09,716-Speed 3055.94 samples/sec Loss 2.2120 LearningRate 0.0034 Epoch: 16 Global Step: 202650 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 20:55:13,128-Speed 3002.89 samples/sec Loss 2.2163 LearningRate 0.0034 Epoch: 16 Global Step: 202660 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 20:55:16,429-Speed 3102.47 samples/sec Loss 2.3007 LearningRate 0.0034 Epoch: 16 Global Step: 202670 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 20:55:19,759-Speed 3076.23 samples/sec Loss 2.1757 LearningRate 0.0034 Epoch: 16 Global Step: 202680 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 20:55:23,072-Speed 3092.32 samples/sec Loss 2.1885 LearningRate 0.0034 Epoch: 16 Global Step: 202690 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 20:55:26,409-Speed 3069.22 samples/sec Loss 2.2708 LearningRate 0.0034 Epoch: 16 Global Step: 202700 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 20:55:29,844-Speed 2982.17 samples/sec Loss 2.1881 LearningRate 0.0034 Epoch: 16 Global Step: 202710 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 20:55:33,317-Speed 2948.96 samples/sec Loss 2.1900 LearningRate 0.0034 Epoch: 16 Global Step: 202720 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 20:55:36,725-Speed 3005.68 samples/sec Loss 2.2357 LearningRate 0.0034 Epoch: 16 Global Step: 202730 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:55:40,077-Speed 3055.62 samples/sec Loss 2.2567 LearningRate 0.0034 Epoch: 16 Global Step: 202740 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:55:43,484-Speed 3007.44 samples/sec Loss 2.2734 LearningRate 0.0034 Epoch: 16 Global Step: 202750 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:55:46,851-Speed 3041.77 samples/sec Loss 2.2727 LearningRate 0.0034 Epoch: 16 Global Step: 202760 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 20:55:50,191-Speed 3066.97 samples/sec Loss 2.2023 LearningRate 0.0034 Epoch: 16 Global Step: 202770 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 20:55:53,504-Speed 3091.29 samples/sec Loss 2.2346 LearningRate 0.0034 Epoch: 16 Global Step: 202780 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 20:55:56,881-Speed 3035.04 samples/sec Loss 2.2674 LearningRate 0.0034 Epoch: 16 Global Step: 202790 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 20:56:00,380-Speed 2927.08 samples/sec Loss 2.2370 LearningRate 0.0034 Epoch: 16 Global Step: 202800 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 20:56:03,770-Speed 3021.34 samples/sec Loss 2.2076 LearningRate 0.0034 Epoch: 16 Global Step: 202810 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 20:56:07,134-Speed 3044.76 samples/sec Loss 2.2768 LearningRate 0.0034 Epoch: 16 Global Step: 202820 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 20:56:10,535-Speed 3012.43 samples/sec Loss 2.2010 LearningRate 0.0034 Epoch: 16 Global Step: 202830 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 20:56:13,853-Speed 3086.95 samples/sec Loss 2.2178 LearningRate 0.0034 Epoch: 16 Global Step: 202840 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 20:56:17,173-Speed 3085.47 samples/sec Loss 2.2290 LearningRate 0.0034 Epoch: 16 Global Step: 202850 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 20:56:20,555-Speed 3028.58 samples/sec Loss 2.2131 LearningRate 0.0034 Epoch: 16 Global Step: 202860 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:56:23,926-Speed 3037.73 samples/sec Loss 2.2185 LearningRate 0.0034 Epoch: 16 Global Step: 202870 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:56:27,282-Speed 3052.84 samples/sec Loss 2.2385 LearningRate 0.0034 Epoch: 16 Global Step: 202880 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:56:30,690-Speed 3004.98 samples/sec Loss 2.2856 LearningRate 0.0034 Epoch: 16 Global Step: 202890 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:56:34,020-Speed 3076.07 samples/sec Loss 2.2628 LearningRate 0.0034 Epoch: 16 Global Step: 202900 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:56:37,455-Speed 2982.03 samples/sec Loss 2.2148 LearningRate 0.0034 Epoch: 16 Global Step: 202910 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:56:40,788-Speed 3073.01 samples/sec Loss 2.2594 LearningRate 0.0034 Epoch: 16 Global Step: 202920 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:56:44,116-Speed 3078.16 samples/sec Loss 2.2441 LearningRate 0.0034 Epoch: 16 Global Step: 202930 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:56:47,487-Speed 3037.92 samples/sec Loss 2.2292 LearningRate 0.0034 Epoch: 16 Global Step: 202940 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:56:50,888-Speed 3011.85 samples/sec Loss 2.2753 LearningRate 0.0034 Epoch: 16 Global Step: 202950 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:56:54,236-Speed 3059.55 samples/sec Loss 2.1611 LearningRate 0.0033 Epoch: 16 Global Step: 202960 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 20:56:57,595-Speed 3049.16 samples/sec Loss 2.1904 LearningRate 0.0033 Epoch: 16 Global Step: 202970 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:57:00,999-Speed 3009.79 samples/sec Loss 2.2367 LearningRate 0.0033 Epoch: 16 Global Step: 202980 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:57:04,458-Speed 2960.98 samples/sec Loss 2.1506 LearningRate 0.0033 Epoch: 16 Global Step: 202990 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:57:08,496-Speed 2536.81 samples/sec Loss 2.2850 LearningRate 0.0033 Epoch: 16 Global Step: 203000 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:57:11,869-Speed 3036.71 samples/sec Loss 2.3546 LearningRate 0.0033 Epoch: 16 Global Step: 203010 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:57:15,299-Speed 2985.67 samples/sec Loss 2.3189 LearningRate 0.0033 Epoch: 16 Global Step: 203020 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:57:20,535-Speed 1955.99 samples/sec Loss 2.2591 LearningRate 0.0033 Epoch: 16 Global Step: 203030 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:57:23,980-Speed 2973.73 samples/sec Loss 2.2514 LearningRate 0.0033 Epoch: 16 Global Step: 203040 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:57:27,472-Speed 2932.86 samples/sec Loss 2.2773 LearningRate 0.0033 Epoch: 16 Global Step: 203050 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:57:31,361-Speed 2633.73 samples/sec Loss 2.2972 LearningRate 0.0033 Epoch: 16 Global Step: 203060 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 20:57:34,723-Speed 3046.30 samples/sec Loss 2.2264 LearningRate 0.0033 Epoch: 16 Global Step: 203070 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 20:57:38,107-Speed 3026.91 samples/sec Loss 2.2305 LearningRate 0.0033 Epoch: 16 Global Step: 203080 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 20:57:41,454-Speed 3060.54 samples/sec Loss 2.3494 LearningRate 0.0033 Epoch: 16 Global Step: 203090 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 20:57:44,867-Speed 3001.12 samples/sec Loss 2.3066 LearningRate 0.0033 Epoch: 16 Global Step: 203100 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 20:57:48,222-Speed 3053.50 samples/sec Loss 2.2434 LearningRate 0.0033 Epoch: 16 Global Step: 203110 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 20:57:51,551-Speed 3076.25 samples/sec Loss 2.2290 LearningRate 0.0033 Epoch: 16 Global Step: 203120 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 20:57:54,951-Speed 3013.09 samples/sec Loss 2.2220 LearningRate 0.0033 Epoch: 16 Global Step: 203130 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 20:57:58,364-Speed 3001.26 samples/sec Loss 2.2013 LearningRate 0.0033 Epoch: 16 Global Step: 203140 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 20:58:01,777-Speed 3000.20 samples/sec Loss 2.2211 LearningRate 0.0033 Epoch: 16 Global Step: 203150 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 20:58:05,158-Speed 3031.13 samples/sec Loss 2.2787 LearningRate 0.0033 Epoch: 16 Global Step: 203160 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:58:08,514-Speed 3052.14 samples/sec Loss 2.2198 LearningRate 0.0033 Epoch: 16 Global Step: 203170 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:58:11,873-Speed 3049.22 samples/sec Loss 2.2268 LearningRate 0.0033 Epoch: 16 Global Step: 203180 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:58:15,199-Speed 3079.59 samples/sec Loss 2.2367 LearningRate 0.0033 Epoch: 16 Global Step: 203190 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:58:18,573-Speed 3035.97 samples/sec Loss 2.2998 LearningRate 0.0033 Epoch: 16 Global Step: 203200 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:58:21,937-Speed 3045.11 samples/sec Loss 2.2873 LearningRate 0.0033 Epoch: 16 Global Step: 203210 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:58:25,313-Speed 3033.90 samples/sec Loss 2.2493 LearningRate 0.0033 Epoch: 16 Global Step: 203220 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:58:28,723-Speed 3004.06 samples/sec Loss 2.2835 LearningRate 0.0033 Epoch: 16 Global Step: 203230 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:58:32,091-Speed 3040.73 samples/sec Loss 2.1393 LearningRate 0.0033 Epoch: 16 Global Step: 203240 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:58:35,479-Speed 3023.77 samples/sec Loss 2.3126 LearningRate 0.0033 Epoch: 16 Global Step: 203250 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:58:38,842-Speed 3044.99 samples/sec Loss 2.2546 LearningRate 0.0033 Epoch: 16 Global Step: 203260 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:58:42,204-Speed 3047.16 samples/sec Loss 2.3158 LearningRate 0.0033 Epoch: 16 Global Step: 203270 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:58:45,552-Speed 3059.14 samples/sec Loss 2.1863 LearningRate 0.0033 Epoch: 16 Global Step: 203280 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:58:48,954-Speed 3011.06 samples/sec Loss 2.2645 LearningRate 0.0033 Epoch: 16 Global Step: 203290 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:58:52,295-Speed 3065.52 samples/sec Loss 2.2044 LearningRate 0.0033 Epoch: 16 Global Step: 203300 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:58:55,621-Speed 3079.46 samples/sec Loss 2.3247 LearningRate 0.0033 Epoch: 16 Global Step: 203310 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:58:59,031-Speed 3004.02 samples/sec Loss 2.2854 LearningRate 0.0033 Epoch: 16 Global Step: 203320 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:59:02,497-Speed 2955.67 samples/sec Loss 2.1985 LearningRate 0.0033 Epoch: 16 Global Step: 203330 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:59:05,891-Speed 3017.52 samples/sec Loss 2.2634 LearningRate 0.0033 Epoch: 16 Global Step: 203340 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:59:09,268-Speed 3033.36 samples/sec Loss 2.2918 LearningRate 0.0033 Epoch: 16 Global Step: 203350 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:59:12,604-Speed 3070.83 samples/sec Loss 2.3423 LearningRate 0.0033 Epoch: 16 Global Step: 203360 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 20:59:15,923-Speed 3086.29 samples/sec Loss 2.2751 LearningRate 0.0033 Epoch: 16 Global Step: 203370 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:59:19,916-Speed 2564.52 samples/sec Loss 2.2451 LearningRate 0.0033 Epoch: 16 Global Step: 203380 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:59:23,296-Speed 3030.96 samples/sec Loss 2.2175 LearningRate 0.0033 Epoch: 16 Global Step: 203390 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:59:26,724-Speed 2987.76 samples/sec Loss 2.3047 LearningRate 0.0033 Epoch: 16 Global Step: 203400 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:59:30,784-Speed 2522.73 samples/sec Loss 2.2690 LearningRate 0.0033 Epoch: 16 Global Step: 203410 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:59:34,164-Speed 3030.55 samples/sec Loss 2.3443 LearningRate 0.0033 Epoch: 16 Global Step: 203420 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:59:37,539-Speed 3035.10 samples/sec Loss 2.3136 LearningRate 0.0033 Epoch: 16 Global Step: 203430 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:59:40,962-Speed 2992.37 samples/sec Loss 2.2786 LearningRate 0.0033 Epoch: 16 Global Step: 203440 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:59:44,438-Speed 2947.57 samples/sec Loss 2.2525 LearningRate 0.0033 Epoch: 16 Global Step: 203450 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:59:47,775-Speed 3069.23 samples/sec Loss 2.2534 LearningRate 0.0033 Epoch: 16 Global Step: 203460 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:59:51,151-Speed 3033.55 samples/sec Loss 2.2947 LearningRate 0.0033 Epoch: 16 Global Step: 203470 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 20:59:54,502-Speed 3056.51 samples/sec Loss 2.3255 LearningRate 0.0033 Epoch: 16 Global Step: 203480 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 20:59:57,879-Speed 3033.38 samples/sec Loss 2.2443 LearningRate 0.0033 Epoch: 16 Global Step: 203490 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:00:01,250-Speed 3039.41 samples/sec Loss 2.2852 LearningRate 0.0033 Epoch: 16 Global Step: 203500 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:00:04,645-Speed 3016.86 samples/sec Loss 2.2903 LearningRate 0.0033 Epoch: 16 Global Step: 203510 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:00:08,037-Speed 3019.78 samples/sec Loss 2.2777 LearningRate 0.0033 Epoch: 16 Global Step: 203520 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:00:11,445-Speed 3005.16 samples/sec Loss 2.2203 LearningRate 0.0033 Epoch: 16 Global Step: 203530 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:00:14,886-Speed 2977.02 samples/sec Loss 2.2836 LearningRate 0.0033 Epoch: 16 Global Step: 203540 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:00:18,327-Speed 2976.41 samples/sec Loss 2.1860 LearningRate 0.0033 Epoch: 16 Global Step: 203550 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:00:21,662-Speed 3071.62 samples/sec Loss 2.2420 LearningRate 0.0033 Epoch: 16 Global Step: 203560 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:00:25,105-Speed 2975.35 samples/sec Loss 2.2817 LearningRate 0.0033 Epoch: 16 Global Step: 203570 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:00:28,450-Speed 3062.05 samples/sec Loss 2.2729 LearningRate 0.0033 Epoch: 16 Global Step: 203580 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:00:31,776-Speed 3079.83 samples/sec Loss 2.2516 LearningRate 0.0033 Epoch: 16 Global Step: 203590 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:00:35,231-Speed 2963.90 samples/sec Loss 2.2853 LearningRate 0.0033 Epoch: 16 Global Step: 203600 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:00:38,609-Speed 3032.04 samples/sec Loss 2.2297 LearningRate 0.0033 Epoch: 16 Global Step: 203610 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:00:42,005-Speed 3016.89 samples/sec Loss 2.3836 LearningRate 0.0033 Epoch: 16 Global Step: 203620 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:00:45,414-Speed 3004.63 samples/sec Loss 2.2157 LearningRate 0.0033 Epoch: 16 Global Step: 203630 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:00:48,751-Speed 3068.89 samples/sec Loss 2.2207 LearningRate 0.0032 Epoch: 16 Global Step: 203640 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:00:52,074-Speed 3083.41 samples/sec Loss 2.2642 LearningRate 0.0032 Epoch: 16 Global Step: 203650 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:00:55,442-Speed 3041.05 samples/sec Loss 2.3242 LearningRate 0.0032 Epoch: 16 Global Step: 203660 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:00:58,796-Speed 3053.38 samples/sec Loss 2.3477 LearningRate 0.0032 Epoch: 16 Global Step: 203670 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:01:02,230-Speed 2983.04 samples/sec Loss 2.2584 LearningRate 0.0032 Epoch: 16 Global Step: 203680 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:01:05,627-Speed 3014.94 samples/sec Loss 2.2506 LearningRate 0.0032 Epoch: 16 Global Step: 203690 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 21:01:09,030-Speed 3009.89 samples/sec Loss 2.2735 LearningRate 0.0032 Epoch: 16 Global Step: 203700 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 21:01:12,399-Speed 3040.26 samples/sec Loss 2.2137 LearningRate 0.0032 Epoch: 16 Global Step: 203710 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 21:01:15,802-Speed 3010.35 samples/sec Loss 2.2972 LearningRate 0.0032 Epoch: 16 Global Step: 203720 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 21:01:19,177-Speed 3035.37 samples/sec Loss 2.2631 LearningRate 0.0032 Epoch: 16 Global Step: 203730 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 21:01:22,523-Speed 3061.12 samples/sec Loss 2.2362 LearningRate 0.0032 Epoch: 16 Global Step: 203740 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:01:25,879-Speed 3052.08 samples/sec Loss 2.2891 LearningRate 0.0032 Epoch: 16 Global Step: 203750 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:01:29,282-Speed 3010.68 samples/sec Loss 2.2680 LearningRate 0.0032 Epoch: 16 Global Step: 203760 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:01:32,725-Speed 2974.72 samples/sec Loss 2.2515 LearningRate 0.0032 Epoch: 16 Global Step: 203770 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:01:36,111-Speed 3024.94 samples/sec Loss 2.2657 LearningRate 0.0032 Epoch: 16 Global Step: 203780 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:01:39,466-Speed 3053.48 samples/sec Loss 2.2424 LearningRate 0.0032 Epoch: 16 Global Step: 203790 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:01:42,840-Speed 3036.32 samples/sec Loss 2.3012 LearningRate 0.0032 Epoch: 16 Global Step: 203800 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:01:46,237-Speed 3014.80 samples/sec Loss 2.2584 LearningRate 0.0032 Epoch: 16 Global Step: 203810 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:01:49,703-Speed 2955.45 samples/sec Loss 2.2580 LearningRate 0.0032 Epoch: 16 Global Step: 203820 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:01:53,108-Speed 3008.06 samples/sec Loss 2.3344 LearningRate 0.0032 Epoch: 16 Global Step: 203830 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:01:56,461-Speed 3056.12 samples/sec Loss 2.3565 LearningRate 0.0032 Epoch: 16 Global Step: 203840 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:01:59,834-Speed 3036.66 samples/sec Loss 2.2516 LearningRate 0.0032 Epoch: 16 Global Step: 203850 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:02:03,213-Speed 3031.61 samples/sec Loss 2.3029 LearningRate 0.0032 Epoch: 16 Global Step: 203860 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:02:06,546-Speed 3073.66 samples/sec Loss 2.3014 LearningRate 0.0032 Epoch: 16 Global Step: 203870 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:02:09,886-Speed 3065.83 samples/sec Loss 2.3204 LearningRate 0.0032 Epoch: 16 Global Step: 203880 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:02:13,308-Speed 2994.04 samples/sec Loss 2.2665 LearningRate 0.0032 Epoch: 16 Global Step: 203890 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:02:16,657-Speed 3057.96 samples/sec Loss 2.2163 LearningRate 0.0032 Epoch: 16 Global Step: 203900 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:02:20,077-Speed 2995.62 samples/sec Loss 2.2080 LearningRate 0.0032 Epoch: 16 Global Step: 203910 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:02:23,507-Speed 2986.42 samples/sec Loss 2.3189 LearningRate 0.0032 Epoch: 16 Global Step: 203920 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:02:26,939-Speed 2984.30 samples/sec Loss 2.3044 LearningRate 0.0032 Epoch: 16 Global Step: 203930 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:02:30,276-Speed 3069.74 samples/sec Loss 2.2425 LearningRate 0.0032 Epoch: 16 Global Step: 203940 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:02:33,709-Speed 2983.81 samples/sec Loss 2.3223 LearningRate 0.0032 Epoch: 16 Global Step: 203950 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:02:37,131-Speed 2992.45 samples/sec Loss 2.2297 LearningRate 0.0032 Epoch: 16 Global Step: 203960 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:02:40,591-Speed 2961.27 samples/sec Loss 2.3127 LearningRate 0.0032 Epoch: 16 Global Step: 203970 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:02:43,934-Speed 3063.85 samples/sec Loss 2.2426 LearningRate 0.0032 Epoch: 16 Global Step: 203980 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:02:47,340-Speed 3006.91 samples/sec Loss 2.2573 LearningRate 0.0032 Epoch: 16 Global Step: 203990 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:02:50,652-Speed 3092.89 samples/sec Loss 2.2666 LearningRate 0.0032 Epoch: 16 Global Step: 204000 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:02:53,991-Speed 3068.04 samples/sec Loss 2.2604 LearningRate 0.0032 Epoch: 16 Global Step: 204010 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:02:57,463-Speed 2949.63 samples/sec Loss 2.2702 LearningRate 0.0032 Epoch: 16 Global Step: 204020 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:03:00,918-Speed 2965.05 samples/sec Loss 2.3414 LearningRate 0.0032 Epoch: 16 Global Step: 204030 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:03:04,274-Speed 3052.26 samples/sec Loss 2.2609 LearningRate 0.0032 Epoch: 16 Global Step: 204040 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 21:03:07,652-Speed 3031.55 samples/sec Loss 2.2938 LearningRate 0.0032 Epoch: 16 Global Step: 204050 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 21:03:10,965-Speed 3092.01 samples/sec Loss 2.2278 LearningRate 0.0032 Epoch: 16 Global Step: 204060 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:03:14,325-Speed 3048.70 samples/sec Loss 2.2701 LearningRate 0.0032 Epoch: 16 Global Step: 204070 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:03:17,764-Speed 2977.76 samples/sec Loss 2.2341 LearningRate 0.0032 Epoch: 16 Global Step: 204080 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:03:21,160-Speed 3016.89 samples/sec Loss 2.2784 LearningRate 0.0032 Epoch: 16 Global Step: 204090 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:03:24,483-Speed 3081.65 samples/sec Loss 2.2123 LearningRate 0.0032 Epoch: 16 Global Step: 204100 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:03:27,862-Speed 3031.60 samples/sec Loss 2.2594 LearningRate 0.0032 Epoch: 16 Global Step: 204110 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:03:31,315-Speed 2966.62 samples/sec Loss 2.2798 LearningRate 0.0032 Epoch: 16 Global Step: 204120 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:03:34,684-Speed 3039.88 samples/sec Loss 2.2701 LearningRate 0.0032 Epoch: 16 Global Step: 204130 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:03:38,061-Speed 3033.07 samples/sec Loss 2.2844 LearningRate 0.0032 Epoch: 16 Global Step: 204140 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:03:41,408-Speed 3060.93 samples/sec Loss 2.2299 LearningRate 0.0032 Epoch: 16 Global Step: 204150 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:03:44,778-Speed 3038.80 samples/sec Loss 2.2787 LearningRate 0.0032 Epoch: 16 Global Step: 204160 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:03:48,138-Speed 3048.49 samples/sec Loss 2.3064 LearningRate 0.0032 Epoch: 16 Global Step: 204170 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:03:51,516-Speed 3032.26 samples/sec Loss 2.2762 LearningRate 0.0032 Epoch: 16 Global Step: 204180 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:03:54,895-Speed 3031.62 samples/sec Loss 2.2756 LearningRate 0.0032 Epoch: 16 Global Step: 204190 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:03:58,302-Speed 3006.48 samples/sec Loss 2.2814 LearningRate 0.0032 Epoch: 16 Global Step: 204200 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:04:01,707-Speed 3008.19 samples/sec Loss 2.2828 LearningRate 0.0032 Epoch: 16 Global Step: 204210 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:04:05,166-Speed 2961.48 samples/sec Loss 2.2824 LearningRate 0.0032 Epoch: 16 Global Step: 204220 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:04:08,567-Speed 3011.63 samples/sec Loss 2.2362 LearningRate 0.0032 Epoch: 16 Global Step: 204230 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:04:11,983-Speed 2998.04 samples/sec Loss 2.2184 LearningRate 0.0032 Epoch: 16 Global Step: 204240 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:04:15,391-Speed 3005.73 samples/sec Loss 2.3477 LearningRate 0.0032 Epoch: 16 Global Step: 204250 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:04:18,807-Speed 2998.64 samples/sec Loss 2.2144 LearningRate 0.0032 Epoch: 16 Global Step: 204260 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:04:22,176-Speed 3040.82 samples/sec Loss 2.2475 LearningRate 0.0032 Epoch: 16 Global Step: 204270 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:04:25,538-Speed 3046.44 samples/sec Loss 2.2687 LearningRate 0.0032 Epoch: 16 Global Step: 204280 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 21:04:28,929-Speed 3020.18 samples/sec Loss 2.3220 LearningRate 0.0032 Epoch: 16 Global Step: 204290 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 21:04:32,275-Speed 3061.97 samples/sec Loss 2.2844 LearningRate 0.0032 Epoch: 16 Global Step: 204300 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:04:35,651-Speed 3033.72 samples/sec Loss 2.3029 LearningRate 0.0032 Epoch: 16 Global Step: 204310 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:04:38,977-Speed 3080.48 samples/sec Loss 2.2994 LearningRate 0.0032 Epoch: 16 Global Step: 204320 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:04:42,379-Speed 3010.40 samples/sec Loss 2.3596 LearningRate 0.0031 Epoch: 16 Global Step: 204330 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:04:45,722-Speed 3063.69 samples/sec Loss 2.3350 LearningRate 0.0031 Epoch: 16 Global Step: 204340 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:04:49,082-Speed 3048.48 samples/sec Loss 2.3192 LearningRate 0.0031 Epoch: 16 Global Step: 204350 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:04:52,502-Speed 2995.61 samples/sec Loss 2.2337 LearningRate 0.0031 Epoch: 16 Global Step: 204360 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:04:55,915-Speed 3000.44 samples/sec Loss 2.3048 LearningRate 0.0031 Epoch: 16 Global Step: 204370 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:04:59,314-Speed 3014.07 samples/sec Loss 2.2724 LearningRate 0.0031 Epoch: 16 Global Step: 204380 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:05:02,724-Speed 3004.08 samples/sec Loss 2.2794 LearningRate 0.0031 Epoch: 16 Global Step: 204390 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:05:06,074-Speed 3057.50 samples/sec Loss 2.2926 LearningRate 0.0031 Epoch: 16 Global Step: 204400 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:05:09,425-Speed 3056.68 samples/sec Loss 2.2045 LearningRate 0.0031 Epoch: 16 Global Step: 204410 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:05:12,753-Speed 3077.74 samples/sec Loss 2.3383 LearningRate 0.0031 Epoch: 16 Global Step: 204420 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:05:16,096-Speed 3064.11 samples/sec Loss 2.3298 LearningRate 0.0031 Epoch: 16 Global Step: 204430 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:05:19,505-Speed 3003.73 samples/sec Loss 2.3501 LearningRate 0.0031 Epoch: 16 Global Step: 204440 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:05:22,910-Speed 3008.82 samples/sec Loss 2.2762 LearningRate 0.0031 Epoch: 16 Global Step: 204450 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:05:26,302-Speed 3019.75 samples/sec Loss 2.2369 LearningRate 0.0031 Epoch: 16 Global Step: 204460 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:05:29,655-Speed 3054.26 samples/sec Loss 2.3066 LearningRate 0.0031 Epoch: 16 Global Step: 204470 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:05:33,027-Speed 3038.04 samples/sec Loss 2.2611 LearningRate 0.0031 Epoch: 16 Global Step: 204480 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:05:36,402-Speed 3035.07 samples/sec Loss 2.3251 LearningRate 0.0031 Epoch: 16 Global Step: 204490 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:05:39,775-Speed 3036.32 samples/sec Loss 2.3018 LearningRate 0.0031 Epoch: 16 Global Step: 204500 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:05:43,288-Speed 2915.59 samples/sec Loss 2.2885 LearningRate 0.0031 Epoch: 16 Global Step: 204510 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:05:46,762-Speed 2948.57 samples/sec Loss 2.3325 LearningRate 0.0031 Epoch: 16 Global Step: 204520 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:05:50,203-Speed 2976.93 samples/sec Loss 2.2792 LearningRate 0.0031 Epoch: 16 Global Step: 204530 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:05:53,689-Speed 2938.10 samples/sec Loss 2.2510 LearningRate 0.0031 Epoch: 16 Global Step: 204540 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:05:57,106-Speed 2997.12 samples/sec Loss 2.3028 LearningRate 0.0031 Epoch: 16 Global Step: 204550 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 21:06:00,494-Speed 3024.26 samples/sec Loss 2.3497 LearningRate 0.0031 Epoch: 16 Global Step: 204560 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 21:06:03,797-Speed 3100.90 samples/sec Loss 2.2914 LearningRate 0.0031 Epoch: 16 Global Step: 204570 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 21:06:07,266-Speed 2952.12 samples/sec Loss 2.2821 LearningRate 0.0031 Epoch: 16 Global Step: 204580 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 21:06:10,647-Speed 3029.41 samples/sec Loss 2.3262 LearningRate 0.0031 Epoch: 16 Global Step: 204590 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:06:14,006-Speed 3049.31 samples/sec Loss 2.3155 LearningRate 0.0031 Epoch: 16 Global Step: 204600 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:06:17,425-Speed 2996.14 samples/sec Loss 2.2819 LearningRate 0.0031 Epoch: 16 Global Step: 204610 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:06:20,869-Speed 2973.78 samples/sec Loss 2.2486 LearningRate 0.0031 Epoch: 16 Global Step: 204620 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:06:24,327-Speed 2962.42 samples/sec Loss 2.2321 LearningRate 0.0031 Epoch: 16 Global Step: 204630 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:06:27,772-Speed 2973.05 samples/sec Loss 2.2781 LearningRate 0.0031 Epoch: 16 Global Step: 204640 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:06:31,135-Speed 3046.67 samples/sec Loss 2.2400 LearningRate 0.0031 Epoch: 16 Global Step: 204650 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:06:34,520-Speed 3025.16 samples/sec Loss 2.2732 LearningRate 0.0031 Epoch: 16 Global Step: 204660 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:06:37,904-Speed 3027.47 samples/sec Loss 2.2914 LearningRate 0.0031 Epoch: 16 Global Step: 204670 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:06:41,233-Speed 3077.34 samples/sec Loss 2.3569 LearningRate 0.0031 Epoch: 16 Global Step: 204680 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:06:44,553-Speed 3085.23 samples/sec Loss 2.3455 LearningRate 0.0031 Epoch: 16 Global Step: 204690 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:06:47,955-Speed 3010.63 samples/sec Loss 2.3180 LearningRate 0.0031 Epoch: 16 Global Step: 204700 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:06:51,334-Speed 3032.51 samples/sec Loss 2.2557 LearningRate 0.0031 Epoch: 16 Global Step: 204710 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:06:54,719-Speed 3026.32 samples/sec Loss 2.2977 LearningRate 0.0031 Epoch: 16 Global Step: 204720 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:06:58,101-Speed 3028.37 samples/sec Loss 2.2847 LearningRate 0.0031 Epoch: 16 Global Step: 204730 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:07:01,494-Speed 3018.87 samples/sec Loss 2.2635 LearningRate 0.0031 Epoch: 16 Global Step: 204740 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:07:04,816-Speed 3083.77 samples/sec Loss 2.3125 LearningRate 0.0031 Epoch: 16 Global Step: 204750 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:07:08,122-Speed 3098.12 samples/sec Loss 2.2968 LearningRate 0.0031 Epoch: 16 Global Step: 204760 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:07:11,534-Speed 3001.93 samples/sec Loss 2.3120 LearningRate 0.0031 Epoch: 16 Global Step: 204770 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:07:14,872-Speed 3068.71 samples/sec Loss 2.3100 LearningRate 0.0031 Epoch: 16 Global Step: 204780 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:07:18,206-Speed 3072.18 samples/sec Loss 2.3547 LearningRate 0.0031 Epoch: 16 Global Step: 204790 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:07:21,593-Speed 3024.41 samples/sec Loss 2.2552 LearningRate 0.0031 Epoch: 16 Global Step: 204800 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:07:24,935-Speed 3064.57 samples/sec Loss 2.3043 LearningRate 0.0031 Epoch: 16 Global Step: 204810 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:07:28,302-Speed 3041.72 samples/sec Loss 2.3527 LearningRate 0.0031 Epoch: 16 Global Step: 204820 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:07:31,744-Speed 2976.48 samples/sec Loss 2.3371 LearningRate 0.0031 Epoch: 16 Global Step: 204830 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:07:35,125-Speed 3029.16 samples/sec Loss 2.2639 LearningRate 0.0031 Epoch: 16 Global Step: 204840 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:07:38,489-Speed 3045.01 samples/sec Loss 2.3521 LearningRate 0.0031 Epoch: 16 Global Step: 204850 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:07:41,844-Speed 3052.66 samples/sec Loss 2.2957 LearningRate 0.0031 Epoch: 16 Global Step: 204860 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:07:45,207-Speed 3046.26 samples/sec Loss 2.3084 LearningRate 0.0031 Epoch: 16 Global Step: 204870 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:07:48,531-Speed 3081.24 samples/sec Loss 2.2794 LearningRate 0.0031 Epoch: 16 Global Step: 204880 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:07:51,870-Speed 3068.00 samples/sec Loss 2.3227 LearningRate 0.0031 Epoch: 16 Global Step: 204890 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:07:55,195-Speed 3080.32 samples/sec Loss 2.2713 LearningRate 0.0031 Epoch: 16 Global Step: 204900 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:07:58,554-Speed 3048.87 samples/sec Loss 2.2274 LearningRate 0.0031 Epoch: 16 Global Step: 204910 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:08:01,974-Speed 2995.76 samples/sec Loss 2.2898 LearningRate 0.0031 Epoch: 16 Global Step: 204920 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:08:05,340-Speed 3042.50 samples/sec Loss 2.3949 LearningRate 0.0031 Epoch: 16 Global Step: 204930 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:08:08,707-Speed 3042.23 samples/sec Loss 2.3984 LearningRate 0.0031 Epoch: 16 Global Step: 204940 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:08:12,115-Speed 3005.72 samples/sec Loss 2.2617 LearningRate 0.0031 Epoch: 16 Global Step: 204950 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:08:15,449-Speed 3071.45 samples/sec Loss 2.2719 LearningRate 0.0031 Epoch: 16 Global Step: 204960 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:08:18,792-Speed 3063.96 samples/sec Loss 2.2518 LearningRate 0.0031 Epoch: 16 Global Step: 204970 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:08:22,140-Speed 3059.52 samples/sec Loss 2.3152 LearningRate 0.0031 Epoch: 16 Global Step: 204980 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:08:25,478-Speed 3068.98 samples/sec Loss 2.3621 LearningRate 0.0031 Epoch: 16 Global Step: 204990 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:08:28,844-Speed 3043.22 samples/sec Loss 2.3149 LearningRate 0.0031 Epoch: 16 Global Step: 205000 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:08:32,330-Speed 2937.86 samples/sec Loss 2.2313 LearningRate 0.0031 Epoch: 16 Global Step: 205010 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:08:35,807-Speed 2945.81 samples/sec Loss 2.3269 LearningRate 0.0031 Epoch: 16 Global Step: 205020 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:08:39,131-Speed 3082.35 samples/sec Loss 2.3236 LearningRate 0.0031 Epoch: 16 Global Step: 205030 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:08:42,496-Speed 3043.59 samples/sec Loss 2.3880 LearningRate 0.0030 Epoch: 16 Global Step: 205040 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:08:45,883-Speed 3023.97 samples/sec Loss 2.3251 LearningRate 0.0030 Epoch: 16 Global Step: 205050 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:08:49,227-Speed 3063.07 samples/sec Loss 2.2325 LearningRate 0.0030 Epoch: 16 Global Step: 205060 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:08:52,542-Speed 3089.37 samples/sec Loss 2.3964 LearningRate 0.0030 Epoch: 16 Global Step: 205070 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:08:55,907-Speed 3044.81 samples/sec Loss 2.2818 LearningRate 0.0030 Epoch: 16 Global Step: 205080 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:08:59,299-Speed 3019.43 samples/sec Loss 2.3080 LearningRate 0.0030 Epoch: 16 Global Step: 205090 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:09:02,739-Speed 2977.55 samples/sec Loss 2.3102 LearningRate 0.0030 Epoch: 16 Global Step: 205100 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:09:06,089-Speed 3057.59 samples/sec Loss 2.2987 LearningRate 0.0030 Epoch: 16 Global Step: 205110 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:09:09,449-Speed 3048.49 samples/sec Loss 2.2686 LearningRate 0.0030 Epoch: 16 Global Step: 205120 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:09:12,847-Speed 3014.36 samples/sec Loss 2.3017 LearningRate 0.0030 Epoch: 16 Global Step: 205130 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:09:16,178-Speed 3075.66 samples/sec Loss 2.3454 LearningRate 0.0030 Epoch: 16 Global Step: 205140 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:09:19,581-Speed 3009.69 samples/sec Loss 2.3790 LearningRate 0.0030 Epoch: 16 Global Step: 205150 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:09:22,987-Speed 3006.95 samples/sec Loss 2.3083 LearningRate 0.0030 Epoch: 16 Global Step: 205160 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:09:26,396-Speed 3004.73 samples/sec Loss 2.3152 LearningRate 0.0030 Epoch: 16 Global Step: 205170 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:09:29,801-Speed 3008.41 samples/sec Loss 2.1981 LearningRate 0.0030 Epoch: 16 Global Step: 205180 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:09:33,123-Speed 3083.13 samples/sec Loss 2.2894 LearningRate 0.0030 Epoch: 16 Global Step: 205190 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:09:36,502-Speed 3031.34 samples/sec Loss 2.2770 LearningRate 0.0030 Epoch: 16 Global Step: 205200 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:09:39,886-Speed 3026.80 samples/sec Loss 2.2184 LearningRate 0.0030 Epoch: 16 Global Step: 205210 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:09:43,233-Speed 3060.30 samples/sec Loss 2.2920 LearningRate 0.0030 Epoch: 16 Global Step: 205220 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 21:09:46,633-Speed 3012.87 samples/sec Loss 2.3292 LearningRate 0.0030 Epoch: 16 Global Step: 205230 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:09:50,058-Speed 2990.38 samples/sec Loss 2.2816 LearningRate 0.0030 Epoch: 16 Global Step: 205240 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:09:53,409-Speed 3056.65 samples/sec Loss 2.3002 LearningRate 0.0030 Epoch: 16 Global Step: 205250 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:09:56,745-Speed 3069.77 samples/sec Loss 2.2514 LearningRate 0.0030 Epoch: 16 Global Step: 205260 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:10:00,111-Speed 3043.64 samples/sec Loss 2.3173 LearningRate 0.0030 Epoch: 16 Global Step: 205270 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:10:03,483-Speed 3037.35 samples/sec Loss 2.2985 LearningRate 0.0030 Epoch: 16 Global Step: 205280 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:10:06,847-Speed 3045.19 samples/sec Loss 2.2676 LearningRate 0.0030 Epoch: 16 Global Step: 205290 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:10:10,159-Speed 3092.13 samples/sec Loss 2.3528 LearningRate 0.0030 Epoch: 16 Global Step: 205300 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:10:13,562-Speed 3010.29 samples/sec Loss 2.3263 LearningRate 0.0030 Epoch: 16 Global Step: 205310 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:10:17,047-Speed 2938.66 samples/sec Loss 2.3086 LearningRate 0.0030 Epoch: 16 Global Step: 205320 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:10:20,439-Speed 3019.63 samples/sec Loss 2.2715 LearningRate 0.0030 Epoch: 16 Global Step: 205330 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:10:23,795-Speed 3052.27 samples/sec Loss 2.2243 LearningRate 0.0030 Epoch: 16 Global Step: 205340 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:10:27,116-Speed 3084.32 samples/sec Loss 2.3396 LearningRate 0.0030 Epoch: 16 Global Step: 205350 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:10:30,503-Speed 3024.69 samples/sec Loss 2.2578 LearningRate 0.0030 Epoch: 16 Global Step: 205360 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:10:33,979-Speed 2946.28 samples/sec Loss 2.3248 LearningRate 0.0030 Epoch: 16 Global Step: 205370 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:10:37,337-Speed 3050.44 samples/sec Loss 2.2768 LearningRate 0.0030 Epoch: 16 Global Step: 205380 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:10:40,723-Speed 3025.15 samples/sec Loss 2.2702 LearningRate 0.0030 Epoch: 16 Global Step: 205390 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:10:44,053-Speed 3075.85 samples/sec Loss 2.2932 LearningRate 0.0030 Epoch: 16 Global Step: 205400 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:10:47,471-Speed 2996.85 samples/sec Loss 2.3308 LearningRate 0.0030 Epoch: 16 Global Step: 205410 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:10:50,855-Speed 3026.90 samples/sec Loss 2.2919 LearningRate 0.0030 Epoch: 16 Global Step: 205420 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:10:54,208-Speed 3054.84 samples/sec Loss 2.3141 LearningRate 0.0030 Epoch: 16 Global Step: 205430 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:10:57,574-Speed 3042.83 samples/sec Loss 2.2542 LearningRate 0.0030 Epoch: 16 Global Step: 205440 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:11:00,954-Speed 3030.85 samples/sec Loss 2.2802 LearningRate 0.0030 Epoch: 16 Global Step: 205450 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:11:04,370-Speed 2999.08 samples/sec Loss 2.3227 LearningRate 0.0030 Epoch: 16 Global Step: 205460 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:11:07,775-Speed 3007.81 samples/sec Loss 2.3456 LearningRate 0.0030 Epoch: 16 Global Step: 205470 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:11:11,113-Speed 3068.61 samples/sec Loss 2.3141 LearningRate 0.0030 Epoch: 16 Global Step: 205480 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:11:14,511-Speed 3014.41 samples/sec Loss 2.2879 LearningRate 0.0030 Epoch: 16 Global Step: 205490 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:11:17,870-Speed 3049.66 samples/sec Loss 2.3141 LearningRate 0.0030 Epoch: 16 Global Step: 205500 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:11:21,216-Speed 3061.34 samples/sec Loss 2.3199 LearningRate 0.0030 Epoch: 16 Global Step: 205510 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:11:24,604-Speed 3022.97 samples/sec Loss 2.2957 LearningRate 0.0030 Epoch: 16 Global Step: 205520 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:11:28,015-Speed 3003.07 samples/sec Loss 2.2876 LearningRate 0.0030 Epoch: 16 Global Step: 205530 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:11:31,423-Speed 3006.09 samples/sec Loss 2.3089 LearningRate 0.0030 Epoch: 16 Global Step: 205540 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:11:34,785-Speed 3046.22 samples/sec Loss 2.2961 LearningRate 0.0030 Epoch: 16 Global Step: 205550 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:11:38,122-Speed 3069.58 samples/sec Loss 2.2715 LearningRate 0.0030 Epoch: 16 Global Step: 205560 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:11:41,497-Speed 3034.80 samples/sec Loss 2.3016 LearningRate 0.0030 Epoch: 16 Global Step: 205570 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:11:44,832-Speed 3071.12 samples/sec Loss 2.2974 LearningRate 0.0030 Epoch: 16 Global Step: 205580 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:11:48,247-Speed 2999.70 samples/sec Loss 2.2831 LearningRate 0.0030 Epoch: 16 Global Step: 205590 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:11:51,705-Speed 2962.53 samples/sec Loss 2.2880 LearningRate 0.0030 Epoch: 16 Global Step: 205600 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:11:55,156-Speed 2967.58 samples/sec Loss 2.3031 LearningRate 0.0030 Epoch: 16 Global Step: 205610 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:11:58,607-Speed 2968.64 samples/sec Loss 2.3024 LearningRate 0.0030 Epoch: 16 Global Step: 205620 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:12:02,027-Speed 2995.06 samples/sec Loss 2.3027 LearningRate 0.0030 Epoch: 16 Global Step: 205630 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:12:05,461-Speed 2982.81 samples/sec Loss 2.3577 LearningRate 0.0030 Epoch: 16 Global Step: 205640 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:12:08,916-Speed 2963.97 samples/sec Loss 2.2917 LearningRate 0.0030 Epoch: 16 Global Step: 205650 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:12:12,273-Speed 3051.74 samples/sec Loss 2.3139 LearningRate 0.0030 Epoch: 16 Global Step: 205660 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:12:15,599-Speed 3079.56 samples/sec Loss 2.2617 LearningRate 0.0030 Epoch: 16 Global Step: 205670 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:12:18,927-Speed 3077.42 samples/sec Loss 2.3129 LearningRate 0.0030 Epoch: 16 Global Step: 205680 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:12:22,249-Speed 3083.65 samples/sec Loss 2.3045 LearningRate 0.0030 Epoch: 16 Global Step: 205690 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:12:25,639-Speed 3021.96 samples/sec Loss 2.2591 LearningRate 0.0030 Epoch: 16 Global Step: 205700 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:12:29,097-Speed 2961.46 samples/sec Loss 2.3594 LearningRate 0.0030 Epoch: 16 Global Step: 205710 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:12:32,534-Speed 2980.21 samples/sec Loss 2.3334 LearningRate 0.0030 Epoch: 16 Global Step: 205720 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:12:35,927-Speed 3019.56 samples/sec Loss 2.2663 LearningRate 0.0030 Epoch: 16 Global Step: 205730 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:12:39,359-Speed 2984.39 samples/sec Loss 2.3243 LearningRate 0.0030 Epoch: 16 Global Step: 205740 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:12:42,778-Speed 2996.21 samples/sec Loss 2.2768 LearningRate 0.0030 Epoch: 16 Global Step: 205750 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:12:46,175-Speed 3014.31 samples/sec Loss 2.3197 LearningRate 0.0029 Epoch: 16 Global Step: 205760 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:12:49,601-Speed 2990.54 samples/sec Loss 2.3338 LearningRate 0.0029 Epoch: 16 Global Step: 205770 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:12:52,994-Speed 3018.69 samples/sec Loss 2.3330 LearningRate 0.0029 Epoch: 16 Global Step: 205780 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:12:56,358-Speed 3045.22 samples/sec Loss 2.2947 LearningRate 0.0029 Epoch: 16 Global Step: 205790 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:12:59,744-Speed 3024.71 samples/sec Loss 2.2566 LearningRate 0.0029 Epoch: 16 Global Step: 205800 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:13:03,172-Speed 2987.87 samples/sec Loss 2.2725 LearningRate 0.0029 Epoch: 16 Global Step: 205810 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:13:06,562-Speed 3021.58 samples/sec Loss 2.2715 LearningRate 0.0029 Epoch: 16 Global Step: 205820 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:13:09,974-Speed 3002.68 samples/sec Loss 2.2845 LearningRate 0.0029 Epoch: 16 Global Step: 205830 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:13:13,363-Speed 3021.96 samples/sec Loss 2.3351 LearningRate 0.0029 Epoch: 16 Global Step: 205840 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:13:16,742-Speed 3031.02 samples/sec Loss 2.3793 LearningRate 0.0029 Epoch: 16 Global Step: 205850 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:13:20,065-Speed 3082.65 samples/sec Loss 2.3038 LearningRate 0.0029 Epoch: 16 Global Step: 205860 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:13:23,393-Speed 3078.26 samples/sec Loss 2.3303 LearningRate 0.0029 Epoch: 16 Global Step: 205870 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:13:26,746-Speed 3054.57 samples/sec Loss 2.2989 LearningRate 0.0029 Epoch: 16 Global Step: 205880 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:13:30,209-Speed 2957.53 samples/sec Loss 2.2905 LearningRate 0.0029 Epoch: 16 Global Step: 205890 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:13:33,566-Speed 3051.89 samples/sec Loss 2.3342 LearningRate 0.0029 Epoch: 16 Global Step: 205900 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:13:36,935-Speed 3040.16 samples/sec Loss 2.2471 LearningRate 0.0029 Epoch: 16 Global Step: 205910 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:13:40,352-Speed 2997.22 samples/sec Loss 2.2541 LearningRate 0.0029 Epoch: 16 Global Step: 205920 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:13:43,699-Speed 3061.02 samples/sec Loss 2.2947 LearningRate 0.0029 Epoch: 16 Global Step: 205930 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:13:47,101-Speed 3010.18 samples/sec Loss 2.3462 LearningRate 0.0029 Epoch: 16 Global Step: 205940 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:13:50,453-Speed 3055.91 samples/sec Loss 2.3164 LearningRate 0.0029 Epoch: 16 Global Step: 205950 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:13:53,831-Speed 3032.50 samples/sec Loss 2.2932 LearningRate 0.0029 Epoch: 16 Global Step: 205960 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:13:57,206-Speed 3035.15 samples/sec Loss 2.3156 LearningRate 0.0029 Epoch: 16 Global Step: 205970 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:14:00,606-Speed 3011.82 samples/sec Loss 2.2408 LearningRate 0.0029 Epoch: 16 Global Step: 205980 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:14:04,002-Speed 3016.92 samples/sec Loss 2.3177 LearningRate 0.0029 Epoch: 16 Global Step: 205990 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:14:07,389-Speed 3024.22 samples/sec Loss 2.2550 LearningRate 0.0029 Epoch: 16 Global Step: 206000 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:14:10,754-Speed 3043.54 samples/sec Loss 2.3082 LearningRate 0.0029 Epoch: 16 Global Step: 206010 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:14:14,131-Speed 3033.60 samples/sec Loss 2.3075 LearningRate 0.0029 Epoch: 16 Global Step: 206020 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:14:17,557-Speed 2989.18 samples/sec Loss 2.3063 LearningRate 0.0029 Epoch: 16 Global Step: 206030 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:14:20,981-Speed 2992.30 samples/sec Loss 2.2990 LearningRate 0.0029 Epoch: 16 Global Step: 206040 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:14:24,397-Speed 2997.72 samples/sec Loss 2.3124 LearningRate 0.0029 Epoch: 16 Global Step: 206050 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:14:27,885-Speed 2936.73 samples/sec Loss 2.3967 LearningRate 0.0029 Epoch: 16 Global Step: 206060 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:14:31,263-Speed 3032.60 samples/sec Loss 2.2831 LearningRate 0.0029 Epoch: 16 Global Step: 206070 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:14:34,679-Speed 2997.91 samples/sec Loss 2.3249 LearningRate 0.0029 Epoch: 16 Global Step: 206080 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:14:38,102-Speed 2992.87 samples/sec Loss 2.3134 LearningRate 0.0029 Epoch: 16 Global Step: 206090 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:14:41,532-Speed 2986.63 samples/sec Loss 2.3232 LearningRate 0.0029 Epoch: 16 Global Step: 206100 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:14:44,977-Speed 2972.73 samples/sec Loss 2.3237 LearningRate 0.0029 Epoch: 16 Global Step: 206110 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:14:48,322-Speed 3062.54 samples/sec Loss 2.2957 LearningRate 0.0029 Epoch: 16 Global Step: 206120 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:14:51,756-Speed 2982.23 samples/sec Loss 2.2858 LearningRate 0.0029 Epoch: 16 Global Step: 206130 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:14:55,167-Speed 3003.42 samples/sec Loss 2.3679 LearningRate 0.0029 Epoch: 16 Global Step: 206140 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:14:58,576-Speed 3005.02 samples/sec Loss 2.2956 LearningRate 0.0029 Epoch: 16 Global Step: 206150 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:15:01,981-Speed 3007.99 samples/sec Loss 2.2548 LearningRate 0.0029 Epoch: 16 Global Step: 206160 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:15:05,440-Speed 2960.74 samples/sec Loss 2.2961 LearningRate 0.0029 Epoch: 16 Global Step: 206170 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:15:08,829-Speed 3022.92 samples/sec Loss 2.3926 LearningRate 0.0029 Epoch: 16 Global Step: 206180 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:15:12,214-Speed 3025.70 samples/sec Loss 2.2119 LearningRate 0.0029 Epoch: 16 Global Step: 206190 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:15:15,591-Speed 3033.72 samples/sec Loss 2.2999 LearningRate 0.0029 Epoch: 16 Global Step: 206200 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:15:18,991-Speed 3012.24 samples/sec Loss 2.3077 LearningRate 0.0029 Epoch: 16 Global Step: 206210 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:15:22,376-Speed 3026.24 samples/sec Loss 2.2677 LearningRate 0.0029 Epoch: 16 Global Step: 206220 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:15:25,739-Speed 3045.46 samples/sec Loss 2.2568 LearningRate 0.0029 Epoch: 16 Global Step: 206230 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:15:29,182-Speed 2974.86 samples/sec Loss 2.3158 LearningRate 0.0029 Epoch: 16 Global Step: 206240 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:15:32,630-Speed 2970.50 samples/sec Loss 2.3366 LearningRate 0.0029 Epoch: 16 Global Step: 206250 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:15:36,037-Speed 3006.84 samples/sec Loss 2.2952 LearningRate 0.0029 Epoch: 16 Global Step: 206260 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:15:39,379-Speed 3064.76 samples/sec Loss 2.2591 LearningRate 0.0029 Epoch: 16 Global Step: 206270 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:15:42,829-Speed 2969.52 samples/sec Loss 2.2886 LearningRate 0.0029 Epoch: 16 Global Step: 206280 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 21:15:46,246-Speed 2998.00 samples/sec Loss 2.3265 LearningRate 0.0029 Epoch: 16 Global Step: 206290 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 21:15:49,706-Speed 2960.08 samples/sec Loss 2.2955 LearningRate 0.0029 Epoch: 16 Global Step: 206300 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 21:15:53,169-Speed 2957.99 samples/sec Loss 2.2893 LearningRate 0.0029 Epoch: 16 Global Step: 206310 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 21:15:56,545-Speed 3033.81 samples/sec Loss 2.3019 LearningRate 0.0029 Epoch: 16 Global Step: 206320 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 21:15:59,912-Speed 3042.98 samples/sec Loss 2.3374 LearningRate 0.0029 Epoch: 16 Global Step: 206330 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 21:16:03,273-Speed 3047.05 samples/sec Loss 2.2277 LearningRate 0.0029 Epoch: 16 Global Step: 206340 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 21:16:06,633-Speed 3048.54 samples/sec Loss 2.3030 LearningRate 0.0029 Epoch: 16 Global Step: 206350 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 21:16:10,019-Speed 3024.95 samples/sec Loss 2.3414 LearningRate 0.0029 Epoch: 16 Global Step: 206360 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 21:16:13,485-Speed 2955.73 samples/sec Loss 2.3294 LearningRate 0.0029 Epoch: 16 Global Step: 206370 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 21:16:16,912-Speed 2988.61 samples/sec Loss 2.3304 LearningRate 0.0029 Epoch: 16 Global Step: 206380 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:16:20,385-Speed 2948.91 samples/sec Loss 2.3118 LearningRate 0.0029 Epoch: 16 Global Step: 206390 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:16:23,764-Speed 3031.66 samples/sec Loss 2.4101 LearningRate 0.0029 Epoch: 16 Global Step: 206400 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:16:27,128-Speed 3044.88 samples/sec Loss 2.3267 LearningRate 0.0029 Epoch: 16 Global Step: 206410 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:16:30,514-Speed 3024.89 samples/sec Loss 2.2868 LearningRate 0.0029 Epoch: 16 Global Step: 206420 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:16:33,945-Speed 2986.01 samples/sec Loss 2.2583 LearningRate 0.0029 Epoch: 16 Global Step: 206430 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:16:37,284-Speed 3067.50 samples/sec Loss 2.2985 LearningRate 0.0029 Epoch: 16 Global Step: 206440 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:16:40,761-Speed 2945.39 samples/sec Loss 2.3062 LearningRate 0.0029 Epoch: 16 Global Step: 206450 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:16:44,160-Speed 3014.32 samples/sec Loss 2.3448 LearningRate 0.0029 Epoch: 16 Global Step: 206460 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:16:47,501-Speed 3065.92 samples/sec Loss 2.3290 LearningRate 0.0029 Epoch: 16 Global Step: 206470 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:16:50,935-Speed 2982.59 samples/sec Loss 2.3193 LearningRate 0.0029 Epoch: 16 Global Step: 206480 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:16:54,406-Speed 2951.09 samples/sec Loss 2.2727 LearningRate 0.0028 Epoch: 16 Global Step: 206490 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:16:57,857-Speed 2967.50 samples/sec Loss 2.2476 LearningRate 0.0028 Epoch: 16 Global Step: 206500 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:17:01,249-Speed 3019.54 samples/sec Loss 2.3112 LearningRate 0.0028 Epoch: 16 Global Step: 206510 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:17:04,673-Speed 2992.19 samples/sec Loss 2.2911 LearningRate 0.0028 Epoch: 16 Global Step: 206520 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:17:08,120-Speed 2971.18 samples/sec Loss 2.2962 LearningRate 0.0028 Epoch: 16 Global Step: 206530 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:17:11,501-Speed 3029.31 samples/sec Loss 2.2715 LearningRate 0.0028 Epoch: 16 Global Step: 206540 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:17:14,848-Speed 3061.08 samples/sec Loss 2.3246 LearningRate 0.0028 Epoch: 16 Global Step: 206550 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:17:18,273-Speed 2990.50 samples/sec Loss 2.3439 LearningRate 0.0028 Epoch: 16 Global Step: 206560 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:17:21,686-Speed 3000.75 samples/sec Loss 2.2681 LearningRate 0.0028 Epoch: 16 Global Step: 206570 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:17:25,059-Speed 3037.42 samples/sec Loss 2.3731 LearningRate 0.0028 Epoch: 16 Global Step: 206580 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 21:17:28,487-Speed 2988.05 samples/sec Loss 2.3381 LearningRate 0.0028 Epoch: 16 Global Step: 206590 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 21:17:31,851-Speed 3044.52 samples/sec Loss 2.3020 LearningRate 0.0028 Epoch: 16 Global Step: 206600 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 21:17:35,274-Speed 2992.85 samples/sec Loss 2.3039 LearningRate 0.0028 Epoch: 16 Global Step: 206610 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:17:38,699-Speed 2990.87 samples/sec Loss 2.2568 LearningRate 0.0028 Epoch: 16 Global Step: 206620 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:17:42,032-Speed 3073.15 samples/sec Loss 2.3309 LearningRate 0.0028 Epoch: 16 Global Step: 206630 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:17:46,208-Speed 2452.59 samples/sec Loss 2.3457 LearningRate 0.0028 Epoch: 16 Global Step: 206640 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:17:49,534-Speed 3080.25 samples/sec Loss 2.3143 LearningRate 0.0028 Epoch: 16 Global Step: 206650 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:17:52,910-Speed 3033.63 samples/sec Loss 2.3022 LearningRate 0.0028 Epoch: 16 Global Step: 206660 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:17:56,257-Speed 3060.10 samples/sec Loss 2.3306 LearningRate 0.0028 Epoch: 16 Global Step: 206670 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:17:59,703-Speed 2972.67 samples/sec Loss 2.3143 LearningRate 0.0028 Epoch: 16 Global Step: 206680 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:18:03,067-Speed 3044.96 samples/sec Loss 2.3317 LearningRate 0.0028 Epoch: 16 Global Step: 206690 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:18:06,435-Speed 3041.11 samples/sec Loss 2.3780 LearningRate 0.0028 Epoch: 16 Global Step: 206700 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:18:09,842-Speed 3006.77 samples/sec Loss 2.3168 LearningRate 0.0028 Epoch: 16 Global Step: 206710 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:18:13,216-Speed 3036.18 samples/sec Loss 2.2683 LearningRate 0.0028 Epoch: 16 Global Step: 206720 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:18:16,616-Speed 3012.63 samples/sec Loss 2.3508 LearningRate 0.0028 Epoch: 16 Global Step: 206730 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:18:20,052-Speed 2981.24 samples/sec Loss 2.3231 LearningRate 0.0028 Epoch: 16 Global Step: 206740 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:18:23,429-Speed 3033.14 samples/sec Loss 2.2960 LearningRate 0.0028 Epoch: 16 Global Step: 206750 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:18:26,791-Speed 3046.76 samples/sec Loss 2.3245 LearningRate 0.0028 Epoch: 16 Global Step: 206760 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:18:30,135-Speed 3062.87 samples/sec Loss 2.3238 LearningRate 0.0028 Epoch: 16 Global Step: 206770 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:18:33,543-Speed 3006.19 samples/sec Loss 2.3271 LearningRate 0.0028 Epoch: 16 Global Step: 206780 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:18:36,896-Speed 3054.07 samples/sec Loss 2.2880 LearningRate 0.0028 Epoch: 16 Global Step: 206790 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:18:40,324-Speed 2988.13 samples/sec Loss 2.3424 LearningRate 0.0028 Epoch: 16 Global Step: 206800 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:18:43,706-Speed 3029.59 samples/sec Loss 2.3505 LearningRate 0.0028 Epoch: 16 Global Step: 206810 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:18:47,084-Speed 3031.58 samples/sec Loss 2.3439 LearningRate 0.0028 Epoch: 16 Global Step: 206820 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:18:50,460-Speed 3034.51 samples/sec Loss 2.3378 LearningRate 0.0028 Epoch: 16 Global Step: 206830 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:18:53,837-Speed 3032.85 samples/sec Loss 2.3389 LearningRate 0.0028 Epoch: 16 Global Step: 206840 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 21:18:57,153-Speed 3088.32 samples/sec Loss 2.2872 LearningRate 0.0028 Epoch: 16 Global Step: 206850 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:19:00,600-Speed 2971.62 samples/sec Loss 2.2674 LearningRate 0.0028 Epoch: 16 Global Step: 206860 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:19:03,981-Speed 3029.71 samples/sec Loss 2.3204 LearningRate 0.0028 Epoch: 16 Global Step: 206870 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:19:07,371-Speed 3022.04 samples/sec Loss 2.3045 LearningRate 0.0028 Epoch: 16 Global Step: 206880 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:19:10,748-Speed 3032.92 samples/sec Loss 2.3450 LearningRate 0.0028 Epoch: 16 Global Step: 206890 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:19:14,172-Speed 2991.24 samples/sec Loss 2.2872 LearningRate 0.0028 Epoch: 16 Global Step: 206900 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:19:17,579-Speed 3006.08 samples/sec Loss 2.3689 LearningRate 0.0028 Epoch: 16 Global Step: 206910 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:19:20,929-Speed 3058.34 samples/sec Loss 2.3007 LearningRate 0.0028 Epoch: 16 Global Step: 206920 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:19:24,345-Speed 2997.84 samples/sec Loss 2.2919 LearningRate 0.0028 Epoch: 16 Global Step: 206930 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:19:27,730-Speed 3026.12 samples/sec Loss 2.2934 LearningRate 0.0028 Epoch: 16 Global Step: 206940 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:19:31,209-Speed 2944.53 samples/sec Loss 2.3484 LearningRate 0.0028 Epoch: 16 Global Step: 206950 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:19:34,613-Speed 3008.79 samples/sec Loss 2.2716 LearningRate 0.0028 Epoch: 16 Global Step: 206960 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:19:37,966-Speed 3055.48 samples/sec Loss 2.2835 LearningRate 0.0028 Epoch: 16 Global Step: 206970 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:19:41,346-Speed 3030.03 samples/sec Loss 2.2505 LearningRate 0.0028 Epoch: 16 Global Step: 206980 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:19:44,750-Speed 3009.45 samples/sec Loss 2.3068 LearningRate 0.0028 Epoch: 16 Global Step: 206990 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:19:48,134-Speed 3027.53 samples/sec Loss 2.2920 LearningRate 0.0028 Epoch: 16 Global Step: 207000 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:19:51,473-Speed 3067.56 samples/sec Loss 2.3350 LearningRate 0.0028 Epoch: 16 Global Step: 207010 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:19:54,875-Speed 3010.73 samples/sec Loss 2.2726 LearningRate 0.0028 Epoch: 16 Global Step: 207020 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:19:58,251-Speed 3034.22 samples/sec Loss 2.3593 LearningRate 0.0028 Epoch: 16 Global Step: 207030 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:20:01,651-Speed 3011.84 samples/sec Loss 2.3014 LearningRate 0.0028 Epoch: 16 Global Step: 207040 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:20:05,052-Speed 3011.57 samples/sec Loss 2.2824 LearningRate 0.0028 Epoch: 16 Global Step: 207050 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:20:08,511-Speed 2961.63 samples/sec Loss 2.3685 LearningRate 0.0028 Epoch: 16 Global Step: 207060 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:20:11,928-Speed 2997.37 samples/sec Loss 2.2685 LearningRate 0.0028 Epoch: 16 Global Step: 207070 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:20:15,231-Speed 3101.43 samples/sec Loss 2.2611 LearningRate 0.0028 Epoch: 16 Global Step: 207080 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:20:18,676-Speed 2973.13 samples/sec Loss 2.2926 LearningRate 0.0028 Epoch: 16 Global Step: 207090 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:20:22,044-Speed 3041.25 samples/sec Loss 2.3799 LearningRate 0.0028 Epoch: 16 Global Step: 207100 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:20:25,474-Speed 2986.35 samples/sec Loss 2.2781 LearningRate 0.0028 Epoch: 16 Global Step: 207110 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:20:28,859-Speed 3025.97 samples/sec Loss 2.3407 LearningRate 0.0028 Epoch: 16 Global Step: 207120 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:20:32,237-Speed 3032.85 samples/sec Loss 2.3408 LearningRate 0.0028 Epoch: 16 Global Step: 207130 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:20:35,574-Speed 3068.73 samples/sec Loss 2.3841 LearningRate 0.0028 Epoch: 16 Global Step: 207140 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:20:38,937-Speed 3046.49 samples/sec Loss 2.3407 LearningRate 0.0028 Epoch: 16 Global Step: 207150 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:20:42,300-Speed 3045.59 samples/sec Loss 2.2239 LearningRate 0.0028 Epoch: 16 Global Step: 207160 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:20:45,697-Speed 3015.30 samples/sec Loss 2.2415 LearningRate 0.0028 Epoch: 16 Global Step: 207170 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:20:49,074-Speed 3033.17 samples/sec Loss 2.2912 LearningRate 0.0028 Epoch: 16 Global Step: 207180 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:20:52,496-Speed 2992.62 samples/sec Loss 2.2690 LearningRate 0.0028 Epoch: 16 Global Step: 207190 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:20:55,850-Speed 3053.95 samples/sec Loss 2.2918 LearningRate 0.0028 Epoch: 16 Global Step: 207200 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:20:59,285-Speed 2982.48 samples/sec Loss 2.2675 LearningRate 0.0028 Epoch: 16 Global Step: 207210 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:21:02,766-Speed 2942.18 samples/sec Loss 2.3359 LearningRate 0.0028 Epoch: 16 Global Step: 207220 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:21:06,195-Speed 2987.19 samples/sec Loss 2.3546 LearningRate 0.0027 Epoch: 16 Global Step: 207230 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:21:09,589-Speed 3018.08 samples/sec Loss 2.2874 LearningRate 0.0027 Epoch: 16 Global Step: 207240 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:21:12,996-Speed 3006.69 samples/sec Loss 2.3016 LearningRate 0.0027 Epoch: 16 Global Step: 207250 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:21:16,321-Speed 3080.26 samples/sec Loss 2.2859 LearningRate 0.0027 Epoch: 16 Global Step: 207260 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:21:19,764-Speed 2974.92 samples/sec Loss 2.2840 LearningRate 0.0027 Epoch: 16 Global Step: 207270 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:21:23,115-Speed 3056.52 samples/sec Loss 2.2172 LearningRate 0.0027 Epoch: 16 Global Step: 207280 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 21:21:26,476-Speed 3048.50 samples/sec Loss 2.2788 LearningRate 0.0027 Epoch: 16 Global Step: 207290 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 21:21:29,902-Speed 2989.23 samples/sec Loss 2.3227 LearningRate 0.0027 Epoch: 16 Global Step: 207300 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:21:33,315-Speed 3000.65 samples/sec Loss 2.2877 LearningRate 0.0027 Epoch: 16 Global Step: 207310 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:21:36,729-Speed 3000.49 samples/sec Loss 2.2705 LearningRate 0.0027 Epoch: 16 Global Step: 207320 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:21:40,191-Speed 2958.63 samples/sec Loss 2.3333 LearningRate 0.0027 Epoch: 16 Global Step: 207330 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:21:43,615-Speed 2992.19 samples/sec Loss 2.2936 LearningRate 0.0027 Epoch: 16 Global Step: 207340 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:21:46,972-Speed 3050.65 samples/sec Loss 2.3471 LearningRate 0.0027 Epoch: 16 Global Step: 207350 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:21:50,384-Speed 3002.43 samples/sec Loss 2.2394 LearningRate 0.0027 Epoch: 16 Global Step: 207360 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:21:53,794-Speed 3003.92 samples/sec Loss 2.3378 LearningRate 0.0027 Epoch: 16 Global Step: 207370 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:21:57,195-Speed 3011.61 samples/sec Loss 2.3013 LearningRate 0.0027 Epoch: 16 Global Step: 207380 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:22:00,567-Speed 3037.59 samples/sec Loss 2.3221 LearningRate 0.0027 Epoch: 16 Global Step: 207390 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:22:03,887-Speed 3084.65 samples/sec Loss 2.2942 LearningRate 0.0027 Epoch: 16 Global Step: 207400 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:22:07,244-Speed 3051.15 samples/sec Loss 2.3016 LearningRate 0.0027 Epoch: 16 Global Step: 207410 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:22:10,612-Speed 3041.40 samples/sec Loss 2.3344 LearningRate 0.0027 Epoch: 16 Global Step: 207420 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:22:13,974-Speed 3047.10 samples/sec Loss 2.2542 LearningRate 0.0027 Epoch: 16 Global Step: 207430 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:22:17,424-Speed 2969.31 samples/sec Loss 2.2739 LearningRate 0.0027 Epoch: 16 Global Step: 207440 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:22:20,768-Speed 3063.16 samples/sec Loss 2.3243 LearningRate 0.0027 Epoch: 16 Global Step: 207450 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:22:24,175-Speed 3006.90 samples/sec Loss 2.3513 LearningRate 0.0027 Epoch: 16 Global Step: 207460 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:22:27,515-Speed 3066.75 samples/sec Loss 2.3071 LearningRate 0.0027 Epoch: 16 Global Step: 207470 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:22:30,881-Speed 3043.50 samples/sec Loss 2.2576 LearningRate 0.0027 Epoch: 16 Global Step: 207480 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:22:34,252-Speed 3038.27 samples/sec Loss 2.2408 LearningRate 0.0027 Epoch: 16 Global Step: 207490 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:22:37,587-Speed 3071.26 samples/sec Loss 2.3379 LearningRate 0.0027 Epoch: 16 Global Step: 207500 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:22:40,899-Speed 3092.65 samples/sec Loss 2.3035 LearningRate 0.0027 Epoch: 16 Global Step: 207510 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:22:44,221-Speed 3082.80 samples/sec Loss 2.2449 LearningRate 0.0027 Epoch: 16 Global Step: 207520 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:22:47,633-Speed 3002.70 samples/sec Loss 2.2797 LearningRate 0.0027 Epoch: 16 Global Step: 207530 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:22:51,028-Speed 3016.52 samples/sec Loss 2.3031 LearningRate 0.0027 Epoch: 16 Global Step: 207540 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:22:54,447-Speed 2996.12 samples/sec Loss 2.2769 LearningRate 0.0027 Epoch: 16 Global Step: 207550 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:22:57,760-Speed 3091.97 samples/sec Loss 2.4068 LearningRate 0.0027 Epoch: 16 Global Step: 207560 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:23:01,122-Speed 3046.29 samples/sec Loss 2.2436 LearningRate 0.0027 Epoch: 16 Global Step: 207570 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:23:04,462-Speed 3066.80 samples/sec Loss 2.3079 LearningRate 0.0027 Epoch: 16 Global Step: 207580 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:23:07,775-Speed 3091.70 samples/sec Loss 2.3331 LearningRate 0.0027 Epoch: 16 Global Step: 207590 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:23:11,123-Speed 3059.63 samples/sec Loss 2.3228 LearningRate 0.0027 Epoch: 16 Global Step: 207600 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:23:14,556-Speed 2984.32 samples/sec Loss 2.2569 LearningRate 0.0027 Epoch: 16 Global Step: 207610 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:23:17,928-Speed 3036.96 samples/sec Loss 2.2991 LearningRate 0.0027 Epoch: 16 Global Step: 207620 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:23:21,310-Speed 3029.54 samples/sec Loss 2.3377 LearningRate 0.0027 Epoch: 16 Global Step: 207630 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:23:24,696-Speed 3024.81 samples/sec Loss 2.3344 LearningRate 0.0027 Epoch: 16 Global Step: 207640 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:23:28,087-Speed 3020.90 samples/sec Loss 2.3183 LearningRate 0.0027 Epoch: 16 Global Step: 207650 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:23:31,437-Speed 3057.61 samples/sec Loss 2.3175 LearningRate 0.0027 Epoch: 16 Global Step: 207660 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:23:34,866-Speed 2986.95 samples/sec Loss 2.2651 LearningRate 0.0027 Epoch: 16 Global Step: 207670 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:23:38,233-Speed 3042.96 samples/sec Loss 2.3655 LearningRate 0.0027 Epoch: 16 Global Step: 207680 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:23:41,694-Speed 2959.16 samples/sec Loss 2.2898 LearningRate 0.0027 Epoch: 16 Global Step: 207690 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:23:45,110-Speed 2998.21 samples/sec Loss 2.3169 LearningRate 0.0027 Epoch: 16 Global Step: 207700 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:23:48,537-Speed 2988.86 samples/sec Loss 2.3471 LearningRate 0.0027 Epoch: 16 Global Step: 207710 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:23:51,894-Speed 3051.92 samples/sec Loss 2.2556 LearningRate 0.0027 Epoch: 16 Global Step: 207720 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:23:55,248-Speed 3053.82 samples/sec Loss 2.2586 LearningRate 0.0027 Epoch: 16 Global Step: 207730 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:23:58,706-Speed 2962.35 samples/sec Loss 2.3405 LearningRate 0.0027 Epoch: 16 Global Step: 207740 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:24:02,113-Speed 3006.34 samples/sec Loss 2.2153 LearningRate 0.0027 Epoch: 16 Global Step: 207750 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:24:05,513-Speed 3012.45 samples/sec Loss 2.2778 LearningRate 0.0027 Epoch: 16 Global Step: 207760 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:24:08,903-Speed 3021.79 samples/sec Loss 2.3161 LearningRate 0.0027 Epoch: 16 Global Step: 207770 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:24:12,284-Speed 3029.33 samples/sec Loss 2.3665 LearningRate 0.0027 Epoch: 16 Global Step: 207780 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:24:15,736-Speed 2967.30 samples/sec Loss 2.3555 LearningRate 0.0027 Epoch: 16 Global Step: 207790 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:24:19,230-Speed 2931.58 samples/sec Loss 2.2792 LearningRate 0.0027 Epoch: 16 Global Step: 207800 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:24:22,682-Speed 2967.11 samples/sec Loss 2.3014 LearningRate 0.0027 Epoch: 16 Global Step: 207810 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:24:26,141-Speed 2961.35 samples/sec Loss 2.3326 LearningRate 0.0027 Epoch: 16 Global Step: 207820 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:24:29,495-Speed 3054.57 samples/sec Loss 2.3316 LearningRate 0.0027 Epoch: 16 Global Step: 207830 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:24:32,865-Speed 3039.40 samples/sec Loss 2.3075 LearningRate 0.0027 Epoch: 16 Global Step: 207840 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:24:36,267-Speed 3010.70 samples/sec Loss 2.2774 LearningRate 0.0027 Epoch: 16 Global Step: 207850 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:24:39,662-Speed 3017.22 samples/sec Loss 2.3099 LearningRate 0.0027 Epoch: 16 Global Step: 207860 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:24:43,022-Speed 3048.01 samples/sec Loss 2.2866 LearningRate 0.0027 Epoch: 16 Global Step: 207870 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:24:46,385-Speed 3045.79 samples/sec Loss 2.2791 LearningRate 0.0027 Epoch: 16 Global Step: 207880 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:24:49,775-Speed 3021.89 samples/sec Loss 2.3108 LearningRate 0.0027 Epoch: 16 Global Step: 207890 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:24:53,124-Speed 3058.40 samples/sec Loss 2.3286 LearningRate 0.0027 Epoch: 16 Global Step: 207900 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:24:56,490-Speed 3043.15 samples/sec Loss 2.3092 LearningRate 0.0027 Epoch: 16 Global Step: 207910 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:24:59,831-Speed 3065.63 samples/sec Loss 2.2320 LearningRate 0.0027 Epoch: 16 Global Step: 207920 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:25:03,308-Speed 2945.70 samples/sec Loss 2.3993 LearningRate 0.0027 Epoch: 16 Global Step: 207930 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:25:06,682-Speed 3035.27 samples/sec Loss 2.2760 LearningRate 0.0027 Epoch: 16 Global Step: 207940 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:25:10,157-Speed 2947.66 samples/sec Loss 2.3024 LearningRate 0.0027 Epoch: 16 Global Step: 207950 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:25:13,640-Speed 2941.38 samples/sec Loss 2.3068 LearningRate 0.0027 Epoch: 16 Global Step: 207960 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:25:17,048-Speed 3005.33 samples/sec Loss 2.3940 LearningRate 0.0027 Epoch: 16 Global Step: 207970 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:25:20,493-Speed 2973.77 samples/sec Loss 2.3324 LearningRate 0.0027 Epoch: 16 Global Step: 207980 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:25:23,925-Speed 2984.53 samples/sec Loss 2.2771 LearningRate 0.0026 Epoch: 16 Global Step: 207990 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:25:27,244-Speed 3085.70 samples/sec Loss 2.2696 LearningRate 0.0026 Epoch: 16 Global Step: 208000 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:25:30,656-Speed 3002.15 samples/sec Loss 2.3118 LearningRate 0.0026 Epoch: 16 Global Step: 208010 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:25:34,109-Speed 2966.76 samples/sec Loss 2.3321 LearningRate 0.0026 Epoch: 16 Global Step: 208020 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:25:37,433-Speed 3081.06 samples/sec Loss 2.3571 LearningRate 0.0026 Epoch: 16 Global Step: 208030 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:25:40,783-Speed 3057.25 samples/sec Loss 2.3606 LearningRate 0.0026 Epoch: 16 Global Step: 208040 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:25:44,162-Speed 3031.54 samples/sec Loss 2.3448 LearningRate 0.0026 Epoch: 16 Global Step: 208050 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:25:47,475-Speed 3091.52 samples/sec Loss 2.2806 LearningRate 0.0026 Epoch: 16 Global Step: 208060 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:25:50,828-Speed 3055.24 samples/sec Loss 2.3157 LearningRate 0.0026 Epoch: 16 Global Step: 208070 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:25:54,244-Speed 2998.63 samples/sec Loss 2.2817 LearningRate 0.0026 Epoch: 16 Global Step: 208080 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:25:57,596-Speed 3055.02 samples/sec Loss 2.2801 LearningRate 0.0026 Epoch: 16 Global Step: 208090 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:26:01,081-Speed 2939.74 samples/sec Loss 2.2013 LearningRate 0.0026 Epoch: 16 Global Step: 208100 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:26:04,534-Speed 2966.36 samples/sec Loss 2.2889 LearningRate 0.0026 Epoch: 16 Global Step: 208110 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:26:07,888-Speed 3053.42 samples/sec Loss 2.2898 LearningRate 0.0026 Epoch: 16 Global Step: 208120 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:26:11,252-Speed 3045.30 samples/sec Loss 2.3392 LearningRate 0.0026 Epoch: 16 Global Step: 208130 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:26:14,671-Speed 2994.90 samples/sec Loss 2.3089 LearningRate 0.0026 Epoch: 16 Global Step: 208140 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:26:18,099-Speed 2988.94 samples/sec Loss 2.2283 LearningRate 0.0026 Epoch: 16 Global Step: 208150 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:26:21,436-Speed 3069.30 samples/sec Loss 2.2973 LearningRate 0.0026 Epoch: 16 Global Step: 208160 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:26:24,825-Speed 3025.35 samples/sec Loss 2.3423 LearningRate 0.0026 Epoch: 16 Global Step: 208170 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:26:28,256-Speed 2985.72 samples/sec Loss 2.3168 LearningRate 0.0026 Epoch: 16 Global Step: 208180 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:26:31,702-Speed 2972.84 samples/sec Loss 2.2863 LearningRate 0.0026 Epoch: 16 Global Step: 208190 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:26:35,182-Speed 2943.39 samples/sec Loss 2.3400 LearningRate 0.0026 Epoch: 16 Global Step: 208200 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:26:38,613-Speed 2985.36 samples/sec Loss 2.3492 LearningRate 0.0026 Epoch: 16 Global Step: 208210 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:26:41,966-Speed 3054.11 samples/sec Loss 2.2876 LearningRate 0.0026 Epoch: 16 Global Step: 208220 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:26:45,396-Speed 2986.55 samples/sec Loss 2.3584 LearningRate 0.0026 Epoch: 16 Global Step: 208230 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:26:48,714-Speed 3087.36 samples/sec Loss 2.2769 LearningRate 0.0026 Epoch: 16 Global Step: 208240 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:26:52,071-Speed 3050.76 samples/sec Loss 2.2947 LearningRate 0.0026 Epoch: 16 Global Step: 208250 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:26:55,459-Speed 3023.30 samples/sec Loss 2.3188 LearningRate 0.0026 Epoch: 16 Global Step: 208260 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:26:58,904-Speed 2974.03 samples/sec Loss 2.3297 LearningRate 0.0026 Epoch: 16 Global Step: 208270 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:27:02,261-Speed 3050.47 samples/sec Loss 2.2652 LearningRate 0.0026 Epoch: 16 Global Step: 208280 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:27:05,643-Speed 3028.66 samples/sec Loss 2.3215 LearningRate 0.0026 Epoch: 16 Global Step: 208290 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:27:09,034-Speed 3021.20 samples/sec Loss 2.3725 LearningRate 0.0026 Epoch: 16 Global Step: 208300 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:27:12,394-Speed 3048.81 samples/sec Loss 2.2366 LearningRate 0.0026 Epoch: 16 Global Step: 208310 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:27:15,824-Speed 2985.69 samples/sec Loss 2.3519 LearningRate 0.0026 Epoch: 16 Global Step: 208320 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:27:19,234-Speed 3004.38 samples/sec Loss 2.3271 LearningRate 0.0026 Epoch: 16 Global Step: 208330 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:27:22,580-Speed 3061.59 samples/sec Loss 2.3150 LearningRate 0.0026 Epoch: 16 Global Step: 208340 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:27:25,988-Speed 3005.05 samples/sec Loss 2.2191 LearningRate 0.0026 Epoch: 16 Global Step: 208350 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:27:29,386-Speed 3014.39 samples/sec Loss 2.3170 LearningRate 0.0026 Epoch: 16 Global Step: 208360 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 21:27:32,742-Speed 3052.35 samples/sec Loss 2.2791 LearningRate 0.0026 Epoch: 16 Global Step: 208370 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:27:36,110-Speed 3041.08 samples/sec Loss 2.3192 LearningRate 0.0026 Epoch: 16 Global Step: 208380 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:27:39,524-Speed 3000.08 samples/sec Loss 2.3741 LearningRate 0.0026 Epoch: 16 Global Step: 208390 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:27:42,959-Speed 2982.25 samples/sec Loss 2.3000 LearningRate 0.0026 Epoch: 16 Global Step: 208400 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:27:46,373-Speed 3000.07 samples/sec Loss 2.2494 LearningRate 0.0026 Epoch: 16 Global Step: 208410 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:27:49,859-Speed 2937.62 samples/sec Loss 2.2843 LearningRate 0.0026 Epoch: 16 Global Step: 208420 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:27:53,244-Speed 3026.42 samples/sec Loss 2.3674 LearningRate 0.0026 Epoch: 16 Global Step: 208430 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:27:56,546-Speed 3101.53 samples/sec Loss 2.3110 LearningRate 0.0026 Epoch: 16 Global Step: 208440 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:27:59,933-Speed 3025.12 samples/sec Loss 2.2716 LearningRate 0.0026 Epoch: 16 Global Step: 208450 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:28:03,298-Speed 3043.46 samples/sec Loss 2.3042 LearningRate 0.0026 Epoch: 16 Global Step: 208460 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:28:06,751-Speed 2966.67 samples/sec Loss 2.2706 LearningRate 0.0026 Epoch: 16 Global Step: 208470 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:28:10,137-Speed 3025.04 samples/sec Loss 2.3086 LearningRate 0.0026 Epoch: 16 Global Step: 208480 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:28:13,566-Speed 2986.59 samples/sec Loss 2.3512 LearningRate 0.0026 Epoch: 16 Global Step: 208490 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:28:16,897-Speed 3074.88 samples/sec Loss 2.2546 LearningRate 0.0026 Epoch: 16 Global Step: 208500 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:28:20,310-Speed 3001.26 samples/sec Loss 2.3205 LearningRate 0.0026 Epoch: 16 Global Step: 208510 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:28:23,797-Speed 2937.15 samples/sec Loss 2.2874 LearningRate 0.0026 Epoch: 16 Global Step: 208520 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:28:27,208-Speed 3003.64 samples/sec Loss 2.3965 LearningRate 0.0026 Epoch: 16 Global Step: 208530 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:28:30,589-Speed 3029.68 samples/sec Loss 2.4210 LearningRate 0.0026 Epoch: 16 Global Step: 208540 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:28:34,097-Speed 2919.85 samples/sec Loss 2.3841 LearningRate 0.0026 Epoch: 16 Global Step: 208550 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:28:37,531-Speed 2982.39 samples/sec Loss 2.3093 LearningRate 0.0026 Epoch: 16 Global Step: 208560 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:28:40,967-Speed 2981.14 samples/sec Loss 2.3320 LearningRate 0.0026 Epoch: 16 Global Step: 208570 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:28:44,292-Speed 3080.80 samples/sec Loss 2.3497 LearningRate 0.0026 Epoch: 16 Global Step: 208580 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:28:47,608-Speed 3089.29 samples/sec Loss 2.2917 LearningRate 0.0026 Epoch: 16 Global Step: 208590 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:28:51,027-Speed 2996.06 samples/sec Loss 2.2606 LearningRate 0.0026 Epoch: 16 Global Step: 208600 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:28:54,418-Speed 3020.62 samples/sec Loss 2.2194 LearningRate 0.0026 Epoch: 16 Global Step: 208610 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:28:57,875-Speed 2962.84 samples/sec Loss 2.2555 LearningRate 0.0026 Epoch: 16 Global Step: 208620 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:29:01,269-Speed 3018.44 samples/sec Loss 2.3894 LearningRate 0.0026 Epoch: 16 Global Step: 208630 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:29:04,636-Speed 3042.32 samples/sec Loss 2.2807 LearningRate 0.0026 Epoch: 16 Global Step: 208640 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:29:08,012-Speed 3033.73 samples/sec Loss 2.2742 LearningRate 0.0026 Epoch: 16 Global Step: 208650 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:29:11,380-Speed 3041.56 samples/sec Loss 2.2724 LearningRate 0.0026 Epoch: 16 Global Step: 208660 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 21:29:14,846-Speed 2954.60 samples/sec Loss 2.2812 LearningRate 0.0026 Epoch: 16 Global Step: 208670 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:29:18,174-Speed 3078.49 samples/sec Loss 2.3158 LearningRate 0.0026 Epoch: 16 Global Step: 208680 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:29:21,514-Speed 3066.66 samples/sec Loss 2.2929 LearningRate 0.0026 Epoch: 16 Global Step: 208690 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:29:24,879-Speed 3043.21 samples/sec Loss 2.3412 LearningRate 0.0026 Epoch: 16 Global Step: 208700 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:29:28,310-Speed 2986.48 samples/sec Loss 2.2862 LearningRate 0.0026 Epoch: 16 Global Step: 208710 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:29:31,681-Speed 3038.14 samples/sec Loss 2.3261 LearningRate 0.0026 Epoch: 16 Global Step: 208720 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:29:35,110-Speed 2987.13 samples/sec Loss 2.3353 LearningRate 0.0026 Epoch: 16 Global Step: 208730 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:29:38,578-Speed 2954.24 samples/sec Loss 2.2784 LearningRate 0.0026 Epoch: 16 Global Step: 208740 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:29:41,995-Speed 2997.83 samples/sec Loss 2.3065 LearningRate 0.0026 Epoch: 16 Global Step: 208750 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:29:45,364-Speed 3040.06 samples/sec Loss 2.3101 LearningRate 0.0025 Epoch: 16 Global Step: 208760 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:29:48,708-Speed 3063.10 samples/sec Loss 2.3181 LearningRate 0.0025 Epoch: 16 Global Step: 208770 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:29:52,025-Speed 3088.13 samples/sec Loss 2.3130 LearningRate 0.0025 Epoch: 16 Global Step: 208780 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:29:55,377-Speed 3056.15 samples/sec Loss 2.2908 LearningRate 0.0025 Epoch: 16 Global Step: 208790 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:29:58,819-Speed 2976.43 samples/sec Loss 2.3229 LearningRate 0.0025 Epoch: 16 Global Step: 208800 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:30:02,211-Speed 3019.99 samples/sec Loss 2.3352 LearningRate 0.0025 Epoch: 16 Global Step: 208810 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:30:05,620-Speed 3004.72 samples/sec Loss 2.3490 LearningRate 0.0025 Epoch: 16 Global Step: 208820 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:30:09,079-Speed 2961.15 samples/sec Loss 2.3408 LearningRate 0.0025 Epoch: 16 Global Step: 208830 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:30:12,517-Speed 2979.22 samples/sec Loss 2.3121 LearningRate 0.0025 Epoch: 16 Global Step: 208840 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:30:15,942-Speed 2990.32 samples/sec Loss 2.2678 LearningRate 0.0025 Epoch: 16 Global Step: 208850 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:30:19,373-Speed 2985.69 samples/sec Loss 2.3524 LearningRate 0.0025 Epoch: 16 Global Step: 208860 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:30:22,820-Speed 2971.85 samples/sec Loss 2.3777 LearningRate 0.0025 Epoch: 16 Global Step: 208870 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:30:26,293-Speed 2949.28 samples/sec Loss 2.3144 LearningRate 0.0025 Epoch: 16 Global Step: 208880 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:30:29,716-Speed 2992.75 samples/sec Loss 2.2908 LearningRate 0.0025 Epoch: 16 Global Step: 208890 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:30:33,113-Speed 3014.95 samples/sec Loss 2.3536 LearningRate 0.0025 Epoch: 16 Global Step: 208900 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:30:36,547-Speed 2982.92 samples/sec Loss 2.2824 LearningRate 0.0025 Epoch: 16 Global Step: 208910 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:30:40,018-Speed 2951.22 samples/sec Loss 2.2716 LearningRate 0.0025 Epoch: 16 Global Step: 208920 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:30:43,424-Speed 3006.45 samples/sec Loss 2.2116 LearningRate 0.0025 Epoch: 16 Global Step: 208930 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:30:46,852-Speed 2988.29 samples/sec Loss 2.3290 LearningRate 0.0025 Epoch: 16 Global Step: 208940 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:30:50,254-Speed 3010.63 samples/sec Loss 2.3558 LearningRate 0.0025 Epoch: 16 Global Step: 208950 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:30:53,696-Speed 2975.95 samples/sec Loss 2.3437 LearningRate 0.0025 Epoch: 16 Global Step: 208960 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:30:57,109-Speed 3001.71 samples/sec Loss 2.3111 LearningRate 0.0025 Epoch: 16 Global Step: 208970 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:31:00,439-Speed 3075.94 samples/sec Loss 2.3555 LearningRate 0.0025 Epoch: 16 Global Step: 208980 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 21:31:03,916-Speed 2945.65 samples/sec Loss 2.3419 LearningRate 0.0025 Epoch: 16 Global Step: 208990 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 21:31:07,309-Speed 3018.99 samples/sec Loss 2.3416 LearningRate 0.0025 Epoch: 16 Global Step: 209000 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 21:31:10,718-Speed 3003.99 samples/sec Loss 2.3216 LearningRate 0.0025 Epoch: 16 Global Step: 209010 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 21:31:14,120-Speed 3011.34 samples/sec Loss 2.3330 LearningRate 0.0025 Epoch: 16 Global Step: 209020 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 21:31:17,532-Speed 3001.86 samples/sec Loss 2.3043 LearningRate 0.0025 Epoch: 16 Global Step: 209030 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 21:31:20,908-Speed 3033.81 samples/sec Loss 2.2289 LearningRate 0.0025 Epoch: 16 Global Step: 209040 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 21:31:24,288-Speed 3030.43 samples/sec Loss 2.3023 LearningRate 0.0025 Epoch: 16 Global Step: 209050 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 21:31:27,673-Speed 3025.76 samples/sec Loss 2.3286 LearningRate 0.0025 Epoch: 16 Global Step: 209060 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 21:31:31,159-Speed 2938.41 samples/sec Loss 2.3185 LearningRate 0.0025 Epoch: 16 Global Step: 209070 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 21:31:34,596-Speed 2980.86 samples/sec Loss 2.3440 LearningRate 0.0025 Epoch: 16 Global Step: 209080 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:31:37,932-Speed 3070.20 samples/sec Loss 2.2980 LearningRate 0.0025 Epoch: 16 Global Step: 209090 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:31:41,281-Speed 3058.03 samples/sec Loss 2.2879 LearningRate 0.0025 Epoch: 16 Global Step: 209100 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:31:44,702-Speed 2994.54 samples/sec Loss 2.2418 LearningRate 0.0025 Epoch: 16 Global Step: 209110 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:31:48,097-Speed 3017.11 samples/sec Loss 2.3010 LearningRate 0.0025 Epoch: 16 Global Step: 209120 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:31:51,486-Speed 3023.02 samples/sec Loss 2.2659 LearningRate 0.0025 Epoch: 16 Global Step: 209130 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:31:54,930-Speed 2973.19 samples/sec Loss 2.2618 LearningRate 0.0025 Epoch: 16 Global Step: 209140 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:31:58,287-Speed 3051.92 samples/sec Loss 2.3654 LearningRate 0.0025 Epoch: 16 Global Step: 209150 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:32:01,607-Speed 3085.67 samples/sec Loss 2.3057 LearningRate 0.0025 Epoch: 16 Global Step: 209160 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:32:04,972-Speed 3043.55 samples/sec Loss 2.3243 LearningRate 0.0025 Epoch: 16 Global Step: 209170 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:32:08,357-Speed 3025.65 samples/sec Loss 2.2817 LearningRate 0.0025 Epoch: 16 Global Step: 209180 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:32:11,646-Speed 3116.69 samples/sec Loss 2.3201 LearningRate 0.0025 Epoch: 16 Global Step: 209190 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 21:32:15,093-Speed 2971.53 samples/sec Loss 2.2496 LearningRate 0.0025 Epoch: 16 Global Step: 209200 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 21:32:18,524-Speed 2985.33 samples/sec Loss 2.3684 LearningRate 0.0025 Epoch: 16 Global Step: 209210 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 21:32:21,913-Speed 3022.79 samples/sec Loss 2.2344 LearningRate 0.0025 Epoch: 16 Global Step: 209220 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 21:32:25,255-Speed 3064.63 samples/sec Loss 2.2976 LearningRate 0.0025 Epoch: 16 Global Step: 209230 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 21:32:28,675-Speed 2995.38 samples/sec Loss 2.3030 LearningRate 0.0025 Epoch: 16 Global Step: 209240 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 21:32:32,063-Speed 3022.93 samples/sec Loss 2.3307 LearningRate 0.0025 Epoch: 16 Global Step: 209250 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 21:32:35,466-Speed 3009.93 samples/sec Loss 2.2692 LearningRate 0.0025 Epoch: 16 Global Step: 209260 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 21:32:38,939-Speed 2949.46 samples/sec Loss 2.3305 LearningRate 0.0025 Epoch: 16 Global Step: 209270 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 21:32:42,399-Speed 2960.46 samples/sec Loss 2.3525 LearningRate 0.0025 Epoch: 16 Global Step: 209280 Fp16 Grad Scale: 8192 Required: 4 hours Training: 2022-04-27 21:32:45,842-Speed 2974.90 samples/sec Loss 2.2109 LearningRate 0.0025 Epoch: 16 Global Step: 209290 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:32:49,200-Speed 3050.39 samples/sec Loss 2.3172 LearningRate 0.0025 Epoch: 16 Global Step: 209300 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:32:52,566-Speed 3043.30 samples/sec Loss 2.2835 LearningRate 0.0025 Epoch: 16 Global Step: 209310 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:32:55,922-Speed 3052.00 samples/sec Loss 2.2641 LearningRate 0.0025 Epoch: 16 Global Step: 209320 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:32:59,336-Speed 2999.62 samples/sec Loss 2.2925 LearningRate 0.0025 Epoch: 16 Global Step: 209330 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:33:02,766-Speed 2986.53 samples/sec Loss 2.2995 LearningRate 0.0025 Epoch: 16 Global Step: 209340 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:33:06,090-Speed 3081.28 samples/sec Loss 2.2850 LearningRate 0.0025 Epoch: 16 Global Step: 209350 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:33:09,450-Speed 3048.63 samples/sec Loss 2.2237 LearningRate 0.0025 Epoch: 16 Global Step: 209360 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:33:12,790-Speed 3066.97 samples/sec Loss 2.2793 LearningRate 0.0025 Epoch: 16 Global Step: 209370 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:33:16,095-Speed 3098.55 samples/sec Loss 2.3470 LearningRate 0.0025 Epoch: 16 Global Step: 209380 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:33:19,424-Speed 3077.24 samples/sec Loss 2.2479 LearningRate 0.0025 Epoch: 16 Global Step: 209390 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:33:22,820-Speed 3016.56 samples/sec Loss 2.2936 LearningRate 0.0025 Epoch: 16 Global Step: 209400 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:33:26,223-Speed 3009.14 samples/sec Loss 2.3102 LearningRate 0.0025 Epoch: 16 Global Step: 209410 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:33:29,586-Speed 3046.26 samples/sec Loss 2.2840 LearningRate 0.0025 Epoch: 16 Global Step: 209420 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:33:32,920-Speed 3072.18 samples/sec Loss 2.3316 LearningRate 0.0025 Epoch: 16 Global Step: 209430 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:33:36,256-Speed 3070.65 samples/sec Loss 2.3414 LearningRate 0.0025 Epoch: 16 Global Step: 209440 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:33:39,600-Speed 3063.32 samples/sec Loss 2.2851 LearningRate 0.0025 Epoch: 16 Global Step: 209450 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:33:42,953-Speed 3054.24 samples/sec Loss 2.2724 LearningRate 0.0025 Epoch: 16 Global Step: 209460 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:33:46,403-Speed 2968.62 samples/sec Loss 2.3322 LearningRate 0.0025 Epoch: 16 Global Step: 209470 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:33:49,776-Speed 3037.04 samples/sec Loss 2.2702 LearningRate 0.0025 Epoch: 16 Global Step: 209480 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:33:53,168-Speed 3020.15 samples/sec Loss 2.2794 LearningRate 0.0025 Epoch: 16 Global Step: 209490 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 21:33:56,499-Speed 3074.45 samples/sec Loss 2.2760 LearningRate 0.0025 Epoch: 16 Global Step: 209500 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:33:59,869-Speed 3039.89 samples/sec Loss 2.3888 LearningRate 0.0025 Epoch: 16 Global Step: 209510 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:34:03,192-Speed 3082.69 samples/sec Loss 2.2480 LearningRate 0.0025 Epoch: 16 Global Step: 209520 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:34:06,601-Speed 3004.70 samples/sec Loss 2.2816 LearningRate 0.0025 Epoch: 16 Global Step: 209530 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:34:10,017-Speed 2997.94 samples/sec Loss 2.2955 LearningRate 0.0024 Epoch: 16 Global Step: 209540 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:34:13,390-Speed 3037.16 samples/sec Loss 2.2628 LearningRate 0.0024 Epoch: 16 Global Step: 209550 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:34:16,787-Speed 3014.91 samples/sec Loss 2.3690 LearningRate 0.0024 Epoch: 16 Global Step: 209560 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:34:20,225-Speed 2979.34 samples/sec Loss 2.3630 LearningRate 0.0024 Epoch: 16 Global Step: 209570 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:34:23,576-Speed 3056.75 samples/sec Loss 2.2885 LearningRate 0.0024 Epoch: 16 Global Step: 209580 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:34:26,895-Speed 3085.47 samples/sec Loss 2.2211 LearningRate 0.0024 Epoch: 16 Global Step: 209590 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:34:30,213-Speed 3088.68 samples/sec Loss 2.2795 LearningRate 0.0024 Epoch: 16 Global Step: 209600 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 21:34:33,727-Speed 2914.89 samples/sec Loss 2.2761 LearningRate 0.0024 Epoch: 16 Global Step: 209610 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 21:34:37,045-Speed 3087.72 samples/sec Loss 2.3358 LearningRate 0.0024 Epoch: 16 Global Step: 209620 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:34:40,439-Speed 3017.72 samples/sec Loss 2.2335 LearningRate 0.0024 Epoch: 16 Global Step: 209630 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:34:43,868-Speed 2987.32 samples/sec Loss 2.2770 LearningRate 0.0024 Epoch: 16 Global Step: 209640 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:34:47,315-Speed 2971.49 samples/sec Loss 2.2846 LearningRate 0.0024 Epoch: 16 Global Step: 209650 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:34:50,660-Speed 3062.24 samples/sec Loss 2.2760 LearningRate 0.0024 Epoch: 16 Global Step: 209660 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:34:54,124-Speed 2956.70 samples/sec Loss 2.3774 LearningRate 0.0024 Epoch: 16 Global Step: 209670 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:34:57,451-Speed 3078.15 samples/sec Loss 2.2790 LearningRate 0.0024 Epoch: 16 Global Step: 209680 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:35:00,792-Speed 3066.20 samples/sec Loss 2.3126 LearningRate 0.0024 Epoch: 16 Global Step: 209690 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:35:04,234-Speed 2975.62 samples/sec Loss 2.2751 LearningRate 0.0024 Epoch: 16 Global Step: 209700 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:35:07,631-Speed 3016.01 samples/sec Loss 2.2806 LearningRate 0.0024 Epoch: 16 Global Step: 209710 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:35:10,998-Speed 3042.22 samples/sec Loss 2.2649 LearningRate 0.0024 Epoch: 16 Global Step: 209720 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:35:14,452-Speed 2965.42 samples/sec Loss 2.2647 LearningRate 0.0024 Epoch: 16 Global Step: 209730 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:35:17,854-Speed 3010.18 samples/sec Loss 2.2817 LearningRate 0.0024 Epoch: 16 Global Step: 209740 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:35:21,296-Speed 2976.14 samples/sec Loss 2.2801 LearningRate 0.0024 Epoch: 16 Global Step: 209750 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:35:24,685-Speed 3022.14 samples/sec Loss 2.3031 LearningRate 0.0024 Epoch: 16 Global Step: 209760 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:35:28,134-Speed 2970.19 samples/sec Loss 2.3219 LearningRate 0.0024 Epoch: 16 Global Step: 209770 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:35:31,557-Speed 2992.42 samples/sec Loss 2.3039 LearningRate 0.0024 Epoch: 16 Global Step: 209780 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:35:34,934-Speed 3033.54 samples/sec Loss 2.3187 LearningRate 0.0024 Epoch: 16 Global Step: 209790 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:35:38,405-Speed 2951.32 samples/sec Loss 2.2862 LearningRate 0.0024 Epoch: 16 Global Step: 209800 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:35:41,878-Speed 2949.37 samples/sec Loss 2.3109 LearningRate 0.0024 Epoch: 16 Global Step: 209810 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:35:45,239-Speed 3046.78 samples/sec Loss 2.3156 LearningRate 0.0024 Epoch: 16 Global Step: 209820 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:35:48,606-Speed 3041.82 samples/sec Loss 2.2619 LearningRate 0.0024 Epoch: 16 Global Step: 209830 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:35:51,907-Speed 3103.18 samples/sec Loss 2.2222 LearningRate 0.0024 Epoch: 16 Global Step: 209840 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:35:55,359-Speed 2967.32 samples/sec Loss 2.2971 LearningRate 0.0024 Epoch: 16 Global Step: 209850 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 21:35:58,707-Speed 3059.55 samples/sec Loss 2.3163 LearningRate 0.0024 Epoch: 16 Global Step: 209860 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:36:02,037-Speed 3076.19 samples/sec Loss 2.2297 LearningRate 0.0024 Epoch: 16 Global Step: 209870 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:36:05,452-Speed 2999.30 samples/sec Loss 2.2872 LearningRate 0.0024 Epoch: 16 Global Step: 209880 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:36:08,765-Speed 3091.46 samples/sec Loss 2.3100 LearningRate 0.0024 Epoch: 16 Global Step: 209890 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:36:12,133-Speed 3041.46 samples/sec Loss 2.3064 LearningRate 0.0024 Epoch: 16 Global Step: 209900 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:36:15,477-Speed 3062.65 samples/sec Loss 2.2554 LearningRate 0.0024 Epoch: 16 Global Step: 209910 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:36:18,872-Speed 3017.31 samples/sec Loss 2.2752 LearningRate 0.0024 Epoch: 16 Global Step: 209920 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:36:22,301-Speed 2987.50 samples/sec Loss 2.2744 LearningRate 0.0024 Epoch: 16 Global Step: 209930 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:36:25,644-Speed 3064.03 samples/sec Loss 2.2958 LearningRate 0.0024 Epoch: 16 Global Step: 209940 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:36:29,113-Speed 2952.92 samples/sec Loss 2.2542 LearningRate 0.0024 Epoch: 16 Global Step: 209950 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:36:32,483-Speed 3039.38 samples/sec Loss 2.4116 LearningRate 0.0024 Epoch: 16 Global Step: 209960 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:36:35,917-Speed 2983.35 samples/sec Loss 2.3086 LearningRate 0.0024 Epoch: 16 Global Step: 209970 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:36:39,273-Speed 3051.72 samples/sec Loss 2.2390 LearningRate 0.0024 Epoch: 16 Global Step: 209980 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:36:42,648-Speed 3034.74 samples/sec Loss 2.2821 LearningRate 0.0024 Epoch: 16 Global Step: 209990 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:36:46,059-Speed 3002.74 samples/sec Loss 2.3332 LearningRate 0.0024 Epoch: 16 Global Step: 210000 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:36:49,467-Speed 3006.05 samples/sec Loss 2.2258 LearningRate 0.0024 Epoch: 16 Global Step: 210010 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:36:52,922-Speed 2966.00 samples/sec Loss 2.3376 LearningRate 0.0024 Epoch: 16 Global Step: 210020 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:36:56,387-Speed 2955.69 samples/sec Loss 2.3270 LearningRate 0.0024 Epoch: 16 Global Step: 210030 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:36:59,732-Speed 3061.95 samples/sec Loss 2.2910 LearningRate 0.0024 Epoch: 16 Global Step: 210040 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:37:03,045-Speed 3092.50 samples/sec Loss 2.2904 LearningRate 0.0024 Epoch: 16 Global Step: 210050 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:37:06,413-Speed 3041.41 samples/sec Loss 2.3492 LearningRate 0.0024 Epoch: 16 Global Step: 210060 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:37:09,827-Speed 3000.66 samples/sec Loss 2.2247 LearningRate 0.0024 Epoch: 16 Global Step: 210070 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:37:13,215-Speed 3022.57 samples/sec Loss 2.2525 LearningRate 0.0024 Epoch: 16 Global Step: 210080 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:37:16,601-Speed 3024.85 samples/sec Loss 2.3394 LearningRate 0.0024 Epoch: 16 Global Step: 210090 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:37:19,958-Speed 3052.07 samples/sec Loss 2.2666 LearningRate 0.0024 Epoch: 16 Global Step: 210100 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:37:23,342-Speed 3026.67 samples/sec Loss 2.2524 LearningRate 0.0024 Epoch: 16 Global Step: 210110 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:37:26,671-Speed 3076.68 samples/sec Loss 2.2678 LearningRate 0.0024 Epoch: 16 Global Step: 210120 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:37:30,043-Speed 3038.04 samples/sec Loss 2.2282 LearningRate 0.0024 Epoch: 16 Global Step: 210130 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:37:33,370-Speed 3079.00 samples/sec Loss 2.3883 LearningRate 0.0024 Epoch: 16 Global Step: 210140 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:37:36,710-Speed 3065.96 samples/sec Loss 2.2116 LearningRate 0.0024 Epoch: 16 Global Step: 210150 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:37:40,144-Speed 2982.75 samples/sec Loss 2.2703 LearningRate 0.0024 Epoch: 16 Global Step: 210160 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:37:43,458-Speed 3091.00 samples/sec Loss 2.2563 LearningRate 0.0024 Epoch: 16 Global Step: 210170 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:37:46,845-Speed 3024.26 samples/sec Loss 2.2729 LearningRate 0.0024 Epoch: 16 Global Step: 210180 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:37:50,243-Speed 3014.60 samples/sec Loss 2.3195 LearningRate 0.0024 Epoch: 16 Global Step: 210190 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:37:53,589-Speed 3061.02 samples/sec Loss 2.3106 LearningRate 0.0024 Epoch: 16 Global Step: 210200 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:37:56,945-Speed 3051.73 samples/sec Loss 2.2773 LearningRate 0.0024 Epoch: 16 Global Step: 210210 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:38:00,422-Speed 2946.49 samples/sec Loss 2.2645 LearningRate 0.0024 Epoch: 16 Global Step: 210220 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:38:03,858-Speed 2981.32 samples/sec Loss 2.2884 LearningRate 0.0024 Epoch: 16 Global Step: 210230 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:38:07,213-Speed 3053.33 samples/sec Loss 2.3418 LearningRate 0.0024 Epoch: 16 Global Step: 210240 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:38:10,614-Speed 3011.21 samples/sec Loss 2.2844 LearningRate 0.0024 Epoch: 16 Global Step: 210250 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:38:14,039-Speed 2990.60 samples/sec Loss 2.3132 LearningRate 0.0024 Epoch: 16 Global Step: 210260 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:38:17,422-Speed 3027.95 samples/sec Loss 2.3569 LearningRate 0.0024 Epoch: 16 Global Step: 210270 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:38:20,937-Speed 2914.25 samples/sec Loss 2.2386 LearningRate 0.0024 Epoch: 16 Global Step: 210280 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:38:24,325-Speed 3023.14 samples/sec Loss 2.2230 LearningRate 0.0024 Epoch: 16 Global Step: 210290 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:38:27,715-Speed 3021.78 samples/sec Loss 2.2308 LearningRate 0.0024 Epoch: 16 Global Step: 210300 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:38:31,089-Speed 3035.62 samples/sec Loss 2.3209 LearningRate 0.0024 Epoch: 16 Global Step: 210310 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:38:34,474-Speed 3026.58 samples/sec Loss 2.2985 LearningRate 0.0024 Epoch: 16 Global Step: 210320 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:38:37,841-Speed 3042.13 samples/sec Loss 2.2476 LearningRate 0.0024 Epoch: 16 Global Step: 210330 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:38:41,147-Speed 3098.21 samples/sec Loss 2.3215 LearningRate 0.0023 Epoch: 16 Global Step: 210340 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:38:44,543-Speed 3015.84 samples/sec Loss 2.3237 LearningRate 0.0023 Epoch: 16 Global Step: 210350 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:38:47,877-Speed 3072.00 samples/sec Loss 2.2829 LearningRate 0.0023 Epoch: 16 Global Step: 210360 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:38:51,231-Speed 3054.18 samples/sec Loss 2.3153 LearningRate 0.0023 Epoch: 16 Global Step: 210370 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:38:54,554-Speed 3082.80 samples/sec Loss 2.3051 LearningRate 0.0023 Epoch: 16 Global Step: 210380 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:38:57,929-Speed 3034.30 samples/sec Loss 2.2511 LearningRate 0.0023 Epoch: 16 Global Step: 210390 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:39:01,320-Speed 3021.47 samples/sec Loss 2.2619 LearningRate 0.0023 Epoch: 16 Global Step: 210400 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:39:04,731-Speed 3002.46 samples/sec Loss 2.2575 LearningRate 0.0023 Epoch: 16 Global Step: 210410 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:39:08,094-Speed 3045.87 samples/sec Loss 2.1983 LearningRate 0.0023 Epoch: 16 Global Step: 210420 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:39:11,464-Speed 3039.32 samples/sec Loss 2.2999 LearningRate 0.0023 Epoch: 16 Global Step: 210430 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:39:14,844-Speed 3030.53 samples/sec Loss 2.2525 LearningRate 0.0023 Epoch: 16 Global Step: 210440 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:39:18,202-Speed 3050.36 samples/sec Loss 2.2922 LearningRate 0.0023 Epoch: 16 Global Step: 210450 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:39:21,573-Speed 3038.43 samples/sec Loss 2.3685 LearningRate 0.0023 Epoch: 16 Global Step: 210460 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 21:39:24,931-Speed 3050.40 samples/sec Loss 2.3153 LearningRate 0.0023 Epoch: 16 Global Step: 210470 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:39:28,288-Speed 3051.54 samples/sec Loss 2.2875 LearningRate 0.0023 Epoch: 16 Global Step: 210480 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:39:31,733-Speed 2973.69 samples/sec Loss 2.2527 LearningRate 0.0023 Epoch: 16 Global Step: 210490 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:39:35,066-Speed 3072.51 samples/sec Loss 2.3164 LearningRate 0.0023 Epoch: 16 Global Step: 210500 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:39:38,400-Speed 3072.97 samples/sec Loss 2.3225 LearningRate 0.0023 Epoch: 16 Global Step: 210510 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:39:41,760-Speed 3048.76 samples/sec Loss 2.3061 LearningRate 0.0023 Epoch: 16 Global Step: 210520 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:39:45,158-Speed 3013.68 samples/sec Loss 2.2870 LearningRate 0.0023 Epoch: 16 Global Step: 210530 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:39:48,550-Speed 3019.47 samples/sec Loss 2.3084 LearningRate 0.0023 Epoch: 16 Global Step: 210540 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:39:51,931-Speed 3030.18 samples/sec Loss 2.2467 LearningRate 0.0023 Epoch: 16 Global Step: 210550 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:39:55,350-Speed 2996.30 samples/sec Loss 2.2696 LearningRate 0.0023 Epoch: 16 Global Step: 210560 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:39:58,804-Speed 2966.15 samples/sec Loss 2.3175 LearningRate 0.0023 Epoch: 16 Global Step: 210570 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:40:02,157-Speed 3054.44 samples/sec Loss 2.2848 LearningRate 0.0023 Epoch: 16 Global Step: 210580 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:40:05,563-Speed 3007.47 samples/sec Loss 2.2600 LearningRate 0.0023 Epoch: 16 Global Step: 210590 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:40:08,978-Speed 2999.53 samples/sec Loss 2.2368 LearningRate 0.0023 Epoch: 16 Global Step: 210600 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:40:12,360-Speed 3028.60 samples/sec Loss 2.3678 LearningRate 0.0023 Epoch: 16 Global Step: 210610 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:40:15,831-Speed 2951.22 samples/sec Loss 2.2798 LearningRate 0.0023 Epoch: 16 Global Step: 210620 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:40:19,141-Speed 3094.33 samples/sec Loss 2.3189 LearningRate 0.0023 Epoch: 16 Global Step: 210630 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:40:22,452-Speed 3093.46 samples/sec Loss 2.2040 LearningRate 0.0023 Epoch: 16 Global Step: 210640 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:40:25,789-Speed 3068.91 samples/sec Loss 2.3079 LearningRate 0.0023 Epoch: 16 Global Step: 210650 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:40:29,203-Speed 3001.15 samples/sec Loss 2.2506 LearningRate 0.0023 Epoch: 16 Global Step: 210660 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:40:32,529-Speed 3079.39 samples/sec Loss 2.2258 LearningRate 0.0023 Epoch: 16 Global Step: 210670 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:40:35,898-Speed 3039.81 samples/sec Loss 2.2719 LearningRate 0.0023 Epoch: 16 Global Step: 210680 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:40:39,238-Speed 3066.54 samples/sec Loss 2.3959 LearningRate 0.0023 Epoch: 16 Global Step: 210690 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:40:42,632-Speed 3018.39 samples/sec Loss 2.2368 LearningRate 0.0023 Epoch: 16 Global Step: 210700 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:40:45,969-Speed 3069.45 samples/sec Loss 2.2568 LearningRate 0.0023 Epoch: 16 Global Step: 210710 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:40:49,341-Speed 3037.54 samples/sec Loss 2.2828 LearningRate 0.0023 Epoch: 16 Global Step: 210720 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:40:52,786-Speed 2973.71 samples/sec Loss 2.2338 LearningRate 0.0023 Epoch: 16 Global Step: 210730 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:40:56,157-Speed 3038.07 samples/sec Loss 2.2423 LearningRate 0.0023 Epoch: 16 Global Step: 210740 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:40:59,628-Speed 2951.46 samples/sec Loss 2.2721 LearningRate 0.0023 Epoch: 16 Global Step: 210750 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:41:02,979-Speed 3055.88 samples/sec Loss 2.2736 LearningRate 0.0023 Epoch: 16 Global Step: 210760 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:41:06,342-Speed 3045.57 samples/sec Loss 2.2416 LearningRate 0.0023 Epoch: 16 Global Step: 210770 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:41:09,643-Speed 3103.57 samples/sec Loss 2.3273 LearningRate 0.0023 Epoch: 16 Global Step: 210780 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:41:13,009-Speed 3042.98 samples/sec Loss 2.3388 LearningRate 0.0023 Epoch: 16 Global Step: 210790 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:41:16,448-Speed 2978.00 samples/sec Loss 2.2494 LearningRate 0.0023 Epoch: 16 Global Step: 210800 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:41:19,803-Speed 3053.03 samples/sec Loss 2.2549 LearningRate 0.0023 Epoch: 16 Global Step: 210810 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:41:23,171-Speed 3041.12 samples/sec Loss 2.2701 LearningRate 0.0023 Epoch: 16 Global Step: 210820 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:41:26,500-Speed 3077.22 samples/sec Loss 2.2465 LearningRate 0.0023 Epoch: 16 Global Step: 210830 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:41:29,864-Speed 3045.20 samples/sec Loss 2.3468 LearningRate 0.0023 Epoch: 16 Global Step: 210840 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:41:33,232-Speed 3040.53 samples/sec Loss 2.2576 LearningRate 0.0023 Epoch: 16 Global Step: 210850 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:41:36,659-Speed 2989.22 samples/sec Loss 2.2554 LearningRate 0.0023 Epoch: 16 Global Step: 210860 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:41:40,031-Speed 3037.35 samples/sec Loss 2.2754 LearningRate 0.0023 Epoch: 16 Global Step: 210870 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:41:43,457-Speed 2989.69 samples/sec Loss 2.3200 LearningRate 0.0023 Epoch: 16 Global Step: 210880 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:41:46,815-Speed 3049.67 samples/sec Loss 2.2562 LearningRate 0.0023 Epoch: 16 Global Step: 210890 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:41:50,183-Speed 3041.92 samples/sec Loss 2.2744 LearningRate 0.0023 Epoch: 16 Global Step: 210900 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:41:53,558-Speed 3034.88 samples/sec Loss 2.3121 LearningRate 0.0023 Epoch: 16 Global Step: 210910 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:41:56,911-Speed 3054.76 samples/sec Loss 2.2301 LearningRate 0.0023 Epoch: 16 Global Step: 210920 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:42:00,288-Speed 3033.53 samples/sec Loss 2.3178 LearningRate 0.0023 Epoch: 16 Global Step: 210930 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:42:03,681-Speed 3018.60 samples/sec Loss 2.3466 LearningRate 0.0023 Epoch: 16 Global Step: 210940 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:42:07,090-Speed 3004.64 samples/sec Loss 2.2567 LearningRate 0.0023 Epoch: 16 Global Step: 210950 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:42:10,423-Speed 3073.45 samples/sec Loss 2.2783 LearningRate 0.0023 Epoch: 16 Global Step: 210960 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:42:13,890-Speed 2954.10 samples/sec Loss 2.3295 LearningRate 0.0023 Epoch: 16 Global Step: 210970 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:42:17,261-Speed 3038.61 samples/sec Loss 2.2386 LearningRate 0.0023 Epoch: 16 Global Step: 210980 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:42:20,585-Speed 3081.52 samples/sec Loss 2.2818 LearningRate 0.0023 Epoch: 16 Global Step: 210990 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:42:24,039-Speed 2965.45 samples/sec Loss 2.2272 LearningRate 0.0023 Epoch: 16 Global Step: 211000 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:42:27,417-Speed 3031.97 samples/sec Loss 2.3156 LearningRate 0.0023 Epoch: 16 Global Step: 211010 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:42:30,854-Speed 2979.96 samples/sec Loss 2.3141 LearningRate 0.0023 Epoch: 16 Global Step: 211020 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:42:34,212-Speed 3050.38 samples/sec Loss 2.2042 LearningRate 0.0023 Epoch: 16 Global Step: 211030 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:42:37,622-Speed 3004.17 samples/sec Loss 2.3451 LearningRate 0.0023 Epoch: 16 Global Step: 211040 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:42:40,966-Speed 3062.69 samples/sec Loss 2.3103 LearningRate 0.0023 Epoch: 16 Global Step: 211050 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:42:44,340-Speed 3036.11 samples/sec Loss 2.2138 LearningRate 0.0023 Epoch: 16 Global Step: 211060 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:42:47,742-Speed 3011.29 samples/sec Loss 2.2193 LearningRate 0.0023 Epoch: 16 Global Step: 211070 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:42:51,056-Speed 3090.94 samples/sec Loss 2.2910 LearningRate 0.0023 Epoch: 16 Global Step: 211080 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:42:54,408-Speed 3055.06 samples/sec Loss 2.3557 LearningRate 0.0023 Epoch: 16 Global Step: 211090 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:42:57,725-Speed 3089.20 samples/sec Loss 2.2670 LearningRate 0.0023 Epoch: 16 Global Step: 211100 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:43:01,073-Speed 3058.72 samples/sec Loss 2.2669 LearningRate 0.0023 Epoch: 16 Global Step: 211110 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:43:04,448-Speed 3035.16 samples/sec Loss 2.2525 LearningRate 0.0023 Epoch: 16 Global Step: 211120 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:43:07,842-Speed 3017.73 samples/sec Loss 2.3168 LearningRate 0.0023 Epoch: 16 Global Step: 211130 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:43:11,165-Speed 3082.39 samples/sec Loss 2.2866 LearningRate 0.0023 Epoch: 16 Global Step: 211140 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:43:14,785-Speed 2829.24 samples/sec Loss 2.3285 LearningRate 0.0023 Epoch: 16 Global Step: 211150 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:43:45,436-Speed 334.10 samples/sec Loss 2.0658 LearningRate 0.0022 Epoch: 17 Global Step: 211160 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:43:48,973-Speed 2896.92 samples/sec Loss 1.4357 LearningRate 0.0022 Epoch: 17 Global Step: 211170 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:43:52,544-Speed 2868.51 samples/sec Loss 1.4037 LearningRate 0.0022 Epoch: 17 Global Step: 211180 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:43:55,931-Speed 3025.26 samples/sec Loss 1.4843 LearningRate 0.0022 Epoch: 17 Global Step: 211190 Fp16 Grad Scale: 16384 Required: 4 hours Training: 2022-04-27 21:43:59,309-Speed 3032.41 samples/sec Loss 1.3992 LearningRate 0.0022 Epoch: 17 Global Step: 211200 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:44:02,736-Speed 2988.74 samples/sec Loss 1.4171 LearningRate 0.0022 Epoch: 17 Global Step: 211210 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:44:06,242-Speed 2921.52 samples/sec Loss 1.4186 LearningRate 0.0022 Epoch: 17 Global Step: 211220 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:44:09,689-Speed 2971.29 samples/sec Loss 1.3853 LearningRate 0.0022 Epoch: 17 Global Step: 211230 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:44:13,068-Speed 3032.15 samples/sec Loss 1.4291 LearningRate 0.0022 Epoch: 17 Global Step: 211240 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:44:16,393-Speed 3080.15 samples/sec Loss 1.4220 LearningRate 0.0022 Epoch: 17 Global Step: 211250 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:44:19,891-Speed 2928.69 samples/sec Loss 1.4336 LearningRate 0.0022 Epoch: 17 Global Step: 211260 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:44:23,231-Speed 3066.80 samples/sec Loss 1.3711 LearningRate 0.0022 Epoch: 17 Global Step: 211270 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:44:26,646-Speed 2998.71 samples/sec Loss 1.3598 LearningRate 0.0022 Epoch: 17 Global Step: 211280 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:44:30,023-Speed 3033.74 samples/sec Loss 1.3697 LearningRate 0.0022 Epoch: 17 Global Step: 211290 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:44:33,427-Speed 3009.33 samples/sec Loss 1.3671 LearningRate 0.0022 Epoch: 17 Global Step: 211300 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 21:44:36,776-Speed 3057.92 samples/sec Loss 1.4721 LearningRate 0.0022 Epoch: 17 Global Step: 211310 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 21:44:40,252-Speed 2947.22 samples/sec Loss 1.3738 LearningRate 0.0022 Epoch: 17 Global Step: 211320 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 21:44:43,624-Speed 3037.78 samples/sec Loss 1.4142 LearningRate 0.0022 Epoch: 17 Global Step: 211330 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 21:44:47,138-Speed 2914.29 samples/sec Loss 1.4376 LearningRate 0.0022 Epoch: 17 Global Step: 211340 Fp16 Grad Scale: 65536 Required: 4 hours Training: 2022-04-27 21:44:50,571-Speed 2984.36 samples/sec Loss 1.3867 LearningRate 0.0022 Epoch: 17 Global Step: 211350 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:44:54,154-Speed 2858.32 samples/sec Loss 1.4066 LearningRate 0.0022 Epoch: 17 Global Step: 211360 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:44:57,660-Speed 2922.19 samples/sec Loss 1.4205 LearningRate 0.0022 Epoch: 17 Global Step: 211370 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:45:01,069-Speed 3004.09 samples/sec Loss 1.3549 LearningRate 0.0022 Epoch: 17 Global Step: 211380 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:45:04,550-Speed 2943.26 samples/sec Loss 1.4018 LearningRate 0.0022 Epoch: 17 Global Step: 211390 Fp16 Grad Scale: 32768 Required: 4 hours Training: 2022-04-27 21:45:07,962-Speed 3002.45 samples/sec Loss 1.4148 LearningRate 0.0022 Epoch: 17 Global Step: 211400 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 21:45:11,386-Speed 2991.69 samples/sec Loss 1.3872 LearningRate 0.0022 Epoch: 17 Global Step: 211410 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 21:45:14,805-Speed 2995.46 samples/sec Loss 1.4436 LearningRate 0.0022 Epoch: 17 Global Step: 211420 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 21:45:18,236-Speed 2986.10 samples/sec Loss 1.4141 LearningRate 0.0022 Epoch: 17 Global Step: 211430 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 21:45:21,710-Speed 2948.13 samples/sec Loss 1.4034 LearningRate 0.0022 Epoch: 17 Global Step: 211440 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 21:45:25,105-Speed 3017.31 samples/sec Loss 1.4596 LearningRate 0.0022 Epoch: 17 Global Step: 211450 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 21:45:28,482-Speed 3033.33 samples/sec Loss 1.4498 LearningRate 0.0022 Epoch: 17 Global Step: 211460 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 21:45:31,939-Speed 2963.23 samples/sec Loss 1.4937 LearningRate 0.0022 Epoch: 17 Global Step: 211470 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 21:45:35,356-Speed 2998.22 samples/sec Loss 1.4180 LearningRate 0.0022 Epoch: 17 Global Step: 211480 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 21:45:38,797-Speed 2976.07 samples/sec Loss 1.4269 LearningRate 0.0022 Epoch: 17 Global Step: 211490 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 21:45:42,136-Speed 3067.79 samples/sec Loss 1.4403 LearningRate 0.0022 Epoch: 17 Global Step: 211500 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 21:45:45,533-Speed 3015.98 samples/sec Loss 1.4266 LearningRate 0.0022 Epoch: 17 Global Step: 211510 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 21:45:48,990-Speed 2962.89 samples/sec Loss 1.4040 LearningRate 0.0022 Epoch: 17 Global Step: 211520 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 21:45:52,414-Speed 2990.65 samples/sec Loss 1.4824 LearningRate 0.0022 Epoch: 17 Global Step: 211530 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 21:45:55,820-Speed 3007.94 samples/sec Loss 1.4414 LearningRate 0.0022 Epoch: 17 Global Step: 211540 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 21:45:59,315-Speed 2930.90 samples/sec Loss 1.4788 LearningRate 0.0022 Epoch: 17 Global Step: 211550 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 21:46:02,745-Speed 2985.54 samples/sec Loss 1.4205 LearningRate 0.0022 Epoch: 17 Global Step: 211560 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 21:46:06,173-Speed 2988.65 samples/sec Loss 1.4223 LearningRate 0.0022 Epoch: 17 Global Step: 211570 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 21:46:09,577-Speed 3008.66 samples/sec Loss 1.4177 LearningRate 0.0022 Epoch: 17 Global Step: 211580 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 21:46:13,009-Speed 2984.31 samples/sec Loss 1.4466 LearningRate 0.0022 Epoch: 17 Global Step: 211590 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 21:46:16,446-Speed 2980.52 samples/sec Loss 1.3622 LearningRate 0.0022 Epoch: 17 Global Step: 211600 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 21:46:19,906-Speed 2960.31 samples/sec Loss 1.4649 LearningRate 0.0022 Epoch: 17 Global Step: 211610 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 21:46:23,281-Speed 3035.33 samples/sec Loss 1.4361 LearningRate 0.0022 Epoch: 17 Global Step: 211620 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 21:46:26,744-Speed 2957.50 samples/sec Loss 1.4265 LearningRate 0.0022 Epoch: 17 Global Step: 211630 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 21:46:30,182-Speed 2979.50 samples/sec Loss 1.4643 LearningRate 0.0022 Epoch: 17 Global Step: 211640 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 21:46:33,583-Speed 3011.94 samples/sec Loss 1.4038 LearningRate 0.0022 Epoch: 17 Global Step: 211650 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 21:46:36,966-Speed 3028.25 samples/sec Loss 1.4841 LearningRate 0.0022 Epoch: 17 Global Step: 211660 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 21:46:40,348-Speed 3027.98 samples/sec Loss 1.4484 LearningRate 0.0022 Epoch: 17 Global Step: 211670 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 21:46:43,745-Speed 3016.06 samples/sec Loss 1.4986 LearningRate 0.0022 Epoch: 17 Global Step: 211680 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 21:46:47,143-Speed 3013.62 samples/sec Loss 1.4406 LearningRate 0.0022 Epoch: 17 Global Step: 211690 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 21:46:50,531-Speed 3023.43 samples/sec Loss 1.4479 LearningRate 0.0022 Epoch: 17 Global Step: 211700 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 21:46:53,989-Speed 2962.04 samples/sec Loss 1.4649 LearningRate 0.0022 Epoch: 17 Global Step: 211710 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 21:46:57,418-Speed 2987.29 samples/sec Loss 1.4524 LearningRate 0.0022 Epoch: 17 Global Step: 211720 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 21:47:00,785-Speed 3042.61 samples/sec Loss 1.3912 LearningRate 0.0022 Epoch: 17 Global Step: 211730 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 21:47:04,187-Speed 3011.09 samples/sec Loss 1.4792 LearningRate 0.0022 Epoch: 17 Global Step: 211740 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 21:47:07,652-Speed 2955.54 samples/sec Loss 1.4871 LearningRate 0.0022 Epoch: 17 Global Step: 211750 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 21:47:11,022-Speed 3039.82 samples/sec Loss 1.4564 LearningRate 0.0022 Epoch: 17 Global Step: 211760 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 21:47:14,420-Speed 3014.08 samples/sec Loss 1.4311 LearningRate 0.0022 Epoch: 17 Global Step: 211770 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 21:47:17,824-Speed 3009.83 samples/sec Loss 1.3999 LearningRate 0.0022 Epoch: 17 Global Step: 211780 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 21:47:21,147-Speed 3082.22 samples/sec Loss 1.4012 LearningRate 0.0022 Epoch: 17 Global Step: 211790 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 21:47:24,485-Speed 3069.17 samples/sec Loss 1.4676 LearningRate 0.0022 Epoch: 17 Global Step: 211800 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 21:47:27,881-Speed 3015.18 samples/sec Loss 1.4008 LearningRate 0.0022 Epoch: 17 Global Step: 211810 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 21:47:31,290-Speed 3005.35 samples/sec Loss 1.5171 LearningRate 0.0022 Epoch: 17 Global Step: 211820 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 21:47:34,605-Speed 3089.78 samples/sec Loss 1.4681 LearningRate 0.0022 Epoch: 17 Global Step: 211830 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 21:47:38,027-Speed 2992.86 samples/sec Loss 1.4559 LearningRate 0.0022 Epoch: 17 Global Step: 211840 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 21:47:41,421-Speed 3017.91 samples/sec Loss 1.4363 LearningRate 0.0022 Epoch: 17 Global Step: 211850 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 21:47:44,834-Speed 3001.68 samples/sec Loss 1.4142 LearningRate 0.0022 Epoch: 17 Global Step: 211860 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 21:47:48,270-Speed 2980.07 samples/sec Loss 1.4697 LearningRate 0.0022 Epoch: 17 Global Step: 211870 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 21:47:51,681-Speed 3003.28 samples/sec Loss 1.4595 LearningRate 0.0022 Epoch: 17 Global Step: 211880 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 21:47:55,062-Speed 3029.64 samples/sec Loss 1.4453 LearningRate 0.0022 Epoch: 17 Global Step: 211890 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 21:47:58,487-Speed 2990.49 samples/sec Loss 1.4320 LearningRate 0.0022 Epoch: 17 Global Step: 211900 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 21:48:01,926-Speed 2978.49 samples/sec Loss 1.5065 LearningRate 0.0022 Epoch: 17 Global Step: 211910 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 21:48:05,318-Speed 3020.47 samples/sec Loss 1.4193 LearningRate 0.0022 Epoch: 17 Global Step: 211920 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 21:48:08,695-Speed 3032.76 samples/sec Loss 1.4561 LearningRate 0.0022 Epoch: 17 Global Step: 211930 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 21:48:12,079-Speed 3026.98 samples/sec Loss 1.4572 LearningRate 0.0022 Epoch: 17 Global Step: 211940 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 21:48:15,463-Speed 3026.77 samples/sec Loss 1.4904 LearningRate 0.0022 Epoch: 17 Global Step: 211950 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 21:48:18,897-Speed 2982.15 samples/sec Loss 1.4256 LearningRate 0.0022 Epoch: 17 Global Step: 211960 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 21:48:22,246-Speed 3059.09 samples/sec Loss 1.4072 LearningRate 0.0022 Epoch: 17 Global Step: 211970 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 21:48:25,660-Speed 3000.20 samples/sec Loss 1.4895 LearningRate 0.0022 Epoch: 17 Global Step: 211980 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 21:48:29,112-Speed 2967.30 samples/sec Loss 1.3886 LearningRate 0.0022 Epoch: 17 Global Step: 211990 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 21:48:32,561-Speed 2969.97 samples/sec Loss 1.4383 LearningRate 0.0021 Epoch: 17 Global Step: 212000 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 21:48:35,936-Speed 3035.21 samples/sec Loss 1.4755 LearningRate 0.0021 Epoch: 17 Global Step: 212010 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 21:48:39,318-Speed 3028.37 samples/sec Loss 1.4513 LearningRate 0.0021 Epoch: 17 Global Step: 212020 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 21:48:42,695-Speed 3033.86 samples/sec Loss 1.4532 LearningRate 0.0021 Epoch: 17 Global Step: 212030 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 21:48:46,067-Speed 3037.08 samples/sec Loss 1.4027 LearningRate 0.0021 Epoch: 17 Global Step: 212040 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 21:48:49,469-Speed 3010.94 samples/sec Loss 1.4738 LearningRate 0.0021 Epoch: 17 Global Step: 212050 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 21:48:52,885-Speed 2998.57 samples/sec Loss 1.5014 LearningRate 0.0021 Epoch: 17 Global Step: 212060 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 21:48:56,289-Speed 3008.45 samples/sec Loss 1.5075 LearningRate 0.0021 Epoch: 17 Global Step: 212070 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 21:48:59,722-Speed 2984.43 samples/sec Loss 1.4700 LearningRate 0.0021 Epoch: 17 Global Step: 212080 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 21:49:03,090-Speed 3040.83 samples/sec Loss 1.4939 LearningRate 0.0021 Epoch: 17 Global Step: 212090 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 21:49:06,498-Speed 3005.52 samples/sec Loss 1.5269 LearningRate 0.0021 Epoch: 17 Global Step: 212100 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 21:49:09,916-Speed 2997.54 samples/sec Loss 1.4768 LearningRate 0.0021 Epoch: 17 Global Step: 212110 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 21:49:13,366-Speed 2968.64 samples/sec Loss 1.4558 LearningRate 0.0021 Epoch: 17 Global Step: 212120 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 21:49:16,702-Speed 3070.67 samples/sec Loss 1.4537 LearningRate 0.0021 Epoch: 17 Global Step: 212130 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 21:49:20,177-Speed 2947.76 samples/sec Loss 1.4349 LearningRate 0.0021 Epoch: 17 Global Step: 212140 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 21:49:23,620-Speed 2974.96 samples/sec Loss 1.4642 LearningRate 0.0021 Epoch: 17 Global Step: 212150 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 21:49:26,975-Speed 3053.06 samples/sec Loss 1.5006 LearningRate 0.0021 Epoch: 17 Global Step: 212160 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 21:49:30,375-Speed 3011.84 samples/sec Loss 1.4480 LearningRate 0.0021 Epoch: 17 Global Step: 212170 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 21:49:33,734-Speed 3049.64 samples/sec Loss 1.5333 LearningRate 0.0021 Epoch: 17 Global Step: 212180 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 21:49:37,055-Speed 3084.59 samples/sec Loss 1.4575 LearningRate 0.0021 Epoch: 17 Global Step: 212190 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 21:49:40,427-Speed 3036.93 samples/sec Loss 1.5002 LearningRate 0.0021 Epoch: 17 Global Step: 212200 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 21:49:43,835-Speed 3005.88 samples/sec Loss 1.4352 LearningRate 0.0021 Epoch: 17 Global Step: 212210 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 21:49:47,265-Speed 2986.70 samples/sec Loss 1.4275 LearningRate 0.0021 Epoch: 17 Global Step: 212220 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 21:49:50,665-Speed 3012.30 samples/sec Loss 1.4590 LearningRate 0.0021 Epoch: 17 Global Step: 212230 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 21:49:54,052-Speed 3024.61 samples/sec Loss 1.4451 LearningRate 0.0021 Epoch: 17 Global Step: 212240 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 21:49:57,459-Speed 3005.69 samples/sec Loss 1.5011 LearningRate 0.0021 Epoch: 17 Global Step: 212250 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 21:50:00,887-Speed 2988.13 samples/sec Loss 1.5125 LearningRate 0.0021 Epoch: 17 Global Step: 212260 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 21:50:04,310-Speed 2992.58 samples/sec Loss 1.4270 LearningRate 0.0021 Epoch: 17 Global Step: 212270 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 21:50:07,806-Speed 2929.71 samples/sec Loss 1.4941 LearningRate 0.0021 Epoch: 17 Global Step: 212280 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 21:50:11,168-Speed 3046.71 samples/sec Loss 1.4666 LearningRate 0.0021 Epoch: 17 Global Step: 212290 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 21:50:14,563-Speed 3016.75 samples/sec Loss 1.5231 LearningRate 0.0021 Epoch: 17 Global Step: 212300 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 21:50:17,922-Speed 3049.60 samples/sec Loss 1.4714 LearningRate 0.0021 Epoch: 17 Global Step: 212310 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 21:50:21,361-Speed 2978.89 samples/sec Loss 1.5415 LearningRate 0.0021 Epoch: 17 Global Step: 212320 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 21:50:24,771-Speed 3003.70 samples/sec Loss 1.5150 LearningRate 0.0021 Epoch: 17 Global Step: 212330 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 21:50:28,103-Speed 3073.58 samples/sec Loss 1.5055 LearningRate 0.0021 Epoch: 17 Global Step: 212340 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 21:50:31,477-Speed 3036.53 samples/sec Loss 1.4809 LearningRate 0.0021 Epoch: 17 Global Step: 212350 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 21:50:34,856-Speed 3030.87 samples/sec Loss 1.5158 LearningRate 0.0021 Epoch: 17 Global Step: 212360 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 21:50:38,166-Speed 3094.60 samples/sec Loss 1.4991 LearningRate 0.0021 Epoch: 17 Global Step: 212370 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 21:50:41,495-Speed 3076.92 samples/sec Loss 1.4581 LearningRate 0.0021 Epoch: 17 Global Step: 212380 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 21:50:44,810-Speed 3090.32 samples/sec Loss 1.4998 LearningRate 0.0021 Epoch: 17 Global Step: 212390 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 21:50:48,135-Speed 3079.95 samples/sec Loss 1.4400 LearningRate 0.0021 Epoch: 17 Global Step: 212400 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 21:50:51,474-Speed 3068.23 samples/sec Loss 1.5176 LearningRate 0.0021 Epoch: 17 Global Step: 212410 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 21:50:54,925-Speed 2967.71 samples/sec Loss 1.5154 LearningRate 0.0021 Epoch: 17 Global Step: 212420 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 21:50:58,318-Speed 3018.75 samples/sec Loss 1.4878 LearningRate 0.0021 Epoch: 17 Global Step: 212430 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 21:51:01,694-Speed 3034.05 samples/sec Loss 1.4988 LearningRate 0.0021 Epoch: 17 Global Step: 212440 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 21:51:05,101-Speed 3007.03 samples/sec Loss 1.4409 LearningRate 0.0021 Epoch: 17 Global Step: 212450 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 21:51:08,401-Speed 3104.55 samples/sec Loss 1.5465 LearningRate 0.0021 Epoch: 17 Global Step: 212460 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 21:51:11,726-Speed 3080.26 samples/sec Loss 1.4854 LearningRate 0.0021 Epoch: 17 Global Step: 212470 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 21:51:15,050-Speed 3082.03 samples/sec Loss 1.4749 LearningRate 0.0021 Epoch: 17 Global Step: 212480 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 21:51:18,352-Speed 3101.52 samples/sec Loss 1.4653 LearningRate 0.0021 Epoch: 17 Global Step: 212490 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 21:51:21,814-Speed 2958.27 samples/sec Loss 1.4878 LearningRate 0.0021 Epoch: 17 Global Step: 212500 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 21:51:25,164-Speed 3058.11 samples/sec Loss 1.5225 LearningRate 0.0021 Epoch: 17 Global Step: 212510 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 21:51:28,534-Speed 3039.85 samples/sec Loss 1.4643 LearningRate 0.0021 Epoch: 17 Global Step: 212520 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 21:51:31,904-Speed 3039.33 samples/sec Loss 1.4725 LearningRate 0.0021 Epoch: 17 Global Step: 212530 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 21:51:35,243-Speed 3068.94 samples/sec Loss 1.4634 LearningRate 0.0021 Epoch: 17 Global Step: 212540 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 21:51:38,651-Speed 3005.11 samples/sec Loss 1.4290 LearningRate 0.0021 Epoch: 17 Global Step: 212550 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 21:51:42,086-Speed 2982.62 samples/sec Loss 1.4888 LearningRate 0.0021 Epoch: 17 Global Step: 212560 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 21:51:45,465-Speed 3031.09 samples/sec Loss 1.5369 LearningRate 0.0021 Epoch: 17 Global Step: 212570 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 21:51:48,770-Speed 3098.84 samples/sec Loss 1.5682 LearningRate 0.0021 Epoch: 17 Global Step: 212580 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 21:51:52,116-Speed 3062.14 samples/sec Loss 1.4777 LearningRate 0.0021 Epoch: 17 Global Step: 212590 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 21:51:55,541-Speed 2990.35 samples/sec Loss 1.5523 LearningRate 0.0021 Epoch: 17 Global Step: 212600 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 21:51:58,912-Speed 3038.61 samples/sec Loss 1.4996 LearningRate 0.0021 Epoch: 17 Global Step: 212610 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 21:52:02,253-Speed 3065.88 samples/sec Loss 1.4696 LearningRate 0.0021 Epoch: 17 Global Step: 212620 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 21:52:05,573-Speed 3085.13 samples/sec Loss 1.4878 LearningRate 0.0021 Epoch: 17 Global Step: 212630 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 21:52:08,911-Speed 3068.60 samples/sec Loss 1.4646 LearningRate 0.0021 Epoch: 17 Global Step: 212640 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 21:52:12,328-Speed 2997.70 samples/sec Loss 1.4818 LearningRate 0.0021 Epoch: 17 Global Step: 212650 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 21:52:15,708-Speed 3030.89 samples/sec Loss 1.4609 LearningRate 0.0021 Epoch: 17 Global Step: 212660 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 21:52:19,080-Speed 3037.54 samples/sec Loss 1.4871 LearningRate 0.0021 Epoch: 17 Global Step: 212670 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 21:52:22,438-Speed 3050.13 samples/sec Loss 1.4793 LearningRate 0.0021 Epoch: 17 Global Step: 212680 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 21:52:25,753-Speed 3089.55 samples/sec Loss 1.5168 LearningRate 0.0021 Epoch: 17 Global Step: 212690 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 21:52:29,166-Speed 3001.55 samples/sec Loss 1.5081 LearningRate 0.0021 Epoch: 17 Global Step: 212700 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 21:52:32,460-Speed 3109.14 samples/sec Loss 1.4764 LearningRate 0.0021 Epoch: 17 Global Step: 212710 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 21:52:35,795-Speed 3072.03 samples/sec Loss 1.4470 LearningRate 0.0021 Epoch: 17 Global Step: 212720 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 21:52:39,125-Speed 3076.11 samples/sec Loss 1.5119 LearningRate 0.0021 Epoch: 17 Global Step: 212730 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 21:52:42,448-Speed 3081.93 samples/sec Loss 1.5319 LearningRate 0.0021 Epoch: 17 Global Step: 212740 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 21:52:45,868-Speed 2995.16 samples/sec Loss 1.5349 LearningRate 0.0021 Epoch: 17 Global Step: 212750 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 21:52:49,234-Speed 3043.01 samples/sec Loss 1.4766 LearningRate 0.0021 Epoch: 17 Global Step: 212760 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 21:52:52,733-Speed 2927.62 samples/sec Loss 1.5100 LearningRate 0.0021 Epoch: 17 Global Step: 212770 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 21:52:56,157-Speed 2991.60 samples/sec Loss 1.5252 LearningRate 0.0021 Epoch: 17 Global Step: 212780 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 21:52:59,506-Speed 3057.77 samples/sec Loss 1.4953 LearningRate 0.0021 Epoch: 17 Global Step: 212790 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 21:53:02,883-Speed 3033.66 samples/sec Loss 1.5045 LearningRate 0.0021 Epoch: 17 Global Step: 212800 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 21:53:06,307-Speed 2991.74 samples/sec Loss 1.4617 LearningRate 0.0021 Epoch: 17 Global Step: 212810 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 21:53:09,690-Speed 3027.94 samples/sec Loss 1.4739 LearningRate 0.0021 Epoch: 17 Global Step: 212820 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 21:53:13,030-Speed 3066.87 samples/sec Loss 1.5419 LearningRate 0.0021 Epoch: 17 Global Step: 212830 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 21:53:16,364-Speed 3071.71 samples/sec Loss 1.4787 LearningRate 0.0021 Epoch: 17 Global Step: 212840 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 21:53:19,753-Speed 3022.53 samples/sec Loss 1.4997 LearningRate 0.0021 Epoch: 17 Global Step: 212850 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 21:53:23,130-Speed 3032.98 samples/sec Loss 1.4752 LearningRate 0.0020 Epoch: 17 Global Step: 212860 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 21:53:26,609-Speed 2944.14 samples/sec Loss 1.5173 LearningRate 0.0020 Epoch: 17 Global Step: 212870 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 21:53:30,000-Speed 3020.61 samples/sec Loss 1.5202 LearningRate 0.0020 Epoch: 17 Global Step: 212880 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 21:53:33,442-Speed 2975.82 samples/sec Loss 1.4798 LearningRate 0.0020 Epoch: 17 Global Step: 212890 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 21:53:36,922-Speed 2943.35 samples/sec Loss 1.5517 LearningRate 0.0020 Epoch: 17 Global Step: 212900 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 21:53:40,373-Speed 2968.41 samples/sec Loss 1.4630 LearningRate 0.0020 Epoch: 17 Global Step: 212910 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 21:53:43,767-Speed 3017.75 samples/sec Loss 1.5229 LearningRate 0.0020 Epoch: 17 Global Step: 212920 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 21:53:47,204-Speed 2981.21 samples/sec Loss 1.5084 LearningRate 0.0020 Epoch: 17 Global Step: 212930 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 21:53:50,597-Speed 3018.67 samples/sec Loss 1.5273 LearningRate 0.0020 Epoch: 17 Global Step: 212940 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 21:53:53,999-Speed 3010.79 samples/sec Loss 1.5392 LearningRate 0.0020 Epoch: 17 Global Step: 212950 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 21:53:57,394-Speed 3016.88 samples/sec Loss 1.4854 LearningRate 0.0020 Epoch: 17 Global Step: 212960 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 21:54:00,769-Speed 3035.56 samples/sec Loss 1.5363 LearningRate 0.0020 Epoch: 17 Global Step: 212970 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 21:54:04,141-Speed 3038.04 samples/sec Loss 1.4679 LearningRate 0.0020 Epoch: 17 Global Step: 212980 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 21:54:07,515-Speed 3035.09 samples/sec Loss 1.4843 LearningRate 0.0020 Epoch: 17 Global Step: 212990 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 21:54:10,872-Speed 3051.42 samples/sec Loss 1.4783 LearningRate 0.0020 Epoch: 17 Global Step: 213000 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 21:54:14,265-Speed 3019.01 samples/sec Loss 1.4785 LearningRate 0.0020 Epoch: 17 Global Step: 213010 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 21:54:17,602-Speed 3069.16 samples/sec Loss 1.4886 LearningRate 0.0020 Epoch: 17 Global Step: 213020 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 21:54:21,062-Speed 2960.65 samples/sec Loss 1.5404 LearningRate 0.0020 Epoch: 17 Global Step: 213030 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 21:54:24,378-Speed 3088.96 samples/sec Loss 1.4904 LearningRate 0.0020 Epoch: 17 Global Step: 213040 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 21:54:27,725-Speed 3059.95 samples/sec Loss 1.5415 LearningRate 0.0020 Epoch: 17 Global Step: 213050 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 21:54:31,108-Speed 3028.55 samples/sec Loss 1.5228 LearningRate 0.0020 Epoch: 17 Global Step: 213060 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 21:54:34,557-Speed 2969.98 samples/sec Loss 1.5032 LearningRate 0.0020 Epoch: 17 Global Step: 213070 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 21:54:38,127-Speed 2869.20 samples/sec Loss 1.5513 LearningRate 0.0020 Epoch: 17 Global Step: 213080 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 21:54:41,597-Speed 2952.33 samples/sec Loss 1.5779 LearningRate 0.0020 Epoch: 17 Global Step: 213090 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 21:54:45,019-Speed 2993.30 samples/sec Loss 1.4548 LearningRate 0.0020 Epoch: 17 Global Step: 213100 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 21:54:48,372-Speed 3054.47 samples/sec Loss 1.5211 LearningRate 0.0020 Epoch: 17 Global Step: 213110 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 21:54:51,763-Speed 3021.08 samples/sec Loss 1.4839 LearningRate 0.0020 Epoch: 17 Global Step: 213120 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 21:54:55,151-Speed 3023.67 samples/sec Loss 1.5216 LearningRate 0.0020 Epoch: 17 Global Step: 213130 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 21:54:58,489-Speed 3068.13 samples/sec Loss 1.4664 LearningRate 0.0020 Epoch: 17 Global Step: 213140 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 21:55:01,934-Speed 2974.12 samples/sec Loss 1.5391 LearningRate 0.0020 Epoch: 17 Global Step: 213150 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 21:55:05,332-Speed 3014.46 samples/sec Loss 1.5556 LearningRate 0.0020 Epoch: 17 Global Step: 213160 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 21:55:08,704-Speed 3037.23 samples/sec Loss 1.4914 LearningRate 0.0020 Epoch: 17 Global Step: 213170 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 21:55:12,067-Speed 3046.01 samples/sec Loss 1.5195 LearningRate 0.0020 Epoch: 17 Global Step: 213180 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 21:55:15,488-Speed 2994.47 samples/sec Loss 1.5109 LearningRate 0.0020 Epoch: 17 Global Step: 213190 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 21:55:18,941-Speed 2966.39 samples/sec Loss 1.5319 LearningRate 0.0020 Epoch: 17 Global Step: 213200 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 21:55:22,314-Speed 3036.18 samples/sec Loss 1.5869 LearningRate 0.0020 Epoch: 17 Global Step: 213210 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 21:55:25,734-Speed 2995.24 samples/sec Loss 1.5645 LearningRate 0.0020 Epoch: 17 Global Step: 213220 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 21:55:29,198-Speed 2957.01 samples/sec Loss 1.5103 LearningRate 0.0020 Epoch: 17 Global Step: 213230 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 21:55:32,654-Speed 2963.25 samples/sec Loss 1.5551 LearningRate 0.0020 Epoch: 17 Global Step: 213240 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 21:55:36,030-Speed 3033.76 samples/sec Loss 1.4931 LearningRate 0.0020 Epoch: 17 Global Step: 213250 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 21:55:39,438-Speed 3006.34 samples/sec Loss 1.5154 LearningRate 0.0020 Epoch: 17 Global Step: 213260 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 21:55:42,881-Speed 2974.73 samples/sec Loss 1.4590 LearningRate 0.0020 Epoch: 17 Global Step: 213270 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 21:55:46,268-Speed 3024.17 samples/sec Loss 1.4810 LearningRate 0.0020 Epoch: 17 Global Step: 213280 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 21:55:49,720-Speed 2966.85 samples/sec Loss 1.5166 LearningRate 0.0020 Epoch: 17 Global Step: 213290 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 21:55:53,080-Speed 3048.89 samples/sec Loss 1.5183 LearningRate 0.0020 Epoch: 17 Global Step: 213300 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 21:55:56,494-Speed 2999.55 samples/sec Loss 1.5072 LearningRate 0.0020 Epoch: 17 Global Step: 213310 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 21:55:59,872-Speed 3032.02 samples/sec Loss 1.4911 LearningRate 0.0020 Epoch: 17 Global Step: 213320 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 21:56:03,243-Speed 3044.00 samples/sec Loss 1.5708 LearningRate 0.0020 Epoch: 17 Global Step: 213330 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 21:56:06,627-Speed 3026.12 samples/sec Loss 1.4920 LearningRate 0.0020 Epoch: 17 Global Step: 213340 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 21:56:10,085-Speed 2962.55 samples/sec Loss 1.5183 LearningRate 0.0020 Epoch: 17 Global Step: 213350 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 21:56:13,414-Speed 3076.10 samples/sec Loss 1.5356 LearningRate 0.0020 Epoch: 17 Global Step: 213360 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 21:56:16,730-Speed 3088.69 samples/sec Loss 1.4869 LearningRate 0.0020 Epoch: 17 Global Step: 213370 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 21:56:20,210-Speed 2944.57 samples/sec Loss 1.4801 LearningRate 0.0020 Epoch: 17 Global Step: 213380 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 21:56:23,615-Speed 3007.51 samples/sec Loss 1.5364 LearningRate 0.0020 Epoch: 17 Global Step: 213390 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 21:56:27,038-Speed 2991.95 samples/sec Loss 1.5679 LearningRate 0.0020 Epoch: 17 Global Step: 213400 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 21:56:30,396-Speed 3050.33 samples/sec Loss 1.4618 LearningRate 0.0020 Epoch: 17 Global Step: 213410 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 21:56:33,790-Speed 3019.17 samples/sec Loss 1.5374 LearningRate 0.0020 Epoch: 17 Global Step: 213420 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 21:56:37,165-Speed 3034.09 samples/sec Loss 1.4998 LearningRate 0.0020 Epoch: 17 Global Step: 213430 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 21:56:40,572-Speed 3006.77 samples/sec Loss 1.4937 LearningRate 0.0020 Epoch: 17 Global Step: 213440 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 21:56:43,883-Speed 3094.01 samples/sec Loss 1.5271 LearningRate 0.0020 Epoch: 17 Global Step: 213450 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 21:56:47,210-Speed 3078.08 samples/sec Loss 1.5214 LearningRate 0.0020 Epoch: 17 Global Step: 213460 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 21:56:50,543-Speed 3073.20 samples/sec Loss 1.4684 LearningRate 0.0020 Epoch: 17 Global Step: 213470 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 21:56:53,906-Speed 3046.08 samples/sec Loss 1.4925 LearningRate 0.0020 Epoch: 17 Global Step: 213480 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 21:56:57,386-Speed 2943.35 samples/sec Loss 1.5551 LearningRate 0.0020 Epoch: 17 Global Step: 213490 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 21:57:00,878-Speed 2933.15 samples/sec Loss 1.4772 LearningRate 0.0020 Epoch: 17 Global Step: 213500 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 21:57:04,280-Speed 3010.59 samples/sec Loss 1.5432 LearningRate 0.0020 Epoch: 17 Global Step: 213510 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 21:57:07,690-Speed 3004.04 samples/sec Loss 1.5526 LearningRate 0.0020 Epoch: 17 Global Step: 213520 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 21:57:11,027-Speed 3069.29 samples/sec Loss 1.4798 LearningRate 0.0020 Epoch: 17 Global Step: 213530 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 21:57:14,446-Speed 2995.55 samples/sec Loss 1.5624 LearningRate 0.0020 Epoch: 17 Global Step: 213540 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 21:57:17,893-Speed 2972.31 samples/sec Loss 1.5667 LearningRate 0.0020 Epoch: 17 Global Step: 213550 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 21:57:21,322-Speed 2987.07 samples/sec Loss 1.5902 LearningRate 0.0020 Epoch: 17 Global Step: 213560 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 21:57:24,786-Speed 2956.44 samples/sec Loss 1.4984 LearningRate 0.0020 Epoch: 17 Global Step: 213570 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 21:57:28,107-Speed 3084.72 samples/sec Loss 1.4962 LearningRate 0.0020 Epoch: 17 Global Step: 213580 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 21:57:31,430-Speed 3082.11 samples/sec Loss 1.5283 LearningRate 0.0020 Epoch: 17 Global Step: 213590 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 21:57:34,842-Speed 3001.60 samples/sec Loss 1.5612 LearningRate 0.0020 Epoch: 17 Global Step: 213600 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 21:57:38,275-Speed 2983.71 samples/sec Loss 1.5425 LearningRate 0.0020 Epoch: 17 Global Step: 213610 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 21:57:41,648-Speed 3037.72 samples/sec Loss 1.5837 LearningRate 0.0020 Epoch: 17 Global Step: 213620 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 21:57:45,035-Speed 3023.87 samples/sec Loss 1.5040 LearningRate 0.0020 Epoch: 17 Global Step: 213630 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 21:57:48,378-Speed 3064.56 samples/sec Loss 1.5008 LearningRate 0.0020 Epoch: 17 Global Step: 213640 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 21:57:51,766-Speed 3022.61 samples/sec Loss 1.5357 LearningRate 0.0020 Epoch: 17 Global Step: 213650 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 21:57:55,127-Speed 3047.54 samples/sec Loss 1.5475 LearningRate 0.0020 Epoch: 17 Global Step: 213660 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 21:57:58,483-Speed 3052.80 samples/sec Loss 1.5770 LearningRate 0.0020 Epoch: 17 Global Step: 213670 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 21:58:02,001-Speed 2911.65 samples/sec Loss 1.5719 LearningRate 0.0020 Epoch: 17 Global Step: 213680 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 21:58:05,393-Speed 3019.59 samples/sec Loss 1.5666 LearningRate 0.0020 Epoch: 17 Global Step: 213690 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 21:58:08,771-Speed 3032.34 samples/sec Loss 1.5313 LearningRate 0.0020 Epoch: 17 Global Step: 213700 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 21:58:12,129-Speed 3050.65 samples/sec Loss 1.4748 LearningRate 0.0020 Epoch: 17 Global Step: 213710 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 21:58:15,529-Speed 3012.69 samples/sec Loss 1.5425 LearningRate 0.0020 Epoch: 17 Global Step: 213720 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 21:58:18,936-Speed 3006.24 samples/sec Loss 1.5535 LearningRate 0.0020 Epoch: 17 Global Step: 213730 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 21:58:22,367-Speed 2985.44 samples/sec Loss 1.5480 LearningRate 0.0019 Epoch: 17 Global Step: 213740 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 21:58:25,749-Speed 3028.84 samples/sec Loss 1.5021 LearningRate 0.0019 Epoch: 17 Global Step: 213750 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 21:58:29,086-Speed 3068.84 samples/sec Loss 1.5274 LearningRate 0.0019 Epoch: 17 Global Step: 213760 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 21:58:32,472-Speed 3025.79 samples/sec Loss 1.5762 LearningRate 0.0019 Epoch: 17 Global Step: 213770 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 21:58:35,950-Speed 2944.68 samples/sec Loss 1.5217 LearningRate 0.0019 Epoch: 17 Global Step: 213780 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 21:58:39,271-Speed 3084.01 samples/sec Loss 1.5184 LearningRate 0.0019 Epoch: 17 Global Step: 213790 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 21:58:42,612-Speed 3066.52 samples/sec Loss 1.4598 LearningRate 0.0019 Epoch: 17 Global Step: 213800 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 21:58:45,931-Speed 3085.22 samples/sec Loss 1.5122 LearningRate 0.0019 Epoch: 17 Global Step: 213810 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 21:58:49,301-Speed 3040.01 samples/sec Loss 1.5862 LearningRate 0.0019 Epoch: 17 Global Step: 213820 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 21:58:52,667-Speed 3043.26 samples/sec Loss 1.5213 LearningRate 0.0019 Epoch: 17 Global Step: 213830 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 21:58:56,102-Speed 2981.62 samples/sec Loss 1.5740 LearningRate 0.0019 Epoch: 17 Global Step: 213840 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 21:58:59,543-Speed 2976.92 samples/sec Loss 1.5894 LearningRate 0.0019 Epoch: 17 Global Step: 213850 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 21:59:02,989-Speed 2972.76 samples/sec Loss 1.5796 LearningRate 0.0019 Epoch: 17 Global Step: 213860 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 21:59:06,355-Speed 3042.68 samples/sec Loss 1.5011 LearningRate 0.0019 Epoch: 17 Global Step: 213870 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 21:59:09,740-Speed 3025.69 samples/sec Loss 1.5236 LearningRate 0.0019 Epoch: 17 Global Step: 213880 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 21:59:13,079-Speed 3067.65 samples/sec Loss 1.5204 LearningRate 0.0019 Epoch: 17 Global Step: 213890 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 21:59:16,407-Speed 3077.42 samples/sec Loss 1.5962 LearningRate 0.0019 Epoch: 17 Global Step: 213900 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 21:59:19,775-Speed 3041.50 samples/sec Loss 1.5563 LearningRate 0.0019 Epoch: 17 Global Step: 213910 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 21:59:23,140-Speed 3043.32 samples/sec Loss 1.6017 LearningRate 0.0019 Epoch: 17 Global Step: 213920 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 21:59:26,481-Speed 3066.39 samples/sec Loss 1.5179 LearningRate 0.0019 Epoch: 17 Global Step: 213930 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 21:59:29,873-Speed 3019.90 samples/sec Loss 1.5415 LearningRate 0.0019 Epoch: 17 Global Step: 213940 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 21:59:33,279-Speed 3007.04 samples/sec Loss 1.5315 LearningRate 0.0019 Epoch: 17 Global Step: 213950 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 21:59:36,619-Speed 3066.92 samples/sec Loss 1.5627 LearningRate 0.0019 Epoch: 17 Global Step: 213960 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 21:59:39,998-Speed 3030.92 samples/sec Loss 1.5310 LearningRate 0.0019 Epoch: 17 Global Step: 213970 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 21:59:43,380-Speed 3029.33 samples/sec Loss 1.5770 LearningRate 0.0019 Epoch: 17 Global Step: 213980 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 21:59:46,821-Speed 2976.47 samples/sec Loss 1.5254 LearningRate 0.0019 Epoch: 17 Global Step: 213990 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 21:59:50,252-Speed 2985.36 samples/sec Loss 1.5838 LearningRate 0.0019 Epoch: 17 Global Step: 214000 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 21:59:53,636-Speed 3026.72 samples/sec Loss 1.5254 LearningRate 0.0019 Epoch: 17 Global Step: 214010 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 21:59:56,964-Speed 3078.11 samples/sec Loss 1.5493 LearningRate 0.0019 Epoch: 17 Global Step: 214020 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:00:00,297-Speed 3073.49 samples/sec Loss 1.5485 LearningRate 0.0019 Epoch: 17 Global Step: 214030 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:00:03,700-Speed 3009.82 samples/sec Loss 1.5323 LearningRate 0.0019 Epoch: 17 Global Step: 214040 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:00:07,063-Speed 3045.81 samples/sec Loss 1.5361 LearningRate 0.0019 Epoch: 17 Global Step: 214050 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:00:10,515-Speed 2967.00 samples/sec Loss 1.4974 LearningRate 0.0019 Epoch: 17 Global Step: 214060 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:00:13,863-Speed 3060.75 samples/sec Loss 1.5197 LearningRate 0.0019 Epoch: 17 Global Step: 214070 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:00:17,288-Speed 2990.84 samples/sec Loss 1.5451 LearningRate 0.0019 Epoch: 17 Global Step: 214080 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:00:20,687-Speed 3013.15 samples/sec Loss 1.4919 LearningRate 0.0019 Epoch: 17 Global Step: 214090 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:00:24,025-Speed 3069.09 samples/sec Loss 1.5445 LearningRate 0.0019 Epoch: 17 Global Step: 214100 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:00:27,404-Speed 3031.01 samples/sec Loss 1.5803 LearningRate 0.0019 Epoch: 17 Global Step: 214110 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:00:30,841-Speed 2980.56 samples/sec Loss 1.5519 LearningRate 0.0019 Epoch: 17 Global Step: 214120 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:00:34,185-Speed 3063.32 samples/sec Loss 1.5654 LearningRate 0.0019 Epoch: 17 Global Step: 214130 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:00:37,592-Speed 3006.27 samples/sec Loss 1.5553 LearningRate 0.0019 Epoch: 17 Global Step: 214140 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:00:40,970-Speed 3031.82 samples/sec Loss 1.5291 LearningRate 0.0019 Epoch: 17 Global Step: 214150 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:00:44,466-Speed 2929.95 samples/sec Loss 1.4982 LearningRate 0.0019 Epoch: 17 Global Step: 214160 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 22:00:47,803-Speed 3070.07 samples/sec Loss 1.6213 LearningRate 0.0019 Epoch: 17 Global Step: 214170 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 22:00:51,206-Speed 3009.72 samples/sec Loss 1.5029 LearningRate 0.0019 Epoch: 17 Global Step: 214180 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:00:54,586-Speed 3030.96 samples/sec Loss 1.5370 LearningRate 0.0019 Epoch: 17 Global Step: 214190 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:00:57,963-Speed 3033.01 samples/sec Loss 1.5259 LearningRate 0.0019 Epoch: 17 Global Step: 214200 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:01:01,300-Speed 3069.27 samples/sec Loss 1.5453 LearningRate 0.0019 Epoch: 17 Global Step: 214210 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:01:04,725-Speed 2991.02 samples/sec Loss 1.5381 LearningRate 0.0019 Epoch: 17 Global Step: 214220 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:01:08,270-Speed 2888.83 samples/sec Loss 1.5545 LearningRate 0.0019 Epoch: 17 Global Step: 214230 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:01:11,715-Speed 2974.28 samples/sec Loss 1.5413 LearningRate 0.0019 Epoch: 17 Global Step: 214240 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:01:15,065-Speed 3056.88 samples/sec Loss 1.5301 LearningRate 0.0019 Epoch: 17 Global Step: 214250 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:01:18,508-Speed 2975.61 samples/sec Loss 1.5322 LearningRate 0.0019 Epoch: 17 Global Step: 214260 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:01:21,937-Speed 2986.55 samples/sec Loss 1.5673 LearningRate 0.0019 Epoch: 17 Global Step: 214270 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:01:25,348-Speed 3003.29 samples/sec Loss 1.5407 LearningRate 0.0019 Epoch: 17 Global Step: 214280 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:01:28,714-Speed 3043.39 samples/sec Loss 1.5964 LearningRate 0.0019 Epoch: 17 Global Step: 214290 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:01:32,088-Speed 3035.68 samples/sec Loss 1.5744 LearningRate 0.0019 Epoch: 17 Global Step: 214300 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:01:35,524-Speed 2980.93 samples/sec Loss 1.5600 LearningRate 0.0019 Epoch: 17 Global Step: 214310 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:01:39,027-Speed 2924.98 samples/sec Loss 1.5019 LearningRate 0.0019 Epoch: 17 Global Step: 214320 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:01:42,481-Speed 2965.91 samples/sec Loss 1.5305 LearningRate 0.0019 Epoch: 17 Global Step: 214330 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:01:45,918-Speed 2980.18 samples/sec Loss 1.5018 LearningRate 0.0019 Epoch: 17 Global Step: 214340 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:01:49,336-Speed 2996.65 samples/sec Loss 1.6165 LearningRate 0.0019 Epoch: 17 Global Step: 214350 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:01:52,767-Speed 2985.84 samples/sec Loss 1.4976 LearningRate 0.0019 Epoch: 17 Global Step: 214360 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:01:56,229-Speed 2958.88 samples/sec Loss 1.5021 LearningRate 0.0019 Epoch: 17 Global Step: 214370 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:01:59,684-Speed 2964.87 samples/sec Loss 1.5758 LearningRate 0.0019 Epoch: 17 Global Step: 214380 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:02:03,073-Speed 3022.19 samples/sec Loss 1.5578 LearningRate 0.0019 Epoch: 17 Global Step: 214390 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:02:06,478-Speed 3008.18 samples/sec Loss 1.5600 LearningRate 0.0019 Epoch: 17 Global Step: 214400 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:02:09,851-Speed 3036.83 samples/sec Loss 1.5933 LearningRate 0.0019 Epoch: 17 Global Step: 214410 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:02:13,318-Speed 2954.68 samples/sec Loss 1.6079 LearningRate 0.0019 Epoch: 17 Global Step: 214420 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:02:16,730-Speed 3001.58 samples/sec Loss 1.5964 LearningRate 0.0019 Epoch: 17 Global Step: 214430 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:02:20,088-Speed 3050.50 samples/sec Loss 1.5734 LearningRate 0.0019 Epoch: 17 Global Step: 214440 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:02:23,489-Speed 3011.88 samples/sec Loss 1.4973 LearningRate 0.0019 Epoch: 17 Global Step: 214450 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 22:02:26,807-Speed 3086.95 samples/sec Loss 1.6253 LearningRate 0.0019 Epoch: 17 Global Step: 214460 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:02:30,213-Speed 3007.43 samples/sec Loss 1.5964 LearningRate 0.0019 Epoch: 17 Global Step: 214470 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:02:33,536-Speed 3081.80 samples/sec Loss 1.6045 LearningRate 0.0019 Epoch: 17 Global Step: 214480 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:02:36,893-Speed 3051.88 samples/sec Loss 1.5464 LearningRate 0.0019 Epoch: 17 Global Step: 214490 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:02:40,315-Speed 2992.71 samples/sec Loss 1.5826 LearningRate 0.0019 Epoch: 17 Global Step: 214500 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:02:43,638-Speed 3083.31 samples/sec Loss 1.4854 LearningRate 0.0019 Epoch: 17 Global Step: 214510 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:02:47,057-Speed 2995.05 samples/sec Loss 1.5245 LearningRate 0.0019 Epoch: 17 Global Step: 214520 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:02:50,413-Speed 3052.98 samples/sec Loss 1.5752 LearningRate 0.0019 Epoch: 17 Global Step: 214530 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:02:53,793-Speed 3030.27 samples/sec Loss 1.6013 LearningRate 0.0019 Epoch: 17 Global Step: 214540 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:02:57,155-Speed 3046.21 samples/sec Loss 1.5624 LearningRate 0.0019 Epoch: 17 Global Step: 214550 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:03:00,534-Speed 3032.30 samples/sec Loss 1.5354 LearningRate 0.0019 Epoch: 17 Global Step: 214560 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:03:03,912-Speed 3031.82 samples/sec Loss 1.6318 LearningRate 0.0019 Epoch: 17 Global Step: 214570 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:03:07,304-Speed 3019.99 samples/sec Loss 1.5822 LearningRate 0.0019 Epoch: 17 Global Step: 214580 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 22:03:10,725-Speed 2993.94 samples/sec Loss 1.5365 LearningRate 0.0019 Epoch: 17 Global Step: 214590 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 22:03:14,069-Speed 3062.68 samples/sec Loss 1.5827 LearningRate 0.0019 Epoch: 17 Global Step: 214600 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 22:03:17,401-Speed 3074.59 samples/sec Loss 1.5818 LearningRate 0.0019 Epoch: 17 Global Step: 214610 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 22:03:20,838-Speed 2980.23 samples/sec Loss 1.5579 LearningRate 0.0019 Epoch: 17 Global Step: 214620 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 22:03:24,278-Speed 2977.68 samples/sec Loss 1.5353 LearningRate 0.0019 Epoch: 17 Global Step: 214630 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 22:03:27,698-Speed 2994.66 samples/sec Loss 1.5760 LearningRate 0.0018 Epoch: 17 Global Step: 214640 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 22:03:31,066-Speed 3041.38 samples/sec Loss 1.5600 LearningRate 0.0018 Epoch: 17 Global Step: 214650 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 22:03:34,463-Speed 3015.37 samples/sec Loss 1.5456 LearningRate 0.0018 Epoch: 17 Global Step: 214660 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 22:03:37,859-Speed 3016.49 samples/sec Loss 1.5534 LearningRate 0.0018 Epoch: 17 Global Step: 214670 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 22:03:41,214-Speed 3053.61 samples/sec Loss 1.5196 LearningRate 0.0018 Epoch: 17 Global Step: 214680 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:03:44,625-Speed 3002.49 samples/sec Loss 1.5958 LearningRate 0.0018 Epoch: 17 Global Step: 214690 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:03:48,018-Speed 3019.17 samples/sec Loss 1.5255 LearningRate 0.0018 Epoch: 17 Global Step: 214700 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:03:51,415-Speed 3015.32 samples/sec Loss 1.5757 LearningRate 0.0018 Epoch: 17 Global Step: 214710 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:03:54,832-Speed 2997.15 samples/sec Loss 1.5971 LearningRate 0.0018 Epoch: 17 Global Step: 214720 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:03:58,180-Speed 3059.59 samples/sec Loss 1.5790 LearningRate 0.0018 Epoch: 17 Global Step: 214730 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:04:01,568-Speed 3023.52 samples/sec Loss 1.5409 LearningRate 0.0018 Epoch: 17 Global Step: 214740 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:04:04,969-Speed 3011.79 samples/sec Loss 1.5441 LearningRate 0.0018 Epoch: 17 Global Step: 214750 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:04:08,357-Speed 3022.79 samples/sec Loss 1.5295 LearningRate 0.0018 Epoch: 17 Global Step: 214760 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:04:11,731-Speed 3035.94 samples/sec Loss 1.5497 LearningRate 0.0018 Epoch: 17 Global Step: 214770 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:04:15,074-Speed 3064.60 samples/sec Loss 1.5905 LearningRate 0.0018 Epoch: 17 Global Step: 214780 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:04:18,491-Speed 2997.06 samples/sec Loss 1.5920 LearningRate 0.0018 Epoch: 17 Global Step: 214790 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:04:21,877-Speed 3025.80 samples/sec Loss 1.5908 LearningRate 0.0018 Epoch: 17 Global Step: 214800 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:04:25,249-Speed 3037.00 samples/sec Loss 1.5499 LearningRate 0.0018 Epoch: 17 Global Step: 214810 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:04:28,684-Speed 2982.60 samples/sec Loss 1.5334 LearningRate 0.0018 Epoch: 17 Global Step: 214820 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:04:32,184-Speed 2926.64 samples/sec Loss 1.5873 LearningRate 0.0018 Epoch: 17 Global Step: 214830 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:04:35,511-Speed 3078.13 samples/sec Loss 1.5669 LearningRate 0.0018 Epoch: 17 Global Step: 214840 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:04:38,925-Speed 2999.87 samples/sec Loss 1.5920 LearningRate 0.0018 Epoch: 17 Global Step: 214850 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:04:42,345-Speed 2995.88 samples/sec Loss 1.5402 LearningRate 0.0018 Epoch: 17 Global Step: 214860 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:04:45,819-Speed 2948.16 samples/sec Loss 1.5570 LearningRate 0.0018 Epoch: 17 Global Step: 214870 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:04:49,291-Speed 2949.92 samples/sec Loss 1.5856 LearningRate 0.0018 Epoch: 17 Global Step: 214880 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:04:52,698-Speed 3006.38 samples/sec Loss 1.6055 LearningRate 0.0018 Epoch: 17 Global Step: 214890 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:04:56,114-Speed 2998.75 samples/sec Loss 1.5465 LearningRate 0.0018 Epoch: 17 Global Step: 214900 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:04:59,513-Speed 3014.13 samples/sec Loss 1.5652 LearningRate 0.0018 Epoch: 17 Global Step: 214910 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:05:02,883-Speed 3039.25 samples/sec Loss 1.5849 LearningRate 0.0018 Epoch: 17 Global Step: 214920 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:05:06,204-Speed 3083.55 samples/sec Loss 1.5966 LearningRate 0.0018 Epoch: 17 Global Step: 214930 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:05:09,574-Speed 3038.92 samples/sec Loss 1.5188 LearningRate 0.0018 Epoch: 17 Global Step: 214940 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:05:13,075-Speed 2926.47 samples/sec Loss 1.5460 LearningRate 0.0018 Epoch: 17 Global Step: 214950 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:05:16,407-Speed 3073.89 samples/sec Loss 1.5719 LearningRate 0.0018 Epoch: 17 Global Step: 214960 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:05:19,796-Speed 3022.24 samples/sec Loss 1.5493 LearningRate 0.0018 Epoch: 17 Global Step: 214970 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:05:23,126-Speed 3075.89 samples/sec Loss 1.5827 LearningRate 0.0018 Epoch: 17 Global Step: 214980 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:05:26,569-Speed 2974.71 samples/sec Loss 1.6277 LearningRate 0.0018 Epoch: 17 Global Step: 214990 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:05:29,965-Speed 3017.11 samples/sec Loss 1.5452 LearningRate 0.0018 Epoch: 17 Global Step: 215000 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:05:33,436-Speed 2950.94 samples/sec Loss 1.5934 LearningRate 0.0018 Epoch: 17 Global Step: 215010 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:05:36,830-Speed 3018.06 samples/sec Loss 1.5605 LearningRate 0.0018 Epoch: 17 Global Step: 215020 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:05:40,184-Speed 3053.55 samples/sec Loss 1.6513 LearningRate 0.0018 Epoch: 17 Global Step: 215030 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:05:43,521-Speed 3069.80 samples/sec Loss 1.5698 LearningRate 0.0018 Epoch: 17 Global Step: 215040 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:05:46,842-Speed 3083.84 samples/sec Loss 1.5849 LearningRate 0.0018 Epoch: 17 Global Step: 215050 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:05:50,204-Speed 3046.60 samples/sec Loss 1.5911 LearningRate 0.0018 Epoch: 17 Global Step: 215060 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:05:53,551-Speed 3061.18 samples/sec Loss 1.6195 LearningRate 0.0018 Epoch: 17 Global Step: 215070 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:05:56,893-Speed 3064.33 samples/sec Loss 1.5877 LearningRate 0.0018 Epoch: 17 Global Step: 215080 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:06:00,283-Speed 3022.08 samples/sec Loss 1.5752 LearningRate 0.0018 Epoch: 17 Global Step: 215090 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:06:03,686-Speed 3010.33 samples/sec Loss 1.5743 LearningRate 0.0018 Epoch: 17 Global Step: 215100 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:06:07,045-Speed 3048.67 samples/sec Loss 1.5475 LearningRate 0.0018 Epoch: 17 Global Step: 215110 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:06:10,431-Speed 3025.83 samples/sec Loss 1.5551 LearningRate 0.0018 Epoch: 17 Global Step: 215120 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:06:13,846-Speed 2998.89 samples/sec Loss 1.5527 LearningRate 0.0018 Epoch: 17 Global Step: 215130 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 22:06:17,198-Speed 3055.48 samples/sec Loss 1.5608 LearningRate 0.0018 Epoch: 17 Global Step: 215140 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:06:20,591-Speed 3019.69 samples/sec Loss 1.5830 LearningRate 0.0018 Epoch: 17 Global Step: 215150 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:06:23,968-Speed 3032.99 samples/sec Loss 1.5652 LearningRate 0.0018 Epoch: 17 Global Step: 215160 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:06:27,333-Speed 3043.99 samples/sec Loss 1.5599 LearningRate 0.0018 Epoch: 17 Global Step: 215170 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:06:30,689-Speed 3052.13 samples/sec Loss 1.6053 LearningRate 0.0018 Epoch: 17 Global Step: 215180 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:06:34,085-Speed 3016.26 samples/sec Loss 1.5722 LearningRate 0.0018 Epoch: 17 Global Step: 215190 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:06:37,437-Speed 3056.54 samples/sec Loss 1.6236 LearningRate 0.0018 Epoch: 17 Global Step: 215200 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:06:40,866-Speed 2986.92 samples/sec Loss 1.5815 LearningRate 0.0018 Epoch: 17 Global Step: 215210 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:06:44,215-Speed 3058.24 samples/sec Loss 1.6019 LearningRate 0.0018 Epoch: 17 Global Step: 215220 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:06:47,560-Speed 3062.07 samples/sec Loss 1.5873 LearningRate 0.0018 Epoch: 17 Global Step: 215230 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:06:50,910-Speed 3057.62 samples/sec Loss 1.5624 LearningRate 0.0018 Epoch: 17 Global Step: 215240 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:06:54,307-Speed 3015.78 samples/sec Loss 1.5866 LearningRate 0.0018 Epoch: 17 Global Step: 215250 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:06:57,705-Speed 3014.70 samples/sec Loss 1.5784 LearningRate 0.0018 Epoch: 17 Global Step: 215260 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:07:01,107-Speed 3010.75 samples/sec Loss 1.5600 LearningRate 0.0018 Epoch: 17 Global Step: 215270 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:07:04,504-Speed 3015.77 samples/sec Loss 1.6165 LearningRate 0.0018 Epoch: 17 Global Step: 215280 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:07:07,889-Speed 3025.21 samples/sec Loss 1.5796 LearningRate 0.0018 Epoch: 17 Global Step: 215290 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:07:11,237-Speed 3059.62 samples/sec Loss 1.6462 LearningRate 0.0018 Epoch: 17 Global Step: 215300 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:07:14,618-Speed 3029.75 samples/sec Loss 1.5630 LearningRate 0.0018 Epoch: 17 Global Step: 215310 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:07:17,983-Speed 3043.66 samples/sec Loss 1.5724 LearningRate 0.0018 Epoch: 17 Global Step: 215320 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:07:21,373-Speed 3021.64 samples/sec Loss 1.5361 LearningRate 0.0018 Epoch: 17 Global Step: 215330 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:07:24,828-Speed 2964.39 samples/sec Loss 1.5627 LearningRate 0.0018 Epoch: 17 Global Step: 215340 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:07:28,246-Speed 2996.89 samples/sec Loss 1.5771 LearningRate 0.0018 Epoch: 17 Global Step: 215350 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:07:31,631-Speed 3026.31 samples/sec Loss 1.6320 LearningRate 0.0018 Epoch: 17 Global Step: 215360 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:07:35,037-Speed 3007.53 samples/sec Loss 1.5662 LearningRate 0.0018 Epoch: 17 Global Step: 215370 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 22:07:38,478-Speed 2977.19 samples/sec Loss 1.5940 LearningRate 0.0018 Epoch: 17 Global Step: 215380 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 22:07:41,820-Speed 3064.75 samples/sec Loss 1.5655 LearningRate 0.0018 Epoch: 17 Global Step: 215390 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:07:45,192-Speed 3036.70 samples/sec Loss 1.6267 LearningRate 0.0018 Epoch: 17 Global Step: 215400 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:07:48,524-Speed 3074.75 samples/sec Loss 1.6303 LearningRate 0.0018 Epoch: 17 Global Step: 215410 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:07:51,905-Speed 3029.87 samples/sec Loss 1.5818 LearningRate 0.0018 Epoch: 17 Global Step: 215420 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:07:55,232-Speed 3077.88 samples/sec Loss 1.5974 LearningRate 0.0018 Epoch: 17 Global Step: 215430 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:07:58,597-Speed 3043.90 samples/sec Loss 1.6178 LearningRate 0.0018 Epoch: 17 Global Step: 215440 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:08:01,945-Speed 3059.63 samples/sec Loss 1.5480 LearningRate 0.0018 Epoch: 17 Global Step: 215450 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:08:05,319-Speed 3035.85 samples/sec Loss 1.5654 LearningRate 0.0018 Epoch: 17 Global Step: 215460 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:08:08,709-Speed 3021.83 samples/sec Loss 1.5172 LearningRate 0.0018 Epoch: 17 Global Step: 215470 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:08:12,063-Speed 3053.93 samples/sec Loss 1.5595 LearningRate 0.0018 Epoch: 17 Global Step: 215480 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:08:15,389-Speed 3079.87 samples/sec Loss 1.6328 LearningRate 0.0018 Epoch: 17 Global Step: 215490 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:08:18,808-Speed 2995.44 samples/sec Loss 1.5624 LearningRate 0.0018 Epoch: 17 Global Step: 215500 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:08:22,194-Speed 3025.53 samples/sec Loss 1.6307 LearningRate 0.0018 Epoch: 17 Global Step: 215510 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:08:25,562-Speed 3041.49 samples/sec Loss 1.6286 LearningRate 0.0018 Epoch: 17 Global Step: 215520 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:08:28,944-Speed 3029.14 samples/sec Loss 1.6128 LearningRate 0.0018 Epoch: 17 Global Step: 215530 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:08:32,324-Speed 3030.07 samples/sec Loss 1.5438 LearningRate 0.0018 Epoch: 17 Global Step: 215540 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:08:35,721-Speed 3015.67 samples/sec Loss 1.6331 LearningRate 0.0018 Epoch: 17 Global Step: 215550 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:08:39,140-Speed 2996.28 samples/sec Loss 1.6108 LearningRate 0.0017 Epoch: 17 Global Step: 215560 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:08:42,518-Speed 3032.04 samples/sec Loss 1.6102 LearningRate 0.0017 Epoch: 17 Global Step: 215570 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:08:45,894-Speed 3034.44 samples/sec Loss 1.5919 LearningRate 0.0017 Epoch: 17 Global Step: 215580 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:08:49,313-Speed 2995.90 samples/sec Loss 1.5541 LearningRate 0.0017 Epoch: 17 Global Step: 215590 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 22:08:52,622-Speed 3095.19 samples/sec Loss 1.6092 LearningRate 0.0017 Epoch: 17 Global Step: 215600 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:08:55,964-Speed 3064.82 samples/sec Loss 1.5904 LearningRate 0.0017 Epoch: 17 Global Step: 215610 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:08:59,347-Speed 3027.42 samples/sec Loss 1.5930 LearningRate 0.0017 Epoch: 17 Global Step: 215620 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:09:02,817-Speed 2952.45 samples/sec Loss 1.5493 LearningRate 0.0017 Epoch: 17 Global Step: 215630 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:09:06,164-Speed 3059.92 samples/sec Loss 1.6542 LearningRate 0.0017 Epoch: 17 Global Step: 215640 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:09:09,540-Speed 3033.78 samples/sec Loss 1.6150 LearningRate 0.0017 Epoch: 17 Global Step: 215650 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:09:12,927-Speed 3024.43 samples/sec Loss 1.5989 LearningRate 0.0017 Epoch: 17 Global Step: 215660 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:09:16,377-Speed 2969.02 samples/sec Loss 1.6362 LearningRate 0.0017 Epoch: 17 Global Step: 215670 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:09:19,758-Speed 3029.69 samples/sec Loss 1.5957 LearningRate 0.0017 Epoch: 17 Global Step: 215680 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:09:23,103-Speed 3062.11 samples/sec Loss 1.5252 LearningRate 0.0017 Epoch: 17 Global Step: 215690 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:09:26,560-Speed 2962.57 samples/sec Loss 1.6035 LearningRate 0.0017 Epoch: 17 Global Step: 215700 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:09:29,962-Speed 3011.71 samples/sec Loss 1.6295 LearningRate 0.0017 Epoch: 17 Global Step: 215710 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:09:33,370-Speed 3005.04 samples/sec Loss 1.6366 LearningRate 0.0017 Epoch: 17 Global Step: 215720 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:09:36,789-Speed 2996.51 samples/sec Loss 1.5946 LearningRate 0.0017 Epoch: 17 Global Step: 215730 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:09:40,201-Speed 3001.60 samples/sec Loss 1.6363 LearningRate 0.0017 Epoch: 17 Global Step: 215740 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:09:43,558-Speed 3051.81 samples/sec Loss 1.6439 LearningRate 0.0017 Epoch: 17 Global Step: 215750 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:09:46,969-Speed 3002.43 samples/sec Loss 1.5924 LearningRate 0.0017 Epoch: 17 Global Step: 215760 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:09:50,413-Speed 2974.16 samples/sec Loss 1.5381 LearningRate 0.0017 Epoch: 17 Global Step: 215770 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:09:53,838-Speed 2991.37 samples/sec Loss 1.5454 LearningRate 0.0017 Epoch: 17 Global Step: 215780 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:09:57,211-Speed 3035.87 samples/sec Loss 1.6042 LearningRate 0.0017 Epoch: 17 Global Step: 215790 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:10:00,610-Speed 3014.20 samples/sec Loss 1.6143 LearningRate 0.0017 Epoch: 17 Global Step: 215800 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:10:04,017-Speed 3005.86 samples/sec Loss 1.6495 LearningRate 0.0017 Epoch: 17 Global Step: 215810 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:10:07,375-Speed 3050.57 samples/sec Loss 1.6093 LearningRate 0.0017 Epoch: 17 Global Step: 215820 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:10:10,838-Speed 2958.11 samples/sec Loss 1.5300 LearningRate 0.0017 Epoch: 17 Global Step: 215830 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:10:14,232-Speed 3017.74 samples/sec Loss 1.6082 LearningRate 0.0017 Epoch: 17 Global Step: 215840 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:10:17,669-Speed 2979.82 samples/sec Loss 1.5530 LearningRate 0.0017 Epoch: 17 Global Step: 215850 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:10:21,066-Speed 3015.81 samples/sec Loss 1.5977 LearningRate 0.0017 Epoch: 17 Global Step: 215860 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:10:24,429-Speed 3046.16 samples/sec Loss 1.5574 LearningRate 0.0017 Epoch: 17 Global Step: 215870 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:10:27,822-Speed 3017.96 samples/sec Loss 1.5650 LearningRate 0.0017 Epoch: 17 Global Step: 215880 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:10:31,258-Speed 2981.32 samples/sec Loss 1.6064 LearningRate 0.0017 Epoch: 17 Global Step: 215890 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:10:34,643-Speed 3026.37 samples/sec Loss 1.6150 LearningRate 0.0017 Epoch: 17 Global Step: 215900 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:10:37,977-Speed 3072.30 samples/sec Loss 1.6307 LearningRate 0.0017 Epoch: 17 Global Step: 215910 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:10:41,307-Speed 3076.50 samples/sec Loss 1.5440 LearningRate 0.0017 Epoch: 17 Global Step: 215920 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:10:44,691-Speed 3026.59 samples/sec Loss 1.5536 LearningRate 0.0017 Epoch: 17 Global Step: 215930 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:10:48,089-Speed 3013.64 samples/sec Loss 1.5992 LearningRate 0.0017 Epoch: 17 Global Step: 215940 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:10:51,519-Speed 2986.33 samples/sec Loss 1.5631 LearningRate 0.0017 Epoch: 17 Global Step: 215950 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:10:54,897-Speed 3032.59 samples/sec Loss 1.6319 LearningRate 0.0017 Epoch: 17 Global Step: 215960 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:10:58,252-Speed 3053.11 samples/sec Loss 1.6076 LearningRate 0.0017 Epoch: 17 Global Step: 215970 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:11:01,662-Speed 3003.65 samples/sec Loss 1.5860 LearningRate 0.0017 Epoch: 17 Global Step: 215980 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:11:04,991-Speed 3077.03 samples/sec Loss 1.5836 LearningRate 0.0017 Epoch: 17 Global Step: 215990 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:11:08,397-Speed 3007.97 samples/sec Loss 1.6335 LearningRate 0.0017 Epoch: 17 Global Step: 216000 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:11:11,893-Speed 2929.37 samples/sec Loss 1.6211 LearningRate 0.0017 Epoch: 17 Global Step: 216010 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:11:15,339-Speed 2972.10 samples/sec Loss 1.5669 LearningRate 0.0017 Epoch: 17 Global Step: 216020 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:11:18,803-Speed 2957.20 samples/sec Loss 1.5949 LearningRate 0.0017 Epoch: 17 Global Step: 216030 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:11:22,197-Speed 3018.62 samples/sec Loss 1.6447 LearningRate 0.0017 Epoch: 17 Global Step: 216040 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:11:25,572-Speed 3034.99 samples/sec Loss 1.6042 LearningRate 0.0017 Epoch: 17 Global Step: 216050 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 22:11:28,904-Speed 3073.95 samples/sec Loss 1.6018 LearningRate 0.0017 Epoch: 17 Global Step: 216060 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:11:32,276-Speed 3037.91 samples/sec Loss 1.5561 LearningRate 0.0017 Epoch: 17 Global Step: 216070 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:11:35,603-Speed 3077.55 samples/sec Loss 1.5915 LearningRate 0.0017 Epoch: 17 Global Step: 216080 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:11:39,032-Speed 2987.45 samples/sec Loss 1.5732 LearningRate 0.0017 Epoch: 17 Global Step: 216090 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:11:42,404-Speed 3037.75 samples/sec Loss 1.5279 LearningRate 0.0017 Epoch: 17 Global Step: 216100 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:11:45,742-Speed 3068.37 samples/sec Loss 1.5305 LearningRate 0.0017 Epoch: 17 Global Step: 216110 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:11:49,099-Speed 3051.35 samples/sec Loss 1.6463 LearningRate 0.0017 Epoch: 17 Global Step: 216120 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:11:52,456-Speed 3051.35 samples/sec Loss 1.6306 LearningRate 0.0017 Epoch: 17 Global Step: 216130 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:11:55,793-Speed 3069.52 samples/sec Loss 1.5845 LearningRate 0.0017 Epoch: 17 Global Step: 216140 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:11:59,148-Speed 3053.37 samples/sec Loss 1.6086 LearningRate 0.0017 Epoch: 17 Global Step: 216150 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:12:02,552-Speed 3009.15 samples/sec Loss 1.5527 LearningRate 0.0017 Epoch: 17 Global Step: 216160 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:12:05,980-Speed 2987.99 samples/sec Loss 1.6142 LearningRate 0.0017 Epoch: 17 Global Step: 216170 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:12:09,354-Speed 3036.11 samples/sec Loss 1.6266 LearningRate 0.0017 Epoch: 17 Global Step: 216180 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:12:12,748-Speed 3017.86 samples/sec Loss 1.6085 LearningRate 0.0017 Epoch: 17 Global Step: 216190 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:12:16,240-Speed 2933.17 samples/sec Loss 1.5929 LearningRate 0.0017 Epoch: 17 Global Step: 216200 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:12:19,630-Speed 3021.89 samples/sec Loss 1.5691 LearningRate 0.0017 Epoch: 17 Global Step: 216210 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:12:23,033-Speed 3009.55 samples/sec Loss 1.5562 LearningRate 0.0017 Epoch: 17 Global Step: 216220 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:12:26,375-Speed 3065.04 samples/sec Loss 1.6029 LearningRate 0.0017 Epoch: 17 Global Step: 216230 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:12:29,729-Speed 3053.70 samples/sec Loss 1.6364 LearningRate 0.0017 Epoch: 17 Global Step: 216240 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:12:33,120-Speed 3020.94 samples/sec Loss 1.6006 LearningRate 0.0017 Epoch: 17 Global Step: 216250 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:12:36,484-Speed 3045.27 samples/sec Loss 1.5755 LearningRate 0.0017 Epoch: 17 Global Step: 216260 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:12:39,848-Speed 3044.79 samples/sec Loss 1.6318 LearningRate 0.0017 Epoch: 17 Global Step: 216270 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:12:43,239-Speed 3020.68 samples/sec Loss 1.5677 LearningRate 0.0017 Epoch: 17 Global Step: 216280 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:12:46,641-Speed 3010.59 samples/sec Loss 1.5346 LearningRate 0.0017 Epoch: 17 Global Step: 216290 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:12:50,008-Speed 3042.14 samples/sec Loss 1.5993 LearningRate 0.0017 Epoch: 17 Global Step: 216300 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:12:53,406-Speed 3014.79 samples/sec Loss 1.5969 LearningRate 0.0017 Epoch: 17 Global Step: 216310 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:12:56,801-Speed 3016.80 samples/sec Loss 1.5946 LearningRate 0.0017 Epoch: 17 Global Step: 216320 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:13:00,138-Speed 3069.82 samples/sec Loss 1.6620 LearningRate 0.0017 Epoch: 17 Global Step: 216330 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:13:03,533-Speed 3018.03 samples/sec Loss 1.6509 LearningRate 0.0017 Epoch: 17 Global Step: 216340 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:13:06,936-Speed 3008.97 samples/sec Loss 1.5515 LearningRate 0.0017 Epoch: 17 Global Step: 216350 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:13:10,414-Speed 2946.14 samples/sec Loss 1.5517 LearningRate 0.0017 Epoch: 17 Global Step: 216360 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:13:13,822-Speed 3005.30 samples/sec Loss 1.5444 LearningRate 0.0017 Epoch: 17 Global Step: 216370 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:13:17,240-Speed 2996.39 samples/sec Loss 1.5965 LearningRate 0.0017 Epoch: 17 Global Step: 216380 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:13:20,633-Speed 3020.86 samples/sec Loss 1.5847 LearningRate 0.0017 Epoch: 17 Global Step: 216390 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:13:24,157-Speed 2906.34 samples/sec Loss 1.6304 LearningRate 0.0017 Epoch: 17 Global Step: 216400 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:13:27,611-Speed 2965.47 samples/sec Loss 1.6069 LearningRate 0.0017 Epoch: 17 Global Step: 216410 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 22:13:31,107-Speed 2929.37 samples/sec Loss 1.6331 LearningRate 0.0017 Epoch: 17 Global Step: 216420 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 22:13:34,513-Speed 3008.04 samples/sec Loss 1.5990 LearningRate 0.0017 Epoch: 17 Global Step: 216430 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:13:37,957-Speed 2973.78 samples/sec Loss 1.6480 LearningRate 0.0017 Epoch: 17 Global Step: 216440 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:13:41,361-Speed 3009.43 samples/sec Loss 1.5868 LearningRate 0.0017 Epoch: 17 Global Step: 216450 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:13:44,816-Speed 2964.57 samples/sec Loss 1.5553 LearningRate 0.0017 Epoch: 17 Global Step: 216460 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:13:48,221-Speed 3007.97 samples/sec Loss 1.5680 LearningRate 0.0017 Epoch: 17 Global Step: 216470 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:13:51,691-Speed 2952.20 samples/sec Loss 1.5731 LearningRate 0.0017 Epoch: 17 Global Step: 216480 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:13:54,998-Speed 3097.60 samples/sec Loss 1.5979 LearningRate 0.0017 Epoch: 17 Global Step: 216490 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:13:58,466-Speed 2952.83 samples/sec Loss 1.6026 LearningRate 0.0017 Epoch: 17 Global Step: 216500 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:14:01,906-Speed 2977.91 samples/sec Loss 1.5581 LearningRate 0.0016 Epoch: 17 Global Step: 216510 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:14:05,263-Speed 3051.31 samples/sec Loss 1.5972 LearningRate 0.0016 Epoch: 17 Global Step: 216520 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:14:08,627-Speed 3044.47 samples/sec Loss 1.5873 LearningRate 0.0016 Epoch: 17 Global Step: 216530 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:14:11,983-Speed 3052.61 samples/sec Loss 1.5869 LearningRate 0.0016 Epoch: 17 Global Step: 216540 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:14:15,352-Speed 3039.94 samples/sec Loss 1.6090 LearningRate 0.0016 Epoch: 17 Global Step: 216550 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 22:14:18,827-Speed 2947.69 samples/sec Loss 1.5901 LearningRate 0.0016 Epoch: 17 Global Step: 216560 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 22:14:22,212-Speed 3026.63 samples/sec Loss 1.6487 LearningRate 0.0016 Epoch: 17 Global Step: 216570 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 22:14:25,627-Speed 2998.65 samples/sec Loss 1.5880 LearningRate 0.0016 Epoch: 17 Global Step: 216580 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 22:14:28,995-Speed 3041.21 samples/sec Loss 1.5993 LearningRate 0.0016 Epoch: 17 Global Step: 216590 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 22:14:32,439-Speed 2975.06 samples/sec Loss 1.6507 LearningRate 0.0016 Epoch: 17 Global Step: 216600 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 22:14:35,891-Speed 2966.33 samples/sec Loss 1.6065 LearningRate 0.0016 Epoch: 17 Global Step: 216610 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 22:14:39,276-Speed 3026.38 samples/sec Loss 1.5931 LearningRate 0.0016 Epoch: 17 Global Step: 216620 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 22:14:42,634-Speed 3051.26 samples/sec Loss 1.6188 LearningRate 0.0016 Epoch: 17 Global Step: 216630 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 22:14:46,070-Speed 2980.30 samples/sec Loss 1.5950 LearningRate 0.0016 Epoch: 17 Global Step: 216640 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 22:14:49,495-Speed 2991.22 samples/sec Loss 1.6100 LearningRate 0.0016 Epoch: 17 Global Step: 216650 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:14:52,838-Speed 3063.73 samples/sec Loss 1.5983 LearningRate 0.0016 Epoch: 17 Global Step: 216660 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:14:56,205-Speed 3042.24 samples/sec Loss 1.6545 LearningRate 0.0016 Epoch: 17 Global Step: 216670 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:14:59,583-Speed 3031.66 samples/sec Loss 1.6201 LearningRate 0.0016 Epoch: 17 Global Step: 216680 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:15:02,987-Speed 3009.48 samples/sec Loss 1.6061 LearningRate 0.0016 Epoch: 17 Global Step: 216690 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:15:06,389-Speed 3010.94 samples/sec Loss 1.6331 LearningRate 0.0016 Epoch: 17 Global Step: 216700 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:15:09,865-Speed 2946.78 samples/sec Loss 1.6303 LearningRate 0.0016 Epoch: 17 Global Step: 216710 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:15:13,227-Speed 3046.73 samples/sec Loss 1.6692 LearningRate 0.0016 Epoch: 17 Global Step: 216720 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:15:16,568-Speed 3064.87 samples/sec Loss 1.6211 LearningRate 0.0016 Epoch: 17 Global Step: 216730 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:15:20,033-Speed 2956.54 samples/sec Loss 1.5530 LearningRate 0.0016 Epoch: 17 Global Step: 216740 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:15:23,394-Speed 3049.15 samples/sec Loss 1.6255 LearningRate 0.0016 Epoch: 17 Global Step: 216750 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:15:26,750-Speed 3052.08 samples/sec Loss 1.5919 LearningRate 0.0016 Epoch: 17 Global Step: 216760 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:15:30,218-Speed 2953.08 samples/sec Loss 1.5790 LearningRate 0.0016 Epoch: 17 Global Step: 216770 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:15:33,608-Speed 3022.55 samples/sec Loss 1.5801 LearningRate 0.0016 Epoch: 17 Global Step: 216780 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:15:37,034-Speed 2989.60 samples/sec Loss 1.5939 LearningRate 0.0016 Epoch: 17 Global Step: 216790 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:15:40,466-Speed 2983.96 samples/sec Loss 1.6403 LearningRate 0.0016 Epoch: 17 Global Step: 216800 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:15:43,867-Speed 3012.30 samples/sec Loss 1.6013 LearningRate 0.0016 Epoch: 17 Global Step: 216810 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:15:47,225-Speed 3050.57 samples/sec Loss 1.5618 LearningRate 0.0016 Epoch: 17 Global Step: 216820 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:15:50,588-Speed 3045.09 samples/sec Loss 1.5825 LearningRate 0.0016 Epoch: 17 Global Step: 216830 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:15:54,001-Speed 3001.54 samples/sec Loss 1.5865 LearningRate 0.0016 Epoch: 17 Global Step: 216840 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:15:57,454-Speed 2966.30 samples/sec Loss 1.5721 LearningRate 0.0016 Epoch: 17 Global Step: 216850 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 22:16:00,885-Speed 2985.51 samples/sec Loss 1.6448 LearningRate 0.0016 Epoch: 17 Global Step: 216860 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 22:16:04,254-Speed 3040.20 samples/sec Loss 1.6759 LearningRate 0.0016 Epoch: 17 Global Step: 216870 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:16:07,693-Speed 2978.05 samples/sec Loss 1.5985 LearningRate 0.0016 Epoch: 17 Global Step: 216880 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:16:11,036-Speed 3064.37 samples/sec Loss 1.5730 LearningRate 0.0016 Epoch: 17 Global Step: 216890 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:16:14,557-Speed 2908.81 samples/sec Loss 1.6693 LearningRate 0.0016 Epoch: 17 Global Step: 216900 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:16:17,936-Speed 3031.24 samples/sec Loss 1.5861 LearningRate 0.0016 Epoch: 17 Global Step: 216910 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:16:21,377-Speed 2976.57 samples/sec Loss 1.5930 LearningRate 0.0016 Epoch: 17 Global Step: 216920 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:16:24,696-Speed 3085.88 samples/sec Loss 1.5860 LearningRate 0.0016 Epoch: 17 Global Step: 216930 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:16:28,110-Speed 3000.04 samples/sec Loss 1.5744 LearningRate 0.0016 Epoch: 17 Global Step: 216940 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:16:31,470-Speed 3049.20 samples/sec Loss 1.6100 LearningRate 0.0016 Epoch: 17 Global Step: 216950 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:16:34,822-Speed 3055.61 samples/sec Loss 1.6108 LearningRate 0.0016 Epoch: 17 Global Step: 216960 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:16:38,187-Speed 3043.56 samples/sec Loss 1.6609 LearningRate 0.0016 Epoch: 17 Global Step: 216970 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:16:41,605-Speed 2997.16 samples/sec Loss 1.5952 LearningRate 0.0016 Epoch: 17 Global Step: 216980 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:16:44,916-Speed 3093.80 samples/sec Loss 1.6285 LearningRate 0.0016 Epoch: 17 Global Step: 216990 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:16:48,228-Speed 3092.65 samples/sec Loss 1.5897 LearningRate 0.0016 Epoch: 17 Global Step: 217000 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:16:51,604-Speed 3034.09 samples/sec Loss 1.6634 LearningRate 0.0016 Epoch: 17 Global Step: 217010 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:16:54,939-Speed 3070.89 samples/sec Loss 1.6383 LearningRate 0.0016 Epoch: 17 Global Step: 217020 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:16:58,332-Speed 3018.66 samples/sec Loss 1.6104 LearningRate 0.0016 Epoch: 17 Global Step: 217030 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:17:01,691-Speed 3050.15 samples/sec Loss 1.6360 LearningRate 0.0016 Epoch: 17 Global Step: 217040 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:17:05,097-Speed 3007.33 samples/sec Loss 1.6596 LearningRate 0.0016 Epoch: 17 Global Step: 217050 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:17:08,574-Speed 2945.02 samples/sec Loss 1.6310 LearningRate 0.0016 Epoch: 17 Global Step: 217060 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:17:11,941-Speed 3042.52 samples/sec Loss 1.5797 LearningRate 0.0016 Epoch: 17 Global Step: 217070 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:17:15,339-Speed 3014.26 samples/sec Loss 1.6180 LearningRate 0.0016 Epoch: 17 Global Step: 217080 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:17:18,770-Speed 2985.76 samples/sec Loss 1.6226 LearningRate 0.0016 Epoch: 17 Global Step: 217090 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:17:22,190-Speed 2994.56 samples/sec Loss 1.6027 LearningRate 0.0016 Epoch: 17 Global Step: 217100 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:17:25,557-Speed 3042.05 samples/sec Loss 1.6376 LearningRate 0.0016 Epoch: 17 Global Step: 217110 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:17:28,946-Speed 3023.07 samples/sec Loss 1.6375 LearningRate 0.0016 Epoch: 17 Global Step: 217120 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:17:32,452-Speed 2921.33 samples/sec Loss 1.5295 LearningRate 0.0016 Epoch: 17 Global Step: 217130 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:17:35,902-Speed 2968.36 samples/sec Loss 1.5914 LearningRate 0.0016 Epoch: 17 Global Step: 217140 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:17:39,329-Speed 2988.68 samples/sec Loss 1.6505 LearningRate 0.0016 Epoch: 17 Global Step: 217150 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:17:42,792-Speed 2958.99 samples/sec Loss 1.6229 LearningRate 0.0016 Epoch: 17 Global Step: 217160 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:17:46,183-Speed 3020.03 samples/sec Loss 1.6080 LearningRate 0.0016 Epoch: 17 Global Step: 217170 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:17:49,555-Speed 3038.50 samples/sec Loss 1.5979 LearningRate 0.0016 Epoch: 17 Global Step: 217180 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:17:52,906-Speed 3056.08 samples/sec Loss 1.5982 LearningRate 0.0016 Epoch: 17 Global Step: 217190 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:17:56,317-Speed 3002.59 samples/sec Loss 1.5177 LearningRate 0.0016 Epoch: 17 Global Step: 217200 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:17:59,672-Speed 3053.39 samples/sec Loss 1.6251 LearningRate 0.0016 Epoch: 17 Global Step: 217210 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:18:03,023-Speed 3056.67 samples/sec Loss 1.6271 LearningRate 0.0016 Epoch: 17 Global Step: 217220 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:18:06,440-Speed 2996.64 samples/sec Loss 1.5581 LearningRate 0.0016 Epoch: 17 Global Step: 217230 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:18:09,817-Speed 3033.21 samples/sec Loss 1.5935 LearningRate 0.0016 Epoch: 17 Global Step: 217240 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:18:13,175-Speed 3051.17 samples/sec Loss 1.6035 LearningRate 0.0016 Epoch: 17 Global Step: 217250 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:18:16,532-Speed 3050.83 samples/sec Loss 1.6554 LearningRate 0.0016 Epoch: 17 Global Step: 217260 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:18:19,894-Speed 3047.26 samples/sec Loss 1.5839 LearningRate 0.0016 Epoch: 17 Global Step: 217270 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:18:23,253-Speed 3049.80 samples/sec Loss 1.6262 LearningRate 0.0016 Epoch: 17 Global Step: 217280 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:18:26,598-Speed 3061.53 samples/sec Loss 1.5972 LearningRate 0.0016 Epoch: 17 Global Step: 217290 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:18:29,985-Speed 3024.41 samples/sec Loss 1.6242 LearningRate 0.0016 Epoch: 17 Global Step: 217300 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:18:33,328-Speed 3063.48 samples/sec Loss 1.6054 LearningRate 0.0016 Epoch: 17 Global Step: 217310 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 22:18:36,686-Speed 3050.19 samples/sec Loss 1.5777 LearningRate 0.0016 Epoch: 17 Global Step: 217320 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 22:18:40,087-Speed 3012.40 samples/sec Loss 1.6268 LearningRate 0.0016 Epoch: 17 Global Step: 217330 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:18:43,460-Speed 3036.54 samples/sec Loss 1.6618 LearningRate 0.0016 Epoch: 17 Global Step: 217340 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:18:46,810-Speed 3057.23 samples/sec Loss 1.6350 LearningRate 0.0016 Epoch: 17 Global Step: 217350 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:18:50,230-Speed 2995.21 samples/sec Loss 1.6131 LearningRate 0.0016 Epoch: 17 Global Step: 217360 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:18:53,702-Speed 2949.78 samples/sec Loss 1.5441 LearningRate 0.0016 Epoch: 17 Global Step: 217370 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:18:57,014-Speed 3092.70 samples/sec Loss 1.6588 LearningRate 0.0016 Epoch: 17 Global Step: 217380 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:19:00,410-Speed 3016.14 samples/sec Loss 1.5881 LearningRate 0.0016 Epoch: 17 Global Step: 217390 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:19:03,778-Speed 3041.58 samples/sec Loss 1.6208 LearningRate 0.0016 Epoch: 17 Global Step: 217400 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:19:07,147-Speed 3040.18 samples/sec Loss 1.5974 LearningRate 0.0016 Epoch: 17 Global Step: 217410 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:19:10,673-Speed 2904.89 samples/sec Loss 1.6028 LearningRate 0.0016 Epoch: 17 Global Step: 217420 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:19:14,061-Speed 3023.17 samples/sec Loss 1.5606 LearningRate 0.0016 Epoch: 17 Global Step: 217430 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:19:17,511-Speed 2968.67 samples/sec Loss 1.5629 LearningRate 0.0016 Epoch: 17 Global Step: 217440 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:19:20,906-Speed 3017.67 samples/sec Loss 1.6234 LearningRate 0.0016 Epoch: 17 Global Step: 217450 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:19:24,296-Speed 3020.90 samples/sec Loss 1.5982 LearningRate 0.0016 Epoch: 17 Global Step: 217460 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:19:27,711-Speed 2999.71 samples/sec Loss 1.6022 LearningRate 0.0016 Epoch: 17 Global Step: 217470 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:19:31,031-Speed 3085.10 samples/sec Loss 1.6248 LearningRate 0.0016 Epoch: 17 Global Step: 217480 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:19:34,431-Speed 3012.26 samples/sec Loss 1.5774 LearningRate 0.0016 Epoch: 17 Global Step: 217490 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:19:37,831-Speed 3012.91 samples/sec Loss 1.5580 LearningRate 0.0015 Epoch: 17 Global Step: 217500 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:19:41,181-Speed 3057.90 samples/sec Loss 1.6111 LearningRate 0.0015 Epoch: 17 Global Step: 217510 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:19:44,506-Speed 3079.99 samples/sec Loss 1.6234 LearningRate 0.0015 Epoch: 17 Global Step: 217520 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:19:47,859-Speed 3054.70 samples/sec Loss 1.6514 LearningRate 0.0015 Epoch: 17 Global Step: 217530 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:19:51,245-Speed 3026.32 samples/sec Loss 1.5702 LearningRate 0.0015 Epoch: 17 Global Step: 217540 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:19:54,661-Speed 2998.45 samples/sec Loss 1.5836 LearningRate 0.0015 Epoch: 17 Global Step: 217550 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:19:58,023-Speed 3045.88 samples/sec Loss 1.6284 LearningRate 0.0015 Epoch: 17 Global Step: 217560 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:20:01,456-Speed 2983.59 samples/sec Loss 1.5847 LearningRate 0.0015 Epoch: 17 Global Step: 217570 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:20:04,837-Speed 3029.40 samples/sec Loss 1.5553 LearningRate 0.0015 Epoch: 17 Global Step: 217580 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:20:08,214-Speed 3033.72 samples/sec Loss 1.5858 LearningRate 0.0015 Epoch: 17 Global Step: 217590 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:20:11,662-Speed 2970.65 samples/sec Loss 1.6143 LearningRate 0.0015 Epoch: 17 Global Step: 217600 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:20:15,133-Speed 2950.38 samples/sec Loss 1.5881 LearningRate 0.0015 Epoch: 17 Global Step: 217610 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:20:18,524-Speed 3021.34 samples/sec Loss 1.5953 LearningRate 0.0015 Epoch: 17 Global Step: 217620 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:20:21,881-Speed 3050.86 samples/sec Loss 1.6878 LearningRate 0.0015 Epoch: 17 Global Step: 217630 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:20:25,257-Speed 3034.26 samples/sec Loss 1.6389 LearningRate 0.0015 Epoch: 17 Global Step: 217640 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:20:28,735-Speed 2945.01 samples/sec Loss 1.6094 LearningRate 0.0015 Epoch: 17 Global Step: 217650 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:20:32,151-Speed 2998.23 samples/sec Loss 1.6232 LearningRate 0.0015 Epoch: 17 Global Step: 217660 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:20:35,549-Speed 3014.67 samples/sec Loss 1.6380 LearningRate 0.0015 Epoch: 17 Global Step: 217670 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:20:38,898-Speed 3058.69 samples/sec Loss 1.6507 LearningRate 0.0015 Epoch: 17 Global Step: 217680 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:20:42,221-Speed 3081.76 samples/sec Loss 1.5947 LearningRate 0.0015 Epoch: 17 Global Step: 217690 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:20:45,575-Speed 3053.80 samples/sec Loss 1.6301 LearningRate 0.0015 Epoch: 17 Global Step: 217700 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 22:20:48,903-Speed 3077.40 samples/sec Loss 1.6146 LearningRate 0.0015 Epoch: 17 Global Step: 217710 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 22:20:52,258-Speed 3053.55 samples/sec Loss 1.5861 LearningRate 0.0015 Epoch: 17 Global Step: 217720 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 22:20:55,632-Speed 3035.83 samples/sec Loss 1.5938 LearningRate 0.0015 Epoch: 17 Global Step: 217730 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 22:20:59,058-Speed 2989.64 samples/sec Loss 1.5834 LearningRate 0.0015 Epoch: 17 Global Step: 217740 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 22:21:02,460-Speed 3010.44 samples/sec Loss 1.6685 LearningRate 0.0015 Epoch: 17 Global Step: 217750 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 22:21:05,981-Speed 2909.63 samples/sec Loss 1.6375 LearningRate 0.0015 Epoch: 17 Global Step: 217760 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 22:21:09,380-Speed 3013.24 samples/sec Loss 1.6703 LearningRate 0.0015 Epoch: 17 Global Step: 217770 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 22:21:12,765-Speed 3026.79 samples/sec Loss 1.6166 LearningRate 0.0015 Epoch: 17 Global Step: 217780 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 22:21:16,109-Speed 3062.83 samples/sec Loss 1.6303 LearningRate 0.0015 Epoch: 17 Global Step: 217790 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 22:21:19,488-Speed 3031.14 samples/sec Loss 1.6124 LearningRate 0.0015 Epoch: 17 Global Step: 217800 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:21:22,909-Speed 2993.94 samples/sec Loss 1.6075 LearningRate 0.0015 Epoch: 17 Global Step: 217810 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:21:26,293-Speed 3027.16 samples/sec Loss 1.6107 LearningRate 0.0015 Epoch: 17 Global Step: 217820 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:21:29,715-Speed 2992.81 samples/sec Loss 1.6395 LearningRate 0.0015 Epoch: 17 Global Step: 217830 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:21:33,150-Speed 2981.94 samples/sec Loss 1.5769 LearningRate 0.0015 Epoch: 17 Global Step: 217840 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:21:36,518-Speed 3041.16 samples/sec Loss 1.5968 LearningRate 0.0015 Epoch: 17 Global Step: 217850 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:21:39,888-Speed 3039.63 samples/sec Loss 1.5961 LearningRate 0.0015 Epoch: 17 Global Step: 217860 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:21:43,283-Speed 3017.01 samples/sec Loss 1.5736 LearningRate 0.0015 Epoch: 17 Global Step: 217870 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:21:46,713-Speed 2985.91 samples/sec Loss 1.6153 LearningRate 0.0015 Epoch: 17 Global Step: 217880 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:21:50,145-Speed 2985.04 samples/sec Loss 1.6081 LearningRate 0.0015 Epoch: 17 Global Step: 217890 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:21:53,540-Speed 3017.57 samples/sec Loss 1.5851 LearningRate 0.0015 Epoch: 17 Global Step: 217900 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:21:57,002-Speed 2958.20 samples/sec Loss 1.6134 LearningRate 0.0015 Epoch: 17 Global Step: 217910 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:22:00,509-Speed 2921.37 samples/sec Loss 1.6155 LearningRate 0.0015 Epoch: 17 Global Step: 217920 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:22:03,840-Speed 3074.28 samples/sec Loss 1.5777 LearningRate 0.0015 Epoch: 17 Global Step: 217930 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:22:07,281-Speed 2976.64 samples/sec Loss 1.6296 LearningRate 0.0015 Epoch: 17 Global Step: 217940 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:22:10,738-Speed 2963.42 samples/sec Loss 1.6495 LearningRate 0.0015 Epoch: 17 Global Step: 217950 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:22:14,106-Speed 3040.82 samples/sec Loss 1.6447 LearningRate 0.0015 Epoch: 17 Global Step: 217960 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:22:17,500-Speed 3017.58 samples/sec Loss 1.5538 LearningRate 0.0015 Epoch: 17 Global Step: 217970 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:22:20,892-Speed 3020.46 samples/sec Loss 1.6367 LearningRate 0.0015 Epoch: 17 Global Step: 217980 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:22:24,264-Speed 3036.80 samples/sec Loss 1.6177 LearningRate 0.0015 Epoch: 17 Global Step: 217990 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:22:27,590-Speed 3080.26 samples/sec Loss 1.6167 LearningRate 0.0015 Epoch: 17 Global Step: 218000 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:22:31,064-Speed 2948.58 samples/sec Loss 1.5693 LearningRate 0.0015 Epoch: 17 Global Step: 218010 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:22:34,573-Speed 2918.74 samples/sec Loss 1.6430 LearningRate 0.0015 Epoch: 17 Global Step: 218020 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:22:37,940-Speed 3042.07 samples/sec Loss 1.6268 LearningRate 0.0015 Epoch: 17 Global Step: 218030 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:22:41,368-Speed 2987.87 samples/sec Loss 1.5880 LearningRate 0.0015 Epoch: 17 Global Step: 218040 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:22:44,812-Speed 2973.72 samples/sec Loss 1.5729 LearningRate 0.0015 Epoch: 17 Global Step: 218050 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:22:48,275-Speed 2958.45 samples/sec Loss 1.6177 LearningRate 0.0015 Epoch: 17 Global Step: 218060 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:22:51,644-Speed 3040.32 samples/sec Loss 1.5900 LearningRate 0.0015 Epoch: 17 Global Step: 218070 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:22:55,063-Speed 2997.17 samples/sec Loss 1.6586 LearningRate 0.0015 Epoch: 17 Global Step: 218080 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:22:58,472-Speed 3004.93 samples/sec Loss 1.6414 LearningRate 0.0015 Epoch: 17 Global Step: 218090 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:23:01,852-Speed 3030.28 samples/sec Loss 1.5995 LearningRate 0.0015 Epoch: 17 Global Step: 218100 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:23:05,199-Speed 3059.83 samples/sec Loss 1.6262 LearningRate 0.0015 Epoch: 17 Global Step: 218110 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:23:08,560-Speed 3048.27 samples/sec Loss 1.6338 LearningRate 0.0015 Epoch: 17 Global Step: 218120 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:23:11,969-Speed 3003.81 samples/sec Loss 1.5760 LearningRate 0.0015 Epoch: 17 Global Step: 218130 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:23:15,368-Speed 3014.05 samples/sec Loss 1.6304 LearningRate 0.0015 Epoch: 17 Global Step: 218140 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:23:18,715-Speed 3060.01 samples/sec Loss 1.6469 LearningRate 0.0015 Epoch: 17 Global Step: 218150 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:23:22,087-Speed 3037.78 samples/sec Loss 1.5711 LearningRate 0.0015 Epoch: 17 Global Step: 218160 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:23:25,388-Speed 3103.53 samples/sec Loss 1.5939 LearningRate 0.0015 Epoch: 17 Global Step: 218170 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:23:28,809-Speed 2994.14 samples/sec Loss 1.6309 LearningRate 0.0015 Epoch: 17 Global Step: 218180 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:23:32,210-Speed 3011.52 samples/sec Loss 1.6165 LearningRate 0.0015 Epoch: 17 Global Step: 218190 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:23:35,605-Speed 3016.12 samples/sec Loss 1.5751 LearningRate 0.0015 Epoch: 17 Global Step: 218200 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:23:38,997-Speed 3020.60 samples/sec Loss 1.5921 LearningRate 0.0015 Epoch: 17 Global Step: 218210 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:23:42,332-Speed 3070.67 samples/sec Loss 1.6042 LearningRate 0.0015 Epoch: 17 Global Step: 218220 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:23:45,655-Speed 3082.25 samples/sec Loss 1.5915 LearningRate 0.0015 Epoch: 17 Global Step: 218230 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:23:49,058-Speed 3010.46 samples/sec Loss 1.6585 LearningRate 0.0015 Epoch: 17 Global Step: 218240 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:23:52,482-Speed 2991.16 samples/sec Loss 1.6171 LearningRate 0.0015 Epoch: 17 Global Step: 218250 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:23:55,821-Speed 3067.69 samples/sec Loss 1.5633 LearningRate 0.0015 Epoch: 17 Global Step: 218260 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:23:59,168-Speed 3060.28 samples/sec Loss 1.5892 LearningRate 0.0015 Epoch: 17 Global Step: 218270 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:24:02,526-Speed 3050.12 samples/sec Loss 1.6617 LearningRate 0.0015 Epoch: 17 Global Step: 218280 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:24:05,870-Speed 3062.89 samples/sec Loss 1.5875 LearningRate 0.0015 Epoch: 17 Global Step: 218290 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:24:09,332-Speed 2959.43 samples/sec Loss 1.6498 LearningRate 0.0015 Epoch: 17 Global Step: 218300 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:24:12,738-Speed 3007.01 samples/sec Loss 1.6469 LearningRate 0.0015 Epoch: 17 Global Step: 218310 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:24:16,071-Speed 3073.31 samples/sec Loss 1.5688 LearningRate 0.0015 Epoch: 17 Global Step: 218320 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:24:19,452-Speed 3028.88 samples/sec Loss 1.6152 LearningRate 0.0015 Epoch: 17 Global Step: 218330 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:24:22,808-Speed 3052.15 samples/sec Loss 1.6509 LearningRate 0.0015 Epoch: 17 Global Step: 218340 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:24:26,152-Speed 3063.16 samples/sec Loss 1.6164 LearningRate 0.0015 Epoch: 17 Global Step: 218350 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:24:29,541-Speed 3022.45 samples/sec Loss 1.6402 LearningRate 0.0015 Epoch: 17 Global Step: 218360 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:24:33,025-Speed 2940.08 samples/sec Loss 1.6023 LearningRate 0.0015 Epoch: 17 Global Step: 218370 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 22:24:36,417-Speed 3019.58 samples/sec Loss 1.5925 LearningRate 0.0015 Epoch: 17 Global Step: 218380 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:24:39,856-Speed 2978.23 samples/sec Loss 1.6867 LearningRate 0.0015 Epoch: 17 Global Step: 218390 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:24:43,258-Speed 3010.79 samples/sec Loss 1.6610 LearningRate 0.0015 Epoch: 17 Global Step: 218400 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:24:46,672-Speed 3000.50 samples/sec Loss 1.6540 LearningRate 0.0015 Epoch: 17 Global Step: 218410 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:24:50,068-Speed 3016.61 samples/sec Loss 1.6625 LearningRate 0.0015 Epoch: 17 Global Step: 218420 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:24:53,494-Speed 2989.38 samples/sec Loss 1.6459 LearningRate 0.0015 Epoch: 17 Global Step: 218430 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:24:56,897-Speed 3010.15 samples/sec Loss 1.6651 LearningRate 0.0015 Epoch: 17 Global Step: 218440 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:25:00,364-Speed 2954.18 samples/sec Loss 1.5690 LearningRate 0.0015 Epoch: 17 Global Step: 218450 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:25:03,771-Speed 3005.94 samples/sec Loss 1.5884 LearningRate 0.0015 Epoch: 17 Global Step: 218460 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:25:07,124-Speed 3054.69 samples/sec Loss 1.6018 LearningRate 0.0015 Epoch: 17 Global Step: 218470 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:25:10,554-Speed 2986.86 samples/sec Loss 1.5867 LearningRate 0.0015 Epoch: 17 Global Step: 218480 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:25:14,000-Speed 2971.93 samples/sec Loss 1.6234 LearningRate 0.0015 Epoch: 17 Global Step: 218490 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:25:17,406-Speed 3007.63 samples/sec Loss 1.6065 LearningRate 0.0015 Epoch: 17 Global Step: 218500 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:25:20,799-Speed 3019.19 samples/sec Loss 1.6559 LearningRate 0.0014 Epoch: 17 Global Step: 218510 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:25:24,153-Speed 3053.84 samples/sec Loss 1.5985 LearningRate 0.0014 Epoch: 17 Global Step: 218520 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:25:27,551-Speed 3014.58 samples/sec Loss 1.4974 LearningRate 0.0014 Epoch: 17 Global Step: 218530 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:25:30,965-Speed 3000.10 samples/sec Loss 1.5965 LearningRate 0.0014 Epoch: 17 Global Step: 218540 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:25:34,423-Speed 2961.34 samples/sec Loss 1.6626 LearningRate 0.0014 Epoch: 17 Global Step: 218550 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:25:37,820-Speed 3015.08 samples/sec Loss 1.5520 LearningRate 0.0014 Epoch: 17 Global Step: 218560 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:25:41,268-Speed 2970.99 samples/sec Loss 1.6056 LearningRate 0.0014 Epoch: 17 Global Step: 218570 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:25:44,640-Speed 3038.02 samples/sec Loss 1.6119 LearningRate 0.0014 Epoch: 17 Global Step: 218580 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:25:47,996-Speed 3055.73 samples/sec Loss 1.5994 LearningRate 0.0014 Epoch: 17 Global Step: 218590 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:25:51,322-Speed 3079.30 samples/sec Loss 1.5957 LearningRate 0.0014 Epoch: 17 Global Step: 218600 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:25:54,732-Speed 3004.47 samples/sec Loss 1.6486 LearningRate 0.0014 Epoch: 17 Global Step: 218610 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:25:58,109-Speed 3032.33 samples/sec Loss 1.6389 LearningRate 0.0014 Epoch: 17 Global Step: 218620 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:26:01,547-Speed 2979.60 samples/sec Loss 1.6767 LearningRate 0.0014 Epoch: 17 Global Step: 218630 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:26:04,901-Speed 3054.09 samples/sec Loss 1.6080 LearningRate 0.0014 Epoch: 17 Global Step: 218640 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:26:08,318-Speed 2997.71 samples/sec Loss 1.6642 LearningRate 0.0014 Epoch: 17 Global Step: 218650 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:26:11,751-Speed 2983.63 samples/sec Loss 1.5590 LearningRate 0.0014 Epoch: 17 Global Step: 218660 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:26:15,168-Speed 2997.45 samples/sec Loss 1.6608 LearningRate 0.0014 Epoch: 17 Global Step: 218670 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:26:18,611-Speed 2974.96 samples/sec Loss 1.6089 LearningRate 0.0014 Epoch: 17 Global Step: 218680 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:26:21,980-Speed 3040.15 samples/sec Loss 1.6453 LearningRate 0.0014 Epoch: 17 Global Step: 218690 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:26:25,372-Speed 3019.93 samples/sec Loss 1.5379 LearningRate 0.0014 Epoch: 17 Global Step: 218700 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:26:28,762-Speed 3021.64 samples/sec Loss 1.6167 LearningRate 0.0014 Epoch: 17 Global Step: 218710 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:26:32,143-Speed 3029.13 samples/sec Loss 1.5977 LearningRate 0.0014 Epoch: 17 Global Step: 218720 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:26:35,592-Speed 2969.95 samples/sec Loss 1.6067 LearningRate 0.0014 Epoch: 17 Global Step: 218730 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:26:39,033-Speed 2976.80 samples/sec Loss 1.5465 LearningRate 0.0014 Epoch: 17 Global Step: 218740 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:26:42,531-Speed 2928.51 samples/sec Loss 1.6202 LearningRate 0.0014 Epoch: 17 Global Step: 218750 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:26:45,971-Speed 2977.44 samples/sec Loss 1.5861 LearningRate 0.0014 Epoch: 17 Global Step: 218760 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:26:49,385-Speed 3000.22 samples/sec Loss 1.6071 LearningRate 0.0014 Epoch: 17 Global Step: 218770 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:26:52,717-Speed 3074.11 samples/sec Loss 1.6372 LearningRate 0.0014 Epoch: 17 Global Step: 218780 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:26:56,109-Speed 3019.32 samples/sec Loss 1.6313 LearningRate 0.0014 Epoch: 17 Global Step: 218790 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:26:59,474-Speed 3043.89 samples/sec Loss 1.6000 LearningRate 0.0014 Epoch: 17 Global Step: 218800 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:27:02,832-Speed 3050.73 samples/sec Loss 1.6404 LearningRate 0.0014 Epoch: 17 Global Step: 218810 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:27:06,172-Speed 3066.54 samples/sec Loss 1.6778 LearningRate 0.0014 Epoch: 17 Global Step: 218820 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:27:09,562-Speed 3021.43 samples/sec Loss 1.6496 LearningRate 0.0014 Epoch: 17 Global Step: 218830 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:27:12,954-Speed 3020.25 samples/sec Loss 1.6352 LearningRate 0.0014 Epoch: 17 Global Step: 218840 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:27:16,291-Speed 3069.32 samples/sec Loss 1.6488 LearningRate 0.0014 Epoch: 17 Global Step: 218850 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 22:27:19,738-Speed 2971.45 samples/sec Loss 1.6272 LearningRate 0.0014 Epoch: 17 Global Step: 218860 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 22:27:23,180-Speed 2976.59 samples/sec Loss 1.5846 LearningRate 0.0014 Epoch: 17 Global Step: 218870 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 22:27:26,610-Speed 2985.53 samples/sec Loss 1.5800 LearningRate 0.0014 Epoch: 17 Global Step: 218880 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 22:27:30,023-Speed 3001.64 samples/sec Loss 1.6270 LearningRate 0.0014 Epoch: 17 Global Step: 218890 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 22:27:33,459-Speed 2980.83 samples/sec Loss 1.6282 LearningRate 0.0014 Epoch: 17 Global Step: 218900 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 22:27:36,813-Speed 3053.60 samples/sec Loss 1.6397 LearningRate 0.0014 Epoch: 17 Global Step: 218910 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 22:27:40,229-Speed 2998.45 samples/sec Loss 1.6944 LearningRate 0.0014 Epoch: 17 Global Step: 218920 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 22:27:43,632-Speed 3010.20 samples/sec Loss 1.6412 LearningRate 0.0014 Epoch: 17 Global Step: 218930 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 22:27:47,087-Speed 2964.30 samples/sec Loss 1.6020 LearningRate 0.0014 Epoch: 17 Global Step: 218940 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 22:27:50,490-Speed 3010.57 samples/sec Loss 1.5865 LearningRate 0.0014 Epoch: 17 Global Step: 218950 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:27:53,972-Speed 2941.76 samples/sec Loss 1.5906 LearningRate 0.0014 Epoch: 17 Global Step: 218960 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:27:57,340-Speed 3040.65 samples/sec Loss 1.6344 LearningRate 0.0014 Epoch: 17 Global Step: 218970 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:28:00,765-Speed 2992.63 samples/sec Loss 1.6089 LearningRate 0.0014 Epoch: 17 Global Step: 218980 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:28:04,256-Speed 2933.56 samples/sec Loss 1.5712 LearningRate 0.0014 Epoch: 17 Global Step: 218990 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:28:07,678-Speed 2992.99 samples/sec Loss 1.5786 LearningRate 0.0014 Epoch: 17 Global Step: 219000 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:28:11,108-Speed 2987.06 samples/sec Loss 1.6292 LearningRate 0.0014 Epoch: 17 Global Step: 219010 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:28:14,472-Speed 3044.48 samples/sec Loss 1.6909 LearningRate 0.0014 Epoch: 17 Global Step: 219020 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:28:17,845-Speed 3036.66 samples/sec Loss 1.6494 LearningRate 0.0014 Epoch: 17 Global Step: 219030 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:28:21,287-Speed 2976.03 samples/sec Loss 1.6149 LearningRate 0.0014 Epoch: 17 Global Step: 219040 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:28:24,612-Speed 3080.42 samples/sec Loss 1.5856 LearningRate 0.0014 Epoch: 17 Global Step: 219050 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:28:28,007-Speed 3017.00 samples/sec Loss 1.6363 LearningRate 0.0014 Epoch: 17 Global Step: 219060 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:28:31,418-Speed 3002.94 samples/sec Loss 1.6480 LearningRate 0.0014 Epoch: 17 Global Step: 219070 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:28:34,803-Speed 3026.05 samples/sec Loss 1.6759 LearningRate 0.0014 Epoch: 17 Global Step: 219080 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 22:28:38,184-Speed 3029.03 samples/sec Loss 1.6080 LearningRate 0.0014 Epoch: 17 Global Step: 219090 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 22:28:41,617-Speed 2984.57 samples/sec Loss 1.6265 LearningRate 0.0014 Epoch: 17 Global Step: 219100 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 22:28:45,005-Speed 3022.60 samples/sec Loss 1.6297 LearningRate 0.0014 Epoch: 17 Global Step: 219110 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 22:28:48,347-Speed 3064.98 samples/sec Loss 1.6088 LearningRate 0.0014 Epoch: 17 Global Step: 219120 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 22:28:51,807-Speed 2960.69 samples/sec Loss 1.5775 LearningRate 0.0014 Epoch: 17 Global Step: 219130 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 22:28:55,182-Speed 3034.82 samples/sec Loss 1.6756 LearningRate 0.0014 Epoch: 17 Global Step: 219140 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 22:28:58,581-Speed 3013.84 samples/sec Loss 1.6418 LearningRate 0.0014 Epoch: 17 Global Step: 219150 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 22:29:01,970-Speed 3021.74 samples/sec Loss 1.6824 LearningRate 0.0014 Epoch: 17 Global Step: 219160 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 22:29:05,416-Speed 2972.65 samples/sec Loss 1.6186 LearningRate 0.0014 Epoch: 17 Global Step: 219170 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 22:29:08,913-Speed 2928.73 samples/sec Loss 1.6457 LearningRate 0.0014 Epoch: 17 Global Step: 219180 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:29:12,352-Speed 2978.42 samples/sec Loss 1.5969 LearningRate 0.0014 Epoch: 17 Global Step: 219190 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:29:15,700-Speed 3060.09 samples/sec Loss 1.6489 LearningRate 0.0014 Epoch: 17 Global Step: 219200 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:29:19,090-Speed 3021.70 samples/sec Loss 1.6457 LearningRate 0.0014 Epoch: 17 Global Step: 219210 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:29:22,439-Speed 3057.79 samples/sec Loss 1.6520 LearningRate 0.0014 Epoch: 17 Global Step: 219220 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:29:25,789-Speed 3057.53 samples/sec Loss 1.6042 LearningRate 0.0014 Epoch: 17 Global Step: 219230 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:29:29,164-Speed 3035.51 samples/sec Loss 1.6579 LearningRate 0.0014 Epoch: 17 Global Step: 219240 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:29:32,577-Speed 3001.29 samples/sec Loss 1.6481 LearningRate 0.0014 Epoch: 17 Global Step: 219250 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:29:35,923-Speed 3061.35 samples/sec Loss 1.5417 LearningRate 0.0014 Epoch: 17 Global Step: 219260 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:29:39,272-Speed 3058.74 samples/sec Loss 1.5894 LearningRate 0.0014 Epoch: 17 Global Step: 219270 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:29:42,628-Speed 3052.06 samples/sec Loss 1.5974 LearningRate 0.0014 Epoch: 17 Global Step: 219280 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:29:46,079-Speed 2967.67 samples/sec Loss 1.5981 LearningRate 0.0014 Epoch: 17 Global Step: 219290 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:29:49,432-Speed 3054.81 samples/sec Loss 1.5664 LearningRate 0.0014 Epoch: 17 Global Step: 219300 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:29:52,853-Speed 2994.89 samples/sec Loss 1.6368 LearningRate 0.0014 Epoch: 17 Global Step: 219310 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:29:56,283-Speed 2985.69 samples/sec Loss 1.6371 LearningRate 0.0014 Epoch: 17 Global Step: 219320 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:29:59,679-Speed 3016.42 samples/sec Loss 1.6143 LearningRate 0.0014 Epoch: 17 Global Step: 219330 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:30:03,027-Speed 3059.00 samples/sec Loss 1.6108 LearningRate 0.0014 Epoch: 17 Global Step: 219340 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:30:07,156-Speed 2480.80 samples/sec Loss 1.5843 LearningRate 0.0014 Epoch: 17 Global Step: 219350 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:30:10,534-Speed 3031.86 samples/sec Loss 1.6702 LearningRate 0.0014 Epoch: 17 Global Step: 219360 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:30:14,029-Speed 2931.38 samples/sec Loss 1.6073 LearningRate 0.0014 Epoch: 17 Global Step: 219370 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:30:18,862-Speed 2119.12 samples/sec Loss 1.6314 LearningRate 0.0014 Epoch: 17 Global Step: 219380 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:30:22,317-Speed 2964.46 samples/sec Loss 1.5973 LearningRate 0.0014 Epoch: 17 Global Step: 219390 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:30:26,377-Speed 2522.57 samples/sec Loss 1.5512 LearningRate 0.0014 Epoch: 17 Global Step: 219400 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:30:30,416-Speed 2536.53 samples/sec Loss 1.6492 LearningRate 0.0014 Epoch: 17 Global Step: 219410 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:30:33,738-Speed 3082.87 samples/sec Loss 1.6460 LearningRate 0.0014 Epoch: 17 Global Step: 219420 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:30:37,087-Speed 3059.13 samples/sec Loss 1.5802 LearningRate 0.0014 Epoch: 17 Global Step: 219430 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:30:40,502-Speed 2999.20 samples/sec Loss 1.5962 LearningRate 0.0014 Epoch: 17 Global Step: 219440 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:30:43,863-Speed 3047.60 samples/sec Loss 1.6479 LearningRate 0.0014 Epoch: 17 Global Step: 219450 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:30:47,365-Speed 2925.27 samples/sec Loss 1.6400 LearningRate 0.0014 Epoch: 17 Global Step: 219460 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:30:50,771-Speed 3007.16 samples/sec Loss 1.6823 LearningRate 0.0014 Epoch: 17 Global Step: 219470 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:30:54,150-Speed 3031.26 samples/sec Loss 1.6063 LearningRate 0.0014 Epoch: 17 Global Step: 219480 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:30:57,555-Speed 3008.08 samples/sec Loss 1.5954 LearningRate 0.0014 Epoch: 17 Global Step: 219490 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 22:31:00,895-Speed 3067.22 samples/sec Loss 1.5869 LearningRate 0.0014 Epoch: 17 Global Step: 219500 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 22:31:04,214-Speed 3085.62 samples/sec Loss 1.6304 LearningRate 0.0014 Epoch: 17 Global Step: 219510 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:31:07,634-Speed 2994.84 samples/sec Loss 1.5468 LearningRate 0.0014 Epoch: 17 Global Step: 219520 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:31:11,015-Speed 3029.24 samples/sec Loss 1.6179 LearningRate 0.0014 Epoch: 17 Global Step: 219530 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:31:14,371-Speed 3052.15 samples/sec Loss 1.6269 LearningRate 0.0014 Epoch: 17 Global Step: 219540 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:31:17,782-Speed 3002.91 samples/sec Loss 1.6112 LearningRate 0.0014 Epoch: 17 Global Step: 219550 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:31:21,164-Speed 3028.10 samples/sec Loss 1.6685 LearningRate 0.0013 Epoch: 17 Global Step: 219560 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:31:24,602-Speed 2979.38 samples/sec Loss 1.5906 LearningRate 0.0013 Epoch: 17 Global Step: 219570 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:31:27,987-Speed 3026.28 samples/sec Loss 1.5939 LearningRate 0.0013 Epoch: 17 Global Step: 219580 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:31:31,499-Speed 2916.39 samples/sec Loss 1.6643 LearningRate 0.0013 Epoch: 17 Global Step: 219590 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:31:34,894-Speed 3016.95 samples/sec Loss 1.5869 LearningRate 0.0013 Epoch: 17 Global Step: 219600 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:31:38,351-Speed 2963.43 samples/sec Loss 1.5769 LearningRate 0.0013 Epoch: 17 Global Step: 219610 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:31:41,721-Speed 3039.09 samples/sec Loss 1.5994 LearningRate 0.0013 Epoch: 17 Global Step: 219620 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:31:45,045-Speed 3081.36 samples/sec Loss 1.6425 LearningRate 0.0013 Epoch: 17 Global Step: 219630 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:31:48,458-Speed 3001.08 samples/sec Loss 1.5906 LearningRate 0.0013 Epoch: 17 Global Step: 219640 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:31:51,817-Speed 3049.87 samples/sec Loss 1.6047 LearningRate 0.0013 Epoch: 17 Global Step: 219650 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:31:55,116-Speed 3104.31 samples/sec Loss 1.5865 LearningRate 0.0013 Epoch: 17 Global Step: 219660 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:31:58,486-Speed 3040.08 samples/sec Loss 1.6044 LearningRate 0.0013 Epoch: 17 Global Step: 219670 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:32:01,873-Speed 3023.39 samples/sec Loss 1.6268 LearningRate 0.0013 Epoch: 17 Global Step: 219680 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:32:05,179-Speed 3098.81 samples/sec Loss 1.6038 LearningRate 0.0013 Epoch: 17 Global Step: 219690 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:32:08,565-Speed 3025.28 samples/sec Loss 1.6215 LearningRate 0.0013 Epoch: 17 Global Step: 219700 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:32:12,012-Speed 2971.67 samples/sec Loss 1.5422 LearningRate 0.0013 Epoch: 17 Global Step: 219710 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:32:15,390-Speed 3031.61 samples/sec Loss 1.6630 LearningRate 0.0013 Epoch: 17 Global Step: 219720 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:32:18,745-Speed 3053.62 samples/sec Loss 1.5976 LearningRate 0.0013 Epoch: 17 Global Step: 219730 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:32:22,093-Speed 3059.33 samples/sec Loss 1.6500 LearningRate 0.0013 Epoch: 17 Global Step: 219740 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:32:26,088-Speed 2563.79 samples/sec Loss 1.6081 LearningRate 0.0013 Epoch: 17 Global Step: 219750 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:32:29,490-Speed 3010.39 samples/sec Loss 1.6601 LearningRate 0.0013 Epoch: 17 Global Step: 219760 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:32:33,485-Speed 2564.24 samples/sec Loss 1.6243 LearningRate 0.0013 Epoch: 17 Global Step: 219770 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:32:36,909-Speed 2991.48 samples/sec Loss 1.6316 LearningRate 0.0013 Epoch: 17 Global Step: 219780 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:32:40,224-Speed 3090.38 samples/sec Loss 1.6283 LearningRate 0.0013 Epoch: 17 Global Step: 219790 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:32:43,635-Speed 3002.04 samples/sec Loss 1.6607 LearningRate 0.0013 Epoch: 17 Global Step: 219800 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:32:47,068-Speed 2983.87 samples/sec Loss 1.5886 LearningRate 0.0013 Epoch: 17 Global Step: 219810 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:32:50,452-Speed 3026.86 samples/sec Loss 1.6565 LearningRate 0.0013 Epoch: 17 Global Step: 219820 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:32:53,835-Speed 3027.44 samples/sec Loss 1.6311 LearningRate 0.0013 Epoch: 17 Global Step: 219830 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:32:57,283-Speed 2970.63 samples/sec Loss 1.6185 LearningRate 0.0013 Epoch: 17 Global Step: 219840 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:33:00,682-Speed 3013.56 samples/sec Loss 1.6754 LearningRate 0.0013 Epoch: 17 Global Step: 219850 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:33:04,029-Speed 3060.44 samples/sec Loss 1.5956 LearningRate 0.0013 Epoch: 17 Global Step: 219860 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 22:33:07,492-Speed 2957.53 samples/sec Loss 1.6496 LearningRate 0.0013 Epoch: 17 Global Step: 219870 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 22:33:10,888-Speed 3016.61 samples/sec Loss 1.6168 LearningRate 0.0013 Epoch: 17 Global Step: 219880 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 22:33:14,248-Speed 3048.32 samples/sec Loss 1.6400 LearningRate 0.0013 Epoch: 17 Global Step: 219890 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:33:17,656-Speed 3005.13 samples/sec Loss 1.6508 LearningRate 0.0013 Epoch: 17 Global Step: 219900 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:33:21,103-Speed 2971.16 samples/sec Loss 1.6378 LearningRate 0.0013 Epoch: 17 Global Step: 219910 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:33:24,428-Speed 3080.97 samples/sec Loss 1.5976 LearningRate 0.0013 Epoch: 17 Global Step: 219920 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:33:27,769-Speed 3065.66 samples/sec Loss 1.6696 LearningRate 0.0013 Epoch: 17 Global Step: 219930 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:33:31,129-Speed 3048.62 samples/sec Loss 1.7067 LearningRate 0.0013 Epoch: 17 Global Step: 219940 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:33:34,480-Speed 3056.64 samples/sec Loss 1.6247 LearningRate 0.0013 Epoch: 17 Global Step: 219950 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:33:37,853-Speed 3037.23 samples/sec Loss 1.5699 LearningRate 0.0013 Epoch: 17 Global Step: 219960 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:33:41,275-Speed 2992.69 samples/sec Loss 1.5830 LearningRate 0.0013 Epoch: 17 Global Step: 219970 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:33:44,694-Speed 2995.62 samples/sec Loss 1.6079 LearningRate 0.0013 Epoch: 17 Global Step: 219980 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:33:48,114-Speed 2994.91 samples/sec Loss 1.6383 LearningRate 0.0013 Epoch: 17 Global Step: 219990 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:33:51,499-Speed 3026.02 samples/sec Loss 1.6419 LearningRate 0.0013 Epoch: 17 Global Step: 220000 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:33:54,886-Speed 3024.55 samples/sec Loss 1.5617 LearningRate 0.0013 Epoch: 17 Global Step: 220010 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:33:58,219-Speed 3072.81 samples/sec Loss 1.6395 LearningRate 0.0013 Epoch: 17 Global Step: 220020 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:34:01,612-Speed 3018.77 samples/sec Loss 1.6309 LearningRate 0.0013 Epoch: 17 Global Step: 220030 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:34:04,990-Speed 3032.15 samples/sec Loss 1.6669 LearningRate 0.0013 Epoch: 17 Global Step: 220040 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:34:08,417-Speed 2989.18 samples/sec Loss 1.5884 LearningRate 0.0013 Epoch: 17 Global Step: 220050 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:34:11,814-Speed 3015.28 samples/sec Loss 1.5602 LearningRate 0.0013 Epoch: 17 Global Step: 220060 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:34:15,195-Speed 3029.51 samples/sec Loss 1.5441 LearningRate 0.0013 Epoch: 17 Global Step: 220070 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:34:18,511-Speed 3089.20 samples/sec Loss 1.6163 LearningRate 0.0013 Epoch: 17 Global Step: 220080 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:34:21,870-Speed 3049.81 samples/sec Loss 1.6014 LearningRate 0.0013 Epoch: 17 Global Step: 220090 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:34:25,273-Speed 3010.19 samples/sec Loss 1.5851 LearningRate 0.0013 Epoch: 17 Global Step: 220100 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:34:28,616-Speed 3064.32 samples/sec Loss 1.6721 LearningRate 0.0013 Epoch: 17 Global Step: 220110 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:34:31,953-Speed 3069.17 samples/sec Loss 1.6278 LearningRate 0.0013 Epoch: 17 Global Step: 220120 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:34:35,393-Speed 2977.21 samples/sec Loss 1.5879 LearningRate 0.0013 Epoch: 17 Global Step: 220130 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:34:38,744-Speed 3056.85 samples/sec Loss 1.6230 LearningRate 0.0013 Epoch: 17 Global Step: 220140 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:34:42,077-Speed 3073.76 samples/sec Loss 1.5348 LearningRate 0.0013 Epoch: 17 Global Step: 220150 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:34:45,430-Speed 3053.86 samples/sec Loss 1.6206 LearningRate 0.0013 Epoch: 17 Global Step: 220160 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:34:48,835-Speed 3008.39 samples/sec Loss 1.6698 LearningRate 0.0013 Epoch: 17 Global Step: 220170 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:34:52,141-Speed 3099.97 samples/sec Loss 1.5767 LearningRate 0.0013 Epoch: 17 Global Step: 220180 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 22:34:55,523-Speed 3028.90 samples/sec Loss 1.5993 LearningRate 0.0013 Epoch: 17 Global Step: 220190 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 22:34:58,862-Speed 3067.31 samples/sec Loss 1.7041 LearningRate 0.0013 Epoch: 17 Global Step: 220200 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 22:35:02,352-Speed 2934.69 samples/sec Loss 1.6519 LearningRate 0.0013 Epoch: 17 Global Step: 220210 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 22:35:05,770-Speed 2997.17 samples/sec Loss 1.5858 LearningRate 0.0013 Epoch: 17 Global Step: 220220 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 22:35:09,171-Speed 3011.75 samples/sec Loss 1.6149 LearningRate 0.0013 Epoch: 17 Global Step: 220230 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 22:35:12,559-Speed 3023.11 samples/sec Loss 1.5907 LearningRate 0.0013 Epoch: 17 Global Step: 220240 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 22:35:15,939-Speed 3029.54 samples/sec Loss 1.6435 LearningRate 0.0013 Epoch: 17 Global Step: 220250 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 22:35:19,322-Speed 3028.30 samples/sec Loss 1.6497 LearningRate 0.0013 Epoch: 17 Global Step: 220260 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 22:35:22,701-Speed 3031.14 samples/sec Loss 1.6330 LearningRate 0.0013 Epoch: 17 Global Step: 220270 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 22:35:26,066-Speed 3043.88 samples/sec Loss 1.5785 LearningRate 0.0013 Epoch: 17 Global Step: 220280 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:35:29,462-Speed 3016.14 samples/sec Loss 1.6892 LearningRate 0.0013 Epoch: 17 Global Step: 220290 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:35:32,807-Speed 3062.19 samples/sec Loss 1.6364 LearningRate 0.0013 Epoch: 17 Global Step: 220300 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:35:36,261-Speed 2964.98 samples/sec Loss 1.6090 LearningRate 0.0013 Epoch: 17 Global Step: 220310 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:35:39,617-Speed 3052.08 samples/sec Loss 1.6237 LearningRate 0.0013 Epoch: 17 Global Step: 220320 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:35:42,972-Speed 3052.85 samples/sec Loss 1.6165 LearningRate 0.0013 Epoch: 17 Global Step: 220330 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:35:46,355-Speed 3028.23 samples/sec Loss 1.6067 LearningRate 0.0013 Epoch: 17 Global Step: 220340 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:35:49,708-Speed 3054.34 samples/sec Loss 1.6390 LearningRate 0.0013 Epoch: 17 Global Step: 220350 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:35:53,135-Speed 2989.16 samples/sec Loss 1.6609 LearningRate 0.0013 Epoch: 17 Global Step: 220360 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:35:56,509-Speed 3035.46 samples/sec Loss 1.5830 LearningRate 0.0013 Epoch: 17 Global Step: 220370 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:35:59,849-Speed 3066.47 samples/sec Loss 1.6051 LearningRate 0.0013 Epoch: 17 Global Step: 220380 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:36:03,277-Speed 2987.98 samples/sec Loss 1.6115 LearningRate 0.0013 Epoch: 17 Global Step: 220390 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:36:06,607-Speed 3076.48 samples/sec Loss 1.6671 LearningRate 0.0013 Epoch: 17 Global Step: 220400 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:36:09,977-Speed 3039.61 samples/sec Loss 1.6518 LearningRate 0.0013 Epoch: 17 Global Step: 220410 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:36:13,332-Speed 3052.34 samples/sec Loss 1.5940 LearningRate 0.0013 Epoch: 17 Global Step: 220420 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:36:16,784-Speed 2967.04 samples/sec Loss 1.6164 LearningRate 0.0013 Epoch: 17 Global Step: 220430 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:36:20,175-Speed 3021.05 samples/sec Loss 1.6174 LearningRate 0.0013 Epoch: 17 Global Step: 220440 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:36:23,530-Speed 3052.27 samples/sec Loss 1.6275 LearningRate 0.0013 Epoch: 17 Global Step: 220450 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:36:26,952-Speed 2993.66 samples/sec Loss 1.5891 LearningRate 0.0013 Epoch: 17 Global Step: 220460 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:36:30,341-Speed 3022.67 samples/sec Loss 1.6410 LearningRate 0.0013 Epoch: 17 Global Step: 220470 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:36:33,801-Speed 2960.23 samples/sec Loss 1.5639 LearningRate 0.0013 Epoch: 17 Global Step: 220480 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 22:36:37,160-Speed 3049.47 samples/sec Loss 1.5378 LearningRate 0.0013 Epoch: 17 Global Step: 220490 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:36:40,544-Speed 3027.26 samples/sec Loss 1.6142 LearningRate 0.0013 Epoch: 17 Global Step: 220500 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:36:43,976-Speed 2985.07 samples/sec Loss 1.5921 LearningRate 0.0013 Epoch: 17 Global Step: 220510 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:36:47,384-Speed 3005.14 samples/sec Loss 1.5990 LearningRate 0.0013 Epoch: 17 Global Step: 220520 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:36:50,804-Speed 2995.58 samples/sec Loss 1.5552 LearningRate 0.0013 Epoch: 17 Global Step: 220530 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:36:54,222-Speed 2996.64 samples/sec Loss 1.5972 LearningRate 0.0013 Epoch: 17 Global Step: 220540 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:36:57,608-Speed 3024.76 samples/sec Loss 1.6110 LearningRate 0.0013 Epoch: 17 Global Step: 220550 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:37:00,961-Speed 3055.01 samples/sec Loss 1.6048 LearningRate 0.0013 Epoch: 17 Global Step: 220560 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:37:04,391-Speed 2985.68 samples/sec Loss 1.6943 LearningRate 0.0013 Epoch: 17 Global Step: 220570 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:37:07,726-Speed 3071.44 samples/sec Loss 1.6082 LearningRate 0.0013 Epoch: 17 Global Step: 220580 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:37:11,108-Speed 3028.71 samples/sec Loss 1.5917 LearningRate 0.0013 Epoch: 17 Global Step: 220590 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 22:37:14,438-Speed 3075.64 samples/sec Loss 1.6670 LearningRate 0.0013 Epoch: 17 Global Step: 220600 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:37:17,829-Speed 3021.13 samples/sec Loss 1.6108 LearningRate 0.0013 Epoch: 17 Global Step: 220610 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:37:21,200-Speed 3038.66 samples/sec Loss 1.6059 LearningRate 0.0013 Epoch: 17 Global Step: 220620 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:37:24,629-Speed 2987.19 samples/sec Loss 1.5768 LearningRate 0.0013 Epoch: 17 Global Step: 220630 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:37:28,080-Speed 2967.76 samples/sec Loss 1.6021 LearningRate 0.0013 Epoch: 17 Global Step: 220640 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:37:31,512-Speed 2984.38 samples/sec Loss 1.6343 LearningRate 0.0012 Epoch: 17 Global Step: 220650 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:37:34,992-Speed 2943.20 samples/sec Loss 1.6009 LearningRate 0.0012 Epoch: 17 Global Step: 220660 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:37:38,417-Speed 2990.97 samples/sec Loss 1.5785 LearningRate 0.0012 Epoch: 17 Global Step: 220670 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:37:41,844-Speed 2988.70 samples/sec Loss 1.6544 LearningRate 0.0012 Epoch: 17 Global Step: 220680 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:37:45,292-Speed 2971.29 samples/sec Loss 1.6215 LearningRate 0.0012 Epoch: 17 Global Step: 220690 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:37:48,631-Speed 3067.41 samples/sec Loss 1.5969 LearningRate 0.0012 Epoch: 17 Global Step: 220700 Fp16 Grad Scale: 65536 Required: 3 hours Training: 2022-04-27 22:37:51,993-Speed 3046.78 samples/sec Loss 1.6252 LearningRate 0.0012 Epoch: 17 Global Step: 220710 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:37:55,464-Speed 2950.92 samples/sec Loss 1.6443 LearningRate 0.0012 Epoch: 17 Global Step: 220720 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:37:58,826-Speed 3046.11 samples/sec Loss 1.6541 LearningRate 0.0012 Epoch: 17 Global Step: 220730 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:38:02,231-Speed 3008.35 samples/sec Loss 1.5948 LearningRate 0.0012 Epoch: 17 Global Step: 220740 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:38:05,596-Speed 3044.16 samples/sec Loss 1.6888 LearningRate 0.0012 Epoch: 17 Global Step: 220750 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:38:09,021-Speed 2990.49 samples/sec Loss 1.5493 LearningRate 0.0012 Epoch: 17 Global Step: 220760 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 22:38:12,401-Speed 3031.14 samples/sec Loss 1.6138 LearningRate 0.0012 Epoch: 17 Global Step: 220770 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 22:38:15,773-Speed 3037.03 samples/sec Loss 1.6244 LearningRate 0.0012 Epoch: 17 Global Step: 220780 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 22:38:19,094-Speed 3084.37 samples/sec Loss 1.6971 LearningRate 0.0012 Epoch: 17 Global Step: 220790 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 22:38:22,465-Speed 3038.11 samples/sec Loss 1.6429 LearningRate 0.0012 Epoch: 17 Global Step: 220800 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 22:38:25,836-Speed 3038.79 samples/sec Loss 1.6129 LearningRate 0.0012 Epoch: 17 Global Step: 220810 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 22:38:29,264-Speed 2988.11 samples/sec Loss 1.6110 LearningRate 0.0012 Epoch: 17 Global Step: 220820 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 22:38:32,636-Speed 3038.12 samples/sec Loss 1.5834 LearningRate 0.0012 Epoch: 17 Global Step: 220830 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 22:38:36,073-Speed 2979.93 samples/sec Loss 1.5871 LearningRate 0.0012 Epoch: 17 Global Step: 220840 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 22:38:39,467-Speed 3018.15 samples/sec Loss 1.6200 LearningRate 0.0012 Epoch: 17 Global Step: 220850 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 22:38:42,782-Speed 3089.81 samples/sec Loss 1.6617 LearningRate 0.0012 Epoch: 17 Global Step: 220860 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:38:46,117-Speed 3071.14 samples/sec Loss 1.6027 LearningRate 0.0012 Epoch: 17 Global Step: 220870 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 22:38:49,533-Speed 2998.31 samples/sec Loss 1.6425 LearningRate 0.0012 Epoch: 17 Global Step: 220880 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 22:38:52,984-Speed 2967.41 samples/sec Loss 1.6111 LearningRate 0.0012 Epoch: 17 Global Step: 220890 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 22:38:56,330-Speed 3061.56 samples/sec Loss 1.6856 LearningRate 0.0012 Epoch: 17 Global Step: 220900 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 22:38:59,777-Speed 2971.51 samples/sec Loss 1.5922 LearningRate 0.0012 Epoch: 17 Global Step: 220910 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 22:39:03,221-Speed 2975.31 samples/sec Loss 1.5221 LearningRate 0.0012 Epoch: 17 Global Step: 220920 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 22:39:06,613-Speed 3019.42 samples/sec Loss 1.6167 LearningRate 0.0012 Epoch: 17 Global Step: 220930 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 22:39:10,061-Speed 2971.08 samples/sec Loss 1.6207 LearningRate 0.0012 Epoch: 17 Global Step: 220940 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 22:39:13,491-Speed 2986.30 samples/sec Loss 1.6786 LearningRate 0.0012 Epoch: 17 Global Step: 220950 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 22:39:16,890-Speed 3013.73 samples/sec Loss 1.6610 LearningRate 0.0012 Epoch: 17 Global Step: 220960 Fp16 Grad Scale: 8192 Required: 3 hours Training: 2022-04-27 22:39:20,364-Speed 2947.72 samples/sec Loss 1.5632 LearningRate 0.0012 Epoch: 17 Global Step: 220970 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:39:23,755-Speed 3020.94 samples/sec Loss 1.5974 LearningRate 0.0012 Epoch: 17 Global Step: 220980 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:39:27,131-Speed 3034.22 samples/sec Loss 1.6284 LearningRate 0.0012 Epoch: 17 Global Step: 220990 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:39:30,607-Speed 2946.74 samples/sec Loss 1.6012 LearningRate 0.0012 Epoch: 17 Global Step: 221000 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:39:33,950-Speed 3064.10 samples/sec Loss 1.6478 LearningRate 0.0012 Epoch: 17 Global Step: 221010 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:39:37,280-Speed 3075.42 samples/sec Loss 1.6527 LearningRate 0.0012 Epoch: 17 Global Step: 221020 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:39:40,725-Speed 2973.75 samples/sec Loss 1.6707 LearningRate 0.0012 Epoch: 17 Global Step: 221030 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:39:44,118-Speed 3018.88 samples/sec Loss 1.5697 LearningRate 0.0012 Epoch: 17 Global Step: 221040 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:39:47,476-Speed 3049.80 samples/sec Loss 1.6082 LearningRate 0.0012 Epoch: 17 Global Step: 221050 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:39:50,822-Speed 3060.88 samples/sec Loss 1.6017 LearningRate 0.0012 Epoch: 17 Global Step: 221060 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:39:54,202-Speed 3030.61 samples/sec Loss 1.5948 LearningRate 0.0012 Epoch: 17 Global Step: 221070 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:39:57,531-Speed 3077.03 samples/sec Loss 1.6262 LearningRate 0.0012 Epoch: 17 Global Step: 221080 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:40:00,884-Speed 3054.59 samples/sec Loss 1.6419 LearningRate 0.0012 Epoch: 17 Global Step: 221090 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:40:04,294-Speed 3003.94 samples/sec Loss 1.5818 LearningRate 0.0012 Epoch: 17 Global Step: 221100 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:40:07,764-Speed 2952.17 samples/sec Loss 1.5813 LearningRate 0.0012 Epoch: 17 Global Step: 221110 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:40:11,191-Speed 2988.93 samples/sec Loss 1.6038 LearningRate 0.0012 Epoch: 17 Global Step: 221120 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:40:14,578-Speed 3023.98 samples/sec Loss 1.6198 LearningRate 0.0012 Epoch: 17 Global Step: 221130 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:40:17,901-Speed 3082.33 samples/sec Loss 1.6312 LearningRate 0.0012 Epoch: 17 Global Step: 221140 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:40:21,335-Speed 2983.01 samples/sec Loss 1.5972 LearningRate 0.0012 Epoch: 17 Global Step: 221150 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:40:24,660-Speed 3080.90 samples/sec Loss 1.6120 LearningRate 0.0012 Epoch: 17 Global Step: 221160 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:40:28,005-Speed 3061.47 samples/sec Loss 1.6165 LearningRate 0.0012 Epoch: 17 Global Step: 221170 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:40:31,437-Speed 2985.44 samples/sec Loss 1.6381 LearningRate 0.0012 Epoch: 17 Global Step: 221180 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:40:34,841-Speed 3008.77 samples/sec Loss 1.5453 LearningRate 0.0012 Epoch: 17 Global Step: 221190 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:40:38,207-Speed 3042.42 samples/sec Loss 1.6518 LearningRate 0.0012 Epoch: 17 Global Step: 221200 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:40:41,647-Speed 2977.79 samples/sec Loss 1.6780 LearningRate 0.0012 Epoch: 17 Global Step: 221210 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:40:45,052-Speed 3008.18 samples/sec Loss 1.5851 LearningRate 0.0012 Epoch: 17 Global Step: 221220 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:40:48,441-Speed 3023.05 samples/sec Loss 1.6669 LearningRate 0.0012 Epoch: 17 Global Step: 221230 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:40:51,914-Speed 2948.71 samples/sec Loss 1.6120 LearningRate 0.0012 Epoch: 17 Global Step: 221240 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:40:55,374-Speed 2960.50 samples/sec Loss 1.6093 LearningRate 0.0012 Epoch: 17 Global Step: 221250 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:40:58,794-Speed 2994.76 samples/sec Loss 1.5974 LearningRate 0.0012 Epoch: 17 Global Step: 221260 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:41:02,192-Speed 3015.05 samples/sec Loss 1.5981 LearningRate 0.0012 Epoch: 17 Global Step: 221270 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:41:05,542-Speed 3057.03 samples/sec Loss 1.6316 LearningRate 0.0012 Epoch: 17 Global Step: 221280 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:41:08,894-Speed 3056.40 samples/sec Loss 1.5835 LearningRate 0.0012 Epoch: 17 Global Step: 221290 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:41:12,310-Speed 2998.93 samples/sec Loss 1.6674 LearningRate 0.0012 Epoch: 17 Global Step: 221300 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:41:15,668-Speed 3049.75 samples/sec Loss 1.6355 LearningRate 0.0012 Epoch: 17 Global Step: 221310 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:41:19,046-Speed 3032.36 samples/sec Loss 1.6041 LearningRate 0.0012 Epoch: 17 Global Step: 221320 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:41:22,482-Speed 2981.39 samples/sec Loss 1.6292 LearningRate 0.0012 Epoch: 17 Global Step: 221330 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:41:25,991-Speed 2918.79 samples/sec Loss 1.6265 LearningRate 0.0012 Epoch: 17 Global Step: 221340 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:41:29,395-Speed 3009.24 samples/sec Loss 1.5632 LearningRate 0.0012 Epoch: 17 Global Step: 221350 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:41:32,850-Speed 2965.51 samples/sec Loss 1.6277 LearningRate 0.0012 Epoch: 17 Global Step: 221360 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:41:36,250-Speed 3011.73 samples/sec Loss 1.6675 LearningRate 0.0012 Epoch: 17 Global Step: 221370 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:41:39,707-Speed 2963.31 samples/sec Loss 1.6162 LearningRate 0.0012 Epoch: 17 Global Step: 221380 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:41:43,114-Speed 3006.44 samples/sec Loss 1.6078 LearningRate 0.0012 Epoch: 17 Global Step: 221390 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:41:46,561-Speed 2971.76 samples/sec Loss 1.6846 LearningRate 0.0012 Epoch: 17 Global Step: 221400 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:41:49,952-Speed 3021.04 samples/sec Loss 1.6272 LearningRate 0.0012 Epoch: 17 Global Step: 221410 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:41:53,386-Speed 2982.24 samples/sec Loss 1.6043 LearningRate 0.0012 Epoch: 17 Global Step: 221420 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:41:56,800-Speed 3000.54 samples/sec Loss 1.6049 LearningRate 0.0012 Epoch: 17 Global Step: 221430 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:42:00,271-Speed 2951.11 samples/sec Loss 1.5632 LearningRate 0.0012 Epoch: 17 Global Step: 221440 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:42:03,623-Speed 3055.84 samples/sec Loss 1.6574 LearningRate 0.0012 Epoch: 17 Global Step: 221450 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:42:06,970-Speed 3059.86 samples/sec Loss 1.6777 LearningRate 0.0012 Epoch: 17 Global Step: 221460 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:42:10,376-Speed 3008.19 samples/sec Loss 1.6640 LearningRate 0.0012 Epoch: 17 Global Step: 221470 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:42:13,860-Speed 2940.05 samples/sec Loss 1.6237 LearningRate 0.0012 Epoch: 17 Global Step: 221480 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:42:17,292-Speed 2983.96 samples/sec Loss 1.6087 LearningRate 0.0012 Epoch: 17 Global Step: 221490 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:42:20,662-Speed 3040.30 samples/sec Loss 1.6035 LearningRate 0.0012 Epoch: 17 Global Step: 221500 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:42:24,066-Speed 3008.57 samples/sec Loss 1.5820 LearningRate 0.0012 Epoch: 17 Global Step: 221510 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:42:27,395-Speed 3076.77 samples/sec Loss 1.6090 LearningRate 0.0012 Epoch: 17 Global Step: 221520 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:42:30,716-Speed 3084.66 samples/sec Loss 1.6211 LearningRate 0.0012 Epoch: 17 Global Step: 221530 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:42:34,027-Speed 3093.62 samples/sec Loss 1.6111 LearningRate 0.0012 Epoch: 17 Global Step: 221540 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:42:37,363-Speed 3070.24 samples/sec Loss 1.6443 LearningRate 0.0012 Epoch: 17 Global Step: 221550 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:42:40,771-Speed 3005.83 samples/sec Loss 1.6470 LearningRate 0.0012 Epoch: 17 Global Step: 221560 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:42:44,199-Speed 2987.51 samples/sec Loss 1.6123 LearningRate 0.0012 Epoch: 17 Global Step: 221570 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:42:47,618-Speed 2996.22 samples/sec Loss 1.6149 LearningRate 0.0012 Epoch: 17 Global Step: 221580 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:42:51,020-Speed 3011.18 samples/sec Loss 1.5956 LearningRate 0.0012 Epoch: 17 Global Step: 221590 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:42:54,375-Speed 3052.39 samples/sec Loss 1.6198 LearningRate 0.0012 Epoch: 17 Global Step: 221600 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:42:57,813-Speed 2979.36 samples/sec Loss 1.6180 LearningRate 0.0012 Epoch: 17 Global Step: 221610 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:43:01,211-Speed 3014.78 samples/sec Loss 1.5575 LearningRate 0.0012 Epoch: 17 Global Step: 221620 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:43:04,594-Speed 3027.53 samples/sec Loss 1.6182 LearningRate 0.0012 Epoch: 17 Global Step: 221630 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:43:07,953-Speed 3049.35 samples/sec Loss 1.5613 LearningRate 0.0012 Epoch: 17 Global Step: 221640 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:43:11,318-Speed 3044.04 samples/sec Loss 1.5875 LearningRate 0.0012 Epoch: 17 Global Step: 221650 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:43:14,721-Speed 3009.52 samples/sec Loss 1.5808 LearningRate 0.0012 Epoch: 17 Global Step: 221660 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:43:18,138-Speed 2998.30 samples/sec Loss 1.6331 LearningRate 0.0012 Epoch: 17 Global Step: 221670 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:43:21,521-Speed 3027.81 samples/sec Loss 1.6335 LearningRate 0.0012 Epoch: 17 Global Step: 221680 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:43:24,905-Speed 3026.37 samples/sec Loss 1.6607 LearningRate 0.0012 Epoch: 17 Global Step: 221690 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:43:28,335-Speed 2986.89 samples/sec Loss 1.6539 LearningRate 0.0012 Epoch: 17 Global Step: 221700 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:43:31,772-Speed 2979.79 samples/sec Loss 1.6223 LearningRate 0.0012 Epoch: 17 Global Step: 221710 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:43:35,180-Speed 3005.32 samples/sec Loss 1.6048 LearningRate 0.0012 Epoch: 17 Global Step: 221720 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:43:38,616-Speed 2981.04 samples/sec Loss 1.6087 LearningRate 0.0012 Epoch: 17 Global Step: 221730 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:43:42,095-Speed 2945.10 samples/sec Loss 1.6459 LearningRate 0.0012 Epoch: 17 Global Step: 221740 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:43:45,506-Speed 3002.42 samples/sec Loss 1.5989 LearningRate 0.0012 Epoch: 17 Global Step: 221750 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:43:48,875-Speed 3039.99 samples/sec Loss 1.5849 LearningRate 0.0012 Epoch: 17 Global Step: 221760 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:43:52,323-Speed 2971.33 samples/sec Loss 1.6566 LearningRate 0.0012 Epoch: 17 Global Step: 221770 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:43:55,701-Speed 3031.83 samples/sec Loss 1.6401 LearningRate 0.0011 Epoch: 17 Global Step: 221780 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:43:59,098-Speed 3014.99 samples/sec Loss 1.6509 LearningRate 0.0011 Epoch: 17 Global Step: 221790 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:44:02,456-Speed 3050.79 samples/sec Loss 1.5511 LearningRate 0.0011 Epoch: 17 Global Step: 221800 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:44:05,859-Speed 3010.01 samples/sec Loss 1.6286 LearningRate 0.0011 Epoch: 17 Global Step: 221810 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:44:09,353-Speed 2930.90 samples/sec Loss 1.5851 LearningRate 0.0011 Epoch: 17 Global Step: 221820 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:44:12,715-Speed 3047.36 samples/sec Loss 1.6744 LearningRate 0.0011 Epoch: 17 Global Step: 221830 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:44:16,027-Speed 3091.76 samples/sec Loss 1.6493 LearningRate 0.0011 Epoch: 17 Global Step: 221840 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:44:19,397-Speed 3039.80 samples/sec Loss 1.6090 LearningRate 0.0011 Epoch: 17 Global Step: 221850 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:44:22,870-Speed 2949.53 samples/sec Loss 1.5674 LearningRate 0.0011 Epoch: 17 Global Step: 221860 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:44:26,292-Speed 2992.61 samples/sec Loss 1.5922 LearningRate 0.0011 Epoch: 17 Global Step: 221870 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:44:29,699-Speed 3006.58 samples/sec Loss 1.5805 LearningRate 0.0011 Epoch: 17 Global Step: 221880 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:44:33,068-Speed 3040.38 samples/sec Loss 1.6329 LearningRate 0.0011 Epoch: 17 Global Step: 221890 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:44:36,429-Speed 3047.92 samples/sec Loss 1.5936 LearningRate 0.0011 Epoch: 17 Global Step: 221900 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:44:39,785-Speed 3052.20 samples/sec Loss 1.6402 LearningRate 0.0011 Epoch: 17 Global Step: 221910 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:44:43,156-Speed 3038.82 samples/sec Loss 1.6063 LearningRate 0.0011 Epoch: 17 Global Step: 221920 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:44:46,538-Speed 3027.96 samples/sec Loss 1.5792 LearningRate 0.0011 Epoch: 17 Global Step: 221930 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:44:49,891-Speed 3055.18 samples/sec Loss 1.5809 LearningRate 0.0011 Epoch: 17 Global Step: 221940 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:44:53,343-Speed 2967.13 samples/sec Loss 1.6059 LearningRate 0.0011 Epoch: 17 Global Step: 221950 Fp16 Grad Scale: 32768 Required: 3 hours Training: 2022-04-27 22:44:56,743-Speed 3012.31 samples/sec Loss 1.6705 LearningRate 0.0011 Epoch: 17 Global Step: 221960 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:45:00,146-Speed 3010.52 samples/sec Loss 1.6543 LearningRate 0.0011 Epoch: 17 Global Step: 221970 Fp16 Grad Scale: 16384 Required: 3 hours Training: 2022-04-27 22:45:03,498-Speed 3055.64 samples/sec Loss 1.5948 LearningRate 0.0011 Epoch: 17 Global Step: 221980 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 22:45:06,854-Speed 3051.81 samples/sec Loss 1.5985 LearningRate 0.0011 Epoch: 17 Global Step: 221990 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 22:45:10,320-Speed 2955.61 samples/sec Loss 1.6581 LearningRate 0.0011 Epoch: 17 Global Step: 222000 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 22:45:13,741-Speed 2994.26 samples/sec Loss 1.6249 LearningRate 0.0011 Epoch: 17 Global Step: 222010 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 22:45:17,103-Speed 3046.04 samples/sec Loss 1.5985 LearningRate 0.0011 Epoch: 17 Global Step: 222020 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 22:45:20,538-Speed 2982.59 samples/sec Loss 1.6662 LearningRate 0.0011 Epoch: 17 Global Step: 222030 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 22:45:23,911-Speed 3036.78 samples/sec Loss 1.6499 LearningRate 0.0011 Epoch: 17 Global Step: 222040 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 22:45:27,298-Speed 3023.55 samples/sec Loss 1.6421 LearningRate 0.0011 Epoch: 17 Global Step: 222050 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 22:45:30,661-Speed 3046.02 samples/sec Loss 1.6179 LearningRate 0.0011 Epoch: 17 Global Step: 222060 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:45:34,062-Speed 3011.52 samples/sec Loss 1.5877 LearningRate 0.0011 Epoch: 17 Global Step: 222070 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:45:37,435-Speed 3036.80 samples/sec Loss 1.5912 LearningRate 0.0011 Epoch: 17 Global Step: 222080 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:45:40,870-Speed 2981.94 samples/sec Loss 1.6330 LearningRate 0.0011 Epoch: 17 Global Step: 222090 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:45:44,227-Speed 3051.67 samples/sec Loss 1.5667 LearningRate 0.0011 Epoch: 17 Global Step: 222100 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 22:45:47,682-Speed 2964.18 samples/sec Loss 1.6179 LearningRate 0.0011 Epoch: 17 Global Step: 222110 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 22:45:51,136-Speed 2965.27 samples/sec Loss 1.5955 LearningRate 0.0011 Epoch: 17 Global Step: 222120 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 22:45:54,526-Speed 3021.97 samples/sec Loss 1.6244 LearningRate 0.0011 Epoch: 17 Global Step: 222130 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 22:45:57,932-Speed 3006.88 samples/sec Loss 1.6841 LearningRate 0.0011 Epoch: 17 Global Step: 222140 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 22:46:01,266-Speed 3072.75 samples/sec Loss 1.6135 LearningRate 0.0011 Epoch: 17 Global Step: 222150 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 22:46:04,623-Speed 3050.58 samples/sec Loss 1.6417 LearningRate 0.0011 Epoch: 17 Global Step: 222160 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 22:46:07,972-Speed 3058.57 samples/sec Loss 1.6281 LearningRate 0.0011 Epoch: 17 Global Step: 222170 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 22:46:11,382-Speed 3004.31 samples/sec Loss 1.6046 LearningRate 0.0011 Epoch: 17 Global Step: 222180 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 22:46:14,775-Speed 3018.56 samples/sec Loss 1.5983 LearningRate 0.0011 Epoch: 17 Global Step: 222190 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 22:46:18,140-Speed 3045.13 samples/sec Loss 1.6047 LearningRate 0.0011 Epoch: 17 Global Step: 222200 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:46:21,609-Speed 2952.15 samples/sec Loss 1.6017 LearningRate 0.0011 Epoch: 17 Global Step: 222210 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:46:25,011-Speed 3011.58 samples/sec Loss 1.5868 LearningRate 0.0011 Epoch: 17 Global Step: 222220 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:46:28,456-Speed 2973.30 samples/sec Loss 1.5972 LearningRate 0.0011 Epoch: 17 Global Step: 222230 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:46:31,792-Speed 3070.01 samples/sec Loss 1.6392 LearningRate 0.0011 Epoch: 17 Global Step: 222240 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:46:35,153-Speed 3048.03 samples/sec Loss 1.5612 LearningRate 0.0011 Epoch: 17 Global Step: 222250 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:46:38,529-Speed 3033.65 samples/sec Loss 1.6138 LearningRate 0.0011 Epoch: 17 Global Step: 222260 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:46:41,902-Speed 3036.75 samples/sec Loss 1.6384 LearningRate 0.0011 Epoch: 17 Global Step: 222270 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:46:45,260-Speed 3050.37 samples/sec Loss 1.5767 LearningRate 0.0011 Epoch: 17 Global Step: 222280 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:46:48,605-Speed 3061.87 samples/sec Loss 1.5914 LearningRate 0.0011 Epoch: 17 Global Step: 222290 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:46:52,067-Speed 2958.93 samples/sec Loss 1.6248 LearningRate 0.0011 Epoch: 17 Global Step: 222300 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 22:46:55,459-Speed 3019.08 samples/sec Loss 1.6132 LearningRate 0.0011 Epoch: 17 Global Step: 222310 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:46:58,865-Speed 3007.76 samples/sec Loss 1.6469 LearningRate 0.0011 Epoch: 17 Global Step: 222320 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:47:02,275-Speed 3003.40 samples/sec Loss 1.6340 LearningRate 0.0011 Epoch: 17 Global Step: 222330 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:47:05,629-Speed 3054.82 samples/sec Loss 1.6237 LearningRate 0.0011 Epoch: 17 Global Step: 222340 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:47:08,982-Speed 3054.78 samples/sec Loss 1.5743 LearningRate 0.0011 Epoch: 17 Global Step: 222350 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:47:12,384-Speed 3010.64 samples/sec Loss 1.6240 LearningRate 0.0011 Epoch: 17 Global Step: 222360 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:47:15,798-Speed 3000.28 samples/sec Loss 1.6054 LearningRate 0.0011 Epoch: 17 Global Step: 222370 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:47:19,213-Speed 3000.05 samples/sec Loss 1.6274 LearningRate 0.0011 Epoch: 17 Global Step: 222380 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:47:22,604-Speed 3021.23 samples/sec Loss 1.6081 LearningRate 0.0011 Epoch: 17 Global Step: 222390 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:47:25,979-Speed 3035.00 samples/sec Loss 1.6123 LearningRate 0.0011 Epoch: 17 Global Step: 222400 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:47:29,368-Speed 3022.21 samples/sec Loss 1.6007 LearningRate 0.0011 Epoch: 17 Global Step: 222410 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 22:47:32,748-Speed 3031.16 samples/sec Loss 1.6056 LearningRate 0.0011 Epoch: 17 Global Step: 222420 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:47:36,127-Speed 3031.06 samples/sec Loss 1.6264 LearningRate 0.0011 Epoch: 17 Global Step: 222430 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:47:39,528-Speed 3011.74 samples/sec Loss 1.5778 LearningRate 0.0011 Epoch: 17 Global Step: 222440 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:47:42,958-Speed 2986.10 samples/sec Loss 1.6110 LearningRate 0.0011 Epoch: 17 Global Step: 222450 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:47:46,342-Speed 3027.08 samples/sec Loss 1.6080 LearningRate 0.0011 Epoch: 17 Global Step: 222460 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:47:49,686-Speed 3063.31 samples/sec Loss 1.6151 LearningRate 0.0011 Epoch: 17 Global Step: 222470 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:47:53,118-Speed 2985.14 samples/sec Loss 1.5857 LearningRate 0.0011 Epoch: 17 Global Step: 222480 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:47:56,467-Speed 3058.37 samples/sec Loss 1.6368 LearningRate 0.0011 Epoch: 17 Global Step: 222490 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:47:59,854-Speed 3024.26 samples/sec Loss 1.5945 LearningRate 0.0011 Epoch: 17 Global Step: 222500 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:48:03,269-Speed 2999.01 samples/sec Loss 1.6240 LearningRate 0.0011 Epoch: 17 Global Step: 222510 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:48:06,644-Speed 3035.58 samples/sec Loss 1.6818 LearningRate 0.0011 Epoch: 17 Global Step: 222520 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 22:48:10,030-Speed 3024.90 samples/sec Loss 1.5985 LearningRate 0.0011 Epoch: 17 Global Step: 222530 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:48:13,462-Speed 2984.33 samples/sec Loss 1.6165 LearningRate 0.0011 Epoch: 17 Global Step: 222540 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:48:16,825-Speed 3047.84 samples/sec Loss 1.5852 LearningRate 0.0011 Epoch: 17 Global Step: 222550 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 22:48:20,200-Speed 3034.91 samples/sec Loss 1.5897 LearningRate 0.0011 Epoch: 17 Global Step: 222560 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 22:48:23,576-Speed 3034.17 samples/sec Loss 1.6127 LearningRate 0.0011 Epoch: 17 Global Step: 222570 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 22:48:26,938-Speed 3046.00 samples/sec Loss 1.6649 LearningRate 0.0011 Epoch: 17 Global Step: 222580 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 22:48:30,321-Speed 3027.94 samples/sec Loss 1.6450 LearningRate 0.0011 Epoch: 17 Global Step: 222590 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 22:48:33,723-Speed 3010.66 samples/sec Loss 1.5816 LearningRate 0.0011 Epoch: 17 Global Step: 222600 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 22:48:37,147-Speed 2991.90 samples/sec Loss 1.6229 LearningRate 0.0011 Epoch: 17 Global Step: 222610 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 22:48:40,641-Speed 2931.57 samples/sec Loss 1.5955 LearningRate 0.0011 Epoch: 17 Global Step: 222620 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 22:48:44,080-Speed 2978.44 samples/sec Loss 1.5260 LearningRate 0.0011 Epoch: 17 Global Step: 222630 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 22:48:47,473-Speed 3019.07 samples/sec Loss 1.5713 LearningRate 0.0011 Epoch: 17 Global Step: 222640 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 22:48:50,796-Speed 3082.60 samples/sec Loss 1.6499 LearningRate 0.0011 Epoch: 17 Global Step: 222650 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:48:54,150-Speed 3053.70 samples/sec Loss 1.6210 LearningRate 0.0011 Epoch: 17 Global Step: 222660 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:48:57,514-Speed 3044.23 samples/sec Loss 1.6168 LearningRate 0.0011 Epoch: 17 Global Step: 222670 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:49:00,858-Speed 3063.48 samples/sec Loss 1.5380 LearningRate 0.0011 Epoch: 17 Global Step: 222680 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:49:04,251-Speed 3018.84 samples/sec Loss 1.6291 LearningRate 0.0011 Epoch: 17 Global Step: 222690 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:49:07,669-Speed 2996.72 samples/sec Loss 1.5552 LearningRate 0.0011 Epoch: 17 Global Step: 222700 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:49:11,077-Speed 3005.73 samples/sec Loss 1.6088 LearningRate 0.0011 Epoch: 17 Global Step: 222710 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:49:14,419-Speed 3064.28 samples/sec Loss 1.6230 LearningRate 0.0011 Epoch: 17 Global Step: 222720 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:49:17,768-Speed 3058.57 samples/sec Loss 1.6674 LearningRate 0.0011 Epoch: 17 Global Step: 222730 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 22:49:21,142-Speed 3036.61 samples/sec Loss 1.5664 LearningRate 0.0011 Epoch: 17 Global Step: 222740 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 22:49:24,496-Speed 3053.95 samples/sec Loss 1.6118 LearningRate 0.0011 Epoch: 17 Global Step: 222750 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 22:49:27,933-Speed 2980.22 samples/sec Loss 1.6509 LearningRate 0.0011 Epoch: 17 Global Step: 222760 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 22:49:31,331-Speed 3013.71 samples/sec Loss 1.6628 LearningRate 0.0011 Epoch: 17 Global Step: 222770 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 22:49:35,319-Speed 2568.23 samples/sec Loss 1.6359 LearningRate 0.0011 Epoch: 17 Global Step: 222780 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 22:49:38,681-Speed 3046.79 samples/sec Loss 1.6140 LearningRate 0.0011 Epoch: 17 Global Step: 222790 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 22:49:42,074-Speed 3019.40 samples/sec Loss 1.5895 LearningRate 0.0011 Epoch: 17 Global Step: 222800 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 22:49:45,453-Speed 3030.98 samples/sec Loss 1.6364 LearningRate 0.0011 Epoch: 17 Global Step: 222810 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 22:49:48,798-Speed 3062.62 samples/sec Loss 1.5735 LearningRate 0.0011 Epoch: 17 Global Step: 222820 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 22:49:52,169-Speed 3038.70 samples/sec Loss 1.6103 LearningRate 0.0011 Epoch: 17 Global Step: 222830 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:49:55,507-Speed 3068.89 samples/sec Loss 1.5956 LearningRate 0.0011 Epoch: 17 Global Step: 222840 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:49:58,879-Speed 3037.21 samples/sec Loss 1.5427 LearningRate 0.0011 Epoch: 17 Global Step: 222850 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:50:02,201-Speed 3083.42 samples/sec Loss 1.6540 LearningRate 0.0011 Epoch: 17 Global Step: 222860 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:50:05,552-Speed 3057.09 samples/sec Loss 1.6061 LearningRate 0.0011 Epoch: 17 Global Step: 222870 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:50:08,943-Speed 3020.99 samples/sec Loss 1.5987 LearningRate 0.0011 Epoch: 17 Global Step: 222880 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:50:12,359-Speed 2998.77 samples/sec Loss 1.5974 LearningRate 0.0011 Epoch: 17 Global Step: 222890 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 22:50:15,722-Speed 3045.56 samples/sec Loss 1.6251 LearningRate 0.0011 Epoch: 17 Global Step: 222900 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 22:50:19,174-Speed 2967.53 samples/sec Loss 1.5622 LearningRate 0.0011 Epoch: 17 Global Step: 222910 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 22:50:22,600-Speed 2989.64 samples/sec Loss 1.6505 LearningRate 0.0011 Epoch: 17 Global Step: 222920 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 22:50:25,998-Speed 3014.24 samples/sec Loss 1.6152 LearningRate 0.0011 Epoch: 17 Global Step: 222930 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 22:50:29,369-Speed 3038.72 samples/sec Loss 1.6102 LearningRate 0.0011 Epoch: 17 Global Step: 222940 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 22:50:32,814-Speed 2973.48 samples/sec Loss 1.6115 LearningRate 0.0011 Epoch: 17 Global Step: 222950 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 22:50:36,258-Speed 2973.90 samples/sec Loss 1.6062 LearningRate 0.0011 Epoch: 17 Global Step: 222960 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 22:50:39,698-Speed 2977.64 samples/sec Loss 1.5570 LearningRate 0.0010 Epoch: 17 Global Step: 222970 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 22:50:43,063-Speed 3043.62 samples/sec Loss 1.5709 LearningRate 0.0010 Epoch: 17 Global Step: 222980 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 22:50:46,383-Speed 3085.01 samples/sec Loss 1.5802 LearningRate 0.0010 Epoch: 17 Global Step: 222990 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:50:49,776-Speed 3019.03 samples/sec Loss 1.6140 LearningRate 0.0010 Epoch: 17 Global Step: 223000 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:50:53,138-Speed 3046.20 samples/sec Loss 1.5713 LearningRate 0.0010 Epoch: 17 Global Step: 223010 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:50:56,605-Speed 2954.64 samples/sec Loss 1.5581 LearningRate 0.0010 Epoch: 17 Global Step: 223020 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:51:00,099-Speed 2931.35 samples/sec Loss 1.6203 LearningRate 0.0010 Epoch: 17 Global Step: 223030 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:51:03,492-Speed 3019.27 samples/sec Loss 1.6180 LearningRate 0.0010 Epoch: 17 Global Step: 223040 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:51:06,865-Speed 3037.13 samples/sec Loss 1.5776 LearningRate 0.0010 Epoch: 17 Global Step: 223050 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:51:10,226-Speed 3047.09 samples/sec Loss 1.6406 LearningRate 0.0010 Epoch: 17 Global Step: 223060 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:51:13,616-Speed 3021.64 samples/sec Loss 1.5823 LearningRate 0.0010 Epoch: 17 Global Step: 223070 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:51:16,968-Speed 3056.14 samples/sec Loss 1.5603 LearningRate 0.0010 Epoch: 17 Global Step: 223080 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:51:20,312-Speed 3062.57 samples/sec Loss 1.6672 LearningRate 0.0010 Epoch: 17 Global Step: 223090 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 22:51:23,667-Speed 3053.20 samples/sec Loss 1.5904 LearningRate 0.0010 Epoch: 17 Global Step: 223100 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:51:27,127-Speed 2960.07 samples/sec Loss 1.5651 LearningRate 0.0010 Epoch: 17 Global Step: 223110 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:51:30,538-Speed 3003.28 samples/sec Loss 1.5322 LearningRate 0.0010 Epoch: 17 Global Step: 223120 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:51:33,900-Speed 3046.89 samples/sec Loss 1.5961 LearningRate 0.0010 Epoch: 17 Global Step: 223130 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:51:37,339-Speed 2978.03 samples/sec Loss 1.6181 LearningRate 0.0010 Epoch: 17 Global Step: 223140 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:51:40,653-Speed 3090.77 samples/sec Loss 1.6016 LearningRate 0.0010 Epoch: 17 Global Step: 223150 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:51:44,075-Speed 2993.26 samples/sec Loss 1.6072 LearningRate 0.0010 Epoch: 17 Global Step: 223160 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:51:47,418-Speed 3064.32 samples/sec Loss 1.5894 LearningRate 0.0010 Epoch: 17 Global Step: 223170 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:51:50,840-Speed 2993.12 samples/sec Loss 1.5923 LearningRate 0.0010 Epoch: 17 Global Step: 223180 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:51:54,201-Speed 3048.70 samples/sec Loss 1.6384 LearningRate 0.0010 Epoch: 17 Global Step: 223190 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:51:57,572-Speed 3037.84 samples/sec Loss 1.6155 LearningRate 0.0010 Epoch: 17 Global Step: 223200 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 22:52:01,021-Speed 2970.67 samples/sec Loss 1.6889 LearningRate 0.0010 Epoch: 17 Global Step: 223210 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 22:52:04,416-Speed 3016.53 samples/sec Loss 1.5852 LearningRate 0.0010 Epoch: 17 Global Step: 223220 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 22:52:07,754-Speed 3068.99 samples/sec Loss 1.6729 LearningRate 0.0010 Epoch: 17 Global Step: 223230 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 22:52:11,101-Speed 3060.06 samples/sec Loss 1.5842 LearningRate 0.0010 Epoch: 17 Global Step: 223240 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 22:52:14,473-Speed 3037.74 samples/sec Loss 1.6069 LearningRate 0.0010 Epoch: 17 Global Step: 223250 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 22:52:17,850-Speed 3033.36 samples/sec Loss 1.5350 LearningRate 0.0010 Epoch: 17 Global Step: 223260 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 22:52:21,301-Speed 2968.40 samples/sec Loss 1.6060 LearningRate 0.0010 Epoch: 17 Global Step: 223270 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 22:52:24,756-Speed 2964.95 samples/sec Loss 1.6083 LearningRate 0.0010 Epoch: 17 Global Step: 223280 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 22:52:28,171-Speed 2999.02 samples/sec Loss 1.6179 LearningRate 0.0010 Epoch: 17 Global Step: 223290 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 22:52:31,538-Speed 3042.87 samples/sec Loss 1.5595 LearningRate 0.0010 Epoch: 17 Global Step: 223300 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:52:34,911-Speed 3036.12 samples/sec Loss 1.6989 LearningRate 0.0010 Epoch: 17 Global Step: 223310 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:52:38,353-Speed 2976.42 samples/sec Loss 1.5609 LearningRate 0.0010 Epoch: 17 Global Step: 223320 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:52:41,717-Speed 3044.61 samples/sec Loss 1.5723 LearningRate 0.0010 Epoch: 17 Global Step: 223330 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:52:45,082-Speed 3043.92 samples/sec Loss 1.6360 LearningRate 0.0010 Epoch: 17 Global Step: 223340 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 22:52:48,478-Speed 3016.86 samples/sec Loss 1.5677 LearningRate 0.0010 Epoch: 17 Global Step: 223350 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 22:52:51,893-Speed 2998.90 samples/sec Loss 1.6001 LearningRate 0.0010 Epoch: 17 Global Step: 223360 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 22:52:55,286-Speed 3019.05 samples/sec Loss 1.5883 LearningRate 0.0010 Epoch: 17 Global Step: 223370 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 22:52:58,661-Speed 3034.71 samples/sec Loss 1.6071 LearningRate 0.0010 Epoch: 17 Global Step: 223380 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 22:53:02,100-Speed 2978.45 samples/sec Loss 1.6109 LearningRate 0.0010 Epoch: 17 Global Step: 223390 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 22:53:05,443-Speed 3064.07 samples/sec Loss 1.5651 LearningRate 0.0010 Epoch: 17 Global Step: 223400 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 22:53:08,830-Speed 3024.12 samples/sec Loss 1.5847 LearningRate 0.0010 Epoch: 17 Global Step: 223410 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 22:53:12,279-Speed 2969.55 samples/sec Loss 1.5964 LearningRate 0.0010 Epoch: 17 Global Step: 223420 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 22:53:15,704-Speed 2991.51 samples/sec Loss 1.6255 LearningRate 0.0010 Epoch: 17 Global Step: 223430 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 22:53:19,038-Speed 3072.00 samples/sec Loss 1.6855 LearningRate 0.0010 Epoch: 17 Global Step: 223440 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:53:22,421-Speed 3028.19 samples/sec Loss 1.6379 LearningRate 0.0010 Epoch: 17 Global Step: 223450 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:53:25,794-Speed 3036.49 samples/sec Loss 1.5715 LearningRate 0.0010 Epoch: 17 Global Step: 223460 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 22:53:29,142-Speed 3059.74 samples/sec Loss 1.5846 LearningRate 0.0010 Epoch: 17 Global Step: 223470 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 22:53:32,500-Speed 3049.87 samples/sec Loss 1.5768 LearningRate 0.0010 Epoch: 17 Global Step: 223480 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 22:53:35,859-Speed 3050.31 samples/sec Loss 1.5279 LearningRate 0.0010 Epoch: 17 Global Step: 223490 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 22:53:39,231-Speed 3036.89 samples/sec Loss 1.5853 LearningRate 0.0010 Epoch: 17 Global Step: 223500 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 22:53:42,615-Speed 3026.56 samples/sec Loss 1.5842 LearningRate 0.0010 Epoch: 17 Global Step: 223510 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 22:53:46,048-Speed 2983.82 samples/sec Loss 1.5701 LearningRate 0.0010 Epoch: 17 Global Step: 223520 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 22:53:49,439-Speed 3021.03 samples/sec Loss 1.5941 LearningRate 0.0010 Epoch: 17 Global Step: 223530 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 22:53:52,848-Speed 3004.96 samples/sec Loss 1.6202 LearningRate 0.0010 Epoch: 17 Global Step: 223540 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 22:53:56,254-Speed 3007.20 samples/sec Loss 1.6191 LearningRate 0.0010 Epoch: 17 Global Step: 223550 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 22:53:59,647-Speed 3018.43 samples/sec Loss 1.5592 LearningRate 0.0010 Epoch: 17 Global Step: 223560 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:54:03,261-Speed 2834.05 samples/sec Loss 1.6009 LearningRate 0.0010 Epoch: 17 Global Step: 223570 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:54:36,514-Speed 307.96 samples/sec Loss 1.5028 LearningRate 0.0010 Epoch: 18 Global Step: 223580 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:54:39,869-Speed 3052.87 samples/sec Loss 1.0186 LearningRate 0.0010 Epoch: 18 Global Step: 223590 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:54:43,473-Speed 2842.78 samples/sec Loss 1.0744 LearningRate 0.0010 Epoch: 18 Global Step: 223600 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:54:46,860-Speed 3024.28 samples/sec Loss 1.0923 LearningRate 0.0010 Epoch: 18 Global Step: 223610 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:54:50,255-Speed 3016.19 samples/sec Loss 1.0870 LearningRate 0.0010 Epoch: 18 Global Step: 223620 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:54:53,615-Speed 3049.47 samples/sec Loss 1.0584 LearningRate 0.0010 Epoch: 18 Global Step: 223630 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:54:57,052-Speed 2980.10 samples/sec Loss 1.0362 LearningRate 0.0010 Epoch: 18 Global Step: 223640 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:55:00,383-Speed 3076.43 samples/sec Loss 1.0382 LearningRate 0.0010 Epoch: 18 Global Step: 223650 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:55:03,781-Speed 3014.42 samples/sec Loss 1.0248 LearningRate 0.0010 Epoch: 18 Global Step: 223660 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 22:55:07,171-Speed 3021.04 samples/sec Loss 1.0418 LearningRate 0.0010 Epoch: 18 Global Step: 223670 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:55:10,528-Speed 3051.60 samples/sec Loss 1.0276 LearningRate 0.0010 Epoch: 18 Global Step: 223680 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:55:13,938-Speed 3004.20 samples/sec Loss 1.0236 LearningRate 0.0010 Epoch: 18 Global Step: 223690 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:55:17,276-Speed 3067.94 samples/sec Loss 1.0607 LearningRate 0.0010 Epoch: 18 Global Step: 223700 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:55:20,679-Speed 3010.28 samples/sec Loss 1.0488 LearningRate 0.0010 Epoch: 18 Global Step: 223710 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:55:24,113-Speed 2984.02 samples/sec Loss 1.0765 LearningRate 0.0010 Epoch: 18 Global Step: 223720 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:55:27,451-Speed 3068.23 samples/sec Loss 1.0895 LearningRate 0.0010 Epoch: 18 Global Step: 223730 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 22:55:30,862-Speed 3003.49 samples/sec Loss 0.9874 LearningRate 0.0010 Epoch: 18 Global Step: 223740 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 22:55:34,194-Speed 3073.54 samples/sec Loss 1.0460 LearningRate 0.0010 Epoch: 18 Global Step: 223750 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 22:55:37,570-Speed 3034.30 samples/sec Loss 1.0821 LearningRate 0.0010 Epoch: 18 Global Step: 223760 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 22:55:40,973-Speed 3009.77 samples/sec Loss 1.0897 LearningRate 0.0010 Epoch: 18 Global Step: 223770 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 22:55:44,343-Speed 3038.95 samples/sec Loss 1.0823 LearningRate 0.0010 Epoch: 18 Global Step: 223780 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 22:55:47,690-Speed 3060.58 samples/sec Loss 1.0314 LearningRate 0.0010 Epoch: 18 Global Step: 223790 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 22:55:51,174-Speed 2940.65 samples/sec Loss 1.0480 LearningRate 0.0010 Epoch: 18 Global Step: 223800 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 22:55:54,586-Speed 3002.19 samples/sec Loss 1.0398 LearningRate 0.0010 Epoch: 18 Global Step: 223810 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 22:55:58,036-Speed 2968.25 samples/sec Loss 1.0921 LearningRate 0.0010 Epoch: 18 Global Step: 223820 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 22:56:01,431-Speed 3017.38 samples/sec Loss 1.0117 LearningRate 0.0010 Epoch: 18 Global Step: 223830 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:56:04,837-Speed 3007.25 samples/sec Loss 1.0539 LearningRate 0.0010 Epoch: 18 Global Step: 223840 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:56:08,397-Speed 2878.16 samples/sec Loss 1.0420 LearningRate 0.0010 Epoch: 18 Global Step: 223850 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:56:11,724-Speed 3078.85 samples/sec Loss 1.0404 LearningRate 0.0010 Epoch: 18 Global Step: 223860 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:56:15,246-Speed 2907.72 samples/sec Loss 1.0633 LearningRate 0.0010 Epoch: 18 Global Step: 223870 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:56:18,687-Speed 2977.53 samples/sec Loss 1.0453 LearningRate 0.0010 Epoch: 18 Global Step: 223880 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:56:22,135-Speed 2970.29 samples/sec Loss 1.0374 LearningRate 0.0010 Epoch: 18 Global Step: 223890 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:56:25,591-Speed 2964.27 samples/sec Loss 1.0502 LearningRate 0.0010 Epoch: 18 Global Step: 223900 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:56:28,999-Speed 3004.96 samples/sec Loss 1.0487 LearningRate 0.0010 Epoch: 18 Global Step: 223910 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:56:32,471-Speed 2949.95 samples/sec Loss 1.0595 LearningRate 0.0010 Epoch: 18 Global Step: 223920 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:56:35,949-Speed 2944.97 samples/sec Loss 1.0410 LearningRate 0.0010 Epoch: 18 Global Step: 223930 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 22:56:39,365-Speed 2999.45 samples/sec Loss 1.0324 LearningRate 0.0010 Epoch: 18 Global Step: 223940 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:56:42,754-Speed 3021.83 samples/sec Loss 1.0420 LearningRate 0.0010 Epoch: 18 Global Step: 223950 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:56:46,125-Speed 3039.02 samples/sec Loss 1.0318 LearningRate 0.0010 Epoch: 18 Global Step: 223960 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:56:49,540-Speed 2999.65 samples/sec Loss 1.0662 LearningRate 0.0010 Epoch: 18 Global Step: 223970 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:56:52,895-Speed 3052.95 samples/sec Loss 1.0209 LearningRate 0.0010 Epoch: 18 Global Step: 223980 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:56:56,270-Speed 3034.47 samples/sec Loss 1.0428 LearningRate 0.0010 Epoch: 18 Global Step: 223990 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:56:59,719-Speed 2970.03 samples/sec Loss 0.9975 LearningRate 0.0010 Epoch: 18 Global Step: 224000 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:57:03,116-Speed 3015.82 samples/sec Loss 1.0383 LearningRate 0.0010 Epoch: 18 Global Step: 224010 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:57:06,601-Speed 2939.31 samples/sec Loss 1.0710 LearningRate 0.0010 Epoch: 18 Global Step: 224020 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:57:10,037-Speed 2980.93 samples/sec Loss 0.9964 LearningRate 0.0010 Epoch: 18 Global Step: 224030 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:57:13,414-Speed 3032.87 samples/sec Loss 1.0580 LearningRate 0.0010 Epoch: 18 Global Step: 224040 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 22:57:16,909-Speed 2931.18 samples/sec Loss 1.0164 LearningRate 0.0010 Epoch: 18 Global Step: 224050 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 22:57:20,282-Speed 3036.86 samples/sec Loss 1.0568 LearningRate 0.0010 Epoch: 18 Global Step: 224060 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 22:57:23,795-Speed 2915.35 samples/sec Loss 1.0536 LearningRate 0.0010 Epoch: 18 Global Step: 224070 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 22:57:27,232-Speed 2980.07 samples/sec Loss 1.0230 LearningRate 0.0010 Epoch: 18 Global Step: 224080 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 22:57:30,652-Speed 2995.25 samples/sec Loss 1.0385 LearningRate 0.0010 Epoch: 18 Global Step: 224090 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:57:34,099-Speed 2971.89 samples/sec Loss 1.0053 LearningRate 0.0010 Epoch: 18 Global Step: 224100 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:57:37,472-Speed 3036.56 samples/sec Loss 1.0313 LearningRate 0.0010 Epoch: 18 Global Step: 224110 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:57:40,816-Speed 3063.20 samples/sec Loss 1.0374 LearningRate 0.0010 Epoch: 18 Global Step: 224120 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:57:44,246-Speed 2986.36 samples/sec Loss 1.0414 LearningRate 0.0010 Epoch: 18 Global Step: 224130 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:57:47,639-Speed 3019.02 samples/sec Loss 1.0443 LearningRate 0.0010 Epoch: 18 Global Step: 224140 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 22:57:51,054-Speed 2999.27 samples/sec Loss 1.0771 LearningRate 0.0010 Epoch: 18 Global Step: 224150 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 22:57:54,460-Speed 3007.66 samples/sec Loss 1.0642 LearningRate 0.0010 Epoch: 18 Global Step: 224160 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 22:57:57,806-Speed 3060.95 samples/sec Loss 1.0198 LearningRate 0.0010 Epoch: 18 Global Step: 224170 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 22:58:01,121-Speed 3089.77 samples/sec Loss 1.0437 LearningRate 0.0010 Epoch: 18 Global Step: 224180 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 22:58:04,474-Speed 3054.59 samples/sec Loss 1.0220 LearningRate 0.0010 Epoch: 18 Global Step: 224190 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 22:58:07,848-Speed 3035.79 samples/sec Loss 1.0932 LearningRate 0.0010 Epoch: 18 Global Step: 224200 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 22:58:11,192-Speed 3063.60 samples/sec Loss 1.0565 LearningRate 0.0009 Epoch: 18 Global Step: 224210 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 22:58:14,609-Speed 2996.87 samples/sec Loss 1.0389 LearningRate 0.0009 Epoch: 18 Global Step: 224220 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 22:58:17,928-Speed 3086.90 samples/sec Loss 1.0880 LearningRate 0.0009 Epoch: 18 Global Step: 224230 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 22:58:21,243-Speed 3090.10 samples/sec Loss 1.0696 LearningRate 0.0009 Epoch: 18 Global Step: 224240 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:58:24,644-Speed 3011.67 samples/sec Loss 1.1020 LearningRate 0.0009 Epoch: 18 Global Step: 224250 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:58:28,088-Speed 2973.73 samples/sec Loss 1.0613 LearningRate 0.0009 Epoch: 18 Global Step: 224260 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:58:31,487-Speed 3013.90 samples/sec Loss 1.0632 LearningRate 0.0009 Epoch: 18 Global Step: 224270 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:58:34,872-Speed 3026.05 samples/sec Loss 1.0607 LearningRate 0.0009 Epoch: 18 Global Step: 224280 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:58:38,375-Speed 2924.13 samples/sec Loss 1.0399 LearningRate 0.0009 Epoch: 18 Global Step: 224290 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:58:41,807-Speed 2985.39 samples/sec Loss 1.0610 LearningRate 0.0009 Epoch: 18 Global Step: 224300 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:58:45,237-Speed 2985.29 samples/sec Loss 1.0669 LearningRate 0.0009 Epoch: 18 Global Step: 224310 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 22:58:48,614-Speed 3033.51 samples/sec Loss 1.0453 LearningRate 0.0009 Epoch: 18 Global Step: 224320 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 22:58:52,074-Speed 2960.85 samples/sec Loss 1.0970 LearningRate 0.0009 Epoch: 18 Global Step: 224330 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 22:58:55,499-Speed 2990.65 samples/sec Loss 1.0416 LearningRate 0.0009 Epoch: 18 Global Step: 224340 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 22:58:59,009-Speed 2918.46 samples/sec Loss 1.0462 LearningRate 0.0009 Epoch: 18 Global Step: 224350 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 22:59:02,326-Speed 3087.52 samples/sec Loss 1.0550 LearningRate 0.0009 Epoch: 18 Global Step: 224360 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 22:59:05,732-Speed 3007.63 samples/sec Loss 1.0395 LearningRate 0.0009 Epoch: 18 Global Step: 224370 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 22:59:09,095-Speed 3045.34 samples/sec Loss 1.0092 LearningRate 0.0009 Epoch: 18 Global Step: 224380 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 22:59:12,458-Speed 3046.27 samples/sec Loss 1.0712 LearningRate 0.0009 Epoch: 18 Global Step: 224390 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 22:59:15,840-Speed 3029.00 samples/sec Loss 1.0564 LearningRate 0.0009 Epoch: 18 Global Step: 224400 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 22:59:19,166-Speed 3079.94 samples/sec Loss 1.0697 LearningRate 0.0009 Epoch: 18 Global Step: 224410 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:59:22,528-Speed 3046.90 samples/sec Loss 1.0435 LearningRate 0.0009 Epoch: 18 Global Step: 224420 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:59:25,971-Speed 2975.04 samples/sec Loss 1.0805 LearningRate 0.0009 Epoch: 18 Global Step: 224430 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:59:29,318-Speed 3060.19 samples/sec Loss 1.0342 LearningRate 0.0009 Epoch: 18 Global Step: 224440 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:59:32,722-Speed 3008.28 samples/sec Loss 1.0400 LearningRate 0.0009 Epoch: 18 Global Step: 224450 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:59:36,064-Speed 3065.76 samples/sec Loss 1.0393 LearningRate 0.0009 Epoch: 18 Global Step: 224460 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:59:39,458-Speed 3017.65 samples/sec Loss 1.0962 LearningRate 0.0009 Epoch: 18 Global Step: 224470 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:59:42,830-Speed 3037.40 samples/sec Loss 1.0502 LearningRate 0.0009 Epoch: 18 Global Step: 224480 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:59:46,373-Speed 2891.21 samples/sec Loss 1.0267 LearningRate 0.0009 Epoch: 18 Global Step: 224490 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:59:49,694-Speed 3084.21 samples/sec Loss 1.0239 LearningRate 0.0009 Epoch: 18 Global Step: 224500 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:59:53,099-Speed 3008.13 samples/sec Loss 1.0760 LearningRate 0.0009 Epoch: 18 Global Step: 224510 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 22:59:56,496-Speed 3015.46 samples/sec Loss 1.0489 LearningRate 0.0009 Epoch: 18 Global Step: 224520 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 22:59:59,894-Speed 3014.31 samples/sec Loss 1.0908 LearningRate 0.0009 Epoch: 18 Global Step: 224530 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:00:03,275-Speed 3030.50 samples/sec Loss 1.0401 LearningRate 0.0009 Epoch: 18 Global Step: 224540 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:00:06,639-Speed 3044.03 samples/sec Loss 1.0684 LearningRate 0.0009 Epoch: 18 Global Step: 224550 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:00:10,112-Speed 2949.73 samples/sec Loss 1.0968 LearningRate 0.0009 Epoch: 18 Global Step: 224560 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:00:13,608-Speed 2929.48 samples/sec Loss 1.0736 LearningRate 0.0009 Epoch: 18 Global Step: 224570 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:00:17,049-Speed 2977.02 samples/sec Loss 1.0543 LearningRate 0.0009 Epoch: 18 Global Step: 224580 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:00:20,491-Speed 2975.65 samples/sec Loss 1.0807 LearningRate 0.0009 Epoch: 18 Global Step: 224590 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:00:23,964-Speed 2949.40 samples/sec Loss 1.0633 LearningRate 0.0009 Epoch: 18 Global Step: 224600 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:00:27,400-Speed 2981.00 samples/sec Loss 1.0537 LearningRate 0.0009 Epoch: 18 Global Step: 224610 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:00:30,787-Speed 3024.43 samples/sec Loss 1.0330 LearningRate 0.0009 Epoch: 18 Global Step: 224620 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:00:34,112-Speed 3081.00 samples/sec Loss 1.0786 LearningRate 0.0009 Epoch: 18 Global Step: 224630 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:00:37,466-Speed 3053.36 samples/sec Loss 1.0901 LearningRate 0.0009 Epoch: 18 Global Step: 224640 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:00:40,872-Speed 3007.70 samples/sec Loss 1.0859 LearningRate 0.0009 Epoch: 18 Global Step: 224650 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:00:44,257-Speed 3025.12 samples/sec Loss 1.0449 LearningRate 0.0009 Epoch: 18 Global Step: 224660 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:00:47,584-Speed 3079.36 samples/sec Loss 1.0803 LearningRate 0.0009 Epoch: 18 Global Step: 224670 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:00:50,917-Speed 3073.11 samples/sec Loss 1.0984 LearningRate 0.0009 Epoch: 18 Global Step: 224680 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:00:54,217-Speed 3103.55 samples/sec Loss 1.0798 LearningRate 0.0009 Epoch: 18 Global Step: 224690 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:00:57,622-Speed 3008.49 samples/sec Loss 1.0639 LearningRate 0.0009 Epoch: 18 Global Step: 224700 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:01:01,027-Speed 3008.49 samples/sec Loss 1.0822 LearningRate 0.0009 Epoch: 18 Global Step: 224710 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:01:04,395-Speed 3040.41 samples/sec Loss 1.0927 LearningRate 0.0009 Epoch: 18 Global Step: 224720 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:01:07,766-Speed 3038.38 samples/sec Loss 1.0659 LearningRate 0.0009 Epoch: 18 Global Step: 224730 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:01:11,094-Speed 3078.02 samples/sec Loss 1.0474 LearningRate 0.0009 Epoch: 18 Global Step: 224740 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:01:14,490-Speed 3016.09 samples/sec Loss 1.0149 LearningRate 0.0009 Epoch: 18 Global Step: 224750 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:01:17,835-Speed 3062.37 samples/sec Loss 1.0754 LearningRate 0.0009 Epoch: 18 Global Step: 224760 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:01:21,160-Speed 3080.40 samples/sec Loss 1.0701 LearningRate 0.0009 Epoch: 18 Global Step: 224770 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:01:24,467-Speed 3098.04 samples/sec Loss 1.0262 LearningRate 0.0009 Epoch: 18 Global Step: 224780 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:01:27,854-Speed 3024.01 samples/sec Loss 1.0933 LearningRate 0.0009 Epoch: 18 Global Step: 224790 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:01:31,227-Speed 3036.49 samples/sec Loss 1.0312 LearningRate 0.0009 Epoch: 18 Global Step: 224800 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:01:34,570-Speed 3064.48 samples/sec Loss 1.0683 LearningRate 0.0009 Epoch: 18 Global Step: 224810 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:01:37,919-Speed 3058.12 samples/sec Loss 1.0773 LearningRate 0.0009 Epoch: 18 Global Step: 224820 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:01:41,255-Speed 3070.18 samples/sec Loss 1.0900 LearningRate 0.0009 Epoch: 18 Global Step: 224830 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:01:44,592-Speed 3070.07 samples/sec Loss 1.1083 LearningRate 0.0009 Epoch: 18 Global Step: 224840 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:01:47,945-Speed 3054.93 samples/sec Loss 1.0758 LearningRate 0.0009 Epoch: 18 Global Step: 224850 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:01:51,301-Speed 3052.03 samples/sec Loss 1.0477 LearningRate 0.0009 Epoch: 18 Global Step: 224860 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:01:54,654-Speed 3054.62 samples/sec Loss 1.1191 LearningRate 0.0009 Epoch: 18 Global Step: 224870 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:01:58,061-Speed 3006.44 samples/sec Loss 1.0911 LearningRate 0.0009 Epoch: 18 Global Step: 224880 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:02:01,426-Speed 3043.72 samples/sec Loss 1.0875 LearningRate 0.0009 Epoch: 18 Global Step: 224890 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:02:04,820-Speed 3018.06 samples/sec Loss 1.0529 LearningRate 0.0009 Epoch: 18 Global Step: 224900 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:02:08,146-Speed 3079.82 samples/sec Loss 1.0575 LearningRate 0.0009 Epoch: 18 Global Step: 224910 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:02:11,509-Speed 3045.85 samples/sec Loss 0.9983 LearningRate 0.0009 Epoch: 18 Global Step: 224920 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:02:14,896-Speed 3023.89 samples/sec Loss 1.0634 LearningRate 0.0009 Epoch: 18 Global Step: 224930 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:02:18,233-Speed 3069.72 samples/sec Loss 1.0654 LearningRate 0.0009 Epoch: 18 Global Step: 224940 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:02:21,629-Speed 3016.25 samples/sec Loss 1.1057 LearningRate 0.0009 Epoch: 18 Global Step: 224950 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:02:25,067-Speed 2979.07 samples/sec Loss 1.1036 LearningRate 0.0009 Epoch: 18 Global Step: 224960 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:02:28,453-Speed 3024.99 samples/sec Loss 1.0666 LearningRate 0.0009 Epoch: 18 Global Step: 224970 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:02:31,816-Speed 3046.39 samples/sec Loss 1.0423 LearningRate 0.0009 Epoch: 18 Global Step: 224980 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:02:35,121-Speed 3099.36 samples/sec Loss 1.0474 LearningRate 0.0009 Epoch: 18 Global Step: 224990 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:02:38,459-Speed 3068.05 samples/sec Loss 1.0582 LearningRate 0.0009 Epoch: 18 Global Step: 225000 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:02:41,855-Speed 3016.46 samples/sec Loss 1.0699 LearningRate 0.0009 Epoch: 18 Global Step: 225010 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:02:45,189-Speed 3072.29 samples/sec Loss 1.0413 LearningRate 0.0009 Epoch: 18 Global Step: 225020 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:02:48,671-Speed 2941.44 samples/sec Loss 1.0010 LearningRate 0.0009 Epoch: 18 Global Step: 225030 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:02:52,029-Speed 3050.75 samples/sec Loss 1.0957 LearningRate 0.0009 Epoch: 18 Global Step: 225040 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:02:55,410-Speed 3028.90 samples/sec Loss 1.0648 LearningRate 0.0009 Epoch: 18 Global Step: 225050 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:02:58,831-Speed 2994.26 samples/sec Loss 1.0542 LearningRate 0.0009 Epoch: 18 Global Step: 225060 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:03:02,240-Speed 3004.78 samples/sec Loss 1.1048 LearningRate 0.0009 Epoch: 18 Global Step: 225070 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:03:05,699-Speed 2961.20 samples/sec Loss 1.0672 LearningRate 0.0009 Epoch: 18 Global Step: 225080 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:03:09,170-Speed 2950.49 samples/sec Loss 0.9809 LearningRate 0.0009 Epoch: 18 Global Step: 225090 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:03:12,583-Speed 3000.88 samples/sec Loss 1.0907 LearningRate 0.0009 Epoch: 18 Global Step: 225100 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:03:15,928-Speed 3062.33 samples/sec Loss 1.0410 LearningRate 0.0009 Epoch: 18 Global Step: 225110 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:03:19,293-Speed 3044.66 samples/sec Loss 1.0524 LearningRate 0.0009 Epoch: 18 Global Step: 225120 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:03:22,643-Speed 3056.96 samples/sec Loss 1.0710 LearningRate 0.0009 Epoch: 18 Global Step: 225130 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:03:26,043-Speed 3012.92 samples/sec Loss 1.0939 LearningRate 0.0009 Epoch: 18 Global Step: 225140 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:03:29,426-Speed 3028.15 samples/sec Loss 1.1032 LearningRate 0.0009 Epoch: 18 Global Step: 225150 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:03:32,777-Speed 3056.24 samples/sec Loss 1.0506 LearningRate 0.0009 Epoch: 18 Global Step: 225160 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:03:36,142-Speed 3044.40 samples/sec Loss 1.1030 LearningRate 0.0009 Epoch: 18 Global Step: 225170 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:03:39,605-Speed 2957.64 samples/sec Loss 1.0372 LearningRate 0.0009 Epoch: 18 Global Step: 225180 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:03:43,080-Speed 2947.37 samples/sec Loss 1.0374 LearningRate 0.0009 Epoch: 18 Global Step: 225190 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:03:46,470-Speed 3021.68 samples/sec Loss 1.0321 LearningRate 0.0009 Epoch: 18 Global Step: 225200 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:03:49,808-Speed 3068.59 samples/sec Loss 1.0369 LearningRate 0.0009 Epoch: 18 Global Step: 225210 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:03:53,194-Speed 3025.58 samples/sec Loss 1.0367 LearningRate 0.0009 Epoch: 18 Global Step: 225220 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:03:56,529-Speed 3071.37 samples/sec Loss 1.0949 LearningRate 0.0009 Epoch: 18 Global Step: 225230 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:03:59,938-Speed 3004.99 samples/sec Loss 1.0721 LearningRate 0.0009 Epoch: 18 Global Step: 225240 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:04:03,323-Speed 3026.01 samples/sec Loss 1.1274 LearningRate 0.0009 Epoch: 18 Global Step: 225250 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:04:06,721-Speed 3013.55 samples/sec Loss 1.1104 LearningRate 0.0009 Epoch: 18 Global Step: 225260 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:04:10,131-Speed 3004.82 samples/sec Loss 1.0695 LearningRate 0.0009 Epoch: 18 Global Step: 225270 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 23:04:13,530-Speed 3014.18 samples/sec Loss 1.0500 LearningRate 0.0009 Epoch: 18 Global Step: 225280 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:04:16,895-Speed 3043.74 samples/sec Loss 1.0781 LearningRate 0.0009 Epoch: 18 Global Step: 225290 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:04:20,277-Speed 3028.63 samples/sec Loss 1.0285 LearningRate 0.0009 Epoch: 18 Global Step: 225300 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:04:23,714-Speed 2980.41 samples/sec Loss 1.0662 LearningRate 0.0009 Epoch: 18 Global Step: 225310 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:04:27,153-Speed 2978.63 samples/sec Loss 1.0202 LearningRate 0.0009 Epoch: 18 Global Step: 225320 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:04:30,614-Speed 2959.37 samples/sec Loss 1.0588 LearningRate 0.0009 Epoch: 18 Global Step: 225330 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:04:34,086-Speed 2949.95 samples/sec Loss 1.0812 LearningRate 0.0009 Epoch: 18 Global Step: 225340 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:04:37,507-Speed 2994.41 samples/sec Loss 1.0952 LearningRate 0.0009 Epoch: 18 Global Step: 225350 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:04:40,915-Speed 3004.82 samples/sec Loss 1.0713 LearningRate 0.0009 Epoch: 18 Global Step: 225360 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:04:44,331-Speed 2998.66 samples/sec Loss 1.0931 LearningRate 0.0009 Epoch: 18 Global Step: 225370 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:04:47,798-Speed 2954.32 samples/sec Loss 1.0877 LearningRate 0.0009 Epoch: 18 Global Step: 225380 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 23:04:51,234-Speed 2981.36 samples/sec Loss 1.0684 LearningRate 0.0009 Epoch: 18 Global Step: 225390 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 23:04:54,548-Speed 3090.34 samples/sec Loss 1.0900 LearningRate 0.0009 Epoch: 18 Global Step: 225400 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:04:57,892-Speed 3062.65 samples/sec Loss 1.1153 LearningRate 0.0009 Epoch: 18 Global Step: 225410 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:05:01,275-Speed 3028.06 samples/sec Loss 1.0998 LearningRate 0.0009 Epoch: 18 Global Step: 225420 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:05:04,697-Speed 2993.58 samples/sec Loss 1.1168 LearningRate 0.0009 Epoch: 18 Global Step: 225430 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:05:08,147-Speed 2969.32 samples/sec Loss 1.1349 LearningRate 0.0009 Epoch: 18 Global Step: 225440 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:05:11,556-Speed 3004.57 samples/sec Loss 1.0776 LearningRate 0.0009 Epoch: 18 Global Step: 225450 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:05:14,923-Speed 3042.37 samples/sec Loss 1.1148 LearningRate 0.0009 Epoch: 18 Global Step: 225460 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:05:18,324-Speed 3012.46 samples/sec Loss 1.0676 LearningRate 0.0009 Epoch: 18 Global Step: 225470 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:05:21,734-Speed 3003.38 samples/sec Loss 1.0955 LearningRate 0.0009 Epoch: 18 Global Step: 225480 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:05:25,148-Speed 3000.54 samples/sec Loss 1.0955 LearningRate 0.0009 Epoch: 18 Global Step: 225490 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:05:28,536-Speed 3023.85 samples/sec Loss 1.1053 LearningRate 0.0009 Epoch: 18 Global Step: 225500 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 23:05:31,933-Speed 3015.59 samples/sec Loss 1.0724 LearningRate 0.0009 Epoch: 18 Global Step: 225510 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:05:35,342-Speed 3004.56 samples/sec Loss 1.0971 LearningRate 0.0008 Epoch: 18 Global Step: 225520 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:05:38,736-Speed 3018.30 samples/sec Loss 1.0482 LearningRate 0.0008 Epoch: 18 Global Step: 225530 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:05:42,230-Speed 2931.12 samples/sec Loss 1.0164 LearningRate 0.0008 Epoch: 18 Global Step: 225540 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:05:45,660-Speed 2986.33 samples/sec Loss 1.0613 LearningRate 0.0008 Epoch: 18 Global Step: 225550 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:05:49,096-Speed 2981.53 samples/sec Loss 1.1207 LearningRate 0.0008 Epoch: 18 Global Step: 225560 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:05:52,544-Speed 2970.60 samples/sec Loss 1.0726 LearningRate 0.0008 Epoch: 18 Global Step: 225570 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:05:55,953-Speed 3004.08 samples/sec Loss 1.0987 LearningRate 0.0008 Epoch: 18 Global Step: 225580 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:05:59,338-Speed 3026.59 samples/sec Loss 1.0906 LearningRate 0.0008 Epoch: 18 Global Step: 225590 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:06:02,761-Speed 2992.55 samples/sec Loss 1.1350 LearningRate 0.0008 Epoch: 18 Global Step: 225600 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:06:06,170-Speed 3004.84 samples/sec Loss 1.1172 LearningRate 0.0008 Epoch: 18 Global Step: 225610 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:06:09,541-Speed 3038.84 samples/sec Loss 1.0558 LearningRate 0.0008 Epoch: 18 Global Step: 225620 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:06:12,944-Speed 3010.09 samples/sec Loss 1.1290 LearningRate 0.0008 Epoch: 18 Global Step: 225630 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:06:16,248-Speed 3100.07 samples/sec Loss 1.0816 LearningRate 0.0008 Epoch: 18 Global Step: 225640 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:06:19,638-Speed 3022.04 samples/sec Loss 1.0687 LearningRate 0.0008 Epoch: 18 Global Step: 225650 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:06:23,049-Speed 3002.31 samples/sec Loss 1.0875 LearningRate 0.0008 Epoch: 18 Global Step: 225660 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:06:26,381-Speed 3074.48 samples/sec Loss 1.0522 LearningRate 0.0008 Epoch: 18 Global Step: 225670 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:06:29,798-Speed 2997.62 samples/sec Loss 1.1107 LearningRate 0.0008 Epoch: 18 Global Step: 225680 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:06:33,225-Speed 2989.68 samples/sec Loss 1.0856 LearningRate 0.0008 Epoch: 18 Global Step: 225690 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:06:36,665-Speed 2977.03 samples/sec Loss 1.0709 LearningRate 0.0008 Epoch: 18 Global Step: 225700 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:06:40,065-Speed 3013.50 samples/sec Loss 1.0949 LearningRate 0.0008 Epoch: 18 Global Step: 225710 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:06:43,454-Speed 3021.76 samples/sec Loss 1.1076 LearningRate 0.0008 Epoch: 18 Global Step: 225720 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:06:46,802-Speed 3059.57 samples/sec Loss 1.0435 LearningRate 0.0008 Epoch: 18 Global Step: 225730 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:06:50,246-Speed 2974.57 samples/sec Loss 1.0693 LearningRate 0.0008 Epoch: 18 Global Step: 225740 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:06:53,691-Speed 2972.97 samples/sec Loss 1.0956 LearningRate 0.0008 Epoch: 18 Global Step: 225750 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:06:57,120-Speed 2986.98 samples/sec Loss 1.0488 LearningRate 0.0008 Epoch: 18 Global Step: 225760 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:07:00,474-Speed 3057.76 samples/sec Loss 1.0641 LearningRate 0.0008 Epoch: 18 Global Step: 225770 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:07:03,808-Speed 3072.87 samples/sec Loss 1.0469 LearningRate 0.0008 Epoch: 18 Global Step: 225780 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 23:07:07,192-Speed 3027.13 samples/sec Loss 1.0869 LearningRate 0.0008 Epoch: 18 Global Step: 225790 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:07:10,555-Speed 3045.38 samples/sec Loss 1.0968 LearningRate 0.0008 Epoch: 18 Global Step: 225800 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:07:13,972-Speed 2998.45 samples/sec Loss 1.0450 LearningRate 0.0008 Epoch: 18 Global Step: 225810 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:07:17,315-Speed 3064.14 samples/sec Loss 1.0794 LearningRate 0.0008 Epoch: 18 Global Step: 225820 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:07:20,620-Speed 3099.52 samples/sec Loss 1.0821 LearningRate 0.0008 Epoch: 18 Global Step: 225830 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 23:07:24,007-Speed 3024.65 samples/sec Loss 1.0985 LearningRate 0.0008 Epoch: 18 Global Step: 225840 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 23:07:27,376-Speed 3040.62 samples/sec Loss 1.0769 LearningRate 0.0008 Epoch: 18 Global Step: 225850 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 23:07:30,716-Speed 3066.25 samples/sec Loss 1.0464 LearningRate 0.0008 Epoch: 18 Global Step: 225860 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 23:07:34,174-Speed 2961.81 samples/sec Loss 1.0820 LearningRate 0.0008 Epoch: 18 Global Step: 225870 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 23:07:37,520-Speed 3062.10 samples/sec Loss 1.0861 LearningRate 0.0008 Epoch: 18 Global Step: 225880 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 23:07:40,850-Speed 3076.43 samples/sec Loss 1.0690 LearningRate 0.0008 Epoch: 18 Global Step: 225890 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 23:07:44,235-Speed 3025.46 samples/sec Loss 1.0304 LearningRate 0.0008 Epoch: 18 Global Step: 225900 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 23:07:47,639-Speed 3009.18 samples/sec Loss 1.0289 LearningRate 0.0008 Epoch: 18 Global Step: 225910 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 23:07:51,021-Speed 3028.71 samples/sec Loss 1.0839 LearningRate 0.0008 Epoch: 18 Global Step: 225920 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 23:07:54,373-Speed 3056.03 samples/sec Loss 1.1152 LearningRate 0.0008 Epoch: 18 Global Step: 225930 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:07:57,771-Speed 3015.18 samples/sec Loss 1.0510 LearningRate 0.0008 Epoch: 18 Global Step: 225940 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:08:01,209-Speed 2979.54 samples/sec Loss 1.0619 LearningRate 0.0008 Epoch: 18 Global Step: 225950 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:08:04,576-Speed 3042.30 samples/sec Loss 1.0826 LearningRate 0.0008 Epoch: 18 Global Step: 225960 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:08:07,946-Speed 3039.01 samples/sec Loss 1.0416 LearningRate 0.0008 Epoch: 18 Global Step: 225970 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:08:11,319-Speed 3036.44 samples/sec Loss 1.1075 LearningRate 0.0008 Epoch: 18 Global Step: 225980 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:08:14,726-Speed 3006.92 samples/sec Loss 1.1235 LearningRate 0.0008 Epoch: 18 Global Step: 225990 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:08:18,141-Speed 2998.90 samples/sec Loss 1.0633 LearningRate 0.0008 Epoch: 18 Global Step: 226000 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:08:21,571-Speed 2987.05 samples/sec Loss 1.0940 LearningRate 0.0008 Epoch: 18 Global Step: 226010 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:08:25,002-Speed 2984.98 samples/sec Loss 1.0685 LearningRate 0.0008 Epoch: 18 Global Step: 226020 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:08:28,381-Speed 3031.72 samples/sec Loss 1.0933 LearningRate 0.0008 Epoch: 18 Global Step: 226030 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:08:31,870-Speed 2935.92 samples/sec Loss 1.0960 LearningRate 0.0008 Epoch: 18 Global Step: 226040 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:08:35,235-Speed 3043.38 samples/sec Loss 1.0466 LearningRate 0.0008 Epoch: 18 Global Step: 226050 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:08:38,629-Speed 3018.37 samples/sec Loss 1.1115 LearningRate 0.0008 Epoch: 18 Global Step: 226060 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:08:41,991-Speed 3046.30 samples/sec Loss 1.0667 LearningRate 0.0008 Epoch: 18 Global Step: 226070 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:08:45,331-Speed 3067.36 samples/sec Loss 1.0966 LearningRate 0.0008 Epoch: 18 Global Step: 226080 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:08:48,824-Speed 2932.26 samples/sec Loss 1.0951 LearningRate 0.0008 Epoch: 18 Global Step: 226090 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:08:52,308-Speed 2940.10 samples/sec Loss 1.1161 LearningRate 0.0008 Epoch: 18 Global Step: 226100 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:08:55,762-Speed 2965.23 samples/sec Loss 1.1029 LearningRate 0.0008 Epoch: 18 Global Step: 226110 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:08:59,131-Speed 3040.13 samples/sec Loss 1.0545 LearningRate 0.0008 Epoch: 18 Global Step: 226120 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:09:02,558-Speed 2989.67 samples/sec Loss 1.1308 LearningRate 0.0008 Epoch: 18 Global Step: 226130 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 23:09:05,951-Speed 3018.39 samples/sec Loss 1.0542 LearningRate 0.0008 Epoch: 18 Global Step: 226140 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 23:09:09,337-Speed 3025.11 samples/sec Loss 1.0861 LearningRate 0.0008 Epoch: 18 Global Step: 226150 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 23:09:12,767-Speed 2986.96 samples/sec Loss 1.0780 LearningRate 0.0008 Epoch: 18 Global Step: 226160 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:09:16,097-Speed 3075.83 samples/sec Loss 1.1034 LearningRate 0.0008 Epoch: 18 Global Step: 226170 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:09:19,482-Speed 3026.55 samples/sec Loss 1.0785 LearningRate 0.0008 Epoch: 18 Global Step: 226180 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:09:22,912-Speed 2986.49 samples/sec Loss 1.0893 LearningRate 0.0008 Epoch: 18 Global Step: 226190 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:09:26,337-Speed 2990.89 samples/sec Loss 1.0984 LearningRate 0.0008 Epoch: 18 Global Step: 226200 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:09:29,744-Speed 3006.54 samples/sec Loss 1.0741 LearningRate 0.0008 Epoch: 18 Global Step: 226210 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:09:33,097-Speed 3054.81 samples/sec Loss 1.0750 LearningRate 0.0008 Epoch: 18 Global Step: 226220 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:09:36,452-Speed 3052.74 samples/sec Loss 1.1096 LearningRate 0.0008 Epoch: 18 Global Step: 226230 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:09:39,823-Speed 3038.70 samples/sec Loss 1.0785 LearningRate 0.0008 Epoch: 18 Global Step: 226240 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:09:43,234-Speed 3003.72 samples/sec Loss 1.0625 LearningRate 0.0008 Epoch: 18 Global Step: 226250 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:09:46,611-Speed 3032.58 samples/sec Loss 1.0963 LearningRate 0.0008 Epoch: 18 Global Step: 226260 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 23:09:49,962-Speed 3056.53 samples/sec Loss 1.1098 LearningRate 0.0008 Epoch: 18 Global Step: 226270 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 23:09:53,352-Speed 3021.49 samples/sec Loss 1.0910 LearningRate 0.0008 Epoch: 18 Global Step: 226280 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:09:56,724-Speed 3037.90 samples/sec Loss 1.0642 LearningRate 0.0008 Epoch: 18 Global Step: 226290 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:10:00,158-Speed 2982.67 samples/sec Loss 1.0994 LearningRate 0.0008 Epoch: 18 Global Step: 226300 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:10:03,528-Speed 3039.80 samples/sec Loss 1.0569 LearningRate 0.0008 Epoch: 18 Global Step: 226310 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:10:06,848-Speed 3084.25 samples/sec Loss 1.0585 LearningRate 0.0008 Epoch: 18 Global Step: 226320 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:10:10,175-Speed 3079.50 samples/sec Loss 1.1120 LearningRate 0.0008 Epoch: 18 Global Step: 226330 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:10:13,601-Speed 2989.95 samples/sec Loss 1.1084 LearningRate 0.0008 Epoch: 18 Global Step: 226340 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:10:17,002-Speed 3011.90 samples/sec Loss 1.1350 LearningRate 0.0008 Epoch: 18 Global Step: 226350 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:10:20,374-Speed 3037.22 samples/sec Loss 1.0924 LearningRate 0.0008 Epoch: 18 Global Step: 226360 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:10:23,717-Speed 3064.26 samples/sec Loss 1.0926 LearningRate 0.0008 Epoch: 18 Global Step: 226370 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:10:27,122-Speed 3007.58 samples/sec Loss 1.0853 LearningRate 0.0008 Epoch: 18 Global Step: 226380 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:10:30,497-Speed 3036.02 samples/sec Loss 1.1006 LearningRate 0.0008 Epoch: 18 Global Step: 226390 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:10:33,845-Speed 3058.55 samples/sec Loss 1.1319 LearningRate 0.0008 Epoch: 18 Global Step: 226400 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:10:37,276-Speed 2985.96 samples/sec Loss 1.0778 LearningRate 0.0008 Epoch: 18 Global Step: 226410 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:10:40,685-Speed 3004.72 samples/sec Loss 1.0920 LearningRate 0.0008 Epoch: 18 Global Step: 226420 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:10:44,004-Speed 3086.74 samples/sec Loss 1.1013 LearningRate 0.0008 Epoch: 18 Global Step: 226430 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:10:47,387-Speed 3027.11 samples/sec Loss 1.0768 LearningRate 0.0008 Epoch: 18 Global Step: 226440 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:10:50,783-Speed 3016.39 samples/sec Loss 1.1146 LearningRate 0.0008 Epoch: 18 Global Step: 226450 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:10:54,113-Speed 3075.42 samples/sec Loss 1.1195 LearningRate 0.0008 Epoch: 18 Global Step: 226460 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:10:57,495-Speed 3029.02 samples/sec Loss 1.0885 LearningRate 0.0008 Epoch: 18 Global Step: 226470 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:11:00,947-Speed 2967.07 samples/sec Loss 1.1325 LearningRate 0.0008 Epoch: 18 Global Step: 226480 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:11:04,320-Speed 3036.66 samples/sec Loss 1.1211 LearningRate 0.0008 Epoch: 18 Global Step: 226490 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:11:07,684-Speed 3044.92 samples/sec Loss 1.1050 LearningRate 0.0008 Epoch: 18 Global Step: 226500 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:11:11,051-Speed 3041.98 samples/sec Loss 1.0907 LearningRate 0.0008 Epoch: 18 Global Step: 226510 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:11:14,416-Speed 3044.74 samples/sec Loss 1.0936 LearningRate 0.0008 Epoch: 18 Global Step: 226520 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:11:17,842-Speed 2989.31 samples/sec Loss 1.1096 LearningRate 0.0008 Epoch: 18 Global Step: 226530 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:11:21,268-Speed 2989.34 samples/sec Loss 1.0340 LearningRate 0.0008 Epoch: 18 Global Step: 226540 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:11:24,713-Speed 2973.21 samples/sec Loss 1.0652 LearningRate 0.0008 Epoch: 18 Global Step: 226550 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:11:28,089-Speed 3034.16 samples/sec Loss 1.0705 LearningRate 0.0008 Epoch: 18 Global Step: 226560 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:11:31,476-Speed 3024.16 samples/sec Loss 1.1251 LearningRate 0.0008 Epoch: 18 Global Step: 226570 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:11:34,771-Speed 3108.82 samples/sec Loss 1.1252 LearningRate 0.0008 Epoch: 18 Global Step: 226580 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:11:38,215-Speed 2974.05 samples/sec Loss 1.0957 LearningRate 0.0008 Epoch: 18 Global Step: 226590 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:11:41,690-Speed 2947.09 samples/sec Loss 1.1129 LearningRate 0.0008 Epoch: 18 Global Step: 226600 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:11:45,079-Speed 3022.54 samples/sec Loss 1.0776 LearningRate 0.0008 Epoch: 18 Global Step: 226610 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:11:48,534-Speed 2964.90 samples/sec Loss 1.1298 LearningRate 0.0008 Epoch: 18 Global Step: 226620 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:11:51,879-Speed 3062.30 samples/sec Loss 1.0855 LearningRate 0.0008 Epoch: 18 Global Step: 226630 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:11:55,229-Speed 3057.46 samples/sec Loss 1.1374 LearningRate 0.0008 Epoch: 18 Global Step: 226640 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:11:58,612-Speed 3027.66 samples/sec Loss 1.0443 LearningRate 0.0008 Epoch: 18 Global Step: 226650 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:12:02,031-Speed 2995.67 samples/sec Loss 1.0945 LearningRate 0.0008 Epoch: 18 Global Step: 226660 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:12:05,399-Speed 3041.57 samples/sec Loss 1.1028 LearningRate 0.0008 Epoch: 18 Global Step: 226670 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:12:08,731-Speed 3074.23 samples/sec Loss 1.0879 LearningRate 0.0008 Epoch: 18 Global Step: 226680 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:12:12,102-Speed 3038.38 samples/sec Loss 1.0364 LearningRate 0.0008 Epoch: 18 Global Step: 226690 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:12:15,472-Speed 3038.73 samples/sec Loss 1.0722 LearningRate 0.0008 Epoch: 18 Global Step: 226700 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:12:18,891-Speed 2995.90 samples/sec Loss 1.1222 LearningRate 0.0008 Epoch: 18 Global Step: 226710 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:12:22,370-Speed 2944.53 samples/sec Loss 1.0972 LearningRate 0.0008 Epoch: 18 Global Step: 226720 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:12:25,741-Speed 3038.13 samples/sec Loss 1.1026 LearningRate 0.0008 Epoch: 18 Global Step: 226730 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:12:29,096-Speed 3052.71 samples/sec Loss 1.1317 LearningRate 0.0008 Epoch: 18 Global Step: 226740 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:12:32,489-Speed 3019.29 samples/sec Loss 1.0848 LearningRate 0.0008 Epoch: 18 Global Step: 226750 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:12:35,908-Speed 2996.00 samples/sec Loss 1.1140 LearningRate 0.0008 Epoch: 18 Global Step: 226760 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:12:39,286-Speed 3032.05 samples/sec Loss 1.0573 LearningRate 0.0008 Epoch: 18 Global Step: 226770 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:12:42,661-Speed 3034.40 samples/sec Loss 1.1497 LearningRate 0.0008 Epoch: 18 Global Step: 226780 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:12:46,028-Speed 3042.29 samples/sec Loss 1.1069 LearningRate 0.0008 Epoch: 18 Global Step: 226790 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:12:49,362-Speed 3073.21 samples/sec Loss 1.0661 LearningRate 0.0008 Epoch: 18 Global Step: 226800 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:12:52,724-Speed 3046.67 samples/sec Loss 1.0589 LearningRate 0.0008 Epoch: 18 Global Step: 226810 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:12:56,114-Speed 3020.93 samples/sec Loss 1.1158 LearningRate 0.0008 Epoch: 18 Global Step: 226820 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:12:59,462-Speed 3059.61 samples/sec Loss 1.0818 LearningRate 0.0008 Epoch: 18 Global Step: 226830 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:13:02,807-Speed 3062.47 samples/sec Loss 1.0805 LearningRate 0.0008 Epoch: 18 Global Step: 226840 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:13:06,227-Speed 2994.45 samples/sec Loss 1.0819 LearningRate 0.0008 Epoch: 18 Global Step: 226850 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:13:09,662-Speed 2981.92 samples/sec Loss 1.0889 LearningRate 0.0008 Epoch: 18 Global Step: 226860 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:13:12,981-Speed 3087.05 samples/sec Loss 1.0510 LearningRate 0.0008 Epoch: 18 Global Step: 226870 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:13:16,342-Speed 3046.96 samples/sec Loss 1.1229 LearningRate 0.0008 Epoch: 18 Global Step: 226880 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:13:19,694-Speed 3056.45 samples/sec Loss 1.0941 LearningRate 0.0008 Epoch: 18 Global Step: 226890 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 23:13:23,013-Speed 3086.14 samples/sec Loss 1.0805 LearningRate 0.0008 Epoch: 18 Global Step: 226900 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:13:26,358-Speed 3062.57 samples/sec Loss 1.0732 LearningRate 0.0007 Epoch: 18 Global Step: 226910 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:13:29,689-Speed 3074.74 samples/sec Loss 1.0779 LearningRate 0.0007 Epoch: 18 Global Step: 226920 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:13:33,061-Speed 3038.02 samples/sec Loss 1.0957 LearningRate 0.0007 Epoch: 18 Global Step: 226930 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:13:36,393-Speed 3073.66 samples/sec Loss 1.1389 LearningRate 0.0007 Epoch: 18 Global Step: 226940 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:13:39,757-Speed 3045.22 samples/sec Loss 1.1414 LearningRate 0.0007 Epoch: 18 Global Step: 226950 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:13:43,104-Speed 3059.98 samples/sec Loss 1.0775 LearningRate 0.0007 Epoch: 18 Global Step: 226960 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:13:46,467-Speed 3046.25 samples/sec Loss 1.1047 LearningRate 0.0007 Epoch: 18 Global Step: 226970 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:13:49,883-Speed 2998.24 samples/sec Loss 1.1084 LearningRate 0.0007 Epoch: 18 Global Step: 226980 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:13:53,311-Speed 2988.31 samples/sec Loss 1.0917 LearningRate 0.0007 Epoch: 18 Global Step: 226990 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:13:56,819-Speed 2919.75 samples/sec Loss 1.0920 LearningRate 0.0007 Epoch: 18 Global Step: 227000 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:14:00,152-Speed 3073.72 samples/sec Loss 1.0219 LearningRate 0.0007 Epoch: 18 Global Step: 227010 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:14:03,523-Speed 3037.74 samples/sec Loss 1.0760 LearningRate 0.0007 Epoch: 18 Global Step: 227020 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:14:06,847-Speed 3081.83 samples/sec Loss 1.1339 LearningRate 0.0007 Epoch: 18 Global Step: 227030 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:14:10,198-Speed 3056.70 samples/sec Loss 1.1078 LearningRate 0.0007 Epoch: 18 Global Step: 227040 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:14:13,560-Speed 3046.28 samples/sec Loss 1.1200 LearningRate 0.0007 Epoch: 18 Global Step: 227050 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:14:16,904-Speed 3063.32 samples/sec Loss 1.0603 LearningRate 0.0007 Epoch: 18 Global Step: 227060 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:14:20,294-Speed 3020.90 samples/sec Loss 1.0709 LearningRate 0.0007 Epoch: 18 Global Step: 227070 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:14:23,690-Speed 3016.30 samples/sec Loss 1.1086 LearningRate 0.0007 Epoch: 18 Global Step: 227080 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:14:27,016-Speed 3079.77 samples/sec Loss 1.0866 LearningRate 0.0007 Epoch: 18 Global Step: 227090 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:14:30,439-Speed 2992.68 samples/sec Loss 1.1070 LearningRate 0.0007 Epoch: 18 Global Step: 227100 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:14:33,821-Speed 3028.29 samples/sec Loss 1.1365 LearningRate 0.0007 Epoch: 18 Global Step: 227110 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:14:37,189-Speed 3040.99 samples/sec Loss 1.0601 LearningRate 0.0007 Epoch: 18 Global Step: 227120 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:14:40,547-Speed 3050.37 samples/sec Loss 1.1241 LearningRate 0.0007 Epoch: 18 Global Step: 227130 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:14:43,890-Speed 3064.53 samples/sec Loss 1.1100 LearningRate 0.0007 Epoch: 18 Global Step: 227140 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 23:14:47,201-Speed 3093.33 samples/sec Loss 1.1183 LearningRate 0.0007 Epoch: 18 Global Step: 227150 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:14:50,527-Speed 3079.57 samples/sec Loss 1.0298 LearningRate 0.0007 Epoch: 18 Global Step: 227160 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:14:53,855-Speed 3078.44 samples/sec Loss 1.1200 LearningRate 0.0007 Epoch: 18 Global Step: 227170 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:14:57,206-Speed 3056.90 samples/sec Loss 1.1168 LearningRate 0.0007 Epoch: 18 Global Step: 227180 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:15:00,651-Speed 2972.63 samples/sec Loss 1.0942 LearningRate 0.0007 Epoch: 18 Global Step: 227190 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:15:04,012-Speed 3047.93 samples/sec Loss 1.1368 LearningRate 0.0007 Epoch: 18 Global Step: 227200 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:15:07,333-Speed 3084.54 samples/sec Loss 1.0885 LearningRate 0.0007 Epoch: 18 Global Step: 227210 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:15:10,713-Speed 3030.51 samples/sec Loss 1.1181 LearningRate 0.0007 Epoch: 18 Global Step: 227220 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:15:14,103-Speed 3022.15 samples/sec Loss 1.1359 LearningRate 0.0007 Epoch: 18 Global Step: 227230 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:15:17,461-Speed 3050.23 samples/sec Loss 1.1225 LearningRate 0.0007 Epoch: 18 Global Step: 227240 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:15:20,855-Speed 3018.16 samples/sec Loss 1.1087 LearningRate 0.0007 Epoch: 18 Global Step: 227250 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:15:24,244-Speed 3022.49 samples/sec Loss 1.1090 LearningRate 0.0007 Epoch: 18 Global Step: 227260 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:15:27,671-Speed 2988.29 samples/sec Loss 1.0962 LearningRate 0.0007 Epoch: 18 Global Step: 227270 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:15:31,045-Speed 3035.89 samples/sec Loss 1.1070 LearningRate 0.0007 Epoch: 18 Global Step: 227280 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:15:34,501-Speed 2963.78 samples/sec Loss 1.1125 LearningRate 0.0007 Epoch: 18 Global Step: 227290 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:15:37,950-Speed 2970.11 samples/sec Loss 1.1517 LearningRate 0.0007 Epoch: 18 Global Step: 227300 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:15:41,358-Speed 3004.95 samples/sec Loss 1.1311 LearningRate 0.0007 Epoch: 18 Global Step: 227310 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:15:44,837-Speed 2944.74 samples/sec Loss 1.0959 LearningRate 0.0007 Epoch: 18 Global Step: 227320 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:15:48,280-Speed 2974.57 samples/sec Loss 1.0931 LearningRate 0.0007 Epoch: 18 Global Step: 227330 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:15:51,714-Speed 2983.32 samples/sec Loss 1.1201 LearningRate 0.0007 Epoch: 18 Global Step: 227340 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:15:55,084-Speed 3040.00 samples/sec Loss 1.1040 LearningRate 0.0007 Epoch: 18 Global Step: 227350 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:15:58,558-Speed 2948.32 samples/sec Loss 1.0952 LearningRate 0.0007 Epoch: 18 Global Step: 227360 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:16:01,974-Speed 2998.35 samples/sec Loss 1.1116 LearningRate 0.0007 Epoch: 18 Global Step: 227370 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 23:16:05,415-Speed 2976.95 samples/sec Loss 1.1071 LearningRate 0.0007 Epoch: 18 Global Step: 227380 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:16:08,824-Speed 3004.56 samples/sec Loss 1.1021 LearningRate 0.0007 Epoch: 18 Global Step: 227390 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:16:12,265-Speed 2977.05 samples/sec Loss 1.1405 LearningRate 0.0007 Epoch: 18 Global Step: 227400 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:16:15,692-Speed 2988.49 samples/sec Loss 1.0892 LearningRate 0.0007 Epoch: 18 Global Step: 227410 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:16:19,036-Speed 3063.35 samples/sec Loss 1.1317 LearningRate 0.0007 Epoch: 18 Global Step: 227420 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:16:22,475-Speed 2978.66 samples/sec Loss 1.0832 LearningRate 0.0007 Epoch: 18 Global Step: 227430 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:16:25,844-Speed 3040.11 samples/sec Loss 1.0876 LearningRate 0.0007 Epoch: 18 Global Step: 227440 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:16:29,251-Speed 3007.56 samples/sec Loss 1.1259 LearningRate 0.0007 Epoch: 18 Global Step: 227450 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:16:32,661-Speed 3003.33 samples/sec Loss 1.1633 LearningRate 0.0007 Epoch: 18 Global Step: 227460 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:16:36,085-Speed 2991.29 samples/sec Loss 1.1020 LearningRate 0.0007 Epoch: 18 Global Step: 227470 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:16:39,485-Speed 3012.61 samples/sec Loss 1.0629 LearningRate 0.0007 Epoch: 18 Global Step: 227480 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:16:42,928-Speed 2975.25 samples/sec Loss 1.1044 LearningRate 0.0007 Epoch: 18 Global Step: 227490 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:16:46,344-Speed 2998.49 samples/sec Loss 1.0958 LearningRate 0.0007 Epoch: 18 Global Step: 227500 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:16:49,744-Speed 3012.03 samples/sec Loss 1.1279 LearningRate 0.0007 Epoch: 18 Global Step: 227510 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:16:53,176-Speed 2985.07 samples/sec Loss 1.1205 LearningRate 0.0007 Epoch: 18 Global Step: 227520 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:16:56,569-Speed 3018.04 samples/sec Loss 1.0605 LearningRate 0.0007 Epoch: 18 Global Step: 227530 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:16:59,957-Speed 3023.77 samples/sec Loss 1.0941 LearningRate 0.0007 Epoch: 18 Global Step: 227540 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:17:03,459-Speed 2925.01 samples/sec Loss 1.1051 LearningRate 0.0007 Epoch: 18 Global Step: 227550 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:17:06,816-Speed 3051.34 samples/sec Loss 1.0883 LearningRate 0.0007 Epoch: 18 Global Step: 227560 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:17:10,202-Speed 3024.50 samples/sec Loss 1.1239 LearningRate 0.0007 Epoch: 18 Global Step: 227570 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:17:13,562-Speed 3048.82 samples/sec Loss 1.0618 LearningRate 0.0007 Epoch: 18 Global Step: 227580 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:17:16,977-Speed 2999.41 samples/sec Loss 1.1112 LearningRate 0.0007 Epoch: 18 Global Step: 227590 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:17:20,328-Speed 3057.22 samples/sec Loss 1.0834 LearningRate 0.0007 Epoch: 18 Global Step: 227600 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:17:23,710-Speed 3028.07 samples/sec Loss 1.1335 LearningRate 0.0007 Epoch: 18 Global Step: 227610 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:17:27,136-Speed 2989.98 samples/sec Loss 1.0811 LearningRate 0.0007 Epoch: 18 Global Step: 227620 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:17:30,499-Speed 3045.97 samples/sec Loss 1.0659 LearningRate 0.0007 Epoch: 18 Global Step: 227630 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:17:33,813-Speed 3090.86 samples/sec Loss 1.1121 LearningRate 0.0007 Epoch: 18 Global Step: 227640 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:17:37,125-Speed 3092.51 samples/sec Loss 1.1448 LearningRate 0.0007 Epoch: 18 Global Step: 227650 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:17:40,454-Speed 3077.05 samples/sec Loss 1.1032 LearningRate 0.0007 Epoch: 18 Global Step: 227660 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:17:43,804-Speed 3057.73 samples/sec Loss 1.0837 LearningRate 0.0007 Epoch: 18 Global Step: 227670 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:17:47,134-Speed 3075.64 samples/sec Loss 1.0764 LearningRate 0.0007 Epoch: 18 Global Step: 227680 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:17:50,532-Speed 3014.68 samples/sec Loss 1.0992 LearningRate 0.0007 Epoch: 18 Global Step: 227690 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:17:53,896-Speed 3044.74 samples/sec Loss 1.0864 LearningRate 0.0007 Epoch: 18 Global Step: 227700 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:17:57,280-Speed 3026.49 samples/sec Loss 1.0903 LearningRate 0.0007 Epoch: 18 Global Step: 227710 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:18:00,720-Speed 2977.56 samples/sec Loss 1.1241 LearningRate 0.0007 Epoch: 18 Global Step: 227720 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:18:04,114-Speed 3018.32 samples/sec Loss 1.0325 LearningRate 0.0007 Epoch: 18 Global Step: 227730 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:18:07,531-Speed 2996.70 samples/sec Loss 1.1037 LearningRate 0.0007 Epoch: 18 Global Step: 227740 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:18:10,890-Speed 3049.79 samples/sec Loss 1.0998 LearningRate 0.0007 Epoch: 18 Global Step: 227750 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:18:14,205-Speed 3089.82 samples/sec Loss 1.0825 LearningRate 0.0007 Epoch: 18 Global Step: 227760 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:18:17,606-Speed 3011.49 samples/sec Loss 1.1276 LearningRate 0.0007 Epoch: 18 Global Step: 227770 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:18:21,009-Speed 3010.68 samples/sec Loss 1.0934 LearningRate 0.0007 Epoch: 18 Global Step: 227780 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:18:24,454-Speed 2972.50 samples/sec Loss 1.0958 LearningRate 0.0007 Epoch: 18 Global Step: 227790 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:18:27,847-Speed 3019.01 samples/sec Loss 1.0889 LearningRate 0.0007 Epoch: 18 Global Step: 227800 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:18:31,224-Speed 3033.61 samples/sec Loss 1.0742 LearningRate 0.0007 Epoch: 18 Global Step: 227810 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:18:34,598-Speed 3035.70 samples/sec Loss 1.0435 LearningRate 0.0007 Epoch: 18 Global Step: 227820 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:18:37,985-Speed 3024.07 samples/sec Loss 1.1108 LearningRate 0.0007 Epoch: 18 Global Step: 227830 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:18:41,437-Speed 2966.82 samples/sec Loss 1.1295 LearningRate 0.0007 Epoch: 18 Global Step: 227840 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:18:44,809-Speed 3037.44 samples/sec Loss 1.1160 LearningRate 0.0007 Epoch: 18 Global Step: 227850 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:18:48,196-Speed 3024.10 samples/sec Loss 1.1030 LearningRate 0.0007 Epoch: 18 Global Step: 227860 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:18:51,603-Speed 3006.64 samples/sec Loss 1.1043 LearningRate 0.0007 Epoch: 18 Global Step: 227870 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:18:54,941-Speed 3068.65 samples/sec Loss 1.0967 LearningRate 0.0007 Epoch: 18 Global Step: 227880 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:18:58,318-Speed 3033.64 samples/sec Loss 1.1592 LearningRate 0.0007 Epoch: 18 Global Step: 227890 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:19:01,735-Speed 2997.72 samples/sec Loss 1.1043 LearningRate 0.0007 Epoch: 18 Global Step: 227900 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:19:05,153-Speed 2996.46 samples/sec Loss 1.1204 LearningRate 0.0007 Epoch: 18 Global Step: 227910 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:19:08,625-Speed 2950.18 samples/sec Loss 1.0874 LearningRate 0.0007 Epoch: 18 Global Step: 227920 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:19:12,026-Speed 3012.00 samples/sec Loss 1.0801 LearningRate 0.0007 Epoch: 18 Global Step: 227930 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:19:15,373-Speed 3060.16 samples/sec Loss 1.0873 LearningRate 0.0007 Epoch: 18 Global Step: 227940 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:19:18,766-Speed 3018.70 samples/sec Loss 1.1272 LearningRate 0.0007 Epoch: 18 Global Step: 227950 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:19:22,111-Speed 3061.56 samples/sec Loss 1.1106 LearningRate 0.0007 Epoch: 18 Global Step: 227960 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:19:25,445-Speed 3071.93 samples/sec Loss 1.1063 LearningRate 0.0007 Epoch: 18 Global Step: 227970 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:19:28,805-Speed 3049.49 samples/sec Loss 1.0820 LearningRate 0.0007 Epoch: 18 Global Step: 227980 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:19:32,183-Speed 3031.97 samples/sec Loss 1.1356 LearningRate 0.0007 Epoch: 18 Global Step: 227990 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:19:35,562-Speed 3030.93 samples/sec Loss 1.0837 LearningRate 0.0007 Epoch: 18 Global Step: 228000 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:19:38,905-Speed 3064.55 samples/sec Loss 1.1311 LearningRate 0.0007 Epoch: 18 Global Step: 228010 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:19:42,328-Speed 2991.79 samples/sec Loss 1.1039 LearningRate 0.0007 Epoch: 18 Global Step: 228020 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:19:45,740-Speed 3002.33 samples/sec Loss 1.0975 LearningRate 0.0007 Epoch: 18 Global Step: 228030 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:19:49,089-Speed 3058.58 samples/sec Loss 1.0938 LearningRate 0.0007 Epoch: 18 Global Step: 228040 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:19:52,412-Speed 3081.88 samples/sec Loss 1.1067 LearningRate 0.0007 Epoch: 18 Global Step: 228050 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 23:19:55,751-Speed 3067.84 samples/sec Loss 1.1008 LearningRate 0.0007 Epoch: 18 Global Step: 228060 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 23:19:59,089-Speed 3069.19 samples/sec Loss 1.1273 LearningRate 0.0007 Epoch: 18 Global Step: 228070 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 23:20:02,442-Speed 3054.15 samples/sec Loss 1.0999 LearningRate 0.0007 Epoch: 18 Global Step: 228080 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 23:20:05,849-Speed 3007.25 samples/sec Loss 1.1094 LearningRate 0.0007 Epoch: 18 Global Step: 228090 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:20:09,246-Speed 3014.90 samples/sec Loss 1.1008 LearningRate 0.0007 Epoch: 18 Global Step: 228100 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:20:12,626-Speed 3030.03 samples/sec Loss 1.0867 LearningRate 0.0007 Epoch: 18 Global Step: 228110 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:20:15,985-Speed 3050.00 samples/sec Loss 1.1044 LearningRate 0.0007 Epoch: 18 Global Step: 228120 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:20:19,351-Speed 3043.05 samples/sec Loss 1.1463 LearningRate 0.0007 Epoch: 18 Global Step: 228130 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:20:22,755-Speed 3009.20 samples/sec Loss 1.0831 LearningRate 0.0007 Epoch: 18 Global Step: 228140 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:20:26,173-Speed 2996.08 samples/sec Loss 1.1236 LearningRate 0.0007 Epoch: 18 Global Step: 228150 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:20:29,509-Speed 3070.88 samples/sec Loss 1.0343 LearningRate 0.0007 Epoch: 18 Global Step: 228160 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:20:32,904-Speed 3017.31 samples/sec Loss 1.0673 LearningRate 0.0007 Epoch: 18 Global Step: 228170 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:20:36,276-Speed 3036.75 samples/sec Loss 1.0937 LearningRate 0.0007 Epoch: 18 Global Step: 228180 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:20:39,674-Speed 3014.79 samples/sec Loss 1.1095 LearningRate 0.0007 Epoch: 18 Global Step: 228190 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:20:43,062-Speed 3023.72 samples/sec Loss 1.1484 LearningRate 0.0007 Epoch: 18 Global Step: 228200 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:20:46,436-Speed 3035.42 samples/sec Loss 1.1480 LearningRate 0.0007 Epoch: 18 Global Step: 228210 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:20:49,961-Speed 2905.88 samples/sec Loss 1.0979 LearningRate 0.0007 Epoch: 18 Global Step: 228220 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:20:53,415-Speed 2965.65 samples/sec Loss 1.1137 LearningRate 0.0007 Epoch: 18 Global Step: 228230 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:20:56,776-Speed 3047.14 samples/sec Loss 1.0988 LearningRate 0.0007 Epoch: 18 Global Step: 228240 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:21:00,096-Speed 3086.16 samples/sec Loss 1.1296 LearningRate 0.0007 Epoch: 18 Global Step: 228250 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:21:03,494-Speed 3014.10 samples/sec Loss 1.0759 LearningRate 0.0007 Epoch: 18 Global Step: 228260 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:21:06,846-Speed 3055.94 samples/sec Loss 1.1429 LearningRate 0.0007 Epoch: 18 Global Step: 228270 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:21:10,217-Speed 3038.95 samples/sec Loss 1.1332 LearningRate 0.0007 Epoch: 18 Global Step: 228280 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:21:13,579-Speed 3045.92 samples/sec Loss 1.0732 LearningRate 0.0007 Epoch: 18 Global Step: 228290 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:21:17,013-Speed 2983.40 samples/sec Loss 1.0788 LearningRate 0.0007 Epoch: 18 Global Step: 228300 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:21:20,464-Speed 2968.67 samples/sec Loss 1.0862 LearningRate 0.0007 Epoch: 18 Global Step: 228310 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:21:23,879-Speed 2999.17 samples/sec Loss 1.0588 LearningRate 0.0007 Epoch: 18 Global Step: 228320 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:21:27,239-Speed 3048.22 samples/sec Loss 1.1324 LearningRate 0.0007 Epoch: 18 Global Step: 228330 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:21:30,757-Speed 2911.96 samples/sec Loss 1.1176 LearningRate 0.0007 Epoch: 18 Global Step: 228340 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:21:34,088-Speed 3075.31 samples/sec Loss 1.0886 LearningRate 0.0007 Epoch: 18 Global Step: 228350 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:21:37,484-Speed 3015.82 samples/sec Loss 1.0895 LearningRate 0.0007 Epoch: 18 Global Step: 228360 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:21:40,899-Speed 2999.78 samples/sec Loss 1.1194 LearningRate 0.0007 Epoch: 18 Global Step: 228370 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:21:44,316-Speed 2997.24 samples/sec Loss 1.0668 LearningRate 0.0007 Epoch: 18 Global Step: 228380 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:21:47,788-Speed 2950.23 samples/sec Loss 1.0860 LearningRate 0.0007 Epoch: 18 Global Step: 228390 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:21:51,190-Speed 3023.08 samples/sec Loss 1.1127 LearningRate 0.0006 Epoch: 18 Global Step: 228400 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:21:54,611-Speed 2994.08 samples/sec Loss 1.0815 LearningRate 0.0006 Epoch: 18 Global Step: 228410 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:21:57,979-Speed 3040.93 samples/sec Loss 1.0808 LearningRate 0.0006 Epoch: 18 Global Step: 228420 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:22:01,433-Speed 2965.78 samples/sec Loss 1.1133 LearningRate 0.0006 Epoch: 18 Global Step: 228430 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:22:04,818-Speed 3025.85 samples/sec Loss 1.0857 LearningRate 0.0006 Epoch: 18 Global Step: 228440 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:22:08,206-Speed 3023.27 samples/sec Loss 1.0969 LearningRate 0.0006 Epoch: 18 Global Step: 228450 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:22:11,611-Speed 3008.27 samples/sec Loss 1.0903 LearningRate 0.0006 Epoch: 18 Global Step: 228460 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:22:14,968-Speed 3051.82 samples/sec Loss 1.0803 LearningRate 0.0006 Epoch: 18 Global Step: 228470 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:22:18,392-Speed 2991.22 samples/sec Loss 1.1423 LearningRate 0.0006 Epoch: 18 Global Step: 228480 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:22:21,716-Speed 3082.12 samples/sec Loss 1.1475 LearningRate 0.0006 Epoch: 18 Global Step: 228490 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:22:25,149-Speed 2983.91 samples/sec Loss 1.0984 LearningRate 0.0006 Epoch: 18 Global Step: 228500 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:22:28,533-Speed 3027.13 samples/sec Loss 1.1171 LearningRate 0.0006 Epoch: 18 Global Step: 228510 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:22:31,982-Speed 2969.39 samples/sec Loss 1.0928 LearningRate 0.0006 Epoch: 18 Global Step: 228520 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:22:35,396-Speed 3000.35 samples/sec Loss 1.0629 LearningRate 0.0006 Epoch: 18 Global Step: 228530 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:22:38,845-Speed 2969.96 samples/sec Loss 1.1182 LearningRate 0.0006 Epoch: 18 Global Step: 228540 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:22:42,207-Speed 3046.53 samples/sec Loss 1.1314 LearningRate 0.0006 Epoch: 18 Global Step: 228550 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:22:45,644-Speed 2979.69 samples/sec Loss 1.1175 LearningRate 0.0006 Epoch: 18 Global Step: 228560 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:22:48,954-Speed 3094.86 samples/sec Loss 1.1013 LearningRate 0.0006 Epoch: 18 Global Step: 228570 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:22:52,357-Speed 3010.22 samples/sec Loss 1.1028 LearningRate 0.0006 Epoch: 18 Global Step: 228580 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:22:55,774-Speed 2997.39 samples/sec Loss 1.1454 LearningRate 0.0006 Epoch: 18 Global Step: 228590 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:22:59,101-Speed 3079.11 samples/sec Loss 1.1086 LearningRate 0.0006 Epoch: 18 Global Step: 228600 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:23:02,498-Speed 3015.08 samples/sec Loss 1.0751 LearningRate 0.0006 Epoch: 18 Global Step: 228610 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:23:05,907-Speed 3005.45 samples/sec Loss 1.1307 LearningRate 0.0006 Epoch: 18 Global Step: 228620 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:23:09,275-Speed 3040.71 samples/sec Loss 1.1271 LearningRate 0.0006 Epoch: 18 Global Step: 228630 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:23:12,720-Speed 2973.81 samples/sec Loss 1.1370 LearningRate 0.0006 Epoch: 18 Global Step: 228640 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:23:16,169-Speed 2969.27 samples/sec Loss 1.1220 LearningRate 0.0006 Epoch: 18 Global Step: 228650 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:23:19,604-Speed 2983.41 samples/sec Loss 1.1360 LearningRate 0.0006 Epoch: 18 Global Step: 228660 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:23:22,948-Speed 3062.89 samples/sec Loss 1.1588 LearningRate 0.0006 Epoch: 18 Global Step: 228670 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:23:26,353-Speed 3008.05 samples/sec Loss 1.0996 LearningRate 0.0006 Epoch: 18 Global Step: 228680 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:23:29,753-Speed 3012.92 samples/sec Loss 1.1262 LearningRate 0.0006 Epoch: 18 Global Step: 228690 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:23:33,116-Speed 3045.58 samples/sec Loss 1.1787 LearningRate 0.0006 Epoch: 18 Global Step: 228700 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:23:36,425-Speed 3095.38 samples/sec Loss 1.0673 LearningRate 0.0006 Epoch: 18 Global Step: 228710 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:23:39,770-Speed 3062.18 samples/sec Loss 1.1175 LearningRate 0.0006 Epoch: 18 Global Step: 228720 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:23:43,153-Speed 3027.04 samples/sec Loss 1.1057 LearningRate 0.0006 Epoch: 18 Global Step: 228730 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:23:46,557-Speed 3010.36 samples/sec Loss 1.1081 LearningRate 0.0006 Epoch: 18 Global Step: 228740 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:23:49,917-Speed 3047.82 samples/sec Loss 1.1092 LearningRate 0.0006 Epoch: 18 Global Step: 228750 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:23:53,367-Speed 2968.89 samples/sec Loss 1.1305 LearningRate 0.0006 Epoch: 18 Global Step: 228760 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:23:56,714-Speed 3060.76 samples/sec Loss 1.0761 LearningRate 0.0006 Epoch: 18 Global Step: 228770 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 23:24:00,133-Speed 2995.42 samples/sec Loss 1.1108 LearningRate 0.0006 Epoch: 18 Global Step: 228780 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 23:24:03,432-Speed 3105.40 samples/sec Loss 1.0770 LearningRate 0.0006 Epoch: 18 Global Step: 228790 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:24:06,801-Speed 3039.74 samples/sec Loss 1.0986 LearningRate 0.0006 Epoch: 18 Global Step: 228800 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:24:10,224-Speed 2992.91 samples/sec Loss 1.1296 LearningRate 0.0006 Epoch: 18 Global Step: 228810 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:24:13,686-Speed 2958.60 samples/sec Loss 1.1741 LearningRate 0.0006 Epoch: 18 Global Step: 228820 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:24:17,041-Speed 3053.07 samples/sec Loss 1.1246 LearningRate 0.0006 Epoch: 18 Global Step: 228830 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:24:20,397-Speed 3052.56 samples/sec Loss 1.1690 LearningRate 0.0006 Epoch: 18 Global Step: 228840 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:24:23,806-Speed 3004.65 samples/sec Loss 1.0845 LearningRate 0.0006 Epoch: 18 Global Step: 228850 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:24:27,215-Speed 3004.21 samples/sec Loss 1.1382 LearningRate 0.0006 Epoch: 18 Global Step: 228860 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:24:30,717-Speed 2925.20 samples/sec Loss 1.1052 LearningRate 0.0006 Epoch: 18 Global Step: 228870 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:24:34,108-Speed 3020.46 samples/sec Loss 1.1060 LearningRate 0.0006 Epoch: 18 Global Step: 228880 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:24:37,598-Speed 2934.39 samples/sec Loss 1.1254 LearningRate 0.0006 Epoch: 18 Global Step: 228890 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:24:41,063-Speed 2956.30 samples/sec Loss 1.0807 LearningRate 0.0006 Epoch: 18 Global Step: 228900 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:24:44,428-Speed 3043.56 samples/sec Loss 1.1460 LearningRate 0.0006 Epoch: 18 Global Step: 228910 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:24:47,815-Speed 3024.10 samples/sec Loss 1.0614 LearningRate 0.0006 Epoch: 18 Global Step: 228920 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:24:51,260-Speed 2973.91 samples/sec Loss 1.1098 LearningRate 0.0006 Epoch: 18 Global Step: 228930 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:24:54,603-Speed 3063.91 samples/sec Loss 1.1211 LearningRate 0.0006 Epoch: 18 Global Step: 228940 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:24:58,054-Speed 2968.03 samples/sec Loss 1.1395 LearningRate 0.0006 Epoch: 18 Global Step: 228950 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:25:01,402-Speed 3059.41 samples/sec Loss 1.1516 LearningRate 0.0006 Epoch: 18 Global Step: 228960 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:25:04,859-Speed 2963.38 samples/sec Loss 1.0979 LearningRate 0.0006 Epoch: 18 Global Step: 228970 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:25:08,313-Speed 2964.82 samples/sec Loss 1.0881 LearningRate 0.0006 Epoch: 18 Global Step: 228980 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:25:11,725-Speed 3001.49 samples/sec Loss 1.0973 LearningRate 0.0006 Epoch: 18 Global Step: 228990 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 23:25:15,146-Speed 2994.38 samples/sec Loss 1.1250 LearningRate 0.0006 Epoch: 18 Global Step: 229000 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:25:18,613-Speed 2954.49 samples/sec Loss 1.0844 LearningRate 0.0006 Epoch: 18 Global Step: 229010 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:25:21,977-Speed 3045.63 samples/sec Loss 1.1378 LearningRate 0.0006 Epoch: 18 Global Step: 229020 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:25:25,469-Speed 2932.88 samples/sec Loss 1.0832 LearningRate 0.0006 Epoch: 18 Global Step: 229030 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:25:28,972-Speed 2924.40 samples/sec Loss 1.1526 LearningRate 0.0006 Epoch: 18 Global Step: 229040 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:25:32,419-Speed 2971.99 samples/sec Loss 1.1440 LearningRate 0.0006 Epoch: 18 Global Step: 229050 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:25:35,820-Speed 3011.18 samples/sec Loss 1.1031 LearningRate 0.0006 Epoch: 18 Global Step: 229060 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:25:39,163-Speed 3064.35 samples/sec Loss 1.1167 LearningRate 0.0006 Epoch: 18 Global Step: 229070 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:25:42,498-Speed 3070.83 samples/sec Loss 1.1222 LearningRate 0.0006 Epoch: 18 Global Step: 229080 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:25:45,901-Speed 3011.15 samples/sec Loss 1.1674 LearningRate 0.0006 Epoch: 18 Global Step: 229090 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:25:49,345-Speed 2974.08 samples/sec Loss 1.1529 LearningRate 0.0006 Epoch: 18 Global Step: 229100 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:25:52,768-Speed 2992.21 samples/sec Loss 1.1313 LearningRate 0.0006 Epoch: 18 Global Step: 229110 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:25:56,184-Speed 2998.35 samples/sec Loss 1.0565 LearningRate 0.0006 Epoch: 18 Global Step: 229120 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:25:59,516-Speed 3074.90 samples/sec Loss 1.1151 LearningRate 0.0006 Epoch: 18 Global Step: 229130 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:26:02,867-Speed 3056.09 samples/sec Loss 1.0975 LearningRate 0.0006 Epoch: 18 Global Step: 229140 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:26:06,231-Speed 3044.78 samples/sec Loss 1.1128 LearningRate 0.0006 Epoch: 18 Global Step: 229150 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:26:09,683-Speed 2967.32 samples/sec Loss 1.1317 LearningRate 0.0006 Epoch: 18 Global Step: 229160 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:26:13,077-Speed 3018.15 samples/sec Loss 1.1034 LearningRate 0.0006 Epoch: 18 Global Step: 229170 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:26:16,507-Speed 2985.98 samples/sec Loss 1.0619 LearningRate 0.0006 Epoch: 18 Global Step: 229180 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:26:19,904-Speed 3015.81 samples/sec Loss 1.1038 LearningRate 0.0006 Epoch: 18 Global Step: 229190 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:26:23,272-Speed 3040.42 samples/sec Loss 1.0875 LearningRate 0.0006 Epoch: 18 Global Step: 229200 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:26:26,669-Speed 3015.41 samples/sec Loss 1.1136 LearningRate 0.0006 Epoch: 18 Global Step: 229210 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:26:30,028-Speed 3049.29 samples/sec Loss 1.1179 LearningRate 0.0006 Epoch: 18 Global Step: 229220 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:26:33,514-Speed 2938.70 samples/sec Loss 1.1115 LearningRate 0.0006 Epoch: 18 Global Step: 229230 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:26:36,862-Speed 3059.36 samples/sec Loss 1.1051 LearningRate 0.0006 Epoch: 18 Global Step: 229240 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:26:40,229-Speed 3042.22 samples/sec Loss 1.1280 LearningRate 0.0006 Epoch: 18 Global Step: 229250 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:26:43,533-Speed 3099.69 samples/sec Loss 1.0992 LearningRate 0.0006 Epoch: 18 Global Step: 229260 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:26:46,947-Speed 3000.51 samples/sec Loss 1.0965 LearningRate 0.0006 Epoch: 18 Global Step: 229270 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:26:50,438-Speed 2934.25 samples/sec Loss 1.1280 LearningRate 0.0006 Epoch: 18 Global Step: 229280 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:26:53,824-Speed 3025.05 samples/sec Loss 1.0717 LearningRate 0.0006 Epoch: 18 Global Step: 229290 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:26:57,258-Speed 2982.04 samples/sec Loss 1.1464 LearningRate 0.0006 Epoch: 18 Global Step: 229300 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:27:00,650-Speed 3020.35 samples/sec Loss 1.1193 LearningRate 0.0006 Epoch: 18 Global Step: 229310 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:27:04,021-Speed 3038.90 samples/sec Loss 1.1004 LearningRate 0.0006 Epoch: 18 Global Step: 229320 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:27:07,361-Speed 3066.55 samples/sec Loss 1.1028 LearningRate 0.0006 Epoch: 18 Global Step: 229330 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:27:10,677-Speed 3089.01 samples/sec Loss 1.0830 LearningRate 0.0006 Epoch: 18 Global Step: 229340 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:27:14,019-Speed 3064.53 samples/sec Loss 1.1057 LearningRate 0.0006 Epoch: 18 Global Step: 229350 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:27:17,484-Speed 2956.43 samples/sec Loss 1.1154 LearningRate 0.0006 Epoch: 18 Global Step: 229360 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:27:20,914-Speed 2985.94 samples/sec Loss 1.1194 LearningRate 0.0006 Epoch: 18 Global Step: 229370 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:27:24,299-Speed 3026.25 samples/sec Loss 1.1597 LearningRate 0.0006 Epoch: 18 Global Step: 229380 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:27:27,744-Speed 2972.91 samples/sec Loss 1.1183 LearningRate 0.0006 Epoch: 18 Global Step: 229390 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:27:31,184-Speed 2978.61 samples/sec Loss 1.1769 LearningRate 0.0006 Epoch: 18 Global Step: 229400 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:27:34,565-Speed 3028.59 samples/sec Loss 1.1143 LearningRate 0.0006 Epoch: 18 Global Step: 229410 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:27:38,024-Speed 2961.74 samples/sec Loss 1.0920 LearningRate 0.0006 Epoch: 18 Global Step: 229420 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:27:41,501-Speed 2945.95 samples/sec Loss 1.1437 LearningRate 0.0006 Epoch: 18 Global Step: 229430 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:27:44,827-Speed 3078.93 samples/sec Loss 1.1313 LearningRate 0.0006 Epoch: 18 Global Step: 229440 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:27:48,153-Speed 3079.55 samples/sec Loss 1.0921 LearningRate 0.0006 Epoch: 18 Global Step: 229450 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:27:51,546-Speed 3020.22 samples/sec Loss 1.0737 LearningRate 0.0006 Epoch: 18 Global Step: 229460 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:27:54,869-Speed 3083.12 samples/sec Loss 1.1462 LearningRate 0.0006 Epoch: 18 Global Step: 229470 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:27:58,220-Speed 3056.09 samples/sec Loss 1.1396 LearningRate 0.0006 Epoch: 18 Global Step: 229480 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:28:01,649-Speed 2987.16 samples/sec Loss 1.1120 LearningRate 0.0006 Epoch: 18 Global Step: 229490 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:28:05,043-Speed 3017.98 samples/sec Loss 1.0800 LearningRate 0.0006 Epoch: 18 Global Step: 229500 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:28:08,481-Speed 2979.18 samples/sec Loss 1.1299 LearningRate 0.0006 Epoch: 18 Global Step: 229510 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:28:11,848-Speed 3042.07 samples/sec Loss 1.1173 LearningRate 0.0006 Epoch: 18 Global Step: 229520 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:28:15,328-Speed 2943.49 samples/sec Loss 1.1304 LearningRate 0.0006 Epoch: 18 Global Step: 229530 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:28:18,765-Speed 2980.25 samples/sec Loss 1.1313 LearningRate 0.0006 Epoch: 18 Global Step: 229540 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:28:22,128-Speed 3045.95 samples/sec Loss 1.1055 LearningRate 0.0006 Epoch: 18 Global Step: 229550 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:28:25,502-Speed 3035.90 samples/sec Loss 1.0789 LearningRate 0.0006 Epoch: 18 Global Step: 229560 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:28:28,918-Speed 2998.75 samples/sec Loss 1.1318 LearningRate 0.0006 Epoch: 18 Global Step: 229570 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:28:32,364-Speed 2972.35 samples/sec Loss 1.1113 LearningRate 0.0006 Epoch: 18 Global Step: 229580 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:28:35,781-Speed 2997.25 samples/sec Loss 1.0970 LearningRate 0.0006 Epoch: 18 Global Step: 229590 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:28:39,215-Speed 2983.39 samples/sec Loss 1.1043 LearningRate 0.0006 Epoch: 18 Global Step: 229600 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:28:42,619-Speed 3008.28 samples/sec Loss 1.0758 LearningRate 0.0006 Epoch: 18 Global Step: 229610 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:28:45,971-Speed 3056.23 samples/sec Loss 1.0842 LearningRate 0.0006 Epoch: 18 Global Step: 229620 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:28:49,393-Speed 2993.31 samples/sec Loss 1.0921 LearningRate 0.0006 Epoch: 18 Global Step: 229630 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:28:52,799-Speed 3006.96 samples/sec Loss 1.1041 LearningRate 0.0006 Epoch: 18 Global Step: 229640 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:28:56,158-Speed 3049.10 samples/sec Loss 1.1252 LearningRate 0.0006 Epoch: 18 Global Step: 229650 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:28:59,540-Speed 3028.70 samples/sec Loss 1.1033 LearningRate 0.0006 Epoch: 18 Global Step: 229660 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:29:02,964-Speed 2991.47 samples/sec Loss 1.1086 LearningRate 0.0006 Epoch: 18 Global Step: 229670 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:29:06,325-Speed 3047.29 samples/sec Loss 1.0677 LearningRate 0.0006 Epoch: 18 Global Step: 229680 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:29:09,792-Speed 2955.13 samples/sec Loss 1.1035 LearningRate 0.0006 Epoch: 18 Global Step: 229690 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:29:13,223-Speed 2985.00 samples/sec Loss 1.0645 LearningRate 0.0006 Epoch: 18 Global Step: 229700 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:29:16,595-Speed 3037.90 samples/sec Loss 1.0982 LearningRate 0.0006 Epoch: 18 Global Step: 229710 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:29:19,986-Speed 3020.61 samples/sec Loss 1.1068 LearningRate 0.0006 Epoch: 18 Global Step: 229720 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:29:23,505-Speed 2910.68 samples/sec Loss 1.1565 LearningRate 0.0006 Epoch: 18 Global Step: 229730 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:29:26,915-Speed 3003.99 samples/sec Loss 1.1110 LearningRate 0.0006 Epoch: 18 Global Step: 229740 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:29:30,301-Speed 3024.96 samples/sec Loss 1.1240 LearningRate 0.0006 Epoch: 18 Global Step: 229750 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:29:33,676-Speed 3034.89 samples/sec Loss 1.1014 LearningRate 0.0006 Epoch: 18 Global Step: 229760 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:29:37,127-Speed 2968.59 samples/sec Loss 1.0969 LearningRate 0.0006 Epoch: 18 Global Step: 229770 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:29:40,490-Speed 3045.84 samples/sec Loss 1.1308 LearningRate 0.0006 Epoch: 18 Global Step: 229780 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:29:43,885-Speed 3017.09 samples/sec Loss 1.1413 LearningRate 0.0006 Epoch: 18 Global Step: 229790 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:29:47,280-Speed 3017.25 samples/sec Loss 1.0503 LearningRate 0.0006 Epoch: 18 Global Step: 229800 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:29:50,663-Speed 3027.34 samples/sec Loss 1.1360 LearningRate 0.0006 Epoch: 18 Global Step: 229810 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 23:29:54,060-Speed 3015.28 samples/sec Loss 1.1119 LearningRate 0.0006 Epoch: 18 Global Step: 229820 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:29:57,482-Speed 2993.33 samples/sec Loss 1.1111 LearningRate 0.0006 Epoch: 18 Global Step: 229830 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:30:00,976-Speed 2931.98 samples/sec Loss 1.1424 LearningRate 0.0006 Epoch: 18 Global Step: 229840 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:30:04,506-Speed 2901.37 samples/sec Loss 1.1403 LearningRate 0.0006 Epoch: 18 Global Step: 229850 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:30:07,935-Speed 2987.38 samples/sec Loss 1.1203 LearningRate 0.0006 Epoch: 18 Global Step: 229860 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:30:11,314-Speed 3031.35 samples/sec Loss 1.1300 LearningRate 0.0006 Epoch: 18 Global Step: 229870 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:30:14,771-Speed 2962.85 samples/sec Loss 1.1542 LearningRate 0.0006 Epoch: 18 Global Step: 229880 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:30:18,217-Speed 2972.60 samples/sec Loss 1.1513 LearningRate 0.0006 Epoch: 18 Global Step: 229890 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:30:21,665-Speed 2970.20 samples/sec Loss 1.1260 LearningRate 0.0006 Epoch: 18 Global Step: 229900 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:30:25,116-Speed 2967.67 samples/sec Loss 1.1325 LearningRate 0.0006 Epoch: 18 Global Step: 229910 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:30:28,665-Speed 2886.02 samples/sec Loss 1.1055 LearningRate 0.0006 Epoch: 18 Global Step: 229920 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:30:32,051-Speed 3025.04 samples/sec Loss 1.0541 LearningRate 0.0006 Epoch: 18 Global Step: 229930 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:30:35,435-Speed 3027.39 samples/sec Loss 1.0863 LearningRate 0.0006 Epoch: 18 Global Step: 229940 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:30:38,876-Speed 2976.15 samples/sec Loss 1.1085 LearningRate 0.0006 Epoch: 18 Global Step: 229950 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:30:42,199-Speed 3083.08 samples/sec Loss 1.1020 LearningRate 0.0006 Epoch: 18 Global Step: 229960 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:30:45,639-Speed 2977.05 samples/sec Loss 1.0861 LearningRate 0.0006 Epoch: 18 Global Step: 229970 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:30:49,007-Speed 3041.87 samples/sec Loss 1.1173 LearningRate 0.0006 Epoch: 18 Global Step: 229980 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:30:52,406-Speed 3013.50 samples/sec Loss 1.1140 LearningRate 0.0006 Epoch: 18 Global Step: 229990 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:30:55,827-Speed 2993.98 samples/sec Loss 1.0825 LearningRate 0.0005 Epoch: 18 Global Step: 230000 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:30:59,203-Speed 3034.02 samples/sec Loss 1.1172 LearningRate 0.0005 Epoch: 18 Global Step: 230010 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:31:02,552-Speed 3058.79 samples/sec Loss 1.1422 LearningRate 0.0005 Epoch: 18 Global Step: 230020 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:31:05,989-Speed 2980.24 samples/sec Loss 1.1389 LearningRate 0.0005 Epoch: 18 Global Step: 230030 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:31:09,330-Speed 3066.16 samples/sec Loss 1.0808 LearningRate 0.0005 Epoch: 18 Global Step: 230040 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:31:12,768-Speed 2979.29 samples/sec Loss 1.1279 LearningRate 0.0005 Epoch: 18 Global Step: 230050 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:31:16,173-Speed 3008.73 samples/sec Loss 1.1407 LearningRate 0.0005 Epoch: 18 Global Step: 230060 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:31:19,567-Speed 3017.54 samples/sec Loss 1.0831 LearningRate 0.0005 Epoch: 18 Global Step: 230070 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:31:22,914-Speed 3059.72 samples/sec Loss 1.1420 LearningRate 0.0005 Epoch: 18 Global Step: 230080 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:31:26,259-Speed 3062.78 samples/sec Loss 1.0920 LearningRate 0.0005 Epoch: 18 Global Step: 230090 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:31:29,602-Speed 3063.67 samples/sec Loss 1.1167 LearningRate 0.0005 Epoch: 18 Global Step: 230100 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:31:32,966-Speed 3044.53 samples/sec Loss 1.1760 LearningRate 0.0005 Epoch: 18 Global Step: 230110 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:31:36,334-Speed 3041.42 samples/sec Loss 1.0900 LearningRate 0.0005 Epoch: 18 Global Step: 230120 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:31:39,750-Speed 2998.36 samples/sec Loss 1.1464 LearningRate 0.0005 Epoch: 18 Global Step: 230130 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:31:43,148-Speed 3014.29 samples/sec Loss 1.1197 LearningRate 0.0005 Epoch: 18 Global Step: 230140 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:31:46,511-Speed 3046.29 samples/sec Loss 1.1101 LearningRate 0.0005 Epoch: 18 Global Step: 230150 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:31:49,853-Speed 3064.61 samples/sec Loss 1.1068 LearningRate 0.0005 Epoch: 18 Global Step: 230160 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:31:53,190-Speed 3069.22 samples/sec Loss 1.1163 LearningRate 0.0005 Epoch: 18 Global Step: 230170 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:31:56,571-Speed 3029.53 samples/sec Loss 1.1236 LearningRate 0.0005 Epoch: 18 Global Step: 230180 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 23:31:59,964-Speed 3018.96 samples/sec Loss 1.0571 LearningRate 0.0005 Epoch: 18 Global Step: 230190 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:32:03,337-Speed 3037.32 samples/sec Loss 1.1246 LearningRate 0.0005 Epoch: 18 Global Step: 230200 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:32:06,736-Speed 3013.21 samples/sec Loss 1.0642 LearningRate 0.0005 Epoch: 18 Global Step: 230210 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:32:10,141-Speed 3007.99 samples/sec Loss 1.1229 LearningRate 0.0005 Epoch: 18 Global Step: 230220 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:32:13,585-Speed 2974.45 samples/sec Loss 1.1186 LearningRate 0.0005 Epoch: 18 Global Step: 230230 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:32:16,938-Speed 3056.46 samples/sec Loss 1.0917 LearningRate 0.0005 Epoch: 18 Global Step: 230240 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:32:20,251-Speed 3091.17 samples/sec Loss 1.1227 LearningRate 0.0005 Epoch: 18 Global Step: 230250 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:32:23,594-Speed 3064.13 samples/sec Loss 1.1102 LearningRate 0.0005 Epoch: 18 Global Step: 230260 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:32:26,964-Speed 3039.32 samples/sec Loss 1.1056 LearningRate 0.0005 Epoch: 18 Global Step: 230270 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:32:30,278-Speed 3091.43 samples/sec Loss 1.1305 LearningRate 0.0005 Epoch: 18 Global Step: 230280 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:32:33,667-Speed 3022.28 samples/sec Loss 1.0851 LearningRate 0.0005 Epoch: 18 Global Step: 230290 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:32:37,133-Speed 2954.80 samples/sec Loss 1.1091 LearningRate 0.0005 Epoch: 18 Global Step: 230300 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:32:40,594-Speed 2959.76 samples/sec Loss 1.0984 LearningRate 0.0005 Epoch: 18 Global Step: 230310 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:32:44,008-Speed 3000.31 samples/sec Loss 1.1373 LearningRate 0.0005 Epoch: 18 Global Step: 230320 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:32:47,393-Speed 3026.12 samples/sec Loss 1.0765 LearningRate 0.0005 Epoch: 18 Global Step: 230330 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:32:50,800-Speed 3006.50 samples/sec Loss 1.1057 LearningRate 0.0005 Epoch: 18 Global Step: 230340 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:32:54,172-Speed 3037.65 samples/sec Loss 1.1056 LearningRate 0.0005 Epoch: 18 Global Step: 230350 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:32:57,601-Speed 2986.51 samples/sec Loss 1.0593 LearningRate 0.0005 Epoch: 18 Global Step: 230360 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:33:01,021-Speed 2995.28 samples/sec Loss 1.0917 LearningRate 0.0005 Epoch: 18 Global Step: 230370 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:33:04,366-Speed 3061.88 samples/sec Loss 1.1467 LearningRate 0.0005 Epoch: 18 Global Step: 230380 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:33:07,697-Speed 3075.51 samples/sec Loss 1.1185 LearningRate 0.0005 Epoch: 18 Global Step: 230390 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:33:11,083-Speed 3024.72 samples/sec Loss 1.1154 LearningRate 0.0005 Epoch: 18 Global Step: 230400 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:33:14,466-Speed 3027.53 samples/sec Loss 1.1234 LearningRate 0.0005 Epoch: 18 Global Step: 230410 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:33:17,850-Speed 3026.77 samples/sec Loss 1.1085 LearningRate 0.0005 Epoch: 18 Global Step: 230420 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:33:21,218-Speed 3040.89 samples/sec Loss 1.0867 LearningRate 0.0005 Epoch: 18 Global Step: 230430 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:33:24,573-Speed 3053.23 samples/sec Loss 1.1345 LearningRate 0.0005 Epoch: 18 Global Step: 230440 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:33:27,959-Speed 3025.05 samples/sec Loss 1.0906 LearningRate 0.0005 Epoch: 18 Global Step: 230450 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:33:31,401-Speed 2976.22 samples/sec Loss 1.1457 LearningRate 0.0005 Epoch: 18 Global Step: 230460 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:33:34,792-Speed 3021.01 samples/sec Loss 1.1285 LearningRate 0.0005 Epoch: 18 Global Step: 230470 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:33:38,186-Speed 3018.27 samples/sec Loss 1.0974 LearningRate 0.0005 Epoch: 18 Global Step: 230480 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:33:41,541-Speed 3052.79 samples/sec Loss 1.1252 LearningRate 0.0005 Epoch: 18 Global Step: 230490 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:33:44,858-Speed 3087.78 samples/sec Loss 1.1082 LearningRate 0.0005 Epoch: 18 Global Step: 230500 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:33:48,235-Speed 3033.27 samples/sec Loss 1.1722 LearningRate 0.0005 Epoch: 18 Global Step: 230510 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:33:51,580-Speed 3062.41 samples/sec Loss 1.0860 LearningRate 0.0005 Epoch: 18 Global Step: 230520 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:33:54,952-Speed 3037.95 samples/sec Loss 1.1247 LearningRate 0.0005 Epoch: 18 Global Step: 230530 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:33:58,340-Speed 3023.03 samples/sec Loss 1.1180 LearningRate 0.0005 Epoch: 18 Global Step: 230540 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:34:01,704-Speed 3045.22 samples/sec Loss 1.0824 LearningRate 0.0005 Epoch: 18 Global Step: 230550 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:34:05,150-Speed 2972.06 samples/sec Loss 1.0908 LearningRate 0.0005 Epoch: 18 Global Step: 230560 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:34:08,535-Speed 3026.77 samples/sec Loss 1.0588 LearningRate 0.0005 Epoch: 18 Global Step: 230570 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:34:11,931-Speed 3016.84 samples/sec Loss 1.0903 LearningRate 0.0005 Epoch: 18 Global Step: 230580 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:34:15,325-Speed 3017.16 samples/sec Loss 1.1481 LearningRate 0.0005 Epoch: 18 Global Step: 230590 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:34:18,726-Speed 3012.69 samples/sec Loss 1.1323 LearningRate 0.0005 Epoch: 18 Global Step: 230600 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:34:22,118-Speed 3019.87 samples/sec Loss 1.1136 LearningRate 0.0005 Epoch: 18 Global Step: 230610 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:34:25,504-Speed 3025.33 samples/sec Loss 1.1516 LearningRate 0.0005 Epoch: 18 Global Step: 230620 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:34:28,912-Speed 3005.32 samples/sec Loss 1.0956 LearningRate 0.0005 Epoch: 18 Global Step: 230630 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:34:32,270-Speed 3050.11 samples/sec Loss 1.1072 LearningRate 0.0005 Epoch: 18 Global Step: 230640 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:34:35,721-Speed 2967.99 samples/sec Loss 1.0851 LearningRate 0.0005 Epoch: 18 Global Step: 230650 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:34:39,105-Speed 3027.24 samples/sec Loss 1.0878 LearningRate 0.0005 Epoch: 18 Global Step: 230660 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:34:42,542-Speed 2980.32 samples/sec Loss 1.0719 LearningRate 0.0005 Epoch: 18 Global Step: 230670 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:34:46,038-Speed 2929.55 samples/sec Loss 1.1301 LearningRate 0.0005 Epoch: 18 Global Step: 230680 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:34:49,439-Speed 3012.25 samples/sec Loss 1.0775 LearningRate 0.0005 Epoch: 18 Global Step: 230690 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:34:52,777-Speed 3068.41 samples/sec Loss 1.1239 LearningRate 0.0005 Epoch: 18 Global Step: 230700 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:34:56,119-Speed 3065.10 samples/sec Loss 1.1107 LearningRate 0.0005 Epoch: 18 Global Step: 230710 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:34:59,476-Speed 3051.37 samples/sec Loss 1.1462 LearningRate 0.0005 Epoch: 18 Global Step: 230720 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:35:02,882-Speed 3007.47 samples/sec Loss 1.0864 LearningRate 0.0005 Epoch: 18 Global Step: 230730 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:35:06,239-Speed 3051.60 samples/sec Loss 1.1286 LearningRate 0.0005 Epoch: 18 Global Step: 230740 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:35:09,606-Speed 3042.03 samples/sec Loss 1.1705 LearningRate 0.0005 Epoch: 18 Global Step: 230750 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:35:13,017-Speed 3002.57 samples/sec Loss 1.1139 LearningRate 0.0005 Epoch: 18 Global Step: 230760 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:35:16,504-Speed 2938.08 samples/sec Loss 1.1174 LearningRate 0.0005 Epoch: 18 Global Step: 230770 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:35:19,910-Speed 3007.13 samples/sec Loss 1.1233 LearningRate 0.0005 Epoch: 18 Global Step: 230780 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:35:23,271-Speed 3047.54 samples/sec Loss 1.1071 LearningRate 0.0005 Epoch: 18 Global Step: 230790 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:35:26,714-Speed 2974.21 samples/sec Loss 1.0668 LearningRate 0.0005 Epoch: 18 Global Step: 230800 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:35:30,133-Speed 2996.28 samples/sec Loss 1.1170 LearningRate 0.0005 Epoch: 18 Global Step: 230810 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:35:33,466-Speed 3073.40 samples/sec Loss 1.0821 LearningRate 0.0005 Epoch: 18 Global Step: 230820 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 23:35:36,843-Speed 3032.49 samples/sec Loss 1.0955 LearningRate 0.0005 Epoch: 18 Global Step: 230830 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 23:35:40,248-Speed 3009.01 samples/sec Loss 1.1170 LearningRate 0.0005 Epoch: 18 Global Step: 230840 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 23:35:43,673-Speed 2989.94 samples/sec Loss 1.1234 LearningRate 0.0005 Epoch: 18 Global Step: 230850 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 23:35:47,025-Speed 3055.53 samples/sec Loss 1.1563 LearningRate 0.0005 Epoch: 18 Global Step: 230860 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 23:35:50,425-Speed 3013.35 samples/sec Loss 1.1690 LearningRate 0.0005 Epoch: 18 Global Step: 230870 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 23:35:53,775-Speed 3057.39 samples/sec Loss 1.1111 LearningRate 0.0005 Epoch: 18 Global Step: 230880 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 23:35:57,098-Speed 3081.84 samples/sec Loss 1.1113 LearningRate 0.0005 Epoch: 18 Global Step: 230890 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 23:36:00,474-Speed 3034.47 samples/sec Loss 1.1048 LearningRate 0.0005 Epoch: 18 Global Step: 230900 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 23:36:03,872-Speed 3014.12 samples/sec Loss 1.0752 LearningRate 0.0005 Epoch: 18 Global Step: 230910 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 23:36:07,261-Speed 3022.55 samples/sec Loss 1.1049 LearningRate 0.0005 Epoch: 18 Global Step: 230920 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:36:10,593-Speed 3074.22 samples/sec Loss 1.1124 LearningRate 0.0005 Epoch: 18 Global Step: 230930 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:36:13,950-Speed 3051.13 samples/sec Loss 1.1302 LearningRate 0.0005 Epoch: 18 Global Step: 230940 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:36:17,312-Speed 3046.71 samples/sec Loss 1.1006 LearningRate 0.0005 Epoch: 18 Global Step: 230950 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:36:20,770-Speed 2961.77 samples/sec Loss 1.1266 LearningRate 0.0005 Epoch: 18 Global Step: 230960 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:36:24,174-Speed 3008.96 samples/sec Loss 1.0601 LearningRate 0.0005 Epoch: 18 Global Step: 230970 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:36:27,582-Speed 3005.45 samples/sec Loss 1.1232 LearningRate 0.0005 Epoch: 18 Global Step: 230980 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:36:30,984-Speed 3011.09 samples/sec Loss 1.1441 LearningRate 0.0005 Epoch: 18 Global Step: 230990 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:36:34,343-Speed 3050.02 samples/sec Loss 1.0941 LearningRate 0.0005 Epoch: 18 Global Step: 231000 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:36:37,729-Speed 3024.53 samples/sec Loss 1.1231 LearningRate 0.0005 Epoch: 18 Global Step: 231010 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:36:41,096-Speed 3042.49 samples/sec Loss 1.1676 LearningRate 0.0005 Epoch: 18 Global Step: 231020 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:36:44,446-Speed 3057.49 samples/sec Loss 1.0693 LearningRate 0.0005 Epoch: 18 Global Step: 231030 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:36:47,884-Speed 2978.98 samples/sec Loss 1.0988 LearningRate 0.0005 Epoch: 18 Global Step: 231040 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:36:51,270-Speed 3025.23 samples/sec Loss 1.1471 LearningRate 0.0005 Epoch: 18 Global Step: 231050 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:36:54,635-Speed 3043.84 samples/sec Loss 1.1439 LearningRate 0.0005 Epoch: 18 Global Step: 231060 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:36:58,031-Speed 3016.30 samples/sec Loss 1.0939 LearningRate 0.0005 Epoch: 18 Global Step: 231070 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:37:01,395-Speed 3044.04 samples/sec Loss 1.0750 LearningRate 0.0005 Epoch: 18 Global Step: 231080 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:37:04,738-Speed 3064.38 samples/sec Loss 1.1253 LearningRate 0.0005 Epoch: 18 Global Step: 231090 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:37:08,123-Speed 3026.27 samples/sec Loss 1.1288 LearningRate 0.0005 Epoch: 18 Global Step: 231100 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:37:11,492-Speed 3040.69 samples/sec Loss 1.0937 LearningRate 0.0005 Epoch: 18 Global Step: 231110 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:37:14,841-Speed 3058.34 samples/sec Loss 1.1306 LearningRate 0.0005 Epoch: 18 Global Step: 231120 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:37:18,159-Speed 3086.60 samples/sec Loss 1.1070 LearningRate 0.0005 Epoch: 18 Global Step: 231130 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:37:21,482-Speed 3082.60 samples/sec Loss 1.1195 LearningRate 0.0005 Epoch: 18 Global Step: 231140 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:37:24,813-Speed 3075.11 samples/sec Loss 1.0977 LearningRate 0.0005 Epoch: 18 Global Step: 231150 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 23:37:28,181-Speed 3040.68 samples/sec Loss 1.1413 LearningRate 0.0005 Epoch: 18 Global Step: 231160 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 23:37:31,580-Speed 3014.21 samples/sec Loss 1.0852 LearningRate 0.0005 Epoch: 18 Global Step: 231170 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 23:37:35,005-Speed 2990.61 samples/sec Loss 1.1286 LearningRate 0.0005 Epoch: 18 Global Step: 231180 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 23:37:38,376-Speed 3038.42 samples/sec Loss 1.1260 LearningRate 0.0005 Epoch: 18 Global Step: 231190 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 23:37:41,737-Speed 3047.08 samples/sec Loss 1.1447 LearningRate 0.0005 Epoch: 18 Global Step: 231200 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 23:37:45,140-Speed 3009.97 samples/sec Loss 1.1456 LearningRate 0.0005 Epoch: 18 Global Step: 231210 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 23:37:48,561-Speed 2994.59 samples/sec Loss 1.1635 LearningRate 0.0005 Epoch: 18 Global Step: 231220 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 23:37:51,916-Speed 3052.65 samples/sec Loss 1.1077 LearningRate 0.0005 Epoch: 18 Global Step: 231230 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 23:37:55,244-Speed 3078.40 samples/sec Loss 1.0909 LearningRate 0.0005 Epoch: 18 Global Step: 231240 Fp16 Grad Scale: 8192 Required: 2 hours Training: 2022-04-27 23:37:58,653-Speed 3004.10 samples/sec Loss 1.0698 LearningRate 0.0005 Epoch: 18 Global Step: 231250 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:38:02,116-Speed 2957.62 samples/sec Loss 1.1024 LearningRate 0.0005 Epoch: 18 Global Step: 231260 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:38:05,458-Speed 3065.72 samples/sec Loss 1.1635 LearningRate 0.0005 Epoch: 18 Global Step: 231270 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:38:08,871-Speed 3000.34 samples/sec Loss 1.0874 LearningRate 0.0005 Epoch: 18 Global Step: 231280 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:38:12,234-Speed 3046.18 samples/sec Loss 1.0839 LearningRate 0.0005 Epoch: 18 Global Step: 231290 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:38:15,562-Speed 3077.34 samples/sec Loss 1.1152 LearningRate 0.0005 Epoch: 18 Global Step: 231300 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:38:18,976-Speed 3000.00 samples/sec Loss 1.0885 LearningRate 0.0005 Epoch: 18 Global Step: 231310 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:38:22,414-Speed 2979.52 samples/sec Loss 1.0975 LearningRate 0.0005 Epoch: 18 Global Step: 231320 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:38:25,755-Speed 3065.91 samples/sec Loss 1.1079 LearningRate 0.0005 Epoch: 18 Global Step: 231330 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:38:29,077-Speed 3083.00 samples/sec Loss 1.1313 LearningRate 0.0005 Epoch: 18 Global Step: 231340 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:38:32,488-Speed 3002.70 samples/sec Loss 1.0749 LearningRate 0.0005 Epoch: 18 Global Step: 231350 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:38:35,905-Speed 2997.82 samples/sec Loss 1.1093 LearningRate 0.0005 Epoch: 18 Global Step: 231360 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:38:39,348-Speed 2975.10 samples/sec Loss 1.0744 LearningRate 0.0005 Epoch: 18 Global Step: 231370 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:38:42,719-Speed 3038.79 samples/sec Loss 1.1151 LearningRate 0.0005 Epoch: 18 Global Step: 231380 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:38:46,088-Speed 3040.10 samples/sec Loss 1.0993 LearningRate 0.0005 Epoch: 18 Global Step: 231390 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:38:49,472-Speed 3027.51 samples/sec Loss 1.1258 LearningRate 0.0005 Epoch: 18 Global Step: 231400 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:38:52,789-Speed 3088.01 samples/sec Loss 1.1242 LearningRate 0.0005 Epoch: 18 Global Step: 231410 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:38:56,187-Speed 3014.35 samples/sec Loss 1.0770 LearningRate 0.0005 Epoch: 18 Global Step: 231420 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:38:59,578-Speed 3020.68 samples/sec Loss 1.0658 LearningRate 0.0005 Epoch: 18 Global Step: 231430 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:39:02,947-Speed 3040.20 samples/sec Loss 1.1492 LearningRate 0.0005 Epoch: 18 Global Step: 231440 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:39:06,321-Speed 3036.22 samples/sec Loss 1.1293 LearningRate 0.0005 Epoch: 18 Global Step: 231450 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:39:09,669-Speed 3058.86 samples/sec Loss 1.1430 LearningRate 0.0005 Epoch: 18 Global Step: 231460 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:39:12,996-Speed 3078.37 samples/sec Loss 1.1131 LearningRate 0.0005 Epoch: 18 Global Step: 231470 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:39:16,389-Speed 3019.19 samples/sec Loss 1.1050 LearningRate 0.0005 Epoch: 18 Global Step: 231480 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:39:19,721-Speed 3074.39 samples/sec Loss 1.0875 LearningRate 0.0005 Epoch: 18 Global Step: 231490 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:39:23,082-Speed 3047.00 samples/sec Loss 1.1276 LearningRate 0.0005 Epoch: 18 Global Step: 231500 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:39:26,515-Speed 2983.80 samples/sec Loss 1.1008 LearningRate 0.0005 Epoch: 18 Global Step: 231510 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:39:29,941-Speed 2989.51 samples/sec Loss 1.1352 LearningRate 0.0005 Epoch: 18 Global Step: 231520 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:39:33,275-Speed 3072.39 samples/sec Loss 1.1147 LearningRate 0.0005 Epoch: 18 Global Step: 231530 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:39:36,681-Speed 3007.22 samples/sec Loss 1.0977 LearningRate 0.0005 Epoch: 18 Global Step: 231540 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:39:40,063-Speed 3028.66 samples/sec Loss 1.1699 LearningRate 0.0005 Epoch: 18 Global Step: 231550 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:39:43,475-Speed 3002.15 samples/sec Loss 1.1091 LearningRate 0.0005 Epoch: 18 Global Step: 231560 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:39:46,841-Speed 3042.75 samples/sec Loss 1.0720 LearningRate 0.0005 Epoch: 18 Global Step: 231570 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:39:50,212-Speed 3038.47 samples/sec Loss 1.0955 LearningRate 0.0005 Epoch: 18 Global Step: 231580 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:39:53,548-Speed 3070.14 samples/sec Loss 1.1704 LearningRate 0.0005 Epoch: 18 Global Step: 231590 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:39:56,928-Speed 3031.00 samples/sec Loss 1.0992 LearningRate 0.0005 Epoch: 18 Global Step: 231600 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:40:00,306-Speed 3032.09 samples/sec Loss 1.0959 LearningRate 0.0005 Epoch: 18 Global Step: 231610 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:40:03,715-Speed 3005.33 samples/sec Loss 1.1071 LearningRate 0.0005 Epoch: 18 Global Step: 231620 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:40:07,089-Speed 3035.85 samples/sec Loss 1.1036 LearningRate 0.0005 Epoch: 18 Global Step: 231630 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:40:10,453-Speed 3044.56 samples/sec Loss 1.1257 LearningRate 0.0005 Epoch: 18 Global Step: 231640 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:40:13,794-Speed 3066.18 samples/sec Loss 1.0838 LearningRate 0.0005 Epoch: 18 Global Step: 231650 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:40:17,143-Speed 3058.27 samples/sec Loss 1.1323 LearningRate 0.0005 Epoch: 18 Global Step: 231660 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:40:20,491-Speed 3059.79 samples/sec Loss 1.0933 LearningRate 0.0005 Epoch: 18 Global Step: 231670 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:40:23,852-Speed 3047.81 samples/sec Loss 1.1274 LearningRate 0.0005 Epoch: 18 Global Step: 231680 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:40:27,185-Speed 3073.27 samples/sec Loss 1.1455 LearningRate 0.0005 Epoch: 18 Global Step: 231690 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:40:30,572-Speed 3024.19 samples/sec Loss 1.1281 LearningRate 0.0005 Epoch: 18 Global Step: 231700 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:40:33,892-Speed 3084.84 samples/sec Loss 1.1282 LearningRate 0.0005 Epoch: 18 Global Step: 231710 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:40:37,240-Speed 3059.53 samples/sec Loss 1.0751 LearningRate 0.0005 Epoch: 18 Global Step: 231720 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:40:40,605-Speed 3044.12 samples/sec Loss 1.1226 LearningRate 0.0005 Epoch: 18 Global Step: 231730 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:40:43,951-Speed 3061.10 samples/sec Loss 1.1138 LearningRate 0.0005 Epoch: 18 Global Step: 231740 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:40:47,325-Speed 3036.25 samples/sec Loss 1.0861 LearningRate 0.0005 Epoch: 18 Global Step: 231750 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:40:50,788-Speed 2957.46 samples/sec Loss 1.0972 LearningRate 0.0004 Epoch: 18 Global Step: 231760 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:40:54,112-Speed 3081.99 samples/sec Loss 1.1306 LearningRate 0.0004 Epoch: 18 Global Step: 231770 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:40:57,460-Speed 3059.75 samples/sec Loss 1.1659 LearningRate 0.0004 Epoch: 18 Global Step: 231780 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:41:00,883-Speed 2992.14 samples/sec Loss 1.0832 LearningRate 0.0004 Epoch: 18 Global Step: 231790 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:41:04,273-Speed 3020.81 samples/sec Loss 1.0773 LearningRate 0.0004 Epoch: 18 Global Step: 231800 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:41:07,606-Speed 3073.51 samples/sec Loss 1.1564 LearningRate 0.0004 Epoch: 18 Global Step: 231810 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:41:11,060-Speed 2965.18 samples/sec Loss 1.0861 LearningRate 0.0004 Epoch: 18 Global Step: 231820 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:41:14,471-Speed 3002.76 samples/sec Loss 1.1108 LearningRate 0.0004 Epoch: 18 Global Step: 231830 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:41:17,842-Speed 3038.43 samples/sec Loss 1.1169 LearningRate 0.0004 Epoch: 18 Global Step: 231840 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:41:21,183-Speed 3066.28 samples/sec Loss 1.0932 LearningRate 0.0004 Epoch: 18 Global Step: 231850 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:41:24,558-Speed 3035.09 samples/sec Loss 1.1002 LearningRate 0.0004 Epoch: 18 Global Step: 231860 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:41:27,897-Speed 3067.50 samples/sec Loss 1.1073 LearningRate 0.0004 Epoch: 18 Global Step: 231870 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:41:31,229-Speed 3073.83 samples/sec Loss 1.1336 LearningRate 0.0004 Epoch: 18 Global Step: 231880 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:41:34,647-Speed 2997.08 samples/sec Loss 1.1271 LearningRate 0.0004 Epoch: 18 Global Step: 231890 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:41:38,012-Speed 3043.98 samples/sec Loss 1.0845 LearningRate 0.0004 Epoch: 18 Global Step: 231900 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:41:41,440-Speed 2987.47 samples/sec Loss 1.0653 LearningRate 0.0004 Epoch: 18 Global Step: 231910 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:41:44,855-Speed 2999.39 samples/sec Loss 1.1458 LearningRate 0.0004 Epoch: 18 Global Step: 231920 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:41:48,269-Speed 3000.61 samples/sec Loss 1.1028 LearningRate 0.0004 Epoch: 18 Global Step: 231930 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:41:51,605-Speed 3070.07 samples/sec Loss 1.0987 LearningRate 0.0004 Epoch: 18 Global Step: 231940 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:41:54,926-Speed 3083.82 samples/sec Loss 1.1257 LearningRate 0.0004 Epoch: 18 Global Step: 231950 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:41:58,275-Speed 3058.68 samples/sec Loss 1.0965 LearningRate 0.0004 Epoch: 18 Global Step: 231960 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:42:01,603-Speed 3078.47 samples/sec Loss 1.1405 LearningRate 0.0004 Epoch: 18 Global Step: 231970 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:42:04,970-Speed 3042.18 samples/sec Loss 1.1440 LearningRate 0.0004 Epoch: 18 Global Step: 231980 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:42:08,304-Speed 3072.11 samples/sec Loss 1.1137 LearningRate 0.0004 Epoch: 18 Global Step: 231990 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:42:11,724-Speed 2994.81 samples/sec Loss 1.0821 LearningRate 0.0004 Epoch: 18 Global Step: 232000 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:42:15,071-Speed 3059.72 samples/sec Loss 1.0823 LearningRate 0.0004 Epoch: 18 Global Step: 232010 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:42:18,486-Speed 2999.33 samples/sec Loss 1.1180 LearningRate 0.0004 Epoch: 18 Global Step: 232020 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:42:21,882-Speed 3016.58 samples/sec Loss 1.1053 LearningRate 0.0004 Epoch: 18 Global Step: 232030 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:42:25,216-Speed 3072.03 samples/sec Loss 1.1141 LearningRate 0.0004 Epoch: 18 Global Step: 232040 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 23:42:28,680-Speed 2956.80 samples/sec Loss 1.1305 LearningRate 0.0004 Epoch: 18 Global Step: 232050 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:42:32,097-Speed 2997.21 samples/sec Loss 1.1234 LearningRate 0.0004 Epoch: 18 Global Step: 232060 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:42:35,478-Speed 3029.48 samples/sec Loss 1.1513 LearningRate 0.0004 Epoch: 18 Global Step: 232070 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:42:38,852-Speed 3035.93 samples/sec Loss 1.1490 LearningRate 0.0004 Epoch: 18 Global Step: 232080 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:42:42,194-Speed 3065.19 samples/sec Loss 1.0935 LearningRate 0.0004 Epoch: 18 Global Step: 232090 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:42:45,631-Speed 2980.37 samples/sec Loss 1.1058 LearningRate 0.0004 Epoch: 18 Global Step: 232100 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:42:49,006-Speed 3034.40 samples/sec Loss 1.0994 LearningRate 0.0004 Epoch: 18 Global Step: 232110 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:42:52,404-Speed 3014.76 samples/sec Loss 1.1162 LearningRate 0.0004 Epoch: 18 Global Step: 232120 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:42:55,737-Speed 3073.09 samples/sec Loss 1.1438 LearningRate 0.0004 Epoch: 18 Global Step: 232130 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:42:59,087-Speed 3056.95 samples/sec Loss 1.1233 LearningRate 0.0004 Epoch: 18 Global Step: 232140 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:43:02,504-Speed 2998.07 samples/sec Loss 1.1409 LearningRate 0.0004 Epoch: 18 Global Step: 232150 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:43:05,871-Speed 3041.39 samples/sec Loss 1.1038 LearningRate 0.0004 Epoch: 18 Global Step: 232160 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:43:09,233-Speed 3047.07 samples/sec Loss 1.0518 LearningRate 0.0004 Epoch: 18 Global Step: 232170 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:43:12,574-Speed 3066.24 samples/sec Loss 1.1126 LearningRate 0.0004 Epoch: 18 Global Step: 232180 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:43:15,928-Speed 3054.02 samples/sec Loss 1.1081 LearningRate 0.0004 Epoch: 18 Global Step: 232190 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:43:19,340-Speed 3001.84 samples/sec Loss 1.1710 LearningRate 0.0004 Epoch: 18 Global Step: 232200 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:43:22,837-Speed 2929.37 samples/sec Loss 1.0931 LearningRate 0.0004 Epoch: 18 Global Step: 232210 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:43:26,186-Speed 3058.32 samples/sec Loss 1.1140 LearningRate 0.0004 Epoch: 18 Global Step: 232220 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:43:29,572-Speed 3024.71 samples/sec Loss 1.0947 LearningRate 0.0004 Epoch: 18 Global Step: 232230 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:43:32,979-Speed 3005.99 samples/sec Loss 1.1098 LearningRate 0.0004 Epoch: 18 Global Step: 232240 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:43:36,356-Speed 3034.06 samples/sec Loss 1.0755 LearningRate 0.0004 Epoch: 18 Global Step: 232250 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:43:39,811-Speed 2964.71 samples/sec Loss 1.1416 LearningRate 0.0004 Epoch: 18 Global Step: 232260 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:43:43,154-Speed 3063.73 samples/sec Loss 1.0610 LearningRate 0.0004 Epoch: 18 Global Step: 232270 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:43:46,536-Speed 3028.82 samples/sec Loss 1.0849 LearningRate 0.0004 Epoch: 18 Global Step: 232280 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:43:49,901-Speed 3044.21 samples/sec Loss 1.1351 LearningRate 0.0004 Epoch: 18 Global Step: 232290 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:43:53,308-Speed 3006.73 samples/sec Loss 1.1704 LearningRate 0.0004 Epoch: 18 Global Step: 232300 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:43:56,688-Speed 3029.65 samples/sec Loss 1.0847 LearningRate 0.0004 Epoch: 18 Global Step: 232310 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:44:00,101-Speed 3002.00 samples/sec Loss 1.0956 LearningRate 0.0004 Epoch: 18 Global Step: 232320 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:44:03,576-Speed 2947.52 samples/sec Loss 1.1128 LearningRate 0.0004 Epoch: 18 Global Step: 232330 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:44:06,977-Speed 3011.21 samples/sec Loss 1.1440 LearningRate 0.0004 Epoch: 18 Global Step: 232340 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:44:10,330-Speed 3055.48 samples/sec Loss 1.0994 LearningRate 0.0004 Epoch: 18 Global Step: 232350 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:44:13,765-Speed 2981.76 samples/sec Loss 1.1625 LearningRate 0.0004 Epoch: 18 Global Step: 232360 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:44:17,149-Speed 3026.23 samples/sec Loss 1.0661 LearningRate 0.0004 Epoch: 18 Global Step: 232370 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:44:20,495-Speed 3061.50 samples/sec Loss 1.1167 LearningRate 0.0004 Epoch: 18 Global Step: 232380 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:44:23,856-Speed 3047.48 samples/sec Loss 1.0633 LearningRate 0.0004 Epoch: 18 Global Step: 232390 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:44:27,240-Speed 3026.81 samples/sec Loss 1.1286 LearningRate 0.0004 Epoch: 18 Global Step: 232400 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 23:44:30,623-Speed 3028.27 samples/sec Loss 1.1157 LearningRate 0.0004 Epoch: 18 Global Step: 232410 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 23:44:33,998-Speed 3034.67 samples/sec Loss 1.0985 LearningRate 0.0004 Epoch: 18 Global Step: 232420 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 23:44:37,317-Speed 3085.48 samples/sec Loss 1.1305 LearningRate 0.0004 Epoch: 18 Global Step: 232430 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 23:44:40,657-Speed 3067.13 samples/sec Loss 1.1296 LearningRate 0.0004 Epoch: 18 Global Step: 232440 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 23:44:44,068-Speed 3002.65 samples/sec Loss 1.1108 LearningRate 0.0004 Epoch: 18 Global Step: 232450 Fp16 Grad Scale: 65536 Required: 2 hours Training: 2022-04-27 23:44:47,388-Speed 3085.67 samples/sec Loss 1.1151 LearningRate 0.0004 Epoch: 18 Global Step: 232460 Fp16 Grad Scale: 32768 Required: 2 hours Training: 2022-04-27 23:44:50,787-Speed 3013.10 samples/sec Loss 1.1030 LearningRate 0.0004 Epoch: 18 Global Step: 232470 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:44:54,205-Speed 2996.75 samples/sec Loss 1.0769 LearningRate 0.0004 Epoch: 18 Global Step: 232480 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:44:57,547-Speed 3065.03 samples/sec Loss 1.1026 LearningRate 0.0004 Epoch: 18 Global Step: 232490 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:45:00,974-Speed 2988.84 samples/sec Loss 1.1625 LearningRate 0.0004 Epoch: 18 Global Step: 232500 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:45:04,445-Speed 2950.66 samples/sec Loss 1.0787 LearningRate 0.0004 Epoch: 18 Global Step: 232510 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:45:07,850-Speed 3008.08 samples/sec Loss 1.0671 LearningRate 0.0004 Epoch: 18 Global Step: 232520 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:45:11,269-Speed 2996.46 samples/sec Loss 1.0639 LearningRate 0.0004 Epoch: 18 Global Step: 232530 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:45:14,607-Speed 3067.95 samples/sec Loss 1.1168 LearningRate 0.0004 Epoch: 18 Global Step: 232540 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:45:18,022-Speed 2999.59 samples/sec Loss 1.1275 LearningRate 0.0004 Epoch: 18 Global Step: 232550 Fp16 Grad Scale: 16384 Required: 2 hours Training: 2022-04-27 23:45:21,465-Speed 2974.38 samples/sec Loss 1.1276 LearningRate 0.0004 Epoch: 18 Global Step: 232560 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 23:45:24,786-Speed 3084.83 samples/sec Loss 1.1136 LearningRate 0.0004 Epoch: 18 Global Step: 232570 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:45:28,135-Speed 3058.63 samples/sec Loss 1.0906 LearningRate 0.0004 Epoch: 18 Global Step: 232580 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:45:31,552-Speed 2997.04 samples/sec Loss 1.0937 LearningRate 0.0004 Epoch: 18 Global Step: 232590 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:45:34,901-Speed 3059.16 samples/sec Loss 1.1265 LearningRate 0.0004 Epoch: 18 Global Step: 232600 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 23:45:38,280-Speed 3031.69 samples/sec Loss 1.1137 LearningRate 0.0004 Epoch: 18 Global Step: 232610 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 23:45:41,670-Speed 3021.28 samples/sec Loss 1.1242 LearningRate 0.0004 Epoch: 18 Global Step: 232620 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 23:45:45,042-Speed 3037.77 samples/sec Loss 1.0927 LearningRate 0.0004 Epoch: 18 Global Step: 232630 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 23:45:48,459-Speed 2997.83 samples/sec Loss 1.1349 LearningRate 0.0004 Epoch: 18 Global Step: 232640 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 23:45:51,849-Speed 3020.80 samples/sec Loss 1.1545 LearningRate 0.0004 Epoch: 18 Global Step: 232650 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 23:45:55,177-Speed 3077.83 samples/sec Loss 1.1188 LearningRate 0.0004 Epoch: 18 Global Step: 232660 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 23:45:58,520-Speed 3064.98 samples/sec Loss 1.0955 LearningRate 0.0004 Epoch: 18 Global Step: 232670 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 23:46:01,962-Speed 2975.19 samples/sec Loss 1.0565 LearningRate 0.0004 Epoch: 18 Global Step: 232680 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 23:46:05,357-Speed 3017.24 samples/sec Loss 1.1019 LearningRate 0.0004 Epoch: 18 Global Step: 232690 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 23:46:08,710-Speed 3054.20 samples/sec Loss 1.0903 LearningRate 0.0004 Epoch: 18 Global Step: 232700 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:46:12,064-Speed 3053.83 samples/sec Loss 1.1018 LearningRate 0.0004 Epoch: 18 Global Step: 232710 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:46:15,463-Speed 3013.36 samples/sec Loss 1.1080 LearningRate 0.0004 Epoch: 18 Global Step: 232720 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:46:18,796-Speed 3073.59 samples/sec Loss 1.0852 LearningRate 0.0004 Epoch: 18 Global Step: 232730 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:46:22,198-Speed 3010.68 samples/sec Loss 1.0759 LearningRate 0.0004 Epoch: 18 Global Step: 232740 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:46:25,685-Speed 2937.27 samples/sec Loss 1.1114 LearningRate 0.0004 Epoch: 18 Global Step: 232750 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:46:29,140-Speed 2965.17 samples/sec Loss 1.0797 LearningRate 0.0004 Epoch: 18 Global Step: 232760 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:46:32,585-Speed 2972.86 samples/sec Loss 1.0911 LearningRate 0.0004 Epoch: 18 Global Step: 232770 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:46:36,063-Speed 2945.10 samples/sec Loss 1.1386 LearningRate 0.0004 Epoch: 18 Global Step: 232780 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:46:39,442-Speed 3030.95 samples/sec Loss 1.1324 LearningRate 0.0004 Epoch: 18 Global Step: 232790 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:46:42,764-Speed 3083.16 samples/sec Loss 1.1486 LearningRate 0.0004 Epoch: 18 Global Step: 232800 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 23:46:46,141-Speed 3033.35 samples/sec Loss 1.1341 LearningRate 0.0004 Epoch: 18 Global Step: 232810 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:46:49,575-Speed 2982.41 samples/sec Loss 1.1287 LearningRate 0.0004 Epoch: 18 Global Step: 232820 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:46:52,921-Speed 3062.11 samples/sec Loss 1.1269 LearningRate 0.0004 Epoch: 18 Global Step: 232830 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:46:56,250-Speed 3076.39 samples/sec Loss 1.1405 LearningRate 0.0004 Epoch: 18 Global Step: 232840 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:46:59,588-Speed 3068.81 samples/sec Loss 1.1209 LearningRate 0.0004 Epoch: 18 Global Step: 232850 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 23:47:02,959-Speed 3038.57 samples/sec Loss 1.1003 LearningRate 0.0004 Epoch: 18 Global Step: 232860 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 23:47:06,267-Speed 3096.16 samples/sec Loss 1.0940 LearningRate 0.0004 Epoch: 18 Global Step: 232870 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 23:47:09,603-Speed 3070.00 samples/sec Loss 1.0955 LearningRate 0.0004 Epoch: 18 Global Step: 232880 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 23:47:12,964-Speed 3047.78 samples/sec Loss 1.1767 LearningRate 0.0004 Epoch: 18 Global Step: 232890 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 23:47:16,340-Speed 3033.89 samples/sec Loss 1.1566 LearningRate 0.0004 Epoch: 18 Global Step: 232900 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 23:47:19,775-Speed 2982.39 samples/sec Loss 1.1039 LearningRate 0.0004 Epoch: 18 Global Step: 232910 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 23:47:23,244-Speed 2952.01 samples/sec Loss 1.0785 LearningRate 0.0004 Epoch: 18 Global Step: 232920 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 23:47:26,623-Speed 3031.91 samples/sec Loss 1.1032 LearningRate 0.0004 Epoch: 18 Global Step: 232930 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 23:47:30,032-Speed 3004.63 samples/sec Loss 1.1490 LearningRate 0.0004 Epoch: 18 Global Step: 232940 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 23:47:33,404-Speed 3037.61 samples/sec Loss 1.1050 LearningRate 0.0004 Epoch: 18 Global Step: 232950 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 23:47:36,746-Speed 3064.99 samples/sec Loss 1.1026 LearningRate 0.0004 Epoch: 18 Global Step: 232960 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 23:47:40,144-Speed 3014.12 samples/sec Loss 1.1157 LearningRate 0.0004 Epoch: 18 Global Step: 232970 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 23:47:43,525-Speed 3030.09 samples/sec Loss 1.0837 LearningRate 0.0004 Epoch: 18 Global Step: 232980 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 23:47:46,885-Speed 3047.84 samples/sec Loss 1.0817 LearningRate 0.0004 Epoch: 18 Global Step: 232990 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 23:47:50,264-Speed 3031.69 samples/sec Loss 1.1236 LearningRate 0.0004 Epoch: 18 Global Step: 233000 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 23:47:53,650-Speed 3025.06 samples/sec Loss 1.1537 LearningRate 0.0004 Epoch: 18 Global Step: 233010 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 23:47:57,050-Speed 3012.21 samples/sec Loss 1.1104 LearningRate 0.0004 Epoch: 18 Global Step: 233020 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 23:48:00,432-Speed 3029.00 samples/sec Loss 1.1112 LearningRate 0.0004 Epoch: 18 Global Step: 233030 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 23:48:03,758-Speed 3079.84 samples/sec Loss 1.1154 LearningRate 0.0004 Epoch: 18 Global Step: 233040 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 23:48:07,191-Speed 2982.67 samples/sec Loss 1.0915 LearningRate 0.0004 Epoch: 18 Global Step: 233050 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 23:48:10,603-Speed 3001.97 samples/sec Loss 1.1396 LearningRate 0.0004 Epoch: 18 Global Step: 233060 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 23:48:14,036-Speed 2984.07 samples/sec Loss 1.1259 LearningRate 0.0004 Epoch: 18 Global Step: 233070 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:48:17,382-Speed 3061.49 samples/sec Loss 1.1634 LearningRate 0.0004 Epoch: 18 Global Step: 233080 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:48:20,744-Speed 3045.58 samples/sec Loss 1.1467 LearningRate 0.0004 Epoch: 18 Global Step: 233090 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:48:24,129-Speed 3026.25 samples/sec Loss 1.0943 LearningRate 0.0004 Epoch: 18 Global Step: 233100 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:48:27,507-Speed 3032.38 samples/sec Loss 1.1293 LearningRate 0.0004 Epoch: 18 Global Step: 233110 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:48:30,936-Speed 2986.63 samples/sec Loss 1.0762 LearningRate 0.0004 Epoch: 18 Global Step: 233120 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:48:34,267-Speed 3076.07 samples/sec Loss 1.1205 LearningRate 0.0004 Epoch: 18 Global Step: 233130 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:48:37,644-Speed 3032.92 samples/sec Loss 1.1234 LearningRate 0.0004 Epoch: 18 Global Step: 233140 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:48:41,078-Speed 2982.68 samples/sec Loss 1.1322 LearningRate 0.0004 Epoch: 18 Global Step: 233150 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:48:44,432-Speed 3053.37 samples/sec Loss 1.1081 LearningRate 0.0004 Epoch: 18 Global Step: 233160 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:48:47,761-Speed 3076.64 samples/sec Loss 1.0913 LearningRate 0.0004 Epoch: 18 Global Step: 233170 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 23:48:51,129-Speed 3041.97 samples/sec Loss 1.0899 LearningRate 0.0004 Epoch: 18 Global Step: 233180 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 23:48:54,553-Speed 2990.79 samples/sec Loss 1.0886 LearningRate 0.0004 Epoch: 18 Global Step: 233190 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 23:48:57,942-Speed 3022.47 samples/sec Loss 1.1217 LearningRate 0.0004 Epoch: 18 Global Step: 233200 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:49:01,280-Speed 3069.35 samples/sec Loss 1.1120 LearningRate 0.0004 Epoch: 18 Global Step: 233210 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:49:04,624-Speed 3062.75 samples/sec Loss 1.1583 LearningRate 0.0004 Epoch: 18 Global Step: 233220 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:49:08,008-Speed 3026.94 samples/sec Loss 1.0680 LearningRate 0.0004 Epoch: 18 Global Step: 233230 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:49:11,391-Speed 3027.78 samples/sec Loss 1.1455 LearningRate 0.0004 Epoch: 18 Global Step: 233240 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:49:14,763-Speed 3037.14 samples/sec Loss 1.1094 LearningRate 0.0004 Epoch: 18 Global Step: 233250 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:49:18,108-Speed 3063.55 samples/sec Loss 1.0895 LearningRate 0.0004 Epoch: 18 Global Step: 233260 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:49:21,473-Speed 3044.09 samples/sec Loss 1.0871 LearningRate 0.0004 Epoch: 18 Global Step: 233270 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:49:24,886-Speed 3000.26 samples/sec Loss 1.0924 LearningRate 0.0004 Epoch: 18 Global Step: 233280 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:49:28,244-Speed 3050.74 samples/sec Loss 1.1096 LearningRate 0.0004 Epoch: 18 Global Step: 233290 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:49:31,637-Speed 3019.36 samples/sec Loss 1.1094 LearningRate 0.0004 Epoch: 18 Global Step: 233300 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 23:49:35,027-Speed 3021.04 samples/sec Loss 1.0998 LearningRate 0.0004 Epoch: 18 Global Step: 233310 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:49:38,450-Speed 2992.35 samples/sec Loss 1.1556 LearningRate 0.0004 Epoch: 18 Global Step: 233320 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:49:41,845-Speed 3017.28 samples/sec Loss 1.1353 LearningRate 0.0004 Epoch: 18 Global Step: 233330 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:49:45,196-Speed 3056.36 samples/sec Loss 1.1240 LearningRate 0.0004 Epoch: 18 Global Step: 233340 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:49:48,594-Speed 3014.94 samples/sec Loss 1.1502 LearningRate 0.0004 Epoch: 18 Global Step: 233350 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:49:51,993-Speed 3013.04 samples/sec Loss 1.1014 LearningRate 0.0004 Epoch: 18 Global Step: 233360 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:49:55,368-Speed 3035.38 samples/sec Loss 1.1252 LearningRate 0.0004 Epoch: 18 Global Step: 233370 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 23:49:58,750-Speed 3028.39 samples/sec Loss 1.0914 LearningRate 0.0004 Epoch: 18 Global Step: 233380 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 23:50:02,088-Speed 3069.05 samples/sec Loss 1.0640 LearningRate 0.0004 Epoch: 18 Global Step: 233390 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 23:50:05,438-Speed 3057.80 samples/sec Loss 1.1585 LearningRate 0.0004 Epoch: 18 Global Step: 233400 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 23:50:08,832-Speed 3018.06 samples/sec Loss 1.1556 LearningRate 0.0004 Epoch: 18 Global Step: 233410 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 23:50:12,274-Speed 2974.98 samples/sec Loss 1.0927 LearningRate 0.0004 Epoch: 18 Global Step: 233420 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 23:50:15,708-Speed 2982.91 samples/sec Loss 1.1278 LearningRate 0.0004 Epoch: 18 Global Step: 233430 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 23:50:19,098-Speed 3021.90 samples/sec Loss 1.1741 LearningRate 0.0004 Epoch: 18 Global Step: 233440 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 23:50:22,454-Speed 3052.30 samples/sec Loss 1.1200 LearningRate 0.0004 Epoch: 18 Global Step: 233450 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 23:50:25,782-Speed 3077.20 samples/sec Loss 1.1137 LearningRate 0.0004 Epoch: 18 Global Step: 233460 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 23:50:29,233-Speed 2968.75 samples/sec Loss 1.0612 LearningRate 0.0004 Epoch: 18 Global Step: 233470 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:50:32,688-Speed 2964.74 samples/sec Loss 1.0706 LearningRate 0.0004 Epoch: 18 Global Step: 233480 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:50:36,103-Speed 2999.39 samples/sec Loss 1.1334 LearningRate 0.0004 Epoch: 18 Global Step: 233490 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:50:39,507-Speed 3008.49 samples/sec Loss 1.1172 LearningRate 0.0004 Epoch: 18 Global Step: 233500 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:50:42,914-Speed 3006.13 samples/sec Loss 1.0731 LearningRate 0.0004 Epoch: 18 Global Step: 233510 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:50:46,215-Speed 3102.85 samples/sec Loss 1.1278 LearningRate 0.0004 Epoch: 18 Global Step: 233520 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:50:49,631-Speed 2998.92 samples/sec Loss 1.1062 LearningRate 0.0004 Epoch: 18 Global Step: 233530 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:50:52,965-Speed 3072.83 samples/sec Loss 1.0712 LearningRate 0.0004 Epoch: 18 Global Step: 233540 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:50:56,456-Speed 2933.34 samples/sec Loss 1.1530 LearningRate 0.0004 Epoch: 18 Global Step: 233550 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:50:59,877-Speed 2994.10 samples/sec Loss 1.0934 LearningRate 0.0004 Epoch: 18 Global Step: 233560 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:51:03,238-Speed 3047.87 samples/sec Loss 1.0545 LearningRate 0.0004 Epoch: 18 Global Step: 233570 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:51:06,647-Speed 3004.34 samples/sec Loss 1.1491 LearningRate 0.0004 Epoch: 18 Global Step: 233580 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 23:51:09,960-Speed 3092.46 samples/sec Loss 1.1200 LearningRate 0.0004 Epoch: 18 Global Step: 233590 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 23:51:13,436-Speed 2946.63 samples/sec Loss 1.1260 LearningRate 0.0004 Epoch: 18 Global Step: 233600 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 23:51:16,912-Speed 2946.76 samples/sec Loss 1.0965 LearningRate 0.0004 Epoch: 18 Global Step: 233610 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 23:51:20,341-Speed 2986.28 samples/sec Loss 1.1226 LearningRate 0.0004 Epoch: 18 Global Step: 233620 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 23:51:23,691-Speed 3058.34 samples/sec Loss 1.1079 LearningRate 0.0004 Epoch: 18 Global Step: 233630 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 23:51:27,107-Speed 2998.12 samples/sec Loss 1.0510 LearningRate 0.0004 Epoch: 18 Global Step: 233640 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 23:51:30,422-Speed 3090.12 samples/sec Loss 1.1040 LearningRate 0.0004 Epoch: 18 Global Step: 233650 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 23:51:33,777-Speed 3053.01 samples/sec Loss 1.1292 LearningRate 0.0004 Epoch: 18 Global Step: 233660 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 23:51:37,108-Speed 3074.71 samples/sec Loss 1.1242 LearningRate 0.0004 Epoch: 18 Global Step: 233670 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 23:51:40,508-Speed 3012.77 samples/sec Loss 1.1122 LearningRate 0.0004 Epoch: 18 Global Step: 233680 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 23:51:43,940-Speed 2984.68 samples/sec Loss 1.0940 LearningRate 0.0004 Epoch: 18 Global Step: 233690 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 23:51:47,334-Speed 3017.97 samples/sec Loss 1.1365 LearningRate 0.0004 Epoch: 18 Global Step: 233700 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 23:51:50,762-Speed 2988.15 samples/sec Loss 1.1369 LearningRate 0.0004 Epoch: 18 Global Step: 233710 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 23:51:54,157-Speed 3016.34 samples/sec Loss 1.1296 LearningRate 0.0004 Epoch: 18 Global Step: 233720 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 23:51:57,505-Speed 3060.07 samples/sec Loss 1.1649 LearningRate 0.0003 Epoch: 18 Global Step: 233730 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-27 23:52:00,895-Speed 3021.01 samples/sec Loss 1.1210 LearningRate 0.0003 Epoch: 18 Global Step: 233740 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 23:52:04,353-Speed 2962.22 samples/sec Loss 1.1133 LearningRate 0.0003 Epoch: 18 Global Step: 233750 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 23:52:07,686-Speed 3072.68 samples/sec Loss 1.0929 LearningRate 0.0003 Epoch: 18 Global Step: 233760 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 23:52:11,105-Speed 2996.27 samples/sec Loss 1.1254 LearningRate 0.0003 Epoch: 18 Global Step: 233770 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 23:52:14,443-Speed 3068.87 samples/sec Loss 1.0982 LearningRate 0.0003 Epoch: 18 Global Step: 233780 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 23:52:17,777-Speed 3072.30 samples/sec Loss 1.1745 LearningRate 0.0003 Epoch: 18 Global Step: 233790 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 23:52:21,103-Speed 3079.33 samples/sec Loss 1.1273 LearningRate 0.0003 Epoch: 18 Global Step: 233800 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 23:52:24,491-Speed 3022.82 samples/sec Loss 1.1219 LearningRate 0.0003 Epoch: 18 Global Step: 233810 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 23:52:27,941-Speed 2968.78 samples/sec Loss 1.1361 LearningRate 0.0003 Epoch: 18 Global Step: 233820 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 23:52:31,317-Speed 3034.10 samples/sec Loss 1.1356 LearningRate 0.0003 Epoch: 18 Global Step: 233830 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 23:52:34,722-Speed 3008.92 samples/sec Loss 1.0817 LearningRate 0.0003 Epoch: 18 Global Step: 233840 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:52:38,083-Speed 3047.23 samples/sec Loss 1.1541 LearningRate 0.0003 Epoch: 18 Global Step: 233850 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:52:41,458-Speed 3034.77 samples/sec Loss 1.1491 LearningRate 0.0003 Epoch: 18 Global Step: 233860 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:52:44,808-Speed 3057.50 samples/sec Loss 1.0617 LearningRate 0.0003 Epoch: 18 Global Step: 233870 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:52:48,193-Speed 3025.86 samples/sec Loss 1.1556 LearningRate 0.0003 Epoch: 18 Global Step: 233880 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:52:51,549-Speed 3052.76 samples/sec Loss 1.1064 LearningRate 0.0003 Epoch: 18 Global Step: 233890 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:52:54,959-Speed 3003.22 samples/sec Loss 1.1400 LearningRate 0.0003 Epoch: 18 Global Step: 233900 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:52:58,359-Speed 3012.43 samples/sec Loss 1.1404 LearningRate 0.0003 Epoch: 18 Global Step: 233910 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:53:01,747-Speed 3023.38 samples/sec Loss 1.1279 LearningRate 0.0003 Epoch: 18 Global Step: 233920 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:53:05,111-Speed 3045.14 samples/sec Loss 1.1050 LearningRate 0.0003 Epoch: 18 Global Step: 233930 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:53:08,480-Speed 3040.12 samples/sec Loss 1.1243 LearningRate 0.0003 Epoch: 18 Global Step: 233940 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:53:11,847-Speed 3043.06 samples/sec Loss 1.0889 LearningRate 0.0003 Epoch: 18 Global Step: 233950 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 23:53:15,213-Speed 3042.26 samples/sec Loss 1.0846 LearningRate 0.0003 Epoch: 18 Global Step: 233960 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 23:53:18,551-Speed 3068.57 samples/sec Loss 1.0774 LearningRate 0.0003 Epoch: 18 Global Step: 233970 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 23:53:21,950-Speed 3014.74 samples/sec Loss 1.1160 LearningRate 0.0003 Epoch: 18 Global Step: 233980 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 23:53:25,351-Speed 3011.70 samples/sec Loss 1.0989 LearningRate 0.0003 Epoch: 18 Global Step: 233990 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 23:53:28,738-Speed 3023.67 samples/sec Loss 1.1028 LearningRate 0.0003 Epoch: 18 Global Step: 234000 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 23:53:32,185-Speed 2971.59 samples/sec Loss 1.1015 LearningRate 0.0003 Epoch: 18 Global Step: 234010 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 23:53:35,671-Speed 2938.25 samples/sec Loss 1.1469 LearningRate 0.0003 Epoch: 18 Global Step: 234020 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 23:53:39,028-Speed 3051.35 samples/sec Loss 1.0982 LearningRate 0.0003 Epoch: 18 Global Step: 234030 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 23:53:42,373-Speed 3061.91 samples/sec Loss 1.0886 LearningRate 0.0003 Epoch: 18 Global Step: 234040 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 23:53:45,757-Speed 3027.06 samples/sec Loss 1.1170 LearningRate 0.0003 Epoch: 18 Global Step: 234050 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:53:49,159-Speed 3010.49 samples/sec Loss 1.1166 LearningRate 0.0003 Epoch: 18 Global Step: 234060 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:53:52,570-Speed 3003.76 samples/sec Loss 1.1293 LearningRate 0.0003 Epoch: 18 Global Step: 234070 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:53:55,914-Speed 3062.27 samples/sec Loss 1.0785 LearningRate 0.0003 Epoch: 18 Global Step: 234080 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:53:59,311-Speed 3015.50 samples/sec Loss 1.1325 LearningRate 0.0003 Epoch: 18 Global Step: 234090 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:54:02,672-Speed 3048.11 samples/sec Loss 1.0779 LearningRate 0.0003 Epoch: 18 Global Step: 234100 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:54:06,019-Speed 3059.67 samples/sec Loss 1.1027 LearningRate 0.0003 Epoch: 18 Global Step: 234110 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:54:09,325-Speed 3102.00 samples/sec Loss 1.1151 LearningRate 0.0003 Epoch: 18 Global Step: 234120 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:54:12,745-Speed 2995.01 samples/sec Loss 1.0904 LearningRate 0.0003 Epoch: 18 Global Step: 234130 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:54:16,108-Speed 3045.79 samples/sec Loss 1.1056 LearningRate 0.0003 Epoch: 18 Global Step: 234140 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:54:19,551-Speed 2975.05 samples/sec Loss 1.1734 LearningRate 0.0003 Epoch: 18 Global Step: 234150 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:54:23,023-Speed 2950.24 samples/sec Loss 1.1021 LearningRate 0.0003 Epoch: 18 Global Step: 234160 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:54:26,494-Speed 2950.21 samples/sec Loss 1.1227 LearningRate 0.0003 Epoch: 18 Global Step: 234170 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:54:29,921-Speed 2989.63 samples/sec Loss 1.1113 LearningRate 0.0003 Epoch: 18 Global Step: 234180 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:54:33,321-Speed 3012.59 samples/sec Loss 1.1265 LearningRate 0.0003 Epoch: 18 Global Step: 234190 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:54:36,666-Speed 3061.92 samples/sec Loss 1.1298 LearningRate 0.0003 Epoch: 18 Global Step: 234200 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:54:40,111-Speed 2974.18 samples/sec Loss 1.1550 LearningRate 0.0003 Epoch: 18 Global Step: 234210 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:54:43,542-Speed 2985.01 samples/sec Loss 1.1246 LearningRate 0.0003 Epoch: 18 Global Step: 234220 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:54:47,017-Speed 2947.43 samples/sec Loss 1.0807 LearningRate 0.0003 Epoch: 18 Global Step: 234230 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:54:50,453-Speed 2980.65 samples/sec Loss 1.0627 LearningRate 0.0003 Epoch: 18 Global Step: 234240 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:54:53,891-Speed 2979.30 samples/sec Loss 1.1195 LearningRate 0.0003 Epoch: 18 Global Step: 234250 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 23:54:57,323-Speed 2984.83 samples/sec Loss 1.1102 LearningRate 0.0003 Epoch: 18 Global Step: 234260 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:55:00,698-Speed 3035.19 samples/sec Loss 1.1322 LearningRate 0.0003 Epoch: 18 Global Step: 234270 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:55:04,057-Speed 3049.02 samples/sec Loss 1.1286 LearningRate 0.0003 Epoch: 18 Global Step: 234280 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:55:07,456-Speed 3013.82 samples/sec Loss 1.1137 LearningRate 0.0003 Epoch: 18 Global Step: 234290 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:55:10,856-Speed 3014.52 samples/sec Loss 1.1060 LearningRate 0.0003 Epoch: 18 Global Step: 234300 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:55:14,224-Speed 3041.11 samples/sec Loss 1.0929 LearningRate 0.0003 Epoch: 18 Global Step: 234310 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:55:17,663-Speed 2978.23 samples/sec Loss 1.1199 LearningRate 0.0003 Epoch: 18 Global Step: 234320 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 23:55:21,088-Speed 2990.87 samples/sec Loss 1.0983 LearningRate 0.0003 Epoch: 18 Global Step: 234330 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 23:55:24,480-Speed 3019.46 samples/sec Loss 1.0978 LearningRate 0.0003 Epoch: 18 Global Step: 234340 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 23:55:27,828-Speed 3059.37 samples/sec Loss 1.0982 LearningRate 0.0003 Epoch: 18 Global Step: 234350 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 23:55:31,274-Speed 2972.55 samples/sec Loss 1.0780 LearningRate 0.0003 Epoch: 18 Global Step: 234360 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 23:55:34,605-Speed 3074.84 samples/sec Loss 1.1012 LearningRate 0.0003 Epoch: 18 Global Step: 234370 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 23:55:37,979-Speed 3036.00 samples/sec Loss 1.1115 LearningRate 0.0003 Epoch: 18 Global Step: 234380 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 23:55:41,364-Speed 3026.32 samples/sec Loss 1.1302 LearningRate 0.0003 Epoch: 18 Global Step: 234390 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 23:55:44,791-Speed 2988.37 samples/sec Loss 1.0758 LearningRate 0.0003 Epoch: 18 Global Step: 234400 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 23:55:48,216-Speed 2990.47 samples/sec Loss 1.1587 LearningRate 0.0003 Epoch: 18 Global Step: 234410 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 23:55:51,528-Speed 3093.02 samples/sec Loss 1.1109 LearningRate 0.0003 Epoch: 18 Global Step: 234420 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:55:54,900-Speed 3037.62 samples/sec Loss 1.1165 LearningRate 0.0003 Epoch: 18 Global Step: 234430 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:55:58,275-Speed 3034.70 samples/sec Loss 1.1842 LearningRate 0.0003 Epoch: 18 Global Step: 234440 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:56:01,598-Speed 3082.89 samples/sec Loss 1.1231 LearningRate 0.0003 Epoch: 18 Global Step: 234450 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:56:04,934-Speed 3069.86 samples/sec Loss 1.1399 LearningRate 0.0003 Epoch: 18 Global Step: 234460 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:56:08,351-Speed 2997.84 samples/sec Loss 1.0912 LearningRate 0.0003 Epoch: 18 Global Step: 234470 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:56:11,736-Speed 3026.07 samples/sec Loss 1.1292 LearningRate 0.0003 Epoch: 18 Global Step: 234480 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:56:15,174-Speed 2979.01 samples/sec Loss 1.0769 LearningRate 0.0003 Epoch: 18 Global Step: 234490 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:56:18,571-Speed 3015.01 samples/sec Loss 1.1027 LearningRate 0.0003 Epoch: 18 Global Step: 234500 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 23:56:21,960-Speed 3023.25 samples/sec Loss 1.0790 LearningRate 0.0003 Epoch: 18 Global Step: 234510 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 23:56:25,307-Speed 3060.41 samples/sec Loss 1.1109 LearningRate 0.0003 Epoch: 18 Global Step: 234520 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 23:56:28,760-Speed 2966.60 samples/sec Loss 1.1050 LearningRate 0.0003 Epoch: 18 Global Step: 234530 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 23:56:32,269-Speed 2919.69 samples/sec Loss 1.1355 LearningRate 0.0003 Epoch: 18 Global Step: 234540 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 23:56:35,618-Speed 3058.70 samples/sec Loss 1.0854 LearningRate 0.0003 Epoch: 18 Global Step: 234550 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 23:56:38,987-Speed 3040.77 samples/sec Loss 1.1295 LearningRate 0.0003 Epoch: 18 Global Step: 234560 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 23:56:42,401-Speed 3000.08 samples/sec Loss 1.1133 LearningRate 0.0003 Epoch: 18 Global Step: 234570 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 23:56:45,843-Speed 2976.40 samples/sec Loss 1.1286 LearningRate 0.0003 Epoch: 18 Global Step: 234580 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 23:56:49,257-Speed 3000.37 samples/sec Loss 1.1173 LearningRate 0.0003 Epoch: 18 Global Step: 234590 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 23:56:52,685-Speed 2987.43 samples/sec Loss 1.1049 LearningRate 0.0003 Epoch: 18 Global Step: 234600 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:56:56,028-Speed 3064.26 samples/sec Loss 1.0980 LearningRate 0.0003 Epoch: 18 Global Step: 234610 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:56:59,380-Speed 3055.78 samples/sec Loss 1.1199 LearningRate 0.0003 Epoch: 18 Global Step: 234620 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:57:02,751-Speed 3038.64 samples/sec Loss 1.1361 LearningRate 0.0003 Epoch: 18 Global Step: 234630 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:57:06,104-Speed 3054.28 samples/sec Loss 1.1024 LearningRate 0.0003 Epoch: 18 Global Step: 234640 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:57:09,564-Speed 2960.96 samples/sec Loss 1.1046 LearningRate 0.0003 Epoch: 18 Global Step: 234650 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:57:12,937-Speed 3036.82 samples/sec Loss 1.1091 LearningRate 0.0003 Epoch: 18 Global Step: 234660 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:57:16,342-Speed 3007.89 samples/sec Loss 1.0724 LearningRate 0.0003 Epoch: 18 Global Step: 234670 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:57:19,726-Speed 3026.78 samples/sec Loss 1.1249 LearningRate 0.0003 Epoch: 18 Global Step: 234680 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:57:23,136-Speed 3003.38 samples/sec Loss 1.0815 LearningRate 0.0003 Epoch: 18 Global Step: 234690 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 23:57:26,607-Speed 2951.12 samples/sec Loss 1.1293 LearningRate 0.0003 Epoch: 18 Global Step: 234700 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 23:57:30,021-Speed 3000.36 samples/sec Loss 1.1406 LearningRate 0.0003 Epoch: 18 Global Step: 234710 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 23:57:33,435-Speed 3000.08 samples/sec Loss 1.1476 LearningRate 0.0003 Epoch: 18 Global Step: 234720 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 23:57:36,920-Speed 2939.26 samples/sec Loss 1.1233 LearningRate 0.0003 Epoch: 18 Global Step: 234730 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 23:57:40,359-Speed 2978.86 samples/sec Loss 1.1107 LearningRate 0.0003 Epoch: 18 Global Step: 234740 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 23:57:43,733-Speed 3034.80 samples/sec Loss 1.1260 LearningRate 0.0003 Epoch: 18 Global Step: 234750 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 23:57:47,105-Speed 3038.07 samples/sec Loss 1.1147 LearningRate 0.0003 Epoch: 18 Global Step: 234760 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 23:57:50,511-Speed 3007.28 samples/sec Loss 1.1132 LearningRate 0.0003 Epoch: 18 Global Step: 234770 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 23:57:53,981-Speed 2951.52 samples/sec Loss 1.0977 LearningRate 0.0003 Epoch: 18 Global Step: 234780 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 23:57:57,412-Speed 2985.79 samples/sec Loss 1.1508 LearningRate 0.0003 Epoch: 18 Global Step: 234790 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:58:00,743-Speed 3074.94 samples/sec Loss 1.0822 LearningRate 0.0003 Epoch: 18 Global Step: 234800 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:58:04,094-Speed 3056.78 samples/sec Loss 1.1302 LearningRate 0.0003 Epoch: 18 Global Step: 234810 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:58:07,397-Speed 3101.17 samples/sec Loss 1.1374 LearningRate 0.0003 Epoch: 18 Global Step: 234820 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:58:10,713-Speed 3089.17 samples/sec Loss 1.1205 LearningRate 0.0003 Epoch: 18 Global Step: 234830 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:58:14,073-Speed 3048.84 samples/sec Loss 1.1065 LearningRate 0.0003 Epoch: 18 Global Step: 234840 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:58:17,400-Speed 3078.51 samples/sec Loss 1.1440 LearningRate 0.0003 Epoch: 18 Global Step: 234850 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:58:20,769-Speed 3040.14 samples/sec Loss 1.1128 LearningRate 0.0003 Epoch: 18 Global Step: 234860 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 23:58:24,205-Speed 2981.61 samples/sec Loss 1.1579 LearningRate 0.0003 Epoch: 18 Global Step: 234870 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 23:58:27,650-Speed 2973.23 samples/sec Loss 1.0889 LearningRate 0.0003 Epoch: 18 Global Step: 234880 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 23:58:31,101-Speed 2968.17 samples/sec Loss 1.0993 LearningRate 0.0003 Epoch: 18 Global Step: 234890 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 23:58:34,521-Speed 2995.16 samples/sec Loss 1.1224 LearningRate 0.0003 Epoch: 18 Global Step: 234900 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 23:58:37,889-Speed 3041.37 samples/sec Loss 1.1079 LearningRate 0.0003 Epoch: 18 Global Step: 234910 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 23:58:41,222-Speed 3073.11 samples/sec Loss 1.0816 LearningRate 0.0003 Epoch: 18 Global Step: 234920 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 23:58:44,635-Speed 3001.91 samples/sec Loss 1.1331 LearningRate 0.0003 Epoch: 18 Global Step: 234930 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 23:58:48,079-Speed 2973.68 samples/sec Loss 1.0940 LearningRate 0.0003 Epoch: 18 Global Step: 234940 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 23:58:51,544-Speed 2955.91 samples/sec Loss 1.1072 LearningRate 0.0003 Epoch: 18 Global Step: 234950 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 23:58:54,966-Speed 2993.25 samples/sec Loss 1.0986 LearningRate 0.0003 Epoch: 18 Global Step: 234960 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:58:58,387-Speed 2994.45 samples/sec Loss 1.1294 LearningRate 0.0003 Epoch: 18 Global Step: 234970 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:59:01,855-Speed 2953.71 samples/sec Loss 1.0906 LearningRate 0.0003 Epoch: 18 Global Step: 234980 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:59:05,200-Speed 3061.98 samples/sec Loss 1.1092 LearningRate 0.0003 Epoch: 18 Global Step: 234990 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:59:08,597-Speed 3015.37 samples/sec Loss 1.1110 LearningRate 0.0003 Epoch: 18 Global Step: 235000 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:59:11,902-Speed 3099.42 samples/sec Loss 1.1333 LearningRate 0.0003 Epoch: 18 Global Step: 235010 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:59:15,284-Speed 3028.32 samples/sec Loss 1.0810 LearningRate 0.0003 Epoch: 18 Global Step: 235020 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:59:18,597-Speed 3091.35 samples/sec Loss 1.1358 LearningRate 0.0003 Epoch: 18 Global Step: 235030 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:59:21,985-Speed 3024.03 samples/sec Loss 1.1061 LearningRate 0.0003 Epoch: 18 Global Step: 235040 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:59:25,328-Speed 3063.48 samples/sec Loss 1.0970 LearningRate 0.0003 Epoch: 18 Global Step: 235050 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:59:28,799-Speed 2951.60 samples/sec Loss 1.1261 LearningRate 0.0003 Epoch: 18 Global Step: 235060 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-27 23:59:32,205-Speed 3007.15 samples/sec Loss 1.0977 LearningRate 0.0003 Epoch: 18 Global Step: 235070 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:59:35,580-Speed 3034.67 samples/sec Loss 1.0514 LearningRate 0.0003 Epoch: 18 Global Step: 235080 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:59:38,977-Speed 3016.57 samples/sec Loss 1.0976 LearningRate 0.0003 Epoch: 18 Global Step: 235090 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:59:42,431-Speed 2965.47 samples/sec Loss 1.1204 LearningRate 0.0003 Epoch: 18 Global Step: 235100 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:59:45,854-Speed 2992.56 samples/sec Loss 1.1246 LearningRate 0.0003 Epoch: 18 Global Step: 235110 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-27 23:59:49,221-Speed 3042.35 samples/sec Loss 1.0962 LearningRate 0.0003 Epoch: 18 Global Step: 235120 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 23:59:52,575-Speed 3053.93 samples/sec Loss 1.1299 LearningRate 0.0003 Epoch: 18 Global Step: 235130 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 23:59:55,941-Speed 3042.48 samples/sec Loss 1.1384 LearningRate 0.0003 Epoch: 18 Global Step: 235140 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-27 23:59:59,324-Speed 3027.88 samples/sec Loss 1.0883 LearningRate 0.0003 Epoch: 18 Global Step: 235150 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:00:02,734-Speed 3004.50 samples/sec Loss 1.1317 LearningRate 0.0003 Epoch: 18 Global Step: 235160 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:00:06,084-Speed 3057.39 samples/sec Loss 1.1186 LearningRate 0.0003 Epoch: 18 Global Step: 235170 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:00:09,495-Speed 3002.89 samples/sec Loss 1.1161 LearningRate 0.0003 Epoch: 18 Global Step: 235180 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:00:12,859-Speed 3044.75 samples/sec Loss 1.1086 LearningRate 0.0003 Epoch: 18 Global Step: 235190 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:00:16,258-Speed 3013.72 samples/sec Loss 1.0951 LearningRate 0.0003 Epoch: 18 Global Step: 235200 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:00:19,643-Speed 3026.25 samples/sec Loss 1.0952 LearningRate 0.0003 Epoch: 18 Global Step: 235210 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:00:23,018-Speed 3034.77 samples/sec Loss 1.1446 LearningRate 0.0003 Epoch: 18 Global Step: 235220 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:00:26,433-Speed 2999.70 samples/sec Loss 1.1443 LearningRate 0.0003 Epoch: 18 Global Step: 235230 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:00:29,788-Speed 3052.56 samples/sec Loss 1.0969 LearningRate 0.0003 Epoch: 18 Global Step: 235240 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:00:33,248-Speed 2960.79 samples/sec Loss 1.0888 LearningRate 0.0003 Epoch: 18 Global Step: 235250 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:00:36,630-Speed 3028.75 samples/sec Loss 1.1020 LearningRate 0.0003 Epoch: 18 Global Step: 235260 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:00:39,958-Speed 3077.97 samples/sec Loss 1.1376 LearningRate 0.0003 Epoch: 18 Global Step: 235270 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:00:43,333-Speed 3035.57 samples/sec Loss 1.0731 LearningRate 0.0003 Epoch: 18 Global Step: 235280 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:00:46,817-Speed 2939.59 samples/sec Loss 1.0542 LearningRate 0.0003 Epoch: 18 Global Step: 235290 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:00:50,323-Speed 2921.44 samples/sec Loss 1.1306 LearningRate 0.0003 Epoch: 18 Global Step: 235300 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:00:53,669-Speed 3061.19 samples/sec Loss 1.1736 LearningRate 0.0003 Epoch: 18 Global Step: 235310 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:00:57,015-Speed 3061.39 samples/sec Loss 1.1198 LearningRate 0.0003 Epoch: 18 Global Step: 235320 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:01:00,461-Speed 2972.32 samples/sec Loss 1.1270 LearningRate 0.0003 Epoch: 18 Global Step: 235330 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:01:03,954-Speed 2932.43 samples/sec Loss 1.1100 LearningRate 0.0003 Epoch: 18 Global Step: 235340 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:01:07,386-Speed 2984.99 samples/sec Loss 1.1239 LearningRate 0.0003 Epoch: 18 Global Step: 235350 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:01:10,836-Speed 2969.46 samples/sec Loss 1.0854 LearningRate 0.0003 Epoch: 18 Global Step: 235360 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:01:14,281-Speed 2973.08 samples/sec Loss 1.0477 LearningRate 0.0003 Epoch: 18 Global Step: 235370 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:01:17,703-Speed 2992.92 samples/sec Loss 1.1459 LearningRate 0.0003 Epoch: 18 Global Step: 235380 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:01:21,121-Speed 2996.35 samples/sec Loss 1.1117 LearningRate 0.0003 Epoch: 18 Global Step: 235390 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:01:24,504-Speed 3028.11 samples/sec Loss 1.1571 LearningRate 0.0003 Epoch: 18 Global Step: 235400 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:01:27,975-Speed 2951.21 samples/sec Loss 1.1450 LearningRate 0.0003 Epoch: 18 Global Step: 235410 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:01:31,311-Speed 3069.73 samples/sec Loss 1.0437 LearningRate 0.0003 Epoch: 18 Global Step: 235420 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:01:34,652-Speed 3066.19 samples/sec Loss 1.1376 LearningRate 0.0003 Epoch: 18 Global Step: 235430 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:01:38,041-Speed 3022.82 samples/sec Loss 1.1521 LearningRate 0.0003 Epoch: 18 Global Step: 235440 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:01:41,450-Speed 3004.73 samples/sec Loss 1.1437 LearningRate 0.0003 Epoch: 18 Global Step: 235450 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:01:44,843-Speed 3018.80 samples/sec Loss 1.1170 LearningRate 0.0003 Epoch: 18 Global Step: 235460 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:01:48,225-Speed 3028.65 samples/sec Loss 1.1253 LearningRate 0.0003 Epoch: 18 Global Step: 235470 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-28 00:01:51,644-Speed 2995.23 samples/sec Loss 1.0995 LearningRate 0.0003 Epoch: 18 Global Step: 235480 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-28 00:01:55,038-Speed 3018.23 samples/sec Loss 1.0884 LearningRate 0.0003 Epoch: 18 Global Step: 235490 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:01:58,487-Speed 2970.11 samples/sec Loss 1.1418 LearningRate 0.0003 Epoch: 18 Global Step: 235500 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:02:01,889-Speed 3010.81 samples/sec Loss 1.1729 LearningRate 0.0003 Epoch: 18 Global Step: 235510 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:02:05,248-Speed 3049.24 samples/sec Loss 1.0906 LearningRate 0.0003 Epoch: 18 Global Step: 235520 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:02:08,609-Speed 3047.34 samples/sec Loss 1.0926 LearningRate 0.0003 Epoch: 18 Global Step: 235530 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:02:12,013-Speed 3009.25 samples/sec Loss 1.1226 LearningRate 0.0003 Epoch: 18 Global Step: 235540 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:02:15,398-Speed 3026.18 samples/sec Loss 1.1211 LearningRate 0.0003 Epoch: 18 Global Step: 235550 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:02:18,756-Speed 3050.25 samples/sec Loss 1.0994 LearningRate 0.0003 Epoch: 18 Global Step: 235560 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:02:22,190-Speed 2983.01 samples/sec Loss 1.1381 LearningRate 0.0003 Epoch: 18 Global Step: 235570 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:02:25,623-Speed 2983.56 samples/sec Loss 1.1398 LearningRate 0.0003 Epoch: 18 Global Step: 235580 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:02:29,082-Speed 2961.13 samples/sec Loss 1.1074 LearningRate 0.0003 Epoch: 18 Global Step: 235590 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:02:32,532-Speed 2968.75 samples/sec Loss 1.1256 LearningRate 0.0003 Epoch: 18 Global Step: 235600 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:02:35,946-Speed 3000.07 samples/sec Loss 1.0925 LearningRate 0.0003 Epoch: 18 Global Step: 235610 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:02:39,281-Speed 3071.32 samples/sec Loss 1.1012 LearningRate 0.0003 Epoch: 18 Global Step: 235620 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:02:42,653-Speed 3037.90 samples/sec Loss 1.1416 LearningRate 0.0003 Epoch: 18 Global Step: 235630 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:02:46,000-Speed 3060.86 samples/sec Loss 1.0977 LearningRate 0.0003 Epoch: 18 Global Step: 235640 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:02:49,368-Speed 3041.28 samples/sec Loss 1.1062 LearningRate 0.0003 Epoch: 18 Global Step: 235650 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:02:52,775-Speed 3006.29 samples/sec Loss 1.0933 LearningRate 0.0003 Epoch: 18 Global Step: 235660 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:02:56,107-Speed 3073.60 samples/sec Loss 1.1359 LearningRate 0.0003 Epoch: 18 Global Step: 235670 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:02:59,452-Speed 3062.38 samples/sec Loss 1.1036 LearningRate 0.0003 Epoch: 18 Global Step: 235680 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:03:02,824-Speed 3037.69 samples/sec Loss 1.1590 LearningRate 0.0003 Epoch: 18 Global Step: 235690 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:03:06,194-Speed 3039.08 samples/sec Loss 1.0957 LearningRate 0.0003 Epoch: 18 Global Step: 235700 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:03:10,229-Speed 2538.96 samples/sec Loss 1.1000 LearningRate 0.0003 Epoch: 18 Global Step: 235710 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:03:13,633-Speed 3008.73 samples/sec Loss 1.1040 LearningRate 0.0003 Epoch: 18 Global Step: 235720 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:03:17,712-Speed 2510.93 samples/sec Loss 1.1089 LearningRate 0.0003 Epoch: 18 Global Step: 235730 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:03:21,638-Speed 2608.81 samples/sec Loss 1.1048 LearningRate 0.0003 Epoch: 18 Global Step: 235740 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:03:24,996-Speed 3050.56 samples/sec Loss 1.1115 LearningRate 0.0003 Epoch: 18 Global Step: 235750 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:03:28,963-Speed 2581.92 samples/sec Loss 1.0911 LearningRate 0.0003 Epoch: 18 Global Step: 235760 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:03:32,319-Speed 3052.41 samples/sec Loss 1.0565 LearningRate 0.0003 Epoch: 18 Global Step: 235770 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:03:35,734-Speed 2998.66 samples/sec Loss 1.0886 LearningRate 0.0003 Epoch: 18 Global Step: 235780 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:03:39,144-Speed 3004.76 samples/sec Loss 1.0919 LearningRate 0.0003 Epoch: 18 Global Step: 235790 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:03:42,590-Speed 2971.68 samples/sec Loss 1.1324 LearningRate 0.0003 Epoch: 18 Global Step: 235800 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:03:45,999-Speed 3005.40 samples/sec Loss 1.1350 LearningRate 0.0003 Epoch: 18 Global Step: 235810 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:03:49,440-Speed 2976.54 samples/sec Loss 1.0993 LearningRate 0.0003 Epoch: 18 Global Step: 235820 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:03:52,779-Speed 3067.16 samples/sec Loss 1.1305 LearningRate 0.0003 Epoch: 18 Global Step: 235830 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:03:56,162-Speed 3028.50 samples/sec Loss 1.0910 LearningRate 0.0003 Epoch: 18 Global Step: 235840 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-28 00:03:59,530-Speed 3041.42 samples/sec Loss 1.0992 LearningRate 0.0003 Epoch: 18 Global Step: 235850 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-28 00:04:02,926-Speed 3016.15 samples/sec Loss 1.1372 LearningRate 0.0003 Epoch: 18 Global Step: 235860 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:04:06,356-Speed 2985.68 samples/sec Loss 1.1053 LearningRate 0.0003 Epoch: 18 Global Step: 235870 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:04:09,774-Speed 2996.92 samples/sec Loss 1.1171 LearningRate 0.0003 Epoch: 18 Global Step: 235880 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:04:13,171-Speed 3015.55 samples/sec Loss 1.1555 LearningRate 0.0003 Epoch: 18 Global Step: 235890 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:04:16,575-Speed 3010.13 samples/sec Loss 1.1206 LearningRate 0.0003 Epoch: 18 Global Step: 235900 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:04:20,019-Speed 2974.70 samples/sec Loss 1.0833 LearningRate 0.0003 Epoch: 18 Global Step: 235910 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:04:23,417-Speed 3013.96 samples/sec Loss 1.1168 LearningRate 0.0003 Epoch: 18 Global Step: 235920 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:04:26,795-Speed 3031.98 samples/sec Loss 1.1404 LearningRate 0.0003 Epoch: 18 Global Step: 235930 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:04:30,164-Speed 3040.68 samples/sec Loss 1.0984 LearningRate 0.0003 Epoch: 18 Global Step: 235940 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:04:33,619-Speed 2964.67 samples/sec Loss 1.1256 LearningRate 0.0003 Epoch: 18 Global Step: 235950 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:04:36,977-Speed 3049.77 samples/sec Loss 1.1095 LearningRate 0.0003 Epoch: 18 Global Step: 235960 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:04:40,432-Speed 2965.00 samples/sec Loss 1.0843 LearningRate 0.0003 Epoch: 18 Global Step: 235970 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:04:43,858-Speed 2989.26 samples/sec Loss 1.1428 LearningRate 0.0003 Epoch: 18 Global Step: 235980 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:04:47,585-Speed 2748.52 samples/sec Loss 1.1219 LearningRate 0.0003 Epoch: 18 Global Step: 235990 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:05:20,223-Speed 313.76 samples/sec Loss 1.0956 LearningRate 0.0002 Epoch: 19 Global Step: 236000 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:05:23,845-Speed 2828.39 samples/sec Loss 0.8540 LearningRate 0.0002 Epoch: 19 Global Step: 236010 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:05:27,468-Speed 2827.50 samples/sec Loss 0.8530 LearningRate 0.0002 Epoch: 19 Global Step: 236020 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:05:31,130-Speed 2799.18 samples/sec Loss 0.8778 LearningRate 0.0002 Epoch: 19 Global Step: 236030 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:05:34,504-Speed 3035.41 samples/sec Loss 0.8590 LearningRate 0.0002 Epoch: 19 Global Step: 236040 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:05:38,039-Speed 2897.89 samples/sec Loss 0.8080 LearningRate 0.0002 Epoch: 19 Global Step: 236050 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:05:41,644-Speed 2841.75 samples/sec Loss 0.8558 LearningRate 0.0002 Epoch: 19 Global Step: 236060 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:05:45,200-Speed 2880.73 samples/sec Loss 0.8641 LearningRate 0.0002 Epoch: 19 Global Step: 236070 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:05:48,754-Speed 2881.98 samples/sec Loss 0.8799 LearningRate 0.0002 Epoch: 19 Global Step: 236080 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:05:52,172-Speed 2996.66 samples/sec Loss 0.8695 LearningRate 0.0002 Epoch: 19 Global Step: 236090 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:05:55,633-Speed 2959.30 samples/sec Loss 0.8866 LearningRate 0.0002 Epoch: 19 Global Step: 236100 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:05:59,514-Speed 2639.39 samples/sec Loss 0.8568 LearningRate 0.0002 Epoch: 19 Global Step: 236110 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:06:02,852-Speed 3068.66 samples/sec Loss 0.8360 LearningRate 0.0002 Epoch: 19 Global Step: 236120 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:06:06,253-Speed 3011.71 samples/sec Loss 0.8445 LearningRate 0.0002 Epoch: 19 Global Step: 236130 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:06:10,268-Speed 2551.00 samples/sec Loss 0.8518 LearningRate 0.0002 Epoch: 19 Global Step: 236140 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:06:13,679-Speed 3003.42 samples/sec Loss 0.8752 LearningRate 0.0002 Epoch: 19 Global Step: 236150 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:06:17,002-Speed 3082.55 samples/sec Loss 0.8762 LearningRate 0.0002 Epoch: 19 Global Step: 236160 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:06:20,396-Speed 3017.36 samples/sec Loss 0.8463 LearningRate 0.0002 Epoch: 19 Global Step: 236170 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:06:23,800-Speed 3009.43 samples/sec Loss 0.8655 LearningRate 0.0002 Epoch: 19 Global Step: 236180 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:06:27,273-Speed 2949.34 samples/sec Loss 0.8670 LearningRate 0.0002 Epoch: 19 Global Step: 236190 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:06:30,743-Speed 2952.71 samples/sec Loss 0.8638 LearningRate 0.0002 Epoch: 19 Global Step: 236200 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:06:34,206-Speed 2957.16 samples/sec Loss 0.8420 LearningRate 0.0002 Epoch: 19 Global Step: 236210 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:06:37,613-Speed 3006.51 samples/sec Loss 0.8969 LearningRate 0.0002 Epoch: 19 Global Step: 236220 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:06:40,994-Speed 3029.57 samples/sec Loss 0.8794 LearningRate 0.0002 Epoch: 19 Global Step: 236230 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:06:44,354-Speed 3048.38 samples/sec Loss 0.8649 LearningRate 0.0002 Epoch: 19 Global Step: 236240 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-28 00:06:47,664-Speed 3094.91 samples/sec Loss 0.8964 LearningRate 0.0002 Epoch: 19 Global Step: 236250 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:06:51,032-Speed 3040.87 samples/sec Loss 0.8415 LearningRate 0.0002 Epoch: 19 Global Step: 236260 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:06:54,427-Speed 3016.78 samples/sec Loss 0.8747 LearningRate 0.0002 Epoch: 19 Global Step: 236270 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:06:57,861-Speed 2983.21 samples/sec Loss 0.8484 LearningRate 0.0002 Epoch: 19 Global Step: 236280 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:07:01,373-Speed 2916.39 samples/sec Loss 0.8713 LearningRate 0.0002 Epoch: 19 Global Step: 236290 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:07:04,870-Speed 2929.77 samples/sec Loss 0.8614 LearningRate 0.0002 Epoch: 19 Global Step: 236300 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:07:08,179-Speed 3094.61 samples/sec Loss 0.8814 LearningRate 0.0002 Epoch: 19 Global Step: 236310 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:07:12,487-Speed 2377.99 samples/sec Loss 0.8211 LearningRate 0.0002 Epoch: 19 Global Step: 236320 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:07:15,868-Speed 3029.34 samples/sec Loss 0.9200 LearningRate 0.0002 Epoch: 19 Global Step: 236330 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:07:19,235-Speed 3042.04 samples/sec Loss 0.8593 LearningRate 0.0002 Epoch: 19 Global Step: 236340 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:07:22,652-Speed 2997.35 samples/sec Loss 0.8721 LearningRate 0.0002 Epoch: 19 Global Step: 236350 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:07:25,998-Speed 3061.55 samples/sec Loss 0.8791 LearningRate 0.0002 Epoch: 19 Global Step: 236360 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:07:29,414-Speed 2998.15 samples/sec Loss 0.8739 LearningRate 0.0002 Epoch: 19 Global Step: 236370 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:07:32,799-Speed 3026.53 samples/sec Loss 0.8477 LearningRate 0.0002 Epoch: 19 Global Step: 236380 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:07:36,139-Speed 3066.16 samples/sec Loss 0.8205 LearningRate 0.0002 Epoch: 19 Global Step: 236390 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:07:39,555-Speed 2999.07 samples/sec Loss 0.8814 LearningRate 0.0002 Epoch: 19 Global Step: 236400 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:07:42,884-Speed 3076.80 samples/sec Loss 0.8817 LearningRate 0.0002 Epoch: 19 Global Step: 236410 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:07:46,322-Speed 2979.25 samples/sec Loss 0.8531 LearningRate 0.0002 Epoch: 19 Global Step: 236420 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:07:49,819-Speed 2929.05 samples/sec Loss 0.8417 LearningRate 0.0002 Epoch: 19 Global Step: 236430 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:07:53,236-Speed 2997.84 samples/sec Loss 0.8825 LearningRate 0.0002 Epoch: 19 Global Step: 236440 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:07:56,578-Speed 3064.90 samples/sec Loss 0.9153 LearningRate 0.0002 Epoch: 19 Global Step: 236450 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:07:59,992-Speed 3000.27 samples/sec Loss 0.8550 LearningRate 0.0002 Epoch: 19 Global Step: 236460 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:08:03,399-Speed 3006.60 samples/sec Loss 0.8458 LearningRate 0.0002 Epoch: 19 Global Step: 236470 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:08:06,874-Speed 2949.06 samples/sec Loss 0.8552 LearningRate 0.0002 Epoch: 19 Global Step: 236480 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:08:10,319-Speed 2973.27 samples/sec Loss 0.8800 LearningRate 0.0002 Epoch: 19 Global Step: 236490 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:08:13,661-Speed 3064.98 samples/sec Loss 0.8364 LearningRate 0.0002 Epoch: 19 Global Step: 236500 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:08:17,013-Speed 3055.69 samples/sec Loss 0.8510 LearningRate 0.0002 Epoch: 19 Global Step: 236510 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:08:20,384-Speed 3038.41 samples/sec Loss 0.8491 LearningRate 0.0002 Epoch: 19 Global Step: 236520 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-28 00:08:23,744-Speed 3049.16 samples/sec Loss 0.8186 LearningRate 0.0002 Epoch: 19 Global Step: 236530 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-28 00:08:27,107-Speed 3045.10 samples/sec Loss 0.8881 LearningRate 0.0002 Epoch: 19 Global Step: 236540 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-28 00:08:30,503-Speed 3016.23 samples/sec Loss 0.8079 LearningRate 0.0002 Epoch: 19 Global Step: 236550 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-28 00:08:33,874-Speed 3038.37 samples/sec Loss 0.8872 LearningRate 0.0002 Epoch: 19 Global Step: 236560 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-28 00:08:37,283-Speed 3004.57 samples/sec Loss 0.8598 LearningRate 0.0002 Epoch: 19 Global Step: 236570 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-28 00:08:40,687-Speed 3008.85 samples/sec Loss 0.9039 LearningRate 0.0002 Epoch: 19 Global Step: 236580 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-28 00:08:44,128-Speed 2976.88 samples/sec Loss 0.8300 LearningRate 0.0002 Epoch: 19 Global Step: 236590 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-28 00:08:47,478-Speed 3057.12 samples/sec Loss 0.8701 LearningRate 0.0002 Epoch: 19 Global Step: 236600 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-28 00:08:50,839-Speed 3048.03 samples/sec Loss 0.8648 LearningRate 0.0002 Epoch: 19 Global Step: 236610 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-28 00:08:54,196-Speed 3051.44 samples/sec Loss 0.8475 LearningRate 0.0002 Epoch: 19 Global Step: 236620 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:08:57,566-Speed 3038.86 samples/sec Loss 0.8603 LearningRate 0.0002 Epoch: 19 Global Step: 236630 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:09:00,927-Speed 3047.92 samples/sec Loss 0.8517 LearningRate 0.0002 Epoch: 19 Global Step: 236640 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:09:04,296-Speed 3040.29 samples/sec Loss 0.9001 LearningRate 0.0002 Epoch: 19 Global Step: 236650 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:09:07,688-Speed 3023.11 samples/sec Loss 0.8619 LearningRate 0.0002 Epoch: 19 Global Step: 236660 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:09:11,030-Speed 3065.20 samples/sec Loss 0.8790 LearningRate 0.0002 Epoch: 19 Global Step: 236670 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:09:14,388-Speed 3049.83 samples/sec Loss 0.8728 LearningRate 0.0002 Epoch: 19 Global Step: 236680 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:09:17,846-Speed 2962.35 samples/sec Loss 0.8787 LearningRate 0.0002 Epoch: 19 Global Step: 236690 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:09:21,223-Speed 3032.96 samples/sec Loss 0.8365 LearningRate 0.0002 Epoch: 19 Global Step: 236700 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:09:24,545-Speed 3083.22 samples/sec Loss 0.8691 LearningRate 0.0002 Epoch: 19 Global Step: 236710 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:09:27,921-Speed 3034.41 samples/sec Loss 0.8532 LearningRate 0.0002 Epoch: 19 Global Step: 236720 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:09:31,311-Speed 3021.78 samples/sec Loss 0.8583 LearningRate 0.0002 Epoch: 19 Global Step: 236730 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:09:34,653-Speed 3064.84 samples/sec Loss 0.8512 LearningRate 0.0002 Epoch: 19 Global Step: 236740 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:09:38,110-Speed 2962.43 samples/sec Loss 0.8986 LearningRate 0.0002 Epoch: 19 Global Step: 236750 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:09:41,460-Speed 3057.77 samples/sec Loss 0.8442 LearningRate 0.0002 Epoch: 19 Global Step: 236760 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:09:44,834-Speed 3035.90 samples/sec Loss 0.8686 LearningRate 0.0002 Epoch: 19 Global Step: 236770 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:09:48,156-Speed 3083.58 samples/sec Loss 0.8388 LearningRate 0.0002 Epoch: 19 Global Step: 236780 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:09:51,510-Speed 3054.05 samples/sec Loss 0.9069 LearningRate 0.0002 Epoch: 19 Global Step: 236790 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:09:54,833-Speed 3081.89 samples/sec Loss 0.8837 LearningRate 0.0002 Epoch: 19 Global Step: 236800 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:09:58,213-Speed 3030.59 samples/sec Loss 0.8791 LearningRate 0.0002 Epoch: 19 Global Step: 236810 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:10:01,602-Speed 3022.42 samples/sec Loss 0.8458 LearningRate 0.0002 Epoch: 19 Global Step: 236820 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-28 00:10:04,958-Speed 3052.12 samples/sec Loss 0.8738 LearningRate 0.0002 Epoch: 19 Global Step: 236830 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:10:08,339-Speed 3029.90 samples/sec Loss 0.8901 LearningRate 0.0002 Epoch: 19 Global Step: 236840 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:10:11,733-Speed 3017.67 samples/sec Loss 0.8489 LearningRate 0.0002 Epoch: 19 Global Step: 236850 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:10:15,083-Speed 3057.92 samples/sec Loss 0.8496 LearningRate 0.0002 Epoch: 19 Global Step: 236860 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:10:18,464-Speed 3029.66 samples/sec Loss 0.8542 LearningRate 0.0002 Epoch: 19 Global Step: 236870 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:10:21,827-Speed 3045.63 samples/sec Loss 0.8482 LearningRate 0.0002 Epoch: 19 Global Step: 236880 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:10:25,164-Speed 3069.61 samples/sec Loss 0.8434 LearningRate 0.0002 Epoch: 19 Global Step: 236890 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:10:28,537-Speed 3037.04 samples/sec Loss 0.8471 LearningRate 0.0002 Epoch: 19 Global Step: 236900 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:10:31,938-Speed 3011.01 samples/sec Loss 0.9328 LearningRate 0.0002 Epoch: 19 Global Step: 236910 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:10:35,338-Speed 3012.92 samples/sec Loss 0.8993 LearningRate 0.0002 Epoch: 19 Global Step: 236920 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:10:38,688-Speed 3058.52 samples/sec Loss 0.8577 LearningRate 0.0002 Epoch: 19 Global Step: 236930 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:10:42,126-Speed 2978.61 samples/sec Loss 0.8904 LearningRate 0.0002 Epoch: 19 Global Step: 236940 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:10:45,525-Speed 3014.20 samples/sec Loss 0.8512 LearningRate 0.0002 Epoch: 19 Global Step: 236950 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:10:48,929-Speed 3008.93 samples/sec Loss 0.8740 LearningRate 0.0002 Epoch: 19 Global Step: 236960 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:10:52,283-Speed 3053.17 samples/sec Loss 0.8648 LearningRate 0.0002 Epoch: 19 Global Step: 236970 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:10:55,574-Speed 3112.44 samples/sec Loss 0.8695 LearningRate 0.0002 Epoch: 19 Global Step: 236980 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:10:58,911-Speed 3070.24 samples/sec Loss 0.8705 LearningRate 0.0002 Epoch: 19 Global Step: 236990 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:11:02,270-Speed 3048.66 samples/sec Loss 0.9119 LearningRate 0.0002 Epoch: 19 Global Step: 237000 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:11:05,666-Speed 3016.72 samples/sec Loss 0.8763 LearningRate 0.0002 Epoch: 19 Global Step: 237010 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-28 00:11:09,152-Speed 2938.40 samples/sec Loss 0.8597 LearningRate 0.0002 Epoch: 19 Global Step: 237020 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-28 00:11:12,630-Speed 2944.13 samples/sec Loss 0.8470 LearningRate 0.0002 Epoch: 19 Global Step: 237030 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-28 00:11:15,982-Speed 3056.70 samples/sec Loss 0.8755 LearningRate 0.0002 Epoch: 19 Global Step: 237040 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-28 00:11:19,310-Speed 3077.03 samples/sec Loss 0.8939 LearningRate 0.0002 Epoch: 19 Global Step: 237050 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-28 00:11:22,690-Speed 3030.41 samples/sec Loss 0.8395 LearningRate 0.0002 Epoch: 19 Global Step: 237060 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-28 00:11:26,044-Speed 3053.92 samples/sec Loss 0.8688 LearningRate 0.0002 Epoch: 19 Global Step: 237070 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-28 00:11:29,443-Speed 3014.13 samples/sec Loss 0.8519 LearningRate 0.0002 Epoch: 19 Global Step: 237080 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-28 00:11:32,801-Speed 3049.63 samples/sec Loss 0.8218 LearningRate 0.0002 Epoch: 19 Global Step: 237090 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-28 00:11:36,171-Speed 3039.28 samples/sec Loss 0.8891 LearningRate 0.0002 Epoch: 19 Global Step: 237100 Fp16 Grad Scale: 8192 Required: 1 hours Training: 2022-04-28 00:11:39,545-Speed 3036.53 samples/sec Loss 0.8584 LearningRate 0.0002 Epoch: 19 Global Step: 237110 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:11:42,958-Speed 3000.95 samples/sec Loss 0.8264 LearningRate 0.0002 Epoch: 19 Global Step: 237120 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:11:46,341-Speed 3027.46 samples/sec Loss 0.8967 LearningRate 0.0002 Epoch: 19 Global Step: 237130 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:11:49,731-Speed 3021.93 samples/sec Loss 0.8902 LearningRate 0.0002 Epoch: 19 Global Step: 237140 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:11:53,084-Speed 3054.87 samples/sec Loss 0.8688 LearningRate 0.0002 Epoch: 19 Global Step: 237150 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:11:56,406-Speed 3082.61 samples/sec Loss 0.8421 LearningRate 0.0002 Epoch: 19 Global Step: 237160 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:11:59,720-Speed 3091.44 samples/sec Loss 0.8810 LearningRate 0.0002 Epoch: 19 Global Step: 237170 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:12:03,102-Speed 3028.45 samples/sec Loss 0.9124 LearningRate 0.0002 Epoch: 19 Global Step: 237180 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:12:06,450-Speed 3059.36 samples/sec Loss 0.8889 LearningRate 0.0002 Epoch: 19 Global Step: 237190 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:12:09,792-Speed 3065.82 samples/sec Loss 0.8761 LearningRate 0.0002 Epoch: 19 Global Step: 237200 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:12:13,168-Speed 3034.16 samples/sec Loss 0.8694 LearningRate 0.0002 Epoch: 19 Global Step: 237210 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:12:16,511-Speed 3064.22 samples/sec Loss 0.8463 LearningRate 0.0002 Epoch: 19 Global Step: 237220 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:12:19,864-Speed 3053.81 samples/sec Loss 0.8869 LearningRate 0.0002 Epoch: 19 Global Step: 237230 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:12:23,314-Speed 2969.06 samples/sec Loss 0.8749 LearningRate 0.0002 Epoch: 19 Global Step: 237240 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:12:26,669-Speed 3053.75 samples/sec Loss 0.8908 LearningRate 0.0002 Epoch: 19 Global Step: 237250 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:12:29,999-Speed 3076.19 samples/sec Loss 0.8523 LearningRate 0.0002 Epoch: 19 Global Step: 237260 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:12:33,560-Speed 2875.97 samples/sec Loss 0.8595 LearningRate 0.0002 Epoch: 19 Global Step: 237270 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:12:36,930-Speed 3039.72 samples/sec Loss 0.8424 LearningRate 0.0002 Epoch: 19 Global Step: 237280 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:12:40,311-Speed 3029.73 samples/sec Loss 0.8488 LearningRate 0.0002 Epoch: 19 Global Step: 237290 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:12:43,652-Speed 3066.30 samples/sec Loss 0.8699 LearningRate 0.0002 Epoch: 19 Global Step: 237300 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:12:47,017-Speed 3043.26 samples/sec Loss 0.8451 LearningRate 0.0002 Epoch: 19 Global Step: 237310 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:12:50,426-Speed 3005.20 samples/sec Loss 0.8539 LearningRate 0.0002 Epoch: 19 Global Step: 237320 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:12:53,819-Speed 3018.32 samples/sec Loss 0.8634 LearningRate 0.0002 Epoch: 19 Global Step: 237330 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:12:57,151-Speed 3074.01 samples/sec Loss 0.8574 LearningRate 0.0002 Epoch: 19 Global Step: 237340 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:13:00,575-Speed 2991.77 samples/sec Loss 0.8822 LearningRate 0.0002 Epoch: 19 Global Step: 237350 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:13:03,884-Speed 3095.59 samples/sec Loss 0.8313 LearningRate 0.0002 Epoch: 19 Global Step: 237360 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:13:07,213-Speed 3076.54 samples/sec Loss 0.9024 LearningRate 0.0002 Epoch: 19 Global Step: 237370 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:13:10,541-Speed 3078.50 samples/sec Loss 0.8511 LearningRate 0.0002 Epoch: 19 Global Step: 237380 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:13:13,851-Speed 3094.27 samples/sec Loss 0.8560 LearningRate 0.0002 Epoch: 19 Global Step: 237390 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:13:17,246-Speed 3017.65 samples/sec Loss 0.8267 LearningRate 0.0002 Epoch: 19 Global Step: 237400 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:13:20,631-Speed 3025.87 samples/sec Loss 0.9028 LearningRate 0.0002 Epoch: 19 Global Step: 237410 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:13:24,009-Speed 3031.63 samples/sec Loss 0.8707 LearningRate 0.0002 Epoch: 19 Global Step: 237420 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-28 00:13:27,496-Speed 2937.57 samples/sec Loss 0.8471 LearningRate 0.0002 Epoch: 19 Global Step: 237430 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-28 00:13:30,884-Speed 3023.66 samples/sec Loss 0.8870 LearningRate 0.0002 Epoch: 19 Global Step: 237440 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:13:34,240-Speed 3051.57 samples/sec Loss 0.8688 LearningRate 0.0002 Epoch: 19 Global Step: 237450 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:13:37,576-Speed 3070.08 samples/sec Loss 0.8283 LearningRate 0.0002 Epoch: 19 Global Step: 237460 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:13:40,965-Speed 3023.27 samples/sec Loss 0.8765 LearningRate 0.0002 Epoch: 19 Global Step: 237470 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:13:44,403-Speed 2978.95 samples/sec Loss 0.8621 LearningRate 0.0002 Epoch: 19 Global Step: 237480 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:13:47,832-Speed 2987.62 samples/sec Loss 0.8701 LearningRate 0.0002 Epoch: 19 Global Step: 237490 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:13:51,294-Speed 2958.91 samples/sec Loss 0.8722 LearningRate 0.0002 Epoch: 19 Global Step: 237500 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:13:54,770-Speed 2946.57 samples/sec Loss 0.8272 LearningRate 0.0002 Epoch: 19 Global Step: 237510 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:13:58,155-Speed 3025.65 samples/sec Loss 0.8397 LearningRate 0.0002 Epoch: 19 Global Step: 237520 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:14:01,548-Speed 3021.49 samples/sec Loss 0.8414 LearningRate 0.0002 Epoch: 19 Global Step: 237530 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:14:04,862-Speed 3089.76 samples/sec Loss 0.8836 LearningRate 0.0002 Epoch: 19 Global Step: 237540 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:14:08,204-Speed 3065.15 samples/sec Loss 0.8957 LearningRate 0.0002 Epoch: 19 Global Step: 237550 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:14:11,543-Speed 3068.18 samples/sec Loss 0.8518 LearningRate 0.0002 Epoch: 19 Global Step: 237560 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:14:14,868-Speed 3080.18 samples/sec Loss 0.8703 LearningRate 0.0002 Epoch: 19 Global Step: 237570 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:14:18,204-Speed 3070.37 samples/sec Loss 0.8631 LearningRate 0.0002 Epoch: 19 Global Step: 237580 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:14:21,610-Speed 3008.06 samples/sec Loss 0.8546 LearningRate 0.0002 Epoch: 19 Global Step: 237590 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:14:25,089-Speed 2943.87 samples/sec Loss 0.8403 LearningRate 0.0002 Epoch: 19 Global Step: 237600 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:14:28,500-Speed 3002.50 samples/sec Loss 0.8603 LearningRate 0.0002 Epoch: 19 Global Step: 237610 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:14:31,848-Speed 3059.29 samples/sec Loss 0.8800 LearningRate 0.0002 Epoch: 19 Global Step: 237620 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:14:35,293-Speed 2973.51 samples/sec Loss 0.8767 LearningRate 0.0002 Epoch: 19 Global Step: 237630 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:14:38,746-Speed 2966.28 samples/sec Loss 0.8998 LearningRate 0.0002 Epoch: 19 Global Step: 237640 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:14:42,104-Speed 3050.21 samples/sec Loss 0.8771 LearningRate 0.0002 Epoch: 19 Global Step: 237650 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:14:45,503-Speed 3013.77 samples/sec Loss 0.8964 LearningRate 0.0002 Epoch: 19 Global Step: 237660 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:14:48,835-Speed 3074.18 samples/sec Loss 0.8826 LearningRate 0.0002 Epoch: 19 Global Step: 237670 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:14:52,229-Speed 3018.12 samples/sec Loss 0.8751 LearningRate 0.0002 Epoch: 19 Global Step: 237680 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:14:55,548-Speed 3086.21 samples/sec Loss 0.8488 LearningRate 0.0002 Epoch: 19 Global Step: 237690 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:14:58,931-Speed 3028.23 samples/sec Loss 0.8754 LearningRate 0.0002 Epoch: 19 Global Step: 237700 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:15:02,408-Speed 2945.92 samples/sec Loss 0.8877 LearningRate 0.0002 Epoch: 19 Global Step: 237710 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:15:05,773-Speed 3043.18 samples/sec Loss 0.8663 LearningRate 0.0002 Epoch: 19 Global Step: 237720 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:15:09,194-Speed 2994.93 samples/sec Loss 0.8586 LearningRate 0.0002 Epoch: 19 Global Step: 237730 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:15:12,537-Speed 3064.89 samples/sec Loss 0.8864 LearningRate 0.0002 Epoch: 19 Global Step: 237740 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-28 00:15:15,884-Speed 3060.23 samples/sec Loss 0.8867 LearningRate 0.0002 Epoch: 19 Global Step: 237750 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:15:19,260-Speed 3033.73 samples/sec Loss 0.8686 LearningRate 0.0002 Epoch: 19 Global Step: 237760 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:15:22,676-Speed 2998.46 samples/sec Loss 0.8705 LearningRate 0.0002 Epoch: 19 Global Step: 237770 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:15:26,129-Speed 2966.24 samples/sec Loss 0.8316 LearningRate 0.0002 Epoch: 19 Global Step: 237780 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:15:29,454-Speed 3080.92 samples/sec Loss 0.8561 LearningRate 0.0002 Epoch: 19 Global Step: 237790 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:15:32,863-Speed 3004.52 samples/sec Loss 0.8556 LearningRate 0.0002 Epoch: 19 Global Step: 237800 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:15:36,322-Speed 2961.29 samples/sec Loss 0.9000 LearningRate 0.0002 Epoch: 19 Global Step: 237810 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:15:39,792-Speed 2951.85 samples/sec Loss 0.8660 LearningRate 0.0002 Epoch: 19 Global Step: 237820 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:15:43,182-Speed 3021.37 samples/sec Loss 0.8267 LearningRate 0.0002 Epoch: 19 Global Step: 237830 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:15:46,565-Speed 3028.19 samples/sec Loss 0.8658 LearningRate 0.0002 Epoch: 19 Global Step: 237840 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:15:49,941-Speed 3033.99 samples/sec Loss 0.8531 LearningRate 0.0002 Epoch: 19 Global Step: 237850 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:15:53,274-Speed 3072.76 samples/sec Loss 0.8290 LearningRate 0.0002 Epoch: 19 Global Step: 237860 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:15:56,657-Speed 3027.65 samples/sec Loss 0.8572 LearningRate 0.0002 Epoch: 19 Global Step: 237870 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:16:00,113-Speed 2964.09 samples/sec Loss 0.8605 LearningRate 0.0002 Epoch: 19 Global Step: 237880 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:16:03,478-Speed 3043.41 samples/sec Loss 0.8915 LearningRate 0.0002 Epoch: 19 Global Step: 237890 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:16:06,862-Speed 3027.21 samples/sec Loss 0.8808 LearningRate 0.0002 Epoch: 19 Global Step: 237900 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:16:10,247-Speed 3026.47 samples/sec Loss 0.8406 LearningRate 0.0002 Epoch: 19 Global Step: 237910 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:16:13,701-Speed 2964.77 samples/sec Loss 0.9064 LearningRate 0.0002 Epoch: 19 Global Step: 237920 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:16:17,083-Speed 3029.07 samples/sec Loss 0.8894 LearningRate 0.0002 Epoch: 19 Global Step: 237930 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:16:20,468-Speed 3025.76 samples/sec Loss 0.8688 LearningRate 0.0002 Epoch: 19 Global Step: 237940 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:16:23,869-Speed 3011.64 samples/sec Loss 0.9020 LearningRate 0.0002 Epoch: 19 Global Step: 237950 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:16:27,276-Speed 3006.74 samples/sec Loss 0.8787 LearningRate 0.0002 Epoch: 19 Global Step: 237960 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:16:30,642-Speed 3042.75 samples/sec Loss 0.8739 LearningRate 0.0002 Epoch: 19 Global Step: 237970 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:16:34,069-Speed 2988.61 samples/sec Loss 0.8599 LearningRate 0.0002 Epoch: 19 Global Step: 237980 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:16:37,405-Speed 3070.29 samples/sec Loss 0.8676 LearningRate 0.0002 Epoch: 19 Global Step: 237990 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:16:40,765-Speed 3048.63 samples/sec Loss 0.8526 LearningRate 0.0002 Epoch: 19 Global Step: 238000 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:16:44,211-Speed 2972.89 samples/sec Loss 0.8781 LearningRate 0.0002 Epoch: 19 Global Step: 238010 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:16:47,697-Speed 2938.21 samples/sec Loss 0.8799 LearningRate 0.0002 Epoch: 19 Global Step: 238020 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:16:51,072-Speed 3034.64 samples/sec Loss 0.8498 LearningRate 0.0002 Epoch: 19 Global Step: 238030 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:16:54,479-Speed 3006.07 samples/sec Loss 0.8898 LearningRate 0.0002 Epoch: 19 Global Step: 238040 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:16:57,886-Speed 3006.47 samples/sec Loss 0.8740 LearningRate 0.0002 Epoch: 19 Global Step: 238050 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:17:01,282-Speed 3016.14 samples/sec Loss 0.8709 LearningRate 0.0002 Epoch: 19 Global Step: 238060 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:17:04,597-Speed 3090.36 samples/sec Loss 0.8932 LearningRate 0.0002 Epoch: 19 Global Step: 238070 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:17:07,971-Speed 3035.70 samples/sec Loss 0.8550 LearningRate 0.0002 Epoch: 19 Global Step: 238080 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:17:11,403-Speed 2984.22 samples/sec Loss 0.8895 LearningRate 0.0002 Epoch: 19 Global Step: 238090 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:17:14,731-Speed 3078.22 samples/sec Loss 0.8970 LearningRate 0.0002 Epoch: 19 Global Step: 238100 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:17:18,158-Speed 2989.12 samples/sec Loss 0.8505 LearningRate 0.0002 Epoch: 19 Global Step: 238110 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:17:21,518-Speed 3048.23 samples/sec Loss 0.8881 LearningRate 0.0002 Epoch: 19 Global Step: 238120 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:17:24,913-Speed 3017.21 samples/sec Loss 0.8662 LearningRate 0.0002 Epoch: 19 Global Step: 238130 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:17:28,278-Speed 3044.05 samples/sec Loss 0.8855 LearningRate 0.0002 Epoch: 19 Global Step: 238140 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:17:31,661-Speed 3028.43 samples/sec Loss 0.8581 LearningRate 0.0002 Epoch: 19 Global Step: 238150 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:17:34,999-Speed 3068.19 samples/sec Loss 0.8700 LearningRate 0.0002 Epoch: 19 Global Step: 238160 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:17:38,462-Speed 2957.99 samples/sec Loss 0.8665 LearningRate 0.0002 Epoch: 19 Global Step: 238170 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:17:41,929-Speed 2953.80 samples/sec Loss 0.8569 LearningRate 0.0002 Epoch: 19 Global Step: 238180 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-28 00:17:45,254-Speed 3081.06 samples/sec Loss 0.8928 LearningRate 0.0002 Epoch: 19 Global Step: 238190 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:17:48,671-Speed 2997.46 samples/sec Loss 0.8584 LearningRate 0.0002 Epoch: 19 Global Step: 238200 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:17:52,032-Speed 3047.48 samples/sec Loss 0.8440 LearningRate 0.0002 Epoch: 19 Global Step: 238210 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:17:55,428-Speed 3016.07 samples/sec Loss 0.8775 LearningRate 0.0002 Epoch: 19 Global Step: 238220 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:17:58,841-Speed 3001.04 samples/sec Loss 0.8740 LearningRate 0.0002 Epoch: 19 Global Step: 238230 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:18:02,214-Speed 3039.04 samples/sec Loss 0.8722 LearningRate 0.0002 Epoch: 19 Global Step: 238240 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:18:05,543-Speed 3077.34 samples/sec Loss 0.8768 LearningRate 0.0002 Epoch: 19 Global Step: 238250 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:18:08,860-Speed 3087.33 samples/sec Loss 0.8656 LearningRate 0.0002 Epoch: 19 Global Step: 238260 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:18:12,248-Speed 3023.51 samples/sec Loss 0.8519 LearningRate 0.0002 Epoch: 19 Global Step: 238270 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:18:15,627-Speed 3031.82 samples/sec Loss 0.8748 LearningRate 0.0002 Epoch: 19 Global Step: 238280 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:18:18,991-Speed 3044.71 samples/sec Loss 0.8675 LearningRate 0.0002 Epoch: 19 Global Step: 238290 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-28 00:18:22,378-Speed 3024.08 samples/sec Loss 0.8978 LearningRate 0.0002 Epoch: 19 Global Step: 238300 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:18:25,731-Speed 3055.45 samples/sec Loss 0.8728 LearningRate 0.0002 Epoch: 19 Global Step: 238310 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:18:29,100-Speed 3040.26 samples/sec Loss 0.8835 LearningRate 0.0002 Epoch: 19 Global Step: 238320 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:18:32,541-Speed 2976.40 samples/sec Loss 0.8799 LearningRate 0.0002 Epoch: 19 Global Step: 238330 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:18:35,966-Speed 2991.33 samples/sec Loss 0.8441 LearningRate 0.0002 Epoch: 19 Global Step: 238340 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:18:39,432-Speed 2954.57 samples/sec Loss 0.8337 LearningRate 0.0002 Epoch: 19 Global Step: 238350 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:18:42,818-Speed 3025.45 samples/sec Loss 0.8568 LearningRate 0.0002 Epoch: 19 Global Step: 238360 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:18:46,182-Speed 3044.97 samples/sec Loss 0.8022 LearningRate 0.0002 Epoch: 19 Global Step: 238370 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:18:49,521-Speed 3067.25 samples/sec Loss 0.8685 LearningRate 0.0002 Epoch: 19 Global Step: 238380 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:18:52,890-Speed 3040.83 samples/sec Loss 0.8395 LearningRate 0.0002 Epoch: 19 Global Step: 238390 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:18:56,246-Speed 3052.03 samples/sec Loss 0.8853 LearningRate 0.0002 Epoch: 19 Global Step: 238400 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:18:59,625-Speed 3031.58 samples/sec Loss 0.9187 LearningRate 0.0002 Epoch: 19 Global Step: 238410 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:19:02,996-Speed 3038.59 samples/sec Loss 0.8840 LearningRate 0.0002 Epoch: 19 Global Step: 238420 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:19:06,447-Speed 2968.26 samples/sec Loss 0.8871 LearningRate 0.0002 Epoch: 19 Global Step: 238430 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:19:09,778-Speed 3074.37 samples/sec Loss 0.8389 LearningRate 0.0002 Epoch: 19 Global Step: 238440 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:19:13,126-Speed 3060.19 samples/sec Loss 0.8899 LearningRate 0.0002 Epoch: 19 Global Step: 238450 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:19:16,536-Speed 3003.29 samples/sec Loss 0.8602 LearningRate 0.0002 Epoch: 19 Global Step: 238460 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:19:20,003-Speed 2954.64 samples/sec Loss 0.8309 LearningRate 0.0002 Epoch: 19 Global Step: 238470 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:19:23,345-Speed 3064.61 samples/sec Loss 0.8819 LearningRate 0.0002 Epoch: 19 Global Step: 238480 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:19:26,735-Speed 3021.97 samples/sec Loss 0.9302 LearningRate 0.0002 Epoch: 19 Global Step: 238490 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:19:30,120-Speed 3025.51 samples/sec Loss 0.8565 LearningRate 0.0002 Epoch: 19 Global Step: 238500 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-28 00:19:33,461-Speed 3065.88 samples/sec Loss 0.8736 LearningRate 0.0002 Epoch: 19 Global Step: 238510 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:19:36,852-Speed 3021.02 samples/sec Loss 0.8567 LearningRate 0.0002 Epoch: 19 Global Step: 238520 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:19:40,226-Speed 3036.03 samples/sec Loss 0.8948 LearningRate 0.0002 Epoch: 19 Global Step: 238530 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:19:43,684-Speed 2962.02 samples/sec Loss 0.8938 LearningRate 0.0002 Epoch: 19 Global Step: 238540 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:19:47,057-Speed 3036.14 samples/sec Loss 0.8701 LearningRate 0.0002 Epoch: 19 Global Step: 238550 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:19:50,472-Speed 2999.82 samples/sec Loss 0.8542 LearningRate 0.0002 Epoch: 19 Global Step: 238560 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:19:53,900-Speed 2988.55 samples/sec Loss 0.8663 LearningRate 0.0002 Epoch: 19 Global Step: 238570 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:19:57,295-Speed 3016.73 samples/sec Loss 0.8493 LearningRate 0.0002 Epoch: 19 Global Step: 238580 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:20:00,665-Speed 3039.50 samples/sec Loss 0.8181 LearningRate 0.0002 Epoch: 19 Global Step: 238590 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:20:04,091-Speed 2989.99 samples/sec Loss 0.8822 LearningRate 0.0002 Epoch: 19 Global Step: 238600 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:20:07,518-Speed 2988.46 samples/sec Loss 0.8268 LearningRate 0.0002 Epoch: 19 Global Step: 238610 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-28 00:20:10,897-Speed 3031.57 samples/sec Loss 0.8938 LearningRate 0.0002 Epoch: 19 Global Step: 238620 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:20:14,274-Speed 3034.20 samples/sec Loss 0.8854 LearningRate 0.0002 Epoch: 19 Global Step: 238630 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:20:17,665-Speed 3020.01 samples/sec Loss 0.8623 LearningRate 0.0002 Epoch: 19 Global Step: 238640 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:20:21,127-Speed 2958.56 samples/sec Loss 0.8565 LearningRate 0.0002 Epoch: 19 Global Step: 238650 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:20:24,514-Speed 3024.72 samples/sec Loss 0.8741 LearningRate 0.0002 Epoch: 19 Global Step: 238660 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:20:27,887-Speed 3035.76 samples/sec Loss 0.8644 LearningRate 0.0002 Epoch: 19 Global Step: 238670 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:20:31,304-Speed 2997.85 samples/sec Loss 0.9330 LearningRate 0.0002 Epoch: 19 Global Step: 238680 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:20:34,666-Speed 3047.10 samples/sec Loss 0.8780 LearningRate 0.0002 Epoch: 19 Global Step: 238690 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:20:38,114-Speed 2970.37 samples/sec Loss 0.8926 LearningRate 0.0002 Epoch: 19 Global Step: 238700 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:20:41,530-Speed 2998.42 samples/sec Loss 0.8752 LearningRate 0.0002 Epoch: 19 Global Step: 238710 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:20:44,969-Speed 2978.49 samples/sec Loss 0.8550 LearningRate 0.0002 Epoch: 19 Global Step: 238720 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-28 00:20:48,293-Speed 3082.44 samples/sec Loss 0.8562 LearningRate 0.0002 Epoch: 19 Global Step: 238730 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:20:51,781-Speed 2936.84 samples/sec Loss 0.8835 LearningRate 0.0002 Epoch: 19 Global Step: 238740 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:20:55,194-Speed 3000.46 samples/sec Loss 0.8589 LearningRate 0.0002 Epoch: 19 Global Step: 238750 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:20:58,630-Speed 2981.37 samples/sec Loss 0.8680 LearningRate 0.0002 Epoch: 19 Global Step: 238760 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:21:02,020-Speed 3021.35 samples/sec Loss 0.8772 LearningRate 0.0002 Epoch: 19 Global Step: 238770 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:21:05,405-Speed 3030.70 samples/sec Loss 0.8819 LearningRate 0.0002 Epoch: 19 Global Step: 238780 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:21:08,870-Speed 2956.42 samples/sec Loss 0.8614 LearningRate 0.0002 Epoch: 19 Global Step: 238790 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:21:12,299-Speed 2987.17 samples/sec Loss 0.8788 LearningRate 0.0001 Epoch: 19 Global Step: 238800 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:21:15,693-Speed 3018.11 samples/sec Loss 0.8807 LearningRate 0.0001 Epoch: 19 Global Step: 238810 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:21:19,026-Speed 3072.83 samples/sec Loss 0.8969 LearningRate 0.0001 Epoch: 19 Global Step: 238820 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:21:22,446-Speed 2995.30 samples/sec Loss 0.8609 LearningRate 0.0001 Epoch: 19 Global Step: 238830 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:21:25,828-Speed 3028.31 samples/sec Loss 0.8514 LearningRate 0.0001 Epoch: 19 Global Step: 238840 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:21:29,295-Speed 2954.28 samples/sec Loss 0.8598 LearningRate 0.0001 Epoch: 19 Global Step: 238850 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:21:32,739-Speed 2974.25 samples/sec Loss 0.8597 LearningRate 0.0001 Epoch: 19 Global Step: 238860 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:21:36,162-Speed 2993.16 samples/sec Loss 0.8612 LearningRate 0.0001 Epoch: 19 Global Step: 238870 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:21:39,660-Speed 2928.55 samples/sec Loss 0.8135 LearningRate 0.0001 Epoch: 19 Global Step: 238880 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:21:43,172-Speed 2916.27 samples/sec Loss 0.8550 LearningRate 0.0001 Epoch: 19 Global Step: 238890 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:21:46,654-Speed 2941.69 samples/sec Loss 0.8526 LearningRate 0.0001 Epoch: 19 Global Step: 238900 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:21:50,107-Speed 2966.00 samples/sec Loss 0.8547 LearningRate 0.0001 Epoch: 19 Global Step: 238910 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:21:53,552-Speed 2973.93 samples/sec Loss 0.8784 LearningRate 0.0001 Epoch: 19 Global Step: 238920 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:21:56,909-Speed 3050.40 samples/sec Loss 0.8781 LearningRate 0.0001 Epoch: 19 Global Step: 238930 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:22:00,326-Speed 2997.71 samples/sec Loss 0.8756 LearningRate 0.0001 Epoch: 19 Global Step: 238940 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:22:03,759-Speed 2983.86 samples/sec Loss 0.8320 LearningRate 0.0001 Epoch: 19 Global Step: 238950 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-28 00:22:07,235-Speed 2946.46 samples/sec Loss 0.8678 LearningRate 0.0001 Epoch: 19 Global Step: 238960 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:22:10,566-Speed 3074.94 samples/sec Loss 0.8678 LearningRate 0.0001 Epoch: 19 Global Step: 238970 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:22:13,928-Speed 3047.32 samples/sec Loss 0.8493 LearningRate 0.0001 Epoch: 19 Global Step: 238980 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:22:17,297-Speed 3040.67 samples/sec Loss 0.8353 LearningRate 0.0001 Epoch: 19 Global Step: 238990 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:22:20,705-Speed 3005.33 samples/sec Loss 0.8467 LearningRate 0.0001 Epoch: 19 Global Step: 239000 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:22:24,095-Speed 3021.93 samples/sec Loss 0.8934 LearningRate 0.0001 Epoch: 19 Global Step: 239010 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:22:27,506-Speed 3002.45 samples/sec Loss 0.8574 LearningRate 0.0001 Epoch: 19 Global Step: 239020 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:22:30,930-Speed 2991.89 samples/sec Loss 0.8527 LearningRate 0.0001 Epoch: 19 Global Step: 239030 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:22:34,327-Speed 3014.91 samples/sec Loss 0.9084 LearningRate 0.0001 Epoch: 19 Global Step: 239040 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:22:37,692-Speed 3044.25 samples/sec Loss 0.8823 LearningRate 0.0001 Epoch: 19 Global Step: 239050 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:22:41,103-Speed 3002.90 samples/sec Loss 0.8853 LearningRate 0.0001 Epoch: 19 Global Step: 239060 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:22:44,468-Speed 3044.71 samples/sec Loss 0.8807 LearningRate 0.0001 Epoch: 19 Global Step: 239070 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:22:47,866-Speed 3014.03 samples/sec Loss 0.8837 LearningRate 0.0001 Epoch: 19 Global Step: 239080 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:22:51,237-Speed 3038.37 samples/sec Loss 0.8846 LearningRate 0.0001 Epoch: 19 Global Step: 239090 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:22:54,702-Speed 2956.35 samples/sec Loss 0.8643 LearningRate 0.0001 Epoch: 19 Global Step: 239100 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:22:58,067-Speed 3043.13 samples/sec Loss 0.8309 LearningRate 0.0001 Epoch: 19 Global Step: 239110 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:23:01,404-Speed 3069.84 samples/sec Loss 0.8801 LearningRate 0.0001 Epoch: 19 Global Step: 239120 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:23:04,726-Speed 3083.69 samples/sec Loss 0.8375 LearningRate 0.0001 Epoch: 19 Global Step: 239130 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:23:08,004-Speed 3124.93 samples/sec Loss 0.8879 LearningRate 0.0001 Epoch: 19 Global Step: 239140 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:23:11,363-Speed 3048.81 samples/sec Loss 0.8692 LearningRate 0.0001 Epoch: 19 Global Step: 239150 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:23:14,766-Speed 3010.71 samples/sec Loss 0.8761 LearningRate 0.0001 Epoch: 19 Global Step: 239160 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:23:18,140-Speed 3035.09 samples/sec Loss 0.8713 LearningRate 0.0001 Epoch: 19 Global Step: 239170 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:23:21,497-Speed 3051.86 samples/sec Loss 0.8730 LearningRate 0.0001 Epoch: 19 Global Step: 239180 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:23:24,843-Speed 3060.69 samples/sec Loss 0.8880 LearningRate 0.0001 Epoch: 19 Global Step: 239190 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:23:28,161-Speed 3087.07 samples/sec Loss 0.8733 LearningRate 0.0001 Epoch: 19 Global Step: 239200 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:23:31,556-Speed 3016.85 samples/sec Loss 0.8996 LearningRate 0.0001 Epoch: 19 Global Step: 239210 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:23:34,906-Speed 3058.07 samples/sec Loss 0.8353 LearningRate 0.0001 Epoch: 19 Global Step: 239220 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:23:38,279-Speed 3035.86 samples/sec Loss 0.8410 LearningRate 0.0001 Epoch: 19 Global Step: 239230 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:23:41,667-Speed 3023.97 samples/sec Loss 0.8736 LearningRate 0.0001 Epoch: 19 Global Step: 239240 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:23:44,987-Speed 3085.45 samples/sec Loss 0.8872 LearningRate 0.0001 Epoch: 19 Global Step: 239250 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:23:48,352-Speed 3043.79 samples/sec Loss 0.8775 LearningRate 0.0001 Epoch: 19 Global Step: 239260 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:23:51,773-Speed 2994.70 samples/sec Loss 0.8291 LearningRate 0.0001 Epoch: 19 Global Step: 239270 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:23:55,189-Speed 2998.39 samples/sec Loss 0.8610 LearningRate 0.0001 Epoch: 19 Global Step: 239280 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:23:58,583-Speed 3017.39 samples/sec Loss 0.8572 LearningRate 0.0001 Epoch: 19 Global Step: 239290 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:24:02,018-Speed 2982.06 samples/sec Loss 0.8717 LearningRate 0.0001 Epoch: 19 Global Step: 239300 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:24:05,464-Speed 2972.52 samples/sec Loss 0.8862 LearningRate 0.0001 Epoch: 19 Global Step: 239310 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:24:08,792-Speed 3077.72 samples/sec Loss 0.8945 LearningRate 0.0001 Epoch: 19 Global Step: 239320 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:24:12,223-Speed 2985.16 samples/sec Loss 0.9126 LearningRate 0.0001 Epoch: 19 Global Step: 239330 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:24:15,657-Speed 2991.82 samples/sec Loss 0.8808 LearningRate 0.0001 Epoch: 19 Global Step: 239340 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-28 00:24:19,026-Speed 3040.33 samples/sec Loss 0.9081 LearningRate 0.0001 Epoch: 19 Global Step: 239350 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:24:22,409-Speed 3027.56 samples/sec Loss 0.9038 LearningRate 0.0001 Epoch: 19 Global Step: 239360 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:24:25,718-Speed 3095.67 samples/sec Loss 0.8652 LearningRate 0.0001 Epoch: 19 Global Step: 239370 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:24:29,110-Speed 3019.85 samples/sec Loss 0.8802 LearningRate 0.0001 Epoch: 19 Global Step: 239380 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:24:32,429-Speed 3086.26 samples/sec Loss 0.8358 LearningRate 0.0001 Epoch: 19 Global Step: 239390 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:24:35,742-Speed 3092.69 samples/sec Loss 0.8790 LearningRate 0.0001 Epoch: 19 Global Step: 239400 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:24:39,235-Speed 2932.05 samples/sec Loss 0.8477 LearningRate 0.0001 Epoch: 19 Global Step: 239410 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:24:42,673-Speed 2979.75 samples/sec Loss 0.8254 LearningRate 0.0001 Epoch: 19 Global Step: 239420 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:24:46,002-Speed 3076.60 samples/sec Loss 0.8665 LearningRate 0.0001 Epoch: 19 Global Step: 239430 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:24:49,399-Speed 3015.22 samples/sec Loss 0.8730 LearningRate 0.0001 Epoch: 19 Global Step: 239440 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:24:52,733-Speed 3072.43 samples/sec Loss 0.8410 LearningRate 0.0001 Epoch: 19 Global Step: 239450 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:24:56,096-Speed 3045.29 samples/sec Loss 0.8874 LearningRate 0.0001 Epoch: 19 Global Step: 239460 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:24:59,556-Speed 2960.43 samples/sec Loss 0.9084 LearningRate 0.0001 Epoch: 19 Global Step: 239470 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:25:02,941-Speed 3025.80 samples/sec Loss 0.8472 LearningRate 0.0001 Epoch: 19 Global Step: 239480 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:25:06,287-Speed 3061.74 samples/sec Loss 0.8985 LearningRate 0.0001 Epoch: 19 Global Step: 239490 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:25:09,673-Speed 3024.54 samples/sec Loss 0.8854 LearningRate 0.0001 Epoch: 19 Global Step: 239500 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:25:13,061-Speed 3023.27 samples/sec Loss 0.8809 LearningRate 0.0001 Epoch: 19 Global Step: 239510 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:25:16,484-Speed 2992.46 samples/sec Loss 0.8889 LearningRate 0.0001 Epoch: 19 Global Step: 239520 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:25:19,899-Speed 2999.29 samples/sec Loss 0.8716 LearningRate 0.0001 Epoch: 19 Global Step: 239530 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:25:23,319-Speed 2994.56 samples/sec Loss 0.8426 LearningRate 0.0001 Epoch: 19 Global Step: 239540 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:25:26,767-Speed 2971.44 samples/sec Loss 0.9065 LearningRate 0.0001 Epoch: 19 Global Step: 239550 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:25:30,130-Speed 3045.50 samples/sec Loss 0.8468 LearningRate 0.0001 Epoch: 19 Global Step: 239560 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:25:33,556-Speed 2990.05 samples/sec Loss 0.8696 LearningRate 0.0001 Epoch: 19 Global Step: 239570 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:25:36,908-Speed 3054.99 samples/sec Loss 0.8493 LearningRate 0.0001 Epoch: 19 Global Step: 239580 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:25:40,232-Speed 3082.54 samples/sec Loss 0.8426 LearningRate 0.0001 Epoch: 19 Global Step: 239590 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:25:43,638-Speed 3006.58 samples/sec Loss 0.8891 LearningRate 0.0001 Epoch: 19 Global Step: 239600 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:25:46,980-Speed 3064.87 samples/sec Loss 0.8454 LearningRate 0.0001 Epoch: 19 Global Step: 239610 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:25:50,356-Speed 3034.74 samples/sec Loss 0.9104 LearningRate 0.0001 Epoch: 19 Global Step: 239620 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:25:53,768-Speed 3002.44 samples/sec Loss 0.8919 LearningRate 0.0001 Epoch: 19 Global Step: 239630 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:25:57,127-Speed 3049.06 samples/sec Loss 0.9037 LearningRate 0.0001 Epoch: 19 Global Step: 239640 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:26:00,472-Speed 3062.30 samples/sec Loss 0.8668 LearningRate 0.0001 Epoch: 19 Global Step: 239650 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:26:03,914-Speed 2976.06 samples/sec Loss 0.8996 LearningRate 0.0001 Epoch: 19 Global Step: 239660 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:26:07,295-Speed 3029.35 samples/sec Loss 0.8566 LearningRate 0.0001 Epoch: 19 Global Step: 239670 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:26:10,719-Speed 2991.30 samples/sec Loss 0.8450 LearningRate 0.0001 Epoch: 19 Global Step: 239680 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:26:14,131-Speed 3002.17 samples/sec Loss 0.8498 LearningRate 0.0001 Epoch: 19 Global Step: 239690 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:26:17,544-Speed 3001.32 samples/sec Loss 0.8892 LearningRate 0.0001 Epoch: 19 Global Step: 239700 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:26:20,980-Speed 2980.32 samples/sec Loss 0.8688 LearningRate 0.0001 Epoch: 19 Global Step: 239710 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-28 00:26:24,426-Speed 2972.61 samples/sec Loss 0.8770 LearningRate 0.0001 Epoch: 19 Global Step: 239720 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:26:27,835-Speed 3004.58 samples/sec Loss 0.8499 LearningRate 0.0001 Epoch: 19 Global Step: 239730 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:26:31,271-Speed 2981.25 samples/sec Loss 0.9033 LearningRate 0.0001 Epoch: 19 Global Step: 239740 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:26:34,697-Speed 2989.36 samples/sec Loss 0.8658 LearningRate 0.0001 Epoch: 19 Global Step: 239750 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:26:38,047-Speed 3057.84 samples/sec Loss 0.8682 LearningRate 0.0001 Epoch: 19 Global Step: 239760 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:26:41,451-Speed 3009.02 samples/sec Loss 0.8709 LearningRate 0.0001 Epoch: 19 Global Step: 239770 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:26:44,838-Speed 3023.84 samples/sec Loss 0.8677 LearningRate 0.0001 Epoch: 19 Global Step: 239780 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:26:48,268-Speed 2986.00 samples/sec Loss 0.8511 LearningRate 0.0001 Epoch: 19 Global Step: 239790 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:26:51,707-Speed 2978.83 samples/sec Loss 0.9061 LearningRate 0.0001 Epoch: 19 Global Step: 239800 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:26:55,110-Speed 3009.89 samples/sec Loss 0.8540 LearningRate 0.0001 Epoch: 19 Global Step: 239810 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:26:58,556-Speed 2972.61 samples/sec Loss 0.8914 LearningRate 0.0001 Epoch: 19 Global Step: 239820 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:27:01,935-Speed 3030.81 samples/sec Loss 0.8936 LearningRate 0.0001 Epoch: 19 Global Step: 239830 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:27:05,307-Speed 3037.50 samples/sec Loss 0.8812 LearningRate 0.0001 Epoch: 19 Global Step: 239840 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:27:08,759-Speed 2967.28 samples/sec Loss 0.8849 LearningRate 0.0001 Epoch: 19 Global Step: 239850 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:27:12,232-Speed 2949.73 samples/sec Loss 0.8695 LearningRate 0.0001 Epoch: 19 Global Step: 239860 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:27:15,553-Speed 3083.74 samples/sec Loss 0.8975 LearningRate 0.0001 Epoch: 19 Global Step: 239870 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:27:18,991-Speed 2979.43 samples/sec Loss 0.8549 LearningRate 0.0001 Epoch: 19 Global Step: 239880 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:27:22,345-Speed 3053.82 samples/sec Loss 0.8454 LearningRate 0.0001 Epoch: 19 Global Step: 239890 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:27:25,742-Speed 3014.85 samples/sec Loss 0.8770 LearningRate 0.0001 Epoch: 19 Global Step: 239900 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:27:29,217-Speed 2947.91 samples/sec Loss 0.8469 LearningRate 0.0001 Epoch: 19 Global Step: 239910 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:27:32,645-Speed 2988.12 samples/sec Loss 0.8631 LearningRate 0.0001 Epoch: 19 Global Step: 239920 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:27:36,091-Speed 2972.41 samples/sec Loss 0.8766 LearningRate 0.0001 Epoch: 19 Global Step: 239930 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:27:39,455-Speed 3044.99 samples/sec Loss 0.8638 LearningRate 0.0001 Epoch: 19 Global Step: 239940 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:27:42,822-Speed 3042.25 samples/sec Loss 0.8546 LearningRate 0.0001 Epoch: 19 Global Step: 239950 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:27:46,189-Speed 3042.10 samples/sec Loss 0.8978 LearningRate 0.0001 Epoch: 19 Global Step: 239960 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:27:49,605-Speed 2998.38 samples/sec Loss 0.8486 LearningRate 0.0001 Epoch: 19 Global Step: 239970 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:27:53,036-Speed 2985.23 samples/sec Loss 0.9117 LearningRate 0.0001 Epoch: 19 Global Step: 239980 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:27:56,436-Speed 3012.36 samples/sec Loss 0.8449 LearningRate 0.0001 Epoch: 19 Global Step: 239990 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:27:59,914-Speed 2945.57 samples/sec Loss 0.8958 LearningRate 0.0001 Epoch: 19 Global Step: 240000 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:28:03,363-Speed 2969.27 samples/sec Loss 0.8965 LearningRate 0.0001 Epoch: 19 Global Step: 240010 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:28:06,776-Speed 3001.71 samples/sec Loss 0.8704 LearningRate 0.0001 Epoch: 19 Global Step: 240020 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:28:10,128-Speed 3055.12 samples/sec Loss 0.8536 LearningRate 0.0001 Epoch: 19 Global Step: 240030 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:28:13,523-Speed 3016.93 samples/sec Loss 0.8664 LearningRate 0.0001 Epoch: 19 Global Step: 240040 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:28:16,930-Speed 3007.93 samples/sec Loss 0.8787 LearningRate 0.0001 Epoch: 19 Global Step: 240050 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:28:20,354-Speed 2991.33 samples/sec Loss 0.9036 LearningRate 0.0001 Epoch: 19 Global Step: 240060 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:28:23,842-Speed 2936.63 samples/sec Loss 0.8386 LearningRate 0.0001 Epoch: 19 Global Step: 240070 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:28:27,247-Speed 3008.42 samples/sec Loss 0.8727 LearningRate 0.0001 Epoch: 19 Global Step: 240080 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:28:30,569-Speed 3082.85 samples/sec Loss 0.8532 LearningRate 0.0001 Epoch: 19 Global Step: 240090 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:28:33,927-Speed 3050.86 samples/sec Loss 0.8813 LearningRate 0.0001 Epoch: 19 Global Step: 240100 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:28:37,274-Speed 3060.05 samples/sec Loss 0.8862 LearningRate 0.0001 Epoch: 19 Global Step: 240110 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:28:40,645-Speed 3038.42 samples/sec Loss 0.8374 LearningRate 0.0001 Epoch: 19 Global Step: 240120 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:28:43,994-Speed 3058.12 samples/sec Loss 0.8741 LearningRate 0.0001 Epoch: 19 Global Step: 240130 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:28:47,334-Speed 3066.77 samples/sec Loss 0.8548 LearningRate 0.0001 Epoch: 19 Global Step: 240140 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:28:50,748-Speed 3000.48 samples/sec Loss 0.8948 LearningRate 0.0001 Epoch: 19 Global Step: 240150 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:28:54,215-Speed 2954.22 samples/sec Loss 0.8768 LearningRate 0.0001 Epoch: 19 Global Step: 240160 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:28:57,668-Speed 2966.26 samples/sec Loss 0.8310 LearningRate 0.0001 Epoch: 19 Global Step: 240170 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:29:01,109-Speed 2976.49 samples/sec Loss 0.8251 LearningRate 0.0001 Epoch: 19 Global Step: 240180 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:29:04,520-Speed 3003.15 samples/sec Loss 0.8698 LearningRate 0.0001 Epoch: 19 Global Step: 240190 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:29:07,909-Speed 3022.37 samples/sec Loss 0.8354 LearningRate 0.0001 Epoch: 19 Global Step: 240200 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:29:11,239-Speed 3075.81 samples/sec Loss 0.9171 LearningRate 0.0001 Epoch: 19 Global Step: 240210 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:29:14,618-Speed 3031.64 samples/sec Loss 0.8684 LearningRate 0.0001 Epoch: 19 Global Step: 240220 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:29:17,977-Speed 3049.50 samples/sec Loss 0.8588 LearningRate 0.0001 Epoch: 19 Global Step: 240230 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:29:21,342-Speed 3043.29 samples/sec Loss 0.8577 LearningRate 0.0001 Epoch: 19 Global Step: 240240 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:29:24,702-Speed 3048.65 samples/sec Loss 0.8425 LearningRate 0.0001 Epoch: 19 Global Step: 240250 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:29:28,124-Speed 2992.98 samples/sec Loss 0.8684 LearningRate 0.0001 Epoch: 19 Global Step: 240260 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:29:31,613-Speed 2935.89 samples/sec Loss 0.8760 LearningRate 0.0001 Epoch: 19 Global Step: 240270 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:29:35,056-Speed 2975.28 samples/sec Loss 0.9070 LearningRate 0.0001 Epoch: 19 Global Step: 240280 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:29:38,514-Speed 2962.83 samples/sec Loss 0.8601 LearningRate 0.0001 Epoch: 19 Global Step: 240290 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:29:41,924-Speed 3003.75 samples/sec Loss 0.9272 LearningRate 0.0001 Epoch: 19 Global Step: 240300 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:29:45,298-Speed 3035.69 samples/sec Loss 0.8899 LearningRate 0.0001 Epoch: 19 Global Step: 240310 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:29:48,650-Speed 3056.29 samples/sec Loss 0.8723 LearningRate 0.0001 Epoch: 19 Global Step: 240320 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:29:52,002-Speed 3055.55 samples/sec Loss 0.8838 LearningRate 0.0001 Epoch: 19 Global Step: 240330 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:29:55,479-Speed 2945.91 samples/sec Loss 0.8701 LearningRate 0.0001 Epoch: 19 Global Step: 240340 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:29:58,862-Speed 3027.63 samples/sec Loss 0.8583 LearningRate 0.0001 Epoch: 19 Global Step: 240350 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:30:02,211-Speed 3058.62 samples/sec Loss 0.8947 LearningRate 0.0001 Epoch: 19 Global Step: 240360 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:30:05,596-Speed 3025.62 samples/sec Loss 0.8597 LearningRate 0.0001 Epoch: 19 Global Step: 240370 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:30:09,007-Speed 3003.71 samples/sec Loss 0.8490 LearningRate 0.0001 Epoch: 19 Global Step: 240380 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:30:12,405-Speed 3013.66 samples/sec Loss 0.8769 LearningRate 0.0001 Epoch: 19 Global Step: 240390 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:30:15,820-Speed 2999.17 samples/sec Loss 0.8799 LearningRate 0.0001 Epoch: 19 Global Step: 240400 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-28 00:30:19,200-Speed 3031.08 samples/sec Loss 0.8495 LearningRate 0.0001 Epoch: 19 Global Step: 240410 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:30:22,640-Speed 2976.87 samples/sec Loss 0.8787 LearningRate 0.0001 Epoch: 19 Global Step: 240420 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:30:25,999-Speed 3050.28 samples/sec Loss 0.8844 LearningRate 0.0001 Epoch: 19 Global Step: 240430 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:30:29,369-Speed 3039.04 samples/sec Loss 0.8561 LearningRate 0.0001 Epoch: 19 Global Step: 240440 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:30:32,809-Speed 2977.94 samples/sec Loss 0.8915 LearningRate 0.0001 Epoch: 19 Global Step: 240450 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:30:36,128-Speed 3085.56 samples/sec Loss 0.8805 LearningRate 0.0001 Epoch: 19 Global Step: 240460 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:30:39,502-Speed 3035.81 samples/sec Loss 0.8738 LearningRate 0.0001 Epoch: 19 Global Step: 240470 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:30:42,947-Speed 2973.46 samples/sec Loss 0.8193 LearningRate 0.0001 Epoch: 19 Global Step: 240480 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:30:46,300-Speed 3054.67 samples/sec Loss 0.8773 LearningRate 0.0001 Epoch: 19 Global Step: 240490 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:30:49,729-Speed 2987.26 samples/sec Loss 0.8784 LearningRate 0.0001 Epoch: 19 Global Step: 240500 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:30:53,114-Speed 3025.67 samples/sec Loss 0.8536 LearningRate 0.0001 Epoch: 19 Global Step: 240510 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-28 00:30:56,443-Speed 3077.32 samples/sec Loss 0.8357 LearningRate 0.0001 Epoch: 19 Global Step: 240520 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:30:59,862-Speed 2995.92 samples/sec Loss 0.8469 LearningRate 0.0001 Epoch: 19 Global Step: 240530 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:31:03,249-Speed 3024.04 samples/sec Loss 0.8777 LearningRate 0.0001 Epoch: 19 Global Step: 240540 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:31:06,622-Speed 3036.81 samples/sec Loss 0.9257 LearningRate 0.0001 Epoch: 19 Global Step: 240550 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:31:09,978-Speed 3051.68 samples/sec Loss 0.8858 LearningRate 0.0001 Epoch: 19 Global Step: 240560 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:31:13,382-Speed 3009.44 samples/sec Loss 0.9325 LearningRate 0.0001 Epoch: 19 Global Step: 240570 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:31:16,812-Speed 2985.76 samples/sec Loss 0.8409 LearningRate 0.0001 Epoch: 19 Global Step: 240580 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:31:20,218-Speed 3007.75 samples/sec Loss 0.8513 LearningRate 0.0001 Epoch: 19 Global Step: 240590 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:31:23,667-Speed 2969.37 samples/sec Loss 0.8505 LearningRate 0.0001 Epoch: 19 Global Step: 240600 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:31:27,077-Speed 3004.14 samples/sec Loss 0.8988 LearningRate 0.0001 Epoch: 19 Global Step: 240610 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:31:30,427-Speed 3057.45 samples/sec Loss 0.8276 LearningRate 0.0001 Epoch: 19 Global Step: 240620 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-28 00:31:33,816-Speed 3022.33 samples/sec Loss 0.8549 LearningRate 0.0001 Epoch: 19 Global Step: 240630 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-28 00:31:37,155-Speed 3067.42 samples/sec Loss 0.8469 LearningRate 0.0001 Epoch: 19 Global Step: 240640 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:31:40,531-Speed 3034.61 samples/sec Loss 0.8679 LearningRate 0.0001 Epoch: 19 Global Step: 240650 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:31:43,894-Speed 3045.82 samples/sec Loss 0.8473 LearningRate 0.0001 Epoch: 19 Global Step: 240660 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:31:47,295-Speed 3011.54 samples/sec Loss 0.8539 LearningRate 0.0001 Epoch: 19 Global Step: 240670 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:31:50,679-Speed 3027.14 samples/sec Loss 0.8785 LearningRate 0.0001 Epoch: 19 Global Step: 240680 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:31:54,013-Speed 3072.37 samples/sec Loss 0.8740 LearningRate 0.0001 Epoch: 19 Global Step: 240690 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:31:57,359-Speed 3061.42 samples/sec Loss 0.9034 LearningRate 0.0001 Epoch: 19 Global Step: 240700 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:32:00,802-Speed 2974.89 samples/sec Loss 0.9036 LearningRate 0.0001 Epoch: 19 Global Step: 240710 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:32:04,139-Speed 3069.93 samples/sec Loss 0.8750 LearningRate 0.0001 Epoch: 19 Global Step: 240720 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:32:07,539-Speed 3012.37 samples/sec Loss 0.8871 LearningRate 0.0001 Epoch: 19 Global Step: 240730 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:32:10,934-Speed 3017.30 samples/sec Loss 0.8814 LearningRate 0.0001 Epoch: 19 Global Step: 240740 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:32:14,367-Speed 2983.32 samples/sec Loss 0.8765 LearningRate 0.0001 Epoch: 19 Global Step: 240750 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:32:17,713-Speed 3063.16 samples/sec Loss 0.8540 LearningRate 0.0001 Epoch: 19 Global Step: 240760 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:32:21,111-Speed 3014.48 samples/sec Loss 0.8331 LearningRate 0.0001 Epoch: 19 Global Step: 240770 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:32:24,491-Speed 3030.10 samples/sec Loss 0.8681 LearningRate 0.0001 Epoch: 19 Global Step: 240780 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:32:27,790-Speed 3105.32 samples/sec Loss 0.8797 LearningRate 0.0001 Epoch: 19 Global Step: 240790 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:32:31,171-Speed 3029.18 samples/sec Loss 0.8478 LearningRate 0.0001 Epoch: 19 Global Step: 240800 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:32:34,497-Speed 3079.94 samples/sec Loss 0.8720 LearningRate 0.0001 Epoch: 19 Global Step: 240810 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:32:37,870-Speed 3036.52 samples/sec Loss 0.8656 LearningRate 0.0001 Epoch: 19 Global Step: 240820 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:32:41,208-Speed 3068.27 samples/sec Loss 0.8420 LearningRate 0.0001 Epoch: 19 Global Step: 240830 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:32:44,581-Speed 3037.01 samples/sec Loss 0.8965 LearningRate 0.0001 Epoch: 19 Global Step: 240840 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:32:47,968-Speed 3024.35 samples/sec Loss 0.8969 LearningRate 0.0001 Epoch: 19 Global Step: 240850 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:32:51,307-Speed 3067.15 samples/sec Loss 0.8421 LearningRate 0.0001 Epoch: 19 Global Step: 240860 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:32:54,631-Speed 3081.35 samples/sec Loss 0.8682 LearningRate 0.0001 Epoch: 19 Global Step: 240870 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:32:57,960-Speed 3077.24 samples/sec Loss 0.8904 LearningRate 0.0001 Epoch: 19 Global Step: 240880 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:33:01,374-Speed 3000.16 samples/sec Loss 0.8776 LearningRate 0.0001 Epoch: 19 Global Step: 240890 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:33:04,725-Speed 3057.32 samples/sec Loss 0.8832 LearningRate 0.0001 Epoch: 19 Global Step: 240900 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:33:08,140-Speed 2999.16 samples/sec Loss 0.8555 LearningRate 0.0001 Epoch: 19 Global Step: 240910 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:33:11,585-Speed 2973.83 samples/sec Loss 0.8592 LearningRate 0.0001 Epoch: 19 Global Step: 240920 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:33:15,043-Speed 2961.67 samples/sec Loss 0.8610 LearningRate 0.0001 Epoch: 19 Global Step: 240930 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:33:18,475-Speed 2984.66 samples/sec Loss 0.8950 LearningRate 0.0001 Epoch: 19 Global Step: 240940 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:33:21,892-Speed 2997.09 samples/sec Loss 0.9010 LearningRate 0.0001 Epoch: 19 Global Step: 240950 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:33:25,287-Speed 3017.65 samples/sec Loss 0.8693 LearningRate 0.0001 Epoch: 19 Global Step: 240960 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:33:28,681-Speed 3018.26 samples/sec Loss 0.8443 LearningRate 0.0001 Epoch: 19 Global Step: 240970 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:33:32,040-Speed 3049.47 samples/sec Loss 0.8708 LearningRate 0.0001 Epoch: 19 Global Step: 240980 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:33:35,376-Speed 3070.60 samples/sec Loss 0.8363 LearningRate 0.0001 Epoch: 19 Global Step: 240990 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:33:38,777-Speed 3011.68 samples/sec Loss 0.8903 LearningRate 0.0001 Epoch: 19 Global Step: 241000 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:33:42,219-Speed 2975.80 samples/sec Loss 0.8858 LearningRate 0.0001 Epoch: 19 Global Step: 241010 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:33:45,528-Speed 3095.59 samples/sec Loss 0.8611 LearningRate 0.0001 Epoch: 19 Global Step: 241020 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-28 00:33:48,888-Speed 3047.91 samples/sec Loss 0.8644 LearningRate 0.0001 Epoch: 19 Global Step: 241030 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:33:52,405-Speed 2912.75 samples/sec Loss 0.8945 LearningRate 0.0001 Epoch: 19 Global Step: 241040 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:33:55,761-Speed 3051.61 samples/sec Loss 0.8553 LearningRate 0.0001 Epoch: 19 Global Step: 241050 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:33:59,077-Speed 3089.29 samples/sec Loss 0.9115 LearningRate 0.0001 Epoch: 19 Global Step: 241060 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:34:02,411-Speed 3072.18 samples/sec Loss 0.8565 LearningRate 0.0001 Epoch: 19 Global Step: 241070 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:34:05,800-Speed 3021.82 samples/sec Loss 0.8826 LearningRate 0.0001 Epoch: 19 Global Step: 241080 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:34:09,217-Speed 2998.15 samples/sec Loss 0.9022 LearningRate 0.0001 Epoch: 19 Global Step: 241090 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:34:12,558-Speed 3065.32 samples/sec Loss 0.9395 LearningRate 0.0001 Epoch: 19 Global Step: 241100 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:34:15,920-Speed 3047.02 samples/sec Loss 0.8648 LearningRate 0.0001 Epoch: 19 Global Step: 241110 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:34:19,330-Speed 3004.14 samples/sec Loss 0.9125 LearningRate 0.0001 Epoch: 19 Global Step: 241120 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:34:22,679-Speed 3058.17 samples/sec Loss 0.8857 LearningRate 0.0001 Epoch: 19 Global Step: 241130 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-28 00:34:26,017-Speed 3068.82 samples/sec Loss 0.8360 LearningRate 0.0001 Epoch: 19 Global Step: 241140 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:34:29,426-Speed 3004.57 samples/sec Loss 0.8791 LearningRate 0.0001 Epoch: 19 Global Step: 241150 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:34:32,808-Speed 3028.23 samples/sec Loss 0.9129 LearningRate 0.0001 Epoch: 19 Global Step: 241160 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:34:36,166-Speed 3050.71 samples/sec Loss 0.8341 LearningRate 0.0001 Epoch: 19 Global Step: 241170 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:34:39,508-Speed 3065.07 samples/sec Loss 0.9102 LearningRate 0.0001 Epoch: 19 Global Step: 241180 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:34:42,905-Speed 3015.10 samples/sec Loss 0.8942 LearningRate 0.0001 Epoch: 19 Global Step: 241190 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:34:46,293-Speed 3022.97 samples/sec Loss 0.8504 LearningRate 0.0001 Epoch: 19 Global Step: 241200 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:34:49,611-Speed 3087.46 samples/sec Loss 0.8682 LearningRate 0.0001 Epoch: 19 Global Step: 241210 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:34:53,001-Speed 3021.43 samples/sec Loss 0.9095 LearningRate 0.0001 Epoch: 19 Global Step: 241220 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:34:56,498-Speed 2929.03 samples/sec Loss 0.9153 LearningRate 0.0001 Epoch: 19 Global Step: 241230 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:34:59,897-Speed 3014.16 samples/sec Loss 0.8445 LearningRate 0.0001 Epoch: 19 Global Step: 241240 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-28 00:35:03,318-Speed 2993.84 samples/sec Loss 0.8687 LearningRate 0.0001 Epoch: 19 Global Step: 241250 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-28 00:35:06,741-Speed 2993.03 samples/sec Loss 0.8672 LearningRate 0.0001 Epoch: 19 Global Step: 241260 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-28 00:35:10,048-Speed 3097.06 samples/sec Loss 0.8510 LearningRate 0.0001 Epoch: 19 Global Step: 241270 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:35:13,380-Speed 3074.95 samples/sec Loss 0.8217 LearningRate 0.0001 Epoch: 19 Global Step: 241280 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:35:16,752-Speed 3037.49 samples/sec Loss 0.8560 LearningRate 0.0001 Epoch: 19 Global Step: 241290 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:35:20,092-Speed 3065.94 samples/sec Loss 0.8715 LearningRate 0.0001 Epoch: 19 Global Step: 241300 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:35:23,402-Speed 3094.40 samples/sec Loss 0.8742 LearningRate 0.0001 Epoch: 19 Global Step: 241310 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:35:26,791-Speed 3023.03 samples/sec Loss 0.8405 LearningRate 0.0001 Epoch: 19 Global Step: 241320 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:35:30,131-Speed 3066.09 samples/sec Loss 0.8467 LearningRate 0.0001 Epoch: 19 Global Step: 241330 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:35:33,464-Speed 3073.34 samples/sec Loss 0.9196 LearningRate 0.0001 Epoch: 19 Global Step: 241340 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:35:36,799-Speed 3071.60 samples/sec Loss 0.8617 LearningRate 0.0001 Epoch: 19 Global Step: 241350 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:35:40,139-Speed 3066.44 samples/sec Loss 0.8579 LearningRate 0.0001 Epoch: 19 Global Step: 241360 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:35:43,493-Speed 3054.26 samples/sec Loss 0.8287 LearningRate 0.0001 Epoch: 19 Global Step: 241370 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-28 00:35:46,832-Speed 3067.75 samples/sec Loss 0.8882 LearningRate 0.0001 Epoch: 19 Global Step: 241380 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:35:50,232-Speed 3012.21 samples/sec Loss 0.8578 LearningRate 0.0001 Epoch: 19 Global Step: 241390 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:35:53,595-Speed 3046.15 samples/sec Loss 0.8994 LearningRate 0.0001 Epoch: 19 Global Step: 241400 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:35:56,987-Speed 3019.76 samples/sec Loss 0.8688 LearningRate 0.0001 Epoch: 19 Global Step: 241410 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:36:00,495-Speed 2919.88 samples/sec Loss 0.8585 LearningRate 0.0001 Epoch: 19 Global Step: 241420 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:36:04,002-Speed 2920.52 samples/sec Loss 0.8642 LearningRate 0.0001 Epoch: 19 Global Step: 241430 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:36:07,389-Speed 3024.31 samples/sec Loss 0.8448 LearningRate 0.0001 Epoch: 19 Global Step: 241440 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:36:10,751-Speed 3046.27 samples/sec Loss 0.8801 LearningRate 0.0001 Epoch: 19 Global Step: 241450 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:36:14,233-Speed 2942.44 samples/sec Loss 0.9043 LearningRate 0.0001 Epoch: 19 Global Step: 241460 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:36:17,667-Speed 2982.63 samples/sec Loss 0.8749 LearningRate 0.0001 Epoch: 19 Global Step: 241470 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:36:21,101-Speed 2982.43 samples/sec Loss 0.8571 LearningRate 0.0001 Epoch: 19 Global Step: 241480 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:36:24,485-Speed 3026.74 samples/sec Loss 0.9006 LearningRate 0.0001 Epoch: 19 Global Step: 241490 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:36:27,834-Speed 3058.33 samples/sec Loss 0.8477 LearningRate 0.0001 Epoch: 19 Global Step: 241500 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:36:31,279-Speed 2973.29 samples/sec Loss 0.8563 LearningRate 0.0001 Epoch: 19 Global Step: 241510 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:36:34,645-Speed 3043.37 samples/sec Loss 0.8538 LearningRate 0.0001 Epoch: 19 Global Step: 241520 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:36:37,999-Speed 3053.65 samples/sec Loss 0.9275 LearningRate 0.0001 Epoch: 19 Global Step: 241530 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:36:41,363-Speed 3044.89 samples/sec Loss 0.8734 LearningRate 0.0001 Epoch: 19 Global Step: 241540 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:36:44,692-Speed 3077.23 samples/sec Loss 0.8757 LearningRate 0.0001 Epoch: 19 Global Step: 241550 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:36:48,068-Speed 3033.88 samples/sec Loss 0.8583 LearningRate 0.0001 Epoch: 19 Global Step: 241560 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:36:51,466-Speed 3014.74 samples/sec Loss 0.8593 LearningRate 0.0001 Epoch: 19 Global Step: 241570 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:36:54,813-Speed 3059.74 samples/sec Loss 0.8784 LearningRate 0.0001 Epoch: 19 Global Step: 241580 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:36:58,169-Speed 3052.36 samples/sec Loss 0.8577 LearningRate 0.0001 Epoch: 19 Global Step: 241590 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:37:01,542-Speed 3036.86 samples/sec Loss 0.8196 LearningRate 0.0001 Epoch: 19 Global Step: 241600 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:37:04,992-Speed 2968.83 samples/sec Loss 0.8830 LearningRate 0.0001 Epoch: 19 Global Step: 241610 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:37:08,338-Speed 3061.14 samples/sec Loss 0.8609 LearningRate 0.0001 Epoch: 19 Global Step: 241620 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:37:11,734-Speed 3016.88 samples/sec Loss 0.8693 LearningRate 0.0001 Epoch: 19 Global Step: 241630 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:37:15,227-Speed 2931.95 samples/sec Loss 0.8715 LearningRate 0.0001 Epoch: 19 Global Step: 241640 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:37:18,649-Speed 2993.08 samples/sec Loss 0.8793 LearningRate 0.0001 Epoch: 19 Global Step: 241650 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:37:22,089-Speed 2977.90 samples/sec Loss 0.8923 LearningRate 0.0001 Epoch: 19 Global Step: 241660 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:37:25,453-Speed 3044.37 samples/sec Loss 0.8995 LearningRate 0.0001 Epoch: 19 Global Step: 241670 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:37:28,824-Speed 3039.26 samples/sec Loss 0.8792 LearningRate 0.0001 Epoch: 19 Global Step: 241680 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:37:32,195-Speed 3037.87 samples/sec Loss 0.8727 LearningRate 0.0001 Epoch: 19 Global Step: 241690 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:37:35,540-Speed 3062.44 samples/sec Loss 0.8603 LearningRate 0.0001 Epoch: 19 Global Step: 241700 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:37:38,873-Speed 3073.30 samples/sec Loss 0.8660 LearningRate 0.0001 Epoch: 19 Global Step: 241710 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:37:42,187-Speed 3090.54 samples/sec Loss 0.8581 LearningRate 0.0001 Epoch: 19 Global Step: 241720 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:37:45,581-Speed 3017.58 samples/sec Loss 0.8852 LearningRate 0.0001 Epoch: 19 Global Step: 241730 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:37:48,946-Speed 3044.07 samples/sec Loss 0.8802 LearningRate 0.0001 Epoch: 19 Global Step: 241740 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:37:52,323-Speed 3033.24 samples/sec Loss 0.8477 LearningRate 0.0001 Epoch: 19 Global Step: 241750 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:37:55,733-Speed 3003.45 samples/sec Loss 0.8908 LearningRate 0.0001 Epoch: 19 Global Step: 241760 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:37:59,080-Speed 3060.37 samples/sec Loss 0.8543 LearningRate 0.0001 Epoch: 19 Global Step: 241770 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:38:02,475-Speed 3017.08 samples/sec Loss 0.8637 LearningRate 0.0001 Epoch: 19 Global Step: 241780 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:38:05,827-Speed 3055.46 samples/sec Loss 0.8910 LearningRate 0.0001 Epoch: 19 Global Step: 241790 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:38:09,163-Speed 3071.02 samples/sec Loss 0.8902 LearningRate 0.0001 Epoch: 19 Global Step: 241800 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:38:12,493-Speed 3076.01 samples/sec Loss 0.9249 LearningRate 0.0001 Epoch: 19 Global Step: 241810 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:38:15,811-Speed 3086.79 samples/sec Loss 0.8757 LearningRate 0.0001 Epoch: 19 Global Step: 241820 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:38:19,131-Speed 3085.66 samples/sec Loss 0.8811 LearningRate 0.0001 Epoch: 19 Global Step: 241830 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:38:22,480-Speed 3058.43 samples/sec Loss 0.8493 LearningRate 0.0001 Epoch: 19 Global Step: 241840 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:38:25,805-Speed 3080.86 samples/sec Loss 0.8477 LearningRate 0.0001 Epoch: 19 Global Step: 241850 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:38:29,214-Speed 3004.59 samples/sec Loss 0.8793 LearningRate 0.0001 Epoch: 19 Global Step: 241860 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:38:32,569-Speed 3053.26 samples/sec Loss 0.9010 LearningRate 0.0001 Epoch: 19 Global Step: 241870 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:38:35,912-Speed 3063.82 samples/sec Loss 0.8422 LearningRate 0.0001 Epoch: 19 Global Step: 241880 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:38:39,300-Speed 3023.01 samples/sec Loss 0.8358 LearningRate 0.0001 Epoch: 19 Global Step: 241890 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:38:42,698-Speed 3014.25 samples/sec Loss 0.8632 LearningRate 0.0001 Epoch: 19 Global Step: 241900 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:38:46,109-Speed 3003.61 samples/sec Loss 0.8841 LearningRate 0.0001 Epoch: 19 Global Step: 241910 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:38:49,512-Speed 3009.25 samples/sec Loss 0.9276 LearningRate 0.0001 Epoch: 19 Global Step: 241920 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:38:52,846-Speed 3072.64 samples/sec Loss 0.9393 LearningRate 0.0001 Epoch: 19 Global Step: 241930 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:38:56,218-Speed 3037.75 samples/sec Loss 0.8910 LearningRate 0.0001 Epoch: 19 Global Step: 241940 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:38:59,600-Speed 3028.20 samples/sec Loss 0.8828 LearningRate 0.0001 Epoch: 19 Global Step: 241950 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:39:02,974-Speed 3035.71 samples/sec Loss 0.8740 LearningRate 0.0001 Epoch: 19 Global Step: 241960 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:39:06,286-Speed 3092.52 samples/sec Loss 0.8781 LearningRate 0.0001 Epoch: 19 Global Step: 241970 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:39:09,658-Speed 3038.21 samples/sec Loss 0.9045 LearningRate 0.0001 Epoch: 19 Global Step: 241980 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:39:13,017-Speed 3048.87 samples/sec Loss 0.8879 LearningRate 0.0001 Epoch: 19 Global Step: 241990 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:39:16,368-Speed 3056.72 samples/sec Loss 0.8884 LearningRate 0.0001 Epoch: 19 Global Step: 242000 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:39:19,792-Speed 2991.77 samples/sec Loss 0.9460 LearningRate 0.0001 Epoch: 19 Global Step: 242010 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:39:23,180-Speed 3023.75 samples/sec Loss 0.8331 LearningRate 0.0001 Epoch: 19 Global Step: 242020 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:39:26,548-Speed 3040.60 samples/sec Loss 0.8672 LearningRate 0.0001 Epoch: 19 Global Step: 242030 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:39:29,901-Speed 3055.50 samples/sec Loss 0.8715 LearningRate 0.0001 Epoch: 19 Global Step: 242040 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:39:33,220-Speed 3085.36 samples/sec Loss 0.8963 LearningRate 0.0001 Epoch: 19 Global Step: 242050 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:39:36,558-Speed 3068.34 samples/sec Loss 0.8761 LearningRate 0.0001 Epoch: 19 Global Step: 242060 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:39:39,949-Speed 3020.51 samples/sec Loss 0.8748 LearningRate 0.0001 Epoch: 19 Global Step: 242070 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:39:43,290-Speed 3066.37 samples/sec Loss 0.8803 LearningRate 0.0001 Epoch: 19 Global Step: 242080 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:39:46,656-Speed 3042.27 samples/sec Loss 0.9496 LearningRate 0.0001 Epoch: 19 Global Step: 242090 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:39:50,031-Speed 3035.58 samples/sec Loss 0.9045 LearningRate 0.0001 Epoch: 19 Global Step: 242100 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:39:53,384-Speed 3054.86 samples/sec Loss 0.8662 LearningRate 0.0001 Epoch: 19 Global Step: 242110 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:39:56,732-Speed 3059.25 samples/sec Loss 0.8596 LearningRate 0.0001 Epoch: 19 Global Step: 242120 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:40:00,062-Speed 3076.07 samples/sec Loss 0.8949 LearningRate 0.0001 Epoch: 19 Global Step: 242130 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:40:03,421-Speed 3049.08 samples/sec Loss 0.8569 LearningRate 0.0001 Epoch: 19 Global Step: 242140 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:40:06,805-Speed 3027.19 samples/sec Loss 0.8804 LearningRate 0.0001 Epoch: 19 Global Step: 242150 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:40:10,180-Speed 3034.14 samples/sec Loss 0.8670 LearningRate 0.0001 Epoch: 19 Global Step: 242160 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:40:13,538-Speed 3050.68 samples/sec Loss 0.8747 LearningRate 0.0001 Epoch: 19 Global Step: 242170 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:40:16,911-Speed 3036.91 samples/sec Loss 0.9096 LearningRate 0.0001 Epoch: 19 Global Step: 242180 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:40:20,288-Speed 3032.60 samples/sec Loss 0.8948 LearningRate 0.0001 Epoch: 19 Global Step: 242190 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:40:23,619-Speed 3075.13 samples/sec Loss 0.8632 LearningRate 0.0001 Epoch: 19 Global Step: 242200 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:40:27,008-Speed 3022.42 samples/sec Loss 0.8498 LearningRate 0.0001 Epoch: 19 Global Step: 242210 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:40:30,402-Speed 3017.50 samples/sec Loss 0.8660 LearningRate 0.0001 Epoch: 19 Global Step: 242220 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:40:33,780-Speed 3032.67 samples/sec Loss 0.8617 LearningRate 0.0001 Epoch: 19 Global Step: 242230 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:40:37,087-Speed 3097.00 samples/sec Loss 0.8506 LearningRate 0.0001 Epoch: 19 Global Step: 242240 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:40:40,499-Speed 3001.87 samples/sec Loss 0.8328 LearningRate 0.0001 Epoch: 19 Global Step: 242250 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:40:43,864-Speed 3044.27 samples/sec Loss 0.8524 LearningRate 0.0001 Epoch: 19 Global Step: 242260 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:40:47,274-Speed 3003.59 samples/sec Loss 0.8293 LearningRate 0.0001 Epoch: 19 Global Step: 242270 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:40:50,672-Speed 3014.88 samples/sec Loss 0.9024 LearningRate 0.0001 Epoch: 19 Global Step: 242280 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:40:54,033-Speed 3047.90 samples/sec Loss 0.8709 LearningRate 0.0001 Epoch: 19 Global Step: 242290 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:40:57,487-Speed 2964.87 samples/sec Loss 0.8402 LearningRate 0.0001 Epoch: 19 Global Step: 242300 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:41:00,807-Speed 3086.25 samples/sec Loss 0.8641 LearningRate 0.0001 Epoch: 19 Global Step: 242310 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:41:04,181-Speed 3035.05 samples/sec Loss 0.8395 LearningRate 0.0001 Epoch: 19 Global Step: 242320 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:41:07,501-Speed 3085.27 samples/sec Loss 0.8708 LearningRate 0.0001 Epoch: 19 Global Step: 242330 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:41:10,866-Speed 3044.17 samples/sec Loss 0.8913 LearningRate 0.0001 Epoch: 19 Global Step: 242340 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:41:14,215-Speed 3058.09 samples/sec Loss 0.9375 LearningRate 0.0001 Epoch: 19 Global Step: 242350 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:41:17,610-Speed 3017.07 samples/sec Loss 0.8718 LearningRate 0.0001 Epoch: 19 Global Step: 242360 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:41:20,995-Speed 3026.20 samples/sec Loss 0.8716 LearningRate 0.0001 Epoch: 19 Global Step: 242370 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:41:24,350-Speed 3052.13 samples/sec Loss 0.8265 LearningRate 0.0001 Epoch: 19 Global Step: 242380 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:41:27,772-Speed 2993.81 samples/sec Loss 0.8745 LearningRate 0.0001 Epoch: 19 Global Step: 242390 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:41:31,105-Speed 3072.81 samples/sec Loss 0.8579 LearningRate 0.0001 Epoch: 19 Global Step: 242400 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:41:34,431-Speed 3079.70 samples/sec Loss 0.8809 LearningRate 0.0001 Epoch: 19 Global Step: 242410 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:41:37,778-Speed 3059.69 samples/sec Loss 0.8418 LearningRate 0.0001 Epoch: 19 Global Step: 242420 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:41:41,171-Speed 3019.78 samples/sec Loss 0.9046 LearningRate 0.0001 Epoch: 19 Global Step: 242430 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:41:44,586-Speed 2998.99 samples/sec Loss 0.8381 LearningRate 0.0001 Epoch: 19 Global Step: 242440 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:41:48,040-Speed 2965.60 samples/sec Loss 0.8577 LearningRate 0.0001 Epoch: 19 Global Step: 242450 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:41:51,539-Speed 2927.14 samples/sec Loss 0.8425 LearningRate 0.0001 Epoch: 19 Global Step: 242460 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:41:54,898-Speed 3049.77 samples/sec Loss 0.8966 LearningRate 0.0001 Epoch: 19 Global Step: 242470 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-28 00:41:58,254-Speed 3052.44 samples/sec Loss 0.9006 LearningRate 0.0001 Epoch: 19 Global Step: 242480 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:42:01,647-Speed 3018.58 samples/sec Loss 0.8727 LearningRate 0.0001 Epoch: 19 Global Step: 242490 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:42:04,999-Speed 3055.45 samples/sec Loss 0.8570 LearningRate 0.0001 Epoch: 19 Global Step: 242500 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:42:08,394-Speed 3017.04 samples/sec Loss 0.8702 LearningRate 0.0001 Epoch: 19 Global Step: 242510 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:42:11,844-Speed 2968.54 samples/sec Loss 0.8578 LearningRate 0.0001 Epoch: 19 Global Step: 242520 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:42:15,307-Speed 2958.33 samples/sec Loss 0.8274 LearningRate 0.0001 Epoch: 19 Global Step: 242530 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:42:18,639-Speed 3073.78 samples/sec Loss 0.8678 LearningRate 0.0001 Epoch: 19 Global Step: 242540 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:42:22,101-Speed 2958.97 samples/sec Loss 0.8624 LearningRate 0.0001 Epoch: 19 Global Step: 242550 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:42:25,464-Speed 3045.51 samples/sec Loss 0.8401 LearningRate 0.0001 Epoch: 19 Global Step: 242560 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:42:28,828-Speed 3045.25 samples/sec Loss 0.8532 LearningRate 0.0001 Epoch: 19 Global Step: 242570 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:42:32,237-Speed 3004.03 samples/sec Loss 0.8408 LearningRate 0.0001 Epoch: 19 Global Step: 242580 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:42:35,648-Speed 3003.40 samples/sec Loss 0.8574 LearningRate 0.0001 Epoch: 19 Global Step: 242590 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:42:39,071-Speed 2991.82 samples/sec Loss 0.8391 LearningRate 0.0001 Epoch: 19 Global Step: 242600 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:42:42,433-Speed 3047.10 samples/sec Loss 0.8794 LearningRate 0.0001 Epoch: 19 Global Step: 242610 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:42:45,806-Speed 3036.64 samples/sec Loss 0.8548 LearningRate 0.0001 Epoch: 19 Global Step: 242620 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:42:49,184-Speed 3032.58 samples/sec Loss 0.8315 LearningRate 0.0001 Epoch: 19 Global Step: 242630 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:42:52,678-Speed 2931.55 samples/sec Loss 0.8869 LearningRate 0.0001 Epoch: 19 Global Step: 242640 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:42:56,182-Speed 2923.18 samples/sec Loss 0.8556 LearningRate 0.0001 Epoch: 19 Global Step: 242650 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:42:59,657-Speed 2947.29 samples/sec Loss 0.9003 LearningRate 0.0001 Epoch: 19 Global Step: 242660 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:43:03,050-Speed 3019.12 samples/sec Loss 0.8584 LearningRate 0.0001 Epoch: 19 Global Step: 242670 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:43:06,453-Speed 3009.42 samples/sec Loss 0.8733 LearningRate 0.0001 Epoch: 19 Global Step: 242680 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:43:09,819-Speed 3042.78 samples/sec Loss 0.8692 LearningRate 0.0001 Epoch: 19 Global Step: 242690 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:43:13,200-Speed 3030.11 samples/sec Loss 0.8464 LearningRate 0.0001 Epoch: 19 Global Step: 242700 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:43:16,649-Speed 2969.85 samples/sec Loss 0.8600 LearningRate 0.0001 Epoch: 19 Global Step: 242710 Fp16 Grad Scale: 16384 Required: 1 hours Training: 2022-04-28 00:43:20,009-Speed 3048.41 samples/sec Loss 0.8894 LearningRate 0.0001 Epoch: 19 Global Step: 242720 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:43:23,348-Speed 3067.24 samples/sec Loss 0.8784 LearningRate 0.0001 Epoch: 19 Global Step: 242730 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:43:26,713-Speed 3044.28 samples/sec Loss 0.8683 LearningRate 0.0001 Epoch: 19 Global Step: 242740 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:43:30,047-Speed 3072.46 samples/sec Loss 0.8665 LearningRate 0.0001 Epoch: 19 Global Step: 242750 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:43:33,392-Speed 3062.58 samples/sec Loss 0.8457 LearningRate 0.0001 Epoch: 19 Global Step: 242760 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:43:36,702-Speed 3093.76 samples/sec Loss 0.8131 LearningRate 0.0001 Epoch: 19 Global Step: 242770 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:43:40,109-Speed 3006.20 samples/sec Loss 0.8854 LearningRate 0.0001 Epoch: 19 Global Step: 242780 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:43:43,509-Speed 3012.73 samples/sec Loss 0.8743 LearningRate 0.0001 Epoch: 19 Global Step: 242790 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:43:46,923-Speed 3000.63 samples/sec Loss 0.8836 LearningRate 0.0001 Epoch: 19 Global Step: 242800 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:43:50,263-Speed 3066.43 samples/sec Loss 0.8903 LearningRate 0.0001 Epoch: 19 Global Step: 242810 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:43:53,693-Speed 2986.57 samples/sec Loss 0.8724 LearningRate 0.0001 Epoch: 19 Global Step: 242820 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-28 00:43:57,124-Speed 2985.50 samples/sec Loss 0.8931 LearningRate 0.0001 Epoch: 19 Global Step: 242830 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-28 00:44:00,532-Speed 3005.66 samples/sec Loss 0.8565 LearningRate 0.0001 Epoch: 19 Global Step: 242840 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:44:04,011-Speed 2944.25 samples/sec Loss 0.8538 LearningRate 0.0001 Epoch: 19 Global Step: 242850 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:44:07,448-Speed 2979.39 samples/sec Loss 0.8116 LearningRate 0.0001 Epoch: 19 Global Step: 242860 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:44:10,825-Speed 3033.95 samples/sec Loss 0.8428 LearningRate 0.0000 Epoch: 19 Global Step: 242870 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:44:14,245-Speed 2994.68 samples/sec Loss 0.8593 LearningRate 0.0000 Epoch: 19 Global Step: 242880 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:44:17,588-Speed 3063.75 samples/sec Loss 0.8338 LearningRate 0.0000 Epoch: 19 Global Step: 242890 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:44:20,959-Speed 3037.92 samples/sec Loss 0.8723 LearningRate 0.0000 Epoch: 19 Global Step: 242900 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:44:24,307-Speed 3059.57 samples/sec Loss 0.8566 LearningRate 0.0000 Epoch: 19 Global Step: 242910 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:44:27,709-Speed 3011.25 samples/sec Loss 0.8471 LearningRate 0.0000 Epoch: 19 Global Step: 242920 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:44:31,170-Speed 2959.55 samples/sec Loss 0.8549 LearningRate 0.0000 Epoch: 19 Global Step: 242930 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:44:34,566-Speed 3015.52 samples/sec Loss 0.8921 LearningRate 0.0000 Epoch: 19 Global Step: 242940 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-28 00:44:37,898-Speed 3074.94 samples/sec Loss 0.8293 LearningRate 0.0000 Epoch: 19 Global Step: 242950 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-28 00:44:41,353-Speed 2964.25 samples/sec Loss 0.8184 LearningRate 0.0000 Epoch: 19 Global Step: 242960 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-28 00:44:44,679-Speed 3079.76 samples/sec Loss 0.9256 LearningRate 0.0000 Epoch: 19 Global Step: 242970 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:44:48,024-Speed 3062.55 samples/sec Loss 0.9010 LearningRate 0.0000 Epoch: 19 Global Step: 242980 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:44:51,341-Speed 3087.48 samples/sec Loss 0.8911 LearningRate 0.0000 Epoch: 19 Global Step: 242990 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:44:54,739-Speed 3014.42 samples/sec Loss 0.9175 LearningRate 0.0000 Epoch: 19 Global Step: 243000 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:44:58,107-Speed 3041.60 samples/sec Loss 0.8346 LearningRate 0.0000 Epoch: 19 Global Step: 243010 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:45:01,463-Speed 3052.02 samples/sec Loss 0.8629 LearningRate 0.0000 Epoch: 19 Global Step: 243020 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:45:04,779-Speed 3088.88 samples/sec Loss 0.8631 LearningRate 0.0000 Epoch: 19 Global Step: 243030 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:45:08,219-Speed 2977.22 samples/sec Loss 0.8536 LearningRate 0.0000 Epoch: 19 Global Step: 243040 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:45:11,604-Speed 3026.57 samples/sec Loss 0.8382 LearningRate 0.0000 Epoch: 19 Global Step: 243050 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:45:15,056-Speed 2967.92 samples/sec Loss 0.8780 LearningRate 0.0000 Epoch: 19 Global Step: 243060 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:45:18,455-Speed 3012.96 samples/sec Loss 0.8636 LearningRate 0.0000 Epoch: 19 Global Step: 243070 Fp16 Grad Scale: 65536 Required: 1 hours Training: 2022-04-28 00:45:21,882-Speed 2988.87 samples/sec Loss 0.9214 LearningRate 0.0000 Epoch: 19 Global Step: 243080 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:45:25,351-Speed 2952.69 samples/sec Loss 0.8590 LearningRate 0.0000 Epoch: 19 Global Step: 243090 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:45:28,841-Speed 2935.09 samples/sec Loss 0.8335 LearningRate 0.0000 Epoch: 19 Global Step: 243100 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:45:32,190-Speed 3058.16 samples/sec Loss 0.8701 LearningRate 0.0000 Epoch: 19 Global Step: 243110 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:45:35,525-Speed 3071.49 samples/sec Loss 0.8776 LearningRate 0.0000 Epoch: 19 Global Step: 243120 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:45:38,921-Speed 3016.38 samples/sec Loss 0.8748 LearningRate 0.0000 Epoch: 19 Global Step: 243130 Fp16 Grad Scale: 32768 Required: 1 hours Training: 2022-04-28 00:45:42,283-Speed 3046.60 samples/sec Loss 0.8904 LearningRate 0.0000 Epoch: 19 Global Step: 243140 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 00:45:45,651-Speed 3040.96 samples/sec Loss 0.8754 LearningRate 0.0000 Epoch: 19 Global Step: 243150 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 00:45:49,088-Speed 2980.86 samples/sec Loss 0.8657 LearningRate 0.0000 Epoch: 19 Global Step: 243160 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 00:45:52,471-Speed 3027.12 samples/sec Loss 0.8676 LearningRate 0.0000 Epoch: 19 Global Step: 243170 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 00:45:55,780-Speed 3095.74 samples/sec Loss 0.8211 LearningRate 0.0000 Epoch: 19 Global Step: 243180 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 00:45:59,117-Speed 3068.97 samples/sec Loss 0.8705 LearningRate 0.0000 Epoch: 19 Global Step: 243190 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 00:46:02,494-Speed 3033.58 samples/sec Loss 0.8653 LearningRate 0.0000 Epoch: 19 Global Step: 243200 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 00:46:05,857-Speed 3045.73 samples/sec Loss 0.8725 LearningRate 0.0000 Epoch: 19 Global Step: 243210 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 00:46:09,280-Speed 2992.32 samples/sec Loss 0.9112 LearningRate 0.0000 Epoch: 19 Global Step: 243220 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 00:46:12,686-Speed 3007.33 samples/sec Loss 0.8847 LearningRate 0.0000 Epoch: 19 Global Step: 243230 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 00:46:16,099-Speed 3000.88 samples/sec Loss 0.8405 LearningRate 0.0000 Epoch: 19 Global Step: 243240 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:46:19,514-Speed 2999.64 samples/sec Loss 0.8912 LearningRate 0.0000 Epoch: 19 Global Step: 243250 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:46:22,908-Speed 3017.41 samples/sec Loss 0.8759 LearningRate 0.0000 Epoch: 19 Global Step: 243260 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:46:26,218-Speed 3094.82 samples/sec Loss 0.8360 LearningRate 0.0000 Epoch: 19 Global Step: 243270 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:46:29,539-Speed 3084.75 samples/sec Loss 0.9195 LearningRate 0.0000 Epoch: 19 Global Step: 243280 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:46:32,932-Speed 3018.42 samples/sec Loss 0.8795 LearningRate 0.0000 Epoch: 19 Global Step: 243290 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:46:36,413-Speed 2942.65 samples/sec Loss 0.8578 LearningRate 0.0000 Epoch: 19 Global Step: 243300 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:46:39,729-Speed 3088.39 samples/sec Loss 0.8998 LearningRate 0.0000 Epoch: 19 Global Step: 243310 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:46:43,113-Speed 3026.71 samples/sec Loss 0.9105 LearningRate 0.0000 Epoch: 19 Global Step: 243320 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:46:46,508-Speed 3017.05 samples/sec Loss 0.8897 LearningRate 0.0000 Epoch: 19 Global Step: 243330 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:46:49,867-Speed 3049.47 samples/sec Loss 0.8638 LearningRate 0.0000 Epoch: 19 Global Step: 243340 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:46:53,221-Speed 3053.87 samples/sec Loss 0.8770 LearningRate 0.0000 Epoch: 19 Global Step: 243350 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:46:56,706-Speed 2938.84 samples/sec Loss 0.8687 LearningRate 0.0000 Epoch: 19 Global Step: 243360 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:47:00,074-Speed 3041.44 samples/sec Loss 0.8856 LearningRate 0.0000 Epoch: 19 Global Step: 243370 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:47:03,527-Speed 2966.55 samples/sec Loss 0.8802 LearningRate 0.0000 Epoch: 19 Global Step: 243380 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:47:06,907-Speed 3030.32 samples/sec Loss 0.8676 LearningRate 0.0000 Epoch: 19 Global Step: 243390 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:47:10,240-Speed 3073.11 samples/sec Loss 0.8814 LearningRate 0.0000 Epoch: 19 Global Step: 243400 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:47:13,582-Speed 3065.50 samples/sec Loss 0.8717 LearningRate 0.0000 Epoch: 19 Global Step: 243410 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:47:17,078-Speed 2930.08 samples/sec Loss 0.8758 LearningRate 0.0000 Epoch: 19 Global Step: 243420 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:47:20,499-Speed 2994.17 samples/sec Loss 0.8718 LearningRate 0.0000 Epoch: 19 Global Step: 243430 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:47:23,961-Speed 2958.31 samples/sec Loss 0.8787 LearningRate 0.0000 Epoch: 19 Global Step: 243440 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-28 00:47:27,383-Speed 2993.29 samples/sec Loss 0.8678 LearningRate 0.0000 Epoch: 19 Global Step: 243450 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 00:47:30,778-Speed 3017.39 samples/sec Loss 0.8703 LearningRate 0.0000 Epoch: 19 Global Step: 243460 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 00:47:34,241-Speed 2957.73 samples/sec Loss 0.8762 LearningRate 0.0000 Epoch: 19 Global Step: 243470 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 00:47:37,728-Speed 2937.97 samples/sec Loss 0.8919 LearningRate 0.0000 Epoch: 19 Global Step: 243480 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 00:47:41,171-Speed 2975.27 samples/sec Loss 0.8566 LearningRate 0.0000 Epoch: 19 Global Step: 243490 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 00:47:44,559-Speed 3023.11 samples/sec Loss 0.8544 LearningRate 0.0000 Epoch: 19 Global Step: 243500 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 00:47:47,975-Speed 2998.68 samples/sec Loss 0.8354 LearningRate 0.0000 Epoch: 19 Global Step: 243510 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 00:47:51,372-Speed 3015.72 samples/sec Loss 0.9141 LearningRate 0.0000 Epoch: 19 Global Step: 243520 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 00:47:54,769-Speed 3014.86 samples/sec Loss 0.8732 LearningRate 0.0000 Epoch: 19 Global Step: 243530 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 00:47:58,203-Speed 2982.64 samples/sec Loss 0.8713 LearningRate 0.0000 Epoch: 19 Global Step: 243540 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 00:48:01,614-Speed 3003.24 samples/sec Loss 0.8328 LearningRate 0.0000 Epoch: 19 Global Step: 243550 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:48:05,012-Speed 3014.07 samples/sec Loss 0.8649 LearningRate 0.0000 Epoch: 19 Global Step: 243560 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:48:08,379-Speed 3042.05 samples/sec Loss 0.8863 LearningRate 0.0000 Epoch: 19 Global Step: 243570 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:48:11,797-Speed 2996.96 samples/sec Loss 0.8704 LearningRate 0.0000 Epoch: 19 Global Step: 243580 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:48:15,226-Speed 2987.20 samples/sec Loss 0.8376 LearningRate 0.0000 Epoch: 19 Global Step: 243590 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:48:18,736-Speed 2917.67 samples/sec Loss 0.8832 LearningRate 0.0000 Epoch: 19 Global Step: 243600 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:48:22,110-Speed 3035.83 samples/sec Loss 0.8502 LearningRate 0.0000 Epoch: 19 Global Step: 243610 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:48:25,510-Speed 3012.97 samples/sec Loss 0.8491 LearningRate 0.0000 Epoch: 19 Global Step: 243620 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:48:28,971-Speed 2958.97 samples/sec Loss 0.8764 LearningRate 0.0000 Epoch: 19 Global Step: 243630 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:48:32,437-Speed 2955.48 samples/sec Loss 0.8704 LearningRate 0.0000 Epoch: 19 Global Step: 243640 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:48:35,817-Speed 3030.21 samples/sec Loss 0.8583 LearningRate 0.0000 Epoch: 19 Global Step: 243650 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-28 00:48:39,178-Speed 3047.40 samples/sec Loss 0.8562 LearningRate 0.0000 Epoch: 19 Global Step: 243660 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:48:42,666-Speed 2936.94 samples/sec Loss 0.9058 LearningRate 0.0000 Epoch: 19 Global Step: 243670 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:48:46,081-Speed 2998.90 samples/sec Loss 0.8792 LearningRate 0.0000 Epoch: 19 Global Step: 243680 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:48:49,537-Speed 2963.70 samples/sec Loss 0.8443 LearningRate 0.0000 Epoch: 19 Global Step: 243690 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:48:52,999-Speed 2958.42 samples/sec Loss 0.8952 LearningRate 0.0000 Epoch: 19 Global Step: 243700 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:48:56,377-Speed 3032.24 samples/sec Loss 0.8916 LearningRate 0.0000 Epoch: 19 Global Step: 243710 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:48:59,768-Speed 3020.75 samples/sec Loss 0.8903 LearningRate 0.0000 Epoch: 19 Global Step: 243720 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:49:03,207-Speed 2978.49 samples/sec Loss 0.8894 LearningRate 0.0000 Epoch: 19 Global Step: 243730 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:49:06,591-Speed 3026.62 samples/sec Loss 0.8906 LearningRate 0.0000 Epoch: 19 Global Step: 243740 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:49:09,931-Speed 3066.91 samples/sec Loss 0.8970 LearningRate 0.0000 Epoch: 19 Global Step: 243750 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:49:13,351-Speed 2995.26 samples/sec Loss 0.8854 LearningRate 0.0000 Epoch: 19 Global Step: 243760 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-28 00:49:16,699-Speed 3059.13 samples/sec Loss 0.8912 LearningRate 0.0000 Epoch: 19 Global Step: 243770 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:49:20,059-Speed 3049.02 samples/sec Loss 0.8723 LearningRate 0.0000 Epoch: 19 Global Step: 243780 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 00:49:23,505-Speed 2972.11 samples/sec Loss 0.8840 LearningRate 0.0000 Epoch: 19 Global Step: 243790 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 00:49:26,919-Speed 3000.23 samples/sec Loss 0.8539 LearningRate 0.0000 Epoch: 19 Global Step: 243800 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 00:49:30,352-Speed 2984.04 samples/sec Loss 0.8411 LearningRate 0.0000 Epoch: 19 Global Step: 243810 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 00:49:33,702-Speed 3057.09 samples/sec Loss 0.8419 LearningRate 0.0000 Epoch: 19 Global Step: 243820 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 00:49:37,120-Speed 2997.31 samples/sec Loss 0.8341 LearningRate 0.0000 Epoch: 19 Global Step: 243830 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 00:49:40,543-Speed 2991.86 samples/sec Loss 0.8313 LearningRate 0.0000 Epoch: 19 Global Step: 243840 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 00:49:43,891-Speed 3059.52 samples/sec Loss 0.8603 LearningRate 0.0000 Epoch: 19 Global Step: 243850 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 00:49:47,279-Speed 3023.25 samples/sec Loss 0.8766 LearningRate 0.0000 Epoch: 19 Global Step: 243860 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 00:49:50,777-Speed 2928.71 samples/sec Loss 0.8503 LearningRate 0.0000 Epoch: 19 Global Step: 243870 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 00:49:54,186-Speed 3004.31 samples/sec Loss 0.8692 LearningRate 0.0000 Epoch: 19 Global Step: 243880 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:49:57,636-Speed 2969.08 samples/sec Loss 0.8819 LearningRate 0.0000 Epoch: 19 Global Step: 243890 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:50:01,101-Speed 2956.64 samples/sec Loss 0.8678 LearningRate 0.0000 Epoch: 19 Global Step: 243900 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:50:04,504-Speed 3009.05 samples/sec Loss 0.8524 LearningRate 0.0000 Epoch: 19 Global Step: 243910 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:50:07,925-Speed 2994.88 samples/sec Loss 0.8826 LearningRate 0.0000 Epoch: 19 Global Step: 243920 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:50:11,337-Speed 3001.89 samples/sec Loss 0.8386 LearningRate 0.0000 Epoch: 19 Global Step: 243930 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:50:14,839-Speed 2924.74 samples/sec Loss 0.8576 LearningRate 0.0000 Epoch: 19 Global Step: 243940 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:50:18,192-Speed 3054.30 samples/sec Loss 0.9177 LearningRate 0.0000 Epoch: 19 Global Step: 243950 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:50:21,608-Speed 2998.91 samples/sec Loss 0.8546 LearningRate 0.0000 Epoch: 19 Global Step: 243960 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:50:24,997-Speed 3022.74 samples/sec Loss 0.8513 LearningRate 0.0000 Epoch: 19 Global Step: 243970 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:50:28,411-Speed 3000.29 samples/sec Loss 0.8961 LearningRate 0.0000 Epoch: 19 Global Step: 243980 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:50:31,911-Speed 2926.65 samples/sec Loss 0.8561 LearningRate 0.0000 Epoch: 19 Global Step: 243990 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:50:35,376-Speed 2956.09 samples/sec Loss 0.8588 LearningRate 0.0000 Epoch: 19 Global Step: 244000 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:50:38,714-Speed 3068.17 samples/sec Loss 0.8484 LearningRate 0.0000 Epoch: 19 Global Step: 244010 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:50:42,130-Speed 2998.33 samples/sec Loss 0.8466 LearningRate 0.0000 Epoch: 19 Global Step: 244020 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:50:45,502-Speed 3037.53 samples/sec Loss 0.8952 LearningRate 0.0000 Epoch: 19 Global Step: 244030 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:50:48,994-Speed 2933.30 samples/sec Loss 0.8733 LearningRate 0.0000 Epoch: 19 Global Step: 244040 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:50:52,340-Speed 3061.69 samples/sec Loss 0.8732 LearningRate 0.0000 Epoch: 19 Global Step: 244050 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:50:55,765-Speed 2990.38 samples/sec Loss 0.8460 LearningRate 0.0000 Epoch: 19 Global Step: 244060 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:50:59,256-Speed 2933.71 samples/sec Loss 0.8841 LearningRate 0.0000 Epoch: 19 Global Step: 244070 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:51:02,665-Speed 3004.78 samples/sec Loss 0.8514 LearningRate 0.0000 Epoch: 19 Global Step: 244080 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-28 00:51:06,135-Speed 2951.83 samples/sec Loss 0.8822 LearningRate 0.0000 Epoch: 19 Global Step: 244090 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-28 00:51:09,528-Speed 3018.85 samples/sec Loss 0.8534 LearningRate 0.0000 Epoch: 19 Global Step: 244100 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:51:12,843-Speed 3089.76 samples/sec Loss 0.8779 LearningRate 0.0000 Epoch: 19 Global Step: 244110 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:51:16,268-Speed 2990.76 samples/sec Loss 0.8333 LearningRate 0.0000 Epoch: 19 Global Step: 244120 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:51:19,630-Speed 3046.59 samples/sec Loss 0.8610 LearningRate 0.0000 Epoch: 19 Global Step: 244130 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:51:23,035-Speed 3008.75 samples/sec Loss 0.8777 LearningRate 0.0000 Epoch: 19 Global Step: 244140 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:51:26,368-Speed 3073.18 samples/sec Loss 0.8218 LearningRate 0.0000 Epoch: 19 Global Step: 244150 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:51:29,701-Speed 3073.39 samples/sec Loss 0.8763 LearningRate 0.0000 Epoch: 19 Global Step: 244160 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:51:33,120-Speed 2995.55 samples/sec Loss 0.8661 LearningRate 0.0000 Epoch: 19 Global Step: 244170 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:51:36,466-Speed 3061.25 samples/sec Loss 0.8652 LearningRate 0.0000 Epoch: 19 Global Step: 244180 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:51:39,825-Speed 3048.74 samples/sec Loss 0.8850 LearningRate 0.0000 Epoch: 19 Global Step: 244190 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:51:43,212-Speed 3024.88 samples/sec Loss 0.8420 LearningRate 0.0000 Epoch: 19 Global Step: 244200 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:51:46,640-Speed 2987.47 samples/sec Loss 0.8739 LearningRate 0.0000 Epoch: 19 Global Step: 244210 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:51:50,123-Speed 2940.88 samples/sec Loss 0.8916 LearningRate 0.0000 Epoch: 19 Global Step: 244220 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:51:53,568-Speed 2972.87 samples/sec Loss 0.8983 LearningRate 0.0000 Epoch: 19 Global Step: 244230 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:51:57,008-Speed 2977.80 samples/sec Loss 0.8725 LearningRate 0.0000 Epoch: 19 Global Step: 244240 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:52:00,399-Speed 3020.59 samples/sec Loss 0.8616 LearningRate 0.0000 Epoch: 19 Global Step: 244250 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:52:03,858-Speed 2961.06 samples/sec Loss 0.8599 LearningRate 0.0000 Epoch: 19 Global Step: 244260 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:52:07,369-Speed 2917.69 samples/sec Loss 0.8803 LearningRate 0.0000 Epoch: 19 Global Step: 244270 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:52:10,818-Speed 2969.51 samples/sec Loss 0.8508 LearningRate 0.0000 Epoch: 19 Global Step: 244280 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:52:14,157-Speed 3067.99 samples/sec Loss 0.8677 LearningRate 0.0000 Epoch: 19 Global Step: 244290 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:52:17,520-Speed 3045.81 samples/sec Loss 0.8750 LearningRate 0.0000 Epoch: 19 Global Step: 244300 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-28 00:52:20,940-Speed 2994.86 samples/sec Loss 0.8479 LearningRate 0.0000 Epoch: 19 Global Step: 244310 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-28 00:52:24,317-Speed 3033.53 samples/sec Loss 0.8437 LearningRate 0.0000 Epoch: 19 Global Step: 244320 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-28 00:52:27,877-Speed 2877.17 samples/sec Loss 0.8939 LearningRate 0.0000 Epoch: 19 Global Step: 244330 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-28 00:52:31,256-Speed 3031.49 samples/sec Loss 0.8192 LearningRate 0.0000 Epoch: 19 Global Step: 244340 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:52:34,604-Speed 3059.12 samples/sec Loss 0.8430 LearningRate 0.0000 Epoch: 19 Global Step: 244350 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:52:38,029-Speed 2990.36 samples/sec Loss 0.8940 LearningRate 0.0000 Epoch: 19 Global Step: 244360 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:52:41,387-Speed 3050.06 samples/sec Loss 0.8635 LearningRate 0.0000 Epoch: 19 Global Step: 244370 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:52:44,801-Speed 3000.75 samples/sec Loss 0.8944 LearningRate 0.0000 Epoch: 19 Global Step: 244380 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:52:48,153-Speed 3054.99 samples/sec Loss 0.8810 LearningRate 0.0000 Epoch: 19 Global Step: 244390 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:52:51,504-Speed 3057.59 samples/sec Loss 0.8781 LearningRate 0.0000 Epoch: 19 Global Step: 244400 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:52:54,883-Speed 3031.19 samples/sec Loss 0.8535 LearningRate 0.0000 Epoch: 19 Global Step: 244410 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:52:58,254-Speed 3038.11 samples/sec Loss 0.8809 LearningRate 0.0000 Epoch: 19 Global Step: 244420 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:53:01,654-Speed 3012.77 samples/sec Loss 0.8524 LearningRate 0.0000 Epoch: 19 Global Step: 244430 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:53:05,010-Speed 3051.53 samples/sec Loss 0.8422 LearningRate 0.0000 Epoch: 19 Global Step: 244440 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 00:53:08,380-Speed 3039.92 samples/sec Loss 0.8365 LearningRate 0.0000 Epoch: 19 Global Step: 244450 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 00:53:11,846-Speed 2955.40 samples/sec Loss 0.8864 LearningRate 0.0000 Epoch: 19 Global Step: 244460 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 00:53:15,297-Speed 2967.68 samples/sec Loss 0.8635 LearningRate 0.0000 Epoch: 19 Global Step: 244470 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 00:53:18,688-Speed 3021.34 samples/sec Loss 0.9106 LearningRate 0.0000 Epoch: 19 Global Step: 244480 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 00:53:22,084-Speed 3018.42 samples/sec Loss 0.8522 LearningRate 0.0000 Epoch: 19 Global Step: 244490 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 00:53:25,447-Speed 3045.13 samples/sec Loss 0.8839 LearningRate 0.0000 Epoch: 19 Global Step: 244500 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 00:53:28,872-Speed 2991.13 samples/sec Loss 0.8602 LearningRate 0.0000 Epoch: 19 Global Step: 244510 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 00:53:32,308-Speed 2980.35 samples/sec Loss 0.8534 LearningRate 0.0000 Epoch: 19 Global Step: 244520 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 00:53:35,762-Speed 2965.80 samples/sec Loss 0.8688 LearningRate 0.0000 Epoch: 19 Global Step: 244530 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 00:53:39,182-Speed 2994.52 samples/sec Loss 0.8662 LearningRate 0.0000 Epoch: 19 Global Step: 244540 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:53:42,708-Speed 2904.99 samples/sec Loss 0.8944 LearningRate 0.0000 Epoch: 19 Global Step: 244550 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:53:46,082-Speed 3035.80 samples/sec Loss 0.8757 LearningRate 0.0000 Epoch: 19 Global Step: 244560 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:53:49,466-Speed 3026.74 samples/sec Loss 0.8997 LearningRate 0.0000 Epoch: 19 Global Step: 244570 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:53:52,914-Speed 2971.14 samples/sec Loss 0.9236 LearningRate 0.0000 Epoch: 19 Global Step: 244580 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:53:56,335-Speed 2994.06 samples/sec Loss 0.8484 LearningRate 0.0000 Epoch: 19 Global Step: 244590 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:53:59,825-Speed 2934.44 samples/sec Loss 0.8306 LearningRate 0.0000 Epoch: 19 Global Step: 244600 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:54:03,244-Speed 2996.37 samples/sec Loss 0.8632 LearningRate 0.0000 Epoch: 19 Global Step: 244610 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:54:06,642-Speed 3013.92 samples/sec Loss 0.8774 LearningRate 0.0000 Epoch: 19 Global Step: 244620 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:54:10,043-Speed 3012.41 samples/sec Loss 0.9036 LearningRate 0.0000 Epoch: 19 Global Step: 244630 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:54:13,442-Speed 3013.58 samples/sec Loss 0.8796 LearningRate 0.0000 Epoch: 19 Global Step: 244640 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-28 00:54:16,781-Speed 3066.93 samples/sec Loss 0.8525 LearningRate 0.0000 Epoch: 19 Global Step: 244650 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:54:20,119-Speed 3069.17 samples/sec Loss 0.8918 LearningRate 0.0000 Epoch: 19 Global Step: 244660 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:54:23,485-Speed 3043.30 samples/sec Loss 0.8996 LearningRate 0.0000 Epoch: 19 Global Step: 244670 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:54:26,859-Speed 3035.32 samples/sec Loss 0.8731 LearningRate 0.0000 Epoch: 19 Global Step: 244680 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:54:30,382-Speed 2907.29 samples/sec Loss 0.8293 LearningRate 0.0000 Epoch: 19 Global Step: 244690 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:54:33,806-Speed 2991.11 samples/sec Loss 0.8873 LearningRate 0.0000 Epoch: 19 Global Step: 244700 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:54:37,210-Speed 3009.35 samples/sec Loss 0.8564 LearningRate 0.0000 Epoch: 19 Global Step: 244710 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:54:40,700-Speed 2934.63 samples/sec Loss 0.8816 LearningRate 0.0000 Epoch: 19 Global Step: 244720 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 00:54:44,147-Speed 2971.98 samples/sec Loss 0.8982 LearningRate 0.0000 Epoch: 19 Global Step: 244730 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 00:54:47,541-Speed 3017.53 samples/sec Loss 0.8615 LearningRate 0.0000 Epoch: 19 Global Step: 244740 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 00:54:50,938-Speed 3015.39 samples/sec Loss 0.8670 LearningRate 0.0000 Epoch: 19 Global Step: 244750 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 00:54:54,312-Speed 3035.72 samples/sec Loss 0.8330 LearningRate 0.0000 Epoch: 19 Global Step: 244760 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 00:54:57,707-Speed 3016.50 samples/sec Loss 0.8381 LearningRate 0.0000 Epoch: 19 Global Step: 244770 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 00:55:01,081-Speed 3036.40 samples/sec Loss 0.8665 LearningRate 0.0000 Epoch: 19 Global Step: 244780 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 00:55:04,433-Speed 3055.21 samples/sec Loss 0.8709 LearningRate 0.0000 Epoch: 19 Global Step: 244790 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 00:55:07,786-Speed 3055.26 samples/sec Loss 0.8543 LearningRate 0.0000 Epoch: 19 Global Step: 244800 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 00:55:11,160-Speed 3035.38 samples/sec Loss 0.8298 LearningRate 0.0000 Epoch: 19 Global Step: 244810 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 00:55:14,538-Speed 3032.98 samples/sec Loss 0.8681 LearningRate 0.0000 Epoch: 19 Global Step: 244820 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:55:17,927-Speed 3022.24 samples/sec Loss 0.8810 LearningRate 0.0000 Epoch: 19 Global Step: 244830 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:55:21,315-Speed 3023.39 samples/sec Loss 0.8585 LearningRate 0.0000 Epoch: 19 Global Step: 244840 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:55:24,710-Speed 3017.21 samples/sec Loss 0.8960 LearningRate 0.0000 Epoch: 19 Global Step: 244850 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:55:28,108-Speed 3013.89 samples/sec Loss 0.8556 LearningRate 0.0000 Epoch: 19 Global Step: 244860 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:55:31,451-Speed 3064.06 samples/sec Loss 0.9173 LearningRate 0.0000 Epoch: 19 Global Step: 244870 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:55:34,802-Speed 3056.58 samples/sec Loss 0.8926 LearningRate 0.0000 Epoch: 19 Global Step: 244880 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:55:38,194-Speed 3019.93 samples/sec Loss 0.8699 LearningRate 0.0000 Epoch: 19 Global Step: 244890 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:55:41,543-Speed 3059.29 samples/sec Loss 0.8257 LearningRate 0.0000 Epoch: 19 Global Step: 244900 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:55:44,906-Speed 3045.72 samples/sec Loss 0.8558 LearningRate 0.0000 Epoch: 19 Global Step: 244910 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:55:48,308-Speed 3010.03 samples/sec Loss 0.8713 LearningRate 0.0000 Epoch: 19 Global Step: 244920 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:55:51,730-Speed 2994.27 samples/sec Loss 0.8716 LearningRate 0.0000 Epoch: 19 Global Step: 244930 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:55:55,112-Speed 3028.16 samples/sec Loss 0.8788 LearningRate 0.0000 Epoch: 19 Global Step: 244940 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:55:58,486-Speed 3035.53 samples/sec Loss 0.8598 LearningRate 0.0000 Epoch: 19 Global Step: 244950 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 00:56:01,838-Speed 3056.27 samples/sec Loss 0.8681 LearningRate 0.0000 Epoch: 19 Global Step: 244960 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 00:56:05,194-Speed 3051.64 samples/sec Loss 0.8655 LearningRate 0.0000 Epoch: 19 Global Step: 244970 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 00:56:08,530-Speed 3070.29 samples/sec Loss 0.8841 LearningRate 0.0000 Epoch: 19 Global Step: 244980 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 00:56:11,898-Speed 3041.75 samples/sec Loss 0.8827 LearningRate 0.0000 Epoch: 19 Global Step: 244990 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 00:56:15,287-Speed 3021.83 samples/sec Loss 0.8477 LearningRate 0.0000 Epoch: 19 Global Step: 245000 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 00:56:18,689-Speed 3010.84 samples/sec Loss 0.8543 LearningRate 0.0000 Epoch: 19 Global Step: 245010 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 00:56:22,050-Speed 3047.88 samples/sec Loss 0.8477 LearningRate 0.0000 Epoch: 19 Global Step: 245020 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 00:56:25,455-Speed 3007.61 samples/sec Loss 0.8903 LearningRate 0.0000 Epoch: 19 Global Step: 245030 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 00:56:28,822-Speed 3042.15 samples/sec Loss 0.8681 LearningRate 0.0000 Epoch: 19 Global Step: 245040 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 00:56:32,181-Speed 3049.25 samples/sec Loss 0.8176 LearningRate 0.0000 Epoch: 19 Global Step: 245050 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:56:35,557-Speed 3034.17 samples/sec Loss 0.8448 LearningRate 0.0000 Epoch: 19 Global Step: 245060 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:56:38,929-Speed 3037.69 samples/sec Loss 0.8786 LearningRate 0.0000 Epoch: 19 Global Step: 245070 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:56:42,277-Speed 3059.26 samples/sec Loss 0.8774 LearningRate 0.0000 Epoch: 19 Global Step: 245080 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:56:45,689-Speed 3002.12 samples/sec Loss 0.9153 LearningRate 0.0000 Epoch: 19 Global Step: 245090 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:56:49,030-Speed 3065.46 samples/sec Loss 0.8913 LearningRate 0.0000 Epoch: 19 Global Step: 245100 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:56:52,392-Speed 3046.52 samples/sec Loss 0.8560 LearningRate 0.0000 Epoch: 19 Global Step: 245110 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:56:55,802-Speed 3003.66 samples/sec Loss 0.8549 LearningRate 0.0000 Epoch: 19 Global Step: 245120 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:56:59,145-Speed 3064.46 samples/sec Loss 0.8735 LearningRate 0.0000 Epoch: 19 Global Step: 245130 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:57:02,545-Speed 3012.28 samples/sec Loss 0.8570 LearningRate 0.0000 Epoch: 19 Global Step: 245140 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:57:05,872-Speed 3078.55 samples/sec Loss 0.8835 LearningRate 0.0000 Epoch: 19 Global Step: 245150 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:57:09,344-Speed 2950.79 samples/sec Loss 0.8518 LearningRate 0.0000 Epoch: 19 Global Step: 245160 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:57:12,799-Speed 2964.06 samples/sec Loss 0.8482 LearningRate 0.0000 Epoch: 19 Global Step: 245170 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:57:16,184-Speed 3026.19 samples/sec Loss 0.8753 LearningRate 0.0000 Epoch: 19 Global Step: 245180 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 00:57:19,546-Speed 3047.02 samples/sec Loss 0.9030 LearningRate 0.0000 Epoch: 19 Global Step: 245190 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 00:57:22,973-Speed 2988.61 samples/sec Loss 0.8874 LearningRate 0.0000 Epoch: 19 Global Step: 245200 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 00:57:26,345-Speed 3037.00 samples/sec Loss 0.8715 LearningRate 0.0000 Epoch: 19 Global Step: 245210 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 00:57:29,820-Speed 2947.81 samples/sec Loss 0.8507 LearningRate 0.0000 Epoch: 19 Global Step: 245220 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 00:57:33,209-Speed 3022.14 samples/sec Loss 0.8686 LearningRate 0.0000 Epoch: 19 Global Step: 245230 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 00:57:36,584-Speed 3035.29 samples/sec Loss 0.8529 LearningRate 0.0000 Epoch: 19 Global Step: 245240 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 00:57:39,954-Speed 3039.23 samples/sec Loss 0.8366 LearningRate 0.0000 Epoch: 19 Global Step: 245250 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 00:57:43,354-Speed 3012.95 samples/sec Loss 0.8553 LearningRate 0.0000 Epoch: 19 Global Step: 245260 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 00:57:46,659-Speed 3098.81 samples/sec Loss 0.8400 LearningRate 0.0000 Epoch: 19 Global Step: 245270 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 00:57:50,069-Speed 3003.97 samples/sec Loss 0.8796 LearningRate 0.0000 Epoch: 19 Global Step: 245280 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:57:53,403-Speed 3072.32 samples/sec Loss 0.8763 LearningRate 0.0000 Epoch: 19 Global Step: 245290 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:57:56,789-Speed 3024.72 samples/sec Loss 0.8624 LearningRate 0.0000 Epoch: 19 Global Step: 245300 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:58:00,193-Speed 3008.22 samples/sec Loss 0.8560 LearningRate 0.0000 Epoch: 19 Global Step: 245310 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:58:03,633-Speed 2977.84 samples/sec Loss 0.8815 LearningRate 0.0000 Epoch: 19 Global Step: 245320 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:58:07,088-Speed 2964.82 samples/sec Loss 0.8089 LearningRate 0.0000 Epoch: 19 Global Step: 245330 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:58:10,447-Speed 3049.44 samples/sec Loss 0.8214 LearningRate 0.0000 Epoch: 19 Global Step: 245340 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:58:13,910-Speed 2958.22 samples/sec Loss 0.8590 LearningRate 0.0000 Epoch: 19 Global Step: 245350 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:58:17,293-Speed 3027.89 samples/sec Loss 0.8583 LearningRate 0.0000 Epoch: 19 Global Step: 245360 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:58:20,744-Speed 2967.53 samples/sec Loss 0.8810 LearningRate 0.0000 Epoch: 19 Global Step: 245370 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:58:24,175-Speed 2985.70 samples/sec Loss 0.8499 LearningRate 0.0000 Epoch: 19 Global Step: 245380 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-28 00:58:27,529-Speed 3053.36 samples/sec Loss 0.8771 LearningRate 0.0000 Epoch: 19 Global Step: 245390 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-28 00:58:30,923-Speed 3017.68 samples/sec Loss 0.8951 LearningRate 0.0000 Epoch: 19 Global Step: 245400 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-28 00:58:34,340-Speed 2997.77 samples/sec Loss 0.8650 LearningRate 0.0000 Epoch: 19 Global Step: 245410 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-28 00:58:37,695-Speed 3053.03 samples/sec Loss 0.8908 LearningRate 0.0000 Epoch: 19 Global Step: 245420 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:58:41,079-Speed 3026.95 samples/sec Loss 0.8376 LearningRate 0.0000 Epoch: 19 Global Step: 245430 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:58:44,483-Speed 3009.46 samples/sec Loss 0.8227 LearningRate 0.0000 Epoch: 19 Global Step: 245440 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:58:47,818-Speed 3070.32 samples/sec Loss 0.8672 LearningRate 0.0000 Epoch: 19 Global Step: 245450 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:58:51,160-Speed 3065.65 samples/sec Loss 0.8526 LearningRate 0.0000 Epoch: 19 Global Step: 245460 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:58:54,545-Speed 3025.64 samples/sec Loss 0.8511 LearningRate 0.0000 Epoch: 19 Global Step: 245470 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:58:57,933-Speed 3022.76 samples/sec Loss 0.8746 LearningRate 0.0000 Epoch: 19 Global Step: 245480 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:59:01,340-Speed 3006.66 samples/sec Loss 0.8236 LearningRate 0.0000 Epoch: 19 Global Step: 245490 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:59:04,793-Speed 2966.16 samples/sec Loss 0.8510 LearningRate 0.0000 Epoch: 19 Global Step: 245500 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:59:08,161-Speed 3041.56 samples/sec Loss 0.8833 LearningRate 0.0000 Epoch: 19 Global Step: 245510 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:59:11,531-Speed 3039.47 samples/sec Loss 0.8081 LearningRate 0.0000 Epoch: 19 Global Step: 245520 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 00:59:14,902-Speed 3038.74 samples/sec Loss 0.8444 LearningRate 0.0000 Epoch: 19 Global Step: 245530 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 00:59:18,245-Speed 3063.60 samples/sec Loss 0.8740 LearningRate 0.0000 Epoch: 19 Global Step: 245540 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 00:59:21,602-Speed 3051.42 samples/sec Loss 0.8920 LearningRate 0.0000 Epoch: 19 Global Step: 245550 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 00:59:25,008-Speed 3007.79 samples/sec Loss 0.8763 LearningRate 0.0000 Epoch: 19 Global Step: 245560 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 00:59:28,380-Speed 3037.52 samples/sec Loss 0.8714 LearningRate 0.0000 Epoch: 19 Global Step: 245570 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 00:59:31,807-Speed 2988.58 samples/sec Loss 0.8846 LearningRate 0.0000 Epoch: 19 Global Step: 245580 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 00:59:35,187-Speed 3030.49 samples/sec Loss 0.9176 LearningRate 0.0000 Epoch: 19 Global Step: 245590 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 00:59:38,569-Speed 3028.30 samples/sec Loss 0.8390 LearningRate 0.0000 Epoch: 19 Global Step: 245600 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 00:59:41,929-Speed 3048.91 samples/sec Loss 0.8795 LearningRate 0.0000 Epoch: 19 Global Step: 245610 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 00:59:45,394-Speed 2956.29 samples/sec Loss 0.8998 LearningRate 0.0000 Epoch: 19 Global Step: 245620 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:59:48,795-Speed 3011.64 samples/sec Loss 0.8524 LearningRate 0.0000 Epoch: 19 Global Step: 245630 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:59:52,227-Speed 2984.91 samples/sec Loss 0.8801 LearningRate 0.0000 Epoch: 19 Global Step: 245640 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:59:55,560-Speed 3072.66 samples/sec Loss 0.9158 LearningRate 0.0000 Epoch: 19 Global Step: 245650 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 00:59:58,926-Speed 3042.89 samples/sec Loss 0.8740 LearningRate 0.0000 Epoch: 19 Global Step: 245660 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:00:02,330-Speed 3009.67 samples/sec Loss 0.8524 LearningRate 0.0000 Epoch: 19 Global Step: 245670 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:00:05,727-Speed 3015.36 samples/sec Loss 0.8735 LearningRate 0.0000 Epoch: 19 Global Step: 245680 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:00:09,083-Speed 3051.57 samples/sec Loss 0.8696 LearningRate 0.0000 Epoch: 19 Global Step: 245690 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:00:12,415-Speed 3074.68 samples/sec Loss 0.8613 LearningRate 0.0000 Epoch: 19 Global Step: 245700 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:00:15,786-Speed 3038.41 samples/sec Loss 0.8859 LearningRate 0.0000 Epoch: 19 Global Step: 245710 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:00:19,215-Speed 2986.60 samples/sec Loss 0.8313 LearningRate 0.0000 Epoch: 19 Global Step: 245720 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-28 01:00:22,572-Speed 3051.24 samples/sec Loss 0.8763 LearningRate 0.0000 Epoch: 19 Global Step: 245730 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:00:25,940-Speed 3041.48 samples/sec Loss 0.8374 LearningRate 0.0000 Epoch: 19 Global Step: 245740 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:00:29,337-Speed 3015.66 samples/sec Loss 0.8574 LearningRate 0.0000 Epoch: 19 Global Step: 245750 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:00:32,711-Speed 3036.43 samples/sec Loss 0.8659 LearningRate 0.0000 Epoch: 19 Global Step: 245760 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:00:36,056-Speed 3061.96 samples/sec Loss 0.9077 LearningRate 0.0000 Epoch: 19 Global Step: 245770 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:00:39,436-Speed 3029.91 samples/sec Loss 0.8687 LearningRate 0.0000 Epoch: 19 Global Step: 245780 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:00:42,907-Speed 2950.64 samples/sec Loss 0.8418 LearningRate 0.0000 Epoch: 19 Global Step: 245790 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:00:46,296-Speed 3022.48 samples/sec Loss 0.8692 LearningRate 0.0000 Epoch: 19 Global Step: 245800 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:00:49,750-Speed 2965.61 samples/sec Loss 0.8820 LearningRate 0.0000 Epoch: 19 Global Step: 245810 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:00:53,103-Speed 3054.69 samples/sec Loss 0.8831 LearningRate 0.0000 Epoch: 19 Global Step: 245820 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:00:56,443-Speed 3066.32 samples/sec Loss 0.8660 LearningRate 0.0000 Epoch: 19 Global Step: 245830 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-28 01:00:59,837-Speed 3019.01 samples/sec Loss 0.8958 LearningRate 0.0000 Epoch: 19 Global Step: 245840 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:01:03,204-Speed 3042.18 samples/sec Loss 0.8605 LearningRate 0.0000 Epoch: 19 Global Step: 245850 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:01:06,588-Speed 3026.82 samples/sec Loss 0.8722 LearningRate 0.0000 Epoch: 19 Global Step: 245860 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:01:09,997-Speed 3004.84 samples/sec Loss 0.8690 LearningRate 0.0000 Epoch: 19 Global Step: 245870 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:01:13,362-Speed 3044.00 samples/sec Loss 0.8713 LearningRate 0.0000 Epoch: 19 Global Step: 245880 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:01:16,734-Speed 3037.26 samples/sec Loss 0.8778 LearningRate 0.0000 Epoch: 19 Global Step: 245890 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:01:20,088-Speed 3054.29 samples/sec Loss 0.8624 LearningRate 0.0000 Epoch: 19 Global Step: 245900 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:01:23,547-Speed 2961.30 samples/sec Loss 0.8362 LearningRate 0.0000 Epoch: 19 Global Step: 245910 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:01:26,921-Speed 3035.46 samples/sec Loss 0.8762 LearningRate 0.0000 Epoch: 19 Global Step: 245920 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:01:30,426-Speed 2922.54 samples/sec Loss 0.8669 LearningRate 0.0000 Epoch: 19 Global Step: 245930 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:01:33,842-Speed 2998.38 samples/sec Loss 0.8715 LearningRate 0.0000 Epoch: 19 Global Step: 245940 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-28 01:01:37,235-Speed 3019.12 samples/sec Loss 0.8505 LearningRate 0.0000 Epoch: 19 Global Step: 245950 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:01:40,596-Speed 3046.89 samples/sec Loss 0.8678 LearningRate 0.0000 Epoch: 19 Global Step: 245960 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:01:43,940-Speed 3063.68 samples/sec Loss 0.8502 LearningRate 0.0000 Epoch: 19 Global Step: 245970 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:01:47,299-Speed 3049.21 samples/sec Loss 0.8543 LearningRate 0.0000 Epoch: 19 Global Step: 245980 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:01:50,691-Speed 3019.10 samples/sec Loss 0.8812 LearningRate 0.0000 Epoch: 19 Global Step: 245990 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:01:54,052-Speed 3047.82 samples/sec Loss 0.8673 LearningRate 0.0000 Epoch: 19 Global Step: 246000 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 01:01:57,435-Speed 3027.30 samples/sec Loss 0.8718 LearningRate 0.0000 Epoch: 19 Global Step: 246010 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 01:02:00,758-Speed 3082.54 samples/sec Loss 0.8943 LearningRate 0.0000 Epoch: 19 Global Step: 246020 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 01:02:04,141-Speed 3028.22 samples/sec Loss 0.8547 LearningRate 0.0000 Epoch: 19 Global Step: 246030 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 01:02:07,483-Speed 3065.53 samples/sec Loss 0.8359 LearningRate 0.0000 Epoch: 19 Global Step: 246040 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 01:02:10,852-Speed 3039.78 samples/sec Loss 0.8827 LearningRate 0.0000 Epoch: 19 Global Step: 246050 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 01:02:14,188-Speed 3070.23 samples/sec Loss 0.8768 LearningRate 0.0000 Epoch: 19 Global Step: 246060 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 01:02:17,581-Speed 3019.09 samples/sec Loss 0.8906 LearningRate 0.0000 Epoch: 19 Global Step: 246070 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 01:02:20,915-Speed 3071.79 samples/sec Loss 0.9052 LearningRate 0.0000 Epoch: 19 Global Step: 246080 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 01:02:24,294-Speed 3031.96 samples/sec Loss 0.8744 LearningRate 0.0000 Epoch: 19 Global Step: 246090 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 01:02:27,695-Speed 3011.67 samples/sec Loss 0.8439 LearningRate 0.0000 Epoch: 19 Global Step: 246100 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:02:31,067-Speed 3037.10 samples/sec Loss 0.8763 LearningRate 0.0000 Epoch: 19 Global Step: 246110 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 01:02:34,544-Speed 2946.29 samples/sec Loss 0.8885 LearningRate 0.0000 Epoch: 19 Global Step: 246120 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 01:02:37,974-Speed 2986.01 samples/sec Loss 0.8640 LearningRate 0.0000 Epoch: 19 Global Step: 246130 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 01:02:41,366-Speed 3019.84 samples/sec Loss 0.8549 LearningRate 0.0000 Epoch: 19 Global Step: 246140 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 01:02:44,775-Speed 3004.80 samples/sec Loss 0.8572 LearningRate 0.0000 Epoch: 19 Global Step: 246150 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 01:02:48,180-Speed 3008.02 samples/sec Loss 0.8161 LearningRate 0.0000 Epoch: 19 Global Step: 246160 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 01:02:51,544-Speed 3045.11 samples/sec Loss 0.8876 LearningRate 0.0000 Epoch: 19 Global Step: 246170 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 01:02:54,892-Speed 3058.93 samples/sec Loss 0.8395 LearningRate 0.0000 Epoch: 19 Global Step: 246180 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 01:02:58,295-Speed 3010.05 samples/sec Loss 0.8467 LearningRate 0.0000 Epoch: 19 Global Step: 246190 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 01:03:01,646-Speed 3057.03 samples/sec Loss 0.9019 LearningRate 0.0000 Epoch: 19 Global Step: 246200 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 01:03:05,088-Speed 2975.46 samples/sec Loss 0.8221 LearningRate 0.0000 Epoch: 19 Global Step: 246210 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:03:08,498-Speed 3003.02 samples/sec Loss 0.8630 LearningRate 0.0000 Epoch: 19 Global Step: 246220 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:03:11,865-Speed 3042.09 samples/sec Loss 0.8734 LearningRate 0.0000 Epoch: 19 Global Step: 246230 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:03:15,202-Speed 3070.05 samples/sec Loss 0.8986 LearningRate 0.0000 Epoch: 19 Global Step: 246240 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:03:18,549-Speed 3060.20 samples/sec Loss 0.8498 LearningRate 0.0000 Epoch: 19 Global Step: 246250 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:03:21,955-Speed 3006.66 samples/sec Loss 0.8741 LearningRate 0.0000 Epoch: 19 Global Step: 246260 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:03:25,283-Speed 3078.74 samples/sec Loss 0.8840 LearningRate 0.0000 Epoch: 19 Global Step: 246270 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:03:28,601-Speed 3086.59 samples/sec Loss 0.8575 LearningRate 0.0000 Epoch: 19 Global Step: 246280 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:03:32,004-Speed 3009.72 samples/sec Loss 0.8153 LearningRate 0.0000 Epoch: 19 Global Step: 246290 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:03:35,402-Speed 3014.37 samples/sec Loss 0.8691 LearningRate 0.0000 Epoch: 19 Global Step: 246300 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:03:38,782-Speed 3030.32 samples/sec Loss 0.8682 LearningRate 0.0000 Epoch: 19 Global Step: 246310 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-28 01:03:42,124-Speed 3064.77 samples/sec Loss 0.8840 LearningRate 0.0000 Epoch: 19 Global Step: 246320 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:03:45,476-Speed 3057.92 samples/sec Loss 0.8322 LearningRate 0.0000 Epoch: 19 Global Step: 246330 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:03:48,814-Speed 3068.99 samples/sec Loss 0.8253 LearningRate 0.0000 Epoch: 19 Global Step: 246340 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:03:52,205-Speed 3020.57 samples/sec Loss 0.8728 LearningRate 0.0000 Epoch: 19 Global Step: 246350 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:03:55,623-Speed 2996.39 samples/sec Loss 0.8995 LearningRate 0.0000 Epoch: 19 Global Step: 246360 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:03:59,003-Speed 3030.07 samples/sec Loss 0.9132 LearningRate 0.0000 Epoch: 19 Global Step: 246370 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:04:02,385-Speed 3028.79 samples/sec Loss 0.8113 LearningRate 0.0000 Epoch: 19 Global Step: 246380 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:04:05,785-Speed 3012.57 samples/sec Loss 0.8796 LearningRate 0.0000 Epoch: 19 Global Step: 246390 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:04:09,174-Speed 3022.07 samples/sec Loss 0.8510 LearningRate 0.0000 Epoch: 19 Global Step: 246400 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:04:12,505-Speed 3075.74 samples/sec Loss 0.8714 LearningRate 0.0000 Epoch: 19 Global Step: 246410 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:04:15,888-Speed 3027.68 samples/sec Loss 0.8811 LearningRate 0.0000 Epoch: 19 Global Step: 246420 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-28 01:04:19,302-Speed 2999.98 samples/sec Loss 0.8980 LearningRate 0.0000 Epoch: 19 Global Step: 246430 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:04:22,637-Speed 3071.47 samples/sec Loss 0.8583 LearningRate 0.0000 Epoch: 19 Global Step: 246440 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:04:25,958-Speed 3084.14 samples/sec Loss 0.8801 LearningRate 0.0000 Epoch: 19 Global Step: 246450 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 01:04:29,395-Speed 2980.48 samples/sec Loss 0.8724 LearningRate 0.0000 Epoch: 19 Global Step: 246460 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 01:04:32,784-Speed 3022.36 samples/sec Loss 0.8925 LearningRate 0.0000 Epoch: 19 Global Step: 246470 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 01:04:36,139-Speed 3053.60 samples/sec Loss 0.8409 LearningRate 0.0000 Epoch: 19 Global Step: 246480 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 01:04:39,538-Speed 3012.95 samples/sec Loss 0.8692 LearningRate 0.0000 Epoch: 19 Global Step: 246490 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 01:04:42,963-Speed 2990.16 samples/sec Loss 0.8758 LearningRate 0.0000 Epoch: 19 Global Step: 246500 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 01:04:46,302-Speed 3068.50 samples/sec Loss 0.8719 LearningRate 0.0000 Epoch: 19 Global Step: 246510 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 01:04:49,778-Speed 2946.55 samples/sec Loss 0.8830 LearningRate 0.0000 Epoch: 19 Global Step: 246520 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 01:04:53,168-Speed 3021.93 samples/sec Loss 0.8390 LearningRate 0.0000 Epoch: 19 Global Step: 246530 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 01:04:56,506-Speed 3068.44 samples/sec Loss 0.8658 LearningRate 0.0000 Epoch: 19 Global Step: 246540 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 01:04:59,954-Speed 2971.31 samples/sec Loss 0.9215 LearningRate 0.0000 Epoch: 19 Global Step: 246550 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:05:03,299-Speed 3062.35 samples/sec Loss 0.8761 LearningRate 0.0000 Epoch: 19 Global Step: 246560 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:05:06,702-Speed 3009.55 samples/sec Loss 0.8263 LearningRate 0.0000 Epoch: 19 Global Step: 246570 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:05:10,183-Speed 2942.19 samples/sec Loss 0.9012 LearningRate 0.0000 Epoch: 19 Global Step: 246580 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:05:13,514-Speed 3075.21 samples/sec Loss 0.8507 LearningRate 0.0000 Epoch: 19 Global Step: 246590 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:05:16,884-Speed 3039.10 samples/sec Loss 0.8513 LearningRate 0.0000 Epoch: 19 Global Step: 246600 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:05:20,235-Speed 3057.21 samples/sec Loss 0.8710 LearningRate 0.0000 Epoch: 19 Global Step: 246610 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:05:23,622-Speed 3023.60 samples/sec Loss 0.8897 LearningRate 0.0000 Epoch: 19 Global Step: 246620 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:05:26,992-Speed 3039.33 samples/sec Loss 0.8774 LearningRate 0.0000 Epoch: 19 Global Step: 246630 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:05:30,394-Speed 3011.69 samples/sec Loss 0.8786 LearningRate 0.0000 Epoch: 19 Global Step: 246640 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:05:33,724-Speed 3075.62 samples/sec Loss 0.8839 LearningRate 0.0000 Epoch: 19 Global Step: 246650 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-28 01:05:37,116-Speed 3019.47 samples/sec Loss 0.8828 LearningRate 0.0000 Epoch: 19 Global Step: 246660 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:05:40,488-Speed 3037.18 samples/sec Loss 0.8638 LearningRate 0.0000 Epoch: 19 Global Step: 246670 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:05:43,842-Speed 3054.68 samples/sec Loss 0.8496 LearningRate 0.0000 Epoch: 19 Global Step: 246680 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:05:47,175-Speed 3073.06 samples/sec Loss 0.8658 LearningRate 0.0000 Epoch: 19 Global Step: 246690 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 01:05:50,584-Speed 3004.22 samples/sec Loss 0.9206 LearningRate 0.0000 Epoch: 19 Global Step: 246700 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 01:05:53,967-Speed 3028.03 samples/sec Loss 0.8696 LearningRate 0.0000 Epoch: 19 Global Step: 246710 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 01:05:57,332-Speed 3043.66 samples/sec Loss 0.8424 LearningRate 0.0000 Epoch: 19 Global Step: 246720 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 01:06:00,777-Speed 2973.10 samples/sec Loss 0.8808 LearningRate 0.0000 Epoch: 19 Global Step: 246730 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 01:06:04,192-Speed 2999.33 samples/sec Loss 0.9038 LearningRate 0.0000 Epoch: 19 Global Step: 246740 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 01:06:07,603-Speed 3002.25 samples/sec Loss 0.8942 LearningRate 0.0000 Epoch: 19 Global Step: 246750 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 01:06:10,955-Speed 3056.41 samples/sec Loss 0.8585 LearningRate 0.0000 Epoch: 19 Global Step: 246760 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 01:06:14,381-Speed 2989.80 samples/sec Loss 0.8706 LearningRate 0.0000 Epoch: 19 Global Step: 246770 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 01:06:17,719-Speed 3068.06 samples/sec Loss 0.8687 LearningRate 0.0000 Epoch: 19 Global Step: 246780 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 01:06:21,083-Speed 3045.23 samples/sec Loss 0.8651 LearningRate 0.0000 Epoch: 19 Global Step: 246790 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 01:06:24,410-Speed 3078.86 samples/sec Loss 0.8688 LearningRate 0.0000 Epoch: 19 Global Step: 246800 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 01:06:27,760-Speed 3057.35 samples/sec Loss 0.8938 LearningRate 0.0000 Epoch: 19 Global Step: 246810 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 01:06:31,153-Speed 3018.11 samples/sec Loss 0.8505 LearningRate 0.0000 Epoch: 19 Global Step: 246820 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 01:06:34,594-Speed 2977.30 samples/sec Loss 0.9154 LearningRate 0.0000 Epoch: 19 Global Step: 246830 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 01:06:38,022-Speed 2987.77 samples/sec Loss 0.8427 LearningRate 0.0000 Epoch: 19 Global Step: 246840 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 01:06:41,419-Speed 3014.83 samples/sec Loss 0.8456 LearningRate 0.0000 Epoch: 19 Global Step: 246850 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 01:06:44,849-Speed 2986.11 samples/sec Loss 0.8641 LearningRate 0.0000 Epoch: 19 Global Step: 246860 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 01:06:48,221-Speed 3037.76 samples/sec Loss 0.8609 LearningRate 0.0000 Epoch: 19 Global Step: 246870 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 01:06:51,598-Speed 3033.11 samples/sec Loss 0.8483 LearningRate 0.0000 Epoch: 19 Global Step: 246880 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 01:06:55,032-Speed 2983.05 samples/sec Loss 0.8682 LearningRate 0.0000 Epoch: 19 Global Step: 246890 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:06:58,442-Speed 3003.78 samples/sec Loss 0.8590 LearningRate 0.0000 Epoch: 19 Global Step: 246900 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:07:01,827-Speed 3026.50 samples/sec Loss 0.8558 LearningRate 0.0000 Epoch: 19 Global Step: 246910 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:07:05,191-Speed 3044.97 samples/sec Loss 0.8398 LearningRate 0.0000 Epoch: 19 Global Step: 246920 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:07:08,584-Speed 3018.39 samples/sec Loss 0.8597 LearningRate 0.0000 Epoch: 19 Global Step: 246930 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:07:11,991-Speed 3006.35 samples/sec Loss 0.8627 LearningRate 0.0000 Epoch: 19 Global Step: 246940 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:07:15,321-Speed 3076.49 samples/sec Loss 0.8664 LearningRate 0.0000 Epoch: 19 Global Step: 246950 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:07:18,707-Speed 3024.48 samples/sec Loss 0.8459 LearningRate 0.0000 Epoch: 19 Global Step: 246960 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:07:22,088-Speed 3029.50 samples/sec Loss 0.8899 LearningRate 0.0000 Epoch: 19 Global Step: 246970 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:07:25,457-Speed 3040.25 samples/sec Loss 0.8688 LearningRate 0.0000 Epoch: 19 Global Step: 246980 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:07:28,903-Speed 2972.35 samples/sec Loss 0.8604 LearningRate 0.0000 Epoch: 19 Global Step: 246990 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-28 01:07:32,250-Speed 3063.15 samples/sec Loss 0.8567 LearningRate 0.0000 Epoch: 19 Global Step: 247000 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-28 01:07:35,675-Speed 2990.15 samples/sec Loss 0.8444 LearningRate 0.0000 Epoch: 19 Global Step: 247010 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-28 01:07:39,004-Speed 3076.60 samples/sec Loss 0.8803 LearningRate 0.0000 Epoch: 19 Global Step: 247020 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:07:42,361-Speed 3051.33 samples/sec Loss 0.8366 LearningRate 0.0000 Epoch: 19 Global Step: 247030 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:07:45,803-Speed 2975.99 samples/sec Loss 0.8814 LearningRate 0.0000 Epoch: 19 Global Step: 247040 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:07:49,332-Speed 2902.47 samples/sec Loss 0.8836 LearningRate 0.0000 Epoch: 19 Global Step: 247050 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:07:52,765-Speed 2983.39 samples/sec Loss 0.9037 LearningRate 0.0000 Epoch: 19 Global Step: 247060 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:07:56,241-Speed 2947.10 samples/sec Loss 0.8864 LearningRate 0.0000 Epoch: 19 Global Step: 247070 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:07:59,609-Speed 3041.80 samples/sec Loss 0.8722 LearningRate 0.0000 Epoch: 19 Global Step: 247080 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:08:03,010-Speed 3011.75 samples/sec Loss 0.8552 LearningRate 0.0000 Epoch: 19 Global Step: 247090 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:08:06,418-Speed 3005.32 samples/sec Loss 0.8494 LearningRate 0.0000 Epoch: 19 Global Step: 247100 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:08:09,806-Speed 3023.27 samples/sec Loss 0.8416 LearningRate 0.0000 Epoch: 19 Global Step: 247110 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:08:13,169-Speed 3045.13 samples/sec Loss 0.8578 LearningRate 0.0000 Epoch: 19 Global Step: 247120 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-28 01:08:16,531-Speed 3047.13 samples/sec Loss 0.8813 LearningRate 0.0000 Epoch: 19 Global Step: 247130 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:08:19,957-Speed 2989.27 samples/sec Loss 0.9177 LearningRate 0.0000 Epoch: 19 Global Step: 247140 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:08:23,353-Speed 3016.99 samples/sec Loss 0.8148 LearningRate 0.0000 Epoch: 19 Global Step: 247150 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:08:26,746-Speed 3018.15 samples/sec Loss 0.8421 LearningRate 0.0000 Epoch: 19 Global Step: 247160 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:08:30,129-Speed 3028.37 samples/sec Loss 0.8795 LearningRate 0.0000 Epoch: 19 Global Step: 247170 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 01:08:33,589-Speed 2960.12 samples/sec Loss 0.8816 LearningRate 0.0000 Epoch: 19 Global Step: 247180 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 01:08:36,986-Speed 3015.13 samples/sec Loss 0.8766 LearningRate 0.0000 Epoch: 19 Global Step: 247190 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 01:08:40,340-Speed 3054.44 samples/sec Loss 0.8369 LearningRate 0.0000 Epoch: 19 Global Step: 247200 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 01:08:43,684-Speed 3062.29 samples/sec Loss 0.8463 LearningRate 0.0000 Epoch: 19 Global Step: 247210 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 01:08:47,120-Speed 2981.02 samples/sec Loss 0.8608 LearningRate 0.0000 Epoch: 19 Global Step: 247220 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 01:08:50,530-Speed 3003.82 samples/sec Loss 0.8496 LearningRate 0.0000 Epoch: 19 Global Step: 247230 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 01:08:53,915-Speed 3026.12 samples/sec Loss 0.8489 LearningRate 0.0000 Epoch: 19 Global Step: 247240 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 01:08:57,258-Speed 3063.65 samples/sec Loss 0.8857 LearningRate 0.0000 Epoch: 19 Global Step: 247250 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 01:09:00,661-Speed 3010.14 samples/sec Loss 0.8519 LearningRate 0.0000 Epoch: 19 Global Step: 247260 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 01:09:04,019-Speed 3050.45 samples/sec Loss 0.8728 LearningRate 0.0000 Epoch: 19 Global Step: 247270 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:09:07,382-Speed 3046.11 samples/sec Loss 0.8725 LearningRate 0.0000 Epoch: 19 Global Step: 247280 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:09:10,747-Speed 3043.71 samples/sec Loss 0.8552 LearningRate 0.0000 Epoch: 19 Global Step: 247290 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:09:14,114-Speed 3041.64 samples/sec Loss 0.8906 LearningRate 0.0000 Epoch: 19 Global Step: 247300 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:09:17,477-Speed 3045.77 samples/sec Loss 0.8261 LearningRate 0.0000 Epoch: 19 Global Step: 247310 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:09:20,833-Speed 3052.18 samples/sec Loss 0.8727 LearningRate 0.0000 Epoch: 19 Global Step: 247320 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:09:24,169-Speed 3071.25 samples/sec Loss 0.8670 LearningRate 0.0000 Epoch: 19 Global Step: 247330 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 01:09:27,587-Speed 2996.05 samples/sec Loss 0.8436 LearningRate 0.0000 Epoch: 19 Global Step: 247340 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 01:09:31,015-Speed 2988.21 samples/sec Loss 0.8710 LearningRate 0.0000 Epoch: 19 Global Step: 247350 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 01:09:34,342-Speed 3078.86 samples/sec Loss 0.8516 LearningRate 0.0000 Epoch: 19 Global Step: 247360 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 01:09:37,713-Speed 3038.62 samples/sec Loss 0.8882 LearningRate 0.0000 Epoch: 19 Global Step: 247370 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 01:09:41,225-Speed 2917.16 samples/sec Loss 0.8683 LearningRate 0.0000 Epoch: 19 Global Step: 247380 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 01:09:44,701-Speed 2945.77 samples/sec Loss 0.8281 LearningRate 0.0000 Epoch: 19 Global Step: 247390 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 01:09:48,098-Speed 3015.90 samples/sec Loss 0.8473 LearningRate 0.0000 Epoch: 19 Global Step: 247400 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 01:09:51,538-Speed 2978.03 samples/sec Loss 0.8788 LearningRate 0.0000 Epoch: 19 Global Step: 247410 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 01:09:55,020-Speed 2941.44 samples/sec Loss 0.9154 LearningRate 0.0000 Epoch: 19 Global Step: 247420 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 01:09:58,344-Speed 3080.99 samples/sec Loss 0.8744 LearningRate 0.0000 Epoch: 19 Global Step: 247430 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:10:01,751-Speed 3006.71 samples/sec Loss 0.8675 LearningRate 0.0000 Epoch: 19 Global Step: 247440 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:10:05,234-Speed 2940.71 samples/sec Loss 0.8186 LearningRate 0.0000 Epoch: 19 Global Step: 247450 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:10:08,669-Speed 2982.02 samples/sec Loss 0.8933 LearningRate 0.0000 Epoch: 19 Global Step: 247460 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:10:12,041-Speed 3037.85 samples/sec Loss 0.8838 LearningRate 0.0000 Epoch: 19 Global Step: 247470 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:10:15,447-Speed 3006.69 samples/sec Loss 0.8221 LearningRate 0.0000 Epoch: 19 Global Step: 247480 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:10:18,806-Speed 3049.19 samples/sec Loss 0.8672 LearningRate 0.0000 Epoch: 19 Global Step: 247490 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:10:22,194-Speed 3024.13 samples/sec Loss 0.9465 LearningRate 0.0000 Epoch: 19 Global Step: 247500 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:10:25,557-Speed 3045.47 samples/sec Loss 0.8676 LearningRate 0.0000 Epoch: 19 Global Step: 247510 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:10:28,923-Speed 3043.41 samples/sec Loss 0.8531 LearningRate 0.0000 Epoch: 19 Global Step: 247520 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:10:32,266-Speed 3063.94 samples/sec Loss 0.8433 LearningRate 0.0000 Epoch: 19 Global Step: 247530 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-28 01:10:35,550-Speed 3118.69 samples/sec Loss 0.8528 LearningRate 0.0000 Epoch: 19 Global Step: 247540 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:10:38,899-Speed 3058.64 samples/sec Loss 0.8397 LearningRate 0.0000 Epoch: 19 Global Step: 247550 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:10:42,279-Speed 3030.10 samples/sec Loss 0.8778 LearningRate 0.0000 Epoch: 19 Global Step: 247560 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:10:45,700-Speed 2994.29 samples/sec Loss 0.8305 LearningRate 0.0000 Epoch: 19 Global Step: 247570 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:10:49,110-Speed 3004.03 samples/sec Loss 0.8314 LearningRate 0.0000 Epoch: 19 Global Step: 247580 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:10:52,463-Speed 3054.54 samples/sec Loss 0.8608 LearningRate 0.0000 Epoch: 19 Global Step: 247590 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:10:55,867-Speed 3008.99 samples/sec Loss 0.8571 LearningRate 0.0000 Epoch: 19 Global Step: 247600 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:10:59,317-Speed 2968.87 samples/sec Loss 0.8671 LearningRate 0.0000 Epoch: 19 Global Step: 247610 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:11:02,703-Speed 3025.91 samples/sec Loss 0.8457 LearningRate 0.0000 Epoch: 19 Global Step: 247620 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:11:06,079-Speed 3034.03 samples/sec Loss 0.8598 LearningRate 0.0000 Epoch: 19 Global Step: 247630 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:11:09,463-Speed 3026.62 samples/sec Loss 0.8928 LearningRate 0.0000 Epoch: 19 Global Step: 247640 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:11:12,848-Speed 3026.32 samples/sec Loss 0.8555 LearningRate 0.0000 Epoch: 19 Global Step: 247650 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:11:16,219-Speed 3037.51 samples/sec Loss 0.8620 LearningRate 0.0000 Epoch: 19 Global Step: 247660 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:11:19,654-Speed 2982.13 samples/sec Loss 0.8411 LearningRate 0.0000 Epoch: 19 Global Step: 247670 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:11:22,999-Speed 3062.71 samples/sec Loss 0.8590 LearningRate 0.0000 Epoch: 19 Global Step: 247680 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:11:26,430-Speed 2984.76 samples/sec Loss 0.7916 LearningRate 0.0000 Epoch: 19 Global Step: 247690 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:11:29,776-Speed 3062.10 samples/sec Loss 0.8741 LearningRate 0.0000 Epoch: 19 Global Step: 247700 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:11:33,155-Speed 3030.77 samples/sec Loss 0.8689 LearningRate 0.0000 Epoch: 19 Global Step: 247710 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:11:36,525-Speed 3039.62 samples/sec Loss 0.8880 LearningRate 0.0000 Epoch: 19 Global Step: 247720 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:11:39,841-Speed 3089.24 samples/sec Loss 0.8716 LearningRate 0.0000 Epoch: 19 Global Step: 247730 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:11:43,229-Speed 3022.69 samples/sec Loss 0.8795 LearningRate 0.0000 Epoch: 19 Global Step: 247740 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 01:11:46,564-Speed 3071.88 samples/sec Loss 0.8736 LearningRate 0.0000 Epoch: 19 Global Step: 247750 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 01:11:49,981-Speed 2997.23 samples/sec Loss 0.8129 LearningRate 0.0000 Epoch: 19 Global Step: 247760 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 01:11:53,336-Speed 3052.96 samples/sec Loss 0.8648 LearningRate 0.0000 Epoch: 19 Global Step: 247770 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 01:11:56,684-Speed 3059.44 samples/sec Loss 0.8706 LearningRate 0.0000 Epoch: 19 Global Step: 247780 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 01:12:00,029-Speed 3061.73 samples/sec Loss 0.8913 LearningRate 0.0000 Epoch: 19 Global Step: 247790 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 01:12:03,412-Speed 3027.95 samples/sec Loss 0.8784 LearningRate 0.0000 Epoch: 19 Global Step: 247800 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 01:12:06,750-Speed 3069.05 samples/sec Loss 0.8767 LearningRate 0.0000 Epoch: 19 Global Step: 247810 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 01:12:10,068-Speed 3086.66 samples/sec Loss 0.8618 LearningRate 0.0000 Epoch: 19 Global Step: 247820 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 01:12:13,456-Speed 3023.63 samples/sec Loss 0.8897 LearningRate 0.0000 Epoch: 19 Global Step: 247830 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 01:12:16,896-Speed 2977.61 samples/sec Loss 0.8864 LearningRate 0.0000 Epoch: 19 Global Step: 247840 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:12:20,238-Speed 3064.98 samples/sec Loss 0.8721 LearningRate 0.0000 Epoch: 19 Global Step: 247850 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:12:23,618-Speed 3030.42 samples/sec Loss 0.8893 LearningRate 0.0000 Epoch: 19 Global Step: 247860 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:12:26,980-Speed 3046.34 samples/sec Loss 0.8574 LearningRate 0.0000 Epoch: 19 Global Step: 247870 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:12:30,359-Speed 3031.44 samples/sec Loss 0.9115 LearningRate 0.0000 Epoch: 19 Global Step: 247880 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:12:33,681-Speed 3082.93 samples/sec Loss 0.8982 LearningRate 0.0000 Epoch: 19 Global Step: 247890 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 01:12:37,024-Speed 3064.32 samples/sec Loss 0.9015 LearningRate 0.0000 Epoch: 19 Global Step: 247900 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 01:12:40,440-Speed 2998.12 samples/sec Loss 0.8183 LearningRate 0.0000 Epoch: 19 Global Step: 247910 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 01:12:43,801-Speed 3048.28 samples/sec Loss 0.8955 LearningRate 0.0000 Epoch: 19 Global Step: 247920 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 01:12:47,256-Speed 2964.38 samples/sec Loss 0.8328 LearningRate 0.0000 Epoch: 19 Global Step: 247930 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 01:12:50,621-Speed 3043.98 samples/sec Loss 0.8281 LearningRate 0.0000 Epoch: 19 Global Step: 247940 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 01:12:54,039-Speed 2997.42 samples/sec Loss 0.8796 LearningRate 0.0000 Epoch: 19 Global Step: 247950 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 01:12:57,402-Speed 3044.98 samples/sec Loss 0.8785 LearningRate 0.0000 Epoch: 19 Global Step: 247960 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 01:13:00,865-Speed 2958.15 samples/sec Loss 0.8464 LearningRate 0.0000 Epoch: 19 Global Step: 247970 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 01:13:04,234-Speed 3040.33 samples/sec Loss 0.8637 LearningRate 0.0000 Epoch: 19 Global Step: 247980 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 01:13:07,592-Speed 3050.82 samples/sec Loss 0.8939 LearningRate 0.0000 Epoch: 19 Global Step: 247990 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:13:11,035-Speed 2974.78 samples/sec Loss 0.8595 LearningRate 0.0000 Epoch: 19 Global Step: 248000 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:13:14,371-Speed 3070.45 samples/sec Loss 0.8731 LearningRate 0.0000 Epoch: 19 Global Step: 248010 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:13:17,762-Speed 3020.66 samples/sec Loss 0.8925 LearningRate 0.0000 Epoch: 19 Global Step: 248020 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:13:21,152-Speed 3021.35 samples/sec Loss 0.8759 LearningRate 0.0000 Epoch: 19 Global Step: 248030 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:13:24,581-Speed 2987.00 samples/sec Loss 0.8543 LearningRate 0.0000 Epoch: 19 Global Step: 248040 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:13:28,006-Speed 2990.18 samples/sec Loss 0.9093 LearningRate 0.0000 Epoch: 19 Global Step: 248050 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:13:31,332-Speed 3079.96 samples/sec Loss 0.8513 LearningRate 0.0000 Epoch: 19 Global Step: 248060 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:13:34,722-Speed 3022.01 samples/sec Loss 0.8442 LearningRate 0.0000 Epoch: 19 Global Step: 248070 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:13:38,077-Speed 3052.13 samples/sec Loss 0.8545 LearningRate 0.0000 Epoch: 19 Global Step: 248080 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:13:41,397-Speed 3086.11 samples/sec Loss 0.8527 LearningRate 0.0000 Epoch: 19 Global Step: 248090 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-28 01:13:44,749-Speed 3055.70 samples/sec Loss 0.8687 LearningRate 0.0000 Epoch: 19 Global Step: 248100 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-28 01:13:48,113-Speed 3044.89 samples/sec Loss 0.9154 LearningRate 0.0000 Epoch: 19 Global Step: 248110 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:13:51,470-Speed 3051.48 samples/sec Loss 0.8269 LearningRate 0.0000 Epoch: 19 Global Step: 248120 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:13:54,898-Speed 2988.07 samples/sec Loss 0.8703 LearningRate 0.0000 Epoch: 19 Global Step: 248130 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:13:58,233-Speed 3071.54 samples/sec Loss 0.8502 LearningRate 0.0000 Epoch: 19 Global Step: 248140 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:14:01,629-Speed 3015.25 samples/sec Loss 0.8659 LearningRate 0.0000 Epoch: 19 Global Step: 248150 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:14:04,966-Speed 3070.40 samples/sec Loss 0.8875 LearningRate 0.0000 Epoch: 19 Global Step: 248160 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:14:08,307-Speed 3065.47 samples/sec Loss 0.8677 LearningRate 0.0000 Epoch: 19 Global Step: 248170 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:14:11,668-Speed 3047.91 samples/sec Loss 0.8463 LearningRate 0.0000 Epoch: 19 Global Step: 248180 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:14:15,029-Speed 3047.62 samples/sec Loss 0.8845 LearningRate 0.0000 Epoch: 19 Global Step: 248190 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:14:18,430-Speed 3012.02 samples/sec Loss 0.8374 LearningRate 0.0000 Epoch: 19 Global Step: 248200 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:14:21,790-Speed 3048.41 samples/sec Loss 0.9028 LearningRate 0.0000 Epoch: 19 Global Step: 248210 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-28 01:14:25,141-Speed 3057.14 samples/sec Loss 0.8676 LearningRate 0.0000 Epoch: 19 Global Step: 248220 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:14:28,506-Speed 3043.33 samples/sec Loss 0.8482 LearningRate 0.0000 Epoch: 19 Global Step: 248230 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:14:31,876-Speed 3039.49 samples/sec Loss 0.8928 LearningRate 0.0000 Epoch: 19 Global Step: 248240 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:14:35,205-Speed 3077.32 samples/sec Loss 0.8320 LearningRate 0.0000 Epoch: 19 Global Step: 248250 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:14:38,553-Speed 3059.87 samples/sec Loss 0.8276 LearningRate 0.0000 Epoch: 19 Global Step: 248260 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:14:41,954-Speed 3010.93 samples/sec Loss 0.8727 LearningRate 0.0000 Epoch: 19 Global Step: 248270 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:14:45,337-Speed 3028.29 samples/sec Loss 0.8716 LearningRate 0.0000 Epoch: 19 Global Step: 248280 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:14:48,747-Speed 3003.78 samples/sec Loss 0.8395 LearningRate 0.0000 Epoch: 19 Global Step: 248290 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:14:52,190-Speed 2975.65 samples/sec Loss 0.8636 LearningRate 0.0000 Epoch: 19 Global Step: 248300 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:14:55,596-Speed 3007.10 samples/sec Loss 0.8418 LearningRate 0.0000 Epoch: 19 Global Step: 248310 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:14:58,954-Speed 3050.57 samples/sec Loss 0.9275 LearningRate 0.0000 Epoch: 19 Global Step: 248320 Fp16 Grad Scale: 65536 Required: 0 hours Training: 2022-04-28 01:15:02,276-Speed 3082.97 samples/sec Loss 0.9029 LearningRate 0.0000 Epoch: 19 Global Step: 248330 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:15:05,633-Speed 3051.48 samples/sec Loss 0.8403 LearningRate 0.0000 Epoch: 19 Global Step: 248340 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:15:09,009-Speed 3034.04 samples/sec Loss 0.8584 LearningRate 0.0000 Epoch: 19 Global Step: 248350 Fp16 Grad Scale: 32768 Required: 0 hours Training: 2022-04-28 01:15:12,318-Speed 3095.55 samples/sec Loss 0.8625 LearningRate 0.0000 Epoch: 19 Global Step: 248360 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 01:15:15,735-Speed 2997.70 samples/sec Loss 0.8571 LearningRate 0.0000 Epoch: 19 Global Step: 248370 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 01:15:19,101-Speed 3042.46 samples/sec Loss 0.8881 LearningRate 0.0000 Epoch: 19 Global Step: 248380 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 01:15:22,519-Speed 2996.68 samples/sec Loss 0.8693 LearningRate 0.0000 Epoch: 19 Global Step: 248390 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 01:15:25,989-Speed 2952.44 samples/sec Loss 0.8958 LearningRate 0.0000 Epoch: 19 Global Step: 248400 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 01:15:29,752-Speed 2722.17 samples/sec Loss 0.8271 LearningRate 0.0000 Epoch: 19 Global Step: 248410 Fp16 Grad Scale: 16384 Required: 0 hours Training: 2022-04-28 01:15:33,093-Speed 3065.85 samples/sec Loss 0.8611 LearningRate 0.0000 Epoch: 19 Global Step: 248420 Fp16 Grad Scale: 16384 Required: -0 hours